It is possible to convert between any two of the following encodings:
Alias for UTF-8.
Big-endian.
Little-endian.
Switches between BE and LE on FFFE/FEFF byte order marks, which can be everywhere in the stream. The default is big-endian.
Analogous to UTF-16.
8-bit encoding as specified in Annex D.7 of the PDF spec. The codepoints 0x7f, 0x9f and 0xad are left undefined.
void pdfout_char_conv_buffer (fz_context *ctx, const char *fromcode,
const char *tocode, const char *src, int srclen,
fz_buffer *buf)
This is the most generic function and used internally by all functions below.
Convert the string src of length srclen from encoding fromcode to encoding tocode. The resulting bytes are appended to buf.
Throw FZ_ERROR_ABORT if a codepoint is not valid in the target encoding. Throw FZ_ERROR_GENERIC on all other errors.
char *pdfout_char_conv (fz_context *ctx, const char *fromcode, const char *tocode,
const char *src, int srclen, int *lengthp);
Like pdfout_char_conv_buffer, but returns it's result as a char-pointer. The length of the result is stored in *lengthp. The resulting string is always null-terminated.
pdf_obj *pdfout_utf8_to_str_obj (fz_context *ctx, pdf_document *doc,
const char *inbuf, int inbuf_len);
Convert UTF-8 string to PDF string object.
char *pdfout_str_obj_to_utf8 (fz_context *ctx, pdf_obj *string, int *len);
Convert PDF string object to UTF-8 string.
char *pdfout_pdf_to_utf8 (fz_context *ctx, const char *inbuf, int inbuf_len,
int *outbuf_len);
Convert PDF string (either PDFDOC or UTF-16BE) to null-terminated UTF-8.
char *pdfout_utf8_to_pdf (fz_context *ctx, const char *inbuf, int inbuf_len,
int *outbuf_len);
Convert UTF-8 to PDF string. If possible, use PDFDOCENCODING. If that fails, use UTF-16.
char *pdfout_check_utf8 (const char *s, size_t n);
Check if the string s of length n contains valid UTF-8. If it contains invalid UTF-8, return a pointer to the first invalid unit. Return NULL, if the string is valid.
int pdfout_uctomb (fz_context *ctx, uint8_t *buf, ucs4_t uc, int n);
Convert the Unicode codepoint uc to UTF-8. Store the result in buf, which has length n. Throw for invalid codepoint or if n is too small. Always use n = 4.