Character Set Conversion

It is possible to convert between any two of the following encodings:

ASCII
UTF-8
C

Alias for UTF-8.

UTF-16BE

Big-endian.

UTF-16LE

Little-endian.

UTF-16

Switches between BE and LE on FFFE/FEFF byte order marks, which can be everywhere in the stream. The default is big-endian.

UTF-32BE
UTF-32LE
UTF-32

Analogous to UTF-16.

PDFDOCENCODING
PDFDOC

8-bit encoding as specified in Annex D.7 of the PDF spec. The codepoints 0x7f, 0x9f and 0xad are left undefined.

Conversion Functions

Lightweight UTF-8 support functions