man m17nCharset () - Charset objects and API for them.

NAME

Charset - Charset objects and API for them.

Variables: Symbols representing a charset.

Each of the following symbols represents a predefined charset. MSymbol Mcharset_ascii

Symbol representing the charset ASCII. MSymbol Mcharset_iso_8859_1

Symbol representing the charset ISO/IEC 8859/1. MSymbol Mcharset_unicode

Symbol representing the charset Unicode. MSymbol Mcharset_m17n

Symbol representing the largest charset. MSymbol Mcharset_binary

Symbol representing the charset for ill-decoded characters.

Variables: Parameter keys for mchar_define_charset().

These are the predefined symbols to use as parameter keys for the function mchar_define_charset() (which see). MSymbol Mmethod

MSymbol Mdimension

MSymbol Mmin_range

MSymbol Mmax_range

MSymbol Mmin_code

MSymbol Mmax_code

MSymbol Mascii_compatible

MSymbol Mfinal_byte

MSymbol Mrevision

MSymbol Mmin_char

MSymbol Mmapfile

MSymbol Mparents

MSymbol Msubset_offset

MSymbol Mdefine_coding

MSymbol Maliases

Variables: Symbols representing charset methods.

These are the predefined symbols that can be a value of the Mmethod parameter of a charset used in an argument to the mchar_define_charset() function.

A method specifies how code-points and character codes are converted. See the documentation of the mchar_define_charset() function for the details. MSymbol Moffset

Symbol for the offset type method of charset. MSymbol Mmap

Symbol for the map type method of charset. MSymbol Munify

Symbol for the unify type method of charset. MSymbol Msubset

Symbol for the subset type method of charset. MSymbol Msuperset

Symbol for the superset type method of charset.

Defines

#define MCHAR_INVALID_CODE

Invalid code-point.

Functions

MSymbol mchar_define_charset (const char *name, MPlist *plist)

Define a charset. MSymbol mchar_resolve_charset (MSymbol symbol)

Resolve charset name. int mchar_list_charset (MSymbol **symbols)

List symbols representing charsets. int mchar_decode (MSymbol charset_name, unsigned code)

Decode a code-point. unsigned mchar_encode (MSymbol charset_name, int c)

Encode a character code. int mchar_map_charset (MSymbol charset_name, void(*func)(int from, int to, void *arg), void *func_arg)

Call a function for all the characters in a specified charset.

Variables

MSymbol Mcharset

The symbol Mcharset.

Detailed Description

The m17n library uses charset objects to represent a coded character sets (CCS). The m17n library supports many predefined coded character sets. r, application programs can add other charsets. A character can belong to multiple charsets.

The m17n library distinguishes the following three concepts:

A code-point is a number assigned by the CCS to each character. Code-points may or may not be continuous. The type unsigned is used to represent a code-point. An invalid code-point is represented by the macro MCHAR_INVALID_CODE.
A character index is the canonical index of a character in a CCS. The character that has the character index N occupies the Nth position when all the characters in the current CCS are sorted by their code-points. Character indices in a CCS are continuous and start with 0.
A character code is the internal representation in the m17n library of a character. A character code is a signed integer of 21 bits or longer.

Each charset object defines how characters are converted between code-points and character codes. To encode means converting code-points to character codes and to decode means converting character codes to code-points.

Define Documentation

#define MCHAR_INVALID_CODE

The macro MCHAR_INVALID_CODE gives the invalid code-point.

Variable Documentation

MSymbol Mcharset

Any decoded M-text has a text property whose key is the predefined symbol Mcharset. The name of Mcharset is 'charset'.

MSymbol Mcharset_ascii

The symbol Mcharset_ascii has name 'ascii' and represents the charset ISO 646, USA Version X3.4-1968 (ISO-IR-6).

MSymbol Mcharset_iso_8859_1

The symbol Mcharset_iso_8859_1 has name 'iso-8859-1' and represents the charset ISO/IEC 8859-1:1998.

MSymbol Mcharset_unicode

The symbol Mcharset_unicode has name 'unicode' and represents the charset Unicode.

MSymbol Mcharset_m17n

The symbol Mcharset_m17n has name 'm17n' and represents the charset that contains all characters supported by the m17n library.

MSymbol Mcharset_binary

The symbol Mcharset_binary has name 'binary' and represents the fake charset which the decoding functions put to an M-text as a text property when they encounter an invalid byte (sequence). See Code Conversion for more details.

MSymbol Mmethod

Parameter key for mchar_define_charset() (which see).

MSymbol Mdimension

Parameter key for mchar_define_charset() (which see).

MSymbol Mmin_range

Parameter key for mchar_define_charset() (which see).

MSymbol Mmax_range

Parameter key for mchar_define_charset() (which see).

MSymbol Mmin_code

Parameter key for mchar_define_charset() (which see).

MSymbol Mmax_code

Parameter key for mchar_define_charset() (which see).

MSymbol Mascii_compatible

Parameter key for mchar_define_charset() (which see).

MSymbol Mfinal_byte

Parameter key for mchar_define_charset() (which see).

MSymbol Mrevision

Parameter key for mchar_define_charset() (which see).

MSymbol Mmin_char

Parameter key for mchar_define_charset() (which see).

MSymbol Mmapfile

Parameter key for mchar_define_charset() (which see).

MSymbol Mparents

Parameter key for mchar_define_charset() (which see).

MSymbol Msubset_offset

Parameter key for mchar_define_charset() (which see).

MSymbol Mdefine_coding

Parameter key for mchar_define_charset() (which see).

MSymbol Maliases

Parameter key for mchar_define_charset() (which see).

MSymbol Moffset

The symbol Moffset has the name 'offset' and, when used as a value of Mmethod parameter of a charset, it means that the conversion of code-points and character codes of the charset is done by this calculation:

CHARACTER-CODE = CODE-POINT - MIN-CODE + MIN-CHAR

where, MIN-CODE is a value of Mmin_code parameter of the charset, and MIN-CHAR is a value of Mmin_char parameter.

MSymbol Mmap

The symbol Mmap has the name 'map' and, when used as a value of Mmethod parameter of a charset, it means that the conversion of code-points and character codes of the charset is done by map looking up. The map must be given by Mmapfile parameter.

MSymbol Munify

The symbol Munify has the name 'unify' and, when used as a value of Mmethod parameter of a charset, it means that the conversion of code-points and character codes of the charset is done by map looking up and offsetting. The map must be given by Mmapfile parameter. For this kind of charset, a unique continuous character code space for all characters is assigned.

If the map has an entry for a code-point, the conversion is done by looking up the map. Otherwise, the conversion is done by this calculation:

CHARACTER-CODE = CODE-POINT - MIN-CODE + LOWEST-CHAR-CODE

where, MIN-CODE is a value of Mmin_code parameter of the charset, and LOWEST-CHAR-CODE is the lowest character code of the assigned code space.

MSymbol Msubset

The symbol Msubset has the name 'subset' and, when used as a value of Mmethod parameter of a charset, it means that the charset is a subset of a parent charset. The parent charset must be given by Mparents parameter. The conversion of code-points and character codes of the charset is done conceptually by this calculation:

CHARACTER-CODE = PARENT-CODE (CODE-POINT) + SUBSET-OFFSET

where, PARENT-CODE is a pseudo function that returns a character code of CODE-POINT in the parent charset, and SUBSET-OFFSET is a value given by Msubset_offset parameter.

MSymbol Msuperset

The symbol Msuperset has the name 'superset' and, when used as a value of Mmethod parameter of a charset, it means that the charset is a superset of parent charsets. The parent charsets must be given by Mparents parameter.