man Xml_lexer () - Simple XML lexer
NAME
Xml_lexer - Simple XML lexer
Module
Module Xml_lexer
Documentation
Module Xml_lexer : sig end
Simple XML lexer
=== This module provides an ocamllex lexer for XML files. It only supports the most basic features of the XML specification. The lexer altogether ignores the following 'events': comments, processing instructions, XML prolog and doctype declaration. The predefined entities (&, <, etc.) are supported. The replacement text for other entities whose entity value consist of character data can be provided to the lexer (see Xml_lexer.entities). Internal entities declarations are not taken into account (the lexer just skips the doctype declaration). CDATA sections and character references are supported. See Xml_lexer.strip_ws about whitespace handling. ===
=== Error reporting ===
type error = | Illegal_character of char | Bad_entity of string | Unterminated of string | Tag_expected | Attribute_expected | Other of string
val error_string : error -> string
exception Error of error * int
This exception is raised in case of an error during the parsing. The int argument indicates the character position in the buffer. Note that some non-conforming XML documents might not trigger an error.
=== API ===
type token = | Tag of string * (string * string) list * bool (* Tag (name, attributes, empty) denotes an opening tag with the specified name and attributes . If empty , then the tag ended in "/>", meaning that it has no sub-elements. *) | Chars of string (* Some text between the tags *) | Endtag of string (* A closing tag *) | EOF (* End of input *) The type of the XML document elements
val strip_ws : bool Pervasives.ref Whitespace handling: if strip_ws is true (the default), whitespaces next to a tag are ignored. Character data consisting only of whitespaces is thus suppressed (i.e. Chars tokens are skipped).
val entities : (string * string) list Pervasives.ref An association list of entities definitions. Initially, it contains the predefined entities ( [" amp" , & ; lt , < ...] ).
val token : Lexing.lexbuf -> token The entry point of the lexer. Raises Error in case of an invalid XML document Returns the next token in the buffer