man Xml_lexer () - Simple XML lexer

NAME

Xml_lexer - Simple XML lexer

Module

Module Xml_lexer

Documentation

Module Xml_lexer : sig end

Simple XML lexer

=== This module provides an ocamllex lexer for XML files. It only supports the most basic features of the XML specification. The lexer altogether ignores the following 'events': comments, processing instructions, XML prolog and doctype declaration. The predefined entities (&, <, etc.) are supported. The replacement text for other entities whose entity value consist of character data can be provided to the lexer (see Xml_lexer.entities). Internal entities declarations are not taken into account (the lexer just skips the doctype declaration). CDATA sections and character references are supported. See Xml_lexer.strip_ws about whitespace handling. ===

=== Error reporting ===

type error = | Illegal_character of char | Bad_entity of string | Unterminated of string | Tag_expected | Attribute_expected | Other of string

val error_string : error -> string

exception Error of error * int

This exception is raised in case of an error during the parsing. The int argument indicates the character position in the buffer. Note that some non-conforming XML documents might not trigger an error.

=== API ===

type token = | Tag of string * (string * string) list * bool (* Tag (name, attributes, empty) denotes an opening tag with the specified name and attributes . If empty , then the tag ended in "/>", meaning that it has no sub-elements. *) | Chars of string (* Some text between the tags *) | Endtag of string (* A closing tag *) | EOF (* End of input *) The type of the XML document elements

val strip_ws : bool Pervasives.ref Whitespace handling: if strip_ws is true (the default), whitespaces next to a tag are ignored. Character data consisting only of whitespaces is thus suppressed (i.e. Chars tokens are skipped).

val entities : (string * string) list Pervasives.ref An association list of entities definitions. Initially, it contains the predefined entities ( [" amp" , & ; lt , < ...] ).

val token : Lexing.lexbuf -> token The entry point of the lexer. Raises Error in case of an invalid XML document Returns the next token in the buffer