4.1 Expat Basics

Parsing of XML documents using Expat is based on user-defined callback functions. You create a parser object, and associate callback (or handler) functions with the events he is interested in. Such events may be, for instance, encountering of a open or closing tag, encountering of a comment block, etc. Once the parser object is ready, you start feeding the document to it. As the parser recognizes XML constructs, it calls the callbacks that are registered for them.

Parsers are created using xml-make-parser function. In the simplest case, it takes no arguments, e.g.:

 
(let ((parser (xml-make-parser)))
  …

The function xml-parse takes the parser as its argument, reads the document from the current input stream and feeds it to the parser. Thus, the simplest program for parsing XML documents is:

 
(use-modules ((gamma expat)))
(xml-parse (xml-make-parser))

This program is perhaps not so useful, but you may already use it to check whether its input is a correctly formed XML document. If xml-parse encounters an error, it signals the gamma-xml-error error. See section error handling, for a discussion on how to handle it.

The xml-make-parser function takes optional arguments, which allow to set callback functions for the new parser. For example, the following code sets function ‘elt-start’ as a handler for start elements:

 
(xml-make-parser #:start-element-handler elt-start)

The #:start-element-handler keyword informs the function that the argument following it is a handler for start XML documents. Any number of handlers may be set this way, e.g.:

 
(xml-make-parser #:start-element-handler elt-start
                 #:end-element-handler elt-end
                 #:comment-handler comment)

Definitions of particular handler functions differ depending on their purpose, i.e. on the event they are defined to handle. For example, a start element handler must be defined as having two arguments. First of them is the name of the tag, and the second one is a list of attributes supplied for that tag. Thus, for example, the following start handler prints the tag and the number of attributes:

 
(define (elt-start name attrs)
  (format #t "~A (~A)~%" name (length attrs)))

For a detailed description of all available handlers and handler keywords, see Expat Handlers.

To further improve our example, suppose you need a program that will take an XML document as its input and create a description of its structure on output, showing element nesting levels by indenting their description. Here is how to write it.

First, define handlers for start and end elements. Start element handler will print two indenting spaces for each level of ancestor elements, followed by the element name and its attributes and a newline. It will then increase the global level variable:

 
(define level 0)

(define (elt-start name attrs)
  (display (make-string (* 2 level) #\space))
  (display name)
  (for-each
   (lambda (x)
    (display " ")
    (display (car x))
    (display "=")
    (display (cdr x)))
   attrs)
  (newline)
  (set! level (1+ level)))

The handler for end tags is simpler: it must only decrease the level:

 
(define (elt-end name)
  (set! level (1- level)))

Finally, create a parser and parse the input:

 
(xml-parse (xml-make-parser #:start-element-handler elt-start
                            #:end-element-handler elt-end))