Experimental Expat binding for Standard ML

Florian Weimer

After a long leave of absence, I have taken up programming in Standard ML again. A discussion on the MLton development list prompted me to publish my current software toy.

The implementation of the binding to Expat, a minimal XML parser library written in C, is a bit unusual in that it tries to avoid transitions between Standard ML and C code. This is somewhat tricky because the Expat interface itself is callback-based. The approach I use is somewhat similar to the Go lexer described in An observation about tail calls: events are serialized into a buffer (which takes the place of the channel in the original Go code) on the C side, and the SML side replays the events from the buffer. If the event buffer is exhausted, the SML code calls Expat again, feeding it chunks of the XML text, until the buffer is no longer empty.

The current state is available as Git repository at:

The example application might eventually replace the Perl script which I currently use to generate the HTML code for these web pages. As it is currently written, the code is very much specific to MLton because it relies on the foreign function interface of this implementation of Standard ML.

As it stands, I am not entirely happy with the code. I will have to introduce finalization, for the struct smlbuf objects (behind the Buffer structure), and for the Expat parser wrappers. Finalization for Buffers will hopefully make liveness checks unnecessary (both the existing ones and those I still have to add).

For me, this is a rather strange experience. For a long time, I have quietly glorified Standard ML slightly beyond reason (an approach that is more often associated with (Common) Lisp), mostly due to its simplicity, clear and uncompromising semantics, and expressiveness. However, the actual experience of writing SML code is a bit sobering. The language lacks quite a few convenience features—such as ordered enums with conversion to and from strings and integers, a language feature available in Ada, Java, and Haskell (and is quite easy to build on your own in any dynamic programming language).

At least, I rediscovered the -const 'Exn.keepHistory true' switch for MLton, which instructions MLton to provide function-level stack traces when reporting unhandled exceptions at run time. This reduces the amount of print debugging and guesswork required if some things do not quite work out as expected.


Florian Weimer
Home Blog (DE) Blog (EN) RSS Feeds Impressum