An observation about tail calls

Florian Weimer

Go lacks proper tail calls: calls in tail positions grow the stack, at least in the standard implementation. This prevents the direct implementation of state machine as a collection of functions recursively calling each other. Surprisingly, the usual workaround, a mini-interpreter, makes it straightforward to execute the state machine incrementally, and not just to completion, without relying on additional language features such as coroutines.

The lexer in the template package encodes part of its state in the instruction pointer. To work around the lack of tail calls, it has to use a mini-interpreter to avoid unbounded stack growth, something like this:

type stateFn func(*lexer) stateFn
// ...
var state stateFn
// ...
for state != nil {
	state = state(l)
}

According to Rob Pike, Lexical Scanning in Go <http://cuddle.googlecode.com/hg/talk/lex.html#slide-17> (August 2011, retrieved 2011-10-09, video <http://www.youtube.com/watch?v=HxaD_trXwRE> ), this more or less how the lexer started. The loop above is a standard way to emulate proper tail calls on top of systems which lack them. It has been used by GHC with the C backend, and various functional language implementations on the JVM.

Curiously, the stateFn type cannot be expressed in Standard ML, even when the option type in the function result type is made explicit. It appears to be a counterexample to the widespread notion that something which cannot be typed in Standard ML is not worth typing at all, though.

To use this mini-interpreter, the actions have to be changed from tail calls

func lexLeftDelim(l *lexer) {
	if strings.HasPrefix(l.input[l.pos:], l.leftDelim+leftComment) {
		return lexComment(l)
	}
	l.pos += len(l.leftDelim)
	l.emit(itemLeftDelim)
	return lexInsideAction(l)
}

to the formulation used in the current Go sources:

func lexLeftDelim(l *lexer) {
	if strings.HasPrefix(l.input[l.pos:], l.leftDelim+leftComment) {
		return lexComment
	}
	l.pos += len(l.leftDelim)
	l.emit(itemLeftDelim)
	return lexInsideAction
}

Such manual tail call elimination has the drawback that it is somewhat inefficient: the indirect call state(l) is likely to be mispredicted. The original calls were direct calls. So it is tempting to ask for direct language support for proper tail calls.

But there's a small bit of magic in this code fragment: the call to lexer.emit(itemLeftDelim). This is effectively a coroutine yield, transferring a result out of the state machine loop to the caller of the lexer. It makes the lexer incremental, returning results as soon as they become available, instead of collecting all tokens and returning them when the end of input is reached. The implementation of lexer.emit() could just append the item to an array, but then the lexer would not be incremental anymore. The coroutine yield is emulated in Go with a goroutine and a channel. The goroutine executes the mini-interpreter, and the channel carries the lexer items to the main program. See the slides <http://cuddle.googlecode.com/hg/talk/lex.html#slide-22> for details.

However, this doesn't work very well in Go because goroutines do more than mere coroutines (they provide parallelism) and have to be disallowed during package initialization. Recall that the code examples come from the lexer of the template package. Package initialization is the time when you want to parse templates to get early errors, not just when the template is actually used. Therefore, it is desirable to avoid goroutines in the template package.

It turns out that the mini-interpreter can be extended easily to do away with the need for the goroutine:

for {
	select {
	case item := <-l.items:
		return item
	default:
		l.state = l.state(l)
	}
}

The lexer actions do not need any changes. They can continue to use the existing lexer.emit() function. To a certain degree, this appears to be an accident. In the earlier verison of the code which used a goroutine, it would not have been a problem to loop around calls to the emit function, but with the new mini-interpreter, such loops would need unbounded buffering. A fixed number of calls to the emit function from a single action before that action returns to the mini-interpreter is acceptable, though.

Using a channel and a conditional receive is not essential here, it is merely a convenience. See Nigel Tao, replace the lexer's item channel with an inline ringbuffer <http://codereview.appspot.com/4657082/> (July 2011, retrieved 2011-10-09).

Apart from the curiously infinite type stateFn (which can be encoded in some way in most languages which lack direct support to express it), the entire construction does not need any special language features. Not even closures are required, bare function values are sufficient.

The key observation is that this approach is impossible if direct tail calls are used instead of a mini-interpreter. There is just no place at which the loop can be broken. The direct tail call could be replaced with a tail call to a function which checks if there is an emitted token, suspending the loop in that case. But this would just reimplement the mini-interpreter loop given above using tail-call recursion, so it is not a real improvement.

To me, this little anecdote suggests that tail calls are not as essential as they might seem in a programming language. State machines which can be executed step-by-step have much wider use than those that run until they reach some end state. Lack of proper tail calls can be an unexpected virtue.

Addendum: There is a fairly straightforward way of encoding the stateFn type and the original run function in Standard ML:

datatype stateFn = StateFn of lexer -> stateFn option

fun run f l =
    case f l of
	SOME (StateFn f) => run f l
      | NONE => ()

Revisions

2011-10-09: published
2011-10-11: included addendum
2012-09-01: Clarifications.

Florian Weimer
Home Blog (DE) Blog (EN) RSS Feeds Impressum