Evaluating Expressions
You are my creator, but I am your master; Obey!
Mary Shelley, Frankenstein
If you want to properly set the mood for this chapter, try to conjure up a thunderstorm, one of those swirling tempests that likes to yank open shutters at the climax of the story. Maybe toss in a few bolts of lightning. In this chapter, our interpreter will take breath, open its eyes, and execute some code.
There are all manner of ways that language implementations make a computer do what the user’s source code commands. They can compile it to machine code, translate it to another high-level language, or reduce it to some bytecode format for a virtual machine to run. For our first interpreter, though, we are going to take the simplest, shortest path and execute the syntax tree itself.
Right now, our parser only supports expressions. So, to “execute” code, we will evaluate an expression and produce a value. For each kind of expression syntax we can parse—literal, operator, etc.—we need a corresponding chunk of code that knows how to evaluate that tree and produce a result. That raises two questions:
-
What kinds of values do we produce?
-
How do we organize those chunks of code?
Taking them on one at a time . . .
7 . 1Representing Values
In Lox, values are created by literals, computed by expressions, and stored in variables. The user sees these as Lox objects, but they are implemented in the underlying language our interpreter is written in. That means bridging the lands of Lox’s dynamic typing and Java’s static types. A variable in Lox can store a value of any (Lox) type, and can even store values of different types at different points in time. What Java type might we use to represent that?
Given a Java variable with that static type, we must also be able to determine
which kind of value it holds at runtime. When the interpreter executes a +
operator, it needs to tell if it is adding two numbers or concatenating two
strings. Is there a Java type that can hold numbers, strings, Booleans, and
more? Is there one that can tell us what its runtime type is? There is! Good old
java.lang.Object.
In places in the interpreter where we need to store a Lox value, we can use Object as the type. Java has boxed versions of its primitive types that all subclass Object, so we can use those for Lox’s built-in types:
Lox type | Java representation |
Any Lox value | Object |
nil |
null |
Boolean | Boolean |
number | Double |
string | String |
Given a value of static type Object, we can determine if the runtime value is a
number or a string or whatever using Java’s built-in instanceof
operator. In
other words, the JVM’s own object representation
conveniently gives us everything we need to implement Lox’s built-in types.
We’ll have to do a little more work later when we add Lox’s notions of
functions, classes, and instances, but Object and the boxed primitive classes
are sufficient for the types we need right now.
7 . 2Evaluating Expressions
Next, we need blobs of code to implement the evaluation logic for each kind of
expression we can parse. We could stuff that code into the syntax tree classes
in something like an interpret()
method. In effect, we could tell each syntax
tree node, “Interpret thyself”. This is the Gang of Four’s
Interpreter design pattern. It’s a neat pattern, but like I mentioned
earlier, it gets messy if we jam all sorts of logic into the tree classes.
Instead, we’re going to reuse our groovy Visitor pattern. In the previous chapter, we created an AstPrinter class. It took in a syntax tree and recursively traversed it, building up a string which it ultimately returned. That’s almost exactly what a real interpreter does, except instead of concatenating strings, it computes values.
We start with a new class.
create new file
package com.craftinginterpreters.lox; class Interpreter implements Expr.Visitor<Object> { }
The class declares that it’s a visitor. The return type of the visit methods will be Object, the root class that we use to refer to a Lox value in our Java code. To satisfy the Visitor interface, we need to define visit methods for each of the four expression tree classes our parser produces. We’ll start with the simplest . . .
7 . 2 . 1Evaluating literals
The leaves of an expression tree—the atomic bits of syntax that all other expressions are composed of—are literals. Literals are almost values already, but the distinction is important. A literal is a bit of syntax that produces a value. A literal always appears somewhere in the user’s source code. Lots of values are produced by computation and don’t exist anywhere in the code itself. Those aren’t literals. A literal comes from the parser’s domain. Values are an interpreter concept, part of the runtime’s world.
So, much like we converted a literal token into a literal syntax tree node in the parser, now we convert the literal tree node into a runtime value. That turns out to be trivial.
in class Interpreter
@Override public Object visitLiteralExpr(Expr.Literal expr) { return expr.value; }
We eagerly produced the runtime value way back during scanning and stuffed it in the token. The parser took that value and stuck it in the literal tree node, so to evaluate a literal, we simply pull it back out.
7 . 2 . 2Evaluating parentheses
The next simplest node to evaluate is grouping—the node you get as a result of using explicit parentheses in an expression.
in class Interpreter
@Override public Object visitGroupingExpr(Expr.Grouping expr) { return evaluate(expr.expression); }
A grouping node has a reference to an inner node for the expression contained inside the parentheses. To evaluate the grouping expression itself, we recursively evaluate that subexpression and return it.
We rely on this helper method which simply sends the expression back into the interpreter’s visitor implementation:
in class Interpreter
private Object evaluate(Expr expr) { return expr.accept(this); }
7 . 2 . 3Evaluating unary expressions
Like grouping, unary expressions have a single subexpression that we must evaluate first. The difference is that the unary expression itself does a little work afterwards.
add after visitLiteralExpr()
@Override public Object visitUnaryExpr(Expr.Unary expr) { Object right = evaluate(expr.right); switch (expr.operator.type) { case MINUS: return -(double)right; } // Unreachable. return null; }
First, we evaluate the operand expression. Then we apply the unary operator itself to the result of that. There are two different unary expressions, identified by the type of the operator token.
Shown here is -
, which negates the result of the subexpression. The
subexpression must be a number. Since we don’t statically know that in Java,
we cast it before performing the operation. This type
cast happens at runtime when the -
is evaluated. That’s the core of what makes
a language dynamically typed right there.
You can start to see how evaluation recursively traverses the tree. We can’t evaluate the unary operator itself until after we evaluate its operand subexpression. That means our interpreter is doing a post-order traversal—each node evaluates its children before doing its own work.
The other unary operator is logical not.
switch (expr.operator.type) {
in visitUnaryExpr()
case BANG: return !isTruthy(right);
case MINUS:
The implementation is simple, but what is this “truthy” thing about? We need to make a little side trip to one of the great questions of Western philosophy: What is truth?
7 . 2 . 4Truthiness and falsiness
OK, maybe we’re not going to really get into the universal question, but at
least inside the world of Lox, we need to decide what happens when you use
something other than true
or false
in a logic operation like !
or any
other place where a Boolean is expected.
We could just say it’s an error because we don’t roll with implicit conversions, but most dynamically typed languages aren’t that ascetic. Instead, they take the universe of values of all types and partition them into two sets, one of which they define to be “true”, or “truthful”, or (my favorite) “truthy”, and the rest which are “false” or “falsey”. This partitioning is somewhat arbitrary and gets weird in a few languages.
Lox follows Ruby’s simple rule: false
and nil
are falsey, and everything else
is truthy. We implement that like so:
add after visitUnaryExpr()
private boolean isTruthy(Object object) { if (object == null) return false; if (object instanceof Boolean) return (boolean)object; return true; }
7 . 2 . 5Evaluating binary operators
On to the last expression tree class, binary operators. There’s a handful of them, and we’ll start with the arithmetic ones.
add after evaluate()
@Override public Object visitBinaryExpr(Expr.Binary expr) { Object left = evaluate(expr.left); Object right = evaluate(expr.right); switch (expr.operator.type) { case MINUS: return (double)left - (double)right; case SLASH: return (double)left / (double)right; case STAR: return (double)left * (double)right; } // Unreachable. return null; }
I think you can figure out what’s going on here. The main difference from the unary negation operator is that we have two operands to evaluate.
I left out one arithmetic operator because it’s a little special.
switch (expr.operator.type) { case MINUS: return (double)left - (double)right;
in visitBinaryExpr()
case PLUS: if (left instanceof Double && right instanceof Double) { return (double)left + (double)right; } if (left instanceof String && right instanceof String) { return (String)left + (String)right; } break;
case SLASH:
The +
operator can also be used to concatenate two strings. To handle that, we
don’t just assume the operands are a certain type and cast them, we
dynamically check the type and choose the appropriate operation. This is why
we need our object representation to support instanceof
.
Next up are the comparison operators.
switch (expr.operator.type) {
in visitBinaryExpr()
case GREATER: return (double)left > (double)right; case GREATER_EQUAL: return (double)left >= (double)right; case LESS: return (double)left < (double)right; case LESS_EQUAL: return (double)left <= (double)right;
case MINUS:
They are basically the same as arithmetic. The only difference is that where the arithmetic operators produce a value whose type is the same as the operands (numbers or strings), the comparison operators always produce a Boolean.
The last pair of operators are equality.
in visitBinaryExpr()
case BANG_EQUAL: return !isEqual(left, right); case EQUAL_EQUAL: return isEqual(left, right);
Unlike the comparison operators which require numbers, the equality operators
support operands of any type, even mixed ones. You can’t ask Lox if 3 is less
than "three"
, but you can ask if it’s equal to
it.
Like truthiness, the equality logic is hoisted out into a separate method.
add after isTruthy()
private boolean isEqual(Object a, Object b) { if (a == null && b == null) return true; if (a == null) return false; return a.equals(b); }
This is one of those corners where the details of how we represent Lox objects in terms of Java matter. We need to correctly implement Lox’s notion of equality, which may be different from Java’s.
Fortunately, the two are pretty similar. Lox doesn’t do implicit conversions in
equality and Java does not either. We do have to handle nil
/null
specially
so that we don’t throw a NullPointerException if we try to call equals()
on
null
. Otherwise, we’re fine. Java’s equals()
method
on Boolean, Double, and String have the behavior we want for Lox.
And that’s it! That’s all the code we need to correctly interpret a valid Lox expression. But what about an invalid one? In particular, what happens when a subexpression evaluates to an object of the wrong type for the operation being performed?
7 . 3Runtime Errors
I was cavalier about jamming casts in whenever a subexpression produces an Object and the operator requires it to be a number or a string. Those casts can fail. Even though the user’s code is erroneous, if we want to make a usable language, we are responsible for handling that error gracefully.
It’s time for us to talk about runtime errors. I spilled a lot of ink in the previous chapters talking about error handling, but those were all syntax or static errors. Those are detected and reported before any code is executed. Runtime errors are failures that the language semantics demand we detect and report while the program is running (hence the name).
Right now, if an operand is the wrong type for the operation being performed, the Java cast will fail and the JVM will throw a ClassCastException. That unwinds the whole stack and exits the application, vomiting a Java stack trace onto the user. That’s probably not what we want. The fact that Lox is implemented in Java should be a detail hidden from the user. Instead, we want them to understand that a Lox runtime error occurred, and give them an error message relevant to our language and their program.
The Java behavior does have one thing going for it, though. It correctly stops executing any code when the error occurs. Let’s say the user enters some expression like:
2 * (3 / -"muffin")
You can’t negate a muffin, so we need to report a
runtime error at that inner -
expression. That in turn means we can’t evaluate
the /
expression since it has no meaningful right operand. Likewise for the
*
. So when a runtime error occurs deep in some expression, we need to escape
all the way out.
We could print a runtime error and then abort the process and exit the application entirely. That has a certain melodramatic flair. Sort of the programming language interpreter equivalent of a mic drop.
Tempting as that is, we should probably do something a little less cataclysmic. While a runtime error needs to stop evaluating the expression, it shouldn’t kill the interpreter. If a user is running the REPL and has a typo in a line of code, they should still be able to keep the session going and enter more code after that.
7 . 3 . 1Detecting runtime errors
Our tree-walk interpreter evaluates nested expressions using recursive method calls, and we need to unwind out of all of those. Throwing an exception in Java is a fine way to accomplish that. However, instead of using Java’s own cast failure, we’ll define a Lox-specific one so that we can handle it how we want.
Before we do the cast, we check the object’s type ourselves. So, for unary -
,
we add:
case MINUS:
in visitUnaryExpr()
checkNumberOperand(expr.operator, right);
return -(double)right;
The code to check the operand is:
add after visitUnaryExpr()
private void checkNumberOperand(Token operator, Object operand) { if (operand instanceof Double) return; throw new RuntimeError(operator, "Operand must be a number."); }
When the check fails, it throws one of these:
create new file
package com.craftinginterpreters.lox; class RuntimeError extends RuntimeException { final Token token; RuntimeError(Token token, String message) { super(message); this.token = token; } }
Unlike the Java cast exception, our class tracks the token that identifies where in the user’s code the runtime error came from. As with static errors, this helps the user know where to fix their code.
We need similar checking for the binary operators. Since I promised you every single line of code needed to implement the interpreters, I’ll run through them all.
Greater than:
case GREATER:
in visitBinaryExpr()
checkNumberOperands(expr.operator, left, right);
return (double)left > (double)right;
Greater than or equal to:
case GREATER_EQUAL:
in visitBinaryExpr()
checkNumberOperands(expr.operator, left, right);
return (double)left >= (double)right;
Less than:
case LESS:
in visitBinaryExpr()
checkNumberOperands(expr.operator, left, right);
return (double)left < (double)right;
Less than or equal to:
case LESS_EQUAL:
in visitBinaryExpr()
checkNumberOperands(expr.operator, left, right);
return (double)left <= (double)right;
Subtraction:
case MINUS:
in visitBinaryExpr()
checkNumberOperands(expr.operator, left, right);
return (double)left - (double)right;
Division:
case SLASH:
in visitBinaryExpr()
checkNumberOperands(expr.operator, left, right);
return (double)left / (double)right;
Multiplication:
case STAR:
in visitBinaryExpr()
checkNumberOperands(expr.operator, left, right);
return (double)left * (double)right;
All of those rely on this validator, which is virtually the same as the unary one:
add after checkNumberOperand()
private void checkNumberOperands(Token operator, Object left, Object right) { if (left instanceof Double && right instanceof Double) return; throw new RuntimeError(operator, "Operands must be numbers."); }
The last remaining operator, again the odd one out, is addition. Since +
is
overloaded for numbers and strings, it already has code to check the types. All
we need to do is fail if neither of the two success cases match.
return (String)left + (String)right; }
in visitBinaryExpr()
replace 1 line
throw new RuntimeError(expr.operator, "Operands must be two numbers or two strings.");
case SLASH:
That gets us detecting runtime errors deep in the innards of the evaluator. The errors are getting thrown. The next step is to write the code that catches them. For that, we need to wire up the Interpreter class into the main Lox class that drives it.
7 . 4Hooking Up the Interpreter
The visit methods are sort of the guts of the Interpreter class, where the real work happens. We need to wrap a skin around them to interface with the rest of the program. The Interpreter’s public API is simply one method.
in class Interpreter
void interpret(Expr expression) { try { Object value = evaluate(expression); System.out.println(stringify(value)); } catch (RuntimeError error) { Lox.runtimeError(error); } }
This takes in a syntax tree for an expression and evaluates it. If that
succeeds, evaluate()
returns an object for the result value. interpret()
converts that to a string and shows it to the user. To convert a Lox value to a
string, we rely on:
add after isEqual()
private String stringify(Object object) { if (object == null) return "nil"; if (object instanceof Double) { String text = object.toString(); if (text.endsWith(".0")) { text = text.substring(0, text.length() - 2); } return text; } return object.toString(); }
This is another of those pieces of code like isTruthy()
that crosses the
membrane between the user’s view of Lox objects and their internal
representation in Java.
It’s pretty straightforward. Since Lox was designed to be familiar to someone
coming from Java, things like Booleans look the same in both languages. The two
edge cases are nil
, which we represent using Java’s null
, and numbers.
Lox uses double-precision numbers even for integer values. In that case, they
should print without a decimal point. Since Java has both floating point and
integer types, it wants you to know which one you’re using. It tells you by
adding an explicit .0
to integer-valued doubles. We don’t care about that, so
we hack it off the end.
7 . 4 . 1Reporting runtime errors
If a runtime error is thrown while evaluating the expression, interpret()
catches it. This lets us report the error to the user and then gracefully
continue. All of our existing error reporting code lives in the Lox class, so we
put this method there too:
add after error()
static void runtimeError(RuntimeError error) { System.err.println(error.getMessage() + "\n[line " + error.token.line + "]"); hadRuntimeError = true; }
We use the token associated with the RuntimeError to tell the user what line of code was executing when the error occurred. Even better would be to give the user an entire call stack to show how they got to be executing that code. But we don’t have function calls yet, so I guess we don’t have to worry about it.
After showing the error, runtimeError()
sets this field:
static boolean hadError = false;
in class Lox
static boolean hadRuntimeError = false;
public static void main(String[] args) throws IOException {
That field plays a small but important role.
run(new String(bytes, Charset.defaultCharset())); // Indicate an error in the exit code. if (hadError) System.exit(65);
in runFile()
if (hadRuntimeError) System.exit(70);
}
If the user is running a Lox script from a file and a runtime error occurs, we set an exit code when the process quits to let the calling process know. Not everyone cares about shell etiquette, but we do.
7 . 4 . 2Running the interpreter
Now that we have an interpreter, the Lox class can start using it.
public class Lox {
in class Lox
private static final Interpreter interpreter = new Interpreter();
static boolean hadError = false;
We make the field static so that successive calls to run()
inside a REPL
session reuse the same interpreter. That doesn’t make a difference now, but it
will later when the interpreter stores global variables. Those variables should
persist throughout the REPL session.
Finally, we remove the line of temporary code from the last chapter for printing the syntax tree and replace it with this:
// Stop if there was a syntax error. if (hadError) return;
in run()
replace 1 line
interpreter.interpret(expression);
}
We have an entire language pipeline now: scanning, parsing, and execution. Congratulations, you now have your very own arithmetic calculator.
As you can see, the interpreter is pretty bare bones. But the Interpreter class and the Visitor pattern we’ve set up today form the skeleton that later chapters will stuff full of interesting guts—variables, functions, etc. Right now, the interpreter doesn’t do very much, but it’s alive!
Challenges
-
Allowing comparisons on types other than numbers could be useful. The operators might have a reasonable interpretation for strings. Even comparisons among mixed types, like
3 < "pancake"
could be handy to enable things like ordered collections of heterogeneous types. Or it could simply lead to bugs and confusion.Would you extend Lox to support comparing other types? If so, which pairs of types do you allow and how do you define their ordering? Justify your choices and compare them to other languages.
-
Many languages define
+
such that if either operand is a string, the other is converted to a string and the results are then concatenated. For example,"scone" + 4
would yieldscone4
. Extend the code invisitBinaryExpr()
to support that. -
What happens right now if you divide a number by zero? What do you think should happen? Justify your choice. How do other languages you know handle division by zero, and why do they make the choices they do?
Change the implementation in
visitBinaryExpr()
to detect and report a runtime error for this case.
Design Note: Static and Dynamic Typing
Some languages, like Java, are statically typed which means type errors are detected and reported at compile time before any code is run. Others, like Lox, are dynamically typed and defer checking for type errors until runtime right before an operation is attempted. We tend to consider this a black-and-white choice, but there is actually a continuum between them.
It turns out even most statically typed languages do some type checks at runtime. The type system checks most type rules statically, but inserts runtime checks in the generated code for other operations.
For example, in Java, the static type system assumes a cast expression will always safely succeed. After you cast some value, you can statically treat it as the destination type and not get any compile errors. But downcasts can fail, obviously. The only reason the static checker can presume that casts always succeed without violating the language’s soundness guarantees, is because the cast is checked at runtime and throws an exception on failure.
A more subtle example is covariant arrays in Java and C#. The static subtyping rules for arrays allow operations that are not sound. Consider:
Object[] stuff = new Integer[1]; stuff[0] = "not an int!";
This code compiles without any errors. The first line upcasts the Integer array
and stores it in a variable of type Object array. The second line stores a
string in one of its cells. The Object array type statically allows that—strings are Objects—but the actual Integer array that stuff
refers to
at runtime should never have a string in it! To avoid that catastrophe, when you
store a value in an array, the JVM does a runtime check to make sure it’s an
allowed type. If not, it throws an ArrayStoreException.
Java could have avoided the need to check this at runtime by disallowing the cast on the first line. It could make arrays invariant such that an array of Integers is not an array of Objects. That’s statically sound, but it prohibits common and safe patterns of code that only read from arrays. Covariance is safe if you never write to the array. Those patterns were particularly important for usability in Java 1.0 before it supported generics. James Gosling and the other Java designers traded off a little static safety and performance—those array store checks take time—in return for some flexibility.
There are few modern statically typed languages that don’t make that trade-off somewhere. Even Haskell will let you run code with non-exhaustive matches. If you find yourself designing a statically typed language, keep in mind that you can sometimes give users more flexibility without sacrificing too many of the benefits of static safety by deferring some type checks until runtime.
On the other hand, a key reason users choose statically typed languages is because of the confidence the language gives them that certain kinds of errors can never occur when their program is run. Defer too many type checks until runtime, and you erode that confidence.