Create a highly user-friendly compiler from Lisp to JavaScript for streamlined programming.
Are you passionate about programming in Clojure but find the JVM’s slow start-up time a hindrance? In this blogpost, I’ll guide you through the exciting journey of building your own Lisp to Javascript compiler. By utilizing a PEG grammar, you’ll be able to parse Lisp source code and transform it into a compatible Abstract Syntax Tree (AST). This AST will then be converted into well-formatted Javascript code using escodegen. Say goodbye to JVM-related concerns and embrace the power of running your Clojure-inspired code on top of Node.js.
Why This Lisp to Javascript Compiler is Special
While there are numerous Lisp to Javascript compilers available, what sets this one apart? This compiler employs a PEG grammar to parse the Lisp source code. Once parsed, the resulting code is transformed into an AST compatible with the Parser API. Subsequently, escodegen seamlessly converts the AST into clean, well-structured Javascript code.
This approach offers several advantages. Firstly, you don’t need to worry about intricate Javascript syntax details. Escodegen takes care of inserting semicolons and ensuring overall code consistency. Moreover, separating the parsing and Javascript generation processes allows you to replace the code generation aspect with alternative software if desired.
Lisp Basics: Primer for Beginners
If you’re already familiar with Lisp, feel free to skip this section and proceed to the next.
Lisp source code consists of s-expressions, which are essentially lists. In an s-expression, the first element represents a function, while the remaining elements are the function’s arguments.
For example:
In this case, (greet “honza”) is a list with two items. “greet” is the function name, and “honza” is the argument. In other programming languages, this might be represented as greet(“honza”).
Lisp employs s-expressions for various purposes, including function definitions, if statements, assignments, binary expressions, and more.
Here are a few examples of Lisp code snippets:
Defining a variable called “name” and assigning it the value “honza”:
Adding 1 and 2 and returning the result:
Conditional statement that prints “hey honza” if the variable “name” is equal to “honza,” otherwise prints “hey stranger”:
Function definition for a function called “greet” that takes one parameter called “name”:
In Lisp, a function body can have multiple s-expressions, but only the last one is returned. Unlike other languages, Lisp doesn’t require a return keyword. Binary operators and keywords like “if” are actually functions that return values.
Unleashing the Mighty PEG Grammar
Every PEG grammar begins with the program directive, which serves as the starting point for parsing.
A Lisp program comprises one or more s-expressions, optionally followed by a newline. The list of s-expressions is stored in the variable s. The parser then returns a Javascript object with two properties: type and body. Since we’re at the top level, the type is “Program,” and the body consists of the matched s-expressions.
To successfully compile this grammar into a parser, we need to define the structure of an s-expression.
An s-expression can be an atom, a list, a vector, or an object. Each of these can be preceded and followed by any amount of whitespace.
Let’s examine the atom:
An atom can be a sequence of digits, a string enclosed in double quotes, or a valid identifier. When a digit sequence is matched, it is assigned to the d variable. We concatenate the digits and convert them into an integer using the numberify function. Both numbers and strings are considered literal values and are returned as such. Identifiers, which represent variable names, are also returned as an identifier object.
Moving on to vectors and objects:
Similarly, a vector can be an empty array, an array with one or more atoms, or an array with one or more objects. The makeObject function takes pairs of elements from the array and converts the first item into an object key, setting its value as the second item. If the array length is not divisible by 2, an error is triggered.
Lastly, let’s explore lists. Lists are special because the first item represents the name of a function.
A list can either be an empty list or a list of one or more s-expressions. When dealing with a non-empty list, we check the first element to determine its type. If it’s “def,” we handle it as a variable declaration. If it’s “fn,” it represents an anonymous function. If it matches a built-in function or a user-defined function, we process it accordingly. The processCallExpression function handles function calls, distinguishing between statements and expressions.
To complete the grammar, we define whitespace:
Whitespace consists of zero or more newline, comma, or space characters.
Overcoming Obstacles
During the process of converting parsed source code into the Parser API tree, several obstacles were encountered. Lisp and Javascript do not map perfectly to each other, requiring additional post-processing steps.
Statement vs. Expression
In Lisp, everything is an expression, while Javascript has both expressions and statements. The challenging part arises when a function call can be both a statement and an expression, depending on its usage. To handle this, a function called processCallExpression was implemented.
This function checks if any arguments passed to a function call are also function calls. If a nested function call is detected, it is represented as a CallExpression. Otherwise, it is treated as a CallExpression within an ExpressionStatement. Since the PEG parser lacks context awareness, this distinction must be made explicitly.
Implicit Return
In Lisp, the last s-expression in a function’s body is implicitly returned. There’s no need to indicate this with a return statement; it’s built into the language. For function declarations, additional processing is required to identify the last expression and wrap it in a ReturnStatement.
If Statement as an Expression
In Lisp, the “if” statement is an expression, just like a function call. This means that the expression in either of the two branches is effectively returned to the caller. To handle this, extra wrapping is added around the statement, and each branch expression is enclosed in a return statement.
Standard Library
To make the Lisp experience even more enjoyable, a standard library was developed. Stored in a file called lib.js, it contains functions that can be accessed from any Lisp program you write. This enables the inclusion of fun functional programming functions and enhances the language’s capabilities.
Putting It All Together
Here’s an overview of the process:
- Use peg.js to compile the grammar into a parser.
- Combine the parser with the compiler program.
- The compiler program serves as a command-line utility, handling program compilation, CLI flags, etc. It can return the AST instead of Javascript or perform additional tasks like code minification.
- To use the compiler, execute the following command:
This will output the resulting Javascript code to stdout. Additionally, the compiler program prepends the compiled Javascript with the standard library code retrieved from the lib.js file.Feel free to reach out via email, or better yet, create your own response article. Let’s embrace the awesome world of blogging and continue advancing our programming knowledge together.