This is a series of tutorials on how to implement a programming language. If you’ve ever written an interpreter or compiler, there’s probably nothing new here. However, if you use regular expressions to “parse” anything that looks like a programming language, read at least the parsing section. Let’s write less error code!

The target audience is the average JavaScript/NodeJS programmer.

What are we going to learn?

  • What a parser is, and how to write one.
  • How to write an interpreter.
  • Why they matter.
  • Write a compiler.
  • How to convert code to continuation style.
  • Some basic optimization techniques.

In between, I’ll argue about why Lisp is a good programming language. However, the language we are going to use is not Lisp. It has a richer syntax (the classic infix notation that everyone knows about) and, except for macros, is comparable to Scheme. Sadly, macros are the ultimate bastion of Lisp that no other language can overcome (unless they are called Lisp dialects).

First, let’s think about what the programming language we want to implement should look like.

We should think clearly about what we want to achieve. It’s a good idea to put together a strict description of the syntax, but I’ll make it simpler in this tutorial. Here’s an example of how we’ll implement the “λ” language:

# this is a comment println("Hello World!" ); println(2 + 3 * 4); # functions are introduced with 'lambda' or 'λ' fib = lambda (n) if n < 2 then N else FIB (n-1) + FIB (n-2); println(fib(15)); Print - range = lambda (a, B) # 'λ' is synonym to 'lambda' if a <= b then {# 'then' here is optional as you can see below print(a); if a + 1 <= b { print(", "); print-range(a + 1, b); } else println(""); # newline }; print-range(1, 5);Copy the code

Note that the identifier name can contain a negative character (print-range). It’s a matter of personal taste: I always put Spaces next to operators, I don’t like too many camelCaseNames, and I think the dash is better than the underline. The great thing about writing your own language is that you can do it as much as you want. 🙂

The output is:

Hello World!
14
610
1, 2, 3, 4, 5
Copy the code

The language looks a bit like JavaScript, but it’s different. First of all, there are no declarations, just expressions. An expression returns a value that can be used in place of any other expression. A semicolon is needed to separate expressions in a sequence. The curly braces {and} create such a sequence, which is itself an expression. Its value is the value returned by the last expression. Here’s a program that works:

a = {
  fib(10);  # has no side-effects, but it's computed anyway
  fib(15)   # the last semicolon can be missing
};
print(a); # prints 610
Copy the code

The function is introduced with one of the keywords lambda or λ (which are synonyms). After the keyword, there must be a (possibly empty) comma-separated list of variable names (possibly empty), just as in JavaScript — these are parameter names. The function body is a single expression, but it can be an expression wrapped around {… }. No return statement – the function returns the value given by the last expression.

There is no var. To introduce new variables, you can use what JSer calls a “function now” approach. Just like in JavaScript, use a lambda, declare variables as arguments. Variables have function scope, and functions are closures.

Even if itself is an expression. In JavaScript, you can use the ternary operator to get this effect:

a = foo() ? bar() : baz(); // JavaScript a = if foo() then bar() else baz(); # lambda anguageCopy the code

The then keyword is optional when the branch starts with a curly brace, as you can see above print-range, otherwise it is required. The else keyword is required when alternative branches exist. Again, the subject of the then else branch is a single expression, but the curly braces “{}” can also contain multiple branches through “;” Delimited expression. The result of the if expression is false when the else branch does not exist and the result of the if condition is false. Thus, false is a keyword that represents the unique Falsy value in our language:

if foo() then print("OK");
Copy the code

When foo() is not false, the program will print “OK”. There is also the keyword true for “true”, and all values that are not false (when using the JavaScript === operator) are interpreted as true (including the number 0 and the empty string “”).

Also note that wrapping an if condition in parentheses makes no sense. But while they are redundant, you can add them without error.

The entire program is parsed as if it were embedded in braces, so you need to place a semicolon after each expression except the last one.

Well, this is our little λ language. It doesn’t have to be perfect. The syntax looks cute, but there are pitfalls. It has a lot of missing functionality, like objects or arrays; We don’t focus on them because they’re not important to our journey. Once you’ve mastered everything in this tutorial, you can easily implement these missing features.

In the next section, we’ll write a parser for this language.

Related articles

  • How to Implement a Programming language using JavaScript (1) — Introduction
  • How to implement a programming language using JavaScript (2) — write a parser
  • How to implement a programming language using JavaScript (3) — input-stream
  • How to implement a programming language using JavaScript (4) — token-Stream
  • How to Implement a programming language using JavaScript (5) — AST
  • How to Implement a programming language using JavaScript (6) — Interpreter

Original link: lisperator.net/pltut/