LISP is simple and difficult, elegant and ad hoc; it is a beautiful blend of foresight and fortuity.¹

Whether you agree with this statement or not, Lisp’s historical importance is indisputable. Even though a beautiful language in its own right, Lisp turned to be one of the most influential modelsof computer programming. It also gave birth to many widespread concepts in the industry today and, as mainstream is steadily shifting from von Neumann style to λ-calculus, we can suspect many to be adopted in the future. In effect, Lisp is of interest to both a historian and a futurist. Paul Graham suggested that what John McCarthy did for programming was something like what Euclid did for geometry². What is so important that McCarthy discovered? To answer this question we have to put aside decades’ worth of Internet mythos and take a look at Lisp’s humble beginnings — Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I³ by John McCarthy. In this article, I will follow his steps in order to explain how Lisp was constructed.

I’m writing this article to practice my understanding and appreciation of McCarthy’s work. The goal of this article is to explain how Lisp came to be and discuss properties that make it special. Hopefully, someone will find it useful. I highly recommend reading the original paper, and Paul Graham’s take on it, after this short introduction (see References).

1. Preliminaries

Lisp is constructed from the bottom. It is only appropriate that we start by laying out basic ideas and notations concerning functions which we will use later on. Most of the concepts presented here are well-known, so feel free to skip this section.

1.1. Partial functions

Partial functions (not to be confused with the partial application!) are functions that are defined only for parts of their domains. The domain of a functionis a collection of all possible arguments to a function. Most of the functions in computer programs are partial (at least in imperative languages). Consider the following example (in Python):

def one_nth(n):
return 1 / n

or, a bit more involved:

def query_purchases(client: BillingClient) -> List[Purchase]:
if client.is_initialized:
return client.query_purchases()
else:
raise Exception('Billing client is not set up')

The behavior of these functions (in a mathematical sense) is undefined for certain inputs, as passing zero to one_nth or uninitialized client to query_purchases will result in an exception. Another possibility is that functions may not terminate, e.g. when caught in infinite recursion. It derives from the fact that the type systems are often too weak to constrain the type as much as the programmer would like or the penalty of assigning more precise type is too high. In mathematics, we don’t have this problem. In case of one_nth, we could simply say that the domain of the function is ℝ \ {0}.

One example of a more “expressive type” system is Idris’, which supports dependent types. Although dependent types are a story for another time, in short, it simply means that types themselves are dynamic and first-class, in the sense that they possess properties related to values they take. For example, a function taking argument N may return the type of list which is no longer than N.


In general, any function that can result in an error, or does not terminate for some of its inputs, is considered partial.

In some languages, such as Haskell, a computation which never completes successfully is referred to as “bottom” and denoted with “⊥”. Therefore, when the evaluation of a partial function results in a partial behavior (such as a crash), its value is ⊥.

In Racket, Lisp dialect, the programmer can explicitly mark a function as partial.

Sometimes, when it is desirable to suppress errors and exceptions, a special value might be returned acting as a bottom, such as IEEE floating point’s not-a-number (NaN).


Analogically, when a function is total, we can say that it produces a valid result for all possible inputs. For example, the function not is obviously defined for both True and False, which are the only possible values of the bool type. Total functions allow a programmer to reason about programs using the types alone; a function with the type A B implies that it is always possible to get a B when you have an A.

1.2. Predicates

A predicate is a function whose codomain (set of all possible outputs) consists only of truth values (i.e., truth or falsehood). A common convention is to denote truth with t and falsehood with f (in the original paper McCarthy used uppercase letters). Simply put, a predicate must return a Boolean.

1.3. Pure functions

One of the most important advantages of purely functional approach over imperative one (and by extension OOP) is referential transparency, which guarantees that for a specific set of arguments a function (or generally, an expression) will always return the same result (it is entirely predictable). In other words, an evaluation of an expression can have no effect other than to compute its value. As a result, one can freely replace expressions by their values. Functions with this property are called pure. Almost all functions in imperative environments are impure.

1.4. Conditional Expressions

Image for post

The function above is a well-known absolute value. Because its value depends on the condition (whether x is greater-or-equal to zero), we can say that it is conditional in its behavior. However, this notation is not suited for expressing symbolically the dependence of quantities on truth values — it has to rely on English words.

A better notation for representing conditional expressions was proposed by McCarthy (now referred to as McCarthy formalism):

(p₁ → e₁, … , pₙ → eₙ)

Here, the p’s are propositional expressions and the e’s are expressions of any kind. It may be read as:

If p₁ then e₁ otherwise if p₂ then e₂, … , otherwise, if pₙ then eₙ.

Notice how we are assuming here the evaluation order — we consider pᵢ from left to right. With this notation under our belt, we can express the absolute value function simply as:

(x ≥ 0 → x, x < 0 → -x)

More importantly, however, this notation is sufficient to express any if-then statement. For example:

if n % 3 == 0 and n % 5 == 0:
return 'FizzBuzz'
elif n % 3 == 0:
return 'Fizz'
elif n % 5 == 0:
return 'Buzz'
else:
return n

…can be expressed as:

(3 ∣ n ∧ 5 ∣ n → 'Fizzbuzz',
3 ∣ n → 'Fizz',
5 ∣ n → 'Buzz',
t → n)

m | n means that n is divisible by m, or m divides n. is standard logical conjunction, or AND in Boolean algebra.

Notice how by exploiting the left-to-right evaluation order we can encode the else clause using the t value.

Conditionals were introduced by Lisp. In effect, if statements, as we know them today, are thanks to McCarthy. As of 1958, Fortran only had a conditional goto which was based on the branch instruction in the underlying hardware. McCarthy also got conditionals into Algol 60, whence they spread to most other languages⁸.

1.5. Recursive functions and label

Thanks to conditional expressions given above, we can now define functions recursively. For example, we can define factorial as:

n! = (n = 0 → 1, t → n · (n − 1)!)

Note, however, that we are making here one very important assumption — the body of the function can reference the function itself by its name (or symbol — ! in this case).

In most actual languages this is not a problem, but remember — we do not have any language yet!

In general, we can solve this problem with something called Y combinator, but that’s a topic for another time. As McCarthy most likely was not aware of Y combinator at the time of writing his paper, he introduced his notation instead.

Image for post
Fig 1. McCarthy in his AI laboratory at Stanford

label(a, e) denotes the expression e, with the special property that all occurrences of awithin e are to be interpreted as referring to the expression e itself. Therefore, by combining label with conditional expressions, we can define Euclid’s algorithm for calculating the greatest common divisor as follows:

label(gcd, (m > n → gcd(n, m),
m | n → m,
t → gcd(n % m, m)))

It is important to note that although recursion existed as a mathematical concept before Lisp, Lisp was the first programming language to support it.

2. S-expressions

First, we define a symbolic expression (s-expression, or just expression forshort). An s-expression is either:

  • an atom, which is a sequence of characters, e.g. spam, ham, eggs or f*&o$o. An atom cannot start with (, which is reserved for lists.
  • a list of zero or more s-expressions which are separated by whitespace and enclosed by parentheses. For example: (abc) , (), (a (bc)).

In simplified extended Backus-Naur form (commonly used for describing the syntax of a language), s-expressions can be represented as:

Image for post
Fig. 2 Simplified EBNF for s-expressions

— which is the exact restatement of the definition given above (except that atoms are assumed to be alphanumeric).

The above definition is recursive, as we are referring to s-expressions while defining lists of s-expressions. Some examples of valid s-expressions include:

()
foo
(bar)
(foo bar)
((spam) (ham ()) eggs ())

— but also:

(+ 1 2 3 5 8 13)
(> 24 512 12)
(and true false true)

3. Abstract machine

“The most practical way to think about the meaning of a program is what it does — when we run the program, what do we expect to happen? How do different constructs in the programming language behave at run time, and what effect do they have when they’re plugged together to make larger programs? This is the basis of operational semantics, a way of capturing the meaning of a programming language by defining rules for how its programs execute on some kind of device. This device is often an abstract machine: an imaginary, idealized computer that is designed for the specific purpose of explaining how the language’s programs will execute. Different kinds of a programming language will usually require different designs of an abstract machine in order to neatly capture their runtime behavior.”⁷ In order to construct Lisp, we first construct an imaginary machine on which it will operate.

Image for post
Fig. 3 ST µA741 op-amp, or: imaginary Lisp machine

3.1. Elementary functions

We start by defining seven (The Magnificent Seven, in fact) elementary functions on s-expressions. We call them elementary because we assume they exist on the abstract machine (just like we assume axioms to be true in some mathematical system). A function on s-expression receives an s-expression and returns a new one. By convention, we will refer to such functions as operators.

To keep our notation simple, we represent operators using atoms and lists — if an s-expression is a list, the first element is an operator, and remaining ones are arguments. This means that functions on s-expressions are encoded using s-expressions themselves. For example, the car operator expects one argument, and so its invocation will take the form:

(car x)

Atom “car” on its own refers to the function itself. If car expected an argument which is a function (it doesn’t), we could write (car car).

atom

atom is a very simple operator; it just returns t if the argument passed is an atom, and otherwise.

> (atom f)
t
> (atom foo)
t
> (atom atom)
t

quote

quote is even simpler, it returns whatever was passed into it.

> (quote x)
x> (quote (atom x))
(atom x)> (quote (1 2 3 4 5))
(1 2 3 4 5)

It is crucial to understand how quote behaves and why is useful. The second example — (quote (atom x)) —will reduce to the list containing atom and x. (atom x) is not “evaluated” further (at this point we do not know what evaluation even is, but let us ignore that for a second). In essence, quote is used to stop the evaluation — normally in(1 2 3), 1 would be assumed to be an operator but we want to treat the list as a regular list. This is why we would use quote. It is usually abbreviated with ', which gives us:

> '(1 2 3)
(1 2 3)

Now that we have quote , we can go back to atom for one more example:

> (atom '(foo bar baz))
f

If the code would eventually be evaluated by some kind of program capable of doing that, quote would prevent evaluation of any s-expression given as its argument, which is very important when passing data around.

eq

eq returns t if the values of its arguments are the same atom and f otherwise.

Some languages, such as Scheme, use eq?. Clojure uses =.

car

car expects a list as an argument and returns its first element. It’s name, similar to cdr, is a historical legacy.

> (car '(1 2 3 4))
1

Certain languages, such as Clojure, rename car with something more intuitive, e.g. first. The “car” and “cdr” names are derived from the machine architecture that Lisp was designed for and nowadays languages use more familiar names.

cdr

cdr expects a list as an argument and returns its back without the first element. For example:

> (cdr '(1 2 3 4))
(2 3 4)

In Clojure, cdr behavior is provided viarest.

cons

cons (construct) takes two arguments, second of which must be a list, and prepends the list with the first argument:

> (cons 1 '(2 3))
(1 2 3)

Why would we prepend to lists instead of appending them is a topic on its own. In short, lists constructed as the head and rest are simpler to manipulate recursively, which is a huge benefit for a language such as Lisp.

cond

We will now translate conditional expressions into s-expressions using cond. cond has the following form:

(cond (p₁ e₁) ... (pₙ eₙ))

The value is evaluated as follows. We evaluate p expressions from left to right. If any predicate evaluates to t, the value of the cond expression is taken to be the value of the corresponding expression e. For example:

> (cond (t 1) (f 2))
1

Moreover, because we evaluate from left to right:

> (cond (t 1) (t 2))
1

3.2. Denoting non-elementary functions

With basic functionality in place, we define a general scheme for defining new functions on the abstract machine. A function is expressed as

(lambda (p₁ … pₙ) e)

where p₁ -pₙ are called parameters and e is an expression. If you ever tinkered with lambda calculus (or with lambdas/anonymous functions in pretty much any modern programming language), you will recognize this notation.

An expression whose first element is a lambda, i.e., is of the form:

Image for post

— is called a function call and its value is computed as follows. Each expression aᵢ is evaluated. Then e is evaluated. During the evaluation of e, the value of any occurrence of one of the pᵢ is the value of the corresponding aᵢ in the most recent function call.

Do you remember label notation? We can now translate it to s-expression form and use it on our abstract machine to obtain a general method for defining functions. The label s-expression will have the following form:

(label name lambda)

— where nameis an atom representing the name of the function and lambda is the function body, also called to form.

Using label and the Magnificent Seven our expressive power is limitless.

For example, we can define the Boolean operator NOT:

(label not (lambda (x)
(cond (x f) (t t))))> (not t)
f> (not f)
t

As you can see, NOT expects one argument — x. In the body of the function, we perform a conditional check of x. If the value is equal to t, we return f and otherwise t (t in the last positional of the cond acts as the else clause in modern languages, since the condition is always true).

Next, we can define a bit more involved Boolean AND:

(label and (lambda (x y)
(cond (x (cond (y t) (t f)))
(t f))))> (and t f)
f> (and t t)
t

This body can be summarized as: If x is true, check if y is true and if so, return t. If y is not true, or x is not true, return f (we have two else clauses).

We can write functions to manipulate lists, Boolean expressions, pair elements, rewrite expressions, and so on. For example:

  • (not x) returns t if its argument returns f and t otherwise.
  • (pair x y) takes two lists of the same length and returns a third one containing two-element lists of corresponding elements in argument lits. Nowadays we would more likely call this function zip.
  • (append x y) takes two lists and concatenates them.
  • (assoc x y) takes an atom x and a y of the form return by pair and returns a second-element from a pair in y whose first element is x. This function emulates a lookup into a dictionary (alternatively called a hash table or associative array).

It is important to note that in Lisp, functions are first class objects — they’re data, just like numbers and strings. Therefore, functions can be stored in variables, passed around as arguments, and so on. The idea of a first-class function was pioneered by Lisp.

3.3. Evaluation

More importantly, however, we can write a function that takes arbitrary Lisp expression and returns its value:https://medium.com/media/e32cbaa4fa84da8db305b5db750c85e1

That is a bit of mouthful. If you want a step by step description, which is beyond the scope of this article, I recommend reading Recursive Functions… by McCarthy³ and Paul Graham’s The Roots of Lisp², which take a thorough look at eval‘s inner-workings.

The important thing is this: eval is a Lisp interpreter, and thus, implements the language. Any valid Lisp code can be passed to it for evaluation and we don’t need any other infrastructure. With only seven functions and a bit of notation, we were able to implement the whole language and allow it to define new functions. That’s neat and very elegant, but so what? Now that we understand how Lisp operates, let’s step back and consider what it means.

If you’re curious how Lisp can be implemented in a few hundred lines of code, check out my toy Lisp implementation: Kite.

4. Chemical X: Homoiconicity

In Lisp, as we saw, both code and data are represented in the same form: as s-expressions (forming trees) which are available for unrestricted mutation and execution. This means that the Lisp code is first-class. It can be explicitly constructed from scratch, passed around, modified and executed. In other words, in Lisp everything is data. A language in which the internal «representation matches the external representation is called homoiconic (from the Greek words homo meaning “the same” and icon meaning “representation”). Other examples of a language with this property include Prolog (logic programming) and Julia (numerical analysis and computational science). One of the immediate consequences of homoiconicity is that by using just a few functions: quote, atom, eq, car, cdr, cons, and cond, we can define eval, that fully implements the language, and with label and eval we can define any additional function we need. In fact, it is a famous fact that the core of the Lisp can be written in about 20–30 lines of Lisp. In (an unfair) comparison, the average person who writes a C compiler or interpreter requires about 20,000 lines of C to do so and must be moderately expert about compilers or interpreters (of course, we rely on meta-circularity in doing so, but that’s a topic for another post). At this point, you may be asking yourself — what’s the big deal?

The big deal, as it turns out, that when the representation of the program (code) is the same as the representation for that data, you can manipulate code like you would manipulate data. This allows, for example, for syntactic extensibility, which is the ability to extend (or completely) change the syntax of the language. For example, in a normal language, if you wished that you had the for-each construct, you would have to wait until creators of the language add it in the next version. In Lisp, you can write a macro in a matter of minutes. In many Lisp languages, program transformation is an explicit part of the evaluation strategy of the language.

Moreover, in Lisp, programs composed of expressions. Lisp programs are trees of expressions, each of which returns a value. At the time, this was a revolutionary idea, in contrast to languages such as Fortran which distinguish between expressions and statements. This was largely dictated by the limitations of the hardware available in the late 1950s.

Finally, what McCarthy discovered is not just a programming language but a very elegant model of computation, just a notch above lambda calculus. We can say that Lisp was discovered, and not invented, because McCarthy built it from axioms so fundamental that it would be very hard to make an argument for the otherwise.

Core Lisp corresponds to an untyped, call-by-value lambda calculus extended with constants. As it turned out, lambda calculus* corresponds to natural deduction in logic. Specifically, by Curry-Howard isomorphism, types in a program are propositions, programs are proofs and evaluation of programs is the simplification of proofs. This discovery is phenomenal. Among others, it means that we can reason about programs using logic and that we get theorems in logic for free in programming (and vice versa).

The big difference between Lips and lambda calculus is that lambda calculus doesn’t contain any non-functional data. In lambda calculus, everything is a function, even a number (if you’re curious how this is achieved see Church encodings or Mogensen-Scott encodings). Lisp, on the other hand, operates on atoms and numbers (hence, constants). This is most likely what makes Lisp readable to mortal humans — a property that lambda calculus lacks.

The adherence to mathematically sound principles of computation allows Lisp to reap the benefits in the form of simple semantics, clarity, consistency, and other nice things such as referential transparency (original Lisp formulation is completely value-based). In contrast, almost all imperative languages are “just” tools (sometimes very good tools nonetheless) with undefined formal semantics, which makes reasoning about them rigorously very hard or almost impossible.


Thank you for reading. Please check out the references and maybe my other stuff. If you have any questions, please feel free to send me an e-mail at hi@iwoherka.eu or leave a comment.

*Specifically, typed lambda calculus. For more details, I highly recommend great talk by Philip Wadler, co-creator of Haskell — Propositions as Types.


References

  1. BYTE Magazine, August 1979
  2. The Roots of Lisp
  3. Recursive functions of symbolic expressions and their computation by machine, Part I
  4. How Lisp Became God’s Own Programming Language
  5. Homoiconic Languages
  6. On Lisp
  7. Understanding Computation
  8. What Made Lisp Different

Thanks to Mateusz Papiernik, Michał Moroz, and Marcin Struś. 

If you have any questions or an idea we could help you with…

Let’s talk!

As the Principal Software Engineer at Makimo, Iwo Herka thrives in the rich terrain of Elixir and functional programming. He employs a scientific approach to demystify its complexities, chronicling his findings in in-depth articles, insightful talks, and informative videos. When he steps away from the pulsating world of code, Iwo can be found trekking nature's trails or engrossed in a good book. For Iwo, programming isn't merely a job — it's a daily adventure he passionately embarks on.