Freitag, Februar 23, 2007

Uniform Syntax

The syntax of a programming language defines (a) valid notational elements for use and (b) rules of their arrangement. By the way, this definition of syntax resembles very much the notion of a protocol. A protocol defines (a) messages and their constituting elements and (b) rules of valid message sequences. In an interactive programming environment (like if you program in Python interactively via the console), you communicate with the programming system according to a protocol; and the protocol is given by the syntax. There is an interesting duality between the notion of a protocol and the notion of syntax.

If you program in Scheme or Lisp, the syntax has a very interesting form: everything you type in in Scheme has to follow the syntactical form of lists. In Scheme and Lisp, lists are notated in the following way:

(first_element second_element third_element ...)

Lists can be nested: any element in a list can be a list. This is almost everything you need to know, in order to write down syntactically correct List/Scheme programms. All you have to take care about is that the number of left parentheses match the number of right parentheses.

The uniform syntax of Scheme and Lisp has very interesting consequences. First, program transformation is very easy. You just have to transform lists into lists. The way you do program transformation in Scheme and Lisp is via macros. Macros are a powerful tool in Scheme and Lisp. That is, why generative programming is something, Lisp and Scheme programmers are used to since decades. Second, since the uniform syntax of lists does not provide any kind of hint, what is actually meant by a list, what it is used for, we can't rely on the syntax alone. The syntax is, so to speak, too primitive. In order to understand what a list means in terms of programming instructions, we have to introduce a sort of second layer of "syntax". Certain conventions in the use of lists have to be introduced. For example, if a "if" symbol is the very first element in a list, we have an if-expression at hand, which expects at least two and at most three further list elements: the second is the test, the third the so called consequent, the (optional) forth the alternate.

The point here is that this second layer of syntax does not break the first layer of syntax, the notational form of lists. In other words, the syntax of Lisp and Scheme does not shape and structure the notational form of programs beyond the form of lists. This explains, why Lisp and Scheme programs are not so easy to read. Human beings are very sensitive on visual shape and form; it helps us easily recognize a pattern. In Scheme and Lisp, you need to look "through" all the parentheses and train your brain on the "hidden" layer of syntax; and that takes some time. It's also an ability that gets quite easily lost if not trained permanently.

So, there is a benefit and a drawback with uniform syntax: you get a powerful transformation system, but you loose readability of your programs.

By the way, if you want to, you can benefit from the power of a uniform form (I don't like to say "syntax" in this context) in your favorite OO language as well. Program your conceptions as classes (your second layer) and enable object configurations solely via constructors (your first layer, the uniform "syntax"). (In Python, it's the "__init__"-Method.) Instantiation would look something like this:


The result is that you get an Abstract Syntax Tree (AST) for free and that you can do program transformations if you like. It's a cool technique for creating a Domain Specific Language (DSL). It puts you somewhat closer to the power of Lisp and Scheme.

I'm -- of course -- not the first, who describes the problems of uniform syntax. You can, for example, read about it in the book from Peter van Roy and Seif Haridi, "Concepts, Techniques, and Models of Computer Programming", MIT Press, 2004, p. 544f.


Mark hat gesagt…

Yes, I think you're post has captured the essence of why it is so easy to write DSLs in Lisp and indeed it is a very common technique among Lisp programmers.

It's perhaps worth mentioning that these DSLs can be used (a) to generate other Lisp expressions and (b) expressions in non-Lisp languages. The latter is something I've used to greate effect to generate SQL, VB, C++, XML, COM Type libraries, documentation etc.

A couple of minor points:

- Most Lisps can also be used interactively via the console - and one of the best times to use this capability is in the middle of a program break where you can write Lisp to do interesting things when you continue the program.

- Taking care that parentheses match isn't a problem - many IDEs will help you with this, and the (pretty-printed) layout of a Lisp program on the page also helps make this clear.

- I certainly don't agree that Lisp programs are hard to read. But - I have been programming in Lisp a while :-).

In fact, like most Lisp programmers I find programs in C-like languages difficult to read - compared to the simple beauty of a Lisp program a C program looks to me like a spider has walked across the page.

I'd be interested to hear your views on the difficulty that more complex syntaxes present to the (in-language) DSL developer. I've heard it said that in languages with C-like syntaxes it is very hard to write DSLs.

Mark Dalgarno
Editor, Code Generation Network

Writing (mainly) on software at The Variation Point

dh hat gesagt…

Hi Mark,

thank you very much for your comments.

I especially like your comment about the readability of C-like languages. The use of a non-uniform syntax is not a guarantee for improved readability. It leads to a very interesting question: What makes a language readable?

Regarding more complex syntaxes. In my post I roughly sketched how you can design your own DSL in Python by using the "uniform syntax" of constructors. I think, you can transfer this technique to many other languages. It should work for C as well. This is at least a very simple way to prototype DSLs within your favorite programming language. Writing DSL-code via class constructors isn't very comfortable (for prototyping, it's ok, I think), but you can print out your AST in a concrete syntax using (in Python) the __repr__-Method. This is also an easy way to experiment with a concrete syntax for your DSL. After that you can write a parser for your DSL if you like.

-- dh