Montag, Juli 02, 2007
Editing and Persistence
Most of you know HTML, the HyperText Markup Language, which is at the heart of web technology. For those who don't know it: The content of a web page is described with HTML. HTML structures the content of a web page hierarchically. Furthermore, the content is annotated by so-called tags. A tag indicates the logical purpose of a piece of content. Is some content supposed to be a header, a paragraph, a highlighted word etc.? For further info see e.g. Wikipedia on HTML or any other of the hundreds of resource about HTML.
We call the logical model of a language (like HTML) its form. The form contains language conceptions and their relations. Together with a syntax definition, a notation, the logical model can be serialized as a stream of characters. It's a both way thing. The character stream can be transformed into a form and vice versa. In case of HTML, it's the HTML syntax, the textual form which is send as messages via HTTP.
To produce a projection, a visual representation of the form, the rendering engine processes the form and a style definition -- something like CSS, for example. That's the way, how something appears on the screen.
The upper part of our figures makes up what can be called editing, the lower part realizes persistence.
Just some years ago, the traditional form of writing code had a misconception (and it's still widely spread): To edit code one uses a more or less advanced text editor. If you call the structure of a programming language its form and its textual notation its syntax, it's somewhat strange why one wants to work on a serialization format and not on the form directly, benefiting from advanced projections and a powerfull I/O-model. That is where modern IDEs like Eclipse move to. Internally, Eclipse builds up a model of the code, which represents its form -- technically speaking, it is an Abstract Syntax Tree (AST). This give you cool features like automated refactoring, word completion etc. However, the actual syntax of, say, Java or C#, is too restricted and too simple, to be an suitable snapshot of the code's form. And that's a pitty. There are only very few new approaches, like Intentional Programming, which realized the limitations a today's code syntaxes.
To summarize, what we discussed here:
HTML is just a serialization format for the exchange of content that is otherwise represented in form of an object model called DOM. This serialization format is the prerequisite to send content around in HTTP messages. But the HTML format is nothing a browser works internally with. It's the object model instead.
If we apply the editing/persistence-model on programming languages, we have to conclude that programming languages are historically driven by a text-based serialization format, which is -- today -- not appropriate anymore.