Direkt zum Hauptbereich

Complexity, Size and Focus

Why is it so hard to understand software? Or to put it more to the point: What makes it so hard to write understandable software? This question is the driving theme of this article: the quest to uncover techniques of complexitiy management that go beyond established principles of software engineering such as information hiding and modularization and which are unbound to specific paradigms like object-orientation or functional programming.

For that reason I studied two kinds of software systems in depth: telecommunication systems, modeling and programming languages. I did so for two reasons: First, telecommunication systems are among the oldest, largest and most successful systems. They are highly reliable, roboust and scalable. What are their key design principles that make them stand such demanding requirements? Second, each modeling and programming language is its creators attempt to provide a set of language conceptions to address and manage complexity. A language – be it a programming or modeling language – is, so to speak, the destillate of another persons opinion, experience and expert knowledge about how to write „better“ programs or how to design „better“ software.

My observation and claim is that two factors are most responsible for what I capture under the term „complexity“: size and focus. Size refers to number of lines of code and number of features, focus refers to intellectual comprehensibility.

The code basis of todays operating systems goes into million lines of code. Recently, Linux was reported to reach 10 million lines of code. Windows 7 is estimated to be of the same size. These numbers do not include application software typically shipped with these operating systems. Applications such as OpenOffice also count some ten million lines of code. The latest release of the Eclipse IDE (Integrated Development Environment), version 3.6, counts 33 million lines of codes. Since there is a relation between code size and the number of faults to be expected per 1000 lines of code (numbers vary from 25 to about 1 error per 1000 lines of code), such huge-sized software is highly interspersed with faults.

Another dimension of size is number of features: Todays software is extremely feature-rich. A whole industry of education and consulting is build around teaching and configuring the use of software, which is too feature-rich to use out of the box. Office applications and SAP R5 come into mind. This observation also refers to programming languages used to build these systems. The most spreaded languages are C, C++, C# and Java. Java, for instance, is such feature-rich that it requires a programmer to learn and understand a language specification of almost 700 pages of text. It is no exaggeration that most Java programmers only master a personalized subset of these 700 pages.

The sheer size of code of todays software makes it impossible for a software developer to understand software systems in its entirety. The sheer volume of code is overwhelming und impossible to master. It is a valid question whether this size complexitiy is inherent to the problem domain or a symptom of a certain design philosophy that has become main stream and is manifested in lanaguages like C(++), C# and Java. Alternative approaches indicate the latter: TeX, a typesetting system designed in the 1980s by Donald E. Knuth, is still top-class in its typesetting quality and widely spread in academia and among textbook authors; many publishers prefer manuscripts produced in TeX. TeX is based on a language kernel with primitives for typesetting and it can be easily extended via a powerful macro system. Besides bug fixes, the TeX kernel has been kept stable for almost 30 years now. Nonetheless, the system adapted constantly via its macro system with grwoing demands and new technologies coming up. Another example is Postscript.

There are also alternatives to feature-rich languages like C(++), C# and Java. Kernel-based languages like Lisp/Scheme, Prolog, Forth and Smalltalk are easy to understand. Their implementations fit on some few pages of code. They easily incorporated new paradigms and trends (e.g. object-orientation and aspect-orientation) due to their extensibility.

Another aspect of complexitiy management is focus. From a cognitive viewpoint, complexity is a human beings incapability to intellectually manage information which is (a) too much and (b) spread in time and space. It is a combination of information overload and a lack of recognizing temporal and/or spacial patterns. Two techniques address these issues: one is condensation, the other is localization. Condensation comes in two forms: abstraction building and modeling. Abstraction building can be reversed by refinement without loss of information; modeling condenses at the price of loosing information thereby simplifying things. A simplification introduces faults and errors; an oversimplification overstresses the acceptance of incorrectness. Localization is a technique to bring together (to bring in focus), what was spread and distributed before and thus appeared to be unrelated and unconnected. To concentrate on a problem (domain) means to put it in focus, to dissolve and localize relevant parts and highlight their relations, which might be spatial (i.e. structural) and/or temporal (behavioral). The act of localization establishes a new context, a new perspective or point of view, a new universe of discourse, a new domain.

In software engineering, several techniques have been developed for abstraction and localization. Among many other ideas we just would like to mention abstract data types, object-orientation and meta-object protocols, aspect-orientation, meta-programming and macro systems. All these approaches have one thing in common: they try to rearrange parts in a software description, they try to bring in focus, they localize. We call the flexibility of a language to adapt to different localization needs its expressiveness.

Interestingly, the other aspect of condensation, modeling, is rarely used in a systematic manner in software engineering with a clear understanding of the degree of incorrection and impreciseness introduced with a model. This understanding of modeling differs significantly from the common interpretation of the term. Typically, modeling is more meant to be a form of visual programming or a means to visually create code templates.

The assumption is that small size systems are a natural consequence of systems designed with extremely expressive languages. Empirical data point into this direction: Systems developed in expressive languages like Lisp/Scheme, Prolog, Python or Ruby argue with code size reduction compared to languages like C, C++, C# and Java. These languages (Lisp etc.) are quite expressive, whereas C and others strictly separate the language from the problem domain. If a certain localization need is not covered by language features, frameworks need to be designed and implemented to simulate expressiveness.

I think that software engineering has yet underestimated the use and the value of highly expressive languages and highly extensible kernel-based systems.

Beliebte Posts aus diesem Blog

Lidl und der Kassen-Bug

Es gibt Fehler, im Informatiker-Jargon "Bugs", die etwas anrühriges haben. Ich bat den Menschen an der Kasse bei Lidl um einen Moment Geduld und meine Kinder um Ruhe, um nicht den wunderbaren Moment zu verpassen, bei dem es passierte. Der Lidl-Mensch fluchte kurz auf -- und ich war entzückt! "Einen Moment, davon muss ich ein Foto machen!" Und dann machte ich noch eines. Ich bin heute extra für diesen Fehler zu Lidl gepilgert -- ich wollte es mit eigenen Augen sehen. Gestern hat mir ein Student (vielen Dank Herr Breyer) von diesem Fehler in einer EMail berichtet. Ein richtig schöner Fehler, ein Klassiker geradezu. Ein Fehler, den man selten zu Gesicht bekommt, so einer mit Museumswert. Dafür wäre ich sogar noch weiter gereist als bis zum nächsten Lidl. Der Fehler tritt auf, wenn Sie an der Kasse Waren im Wert von 0 Euro (Null Euro) bezahlen. Dann streikt das System. Die kurze Einkaufsliste dazu: Geben Sie zwei Pfandflaschen zurück und Lidl steht mit 50 Cent bei Ihne

Syntax und Semantik

Was ist Syntax, was ist Semantik? Diese zwei Begriffe beschäftigen mich immer wieder, siehe zum Beispiel auch " Uniform Syntax " (23. Feb. 2007). Beide Begriffe spielen eine entscheidende Rolle bei jeder Art von maschinell-verarbeitbarer Sprache. Vom Dritten im Bunde, der Pragmatik, will ich an dieser Stelle ganz absehen. Die Syntax bezieht sich auf die Form und die Struktur von Zeichen in einer Sprache, ohne auf die Bedeutung der verwendeten Zeichen in den Formen und Strukturen einzugehen. Syntaktisch korrekte Ausdrücke werden auch als "wohlgeformt" ( well-formed ) bezeichnet. Die Semantik befasst sich mit der Bedeutung syntaktisch korrekter Zeichenfolgen einer Sprache. Im Zusammenhang mit Programmiersprachen bedeutet Semantik die Beschreibung des Verhaltens, das mit einer Interpretation (Auslegung) eines syntaktisch korrekten Ausdrucks verbunden ist. [Die obigen Begriffserläuterungen sind angelehnt an das Buch von Kenneth Slonneger und Barry L. Kurtz: Formal Syn

Mit Prof. Handke im Gespräch: Vom Workbook zum Inverted Classroom

Aus dem Netz in Handkes Büro Es gibt diese schönen Momente, da führen soziale Medien zu sozialen Begegnungen im echten Leben. Ich twittere im Nachgang zur #BiDiWe16, ein Dialog mit Jürgen Handke ergibt sich, er schickt mir seine Telefonnummer, ich rufe sofort durch, wir verabreden uns. Drei Tage nach der #BiDiWe16 sitze ich bei Handke im Büro, das gleichzeitig sein beachtlich ausgestattetes Aufnahmestudio beherbergt. Es ist Freitagmorgen, 9. September 2016. Jürgen Handke ist mir kein Fremder. Ich habe zwei seiner ICM-Konferenzen besucht, auf der #BiDiWe16 in Berlin hielt er die Keynote. Er hat für seine Lehre Preise erhalten, zuletzt 2015 den Ars Legendi-Preis für exzellente Hochschullehre. Zugegeben, ich hadere mit dem Konzept des Inverted Classroom -- auch Flipped Classroom genannt. Meine Erfahrungen mit der Programmierausbildung von Informatik-Studierenden des 1. und 2. Semesters lassen mich zweifeln. Videos habe ich auch schon produziert, aber vor allem das selbstgesteuerte