Donnerstag, Dezember 07, 2006

Inversion of Control

Did you every try to implement an interpreter for the Scheme programming language yourself? I enjoyed it a lot -- and still enjoy it, since I haven't finished my little Scheme project yet. It is a lot of fun to implement Scheme in Scheme, but I use Python in order to make sure I understand each and every bit of the Scheme language. While a basic Scheme implementation is easy to realize, a more advanced implementation supporting so-called continuations is a nice challenge. First you have to understand the concept of continuations, which is quite a challenge in its own. Second, you need to have an idea how to make them happen in your host language.

Just to give you a rough idea: In Scheme, you get a sort of read and restore access to the current state of the computational process. There is a special procedure in Scheme, which you can use anywhere in a Scheme program; the procedure is called "call with current continuation" ("call/cc" for short) -- but that's not so important here. When executed, this special procedure saves the current state of the computational process and packages this information within something called a continuation. A continuation is a first-class entity in Scheme; you can pass it around and store. Once a continuation gets called, the state of the computational process stored within the continuation is restored. Simply speaking, program execution continues at exactly that point, where the continuation was created. To give you an analogy, think of exceptions as you know them from other programming languages. When you "try" something, the "except" clause restores the computational state of affairs in case an exception is thrown, no matter when and where deeper down in the call chain the exception is raised. Unfortunately, exceptions are functionally weaker than continuations are. You can build an exception system with continuations, but not the other way around. You cannot store an exception, raise it outside the calling context of a "try" and throw it again later on. That's the kind of thing you can do with continuations. Exceptions are a poor man's continuations.

Besides Scheme, there are only some few languages, which natively support continuations. Among those is Ruby. In most cases however, you are left without any support for continuations -- and that's not too bad. It forces you to really understand what continuations are and how to implement them. That's the situation I was faced with.

If you do not have full control over the computational process, as is needed for an implementation of continuations, you have to think of a way to regain execution control. What you end up with is either writing a compiler, which transforms Scheme programs into a form called Continuation Passing Style (CPS), or you write an interpreter maintaining its own call stack. The call stack represents the state of the computational process. Having access to the call stack is key for an implementation of continuations. The current continuation is the current situation of the call stack.

I opted for writing an interpreter. In the following, I will show you the way how I did this with Python. But before you hesitate to read on: You do not need to know or understand Scheme. What I'm after in this post is not Scheme; it's the way how to regain execution control. Scheme and the implementation of continuations is just one motivating example. I'll give you more examples later on; examples, which are relevant for anybody with an interest in software design.

For setting the scene, I need to tell you something about a cool feature in Python, namely generators. A generator looks like a regular function definition. As you know from other languages, a "return" statement within a function terminates execution of the function and possibly returns some data. Within a generator definition you do not use "return" statements. Instead, you use "yield" statements much like return statements, but a "yield" works differently. A "yield" returns data but it does not terminate the generator. The execution state of the generator is preserved and execution can be resumed later on. (Sounds like another poor man's continuations, eh?!) To give you an example, here is an extract of an interactive Python session:


>>> def countdown():
... yield 2
... yield 1
... yield 0
...


If try to call this function, it won't work as you might expect. It doesn't return a 2 or something like that. A generator object is returned instead.


>>> countdown()



With the generator object you can iteratively step from one "yield" statement to the next via the generator's "next()"-method, get some value back from the "yield" statement and, later on, resume execution where it left-off.


>>> cd = countdown()
>>> cd.next()
2
>>> cd.next()
1
>>> cd.next()
0
>>> cd.next()
Traceback (most recent call last):
File "", line 1, in
StopIteration
>>>


The exception "StopIteration" indicates that the generator is done. Generators support the same protocol as Iterators do. So, a generator can be used within a "for" statement. Creating the generator object, calling "next()" and exiting when StopIteration is raised, all this is done automagically for you in the background.


>>> for i in countdown(): print i
...
2
1
0


A more flexible definition of countdown could look like this:


>>> def countdown(n):
... assert n >= 0
... while n >= 0:
... yield n
... n = n - 1
...


I think that you get the basic idea. With "yield", a "function" (actually a generator) can return control to the caller of the "function" at any time on appearance of a "yield" statement. A "function" does not need to finish its computation before it can return something.

In my Scheme interpreter I use generators to return control, whenever the evaluator of a basic Scheme expression needs to call the evaluator of another Scheme expression. Instead of doing something like this (what you would conventionally do) ...


def do_something(n):
result = do_something_else(n)
return result


... I do that, a slight modification to the code in order to return control, whenever I call another function.


def do_something():
result = (yield do_something_else,n)
yield result


Got it? Control is returned to the caller of "do_something". The generator hands over the method object and the arguments to the caller. Now, it is up to the caller to call "do_something_else" on behalf of "do_something". After that the result of "do_something_else" is sent into the generator and the generator proceeds. Just to give you an insight how I use this technique for my Scheme interpreter, here is a code snippet without further comment.


class If(object):
def __init__(self,test,consequent,alternate=None):
self.test = test
self.consequent = consequent
self.alternate = alternate

def eval(self,env):
if (yield Command(eval,self.test,env)) != False:
yield Command(eval,self.consequent,env)
elif self.alternate:
yield Command(eval,self.alternate,env)

def __repr__(self):
repr = "(if %s %s)" % (self.test,self.consequent)
return self.alternate and (repr[:-1]+" %s)" \
% self.alternate) or repr


The overall pattern behind this approach is Inversion of Control (IoC): Delegate method calls to a centralized instance (in this case I used "yield" statements for that purpose) so that this instance gets full control over who is when to call next.

By the way, I don't feel comfortable calling Inversion of Control a pattern. It's not that easy to boil IoC down. It cannot be captured by a simple code template. I would say it's a meta-pattern or a design principle. There are other techniques to achieve IoC. Metaprogramming is one example.

Whatever you call it, Inversion of Control is a very powerful thing. IoC is about getting control over the flow of execution. You can use it to achieve quite fancy things. It might be the basis for an architectural infrastructure. Let me give you some examples.

Suppose you do not want to use threads, since there is some cost penalty associated to them. You choose to go for cooperative multithreading. What is cooperative multithreading about? The basic idea is that only one task of multiple concurrent tasks actually gets the control thread. The task returns control on own initiative back to a dispatcher and scheduler. That's why it is called "cooperative": Each task is assumed to give control back as soon as possible, so that other tasks on the scheduler's list can continue. Guess how one could implement this? With the help of generators using the technique I just showed you.

Another example: Your language does not support pre- and post-conditions (see my post about "Netze spannen mit Design by Contract"). Again, use Inversion of Control. It's up to the controller to call a method. Before calling a method, the controller just needs to check if there is a method registered representing the pre-condition and call it beforehand. Similarly, the method representing the post-condition is called after the main method. By the way, you can use this to add aspect-orientation to your system!

A final example: Suppose that you want to monitor a certain component in your system. You are interested in all methods calls going out and coming in. Since the interaction with the environment is complex, you log all method calls and the results they return. You record everything. Now, you can remove the environment and let the component re-run a certain scenario. The controller can now "replay" the recorded reactions and stimuli of the environment. Thereby, you can test and debug your component under test in isolation.

This was a rather long post. But I hope that you now understand that without Inversion of Control, there is something very valuable missing in your toolbox. Inversion of Control is an important structuring technique of your code. Some projects make heavily use of this principle (see e.g. Apache Excalibur). It is one of this magic techniques, wizards use ;-)

1 Kommentar:

Dominik Klein hat gesagt…

Funny - back in the days when we were working on the implementation of MPPM we also looked into implementing continuations in Python.

If I remember correctly, we opted for the "easy way out" and used multiple Threads in the end which we suspended and resumed as needed. The generator approach would have been the more elegant solution, though.

By the way - as far as I know "Stackless Python" offers continuations straight out of the box.