Speaking in code

Once upon a time there was a guy called Donald Knuth. Besides being the author of our discipline’s most prominent work, the Art of Computer Programming, he also invented Literate Programming, of which he says:

I believe that the time is ripe for significantly better documentation of programs, and that we can best achieve this by considering programs to be works of literature. Hence, my title: “Literate Programming.”

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reinforce each other.

His tools to do literate programming were hand-crafted by himself and included the TeX typesetting system and a system to embed TeX into documentation strings of programs and generate hypertext documents from it, called CWEB. TeX, plus his Metafont typeface system, was so good it’s in use by many professional publishing houses today. CWEB didn’t fare quite as well, lost in the obscurity of time.

Very few people really did literate programming, and although there are more modern tools like doxygen and pydoc for generating documentation from comments in code, this is pretty much exclusively used for API documentation… It is documentation embedded in code instead of code embedded in documentation. It’s the difference between a tutorial and a reference library; which would you rather learn a system from?

I think we’re missing something here…

In the last 10 years, the single most important development in the field of software development has, in my opinion, been the emergence of agile metodologies such as XP. Much good came from this, including the idea that developers should take ownership of building automated tests for all of their functions, use cases, and bug fixes, depending on who you ask.

Test-Driven Development (TDD) takes this to the extreme by forcing you to write the tests just before you write the code for the tests… or is it really extreme?

How do you plan or design a system? You probably walk through some use cases, doing “what if” type of experiments in your head, while identifying objects, classes, and interfaces. You can document these “scenarios” using fancy UML sequence diagrams in a tool that eats hundreds of MB of RAM and takes minutes to start up… or you can scribble it down in pseudo code in Notepad, TextEdit, Outlook Mail, Emacs, vi, or on a piece of paper. Say you wanted to design a system for students to take courses:


  math101 = Course("Math 101")
  albert = Student("Albert")
  albert.take(math101)
  print math101.students()
  # should output something like <Student: Albert>
 

Aside: What if you could execute these examples? Well, if you’re programming in Python, you actually can execute the above example. You’ll find with modern dynamic languages that if you program close to the problem domain that your code ends up looking like pseudo code; it’s the essential complexity of your algorithm. In the words of the Programmer’s Stone, the tool’s got out of your way and you can reach the quality plateau.

With some experience you find that by writing out such examples/ scenarios you can more easily find natural interfaces and designs. If you went straight for classes you’d have to keep all these examples in your head, and that’s not easy. Often times you’d probably end up with objects that didn’t plug into each other well.

This is what I think TDD should help you with: thinking in terms of examples, of usages, instead of individual class designs. Unfortunately, the word test seems to imply something that is done after you finish your work. How can you test something that isn’t even done yet? It doesn’t make sense.

Some people are so unhappy with the word that they invented a whole new discipline, Behavior-Driven Development (BDD), to replace TDD. Tests are no longer tests but… er… behaviors. Someone else also calls them (checked) examples, which to me seems more natural. (It follows that the practice should be called Example-Driven Development then.) The BDD crowd also invented a whole new set of tools, which lets you write test cases in a more natural language.

Python has since a long time ago come bundled with something called doctests, which has been perfectly fine for writing tests so they look like examples.

Phew… To summarize this long (Python) snake of a thought thread with two recommendations for better code, the first inspired by the agile movement guys, the second by Donald Knuth:

  1. Practice TDD with doctests. Read “example” instead of “test”.

    • Your interfaces and designs are likely to be more natural.
    • You’ll get automated tests so you can refactor mercilessly.
    • Your tests are your documentation, so they are never out-of-sync.
  2. Structure your examples and documentation to form a tutorial to your code. Your fellow programmers will:

    • … love reading your code
    • … actually understand your design
    • … quickly become capable of working with your code

For extra points, see how you can extend automated testing all the way into the realm of real customer requirements; check out FIT.

Share it on...
del.icio.us  Digg it  Netscape  Newsvine  reddit  StumbleUpon  Yahoo MyWeb  

One Response to “Speaking in code”

  1. Ionut Bizau Says:

    Doctests are pretty impressive - not that the idea was new, but I find the implementation really innovative!

Leave a Reply


tracker