Publications
of Jon Jagger
jon@jaggersoft.com
Appeared in CVu 11.2, January 1999

Abstraction, syntax, intent

Deep wisdom

There is a fairly well known writing guide called The Elements of Style [1]. It contains the proverb "Omit needless words". This appears to be a good piece of advice for a novice writer. But is it? Think about it. Does a novice writer know which words are needles? Probably not. What about an experienced writer?. There's a deeper meaning: good writers know which words are needless. Many expressions of deep wisdom are like this, only revealing the deep meaning when experience allows you to see past the shallow meaning.

One of the most frequently used examples of abstraction is to separate an interface from an implementation, the type from the class. Just like "Omit needless words", it's hard to grasp the depth of this advice. It's easy to say, and easy to think you understand it. But do you? Can you clearly separate out in your mind what's interface and what's not. It can be hard, especially if you have a long procedural background. I know it was for me.

What's interface and what's not?

Read this fragment and decide.
size_t strlen(const char *);
const char * greeting = "hello";
strlen(greeting);
Well, you can see it all, so surely it's all interface. But hang on, lurking in there is a char *, which is a representation of the string concept. If you're a hardened C programmer it can be hard to really appreciate this. The char * has been right in front of your nose for so long that you no longer even notice it. But think, how do you create and use a double? The answer is without needing to know how a double is represented. You write:
double sin(double);
double tax = 42.24;
sin(tax); 
you don't need to write [2]
struct double
{	
    unsigned int sign     :  1;
    unsigned int exponent :  7;
    unsigned int mantissa : 24;
};

The syntax is different

How does the string example look in C and in C++? Essentially like this:
/* C */
size_t strlen(const char *);
const char * greeting = "hello";
strlen(greeting);
// C++
size_t string::length() const;
string greeting("hello");
greeting.length();
There is a lot of similarity between the C and C++ versions. If the C++ string representation is a single character pointer, the immediate memory footprint of greeting will be the same in both cases. The implementation of string::length() might well delegate to strlen(). In other words, under the hood they 're very similar, the differences seems syntactic, superficial. If you look at the assembler generated by a C compiler you'll find memory values, memory value operations, and jumps. If you look at the assembler generated by a C++ compiler and you'll find... exactly the same.

But you weren't programming in assembler: that's the point. And like the char * in front of your nose, it's so obvious it's easy to miss. A vital difference is that the code is different. It just is different. The syntax is different.

The job of syntax

At a superficial level syntax matters because it's what the programmer sees. It's what the programmer reads. The syntax is the interface to the language. But syntax has a deeper purpose, to express ideas, to support design. The more expressive the language, the smaller the gap between a concept and the expression of the concept in the syntax of the language. In other words, the job of syntax is to directly and explicitly express your intent. I don't think it's by chance that we use the words statement, declaration, and expression when we talk about C and C++. The terminology is deliberate. The declarative power of C++ allows you raise the level of abstraction in exactly the same way that C did over assembler. When you program at a higher level of abstraction you can capture more of the concept simply because the syntax allows you to express more of the concept.

Let's look at the C and C++ fragment again, this time looking past the superficial difference in syntax. Think about why the syntax is different. What concept does the difference express? Here's the C again:

size_t strlen(const char *);     
Let's break the code into individual symbols.

The char * is the representation of the concept string, but is not the name of the concept.

The const represents the function semantics on the target string we're finding the length of. Notice that for functions with more than one parameter, there is nothing to distinguish the parameter representing the target from other parameters.

The str of strlen refers to the name of the string concept, while the len of strlen refers to the semantics of the function. Only a human can make this simple analysis. There is nothing in the syntax of the single token strlen that the compiler can use to separate str from len, or st from rlen, etc. There is nothing in the syntax of C to group strlen and strchr more strongly than, say, strlen and strerror.

A single token has no declarative power, even if it is a keyword [3]. To make your code more declarative you first have to break it down into separate elements. The declarative power arises through the very act of explicitly combining the separate elements together. C++ is very expressive primarily because it's syntax offers lots of ways to combine elements.

size_t string::length() const;
The token strlen has been replaced by the tokens string and length. There is now a clear separation of concerns: string names the type, length names the function, and vitally, the :: glues them together. The :: syntactically groups string and length together. We have said it in code. The length of string::length and the find of string::find are strongly bound together because both are bound to string using the :: binding. Notice also that there is nothing called error bound to string.

The const represents the semantics of the target string we're finding the length of. Notice that for functions with more than one parameter, there is now a syntactic difference between the target and other parameters. Again, we've said it in code.

The char * has vanished. It's nicely hidden inside the definition of class string.

class string
{
   ...
private: // representation
	char * text;
};

Summary

Here's my definition of design: consciously joining separate elements to create structure. Note that the elements need to be separate to be able to be joined together. This relates to simplicity. If an element is too large it tends to be hard to understand, hard to maintain, it loses structure, it rusts. Break a large element down into separate pieces and each piece becomes simpler, easier to understand, easier to maintain, more cohesive. Then put the pieces back together in different ways to create different structures with different strengths and weaknesses. This is easy to say, but it's harder to do. Most human beings seem to have great difficulty really embracing simplicity. The best designers seem to have a knack for seeing the underlying three or four concepts when you just see one.

That's all for now.
Cheers
Jon Jagger
jon@jaggersoft.com

[1] The Elements of Style, William Strunk Jr, E.B.White, MacMillan, 0-02-418200-1

[2] This is a guess. I really don't know that much about floating point representation.

[3] Try it. Go on. What's the fewest number of tokens that form a legal statement?