One of the most frequently used examples of abstraction is to separate an interface from an implementation, the type from the class. Just like "Omit needless words", it's hard to grasp the depth of this advice. It's easy to say, and easy to think you understand it. But do you? Can you clearly separate out in your mind what's interface and what's not. It can be hard, especially if you have a long procedural background. I know it was for me.
size_t strlen(const char *); const char * greeting = "hello"; strlen(greeting);Well, you can see it all, so surely it's all interface. But hang on, lurking in there is a char *, which is a representation of the string concept. If you're a hardened C programmer it can be hard to really appreciate this. The char * has been right in front of your nose for so long that you no longer even notice it. But think, how do you create and use a double? The answer is without needing to know how a double is represented. You write:
double sin(double); double tax = 42.24; sin(tax);you don't need to write [2]
struct double
{
unsigned int sign : 1;
unsigned int exponent : 7;
unsigned int mantissa : 24;
};
/* C */
size_t strlen(const char *);
const char * greeting = "hello";
strlen(greeting);
// C++
size_t string::length() const;
string greeting("hello");
greeting.length();
There is a lot of similarity between the C and C++ versions. If the C++ string
representation is a single character pointer, the immediate memory footprint of
greeting will be the same in both cases. The implementation of
string::length() might well delegate to strlen(). In other words,
under the hood they 're very similar, the differences seems syntactic, superficial.
If you look at the assembler generated by a C compiler you'll find memory values,
memory value operations, and jumps. If you look at the assembler generated by a
C++ compiler and you'll find... exactly the same.
But you weren't programming in assembler: that's the point. And like the char * in front of your nose, it's so obvious it's easy to miss. A vital difference is that the code is different. It just is different. The syntax is different.
Let's look at the C and C++ fragment again, this time looking past the superficial difference in syntax. Think about why the syntax is different. What concept does the difference express? Here's the C again:
size_t strlen(const char *);Let's break the code into individual symbols.
The char * is the representation of the concept string, but is not the name of the concept.
The const represents the function semantics on the target string we're finding the length of. Notice that for functions with more than one parameter, there is nothing to distinguish the parameter representing the target from other parameters.
The str of strlen refers to the name of the string concept, while the len of strlen refers to the semantics of the function. Only a human can make this simple analysis. There is nothing in the syntax of the single token strlen that the compiler can use to separate str from len, or st from rlen, etc. There is nothing in the syntax of C to group strlen and strchr more strongly than, say, strlen and strerror.
A single token has no declarative power, even if it is a keyword [3]. To make your code more declarative you first have to break it down into separate elements. The declarative power arises through the very act of explicitly combining the separate elements together. C++ is very expressive primarily because it's syntax offers lots of ways to combine elements.
size_t string::length() const;The token strlen has been replaced by the tokens string and length. There is now a clear separation of concerns: string names the type, length names the function, and vitally, the :: glues them together. The :: syntactically groups string and length together. We have said it in code. The length of string::length and the find of string::find are strongly bound together because both are bound to string using the :: binding. Notice also that there is nothing called error bound to string.
The const represents the semantics of the target string we're finding the length of. Notice that for functions with more than one parameter, there is now a syntactic difference between the target and other parameters. Again, we've said it in code.
The char * has vanished. It's nicely hidden inside the definition of class string.
class string
{
...
private: // representation
char * text;
};
That's all for now.
Cheers
Jon Jagger
jon@jaggersoft.com
[1] The Elements of Style, William Strunk Jr, E.B.White, MacMillan, 0-02-418200-1
[2] This is a guess. I really don't know that much about floating point representation.
[3] Try it. Go on. What's the fewest number of tokens that form a legal statement?