Publications
of Jon Jagger
jon@jaggersoft.com
Appeared in Overload 32, June 1999

A Review of cow_ptr<type>

Introduction

In my previous article I looked at a copy-on-write pointer class. I had planned that this article would follow in the normal way. Things rarely happen exactly as you plan them. Instead, I'm going to use this article to review the previous article (I'm tempted to repeat this as a pattern: article,review; article,review; article,review...) I think this is an honest approach. It is after all how most developers work. You don't write code in a straight line.

Style follows audience

Astute reader Mr Kevlin Henney of Bristol kindly read through the previous article before I submitted it. He asked why I mixed the style of my const declarations. Ah yes. In the previous article at one point I wrote this...

class string::char_reference
{
    ...
private:
    string * const target;
    size_t const index;
};

I think my train of thought for this experiment started with target. There is no const choice for target. If I want target to be const it has to be as it is. Ok, but what about index? If I want index to be constthere are two choices...

const size_t index;  // conventional
size_t const index;  // not-so-conventional

The first obvious point is that using the not-so-conventional style for index means that target and index acquire similar looking declarations. Thereís something else too though. It was pretty subconscious but looking at it now and trying to vocalise the difference I think I can see it. Itís a matter of emphasis. The not-so-conventional style places the const near to the identifier (with the effect that the type-name is always the leftmost identifier), whereas the conventional-style places the const near to the type-name. The question is what is const? Is it the type or is it the entity referred to by index? Itís both really, but it comes back to types and expressions. Do you want to focus on the type of index or on the use of index in expressions? Given that index is a private data member itís fair to say that the primary audience for index is the class-implementor. And of course they will be using index in expressions. Clear as mud?

However, at another point in the article I wrote...

cow_ptr(const cow_ptr & other); // cítor declaration

To make this consistent with the index-style, surely it would need to be...

cow_ptr(cow_ptr const & other);

I donít think so. This is a very different case. This is public method. I think itís fair to say that the primary audience for this is the class-clients. They wonít be using the parameter rhs in expressions. Theyíll be supplying an argument for rhs. Their main consideration is the type of other. So, given that Iíve chosen to use the expression-style when I want to focus on the implementation it makes sense to use the type-style when I want to focus on the interface. It provides a nice interface/implementation distinction too.

There something else to add to this. Itís the observation that these function declarations must have a matching function definition. Should the function definition use the type-style or the expression-style? I think that it should use the type-style...

template<typename type>
cow_ptr<type>::cow_ptr(const cow_ptr & other)
    ... 
{
    ... 
}

Note though that Iím happy for the names of the parameters to be different from declaration to definition. I think thatís entirely reasonable. Again itís all about the target audience. For the declaration the audience are the users, and their focus should be what is happening. For the definition the audience is the implementor, and their focus is how itís happening. I think thatís a healthy difference.

So there you have it; my limited excursion into expression-style. I donít go the whole hog with it. But some do. Dan Saks does. Francis uses it in his EXE columns too. I recall one reason for their enthusiastic embrace was based on reading declarations. The argument is that it helps when reading the declarations from right to left.

size_t const index; 

Let's try it: index is a const size_t. It does appear to work. Donít get me wrong now, I think reading code is very important. Code is rarely spoken aloud - itís read. Which is why how code looks is so important. Layout matters. Itís just that I donít think that you read code right-to-left or left-to-right! I think an experienced C/C++ programmer slurps up whole statements at a time.

Style follows style

One more style issue. Member initialisation lists. In my original Overload 31 submission I wrote...

template<typename type>
cow_ptr<type>::cow_ptr(const cow_ptr & rhs)
  : ptr(rhs.ptr)
  , count(rhs.count)
{
    ...
}

However, in the published article the editor edited it (as they tend to do) into...

template<typename type>
cow_ptr<type>::cow_ptr(const cow_ptr & rhs)
    : ptr(rhs.ptr), count(rhs.count)
{
    ...
}

The former style, with each initialisation on its own line, is the style I prefer for member initialisation lists. There are a couple of reasons. Firstly it just fits on the page better. Particularly if I have a couple of longish identifiers. Secondly it separates the colon-comma punctuation from the essential initialisations. Thirdly it matches closely the declaration style I use in the class definition. In the class definition I declare identifiers one per line. I write...

template<typename type>
class cow_ptr
{
    ...
private: // state
    type * ptr;
    size_t * count;
};

I donít write...

template<typename type>
class cow_ptr
{
    ...
private: // state
    type * ptr;  size_t * count;
};

It all comes back to simplicity. See below.

Exception Unsafety

Hereís the constructor for cow_ptr<> again.

template<typename type>
cow_ptr<type>::cow_ptr(type * p)
  : ptr(p)
  , count(new size_t(1))
{
    // all done
}

and hereís how it was used...

cow_ptr<type> unshared(new type(*ptr));

This is not exception safe. And remember this is a template class. We donít know how big the potential leak is because we donít know what type is. Ooops.

template<typename type>
cow_ptr<type>::cow_ptr(type * p)
{
    auto_ptr<type> safe(p);
    count = new size_t(1);
    ptr = safe.release();
}

Thatís better.

Refactor

The longer I work in software the more I become convinced that simplicity is the key. Yet somehow itís really hard to write really simple code. It's as though the programmer's psyche somehow rejects simplicity. I can imagine a hard-core hacker pointing to a particularly dense piece of code they've just written and proclaiming loudly

“I wrote that. Bet you can't understand it. I can. I'm dead hard me.”
Simplicity is one of those things that is hard to define but you know it when you see it.

The minimum could be defined as the perfection that an artefact achieves when it is no longer possible to improve it by subtraction.

Minimum, John Pawson
Phaidon Press, 0-7148-3817-9

Code-simplicity and code-duplication don't sit well together. I'm afraid there was some code-duplication in my previous article. I wrote...

template<typename type>
cow_ptr<type> &
cow_ptr<type>::operator=(const cow_ptr & rhs)
{
    ++*rhs.count;
    if (--*count == 0)
    {
        delete count;
        delete ptr;
    }
    ptr = rhs.ptr;
    count = rhs.count;
    return *this;
}

template<typename type>
cow_ptr<type> &
cow_ptr<type>::~cow_ptr()
{
    if (--*count == 0)
    {
        delete count;
        delete ptr;
    }
}

The duplication is plain to see. Letís try refactoring to avoid the duplication. One approach is to simply extract the common code and put it in a new private method.

template<typename type>
void cow_ptr<type>::decrement() const
{
    if (--*count == 0)
    {
        delete count;
        delete ptr;
    }
}

template<typename type>
cow_ptr<type> &
cow_ptr<type>::operator=(const cow_ptr & rhs)
{
    ++*rhs.count;
    decrement();
    ptr = rhs.ptr;
    count = rhs.count;
    return *this;
}

template<typename type>
cow_ptr<type> &
cow_ptr<type>::~cow_ptr()
{
    decrement();
}

Now suppose we wanted to clear the cow_ptr pointer, that is to reset itís pointer to null. At the moment we would have to jump through a small hoop...

cow_ptr<type> ptr(new type(...));
...
ptr = cow_ptr<type>(0);  // clear pointer

Itís a small but perhaps not-so-obvious step to promote the private decrement() into a public clear().

template<typename type>
cow_ptr<type> &
cow_ptr<type>::~cow_ptr()
{
    clear();
}

template<typename type>
cow_ptr<type>::operator=(const cow_ptr & rhs)
{
    ++*rhs.count;
    clear();
    ptr = rhs.ptr; 
    count = rhs.count;
    return *this;
}

template<typename type>
void cow_ptr<type>::clear()
{
    if (--*count == 0)
    {
        delete count;  // This contains a BUG
        delete ptr;    // See the fix in
        count = 0;     // next article
        ptr = 0;
    }
}

Thereís still a potential problem with the assignment operator though. Itís not exception safe. To fix this we can use an approach based on the "Resource Acquisition is Initialization" idiom (what a catchy name). This requires a new constructor, which would be private...

tempate<typename type>
cow_ptr<type>::cow_ptr(type * p, size_t * c) throw()
  : ptr(p)
  , count(c)
{
    // all done
}

tempate<typename type>
cow_ptr<type> &
cow_ptr<type>::operator=(const cow_ptr & rhs)
{
    cow_ptr old_self(ptr, count);
    ptr = rhs.ptr;
    count = rhs.count;
    ++*count;
    return *this;
}

Refactor

I'm afraid there was also some duplication elsewhere...

string::body::body(const char * literal)
{
    cached_length = ::strlen(literal);
    text = new char[cached_length + 1];
    ::strcpy(text, literal);
}

string::body::body(const body & rhs)
  : text(new char[rhs.cached_length + 1])
  , cached_length(rhs.cached_length)
{
    ::strcpy(text, rhs.text);
}

Code on a printed page feels different to code on the screen. These grate on my sensibilities. The first assigns to cached_length and then to text. The second initializes text and then cached_length. Refactor. Here's my first attempt...

void 
string::body::initialize(const char * literal)
{
    cached_length = ::strlen(literal) ;
    text = new char[cached_length + 1];
    ::strcpy(text, literal);
}

string::body::body(const char * literal)
{
    initialize(literal);
}

string::body::body(const body & rhs)
{
    initialize(rhs.text);
}

This is better. I can still see a flaw though. The copy constructor is needlessly recalculating the length of the text. Here's my second attempt...

void 
string::body::initialise(const char * literal, size_t length)
{
    cached_length = length;
    text = new char[length + 1];
    ::strcpy(text, literal);
}

string::body::body(const char * literal)
{
    initialise(literal, ::strlen(literal));
}

string::body::body(const body & rhs)
{
    initialise(rhs.text, rhs.cached_length);
}

Ban all duplicate code.

Explicit

Following on nicely let's have a careful look at those two string::body constructors. The sole purpose of these two constructors is to be called explicitly. There is no reason I can think of why you'd want to use string::body objects as value parameters to a function, or as value return types from a function. They can be made explicit.

struct string::body
{
    explicit body(const char * literal);
    explicit body(const body & rhs);
    ...
};
My thanks to Kevlin for several comments that improved this article.

That's all for now.
Cheers
Jon Jagger
jon@jaggersoft.com