Publications
of Jon Jagger
jon@jaggersoft.com
Appeared in Overload 31, April 1999

copy_on_write_ptr<type>

Recap

In my first counted_ptr<> article (way back now) I created a flawed string class that reference counted shared state. The flaw in this naďve attempt was that it failed to consider an important question: what happens when you change shared state? For my toddler string class it was an internal implementation detail. This meant quite simply that the string modifier-methods had to obtain a personal, unshared copy of state before doing their modification. My previous article focused on just one of the string modifier-methods; the subscript operator. At the end of that article the string class looked like this:
// string.hpp
namespace accu
{
    class string
    {
    public: // types
	
        class char_reference;
        typedef char_reference reference;
        ...
    public: // access, idiomatic and primitive

        char_reference operator[](size_t index);
        char           operator[](size_t index) const;

        void assign_at(size_t, char); // [1]

    private: // plumbing

        void bounds_check(size_t index) const;
        void unshare_state();

    private: // shared representation
		
        char * text;	
        size_t * count;	
    };

    class string::char_reference
    {
    public:
        char_reference(string * target, size_t index);
        ...
        char_reference operator=(char new_value)
        operator char() const;
        ...
    private:

        string * const target;
        size_t const index;
    };
}
The essence of the second article was that in a code fragment like this:
string theory("hello");
theory[0] = 'C';      // expression 1
cout << theory[0];    // expression 2
expression 1 was syntactic shorthand for
string::char_reference _temp = theory[0];
temp.operator=('C'); 
// delegates to theory.assign_at(0, 'C')
and expression 2 was syntactic shorthand for
string::char_reference _temp1 = theory[0];
char _temp2 = _temp1.operator char();
cout << _temp2;

Intro

In this article I'm going to address a related issue. I've already hinted at it: what if the class you are putting the counted_ptr<> inside has lots [2] of modifier methods? I mean, I haven't even considered string append yet. Is that going to be as tricky as the subscript operator was? This technique of reference counting seems fine for the queries but the modifiers are definitely tricky. I will address this issue. I promise. But first I can't resist polishing the string class just a little more.

Caching in

Suppose we want to cache the length of the string. We could design the cache as, say, a size_t. And a cache value of zero could indicate a "stale" cache. The length query simply refreshes a stale cache (if it needs to) by calling strlen. The cache will need to be declared mutable. All pretty standard stuff. But hang on. The cache will need to be mutable? Will it? The text is shared, and the cache is caching the length of that, so shouldn’t the cache be shared too?
// string.hpp
namespace accu
{
    class string
    {
    public:
        string(const char * other = "");
        ...
        char_reference operator[](size_t index);
        void assign_at(size_t index, char new_value);
        ...
        size_t length() const;
        ...
    private:

        char * text;
        size_t * cached_length;	
        size_t * count;
    };
}
The need for mutable vanishes because const-ness does not traverse a pointer. The string::length() query cannot modify that state of the string object pointed to by this. In other words, string::length() cannot change cached_length (which is a pointer), but it can change what cached_length points to. Hence we arrive at:
// string.cpp
namespace accu
{
    size_t string::length() const
    {
        if (*cached_length == 0)
        {
            *cached_length = strlen(text);
        }
        return *cached_length;
    }

    void string::assign_at(size_t index, char new_value)
    {
        bounds_check(index);
        unshare_state();
        text[index] = new_value;
        *cached_length = 0;
    }
}
This is fine as far as it goes, but it goes too far. It's doing more work than it needs to. There are two cases to consider. Firstly, what if new_value (of assign_at) is not the nul character? Answer: *cached_length is needlessly resetting the cache. Secondly, what if new_value (of assign_at) is the nul character? Answer: *cached_length does (surely?) need to be reset. Yes, but does it need to reset it to zero? Follow it through; what will length recalculate the cache value as? Well, to strlen(text). Yes yes yes, but what's that? Answer: the value of index in the previous assign_at where new_value was the nul character! Hence:
// string.cpp
namespace accu
{
    size_t string::length() const
    {
        return *cached_length;  
    }

    void string::assign_at(size_t index, char new_value)
    {
        bounds_check(index);
        unshare_state();
        text[index] = new_value;
        if (new_value == '\0')
        {
            *cached_length = index;
        }
    }
}
Neat eh? Of course all this has nothing to be with reference counting. It rests on the char_reference proxy class. Enough polishing...for now.

Putting the counted_ptr<> back in again

The string class is starting to get messy; at least two private methods (not great on a user class) and three data members (if we leave the cache in) that are all pointers. Time to simplify. Time to put the counted_ptr<> back in.
// string.hpp
#include "counted_ptr.hpp"
namespace accu
{
    class string
    {
    public:
        ...
        size_t length() const;
        ...
        void assign_at(size_t index, char new_value);
        ...
    private:

        void unshare_state();
        ...
    private:

        class body;
        counted_ptr<body> ptr;
    
    };
}
// string.cpp
namespace accu
{
    struct string::body
    {
        char * text; 
        size_t cached_length; // strlen(text)
        ...
        void assign_at(size_t index, char new_value);
        ...
    };
}

Simpler still ?

This is much better. But unshare_state is still there. Is there any way we get rid of that too? There is. First take a look at a fragment of counted_ptr<>
namespace accu
{
    template<typename type>
    class counted_ptr
    {
    public:
        ...
        type * operator->() const;
        type & operator*() const;
    private:
        type * ptr;
        size_t * count;	
    };
}
Both operators are const queries, but both give non-const access to the pointed to object. This is perfectly reasonable. The counted pointer is behaving exactly like a raw pointer. What we have here is the hidden handle/body idiom. The user uses the handle (string) directly as a value, and the body (string::body) is effectively invisible. Hidden handle/body means the body is hidden from the user of the handle. In this case, the counted_ptr is the link between the string-handle and the body-body. More specifically, counted_ptr::operator->() is the usual way the string traverses to the body. We can see this in the implementation of the string methods. They all just delegate to the body through operator->()
// string.cpp
namespace accu
{
    ...
    size_t string::length() const
    {
        return ptr->cached_length;
    }
	
    void string::assign_at(size_t index, char new_value)
    {
        ptr->assign_at(index, new_value);
    }
}
But there's a mistake here. It's so easy to make - string::assign_at() hasn't called unshare_state() before delegating across operator->(). Remember string::assign_at() is a modifier. The user is writing to the shared state, so the modifier must make its own personal copy of the state. Copy On Write. In other words, the programmer must remember to unshare in every modifier. That's not that much to ask. But it is duplication. And duplication is boring. And when the mind gets bored it makes mistakes. It would be better if we could make it happen automatically. Can we? Well, as it turns out we can. One way is to overload on const: provide two versions of operator->()

cow_ptr - Copy On Write pointer

namespace accu
{
    template<typename type>
    class cow_ptr
    {
    public:
        ...
              type * operator->();
        const type * operator->() const;
        ...
    private:

        type * ptr;
        size_t * count;
    };
}
If our string class uses cow_ptr instead of counted_ptr then the semantics of the string methods change. Let's look at string::length() and string::assign_at() again:
namespace accu
{
    size_t string::length() const
    {
        return ptr->cached_length;
    }
}
This is a query. That forces the compiler to resolve ptr-> as the const version of cow_ptr::operator->(), which simply returns a read-only raw pointer. Recall the (sole) version of counted_ptr::operator->() returns a read-write raw pointer. In other words this:
namespace accu
{
    size_t string::length() const
    {
        ptr->cached_length = 42;
        ... 
    }
}
will compile quite happily if ptr is a counted_ptr<body>, which is not a Good Thing. But it will generate a compiler error if ptr is a cow_ptr<body>, which is a Good Thing. The cow_ptr<> version is type safer. When you split a class into a handle and a body it's easy to forget this subtlety: that the const-ness of a raw pointer does not traverse the pointer.
namespace accu
{
    void string::assign_at(size_t index, char new_value)
    {
        ptr->assign_at(index, new_value);
    }
}
This is a modifier. That forces the compiler to resolve ptr-> as the non-const version of cow_ptr::operator->(). This can do the unsharing and then return a read-write raw pointer. The missing call to unshare the state is no longer an error, as it has been refactored inside operator->() itself. There is a single point of definition, and no duplication. And the code in the string methods has a common look and feel too.
namespace accu
{
    template<typename type>
    class cow_ptr
    {
    public:

        cow_ptr(type * other);
        cow_ptr(const cow_ptr & other);
        cow_ptr & operator=(const cow_ptr & rhs);
        ~cow_ptr();

    public:
   
              type * operator->();
        const type * operator->() const;

              type & operator*();
        const type & operator*() const;
        ...
    private:

        cow_ptr(type * ptr, size_t * count);
        void copy();
        type * checked_ptr() const;

    private:
    
        type * ptr;
        size_t * count;
    };
}
namespace accu
{
    template<typename type>
    cow_ptr<type>::cow_ptr(type * other)
      : ptr(other)
      , count(new size_t(1)) 
    {
        // all done
    }

    template<typename type>
    cow_ptr<type>::cow_ptr(const cow_ptr & other)
      : ptr(other.ptr)
      , count(other.count)
    {
        ++*count;
    }

    template<typename type>
    cow_ptr<type> & 
    cow_ptr<type>::operator=(const cow_ptr & rhs)
    {
        cow_ptr<type> old_self(ptr, count); // [3]
        ptr = rhs.ptr;
        count = rhs.count;
        ++*count;
        return *this;
    }

    template<typename type>
    cow_ptr<type>::~cow_ptr()
    {
        if (--*count == 0)
        {
            delete count;
            delete ptr;
        }
    }
}

namespace accu
{
    template<typename type>
    type * 
    cow_ptr<type>::operator->()
    {
        copy();
        return checked_ptr();
    }

    template<typename type>
    const type * 
    cow_ptr<type>::operator->() const
    {
        return checked_ptr();
    }

    template<typename type>
    type & 
    cow_ptr<type>::operator*()
    {
        copy();
        return *checked_ptr();
    }
	
    template<typename type>
    const type & 
    cow_ptr<type>::operator*() const
    {
        return *checked_ptr();
    }
}

namespace accu
{
    template<typename type>
    cow_ptr<type>::cow_ptr(type * p, size_t * c)
      : ptr(p)
      , count(c)
    {
        // all done
    }

    template<typename type>
    void 
    cow_ptr<type>::copy()
    {
        cow_ptr<type> unshared(ptr->copy());
        *this = unshared;
    }

    template<typename type>
    type * 
    cow_ptr<type>::checked_ptr() const
    {
        if (!ptr)
        {
            throw logic_error("cow_ptr<>: null");
        }
        return ptr;
    }
}
And the only change we need to make to the body class is to add a single method called copy(). Notice that copy() is not a virtual function.
// source.cpp
namespace accu
{
    class string::body
    {
    public:
        body(const char * other);
        ...
        body * copy() const;
        ...
    private:
        char * text;
    };

    string::body * string::body::copy() const  
    {
        return new body(text);
    }
}

Rethinking public vs private. Again

You may recall that in one of my articles I um'ed and ah'ed about the virtues of making the string::assign_at method public or private. Making it private was attractive because it meant there was a single syntax for assignment. It also reduces the public interface. And a smaller public interface is a good thing. A mistake on a private method is going to affect user code a lot less than a mistake on a public method. See footnote [1]. However, making it private raises access issues, which are usually solved by friendship. And maybe public isn't so bad really. In my Write to Learn article I opted for making it public. And I stick by that. However, I've since realised that just occasionally there may be another option. One that I handn't even considered before: remove the method altogether! The insight is that methods of string::char_reference don't have to delegate to methods of string (which in turn delegate to methods of string::body). They could delegate directly to methods of string::body.
// string.hpp
#include "cow_ptr.hpp"
namespace accu
{
    class string
    {
    public:

        class body;
        class char_reference;
        ...
        char_reference operator[](size_t index);
                  char operator[](size_t index) const;

        char_reference at(size_t index);
                  char at(size_t index) const;
        ...

    private: // only this

        cow_ptr ptr;
    };

    class string::char_reference
    {
    public:
        char_reference(cow_ptr & ptr, size_t index);
        char_reference operator=(char new_value);
        operator char() const;

    private:

        cow_ptr<string::body> & ptr;
        size_t const index;
    };
}
// string.cpp
namespace accu
{
    struct string::body
    {
        size_t cached_length; // strlen(text)
        char * text;
        ...
        void assign_at(size_t index, char_new_value)
        {
            text[index] = new_value;
            if (new_value == '\0')
            {
                cached_length = index;
            }
        }
        void bounds_check(index) const;
    };
}

namespace accu
{
    string::char_reference string::operator[](size_t index)
    {
        return char_reference(ptr, index);
    }

    char string::operator[](size_t index) const
    {
        return ptr->text[index];
    }

    string::char_reference string::at(size_t index)
    {
        ptr->bounds_check(index);
        return char_reference(ptr, index);
    }

    char string::at(size_t index) const
    {
        ptr->bounds_check(index);
        return ptr->text[index];
    }
    ...
};

namespace accu
{
    ...
    string::char_reference
    string::char_reference::operator=(char new_value)
    {
        ptr->assign_at(index, new_value);
        return *this;
    }
	
    string::char_reference::operator char() const
    {
        return ptr->text[index];
    }
}

namespace accu
{
    size_t string::length() const
    {
        return ptr->cached_length;
    }
    ...
}
Notice that

That's all for now.
Cheers
Jon Jagger
jon@jaggersoft.com

[1] The previous version used a method called assign rather than assign_at. That was a bad choice because there's already a method called assign in std::string with different semantics, but exactly the same prototype.

[2] I'd argue that no class should have lots of methods. By which I mean of course, that a class should be cohesive: it's hard for anything large to be cohesive.

[3] This is an alternative way to implement reference counting assignment. I've never seen it in print, but it has some nice features. Consider for example exceptions.