Publications
of Jon Jagger
jon@jaggersoft.com
Appeared in CVu 10.2, Jan 1998

{ yourself }
Rock and Unroll

Many of you will have come across Duff's device. It's a devastatingly deviously unrolled byte-copying loop devised by Tom Duff while he was working at Lucasfilm. In it's “classic” form, it looks like this:

/* count > 0 assumed */
register int n = (count + 7) / 8;
switch (n % 8)
{
case 0: do {	*dst = *src++;
case 7: 	*dst = *src++;
case 6: 	*dst = *src++;
case 5: 	*dst = *src++;
case 4: 	*dst = *src++;
case 3: 	*dst = *src++;
case 2: 	*dst = *src++;
case 1: 	*dst = *src++;
	    } while (--n > 0);
}

In the loop, count bytes are copied from the array pointed to by src to the memory location pointed to by dst (which is a memory-mapped device output register, which is why dst isn't incremented). It solves the problem of handling the leftover bytes (when count isn't a multiple of 8) by interleaving a switch statement with the loop. Yes it is legal to have case labels buried within blocks nested in a switch statement like this. Duff noted that:

“the fall through" behaviour of C's switch statement has long been controversial. This code forms some sort of argument in that debate, but I'm not sure whether it's for or against.”

Loop unrolling is a tool you should have in your programming toolbox. As with any optimization technique it should only be used on working code that has been profiled. Most examples of loop unrolling are much simpler than Duff's Device. For example...

for (i = 0; i < 8; ++i)
{
    src[i] = func(i, delta);
}

And the unrolled version is simply:

src[0] = func(0, delta);
src[1] = func(1, delta);
src[2] = func(2, delta);
src[3] = func(3, delta);
src[4] = func(4, delta);
src[5] = func(5, delta);
src[6] = func(6, delta);
src[7] = func(7, delta);

Sometimes it can be difficult to spot code that is amenable to unrolling. Most of the time we are implicitly trying to avoid duplication. That's why we write functions. Here's another less well known example:

enum { width = 128 };
int array[width];

/* array sorted: array[i] <= array[i+1] , i = [0..126] */

int i = 0;
if (array[    64] <= value) i +=  64;
if (array[i + 32] <= value) i +=  32;
if (array[i + 16] <= value) i +=  16;
if (array[i +  8] <= value) i +=   8;
if (array[i +  4] <= value) i +=   4;
if (array[i +  2] <= value) i +=   2;
if (array[i +  1] <= value) i +=   1;

Believe it or not, this is an unrolled binary search. It's worth spending some time figuring out how this elegant piece of code works.

That's all for now.
Cheers
Jon Jagger
jon@jaggersoft.com