hudebnik

I'm teaching a class on design and analysis of algorithms, and trying to have as much of the class as possible led by the students. All this week the students have been presenting homework solutions, and today one of the sharper students presented a comparison between a naive implementation and a cleverer, "optimized" implementation of the heapsort algorithm. He walked us through his code, both heapsorts and the timing code wrapped around them, ran through a toy-sized demonstration of each on the board, and then I said "So, do you have results?"

"Yes, I do." He ran the program: the naive implementation took 1.5 milliseconds, and the optimized implementation took about 30 microseconds. Beautiful!

He ran it again, with similar (though not identical) results. And again, and again. Pretty convincing... except that he was sorting an array of 8 numbers, on which even the stupidest heapsort shouldn't take 1.5 milliseconds. Furthermore, looking at the algorithms, I concluded that the speedup should be at best a factor of 2, not a factor of 50. So I asked him, on a whim, to swap the code segments to run the optimized implementation first, and the naive implementation second.

The optimized implementation took 1.5 milliseconds, and the naive one 30-some microseconds. Consistently. The presenter looked like he'd been hit with a pole-axe. The rest of the class were amused, but equally puzzled. We discussed possible explanations, and eventually concluded that it probably had to do with loading a Java class file from disk on the first demand for it. (1.5 ms actually sounds low for that, unless he has a solid-state drive.) After suggesting a couple of ways to correct for this, the students agreed to sort another array first, without timing it, and then try the timed runs. They also suggested moving the timing code in closer, so that it only measured the parts of the code that differed from one algorithm to the other.

The optimized implementation took 30-some microseconds, and the naive one 20-some microseconds. That again, and again, and again. This wasn't such a shock, really, as the presenter's own toy-sized demonstration had actually done more work under the "optimized" algorithm than the naive one; it was predicted to work better, in the average case, on reasonably large heaps. So we tried size 8000, and the naive algorithm still won. Then 10000000, and it was pretty close: the optimized algorithm won one match out of five, and lost only narrowly the rest of the time. Then 20000000, and the optimized algorithm won fairly consistently.

So some valuable lessons were learned about empirical efficiency-testing. Not what we set out to do, but you take teachable moments where you find them.