In my last post I talked about Pl/I programming, which I’ve again fallen in love with, and the speed of my small engineering programs on my Z80 single board computer compared to a modern PC.

I was actually incorrect in my last post – the program is not 4000 times slower than a PC. In fact, the PC version was so quick one could not really time it by hand. There are tools on Unix boxes to time program execution, but I didn’t use them. However, a more realistic estimate of the larger data set would be about 1/2 second, not 1 second. That means my little Z80 program was 4000 x 2 or 8000 time slower (at least).

But something else about that program has been bothering me for the past few days. I mentioned in my last post how the ‘exp’ (or ‘**’) operation had a bug that prevented the double precision version working with negative numbers. Normally, there’s no problem – for example square -2 and you get +4. Cube -2 and you get -8. It’s all just serial multiplication, so it bothered me that it was not working. My fix was to add ‘abs’ before the term to remove any negative value; that worked, but I wasn’t happy with the fix as it wasn’t ‘totally correct’ from an aesthetic standpoint.

The solution was to return to my programming roots. Back in the early 1980’s I worked writing Reservoir Simulation programs. These huge FORTRAN programs could consume every calculation cycle of very, very large computers. Optimization beyond what a compiler was capable of became essential. There were many tricks we used to squeeze every bit of performance out of such programs.

One technique involved removing subroutine calls when possible by ‘in lining’ code. It’s ugly, and counter to all ‘structured programming’ rules, but it does work. Another technique analyzed operations; division is much more ‘expensive’ (in CPU cycles) than multiplication, with addition being about the cheapest. Loops have set-up time, so ‘unrolling’ loops might also be done in simple cases.

My analysis of this particular PL/I program showed the line with the error in it had multiplication, division, and that exponentiation. Now I added a call to the ‘abs’ function. I knew there was a way to fix both and make it run faster… convert ‘exp’ (or ‘**’) into serial multiplication and remove the ‘abs’ call. It would not only fix the negative value error (multiply always works) but remove two function calls and their related overhead (push stack, pop stack).

I decided to write a new function ‘doexp()’ which used a loop to perform the serial multiplication. I knew I was adding the overhead of a new function, but removing two function calls and a (supposed) complex general-purpose exponentiation routine. It won’t work for fractional exponents, but this program was always limited to whole number exponents anyway.

The new function coded, I tried the smaller data sets. All reported a roughly 1.5 time speed improvement. I then ran the big data set, and early timing of the time steps shows it’s running 2 times faster than the original program. The entire job, which would have taken 2.8 days, will now run in 1.4 days. For program tuning, this is a huge improvement.

As the post title says, it matters to me. It was also great fun to see it work.

Did you use an O(ln(n)) (squaring ln(n) times before regular multiplies) or an O(n) algorithm for doing serial multiplication

It’s just a simple loop: sum=1.0; for i = 1 to exp { sum = sum * term};

I know multiplication is faster than log and I suspect the log is at the heart of this problem in the library code, so I just went with the simplest alternative.

I’m quite surprised at what you’re finding: if ** uses a log and exp I’d expect it to be much slower than multiplication, not just half speed. But I suppose it depends on the exponents. It certainly explains the unwillingness to raise negative numbers to integer powers.

But as anon says, you can do a bit better than just multiplying – search for “exp_by_squaring_iterative”

True Ed, there are faster methods. The beauty of creating a function ‘do_exp()’ is that you can easily replace the ‘it’s working but brute force’ code with something more robust and elegant (and faster) easily without disturbing anything else in the program. Once I have the program running, I’ll probably do that.