It matters to me (or the lost art of program tuning)

In my last post I talked about Pl/I programming, which I’ve again fallen in love with, and the speed of my small engineering programs on my Z80 single board computer compared to a modern PC.

I was actually incorrect in my last post – the program is not 4000 times slower than a PC. In fact, the PC version was so quick one could not really time it by hand. There are tools on Unix boxes to time program execution, but I didn’t use them. However, a more realistic estimate of the larger data set would be about 1/2 second, not 1 second. That means my little Z80 program was 4000 x 2 or 8000 time slower (at least).

But something else about that program has been bothering me for the past few days. I mentioned in my last post how the ‘exp’ (or ‘**’) operation had a bug that prevented the double precision version working with negative numbers. Normally, there’s no problem – for example square -2 and you get +4. Cube -2 and you get -8. It’s all just serial multiplication, so it bothered me that it was not working. My fix was to add ‘abs’ before the term to remove any negative value; that worked, but I wasn’t happy with the fix as it wasn’t ‘totally correct’ from an aesthetic standpoint.

The solution was to return to my programming roots. Back in the early 1980’s I worked writing Reservoir Simulation programs. These huge FORTRAN programs could consume every calculation cycle of very, very large computers. Optimization beyond what a compiler was capable of became essential. There were many tricks we used to squeeze every bit of performance out of such programs.

One technique involved removing subroutine calls when possible by ‘in lining’ code. It’s ugly, and counter to all ‘structured programming’ rules, but it does work. Another technique analyzed operations; division is much more ‘expensive’ (in CPU cycles) than multiplication, with addition being about the cheapest. Loops have set-up time, so ‘unrolling’ loops might also be done in simple cases.

My analysis of this particular PL/I program showed the line with the error in it had multiplication, division, and that exponentiation. Now I added a call to the ‘abs’ function. I knew there was a way to fix both and make it run faster… convert ‘exp’ (or ‘**’) into serial multiplication and remove the ‘abs’ call. It would not only fix the negative value error (multiply always works) but remove two function calls and their related overhead (push stack, pop stack).

I decided to write a new function ‘doexp()’ which used a loop to perform the serial multiplication. I knew I was adding the overhead of a new function, but removing two function calls and a (supposed) complex general-purpose exponentiation routine. It won’t work for fractional exponents, but this program was always limited to whole number exponents anyway.

The new function coded, I tried the smaller data sets. All reported a roughly 1.5 time speed improvement. I then ran the big data set, and early timing of the time steps shows it’s running 2 times faster than the original program. The entire job, which would have taken 2.8 days, will now run in 1.4 days. For program tuning, this is a huge improvement.

As the post title says, it matters to me. It was also great fun to see it work.

Just How Slow is my Z80 single board computer?

Officially, the Z80 single board computer from CPUville has a 1.8432 MHz crystal clock. Compare that to a modern PC running multi-megahertz clocks and it’s pretty slow indeed. But sometimes the real test is in running ‘real programs’.

As stated in prior posts, I’ve been playing with PL/I on my Z80 single board computer under CP/M 2.2, and I love it. PL/I is turning out to be the amazing and fun language I learned back in 1980, and it’s a blast. One thing I really love about PL/I is the total control over output, done easily with output formatting capabilities, plus the really great error handling.

I had an obscure bug in my latest program; a multi-phase concentration-with-time program from my Engineering graduate days. It would not run on one of the numerous test data sets, instead crashing immediately with ‘Error 3’. Looking up the error, it says: ‘A transcendental function argument is out of-range.’ Some help. However, transcendental functions include exponentiation, and the program had just such a line. Typical debugging means putting in a bunch of print statements and then wading through reams of output to try and trace the problem.

In PL/I, you can use the ‘on’ construct to trap errors. So by adding a ‘on error(3) begin… end;’ I was able to immediately isolate the subroutine where the error occurred and yes, it was the exponentiation line. Now adding some print statements made sense, and I quickly found that in double-precision, squaring a negative number was causing the problem. Now squaring a negative number is legal (the result is positive), so I had to find a fix. Fortunately using the ‘abs’ function solved the problem for this case as no real data (i.e. fluid concentration) should be negative in the runs. With the problem fixed, time to run all the example cases.

All but one ran in decent time. But problem 10, which has 600 time steps, was still going after two days. With no console output, I wasn’t sure if the program had crashed or was just taking a long time.

A few more print statements (with a debug flag to turn them off later), and a stopwatch, and I found this particular run was taking 6.44 seconds per time step. With 600 time steps in the run, that’s 4040 seconds or 67.3 hours (2.8 days!). No wonder I thought it was taking a while.

So about slowness… when I was wondering if the run was ‘broken’ or not, I ran the test case on my Win7 PC (in FORTRAN, but close enough). It was so fast you could not really time it. Let’s say for argument sake it took 1 second. That makes the Z80 4040 TIMES SLOWER!

Oh well, this is all about fun, so waiting for the full run on the Z80 will be something to wait for. Until next time…

Sin(x) Taylor Series, finally

In my last post I mentioned that I would turn on debugging to see where the program failed before resorting to writing a double precision version.

As it turns out, the program wasn’t really failing, it was the precision I set on the output format that caused the error (CONVERSION). The OVERFLOW error was simply a division that produced a number so small it was beyond single precision. The first was fixed with a simple format change, f(11.7) from f(7.3) and the second… well that needed double precision.

It is very easy to convert the program to double precision. A global replace on ‘float binary’ to ‘float binary(53)’ and it’s done. Since PL/I requires ALL variables to be declared, the above will get them all. Recompile & link, and it’s done.

Except… Digital Research (DR) PL/I for Z80 up to V1.3 explicitly does NOT support double precision; it’s stated quite clearly in the manual. I needed V1.4 for double precision support.

After searching the ‘net and coming up blank, I posed on ‘retrocomputingforum.com’, and immediately EdS came to the rescue with a link to DR PL/I 1.4. I downloaded it, installed it and the program compiled and linked.

Running the program with 11 terms is now not a problem. Using my higher precision formats I could run the program from 0 to 4Pi with 11 terms. Below I show the special cases (exact values of Pi) and you can see how the result differs from actual at 4Pi:

Special cases - sin( Pi):
sin x( 3.14) = 0.0000002 (calculated) 0.0000002 (actual) -0.0000000 (diff)
Special cases - sin( 2Pi):
sin x( 6.28) = -0.0000058 (calculated) -0.0000003 (actual) -0.0000055 (diff)
Special cases - sin( 3Pi):
sin x( 9.42) = -0.1299004 (calculated) -0.0000000 (actual) -0.1299003 (diff)
Special cases - sin( 4Pi):
sin x( 12.57) = -158.2498658 (calculated) -0.0000006 (actual) -158.2498652 (diff)

I now know that if I run 13 terms, it should be accurate to 4Pi, but I think I’ve effectively run the course on this program.

Next up: More conversions from FORTRAN to PL/I. Last time it was an Analytical Well Model program, which gave me a scare in PL/I V1.4 (compared to V1.3) and a couple of programs that “solve the continuity equation for composition variation with time and distance for a one dimensional, two phase, three component system”. Lots of fun. 🙂

Sin(x) Taylor Series – Revisited

I received a message on an excellent discussion group that I follow (retrocomputingforum.com) regarding my last post on the divergence of my calculated Sin(x) results vs. ‘actual’ using PL/I and my sin(x) program.

The gist of the post was that the Taylor series *should* converge to the correct answer for all values of x, so long as there were sufficient terms in the Taylor series. The poster (EdS) went on to describe some tests he had done showing practical limits for x based on the number of terms in the series. Sure enough, 5 terms started to diverge above Pi, while 9 terms was good to over 2Pi.

In response, I rewrote my Sin(x) program to ask for user input: first the upper range (calculating from 0 to nPi where n is input) and the number of Taylor series terms (from 5 to 13). Using the new program, I calculated Sin(x) from 0 to 2Pi with 9 terms, and the results were accurate until close to 2Pi, instead of diverging almost immediately above Pi.

I tried 11 terms to 4Pi, but the program terminated with OVERFLOW error. I then tried going to 4Pi using 9 terms, but received a CONVERSION error. Both indicate the program is at the limit of the PL/I compiler floating-point precision, which is 24 bits for versions 1.0 and 1.3 (float binary(24)). I read in the docs that version 1.4 allows double precision (float binary(53)), so I found a copy and have now installed that.

Before I create a double-precision version of the program, I’ve turned on all debugging just to see where in the taylor series calculations the program is failing.

More to come…

Z80, Taylor Series for sin(x) and why limits matter

I have been thinking about why the results for my calculations of sin(x) using the well published Taylor series have been inaccurate for values of ‘x’ between PI and 2PI.

I think, reading between the lines of many, many posts on the subject, I have an answer.

The series appears to be accurate between -PI and PI. I cannot find a definitive statement to this effect, but it certainly appears to be the case based on a lot of discussion, questions and answers, and ‘chatter’.

By now I’ve tested the algorithm using single precision (on my Ubuntu linux box), double precision, and even swapping out my coded ‘power’ method for a library method. In all cases, the results were the same. Between PI and 2PI, results diverged from ‘actual’.

Note: PL/I on the Z80 does not have a ‘pow’ method which is why I had to write one. Having written in for the PL/I program, I kept it in my C programs (single and double precision) for comparative equality between the C and PL/I verions.

I recoded the program to calculate between -PI and PI, and also added a difference output (calc – actual) to see the actual divergence. Using -Pi to PI, the results are all much closer for the entire range.

My conclusion, barring new information, is that the published Taylor series for calculating values of sin(x) for ‘x’ in radians is accurate in the range of ‘x’ between -PI and PI.

The Continuing Saga of the Z80 Singleboard Computer

I’ve already posted about the fun I’m having with the Z80 singleboard computer (kit from CPUville) recently.

In addition to a FORTRAN compiler (Microsoft F80), I added the High Tech C compiler. I’ve written programs in FORTRAN, C and 8080 Assembler. I’ve used both the CP/M 2.2 ASM assembler and the M80 assembler that came with the F80 compiler. Except for one instance where my port reading assembly program won’t actually read the port, it’s been fun and games.

I’ve even created assembler programs that can be called from FORTRAN (the aforementioned port reading routine).

Last week, while exploring the various archives of CP/M software, especially compilers, I spied the DEC PL/I compiler. That looked really promising.

Back at my first job after my B.Sc., I worked at an IBM shop that sent me on a PL/I course. Afterward I spent the next year writing software in PL/I for a pair of IBM 3033 mainframes. It was all great fun.

Finding a working PL/I compiler was too good to pass up, so I grabbed the archive and beamed them to the Z80. After a bit of digging, I found my 1980 PL/I reference book, “PL/I Structured Programming” by Joan K. Hughes (2nd edition, Wiley, 1973). After reading through it to refresh my memory, I started building a few PL/I programs, following the examples in the book and then the chapter problems.

Some features of the IBM compiler were not available in the DEC (CP/M) version, but I had the DEC PL/I documentation to help me with the transition. Eventually I had written several working PL/I programs.

The past few days I’ve been playing with a Taylor series program for calculating Sin(x) (x in radians). I have the program working, but the answers diverge from ‘actual’ values in the range PI to 2PI. I had full debugging in the code, but could not really see the reason.

I decided to try converting the PL/I program to C, and then running it on a modern C compiler on one of my Ubuntu 18.04 servers.

SURPRISE!!! The C program has the exact same divergence! Even switching from ‘float’ to ‘double’ didn’t remove the divergence in the C program on a modern machine. I’ll definitely have to investigate further.

Just for fun I then took the working C program and beamed the code over to the Z80. The High Tech C compiler is sound, so it compiled the program easily. The run on the Z80 gave the exact same answers (with a small nod to precision on various platforms) as both the C/Linux version and the PL/I CP/M version. It’s either a really difficult to find coding mistake in my work, or a real phenomenon. As I said, I’ll have to investigate.

Where it all gets cute is timings. On the Linux box (big AMD 6-core server with loads of memory) the C program runs so fast it would be timed in milliseconds were I to try. Certainly faster than one could manually time it. The PL/I program on the Z80 takes 3min, 40.15 seconds to run. What was a surprise is the C program on the Z80 took 5min, 34.40 seconds! I never expected a C program to be that much slower than the PL/I program.

Now that I have FORTRAN, PL/I, C and Assembler all working, time to continue playing.

One last thing: I found a printing “bug” in the PL/I textbook. The formula for the Sin(x) Taylor series has two major errors. First, the terms have a denominator that is (2n+1)! (factorial), or 3!, 5!, 7!, 9! in the expanded formula found in the book. But some typesetter must have thought that an error, as the book has replace the ! with 1, giving denominators of 31, 51, 71 and 91. Not a small error when you are coding!. The other error is the terms alternate in sign (-1**n) so x-a+b-c+d and so on. The book had all + signs.

When debugging the massively incorrect results, I simply did a google search on ‘series solution of sin(x)’ and found the correct formula, then coded that. It is that corrected formula that still diverges from actual results for values of ‘x’ greater than PI.

When old beats modern (a computer story)

If you asked me to comment on whether an old computer technology could beat modern technology, I’d give the obvious answer: no.

Except my recent explorations with the two have proven that in some cases, the exact opposite is true. In my case, I’ve been playing with a Z80 single board computer whose design is based on a design from the 1970s. It’s a solid design, and the implementation by “CPUville” is awesome.

The only bit of “new” in the system is a wonderful little device known as an IDE to SD card interface. The CPUville Z80 singleboard has an IDE interface and connector which accepts this $10 board, which then accepts a SD card to act as hard disk for the system. It all works, and works exceptionally well. I end up with 4 large hard drives for the Z80, which is running CP/M 2.2

And that is where the magic, and indeed the beating takes place. No, the Z80 is not faster than a modern computer. It’s much, much slower. But CP/M was an almost fully realized operating system with a rich user community and a lot of available software, and that is where the difference lies in what I do.

I love writing programs, and especially in the older languages – FORTRAN, C and now PL/I. I used PL/I in my very first job in 1980 and 1981 in an IBM mainframe shop, and quite enjoyed it.

On other systems, getting compilers has been difficult, but for CP/M 2.2 there are extremely good compilers for FORTRAN (Microsoft’s F80), C (High Tech C) and now PL/I (Digital Equipment’s version). What is even more wonderful, they all work very well and compile my old programs nicely.

Of course there are quirks and things one must learn (or re-learn) but it’s all fun.

Which brings me to the “beats modern” part of this post. You see, at this moment I can’t compile C programs on my Windows 7 PC. I try, but the compiler I’m using (MinGW) is 32 bit and my version of Win7 is pure 64 bit, and the two currently hate each other. I know that I will eventually fix the problem, but I’m not in a hurry because I have several Linux boxes plus a Macbook, so I have C compilers available.

Getting FORTRAN was a bit tougher, but eventually I found a nice set of FORTRAN compilers for the Windows machine that work well. But with PL/I I’ve currently hit a wall.

I found a Linux PL/I compiler, but the archive is broken (unreadable). I cannot find a PL/I compiler for Win7.

But equally interesting is the fact that getting the compiler running on the Z80 CP/M system was just … easier. Essentially, the three compilers not only “just work”, they all tend to work in very similar ways. I have a good manual for FORTRAN, plus one for C, so I was able to get going quickly. But I have no PL/I manual for using it. I have a PL/I programming manual, but not one for ‘how to compile & link’. What is really cool is that knowing how to compile & link with FORTRAN and C, I just type the same commands into PL/I and it worked. I’m sure there are options I don’t know about, but basic operations are working fine.

And I’m loving it. Now if only I could get all these tools working as well on my other platforms.

More Fun with the CPUville Z80 Single board

During the Christmas break I built the CPUville Z80 single board, plus the ‘slow board’ which is really a really nice ‘blinkenlights’ display board for the Z80 single board. That was a fun build.

I also built the CPUville 8-bit computer (3 boards) plus register display board and added that to a separately built Z80 single board (with Z80 replaced by the 8-bit boards). That was also a fun build.

But the real fun began with the Z80 single board once I added an IDE 40-pin to CF (compact flash) controller board with a 4+ GB CF card. Following the CPUville instructions, I was able to modify/compile/install CPM 2.2 on the CF card giving me 4 large “hard disk” CP/M partitions.

THEN… I started playing. After reading a few CP/M manuals, I began to learn my way around the system. ED was perhaps the hardest to learn, only because the first manual neglected to mention that the display buffer was NOT filled ‘on entry’. One has to type ‘#A’ to load it with the file contents before you can see/edit anything.

I started with a few Z80 (or 8080) assembler programs, then found and loaded FORTRAN (F80). I then spent days playing with my FORTRAN programs that I wrote in the 1980’s during my Engineering degree and post-grad courses. Interestingly, they compiled and linked easier than when I tried them on the PiDP8 replica I built several years ago. The version of FORTRAN in F80 was just a bit more modern than the FORTRAN IV on the PiDP8, making things much easier and more fun.

Last week I found an loaded High Tech C compiler on the Z80. I compiled a few C programs from my earlier C programming days, as well as a few versions of the “calculate PI to N digits” programs. Again, tons of fun.

The interesting bits came trying to install the C compiler. It’s a lot of files, and when I tried loading them individually via “PCGET”, they crashed the terminal program. Seeking a better solution, I tried LZH unpack programs (didn’t work on modern LZH files), and eventually found that using modern WINZIP and an old CP/M UNZIP18.COM program, I was able to load whole groups of files to the Z80 and then unzip them in place. The only condition is that the CP/M unzip does not understand ‘modern’ zip methods, so you must zip them on the PC (Windows 7 in my case) with NO COMPRESSION.

The other ‘gotcha’ I discovered tonight is that you must be sure the ZIP files are named in CAPITAL letters. If you unzip lowercase named files on the Z80, they remain lowercase and kind of ‘disappear’ to CP/M. I could not even delete them until I asked on the ‘comp.os.cpm’ google group and was told about NSWEEP (or NSWP.COM). That program was able to delete them easily. I then rebuilt the zip with uppercase file names and it was fine.

So onward and upward with this wonderful true Z80 computer running CP/M 2.2, with FORTRAN, C and 8080/Z80 assembler.

The Wonderful World of Old

I’ve written before about the fun I’ve been having building and running old hardware systems, such as a PDP-8i and a PDP-11 replica. These both use faithful scale recreations of the front panel of the machine, complete with ‘blinkenlights’ and switches. Both use the Raspberry Pi (3B+) running a program called SIMH to faithfully recreate the hardware. The PiDP-8 runs the DEC OS, while my PiDP-11 runs BSD 2.11 on top of SIMH.

Both have been much fun. The PiDP-8 I used to run some of my 80’s FORTRAN programs, while the PiDP11 ran C and a simple web server.

I also built an Altair 8800 replica called the Altairduino that gets it’s power from an Arduino board, again running simulation software to mimic the Altair. I confess I haven’t done much with this system, even though it came with an SD card full of software including CP/M.

But this winter I spent some time building two kits that really brought back my enthusiasm for the Z80, and actually has me learning and running CP/M for the first time. (I was a TRSDOS then NewDOS TRS-80 user in the ’80s).

Both kits come from ‘CPUville’, a fellow who designed, built and now sells kits for the Z80 single board computer of Byte fame. The first kit was a single board Z80 system, complete with IDE interface and true RS-232 serial port. It runs a true Zilog Z80 CPU at 1.8x MHz, which was fast ‘in the day’. It supports a second ‘display’ board that offers all the LEDs and switches to see and interact with the Z80 in real time.

The second kit starts with the Z80 single board, but then replaces the Z80 with a set of 3 8-bit CPU boards that use discrete logic chips instead of a single processor. I topped it off with it’s own display board (LEDs and switches) and it’s a fully functional 8-bit computer.

But the real fun came when I started playing with the Z80 single board and installed CP/M on an IDE->SD card ‘hard disk’. With 2+GB to play with, it’s like a world of CP/M disks all in one. I started by installing CP/M 2.2, then Microsoft FORTRAN 80, and today HiTemp’s C compiler.

There have been frustrations, such as learning to use the ED editor, and other OS programs (PIP anyone?). But the ‘proof in the pudding’ as they say has been in how well it runs my ’80s vintage FORTRAN programs. Even though it’s only an 8-bit computer, and thus suffers from a severe lack of precision – the integer is only 2 bits long – it has successfully run most of my programs. There are a couple that are simply too big for the 64K RAM, but otherwise it’s been a blast.

Today I played with compression/archive programs (to get the C compiler installed) and now it’s happily calculating the first 1000 digits of PI.

It’s slow, but what I love the most about it is that it is NOT a replica, nor a simulation – it’s a real, honest-to-goodness Zilog Z80 on a single board talking to an older laptop (with real RS-232 port in the back) via non-simulated SERIAL interface (actual 9600 baud) and I couldn’t be happier.