Vendors who should know better (a server nightmare)

Gigzbyte and AMD, I’M TALKING TO YOU!!!

I need a new server for JupyterHub, and since I do like building servers and such things, I decided to do some research and buy a decent lower-cost “server-as-parts”.

I found from many reviews that the Gigabyte B450M DS3G motherboard, paird with the AMD Ryzen 5 2600 CPU was a killer low-cost solution. I added appropriate speed (3000MHz) Corsair DDR4 memory (16GB to start) and a M.2 250GB SSD, all to go into a 2 rack-space case with an EVGA 500W power supply.

After all the bits came, I carefully assembled it and tried the first “smoke test”. It ran, but immediately gave a set of BIOS “error beeps”. Specifically “long-short-short” which means NO VIDEO for this BIOS.

Sure enough, plugging in either known good HDMI or DVI cables to a working monitor gave nothing.

Searching on the internet proved this to be a VERY common problem, known since at least Nov 2018. Essentially, the motherboard is shipped with the wrong BIOS version. It’s early and doesn’t know about the new CPU with on-board video.

The solution is to flash a new BIOS… but how? With no video, you can’t see what’s going on to flash a BIOS. Very expensive motherboards have “Qflash+” which lets you put the bios on a data key in a special USB slot and it “just flashes”. My motherboard, the less expensive one, doesn’t have that feature. It can update from USB key (Qflash) but not “the plus”.

AMD’s solution is to have you request “a boot kit”. They send you a lesser (older) CPU “on loan” to fire up the motherboard, flash the bios and then send back. However, it was instantly obvious they have zero intention of doing this – you must “prove” you own the chip by taking a photo of the CPU clearly showing the serial number and model. PROBLEM: these are now covered with opaque white thermal compound if you’ve installed the supplied CPU cooling fan as any intelligent builder would do. So AMD wants you to scrape off the thermal compound and take the photo, then use ??? (what???) when you finally put it all back together. Well, I’m not stupid so I’m not running a CPU “dry”. Which means I can’t take the obligatory photo, so I can’t have the “boot kit”. What a bunch of idiots. (and I told them so by reply email and in an on-line review).

Next idea: put in a PCIE graphics card into one of the PCIE slots and boot graphics that way. I was able to score a very old PCIE VGA/DVI card from a local computer company’s scrap bin, and sure enough, it WORKED!!!

It sits an inch higher than the case, so it’s not a permanent solution, but it worked and I had VGA to see the BIOS screen.

After reading the BIOS update procedures, I carefully updated the BIOS to the latest version. Everything works… EXCEPT STILL NO VIDEO!!!

I’ve got a second trouble ticket in with Gigabyte, but who knows when they’ll answer.

Since this is a server I could buy a $50 shorter PCI graphics card and just use it to install Ubuntu, as the server will actually never be connected to a video monitor unless there’s a problem.

BUT WHAT WERE THESE IDIOTS THINKING – SELLING STUFF THAT DOESN’T WORK AND THEN HAVING ABYSMAL CUSTOMER SUPPORT (and the latest BIOS still doesn’t work).

If this was ‘bleeding edge’ like a game machine, I could see this as a typical issue, but this isn’t bleeding edge stuff – or shouldn’t be.

WORST CASE, BOTH CHIP AND MOTHERBOARD GET RETURNED IN MARCH.

Well, this is unexpected (a server story)

As the title says, I’ve been having a most weird server experience, culminating in a rather fascinating and unexpected discovery.

As posted recently, I’ve been experimenting with Jupyter Notebooks using JupyterHub on Ubuntu 18.04 Server.

I started with a server built on Oracle’s VirtualBox 5.x running on my development machine, which is an Intel quad-core I7 with 16GB of memory and a couple of smaller SSDs. I gave the virtual Ubuntu 8GB of memory and 2 cores, plus 64GB of disk space. This is where I cut my teeth on installing Jupyter, first locally, then JupyterLabs then JupyterHub (again locally) before finally installing JH globally. On the way I learned quite a lot, and took these lessons to all other platforms via some detailed documents I wrote.

The first Ubuntu was desktop, complete with lots of X-type stuff. It was fast, it was good, but I wanted a more “dedicated” server.

My second server was Ubuntu server running on my Windows Server 2008 R2 file server. It’s a backup file server, so it wasn’t doing much. I installed Oracle VirtualBox, this time V6.0 and Ubuntu 18.04 server as I don’t need the x-stuff and wanted a lean, faster server rather than a desktop. (the iso install images are quite dramatically different in size). This machine is a quad-core Xeon of recent vintage.

he server only had 8GB physical memory, so I could only give the virtual server 4GB. As a result, it was very slow.

About that time I resurrected a ‘pizza box’ 1U quad-core Xeon server that also had 8GB of memory (it was the max for that vintage machine). As this was a dedicated box, I could install Ubuntu 18.04 server as the native OS and give it all the memory. After installing JupyterHub, it seemed… VERY sluggish. Opening notebooks took a very long time (minutes) and sometimes they would not open at all. I experienced problems connecting to the kernel, and it was just very frustrating.

I’d deleted both virtual machines, so decided to try another on the development I7 box. Giving it 8GB, 2 cores and 64GB disk as before, I installed JupyterHub.

At this point, I have two almost identical servers, with the same memory. The quad-Xeon has 4 cores, the virtual I7 has only 2, but otherwise things are very close.

And here came the unexpected surprise. The I7 virtualmachine is easily 10x faster to my perception than the Xeon. It’s truly a night and day difference. Where the Xeon is sluggish to open notebooks and connect the kernel (if it even succeeds), the I7-virtual is quick and responsive. Editing notebooks is a joy, instead of a grind. Things are quick, kernels don’t die and it’s just a totally different environment.

Yet aside from hardware, everything about the installs is identical. Even the notebooks come from the same github repo, so are identical.

Today I ran some benchmark tests on both machines, and every test shows the I7-virtual machine (with 2 cores) is double the speed of the quad-core xeon with 4 cores. It’s astounding.

Rebuttal: Why I DO Like the Linux Model

OK, this does seem like a bit of a back-pedal, doesn’t it?

Well, that’s the thing about the “Linux Model” – the very things that are so irritating can also be the reason it works.

Let me explain by returning to one example I mentioned in my prior post, using C++ in Jupyter Hub.

To recap: C++ used to work in Jupyter Hub, then suddenly stopped after an update of some packages in Conda. Conda is the environment that manages Jupyter Hub, and works a lot like apt-get for Linux. After one update, all things C++ failed to the point the kernel would not load at all. An examination of logs revealed <features.h> was missing as well as many other library errors.

Simple google search revealed many with similar (but NOT the same) problems, and many complicated workarounds.

This is one of the problems with the Linux Model. The many “solutions” can often make the problem infinitely worse. Worse to the point you throw up your hands and just rebuild from scratch, which most certainly did NOT want to do. Part of the problem is that “solutions” can come from anyone in the community: seasoned pros, or first-time amateurs. Most don’t document what they are doing very well, and so you make assumptions… and get in worse trouble.

The Linux Model solution is to try and find an authoritative source. Usually this means contacting the team that developed the “thing” that’s broken. Often (and again a failing of the open Linux Model) the team has moved on to other things and really doesn’t care or maintain the broken thing. In such cases, you are pretty much hooped unless you can get the code and love delving into ancient artifacts.

It also requires a LOT of digging in many cases to find the team, or else… EXPERIENCE knowing where to look.

Fortunately, I was beginning to obtain that experience. (and NO, it’s most definiely NOT the group of Stack Overflow websites, but that opinion is for another day). After starting with Jupyter Hub, I began noticing that a lot of the projects were hosted on github.com. I’ve used github before, but only to download/install things. With Jupyter, I began noticing a lot of activity happening on the “Issues” tab. Here I discovered the magic: if the project was active, the developers READ the issues and would comment/reply.

Knowing this, I returned to my C++ problem. I found the package on github, and used issues to contact the team with my problem…. “it’s busted” but stated more “unix like” 😀

Within an hour one of the developers contacted me to say they’d changed the way they distributed the package for the very reason I mentioned (C++ library problems). They rewrote the distribution and moved the code from a custom source to the Conda standard source “conda-forge”. However, the old code was still “out there”. I was told to grab the new code and it shoudl work.

I did this, and it didn’t work. However, having chatted with a developer, I simply updated my “Issue”. The next day I received a reply: remove EVERYTHING from the old distro source. Using “conda list” I could clearly see MANY packages (not just the base C++ package) came from the “now bad source”. After removing all of them and reinstalling the main package from the proper source (conda-forge), I tried my C++ example and it worked perfectly.

So the Linux Model does work, but you have to do a lot more homework and find the place where the developers’ hang out with the current code.

For Tomcat, that’s the Tomcat-users or Tomcat-devel list group. For my 8-bit computer replicas, that place is some specific google groups. For most things involving Jupyter Hub, that place is the appropriate github.com repo (and the Issues tab).

My final thought for now on the Linux Model is that it does work for almost anything current. The big bonus is there is often a HUGE community of active developers who really want their work to be appreciated and used. Find them, and ask properly worded respectful questions, and you can see the Model work beautifully.

It’s Nice When Things Work

This is about LetsEncrypt, JupyterHub and Tomcat.

I built my JupyterHub server on a quad-core xeon 1U ‘pizza box’ server I had spare. It’s short on memory because this generation HP Proliant server maxed out at 8gig, so that’s all I can put in it. Still, it works and is a good demo platform for JupyterHub and my Java course revision project.

JupyterHub really wants to be running as secure HTTP (HTTPS) with a proper certificate. I put the server on a different port (not 443) but can still reach it from my domain, using packet-filter redirection in my firewall.

But – it wants that proper certificate. Typically one would just create a ‘self-signed’ cert using Java’s keytool and use that for Tomcat, but Jupyter wanted something else.

Fortunately I found enough documentation and tutorials to enable me to install and generate a LetsEncrypt (free) certificate that worked perfectly with JupyterHub. There were issues, mostly involving the need to create the certificate manually, but once these were resolved it worked perfectly.

This past week I wondered “could I use the LetsEncrypt certificate with my Tomcat application?”. I searched the web, and found several rather conflicting accounts of how to do it. I tried a few, and all failed.

Eventually I found one that started with “forget all the difficult stuff you’ve read. Installing a LetsEncrypt ‘pem’ file into a Tomcat keyfile is easy. Here’s how…”. I followed that two-command process, and was immediately rewarded with full certificate security for my Tomcat application, WITHOUT having to create a browser exception for the certificate.

It is so very nice when something “just works” the way it’s supposed to work. It’s even nicer when you find simple, unambiguous instructions as a guide. Thanks to Maximilian Böhm and his guide here: https://maximilian-boehm.com/en-gb/blog/create-a-java-keystore-jks-from-lets-encrypt-certificates-1884000/

Why I don’t like the ‘Linux model’

There’s a thing I’m going to call the ‘Linux model’. Not because it pertains ONLY to Linux, but because most of what’s wrong with this model often starts with Linux and stuff that runs (best) on Linux.

In a way, this is really a story about all the stuff that’s broken in JupyterHub, but it goes beyond that… it’s the general model that’s broken, and the model really owes it’s roots to Linux.

Basically, when you install something on a Linux box, and even the OS for the Linux box itself, it’s probably broken. That is, *something* won’t work after installing it, and there is no way short of digging into some code somewhere of ever fixing it.

Worse, the breaking of such stuff is often super complex and intricate – somewhere buried in a log somewhere is a message regarding “package X failed due to expecting library Y to be x.x.x but was z.z.z”. Or similar obscure “thing” that takes days to figure out, if ever.

You can post the error on google and what you get most of the time is a dozen hits – all questions on StackOverflow asking the same thing and getting precious little of value in response.

Worse, you are expected to manually update packages on an almost continuous basis, and (of course) such updates often break things that were working fine before the update. Yet if you don’t update, something ELSE will break.

The entire model is broken.

What triggered this particular rant today is that I spent ages figuring out how to (finally) install C++ into JupyterHub so I could run C++ notebooks. Yesterday, I found it broken. The log complains about a library *supplied by the supporter of this C++ package* being the wrong date compared to what’s expected. It doesn’t matter. C++ in JupyterHub is now broken, and good luck finding anyone to respond with anything useful. Even less likely is that the C++ supplier will fix it anytime soon.

That’s the other problem with the Linux model. Everything is well documented and often supplied with tutorials. BUT… THEY ARE ALL YEARS OUT OF DATE. Worse, the stuff they describe has changed so much in the years since that you cannot follow the tutorial without being worse off then if you’d just thrown mud at a wall.

The biggest problem with the Linux model is that noone really cares. “I did this really cool thing in 2012 but now I’m bored and … who cares”, seems the mantra of every developer. Nothing is maintained for long. It’s becoming obvious that nothing is really being used either. Otherwise the failures would be noted and (hopefully) fixed.

Overall, it’s a really depressing time to be trying to actually do anything on a Linux box.