Jupyterhub Maintenance Nightmare

I just finally finished what was a week-long nightmare involving Jupyterhub.

It started simply enough. I have always kept my own documentation on “how to do” various things. In this case I had recorded steps to update all the packages in Jupyterhub on an ongoing basis. Jupyterhub lives inside something called Anaconda, or conda for short. Upgrading is supposed to be as simple as “conda upgrade xxx” where xxx is the package to upgrade.

I have been using this process since first installing Jupyterhub on a server back in January 2019.

Last week I started working through the 8 or so packages that I normally update, and one of them failed the update with a blizzard of messages and warnings. But that wasn’t the worst part. The worst part was that it also killed conda dead. Even typing “conda -h”, which should print out a simple help file, instead resulted in a page of error messages warning of missing libraries before failing. Nothing I tried would work – conda was dead.

I grabbed my notes from January, and started to reinstall anaconda/conda. It’s really quite simple: delete the anaconda3 directory and reinstall.

It all worked until I came to the last package; xeus-cling, which provides the C++ support I require. That installation failed with (again) a set of rather bizarre “missing stuff” messages.

I left it at that… after all everything else worked, so I left a message on the xeus-cling github page and waited.

This week I got a reply: conda 4.7.9 was broken. Conda 4.7.10 worked with xeus-cling. I checked, and sure enough I was on version 4.7.9. I updated (conda upgrade –all) and tried xeus-cling. It did not work. Instead, it just hung trying to resolve the ‘environment’.

Eventually, after several trial-and-error sessions, I resolved the situation and once again have a perfectly good, working Jupyterhub. I’m very glad that reinstalling anaconda3 is so simple, as this was done several times until I got the order of things correct.

In a nutshell, you must install anaconda3 (from 2019.03 script) (v 4.6.11), then install xeus-cling before anything else. Only in this way can you avoid the timeout problem with xeus-cling and conda 4.7.10. This is because all installs via conda update conda to the latest version.

Once xeus-cling is installed, everything else installs just fine, and the system is back up and running. Still, it was a nightmare figuring out what was wrong, and I can’t say I’m very impressed with conda for breaking things which were working in pre-production environment.

Server – A Nightmare No More

My last post about the new server indicated the Gigabyte motherboard was returned and an ASUS motherboard ordered as a replacement. I also ordered the cheapest video card I could find as there is/was no on-board video with the basic AMD Ryzen CPU.

The parts arrived in the first week of March, and I promptly put everything together. This time I installed the cpu and head sink/fan on a sturdy table (with static protection), as well as the memory and M.2 SSD. The new motherboard is a bit longer than the first one. It nicely picks up some mounting standoffs on the end, leaving nothing unsupported.

The motherboard went into the case, and the power supply connected easily. There was an initial problem with the front panel connectors, but ASUS had a QR code linking to a very complete ‘motherboard connector and header’ manual that helped immeasurably.

With everything connected, I started the machine and was immediately rewarded by a good boot sequence and the video BIOS screen. After verifying the BIOS settings, I started to install the OS.

Here I had a problem. The 16gig data key was not recognized about 9 times out of 10. Finally I grabbed a different brand data key (same size), re-flashed Ubutntu 18.04 server and was able to install the OS in very short order.

Once I was sure all was well, I buttoned up the case and installed it in my server rack. It’s been running without issues since, having JupyterHub installed as the primary application. It’s also blazingly fast compared to all my other Ubuntu boxes.

Overall I am now quite happy with the 6-core AMD Ryzen chip, though I still wish it had come with at least minimal VGA graphics, as that’s really all a server requires.

Server Nightmare, continued

When last we left our tale of woe, the server was running but without video. Messages to AMD and Gigabyte were unanswered, and the internet was not much help other than to suggest a BIOS update was needed.

Since then much has happened. I did finally hear from both vendors; more on that later.

In the meantime, I decided to try one internet suggestion – adding a separate graphics card to flash the BIOS. The motherboard has three PCI slots, but I didn’t have a PCI video card. I called a local computer shop to inquire whether they might have something in a “junk bin”, and they did. I went to town (literally) and picked it up… FREE.

Back home I plugged it in, and it worked. I now had VGA video, which is all I really need for a server, and certainly sufficient to flash BIOS.

With video, I managed to flash the BIOS from the older version (F2), first to F3 and finally to F4 (the latest). There were a few issues along the way, but by now I had downloaded a full manual and used it as a guide. I did notice a few issues with the motherboard and USB data keys that bothered me at the time, such as booting with some keys locked up the USB keyboard, but set that aside for the time.

With the BIOS updated to “latest”, I removed the PCI graphics card, powered on, and… STILL no video. I was flummoxed.

However, help came from a surprising source. FINALLY I heard back from both vendors. Both said the same thing: the AMD Ryzen 5 2600 CPU does not have on-board graphics. This was a surprise to me as there was NOTHING on the manufacturer sites or manufacturer materials supplied to Amazon.ca to suggest this was the case when I chose these components. However, given what I was seeing, it made sense. As it turns out, AMD sells two “things”: a CPU with no on-board GPU, and a thing called an APU which has the GPU on-board. Who knew?

I decided I could live with this and sourced a cheap PCI low profile graphics card as the free one was full height and won’t fit in the 2U rack case.

I decided also to install my server OS – Ubuntu 18.04.01 (server) as the video card wouldn’t be an issue as I always use VGA on servers.

Here is where the USB issue finally bit me. More than half of the time, the install failed with a USB error. Sometimes it locked up the USB keyboard as well. Only once out of perhaps 12 attempts did it start to load the OS, and then it failed when I plugged in the network cable by scrambling the video (what???).

Ultimately I decided the USB was flakey and initiated a return from Amazon.ca for the Gigabyte motherboard (reason: defective… ‘flakey USB’). It’s already boxed and mailed back as I write this.

I decided to keep everything else, as I do like the other components and am willing to give the AMD Ryzen a chance. I would have kept the Gigabyte motherboard as well were it not defective. However, given the several reports of similar USB flakey-ness by other reviewers, I decided to buy an ASUS motherboard designed for this CPU.

One last annoying tidbit – the ASUS site actually states that the AMD chip does not have on-board graphics and you’ll need to buy a video card. I wish I’d gone with ASUS from the start – at least I would not have been surprised and wasted 3 days chasing phantom video problems.

Vendors who should know better (a server nightmare)

Gigzbyte and AMD, I’M TALKING TO YOU!!!

I need a new server for JupyterHub, and since I do like building servers and such things, I decided to do some research and buy a decent lower-cost “server-as-parts”.

I found from many reviews that the Gigabyte B450M DS3G motherboard, paird with the AMD Ryzen 5 2600 CPU was a killer low-cost solution. I added appropriate speed (3000MHz) Corsair DDR4 memory (16GB to start) and a M.2 250GB SSD, all to go into a 2 rack-space case with an EVGA 500W power supply.

After all the bits came, I carefully assembled it and tried the first “smoke test”. It ran, but immediately gave a set of BIOS “error beeps”. Specifically “long-short-short” which means NO VIDEO for this BIOS.

Sure enough, plugging in either known good HDMI or DVI cables to a working monitor gave nothing.

Searching on the internet proved this to be a VERY common problem, known since at least Nov 2018. Essentially, the motherboard is shipped with the wrong BIOS version. It’s early and doesn’t know about the new CPU with on-board video.

The solution is to flash a new BIOS… but how? With no video, you can’t see what’s going on to flash a BIOS. Very expensive motherboards have “Qflash+” which lets you put the bios on a data key in a special USB slot and it “just flashes”. My motherboard, the less expensive one, doesn’t have that feature. It can update from USB key (Qflash) but not “the plus”.

AMD’s solution is to have you request “a boot kit”. They send you a lesser (older) CPU “on loan” to fire up the motherboard, flash the bios and then send back. However, it was instantly obvious they have zero intention of doing this – you must “prove” you own the chip by taking a photo of the CPU clearly showing the serial number and model. PROBLEM: these are now covered with opaque white thermal compound if you’ve installed the supplied CPU cooling fan as any intelligent builder would do. So AMD wants you to scrape off the thermal compound and take the photo, then use ??? (what???) when you finally put it all back together. Well, I’m not stupid so I’m not running a CPU “dry”. Which means I can’t take the obligatory photo, so I can’t have the “boot kit”. What a bunch of idiots. (and I told them so by reply email and in an on-line review).

Next idea: put in a PCIE graphics card into one of the PCIE slots and boot graphics that way. I was able to score a very old PCIE VGA/DVI card from a local computer company’s scrap bin, and sure enough, it WORKED!!!

It sits an inch higher than the case, so it’s not a permanent solution, but it worked and I had VGA to see the BIOS screen.

After reading the BIOS update procedures, I carefully updated the BIOS to the latest version. Everything works… EXCEPT STILL NO VIDEO!!!

I’ve got a second trouble ticket in with Gigabyte, but who knows when they’ll answer.

Since this is a server I could buy a $50 shorter PCI graphics card and just use it to install Ubuntu, as the server will actually never be connected to a video monitor unless there’s a problem.

BUT WHAT WERE THESE IDIOTS THINKING – SELLING STUFF THAT DOESN’T WORK AND THEN HAVING ABYSMAL CUSTOMER SUPPORT (and the latest BIOS still doesn’t work).

If this was ‘bleeding edge’ like a game machine, I could see this as a typical issue, but this isn’t bleeding edge stuff – or shouldn’t be.

WORST CASE, BOTH CHIP AND MOTHERBOARD GET RETURNED IN MARCH.

Well, this is unexpected (a server story)

As the title says, I’ve been having a most weird server experience, culminating in a rather fascinating and unexpected discovery.

As posted recently, I’ve been experimenting with Jupyter Notebooks using JupyterHub on Ubuntu 18.04 Server.

I started with a server built on Oracle’s VirtualBox 5.x running on my development machine, which is an Intel quad-core I7 with 16GB of memory and a couple of smaller SSDs. I gave the virtual Ubuntu 8GB of memory and 2 cores, plus 64GB of disk space. This is where I cut my teeth on installing Jupyter, first locally, then JupyterLabs then JupyterHub (again locally) before finally installing JH globally. On the way I learned quite a lot, and took these lessons to all other platforms via some detailed documents I wrote.

The first Ubuntu was desktop, complete with lots of X-type stuff. It was fast, it was good, but I wanted a more “dedicated” server.

My second server was Ubuntu server running on my Windows Server 2008 R2 file server. It’s a backup file server, so it wasn’t doing much. I installed Oracle VirtualBox, this time V6.0 and Ubuntu 18.04 server as I don’t need the x-stuff and wanted a lean, faster server rather than a desktop. (the iso install images are quite dramatically different in size). This machine is a quad-core Xeon of recent vintage.

he server only had 8GB physical memory, so I could only give the virtual server 4GB. As a result, it was very slow.

About that time I resurrected a ‘pizza box’ 1U quad-core Xeon server that also had 8GB of memory (it was the max for that vintage machine). As this was a dedicated box, I could install Ubuntu 18.04 server as the native OS and give it all the memory. After installing JupyterHub, it seemed… VERY sluggish. Opening notebooks took a very long time (minutes) and sometimes they would not open at all. I experienced problems connecting to the kernel, and it was just very frustrating.

I’d deleted both virtual machines, so decided to try another on the development I7 box. Giving it 8GB, 2 cores and 64GB disk as before, I installed JupyterHub.

At this point, I have two almost identical servers, with the same memory. The quad-Xeon has 4 cores, the virtual I7 has only 2, but otherwise things are very close.

And here came the unexpected surprise. The I7 virtualmachine is easily 10x faster to my perception than the Xeon. It’s truly a night and day difference. Where the Xeon is sluggish to open notebooks and connect the kernel (if it even succeeds), the I7-virtual is quick and responsive. Editing notebooks is a joy, instead of a grind. Things are quick, kernels don’t die and it’s just a totally different environment.

Yet aside from hardware, everything about the installs is identical. Even the notebooks come from the same github repo, so are identical.

Today I ran some benchmark tests on both machines, and every test shows the I7-virtual machine (with 2 cores) is double the speed of the quad-core xeon with 4 cores. It’s astounding.

Why I don’t like the ‘Linux model’

There’s a thing I’m going to call the ‘Linux model’. Not because it pertains ONLY to Linux, but because most of what’s wrong with this model often starts with Linux and stuff that runs (best) on Linux.

In a way, this is really a story about all the stuff that’s broken in JupyterHub, but it goes beyond that… it’s the general model that’s broken, and the model really owes it’s roots to Linux.

Basically, when you install something on a Linux box, and even the OS for the Linux box itself, it’s probably broken. That is, *something* won’t work after installing it, and there is no way short of digging into some code somewhere of ever fixing it.

Worse, the breaking of such stuff is often super complex and intricate – somewhere buried in a log somewhere is a message regarding “package X failed due to expecting library Y to be x.x.x but was z.z.z”. Or similar obscure “thing” that takes days to figure out, if ever.

You can post the error on google and what you get most of the time is a dozen hits – all questions on StackOverflow asking the same thing and getting precious little of value in response.

Worse, you are expected to manually update packages on an almost continuous basis, and (of course) such updates often break things that were working fine before the update. Yet if you don’t update, something ELSE will break.

The entire model is broken.

What triggered this particular rant today is that I spent ages figuring out how to (finally) install C++ into JupyterHub so I could run C++ notebooks. Yesterday, I found it broken. The log complains about a library *supplied by the supporter of this C++ package* being the wrong date compared to what’s expected. It doesn’t matter. C++ in JupyterHub is now broken, and good luck finding anyone to respond with anything useful. Even less likely is that the C++ supplier will fix it anytime soon.

That’s the other problem with the Linux model. Everything is well documented and often supplied with tutorials. BUT… THEY ARE ALL YEARS OUT OF DATE. Worse, the stuff they describe has changed so much in the years since that you cannot follow the tutorial without being worse off then if you’d just thrown mud at a wall.

The biggest problem with the Linux model is that noone really cares. “I did this really cool thing in 2012 but now I’m bored and … who cares”, seems the mantra of every developer. Nothing is maintained for long. It’s becoming obvious that nothing is really being used either. Otherwise the failures would be noted and (hopefully) fixed.

Overall, it’s a really depressing time to be trying to actually do anything on a Linux box.

JupyterHub Chronicles

I’ve continued to work with JupyterHub since my last post, and have made significant progress towards my overall goal of creating a real system for developing a programming course.

The first development was to recreate my work to date on a new server: Ubuntu 18.04 Server, as opposed to Desktop, which I had been using. I also moved this server to VirtualBox (now V6) on a different machine. The new machine acts as a file server and has capacity to spare, plus stays on “as a server” all the time.

Installing Ubuntu 18.04 Server on the machine was not difficult, and following my scripts I was able to create JupyterHub on the new server, with full encryption and networked through “huntrods.com”. I also recreated the various demo logins to allow me to share this work with other colleagues.

I finished developing “Unit 0” for my Java programming course, as well as exploring other resources such as using it for my Network Java Programming course. There were some issues, but most of the programs work.

I also found some significant shortcomings in SciJava, which I contacted the developers for more documentation. Their response was “move to BeakerX, as it has a full Java implmentation”. They also informed me that SciJava might be End-Of-Life soon, which would be unfortunate.

However, I installed BeakerX according to guidelines from a developer on my single-user Ubuntu Desktop. It worked, so I then tried installing it on the Ubuntu Server. After one set of instructions failed, I reverted to the method that worked for many of the packages, and it worked.

I now have a full-featured Java running on JupyterHub under BeakerX. There is one outstanding issue that affects both BeakerX-Java as well as SciJava: neither will accept user input from the keyboard.

Another limit on BeakerX-Java is that it won’t run fragments of code that aren’t real Java. Example: SciJava will evaluate “10+23” and output “33”. BeakerX-Java gives an error as would happen with “real” Java (which is what BeakerX has).

It turns out (from the developer) that SciJava is really a Java+Groovy hybrid, which is great for what I’d been doing, but isn’t really “real” Java.

Either I modify my Unit 0, or go with the SciJava in some notebooks and BeakerX-Java in others.

However, it’s great to have full-blown Java available in my notebooks.

JupyterHub – it’s been a long journey (and it’s not over yet…)

I started working with Jupyter Notebooks in late November (2018), and was rewarded fairly quickly with the ability to create notebooks for Java (SciJava), Chemistry (rdkit), Engineering (scipy), graphics (matplotlib) and Geography (Basemap).

However, the real sticking point was these were all pages executing Jupyter on a local user account, running on a VirtualBox Ubuntu Linux server (18.10) that I’d created.

The real goal was to create a Jupyter system that would work for multiple users, so that I could use it for my new revision of “Introduction to Computing – Java” for Athabasca University. This meant running JupyterHub.

Along the way I moved to Ubuntu 18.04LTS (a checkpoint version) and spent hours on google, youtube and the plethora of Jupyter (and JupyterHub) pages. There were many frustrations along the way, from a complete communications breakdown in forums trying to get a site certificate (letsencrypt), to documentation and tutorials written in 2015 and never updated when everything (and I do mean everything) changed in the time since.

By December 5, I was able to create a functioning JupyterHub on huntrods.com with the proper certificate. The only kernel running was Python3, but it featured either PAM (local user) authentication or OAuth (Github login) authentication, so I was pretty happy.

BUT… (and this is huge) I really needed SciJava, or writing a Java course would be a bust.

The breakthrough came this week – yesterday, in fact. After repeated ‘banging head against the wall’ attempts, I was able to install SciJava for all users. With that success, it was relatively simple to install the other libraries (noted above) so that all my single-user demonstration notebooks ran in the new JupyterHub.

I was off and running, and quickly wrote my first notebook for the Java course. It’s everything I wanted, and more. It’s really a new way of “doing programming”, a mix of documentation and program code that works almost seamlessly together. Instead of a book with dead code examples, the code is ‘alive’ – press run and it executes. Better still, the student can change the ‘book code’ and immediately see the change take effect. It’s brilliant!

Today I worked on getting the Hub automated with supervisor. My next project is to store the notebook pages in a Git repository, either GitHub or local to the server, and then refresh them whenever users log in to the Hub.

Eventually I’ll use Git for notebook version management for all users, but one step at a time.


Email Oops

I woke up one morning, and checked my email as usual. All was good. A little later, I wanted to send a reply to one message.

It would not send. I kept getting “timeout on mail server” errors. I tried several things, and nothing worked. Finally, I called my email provider to as if the mail server was in some way affected.

Nope. But then I got asked a series of questions about my config. Apparently “everything” was wrong with it. I made the changes they recommended, but hate them as the password is sent in plain text. Yuk. But… at least I could send email again.

Later in the day I was doing some other work, and had reason to open the taskbar box (win 7). I noticed something odd. The Pulse Connect icon showed it was active. I have to use Pulse to create a secure tunnel to AU in order to view exams that I mark. Usually I activate the tunnel, mark the exam and then disconnect. However, this day I saw that I was still connected.

Acting on a hunch, I disconnected the Pulse tunnel. Then I opened my email and reset the configurations to what I had before the morning phone call. Lo and behold, I could send email again with a secure password.

SO – the tunnel to AU was interfering with access to my email provider’s SNMP (send) server. Interesting. Something to note in case I do that again.

Notes from all over for Dec 22

Just some notes on stuff that’s happening as of Dec 22.

Linda’s Windows 10 computer, after a few configuration teething pains, is running quite well. Getting rid of the lock screen took 3 attempts as Microsoft is determined to foist this crap on users, even to the tune of disabling workarounds with each new update. It remains to be seen if my efforts will work for the longer term as MS is so very determined.

We did blow ‘edge’ away. It’s easily the worst browser I’ve ever seen. Basically, it has almost zero configuration options, and the few it does have it ignores. Gone forever and gladly back to Firefox. Likewise the default ‘mail’ app is gone and Thunderbird again rules the emails. Like edge, ‘mail’ is another MS app that can’t even play nice – not even with other MS things like Outlook. What a damaged, untested, unprofessional piece of crap.

I did install Office 2016 this week thanks to a “Home User Program” deal from MS. Because Athabasca U bought into the whole MS lock-in, we get to buy home versions for really cheap (like $13 for Office 2016 pro!). It’s OK. I personally prefer Office 2013 because that was the last version without “THE RIBBON”. Yet another unwanted MS user interface “update”.

As for my AU work, I can’t hear people on the phone very well, and certainly not upset persons who make talk fast and in a higher register. After consultation with other AU academics, I bought “MagicJack” from the main website as it was on sale. It does come from the USA and took a while to arrive, and the free phone number is only USA, but it does indeed do what it claims. I paid the extra $10 to get a CDN number (Edmonton exchange) and then had AU tie it to my academic 1-888 number. By yesterday it was all working tickety-boo. Better yet – any voicemail message gets emailed to me as an audio file so I can keep track. I can use a headset when calling anywhere in North America (free) so it’s awesome. Eventually I plan to see if it would work to replace most of the land line features, but not yet. First to see it in action.

I bought a leak detector for my underwater camera, and it came after almost a month in the postal system. Still, not bad coming from Slovenia. It’s really well built and should provide extra protection against flooding for the big underwater camera system.

Speaking of which, the replacement Kraken ring light/strobe came a few weeks ago, and worked correctly from the box. Nice to know it wasn’t simply user error but rather some issue with the optical strobe sensor.

That’s all for now. Time for a Christmas break.

Merry Christmas to all, and a very Happy New Year!