Server upgrades - portage hell and beyond

Posted by Dominic Cronin at May 31, 2009 08:35 PM | Permalink

Oh what fun you can have once you start to upgrade a previously working server.

So it's now a couple of months ago that I decided to blow the dust off some unfinished business on my Gentoo server. I'd always thought it would be nice to get a desktop running, but it had never been a priority. After all - the machine is used as a server. It sits under the telly and serves up the occasional web page. A desktop is definitly a "nice-to-have" feature rather than an essential one. As usual with such things, it expanded out into a mega-project.

I'd previously tried to get KDE running, and now I'd be installing Gnome, I'd be adding a lot of packages and uninstallling quite some others, so the first order of the day was some general housekeeping and clean-up. Gentoo linux is characterised by the portage package management and build system. You specify that you want to use a particular package by using the emerge utility, and the relevant code is pulled from the servers, compiled and installed automatically. If the module you've requested needs some other module, that will be built too, and so on - from there it's turtles all the way down.

Portage keeps track of the packages you've specified under the heading of "world". You can sync your system with the latest releases, and then rebuild world. Lots of people do this very regularly, and thereby ensure that they have the latest-greatest versions of everything running on their computers. (Obviously, you can also specify that you want specific versions for some things.) Anyway, I started off in the standard way by running emerge --sync and then emerge --update --deep --newuse. Then I watched the breakage roll in. It had been a couple of years since I'd done this, and I'd definitely never done it after my previous abortive attempts at a desktop, so my system was, frankly, a mess. The main problem is that you get one package that isn't compatible with another. Then you have to fiddle with versions, or just uninstall things just to get it working. The longer you leave it, the bigger a puzzle it is. To put it in perspective, people who use Gentoo for their desktop machine often get it in sync every day. On a server you don't update much, you might let it go six months. A couple of years is just crazy.

Quite some time ago, I had realised that using portage to keep my Zope/Plone server up to date would just be a world of pain. There are just too many dependencies within the Python/Zope/Plone stack. Specifically, the problem is that to upgrade, you pretty much need to run the old version side-by-side with the new one while you upgrade your data. Portage generally lets you have only one version of a package at any given time. There are some rare exceptions, in which case separate versions can co-exist, in what are known as "slots". For Zope and Plone, I chose instead to use Buildout, which does something similar to portage, but for python-based systems. This worked fine, but it meant that portage didn't know about these systems. (You can install them using portage, but not at the versions I needed to satisfy some of the plone add-ons I was using.)

So I wrestled with portage through the wee-small-hours night after night - got all sleep-deprived and cranky, and otherwise wasn't much further forward. Quite a long time later, I emerged (pun unintentional, but acknowledged) with a system where all the dependencies lined up beautifully. Everything was clean, and all desktop-ish things were consigned to oblivion.

Great stuff, I thought - now we're cooking. So I went ahead and started to add Gnome to my system, only to discover that I needed to juggle with yet more dependencies in order to get Gnome installed. (Gnome requires quite a few packages, and most of them have their own requirements too!) Never mind - a few days later I was good to go. Super!

Except... aaargh... along the way, I'd broken my web site. You see, to get Gnome running (at least at the latest version, which is what I wanted), I'd installed Python 2.5. Why not, eh? After all, 2.4 is pretty old hat; they even have 2.6 these days. But - it's Zope, you see. I'm running Plone 3.1.7, which requires Zope 2.10.6, which requires Python 2.4.4. So when I went to start up my web site, it didn't - Zope pretty comprehensively didn't want to run on Python 2.5. Ouch!

But no problemo! After all, Python is one of the very few things in Gentoo portage that has slots. You can have Python 2.4 and Python 2.5 side-by-side. So then you can have either one as the default. You could fix up the symbolic links to /usr/bin/python by hand, but there's a utility that helps you to do this quickly and accurately (as well as supporting several other system configuration options), so I "emerged" eselect, and was able to switch between the two. After switching to 2.4, Zope should just work again, eh. After all, it had worked on 2.4 before....wrong!

It turned out that I could get the result I wanted by doing a full emerge of Python 2.4, but simply switching between the two wasn't enough. The full emerge, would also take care of recompiling all the libraries that linked to Python. What I needed was the "python-updater" utility. This is a script that rebuilds all the stuff that's known to work differently on the two versions of Python. If I switched to Python 2.4 and ran python-updater, everything was fine.

But the story goes on. Running Python 2.4 as my system default Python was enough to get the web site back in the air, but presumably this would be a big spanner in the works of my project to get a desktop running. (Actually, not just a desktop - I'd added some other packages such as mono-develop, which represent my motivation for wanting a desktop in the first place, but some of these also had Python dependencies).

After a bit of research, I realised that if I could specify the Python executable I wanted in my buildout configuration. (using the "executable" directive in the [buildout] section). This meant I could have the default Python be 2.5 and still have Zope/Plone use 2.4. I tried this, and it worked, and the web site was back up and I breathed a sigh of relief. Then I thought - let's get back to a clean system, so I ran python-updater.... and huge skip-loads of fail came raining down. Zope seemed to start OK, but my Plone sites were hosed, and had some bizarre message that didn't point anywhere useful whatsoever. Grrr... so I set the system Python back to 2.4, ran python-updater, and all was well with the world. At this point I left the system alone for a couple of days just to recover a little tenacity - sometimes you have to.

When I came back to the problem, I flipped everything back into the broken configuration, and started methodical debugging. After quite a few false starts, someone suggested that I start Zope in debug mode. I did this, and the problem was immediately apparent. While starting up, Zope was reporting that it couldn't find PIL, the Python Imaging library. Well PIL is pretty clearly a dependency for Plone, so that was the problem. Re-visiting python-updater a couple of times showed that one of the things it does is to remove Python's site-packages from whichever Python you're switching away from. (I don't understand why it does this. Is that a bug?)

This new development led me to the conclusion that if, instead of having the buildout rely on the PIL installation in Python's site packages, it had its own local copy, I could just leave it wired up to Python2.4, and it wouldn't matter that switching to 2.5 and running python-updater would make the 2.4 site packages disappear. But how to do this? I spent a little time investigating the possibility of using buildout to download and build a local PIL. I'd seen some hints on the web that you could use a buildout recipe to do this, so I tried using a buildout recipe that was supposed to do this, however it was so poorly documented that I was reduced to guesswork and soon fell back to plan B. (Fortunately, I can't remember the name of the recipe, and Googling for it a second time didn't turn it up. What it did turn up was this, which might also work.)

Plan B was to attempt something using virtualenv which will create a virtual python environment locally for you, into which you can deploy whatever packages you want. I was just about to go along this road, but it looked quite tricky to figure it all out and integrate with my existing buildout, so I asked on the #plone channel on freenode IRC and someone suggested I look at PILwoTk. At first I thought I'd rather look straight into the sun, but I simply added the egg to buildout.cfg and it just worked. I don't know where PILwoTk is to be found or why it has such a bizarre name, but it's got my system working with a local PIL in my buildout. Great.

All I need to do now is finish getting my desktop working. Of course, it won't be a real desktop, because there's no desk, and definitely no screen or keyboard. Watch this space.

Dominic Cronin's web site

Server upgrades - portage hell and beyond