Skip to content. | Skip to navigation

Personal tools

Navigation

You are here: Home / weblog

Dominic Cronin's weblog

Showing blog entries tagged as: Infrastructure

Revisiting validateXml

Some time back in 2009 I blogged about validating Tridion's content delivery configuration files. It was a good idea then, and it's remained a good idea ever since. These days, we're dealing with SDL Web 8 and with the new micro-services architecture, you've got a lot of configuration files to get right. (On my fairly unambitious test system, running staging and live together, I just counted almost 80 configuration files.) Fortunately these seem to be reliably supported with schema files that are simply in each of the microservice folders that you copy during an installation. 

Back when I first wrote the ValidateXmlFile powershell function, I'd left it rather unfinished. It was good enough to let me do some validations and detect problems, but it had a significant flaw, in that if a schema file was not present at the location indicated by the noNamespaceSchemaLocation attribute, it would simply not bother with validation. Considering that we're using an XmlReader to do the validation, this is a pretty reasonable design decision - after all the main purpose is to read in the XML, and validation is perhaps a bit of a side-effect. Fair enough, but it's a nasty hole in our defences, so now that I'm revisiting the technique, I've beefed up the script a bit so that it checks that the location is present and that there's a file in the location. 

I've also made sure that the script does some pushd/popd to make sure that everything is nicely lined up when the location is relative to the file (which it generally is).

Here's the updated script

function ValidateXmlFile {
    param ([string]$xmlFile       = $(read-host "Please specify the path to the Xml file"))
	$xmlFile = resolve-path $xmlFile
    "==============================================================="
    "Validating $xmlFile using the schemas locations specified in it"
    "==============================================================="
    # The validating reader silently fails to catch any problems if the schema locations aren't set up properly
    # So attempt to get to the right place....
    pushd (Split-Path $xmlFile)

    try {
        $ns = @{xsi='http://www.w3.org/2001/XMLSchema-instance'}
	# of course, if it's not well formed, it will barf here. Then we've also found a problem
        # use * in the XPath because not all files begin with Configuration any more. We'll still 
        # assume the location is on the root element 
        $locationAttr = Select-Xml -Path $xmlFile -Namespace $ns -XPath */@xsi:noNamespaceSchemaLocation
        if ($locationAttr -eq $null) {throw "Can't find schema location attribute. This ain't gonna work"}

        $schemaLocation = resolve-path $locationAttr.Path
        if ($schemaLocation -eq $null) 
        {
            throw "Can't find schema at location specified in Xml file. Bailing" 
        }

        $settings = new-object System.Xml.XmlReaderSettings
        $settings.ValidationType = [System.Xml.ValidationType]::Schema
        $settings.ValidationFlags = $settings.ValidationFlags `
                -bor [System.Xml.Schema.XmlSchemaValidationFlags]::ProcessSchemaLocation
        $handler = [System.Xml.Schema.ValidationEventHandler] {
            $args = $_ # entering new block so copy $_
            switch ($args.Severity) {
                Error {
                    # Exception is an XmlSchemaException
                    Write-Host "ERROR: line $($args.Exception.LineNumber)" -nonewline
                    Write-Host " position $($args.Exception.LinePosition)"
                    Write-Host $args.Message
                    break
                }
                Warning {
                    # So far, everything that has caused the handler to fire, has caused an Error...
                    # So this /might/ be unreachable
                    Write-Host "Warning:: " + $args.Message
                    break
                }
            }
        }
        $settings.add_ValidationEventHandler($handler)
        $reader = [System.Xml.XmlReader]::Create($xmlfile, $settings)
        while($reader.Read()){}
        $reader.Close()

    }
    catch {
        throw
    }
    finally {
        popd         
    }
}

Of course, what you really want is to be able to verify all your configurations in one go. Once the script is in your powershell $profile, you can put together some fairly simple command-line-fu to take care of that. I have all my microservices in one directory, which I guess is a pretty common pattern, so all I had to do was CD over there and execute the following: 

gci -r -file -include *conf.xml | % {ValidateXmlFile $_}

By running this, I've also picked a couple of things that might be false positives. That aside, this is a real time saver if you're trying to solve issues. There's nothing like being able to eliminate a lot of the stupid typos from consideration all in one go. 

System refresh: new architecture for www.dominic.cronin.nl

It's taken a while, and the odd skinned knuckle and a bit of cursing, but I can finally announce that this site is running on...erm.. the other server. Tada! Ta-ta-ta-diddly.... daaahhhh!!!!

Um yeah - I get it. it's not so exciting is it really? The blog's still here, and it's got more or less the same content. It doesn't look any different. Maybe it's a tiny smidgin faster, but even that's more likely to do with the fact that we switched over to an ISP that actually makes use of the glass that runs in to our meter cupboard. 

But I'm excited. Just a bit, anyway. Partly because it's taken me months. It needn't have, but it's the usual question of squeezing it into the cracks between all the other things that need to get done in life. That and the fact that I'm an utter cheapskate and I don't want to pay for anything. There's also plenty not to be excited about. As I said, the functionality is exactly as it was. The benefits I get from it are mostly about the ability to do things better going forward. 

So what have I done? Well it all started an incredibly long time ago when I started tinkering with docker. I figured that the whole containerisation technology thing had such a lot of potential that I ought at least to run docker on my own server. After all, over the years, I'd always struggled with Plone needing to have a different version of Python than the one available in the current Gentoo ebuilds. I'd attempted a couple of things, including I think an early version of what became LXC, but then along came virtualenv, which made the whole thing moot. 

Yeah, well - until I wanted to play with docker for itself. At this point, I just thought I'd install it on my server, and get going, but I immediately discovered, that the old box I was running was 32-bit, and docker is just far too hip to run on anything so old-fashioned. So I needed a new server, and once I'd realised that, that's when the whole thing started. If I was going to have a new server, why didn't I just containerise everything? It's at this point that someone inevitably chips in with a suggestion that if I weren't such a dinosaur, I'd run it on the cloud, wouldn't I? Well yes - sure! But I told you - I'm a cheapskate, and apart from that, I don't want anyone's soul-less reliability messing with my carefully constructed one-nine availability commitment. 

Actually I like cloud tech, but frankly, when you look at the micro-budget that supports this site, I'd have spent all my time searching out a super-cheap host, and even then I'd have begrudged it. So my compromise with myself was that I'd build it all very cloudy, and then the world's various public clouds would be my disaster recovery plan. And so it is. If this server dies, I can get it all up in the cloud with a fairly meagre effort. Still not going to two-nines though.

So I went down to my local high street where there's a shop run by these Indian guys. They always have a good choice of "hardly used" ex-business computers. I think I shelled out a couple of hundred Euros, and then I had something with an i5 and enough memory, and a couple of stupidly big disks to make a raid. Anyway - more than enough for a web server - which is just as well, because pretty soon it ends up just being "the server", and it'll get used for all sorts of other things. All the more reason to containerise everything. 

I got the thing home, and instead of doing what I've done many times before, and installing Gentoo linux, I poked around a bit on the Internet and found CoreOS. Gentoo is a masochist's delight. I mean - it runs like a sports car, but you have to own a set of spanners. CoreOS, on the other hand, is more or less maintenance free. It's built on Gentoo's build system, so it inherits the sports car mentality of only installing things you are going to use, but then the guys at CoreOS do that, and their idea of "things you are going to use" is basically everything that it takes to get containers up and keep them running, plus exactly nothing else. For the rest, it's designed for cloud use, so you can install it from bare metal to fully working just by writing a configuration file, and it knows how to update itself while running. (It has a separate partition for the new version, and it just switches over.) 

So with CoreOS up and running, the next thing was to convert all the moving parts over to Docker containers. As it stands now, I didn't want to change too much of the basics, so I'm running Plone on a Gentoo container. That's way too much masochism though. I'd already been thinking I'd do a fresh one with a more generic out-of-the-box OS, and I've just realised I can pull a pre-built Plone image based on Debian (or Alpine). This gets better and better. And I can run it all up side-by-side in separate containers until I'm ready to flip the switch. Just great! Hmm... maybe my grand master plan was just to get to Plone 5! 

The Gentoo container I'm using is based on one created by the Gentoo community, which you can pull from the Docker hub. Once I found this, I thought I was home and dry, but it's not really well-suited to just pulling automatically from a docker file. What they've done is to separate out the portage tree into a separate container. This is smart, because you are unlikely to want the whole of portage in your container for any given purpose that makes you want to run Gentoo. What you do instead is mount the portage data using docker's --volumes-from argument. With it mounted, you can run emerge and install whatever packages you need, and then at runtime you get to run a much slimmer system. Which is great, but it means you have to create and store your own image manually rather than using a dockerfile. (At least, that's how it ended up for a noob like me, once I realised that dockerfile doesn't have an equivalent of --volumes-from.) 

My goal was to set up CoreOs to automatically pull the docker images it needed, and run some setup commands. This meant that I'd need to have my personalised Gentoo image available somewhere. Some of the data in there was sensitive, so I went looking for a private Docker registry that I could upload it to. There are plenty of private registries, but most of them aren't free. (If you don't mind the whole world pulling your containers, then free registries abound.) I eventually found https://canister.io/, which suited my needs. That said, my needs aren't much. If I ever need an alternative to canister, I'll probably look at Google Cloud Platform, which isn't free but has a private container registry where you only pay for storage and data egress, at pretty reasonable rates. Or I could just host it myself, but that's maybe too many eggs in the same basket. 

Meanwhile, my very next step ought most probably be to get backups sorted out. The "Dockerish" way to do this is to run up yet another dedicated container to deal with just this concern. Then if I want to host it separately, and my backup approach changes, nothing else needs to. Once I have the backups sorted out, it will definitely be worth the while to tidy things up so that I really can just push to the cloud if needs be. The way it's set up now, I could be up and running again very quickly but we're probably talking hours rather than seconds. 

I'm really enjoying the flexibility that containerisation gives me, although it's definitely important to get into the right mindset. Being able to build containers that will run on a really generic platform is quite liberating.

Hotfix rollups are the new Service Pack

I was recently surprised to learn that a Hotfix Rollup shipped from SDL Tridion is something quite different to what you'd expect from the title. For at least the last 10 years, and probably longer, the distinction between a hotfix and a service pack was very simple:

Service Pack

A collection of product improvements shipped between full version releases. The improvements would include bug fixes, and possibly new features, but never "breaking" changes. The intention was that customers should install the latest service pack for their current version. The service pack would have been thoroughly tested by R&D and would be the basis for on-going support until the next release.

Hotfix

If an issue was found in software in the field, a hotfix could be created to address this issue. There wouldn't be an installer - just some files and some instructions. Often a hotfix would be seen as suitable for any customer to install, but other hotfixes were riskier, and if you didn't have the problem, installing the hotfix would be a bad idea. Hotfixes were tested by customer support. The next service pack or full release would supersede any hotfix. In a reasonably thorough risk-management strategy, the standard play was to avoid taking hotfixes until you needed them. The official advice from Tridion as of 2011 was this:

IMPORTANT NOTE: Hotfixes are released at the discretion of SDL Tridion based on technical complexity, customer business requirements and schedules. Hotfixes are made and tested only for the described problem on a particular environment/configuration and therefore should only be installed if approved by SDL Tridion Customer Support. Hotfixes should be replaced as soon as possible by the subsequent service pack where the problem is fixed.

And then along came Hotfix Rollups...

Hotfix rollups

You might be forgiven for thinking that a hotfix rollup was, well a sort of erm... roll-up of hotfixes. A collection of hotfixes. A gathering together of a handy bunch of hotfixes to make life easier for the less risk-averse who like to install everything. (Like me, when I'm installing my own dev image. Love the handiness of it.) That's what the name means in any normal interpretation of the English language. The point here is that this is not what SDL Tridion mean when they say Hotfix Rollup. From discussions with various SDL people, it seems that they see a hotfix rollup as having the following characteristics:

  • It is not expected to cause any problems on your system and can safely be installed.
  • To this end, it has been tested by the relevant specialists in R&D
  • In the same way that you are expected to install a service pack, you are expected to install a hotfix rollup. Should further hotfixes become necessary, they will have the hotfix rollup as a dependency, not specific hotfixes. (This means that if you need that hotfix, you'll end up installing the hotfix rollup too, probably at a moment that you'd prefer to have chosen yourself.)

 

This is my best understanding at the current moment, but I am not aware of any formal communication from SDL that makes this clear, or otherwise updates the advice from 2011. Obviously, feel free to get formal confirmation via the usual channels

And as for you, SDL: your customers' risks are not your risks. You owe it to your customers to communicate correctly and in a timely way about this kind of thing. If anyone thought this would engender trust and confidence, that person was not thinking clearly. I wouldn't be saying this, but people out in the field often spend significant effort trying to balance risks like this, and it's in all our interests to make sure it goes well.

One-nine availability

Posted by Dominic Cronin at Aug 16, 2014 09:43 AM |

A couple of weeks ago, this site went down. That happens from time to time. It went down just as I left the country to go on holiday, and it could only be fixed via physical access, so it was down for a week. At least one person has commented that maybe I should stop with this silliness of running my own server on an old Gentoo box in the meter cupboard, and get some proper hosting.

The thing is, that when I started this blog, some years ago now, I went through a detailed requirements analysis, and a full MoSCoWMeh matrix. If you aren't familiar with MoSCoWMeh, this is an enhanced variant of the well-known MoSCoW technique, which also accommodates the needs of private and hobby-run systems.

The requirement for reliable hosting and 5-nines up-time was classified as Meh, and has remained so since. So now you know.

Keeping your feet dry

Posted by Dominic Cronin at Feb 14, 2014 11:10 PM |

When designing and implementing web-content-managed web sites with Tridion, the usual arrangement is to have at least four distinct environments, designated for specific purposes. Development, Test, Acceptance, Production. Often we refer to this as the DTAP street. Each environment has its own peculiarities. The production environment serves web pages to the visiting public, so will have at least some servers in the "demilitarized zone". There will probably be multiple web servers behind a load balancer, and particular attention will be paid to defences against the ne'er-do-wells of the Internet. The acceptance environment will be used for the final testing of software releases before they are allowed on to the production system. If the production system is load-balanced, so will the acceptance system be, and lots of attention will be paid to ensuring that the A-environment is a truly representative copy of P. The hardware will be close to identical, and all software will be patched to exactly the same levels as on the production system (unless, of course, a patch is being rolled out, in which case this will take place first in A). The other two environments belong to the development team. The Test environment is used for testing during a development project, to ensure that the necessary quality levels are achieved before moving on to acceptance testing. New versions of the software may be frequently deployed - perhaps daily or several times a day. In general, the environment will be maintained to be a good representation of the Production environments, but not to quite the same levels of obsession as for the Acceptance environment. The Development environment will be quite similar to the Test environment, but is likely to have extra software installed for use in development. Programming software, automated build and test software - that kind of thing. Typically the programmers will have more access privileges in the Development environment than in the other environments. Depending on the organisation, they may well be system administrators in D, and have significant privileges in T. This makes sense, because often they will need to try new approaches, and set up new configurations, or perhaps they might need to attach a debugger to the running software to analyse its processing.

All of this adds up to a significant investment. There will be an entire team of people busy for quite a while to get this set up and to maintain it. Hardware (although often virtualised these days - still complex enough), software - operating systems, databases, security, etc, etc, Then you have to add in all the work of simply managing the whole thing. It's not cheap, and then on top of that, the licenses usually aren't free. So there's a temptation to cut corners. This can mean missing out an environment entirely, or even two - although even the most miserly will usually draw the line at doing development work on the same system that serves the public. It can also mean taking shortcuts in configuration expenses. Maybe you can't afford to have your system administrator spend his time making special configurations for the development environment. The thing is, making sure everything is done right can be unpalatably expensive. So, of course, the first thing to do is ensure that such an expensive set-up is a good fit for your needs. For the vast majority of web sites, you definitely don't need a high-end enterprise web content management system like Tridion. If you do, however, then it's probably a pretty good sign that the expense of running a proper DTAP street is also worth it.

But what if you want all that goodness without having to pay for it? Well in that case, the responsible technicians need to make it clear what the trade-offs are. You can save money, but it's a gamble. The problem is that getting these things working doesn't just cost money. In an emergency, you can usually get more of that. The trouble is, that it also costs time, and the definition of an emergency is that you don't have any of that spare. So - imagine a situation where you would like to be able to debug a problematic piece of software, but your security requirements are pretty heavy, and cast in procedural concrete. So you attempt to set up the necessary tools in your development system, but it doesn't work. To get it working, you estimate you'll need to spend a couple of days of research (say - a day each for a developer and a sysadmin). Maybe it's twice that, maybe it's half, and maybe you need to write a report on all the possible approaches, and have it approved by a committee of architects. Whatever - it's more expensive than you'd like... so you choose not to do it. This is the point at which clear communication is essential.

Living, as I do, in Amsterdam, I can't help feeling just a bit smug as I listen to the news on the radio. In England, the Somerset Levels have been flooded for a long time, and it's still raining. The amount of rain landing on the South of England, and on Wales just now is more than normal - that is to say, it only comes down like that a few times in a century. The people of the Somerset Levels are complaining vociferously that maybe they'd have stood for being flooded for a week, but the water won't go away so they've had it for weeks and weeks. Now some people near London are getting their feet wet too, so suddenly it's important. :-) On my way home, I felt some of that weather. The rain was lashing down, with a pretty solid wind behind it. Was I worried? Not in the slightest, even though I live at least as far below sea level as the people of the levels. You see, here, if the weather gets like that, the only real effect you'll see, is perhaps some smoke coming out of the chimney at your local friendly pumping station. The whole landscape is littered with places for water to go. Every little canal or pool has big sloping sides that will accommodate several times the normal amount of water.

So - when it rains in Somerset (nothing personal, folks), they get their feet very wet indeed, and have some very uncomplimentary things to say about the government's Environment Agency, whose job it is to build and maintain the DTAP street. Various government agencies turn up to provide sandbags. When it rains here, the pumps kick in, and we're good. Sometimes I find it hard to articulate to budget holders exactly why I'd like them to spend money on the odd pumping station that's never really going to get used, is it? I mean come on, what are the chances? Did I say pumping station?

Seriously - if you're going to cut corners on your infrastructure, make sure all your stakeholders know the difference between Somerset and Amsterdam.