Skip to content. | Skip to navigation

Personal tools

Navigation

You are here: Home / weblog

Dominic Cronin's weblog

Showing blog entries tagged as: Infrastructure

Encrypting passwords for Tridion content delivery... revisited

A while ago I posted a "note to self" explaining that in order to use the Encrypt utility from the Tridion content delivery library, you needed to put an extra jar on your classpath. That was in SDL Web 8.5. This post is to explain that in SDL Tridion Sites 9, this advice still stands, but the names have changed. 

But first, why would you want to do this? Basically it's a measure to prevent your passwords being shoulder-surfed. Imagine you have a configuration file with a password in it like this: 

<Account Id="cduser" Password="${cduserpassword:-CDUserP@ssw0rd}">
<Metadata>
<Param Name="FirstName" Value="CD"/>
<Param Name="LastName" Value="User"/>
<Param Name="Role" Value="cd"/>
<Param Name="AllowedCookieForwarding" Value="true"/>
</Metadata>
</Account>

You might not want everyone who passes by to see that your password is "CDUserP@ssw0rd". Much better to have something like encrypted:o/cgCBwmULeOyUZghFaKJA==

<Account Id="cduser" Password="${cduserpassword:-encrypted:o/cgCBwmULeOyUZghFaKJA==}">
<Metadata>
<Param Name="FirstName" Value="CD"/>
<Param Name="LastName" Value="User"/>
<Param Name="Role" Value="cd"/>
<Param Name="AllowedCookieForwarding" Value="true"/>
</Metadata>
</Account>

Actually - with the possibility to do token replacement, I do wonder why you need a password in your config files at all, but that's not what this post's about. 

The thing is that the jar files that used to be called cd_core.jar is now called udp-core.jar and cd_common_util.jar has become udp-common-util.jar.  Actually this is a total lie, because in recent versions of Tridion all the jars have versioned names, as you'll see in the example I'm about to show you. One of these jars is to be found in the lib folder of your services, and the other in the services folder, so you might find it's easier just to copy them both to the same directory, but this is what it looks like doing it directly from the standalone folder of discovery: 

PS D:\Tridion Sites 9.0.0.609 GA\Tridion\Content Delivery\roles\discovery\standalone> java -cp services\discovery-servic
e\udp-core-11.0.0-1020.jar`;lib\udp-common-util-11.0.0-1022.jar com.tridion.crypto.Encrypt foo
Configuration value = encrypted:6oR074TGuXmBdXM289+iDQ==

Note that here I've escaped the semicolon from the powershell with a backtick, but you can just as easily wrap the whole cp argument in quotes. Please note that I do not recommend the use of foo as a password. Equally, please don't use this encryption as your only means of safeguarding your secrets. It raises the bar a bit for the required memory skills of shoulder surfers, and that's about it. It's a good thing, but don't let it make you complacent. You also need to follow standard industry practices to control access to your servers and the data they hold. Of course, this is equally true of any external provisioning systems you have. 

 

 

Discovery service in Tridion Sites nine has two storage configs

I just got bitten by a little "gotcha" in SDL Tridion Sites 9. When you unpack the intallation zip, you'll find that in the Content Delivery/roles/Discovery folder, there's a separate folder for registration, with the registration tool and its own copy of cd_storage_conf.xml. The idea seems to be that running the service and registering capabilities are two separate activities. I kind of get that. When I first saw the ConfigRepository element at the bottom of Discovery's configuration, I felt like it had been shoehorned into a somewhat awkward place. Yet now, it seems even more awkward. So sure, both the service and the registration tool need access to the storage settings for the discovery database, while only the registration tool needs the configuration repository.

The main difference seems to be one of security. The version of the config that goes with the registration tool has the ClientId and ClientSecret attributes while the other doesn't. This, in fact, is the gotcha that caught me out; I'd copied the storage config from the service, and ended up being unable to perform an update. The error output did mention being unable to get an OAuth token, but I didn't immediately realise that the missing ClientId and ClientSecret were the reason. Kudos to Damien Jewett for his answer on Stack Exchange, which saved me some hair-pulling.

I'm left wondering if this is the end game, or whether a future version will see some further tidying up or separation of concerns.

EDIT: On looking at this again, I realised that even in 8.5 we had two storage configs. The difference is that in 8.5 both had the ClientId and ClientSecret attributes.

Enumerating the Tridion config replacement tokens

OK - I get it. It's starting to look like I've got some kind of monomania regarding the replacement tokens in Tridion config files, but bear with me. In my last blog post, I'd hacked out a regex that could be used for replacing them with their default values, but had thought better of actually doing so. But still, the idea of being able to grab all the tokens has some appeal. I can't bear to waste that regex, so now I'm looking for a reasonable use for it.

It occurred to me that at some point in an installation, it might be handy to have a comprehensive list of all the things you can pass in as environment variables. Based on what I'd done yesterday this was quite straightforward

gci -r -include *.xml -exclude logback.xml| sls '\$\{.*?\}' `
| select {$_.RelativePath((pwd))},LineNumber,{$_.Matches.value} `
| Export-Csv SitesNineTokens.csv

By going to my unzipped Tridion zip and running  this in the "Content Delivery/roles" folder, I had myself a spreadsheet with a list of all the tokens in Sites 9. Similarly, I created a spreadsheet for Web 8.5. (As you can see, I've excluded the logback files just to keep the volume down a bit, but in real life, you might also want to see those listed.)

The first thing you see when comparing Sites 9 with Web 8.5 is that there are a lot more of the things. More than twice as many. (At this point I should probably confess to some possible inaccuracy, as I haven't gone to the trouble of stripping out XML comments, so there could be some duplicates.)

65 of these come from the addition of XO and another 42 from IQ, but in general, there are just more of them. The bottom line is that to get a Tridion system up and running these days, you are dealing with hundreds of settings. To be fair, that's simply what's necessary in order to implement the various capabilities of such an enterprise system.

One curious thing I noticed is that the ambient configs all have a token to allow you to disable oauth security, yet no tokens for the security settings for the various roles. I wonder if this reflects the way people actually use Tridion.

Of course, you aren't necessarily limited by the tokens in the example configs of the shipped product. Are customers defining their own as they need them?

That's probably enough about this subject, though, isn't it?

Unboxing SDL Tridion Sites 9

Posted by Dominic Cronin at Dec 26, 2018 10:40 PM |

It's Boxing Day, so I thought I'd treat myself by unboxing Tridion Sites 9, or more to the point, installing the Content Manager. Just to give a bit of context, this is not a production installation, but rather a "fifth environment". (Chris Summers once suggested this term for a developer's own setup, as distinct from the Developent, Test, Acceptance and Production environments of a traditional delivery street. The name seems to have stuck.)

I'm installing on the Google Cloud Platform (GCP) so I selected a "SQL Server 2017 Standard on Windows Server 2016 Datacenter" with a standard 50GB boot disk. By default I got a system with 3.75GB of memory, but during installation I got a notification from GCP that this might not be enough, so I accepted the suggested upgrade to 4.75GB. I'm not sure if the installation process is sufficiently typical use to determine memory sizing, but well... 4 gigs is pretty small these days, eh? I'm trying to run this on a tight budget, but an extra gig of memory won't be what breaks the bank. (If anything turns out to be expensive, it will be the Windows license, but you pay by the second and I plan to be very disciplined about shutting things down when I don't need them.) The shoestring budget is why I chose a version that already has MSSQL installed. The last time I did this, I ended up running up two separate Windows Servers, but this time I'm just starting with the version with MSSQL, which might help to keep the costs down. Of course, in production, a separate database server or two would still make sense, but this is a research rig. As for where to run the database, I'm also looking at the dockerised version of MSSQL, which has some attractions, but to get going quickly, a Windows image with it installed will be fine.

For SDL Web 8.5 I already had some scripts that took care of most of the content manager installation. I'm pleased to say that these ran with only very minor modifications for Tridion 9. So, for example, the layout of the installer directory is relatively predictable, but the installer executable is now called SDLTridionSites9.exe. But let's start at the beginning. After the usual fuss trying to get the files I needed up to the cloud and available from my image, I was able to do the following:

  • Run my script that kicks off all the Tridion database install scripts. Nothing very exciting here, just a rinse-and-repeat operation for most of the time.Having a script for this saves you typing the same input parameters a dozen times, and gives you an audit trail. Before I could do this, I had to install Microsoft Sql Server Management studio and use it to set up the sa account as I wanted it. For content delivery I may well choose to put the databases somewhere else, but I'll definitely remember to read through this note to self I wrote a while ago.
  • As it's an all-in-one install, it's necessary to disable the loopback check. If you're happy with a quick-and-dirty, this will get you there:
New-ItemProperty HKLM:\System\CurrentControlSet\Control\Lsa -Name DisableLoopbackCheck -Value 1 -PropertyType dword
    • After that I ran my installation script. Mostly this is a question of passing some pre-determined parameters to the installer, mainly so it's repeatable. I do, however do a couple of things first. One is to create the content manager system account (tridionsys, or mtsuser if you're old-fashioned). I also set up a couple of optional windows features: like this -
$desiredFeatures = "IIS-ApplicationInit", `
            "IIS-ASPNET",`
            "IIS-HttpCompressionDynamic", `
            "IIS-ManagementService", `
            "Web-Mgmt-Service", `
            "WAS-NetFxEnvironment", `
            "Windows-Identity-Foundation"

Get-WindowsOptionalFeature -Online `
| ?{$desiredFeatures -contains $_.FeatureName -and -not ($_.State -eq 'Enabled')} `
| Enable-WindowsOptionalFeature -Online

The old installer had problems with installing the content manager and topology manager on the same port with distinct host headers. My installation approach involves letting the installer put them on separate ports and then running a separate script to fix things up how I like it. I haven't tested whether the new installer makes this unnecessary, as for now my priority is just getting a working system. Perhaps it's also interesting to test whether my pre-install fixup of Windows features is still necessary. UPDATE: I have now tested whether this problem still exists in the Sites 9 installer, and it does.

Anyway, I now have the content manager and topology manager running, and can move on to content delivery. My overall assessment so far is that it's pretty straightforward setup. I'm looking forward to my further adventures with Tridion Sites 9.

 

 

Using environment variables to configure the Tridion microservices

Within a day of posting this, Peter Kjaer informed me that the microservices already support environment variables, so this entire blog post is pointless. So my life just got simpler, but it cost me a blog post to find out. Oh well. I'm currently trying to decide whether to delete the post entirely or work it into something useful. In the meantime at least be aware that it's pointless! :-) Anyway - thanks Peter.

When setting up a Tridion content delivery infrastructure, one of the most important considerations is how you are going to manage all the configuration values. The microservices have configuration files that look very similar to those we're familiar with from versions of Tridion going back to R5. Fairly recently, (in 8.5, I think) they acquired a "new trick", which is that you can put replacement tokens in the files, and these will be filled in with values that you can pass as JVM parameters when starting up your java process. Here's an example taken from cd_discovery_conf.xml

<ConfigRepository ServiceUri="${discoveryurl:-http://localhost:8082/discovery.svc}"
ConnectionTimeout="10000"
    CacheEnabled="true"
    CacheExpirationDuration="600"
    ServiceMonitorPollDuration="10"
    ClientId="registration"
    ClientSecret="encrypted:HzfQh9wYwAKShDxCm4DnnBnysAz9PtbDMFXMbPszSVY="
    TokenServiceUrl="${tokenurl:-http://localhost:8082/token.svc}">

Here you can see the tokens "discoveryurl" and "tokenurl" delimited from the surrounding text with ${} and followed by default values after the :- symbol.

This is really handy if you are doing any kind of managed provisioning where the settings have to come from some external source. One word of warning, though. If you are setting up your system by hand and intending to maintain it that way, it's most likely a really bad idea to use this technique. In particular, if you are going to install the services under Windows, you'll find that the JVM parameters are stored in a deeply obscure part of the registry. More to the point, you really don't want two versions of the truth, and if you have to look every time to figure out whether tokenurl is coming from the default in your config or from deep underground, I don't hold out much hope for your continued sanity if you ever have to troubleshoot the thing.

That said, if you do want to provision these values externally, this is the way to go. Or at least, in general, it's what you want, but personally I'm not really too happy with the fact that you have to use JVM parameters for this. I've recently been setting up a dockerised system, and I found myself wishing that I could use environment variables instead. That's partly because this is a natural idiom with docker. Docker doesn't care what you run in a container, and has absolutely no notion of a JVM parameter. On the other hand, Docker knows all about environment variables, and provides full support for passing them in when you start the container. On the command line, you can do this with something like:

> docker run -it -e dbtype=MSSQL -e dbclass=com.microsoft.sqlserver.jdbc.SQLServerDataSource -e dbhost=mssql -e dbport=1433 -e dbname=Tridion_Disc
-e discoveryurl=http://localhost:8082/discovery.svc -e tokenurl=http://localhost:8082/token.svc discovery bash

I'm just illustrating how you'd pass command-line environment arguments, so don't pay too much attention to anything else here, and of course, even if you had a container that could run your service, this wouldn't work. It's not very much less ugly than constructing a huge set of command parameters for your start.sh and passing them as a command array. But bear with me; I still don't want to construct that command array, and there are nicer ways of passing in the environment variables. For example, here's how they might look in a docker-compose.yaml file (Please just assume that any YAML I post is accompanied by a ritual hawk and spit. A curse be on YAML and it's benighted followers.)

   environment: 
      - dbtype=MSSQL
      - dbclass=com.microsoft.sqlserver.jdbc.SQLServerDataSource
      - dbhost=mssql
      - dbport=1433
      - dbname=Tridion_Discovery
      - dbuser=TridionBrokerUser
      - dbpassword=Tridion1
      - discoveryurl=http://localhost:8082/discovery.svc
      - tokenurl=http://localhost:8082/token.svc

This is much more readable and manageable. In practice, rather than docker-compose, it's quite likely that you'll be using some more advanced orchestration tools, perhaps wrapped up in some nice cloudy management system. In any of these environments, you'll find good support for passing in some neatly arranged environment variables. (OK - it will probably degenerate to YAML at some point, but let's leave that aside for now.)

Out of the box, the Tridion services are started with a bash script "start.sh" that's to be found in the bin directory of your service. I didn't want to mess with this: any future updates would then be a cause for much fiddling and cursing. On top of that, I wanted something I could generically apply to all the services. My approach looks like this:

#!/bin/bash
# vim: set fileformat=unix

scriptArgs=""
tcdenvMatcher='^tcdconf_([^=]*)=(.*)'
for tcdenv in $(printenv); do
    if [[ $tcdenv =~ $tcdenvMatcher ]]; then
        scriptArgs="$scriptArgs -D${BASH_REMATCH[1]}=${BASH_REMATCH[2]}"
    fi
done

script_path="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null && pwd )"
$script_path/start.sh $scriptArgs

(I'm sticking with the docker-compose example to illustrate this. In fact, with docker-compose, you'd also need to script some dependency-management between the various services, which is why you'd probably prefer to use a proper orchestration framework.)

The script is called "startFromEnv.sh". When I create my docker containers, I drop this into the bin folder right next to start.sh. When I start the container, the command becomes something like this, (but YMMV depending on how you build your images).

command: "/Discovery/bin/startFromEnv.sh"

instead of:

command: "/Discovery/bin/start.sh"

And the environment variables get some prefixes, so the relevant section of the setup looks like this:

    environment: 
      - tcdconf_dbtype=MSSQL
      - tcdconf_dbclass=com.microsoft.sqlserver.jdbc.SQLServerDataSource 
      - tcdconf_dbhost=mssql
      - tcdconf_dbport=1433
      - tcdconf_dbname=Tridion_Discovery
      - tcdconf_dbuser=TridionBrokerUser
      - tcdconf_dbpassword=Tridion1
      - tcdconf_discoveryurl=http://localhost:8082/discovery.svc
      - tcdconf_tokenurl=http://localhost:8082/token.svc

The script is written in bash, as evidenced by the hashbang line at the top. (Immediately after is a vim modeline that you can ignore or delete unless you happen to be using an editor that respects such things and you are working on a Windows system. I've left it as a reminder that the line endings in the file do need to be unix-style.)

The rest of the script simply(!) loops through the environment variables that are prefixed with "tcdconf_" and converts them to -D arguments which it then passes on to script.sh (which it looks for in the same directory as itself).

I'm still experimenting, but for now I'm assuming that this approach has improved my life. Please do let me know if it improves yours. :-)

If you think the script is ugly, apparently this is a design goal of bash, so don't worry about it. At least it's not YAML (hack, spit!)

Websphere and Xalan fun for SDL Web 8

Of the small number of people who follow this blog, an unreasonably large proportion will be familiar with SDL Web 8, and the promise it holds for freedom from classpath hell. The new service-based architecture is a huge step forward, but we aren't out of the woods yet. I'm currently busy with an upgrade project where we're taking an interesting mix of web applications from SDL Tridion 2011 to SDL Web 8.

Web 8's much-vaunted REST-ful microservice approach was initially communicated as pretty much a drop-in replacement for the existing Content Delivery APIs. In practice, it turned out that the focus on backwards compatibility wasn't as clear as it might have been, and if you use JSPPage when invoking dynamic component presentations from a JSP page, you are out of luck, because this class doesn't have an implementation in the REST-ful facade. This is annoying, as I can't see any reason why it couldn't or shouldn't be made to work. The missing support is a "known issue", however I'm told there's not much appetite for fixing it. After all, goes the argument, we can use the in-process API, which does have JSPPage, so that's a workaround isn't it? Except that then we don't get the benefit of the dependency-free service architecture, and that, as I shall explain, is no small thing. 

With the in-process API, of course, the idea is that all the necessary jars to do Tridion content delivery things have to be on your classpath. The general idea is simple enough, but in practice, we have to deal with the fact that there are several class-loaders arranged in a hierarchy, and each of these has their own classpath, although it's not always called that. At the top you've got the class loaders that belong to java itself. This means the boot classloader that loads the nuts and bolts of java itself, and also the one that works from the java CLASSPATH variable plus one for java extensions. And then lower down you have Websphere's own extensions classloader, some magic called the OSGi class loader gateway, and then the application's class loader and one for the module. Yes I know - it sounds pretty insane, but I didn't make it up. Have a look over here if you don't believe me! 

So what kind of trouble did we get into, and how did we get out of it? Well we had all the Web 8 jars in a directory, and we'd deployed our application and set things up so the jars would be on the classpath. Keeping the jars outside the application has been the customer's preferred way of doing things for some years, and it's worked well, so our initial expectation was that things should "just work", but once we started testing, we started to see exceptions like: 

[java.lang.ClassCastException: org.apache.xml.dtm.ref.DTMManagerDefault incompatible 
with org.apache.xml.dtm.DTMManager

This is a bit of a weird one,  because if you look up the sources, org.apache.xml.dtm.ref.DTMManagerDefault and org.apache.xml.dtm.DTMManager are actually defined in the same jar. How could they possibly be incompatible? Well as it turns out, it's possible for java to load two incompatible versions of the same jar simultaneously, from different jars. 

If you look it up, this problem is about the Xerxes library, and it's associated serialiser jar. I think this comes about because Websphere uses Xerxes itself, (as do several other application servers) and because Tridion's own Content Delivery installation has these as third-party jars, any difference in the required versions will be problematic. Of course, it could happen with other libraries, but in practice, it's Xerces. (OK - so we also had similar issues with another application that uses JSTL.)

But let's start with "parent first" and "parent last". When working with the hierarchy of classloaders, the default method of loading a class is parent first. What this means is that when the module classloader needs to load a class, it first checks to see if its parent (the application classloader) can load the class. The application classloader then asks its parent, and so on all the way up. I've visualised this in the left hand diagram with the arrows going down, because in practice, what this means is that classes are made available from the top down. If the java classloaders have the class, that's what will be used throughout. 

Parent last is the opposite arrangement. If the module classloader can find the class in its classpath, it loads it itself and doesn't trouble the parents with it. This effectively means that the lower the classloader, the higher priority its jars have, and hence the direction of the arrows in my right-hand diagram.  

Classloader Parent last                  Classloader Parent first

So to get rid of the ClassCastException, we flipped the configuration from Parent First to Parent Last. This works. It's what SDL recommend that you should do if you encounter these exceptions in your environment. But...

Well it turned out that our problems weren't over. Instead of a ClassCastException, now we had a ClassNotFoundException. I can't post it here, because all this happened a while ago and I'm writing this up later, but as I said earlier, it's all about Xerces. The problem in this case is that if a class is loaded by a classloader, it can't call a class that's loaded by a classloader that's lower in the hierarchy. Parent last leaves you rather more sensitive to this kind of problem, because you're deliberately loading classes lower down that might also be available further up the tree, and also might be expected by classes further up. In any case, even though the class is available, it can't be loaded, and you get a ClassNotFoundException. 

In our case, we were able to solve the problem by moving the xalan and serialiser jars to Websphere's ws.ext directory, where they would be loaded by the Websphere Extensions classloader. 

All this is a bit like the "dll hell" that we used to waste days on on Windows systems before the .NET framework came along. Sooner or later, the answer always ended up being that you needed to know far more than you wanted to about the nuts and bolts of how it all worked, and the various possible locations that Windows would look for a dll. "Classloader hell" is not much different. I've been able to avoid it for a long time just by breezily saying - "Oh yes - that's a Java problem". These days, I seem to be more engaged with Java than I used to be, so having to figure out classloader hell is probably fair game. 

It's been a while since I linked to Joel Spolsky's classic: "The law of leaky abstractions", but this seems like a reasonable moment to do so. Joel's piece probably makes for far more entertaining reading than either this, or this, (both of which are pretty good) or any of the other detailed coverage that Google will turn up for you on the complexities of classloaders. My own description here has been deliberately only a sketch to give the big picture. I've skated over many details, missed others out entirely, and probably got a few things wrong (in which case, comments are welcome). My point is that in any given environment, there's a good chance you'll have to solve this kind of thing. It's frustrating, and it costs time that you probably feel like you don't have, but once you engage with the detail, you will find a solution. I'm not saying the solution outlined here is the best one. There may be other ways to get it working, and some of them may well be better. 

To finish on a rather more upbeat note, we should all be happy to be moving slowly but surely towards the new architecture. Having to deal with these issues is actually a very welcome reminder of why we're investing in new architectures in the first place. The difficulty lies in the fact that you can't necessarily have a rebuild of all your legacy systems in the scope of an upgrade project, so we live with some things that aren't perfect, but we are moving in the right direction. Next time will be better! 

Revisiting validateXml

Some time back in 2009 I blogged about validating Tridion's content delivery configuration files. It was a good idea then, and it's remained a good idea ever since. These days, we're dealing with SDL Web 8 and with the new micro-services architecture, you've got a lot of configuration files to get right. (On my fairly unambitious test system, running staging and live together, I just counted almost 80 configuration files.) Fortunately these seem to be reliably supported with schema files that are simply in each of the microservice folders that you copy during an installation. 

Back when I first wrote the ValidateXmlFile powershell function, I'd left it rather unfinished. It was good enough to let me do some validations and detect problems, but it had a significant flaw, in that if a schema file was not present at the location indicated by the noNamespaceSchemaLocation attribute, it would simply not bother with validation. Considering that we're using an XmlReader to do the validation, this is a pretty reasonable design decision - after all the main purpose is to read in the XML, and validation is perhaps a bit of a side-effect. Fair enough, but it's a nasty hole in our defences, so now that I'm revisiting the technique, I've beefed up the script a bit so that it checks that the location is present and that there's a file in the location. 

I've also made sure that the script does some pushd/popd to make sure that everything is nicely lined up when the location is relative to the file (which it generally is).

Here's the updated script

function ValidateXmlFile {
    param ([string]$xmlFile       = $(read-host "Please specify the path to the Xml file"))
	$xmlFile = resolve-path $xmlFile
    "==============================================================="
    "Validating $xmlFile using the schemas locations specified in it"
    "==============================================================="
    # The validating reader silently fails to catch any problems if the schema locations aren't set up properly
    # So attempt to get to the right place....
    pushd (Split-Path $xmlFile)

    try {
        $ns = @{xsi='http://www.w3.org/2001/XMLSchema-instance'}
	# of course, if it's not well formed, it will barf here. Then we've also found a problem
        # use * in the XPath because not all files begin with Configuration any more. We'll still 
        # assume the location is on the root element 
        $locationAttr = Select-Xml -Path $xmlFile -Namespace $ns -XPath */@xsi:noNamespaceSchemaLocation
        if ($locationAttr -eq $null) {throw "Can't find schema location attribute. This ain't gonna work"}

        $schemaLocation = resolve-path $locationAttr.Path
        if ($schemaLocation -eq $null) 
        {
            throw "Can't find schema at location specified in Xml file. Bailing" 
        }

        $settings = new-object System.Xml.XmlReaderSettings
        $settings.ValidationType = [System.Xml.ValidationType]::Schema
        $settings.ValidationFlags = $settings.ValidationFlags `
                -bor [System.Xml.Schema.XmlSchemaValidationFlags]::ProcessSchemaLocation
        $handler = [System.Xml.Schema.ValidationEventHandler] {
            $args = $_ # entering new block so copy $_
            switch ($args.Severity) {
                Error {
                    # Exception is an XmlSchemaException
                    Write-Host "ERROR: line $($args.Exception.LineNumber)" -nonewline
                    Write-Host " position $($args.Exception.LinePosition)"
                    Write-Host $args.Message
                    break
                }
                Warning {
                    # So far, everything that has caused the handler to fire, has caused an Error...
                    # So this /might/ be unreachable
                    Write-Host "Warning:: " + $args.Message
                    break
                }
            }
        }
        $settings.add_ValidationEventHandler($handler)
        $reader = [System.Xml.XmlReader]::Create($xmlfile, $settings)
        while($reader.Read()){}
        $reader.Close()

    }
    catch {
        throw
    }
    finally {
        popd         
    }
}

Of course, what you really want is to be able to verify all your configurations in one go. Once the script is in your powershell $profile, you can put together some fairly simple command-line-fu to take care of that. I have all my microservices in one directory, which I guess is a pretty common pattern, so all I had to do was CD over there and execute the following: 

gci -r -file -include *conf.xml | % {ValidateXmlFile $_}

By running this, I've also picked a couple of things that might be false positives. That aside, this is a real time saver if you're trying to solve issues. There's nothing like being able to eliminate a lot of the stupid typos from consideration all in one go. 

System refresh: new architecture for www.dominic.cronin.nl

It's taken a while, and the odd skinned knuckle and a bit of cursing, but I can finally announce that this site is running on...erm.. the other server. Tada! Ta-ta-ta-diddly.... daaahhhh!!!!

Um yeah - I get it. it's not so exciting is it really? The blog's still here, and it's got more or less the same content. It doesn't look any different. Maybe it's a tiny smidgin faster, but even that's more likely to do with the fact that we switched over to an ISP that actually makes use of the glass that runs in to our meter cupboard. 

But I'm excited. Just a bit, anyway. Partly because it's taken me months. It needn't have, but it's the usual question of squeezing it into the cracks between all the other things that need to get done in life. That and the fact that I'm an utter cheapskate and I don't want to pay for anything. There's also plenty not to be excited about. As I said, the functionality is exactly as it was. The benefits I get from it are mostly about the ability to do things better going forward. 

So what have I done? Well it all started an incredibly long time ago when I started tinkering with docker. I figured that the whole containerisation technology thing had such a lot of potential that I ought at least to run docker on my own server. After all, over the years, I'd always struggled with Plone needing to have a different version of Python than the one available in the current Gentoo ebuilds. I'd attempted a couple of things, including I think an early version of what became LXC, but then along came virtualenv, which made the whole thing moot. 

Yeah, well - until I wanted to play with docker for itself. At this point, I just thought I'd install it on my server, and get going, but I immediately discovered, that the old box I was running was 32-bit, and docker is just far too hip to run on anything so old-fashioned. So I needed a new server, and once I'd realised that, that's when the whole thing started. If I was going to have a new server, why didn't I just containerise everything? It's at this point that someone inevitably chips in with a suggestion that if I weren't such a dinosaur, I'd run it on the cloud, wouldn't I? Well yes - sure! But I told you - I'm a cheapskate, and apart from that, I don't want anyone's soul-less reliability messing with my carefully constructed one-nine availability commitment. 

Actually I like cloud tech, but frankly, when you look at the micro-budget that supports this site, I'd have spent all my time searching out a super-cheap host, and even then I'd have begrudged it. So my compromise with myself was that I'd build it all very cloudy, and then the world's various public clouds would be my disaster recovery plan. And so it is. If this server dies, I can get it all up in the cloud with a fairly meagre effort. Still not going to two-nines though.

So I went down to my local high street where there's a shop run by these Indian guys. They always have a good choice of "hardly used" ex-business computers. I think I shelled out a couple of hundred Euros, and then I had something with an i5 and enough memory, and a couple of stupidly big disks to make a raid. Anyway - more than enough for a web server - which is just as well, because pretty soon it ends up just being "the server", and it'll get used for all sorts of other things. All the more reason to containerise everything. 

I got the thing home, and instead of doing what I've done many times before, and installing Gentoo linux, I poked around a bit on the Internet and found CoreOS. Gentoo is a masochist's delight. I mean - it runs like a sports car, but you have to own a set of spanners. CoreOS, on the other hand, is more or less maintenance free. It's built on Gentoo's build system, so it inherits the sports car mentality of only installing things you are going to use, but then the guys at CoreOS do that, and their idea of "things you are going to use" is basically everything that it takes to get containers up and keep them running, plus exactly nothing else. For the rest, it's designed for cloud use, so you can install it from bare metal to fully working just by writing a configuration file, and it knows how to update itself while running. (It has a separate partition for the new version, and it just switches over.) 

So with CoreOS up and running, the next thing was to convert all the moving parts over to Docker containers. As it stands now, I didn't want to change too much of the basics, so I'm running Plone on a Gentoo container. That's way too much masochism though. I'd already been thinking I'd do a fresh one with a more generic out-of-the-box OS, and I've just realised I can pull a pre-built Plone image based on Debian (or Alpine). This gets better and better. And I can run it all up side-by-side in separate containers until I'm ready to flip the switch. Just great! Hmm... maybe my grand master plan was just to get to Plone 5! 

The Gentoo container I'm using is based on one created by the Gentoo community, which you can pull from the Docker hub. Once I found this, I thought I was home and dry, but it's not really well-suited to just pulling automatically from a docker file. What they've done is to separate out the portage tree into a separate container. This is smart, because you are unlikely to want the whole of portage in your container for any given purpose that makes you want to run Gentoo. What you do instead is mount the portage data using docker's --volumes-from argument. With it mounted, you can run emerge and install whatever packages you need, and then at runtime you get to run a much slimmer system. Which is great, but it means you have to create and store your own image manually rather than using a dockerfile. (At least, that's how it ended up for a noob like me, once I realised that dockerfile doesn't have an equivalent of --volumes-from.) 

My goal was to set up CoreOs to automatically pull the docker images it needed, and run some setup commands. This meant that I'd need to have my personalised Gentoo image available somewhere. Some of the data in there was sensitive, so I went looking for a private Docker registry that I could upload it to. There are plenty of private registries, but most of them aren't free. (If you don't mind the whole world pulling your containers, then free registries abound.) I eventually found https://canister.io/, which suited my needs. That said, my needs aren't much. If I ever need an alternative to canister, I'll probably look at Google Cloud Platform, which isn't free but has a private container registry where you only pay for storage and data egress, at pretty reasonable rates. Or I could just host it myself, but that's maybe too many eggs in the same basket. 

Meanwhile, my very next step ought most probably be to get backups sorted out. The "Dockerish" way to do this is to run up yet another dedicated container to deal with just this concern. Then if I want to host it separately, and my backup approach changes, nothing else needs to. Once I have the backups sorted out, it will definitely be worth the while to tidy things up so that I really can just push to the cloud if needs be. The way it's set up now, I could be up and running again very quickly but we're probably talking hours rather than seconds. 

I'm really enjoying the flexibility that containerisation gives me, although it's definitely important to get into the right mindset. Being able to build containers that will run on a really generic platform is quite liberating.

Hotfix rollups are the new Service Pack

I was recently surprised to learn that a Hotfix Rollup shipped from SDL Tridion is something quite different to what you'd expect from the title. For at least the last 10 years, and probably longer, the distinction between a hotfix and a service pack was very simple:

Service Pack

A collection of product improvements shipped between full version releases. The improvements would include bug fixes, and possibly new features, but never "breaking" changes. The intention was that customers should install the latest service pack for their current version. The service pack would have been thoroughly tested by R&D and would be the basis for on-going support until the next release.

Hotfix

If an issue was found in software in the field, a hotfix could be created to address this issue. There wouldn't be an installer - just some files and some instructions. Often a hotfix would be seen as suitable for any customer to install, but other hotfixes were riskier, and if you didn't have the problem, installing the hotfix would be a bad idea. Hotfixes were tested by customer support. The next service pack or full release would supersede any hotfix. In a reasonably thorough risk-management strategy, the standard play was to avoid taking hotfixes until you needed them. The official advice from Tridion as of 2011 was this:

IMPORTANT NOTE: Hotfixes are released at the discretion of SDL Tridion based on technical complexity, customer business requirements and schedules. Hotfixes are made and tested only for the described problem on a particular environment/configuration and therefore should only be installed if approved by SDL Tridion Customer Support. Hotfixes should be replaced as soon as possible by the subsequent service pack where the problem is fixed.

And then along came Hotfix Rollups...

Hotfix rollups

You might be forgiven for thinking that a hotfix rollup was, well a sort of erm... roll-up of hotfixes. A collection of hotfixes. A gathering together of a handy bunch of hotfixes to make life easier for the less risk-averse who like to install everything. (Like me, when I'm installing my own dev image. Love the handiness of it.) That's what the name means in any normal interpretation of the English language. The point here is that this is not what SDL Tridion mean when they say Hotfix Rollup. From discussions with various SDL people, it seems that they see a hotfix rollup as having the following characteristics:

  • It is not expected to cause any problems on your system and can safely be installed.
  • To this end, it has been tested by the relevant specialists in R&D
  • In the same way that you are expected to install a service pack, you are expected to install a hotfix rollup. Should further hotfixes become necessary, they will have the hotfix rollup as a dependency, not specific hotfixes. (This means that if you need that hotfix, you'll end up installing the hotfix rollup too, probably at a moment that you'd prefer to have chosen yourself.)

 

This is my best understanding at the current moment, but I am not aware of any formal communication from SDL that makes this clear, or otherwise updates the advice from 2011. Obviously, feel free to get formal confirmation via the usual channels

And as for you, SDL: your customers' risks are not your risks. You owe it to your customers to communicate correctly and in a timely way about this kind of thing. If anyone thought this would engender trust and confidence, that person was not thinking clearly. I wouldn't be saying this, but people out in the field often spend significant effort trying to balance risks like this, and it's in all our interests to make sure it goes well.

One-nine availability

Posted by Dominic Cronin at Aug 16, 2014 09:43 AM |

A couple of weeks ago, this site went down. That happens from time to time. It went down just as I left the country to go on holiday, and it could only be fixed via physical access, so it was down for a week. At least one person has commented that maybe I should stop with this silliness of running my own server on an old Gentoo box in the meter cupboard, and get some proper hosting.

The thing is, that when I started this blog, some years ago now, I went through a detailed requirements analysis, and a full MoSCoWMeh matrix. If you aren't familiar with MoSCoWMeh, this is an enhanced variant of the well-known MoSCoW technique, which also accommodates the needs of private and hobby-run systems.

The requirement for reliable hosting and 5-nines up-time was classified as Meh, and has remained so since. So now you know.