Skip to content. | Skip to navigation

Personal tools

Navigation

You are here: Home / weblog

Dominic Cronin's weblog

Showing blog entries tagged as: SDL Web

Out with the old, in with the new. How will 2019 look for Tridion specialists?

Posted by Dominic Cronin at Dec 31, 2018 05:12 PM |

It's New Year's eve: a traditional time to look backwards and forwards. I've spent a little time contemplating these grand themes in the context of my life as a Tridion specialist. Where are we now and what will the new year bring?

Let's start with 2018. The major event this year, of course, was the release of SDL Tridion DX, incorporating SDL Tridion Sites 9. So the first thing you see is that we got the Tridion name back. Hurrah! OK - enough of that: getting the name back is good, but there are other things to get excited about.

Firstly - Tridion DX brought SDL's two content management systems together: the web content management system "formerly known as Tridion", and the SDL Knowledge Center. The new branding names the first "SDL Tridion Sites" and the second "SDL Tridion Docs". So now we have both the web content management and structured content management features in the same product. To be honest, I suspect at first the number of customers who want to combine the two will be small, but for companies that do need to straddle these two worlds, the integration provided by DX will be a killer feature. As time goes on, we'll probably find that having both approaches available helps to prevent the need to knock a round peg into a square hole in some implementations. It's also clear that this represents a significant engineering effort at SDL. They haven't just put everything in the same shrink-wrap, but for example, the content delivery architecture has had a major revamp to get the two systems to play nicely together. Even this got a new branding: the Unified Delivery Platform!

I suspect, though that in 2019, I'll be mostly busy with pure WCM work. Sites 9 brings a raft load of enhancements that help to keep it current in the fast-moving world of modern web development. The most interesting is perhaps the new model service. We've seen a model service as part of the DXA framework, but Sites 9 has a "Public Content API", which boils down to a GraphQL endpoint. Tridion's architecture has always had great separation of concerns, so in many ways, it can take the current trend for headless sites in its stride, but a GraphQL service will make it easier to consume content directly, without having to build server-side support as part of your implementation. GraphQL allows you to specify exactly which data you want to get back, and will enable developers to ensure the data traffic between server and client is clean and lean.

There are also other interesting new features. A good example is regions within pages. In practice, the build-up of a web page is done this way - we have different areas of the page showing different kinds of content, and it's great to see that this kind of structure can now be modeled directly in the content manager. I'll stop there; there are far too many new features for a short blog post.

The Sites 9 release has meant a matching update (2.1) to the DXA framework, which is now using the new public content API and of course has support for the new page regions.

So going in to 2019, things are looking really great for anyone beginning a greenfield project on SDL Tridion. That's not the whole story, though. At the other end of the spectrum, there are always customers who are waiting for the right moment to upgrade from an older version. This might be the year when we finally say goodbye to our old friend vbScript. As I understand it, from the Sites 9 release onwards, the legacy support won't even install any more, so organisations that still have vbScript will be planning how to migrate before the next "major" release puts them out of support. To be fair to SDL, by my calculation it's 16 years since compound templating was introduced. That ought to be ample time, you'd have thought. Putting that a bit more positively, we now have very much better ways of doing things, and a Tridion 9/DXA 2.1 approach is a very much better place to be. 

I suspect the other main themes for 2019 will be cloud computing and devops. As organisations move forward to the new product versions, they are also looking at their architectures and working practices. Fortunately, Tridion as a product is already highly cloud-capable, and the move away from templating on the content manager has definitely had an impact on how easy it is to implement continuous integration and delivery/deployment.

It will be a year of transition for many of our customers: not only the technical transitions that I've mentioned, to new architectures and techniques, but also for the business people who are looking to take the next step towards a unified on-line experience for their customers and visitors.

I'm looking forward to it. Bring it on!

A happy new year to you all.

Using environment variables to configure the Tridion microservices

Within a day of posting this, Peter Kjaer informed me that the microservices already support environment variables, so this entire blog post is pointless. So my life just got simpler, but it cost me a blog post to find out. Oh well. I'm currently trying to decide whether to delete the post entirely or work it into something useful. In the meantime at least be aware that it's pointless! :-) Anyway - thanks Peter.

When setting up a Tridion content delivery infrastructure, one of the most important considerations is how you are going to manage all the configuration values. The microservices have configuration files that look very similar to those we're familiar with from versions of Tridion going back to R5. Fairly recently, (in 8.5, I think) they acquired a "new trick", which is that you can put replacement tokens in the files, and these will be filled in with values that you can pass as JVM parameters when starting up your java process. Here's an example taken from cd_discovery_conf.xml

<ConfigRepository ServiceUri="${discoveryurl:-http://localhost:8082/discovery.svc}"
ConnectionTimeout="10000"
    CacheEnabled="true"
    CacheExpirationDuration="600"
    ServiceMonitorPollDuration="10"
    ClientId="registration"
    ClientSecret="encrypted:HzfQh9wYwAKShDxCm4DnnBnysAz9PtbDMFXMbPszSVY="
    TokenServiceUrl="${tokenurl:-http://localhost:8082/token.svc}">

Here you can see the tokens "discoveryurl" and "tokenurl" delimited from the surrounding text with ${} and followed by default values after the :- symbol.

This is really handy if you are doing any kind of managed provisioning where the settings have to come from some external source. One word of warning, though. If you are setting up your system by hand and intending to maintain it that way, it's most likely a really bad idea to use this technique. In particular, if you are going to install the services under Windows, you'll find that the JVM parameters are stored in a deeply obscure part of the registry. More to the point, you really don't want two versions of the truth, and if you have to look every time to figure out whether tokenurl is coming from the default in your config or from deep underground, I don't hold out much hope for your continued sanity if you ever have to troubleshoot the thing.

That said, if you do want to provision these values externally, this is the way to go. Or at least, in general, it's what you want, but personally I'm not really too happy with the fact that you have to use JVM parameters for this. I've recently been setting up a dockerised system, and I found myself wishing that I could use environment variables instead. That's partly because this is a natural idiom with docker. Docker doesn't care what you run in a container, and has absolutely no notion of a JVM parameter. On the other hand, Docker knows all about environment variables, and provides full support for passing them in when you start the container. On the command line, you can do this with something like:

> docker run -it -e dbtype=MSSQL -e dbclass=com.microsoft.sqlserver.jdbc.SQLServerDataSource -e dbhost=mssql -e dbport=1433 -e dbname=Tridion_Disc
-e discoveryurl=http://localhost:8082/discovery.svc -e tokenurl=http://localhost:8082/token.svc discovery bash

I'm just illustrating how you'd pass command-line environment arguments, so don't pay too much attention to anything else here, and of course, even if you had a container that could run your service, this wouldn't work. It's not very much less ugly than constructing a huge set of command parameters for your start.sh and passing them as a command array. But bear with me; I still don't want to construct that command array, and there are nicer ways of passing in the environment variables. For example, here's how they might look in a docker-compose.yaml file (Please just assume that any YAML I post is accompanied by a ritual hawk and spit. A curse be on YAML and it's benighted followers.)

   environment: 
      - dbtype=MSSQL
      - dbclass=com.microsoft.sqlserver.jdbc.SQLServerDataSource
      - dbhost=mssql
      - dbport=1433
      - dbname=Tridion_Discovery
      - dbuser=TridionBrokerUser
      - dbpassword=Tridion1
      - discoveryurl=http://localhost:8082/discovery.svc
      - tokenurl=http://localhost:8082/token.svc

This is much more readable and manageable. In practice, rather than docker-compose, it's quite likely that you'll be using some more advanced orchestration tools, perhaps wrapped up in some nice cloudy management system. In any of these environments, you'll find good support for passing in some neatly arranged environment variables. (OK - it will probably degenerate to YAML at some point, but let's leave that aside for now.)

Out of the box, the Tridion services are started with a bash script "start.sh" that's to be found in the bin directory of your service. I didn't want to mess with this: any future updates would then be a cause for much fiddling and cursing. On top of that, I wanted something I could generically apply to all the services. My approach looks like this:

#!/bin/bash
# vim: set fileformat=unix

scriptArgs=""
tcdenvMatcher='^tcdconf_([^=]*)=(.*)'
for tcdenv in $(printenv); do
    if [[ $tcdenv =~ $tcdenvMatcher ]]; then
        scriptArgs="$scriptArgs -D${BASH_REMATCH[1]}=${BASH_REMATCH[2]}"
    fi
done

script_path="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null && pwd )"
$script_path/start.sh $scriptArgs

(I'm sticking with the docker-compose example to illustrate this. In fact, with docker-compose, you'd also need to script some dependency-management between the various services, which is why you'd probably prefer to use a proper orchestration framework.)

The script is called "startFromEnv.sh". When I create my docker containers, I drop this into the bin folder right next to start.sh. When I start the container, the command becomes something like this, (but YMMV depending on how you build your images).

command: "/Discovery/bin/startFromEnv.sh"

instead of:

command: "/Discovery/bin/start.sh"

And the environment variables get some prefixes, so the relevant section of the setup looks like this:

    environment: 
      - tcdconf_dbtype=MSSQL
      - tcdconf_dbclass=com.microsoft.sqlserver.jdbc.SQLServerDataSource 
      - tcdconf_dbhost=mssql
      - tcdconf_dbport=1433
      - tcdconf_dbname=Tridion_Discovery
      - tcdconf_dbuser=TridionBrokerUser
      - tcdconf_dbpassword=Tridion1
      - tcdconf_discoveryurl=http://localhost:8082/discovery.svc
      - tcdconf_tokenurl=http://localhost:8082/token.svc

The script is written in bash, as evidenced by the hashbang line at the top. (Immediately after is a vim modeline that you can ignore or delete unless you happen to be using an editor that respects such things and you are working on a Windows system. I've left it as a reminder that the line endings in the file do need to be unix-style.)

The rest of the script simply(!) loops through the environment variables that are prefixed with "tcdconf_" and converts them to -D arguments which it then passes on to script.sh (which it looks for in the same directory as itself).

I'm still experimenting, but for now I'm assuming that this approach has improved my life. Please do let me know if it improves yours. :-)

If you think the script is ugly, apparently this is a design goal of bash, so don't worry about it. At least it's not YAML (hack, spit!)

Tridion Core service PowerShell settings for SSO-enabled CMS

Posted by Dominic Cronin at Nov 18, 2018 07:30 PM |

In a Single-Sign-On (SSO) configuration, it's necessary to use Basic Authentication for web requests to the Tridion Content Manager from the browser. This is probably the oldest way of authenticating a web request, and involves sending the password in plain over the wire. This allows the SSO system to make use of the password, which would be impossible if you used, for example, Windows Authentication. The down side of this is that you'd be sending the password in plain over the wire... can't have that, so we encrypt the connection with HTTPS.

What I'm describing here is the relatively simple use case of using the powershell module to log in to an SSO-enabled site using a domain account. Do please note that this won't work if you're expecting to authenticate using SSO. Then you'll need to mess around with federated security tokens and such things. My use case is a little simpler as I have a domain account I can log in with. As the site is set up to support most of the users coming in via SSO, these are the settings I needed, and hence this "note to self" post. If anyone has gone the extra mile to get SSO working, I'd be interested to hear about it.

So this is how it ends up looking:

Import-Module Tridion-CoreService
Set-TridionCoreServiceSettings -HostName 'contentmanager.company.com'
Set-TridionCoreServiceSettings -Version 'Web-8.5'
Set-TridionCoreServiceSettings -CredentialType 'Basic'
Set-TridionCoreServiceSettings -ConnectionType 'Basic-SSL'
$ServiceAccountPassword = ConvertTo-SecureString 'secret' -AsPlainText -Force
$ServiceAccountCredential = New-Object System.Management.Automation.PSCredential ('DOMAIN\login', $ServiceAccountPassword)
Set-TridionCoreServiceSettings -Credential $ServiceAccountCredential

$core = Get-TridionCoreServiceClient
$core.GetApiVersion() # The simplest test

This is just an example, so I've stored my password in the script. The password is 'secret'. It's a secret. Don't tell anyone. Still - even though I'm a bit lacking in security rigour, the PowerShell isn't. It only wants to work with secure strings and so should you. In fact, it's not much more fuss to work with Convert-ToSecureString and friends to keep everything ship shape and Bristol fashion.

 

 

Using the Tridion PowerShell module in a restricted environment

At some point, pretty much every Tridion specialist is going to want to make use of Peter Kjaer's Tridion Core Service Powershell modules.  The modules come with batteries included, and if you look at the latest version, you'll see that the modules are available from the PowerShell gallery, and therefore a simple install via Install-Module should "just work".

Most of us spend a lot of our time on computers that are behind a corporate firewall, and on which the operating system is managed for us by people whose main focus is on not allowing us to break anything. I recently found myself trying to install the modules on a system with an older version of PowerShell where Install-Module wasn't available. The solution for this is usually to install the PowerShellGet module which makes Install-Module available to you. In this particular environment, I knew that various other difficulties existed, notably with the way the PowerShell module path is managed. Installing a module would first require a solution to the problem of installing modules. In the past, I'd made a custom version of the Tridion module as a workaround, but now I was trying to get back to a clean copy of the latest, greatest version. Hacking things by hand would defeat my purpose.

It turned out that I was able to clone the GIT repository, so I had the folder structure on disk. (Failing that I could have tried downloading a Zip file from GitHub.) 

Normally, you install your modules in a location on the Module Path of your PowerShell, and the commonest of these locations is the WindowsPowerShell folder in your Documents folder. (There are other locations, and you can check these with "gc Env:\PSModulePath".) As I've mentioned, in this case, using the normal Module Path mechanism was problematic, so I looked a little further. It turned out the solution was much simpler than I had feared. You can simply load a module by specifying its location when you call ImportModule. I made sure that the tridion-powershell-modules folder I'd got from GIT was in a known location relative to the script file from which I wanted to invoke it, and then called Import-Module using the location of Tridion-CoreService.psd1

$scriptLocation = Split-Path ((Get-Variable MyInvocation -Scope 0).Value).MyCommand.Path 
import-module $scriptLocation\..\tridion-powershell-modules\CoreService\Tridion-CoreService.psd1

Getting the script location from the built-in MyInvocation variable is ugly, but pretty much standard PowerShell. Anyway - this works, and I now have a strategy for setting up my scripts to use the latest version of the core service module. Obviously, if you want the Alchemy or Content Delivery module, a similar technique ought to work.

Tridion Sites 9.... and beyond!!!

Posted by Dominic Cronin at Oct 09, 2018 06:07 PM |

A month or so ago, Amsterdam was again host to the Tridion Developer Summit. This is a great event for anyone involved with Tridion, and each year it goes from strength to strength. This year, a lot of the focus, understandably, was on the forthcoming release of Tridion Sites 9, which will be part of Tridion DX. We heard speakers from SDL and from the wider community talking on a variety of topics. In one sense, I suppose, the usual mixture, but there's always a certain excitement when a new major release is coming out. (Yes, I know we don't call them major's any more, but still, we're looking at brand new APIs that none of us have used yet: that's a major in my book!)

The talks covered everything from the new user interface, to the combined play with structured content that the DX platform will offer, to new services based on GraphQL (which is probably becoming the "must study" topic). Other speakers covered integrations and extension points and javascript and, well, you name it. If you spoke, and I haven't mentioned your bit, please don't take offence!

It was a great conference, which I thoroughly enjoyed; not least because of the chance to catch up with everyone. But a month later, I just want to share the thing that really blew me away and stuck with me. The new product release isn't finished just yet, but the scope is more or less fixed. If a feature isn't already in, then it probably won't be in Tridion sites 9. That said, the guys in R&D are not standing still, and they are already looking forward to the next thing. Which brings me to the buzz moment of this year's summit. I'm not sure if Likhan Siddiquee was even meant to be presenting in the main theatre at that moment, but well... Likhan's an enthusiast. If this guy's got some amazing new tech to show, try and stop him! (Good luck with that!) So he comes in and just kind of tags along after a couple of the other SDL presenters. He's showman enough that it could have all been staged, but he managed to make it seem as though... well... he just had this cool stuff on his laptop and.... did we maybe have five or ten minutes?

So he walks on stage carrying his kid - a babe in arms! Start em young, I suppose. Anyway child-care and work-life balance obviously hold no fears for Likhan. He hands off the baby to his able assistant, and proceeds to unveil the geeky goodies. What did he have? Nothing less than the Tridion kernel running on .NET core! Sure - this was a pre-preview. Hot off the press. No user interface, and only a bare-bones system, but sure enough he got it going from the command prompt with the "dotnet" command and proceded to start hitting service endpoints with a web browser. Wow!

It was a moment in time. You had to be there. I'm sure we'll be waiting a while to see a production version. For sure it won't make it into the 9 release, but who cares? Hey for a lot of people, they won't even notice. Nothing wrong with running Windows Server, is there? Still this will open up lots of possibilities for different kinds of hosting options, and for those of us who like to run a "fifth environment" it's going to be awesome. Everything on linux containers. What's not to like?

Thanks to all those who took part in the Summit. You were all great, but especially thanks Likhan for that inspiring moment!

Preparing HTML data for use in a Tridion Rich Text Format area

Posted by Dominic Cronin at Aug 19, 2018 11:25 AM |

I recently had to create some Tridion components from code via the core service. The incoming data was in the form of HTML, and not XML in the XHTML namespace, which is what is required for a Tridion RTF area. I'd also had to do some preparatory clean-up of the data, and by the time I wanted to fix up the namespaces, I already had the input data in an XLinq XElement

These days, if I'm processing XML in .NET, I'm quite likely to use XLinq. It's taken me a while to get comfortable with some of its idioms. The technique I ended up using is similar to the classic approach we typically adopt in XSLT, starting with an identity transform and making a couple of minor tweaks to the data as it goes through. 

So, mostly by way of a "note to self", here's how it looks in XLinq. All you need to do is pass in your XElement containing your XHTML, and it will rip through all the elements and put them in the XHTML namespace, leaving all the attributes and other nodes untouched. 

public XNode PutHtmlElementsInXhtmlNamespace(XNode input){
XNamespace xhtmlNs = "http://www.w3.org/1999/xhtml"; var element = input as XElement; if (element != null) { XName name = xhtmlNs + element.Name.LocalName; return new XElement(name,element.Attributes(), element.Nodes().Select(n => PutHtmlElementsInXhtmlNamespace(n))); }   return input; }

In this way you can easily create data that's suitable for use in an RTF. Piecing the rest of a Content element together with XElement is pretty easy too, or of course, you can use the venerable Fields class for the rest. 

Building a DXA module in Java

Posted by Dominic Cronin at Jun 15, 2018 01:37 PM |
Filed under: , ,

I'm currently trying to get a bit of practice in working with DXA 2.0 in Java. Some months ago I did SDL's DXA course, which gets you in the quickest possible way to a working DXA implementation. You have to follow up by filling in the details, and today was the first time I'd tried to create a module by actually following the instructions in the documentation. 

I was looking at the documentation page for Building a Java module with Maven POM, and my first attempt was simply to copy the POM from the documentation. Although it seemed like a good idea at the time, pretty soon I was staring at a  nasty-looking error: 

Project build error: Non-resolvable parent POM for dxa-modules:module-one:[unknown-version]: 
Could not find artifact com.sdl.dxa.modules:dxa-modules:pom:1.2-SNAPSHOT and 
'parent.relativePath' 
points at wrong local POM pom.xml /module-one line 3 Maven pom Loading Problem

When I say it seemed like a good idea at the time, to tell the truth, I'd already had my doubts when I saw SNAPSHOT, and that indeed turned out to be the problem. When using Maven, a snapshot build is one that a developer creates locally; you wouldn't expect a snapshot build to be released to a repository. Statistically speaking, Java developers spend 17.3% of their working hours Googling for the correct versions of external dependencies that they need to get out of various external repositories. That's the great thing about Maven; once you get the versions right, everything works by magic and you can go and have a cup of tea. 

So - like thousands before me, I duly Googled, and ended up on a page that told me I could use version 1.3.0

So I fixed up the POM so that this: 

<parent> 
<groupId>com.sdl.dxa.modules</groupId>
<artifactId>dxa-modules</artifactId>
<version>1.2-SNAPSHOT</version>
</parent>

looked like this: 

<parent> 
<groupId>com.sdl.dxa.modules</groupId>
<artifactId>dxa-modules</artifactId>
<version>1.3.0</version>
</parent>

That's fixed it, so now I can get on with the rest of the job. And sure, this is blindingly obvious if you do a lot of Java, but these little things can slow you down a fair bit. In this case, I'd spent some time obsessing about Maven a while back, so I got there reasonably quickly, but we're not always that lucky! 

Encrypting passwords for Tridion content delivery

Posted by Dominic Cronin at May 10, 2018 05:08 PM |

This is just a quick note to self, because I just spent a few minutes figuring out something fairly trivial and I don't want to forget it.

Previously, to encrypt a password for Tridion content delivery, you would do something like:

java -cp cd_core.jar com.tridion.crypto.Encrypt foobar

It's been a while since I did this, and I hadn't realised that in Web 8.5 it doesn't work any more. They've factored the Crypto class out into a utility jar, so now the equivalent command has become something like:

java -cp cd_core.jar;cd_common_util.jar com.tridion.crypto.Encrypt foobar

Of course, these days the jars also have build numbers in the name, so it's a bit uglier. The point is that you have to have cd_core and cd_common_util on your classpath.

 

Getting started with Insomnia as a Tridion content delivery client

Posted by Dominic Cronin at Dec 17, 2017 08:15 PM |

Today I ran across Insomnia, which is a generic development/test client for RESTful HTTP services much along the same lines as Postman. The latter is pretty well established, but it's a paid product, and Insomnia seems at first sight to be more or less a clone, but open source and free. (That said, Postman is free to most people, and Insomnia has paid-for plugins. Everyone's got to eat, right?)

It will hardly be a surprise to the reader that my interest in this is in the context of Tridion's content delivery APIs. To be honest I haven't really spent much time getting to know Postman, preferring to make use of simple Powershell scripts for purposes such as validating that the services are running and that authentication is working. While there's much to be said for a scripted approach, I've always had niggling doubts that perhaps I'd find my way around the data a bit more easily with a GUI client. Coming across Insomnia today is my opportunity to find out whether this is so.

I started by downloading and installing the Windows version (like Postman, it's also available for Linux and Mac). So far, I've got as far as making a simple query against my content service. To do this, you have to figure your way through the somewhat arcane details of getting an OAuth token. The services on my Tridion research server are not secured in any meaningful way, but OAuth is still "switched on. That is to say, I have the out-of-the-box user accounts configured in my discovery service's cd_ambient_conf.xml along with the out-of-the-box passwords. So obviously, don't do this at home children, but hey - it's my research rig, not a production server. This being the case, I'm not giving much away by sharing the following:

What you can see here is that my Tridion image is running at "sdlweb", so I'm issuing a GET against http://sdlweb:8081/client/v2/content.svc. Insomnia has support for variables, so I imagine you could use one for the hostname if you want to keep your tests generic.

You can also see that I've got the authentication tab open and have selected OAuth2. The first thing you need to do is select Client Credentials for the grant type. With this choice, you only need to fill in the client id and secret. (Obviously these need to match your actual security settings, and of course, you haven't left these at their defaults... right!?) 

The only thing that made me scratch my head for a short moment was that when I tried with just those details, it didn't work, and I got a 400 status back. That's HTTP for "Bad request", so I went into the Advanced settings to see if there was anything I could change to make the server happier about my manners. It turns out that switching Credentials to "In Request Body" is all you need and as you can see, there's a nice green 200 status displaying, and some data from the service.

Well that's enough to get me started. Please do let me know about your experiences with Insomnia. Especially if you're a Postman maven, let me know how the two stack up against each other.

 

Stripping namespace declarations from XML

Posted by Dominic Cronin at Nov 19, 2017 12:30 PM |

I've recently been working on an application that will allow members of our content management teams to search within a chosen folder in Tridion for specific content. You might think that's well enough covered by the built-in search functionality, but we're heading towards a search and replace feature, so we pretty much have to process the content ourselves. In the end users' view of the world, a Rich Text field in a component has... well...  a rich text view, and, for the power-users, a Source tab where you can see the underlying HTML. That's all fine, but once you get to the technical implementation, it's a bit more complicated, and we'll end up replicating some of Tridion's own smoke and mirrors to present a view to the users that's consistent with what they are used to. This means not only that we need to be able to translate from text to HTML, but also from "XML in the XHTML namespace" to HTML. One of the bulding blocks we need to do this is the ability to take XML with namespace declarations, and get rid of them so that the result isn't in a namespace. 

A purist (such as myself) might say that the only correct way to parse XML is with an XML parser, and just in case you've never ended up there, I heartily recommend that you read this answer on Stack Exchange before proceding further. Still - in this case, what I want to do is amenable to RegExes, and yes, I know: now I have two problems. Anyway - FWIW - I started this at the office, thinking I'd just quickly Google for a namespace-stripping regex and I'd be on my way. Suffice it to say that the Internet is rubbish at this. I ended up with a page of links to rubbish regexes that just weren't going to float my boat. So I mailed the problem to myself at home, and today, in the quiet of a Sunday morning, it didn't seem quite so daunting. Actually, I'm still considering whether an XML-parser approach, or an XSLT might not be better, and I may end up there if my needs turn out to be more complex, but for now, here's the namespace stripper. 

static Regex namespaceRegex = new Regex(@"    
xmlns # literal (:[^\s=]+)? # : followed by one or more non-whitespace, non-equals chars \s* # optional whitespace = # literal \s* # optional whitespace (?<quote>['""]) # Either a single or double quote - giving it the name 'quote' for back-reference .+? # Non-greedily match anything \k<quote> # The end-quote to match the one we found earlier ", RegexOptions.Singleline | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
public static string RemoveNamespacesFromDocument(string xml) { return namespaceRegex.Replace(xml, string.Empty); }

Of course, this is written in C#, and I'm taking advantage of the IgnorePatternWhitespace feature in .NET regexes, which allows for the copious comments that might well be necessary if I ever have to actually read this code instead of just writing it. 

But just in case you are hardcore, and all that named matches and commenting fuss is for wusses, here's the TL;DR...

@"(?is)xmlns(:[^\s=]+)?\s*=\s*(['""]).+?\2"

What's not to like? :-)