Skip to content. | Skip to navigation

Personal tools

Navigation

You are here: Home / weblog

Dominic Cronin's weblog

Showing blog entries tagged as: Regex

Enumerating the Tridion config replacement tokens

OK - I get it. It's starting to look like I've got some kind of monomania regarding the replacement tokens in Tridion config files, but bear with me. In my last blog post, I'd hacked out a regex that could be used for replacing them with their default values, but had thought better of actually doing so. But still, the idea of being able to grab all the tokens has some appeal. I can't bear to waste that regex, so now I'm looking for a reasonable use for it.

It occurred to me that at some point in an installation, it might be handy to have a comprehensive list of all the things you can pass in as environment variables. Based on what I'd done yesterday this was quite straightforward

gci -r -include *.xml -exclude logback.xml| sls '\$\{.*?\}' `
| select {$_.RelativePath((pwd))},LineNumber,{$_.Matches.value} `
| Export-Csv SitesNineTokens.csv

By going to my unzipped Tridion zip and running  this in the "Content Delivery/roles" folder, I had myself a spreadsheet with a list of all the tokens in Sites 9. Similarly, I created a spreadsheet for Web 8.5. (As you can see, I've excluded the logback files just to keep the volume down a bit, but in real life, you might also want to see those listed.)

The first thing you see when comparing Sites 9 with Web 8.5 is that there are a lot more of the things. More than twice as many. (At this point I should probably confess to some possible inaccuracy, as I haven't gone to the trouble of stripping out XML comments, so there could be some duplicates.)

65 of these come from the addition of XO and another 42 from IQ, but in general, there are just more of them. The bottom line is that to get a Tridion system up and running these days, you are dealing with hundreds of settings. To be fair, that's simply what's necessary in order to implement the various capabilities of such an enterprise system.

One curious thing I noticed is that the ambient configs all have a token to allow you to disable oauth security, yet no tokens for the security settings for the various roles. I wonder if this reflects the way people actually use Tridion.

Of course, you aren't necessarily limited by the tokens in the example configs of the shipped product. Are customers defining their own as they need them?

That's probably enough about this subject, though, isn't it?

Removing the replacement tokens from Tridion configuration files, and choosing not to

In SDL Web 8.5 we saw the introduction of replacement tokens in the content delivery configuration files. Whereas previously we'd simply had XML files with attributes and elements that we filled in with the relevant values, the replacement tokens allowed for the values to be provided externally when the configuration file is used. The commonest way to do this is probably by using environment variables, but you can also pass them as arguments to the java runtime. (A while ago, I wasted a bunch of time writing a script to pass environment variables in via java arguments. You don't need to do this.)

So anyway - taking the deployer config as my example, we started to see this kind of thing:

<Queues>
 <Queue Id="ContentQueue" Adapter="FileSystem" Verbs="Content" Default="true">
  <Property Value="${queuePath}" Name="Destination"/>
 </Queue>
 <Queue Id="CommitQueue" Adapter="FileSystem" Verbs="Commit,Rollback">
  <Property Value="${queuePath}/FinalTX" Name="Destination"/>
 </Queue>
 <Queue Id="PrepareQueue" Adapter="FileSystem" Verbs="Prepare">
  <Property Value="${queuePath}/Prepare" Name="Destination"/>
 </Queue>

Or from the storage conf of the disco service:

<Role Name="TokenServiceCapability" Url="${tokenurl:-http://localhost:8082/token.svc}">

So if you have an environment variable called queuePath, it will be used instead of ${queuePath}. In the second example, you can see that there's also a syntax for providing a default value, so if there's a tokenurl environment variable, that will be used, and if not, you'll get http://localhost:8082/token.svc.

This kind of replacement token is very common in the *nix world, where it's taken to even further extremes. Most of this is based on the venerable Shell Parameter Expansion syntax.

All this is great for automated deployments and I'm sure the team running SDL's cloud services makes full use of this technique. For now, I'm still using my own scripts to replace values in the config files, so a recent addition turned out to be a bit inconvenient. In Tridion Sites 9, the queue Ids in the deployer config have also been tokenised. So now we have this kind of thing:

<Queue Default="true" Verbs="Content" Adapter="FileSystem" Id="${contentqueuename:-ContentQueue}">
  <Property Name="Destination" Value="${queuePath}"/>
</Queue>

Seeing as I had an XPath that locates the Queue elements by ID, this wasn't too helpful. (Yes, yes, in the general case it's great, but I'm thinking purely selfishly!) Shooting from the hip I quickly updated my script with an awesome regex :-) , so instead of

$config = [xml](gc $deployerConfig)

I had

$config = [xml]((gc $deployerConfig) -replace '\$\{(?:.*?):-(.*?)\}','$1')

About ten seconds after finishing this, I realised that what I should be doing instead is fixing my XPath to glom onto the Verbs attribute instead, but you can't just throw away a good regex. So - I present to you, this beautiful regex for converting shell parameter expansions (or whatever they are called) into their default values while using the PowerShell. In other words, ${contentqueuename:-ContentQueue} becomes ContentQueue.

How does it work? Here it comes, one piece at a time:

'            single quote. Otherwise Powershell will interpret characters like $ and {, which you don't want
\            a slash to escape the dollar from the regex
$            the opening dollar of the expansion espression
\{           match the {, also escaped from the regex
(?:.*?)      match zero or more of anything, non-greedily, and without capturing
:- match the :-
(.*?) match zero or more of anything non-greedily. No ?: this time so it's captured for use later as $1
\} match the }
' single quote

 The second parameter of -replace is '$1', which translates to "the first capture". Note the single quotes, for the same reason as before

So there you have it. Now if ever you need to rip through a bunch of config files and blindly accept the defaults for everything, you know how. But meh... you could also just not provide any values in the environment. I refuse to accept that this hack is useless. A reason will emerge. The universe abhors a scripting hack with no purpose.

Stripping namespace declarations from XML

Posted by Dominic Cronin at Nov 19, 2017 12:30 PM |

I've recently been working on an application that will allow members of our content management teams to search within a chosen folder in Tridion for specific content. You might think that's well enough covered by the built-in search functionality, but we're heading towards a search and replace feature, so we pretty much have to process the content ourselves. In the end users' view of the world, a Rich Text field in a component has... well...  a rich text view, and, for the power-users, a Source tab where you can see the underlying HTML. That's all fine, but once you get to the technical implementation, it's a bit more complicated, and we'll end up replicating some of Tridion's own smoke and mirrors to present a view to the users that's consistent with what they are used to. This means not only that we need to be able to translate from text to HTML, but also from "XML in the XHTML namespace" to HTML. One of the bulding blocks we need to do this is the ability to take XML with namespace declarations, and get rid of them so that the result isn't in a namespace. 

A purist (such as myself) might say that the only correct way to parse XML is with an XML parser, and just in case you've never ended up there, I heartily recommend that you read this answer on Stack Exchange before proceding further. Still - in this case, what I want to do is amenable to RegExes, and yes, I know: now I have two problems. Anyway - FWIW - I started this at the office, thinking I'd just quickly Google for a namespace-stripping regex and I'd be on my way. Suffice it to say that the Internet is rubbish at this. I ended up with a page of links to rubbish regexes that just weren't going to float my boat. So I mailed the problem to myself at home, and today, in the quiet of a Sunday morning, it didn't seem quite so daunting. Actually, I'm still considering whether an XML-parser approach, or an XSLT might not be better, and I may end up there if my needs turn out to be more complex, but for now, here's the namespace stripper. 

static Regex namespaceRegex = new Regex(@"    
xmlns # literal (:[^\s=]+)? # : followed by one or more non-whitespace, non-equals chars \s* # optional whitespace = # literal \s* # optional whitespace (?<quote>['""]) # Either a single or double quote - giving it the name 'quote' for back-reference .+? # Non-greedily match anything \k<quote> # The end-quote to match the one we found earlier ", RegexOptions.Singleline | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
public static string RemoveNamespacesFromDocument(string xml) { return namespaceRegex.Replace(xml, string.Empty); }

Of course, this is written in C#, and I'm taking advantage of the IgnorePatternWhitespace feature in .NET regexes, which allows for the copious comments that might well be necessary if I ever have to actually read this code instead of just writing it. 

But just in case you are hardcore, and all that named matches and commenting fuss is for wusses, here's the TL;DR...

@"(?is)xmlns(:[^\s=]+)?\s*=\s*(['""]).+?\2"

What's not to like? :-)