Skip to content. | Skip to navigation

Personal tools

Navigation

You are here: Home / weblog

Dominic Cronin's weblog

Building a DXA module in Java

Posted by Dominic Cronin at Jun 15, 2018 01:37 PM |
Filed under: , ,

I'm currently trying to get a bit of practice in working with DXA 2.0 in Java. Some months ago I did SDL's DXA course, which gets you in the quickest possible way to a working DXA implementation. You have to follow up by filling in the details, and today was the first time I'd tried to create a module by actually following the instructions in the documentation. 

I was looking at the documentation page for Building a Java module with Maven POM, and my first attempt was simply to copy the POM from the documentation. Although it seemed like a good idea at the time, pretty soon I was staring at a  nasty-looking error: 

Project build error: Non-resolvable parent POM for dxa-modules:module-one:[unknown-version]: 
Could not find artifact com.sdl.dxa.modules:dxa-modules:pom:1.2-SNAPSHOT and 
'parent.relativePath' 
points at wrong local POM pom.xml /module-one line 3 Maven pom Loading Problem

When I say it seemed like a good idea at the time, to tell the truth, I'd already had my doubts when I saw SNAPSHOT, and that indeed turned out to be the problem. When using Maven, a snapshot build is one that a developer creates locally; you wouldn't expect a snapshot build to be released to a repository. Statistically speaking, Java developers spend 17.3% of their working hours Googling for the correct versions of external dependencies that they need to get out of various external repositories. That's the great thing about Maven; once you get the versions right, everything works by magic and you can go and have a cup of tea. 

So - like thousands before me, I duly Googled, and ended up on a page that told me I could use version 1.3.0

So I fixed up the POM so that this: 

<parent> 
<groupId>com.sdl.dxa.modules</groupId>
<artifactId>dxa-modules</artifactId>
<version>1.2-SNAPSHOT</version>
</parent>

looked like this: 

<parent> 
<groupId>com.sdl.dxa.modules</groupId>
<artifactId>dxa-modules</artifactId>
<version>1.3.0</version>
</parent>

That's fixed it, so now I can get on with the rest of the job. And sure, this is blindingly obvious if you do a lot of Java, but these little things can slow you down a fair bit. In this case, I'd spent some time obsessing about Maven a while back, so I got there reasonably quickly, but we're not always that lucky! 

Encrypting passwords for Tridion content delivery

Posted by Dominic Cronin at May 10, 2018 05:08 PM |

This is just a quick note to self, because I just spent a few minutes figuring out something fairly trivial and I don't want to forget it.

Previously, to encrypt a password for Tridion content delivery, you would do something like:

java -cp cd_core.jar com.tridion.crypto.Encrypt foobar

It's been a while since I did this, and I hadn't realised that in Web 8.5 it doesn't work any more. They've factored the Crypto class out into a utility jar, so now the equivalent command has become something like:

java -cp cd_core.jar;cd_common_util.jar com.tridion.crypto.Encrypt foobar

Of course, these days the jars also have build numbers in the name, so it's a bit uglier. The point is that you have to have cd_core and cd_common_util on your classpath.

 

Getting started with Insomnia as a Tridion content delivery client

Posted by Dominic Cronin at Dec 17, 2017 08:15 PM |

Today I ran across Insomnia, which is a generic development/test client for RESTful HTTP services much along the same lines as Postman. The latter is pretty well established, but it's a paid product, and Insomnia seems at first sight to be more or less a clone, but open source and free. (That said, Postman is free to most people, and Insomnia has paid-for plugins. Everyone's got to eat, right?)

It will hardly be a surprise to the reader that my interest in this is in the context of Tridion's content delivery APIs. To be honest I haven't really spent much time getting to know Postman, preferring to make use of simple Powershell scripts for purposes such as validating that the services are running and that authentication is working. While there's much to be said for a scripted approach, I've always had niggling doubts that perhaps I'd find my way around the data a bit more easily with a GUI client. Coming across Insomnia today is my opportunity to find out whether this is so.

I started by downloading and installing the Windows version (like Postman, it's also available for Linux and Mac). So far, I've got as far as making a simple query against my content service. To do this, you have to figure your way through the somewhat arcane details of getting an OAuth token. The services on my Tridion research server are not secured in any meaningful way, but OAuth is still "switched on. That is to say, I have the out-of-the-box user accounts configured in my discovery service's cd_ambient_conf.xml along with the out-of-the-box passwords. So obviously, don't do this at home children, but hey - it's my research rig, not a production server. This being the case, I'm not giving much away by sharing the following:

What you can see here is that my Tridion image is running at "sdlweb", so I'm issuing a GET against http://sdlweb:8081/client/v2/content.svc. Insomnia has support for variables, so I imagine you could use one for the hostname if you want to keep your tests generic.

You can also see that I've got the authentication tab open and have selected OAuth2. The first thing you need to do is select Client Credentials for the grant type. With this choice, you only need to fill in the client id and secret. (Obviously these need to match your actual security settings, and of course, you haven't left these at their defaults... right!?) 

The only thing that made me scratch my head for a short moment was that when I tried with just those details, it didn't work, and I got a 400 status back. That's HTTP for "Bad request", so I went into the Advanced settings to see if there was anything I could change to make the server happier about my manners. It turns out that switching Credentials to "In Request Body" is all you need and as you can see, there's a nice green 200 status displaying, and some data from the service.

Well that's enough to get me started. Please do let me know about your experiences with Insomnia. Especially if you're a Postman maven, let me know how the two stack up against each other.

 

Stripping namespace declarations from XML

Posted by Dominic Cronin at Nov 19, 2017 12:30 PM |

I've recently been working on an application that will allow members of our content management teams to search within a chosen folder in Tridion for specific content. You might think that's well enough covered by the built-in search functionality, but we're heading towards a search and replace feature, so we pretty much have to process the content ourselves. In the end users' view of the world, a Rich Text field in a component has... well...  a rich text view, and, for the power-users, a Source tab where you can see the underlying HTML. That's all fine, but once you get to the technical implementation, it's a bit more complicated, and we'll end up replicating some of Tridion's own smoke and mirrors to present a view to the users that's consistent with what they are used to. This means not only that we need to be able to translate from text to HTML, but also from "XML in the XHTML namespace" to HTML. One of the bulding blocks we need to do this is the ability to take XML with namespace declarations, and get rid of them so that the result isn't in a namespace. 

A purist (such as myself) might say that the only correct way to parse XML is with an XML parser, and just in case you've never ended up there, I heartily recommend that you read this answer on Stack Exchange before proceding further. Still - in this case, what I want to do is amenable to RegExes, and yes, I know: now I have two problems. Anyway - FWIW - I started this at the office, thinking I'd just quickly Google for a namespace-stripping regex and I'd be on my way. Suffice it to say that the Internet is rubbish at this. I ended up with a page of links to rubbish regexes that just weren't going to float my boat. So I mailed the problem to myself at home, and today, in the quiet of a Sunday morning, it didn't seem quite so daunting. Actually, I'm still considering whether an XML-parser approach, or an XSLT might not be better, and I may end up there if my needs turn out to be more complex, but for now, here's the namespace stripper. 

static Regex namespaceRegex = new Regex(@"    
xmlns # literal (:[^\s=]+)? # : followed by one or more non-whitespace, non-equals chars \s* # optional whitespace = # literal \s* # optional whitespace (?<quote>['""]) # Either a single or double quote - giving it the name 'quote' for back-reference .+? # Non-greedily match anything \k<quote> # The end-quote to match the one we found earlier ", RegexOptions.Singleline | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
public static string RemoveNamespacesFromDocument(string xml) { return namespaceRegex.Replace(xml, string.Empty); }

Of course, this is written in C#, and I'm taking advantage of the IgnorePatternWhitespace feature in .NET regexes, which allows for the copious comments that might well be necessary if I ever have to actually read this code instead of just writing it. 

But just in case you are hardcore, and all that named matches and commenting fuss is for wusses, here's the TL;DR...

@"(?is)xmlns(:[^\s=]+)?\s*=\s*(['""]).+?\2"

What's not to like? :-) 

Tridion MVP retreat 2017

Tridion MVP retreat 2017

Posted by Dominic Cronin at Oct 22, 2017 12:06 PM |
Filed under: ,

It's become a regular feature of my year: the Tridion MVP retreat. This year I was fortunate enough to be invited again, and as usual it lived up to my expectations. So let me start by saying thank you to SDL for the invitation and hospitality throughout, and particularly to Carla and her team in Portugal for making it all a reality. Thanks also to the Tridion community: the award is firmly rooted there, and none of us would be there but for the inspiration that comes from helping each other and being helped the whole year through. 

Others have blogged about the technical wonders we produced at the retreat: web frameworks, diagnostic tools, scripting libraries, Tridion extensions and other kinds of voodoo. It always amazes me how much technical goodness comes out of the retreat, and this year was no exception. OK - so often enough, things don't get finished while we're still in Portugal, but they usually get finished. The great thing is getting all these initiatives started. I worked in a team with Jonathan Williams, Rick Pannekoek, and Siawash Sibani, trying to demystify some of the magic underlying the Experience Manager. We tried to figure out what the challenging questions are for implementers, and to get some solid answers for those. (Speaking of demystifying - special thanks to Rick for the extra time he spent helping me to get a much better understanding of DXA.) 

So what's so great about getting to be an MVP and going to the retreat? To be honest, it's hard to put your finger on any one thing. I could mention the great hospitality, and the fact that somehow I managed to put on two and a half kilograms in the four days of the retreat. What can you do? They keep taking you to great restaurants. It's become our tradition that every night, not only do we talk into the wee small hours, but we also make music. I could talk about the cultural visits (like to the catholic shrine at Fatima) or the spectacular wonders of nature (like the boat trip at Nazaré - famous for the highest wave ever surfed). 

Somehow, all of these things are great, and I enjoyed them all to the full, but still none of them are the defining feature of the retreat. Someone once said that if you're the smartest person in the room, you're in the wrong room. One thing is certain about the MVP retreat, and that is that you aren't going to be the smartest person in the room. Don't get me wrong, MVPs aren't selected for being smart, but somehow, they manage to be an inspiring group. The funny thing is, that talking to the guys - every single one of us felt that we were privileged to be surrounded by a bunch of people that would challenge us and bring us new insights. OK - maybe we all suffer from the impostor syndrome, but it's also true that each of us brings something different to the party. 

One thing I've noticed at previous retreats, and this time it was no different, is the way that the conversation can run from general chat about the state of the universe, to stupid jokes, to shared experiences from our working lives, and then without dropping a beat, you'll suddenly see bizarrely deep technical discussions break out like wildfire. In this company, all these things have equal value, and that is a special thing. 

For this reason, the image I've chosen to accompany this blog post is not of the surf at Nazare or the castle at Ourém but of a moment late at night, when the subject turned to JavaScript, and I suddenly realised that our resident web guru Frank Taylor had embarked on enlightening a small group about the joys of type coercion in that language. Don't ask me why, but this kind of thing breaks out spontaneously. If it wasn't Javascript it would have been content deployment archtecture or something else. You can't predict what's going to come up. I hope I'm there to see what it will be next time. 

New recipes at Tridion Practice

Posted by Dominic Cronin at Oct 17, 2017 08:01 PM |

It's been a while since we had a new recipe at Tridion Practice. Just to shake things up a bit, two in the same day!

Firstly a quick "throwaway" script, which might help you if you're trying to squeeze too many microservices into too small a memory footprint. 

And if you're in the market for something a bit more substantial, how about some provisioning scripts to help you get your Content Delivery microservices up and running

Don't forget: Tridion Practice is a community site. If you have anything to contribute, please get in touch. 

deployer-conf.xml barfs on the BOM

Today I was working on some scripts to provision, among other things, the SDL Web deployer service. It should have been straightforward enough, I thought. Just copy the relevant directory and fix up a couple of configuration files. Well I got that far, at least, but my deployer service wouldn't start. When I looked in the logs and found this:  

2017-09-16 19:20:21,907 ERROR NonLegacyConfigConditional - The operation could not be performed.
com.sdl.delivery.configuration.ConfigurationException: Could not load legacy configuration
at com.sdl.delivery.deployer.configuration.DeployerConfigurationLoader.configure(DeployerConfigurationLoader.java:136)
at com.sdl.delivery.deployer.configuration.folder.NonLegacyConfigConditional.matches(NonLegacyConfigConditional.java:25)

I thought it was going to be a right head-scratcher. Fortunately, a little further down there was something a little more clue-bestowing: 

Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at com.tridion.configuration.XMLConfigurationReader.readConfiguration(XMLConfigurationReader.java:124)

So it was about the XML. It seems that Xerxes thought I had content in my prolog. Great! At least, despite its protestations about a legacy configuration, there was a good clear message pointing to my "deployer-conf.xml". So I opened it up, thinking maybe my script had mangled something, but it all looked great. Then some subliminal, ancestral memory made me think of the Byte Order Mark. (OK, OK, it was Google, but honestly... the ancestors were there talking to me.) 

I opened up the deployer-conf.xml again, this time in a byte editor, and there  it was, as large as life: 

Three extra bytes that Xerxes thought had no business being there: the Byte Order Mark, or BOM. (I had to check that. I'm more used to a two-byte BOM, but for UTF-8 it's three. And yes - do follow this link for a more in-depth read, especially if you don't know what a BOM is for. All will be revealed.

What you'll also find if you follow that link is that Xerxes is perfectly entitled to think that, as it's a "non-normative" part of the standard. Great eh?

Anyway - so how did the BOM get there, and what was the solution? 

My provisioning scripts are written in Windows PowerShell, and I'd chosen to use PowerShell's "native" XML processing, which amounts to System.Xml.XmlDocument. In previous versions of these scripts, I'd used XLinq, but it's not really a good fit with PowerShell as you can't really use XPath without extension methods. So I gave up XLinq's ease of parsing fragments for a return to XmlDocument. To be honest, I wouldn't be surprised if the BOM problem also happens with XLinq: after all, it's Xerxes that's being fussy - you could argue Microsoft is playing "by the book".

So what I was doing was this. 

$config = [xml](gc $deployerConfig)

Obviously, $deployerConfig refers to the configuration file, and I'm using Powershell's Get-Content cmdlet to read the file from disk. The [xml] cast automatically loads it into an XmlDocument, represented by the $config variable. I then do various manipulations in the XmlDocument, and eventually I want to write it back to disk. The obvious thing to do is just use the Save() method to write it back to the same location, like this: 

$config.Save($deployerConfig)

 Unfortunately, this gives us the unwanted BOM, so instead we have to explicitly control the encoding, like this: 

$encoding = new-object System.Text.UTF8Encoding $false
$writer = new-object System.IO.StreamWriter($deployerConfig,$false,$encoding) $config.Save($writer) $writer.Close()

 As you can see, we're still using Save(), but this time with the overload that writes to a stream, and also allows us to pass in an encoding. This seems to work fine, and Xerces doesn't cough it's lunch up when you try to start the deployer. 

I think it will be increasingly common for people to script their setups. SDL's own "quickinstall" doesn't use an XML parser at all, but simply does string replacements based on its own, presumably hand-made, copies of the configuration files. Still - one of the obvious benefits of having XML configuration files is that you can use XML processing tools to manipulate them, so I hope future versions of the content delivery microservices will be more robust in this respect. Until then, here's the workaround. As usual - any feedback or alternative approaches are welcome. 

Finding the powershell profiles you actually have

Posted by Dominic Cronin at Sep 09, 2017 08:28 AM |
Filed under: ,

Many of you Powershell aficionados out there will be familiar with the fact that there are four separate locations where you can place a profile script. These scripts will run when you start the shell, and that allows you to get some default stuff set up. (How hard can it be? Well actually, much harder!)

Today I got irritated with the fact that I can never find which profile I've put something in. It starts with a vague recollection of "didn't I have something in my profile for that?". Then I start by opening a shell and typing:

notepad $profile

... and thereby opening up my $profile.CurrentUserCurrentHost - which to be fair is where I put most stuff. Not there eh? Ok, let's go looking for the other profiles. So I type:

notepad $profile.<TAB><TAB>

and end up at

notepad $profile.AllUsersAllHosts

Then notepad tells me that this one doesn't exist, so I end up going through the same steps for the other two profiles. Especially on a system where they aren't there, it's just irritating. So I put this in my profile (yes, the CurrentUserCurrentHost one, but actually AllHosts would be better, eh?):

function get-profiles {
    $profile.PSobject.Properties | ? {Test-Path $_.Value} | select Name, Value
}

Now all I have to do is remember that I put it there.

Connecting to Microsoft SQL Server Developer from Tridion Content Delivery

I've recently been setting up a development image for SDL Web 8.5, and as it's only for use on my development rig, it's fair game to use Microsoft SQL Server Developer edition. It's not supported by SDL, but it's close enough to make it a reasonable risk for my purposes. I got the databases set up and the content manager installed OK, so I moved on to the content delivery stack. 

First I hacked together a database test script to make sure I had all the logins correct etc. I've done it this way for years, and you may have seen my blog about it quite a long time ago.  Everything seemed fine. 

I'd started with the Discovery service, and I'd configured the cd_storage_conf.xml with the relevant database settings I'd just tested. How hard could it be? Except that it didn't work. I got messages in the logs telling me to check my firewall. Doh! Off I went and opened up the firewall ports for my microservices (which I'd forgotten to do) and also 1433 for MSSQL. Still no joy. 

Somewhere along the way I'd also disabled loopback checking and double-checked a bunch of other things that can cause trouble. No joy. 

I went back to my database test script a few times. It uses a System.Data.SqlClient.SqlConnection to execute a simple command. The connection string specifies '(local)' as the server. I'd had trouble with using '(local)' in the cd_storage_conf.xml in a previous version of Tridion, so I had specified 'localhost' instead, and then when that didn't work, a different name that mapped to the same interface. Still nothing. 

The troubling thing was that the test script worked fine. Why was that, when Tridion's java stack had trouble doing the same thing? I should have cottoned on to this way earlier, but eventually I started checking to see if there was actually anything listening on 1433. No there wasn't. Well that helped. And then I started poking around in the network configuration of SQL Server. Sure enough: TCP/IP wasn't enabled. I'm still not sure if this is a Developer edition thing. I seem to recall having come across it before. I'm not the only one. Now that I know the answer, finding a suitable Stack Overflow answer is easy! Maybe I'd had trouble with SQLEXPRESS. 

Anyway, at least that explained why my test script worked OK. The SqlConnection client sees '(local)' and is then able to attempt a named pipes or shared memory connection as well as TCP/IP. The java client, on the other hand, doesn't have this repertoire of options and if TCP/IP fails, it's over.

Anyway - now it's fixed. Just time for a quick Note To Self, and on with the rest of my system. 

Character encodings and the SDL Web 8 deployer - a journey through double-encoded UTF-8

Posted by Dominic Cronin at Aug 23, 2017 09:45 PM |

I spent some time yesterday and today working with a colleague to resolve an encoding issue in our new SDL Web 8.5 publishing systems. It's a migration from an older Tridion implementation that manages several portals, including a very old one in which the default encoding is ISO-8859-1. 

For various historical reasons, even for the portals which use UTF-8, the code page has always been set explicitly in the template, using something like setCodePage(1252) or setCodePage(65001) in the vbScript of the page template. (The pedantic among you may have noted that code page 1252 is not the same as ISO-8859-1, and even though some of the characters we were having trouble with were, indeed, quotation marks in the control codes range, I'm going to let that particular distinction slide for the purpose of this blog post. An exercise for the student, as they used to say... ) 

So most of the sites are in UTF-8, and had setCodePage(65001) in the templates. These worked fine with the out-of-the-box installation of the deployer service. Even the gnarliest of funky characters were transmitted faithfully from end to end. The trouble was with the old site that had code page 1252. On this site, any vaguely interesting characters were incorrectly displayed. OK - this might not have been too much of a surprise. 

In SDL Web 8, publication targets have been replaced as part of the move to the new "Topology Manager"-based architecture. So where we'd previously had the option to specify a default encoding on a publication target, now the matching configuration had moved to the deployer. (Or at least to the CD environment - strictly it's a Deployer Capability which is exposed by the Discovery service.) The general assumption seems to be that all sites sharing a deployer will also share an encoding. It's not actually so daft an idea. Most sites these days just use UTF-8 and have done with it. Even if you really, really, really want to have sites with different encodings, well you could always run up another environment, couldn't you? Microservices FTW!

By the time we'd come to this understanding, my colleague had already spent quite some time experimenting with different settings. We'd ended up being able to show that we could get one or the other working, but not both at the same time. We didn't want to set up extra CD environments throughout the DTAP, so the obvious approach was to fix up the old site to use UTF-8. What's not to like? In the beginning I hadn't realised that the old site also used setCodePage(1252) - it was buried pretty deep. So my first approach was simply to get into the templating and fix up the JSP page directive so that we were sending the right contentType header, and specifying pageEncoding="UTF-8". However... no joy.. we still had bad characters, so I then dug deep enough to find the relevant routine. I duly changed it to setCodePage(65001) and smugly headed off to get a cup of coffee while it all published. 

By the time we had some published output to look at, we realised, that the "interesting" characters were now double-encoded UTF-8. (You can usually tell this just by looking. You tend to see pairs of characters, the first of which is often an accented A, like å or Ã.) So what was happening?

TL;DR

  1. It turns out that even in Web 8, the renderer is capable of creating transport packages in a variety of encodings. If you specify 1252 programatically in the template, the page in the zip file will be encoded with that encoding. Likewise for 65001/UTF-8. Not only will the renderer use the specified encoding, but it will tell the truth about this when it writes the <codepage> element in the pages.xml file. 
  2. With neither a publication target nor a programatically specified code page, UTF-8 will be used in the transport package. 
  3. No further encoding will take place until the package reaches the deployer and is unzipped. 
  4. When reading the newly received page, the deployer will use the current default encoding of its JVM. If you don't specify this, the default will be the default encoding of your operating system. On Windows, usually code page 1252, and on Linux usually UTF-8. (Obviously, this means it's ignoring the information about encoding that's embedded in the deployment package. You could argue that this might be a bug.)
  5. The installation scripts for the deployer configure the service to pass various arguments to the JVM on startup, including "-Dfile.encoding=UTF-8". This matches the assumption that you have no publication target and the incoming encoding is therefore UTF-8. 
  6. In our case, we left the Deployer Capability setting at UTF-8. 

 

The reason we had seen double-encoded UTF-8 was that after the various experimentation, we no longer had the -Dfile.encoding=UTF-8 parameter controlling the JVM startup. Without this, when we were successfully sending UTF-8 in the deployment package, it was being read in as cp1252, and then dutifully re-encoded to the encoding specified in the Deployer Capability registration: UTF-8. 

Without this setting, at one point we had also successfully used cp1252, with the output rendered correctly as UTF-8. 

Once we'd figured it all out, we got the whole thing working with all sites running UTF-8. This is almost certainly better than having to worry about a variety of different settings in your infrastructure.

As with any investigation of encodings, a byte-editor is your friend, and plenty of patience to look carefully at what you're seeing. In the end, you'll get there!