Dominic Cronin's weblog — Dominic Cronin's web site

XML Namespaces aren't mandatory, and tools shouldn't assume that they are.

Posted by Dominic Cronin at Jan 02, 2010 10:40 PM | Permalink

Filed under: XML, SDL Tridion, Tips

In his recent blog posting on XML Namespaces, James Clark questions the universal goodness of namespaces. Of course, there is plenty of goodness there, but he's right to question it. He says the following:

For XML, what is done is done. As far as I can tell, there is zero interest amongst major vendors in cleaning up or simplifying XML. I have only two small suggestions, one for XML language designers and one for XML tool vendors:

For XML language designers, think whether it is really necessary to use XML Namespaces. Don’t just mindlessly stick everything in a namespace because everybody else does. Using namespaces is not without cost. There is no inherent virtue in forcing users to stick xmlns=”…” on the document element.
For XML vendors, make sure your tool has good support for documents that don’t use namespaces. For example, don’t make the namespace URI be the only way to automatically find a schema for a document

It's the second point that interests me. During a recent Tridion project, there was a requirement to accept data from an external source as an XML document. I wanted to use a Tridion component to store this data, as this would give me the benefits of XML Schema validation, and controlled publishing. The document didn't have a namespace, although it had a schema. In order to get this to work with Tridion, I had to go to the provider of the document, and get them to add a namespace. Tridion wouldn't allow me to create a schema whose target namespace was empty. It seemed a shame that even when hand-editing the schema (so presumably asserting that I knew what I was about) the system wouldn't let me make this choice.

At the time, I just got the other party to make the change, and went back to more important things. Maybe there's some internal constraint in the way Tridion works that prevents them from supporting this, or maybe it's such an edge case that no-one was ever bothered by it. If the former, then I can't think what the problem would be; there's no reason to abuse the namespace to locate the schema. Tridion's quite happy enough to allow several schemas targetting the same namespace, so what's so special about the "no" namespace? In Tridion components, XML attributes (quite correctly) are in no namespace, but as long as the correct schema gets used for validation, so what?

I suspect it's more likely that this just comes under the "edge case" heading, in which case, perhaps they can improve it in a future release.

Dilbert finder

Posted by Dominic Cronin at Jan 01, 2010 04:45 PM | Permalink

Read comments (None yet)

A long time ago, I worked in a job where the boss was "pointy-haired", and we habitually referred to some of the manifestations of his pointy-hairedness as "mauve". Why was this? Well it all came about as a reference to a specific Dilbert cartoon. Tioday I followed a link from Raymond Chen and ended up at the Dilbert strip finder. I typed in database (why not mauve?), and the second hit was the cartoon that explains it all. What's not to like?

Down for everyone, or just me?

Posted by Dominic Cronin at Dec 12, 2009 11:15 PM | Permalink

Read comments (None yet)

I saw this mentioned in passing tonight in #gentoo. It's a simple enough idea, but it makes one particular task very easy. When you're stuck behind layers of networking devices, sometimes it's hard to know whether your site is working for people outside your firewalls etc.

http://downforeveryoneorjustme.com/www.dominic.cronin.nl

XML Schema validation from Powershell - and how to keep your Tridion content delivery system neat and tidy

Posted by Dominic Cronin at Dec 12, 2009 10:55 PM | Permalink

Filed under: XML, SDL Tridion, Powershell, Tridion

Read comments (None yet)

I don't know exactly when it was that Tridion started shipping the XML Schema files for the content delivery configuration files. For what it's worth, I only really became aware of it within the last few months. In that short time, schema validation has saved my ass at least twice when configuring a Tridion Content Delivery system. What's not to like? Never mind "What's not to like?" - I'll go further. Now that the guys over at Tridion have gone to the trouble of including these files as release assets - it is positively rude of you not to validate your config files.

Being a well-mannered kind of guy, I figured that I'd like to validate my configuration files not just once, but repeatedly. All the time, in fact. Whenever I make a change. The trouble is that the typical server where you find these things isn't loaded down with tools like XML Spy. The last time I validated a config file, it involved copying the offending article over to a file share, and then emailing it to myself on another machine. Not good. Not easy. Not very repeatable.

But enter our new hero, Windows 2008 Server - these days the deployment platform of choice if you want to run Tridion Content Delivery on a Windows box. And fully loaded for bear. At least the kind of bears you can hunt using powershell. Now that I can just reach out with powershell and grab useful bits of the .NET framework, I don't have any excuse any more, or anywhere to hide, so this afternoon, I set to work hacking up something to validate my configuration files. Well - of course, it could be any XML file. Maybe other people will find it useful too.

So to start with - I thought - just do the simplest thing. I needed to associate the xml files with their relevant schemas, and of course, I could have simply done that in the script, but then what if people move things around etc., so I decided that I would put the schemas in a directory on the server, and use XMLSchema-instance attributes to identify which schema belongs with each file.

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="schema.xsd"

OK - so I'd have to edit each of the half-dozen or so configuration files, but that's a one-off job, so not much trouble. The .NET framework's XmlReader can detect this, and use it to locate the correct schema. (although if it isn't correctly specified, you won't see any validation errors even if the file is incorrect. I'll hope to fix that in a later version of my script.)

I created a function in powershell, like this:

# So far this silently fails to catch any problems if the schema locations aren't set up properly
# needs more work I suppose. Until then it can still deliver value if set up correctly
function ValidateXmlFile {
    param ([string]$xmlFile       = $(read-host "Please specify the path to the Xml file"))
    "==============================================================="
    "Validating $xmlFile using the schemas locations specified in it"
    "==============================================================="
    $settings = new-object System.Xml.XmlReaderSettings
    $settings.ValidationType = [System.Xml.ValidationType]::Schema
    $settings.ValidationFlags = $settings.ValidationFlags `
            -bor [System.Xml.Schema.XmlSchemaValidationFlags]::ProcessSchemaLocation
    $handler = [System.Xml.Schema.ValidationEventHandler] {
        $args = $_ # entering new block so copy $_
        switch ($args.Severity) {
            Error {
                # Exception is an XmlSchemaException
                Write-Host "ERROR: line $($args.Exception.LineNumber)" -nonewline
                Write-Host " position $($args.Exception.LinePosition)"
                Write-Host $args.Message
                break
            }
            Warning {
                # So far, everything that has caused the handler to fire, has caused an Error...
                Write-Host "Warning:: Check that the schema location references are joined up properly."
                break
            }
        }
    }
    $settings.add_ValidationEventHandler($handler)
    $reader = [System.Xml.XmlReader]::Create($xmlfile, $settings)
    while($reader.Read()){}
    $reader.Close()
}

With this function in place, all I have to do is have a list of lines like the following:

ValidateXmlFile "C:\Program Files\Tridion\config\cd_instances_conf.xml"
ValidateXmlFile "C:\Program Files\Tridion\config\live\cd_broker_conf.xml"

If I've made typos or whatever, I'll pretty soon find them, and this can easily save hours. My favourite mistake is typing the attributes in lower case. Typically in these config files, attributes begin with a capital letter. Once you've made a mistake like that, trust me, no amount of staring at the code will make it obvious. You can stare straight at it and not see it.

So there you have it - as always - comments or improvements are always welcome, particularly if anyone knows how to get the warnings to show up!

SDL Tridion's YouTube marketing

Posted by Dominic Cronin at Dec 10, 2009 10:10 PM | Permalink

Filed under: SDL Tridion, Tridion

Read comments (None yet)

In recent times, SDL Tridion have been putting out marketing videos on their very own YouTube channel. Each one takes a fictitious (or perhaps anonymised) use-case for their products and presents a fairly non-technical view of it by means of animated graphics, with voice-overs in disconcertingly generic varieties of British English. Never mind British, it's English English, of a sort of carefully not-too-posh-but-not-too-common-either, home-counties kind.

It's a bit strange for me. Obviously, I'm not the target audience. The videos are aimed at non-technical people who are not familiar with Tridion. Over the years I've got to know Tridion pretty well, both as a company and as a product suite. From so close up, the videos have an almost surreal aura. The multitudinous mismatches between the marketeers' in-world reality, and my own, leap out at me.

Why British English, when it's a Dutch company, whose biggest growth market is the US? Why this particularly strange variant of British English? (OK - full disclosure here; I'm a Geordie!) Is this the SDL influence, or does this come from US marketeers?

And are the examples real? I'm curious.

And so on....

As for the content, at least the most recent one (SDL Tridion WCM Platform for Syncronizing Online Content) had me jumping up and down in a couple of places. Like this: "... I can always revert back to previous versions of any content. It is important for our compliance department that we have proof of how a web site looked at any date in the past". Talk about a confusing message. Firstly, normal versioning in Tridion is not intended for that, and won't meet that need. They have a separate product called Archive Manager, which could be used to implement a solution for this problem.

Of course, if their marketing department were to put out a message aimed at someone like me, they would almost certainly deserve to be fired on the spot. It just hits my weirdness button. Know what I mean?

9 Troubleshooting questions

Posted by Dominic Cronin at Sep 09, 2009 09:05 PM | Permalink

Read comments (None yet)

Some questions from Tess Fernandez to help when troubleshooting complex problems.

Tess Fernandez' blog (If broken it is, fix it you should) usually contains very technical content. Today she has posted a troubleshooting guide that will be of use to anyone who has to deal with complex problems. It comes in the form of 9 questions, with full details and examples to explain each question. For the impatient, here's the list, but I heartily recommend reading the entire post.

1. What is happening?
2. What did you expect to happen?
3. When is it happening?
4. When did it start happening?
5. How does this problem affect you?
6. What do you think the problem is? and what data are you basing this on?
7. What have you tried so far?
8. What is the expected resolution?
9. Is there anything that would prohibit certain troubleshooting steps or solutions?

The wrong level of abstraction, YAGNI, LOLA, and the fallacy of re-use.

Posted by Dominic Cronin at Jul 13, 2009 05:55 PM | Permalink

Read comments (None yet)

Almost as an aside, in his recent post "And get rid of those pesky programmers", Phil Haack spins out a link to Udi Dahan's "The fallacy of ReUse". Coupled with the fact that I'd just read Jeff Atwood's "The Wrong Level of Abstraction" and that this subject is a hobby horse of mine, well I have to write something, eh?

It's all about judgement, isn't it? I mean - the kind of judgement that you expect people to acquire as they gain experience in software development. When do you re-use something, and when do you just re-do the job. As a web content management (WCM) specialist, my own work often involves creating templates that generate web pages based on content entered into other parts of the system by authors and editors. My weapon of choice in this endeavour is Tridion. As with other WCM tools, the templating facilities provide you with the ability to write out the HTML of the page design, and inject specific items of data where you want. In my (not so?) humble opinion, this is exactly the right level of abstraction for dealing with the specified problem, viz. that of creating a web page.

Some people, given the same task and tools, will immediately factor every single thing into a reusable library of functions. I don't do this, and sometimes a new colleague will be surprised at this, or even say they think it's wrong. My first response is usually to tell them that as and when there is a second occasion when I need the same functionality, I will either factor it out to something more abstract, or maybe just wait for the third occasion. As the agilists will have it, You ain't gonna need it (YAGNI). The problem goes further, however. Often I am faced with a choice between writing a few lines of code locally, very closely focussed on the specific output I need, or using/creating some generic function that will give me the output I need, if only I can figure out exactly how to invoke it. Maybe it will almost do what I need. <blink>No problem, I can always tweak it up a bit and add a parameter or two. Right?</blink>

I recently came across a function whose purpose was to create hyperlinks. It would accept about ten parameters, and could produce perhaps 20 variations on the theme of a hyperlink. In order to use this function, you have to open it up and read it, otherwise you don't know how its going to behave. (I'd defy anyone to thoroughly document the consequences of combining the parameters in different ways.) So a clear and present danger in creating an abstraction is that you create aleaky abstraction. (With more than a nod in the direction of Joel Spolsky's splendid article on the Law of Leaky Abstractions, or LOLA to her friends!)

So wrapping up this trivial stuff into an abstraction just made it non-trivial to work with. Result! Actually - all the function really did was dispatch to (hopefully) the correct code snippet based on the parameters. Having the code snippets locally in the templates would have been much easier to read and less error prone. The API provided by the vendor was already at the right level of abstraction, and further abstracting it was probably a mistake.

Of course, it's not as simple as that - otherwise we would never have started down this road in the first place. There is benefit in creating library functions, but mostly it's not about saving you from writing the same three lines of code in a bunch of places. It's often the case that what you want is for any change in the behaviour of your function to be propagated automatically to all the places it's called from. The only trouble is, you can't tell after the fact whether all the people you called your code knew that this was your intention. Udi Dahan notes that the characteristic property of code which you want to reuse is that it is generic. He also notes that re-use introduces dependencies, and that dependencies are what cause all the pain and grief of software maintenance.

Jeff Atwood's problem was the other way round. His choice had been to programme at a lower level of abstraction, when perhaps choosing to use a higher level library written by domain experts would have saved him from mistakes caused by him not understanding the low-level implementation details well enough.

So when you think about moving some code into a library, think about why you're doing it. If your purpose is to capture expertise that your client programmers don't have, you are probably doing a good thing. In this case, your library will probably have a very clear, tightly focussed purpose. There must be no possible ambiguity (or leaks, if you prefer) in the API you expose. When I invoke YAGNI against prematurely factoring code out into libraries, I can hope that this introduces a shakedown period between on the one hand, needing some code, and on the other hand, needing to introduce a layer of abstraction. Without this shakedown period, it's quite unlikely that my decision to abstract will be conscious enough to encompass those questions:

Will my library be generic?
Is it a genuine (watertight?) abstraction, in that a naive programmer can simply call it with the assurance that the domain experts have done their job within?
Am I happy that whatever updates and bugfixes the library gets will be appropriate for all its clients?
Will I have problems updating the library because I don't dare break its clients?

Of course, the "domain expert" might be the programmer who writes the client. Even so, getting the interface right is just as important for your own peace of mind. Then you can afford to forget some of the detail of the inner workings.

Coming back to re-use: we are often faced with managers who assume that re-use is always good. It is up to us as technicians to make the right judgements about what should be re-used, and then if necessary inform and explain. As Udi implies, building for re-use is often synonymous with lock-in to maintenance hell. Sometimes, the right decision is to build disposable code on top of cleanly factored abstractions. The HTML templating I began with is a good example. If the site design changes, it probably makes sense just to throw the old templates away and start again. Because the underlying API gives me clean access to the various fields in the content, I can put together another page template very quickly - probably quicker than figuring out all the dependencies under the old design. Jeff's example of JQuery is another good one. Because the library is so powerful and tightly engineered, it's not going to take you long to simply re-implement your desired behaviour when the design changes.

Re-use is unlikely to be successful unless the code being re-used was explicitly designed with that in mind. Simply factoring some code out to keep it tidy isn't the same thing as building a library. Be careful that you don't accidentally give the impression that your code is intended, designed and implemented for re-use if it isn't. It's OK to write disposable code too, especially if you know the difference between the two.

Server upgrades - portage hell and beyond

Posted by Dominic Cronin at May 31, 2009 08:35 PM | Permalink

Read comments (None yet)

Oh what fun you can have once you start to upgrade a previously working server.

So it's now a couple of months ago that I decided to blow the dust off some unfinished business on my Gentoo server. I'd always thought it would be nice to get a desktop running, but it had never been a priority. After all - the machine is used as a server. It sits under the telly and serves up the occasional web page. A desktop is definitly a "nice-to-have" feature rather than an essential one. As usual with such things, it expanded out into a mega-project.

I'd previously tried to get KDE running, and now I'd be installing Gnome, I'd be adding a lot of packages and uninstallling quite some others, so the first order of the day was some general housekeeping and clean-up. Gentoo linux is characterised by the portage package management and build system. You specify that you want to use a particular package by using the emerge utility, and the relevant code is pulled from the servers, compiled and installed automatically. If the module you've requested needs some other module, that will be built too, and so on - from there it's turtles all the way down.

Portage keeps track of the packages you've specified under the heading of "world". You can sync your system with the latest releases, and then rebuild world. Lots of people do this very regularly, and thereby ensure that they have the latest-greatest versions of everything running on their computers. (Obviously, you can also specify that you want specific versions for some things.) Anyway, I started off in the standard way by running emerge --sync and then emerge --update --deep --newuse. Then I watched the breakage roll in. It had been a couple of years since I'd done this, and I'd definitely never done it after my previous abortive attempts at a desktop, so my system was, frankly, a mess. The main problem is that you get one package that isn't compatible with another. Then you have to fiddle with versions, or just uninstall things just to get it working. The longer you leave it, the bigger a puzzle it is. To put it in perspective, people who use Gentoo for their desktop machine often get it in sync every day. On a server you don't update much, you might let it go six months. A couple of years is just crazy.

Quite some time ago, I had realised that using portage to keep my Zope/Plone server up to date would just be a world of pain. There are just too many dependencies within the Python/Zope/Plone stack. Specifically, the problem is that to upgrade, you pretty much need to run the old version side-by-side with the new one while you upgrade your data. Portage generally lets you have only one version of a package at any given time. There are some rare exceptions, in which case separate versions can co-exist, in what are known as "slots". For Zope and Plone, I chose instead to use Buildout, which does something similar to portage, but for python-based systems. This worked fine, but it meant that portage didn't know about these systems. (You can install them using portage, but not at the versions I needed to satisfy some of the plone add-ons I was using.)

So I wrestled with portage through the wee-small-hours night after night - got all sleep-deprived and cranky, and otherwise wasn't much further forward. Quite a long time later, I emerged (pun unintentional, but acknowledged) with a system where all the dependencies lined up beautifully. Everything was clean, and all desktop-ish things were consigned to oblivion.

Great stuff, I thought - now we're cooking. So I went ahead and started to add Gnome to my system, only to discover that I needed to juggle with yet more dependencies in order to get Gnome installed. (Gnome requires quite a few packages, and most of them have their own requirements too!) Never mind - a few days later I was good to go. Super!

Except... aaargh... along the way, I'd broken my web site. You see, to get Gnome running (at least at the latest version, which is what I wanted), I'd installed Python 2.5. Why not, eh? After all, 2.4 is pretty old hat; they even have 2.6 these days. But - it's Zope, you see. I'm running Plone 3.1.7, which requires Zope 2.10.6, which requires Python 2.4.4. So when I went to start up my web site, it didn't - Zope pretty comprehensively didn't want to run on Python 2.5. Ouch!

But no problemo! After all, Python is one of the very few things in Gentoo portage that has slots. You can have Python 2.4 and Python 2.5 side-by-side. So then you can have either one as the default. You could fix up the symbolic links to /usr/bin/python by hand, but there's a utility that helps you to do this quickly and accurately (as well as supporting several other system configuration options), so I "emerged" eselect, and was able to switch between the two. After switching to 2.4, Zope should just work again, eh. After all, it had worked on 2.4 before....wrong!

It turned out that I could get the result I wanted by doing a full emerge of Python 2.4, but simply switching between the two wasn't enough. The full emerge, would also take care of recompiling all the libraries that linked to Python. What I needed was the "python-updater" utility. This is a script that rebuilds all the stuff that's known to work differently on the two versions of Python. If I switched to Python 2.4 and ran python-updater, everything was fine.

But the story goes on. Running Python 2.4 as my system default Python was enough to get the web site back in the air, but presumably this would be a big spanner in the works of my project to get a desktop running. (Actually, not just a desktop - I'd added some other packages such as mono-develop, which represent my motivation for wanting a desktop in the first place, but some of these also had Python dependencies).

After a bit of research, I realised that if I could specify the Python executable I wanted in my buildout configuration. (using the "executable" directive in the [buildout] section). This meant I could have the default Python be 2.5 and still have Zope/Plone use 2.4. I tried this, and it worked, and the web site was back up and I breathed a sigh of relief. Then I thought - let's get back to a clean system, so I ran python-updater.... and huge skip-loads of fail came raining down. Zope seemed to start OK, but my Plone sites were hosed, and had some bizarre message that didn't point anywhere useful whatsoever. Grrr... so I set the system Python back to 2.4, ran python-updater, and all was well with the world. At this point I left the system alone for a couple of days just to recover a little tenacity - sometimes you have to.

When I came back to the problem, I flipped everything back into the broken configuration, and started methodical debugging. After quite a few false starts, someone suggested that I start Zope in debug mode. I did this, and the problem was immediately apparent. While starting up, Zope was reporting that it couldn't find PIL, the Python Imaging library. Well PIL is pretty clearly a dependency for Plone, so that was the problem. Re-visiting python-updater a couple of times showed that one of the things it does is to remove Python's site-packages from whichever Python you're switching away from. (I don't understand why it does this. Is that a bug?)

This new development led me to the conclusion that if, instead of having the buildout rely on the PIL installation in Python's site packages, it had its own local copy, I could just leave it wired up to Python2.4, and it wouldn't matter that switching to 2.5 and running python-updater would make the 2.4 site packages disappear. But how to do this? I spent a little time investigating the possibility of using buildout to download and build a local PIL. I'd seen some hints on the web that you could use a buildout recipe to do this, so I tried using a buildout recipe that was supposed to do this, however it was so poorly documented that I was reduced to guesswork and soon fell back to plan B. (Fortunately, I can't remember the name of the recipe, and Googling for it a second time didn't turn it up. What it did turn up was this, which might also work.)

Plan B was to attempt something using virtualenv which will create a virtual python environment locally for you, into which you can deploy whatever packages you want. I was just about to go along this road, but it looked quite tricky to figure it all out and integrate with my existing buildout, so I asked on the #plone channel on freenode IRC and someone suggested I look at PILwoTk. At first I thought I'd rather look straight into the sun, but I simply added the egg to buildout.cfg and it just worked. I don't know where PILwoTk is to be found or why it has such a bizarre name, but it's got my system working with a local PIL in my buildout. Great.

All I need to do now is finish getting my desktop working. Of course, it won't be a real desktop, because there's no desk, and definitely no screen or keyboard. Watch this space.

Amsterdam

Posted by Dominic Cronin at Apr 24, 2009 10:25 PM | Permalink

Read comments (None yet)

Ingrid's "Where I live" post highlights some of the bizarre perceptions people have of Amsterdam and why a person might choose to live there. When I first came here, I was struck by the number of my countryfolk whose immediate thought was "you'll be stoned all the time then".

For sure, the regulations concerning cannabis are relatively liberal here. For me that mostly means that I know that there may possibly be some in the house, but I have no idea where it might be. Maybe it was the victim of a spring-cleaning frenzy. In less liberal regimes, you damn well know where your stash is at.

Predicate

Posted by Dominic Cronin at Mar 25, 2009 01:15 AM | Permalink

Read comments (None yet)

It's a predicate. That's the name of the thing, a predicate. Why can't I remember that?

So there I was, trying to explain some XPath thing to one or two of the guys on the project, and I kept hitting this mental block. The things in the squaredy-brackets; what the heck is that called? You know - those things that denote some sort of truth value that indicates membership in a set. The flippin' WHERE-CLAUSE thingys in an XPath. Jeez - you know what I mean anyway - why is it this hard?

It's called a predicate. I knew that - really!

Some words don't stick in my brain. I used to try to conquer this problem by solving crosswords, but then you end up with crossword compilers with warped minds. I recall being driven to distraction by one crossword clue which used gambolas a synonym for capriole. How can you cope with that?