Skip to content. | Skip to navigation

Personal tools


You are here: Home / weblog

Dominic Cronin's weblog

XPath and the dreaded distinction between default namespace and no namespace.

Posted by Dominic Cronin at Jun 12, 2006 10:00 PM |
Filed under:

XPath and the dreaded distinction between default namespace and no namespace.

I thought I was pretty much an old hand at XML by now, and that the standard gotchas wouldn't catch me any more. Not so!  The standardest gotcha of 'em all jumped up out of the slime today like something undead and there I was, gotcha'd again.


I suppose I was led astray by putting too much trust in the notion that the methods surfaced in an API would always be relevant. Stupid really, but perhaps if you follow along with the story you'll forgive me.  The API in question was that of the .NET framework; specifically the XmlNamespaceManager class.

Let's say you have some XML like this:


<?xml version='1.0'?>
<a:one xmlns:a='aaa'>
	<two xmlns='bbb'/>


If you have this in an XmlDocument and you want to XPath to the < two/> element, the first thing you'd usually do is create an XmlNamespaceManager and add the namespaces that you want to use in your XPath expression. This allows you to create a mapping between the namepace prefixes in your expression and the namespaces they represent. In the document itself, the same thing is achieved by the namespace declarations you can see in the sample above, but these don't exist in the XPath expression; it needs its own namespace context, or it won't be able to address anything that's in a namespace. But more of this just now...


So there I was today trying to demonstrate some techniques to a colleague, and I wrote some code something like this:

            NameTable nt = new NameTable();
            XmlNamespaceManager nm = new XmlNamespaceManager(nt);
            nm.AddNamespace("a", "aaa");
            nm.AddNamespace(String.Empty, "bbb");

            if (null == doc.SelectSingleNode("a:one/two", nm))
                Console.WriteLine ("Couldn't match default namespace");
                Console.WriteLine ("Matched default namespace");

... and to my horror the match failed. After a few quick changes I had something like this:

            nm.AddNamespace("b", "bbb");
            if (null == doc.SelectSingleNode("a:one/b:two", nm))
                Console.WriteLine("Couldn't match b: namespace");
                Console.WriteLine("Matched b: namespace");

... and on this occasion the match succeeded.


What was going on?  Could I possiibly be the first human to set foot on a previously undiscovered bug in the framework? Suffice it to say that my moment of glory will have to wait. A quick hunt around on Microsoft's web site showed that half a dozen other people had reported this as a bug, and that Microsoft's response was "Won't fix."


What was going on?

Five or so years ago Martin Gudgin and the other luminaries teaching Developmentor's "Guerilla XML" course, had gone to extraordinary lengths to teach me and my fellow victims the difference between a node that is in a namespace, and one that isn't. Sorry Martin, I failed you.

It turns out that in the XPath standard it says the following

A QName in the node test is expanded into an expanded-name using the namespace
declarations from the expression context.  This is the same way
expansion is done for element type names in start and end-tags except
that the default namespace declared with xmlns is not used: if the QName does not have
a prefix, then the namespace URI is null (this is the same way attribute names are 
expanded).  It is an error if the QName has a prefix for which there is
no namespace declaration in the expression context.

So - in XPath, no prefix means not in any namespace at all, just like for attributes. Microsoft's implementation is correct. Otherwise, if you had XML like this:

<?xml version='1.0'?>
	<two xmlns='bbb'>
		<c:three xmlns:c="po"/>

... you'd be unable to write an XPath from the root element down to <c:three/>.


Now for the part where you forgive me for my stupidity:

If you look at the documentation for XmlNamespaceManager, it states very clearly that you can use the empty string to set the Default namespace. (To be fair, the documentation for selectSingleNode has a note which attempts to clarify matters.)


If you look at the rest of the API of XmlNamespaceManager, the reason for the confusion becomes clear. XmlNamespaceManager also supports methods like PushScope, GetNamespacesInScope, etc. which plainly aren't intended for use with XPath at all. It looks rather as though you could use an XmlNamespaceManager for managing your namespaces as you navigate through an XmlDocument, perhaps with a streaming library. In that context, setting the default namespace makes perfect sense. If you're using it with XPath though, it's completely irrelevant.


So repeat after me: A default namespace isn't the same thing as the empty namespace (aka null namespace, no namespace, not in a namespace, etc.)

Pulling website information out of the IIS metabase

Posted by Dominic Cronin at May 24, 2006 10:00 PM |
Filed under: , , ,

Pulling website information out of the IIS metabase

Someone recently asked me how to find the URL where the Tridion user interface is running. The idea was to automate the set-up for Tridion Site-Edit. The following snippet of code doesn't solve their problem, but for me it was a bit of fun exploring how to pull up this data out of the IIS metabase using C#. Although this blog entry is categorised as "Tridion", this code isn't particularly Tridion-specific. It's sufficient to show that it's not particularly painful to work with IIS programatically. You can also write to the metabase using similar techniques.


using System;
using System.Collections.Generic;
using System.Text;
using System.DirectoryServices;

namespace Hinttech.Dotnet.Samples.WebSiteDumper
    class Program
        static void Main( string[] args )
            using (DirectoryEntry webServers = new DirectoryEntry("IIS://localhost/W3SVC"))
                foreach( DirectoryEntry server in webServers.Children)
                    PropertyValueCollection serverComment = server.Properties["ServerComment"];
                    if ( serverComment.Value != null && serverComment[0].ToString() == "Tridion Content Manager")
                        Console.WriteLine("The Tridion web site is running on: ");
                        // If you want the https sites too, you need to do the same thing for "SecureBindings"
                        foreach ( string serverBinding in server.Properties["ServerBindings"] )
                            string[] serverBindingParts = serverBinding.Split(':');
                            string ipAddress = serverBindingParts[0];
                            string port = serverBindingParts[1];
                            string hostHeader = serverBindingParts[2];
                            if (string.IsNullOrEmpty(ipAddress))
                                "\tIP Address = {0}\n\tPort= {1}\n\tHostHeader= {2}",
                                ipAddress, port, hostHeader );


Xml namespace weirdness in MSXML4

Posted by Dominic Cronin at Apr 28, 2006 10:00 PM |
Filed under:

Xml namespace weirdness in MSXML4

This post has been a while brewing. It began a few months ago when I was working on a project with my colleague Rob Wittenbols. He was building an XML document using the MSXML4 DOM API, and he wanted to have the same degree of control over where namespace prefixes would be declared as if he was deserializing the document from a text file. We discussed the subject briefly and then went off to do other things. Later in the day, we'd obviously both been pondering the issue, and had come up with different answers.

I had looked at the API and discovered that it was impossible to do what he wanted. The API didn't support it. You could create a node, and put it in a namespace, and even give that node a prefix, but you couldn't declare another namespace prefix at the same time. That made perfect sense, I said, because namespace prefixes were purely to do with the serialized view of the document, and where they were declared was generally irrelevant when working in the DOM. As long as you could get each element in to the correct namespace, you'd be fine.

Rob had taken an almost opposite view, and come up with a bit of practical hackery that blew my theoretical argument out of the water. He'd just escaped the final quote of his namespace value and carried on to add another one. Something like this:

dom.createNode(1, "xxx:child", "xxx"" xmlns:y=""yyy")

(Surprisingly to me) this worked, and allowed him to achieve what he wanted, and I couldn't argue with him because his program was intended to output serialized XML and he wanted it in a particular way. To be fair, a good DOM implementation would provide a clean way to do this. (OK - this is an edge case, and in many other respects MSXML4 is an excellent implementation.)

Getting back to theoretical niceties, I went up and looked up the recommendations for XML namespaces. A namespace name must be a URI as defined in RFC2396, and there the use of double quotes is excluded from use in a URI:

  The angle-bracket "<" and ">" and double-quote (") characters are excluded because they are often used as the delimiters around URI in text documents and protocol fields.


So strictly, Rob's hack is cheating :-) , but it got the job done for him. (Note that in this context we don't have to worry about the single-quote character (') because when MSXML4 serializes the Dom, it uses double-quotes to delimit the namespace name.)

The current recommendation, XML Namespaces 1.1, specifies that a namespace name should be an Internationalized Resource Identifier, and the standard for those (currently a draft) doesn't mention double-quotes at all.


Not long after this discussion, I almost got as far as writing about it when Don Box made a posting on a related topic, asking how namespaces ought to be compared for equivalence, as the Microsoft implementations do not perform encoding or unencoding when comparing. The responses were fairly clear. The Microsoft implementation is correct in this respect: Namespaces are compared as strings.

To quote from XML Namespaces 1.1:

IRI references identifying namespaces are compared when determining whether a name belongs to a given namespace, 
and whether two names belong to the same namespace. [Definition: The two IRIs are treated as strings, and they are
identical if and only if the strings are identical, that is, if they are the same sequence of characters.] 
The comparison is case-sensitive, and no %-escaping is done or undone.

So that's pretty straightforward. When comparing namespaces for equivalence, you don't do anything with the string. Having said that, I think the implementation of MSXML4 is broken. It should check for the double-quote character. Here's a snippet of vbscript that shows why:

Set dom = CreateObject("Msxml2.DOMDocument.4.0")
dom.async = false
dom.loadXML "<?xml version='1.0'?><root/>"
Dim child
' VBScript escapes double-quote as double double-quote
Set child = dom.createNode(1, "xxx:child", "xxx"" xmlns:y=""yyy")
dom.documentElement.appendChild child
dom.setProperty "SelectionNamespaces", "xmlns:a='xxx' xmlns:b='yyy'"

' This XPath will fail, as 'xxx' is not the same as 'xxx"" xmlns:y=""yyy"'
If dom.selectSingleNode("//a:child") is nothing then
    WScript.Echo "Can't xpath to xxx:child"
    WScript.Echo "Can xpath to xxx:child"
End if "c:\temp\tempdom.xml"
dom.load "c:\temp\tempdom.xml"

' The rather bizarre namespace name doesn't survive the round-trip to disk, and this XPath succeeds
If dom.selectSingleNode("//a:child") is nothing then
    WScript.Echo "Can't xpath to xxx:child"
    WScript.Echo "Can xpath to xxx:child"
End if 

Its this inconsistency that makes me say it's broken. The document represented by your DOM should always survive a round-trip via the serialization mechanism. Either the serialization mechanism has to find some way to represent the namespace name with the double-quote in it, (Is there any reason why you couldn't say <xxx:child xmlns:xxx="xxx&quot; xmlns:y=&quot;yyy"/>) or you have to regard the double-quote as illegal in a namespace name.


That's just the purist in me speaking. On the other hand, I can't think of a good use case where a mended implementation would be better than the currently broken one. At least the broken version lets Rob serialize how he wants.

Benefits of continuous integration

Posted by Dominic Cronin at Apr 05, 2006 10:00 PM |

Joanna Rothman notes the following benefits of continuous integration:


  • Continuous integration provides early feedback to developers.
  • The PM can see every day if the build is broken. (I use this as a predictive metric.)
  • There's less to check in every day, so it's easier to see where the problems that broke the build occurred.
  • It's much more obvious earlier whether we have enough tests to know if the build is any good.



and asks us if we have more to add.


For myself I'd note the following:


  • Developer morale benefits from knowing that there isn't an invisible mountain looming up in front of you
  • Those who mess things up are forced immediately to switch their efforts to undoing their evil, which (hopefully) means they can't be busy digging a deeper hole
  • Developer morale benefits from knowing that if you are the one to screw up, the scope of your crime will be small enough that no-one will actually hate you for it (unless you screw up a lot)
  • If you screw up a lot - there's no-where to hide. Developer morale benefits again.

Total quality

Posted by Dominic Cronin at Mar 25, 2006 11:00 AM |
Filed under:

Total quality

I used to work in manufacturing industry. I made cookers and fridges, and mechanical excavators, and bits of trains. I was a manufacturing engineer, and I spent my time trying to make it better. Much like now, when I'm a computer programmer (hey, that shit was 15 years ago or something), and every day I'm trying to make it better.

Back then, it was a given that it was impossible to ship a 100% defect-free car (or substitute for car any manufactured item of equivalent complexity). Nowadays, it's a given that you can't ship defect-free software (or at least that it would be prohibitively expensive to do so).

So when you buy a car nowadays, how likely do you think it is that you'll be able to drive it for at least six months without taking it to the garage? Pretty likely eh?

And now to the crux of it - we've come to accept that it's impossible (or commercially infeasible) to produce defect-free software. Fifteen or so years ago, the Japanese motor companies started shipping defect-free cars as a matter of routine. The rest of the world's car manufacturers couldn't hide any more. They had to start doing the same thing.

Software development techniques are at a similar crossroads right now, with test driven development etc. Who will be the first to start routinely delivering 100% defect-free software products?

Who said it couldn't be done?

Component linking and magical parameters in Tridion's XSLT component templates

Posted by Dominic Cronin at Jan 28, 2006 11:00 PM |
Filed under: ,

I was recently asked a question about XSLT templating in Tridion. The question was about how you create component linking code from within your XSLT component template. The tricky part is that you need a publication ID and a component template ID. My solution involved getting these in the page template and passing them in to the XSLT as parameters. OK - job done - it's just coding from there eh? Well yes - but it didn't stop being interesting at that point, because my questioner in the meantime had come up with his own solution based on some sample code that he'd found on the Tridion forum. He hadn't needed to add the parameters himself, because Tridion adds some parameters for you as if by magic.


It turns out that if you include any of the following three global parameters at the top of your XSLT, Tridion will add the corresponding DOMs as parameters as if by magic.

<xsl:param name="tcm:Page"/>
<xsl:param name="tcm:Publication"/>
<xsl:param name="tcm:ComponentTemplate"/>

This is very cool, and saves you the effort of coding it, but I don't know whether it's a supported feature. I suppose it should be, because it's really just the standard behaviour for a component template translated to XSLT. I just can't find it anywhere in the documentation.


There is a gotcha, mind you! Although the parameters are declared in the "tcm:" namespace, which resolves to "", don't be tempted to use a prefix other than "tcm:" for this namespace. If you do, you'll find that there's something broken; the parameters don't seem to be added automatically. (Tested on R51SP4) Still - as it's an undocumented, and perhaps unsupported feature, we can't really complain.


It would make a good enhancement request though. Perhaps Tridion can be persuaded to document this and support it. In the meantime, perhaps it's better to be on the safe side and explicitly add the parameters you need to the Component Presentation from the Page Template.

Tridion behaves badly when publishing ASPX

Posted by Dominic Cronin at Jan 12, 2006 11:00 AM |
Filed under: ,

Tridion behaves badly when publishing ASPX

When you publish ASP.NET code from Tridion using the TCDL mechanism, it adds extra output that you didn't ask for. Specifically, each component presentation is emitted wrapped in a span tag. The span tag has an ID composed of the URI's of the component and component template.

These span tags aren't there because you asked for them in your content, or in your templates, so why are they there?

I reported this bug to Tridion customer support, and they told me that it's not a bug. It's designed that way, they say. Fair enough, I say, then it's a design bug. Apparently, the purpose is "to make it easier for SiteEdit and template designers to manipulate existing component presentations. Well that won't wash, because it's no earthly use for either purpose.

Is it a bug? Walks like a duck, quacks like a duck....

What harm can it do?

  • Well in my case the main issue was that in XHTML a span tag isn't allowed to contain block-level elements. That means the XHTML produced this way is invalid. (I'm the kind of guy that likes to see a clean compile, and I like to see my web pages validate correctly too. This isn't just bravado, it's the most efficient way to work.)
  • On top of that, you might not always be in charge of the CSS on your site. What if someone specifies a child selector instead of a descendant? That's just an example: you can imagine a hundred other ways this detritus could ruin your style.
  • Oh heck! It's just broken, that's all. A web content management system should allow you to manage your web content. That means not emitting things you didn't ask for.

Anyway - what can you do about it?

  • You can ignore it. This might be OK if you don't care about validation, and you have complete control of the styles.
  • Of the people I've spoken to, it seems most are simply avoiding the problem by publishing to a Publishing Target with Target Language set to None. Of course, if you do this, you have to writeOut the various functions for linking etc. yourself. If everyone does this, then Tridion might as well never have invented TCDL in the first place.
  • You can create a customised version of the AspDotNETTransformer. This is probably what I'll do. Tridion customer support have provided me with some java source code (which is just as well as it's an undocumented API). It looks straightforward enough and it's install once - fixed forever.
Quack, Quack!!!