XPath and the dreaded distinction between default namespace and no namespace.

Posted by Dominic Cronin at Jun 12, 2006 10:00 PM | Permalink

Filed under: XML

I thought I was pretty much an old hand at XML by now, and that the standard gotchas wouldn't catch me any more. Not so! The standardest gotcha of 'em all jumped up out of the slime today like something undead and there I was, gotcha'd again.

I suppose I was led astray by putting too much trust in the notion that the methods surfaced in an API would always be relevant. Stupid really, but perhaps if you follow along with the story you'll forgive me. The API in question was that of the .NET framework; specifically the XmlNamespaceManager class.

Let's say you have some XML like this:

<?xml version='1.0'?>
<a:one xmlns:a='aaa'>
	<two xmlns='bbb'/>
</a:one>

If you have this in an XmlDocument and you want to XPath to the < two/> element, the first thing you'd usually do is create an XmlNamespaceManager and add the namespaces that you want to use in your XPath expression. This allows you to create a mapping between the namepace prefixes in your expression and the namespaces they represent. In the document itself, the same thing is achieved by the namespace declarations you can see in the sample above, but these don't exist in the XPath expression; it needs its own namespace context, or it won't be able to address anything that's in a namespace. But more of this just now...

So there I was today trying to demonstrate some techniques to a colleague, and I wrote some code something like this:

            NameTable nt = new NameTable();
            XmlNamespaceManager nm = new XmlNamespaceManager(nt);
            nm.AddNamespace("a", "aaa");
            nm.AddNamespace(String.Empty, "bbb");

            if (null == doc.SelectSingleNode("a:one/two", nm))
                Console.WriteLine ("Couldn't match default namespace");
            else
                Console.WriteLine ("Matched default namespace");

... and to my horror the match failed. After a few quick changes I had something like this:

            nm.AddNamespace("b", "bbb");
            if (null == doc.SelectSingleNode("a:one/b:two", nm))
                Console.WriteLine("Couldn't match b: namespace");
            else
                Console.WriteLine("Matched b: namespace");

... and on this occasion the match succeeded.

What was going on? Could I possiibly be the first human to set foot on a previously undiscovered bug in the framework? Suffice it to say that my moment of glory will have to wait. A quick hunt around on Microsoft's web site showed that half a dozen other people had reported this as a bug, and that Microsoft's response was "Won't fix."

What was going on?

Five or so years ago Martin Gudgin and the other luminaries teaching Developmentor's "Guerilla XML" course, had gone to extraordinary lengths to teach me and my fellow victims the difference between a node that is in a namespace, and one that isn't. Sorry Martin, I failed you.

It turns out that in the XPath standard it says the following

A QName in the node test is expanded into an expanded-name using the namespace
declarations from the expression context.  This is the same way
expansion is done for element type names in start and end-tags except
that the default namespace declared with xmlns is not used: if the QName does not have
a prefix, then the namespace URI is null (this is the same way attribute names are 
expanded).  It is an error if the QName has a prefix for which there is
no namespace declaration in the expression context.

So - in XPath, no prefix means not in any namespace at all, just like for attributes. Microsoft's implementation is correct. Otherwise, if you had XML like this:

<?xml version='1.0'?>
<one>
	<two xmlns='bbb'>
		<c:three xmlns:c="po"/>
	</two>
<one>

... you'd be unable to write an XPath from the root element down to <c:three/>.

Now for the part where you forgive me for my stupidity:

If you look at the documentation for XmlNamespaceManager, it states very clearly that you can use the empty string to set the Default namespace. (To be fair, the documentation for selectSingleNode has a note which attempts to clarify matters.)

If you look at the rest of the API of XmlNamespaceManager, the reason for the confusion becomes clear. XmlNamespaceManager also supports methods like PushScope, GetNamespacesInScope, etc. which plainly aren't intended for use with XPath at all. It looks rather as though you could use an XmlNamespaceManager for managing your namespaces as you navigate through an XmlDocument, perhaps with a streaming library. In that context, setting the default namespace makes perfect sense. If you're using it with XPath though, it's completely irrelevant.

So repeat after me: A default namespace isn't the same thing as the empty namespace (aka null namespace, no namespace, not in a namespace, etc.)

Dominic Cronin's web site

XPath and the dreaded distinction between default namespace and no namespace.