TDOM

From AOLserver Wiki
Jump to navigation Jump to search

Documentation is available at http://www.tdom.org/doc-index.html

A very helpful tutorial is available at http://wiki.tcl.tk/8984


Dossy: Assigning a prefix to the default namespace when using XPath in tDOM

I don't know how many folks are using tDOM but I'm sure more and more are. I keep stumbling into this problem of trying to access child nodes which are in the default namespace but the parent is in a named namespace ... and trying to write XPath expressions start to look really dog-awful, like:

   $root selectNodes {/*[local-name()='foo']/*[local-name()='bar']/...}

So, I sat down for a few minutes and thought about the problem, and after some head-scratching, it hit me: name the default namespace. Duh. So, here's some sample code:

   ### IMPORTANT: This code lets us get at nodes with a null prefix.
   if {[$root hasAttribute xmlns]} {
       $root setAttributeNS "" xmlns:default [$root getAttribute xmlns]
   }

So, now, I can rewrite that XPath expression above:

   $root selectNodes /default:foo/default:bar/...

Man. Life just couldn't get any better.

If any of you are wondering why this matters, here's a sample XML doc and some sample code illustrating the beauty:

   set xml {
       <aaa:duh xmlns="booboo" xmlns:aaa="blort">
           <foo>
               <bar>some text</bar>
           </foo>
       </aaa:duh>
   }
   
   $doc [dom parse $xml]
   set root [$doc documentElement]

Suppose you want to get the text node inside <bar> with an absolute XPath expression. You might think you could write:

   [$root selectNodes /aaa:duh/foo/bar] text

Ah, but XPath *SPANKS* us and tells us we're naughty, that <foo> and <bar> have a null prefix and live in the unnamed namespace. That line of code returns an empty string, not "some text" as we might have expected. So, we tell XPath "well, how about you do me a favor and pretend like any node in the unnamed namespace really has a name of my choosing ..." by doing this:

   $root setAttributeNS "" xmlns:default [$root getAttribute xmlns]

We're telling the processor to name all nodes in the default namespace as though they were in the namespace named "default". N.B.: If your XML document really *HAS* a namespace named "default", this code will likely CLOBBER it and give you really weird behavior. Change "xmlns:default" in the above line to a namespace that won't ever exist in your document, then.

Now that we've done this, lets rewrite that XPath expression and try again:

   [$root selectNodes /aaa:duh/default:foo/default:bar] text

Ahh! It returned the string "some text" -- exactly what we wanted. Notice the injection of "default:" above before "foo" and "bar" -- since we just told the processor that nodes in the unnamed namespace (i.e., <foo> and <bar>) now live in the namespace named "default", in order to specify them in an XPath expression, we need to say "default:foo" and "default:bar". But this is a hell of a lot easier to read than "*[local-name()='foo']" and "*[local-name()='bar']", at least on my eyes.

If this is all old news, feel free to ignore this email. I hadn't seen it documented anywhere and I know I've puzzled over this for at least an hour or two over the past few months, so maybe this write-up might save a few people the same agony.


While the above code generally works, the explanation is slightly incorrect. It is not naming the the default namespace, is is binding a known prefix to the uri of the default namespace, which it looks up. An xml namespace is not the (possibly null) prefix; is is the uri to which the prefix is bound. In particular, to match nodes, the prefixes need not match, but the namespace uris must. Plain xml obscures this to a certain extent, but xpath makes it painfully clear.

The right thing to do is to always know what namespace uris are being used in the document, and assign known prefixes to all of them rather than to rely on the context node having all the needed (and convenient) bindings.

Note too that the above code will be of no help if the null prefix is assigned or changed in a child of the context node.