10x More Productive Blog!

XQuery for fun

By FB2

Thursday, January 22, 2004

Why XQuery

It seems that we're about to join a project at work, where there'll be lots of XQuery development.
As an architect, I always feel uneasy about talking about something without a hands-on experience, so I got myself a couple of books, tools, articles, and I started to play around with the language.

I did not feel much enthusiasm about the language itself: its syntax reminded me too much of good old BASIC, and simply I did not see where it could be of useful for enterprise development, or regular "home" use. There are other mechanisms to get XML out of documents, and transform them into something else, using various languages and frameworks. I did not see what benefit it could provide. It seemed to be a marginally different alternative to XSLT.

Let me tell you right away: I was wrong. I'm having a blast writing simple XQuery scripts for all the things I wanted to do to my XML documents, and was too lazy to do them with XSLT or Java. I'm still at he learning stage, busy reading obscure mailing lists, and browsing the specification to find solutions for even the simple things - but I'm already productive with it. So don't take this article as an expert's guide: this is just the result of my learning and tinkering process.

I'm not going to replicate the specification or the relevant books here, but let's now define what XQuery is. Despite the simplicity of the language itself, it's rather hard to nail down a concise description, mostly because it has different uses for different people.
The http://xquery.com page describes it the following way:

XQuery is a technology under development by the World Wide Web Consortium (W3C) that's designed to query collections of XML data -- not just XML files, but anything that can appear as XML, including relational databases. XQuery has broad support from IBM, Microsoft, and Oracle as well as application server vendors such as BEA and Software AG.

It sounds a bit like a marketing blurb, with serious name dropping, let's see how Jason Hunter defines it:

XQuery provides the mechanism to efficiently and easily extract information from Native XML Databases (NXD) and relational data as well. With XQuery, you can view RDBMS tables as just another XML data source. XQuery makes possible the exciting possibility of a single query that combines an incoming purchase order in native XML format, an archive of catalog data also in native XML format, and an inventory system held in a relational database.

Ok, so to put it simply, it enables you to get XML from various sources, including relational DBs, XML databases, file system, etc., and then return it in a format which is suitable for consumption for your application. The getting-the-xml part is enabled by XMLQuery hooks in the various databases and other sources. The formatting and manipulation is enabled by a set of specifications related to XQuery, including XPath, and XSLT functions.

Why is it different from XSLT?
There are long articles and chapters in books about it, but the primary differences for me are:

The language is more suitable for simple scripts - it trades in the template-driven XSLT mechanism for simple flow control implemented by your XQuery language code.
It allows you to easily retrieve XML nodes from multiple sources, and combine/aggregate them.
If so far you did not grok XSLT, you're never going to - now you get a new shot at XML manipulation with XQuery :-)

I'd say it is much closer to templating languages like Velocity, JSP, PHP, so it will be more natural to programmers who grew up on something other than LISP.

Why is it different from existing templating languages which allow me to work with XML anyway?
I think the most important aspect is that it has been cooked from XML, XPath, generally it thinks XML inside. You don't have to map your XML to constructs which are closer to the other language, you access, manipulate, and construct everything in XML, without losing anything in the process of mapping from XML to the specific language.

Okay, this is enough for the Boring Stuff, let's see XQuery In Action! (This is probably trademarked by Manning, need to check :-)).

Getting the docs

To get started with XQuery development, you'll probably need the specifications.
The specs can be found on the W3C XQuery pages.

It's not fun to read a W3C specification, anyone knows that. You can find a couple of tutorials on the web - it's a good idea to get through the basics with them. Here's a good one:
http://www.brics.dk/~amoeller/XML/querying/

To my surprise, there are already available books on the subject. I've checked out these:

I generally like the Kick Start series (I'm planning an article about JSTL, the JSTL Kick Start book is good), but out of these two, the "from the Experts" book was much better. The Kick Start books wastes an incredible amount of pages on explaining what XML is, and all that basic stuff. The "Expert" book is much better structured, and there's a very good overview chapter to get the basics of the language.

Getting tooled up

Learning programming should not be just reading and thinking about the concepts - you should do the examples as you're walking through the tutorial and the books. To execute some XQueries, all you need is an XQuery engine, but some of the more visual environments will help you to get through the first steps. My first environment I tried out was Stylus Studio from Sonic Software. It's an excellent IDE to get started quickly. You can test your queries instantly, without leaving the editor. It helps you to highlight issues, and it has general support for editing XML documents. There's a trial version available, it's a good idea to use this for trying out the tutorial examples.
Unfortunately the built-in XQuery engine does not support everything in the specification, so you'll probably end up not being able to run some of the examples, or try out some features, so after a while it's time to get a simple command-line based, but full-featured XQuery engine.

I've found out that one of the best XSLT engines, Saxon has now the capabilities to act as an XQuery processor. To my disappointment, just as Stylus Studio, this engine was not able to handle the "declare function" feature.

After some googling around, the best implementation seems to be http://www.gnu.org/software/qexo/. There are features in there which allows the invocation of Java code from the queries, which sounds like fun.The only problem I have with it currently is that it did not implement the "replace" function. If anyone knows a more complete implementation, please let me know!

As the specification is just about to be finalized, I expect the appearance of fully compliant processors Real Soon Now.

Real Life Examples

Now that we know the specification, and we have the tools to use it, suddenly we start to realize how much we could get done with it.

In my case, the very first thing I tried to implement, was to generate my DVD list page. I wanted to do that a long time ago, but the prospect of sitting down and doing it from XSLT just wasn't too exciting for me. I use a Java Swing application, called DVDAttache, to maintain my DVD collection (mostly obscure B movies, don't ask why). I wanted to publish a web page from it, as I wanted to share this information with my friends. DVDAttache has a Web Export feature, with a simple template language. Unfortunately I did not manage to export the data exactly the way I wanted to do it. Luckily all the data is maintained in an XML document.

There's no XML Schema associated with the document, but here's an entry:

I omitted several names from the cast list, so that you won't have to scroll a big document. Now it would be trivial even with XSLT just to generate a webpage out of this. But this is not what I want to achieve: I want to filter, restructure, and format the document into a HTML page that is consistent with the layout of my homepage. For my web page, I'm only interested in the small image associated with the movie, the title, the director, only the three main members of the cast, and the plot. I just want to produce one page with all these details, so that my friends can just scroll through the list, instead of navigating back and forth on multiple pages.

Here's the result of a 30 minute tinkering session with XQuery.

Notice how with a very few lines of XQuery, we processed the original XML document, and achieved the intended output. Most of it is just the HTML itself. Isn't this fun?
Here are a couple of points that I've encountered:

as with XPath, you need to address the attributes of the element with the @ character (for example $dvd/director returns the <director> element, while $dvd/@title returns the "title" attribute)
in a "for" loop, you can get access to the index of the loop with the "at $variablename" construct. I use this to display the index of the DVDs from within the loop
The title data is gathered from Amazon.com automatically by DVDAttache, the cast list tends to be very long, and I did not want to lose all that space on my list page. Luckily, the "for" loop iterates over a sequence, which can be the result of an XPath expression, but I can also specify a number of nodes myself, creating a sequence by hand, just adding the first 3 cast members.
I've noticed, that if there was no cast member specified for a movie, the script above sent out an empty <table/> node, instead of the <table></table> I've expected. This confused the browser, and the table structure fell apart. I guess this is one more reason to use XHTML instead of good old HTML. The quick hack I've implemented there with the "let emptyTag" assignment, basically includes a new XML node, which is a TR and TD tag, with some text, so the browsers won't get confused.

This is the empty tag I'm including when there'd be an empty table:

More documents, more fun

The next problem I wanted to solve relates to the way I read RSS feeds. I have RSS Bandit installed on my laptop, and on my home PC. When I read the news items, or I add a new feed to my list, there's a feature in there, which allows me to upload the whole feed list to my FTP site. Of course I always forget to do that, and I add interesting new feeds frequently, so I have a couple of new ones on my laptop, and a couple of new ones on my home PC. I wanted a way to see what the new feeds on the two instances are, and then add them to RSS Bandit.
The program uses an XML document to store the list of the feeds - how convenient! Here's a feed from the list:

For the "feed diff" I wanted to implement, I was only interested in the title and the URL of each feed. I also wanted a "feed://" link I could just click on, as RSS Bandit is associated with that URL scheme. Just a warning though: I was not able to find an XQuery implementation which was able to fully run this query:

Here are the new things, which I did not use yet in the previous script:

Since the feed items are specified in a namespace("http://www.25hoursaday.com/2003/RSSBandit/feeds/"), I needed to declare that namespace to get access to the elements and attributes. After I declared the namespace, I addressed the elements and attributes with the "rssd:" prefix. This is when I first started to realize that Stylus Studio did not fully implement the spec yet: I had to put a ";" on the end of the declare namespace line to be able to run it in Stylus Studio.
I implemented a function, so that I could get the new items in both versions of the feed list, without copy-paste code.
The next problem was a showstopper: Studio simply was not able to understand the "define function" constructs...After some research with Google and pinging my more XML-savvy friends, I've found Qexo, which is implemented in Java. The only problem I've found so far, is that Qexo did not implement the "replace" function, which I wanted to use to replace "http://" with "feed://" , for the Subscribe link.
If anyone knows about an XQuery implementation, which fulfills all these requirements, let me know! Meanwhile, the "feed:" replace stuff has to go...

I hope this little article was enough to get you started with XQuery! I'm in the process of completing the next one, which is about writing XQuery with WebLogic Workshop. I'd like to close this with an advice: try to avoid the "if all you have is a hammer..." syndrome! Sometimes it simply does not make sense to use XQuery. XSLT template matching could be much more powerful in many situations, and sometimes you just have to write proper Java/C#/Python code: XQuery is not for programming, it's for small queries. Don't go overboard with it...

Searching for the Holy Grail of software development...