My XSLT Toolbox – Recursive XSLT templates

2008-12-28 4 min read Programming Xslt Eddie

Recursion is one of the core concepts in programming. It’s valuable not only as a technique for writing programs, but as a general concept for solving problems. XSLT provides many useful elements such as for-each (and apply-templates), but occasionally you will run into a problem which must be solved with recursion. Let’s take a look at a real-world (no Fibonacci!!) example, where we have to operate on a simple string of numbers separated by commas. We’ll take a step-by-step approach to writing a recursive template.

Let’s say we have the following source document, short and sweet. We want to take each number, and wrap it with an element.

<?xml version="1.0" encoding="UTF-8"?>
<comma>1,2,3,4,5,6,7,88,99,100</comma>

The easy way to do this is to use the EXSLT str:tokenize function, which takes a string and some delimiters and splits the string based on those delimiters. All we do is add the xmlns:str and extension-element-prefixes attributes to our xsl:stylesheet declaration, and then call the str:tokenize function.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
version="1.0" xmlns:str="http://exslt.org/strings" 
extension-element-prefixes="str">
 
    <xsl:template match="/>
        <xsl:for-each select="str:tokenize( comma, ',')">
            <xsl:copy-of select="."/>
        </xsl:for-each>
    </xsl:template>
 
</xsl:stylesheet>

The result is, (new-lines added for readability):

<?xml version="1.0"?>
<token>1</token>
<token>2</token>
<token>3</token>
<token>4</token>
<token>5</token>
<token>6</token>
<token>7</token>
<token>88</token>
<token>99</token>
<token>100</token>

Excellent. But let’s say that we don’t have access to the EXSLT functions, and we have to write a template to perform the same thing.

So now we think up a recursive algorithm. Let’s look at a simplified list with three numbers, such as “1,2,3”. First, we print the “1”, the value before the first comma, and then we discard the first comma. At that point, our list will be “2,3” and we repeat, printing the new first value, and discarding the new first comma. Finally, the list becomes only “3”. There is no comma, so we simply print out the rest of the list, “3”. So we will be recursing over the string printing the first number, and then popping off the first number and first comma. This technique will work with a three number list, or a million-number list (though your processor will probably run out of memory before that).

XPath’s “substring-before”, “substring-after”, and “contains” functions are all of the tools that we’ll need to implement our algorithm. “substring-before” lets us obtain the number before the first comma. “substring-after” lets us discard the first number and first comma, and “contains” allows us figure out the last, comma-less case.

Our function starts in the same manner as all recursive functions, dealing with the last case, and then all of the cases before it. The last case will be the comma-less case from our algorithm. So here’s our template skeleton.

Continue reading

My XSLT Toolbox – copy and copy-of

2008-12-27 4 min read Programming Xslt Eddie

Using XSLT to copy elements is extremely common when you’re transforming a source document of a certain type (XML, HTML, etc.) to the same type. Often, you need an exact copy of an element verbatim, but other times you need to selectively choose certain elements to copy and others to discard. XSLT makes this process quite elegant using it’s xsl:copy-of and xsl:copy elements. The following is a setp-by-step tutorial on how these elements are used.

When you need an exact copy of an element and it’s children, you use the xsl:copy-of element, which makes an exact copy of the selected element and it’s children. Given the following XML data, which represents a (trivial) inventory of a store, let’s say you want an exact copy of any items with the name “XSLT”.

<pre lang="xml">
<?xml version="1.0" encoding="UTF-8"??><inventory><item id="1"><name>The Little Schemer</name><type>book</type><author>Friedman</author><author>Felleisen</author><list-price>29.95</list-price><sell-price>26.99</sell-price><cost>17.92</cost></item><item id="2"><name>XSLT</name><type>book</type><author>Tidwell</author><list-price>49.95</list-price><sell-price>34.99</sell-price><cost>22.92</cost></item><item id="3"><name>Romeo and Juliet</name><type>compact disc</type><conductor>Rostropovich</conductor><list-price>18.98</list-price><sell-price>13.99</sell-price><cost>9.92</cost></item></inventory>

You simply apply the following XSLT stylesheet to your source document:

<pre lang="xml">
<?xml version="1.0" encoding="UTF-8"??><stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><template match="/"><copy-of select="inventory/item[name = 'XSLT']"></copy-of></template></stylesheet>

Which gives you exactly what you were looking for, the “item” with the name “XSLT”.

<pre lang="xml">
<?xml version="1.0" encoding="utf-8"??><item id="2"><name>XSLT</name><type>book</type><author>Tidwell</author><list-price>49.95</list-price><sell-price>34.99</sell-price><cost>22.92</cost></item>

That was easy, so now let’s say you want to do a little more with your inventory document. Your boss wants a copy of it to look at the numbers and do some accounting. She doesn’t care about the authors or conductors, so she’d like that information left out. Also, she would like an additional piece of information for each item, the amount of profit off each item sold, the difference between the sell-price and the cost.

Because we are adding a piece of information and getting rid of elements that don’t affect the accounting, we can’t use a xsl:copy-of, because that would output an exact copy of the item element, it’s attribute nodes, and it’s child nodes. This exact copy is called a deep copy, because it not only copies the element, but all of it’s children as well. The solution is to use xsl:copy which performs a shallow copy, which means it only copies the current node, and ignores all children or attribute nodes.

Since xsl:copy only copies one element at a time, you need to explicitly specify that you want to continue copying attribute nodes and child nodes. xsl:apply-templates gives us the leverage to write a template that accomplishes that. The following template starts by matching attribute and children nodes, then copies the node, and recursively applies itself to any attribute or child nodes found in the source tree.

Continue reading

Affecting your situation

2008-12-18 1 min read Cello Classical Music Eddie

I don’t typically link to other blogs/articles, nor do I mention classical music particularly often, but I found this article and blog entry so interesting and thought-provoking that they deserve a re-post.

First, a moving blog entry from David Finlayson, trombonist in the New York Philharmonic, and second, the New York Times article describing the background, as well as referencing the blog post.

While I’ve never had a specifically parallel experience, I can relate to the concepts of “fakes” in a particular industry. I find Mr. Finlayson’s reaction (that all musicians must take responsibility and blame for the situation) to be both bold, yet… well, correct. It takes a strong person to identify a stormy situation clearly and react in an appropriate fashion. I only hope that I would react the same way given the circumstanses.

Advantages of push-style XSLT over pull-style

2008-11-25 3 min read Programming Xslt Eddie

Working with more than a few new-hires over the last few weeks, I’ve noticed that new XSLT developers often write pull-style XSLTs by default. However, this tends to defy XSLT’s functional heritage, and is not as useful as the opposite form, push-style XSLTs.

Pull-style XSLTs reach into the source document and pull out the data they need to transform. The pull-style is similar to template systems like those found in Rails or Django, or inserting PHP commands between HTML elements. For example, given the trivial input:

<?xml version="1.0" encoding="UTF-8"?>
<books>
    <book>
        <title>The Scheme Programming Language</title>
        <author>R. Kent Dybvig</author>
    </book>
    <book>
        <title>Essentials of Programming Languages</title>
        <author>Daniel P. Friedman</author>
    </book>
    <book>
        <title>An Introduction to Information Theory</title>
        <author>John R. Pierce</author>
    </book>
</books>

an XSLT novice will produce a stylesheet like the following (note lines 11 and 12 which reach into the source and grab the data):

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:template match="/">
        <html>
            <head>
                <title>books</title>
            </head>
            <body>
                <dl>
                    <xsl:for-each select="books/book">
                        <dt><xsl:value-of select="title"/></dt>
                        <dd><xsl:value-of select="author"/></dd>
                    </xsl:for-each>
                </dl>
            </body>
        </html>
    </xsl:template>
</xsl:stylesheet>

which transforms into:

<html>
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <title>books</title>
   </head>
   <body>
      <dl>
         <dt>The Scheme Programming Language</dt>
         <dd>R. Kent Dybvig</dd>
         <dt>Essentials of Programming Languages</dt>
         <dd>Daniel P. Friedman</dd>
         <dt>An Introduction to Information Theory</dt>
         <dd>John R. Pierce</dd>
      </dl>
   </body>
</html>

The real power of XSLT, however, is defining templates for the elements found within the source document. These are push-style XSLTs. They have two main advantages. First, push-style gracefully handles complex source structures, including recursively nested elements. It would be near impossible to handle the following source document using pull-style,

<pre lang="xml">
<div><div><div>a</div></div></div>

if you didn’t know how deep the recursive divs would go. A push-style solution, though, is incredibly simple.

<pre lang="xml">
<template match="div">
     * <apply-templates></apply-templates> *
</template>

Will transform the previous source into the following.

* * * a * * *

In addition to handling complex source structures, push-style allows code reuse. This is of course an ideal of any programming language. Push-style XSLTs have a greater ability to be reused, because the individual templates can be reused. When you only have one template, it is quite difficult to make it general without resorting to numerous choose-when statements. Here is an example of code reuse, where we extend a previously written template with the xsl:apply-imports rule.

Given the input,

<images>
    <image>
        <url>http://www.filmjunkie.com/drinks/blixa/blixa.jpg</url>
        <alt>Blixa!</alt>
    </image>
</images>

and the XSLTs,

    <xsl:import href="imageformat.xsl"/>
 
    <xsl:template match="image">
        <div class="wrapper">
            <xsl:apply-imports/>
        </div>
    </xsl:template>

and the rule in “imageformat.xsl” (the template being extended in this case),

Continue reading

Horror Movies, Final installment

2008-11-11 2 min read Movies Eddie

The final installment… a little late.

  • Amityville Horror – (The OLD one, not the new one.) I’d watched this before, but it was so much better than I remember it. In fact, I felt it was the scariest movie that I ended up watching. The scene at the beginning where the priest goes into the house was totally blood chilling.
  • House of 1000 Corpses – I waited for years to see this movie, only to end up kinda disappointed. I figured Rob Zombie had to know what he was doing with a horror movie, but the plot was so off-the-wall that I couldn’t believe anything, resulting in my rather ambivalent feeling towards it. Too many characters killing too many other characters. The creepy clown was the only interesting/memorable character.
  • The Devil’s Rejects – I also waited a number of years to see this movie, but was prevented under the reasoning that I had to see House of 1000 Corpses first. With that finally out of the way, I was now allowed, but I was wary… especially after the last film. For the Devil’s Rejects, Rob Zombie took the large number of characters and un-realistic plot from the first movie, and turned it on it’s head… keeping the cast numbers low and keeping the entire movie realistic. Zombie made the 3 bad guys out to be quasi-good guys, and the cop into a revenge obsessed bad guy. The result was a really good character study of some truly bizarre people. This movie wasn’t particularly scary, but it was quite good.
  • Carrie – One of the best scary movies ever… my favorite part is at the end where Carrie’s hand… well, you know.
  • The Exorcist – I LOVE this movie. I saved this pick until Halloween night. I love Max von Sydow, the early scenes in the desert, Father Damien, the fact that the movie is based on something that happened in a part of Maryland I frequent, that the movie’s steps are right down the street from where I used to work in Georgetown. My favorite.
Older posts Newer posts