Chapter 9. Filters

To this point, we have been building stand-alone applications to transform external files, in XML format or just plain text, to OpenOffice.org. OpenOffice.org allows you to integrate an XSLT transformation into the application as a filter.

XSLT-based filters work by associating an XML file type, which we will call the “foreign” file, XSLT transformation files for import and/or export, and an OpenOffice.org template file. XML elements in the foreign file are associated with styles in the template file. The import transformation will take the foreign file’s content and insert it into the template, assigning styles as appropriate. The export transformation will read the OpenOffice.org document, and, using the style information, create a foreign file.

The remainder of this chapter will be a case study that shows how to construct and install XSLT-based filters.

The XML that we will import is a database of amateur wrestling clubs in California (yes, this is an actual database; the phone numbers and emails have been changed.) The state is divided into several areas or associations; for example, SCVWA—the Santa Clara Valley Wrestling Association. Each association consists of a series of clubs. Example 9.1, “Sample Club Database” shows an abbreviated file. A club can have multiple email addresses, and the <info> element is optional. The only element that isn’t self-explanatory is the <age-groups> element. Its type attribute tells which age groups the club serves: Kids, Cadets, Juniors, Open (competitors out of high school), and Women. The <info> element may contain hypertext link to a club’s website, represented by the HTML <a> element, which has been borrowed into this custom language without a namespace.

Figure 9.1, “Imported Club Database” shows the OpenOffice.org Writer file that we want as a result.

We will now create the template file in OpenOffice.org. This is just an empty document with styles that will be associated with XML elements. Figure 9.2, “Styles in Writer Template” shows the names of the paragraph and character styles in the template.

That having been done, we create the stylesheet, shown in Example 9.2, “Stylesheet for Transforming Club List to Writer Document”. The template doesn’t have to include any <style:style> elements; those have been taken care of in the template.

Example 9.2. Stylesheet for Transforming Club List to Writer Document

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns:fo="http://www.w3.org/1999/XSL/Format"
    xmlns:office="http://openoffice.org/2000/office" 
    xmlns:style="http://openoffice.org/2000/style" 
    xmlns:text="http://openoffice.org/2000/text" 
    xmlns:table="http://openoffice.org/2000/table" 
    xmlns:xlink="http://www.w3.org/1999/xlink" 
    xmlns:number="http://openoffice.org/2000/datastyle" 
    xmlns:script="http://openoffice.org/2000/script" 
    xmlns:config="http://openoffice.org/2001/config"
    office:class="text" office:version="1.0">

<xsl:template match="/">
    <office:document>
        <office:body>
            <xsl:apply-templates select="club-database/association"/>
        </office:body>
    </office:document>
</xsl:template>

<xsl:template match="association">
    <text:h text:level="1" text:style-name="Association"> 1
        <xsl:value-of select="@id"/>
    </text:h>
    <xsl:apply-templates select="club"/>
</xsl:template>

<xsl:template match="club">
    <text:h text:level="2" text:style-name="Club Name">
        <xsl:value-of select="name" />
        <xsl:text> </xsl:text>
        <text:span text:style-name="Club Code"><xsl:value-of
            select="@id" /></text:span>
    </text:h>
    <text:p text:style-name="Default">
        <xsl:text>Chartered: </xsl:text>
        <text:span text:style-name="Charter"> 2
            <xsl:value-of select="@charter"/>
        </text:span>
    </text:p>
    <text:p text:style-name="Default">
        <xsl:text>Contact: </xsl:text>
        <text:span text:style-name="Contact">
            <xsl:value-of select="contact"/>
        </text:span>
    </text:p>
    <text:p text:style-name="Default">
        <xsl:text>Location: </xsl:text>
        <text:span text:style-name="Location">
            <xsl:value-of select="location"/>
        </text:span>
    </text:p>
    <text:p text:style-name="Default">
        <xsl:text>Phone: </xsl:text>
        <text:span text:style-name="Phone">
            <xsl:value-of select="phone"/>
        </text:span>
    </text:p>

    <xsl:choose> 3
        <xsl:when test="count(email) = 1">
            <text:p text:style-name="Default">
                <xsl:text>Email: </xsl:text>
                <text:span text:style-name="Email">
                    <xsl:value-of select="email"/>
                </text:span>
            </text:p>
        </xsl:when>
        <xsl:when test="count(email) &gt; 1">
            <text:p text:style-name="Default">Email:</text:p>
            <text:unordered-list text:style-name="UnorderedList">
                <xsl:for-each select="email">
                    <text:list-item>
                        <text:p text:style-name="default">
                            <text:span text:style-name="Email">
                                <xsl:value-of select="."/>
                            </text:span>
                        </text:p>
                    </text:list-item>
                </xsl:for-each>
            </text:unordered-list>
        </xsl:when>
    </xsl:choose>

    <xsl:apply-templates select="age-groups"/>
    
    <xsl:apply-templates select="info"/>
</xsl:template>

<xsl:template match="age-groups"> 4
    <text:p text:style-name="Default">
        <xsl:text>Age Groups: </xsl:text>
        <text:span text:style-name="Age Groups">
            <xsl:if test="contains(@type,'K')">
                <xsl:text>Kids </xsl:text>
            </xsl:if>
            <xsl:if test="contains(@type,'C')">
                <xsl:text>Cadets </xsl:text>
            </xsl:if>
            <xsl:if test="contains(@type,'J')">
                <xsl:text>Juniors </xsl:text>
            </xsl:if>
            <xsl:if test="contains(@type,'O')">
                <xsl:text>Open </xsl:text>
            </xsl:if>
            <xsl:if test="contains(@type,'W')">
                <xsl:text>Women </xsl:text>
            </xsl:if>
        </text:span>
    </text:p>
</xsl:template>

<xsl:template match="info">
    <xsl:if test="normalize-space(.) != ''"> 5
        <text:p text:style-name="Club Info">
            <xsl:apply-templates/>
        </text:p>
    </xsl:if>
</xsl:template>

<xsl:template match="a"> 6
    <text:a xlink:type="simple" xlink:href="{@href}"><xsl:value-of select="."/></text:a>
</xsl:template>

</xsl:stylesheet>
1 This is the first occurrence of connecting the foreign file’s content with a custom style in the template.
2 Notice that we attach the style only to the actual content, not to the entire paragraph. This means we don’t have to parse the paragraph content upon export.
3 If there’s only one email address, it is placed on the same line as the label; otherwise, the transformation creates an unordered list of all the email addresses.
4 Go through the age group symbols one at a time. Note that we will have to parse this in the export transformation.
5 This code makes sure we don’t issue a paragraph if there’s nothing in the <info> element.
6 This is how you add a hypertext link to an OpenOffice.org Writer document; it also borrows the <a> element from HTML, but does it the right way—with a namespace.

Creating the export filter is a much more difficult task. When we imported a file, a hierarchical structure like this:

was “flattened” into a structure like this:

The export filter will have to take this flattened structure and re-create the nesting. The algorithm for this is not particularly difficult:

For each <text:h> element with a text:style-name of Association:

To construct a <club> element:

  1. Create an opening <club> element.
  2. While the next sibling of this element is a <text:p> element:
    1. If there is a child <text:span> element, create an appropriate child element based on the span’s text:style-name.
    2. Otherwise, if there is a neighboring <text:unordered-list>, then you have a list of emails.[17] Extract the email addresses and create the appropriate <email> elements in the target document.
    3. Otherwise, if this is a club info paragraph, inset an <info> element.
  3. You have encountered a <text:h> element or the end of the file. Close the <club> element.

This is not exactly rocket surgery, but the job is complicated by the fact that XSLT almost exclusively uses recursion, not iteration.[18] This makes the transformation ugly, so we will present it in parts.

The first part shows the opening <xsl:stylesheet> element, showing the namespaces that could be used in the OpenOffice.org document. The transformation won’t work without these declarations, but we do not want to see the namespaces in the resulting output file. Thus, we use the exclude-result-prefixes attribute to eliminate namespace delcarations from our ouput.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format"
xmlns:office="http://openoffice.org/2000/office" xmlns:style="http://openoffice.org/2000/style" xmlns:text="http://openoffice.org/2000/text" xmlns:table="http://openoffice.org/2000/table" xmlns:draw="http://openoffice.org/2000/drawing"  xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:number="http://openoffice.org/2000/datastyle" xmlns:form="http://openoffice.org/2000/form" xmlns:script="http://openoffice.org/2000/script" xmlns:config="http://openoffice.org/2001/config"
exclude-result-prefixes="text xsl fo office style table draw xlink form script config number">
<xsl:output method="xml" indent="yes"/>

Almost the only place we can use XSLT’s natural processing style is to grab all the <text:h> elements for the associations. Processing an association creates the <association> element with its ID, and then starts the process of making entries for the constituent clubs. Implicit in this code is the presumption that there is at least one club in an association.

<xsl:template match="/">
    <xsl:apply-templates select="office:document/office:body/
    text:h[@text:style-name='Association']"/>
</xsl:template>

<xsl:template match="text:h[@text:style-name='Association']">
    <association id="{.}">
        <xsl:call-template name="make-club">
            <xsl:with-param name="clubNode"
             select="following-sibling::text:h[1]"/>
        </xsl:call-template>
    </association>
</xsl:template>

We can now make the club(s) in the association.

<xsl:template name="make-club">
    <xsl:param name="clubNode"/>
    <xsl:if test="$clubNode/@text:style-name = 'Club Name'"> 1
        <club>
            <xsl:attribute name="id">
                <xsl:value-of
                 select="$clubNode/
                 text:span[@text:style-name='Club Code']"/>
            </xsl:attribute>      
            
            <name><xsl:value-of select="$clubNode"/></name>
            
            <xsl:call-template name="make-content">
                <xsl:with-param name="contentNode" 2
                    select="$clubNode/following-sibling::*[1]"/>
            </xsl:call-template>
        </club>
        
        <xsl:if test="$clubNode/following-sibling::text:h[1]">  3
            <xsl:call-template name="make-club">
                <xsl:with-param name="clubNode"
                    select="$clubNode/following-sibling::text:h[1]"/>
            </xsl:call-template>
        </xsl:if>
    </xsl:if>
</xsl:template>
1 The node that was passed on to the make-club template could either be a <text:h> for a club name or, if this was the last club, could be the next association. Hence, the <xsl:if> to make sure we have gotten a club name.
2 When we proceed to gather the club’s content, we have to blindly pass on the first following sibling element—it could be a <text:p> that is part of the club, a <text:h> that starts a new club, or a <text:h> that starts a new association.
3 After completing this club, check to see if this node has a following <text:h> node. If so, recursively call this template with that new node, which could be another club or the next association.

Assembling the content for a club works very much along the same lines.

<xsl:template name="make-content">
    <xsl:param name="contentNode"/>
    <xsl:if test="name($contentNode) = 'text:p'"> 1
        <xsl:choose>
            <xsl:when test="$contentNode/text:span"> 2
                <xsl:call-template name="add-item">
                    <xsl:with-param name="spanNode"
                        select="$contentNode/text:span"/>
                </xsl:call-template>
            </xsl:when>
            
            <xsl:when test="name($contentNode/
             following-sibling::*[1]) = 'text:unordered-list'"> 3
                <xsl:call-template name="email-list">
                    <xsl:with-param name="emailList"
                     select="$contentNode/
                     following-sibling::text:unordered-list[1]"/>
                </xsl:call-template>
            </xsl:when>

            <xsl:when test="$contentNode/@text:style-name = 'Club Info'"> 4
                <info>
                    <xsl:apply-templates select="$contentNode"/>
                </info>
            </xsl:when>
        </xsl:choose>

        <xsl:call-template name="make-content">  5
            <xsl:with-param name="contentNode"
                select="$contentNode/following-sibling::*[1]"/>
        </xsl:call-template>
    </xsl:if>
</xsl:template>
1 If this isn’t a paragraph, it’s not part of the club content. (This stops recursion when we hit the end of the file or the next club/association.)
2 If this paragraph has a <text:span> child, then it ’s a charter, location, contact, phone, single email, or age group specification. Hand it off to another template.
3 If there’s an unordered list following this paragraph, it must be a club with multiple emails. Again, hand the list off to another template.
4 Club information is just straight text with embedded links, so use <apply-templates> to handle the text (with the default template) and the links with a soon-to-be-described template.
5 In any case, keep gathering content by recursively calling this template with the next node in the OpenOffice.org document.

Here’s the template that adds individual elements as children of a club. The styleAttr variable is for convenience, to make the source easier to read. All the elements except <age-groups> are handled by adding the span’s contents. Age groups are special, and, rather than trying to split up a list of keywords and recursively handle them, we cheat. The call to the translate function eliminates all lowercase letters and blanks, leaving the uppercase abbreviations for the age groups. For example, Kids Cadets Open is instantly reduced to KCO.

<xsl:template name="add-item">
    <xsl:param name="spanNode"/>
    <xsl:variable name="styleAttr"
     select="$spanNode/@text:style-name"/>
    
    <xsl:choose>
        <xsl:when test="$styleAttr = 'Charter'">
            <charter><xsl:value-of select="$spanNode"/></charter>
        </xsl:when>
        <xsl:when test="$styleAttr = 'Contact'">
            <contact><xsl:value-of select="$spanNode"/></contact>
        </xsl:when>
        <xsl:when test="$styleAttr = 'Phone'">
            <phone><xsl:value-of select="$spanNode"/></phone>
        </xsl:when>
        <xsl:when test="$styleAttr = 'Location'">
            <location><xsl:value-of select="$spanNode"/></location>
        </xsl:when>
        <xsl:when test="$styleAttr = 'Email'">
            <email><xsl:value-of select="$spanNode"/></email>
        </xsl:when>
        <xsl:when test="$styleAttr = 'Age Groups'">
            <age-groups>
                <xsl:attribute name="type">
                    <xsl:value-of select="translate($spanNode,
                    ' abcdefghijklmnopqrstuvwxyz', '')"/>
                </xsl:attribute>
            </age-groups>
        </xsl:when>
    </xsl:choose>
</xsl:template>

Rounding out the XSLT stylesheet are the templates that handle a list of email addresses within a <text:unordered-list> and the <text:a> element inside the club information.

<xsl:template name="email-list">
    <xsl:param name="emailList"/>
    <xsl:for-each select="$emailList/
     descendant::text:span[@text:style-name='Email']">
        <email><xsl:value-of select="."/></email>
    </xsl:for-each>
</xsl:template>

<xsl:template match="text:a">
<a href="{@xlink:href}"><xsl:apply-templates/></a>
</xsl:template>

</xsl:stylesheet>


[17] This is where our cleverness of reperesenting multiple emails as a list comes back to haunt us.

[18] When your only tool is a hammer, everything looks like a nail.


Creative Commons License Content licensed under a Creative Commons License.
All content is copyright O’Reilly & Associates, Inc.
During development, I give permission for non-commercial copying for educational and review purposes. After publication, all text will be released under the Free Software Foundation’s GNU Free Documentation License.