Chapter 4. Text Documents—Advanced

A frame in an OpenOffice.org document is much like a section; it’s an independent area of text which may have multiple columns. The difference between a frame and a section is that a frame may “float” and have the main text wrap around it. Frames are also anchored to the page, a pargraph, or an individual character. They may also act as though they are just another character in the stream of the text.

Each frame will have a <style:style> element whose style:name begins with fr and whose style:family is graphics (yes, frames are actually considered to be graphic objects). Its style:parent-style-name will be Frame.

Within the <style:style> is a <style:properties> element with these relevant attributes:

style:vertical-rel

This attribute tells where the frame is anchored: page, paragraph-content, or char. If the frame is anchored as a character, then this attribute has the value baseline.

style:vertical-pos

This gives the vertical position with respect to the anchor: top, middle, or bottom. If you have manually adjusted a frame by moving it, then this value will be from-top, and the offset will be in the body of the document.

style:horizontal-rel

Depending upon the anchorage of the frame, this attribute can have the following values: page and page-content (the entire page or just the text area), page-start-margin and page-end-margin, paragraph and paragraph-content, paragraph-start-margin and paragraph-end-margin, or char. If you have frames nested in frames, you may use the following values as well: frame and frame-content (the entire frame or just the occupied area), frame-start-margin, and frame-end-margin

style:horizontal-pos

The values used for this attribute are: left, center, right. If you have manually adjusted a frame by moving it, then this value will be from-left, and the offset will be in the body of the document.

style:wrap

How should text wrap around this frame? none, left (all text appears at the left of the frame), right (all text appears to the right of the frame), parallel (text appears on both sides of the frame), dynamic (what OpenOffice.org calls “optimal” page wrap), and run-through (the text appears behind the frame). [6]

style:number-wrapped-paragraphs

This attribute has the value no-limit unless you have checked the “First paragraph” option, in which case this attribute is not present.

A frame can have borders and columns. The borders are set as in the section called “Borders and Padding”, and the columns are set as described in the section called “Sections”. A frame’s background color is set with the fo:background-color attribute; the value is a six-digit hex value. You may set the style:background-transparency attribute to a value from 0% to 100%.

In the body of the document, each frame is represented by a <draw:text-box> element, with these attributes:

Example 4.1, “XML Representation of a Frame” shows the style and body information for a frame that has text that wraps on the left.

When you insert an image into an OpenOffice.org document, the application will store a copy of that file in the Pictures directory, and assign it an internal filename that looks something like this: 100000000000001800000018374E562F.png. The filename extension corresponds to the type of the original graphic.

As with many other items in OpenOffice.org, the specification for an image is contained both within the <office:automatic-styles> and <office:body> elements.

Each <style:style> element for an image has a style:family of graphics and a style:parent-style-name of Graphics. There is also a style:name attribute, which gives the name of this image syle.

The <style:style> element will contain a <style:properties> element that gives further information about the frame, exactly as described in the section called “Style Information for Frames”. Additionally, you will find the following attributes in the <style:properties> element which correspond to the settings in the graphics object toolbar shown in Figure 4.1, “Graphics Object Toolbar”.

Each image in the text is represented by a <draw:image> element, with these attributes:

draw:style-name

A reference to a <style:style> within the <office:automatic-styles> section. The name begins with the letters fr, since images are represented as frames, as discussed in the section called “Frames”

draw:name

If you do not specify a name, OpenOffice.org will assign a value of the form Graphicsn where n is an integer; within OpenOffice.org, this appears within the Format...Graphics...Options dialog pane.

xlink:href

A reference to the image file; it begins with #Pictures/ and is followed by the internal file name.

svg:width, svg:height

This is the size of your image, with scaling factors taken into account; thus, if your original picture is one centimeter by one centimeter and you scale it to 75% horizontally and 125% vertically, the width will be 0.75cm and the height 1.25cm.

svg:rel-width, svg:rel-height

These are included only if the image if the “relative” checkboxes in the Graphics dialog have been selected.

xlink:type, xlink:show, xlink:actuate

These three items always have the values simple (for xlink:type), embed (for xlink:show), and onLoad (for xlink:actuate).

A background image is entirely described in the style portion of your document. You need to put a <style:background-image> element within the <style:properties> element. This element will have xlink:href, xlink:actuate, and xlink:type attributes as described in the section called “Body Information for Images in Text” (even though the attributes are in the style section, not the body section), and draw:transparency, ranging from 0% to 100%.

The <style:background-image> has the following additional properties:

style:repeat

The background image can be tiled repeat, stretched to fit the frame stretch, or appear at its normal size no-repeat.

style:position

If the background image is not repeated, then you have to tell where it should be placed within the frame. The value of this attribute consists of two whitespace-separated values giving the vertical position of the image (top, bottom, or center) and horizontal position of the image (left, right, or center).

OpenOffice.org allows you to enter fields with dynamic values into a document. These include date, time, and document information.

In the section called “Specifying a Page Master” we discussed how you set up the page layout to include room for footnotes. Within the document body, each footnote is contained within a <text:footnote> element, which has a unique text:id attribute of the form ftnn, where n is an integer.

Within the <text:footnote> is a <text:footnote-citation> element that describes the foonote marker. If you choose automatic numbering for the footnote, then the element’s content is the footnote number. If you chose a character for the footnote marker, then the <text:footnote-citation> element contains the marker character. The marker is duplicated in the text:label attribute.

The <text:footnote-citation> is followed by the <text:footnote-body> element, which contains the text in your footnote.

If you are inserting an end note, the relevant elements are <text:endnote>, <text:endnote-citation>, and <text:endnote-body>. Figure 4.2, “Footnotes and Endnotes” shows a numbered footnote, a footnote marked with an asterisk, and a numbered endnote. The corresponding XML is in Example 4.2, “Footnote and Endnote XML”. The Footnote and Endnote styles come from the styles.xml file.

OpenOffice.org tracks three types of changes: insertions, deletions, and format changes. These are all combined into a <text:tracked-changes> element at the beginning of the <office:body> element. Each change is contained in a <text:changed-region> element with a unique text:id attribute.

A <text:changed-region> contains one of three elements:

These all share a common <office:change-info> element, which has office:chg-author and office:chg-date-time attributes. In the case of a deletion, the <office:change-info> element is followed by the deleted material. (If it is only a single word, then it is enclosed in a copy of the parent <text:p> or <text:h> element from which it was deleted. Figure 4.3, “Document with Changes Tracked” shows a section of a document with these three types of changes, and Example 4.3, “OpenOffice.org Change Tracking” shows the markup.

In the body of the text, we must be able to determine where these accumulated changes have occurred. For deletions, a <text:change> element is placed where the deletion occurred; its text:change-id attribute will refer to the corresponding text:id of the <text:changed-region>.

For insertions and format changes, the start of the change is marked with an empty <text:change-start> element, and the end with an empty <text:end> element. Example 4.4, “XML Representation of Change Tracking” shows the markup for the changes described in the preceding example.

Text tables in OpenOffice.org are, as with HTML tables, made up of rows, each of which contains cells. Again, the information for the table layout is in the <office:automatic-styles> section and the table data within the <office:body> section. In this section, when we refer to a length, we mean a number followed by a length unit; for example, 3.5cm.

Within the <office:automatic-styles> element, you will find the following for each table:

This information corresponds to the information that is set in the portion of the “Format Table” dialog box shown in Figure 4.4, “Table Width and Spacing”.

The “whole table’s” <style:style> element has a style:name attribute containing the table name (1). Its child <style:properties> element contains the remaining information:

The style:width attribute (2) is a length that gives the total width of the table. If you check the “Relative” checkbox, then the width of the table as a percentage of the page width is stored in the style:rel-width attribute.

The spacing (3) is represented by fo:margin-left, fo:margin-right, fo:margin-top, and fo:margin-bottom attributes, which all have length values. The left and right margins plus the width always add up to the distance between the page margins.

In a document created by OpenOffice.org, the application sets some of the margins depending upon the setting of the alignment (4). The table:align attribute interacts with the margins in strange and wondrous ways when OpenOffice.org creates a document. If you are creating a document, set this attribute to:

The top and bottom margins are up to you, no matter what alignment you choose.

Figure 4.5, “Other Table Properties” is the dialog box that corresponds to other attributes in the table’s <style:properties> element.

Example 4.5, “Three by Two table without Repeating Headers” shows a three-by-two table without repeating headers.

1 A table’s content is contained within a <table:table> element, which has a table:style-name attribute which references the <style:style> with the same name. The table:name attribute has the same value as table:style-name in OpenOffice.org documents.
2 In this example, all three columns have the same style, so the are <table:table-column> has a table:number-columns-repeated attribute. If all three columns had different styles, then the XML would contain three <table:table-column> elements, each with a different table:style-name reference.
3 If you have specified that your table does not repeats headings (item 3 in Figure 4.5, “Other Table Properties”), then the <table:table-row> elements follow. If your table has repeating headers, then the first row will be enclosed in a <table:table-header-rows> element.
4 Each <table:table-row> contains the <table:table-cell> elements for that row. Each <table:table-cell> has a table:style-name reference and a table:value-type, which has the value string. (We will encounter other values when we discuss spreadsheets.)

Horizontally merged cells are simple in OpenOffice.org. The first of the cells gets a table:number-columns-spanned attribute, whose value is the number of columns that have been merged. That cell is followed by n-1 <table:covered-table-cell> elements. Thus, a cell that spans three columns might look like Example 4.6, “Cells Spanning Columns”, with the text:style-name attributes removed for ease of reading.

Cells that span rows are an entirely different story. Rather than a simple table:number-rows-spanned attribute, OpenOffice.org represents the cells on either side of the large cell as sub-tables. Figure 4.6, “Cells Spanning Rows” shows a table with a cell that spans two rows. As far as OpenOffice.org is concerned, the table has only two rows. The second row consists of:

  • A cell that contains a two-by-one subtable
  • An ordinary cell (labelled main 2,2)
  • A cell that spans two columns and contains a two-by-two subtable.

Example 4.7, “XML for Cells Spanning Rows” shows the relevant XML for the second row, with the text:style-name and text:value-type attributes removed for ease of reading. We’ve also added a comment at the end of the listing.

Let’s put this information to use by creating a document that contains a table that summarizes the changes made in another OpenOffice.org document. We will use XSLT to do this transformation.

Figure 4.7, “Change Summary, Sorted by Time” shows sample output, reduced and cropped to save space. Our table will contain three columns: the time, author, and type of change. It can be sorted by any of the three columns, and the column that is used for the sort is highlighted in light green. The transformation accepts a parameter named sort with the value of time, author, or type to specify the sorting criterion.

The stylesheet begins with an <xsl:stylesheet> that provides all the relevant namespaces and an <xsl:output> element that sets the output DOCTYPE. These can be copied straight from Example C.6, “XSLT Framework for Transforming OpenOffice.org Documents”, and are not shown here.

Here’s the XSLT to set up the “outer structure” of the output document.

<xsl:template match="/">
    <office:document-content xmlns:office="http://openoffice.org/2000/office"
        xmlns:style="http://openoffice.org/2000/style"
        xmlns:text="http://openoffice.org/2000/text"
        xmlns:fo="http://www.w3.org/1999/XSL/Format"
        xmlns:table="http://openoffice.org/2000/table" 
        office:class="text">
    <office:script />

    <office:font-decls>
        <style:font-decl style:name="Lucidasans1"
        fo:font-family="Lucidasans" />
    </office:font-decls>

    <office:automatic-styles> 1
        <style:style style:name="P1" style:family="paragraph">
            <style:properties style:font-name="Lucidasans1"
            fo:font-size="12pt" style:font-size-asian="12pt"
            style:font-size-complex="12pt"/>
        </style:style>
            
        <style:style style:name="P2" style:family="paragraph">
            <style:properties style:font-name="Lucidasans1"
            fo:font-size="12pt" style:font-size-asian="12pt"
            style:font-size-complex="12pt"
            fo:text-align="center"
            fo:font-style="italic"
            fo:font-weight="bold"/>
        </style:style>

        <style:style style:name="ctable" style:family="table"> 2
            <style:properties
                style:width="15cm" table:align="center" />
        </style:style>
        
        <style:style style:name="ctable.A" style:family="table-column">
            <style:properties style:column-width="4.5cm" />
        </style:style>

        <style:style style:name="ctable.B" style:family="table-column">
            <style:properties style:column-width="7cm"/>
        </style:style>

        <style:style style:name="ctable.C" style:family="table-column">
            <style:properties style:column-width="3.5cm"/>
        </style:style>
        
        <style:style style:name="ctable.A1" style:family="table-cell"> 3
            <style:properties
                fo:border-top="0.035cm solid #000000"
                fo:border-right="none"
                fo:border-bottom="0.035cm solid #000000"
                fo:border-left="0.035cm solid #000000"
                fo:padding="0.10cm">
                <xsl:call-template name="set-bg-color"> 4
                    <xsl:with-param name="col-type">time</xsl:with-param>
                </xsl:call-template>
            </style:properties>       
        </style:style>

        <style:style style:name="ctable.B1" style:family="table-cell">
            <style:properties
                fo:border-top="0.035cm solid #000000"
                fo:border-right="none"
                fo:border-bottom="0.035cm solid #000000"
                fo:border-left="0.035cm solid #000000"
                fo:padding="0.10cm">
                <xsl:call-template name="set-bg-color">
                    <xsl:with-param name="col-type">author</xsl:with-param>
                </xsl:call-template>
            </style:properties>
        </style:style>
        
        <style:style style:name="ctable.C1" style:family="table-cell">
            <style:properties
                fo:border="0.035cm solid #000000"
                fo:padding="0.10cm">
                <xsl:call-template name="set-bg-color">
                    <xsl:with-param name="col-type">type</xsl:with-param>
                </xsl:call-template>
            </style:properties>
        </style:style>

        <style:style style:name="ctable.A2" style:family="table-cell"> 5
            <style:properties
                fo:border-top="none"
                fo:border-right="none"
                fo:border-bottom="0.035cm solid #000000"
                fo:border-left="0.035cm solid #000000"
                fo:padding="0.10cm">         
                <xsl:call-template name="set-bg-color">
                    <xsl:with-param name="col-type">time</xsl:with-param>
                </xsl:call-template>
            </style:properties>           
        </style:style>

        <style:style style:name="ctable.B2" style:family="table-cell">
            <style:properties
                fo:border-top="none"
                fo:border-right="none"
                fo:border-bottom="0.035cm solid #000000"
                fo:border-left="0.035cm solid #000000"
                fo:padding="0.10cm">
                <xsl:call-template name="set-bg-color">
                    <xsl:with-param name="col-type">author</xsl:with-param>
                </xsl:call-template>
            </style:properties>           
        </style:style>
        
        <style:style style:name="ctable.C2" style:family="table-cell">
            <style:properties
                fo:border-top="none"
                fo:border-right="0.035cm solid #000000"
                fo:border-bottom="0.035cm solid #000000"
                fo:border-left="0.035cm solid #000000"
                fo:padding="0.10cm">         
                <xsl:call-template name="set-bg-color">
                    <xsl:with-param name="col-type">type</xsl:with-param>
                </xsl:call-template>
            </style:properties>           
        </style:style>

    </office:automatic-styles>

    <office:body>
    
    <table:table table:name="ctable" table:style-name="ctable"> 6
        <table:table-column table:style-name="ctable.A" />
        <table:table-column table:style-name="ctable.B" />
        <table:table-column table:style-name="ctable.C" />
        <table:table-header-rows>
            <table:table-row>
                <table:table-cell table:style-name="ctable.A1"
                    table:value-type="string">
                    <text:h text:style-name="P2">Time</text:h>
                </table:table-cell>
                <table:table-cell table:style-name="ctable.B1"
                    table:value-type="string">
                    <text:h text:style-name="P2">Author</text:h>
                </table:table-cell>
                <table:table-cell table:style-name="ctable.C1"
                    table:value-type="string">
                    <text:h text:style-name="P2">Type</text:h>
                </table:table-cell>
            </table:table-row>
        </table:table-header-rows>
    
        <xsl:choose> 7
            <xsl:when test="$sort = 'time' or $sort = 'author'">
                <xsl:apply-templates
                    select="office:document-content/office:body/
                    text:tracked-changes"/>
            </xsl:when>
            <xsl:otherwise>
                <xsl:apply-templates
                    select="office:document-content/office:body/
                    text:tracked-changes/text:changed-region[text:insertion]"/>
                <xsl:apply-templates
                    select="office:document-content/office:body/
                    text:tracked-changes/text:changed-region[text:deletion]"/>
                <xsl:apply-templates
                    select="office:document-content/office:body/
                    text:tracked-changes/text:changed-region[text:format-change]"/>
            </xsl:otherwise>
        </xsl:choose>
    </table:table>
    </office:body>
    </office:document-content>

</xsl:template>
1 Style P1 will be used for all text except the table headings, which use style P2, which makes headings bold, italic, and centered.
2 The table is 15 centimeters wide, with columns of 4.5, 7, and 3.5 centimeters.
3 Because we will be adding a background color to only one cell in each row, we have to create separate styles for each cell in a row. Styles ctable.A1,ctable.B1, and ctable.C1 are for the cells in the first row. Note that only ctable.C1 has a right border.
4 This template will add the background color to the style if the global sort parameter (specified by the user) matches the col-type, which is the “type of data this column contains.”
5 We have to create a similar set of styles for the second and subsequent rows; none of these has a top border (since the bottom margin of the row above fills in that line), and, again, only ctable.C2 has a right border.
6 The table begins with the three <table:table-column> elements, followed by the first row of the table. The first row is enclosed in a <table:table-header-rows> element so that it will be repeated in case the table extends across a page boundary.
7 This logic is a bit tricky. Time and author information are stored as attributes in the <office:change-info> element which is inside each <text:changed-region>, so we hand that processing off to another template. Each type of change, however, is represented by a different element within the <text:changed-region>. (<text:insertion>, <text:deletion>, or <text:format-change>), and we handle them right away.

Here is the template that sets background color:

<xsl:template name="set-bg-color">
<xsl:param name="col-type"/>
<xsl:if test="$sort = $col-type">
    <xsl:attribute name="fo:background-color">#ddffdd</xsl:attribute>
</xsl:if>
</xsl:template>

The following template handles the sorting of time or date; notice that we must search the descendant:: axis, since the <office:change-info> element is a grandchild of the <text:changed-region>.

<xsl:template match="text:tracked-changes">
    <xsl:choose>
    <xsl:when test="$sort = 'time'">
        <xsl:apply-templates select="text:changed-region">
            <xsl:sort
                select="descendant::office:change-info/@office:chg-date-time"/>
        </xsl:apply-templates>
    </xsl:when>
    <xsl:when test="$sort = 'author'">
        <xsl:apply-templates select="text:changed-region">
            <xsl:sort
                select="descendant::office:change-info/@office:chg-author"/>
        </xsl:apply-templates>
    </xsl:when>
    </xsl:choose>
</xsl:template>

Each <text:changed-region> creates a new table row, with the appropriate data in each cell. We call a template named format-time to change the ISO8601 format to something slightly less unpleasant.

<xsl:template match="text:changed-region">

<table:table-row>
<table:table-cell table:style-name="ctable.A2">
<text:p text:style-name="P1">
    <xsl:call-template name="format-time">
        <xsl:with-param name="time"
            select="descendant::office:change-info/@office:chg-date-time"/>
    </xsl:call-template>
</text:p>
</table:table-cell>

<table:table-cell table:style-name="ctable.B2">
<text:p text:style-name="P1">
    <xsl:value-of select="descendant::office:change-info/@office:chg-author"/>
</text:p>
</table:table-cell>

<table:table-cell table:style-name="ctable.C2">
<text:p text:style-name="P1">
<xsl:choose>
    <xsl:when test="text:insertion">
        <xsl:text>Insertion</xsl:text>
    </xsl:when>
    <xsl:when test="text:deletion">
        <xsl:text>Deletion</xsl:text>
    </xsl:when>
    <xsl:when test="text:format-change">
        <xsl:text>Format Change</xsl:text>
    </xsl:when>
</xsl:choose>
</text:p>
</table:table-cell>
</table:table-row>

</xsl:template>

Here is the time formatter; it simply removes the T from the time, and drops the seconds from the time of day.

<xsl:template name="format-time">
<xsl:param name="time"/>
<xsl:value-of select="substring-before($time, 'T')"/>
<xsl:text> </xsl:text>
<xsl:value-of select="substring(substring-after($time, 'T'),1,5)"/>
</xsl:template>

The stylesheet ends with a template that will eliminate any stray text nodes from the output:

<xsl:template match="text()"/>


[6] If you want the frame in the background, then set the style:run-through attribute to background


Creative Commons License Content licensed under a Creative Commons License.
All content is copyright O’Reilly & Associates, Inc.
During development, I give permission for non-commercial copying for educational and review purposes. After publication, all text will be released under the Free Software Foundation’s GNU Free Documentation License.