Using Ruby on Rails and XSLT to Create a Word 2007 Document
Summary: Learn how to use Ruby on Rails to transform XML data into the Open XML Format to create a Word 2007 document. (18 Printed Pages)
Applies to: 2007 Microsoft Office system, Microsoft Office Word 2007, Ruby 1.8.7, Rails 2.3.2
Joel Krist, iSoftStone
January 2010
Code It | Read It | Explore It
Code It
This Visual How To shows how to create a Ruby on Rails application that you can use to create an XML data file, an XSL Transform file, and a Word 2007 document template. The application uses XSLT to create a nicely formatted Word 2007 document that contains the data from the XML data file and then sends the new document back to the browser.
Figure 1. Ruby on Rails Application
This Visual How To is based on Use Ruby on Rails to modify an Open XML document by Hadley Pettigrew. It is not meant to be a primer on how to use the Ruby language or how to work with the Rails Web development framework. Although it provides the steps to create a Ruby on Rails application, it assumes that the reader is familiar with Ruby and Rails concepts such as gems, models, controllers, and views. For more information about the Ruby language, see About Ruby. For more information about Rails see Getting Started with Rails.
To create the solution described earlier, this section includes the following steps:
Creating an XML data file.
Creating a Word 2007 template document that has the desired layout.
Extracting the contents of the main document part of the Word 2007 template document.
Creating an XSL Transform file that is based on the extracted content.
Adding Transforms to the XSL Transform file.
Creating a Ruby on Rails application.
Creating an XML Data File
The sample code for this Visual How To specifies that a personal movie library XML data file named MyMovies.xml exists in the C:\Temp directory.
To create the MyMovies.xml data source
Start Visual Studio 2008.
From the File menu, point to New and then click File.
In the New File dialog box, select XML File.
Copy the following XML example into the new file.
<?xml version="1.0" encoding="UTF-8"?> <Movies> <Genre name="Action"> <Movie> <Name>Crash</Name> <Released>2005</Released> </Movie> </Genre> <Genre name="Drama"> <Movie> <Name>The Departed</Name> <Released>2006</Released> </Movie> <Movie> <Name>The Pursuit of Happyness</Name> <Released>2006</Released> </Movie> </Genre> <Genre name="Comedy"> <Movie> <Name>The Bucket List</Name> <Released>2007</Released> </Movie> </Genre> </Movies>
Save the document as C:\Temp\MyMovies.xml.
Creating a Word 2007 Template Document
The sample code uses an existing Word 2007 document to help simplify creating the XSL Transform file. The code specifies that a Word 2007 document named MyMoviesTemplate.docx is in the C:\Temp directory.
To create the MyMoviesTemplate.docx document
Start Word 2007.
Add text to the document and placeholders for the XML data following the layout in the example. The placeholders in the document shown in this section are Genre Name, Movie Title, and Year. When the code performs the XSL transform, data that is pulled from the XML data file that you created earlier replace the placeholders.
Figure 2. Word 2007 Template Document
Save the document as C:\Temp\MyMoviesTemplate.docx.
Extracting the Contents of the Main Document Part of the Template Document
To simplify creating the XSL Transform file, use the contents of the Word 2007 template document that you created in the previous step as a starting point. Word 2007 documents use the Open Packaging Conventions (OPC). This means that Word 2007 documents are actually .zip files that contain XML files, binary files, and other kinds of files. By adding the .zip extension to the end of the Word 2007 document name, you can then use tools such as WinZip or Windows Explorer to examine and extract the contents of the document.
To create the XSL Transform file, you temporarily add a .zip extension to the Word 2007 template document that you created earlier. You then extract the document.xml file from the template document to the C:\Temp directory, rename it MyMovies.xslt, and convert it from XML to XSLT.
To extract the contents of the main document part of the template document
Start Windows Explorer and navigate to the folder that contains the MyMoviesTemplate.docx file that you created earlier.
Rename MyMoviesTemplate.docx to MyMoviesTemplate.docx.zip to gain access to the underlying Open XML files.
Use Windows Explorer to explore the contents of MyMoviesTemplate.docx.zip. Navigate to the Word 2007 folder in the .zip file, copy the document.xml file to the C:\Temp folder, and then rename the file MyMovies.xslt.
Figure 3. Copying the Document.XML File
Rename MyMoviesTemplate.docx.zip back to MyMoviesTemplate.docx.
Creating an XSL Transform File Based on the Extracted Content
The next step is to convert the MyMovies.xslt file to an XSL Transform file.
To create the MyMovies.xslt XSL transform file
In Visual Studio 2008, open the MyMovies.xslt that you created in the previous step.
Do the following to convert the document structure of MyMovies.xslt from XML to XSLT:
Remove the following line from the top of the document.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>Add the following line to the top of the document.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">Close the style sheet element by adding the following line to the very end of the document.
</xsl:stylesheet>Add an <xsl:template> element around the existing <w:document> element.
<xsl:template match="/"> <w:document ...> <w:body> ... </w:body> </w:document> </xsl:template>
Save the changes to MyMovies.xslt.
Adding Transforms to the XSL Transform File
In the next step, you add XSL Transform elements to list the name of each genre and to add information about each movie in that genre that the XML data file contains. This transform uses the xsl:value-of and xsl:for-each elements.
Use the xsl:value-of elements to replace the Genre Name, Movie Title, and (year) placeholders in the Word 2007 template document.
Use two xsl:for-each elements to list each genre and the movies in that genre. When you place the xsl:for-each elements, make sure that you include all Open XML elements that relate to the genre or to the movie text that you repeat to ensure that the transformation outputs valid Open XML.
The following example is a fragment from the contents of the original document.xml. It shows the placeholder text from the template document.
...
<w:p w:rsidR="00EC137C" w:rsidRPr="00BF ...
<w:pPr>
<w:pStyle w:val="Heading2"/>
</w:pPr>
<w:r w:rsidRPr="00BF350E">
<w:t>Genre Name</w:t>
</w:r>
</w:p>
<w:p w:rsidR="00EC137C" w:rsidRPr="00EC1 ...
<w:pPr>
<w:pStyle w:val="ListParagraph"/>
<w:numPr>
<w:ilvl w:val="0"/>
<w:numId w:val="1"/>
</w:numPr>
</w:pPr>
<w:r w:rsidRPr="00BF350E">
<w:rPr>
<w:b/>
</w:rPr>
<w:t>Movie Title</w:t>
</w:r>
<w:r w:rsidR="00C46B60">
<w:t xml:space="preserve"> (year)</w:t>
</w:r>
</w:p>
...
The following example is an XSLT fragment of the modified MyMovies.xslt file. It shows the changes made to include the XSL Transform elements.
<!-- for-each loop added for Genre. This loop includes the Open XML elements for the paragraph the Genre placeholder is in and all paragraphs for the Movies. -->
<xsl:for-each select="Movies/Genre">
<w:p w:rsidR="00EC137C" w:rsidRPr="00BF ...
<w:pPr>
<w:pStyle w:val="Heading2"/>
</w:pPr>
<w:r w:rsidRPr="00BF350E">
<w:t>
<!-- Genre Name placeholder replaced by the Genre's Name attribute in the XML data file. -->
<xsl:value-of select="@name"/>
</w:t>
</<xsl:value-of select w:r>
</w:p>
<!-- for-each loop added for Movie. This loop includes the Open XML elements that define the paragraph as a bulleted list. -->
<xsl:for-each select="Movie">
<w:p w:rsidR="00EC137C" w:rsidRPr="00EC1 ...
<w:pPr>
<w:pStyle w:val="ListParagraph"/>
<w:numPr>
<w:ilvl w:val="0"/>
<w:numId w:val="1"/>
</w:numPr>
</w:pPr>
<w:r w:rsidRPr="00BF350E">
<w:rPr>
<w:b/>
</w:rPr>
<w:t>
<!-- Movie Title placeholder replaced by the Movie's Name element in the XML data file. -->
<xsl:value-of select="Name"/>
</w:t>
</w:r>
<w:r w:rsidR="00C46B60">
<!-- Year placeholder replaced by the Movie's Released element in the XML data file. -->
<w:t xml:space="preserve"> (<xsl:value-of select="Released"/>)
</w:t>
</w:r>
</w:p>
</xsl:for-each>
</xsl:for-each>
...
Be sure to save the modified MyMovies.xslt file after you make any changes.
Creating a Ruby on Rails Application
The Ruby on Rails applications that you create in this Visual How To uses the following components, which you must install before you build the application:
Ruby version 1.8.6 language.
RubyGems version 1.3.5
Ruby package manager to enable installation of Ruby gems.
Rails version 2.3.2 Web development framework.
ZipRuby gem version 0.3.2 to provide Ruby language bindings for libzip library for reading, creating, and modifying zip archives to enable working with Word 2007 documents as ZIP files.
Nokogiri gem version 1.3.3 to enable XML document support.
LibXml-Ruby gem version 1.1.3 to provide Ruby language bindings for the GNOME Libxml2 XML toolkit.
LibXSLT-Ruby gem version 0.9.2 to provide Ruby language bindings for the GNOME Libxslt toolkit.
To install the prerequisite components
Install Ruby:
Download the installer for Ruby version 1.8.6 and save it to a local folder.
Run the downloaded Ruby installer.
Install RubyGems:
Download the RubyGems version 1.3.5 archive and save it to a local folder.
Extract the files from the RubyGems archive to the local folder.
Open a Windows command prompt and navigate to the local folder that contains the RubyGems files.
Run the following command from the Windows command prompt to install RubyGems.
ruby setup.rb
Install Rails:
At a Windows command prompt, navigate to the local folder that contains the RubyGems files extracted previously.
Run the following command from the Windows command prompt to install Rails.
gem install -v=2.3.2 rails
Install ZipRuby:
Download the ZipRuby version 0.3.2 gem file and save it to a local folder.
At a Windows command prompt, navigate to the local folder that contains the ZipRuby gem file.
Run the following command from the Windows command prompt to install ZipRuby.
gem install zipruby
Install Nokogiri:
Download the Nokogiri version 1.3.3 archive and save it to a local folder.
At a Windows command prompt, navigate to the local folder that contains the downloaded Nokogiri archive.
Run the following command from the Windows command prompt to install Nokogiri.
gem install nokogiri
Install LibXml-Ruby:
At a Windows command prompt, navigate to the local folder that contains the RubyGems files extracted previously.
Run the following command from the Windows command prompt to install LibXml-Ruby.
gem install libxml-rubyIgnore the many warnings displayed when the ri and RDoc documentation is installed.
Install LibXSLT-Ruby:
At a Windows command prompt, navigate to the local folder that contains the RubyGems files extracted previously.
Run the following command from the Windows command prompt to install LibXSLT-Ruby.
gem install libxslt-ruby
After you successfully install the prerequisite components, you can create the Ruby on Rails application. Use the following steps to create the application in a folder called RubyAndOpenXml, located in the C:\Temp folder. To place the application in a different folder, change all references to C:\Temp\RubyAndOpenXml to the desired folder path.
To create a Ruby on Rails application
Open a Windows command prompt.
Create a folder for the new application. Run the following command from the Windows command prompt.
md C:\Temp\RubyAndOpenXmlCreate the Ruby on Rails application in the new folder. Run the following command from the Windows command prompt.
rails C:\Temp\RubyAndOpenXmlMake the new folder the current folder. Run the following command from the Windows command prompt.
cd C:\Temp\RubyAndOpenXmlCreate models and controllers. Run the following commands from the Windows command prompt in the new folder.
ruby script/generate model DataFile
ruby script/generate model OfficeOpenXml
ruby script/generate controller UploadAdd code to the data_file.rb file.
Use your favorite editor to open the following file.
C:\Temp\RubyAndOpenXml\app\models\data_file.rbReplace the contents of the data_file.rb file with the following Ruby code.
class DataFile def initialize end # Save the uploaded files to a temp folder and then perform # translation. def self.save(upload,upload1,upload2) name = sanitize_filename(upload['file'].original_filename).to_s name1 = sanitize_filename(upload1['file1'].original_filename).to_s name2 = sanitize_filename(upload2['file2'].original_filename).to_s directory = "public\\data\\" # Create the file path. path = File.join(directory, name).to_s path1 = File.join(directory,name1).to_s path2 = File.join(directory,name2).to_s # Save the files. upload_file(path,upload,'file') upload_file(path1,upload1,'file1') upload_file(path2,upload2,'file2') OfficeOpenXML.translate(path, path1, path2, "public\\resources\\newdoc.docx") end private def self.upload_file(path,uploadfile,file) File.open(path, "wb") do |f| f.write(uploadfile[file].read) end end private def self.sanitize_filename(file_name) # Get only the filename, not the whole path. just_filename = File.basename(file_name) # Replace all non-alphanumeric, underscore or period characters # with an underscore. just_filename.gsub(/[^\w\.\_]/,'_') end end
Add code to the office_open_xml.rb file.
Use your favorite editor to open the following file. C:\Temp\RubyAndOpenXml\app\models\office_open_xml.rb
Replace the contents of the office_open_xml.rb file with the following Ruby code.
require 'zipruby' require 'nokogiri' require 'fileutils' class OfficeOpenXML def self.translate(xslt, template, xml, newdoc) new(xslt, template, xml, newdoc).translate end def initialize(xslt, template, xml, newdoc) # Store the instance variables. @xslt, @template, @xml, @newdoc = xslt, template, xml, newdoc end def translate wordprocessingml_schema = "http://schemas.openxmlformats.org/wordprocessingml/2006/main" # Get the contents of the main document part from the # template document. existing_xml = get_from_template("word/document.xml") # Get the w:body node. body_node = existing_xml.root.xpath( "w:body", {"w" => wordprocessingml_schema}).first # Clear the contents of the document by removing all child nodes # of the w:body element of the template document. body_node.children.unlink # Add each w:body child node from the new transformed XML to the # body of the template document. new_xml.xpath( "*/w:body", {"w" => wordprocessingml_schema}).first.children.each do |child| body_node.add_child(child) end # Save the template document as a new document. compress(existing_xml) end def get_from_template(filename) # Retrieve the contents of the main document part from the # template document. xml = Zip::Archive.open(@template) do |zipfile| zipfile.fopen(filename).read end # Parse the resulting XML into a Nokogiri XML document. Nokogiri::XML.parse(xml) end def new_xml # Transform the values from the XML data file. stylesheet_doc.transform(Nokogiri::XML.parse(File.open(@xml))) end def compress(newXML) # Copy the modified template document to a new document. FileUtils.copy(@template, @newdoc) # Open the new document as a ZIP archive. Zip::Archive.open(@newdoc, Zip::CREATE) do |zipfile| # Replace the contents of the main document part of the new # document with the new transformed XML. zipfile.add_or_replace_buffer('word/document.xml', newXML.to_s) end end def stylesheet_doc # Parse the XSLT file into a Nokogiri XSLT document. Nokogiri::XSLT.parse(File.open(@xslt)) end end
Add code to the upload_controller.rb file.
Use your favorite editor to open the following file.
C:\Temp\RubyAndOpenXml\app\controllers\upload_controller.rbReplace the contents of the upload_controller.rb file with the following Ruby code.
require 'data_file' class UploadController < ApplicationController def initialize end def index render :file=> 'app\\views\\uploadfile.rhtml' end def uploadfile mime_type = "application/vnd.openxmlformats- officedocument.wordprocessingml.document" DataFile.save(params[:upload],params[:upload1],params[:upload2]) # Send the new file with the wordprocessingml document # content type. send_file("#{RAILS_ROOT}/public/resources/newdoc.docx", :filename=> "newdoc.docx", :type=>mime_type) end end
Create a view for the application.
Use your favorite editor to create a new file named uploadfile.rhtml in the following folder.
C:\Temp\RubyAndOpenXml\app\viewsAdd the following markup to uploadfile.rhtml and save the changes to the file.
The following example is Ruby HTML.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"> <html> <head> <title>Using Ruby on Rails and XSLT to Create a Word 2007 Document</title> <style type="text/css"> body { background-color:gray; font-family: tahoma, verdana, sans-serif; font-size:11px; margin-top:100px; } h1 { font-size:1.3em; } .upload { margin:0px auto; padding-top:15px; width:600px; height:160px; border:4px solid silver; background-color: #CCC; text-align: center; } input { font-size:11px; font-family: tahoma, verdana, sans-serif; } .submit { margin-top:10px; } </style> </head> <body> <div class="upload"> <h1>Using Ruby on Rails and XSLT to Create a Word 2007 Document</h1> <%= form_tag({:action => "uploadfile"}, {:id=>"upload_form",:name=>"upload_form", :multipart=> true}) %> <table width="90%"> <tr> <td align="left"> XSL Transform File (.xslt): </td> <td align="right"> <%= file_field 'upload', 'file', "size" => 60%> </td> </tr> <tr> <td align="left"> Template Document (.docx): </td> <td align="right"> <%= file_field 'upload1', 'file1', "size" => 60%> </td> </tr> <tr> <td align="left"> XML Data File (.xml): </td> <td align="right"> <%= file_field 'upload2', 'file2', "size" => 60%> </td> </tr> <tr> <td colspan="2" align="center"> <%= submit_tag "Create Document", :class => "submit" %> </td> </tr> </table> <%= form_tag %> </div> </body> </html>
Configure the Ruby on Rails application to use uploadfile.rhtml as the site's home page.
Run the following command from the Windows command prompt to delete the default home page.
del C:\Temp\RubyAndOpenXml\public\index.htmlUse your favorite editor to open the following file.
C:\Temp\RubyAndOpenXml\config\routes.rbEdit routes.rb and add a map.root :controller statement that points at the upload controller. The routes.rb file contains a map.root statement that is commented out. Uncomment the statement and change it to the following.
map.root :controller => "Upload"Save the changes to the routes.rb file.
Run the following commands from the Windows command prompt to create the folders that the Ruby on Rails application uses when you process uploaded files and generate the new document.
md C:\Temp\RubyAndOpenXml\public\data
md C:\Temp\RubyAndOpenXml\public\resourcesConfigure the Ruby on Rails application to run without a database.
Use your favorite editor to open the following file.
C:\Temp\RubyAndOpenXml\config\environment.rbEdit environment.rb and remove the comment mark in the following line.
#config.frameworks -= [ :active_record, :active_resource, :action_mailer ]Save the changes to the environment.rb file.
To run the Ruby on Rails application
At a Windows command prompt, navigate to the following folder.
C:\Temp\RubyAndOpenXmlStart the WEBrick Web server by using the following command.
ruby script/serverOpen the browser and navigate to the following site to display the upload page of the Ruby on Rails application.
http://localhost:3000.Select the MyMovies.xml data file, the MyMoviesTemplate.docx Word 2007 template document, and the MyMovies.xslt XSL transform file that you created previously, and then click Create Document. The application uploads the files, applies the XSL transform to the XML data, generates a new Word 2007 document that has the XML data, and then sends the new document back to the browser.
Figure 4. The New Document
Read It
You can simplify creating a Word 2007 document by starting with a document that already contains the desired layout. After you create the document, you can extract its contents and add XSLT elements to replace values or to repeat information.
The Ruby on Rails application in this Visual How To transforms XML data into the Open XML Wordprocessing format by using an XSL Transform file that you create from a Word 2007 template document. The application then creates a Word 2007 document that contains the XML data and sends the new document to the browser. This section uses code examples from the sample code to describe the approach that the application uses.
When you click the Create Document button, the application uploads the selected XSL transform, the Word 2007 template document, and the XML data files to the public\data directory on the server. Next, the application calls the translate method of a Ruby class named OfficeOpenXML to translate the XML data. The translate method is passed the locations of the uploaded files and the location of the new document. The following code example uses Ruby to perform those tasks.
# Save the uploaded files to a temp folder and then perform
# translation.
def self.save(upload,upload1,upload2)
name =
sanitize_filename(upload['file'].original_filename).to_s
name1 =
sanitize_filename(upload1['file1'].original_filename).to_s
name2 =
sanitize_filename(upload2['file2'].original_filename).to_s
directory = "public\\data\\"
# Create the file path.
path = File.join(directory, name).to_s
path1 = File.join(directory,name1).to_s
path2 = File.join(directory,name2).to_s
# Save the files.
upload_file(path,upload,'file')
upload_file(path1,upload1,'file1')
upload_file(path2,upload2,'file2')
OfficeOpenXML.translate(path, path1, path2, "public\\resources\\newdoc.docx")
End
The OfficeOpenXML.translate method uses the ZipRuby library to extract the document.xml document part from the template document and then retrieves the part's XML as a Nokogiri XML document.
The following code example uses Ruby to get the contents of the main document part from the template directory.
# Get the contents of the main document part from the
# template document.
existing_xml = get_from_tempate("word/document.xml")
…
def get_from_tempate(filename)
# Retrieve the contents of the main document part from the
# template document.
xml = Zip::Archive.open(@template) do |zipfile|
zipfile.fopen(filename).read
end
# Parse the resulting XML into a Nokogiri XML document.
Nokogiri::XML.parse(xml)
End
Using XPath, the body node of the template document is retrieved. Because there is only one body element in the document, the first item in the collection is referenced. All child nodes of the body node are then removed, effectively clearing all of the content in the template document.
The following code example uses Ruby to get the w:body node.
# Get the w:body node.
body_node = existing_xml.root.xpath(
"w:body", {"w" => wordprocessingml_schema}).first
# Clear the contents of the document by removing all child nodes
# of the w:body element of the template document.
body_node.children.unlink
The following code example uses Ruby to transform the XML data by using XSLT.
def new_xml
# Transform the values from the XML data file.
stylesheet_doc.transform(Nokogiri::XML.parse(File.open(@xml)))
end
…
def stylesheet_doc
# Parse the XSLT file into a Nokogiri XSLT document.
Nokogiri::XSLT.parse(File.open(@xslt))
End
Next, XPath selects the body element in the new, transformed XML, and then each body node child is added as a child node of the body element in the template document. The following code example uses Ruby to perform those tasks.
# Add each w:body child node from the new transformed XML to the
# body of the template document.
new_xml.xpath(
"*/w:body",
{"w" => wordprocessingml_schema}).first.children.each do
|child| body_node.add_child(child) end
After the body element of the template document is updated, the template document is copied to a new document and ZipRuby is used to replace the main document part of the new document by using the new transformed XML. The following code example uses Ruby to perform those tasks.
# Save the template document as a new document.
compress(existing_xml)
…
def compress(newXML)
# Copy the modified template document to a new document.
FileUtils.copy(@template, @newdoc)
# Open the new document as a ZIP archive.
Zip::Archive.open(@newdoc, Zip::CREATE) do |zipfile|
# Replace the contents of the main document part of the new
# document with the new transformed XML.
zipfile.add_or_replace_buffer('word/document.xml',
newXML.to_s)
end
end
Finally, the new document is sent back to the browser with the content type of application/vnd.openxmlformats-officedocument.wordprocessingml.document, which enables the browser to recognize the file as a Word 2007 document.
The following code example uses Ruby.
mime_type =
"application/vnd.openxmlformats-
officedocument.wordprocessingml.document"
…
# Send the new file with the wordprocessingml document
# content type.
send_file("#{RAILS_ROOT}/public/resources/newdoc.docx",
:filename=> "newdoc.docx",
:type=>mime_type)