Click to Rate and Give Feedback
Related Articles
Here the author introduces SQL Server Data Services, which exposes its functionality over standard Web service interfaces.

By David Robinson (July 2008)
Here the author answers questions regarding the Entity Framework and provides an understanding of how and why it was developed.

By Elisa Flasko (July 2008)
Here we present techniques for programmatic and declarative data binding and display with Windows Presentation Foundation.

By Josh Smith (July 2008)
Systems that handle failure without losing data are elusive. Learn how to achieve systems that are both scalable and robust.

By Udi Dahan (July 2008)
More ...
Articles by this Author
Here's a custom branding solution for SharePoint sites that integrates Master Pages and CSS files at the level of the site collection.

By Ted Pattison (July 2008)
Ted Pattison shows how to use a new STSDEV utility to set up and deploy SharePoint development projects in Visual Studio in an easy and repeatable manner.

By Ted Pattison (March 2008)
This month Ted Pattison presents an overview of programming security and permissions for Windows SharePoint Services 3.0.

By Ted Pattison (February 2008)
Windows SharePoint Services (WSS) 3.0 provides a new and improved infrastructure for handling server-side events. In this installment of Office Space, we look at techniques for hooking up Before Events and After Events using both Features and code.

By Ted Pattison (November 2007)
WSS 3.0 introduces a new deployment mechanism that lets you move your development efforts into a staging or production environment.

By Ted Pattison (August 2007)


By Ted Pattison (May 2007)
This new column explores how you can extend and customize Microsoft Office System applications and file formats.

By Ted Pattison (February 2007)
Microsoft Office SharePoint Server (MOSS) 2007 provides great portal and search features and much more, and Ted Pattison puts them to good use here.

By Ted Pattison (August 2006)
More ...
Popular Articles
Jay Flowers demonstrates how to set up and use a Continuous Integration server using both discrete tools and the more comprehensive CI Factory solution.

By Jay Flowers (March 2008)
Microsoft Robotics Studio is not just for playing with robots. It also allows you to build service-based applications for a wide range of hardware devices.

By Sara Morgan (June 2008)
Kenny Kerr sings the praises of the new Visual C++ 2008 Feature Pack, which brings modern conveniences to Visual C++.

By Kenny Kerr (May 2008)
In this article we introduce you to BizTalk Services, new technology that offers the Enterprise Service Bus features of BizTalk Server as a hosted service.

By Jon Flanders and Aaron Skonnard (June 2008)
More ...
Read the Blog
Windows Presentation Foundation (WPF) offers excellent support for managing the display and editing of complex data. In the December 2007 edition of MSDN Magazine, John Papa did a great job of explaining essential WPF data binding concepts. ...
Read more!
The most fundamental form of Web testing is HTTP request/response testing. This involves programmatically sending an HTTP request to the Web application, fetching the HTTP response, and examining the response for an expected value. In the May 2008 issue of MSDN Magazine, Read more!
In the November issue of MSDN Magazine, Jeffrey Richter demonstrates some recent additions to the C# programming language that make working with the APM significantly easier. In the June ...
Read more!
The July 2008 issue of MSDN Magazine is now available online. Here's what's in the issue: Data Services: Develop ...
Read more!
The June 2008 issue features the first installment of a new MSDN Magazine column on software design fundamentals. We’ll discuss design patterns and principles in a manner that isn't bound to a specific tool or lifecycle methodology. In this issue, Jeremy Miller starts the Patterns in Practice column ...
Read more!
In the April 2008 issue of MSDN Magazine, Kenny Kerr introduced the Windows Imaging Component (WIC), showing you how you can use it to encode and decode different image ...
Read more!
More ...
Basic Instincts
Server-Side Generation of Word 2007 Docs
Ted Pattison

Code download available at: BasicInstincts2006_11.exe (170 KB)
Browse the Code Online
Until now, writing and deploying server-side applications that read, modify, and generate documents used by Microsoft® Office applications has presented a challenge. The older binary format used by Microsoft Word, Excel®, and PowerPoint® was introduced in 1997 and has remained the default file format up through the Office 2003 release. However, this binary file format has proven to be too tricky to work with. The vast majority of production applications that read and write Office documents do so by going through the object model of the hosting Office application.
Applications and components that use the object model of applications such as Word or Excel run much better on the desktop than they do in server-side scenarios. Anyone who has spent time writing the extra infrastructure code required to make an Office desktop application behave reliably on the server will tell you it's a less than desirable solution. That's because Office desktop applications such as Word and Excel were never designed to run on the server, and they require a custom utility program to terminate and restart them whenever they encounter a modal dialog that requires human intervention.
The ability to read and write Office documents without going through the object model of the hosting Office application becomes far more desirable in a server-side scenario. Office 2000 and Office 2003 introduced some modest capabilities for creating Excel workbooks and Word documents using XML. This advancement introduced the possibility of writing portions of an Office document using an XML library such as the one contained in the Microsoft .NET Framework provided through the System.Xml namespace.
In the 2007 Microsoft Office system, Microsoft has taken this idea much further by adopting the Office Open XML file formats for the documents used by Microsoft Office Word, Excel, and PowerPoint 2007. The Office Open XML file formats are exciting to ASP.NET and SharePoint® developers because they provide the ability to read, write, and generate a Word document, an Excel workbook, or a PowerPoint presentation on the server without requiring the system to run an Office desktop application on the server.
The Office Open XML file formats are on their way to becoming a published European Computer Manufacturers Association (ECMA) standard. You can read more about the ongoing standardization process, download the latest draft of the Office Open XML file formats specification, and find other great online resources at openxmldeveloper.com. My goal in this month's column is to introduce the programming required to create simple Word documents within an ASP.NET application and pass them back to users who will be able to open them directly with Word 2007.

Word 2007 Document Internals
Let's begin by examining the structure of a simple Word document based on the Office Open XML file formats. As you will see, the Office Open XML file formats are based on standard ZIP file technology. Each top-level file is saved as a ZIP archive, which means you will be able to open a Word document just as you would any other ZIP file and snoop around inside using the ZIP file support built into Windows® Explorer.
Note that the 2007 Microsoft Office system applications such as Word and Excel have introduced file extensions for documents that use the new format. For example, the .docx extension is used for Word documents stored in the Office Open XML file formats while the more familiar .doc extension continues to be used for Word docs stored in the binary format.
To see what I'm talking about, create a new document in Word 2007 and add "Hello Word" as the text. Save the document using the default format to a new file named Hello.docx and close Word. Next, locate Hello.docx using Windows Explorer and rename it Hello.zip. Open Hello.zip and see the structure of folders and files that Word has created inside (see Figure 1).
Figure 1 DOCX File is a ZIP Archive (Click the image for a larger view)
The top-level file (Hello.docx) is known as a package. Since a package is implemented as a standard ZIP archive, it automatically provides compression and it makes its contents instantly accessible to many existing utilities and APIs on both Windows platforms and other platforms alike.
Inside a package there are two kinds of internal components: parts and items. In general, parts contain content and items contain metadata describing the parts. Items can be further subdivided into relationship items and content-type items. A part is an internal component containing content that is persisted inside the package. The majority of parts are simple text files that are serialized as XML with an associated XML schema. However, parts can also be serialized as binary data when necessary, for example when a Word document contains a graphic image or a media file.
A part is named using a Uniform Resource Identifier (URI) that contains its relative path within the package file combined with the part file name. For example, the main part within the package for a Word document is /word/document.xml. Here are a few more typical part names you will find inside the package for a simple Word document:
  • /[Content_Types].xml
  • /_rels/.rels
  • /docProps/app.xml
  • /docProps/core.xml
  • /word/_rels/document.xml.rels
  • /word/document.xml
  • /word/fontTable.xml
  • /word/settings.xml
  • /word/styles.xml
  • /word/theme/theme1.xml
The Office Open XML file formats use relationships to define associations between a source and a target part. A package relationship defines an association between the top-level package and a part. A part relationship defines an association between a parent part and a child part. Relationships are important because they make these associations discoverable without having to examine the content within the parts in questions. That means relationships are independent of content-specific schema and are, therefore, faster to resolve. There is an additional benefit in that you can establish a relationship between two parts without having to modify either one of them.
Relationships are defined in internal components known as relationship items. A relationship item is stored inside the package just like a part, although a relationship item is not actually considered a part. For consistency, relationship items are always created inside folders named _rels.
For example, a package contains exactly one package-relationship item named /_rels/.rels. The package-relationship item contains XML elements to define package relationships such as the one between the top-level package for a .docx file and the internal part /word/document.xml, as shown here:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="../package/2006/relationships ">
  <Relationship Id="rId1"
      Type="../officeDocument/2006/relationships/officeDocument "
      Target="word/document.xml"/>
</Relationships>

As you can see, a Relationship element defines a name, a type, and a target part. You should also observe that the type name for a relationship is defined using the same conventions used to create XML namespaces.
In addition to a single package-relationship item, a package can also contain one or more part-relationship items. For example, you define relationships between /word/document.xml and child parts inside a package-relationship item located at the URI /word/_rels/document.xml.rels. Note that the Target attribute for a relationship in a part-relationship item is a URI relative to the parent part and not the top-level package.
Every part inside a package is defined in terms of a specific content type. A content type is metadata which defines a part's media type, a subtype, and a set of optional parameters. Any content type used within a package must be explicitly defined inside a component known as a content type item. Each package has exactly one content type item named /[Content_Types].xml. An example of content type definitions defined inside /[Content_Types].xml is shown in Figure 2. You should observe from this figure that relationship items are like parts in that they are also defined with an associated content type.
Content types are used by the consumer of a package to interpret how to read and render the content within its parts. As you can see from Figure 2, a default content type is typically associated with a file extension, such as .rels or .xml. Override content types are used to define a specific part in terms of a content type that is different from the default content type associated with its file extension. For example, Figure 2 shows how /word/document.xml is associated with a content type that is different from the default content type used for files with the .xml extension.

Generating Your First DOCX File
While there are several existing libraries you can use to read and write to ZIP files, you should use the new packaging API that is part of the WindowsBase.dll assembly that ships with the .NET Framework 3.0 whenever possible because the packaging API is aware of the Office Open XML file formats. For example, there are convenient methods that make it easy to add relationship elements to a relationship item and to add content type elements to a content type item. The packaging API makes things easier because you never have to touch these elements and items directly.
This month's code uses the version of WindowsBase.dll that is installed along with the WinFX® Runtime Components February community technology preview (CTP). Note that WinFX Runtime Components were renamed to the .NET Framework 3.0 with the June CTP release. You can use either of these two versions or any later release of the .NET Framework 3.0. To test the code we are going to write in this column, you will also need to install Word 2007 Beta 2 or later. (Changes to Office since Beta 2 may require some modifications to the code presented here.)
Once you have installed the .NET Framework 3.0, which includes WindowsBase.dll, you can start programming against the packaging API in a Visual Studio® 2005 project by adding a reference, as shown in Figure 3.
Figure 3 Add Reference to WindowsBase.dll (Click the image for a larger view)
The classes that make up the packaging API are contained inside the namespace System.IO.Package. When working with packages, you will also be frequently programming against older and (hopefully) familiar classes in the System.IO namespace and the System.Xml namespace. Examine the code in Figure 4 that shows the skeleton for creating a new package.
The System.IO.Packaging namespace contains the Package class, which exposes a shared method named Open to be used to create new packages and open existing packages. As with other classes that deal with file I/O, a call to Open should always be complemented with a call to Close.
Once you have created the new package, the next step is to create one or more parts and to serialize content into them. In my example in this month's column, we will follow the guidelines for a "Hello World" application, which will require the creation of a single part named \word\document.xml. We can create a part by calling the CreatePart method on an open Package object and passing parameters for a URI and a string-based content type:
'*** Create main document part (document.xml) ...
Dim uri As Uri = New Uri("/word/document.xml", UriKind.Relative)
Dim partContentType As String = "application/vnd.openxmlformats" & _
    "-officedocument.wordprocessingml.document.main+xml"
Dim part As PackagePart = pack.CreatePart(uri, partContentType)

'*** Get stream for document.xml
Dim streamPart As New StreamWriter(part.GetStream(FileMode.Create, _
                                                  FileAccess.Write))
The call to CreatePart passes a URI based on the path /word/document.xml and the content type that's required by the Office Open XML file formats for the part in a word-processing document that contains the main story. Once we have created a part, we must serialize our content into it using standard stream-based programming techniques. The code you have just seen opens a stream on the part by calling the GetStream method and uses the resulting Stream object to initialize a StreamWriter object.
The StreamWriter object is going to be used to serialize the "Hello World" XML document into document.xml. Take a look at the following XML; it represents the simplest of XML documents that can be serialized in document.xml:
<?xml version="1.0" encoding="utf-8"?>
<w:document xmlns:w=
    "http://schemas.openxmlformats.org/wordprocessingml/2006/3/main"> 
  <w:body>
    <w:p>
      <w:r>
        <w:t>Hello Word 2007</w:t>
      </w:r>
    </w:p>
  </w:body>
</w:document>
Note that all the elements in this XML document are defined in the schemas.openxmlformats.org/wordprocessingml/2006/3/main namespace as required by the Office Open XML file formats. The XML document contains a high-level document element. Within the document element there is a body element that contains the main story.
Within the body element there is a <p> element for each paragraph. Within the <p> element is an <r> which defines a run. A run is a region of elements that share the same set of characteristics. Within the run is a <t> element which defines a range of text, in this case "Hello Word 2007".
Now it's time to generate this XML document with code using the XmlDocument class from the System.Xml namespace. If you take a look at Figure 5 you'll see how the code creates these elements within the proper structure and using the appropriate namespace. You should also note how this code writes the XML document into the stream using the StreamWriter object and then closes the stream and flushes the serialized content into the Package object.
Now we are finished writing the XML content into document.xml. The final step is to create a relationship between the package and document.xml by calling the CreateRelationship method of the Package object. This is easy as long as you know the correct string value for the relationship type and you can come up with a unique name (such as rId1) for the relationship being created:
'*** Create the relationship part
Dim relationshipType As String = _
    "http://schemas.openxmlformats.org" & _
    "/officeDocument/2006/relationships/officeDocument"
pack.CreateRelationship(uri, TargetMode.Internal, _
    relationshipType, "rId1")
pack.Flush()

'*** Close package
pack.Close()
You should also observe the call to Flush after the call to CreateRelationship. This call forces the packaging API to update the package relationship item with the proper Relationship element. The final call to the Package object's Close method completes the package serialization and releases the file handle on Hello.docx.
Now you have seen all the steps to generate a simple .docx file from a console application written in Visual Basic® 2005, all of which is available here. Now let's write the code to generate a .docx file on the server side from within an ASP.NET app.

Generating DOCX Files on the Server
The first thing to consider when modifying the code from the console application to run on a Web server is that you probably want to avoid having to save the package out as a physical file within the file system of the host computer. Instead, it will be faster to simply create the package in a memory stream within the ASP.NET worker process. Then you can write it back to the client using the OutputStream object of the ASP.NET Response object.
Let's start by changing the code to use a MemoryStream object instead of a physical file. Examine the code in Figure 6, which has been added to a server-side event handler inside an ASP.NET 2.0 page. Note that the first parameter to Package.Open has changed from the string-based file path to a MemoryStream object. This approach will allow you to reuse the same code for creating the package and its parts as we did earlier. However, you don't have to worry about creating and naming an OS-level file. This approach may also provide faster response times and better throughput in high-traffic scenarios.
The code you have just seen creates a MemoryStream object and then serializes a .docx file into it just as you would serialize a .docx file into a physical file. The code inside the custom WriteContentToPackage method was taken directly from the code inside the console application shown earlier. However, now it's creating a package and serializing it into a buffer in memory instead of into a physical file.
Once you have written the package into the MemoryStream object and called Close on the package object, your work with the packaging API is complete. All that's left to do is to set up the appropriate HTTP headers and write the package content into the body of the response that is being sent back to the client.
Let's start with the HTTP headers. You should call methods on the ASP.NET response object to clear any existing headers and to add a content-disposition header specifying that the response contains an attachment with a file name of Hello.docx:
Response.ClearHeaders()
Response.AddHeader("content-disposition", _
                   "attachment; filename=Hello.docx")
Next, you must set up the encoding and MIME content type for the HTTP response and then write the binary content for the package into the body of the HTTP response. This can be accomplished using the code in Figure 7. Note that the last two calls to Response.Flush and Response.Close are required to make sure the entire package is completely written into the HTTP response.
As long as the client's machine has been configured to understand the MIME content type associated with a .docx file, the user will be presented with the usual options: open the document within Word or save it to the local hard drive.
If the user clicks on the Open button, the .docx file is opened automatically in Word as shown in Figure 8. It is interesting to note that up to this point the package has never been saved as a physical file to the file system. It's been stored only in memory both on the Web server and on the client desktop computer. If the user closes the document without saving it, it is as if it never existed at all. This makes this approach ideal for generating templates for letters, memos, customer lists, and any type of document you can think of.
Figure 8 Document Opens in Word 2007 (Click the image for a larger view)

Send your questions and comments for Ted to instinct@microsoft.com.


Ted Pattison is an author, trainer and SharePoint MVP who lives in Tampa, Florida. Ted is writing a book titled Inside Windows SharePoint Services 3.0 for Microsoft Press and he delivers advanced SharePoint 2007 Training to professional developers through his company, Ted Pattison Group (http://www.TedPattison.net).

Page view tracker