Automating Document Assembly with Keylogix ActiveDocs and WordprocessingML

Article
10/16/2013

Summary: Learn how the Keylogix ActiveDocs suite enhances the power of Microsoft Office Word 2003 and Word XML Markup Language (WordprocessingML) and turns your desktop publishing interface into a server-side, enterprise-class document automation engine. (16 printed pages)

Joshua Greenberg, Ph.D., Altra IT Corporation

June 2005

Applies to: Microsoft Office Word 2003

Contents

Scenario Overview
Defining Document Automation
Establishing Requirements
System Performance and Licensing Costs
Design Examples
Conclusion
Additional Resources

Scenario Overview

Nearly all corporations need to produce consistent, personalized documents. In some cases, the output may be no more complex than a one-page form letter with a few variable fields. In other situations, however, the documents may consist of hundreds of pages of complex content representing sales contracts, business agreements, legal documents, monthly statements, and government forms populated by thousands of variable elements. In this article, I define document automation, produce a reference list of important document automation features, discuss the importance of the features, comment on the availability and implementation of the features in different document automation products, and detail the specifics of a document automation solution implemented by using Keylogix ActiveDocs.

This article focuses on the template creation and compilation features of a document automation system. Note, however, that the Keylogix ActiveDocs suite can also generate a fully functional user interface based on field information from the underlying templates. This functionality, while very powerful, is not a common feature of other document automation systems, and for this reason, the implementation example in this article is based on an existing Java 2 Platform Enterprise Edition (J2EE) Web-based interface. The example not only helps to show the use of the ActiveDocs API, but it also highlights the simplicity of integration with legacy systems.

About Keylogix ActiveDocs

This article is the result of my one-year analysis of 22 different document automation solutions. This analysis culminated in my selection of Keylogix ActiveDocs as the product that most closely met the requirements I describe later in this article. Whereas, with other product types, vendors may compete based on esoteric features or performance metrics, the offerings in document automation are easily differentiated. The results of my analysis showed that ActiveDocs was not only the best product of those I tested, but the only product that met my full set of requirements.

Defining Document Automation

I begin my discussion of document automation with a few definitions. While many of the terms used may seem familiar, their repeated appearance in a wide variety of contexts causes ambiguity. Rather than invent a specialized and non-intuitive language for describing the details of document automation, I will use the following definitions in relation to the content in this article:

Content. Any visible element accessible by the document automation system. Elements meant to convey information, including (but not limited to) text, images, or drawings.
Field. Variable field marker. Within a template, a named region, possibly formatted by application of a set of styles, but devoid of content. That which is replaced by content when its containing template is passed through the compilation engine and a corresponding actual value is supplied.
Template. A collection of content, including variable field markers, formatted by application of a set of styles. One of the two inputs to the compilation process, the other being the actual field values that are substituted for the corresponding field markers.
Aggregate Template. A set of one or more templates concatenated but not necessarily ever stored in this fashion. A virtual concept denoting the larger template that results from concatenating a set of one or more smaller templates.
Document. A collection of content, formatted by applying a specified set of styles, and stored in a single file in the file system. The end result of the compilation process. The output of the compilation engine. An instantiated aggregate template. An aggregate template in which variable fields are replaced by actual values.

The two primary components of any document automation system are the template creation interface and the compilation engine. A template is created by using the template creation interface to specify both the static content and the variable fields. At compile time, an external application provides dynamic content to populate (instantiate) the fields in the template. In cases where a set of templates is required, the compilation engine behaves as if it first created and then instantiated an aggregate template. In either case, the result is a document identical in appearance to the template except for replacement of the fields by actual values. Figure 1 shows this process.

Figure 1. High-level document automation process

High-level document automation process

The primary stakeholders in a document automation project may generally be divided into three categories:

Information technology professionals responsible for development and maintenance of the solution
Template specialists responsible for creation and maintenance of the document content
Users responsible for submitting the variable data values into the supplied interface and then submitting the document to the next stage of the workflow

You can also integrate a document automation system with a content management system, portal product, personalization suite, enterprise application integration architecture, or workflow engine to create an even more powerful solution. Such integrations lead naturally to the addition or specialization of roles including workflow developers, personalization experts, and the like.

Establishing Requirements

I collected and refined the requirements presented in this section throughout a one-year vendor analysis process. While I established many of the requirements prior to engaging any vendors, I discovered or tightened others as I gained familiarity with both the specifics of my project and the capabilities of the offerings. For discussion purposes, I broke down the complete set of requirements into seven major sections: templates, documents, printing, template specialist interface, technical, performance, and licensing. This article explains the importance of each requirement and indicates which requirements were most troublesome for a majority of the vendor products.

Template Requirements

Table 1 lists the requirements I developed for templates. These requirements address how templates handle such things as text and fonts, page sizes, conditional text, and tables.

Table 1. Template requirements

Number and Name	Description
Template 1: Text and Fonts	Templates can contain text of any style or layout supported by traditional desktop publishing tools. Must support multiple fonts including bold, italic, and strikethrough at arbitrary point sizes.
Template 2: Styles and Layout	Style and layout options that can be applied to boilerplate text within a template must be available for application to fields. Upon field replacement, actual values inherit the styles and layout applied to the fields they replace.
Template 3: Number of Pages	Templates can contain any number of pages.
Template 4: Multiple Page Sizes	Pages can be any combination of sizes, such as letter, legal, or A4.
Template 5: Independent Page Setup	Page setup details for each page must be independent. Margins, headers, footers, and other details must be specified per page, not per template.
Template 6: Template Availability	Templates must be created prior to compilation and placed in a location accessible to the compilation engine. Optional requirement: Documents must be created at run time.
Template 7: Conditional Content	Templates can contain conditional content, particularly a paragraph or table that appears only in certain circumstances (for example, when the buyer is from Florida).
Template 8: Nested Tables	Templates can contain tables, and tables can be nested within other tables to an arbitrary level of depth.
Template 9: Table Fields and Repeating Groups	Tables can contain fields to be populated with actual data values at run time. Fields within tables must allow multiple values per field. Template must support repeating groups so that the populated table contains exactly the number of rows implied by the repeated field values supplied at run time. Repeating groups can inherit their style from a designated group in the uninstantiated template. Optional requirement: Must support repeating groups outside the context of tables (for example, for numbered list items, bulleted items, and the like).
Template 10: Unicode Text	Templates must support Unicode (multibyte character sets) boilerplate text. Optional requirement: Must support Unicode field replacement values.
Template 11: Static Images	Templates can contain static images, such as .jpeg, .png, and .gif files. Optional requirement: Must support dynamic image inclusion; specifically, fields must support textual as well as image replacement.
Template 12: Text Reflow	On template instantiation, if sizes of values that replace variable fields exceed placeholder sizes, entire template must reflow its layout to accommodate the full value. In contrast, absolute positioning treats each text portion as an independent entity and clips any excess text to fit within the allocated space or else overwrites any following text. Document automation system should support both reflow and absolute (with clipping or overwriting) options.
Template 13: Embedded Logic in Fields	Fields within a template can store embedded logic; particularly, basic addition or string concatenation.

Templates form the core of any document automation system. As Figure 1 shows, a template is a combination of boilerplate text and fields. At run time, the fields are populated with actual data values, and the resulting document can be stored, printed, or both. Consider the following fragment taken from a hypothetical template.

<BuyerName> agrees to purchase <> from <SellerName>.
[<>]
Revisions to this document must be supplied in writing. . .

The four fields, <BuyerName>, <LotNum>, <SellerName>, and <LatestClosingDate>, are replaced with actual values at run time. I also specified that when the <LotNum> and <LatestClosingDate> fields are replaced, the values should appear in bold type. In addition, I invented some convenient syntax to highlight the fact that the entire sentence (line) containing the <LatestClosingDate> field is optional. In this case, when the line is excluded, the following line ("Revisions. . .") is pulled up to sit directly beneath the previous line of text. This example shows the importance of requirements as described in Template Requirements, Template Requirements, Template Requirements, and Template Requirements. As an additional example of Template Requirements, if the buyer and seller names required space greater than that available on the single line, the first line should wrap (using an applied paragraph style), and all following lines should be pushed down. The result might appear as follows:

William Sterling Montgomery agrees to purchase  from
     Ashwatthama Aniruddha Singh.
Revisions to this document must be supplied in writing. . .

In this example, the fields are replaced with actual values, and the text wraps itself in order to conform to the paragraph style applied to the first line (in this case, the first line was itself a paragraph, and the paragraph style mandated that wrapped text should appear indented on the following lines). The closing language is excluded, and the statement regarding revision reflows to appear just beneath the first line. Finally, the actual lot number appears in bold type.

Relative vs. Absolute Positioning

Perhaps the most outwardly visible difference between the various template creation interfaces I reviewed is in the support of either relative or absolute positioning. Relative positioning is best understood as the familiar flow-based paradigm employed by applications such as Microsoft Office Word. For example, as I compose this article, I often want to insert or remove text at a given location. Because I am using Word, such actions produce the intuitive result of moving surrounding text up, down, left, or right, to accommodate my changes. When the surrounding text moves, it conforms to any styles I have specified for the containing paragraphs.

Alternatively, in a system based on absolute positioning, templates are created by repeated placement of text boxes. This process is similar to composing a letter using Microsoft Office PowerPoint 2003. In such a process, the author creates the following text boxes: one to hold the date in the upper-right corner, another to hold the address information, another to hold the greeting, another to hold the body (or possibly one per paragraph of the body), and one to hold the signature. While it may be possible to create documents of any level of complexity using this technique, it is certainly not the most effective or efficient approach. Instead, most documents lend themselves to the relative, flow-based layout found in applications such as Word. Note that any interactions between the text boxes, such as what should occur upon expansion, contraction, or intersection, must be explicitly defined for each individual element.

There may be rare circumstances in which a flow-based paradigm using styles and tables seems inappropriate. In these situations, it would be optimal if the template creation interface supported both options. Keylogix ActiveDocs gains this dual functionality by taking advantage of features available in the Word interface. While such features may not be used very often, Word does provide seamless support for the addition of absolutely positioned text boxes, hand drawings, and other content.

Page Setup

While all products that I tested supported at least one template creation interface of either flow-based or absolute positioning, the same cannot be said of the page setup requirements for templates described in Template Requirements, Template Requirements, and Template Requirements. Of the 22 products I analyzed, only two supported all three of these requirements. The most notable omission was support for multiple page sizes (Template Requirements). While it is possible to split a template with multiple page sizes into a series of templates, each containing pages of only a single size, this is not the preferred solution. Many vendors attempted to minimize the importance of this functionality by stating that the different page sizes could be supported by submitting multiple print jobs. This approach creates two major problems. First, reflow across different page sizes becomes impossible and, second, users run the risk of finding interleaved document sets sitting in their printer tray (for more information, see Document Requirements). If interleaved documents do not seem like a major issue, try convincing a salesperson who just showed half of two different buyers' contracts to his current prospect.

Tables

Tables are an important element of any document automation system. While the traditional use of tables for the display of tabular information is obviously important, the additional layout functionalities made possible through the judicious use of nested tables is equally invaluable. In the world of Web programming, there was a time when tables were essentially the only way to give a page an attractive layout. More recently, the use of cascading style sheets has provided many more layout options. Tables still retain an important role in Web design, however, and they most certainly play an integral part in the layout of printed documents. Even if you completely disagree with the use of tables as a preferred layout mechanism, you must at least acknowledge their use when the gridlines between cells are to be displayed. In my case, Template Requirements was mandatory because I needed a nested table layout, with gridlines, on a government-specified regulatory document.

I have implicitly identified two types of tables: those used for formatting, and those used for the display of tabular data. The latter use breaks down into two very important subtypes: static and dynamic. While a static table is populated with a fixed number of rows at run time, a dynamic table can grow to accommodate a variable number of rows. Dynamic tables are an important feature of any document automation system but are rarely ever implemented. Most products simply allow you to create fixed-size tables that are then populated only as much as required. For example, if a table of ten blank rows is created, and then, if the purchase order has only three line items, the resulting document will contain seven blank rows. For this reason, it is equally important that growth of the dynamic table also results in a reflow of all following content in the document. If this were not the case, then a dynamic table would overwrite any content that appeared below the previously allocated space, and the resulting document would look like gibberish.

Character Set and Image Support

The requirements described in Template Requirements and Template Requirements were supported by nearly half of the candidate products. When Unicode text was restated as a requirement to support Japanese, however, the list of candidates dropped sharply. At this point, it is also interesting to note the various levels of support for fonts. Some rather expensive products, especially those using a proprietary template creation interface, had very poor font support. Without delving into font-specific terminology, I submit that the easiest way to state the problem is to call it a lack of selection. Many products only had a few fonts; did not support various combinations of bold, underline, or strikethrough; did not support different font sizes; and required a complex process to add additional fonts. Consider a seemingly simple requirement for strikethrough fonts imposed by some government documents. One might naturally expect all products to support this feature. In reality, they do not, and often their workaround is hand-drawn lines with no support for automatic text reflow (the text gets pushed down or over, but the strikethrough line does not).

Conditional Content and Embedded Logic

The requirements described in Template Requirements and Template Requirements are somewhat different, but they both imply the need for embedded logic within fields. Once again, the support for such functionality varies greatly across vendors.

Depending on the architecture of your complete system (of which document automation may only be one component), you may choose to use no embedded logic or to use it extensively. As a developer, my initial opinion of embedded logic was quite negative. From my perspective, such luxuries only promoted the diffusion of business logic out of the business-logic tier. You may find, however, that allowing users more control over certain functionality frees your developers to work on more complex tasks.

For example, rather than having the three separate fields called first name, last name, and full name, you can simply support first name and last name. Then, when your template creation specialists require full name, they can use a simple string concatenation operator. In a similar fashion, the costs of all of the line items on a purchase order might be summed up and displayed as a total row in a table. In both instances, the use of embedded logic avoids the need for redundant data and consequently helps maintain a more normalized (in the relational sense) data model.

Document Requirements

Table 2 lists the requirements I developed for documents. These requirements address how documents handle such things as template selection and output file type.

Table 2. Document requirements

Number and Name	Description
Document 1: Template Selection	A document can be composed of an arbitrary set of templates determined at run time.
Document 2: Single Document	The final document must be a single document (not a series of documents as might happen if the templates were individually instantiated but not concatenated).
Document 3: Document File Type	The final document must be a document file type so that the text of the final document is searchable. Output must not be in any nonsearchable or image format.
Document 4: White Space	When templates are concatenated to form a new document, white space at the end of a template must be retained, implying the next template will begin on a new page. Optional requirement: The system must support an optional white space retention policy.

Output File Type

The document requirements primarily relate to the output file type of the final document. The requirement described in Document Requirements is especially important when one considers the final document in the context of a larger overall workflow. My research found that a number of vendor products produce final documents in only image-based or printer-specific formats. Consider the difficulties that would arise if you attempted to send a Tagged Image File Format (TIFF), PostScript, Printer Control Language (.pcl), Portable Document Format (.pdf), or IBM Advanced Function Printing (AFP) document file type to the next stage of a workflow. While it may be possible to find moderately effective parsers for the latter four formats, it is essentially impossible to use a TIFF document file as input to any further steps in your process.

As an alternative, consider the power and flexibility ActiveDocs gains by producing final documents in Word XML Markup Language (WordprocessingML) file format. First, you can view and edit the WordprocessingML document directly in Word. Second, because the WordprocessingML format is based on the ubiquitous XML standard, you can pass the document through any number of transformations, thus enabling such possibilities as alternate presentation styles, data extraction, and full-text searches. In other words, rather than settling for an inaccessible output document, you gain a flexible new data source by using ActiveDocs to produce a fully instantiated WordprocessingML output document. As content management systems become more widely used, such ability to repurpose content becomes indispensable.

Template Concatenation

The requirements described in Document Requirements, Document Requirements, and Document Requirements refer to a system's ability to truly concatenate an arbitrary set of independent templates at run time. Such concatenation must retain the format of each template in the aggregate, and the concatenation must succeed, regardless of the page dimensions and layouts used within each template. The resulting document must be a single document, a single print stream, and a single job in the print queue. Compromising on these requirements results in interleaved documents in your printer tray and a versioning nightmare in your content management system.

Printing Requirements

Table 3 lists the requirements I developed for printing. These requirements address how printers handle such things as queuing, multibyte characters, and output stream.

Table 3. Printing requirements

Number and Name	Description
Printing 1: Single Print Stream	The print stream representation of the document must be a single stream.
Printing 2: Single Print Queue Entry	The print job associated with printing the document must exist as a single entry in the print queue to ensure there are no interleaved documents in the printer output tray.
Printing 3: Multibyte Character Support	The print stream must support multibyte characters.
Printing 4: Paper Tray Selection	The printer must be given sufficient information to pull paper from the correct paper tray when the document contains multiple page sizes.
Printing 5: Output Stream	The output stream should be in either PostScript or PCL format, to enable use of a wide variety of printers (including inexpensive models).

I discussed most of the printing-related requirements in Document Requirements but two additional points are worth mentioning. First, some vendors may believe their products can print multibyte character sets, but they find that their products fail on non-European languages such as Japanese. Second, the hardware costs incurred by using a product that does not support smaller printers may greatly exceed the software costs for the entire document automation system. Consider your performance requirements carefully before you arbitrarily select the product that boasts the highest possible print volumes. Chances are you will never require this capacity, and the additional cost will be pure overhead.

Template Creation Interface Requirements

Table 4 lists the requirements I developed for the template creation interface. These requirements address the needs of template creation specialists for an easy-to-use interface for integrating their templates with the document automation system.

Table 4. Template creation interface requirements

Number and Name	Description
Template Creation 1: Template Creation Tool	Must allow template creation with a standard desktop publishing tool such as Microsoft Word.
Template Creation 2: Test Data	Must allow templates to be populated with test data, and must allow results to be viewed in real time on monitor and printer. Optional requirement: test support for complete contracts instead of for individual templates.
Template Creation 3: Null Values	Must allow creation and printing of null-instantiated (blank values for variable fields) templates.

The template creation interface requirements are important to making sure you can effectively use your document automation system and thus justify its cost. Most system designers of document automation forget this fact when they design their template creation interface. Use of a proprietary template creation interface can cause a number of undesirable results: poor user acceptance, significant expense, limited features, and costly hiring and training. Template creation specialists usually are not programmers; they are domain experts who have learned the additional skill of integrating their template designs into the system. As such, the template creation interface should be as intuitive and familiar as possible. A proprietary interface may intimidate users and reduce the likelihood that they will independently investigate any advanced functionality that may be available. While a particular proprietary tool may offer specialized functionality, such functionality is essentially nonexistent if the user is unable or unwilling to learn how to use it.

The problems with a proprietary interface do not end with user acceptance and proficiency issues. Assuming your users would accept a non-standard interface, you would still incur greatly increased training costs, compounded by difficulties in hiring new staff and difficulty ensuring that current staff are backed up during emergencies. Vendors who choose to produce a proprietary interface instead of using an established product pass along to their clients the expense associated with training and maintenance. In my opinion, template creation tools such as Word, which are widely used and for which functionality is regularly updated, are a better choice in terms of cost and usability. With a proprietary system, the buyer ultimately is left with a more costly, less functional legacy interface.

User acceptance was the most important issue on my personal list of requirements for the template creation interface. As a result, nothing pleased me more than to find that ActiveDocs had chosen the highly functional and widely used Word as its template creation tool. Many template creation specialists likely already accept and have a degree of proficiency with the Word interface. Simply by choosing to use Word instead of a proprietary interface, the ActiveDocs product fulfilled the majority of my template-specific requirements. In my opinion, Word has outstanding font support, page layout support, image support, table support, and more. A common interface is imperative, and Word fulfills this requirement.

Technical Requirements

Table 5 lists the technical requirements I developed. These requirements address the extensibility and accessibility of the architecture.

Table 5. Technical requirements

Number and Name	Description
Technical 1: Server-side Execution	The complete document automation process must execute on the server side without the need for a logged-in user on the server computer.
Technical 2: API Calls	The document automation engine must expose a thread-safe API callable from architectures based on those protocols (such as J2EE/EJB, DCOM, Microsoft .NET Remoting, SOAP). The preferred protocol is SOAP.
Technical 3: XML Data Format	The preferred format for sending data to the compilation engine is an XML document containing both the actual data values and an ordered list of the required templates (templates can be identified either with handles or with the template file name).
Technical 4: Compilation Service	The compilation engine should run as a service.
Technical 5: API Synchronicity	The compilation engine should support both synchronous and asynchronous APIs. The asynchronous API should accept the XML description of the job and should return the file name of the completed document. Asynchronous support could be handled by polling, but callback support is preferable.
Technical 6: Multithreading	The compilation engine should be multithreaded, with the ability to support multiple CPUs.
Technical 7: Message Queuing	The compilation engine (or an additional dispatching service) should handle message queuing to ensure robust and reliable job transfer from clients to service.

With regard to extensibility and accessibility of the architecture, you should know that many document automation systems are not server-based products. While such products may add a substantial amount of functionality beyond a standard mail-merge, they are clearly not candidates for integration into a larger, enterprise-class system. After you establish the need for a server-based product, the next obvious requirement is scalability. As your printing demands grow, the product must allow you to add servers functioning in a clustered capacity. It is most cost-effective if the compilation service running on each of those servers can use multiple CPUs by taking advantage of an intelligently designed multithreaded architecture.

The requirement described in Technical Requirements relates to the communication protocols exposed by the automation system. Typical options might include Distributed COM (DCOM), Java 2 Enterprise Edition (J2EE) with Remote Method Invocation (RMI), Microsoft .NET Remoting, proprietary sockets, Microsoft Windows Message Queuing (also known as MSMQ), or SOAP. Given the wide variety of legacy systems that may ultimately make use of the document automation functionality, my preference ran towards SOAP. With the release of the Web-services technology (code named "Indigo") from Microsoft, many of the performance concerns surrounding the use of a service-oriented architecture will be minimized. In my particular application, I was able to communicate from a J2EE business-logic tier to the ActiveDocs Web service by using simple SOAP messages. The interface code took one day to write and worked flawlessly. My only remaining concern related to the reliability of the communication process. This process could be enhanced with the use of real queues, but I was satisfied that I could use ActiveDocs, without any modification, to queue pending jobs.

System Performance and Licensing Costs

You should consider system performance and licensing costs in addition to the other requirements listed in this article. Following are checklists for each of these areas.

Performance Checklist

System should support at least your required number of simultaneous users.
System should support production of no less than your required number of contracts per month with each contract averaging your required number of pages.
Peak use of the system requires support for your required number of pages per minute.

Licensing Checklist

Cost per CPU (production and backup systems)
Cost per server (production and backup systems)
Cost based on print volume (production system)
Cost per connected client access license
Cost per disconnected (or intermittently connected) client access license
Cost per Web user (if broken out separately from disconnected users)
Cost per developer and test versions of the compilation engine
Cost per user (template specialist) for the template creation tool
Cost per developer for the template creation tool
Training costs per developer
Training costs per template specialist
Maintenance costs (yearly, repeated) for each component
Cost for development tools
Cost for additional components not supplied for primary vendor, such as print queue managers, reliable messaging services, and the like
Minimum cost per printer (if system places restrictions on allowable printers)
Cost per end user (if viewing and printing of final contract cannot be done directly in a browser, PDF viewer, or other available free viewer)

The performance and licensing checklists are provided as a reminder of the importance of including such details in your requirements document. Each application will have different budgetary constraints and performance needs. If possible, validate the performance claims of your vendors when applied to your specific documents. Use of different languages, extensive use of tables or images, or any of your other requirements may greatly affect the performance results for your particular application. My analysis configuration consisted of the ActiveDocs compilation engine run in clustered mode across two servers with four CPUs each. Vendors continuously improve their performance metrics, and ActiveDocs specifically has come out with a new high-performance print module. You can use the following numbers from my analysis configuration as a general guideline:

System should support at least 500 simultaneous users.
System should support production of no fewer than 6,000 contracts per month, with each contract averaging 25 pages.
Peak use of the system requires support for 1,000 pages per minute.

In my situation, price was certainly a criterion, but it was not the deciding factor. The system I hoped to replace was already the most expensive available, and I could simply have continued to maintain it if no suitable alternative could be found. With financial concerns pushed aside, I was free to focus on the more important elements of all available systems: function points, architectures, usability, extensibility, APIs, performance, and so forth. Ultimately, the ActiveDocs solution was the only product that met all of my requirements, and it was also one of the most reasonably priced.

Design Examples

While the Keylogix Web site has many excellent examples of applications based on ActiveDocs, the design depicted in Figure 2 shows an architecture that may be more applicable to organizations with existing applications and user interfaces. From a requirements perspective, my implementation required all of the functionality detailed in this article, and ActiveDocs was the only product that filtered through my elimination process. I have little doubt that my situation is mirrored across many different departments and corporations across the globe. That is why I believe it is important to point out that the overall cost of replacing my existing system, retraining existing users, and developing the necessary interface routines for introduction of ActiveDocs still produced a cost savings when compared with maintenance of the existing architecture. The new ActiveDocs architecture, shown in Figure 2, also yielded a far more elegant and streamlined system.

Figure 2. High-level document automation architecture

High-level document automation architecture

The document automation solution shown in Figure 2 consists of four primary components: the ActiveDocs software with a document specialist interface based on Microsoft Word, the ActiveDocs compilation engine, a J2EE application server, and a J2EE browser-based interface. In this example, an international salesperson uses the browser interface to access the ActiveDocs application on the J2EE application server. The salesperson moves through a series of wizard-style screens to select optional product attributes, and can browse the collection of contract-specific data values. The J2EE application server uses the information collected from the salesperson to build a SOAP message containing an ordered list of template names, as well as an extensive list of name-value pairs. This SOAP message is passed asynchronously from the J2EE application server to the ActiveDocs compilation engine. When the compilation process finishes, the compilation engine places the final contract in a location accessible from the J2EE application, and the salesperson is given a screen with a link to the new contract. At this point, the salesperson opens the contract in Word, reviews the information, and then prints the document.

The architecture shown in Figure 2 is very straightforward, and it met my needs perfectly. In other circumstances, it may be beneficial to print directly from the application server to a network printer, and this can certainly be accomplished. If the print job must occur without any user interaction, it would also make sense to add some form of reliable messaging. Depending on your specific needs, there are many possible architectures. The important point is that the ActiveDocs suite supports many different alternatives and is also able to generate your user interfaces.

Conclusion

Document automation is an indispensable element of any corporate technology suite. Unfortunately, few technology professionals have experience with such products, and many companies simply decide to outsource their printing needs. With the introduction of ActiveDocs, Keylogix has simplified the document automation process and opened the door to a revolution in personalized document production. Whether your firm generates form letters, sales contracts, government documents, banking statements, or any other type of printed material, ActiveDocs is likely to save you both time and money. When Microsoft first introduced the WordprocessingML standard, many people questioned its purpose and utility. After reviewing the breadth and depth of document automation solutions, I am convinced that WordprocessingML has paved the way for many innovative technologies, and Keylogix ActiveDocs may be the first.

About the Author

Joshua Greenberg, Ph.D. is the president of Altra IT Corporation, a Florida-based consulting firm specializing in software and business development for the J2EE, .NET, and Oracle platforms. Joshua has over fifteen years of experience building mainstream information systems. In addition, Joshua has continued his research on consumer-to-business systems and semantically enriched content. He can be reached at jgreenberg@altrait.com.

Additional Resources

For more information about developing custom solutions with Word, see: