Information Architecture: Do Not Model the Staplers
Dr. David Loffredo
Summary: Keep your information architecture relevant by limiting your scope, knowing your deliverable, and not letting your tools control you. (5 printed pages)
Back in the early '90s, I was discussing information modeling with someone from a large company. I was young and enthusiastic, but this person quickly put me in my place:
"Corporate brought in some information-modeling people a while back, and, a week after they got started, they were trying to model the staplers on peoples' desks."
I never forgot that lesson. While information architecture is a central aspect of many projects, an architect who strays into irrelevancy will never succeed. There are three areas in which it is particularly easy to stray from the path.
The people with the fascination for staplers got there because they did not know where to stop. It is crucial to set bounds on a project; and, with information models, even more so. It is easy to keep going. Someone always wants to do a bit more; the siren song of "completeness" has drawn many off into irrelevant waters.
In my particular experience, I have seen modeling as part of ISO industrial standards. The people who were involved were all bright, and there was never a shortage of ideas—so many ideas, in fact, that to accommodate all of them would mean that nothing would ever get finished. Fortunately, the ISO procedures called for an explicit, written scope statement before any project could even be started. The scope statement clearly spelled out the things that were to be included and excluded, what the intended deliverable was to be used for, what uses were not to be considered, and so forth.
In most cases, this scope statement was the only measure of control over projects. An architect needs the wisdom to document the scope of the project clearly, and the willingness to stand by it and fight. Not every good idea must be accommodated immediately; good ideas remain good ideas after six months. Remember the adage that a complex system that works began as a simple system that worked.
Information architecture is a broad term. Within it are several layers of models—just like the three-schema architecture that is taught in database courses—and each layer has a purpose.
At the highest level is the information model, which captures knowledge about the application area and documents all of the information that the project must handle. While the scope statement documents the breadth of the problem, the information model documents the depth. Some might think of this as just project specifications, instead of a formal model; but, no matter how it is documented, it is a key part of the information architecture.
Whether the application area is computer-aided-drawing (CAD) data or sales reports, the source of this knowledge are "domain experts," who are essentially the people who work in this area—quirky, imprecise, jargon-slinging people who have no idea how to describe formally the information on which they work every day. An architect's job is to tease that knowledge out of them, document it, and formalize it.
The quickest way to irrelevancy is to assemble these domain experts, declare to them what their needs are (I suggest: "A complete life-cycle model of staplers"), and walk out. To avoid such a fate, stop talking. Listen to what these domain experts say before you build a mental picture of how things work. Even if none of them has a global picture, you can get a very good idea of what is important to them. Work the knowledge that is gained from them into a more formal state. Strive for completeness, non-redundancy, and accuracy, but don't worry yet about programmatic access and other implementation details.
My friend of the stapler experience apparently had never been consulted by the modelers. Those modelers had some peculiar ideas about what was important that were not shared by domain experts. As far as I could gather, they did not make much of an effort either to communicate why they thought these staplers were so important or to get much actual input from the people who worked in the company. Do not repeat this mistake. Use the information model to build consensus with the customer, and make sure that it captures the important pieces of the problem area.
As soon as everyone agrees that the information model captures the bounds of the problem, it is time to move down a level and start building a data model. The data model is tied to implementation, so that the choice of tools will be influenced strongly by the technologies that are in use.
Now is the time to consider the implementation details! If you do not, you will certainly misuse the underlying system. Anyone who has seen enough different implementations can tell their favorite horror stories of misuse of database technologies, such as using relational database tables to implement a linked list, creating new tables for each customer, and other hair-raising tales.
No matter what type of model you are creating, avoid overloading it with details that lie outside of the problem scope. I cannot count the number of times that I added details to something speculatively, only to find out that I did a partial job that constrained any future work. If you do not need a telephone number, do not add it to the model, no matter how easy it would be to put it in.
Whether information model or data model, the goal is to represent the model as faithfully as possible by using the tools at hand. For example, a project's information model might describe extensive inheritance relations between some things; but, for various reasons, a relational database is the right implementation technology. SQL is the clear choice for your data model, but it would probably not be the best choice for describing the inheritance in the information model. Modeling languages document the results of information modeling; and, if you cannot capture those results with one, you should switch to something else that can.
There are as many modeling languages as there are programming languages, and each has vigorous advocates. An architect might have a choice—or one might be imposed upon the project, by circumstance—but it is important to recognize that each has strengths and limits. I have picked a few and listed them here.
· A plain-text document is the most basic way to document a model. While this is not really a formal description, it can be quite readable. This can be useful for information models; but it is less so for data models, because there are no corresponding implementation tools to do anything with the description.
· The Integrated Definition Language (IDEF-0) is an older diagram technique that is used for documenting inputs, outputs, and controls. It is good at showing the flow of data through processes at several layers of abstraction.
· IDEF1, IDEF1X, and Natural language Information Analysis Method (NIAM) are diagram techniques that are popular in some circles. As with most diagrams, they are reasonably good at documenting basic objects and relations, but they do not really show constraints. IDEF1 is designed for information models, while IDEF1X is more appropriate for data models (primarily, relational databases).
· The EXPRESS modeling language (ISO 10303-11) is near to my heart, although it is not widely used outside of the CAD standards community. It was developed in the 1980s and 1990s to describe engineering data constructs, and can be used for either information or data models. It can capture structures, relationships, and constraints. EXPRESS has an advanced inheritance-modeling paradigm, called AND/OR inheritance, that goes beyond multiple inheritance and allows types to be combined by using logical operators. A diagram version, called Graphical EXPRESS (EXPRESS-G), is also useful for information models, but it cannot depict the full range of constraints that are possible in the lexical version.
· The Universal Modeling Language (UML) and Interface Description Language (IDL) are widely used—mainly, to model application interfaces, but also for some data modeling. Excellent sets of tools are available to take a model and produce useful code. For information models, it can be difficult to describe constraints.
· SQL is used, of course, to define relational databases; has an incredible amount of supporting tools; and is probably the most widely used language for describing data models. Entity relationship (ER) and extended ER diagrams also are used to document tables and the key relationships between them. Your mileage might vary when used for information models, because it is difficult to describe concepts such as inheritance, lists, bags, sets, arrays, or unions, as well as more complex constraints.
· eXtensible Markup Language (XML) document type definitions (DTDs) and XML Schema are more recent arrivals. XML DTDs were first; they describe basic object and property data models for XML documents. XML Schema can describe more sophisticated data models, but not inheritance, although it does have a form of refinement that can simulate single inheritance.
· Web Ontology Language (OWL) is descended from knowledge-representation/AI work and has a rich set of relations, cardinality, and properties. It should be able to represent any information model. However, most knowledge-representation languages are designed to allow automated reasoning about a particular area, and they might or might not be good tools for communicating with people.
These are just a few of the many possible model-description methods. Whatever the language or tool that you use, remember that each one has strong and weak points. The right tool for documenting and communicating an information model might not be the right tool for crafting and implementing a data model.
Three main points that every information architect should remember are:
· Keep control of the scope, or you will be talking about staplers before you know it.
· Remember what the resulting model is for: Data models are meant for implementation, but information models are for documentation and communication.
· Do not let tools control your view of the world: You see a lot of nails when you are holding a hammer, but it is awfully hard to clean windows with one. Likewise, it is hard to document extensive and subtle inheritance relationships by using only SQL or ER diagrams.
If you keep these points in mind, no one will ever accuse you of modeling staplers—not even the fancy red ones.
· What is in my scope? What is out of my scope?
· Am I developing an information model or a data model?
· Is this information necessary for the project, or is it just nice to have?
· Am I really listening to the domain experts, or am I trying to convince them of something?
· Am I taking advantage of the strengths of my modeling language, or am I spending most of my time trying to work around limitations in the language?
· Booch, Grady, James Rumbaugh, and Ivar Jacobson.The Unified Modeling Language User Guide.Reading, MA: Addison-Wesley, 1998.
· Bruce, Thomas A. Designing Quality Databases with IDEF1X Information Models. New York, NY: Dorset House Publishing, 1992.
· Date, C.J. A Guide to the SQL Standard: A User's Guide to the Standard Relational Language SQL. Second ed. Reading, MA: Addison-Wesley, 1989.
· Schenck, Douglas, and Peter R. Wilson.Information Modeling the EXPRESS Way. New York, NY: Oxford University Press, 1994.
· More on OWL, XML Schema, and XML DTDs can be found on the W3C site at http://www.w3c.org/.
About the author
Dr. David Loffredo is VP of Product Development at STEP Tools, Inc. He has worked on software and standards for CAD and CAM/CNC data exchange since 1987. Dave is a tinkerer at heart, drinks entirely too much coffee, remains relatively cheerful, and is a handy fellow to have around.
This article was published in Skyscrapr, an online resource provided by Microsoft. To learn more about architecture and the architectural perspective, please visit skyscrapr.net.