Transparent Connectivity: A Solution Proposal for Service Agents


Maarten Mullender and Jon Tobey

May 2006

Applies to:
   Architecture Development
   Solution Architecture

Summary: This article proposes a solution for designing reusable service agent components, built largely on existing components, and using metadata to significantly reduce the coding effort. (22 printed pages)


Service Agent Challenges
Service Agent Solution
Service Agent Components


Increasingly, people are working in environments that are disconnected from corporate networks, or only partially connected to them. Additionally, their platforms are becoming more diverse, expanding from traditional desktop computers to notebooks, personal digital assistants (PDAs), and Smartphones. Such variability in connectivity and platforms emphasizes caching and concurrency issues that take a lot of developer effort. However, by now, we understand many of these issues, and have sufficient technology to build components that developers can implement to build reusable solutions.


People want to work away from their desktop, elsewhere in the office, on the road, at home, or even during holidays. This can be done only if they have access to their information either online or offline. However, in many cases, it is too cumbersome or too time-consuming to take work offline.

Both customers and partners ask for guidance, primarily for a good, reusable solution, and for when and how to use that solution. Here, I suggest a solution for designing service agents, built largely on existing technology (the Patterns and Practices group Composite Application Block), and using metadata to significantly reduce the coding effort.

To frame the discussion, the following information is excerpted from Dealing with Concurrency: Designing Interaction Between Services and Their Agents and expanded upon.

The Service Agent Role

Service agents interact with services, to facilitate the interaction and make connectivity transparent. This is particularly important in smart clients that span a level of connectivity, from always connected to rarely connected, and that may run on a variety of devices, from a Smartphone to a notebook. But in all of these situations, service agents deal with a set of related challenges, including:

  • Facilitating interaction with a service.
  • Facilitating transparency of connectivity.

Interacting with a service, and keeping your own state synchronized with the service's state, is difficult. Not only is it difficult, it is a problem that every application that wants to interact with a service faces, whether the application presents a user interface on a notebook, the application is another service interacting with the service, or the local state is persistent or kept in memory. In all of these cases, the logic to interact with a service can be abstracted into a service agent.

There are no constraints on where to use such a service agent. For example, you may use multiple service agents to interact with the same service. It makes sense in business-to-business (B2B) scenarios, and even in enterprise application integration (EAI), to have service agents implement the interaction, and thus it makes sense to have two service agents implement the two sides of a conversation, as in Business Protocols: Implementing A Customer Complaint System.

A Service Agent Facilitates Interaction with a Service

Service agents facilitate transparency, by dealing with concurrency issues and the management of integrity between requests. They also deal with dependencies between requests in offline scenarios.

Services typically interact for many purposes. They have conversations with various other services around various topics. The human resources system may communicate with the financial system around payments, with the benefits system, and with the procurement system. The order management system will communicate with one set of systems to receive orders, and with various other systems to fulfill orders. These conversations can be modeled in contracts, and service agents can help in facilitating the interaction through such a contract, or possibly through multiple such contracts. That is, there may be various service agents, each of which facilitates a certain portion of the interaction—one implementation of a service agent may focus on discussing new orders with the order management system; another may focus on discussing product availability; and yet another may discuss business information for high-level decision making. Each of these deals with specific aspects of interacting with the order management service.

Service agents may provide an abstraction over the service contract. This may be purely to insulate the client from changes in the service interaction. Amazon offers all past interfaces to their Web services, so that clients don't have to change. Few others will do the same, and often it is not practical to do so. In all these other cases, the clients will have to adapt, and they can do so by changing the service agent rather than their internal logic.

Another use of this abstraction can be to adapt to a different model. The client and service may use different models of the world—for instance, because they have been developed independently, or just because we're dealing with a legacy system. The service agent brings these together by providing the mapping between these models.

Service agents can help the client formulate correct requests by doing some local checking before sending the request to the service; they may make it easier to formulate requests by offering a better-suited interface (in terms of data freshness and importance, for example); and they may make it easier to encode and transmit the request, and to handle the response. The service agent may even make authentication and encryption transparent to the client.

A Service Agent Facilitates Transparency of Connectivity

Service agents may be used in scenarios where connections may be slow or intermittent, such as applications running on notebooks and on mobile devices that have to deal with every possible connection state, ranging from fast connection, through shaky connections, to being completely disconnected. One of the service agent's goals is to deal with such irregular connectivity and provide a consistent programming model to its client. It does this through caching, or even pre-caching, information.

I heard Clemens Vasters once coin the term qualified lying, and I have been using it ever since. The service agent can do everything to pretend that the client is connected. It may just keep information cached, but it can also execute calculations and return results as if the service were online. Such a programming model should enable the application to execute the exact same business logic, regardless of the connection state. However, that does not mean that the application should never know what the state is. In some cases, the application may offer additional functionality when connected. A solution may check out documents automatically when online, and allow editing when offline, but prohibit editing of documents that have not been checked out. In other cases, the application may choose to deal with incomplete information. For example, the schedule of some participants may not be available when scheduling a meeting; for others, it might be cached with the risk of being stale.

The service agent has to manage the views on the data it obtained from the service; it has to manage the data's validity, and it has to deal with the case in which the central data was changed during the time that the client application was using it. In both the online and offline cases, you get information and use data to formulate requests. In one case, it gets there quickly; in the other, it takes a little longer.

The application still has to deal with delayed exceptions: "Sorry, you cannot accept the meeting request; the meeting has been cancelled." The application has to deal with the resulting rejections and surface conflict resolutions to the user. The service agent merely deals with the timing of these.

Ideally, we would like to have a reusable solution that addresses these challenges. We could build such a solution, for instance, by extending the Patterns and Practices group Composite Application Block work. This paper will lay out what the framework for such a solution would look like, and how we can use existing components to build it.

Service Agent Challenges

Already, our current level of connectivity has significantly impacted how we work. Increasingly, people will be working in more flexible environments, and not tied to their desk: from home, in their car, in other peoples' offices, or anywhere else. In many office buildings, many workplaces are unoccupied at any given time. This means that people need different and more flexible ways to access their information. Sooner than we may think, nearly any device will have a general packet radio service (GPRS) connection or some newer variation, and thus the potential of connectedness.

This brings a whole new set of challenges to developing smart client applications. Managing offline information without dealing with updates is a challenge in itself, as is deferred updating without having to worry about the correctness of the local data; however, combined, these pose an even greater challenge. Solution designers struggle with this on a daily basis when they are designing smart client solutions. Independent of the particular application, this challenge has many similarities across the problem space, making it suitable to extract these similarities into a framework.

The proposed solution for designing service agents will focus on three problem areas:

  • Identifying Relevant Information to Cache—Taking too much information takes too much space or is too slow; taking too little information is untenable for the user.
  • Reducing Chattiness on the Wire—Optimize the download process by packaging this download and subsequent communication.
  • Dealing with Local Change—Propagate local work (pending changes, queuing, and so on), and synchronize with the service.

Identifying Relevant Information

Automating offline information is not a "simple" matter of caching. Connectivity is a spectrum, and therefore, so is the amount of data we need to cache in order to optimize our work. Even in situations where I am in the office and connected to the network, having some of this information available locally (cached) certainly improves performance. For instance, something as relatively simple as working on a document is much faster when I have a local copy. When I am on the road, I will need to take much more information with me in order to do my job.

The simple solution of taking everything offline to work disconnected does not work, or it is not very attractive, because users neither have the space to store all information offline, nor do they want to. This is because offline space may be limited, and the synchronization process becomes inconvenient, because the more information, the longer it takes. Therefore, we have to be selective in what we take offline. Laptops and, certainly, PDAs don't have enough storage to contain all the information to which I have access. For things like computer-aided design (CAD) and product lifecycle management (PLM), they don't even have enough storage for the information for everyday work.

Also, when we think of using the telephony network, data transfer may be expensive or slow, so we must optimize the information we cache, and how we deal with updates. In the not-too-distant future, devices will be more consistently connected; however, connectivity is not always equally efficient/fast. When I use the wireless phone connection, speed is not as high as when connected in the office, or even from home. The price tag for connectivity may also be higher when connecting through the air.

So it is important to synchronize most of the important information while having a fast and inexpensive connection. Then, we can retrieve the rest when needed. This needs to be automated, and it needs to work for various levels of complexity. Furthermore, we will see that these levels correlate fairly directly to roles.

Scenario 1: Static Subscription: Sales Manager

Let's call the simplest level of complexity "subscription," where the data to cache is a predictable subset. For instance, a sales manager could subscribe to a customer relation management (CRM) system that tracks information about clients, including both "static" information such as contact information, and more dynamic information such as ongoing project details. We could supply all of this information, though, through a subscription based on the Customer entity.

The problem becomes more complex if we expand the idea of a subscription beyond that of an individual. For instance, even the satellite office or service center may have problems if there is too much information that needs to be synchronized, because if there is too much information, we may run into issues of expense or concurrency, because of bandwidth or latency. This is especially so if the information changes often. So, for them too, it may be interesting to take only information offline that is relevant in the upcoming few days. For instance, the sales executive needs only the clients for this week, and the repair center does not need every customer in the nation, or information on every model appliance sold. They need only the customers and appliances germane to their region—a Florida service center does not need repair information for snow blowers. Again, these problems are fairly predictable, and we could define subscriptions to cache the right information in each case. Or, in other words, we could automate subscription creation and deletion.

In this class of scenarios, the set of information that is to be cached is defined by users, designers, or administrators of the system. The system need not be aware of the relationship between the pieces of information, but the people that defined the subscriptions must be.

Scenario 2: Dynamic Subscription: Repair Technician

A slightly more complex situation is where users need to automatically get different subsets of the information, or would have to change the subscription on a very regular basis. Consider a traveling repairman, for instance. This person doesn't need a complete customer list, nor would it be practical. Instead, the technician can take just the relevant information for that day's customers and appliances, plus the updates for the company's Quick Fix information database concerning tips that other repairmen have learned in the field.

Specifying only the Customer entity is typically not sufficient. The technician doesn't necessarily need all the customer's opportunities, activities, open orders, history, unpaid orders, and problem reports offline. But the technician does need some information related to the Customer entity. Perhaps there are documents that relate to the customer's service level, and the report service call instance information. Less predictable problems—those that depend on the information itself—are more difficult. For instance, if the customer has a service contract that includes regular maintenance on its machines, it might make sense to look up those machines as well, and to combine the service call with a scheduled maintenance call. In this situation, we see that we might need to expand the tree of information to give the technician all of the information for the visit. Even in this case, we could probably write something specific for the repairman, because, given the customer and the problem, it is very predictable exactly what needs to be taken offline, and for how long. So we can follow a schema. In this situation, we could consider each instance as a new subscription, so it is not that different from the first case.

In this class of scenarios, the root instance, or root instances, changes. The system must follow the relationships between the entities in order to identify the correct set of information.

Scenario 3: Arbitrary Subscription: Information Worker

The most complex situation is when the required set of information is based on the user's current or planned work. For instance, a traveling executive, a lawyer, or even an information worker, in the course of daily events is working on information that is much less predictable—information that is based on the context of what they are working on. What makes this complex is that the starting point is variable. Instead of starting with a specific set of customers, or a variable set of customers, like in the previous cases, an information worker may have to decide what data is important based on an e-mail message or a phone call, for instance. The starting point for the offline process may be a customer of course, but it may also be a document, an appointment, an event that needs be organized, or a product that is being discontinued. Identifying the set of offline information is more complex than in the previous two cases.

I can even imagine that the system will look at the appointments in the calendar, at the task list, and at many other hints that the system offers, and then determine a set of potentially relevant information. The types and relationships that are being followed is not fixed—they are created at run time. I call these runtime relationships "associations." Where a hard-coded recipe would work in the second scenario, it doesn't work anymore in the third. This kind of solution is beyond the scope of this document. For more discussion of this case, see Collaborative Productivity Through Context.

If we look at the three scenarios, we see that all we are talking about here is variable levels of complexity of the same thing—identifying relevant information to cache. Also, we can see that, often, the required information depends on the user's role (see Table 1).

  • Static Subscription—A predictable fixed subset or subscription
  • Dynamic Subscription—A less predictable subset or variable subscription
  • Arbitrary Subscription—Unpredictable subset or context-driven situation

For each of these situations, we need to specify the relevant subset in order to take exactly that information offline and automate the process. Selecting all relevant related information currently needs to be done manually, and therefore, it often is just not done.

Table 1. Summary of the various scenarios

ScenarioRoleStarting PointInformation to CacheSolution
Static SubscriptionSales managerFixed Entities Fixed queries

(Customer List)

Defined by subscriptionPossible to hard code
Dynamic SubscriptionRepair TechnicianFixed Entities

Variable subset (Customer ID 1234)

Defined by following relationships from starting pointMore difficult to hard code
ArbitraryInformation WorkerFlexibleDefined by relationships and associationsImpossible to hard code

Reducing Chattiness

The cost of interaction is determined not only by the amount of data and the speed of the connection, but also by the number of interactions needed in order to get the information to the other side. Using a racecar to stock a warehouse is much less efficient than using a truck. Similarly, if we can retrieve the information in fewer requests, and thus in larger chunks, it will be faster. If we can aggregate the information on the server, and then download it, we will have one large chunk instead of many smaller ones. As a side effect, we will have reduced the information somewhat, by removing duplicate information.

However, by aggregating, we have changed the format of the information. If we do not want to change the clients, the service agent should extract the desired information from the aggregated information—that is, it should decompose the information again. This is important, because the schema of aggregated information typically changes at a rapid pace, and you do not want to have all clients change at the same rate. The clients should not take a dependency on the aggregated information.

This leads to (at least) two mappings: The aggregation service on the middle tier aggregates the individual entities and exposes its own model. The service agent on the other side consumes this model, and exposes a more-or-less independent model to the clients. A change in a back-end service does not necessarily change the aggregation, but, more importantly, it does not have to alter the model consumed by the clients—or, at least, not of all clients at once.

Aggregation can work in multiple dimensions. We can aggregate multiple entities in order to create new ones. For instance, the information that the sales manager wants to see about a customer can be combined into one big Sales Manager Customer entity that includes the customer, the order information, opportunities, activities, complaints, contacts, and so on. We can also aggregate multiple instances with shared characteristics into a single piece of information. For example, we can provide the repair technician with a list of addresses arranged by time of visit, and a list of maintenance logs for those addresses.

Another idea is to let the client express its interest, and then have a service collect the information and, possibly on next connection, hand the information to the client. The problem is not so much that the client may be interested in other information on reconnect, but more that the client may have updates to the services, and that the collected information has to be updated.

Dealing with Local Change

Of course, a major issue in working with disconnected and changing local data is the issue of reconciling local data, which really means that how we use service requests to update data is crucial. Keeping the local information up-to-date is not easy, even when there are no changes made locally, but it becomes even harder when the user makes local changes, and the effect of these pending changes should be visible to the user before synchronizing. What we are seeing is that the local changes and the cache are related. (See Figure 1.)


Figure 1. View data is not current

The first problem is, of course, that the local data is not necessarily up to date. Even when directly connected to a network, latency issues could be an issue. For example, the technician could do a search on a scarce part and find it in stock, but if another technician ordered since the cache was refreshed, the search results may be invalid. As connectivity becomes more tenuous and latency increases, this becomes even more important, because we cannot connect to the service, or we do not want to, because the connection is slow and expensive.

Another, more straightforward, challenge is that we may want to send some requests with a higher priority. Sometimes, we even want requests to wait until we are back in the office and the connection is faster, more reliable, and cheaper. The technician will want to know as soon as possible whether a part needed at the customer is actually available, but the message that he is finished at the customer's site and on his way to the next can easily be sent with a slight delay. Updating the maintenance logs may wait until the technician is back.

The real challenge, however, is dealing with change. Does the user have to see the effect of the changes on the local data while requests are pending? If the technician signs off on the maintenance and enters the service information, the device information probably doesn't have to be updated immediately to reflect the latest update. That can wait. However, when I update a telephone number in my address book, I want that information to be available immediately. Sometimes it suffices to show the original information, and to show that there are pending actions. However, in other cases, the local information must be updated.

Since requests cannot be executed while offline, or they are just being held up until appropriate, many requests may be pending. Sometimes we want to allow the user to cancel pending actions. For example, the sales rep sends in an expense report for approval, and remembers that he needs to add another item. As long as the request is pending, that should not be a problem. However, sometimes there are dependencies. When the insurance agent sells a home insurance policy plus a car insurance policy, the pricing of the second one depends on the first. He cannot just cancel the home insurance without recalculating the car insurance. This is a situation that we will not have to face in every service agent, but it's good to keep in mind.

Service Agent Solution

Smart client developers need service agents that optimize the amount of information to take offline by allowing the user to configure or indicate relevancy. Unlike ActiveSync offline files, or Outlook offline folders, these would use a dynamic set of information that is taken offline, rather than a static, subscription-oriented, all-or-nothing approach. By effectively picking this information and designing our service calls, we can give workers most, if not all, of the information they need in order to complete a task, and deliver other information or return results with a minimum of additional communication.

We want a service agent that is useful without any code extensions, but that can be adapted to solve more specific needs. This service agent would use metadata as described in Razorbills: What and How of Service Consumption. On the one hand, this is a simplification of IBF metadata and the SharePoint Business Data Catalog (SharePoint BDC); on the other hand, it is an extension to it. Our service agent metadata does not include any user interface–specific information, but it does require information about the relevance of relationships, cacheability of views, and the composition and decomposition of those views.

A generic service agent either would probably not solve all our needs, or it would become too complicated to build and configure. Therefore, such a metadata-driven service agent would need to be extensible and customizable by code. This extensibility needs to be built-in from the outset.

Pruning and Grafting Algorithm

So, to restate the issues of identifying relevant information to cache, the following are a few problems with increasing level of complexity that we would like to address:

  • There is no infrastructural support yet for taking a predictable set offline. Solution builders are on their own, and it is hard to do. For example, given a customer, we take the customer information offline, plus all of that customer's contracts, and so on.
  • There is no support for a generic and flexible approach that allows administrators, or even users, of systems to configure their caching and offlining support.
  • There is no support for a less-predictable, flexible approach, where we take documents plus everything that is referred to by those documents.

What does it mean to take related information offline? The repair technician selects a customer. Once this is determined, the service agent must then take the related service contracts offline. For each service contract, it may navigate to the devices, from there to the maintenance manuals for those devices, and to the specific instances of the devices, and then from these instances to the maintenance records, and so forth.

If we imagine a tablecloth made of lace, the offline process is like lifting up the lace a little bit at a specific point and then taking everything offline that no longer touches the table. The knots would represent the instances that we want to take offline, and the threads would represent the relationships between these instances.

Technically, a relationship means that, given the information of a source instance, such as customer Contoso, the algorithm can retrieve the related information, such as the service contracts for Contoso. For example, customer Contoso has CustomerID = 4711, and the algorithm can retrieve all service contracts where CustomerID = 4711. From there, the algorithm can follow the relationships that need the contract information. The algorithm creates the tree of entities by grafting the additional entities following the relationships.

Now, customer may have many relationships: to orders, to invoices, to contacts, and so on. Not all of these relationships are equally important to the repair technician. The algorithm follows some of these relationships, but not others. The algorithm prioritizes the relationships based on the user's role.

The algorithm needs to stop grafting at some point. It needs to prune the virtual tree of relationships and entity instances. The algorithm grafts the tree based on the role-specific priorities, and then walks the tree of relationships up to a certain perimeter, pruning the tree when it has gone far enough. To determine what "far enough" means, we use a distance metaphor. A relationship has a certain distance. The shorter the distance, the more important the relationship. Note that distances are one way. The distance from customer to sales representative is probably shorter than the distance from sales representative to customer. Otherwise, when starting with the customer, the sales representative would be included, and there would be a good chance that all customers for that sales representative would be included as well. By increasing that distance from sales representative to customer, the chance of including all customers for that representative would be reduced; those customers would fall outside of the perimeter.

A specific solution for the second scenario of the maintenance repairman could be implemented without any metadata. The service agent would take all of the items offline through hard-coded navigation. If this is the only solution that needs be built, this is probably the most efficient way of doing it. However, when the solution needs to be modified later, because more artifacts need to be downloaded, or when similar solutions need to be built for different roles, the number of relationships and the variation of subsets would grow quickly. Then, a hard-coded solution would become impractical.

If we look at the third scenario, the starting points of the algorithm become much more diverse. The information worker can start with an appointment with a customer, and take customer-related information offline, or he or she can start with a marketing plan for a new product, and take that information offline. In the first scenario, we have a fixed set of information that we take offline. We don't ever have to follow relationships. In the second scenario, we always use the same entity, but we vary the instances. In the third scenario, we choose different instances out of a variety of entities each time. In the second scenario, a not-so-very-flexible solution might work. In the third scenario, it wouldn't.

The major difference is that, in the first and second scenarios, we know their starting points. In the third scenario, we must get the starting point. Once we get that, the algorithm is the same. With the metadata as described in Razorbills and provided by IBF as a basis, we are able to make this algorithm generic. The metadata describes how to create the service requests to obtain the information, and how to use the obtained information to follow relationships and retrieve related information.

There are some difficulties here. For the third scenario to work, we would basically have to have all pertinent relationships predefined, so that from any given starting point, we could walk the tree. This is not trivial, and often not sufficient. I propose introducing what I call "associations." Where a relationship is a designed recipe to get from one entity to another—for example, from a particular customer to the related service contracts—an association is a relation that is created at run time. For instance, if an information worker is working on a project proposal, he may look at some other proposals that are similar, and he may look at a number of products that he is considering including. The project proposal, the example proposals, and the products are now correlated. If the system can store this information, as described in Collaborative Productivity Through Context, the algorithm can treat these associations the same way it treats the relationships, and we can now automate the context scenario. Another example of an association is an IBF document. A document that contains references to specific instances—for example, the proposal document that contains references to the customer, products, and services, exposes associations (from the document to that specific customer, the orders, and services). These associations can, and should, be considered by the algorithm.

It is important to note that, although the algorithm uses the concepts of references and relationships, this does not mean that the services that are accessed must also use these. The Razorbills metadata provides the mapping of a reference to a service request.

Reducing Chattiness on the Wire

The algorithm of the previous paragraphs may lead to additional chattiness. If we just graft the tree by navigating from instance to instance, we may arrive at the same instance multiple times. For example, we may go from the customer, to the list of machines up for service, and then to the manuals. Multiple customers having the same machine would lead to multiple requests for the same manual.

This is exacerbated by the fact that, while references uniquely identify instances, the references themselves are not unique. This means that two different references can refer to the same entity (for example, I can identify the customer by CustomerID or, perhaps, by telephone number). To resolve this ambiguity, the caching mechanism must be able to deal with these different references. This should be possible after retrieving the view once. This leaves the problem that we may want to gather a batch of references before submitting a request.

A minor thing to note is that references refer to, and services provide, views on entities. That means that a service request may not provide the complete entity information. This does not alter the basic working of the grafting and pruning algorithm. We can just ignore this, follow the relationships from view to view, and ignore the fact that multiple views may be related to the same entity. Alternatively, whenever entity information needs to be retrieved, the complete (for the role) information can be retrieved.

Knowing about entities that are encapsulated in the services and their concrete views allows us to request aggregated information from the service, and it allows us to construct the individual views in the service agent. The aggregated information causes less chattiness on the wire, and being able to construct the individual views and provide these to the consumer makes the service agent transparent. This aggregation can be a combination of views on a single instance, or it can be a combination of multiple individual instances into a list of instances. For instance, a given work order would consist of all of the relevant customer information, and all of the relevant information for each machine, as well as other possible machines on-site that are due for maintenance. The day's work might consist of several such orders all downloaded at once, rather than as separate actions.

The service agent will have to untangle or decompose these aggregations, and be able to provide the original views. I see the Decomposer as a part of the caching mechanism. The cache may store the aggregated views on instances, and that is probably most efficient. The cache stores the aggregated entity or view information, and it should be able to return not only both lists and instances, but also all views that can be created from the local information. This implies that the cache, with its Decomposer, must understand the references and the relationships between the schemas. The Decomposer must also be able to resolve all references to instances, and it must be able to make a best guess at lists. That is, it's impossible to be certain that a list can be retrieved correctly from cache, unless the query has been executed and cached as such. For instance, you might be able to return "the customer's orders" from cache, but you would not necessarily be able to return "this week's top ten orders," unless you had cached that list or a superset of it. This is a generic caching problem that wasn't introduced by the solution proposal, but it wasn't addressed by it either.

Propagating Local Work

While, with current technology, create read update delete (CRUD) is often still the best approach, ideally we want to provide an alternative to using CRUD that is less limiting. The idea is not new, and descriptions can be found, for example, in "Using a Task-Based Approach," in Smart Client Developer Center: Chapter 4 - Occasionally Connected Smart Clients, and in "Journal" in Dealing with Concurrency: Designing Interaction Between Services and Their Agents.

The idea is that the information from the service is cached locally, and that requests for updates are being queued. The queue is processed whenever there is a connection, and the cache is synchronized with the service only after the queue processing has finished. We often refer to this mechanism as "caching and queuing." Caching and queuing avoids having to deal with merge replication.

I want to add a few mechanisms on top of this idea, to make it more user-friendly. First, I want to introduce what I call a "journal." The word is taken from financial accounting, where journals have been used as structured request queues for general ledger posting. A journal is a structured request queue. A business request may technically consist of several more-primitive requests. The user may or may not be aware of these. The business request, or business action, can be shown to the user. We can tell the user that an expense report was submitted, without having to say that the expense report was sent to a document store, and that a task was sent to a workflow system. The journal could contain the overarching information, as well as the detailed information that the system can execute. A business action is always treated as a unit of work.

Updating the Cache

An important question that one has to ask is whether it is necessary to update the cached information if the requests are being queued. As always, I believe the answer is, "It depends." In some cases, it will be sufficient to show that there are pending actions, and in others, the changed information needs to be available. For instance, a changed telephone number in the address book should be available immediately, especially if it is on the Smartphone. Both the local update and the showing of pending requests require some work.

The service agent should be transparent to the client. The client sends a request to the service, which the service agent intercepts. This means that the service agent itself must make any changes to the local cache. This, in turn, implies that the service agent must understand how to interpret the journal entry in order to apply those local changes. It may be possible to define some of these in metadata, but it's certainly possible to use the metadata to configure custom components. The service agent will then call these components to make these changes. This gives the service agent additional control over the cached information.

An application that displays an entity and wants to show the pending actions should be able to take the reference to the entity and ask the service agent for those pending actions. The service agent inspects the journal and provides the information, including a user-friendly description of the action.

For the synchronization mechanism to work, it is technically not necessary that there be a connection between the journal and the cache. The cache is updated after the journal is processed, and the synchronization will look at all server-side changes, regardless of where they originated. However, it may be interesting to show the user that the information displayed has not yet been processed, and that it therefore is not confirmed yet. This requires the service agent's understanding of the relation between the queue and the cache, and the application's understanding of the service agent.

Transactional Behavior

Since the journal structures the requests into business actions, the question of transactional behavior immediately comes to mind. The service agent does not require a transactional store, even though having one may simplify some of the coding. Remember, though, that every request to the service agent is a request that could have been sent to the service. The latter would not have been transactional either, in almost all cases. And, since the journal is sent either as a whole, or in chunks, to the service, processing of the journal does not require transactional behavior on the client side. The service agent does not add the transactional requirement; in fact, the journal mechanism reduces the need for a transactional store. The journal buffers the requests, and when something goes wrong on the client, the journal request is either completely added, or it may be ignored. That being said, the availability of both a transactional store and reliable messaging makes the design much simpler.


The journal structure of the queue allows us to keep track of dependencies between requests. This is very important if we want to offer the following two features:

  • Prioritization of requests
  • Cancellation of pending requests

It is relatively straightforward to add priorities to the processing of a journal. Some actions may be synchronized with high priority, others may be synchronized when enough bandwidth and processing power is available, and yet others may have to wait until a specific time, or until a specific type of connection is available. However, not all actions are completely independent. A business action may depend on a previous action that has not yet been synchronized—as a withdrawal may require a deposit to clear. Reversing the sequence may lead to errors. The journal can help track these dependencies by tracking the target entities. The second request to a specific instance depends on the first request to that same instance. A business action, through its primitive actions, may affect multiple instances, and therefore it is possible to catch side effects as well. Note, however, that the dependency is set on the business action as a whole, not on the individual primitive actions.

The journal can maintain the correct sequence of execution for interdependent actions, while offering prioritization between independent actions.

Cancellation of pending actions poses a very similar problem. Can we always offer cancellation of requests? By removing the journal entries of the action, the request has been cancelled. However, there are two problems to consider:

  • Are there any actions that depend on this one?
  • Did the action make any local changes?

With the information in the journal, we can determine whether the action to be undone has other actions that depend on it. This is the same mechanism used for prioritization of requests. If there is such a dependency, undo should not be allowed.

The second question can be handled only in the context of the service agent. If the action involved local activity, such as saving a file in a specific location or updating any other local state, the service agent will not be aware. So, first, some actions cannot be cancelled by the service agent; they can be cancelled only through the client application.

However, if there is no such restriction, the caching mechanism keeps track of local changes and provides a link to the journal. This is possible because, as stated before, all changes to the cache are done within the service agent. Undoing those changes depends on the sophistication of the cache storage mechanism. If it provides complete versioning, it is certainly possible to undo the last change. However, if, for instance, it keeps only before and after images, this would only be possible if there is only a single action. This may not even be a severe limitation, but it may be irritating enough to forego offering undo functionality that works most of the time, but not always.

Composing Intelligent Metadata

Anything that can be defined using metadata can also be provided as code, sometimes more efficiently. It is important to have escapes in your metadata to call code, because some things cannot be defined in metadata. The service agent mechanisms call (type) specific logic. Most of this logic is available in the form of metadata, but in many cases, we cannot describe the desired logic in metadata; instead, we call a method that has been configured in metadata (prescribed interfaces, auto download, services, and so on).

Say that you want to describe a relationship between two entities, you have a view on one, and you want to get a reference to the other. Often you can do this with XSLT and put that in the metadata. But sometimes you just need code to do it. For instance, in one field, you have a key which is composite ID plus the date, and in the reference field it is the ID or the date.

So a general rule would be: Do not include business logic in metadata; instead, call it from the metadata. This streamlines your schema and provides much greater flexibility. For instance, you may call a service in your metadata to provide compliance. For instance, if the regulations to which you report change (for example, going from local to interstate), you can change your service without impacting the metadata.

Using this approach, we can build smart client solutions with the following characteristics:

  • They are metadata-driven
  • However, They allow manual override:
    • Defined in metadata wherever possible
    • Defined using interfaces between the infrastructure (the service agent) and the custom components that execute the specific logic
  • They support the original views on instances and lists. That is, they are transparent to the client, other than that the client needs to change its endpoint.

For more on metadata, see Razorbills: What and How of Service Consumption.

Service Agent Components


Figure 2. Service agent components

Once we have the pruning and grafting algorithm and metadata established, we can build service agents that work as follows (see also Figure 2):

  1. Based on a triggering event (for instance a subscription, or a query), the SAF identifies the data to take offline, using algorithms in the Pruner and Grafter component. This component looks at metadata entities and relationships, to decide what to take offline.
  2. The SAF then builds and executes requests to the line-of-business services to get the appropriate data. This is retrieved in one "chunk," or in an optimized number of chunks to reduce chattiness on the wire.
  3. This data is stored in the cache.
  4. The Decomposer retrieves the views that the user needs from the cache, decomposing the original call into more-granular calls as needed.
  5. Data is displayed in the user interface, where the user can change the data.
  6. Changes are stored in the journal as business actions. These changes may or may not be reflected in the cache.
  7. On connection, the business actions are relayed to the service data, where logic handles any conflict.

Identifying Relevant Information

Taking related information offline means that, given a trigger—for instance, a service call—the customer information is taken offline, as well as all the information related to that customer. Not only do we have to get the right information, but we have to assemble and disassemble it so that we can retrieve it in coarse-grained calls (today's service calls for our technician), and then retrieve the actual fine-grained information back from these calls (service history for machine serial number 123654). Not only does this reduce chattiness on the wire, but it also prevents dependencies that we do not want. That is, if we change the piece of equipment out, we do not want to have to go all the way back to the customer view in order to do it. This could make any changes to that view pending, even if they are not interrelated.

To do this, we need to do three things:

  1. Define a subscription model.
  2. Find the relevant information.
  3. Automate the subscription.

Note that these build on top of each other.

Once we define the subscription model, we need a way to automatically graft and prune the tree of information, so that we can automate the subscription.

Grafter and Pruner

This component identifies the relevant information based on the length of the relationships—in other words, it grafts and prunes the tree of relationships. Without grafting and pruning, we would get every instance from every system when we went to take information offline, making the process unworkable.

Let's assume, for simplicity's sake, that each entity has only a single view defined for it. The service agent will then have to retrieve the view, and then, for all relevant relationships, build the reference to the related instances. Then, each of these instances must be retrieved, and the procedure repeated, until there are no relationships within the perimeter. However, following this procedure, the same instance may be retrieved multiple times (for instance, the same customer may own multiple products of the same model, and from there on, there will be duplications). So, the algorithm has to try to avoid retrieving the same information twice.

Reducing Chattiness on the Wire

Basically, the Grafter and Pruner mixes-and-matches schemas based on roles, and then aggregates this into a single request. Once the request is built, we retrieve this as a coarse-grained call and store it in the cache.

Build and Execute Requests

This component interfaces with the metadata and line-of-business (LOB) applications:

  • It is a generic proxy to the services.
  • The service access part of this may leverage the work done in IBF (metadata-driven proxy).
  • It supports combining requests, in both of the following ways:
    • Combining multiple views on a single instance into one request
    • Combining multiple-instance request into a list request

The problem does not become much more complicated if we remove the restriction of a single view for each entity. We can regard the combination of all views on a specific entity that are accessible by the consumer as one single view. We can then retrieve that aggregated view in a single operation, and thus reduce it to the original problem by adding an extra indirection. That is, there is no real difference between getting five views and getting one view; five views on a customer is the same as an aggregated single view. Combining these views would be the responsibility for the Build and Execute Requests module.

That same module would also be responsible for combining multiple requests that can be handled in a single roundtrip, in order to reduce chattiness.

A request typically looks like GetType(parameters)—for example, GetCustomer(CustomerID). This can be replaced by Get(Reference), where the reference contains an envelope that declares the type to be retrieved, and the typeinfo for the parameters, plus the reference itself. This type of request makes building infrastructure much easier, because if you work with references, you really only need one method to fetch them.

Dealing with Local Change

The journal and cache are interrelated. The journal is a list of actions for cached data.


This component:

  • Stores the cached information.
  • Provides the individual views.
  • Decomposes the retrieved information into those views.
  • Provides information about the reliability of the information (based on retrieval time, local changes, and, potentially, SLAs with the service).

The cache component stores the entity information (the aggregated view), and it understands the schema of the information. All requests to cache are either requests for retrieval of information based on a reference, or they are update requests that use a reference to uniquely identify the instance involved. The reference is either an instance reference, where the system may assume that such a reference uniquely identifies the instance, or it is a list reference. For example, for a company dealing with consumers, the instance reference to Customer may specify a customer ID, a telephone number, or a name and address, in order to uniquely identify the customer. Or, the list may be specified by a query or a relationship, which is then mapped to a query. That is, all customers living in Texas, all customers that spent more than $1000, or all orders for a given customer.

The cache must be able to take the reference, deduce the type of information requested (Customer), and then use the reference to find it. This may be done using a standard (pre-canned and parameterized through metadata) method (such as a search for customers where attribute x adheres to condition y), or using a non-standard method provided as code (implementing a specific interface to return a list).

This means that the cached information may be stored in a database, with the information stored as XML, and the key attributes derived from the references. In this case, the customer could be stored using the customer number, the telephone number, the name, and the address. Searching of Customer can then be done by crawling the entity information and matching it to the XML. For a call center, that would mean that the customer may be identified by the phone he or she is using, or by providing the customer name and, if this leads to multiple hits, the address.


The Decomposer is, in some ways, the inverse of the Build and Execute Requests component, allowing fine-grained views on the cached data. Originally, I requested "get customer and everything related" from the service, but now I may want to "get customer address" from the cache. This is just using a subschema from the "über schema" that we used to compose our original request.

If you think about it, this is just entity aggregation. In entity aggregation, we get a whole heap of data for one logical entity—either from multiple sources or a single source. For instance, an ERP system will provide you with an order header and order items, and none of these have any human-readable info; therefore, we have to retrieve that too. This information is then assembled into as single schema. When accessing the EA service, we don't always want the whole entity, but just a portion of it—either because that is what we use for the activity, or because of access rights restrictions. So, we store the whole schema and make it searchable. Together, that is aggregating the view on the way in, and decomposing the view on the way out.

So, entity aggregation "only" adds access rights to the service agent pattern. In our application, the service agent has defined roles to give us our subschema. With our role-based approach to the data, we know who is retrieving the data. But if we were to open this up so that anybody could access it, we would have the same pattern as entity aggregation (EA). Local use allows a couple of simplifications, and it allows you to do related stuff (as opposed to predictable stuff). Take the unpredictability out of this, and it is an entity aggregation service. For more on entity aggregation see SOA Challenges: Entity Aggregation.


This component:

  • Manages information about offline activities.
  • Queues and executes all requests.
  • Manages the relationships between instances and pending actions.

The journal is where you organize offline work and synchronize the journal changes back to the data. Everything in the journal is a pending action. In Dealing with Concurrency: Designing Interaction Between Services and Their Agents, I recommended structuring journal entries as business actions. Thus, journal requests combine to be logical transactions, rather than unrelated individual steps, so that logical action then forms the basis for an undo. The journal links these transactions to instances in the cache, and vice versa.

In the UI, you can display data from a service (a few ms old), from the cache, or from the modified cache. In the journal, you can have things that have yet to be modified in the cache that you want to show as a pending action. You want these various cases to be displayed; therefore, you know this is not service data, but your data (pending action), because, as we have already discussed, actions can be dependent on previous actions (both local and remote). The user needs to know the status of actions. If you modify the cache, we want to see that in the UI. Therefore, every journal action is also linked to an entity that is in a UI form.

Update Strategy

The cache and Decomposer components are obviously affected by the update strategy. A major decision here is whether or not to normalize the stored information. This is only important if local activities need to be reflected in the locally cached information. The advantage of normalization is obviously that the updates will automatically affect all copies of information. Imagine an order view that embeds a customer's delivery address. Any change to the delivery address would have to be reflected in this order view. When the data is normalized and stored, this would automatically happen, but it would require knowledge about the schemas and their relations. It would require a mapping to a database column for each view attribute, leading to having to have a complete offline database design, and having to maintain that design. This design should then also reflect the business logic (not necessarily the database schema) of the service.

The alternative would be to just store the views without trying to normalize them. Then, the business logic on the client would have to execute those multiple updates, if it were required to maintain that consistency for the user. I believe that this approach is preferable.

Once we have information stored locally, we do not have to fetch everything again the next time. Versions and timestamps have been used to solve this problem, but how would that work in the more generic case? One of the challenges is that we may have to deal with multiple views on an entity, and that the entity has a timestamp that is meaningful for some, but for all, views.

We can propose a service interface for querying timestamps without having to retrieve all information. For example, we could build a query that includes timestamps, and we could have the service decide whether or not to return updated views. This requires more specification.

When working through the journal, we eventually need to synchronize our actions back up. Depending on the nature of the actions, some actions (for instance, rush-ordering a part) will be processed immediately through slow/expensive connections, whereas the rest (for instance, an address change) may wait until a cheap/fast connection becomes available.

Users will need to know the "freshness" of the data—for instance, error messages for a client's machine. We can do this with a timestamp. They will also need to know that data has changed offline, or that it is pending changes. We could color-code the fact that the data was changed offline/is pending changes. This may be offered as part of the service interface and data schema, without involving the service agent.

It will also be important for the users to know whether or not a connection is available—for example, by graying out some actions or navigations, or just by showing availability in the status bar. These could be done by the service agent if the client adapts its interface to the service agent (rather than to the original service).

Based on this discussion, I recommend several strategies for working offline:

  • We can follow a CRUD approach, which I oppose for service agents. See Dealing with Concurrency: Designing Interaction Between Services and Their Agents, and CRUD, Only When You can Afford It.
  • The simplest one would be to queue requests and not change any of the data in the cache. The consumer may see a list of pending actions for the instances on which an action was requested, and these could be color-coded.
  • The next approach is very similar to the previous one, but changes are made to the cached data. This data is then marked as being updated. and a link to the journal entry is kept.

The latter two approaches are obviously not mutually exclusive. That is, "release order" may very well be a pending action that is not reflected in the order attributes, whereas a change to a contact's telephone number should be immediately visible when the contact information is displayed.

Service Agent Façade

A service agent exposes itself just as any other service would: services are described in metadata, and similar metadata is available from the service agent (see Razorbills: What and How of Service Consumption). To the client, only the endpoint is different. That is, it is transparent to the client, other than that the client needs to change its endpoint from the service to the service agent. This means that service agents also support the original views on instances and lists. Unfortunately, the change of endpoint means that, for existing clients, it is not always possible to inject a service agent by a simple configuration change. The code change is simple, because none of the logic changes, but it is often a code change nevertheless.

The service agent does not offer local business logic, but it may be used to route requests to local business logic, depending on the connection state. This local logic may then, in turn, use the service agent to queue the requests to the central service. This would remove the burden on the client to be aware of connection state.

Using the Information Bridge Framework

Because a service agent exposes operations that are, in turn, Web methods, you could also expose the service agent to its consumers as an IBF service, and put IBF on top of them. By using IBF as the CAB, IBF becomes the data provider and nothing else. With IBF, you can also use Office as your UI, making every view a form in the UI.


Building service agents presents several challenges for automating information download and synchronization, and for doing so in a way that is transparent (and painless) to the user. However, through our work in IBF, we have laid a solid metadata foundation to build these tools. This, combined with good design practices as outlined in Dealing with Concurrency: Designing Interaction Between Services and Their Agents, CRUD, Only When You can Afford It, and the work being done in the Composite Application Block, gives us all the tools we need to build reusable service agent components.