Skip to main content
Mobile Data Architecture

The Architecture Journal

by Rodney Guzman

Summary: Applications that are occasionally connectedhave a reputation for being difficult to implement. The challenges are layered,and the heart of the problem is how data is managed. Each occasionally connectedclient consumes and produces data at random intervals. Islands of data areproduced when disconnected, and must be reconciled later when connectivity isrestored. Technologies exist to move data to and from the clients, however,only the simplest of data reconciliation tasks are handled without custom codeintervening. For example, SQL Replication is a fantastic technology for movingnew data from one system to another. But when a record is updated by more thanone client, should the last one in always win? Also, what happens when new dataentities are created on multiple clients, but they each represent the sameunique entity instance? Unfortunately, data models are usually too complex to behandled by generic conflict resolution processes.

This article will discuss how to build your data model tosupport an occasionally connected application. The data model supports theintroduction of a complex conflict resolution process built in .NET. A point-of-servicesample application is introduced with steep requirements, including: Web andsmart client entry points; smart clients will be occasionally connected; dataentities that span multiple tables; and a complex conflict resolution processwhich includes support for new records that can be created on the Web applicationand/or smart client application instances.

Contents

Scenario
When You Are Connected
Complexity of Never Knowing When You AreConnected
Relying On Key Connected Moments
Conflict Resolution
Logical Records
Table History
Logical Transactions
Updated Records
Inserted Records
Conflict Resolution with .NET
Conclusion
About the Author

Scenario

For the purposes of this article, a sample event managementapplication is introduced. This application is a hosted Web site with a robustdata model. The business demands that this application be accessible in remote,disconnected areas where the events are held. At the events, the company’sproducts can be purchased and attendee information updated just as they can onthe Web site. Each event lasts for potentially multiple days with thousands ofattendees. During the event, there is no regularity at which anyone can assumeto have network connectivity.

Shortly after the event is completed, the data on eachclient much be reconciled. The simplified database diagram shown in Figure 1 isreferenced as a sample subset from this application.

Bb985497.jour14_MobileDataArch_image001(en-us,MSDN.10).jpg
Figure 1. Extremely scoped down version of the SQL tables

Re-use of existing components to minimize the cost ofbuilding a smart client application has been demanded by the business. Thebusiness does not want to rebuild their business objects already abstractedaway correctly from their ASP.NET application. They want the same businessobjects to be deployed with the smart client Windows Presentation Foundation(WPF) application. SQL Express will be in use on the desktop, and the dataschema will re-used from the Web application.

When You Are Connected

You can never rely on when occasionally connectedapplications will be connected, which imposes a tremendous burden on yourapplication. An application must be connected at certain times to perform basefunctions. For our example scenario, to optimize the event experience forattendees, pre-registration information should be accessible on thedisconnected client during the event, so the device is synchronized with thecentral data warehouse prior to the event. Regardless of how the information isinterchanged, the device needs to be connected before the event to obtain thisinformation.

The initial requirements for the application included amodel that supported an application that never required the user, at any time,to be connected to perform a function. Ironically, the user has to be connectedat some point in the life cycle of using the application in order for data tobe synchronized to and from the server. The lack of control over when theapplication might be connected poses complexities that cannot be ignored. Inthe case of this application, requirements were modified to force theapplication to be connected at key moments in time.

Complexity of Never Knowing When You AreConnected

SQL Replication supports very well the concept of datainterchange with occasionally connected clients. However, if you are managing amultistep process in which each step relies on some form of connectivity,knowing where you are in the process can be challenging. In the eventmanagement application, to rely exclusively on SQL Replication, the processflow would have to resemble the following:

  1. The list of available events would be replicated to each client device. If a new event is added, or an existing one modified, it would need to be synchronized to each device.
  2. Authorization for who can see which event would be replicated to each client device. If authorization changed, it would need to be replicated.
  3. On the local device, the user selects the event to synchronize, placing a record in the EventDownload table in the local database which includes the host name of the client machine and the event to replicate.
  4. The EventDownload table is replicated back to the server.
  5. Filtered SQL Replication brings down the requested events to the user’s device.

This process flow seems simple enough but requiresconnectivity throughout the process. Imagine having a conversation with anontechnical user about why he could not see the event on his device becausethe authorization granting him access changed after he synchronized hismachine. Relying on a model where you cannot ever control connectivity can becomplicated to debug and maintain.

Relying On Key Connected Moments

Ultimately, connectivity at some point is required. Incontrast to the pure SQL replication model, a more direct approach can be takento simplify the overall processing. Enforcing connectivity at key points avoidsseveral of the round trips described in the previous section. For example,rather than synchronizing the list of events to the local database, a Webservice can be invoked to retrieve a list of authorized events the user cansynchronize with, changing the flow to:

  1. The user launches the application and retrieves a list of events via a Web service call. Only authorized events are shown to the user.
  2. The user picks an event and calls the Web service again. On the server a record is set in the EventDownload table with the host name of the user device.
  3. Filtered SQL Replication rules associate events to client host names, specifying which events will be replicated to which client device (Figure 2).

Bb985497.jour14_MobileDataArch_image002(en-us,MSDN.10).jpg
Figure 2. EventDownload table specifies which events willdownload to which client hosts

A whole host of additional information can be obtainedthrough the Web services, from the number of records to be synchronized toprogress updates on whether a replication has occurred or is complete.

Conflict Resolution

SQL Replication and ADO.NET provide no silver bullet forconflict resolution. Most of our applications are simply too complicated tohandled by out-of-the-box solutions. We cannot always rely on the “last one inwins”—typically, we need to wait for data to be accumulated from multiplesources before making a decision on what “the best” record is.

This outlook seems abysmal, but there are techniques to cutthrough this complexity. In the event management application, we rely on thestrengths of SQL Replication as a cornerstone to the solution. A perceivedstrength of SQL

Replication is that it is fantastic for managing SQLinserts. For updates, however, there were too many scenarios where we neededfine control over the conflict resolution process at scheduled intervals.

Logical Records

In the context of the event management application, alogical record for an attendee is not only the information within the Attendeetable, but all related information. Selected sessions, product purchases, andfamily members all make up the attendee logical record. If any one of thesepieces of information is changed, regardless of what table the change was madein, then the attendee logical record has been changed.

A base requirement for this application is that if anattendee’s information is touched at the smart client, a validation of theentire attendee logical record has occurred. Therefore, even if only a singletable was modified, each table’s data that represents an attendee logicalrecord has been validated as being correct at that moment in time. This posesan interest dilemma in that standard SQL replication processing cannotautomatically synchronize all table data related to a logical record if thedata has not been touched. We managed this problem by forcing a change to occurin the database to trigger the appropriate set of events—what we call “logicaltransactions”—for SQL Replication.

Table History

The existing Web application’s database has a history tableassociated to every user modifiable table in the database. The purpose of thesehistory tables is to keep a history of all changes, when the changes were made,and who made the changes. A requirement of the smart client application is thatall of the local history changes are recorded and then replicated back to thecentral data warehouse. All of the history tables are populated by a trigger onthe respective parent table. Any time an insert or update is performed on theparent table, a new history record is produced (Figure 3).

Bb985497.jour14_MobileDataArch_image003(en-us,MSDN.10).jpg
Figure 3. History tables on key parent tables allow “transactions” to bemaintained

The history tables constitute a transaction history ofchange to each table, a fact which became important to us when considering howupdates to a record can be replicated back to the central data warehouse.Instead of replicating a change to a record back to the central data warehouse,such as an update to the parent table, we could replicate the new insertedhistory records and reconcile all the information centrally in an intelligentmanner.

In our design, we would always have at least one record in ahistory table. If a record is created on the smart client, then the triggerwould fire and produce a history record. If a record was replicated centrallyto the smart client, the trigger would fire, also ensuring that a historyrecord would be present.

In order to take advantage of history records astransactions, and to support the concept of logical records where a record isrepresented across multiple tables, we needed to bind history records acrosshistory tables. This requirement resulted in the concept of logical transactions.

Logical Transactions

A logical transaction is a representation of an attendeeacross all tables at a moment in time. What was missing from the history tableswas a way to associate them together. When a change occurred in a parent table,the trigger will fire to populate the respective history table. This triggerwould now need to be modified to perform the following functions (Figure 4):

  • Make a copy of the parent table and insert it into the respective history table (as before)
  • Obtain a new transaction ID that was date and time stamped
  • For every history table that represents the logical record, update the last history record inserted with the new transaction ID

Bb985497.jour14_MobileDataArch_image004(en-us,MSDN.10).jpg
Figure 4. Logical transactions (AttendeeTransaction) keep history recordsrelated

The third point is important in that even though we made achange to a single table, and only one new history record was produced, in thatmoment of time all of the other tables that represent the logical record havebeen validated. The functionality of obtaining a new transaction ID andupdating all the latest respective history records is encapsulated in an SQLuser-defined function and inserted into each trigger.

Logical transactions solve a key problem we had when dealingwith information being updated on multiple devices. This problem stems from thecomplexity of synchronization multiple devices at different times for the sameinformation.

Consider this scenario: Device A updates Attendee X onMonday, and device B updates the exact same information for Attendee X onTuesday. In this application, because the same information was updated, theinformation updated on device B is deemed more relevant than A because it wasupdated at a later time. However, device B is synchronized on Thursday anddevice A on Friday. In the Thursday evening conflict resolution processing, noupdates from device A are known, so device B’s updates are taken. In Fridayevening’s conflict resolution processing, device A’s information is reconciled.Without any date/time stamping of the logical transaction, it is difficult tounderstand when the change occurred and whether or not device A’s informationis more or less relevant than device B’s. Logical transactions provided a cleanrepresentation of the information for all data entities in the system.

Updated Records

When an SQL table record is updated, SQL Replication can beused to move those changes from one database to another. Sounds simple enoughwhen stated this way, however, knowing if you should overwrite the changes youhave with an update from another system is not so trivial. The last-one-in-winsrule does not always apply. There may be parts of the record you wish to keeprather than just overwriting it.

Refining how one requires updated records to flow withinyour application is important in simplifying as much as possible the overalldesign. In the case of disconnected mobile applications, the flow ofinformation flows from a central server out and lives and grows temporarily oneach disconnected device, and eventually flows back to the central server. Whenexamining this flow we made some assumptions and requirements on how the eventmanagement application would operate.

Information from the central server only flowed once to themobile device. For example, when an event was requested to be synchronized, andall the attendees’ information was replicated down to the device, any updatesmade to the attendee information in the online Web application would not bereplicated to the device. Even if we intended updates to be replicated, wecould not rely on the users to re-synchronize their devices throughout theevent. However, during the events, the attendees may update their informationonline. Our design allows multiple devices and the online system to update thesame records and for reconciliation later by our .NET conflict resolutionprocess. This is enabled by only replicating what has been inserted and notupdated. Only SQL inserts (no updates) are communicated between systems. We canthen pick a time of our choosing to reconcile all the information centrally.

In the event management application, after an event has beenreplicated to a device, a replicated attendee can be updated by the WPFapplication. When the attendee record is updated, a history record is produced.With the history record, a new logical transaction is created. Both of theseare SQL inserts. What has been inserted is replicated back to the centralserver. The parent record itself is not replicated as this would overwrite therecord on the server. The .NET conflict resolution process examines the logicaltransaction queue and determines new changes are present. It examines thehistory records that have been produced and determines the best approach tointegrate those changes into the parent records.

Inserted Records

SQL inserts are much simpler to manage with SQL replicationas there is, on the surface, no conflict resolution to handle. However, whathappens when a new attendee is added to the disconnected device that alreadyexists elsewhere? In our application, an attendee could have registered in theonline application after the event information was replicated; at the event,the attendee could be added to the mobile application; and later, the attendeecould be working with another user with their own instance of the application.Multiple instances of the same attendee record can easily be created.

We began to think of records that have been inserted on theclient as not having the same rank as records inserted on the central server.When the inserted record on the client was replicated back to the server, theinformation contained in the record must be validated before becomingaccessible to the application. To prevent these replicated records from beingseen by the ASP.NET application, they would need to be tagged and excluded fromSQL queries. The .NET conflict resolution process would examine these records,determine how to integrate the information, and decide whether or not to copythe newly inserted information into another record, retire the record, or makea new attendee available in the ASP.NET application (in our example).

For each SQL table that has records that can be replicatedfrom the device to the central server, a new column was added to maintain thestate of the record. Each record could have one of the following states:created on the server, not reconciled, merged, retired, or reconciled. If therecord is created with the ASP.NET application, it would be tagged as “createdon the server.” This state tag assists with filtered replication of recordsfrom the server to the client. If a record is created on the client, then it is“not reconciled.” When this record is replicated back to the server, then it isin the same “not reconciled” state, identifying the record to the .NET conflictresolution process as a record that needs to be processed. The ASP.NET Webapplication will ignore any records in the “not reconciled” state. Onceprocessed, depending on the action taken by the conflict resolution process,the record’s state would be changed to “merged,” “retired,” “reconciled,” or“unknown.”

Conflict Resolution with .NET

The requirements of the event management mobile applicationto manage conflict resolution go beyond what can be handled with out-of-the-boxtechnologies. With logical transactions and state associated to insertedrecords, our goal was to get the data all in one spot in order to be able tomake intelligent decisions about how to handle the information. Part of thepuzzle is to determine when to actually make these intelligent decisions.Standard out-of-the-box technologies make decisions at the time the records aremoved. In contrast, we needed to have a complete picture of all the insertedrecords (with parent tables and history tables) before determining how toreconcile the information. A business process was put in place to run the .NETconflict resolution process at 1 A.M. server time to reconcile the previousday’s replicated information.

When the process begins, it obtains all logical transactionsfrom the day that process last successfully executed to the day previous to thecurrent date. An attendee may be represented by multiple logical transactionsas the information could have been updated multiple times on one or manymachines. For multiple updates for the same attendee on the same machine, onlythe last one entered is processed. From the remaining records, logicaltransactions are grouped by attendee and sorted by date and time entered.

As each logical transaction is processed, it is examined forwhether or not the attendee record was originally replicated from the centralserver or entered on the client machine. If originally replicated, then thehistory records have already been replicated back and associated correctly. Weare now free to implement as much detail as necessary in determine whatbusiness rules should be executed on the information. Each logical recordreferenced by the logical transaction has one or more tables associated to it.We can decide to simply overwrite more recent information across all tables, orone or more of the tables, or specific columns within any of the tables. Thisall depends on the business rules and can be as complex as necessary.

If a logical transaction is associated to an attendee recordthat was created on the client, then a determination must be made as to whetherthe attendee already exists in the system. At this point in time, this recordis not viewable by the ASP.NET application as it is in a “not reconciled”state. A composite key is constructed from the attendee records to uniquelyidentify them. The existing attendees are examined to see if there is a matchto the new record. One of three pathways is taken at this point: the newattendee record already exists, is brand new, or this is not enough informationto make a decision. If the attendee already exists, then all of the historyrecords associated to the attendee record in the “not reconciled” state areupdated to point to the pre-existing attendee record. The same rules then applyas if the pre-existing record had been updated. If the attendee recorddefinitely is new, then its state is changed to “reconciled” and it is madeavailable to the ASP.NET application. If no determination can be made, then therecord’s state is changed to “unknown” and an exception management Web applicationis run to determine what to do with the record.

This process is repeated for each logical transaction. Allthe work that has been done was in preparation for this process to simplify itas much as possible and to centralize all complexity into a single spot.

Bb985497.jour14_MobileDataArch_image005(en-us,MSDN.10).jpg
Figure 5. RecordState column added to remember where therecord came from and how it was processed

Conclusion

Mobile applications are challenging to build. There is nomagic framework we can all use to enable disconnected scenarios. We have powerfulplumbing, but without customization, it only solves the simplest scenarios. Theapproach described in this article is generic enough to be applied to acrossdifferent disconnected applications. Relying on the power of SQL Replication toperform only inserts allowed us to focus conflict resolution in the .NETprocess. Although much data work had to be done to augment the database schemato support it, we believe it simplified, as much as possible, the conflictresolution process.

This approach is not new, and it has limitations that makeit appropriate for only a subset of mobile applications. Many mobile scenarios,for example, require real-time resolution of data as soon as it has beenreplicated to the central server compared to the many hours of delay proposedin this solution. Unidirectional replication of only inserted records may notbe feasible, as our choice here was to centralize when and where we performconflict resolution. Data loss due to conflict resolution has also beenrelegated to manual intervention in this solution. We took advantage of therequirement to retain and replicate all changes to records centrally, whichbecame the transactions we based our conflict resolution process on. If youhave no such requirement, then a burden is placed on replicating more data thanyou may prefer or be able to do. Ultimately, the business process driving yoursolution will dictate how cavalier you can be with your choices.

About the Author

Rodney Guzman is the CTO and cofounder ofInterKnowlogy. He got his start in software systems working on submarine sonarsduring college. For seven years at SAIC, Rodney was the lead developer andarchitect on such projects as a large Java SOA HTTP/XML based Web portal on militaryhospitals throughout the country. In 1998 Rodney moved to Stellcom to work onmore Microsoft projects, including Site Server implementations and anenterprise security framework for Pacific Life, that allowed custom policiesderived from AD groups and attributes to drive personalization and security onASP Web sites. At InterKnowlogy, Rodney steers the technology direction, actingas lead architect on its largest projects, such as a large SOA implementationwith a smart client framework, creating large Microsoft Web properties (CommNetand Partner Campaign Builder), and large MOSS implementations. Rodney architectedthe WPF/MOSS Scripps Cancer application. Rodney has spoken in numerousMicrosoft events and has written numerous articles. He has sat on the CommercePAC and Microsoft Architectural Advisory Board, and is a Solution ArchitectMVP.

 

This article was published in the Architecture Journal, a printand online publication produced by Microsoft. For more articles from thispublication, please visit the Architecture Journal Web site.