.gif)
by Peter Hammond
Summary: The successful exchange of data
is the most integral function of applications today, and yesterday, for that
matter. This article highlights some experiences from the viewpoints of an end
user, novice developer, professional consultant, and the development team lead
over the course of the past 10 years. I've had some fantastic successes and
disastrous failures, and gained many insights along the way. I detail how the
architecture of the past combined with the technology of today has led to
successful exchange of data over the Web. (10 printed pages)
Contents
WORMS, ARROW, and SQL Server
Web Services and CyberSavvy
The Enterprise Product Roadmap
The Emerging Business Team Tracker
Data Confidence
References
In the fall of 1994, I was transferred to Fairchild Air Force
Base in Washington State, after a four-year tour at the beautiful Aviano Air Base in
Italy. One of my responsibilities as a security police desk sergeant—a
cross between a police dispatch and 911 operator—was
to document the activities during my shift. In Aviano, this consisted of a
typed 12+ page report with 10 carbon copies, each one distributed to the
command for daily review. If any part of the report had to be fixed, the entire
report had to be retyped and carbon copies redistributed. We killed a lot of
trees. You can imagine my excitement at Fairchild when I was introduced to the
new Windows-based Security Police Automation System (SPAS) for police reports.
I would later learn that acronyms can predict trouble down the line. This
networked client/server application built on FoxPro was a wondrous thing, as
you could type out all your reports, correct any issues found, and distribute
to command digitally. No more trees would die in vain. So it seemed.
Unfortunately, as time passed and the amount of data grew, SPAS
freaked out. After spending hours typing out the detailed notes on an incident
and clicking Save, a cryptic "Index Corrupted" error would flash.
Even though your work still appeared on the screen, in reality it was gone,
although you might not know it until command pulled you back in after you had
gone home because your report was missing.
The consequences were swift. Everyone started doubting the SPAS.
Superstitions arose about what would cause the system to fail. Was it hitting
Save more than once, having a large amount of text, or standing on one leg
while you typed with one eye closed? No one knew for sure. In the end we
started printing out the entire report again, just in case. Thus began my
understanding of the importance of having confidence in data systems. Without
confidence in solid data communications, the users (and trees) pay a terrible
price.
I started to create my reports in Microsoft Word and then
cut-and-paste them into SPAS. I was getting my work done faster since I didn't
have to retype all of my entries, even if there was an "Index Corrupted"
error, and I quickly realized I had free time. The value of having technology
work for me rather than the reverse wasn't lost on me. I spent my newfound time
learning more about Microsoft's Office products.
As luck would have it, I got the opportunity to start doing light
development building computer-based training programs in Microsoft PowerPoint
for the Air Base Operations course (ABO). I created presentations with embedded
graphics, sound, and video which made the training a big hit. When we needed to
track who had completed the training, instead of using a paper signature sheet,
I wrote a testing program that would track who had attended the training.
Microsoft Excel was my first choice because that seemed to be where everyone
tracked data, from flu shots to training records to test scores, and so on.
Using Excel, I was able to define the questions as one sheet and the answers in
another, quickly build a testing form, and successfully demonstrate it to my supervisor.
All seemed well until some obvious issues arose. We posted the
Excel test to a network share to enable multiple people to take the test only
to find out it could only be opened by one person at a time for edit and
read-only mode for everyone else. I used hidden sheets to work around the
security issue of storing the questions and answers in the same file. These
measures failed as people found it faster to use the paper test forms rather
than to wait for the single file to become available and some users discovered
how to unhide the answer sheet. Yes, more trees suffered.
Little did I know that my baptism by fire into the difficulties
of using network access for multiuser tenancy and ensuring real data security
was just the tip of the iceberg.
I found Microsoft Access which promised to solve these problems
and more, with its support for multiple simultaneous users, user-based
security, plus the ease of Excel in defining tables, the ability to have
complex data schema, and the advanced form designer.
The new Access testing database was a great success and I was
granted server space on the base's CD tower server in the network center so the
entire base could complete the mandatory training and tests remotely. The
successful results quickly poured in and the training appeared to be going very
well—unfortunately, a little too well.
I happened to notice that everyone was getting a 100 percent on
all the tests the first time. During the design, I had included a creation and
modification timestamp on each user's answer. I knew something was amiss when I
ran a report using the difference between the created and modified timestamps
on the answers. Most users were finishing the 30 question test in less than 27
seconds. You can imagine my surprise, as I figured out they were cheating! As
the test questions were the same sequence, someone had created a cheat template
(on paper, I'm sure). Serving the questions randomly helped ensure the tests
were being taken legitimately. However, now that people weren't cheating, I had
a new problem: The tests were taking longer to complete, so more people were
accessing the tests at the same time. The Access database brought me to the
harsh realization of what (lack of) scalability meant.
Apparently, the CD server started throwing a "Record Lock
Timeout Error." Each one resulted in a notification bell that got pretty
annoying to the network administrators, who weren't very happy that I (a
security policemen) was playing developer. So they shut off the bell. As the
number of users grew and the number of record-locking timeout errors increased,
the inevitable occurred: My little Access database dropped the CD tower server
and lost multiple in-process tests, including those of several very unforgiving
officers. This was particularly bad on its own, however, the impact was
far-reaching as the base was in the middle of a huge deployment of both Office
95 and Windows 95 from the CD server. I solved the issue by limiting the number
of simultaneous users; however, the lessons of multiuser concurrency, data
integrity, network resources and scale were not wasted on me.
In the end, the ABO course was a big success and the training was
well received. I went on to build several more database programs with Access,
enjoying its rapid development features but always wary of its limitations.
WORMS, ARROW, and SQL Server
After I completed my second enlistment, I decided to try the
civilian sector as a professional consultant. I found a lot of work fixing and
upgrading a myriad of Excel- and Access-based systems. It seemed the world ran
on Excel, and not very well. I eventually migrated into Visual Basic which
enabled me to use Access as a back end and code around most of its issues.
I was afforded the opportunity to work on the Work Order Remote
Management System (WORMS). WORMS
was based on Microsoft SQL Server 6.5 and was an incredible system. Its scale
was huge as it serviced hundreds of users, processing thousands of records. But
the system had gone to the worms.
Apparently, WORMS was losing records at random. The loss occurred somewhere between the local SQL
database and an Oracle system of the parent company. As a stopgap measure, all
the work orders were printed out on paper in the morning, then hand-entered
into a separate Access database that was copied to corporate each day to
reconcile lost records. You guessed it, no data confidence and more trees
killed in a single day then I did in my entire military career.
My job was to baby-sit the system while its replacement, the
Automatic Remote Routing Of Workorders (ARROW), was developed. As I tracked
down the loss of the records, I was very impressed with the diagnostic
capabilities in SQL—from tracing to event
logs, I could track all the data from end to end. With this advanced level of
diagnostics, I was able to determine the cause and fix it, subsequently finding
the records and losing my job all in one fell swoop.
The fix worked so well that it drew into question whether the
customer needed the replacement ARROW system and I was quickly pulled from the
project to ensure I didn't "fix" anything else. Regardless of the
outcome, I was hooked. SQL Server was the answer I had been looking for, and
then the real questions started.
As I advanced with SQL Server, the successes followed.
Client/server-based applications were solid, easy to build, fast to
troubleshoot, and users loved them because when they hit Save, it saved.
Confidence in data equaled successful applications.
Years later, I was contracted to work on a project for tracking
training that Microsoft conducted as an intranet academy. Using Microsoft .NET
and SQL Server 2000, we quickly built a client/server application that met the
users' needs extremely well. Over time, however, the number of support issues
increased markedly as people tried to use the application remotely. Since it
had to load all the records from the server each time it was launched, the
performance over a VPN connection was dreadful. Nevertheless, users were happy
because they had confidence that their changes were being persisted
immediately, and with all the records loaded in memory, the UI was very
responsive. The application was used for years.
Eventually, the resource requirements, network connectivity
issues, and the explosion of the Internet caused the popularity of
client/server architecture to fade. N-Tier implementations, with Web services,
were hailed to be the holy grail of networked applications for online data exchange.
Web Services and CyberSavvy
We jumped on board without looking back. I say "we" as
I had started CyberSavvy, a vending company providing high-end consulting
services. I hired the best of the best, some that I had had the privilege of
working with over the years or by referral only. We focused on creating
best-of-breed solutions for Microsoft's toughest internal business problems.
Web applications were the buzz, but Web interfaces that lost all
your data when the page errored or connection went down did not build data
confidence. Instead, we focused on building smart client applications using
Microsoft .NET. They provided the rich features needed for applications that
addressed complex business problems, involving complex relational data from
many sources, requiring instant results and the ability to work both on- and
offline.
Over the last five years, we built 15 smart client applications
for Microsoft and built a strong working relationship with Microsoft's
ClickOnce, .NET, ADO,
and SQL teams. Most of these applications focused on solving issues facing
Microsoft's business and sales force.
The Enterprise Product Roadmap
One of the most interesting solutions we built for Microsoft
helps keep the field updated with new product release information.
Since the sales force often works in disconnected environments,
either in transit or at a customer's site, the necessity for offline ability
prohibited a Web application or typical client/server solution.
The requirements to quickly search, sort, and filter through all
Microsoft products for custom presentations necessitated having all the data
locally, yet kept up-to-date. The added need for intense security as certain
product information had to be filtered by granular roles required row-based
permissions.
By using SQL Server, Web services, and a smart client distributed
with ClickOnce, we achieved fantastic results in all sectors. (See Figure 1.)
Figure 1. Original architecture for the Enterprise Product Roadmap, as of .NET 2.0 Beta 1 (Click on the picture for a larger
image)
The Enterprise Product Roadmap (EPR, not to be confused with ERP)
was rapidly adopted by the field, scaled incredibly well, was secure, worked
online/offline, and was highlighted as a Microsoft showcase.
One disadvantage was trying to create and maintain a Web service design
that used multiple Web methods to select and post data to each individual table
(EPR had 20+ tables). We standardized on automated Select, Insert, Update, and
Delete stored procedures that were used to create a dynamic ADO.NET Dataset
DiffGram and serialized to XML a Web service that had just two methods.
The select Web method queried all of the data from all of the EPR
tables a user had access to and returned a server datetime that annotated when
the data was queried. Subsequent use by the same user would result in a
DiffGram dataset, containing XML that represented new or updated data, as well
as deletes.
The post Web method took a DiffGram with pending changes. These
changes were then sorted by their change type and foreign key relationships, so
that they may be committed to the SQL database in the correct sequence. That
way, we had Web methods that did not require changing if we modified our
schema. Nevertheless, as the schema was revised, we still had to revise the data
layer that these two Web methods relied upon. For each new column or table,
stored procedures, dataSet schema, and the Web service's data-layer code needed
to be revised to match; a process that we automated over time. This model
allowed for rapid updates to the smart client's data cache by using ADO.NET to
do all the hard work.
We had extraordinary success with this on many applications—that is, until we tried it with the Emerging
Business Team Tracker (EBT).
The Emerging Business Team Tracker
With many successful implementations of this architecture behind
us, we tackled EBT with great confidence. We had created automation processes
to dynamically build out the data and Web service layers from our standardized
stored procedures. Our security fully integrated with Active Directory with the
ability to filter user data at the row level. We built robust client
applications using WinForms with all the bells and whistles of highly polished
applications that users raved about. Our implementations were considered Microsoft
showcases and used by the field to help sell the power of Microsoft
technologies. Regardless of our previous successes, EBT would soon make an
important point painfully apparent. Looking back, it's clear that the majority
of our projects were report and query applications, basically read-only. EBT
was a horse of a different color, and it was about to remind me of an old
lesson.
EBT was both a reporting and a data capture program, comprising
three separate systems that we combined into a single solution. The promise of
a single application providing offline access and advanced search was greatly
anticipated by the users. We released the application to great fanfare,
confident of the success of the solution. We were completely unaware of the
horrors to come.
At first everything went very well. Most users were connected to
the corporate infrastructure while using EBT and making data changes. The
amount of changes in a single DiffGram posted to the Web service was very
small. As things progressed, more users worked offline for longer periods of
time, and soon the support issues started.
As with all of our other applications, EBT stored its data using
in-memory ADO.NET datasets. This worked relatively well for small amounts of
data. However, EBT's dataset of about 60+mb, when serialized to disk, caused a
slow launch time, as we had to deserialize all the data into memory on load. The
problems really started when we serialized it back out to disk on exit.
We utilized DiffGrams on the client in the same way as the server,
to communicate pending changes back and forth. Since pending change DiffGrams
were persisted on exit, if the operating system or applications caused a fault
before it was able to finish, the saved changes were lost. Sometimes the crash
happened after the UI closed, so the user wouldn't know there was even an issue
until the next time they launched and noticed their changes missing. We tried
to compensate for this by doing incremental saves while the application was
running but it caused various other issues, such as lags in the performance.
Regardless of how well we did it, it wouldn't handle 100 percent of the cases.
Multiuser Concurrency
The next big issue was multiuser concurrency. As the DiffGrams
were basically sequential Insert/Update/Delete actions, if any part of the
steps failed when applied to the server, the entire DiffGram failed. This
manifested itself when a user working offline for several days making huge
amounts of changes would synchronize and one of their first changes would be
rejected due to a concurrency violation because the row had been updated by
someone else while they were offline.
That caused the subsequent actions to all fail back to the
client. We tried to resolve the issue automatically but the client side was
usually so out of sync with the server, the user's changes compounded in a series
of conflicts which corrupted the local DiffGram. The user would then force a "Full
Data Reset" option, deleting their cache and pulling down a new copy. Of
course, this caused all their pending changes to be lost as well. We built more
and more diagnostics into the calls and more and more failover retry logic, but
users lost confidence that their work would really be saved.
The Worst Was Yet to Come
The applications usage and the amount of changes being applied by
both the server and the client rose exponentially. The amount of
Insert/Update/Delete changes that needed to be reconciled at the same time by
ADO.NET started to have some all too familiar issues. You can imagine my horror
when a user sent in a support issue with screen shot of a huge error dialog on
EBT with the accursed "Index Corrupted". I couldn't believe that I
was responsible for recreating SPAS.
For almost a year, I funded an effort to fix the flaws and make
EBT work— from creating asynchronous dataset
backups, to identifying several obtuse ADO.NET bugs with large scale DiffGrams,
to building advanced logging diagnostics into almost every step of the Web
service.
We started exploring memory maps and binary datasets. We wrote an
administrative utility that could reload the various individual DiffGrams from
their XML files and recreate the entire change set in memory. Then we would try
and work through the dataset changes row by row sending them one by one to the
Web service to find out which one actually violated a constraint, errored on a
security issue, or simply did not work for some indeterminate reason. Once
identified, the change(s) could be removed and then an attempt was made to
resubmit the rest of the DiffGram. This would often fail as the changes were
interdependent due to the complex structure of the data and security.
In the end, most attempts were futile; the hierarchical nature of
the errors combined with the offline aspect of the changes made it next to
impossible to diagnose the original cause of the issues without capturing the
state of both the client and the server at the same time.
We later found the same issues were actually occurring with our
previous applications as well, such as EPR. But since they used read-only data,
when a concurrency issue occurred, a new dataset would automatically be applied
over the one in error and the user was unaware that anything had happened. With
EBT, the user experience was catastrophic as the new dataset would erase any
pending changes they may have had. Support incidents were through the roof, and
most of the time the issue was too far gone to even try to fix. Users were
literally cursing us, and, worst of all, they started saving screen shots of
their data edits and then using these to reenter their changes when they were
lost. Not quite as bad as printing it to paper, but just as damaging to the
success of the system.
The more things we fixed and workarounds we applied, the more
obvious it was that we were creating custom processes to handle data to Web
service synchronization, which we had chosen not to do in favor of using the
out-of-the-box ADO.NET DiffGrams. We also couldn't focus on building the
remaining application features required by the client, which hampered user
adoption even further.
SQL Merge Replication
Searching for answers, we engaged the ADO.NET team and did
extensive reviews of the architecture. By all rights we had done exactly as
prescribed. The basic elements of the architecture worked for smaller
implementations or those that were read-only. However, using ADO.NET with
DiffGrams to communicate these changes was determined to be inadvisable due to
the combination of the size and complexity of the data, the row level security
requirements, and the offline model with data entry.
As we searched for new ways to resolve the issues encountered, it
became apparent that we needed a client-side database engine to handle the data
locally and would need to create a solid middle tier to ensure reliable
communication.
We looked into SQL replication seeing that it was essentially
what we were doing. Replication had always been represented to me as a great
idea, but a terrible reality. We engaged with the SQL team and joined the Yukon
TAP program. SQL Merge Replication to synchronize from the SQL Server to a
client SQLExpress instance seemed optimal, but would it really work?
Over the course of two months, we completely removed the various
tiers of data layer, data transformations, and Web services from the server and
client application. We then rebuilt the EBT client to work directly against the
local SQLExpress instance and it depended solely on SQL Server Replication for
the data transport.
Client/Server Simplicity, Multiuser Tenancy, True Scale and Offline Ability
Success was immediate. The user's changes were saved directly to
a local SQLExpress database bringing the same reliability to the client that
the full version of SQL brings to the server. The reduction in multiple tiers
of custom logic, and the addition of SQL's robust diagnostics and event
logging, enabled us to provide quick resolution to almost every data support
issue we received. As our users' confidence in the data's integrity increased
so did their demands on our architecture. To our great relief, SQL has stepped
up to meet every new challenge without compromise.
From EBT user on 12/23/2005:
"Sorry, but this app has little trust
worthiness at this point, and wondering if I should dump everything I do to
Word documents."
From same EBT user, five months later, on 5/1/2006, after
the new release using SQL replication:
"So far, so good. I've done almost every
combination of infrastructure usage that would typically create problems for
replication or stability, and have not had one problem. Congratulations!"
Data Confidence
We continue to find advantages with the new design. For example,
replacing ADO.NET in-memory datasets with a true query engine allowed us to
strip almost all the extremely complicated data processing code that was
necessary to represent the various "views" of the data the customer
wanted with standard SQL. Thus, we were able to leverage the DBA for client
functionality and to remove a lot of hard to maintain custom code. In some
cases, code size dropped by more than one-third.
As we progressed with SQL replication, securely transferring data
over the intranet and Internet was a concern. One of the promises of Web
services was the idea that the design could be used to communicate data over
the Web. Again SQL replication provided the answer, as it came with an
out-of-the box implementation of the Web replication component. By using the
REPLISAPI component, schema, data and updates to both can be securely and
reliably relayed over https.
We have had such great success with SQL replication that we are
now providing a Software as a Service (SaaS) product line called DataPlace (see
Figure 2). DataPlace exposes the tremendous power of SQL Server and replication
for the end users, novice developers, professional consultants, and a
development teams, without the complexities of having to set it up for
themselves.
Figure 2. DataPlace architecture (Click on the picture for a
larger image)
It provides a database factory to design, develop, deploy, and
use the power of SQL databases dynamically in a fully hosted environment. It is
the solution we now use to build all of our latest smart client applications.
This article recounts our experiences over the course of several
years and how we've spent a considerable amount of time and effort trying to
solve the complex problems relating to working with data over the web in a
multiuser, secure, scalable, confident and repeatable way. These experiences
are not uncommon among developers, and they illustrate the necessity for
dependable products and the knowledge to use them. Microsoft SQL Server 2005
Replication has provided a solid answer to these complex technical problems.
Now, my team and I can finally focus on what we started out to solve, our
customers' business problems!
References
Aviano Air Base, Italy
Fairchild Air Force Base
"Configuring and Maintaining
Replication," MSDN Library
Microsoft .NET Framework 3.0
Smart Client Community
Microsoft Case Study: Smart Client Sales Application Saves Time and Cuts Deployment Costs
CyberSavvy.NET DataPlace—Information and free trial
About the author
Peter Hammond is president of CyberSavvy,
a Microsoft Preferred Vendor. He served two enlistments in the U.S. Air Force,
during which he became interested in programming. After duty, he gained
real-world experience working in various roles from network support to development
consulting at several companies including Microsoft. In 2002, Peter recruited a
team of exceptional professionals focused on building showcase solutions with
Microsoft's latest technologies. CyberSavvy has built more than a dozen
enterprise applications, notably the Enterprise Product Roadmap, used by more
than 10,000 Microsoft employees for more than five years. Peter can be
contacted at peter@cybersavvy.biz
This article was published in the Architecture Journal, a print
and online publication produced by Microsoft. For more articles from this
publication, please visit the Architecture Journal Web site.