Anthony Bloesch
Microsoft
Last Updated April 3, 2009
Summary: This
white paper is an overview of the "Oslo" repository's design for
supporting schema and data versioning.
Introduction
For the codename "Oslo" repository to serve the
needs of its customers, it must address several versioning issues. These issues
fall into the following categories:
- Application lifecycle management (ALM):
Facilitating team development of software in a continuous cycle of steps (that
is, definition, design, development, testing, deployment, and management).
Traditionally, Source Code Control Systems (SCCS) and build systems, like
Visual Studio Team Foundation Server, automate ALM.
- Schema evolution: Changing schemas over time in
such a way that applications continue to run unchanged or with minimal changes.
- "Oslo" product versioning: Upgrading a
customer's "Oslo" repository, from version n to version n+1, in such
a way that the data is preserved.
- Application versioning: The ability to upgrade
applications, stored in the "Oslo" repository, to new versions. The
upgrade process has the option to retain the data from older versions of the
application.
- Operational versioning: The ability to
understand operational/management changes to a deployed application.
This white paper describes current plans for how the "Oslo"
repository's design addresses these issues.
Application lifecycle management (ALM)
Modern software is developed using source code control
systems and automated build. The repository is built using codename "M",
a textual language for data that is well suited to both of these environments. "M"
allows all aspects of data definition to be captured in SCCS-friendly text
files that can be version-controlled by any SCCS. Additionally, support for
extracting information from running repositories as "M" source text
(forthcoming) will allow traditional diff-ing tools to determine how data has
evolved from its original definition.
The repository works with automated build systems. Repository
models that are defined in "M" can be statically type checked and
validated as part of a normal build process without requiring a running
database. This includes the ability to statically compile new models against
the predefined models that ship with Oslo.
Schema evolution
As technologies evolve, so must the corresponding "Oslo"
repository schemas. The "Oslo" repository must support the easy and
robust migration of "Oslo" repository-based applications to new
schema versions. Applications that use the "Oslo" repository must
continue to work with little or no change. However, evolving arbitrary database
schemas and associated applications' queries and data access layers requires
careful attention from schema developers. While the repository does not eliminate
the need for developer attention, it at least provides proven mechanisms for
dealing with the problem of schema versioning.
The strategy used to solve the schema evolution problem is
to expose data through updatable views. As schemas evolve, updatable views
remain in place but map to new structures. Existing code that accesses the
repository based on previous schemas can continue to work as schemas evolve.
If these legacy views remained in place forever, they would
make innovation difficult since schemas could evolve only in limited ways.
Thus, legacy views will be deprecated according to a defined schedule. Figure 1
illustrates how this works for a simple case.
.jpg)
Figure 1. A simple
example of how a schema change works in practice. In version x+1, support
for international addresses is added (shown in green) and support for ZIP codes
is deprecated (shown in red); then, in version x+3, support for ZIP codes is
removed. In versions x+1 and x+2, ZipCode now is mapped to PostCode and old
applications continue to work. In particular, a program developed against
version X can continue to work as before.
Developers who use an object-relational mapping framework
(ORM) that includes schema mapping facilities (such as the ADO.NET Entity
Framework) may also use their ORM's mapping layer to alter how their static
class structure projects down to the target database schema.
This approach has the following advantages:
- It mitigates most schema evolution issues.
- It allows schemas to evolve.
The principal disadvantages are:
- The implementation is more complex.
- Some SQL Server features do not work as well
with views as they do with base tables.
"Oslo" product versioning
There are essentially two approaches to upgrading the "Oslo"
repository database as a whole from one version to another: in-place update and
data transfer (that is, Extract, Transform, and Load, or ETL). In-place update
is suitable for patches and other updates where changes are minor. Because of
the risk of service and data loss, an "Oslo" repository database
should not be in-place upgraded to a new version. Rather, a new repository
should be created and the data from the original repository should be migrated
using ETL and applications migrated as appropriate.
"Oslo" repository upgrades will come with an SQL
Server Integration Services (SSIS) package to transfer the data from the
previous version of the "Oslo" repository to a new version of the "Oslo"
repository (see Figure 2).
This approach has several advantages:
- It matches how IT organizations usually upgrade
databases and other servers.
- Applications can be migrated from one version of
the "Oslo" repository to the new version in an incremental and
disciplined way.
- The risk of data and service loss is minimized
because an old version of the repository remains running until applications can
be migrated to the new version.
- By customizing the SSIS package, an organization
can also transfer its custom data along with the other "Oslo"
repository data. Since most organizations that have custom "Oslo"
repository schemas will have significant in-house expertise in developing SSIS
packages, this should minimize the customer's development costs.
.jpg)
Figure 2. An example
of transferring data from version x of the "Oslo" repository to
version x+1 of the "Oslo" repository. An SSIS package is used to
transfer the data in an old version of the "Oslo" repository to the
new version. Application1 has not yet been migrated to the new version of the "Oslo"
repository. Application2 has been migrated to the new version of the "Oslo"
repository.
Domain versioning
The data in the "Oslo" repository is often
logically grouped together based on version information. For example, the
repository models for storing information about .NET Assemblies needs to record
information about assemblies from multiple versions of the framework.
Performing fine-grained versioning of individual rows would significantly
reduce performance and complicate the data model. Moreover, the need to
logically group data is so common an idiom that the repository provides a
common mechanism for organizing data. That mechanism is called a repository
folder.
Repository
folders balance between the competing demands of ease of use, performance, and
functionality. Oslo content is organized in terms of a hierarchy of folders,
which themselves are modeled in the database. Essentially, folders segment the "Oslo"
repository's data horizontally with each row belonging to a single leaf folder.
Typically, folders will be structured analogously to how an organization thinks
about its assets (for example, /Finance/Applications/Pay Roll/2.5). After
deployment, the data in a folder can be updated in place. This is analogous to
the way a patch can be applied to an application without conceptually updating
the version number.
The relationship between different versions of items in
separate folders (for example, the .NET Framework 1.0's System.Object and the
.NET Framework 2.0's System.Object) is computed based on domain-specific
resolution rules (for example, CLR linking rules). This is natural in that it
matches whatever mechanism the domain uses to resolve references across
versions and it minimizes the performance impact of supporting the relatively
rare operation of comparing "Oslo" repository items across major
versions. Schemas will usually supply SQL functions to make this name
resolution easier. For example, the CLR schema could provide a function that
takes a method's ID and finds the best matching method in another folder.
Since the relationship between folders is not prescribed,
users are free to structure folders in arbitrary ways. However, folders will
usually have a tree structure that follows the enterprise's taxonomy of IT
assets.
Structuring the "Oslo" repository in this way has
a number of advantages over previous repository approaches:
- It is easily understood and is a good match for
how software artifacts are thought about in enterprises.
- It can be efficiently implemented.
- It supports efficient and secure scale-out and
scale-up strategies.
- It has a clean relational mapping that does not
hamper the ability to query the repository.
- It provides a natural granularity for securing "Oslo"
repository contents (that is, users are granted or denied access to an entire
folder).
- It provides a natural basis for data
partitioning.
Operational versioning
The "Oslo" repository supports storing multiple
versions of an application in different repository folders (for example,
.NET/2.0 and .NET/3.0). However, once an application is deployed, it might need
to be changed for operational reasons (for example, replacing soon-to-expire
certificates and installing security patches). In the process of maintaining
the application, operations staff or automated systems like SMS might make
incorrect changes that might not be discovered for some time (for example,
installing a certificate with too short a lifetime).
When a system fails, operations staff need to quickly
diagnose and repair the issue. To do this, operations staff need to understand
which changes have been made and which components the changes affect. Once
incorrect changes are identified, operations staff need the ability to perform
corrective compensating actions. Rolling back all changes since an incorrect
change would often remove many correct changes; thus, operations staff need the
ability to perform individual compensating actions, not just roll back all
changes after a point in time.
To help with the operational scenarios described above, the "Oslo"
repository can maintain a historical log of changes. A historical log enables
operations staff to explore and search the history of operational changes and,
for "Oslo" repository-based runtimes, issue appropriate compensating
changes (for example, restore an old Access Control List or ACL).
There are several options for collecting "Oslo"
repository change events. Enterprise versions of the "Oslo"
repository use SQL Server 2008's Change Data Capture (CDC) to track changes to "Oslo"
repository tables. "Oslo" repository provides table-valued functions
(named with the pattern [Schema].[ViewNameChanges]) over CDC to abstract away
how views map to base tables. The approach has the advantages of aligning with
the Microsoft Sync Framework and providing a simple mechanism for "Oslo"
repository clients to query for changes. The principal disadvantage is that the
approach reduces throughput since changes to the "Oslo" repository
are recorded twice. However, because CDC is based on a very mature SQL log
reader implementation, CDC is also very efficient and robust.
All versions of the "Oslo" repository also support
tracking changes with SQL Server 2008's Change Tracking support. The
disadvantage of this Change Tracking over CDC is that it records minimal
information about the change (for example, it would record that an assembly
with ID 10 was deleted, but would not record other values like the name of the
assembly that was deleted).
The "Oslo" repository implementation wraps some
CDC and change-tracking functions so that security policies can be enforced.
For example, if a user is not allowed to see the contents of a folder, then the
user should also not be able to see changes related to the folder.
Summary
The term "versioning" encompasses many aspects of
the evolution of a repository, ranging from tracking changes to the data itself
through to deploying new versions of the "Oslo" repository. Each
aspect imposes its own demands and needs careful treatment if the "Oslo"
repository is to remain useful. This document details the pragmatic strategies
that will allow the "Oslo" repository to meet these competing demands
and significantly improve the software lifecycle.
About the Author
Anthony is a principal software architect working on the
"Oslo" repository. Anthony was previously an architect and
development manager on Visual Studio Team System, the director of Visio's
software and database group and a research fellow.