Anthony Bloesch
Microsoft
Last Updated November 9,
2009
Summary: This white paper is an overview of the design
of the SQL Server Modeling services repository database for supporting schema
and data versioning.
Introduction
For the SQL Server
Modeling Services repository to serve the needs of its customers, it must
address several versioning issues. These issues fall into the following
categories:
- Application lifecycle management
(ALM): Facilitating team development of software in a continuous cycle of
steps (that is, definition, design, development, testing, deployment, and
management). Traditionally, Source Code Control Systems (SCCS) and build
systems, like Visual Studio Team Foundation Server, automate ALM.
- Schema evolution: Changing schemas
over time in such a way that applications continue to run unchanged or
with minimal changes.
- Product versioning: Upgrading a
customer's repository, from version n to version n+1, in such a way that
the data is preserved.
- Application versioning: The
ability to upgrade applications, stored in the repository, to new
versions. The upgrade process has the option to retain the data from older
versions of the application.
- Operational versioning: The
ability to understand operational/management changes to a deployed
application.
This white paper
describes current plans for how the repository's design addresses these issues.
Application
lifecycle management (ALM)
Modern software is
developed using source code control systems and automated build. The repository
is built using codename "M", a textual language for data that is well
suited to both of these environments. "M" allows all aspects of data
definition to be captured in SCCS-friendly text files that can be
version-controlled by any SCCS. Additionally, support for extracting
information from running repositories as "M" source text
(forthcoming) will allow traditional diff-ing tools to determine how data has
evolved from its original definition.
The repository works
with automated build systems. Repository models that are defined in
"M" can be statically type checked and validated as part of a normal
build process without requiring a running database. This includes the ability
to statically compile new models against the predefined models that ship with the
SQL Server Modeling CTP.
Schema
evolution
As technologies evolve,
so must the corresponding repository schemas. The repository must support the
easy and robust migration of SQL Server Modeling Services applications to new
schema versions. Applications that use the repository must continue to work
with little or no change. However, evolving arbitrary database schemas and
associated applications' queries and data access layers requires careful
attention from schema developers. While the repository does not eliminate the
need for developer attention, it at least provides proven mechanisms for
dealing with the problem of schema versioning.
The strategy used to
solve the schema evolution problem is to expose data through updatable views.
As schemas evolve, updatable views remain in place but map to new structures.
Existing code that accesses the repository based on previous schemas can
continue to work as schemas evolve.
If these legacy views
remained in place forever, they would make innovation difficult since schemas
could evolve only in limited ways. Thus, legacy views will be deprecated
according to a defined schedule. Figure 1 illustrates how this works for a
simple case.
.jpg)
Figure 1. A simple
example of how a schema change works in practice. In version x+1, support for international
addresses is added (shown in green) and support for ZIP codes is deprecated
(shown in red); then, in version x+3, support for ZIP codes is removed. In
versions x+1 and x+2, ZipCode now is mapped to PostCode and old applications
continue to work. In particular, a program developed against version X can
continue to work as before.
Developers who use an
object-relational mapping framework (ORM) that includes schema mapping
facilities (such as the ADO.NET Entity Framework) may also use their ORM's
mapping layer to alter how their static class structure projects down to the
target database schema.
This approach has the
following advantages:
- It mitigates most schema evolution
issues.
- It allows schemas to evolve.
The principal disadvantages
are:
- The implementation is more
complex.
- Some SQL Server features do not
work as well with views as they do with base tables.
Product
versioning
There are essentially
two approaches to upgrading the repository database as a whole from one version
to another: in-place update and data transfer (that is, Extract, Transform, and
Load, or ETL). In-place update is suitable for patches and other updates where
changes are minor. Because of the risk of service and data loss, a repository
database should not be in-place upgraded to a new version. Rather, a new
repository should be created and the data from the original repository should
be migrated using ETL and applications migrated as appropriate.
Repository upgrades will
come with an SQL Server Integration Services (SSIS) package to transfer the
data from the previous version of the repository to a new version of the
repository (see Figure 2).
This approach has
several advantages:
- It matches how IT organizations
usually upgrade databases and other servers.
- Applications can be migrated from
one version of the repository to the new version in an incremental and
disciplined way.
- The risk of data and service loss
is minimized because an old version of the repository remains running
until applications can be migrated to the new version.
- By customizing the SSIS package,
an organization can also transfer its custom data along with the other
repository data. Since most organizations that have custom repository
schemas will have significant in-house expertise in developing SSIS
packages, this should minimize the customer's development costs.
.jpg)
Figure 2. An example of
transferring data from version x of the repository to version x+1 of the
repository. An SSIS package is
used to transfer the data in an old version of the repository to the new
version. Application1 has not yet been migrated to the new version of the
repository. Application2 has been migrated to the new version of the
repository.
Domain
versioning
The data in the
repository is often logically grouped together based on version information.
For example, the repository models for storing information about .NET
Assemblies needs to record information about assemblies from multiple versions
of the framework. Performing fine-grained versioning of individual rows would
significantly reduce performance and complicate the data model. Moreover, the
need to logically group data is so common an idiom that the repository provides
a common mechanism for organizing data. That mechanism is called a repository
folder.
Repository folders balance between the competing demands of ease of use,
performance, and functionality. SQL Server Modeling Services content is
organized in terms of a hierarchy of folders, which themselves are modeled in
the database. Essentially, folders segment the repository's data horizontally
with each row belonging to a single leaf folder. Typically, folders will be
structured analogously to how an organization thinks about its assets (for
example, /Finance/Applications/Pay Roll/2.5). After deployment, the data in a
folder can be updated in place. This is analogous to the way a patch can be
applied to an application without conceptually updating the version number.
The relationship between
different versions of items in separate folders (for example, the .NET
Framework 1.0's System.Object and the .NET Framework 2.0's System.Object) is
computed based on domain-specific resolution rules (for example, CLR linking
rules). This is natural in that it matches whatever mechanism the domain uses
to resolve references across versions and it minimizes the performance impact
of supporting the relatively rare operation of comparing repository items
across major versions. Schemas will usually supply SQL functions to make this
name resolution easier. For example, the CLR schema could provide a function
that takes a method's ID and finds the best matching method in another folder.
Since the relationship
between folders is not prescribed, users are free to structure folders in
arbitrary ways. However, folders will usually have a tree structure that
follows the enterprise's taxonomy of IT assets.
Structuring the
repository in this way has a number of advantages over previous repository
approaches:
- It is easily understood and is a
good match for how software artifacts are thought about in enterprises.
- It can be efficiently implemented.
- It supports efficient and secure
scale-out and scale-up strategies.
- It has a clean relational mapping
that does not hamper the ability to query the repository.
- It provides a natural granularity
for securing repository contents (that is, users are granted or denied
access to an entire folder).
- It provides a natural basis for
data partitioning.
Operational
versioning
The repository supports
storing multiple versions of an application in different repository folders
(for example, .NET/2.0 and .NET/3.0). However, once an application is deployed,
it might need to be changed for operational reasons (for example, replacing soon-to-expire
certificates and installing security patches). In the process of maintaining
the application, operations staff or automated systems like SMS might make
incorrect changes that might not be discovered for some time (for example,
installing a certificate with too short a lifetime).
When a system fails,
operations staff need to quickly diagnose and repair the issue. To do this,
operations staff need to understand which changes have been made and which
components the changes affect. Once incorrect changes are identified,
operations staff need the ability to perform corrective compensating actions.
Rolling back all changes since an incorrect change would often remove many
correct changes; thus, operations staff need the ability to perform individual
compensating actions, not just roll back all changes after a point in time.
To help with the
operational scenarios described above, the repository can maintain a historical
log of changes. A historical log enables operations staff to explore and search
the history of operational changes and, for " repository-based runtimes,
issue appropriate compensating changes (for example, restore an old Access
Control List or ACL).
There are several
options for collecting repository change events. Enterprise versions of the
repository use SQL Server 2008's Change Data Capture (CDC) to track changes to
repository tables. The repository provides table-valued functions (named with
the pattern [Schema].[ViewNameChanges]) over CDC to abstract away how views map
to base tables. The approach has the advantages of aligning with the Microsoft
Sync Framework and providing a simple mechanism for repository clients to query
for changes. The principal disadvantage is that the approach reduces throughput
since changes to the repository are recorded twice. However, because CDC is
based on a very mature SQL log reader implementation, CDC is also very
efficient and robust.
All versions of the SQL
Server Modeling Services repository also support tracking changes with SQL
Server 2008's Change Tracking support. The disadvantage of this Change Tracking
over CDC is that it records minimal information about the change (for example,
it would record that an assembly with ID 10 was deleted, but would not record
other values like the name of the assembly that was deleted).
The repository
implementation wraps some CDC and change-tracking functions so that security
policies can be enforced. For example, if a user is not allowed to see the
contents of a folder, then the user should also not be able to see changes
related to the folder.
Summary
The term
"versioning" encompasses many aspects of the evolution of a
repository, ranging from tracking changes to the data itself through to
deploying new versions of the repository. Each aspect imposes its own demands
and needs careful treatment if the repository is to remain useful. This
document details the pragmatic strategies that will allow the repository to
meet these competing demands and significantly improve the software lifecycle.
About the
Author
Anthony is a principal
software architect working on the SQL Server Modeling Services repository.
Anthony was previously an architect and development manager on Visual Studio
Team System, the director of Visio's software and database group and a research
fellow.