.gif)
SQL Server
Technical Article
(11 printed pages)
Writer: Bob Beauchemin, SQLskills
Published: November 2009
Applies to: SQL Server 2008
Summary: This whitepaper covers all facets of
the .NET data development platform. This includes not only client-side and
service-based APIs but also .NET APIs for programming at a server level inside
the SQL Server 2008 database and for developing and testing a SQL Server
database application. It also includes information on future directions of the
.NET and SQL Server development platform.
Introduction
The large
majority of applications use a database to store, query, and maintain their
data. Almost all of them use a relational database that is queried using the
SQL query language. Application programmers then need to tie user activities,
such as ordering products and browsing through lists of items for sale, to
database activities like SELECT statements and stored procedure execution in a
visually pleasing and responsive graphical user interface. They need easy-to-use
components that employ the same object-oriented concepts that are used in the
rest of the application to allow them to access data and bind it to the user
interface. To develop responsive, robust applications on-time and on-budget,
programmers need to reduce the time from model to implementation, as well as to
easily write automated test cases to ensure the application is robust and
scalable.
But data
usage doesn’t end there. One of the key challenges today is to make sense of
the data in timely manner. This means applications often add Business Intelligence
to the OLTP (online transaction processing system) data that is managed in a
relational database for timely decision making within OLTP applications. The
OLTP data can be imported, exported, combined with other data and transformed
using ETL (extract, transform, and load) technologies, and analyzed using an
OLAP (online analytical processing) system. Patterns and insights can be teased
from the data using smart data mining algorithms. User interactions can be
real-time, but reports serve an important role in data presentation and can be
embellished graphically using charts, gauges, maps and other user-interface
controls. Programmers enhancing applications with Business Intelligence should
be able to leverage existing experience by using the same basic programming
APIs and methodology as they use in application development.
Since the
.NET framework 1.0 was released in 2002, it has become programmers’ preferred
framework for developing software, including data-driven applications, on
Microsoft platforms. .NET 1.0 includes a set of classes known as ADO.NET to
provide a substrate for database development. ADO.NET is based on a provider
model. Database products (like SQL Server, Oracle, and DB2) hook into the model
by delivering their own data provider. .NET 1.0 shipped with three data
providers; bridge providers to existing ODBC drivers and OLE DB providers, and
the SqlClient provider for the SQL Server database. Over time, not only has the
SqlClient provider been improved to keep pace with innovations in the SQL
Server database, but SQL Server has expanded the number of integration points
between the database and its related features and the .NET framework. Today
.NET APIs are an integral part of the SQL Server product. In this whitepaper,
I’ll describe the current state of the Microsoft data development platform and
illustrate the deep integration between .NET and SQL Server to show why SQL
Server is the preferred database product for .NET developers. The .NET Framework
has been enhanced since its advent to include functionality like
object-relational mapping and a programming language integrated query language.
I’ll also discuss the direction that platforms and integration will take in the
future so you can plan your overall database development strategy.
Flexible Data Access Frameworks for Productive .NET Application Development
Data Access
Frameworks are what developers normally consider to be “database programming”.
.NET developers have a choice of using the low-level ADO.NET object model or
the higher-level conceptual model as defined by the ADO.NET Entity Framework.
You can also program you client-level database access through a well-known
REST-based services model, exposing the database as a service. Each of the
client-level data access frameworks integrate seamlessly with multi-tier
applications and with visual controls in the .NET Framework.
ADO.NET – The Substrate
ADO.NET is
the base database API and the basis for all of today’s .NET-based data access
frameworks. All existing .NET applications in releases prior to .NET 3.5 use
this API. Programmers should use ADO.NET when they want direct control of the
SQL statements used to communicate with the underlying database and also to
access specific database features not supported by the ADO.NET Entity
Framework.
Microsoft
ships three SQL Server-related ADO.NET data providers. System.Data.SqlClient is
the provider for the SQL Server database engine, mentioned previously.
Microsoft.Data.SqlCe is a provider for SQL Server Compact Edition that works on
the desktop or on compact devices. There is also a specialized data provider
for SQL Server Analysis Services and Data Mining functionality (the ADOMD.NET
provider), which will be mentioned later.
ADO.NET uses
the “connection-command-resultset” paradigm. Programmers open a Connection to
the database, and issue Commands consisting of stored procedures or SQL
statements that can either perform insert, update, and delete operations, or
select statements that return a DataReader (resultset). Additional classes
encapsulate transactions, procedure parameters, and error handling. ADO.NET 1.0
and 1.1 used an interface-based design; that is, programmers used
database-specific classes that implemented the well-known IDbConnection,
IDbCommand, and IDataReader interfaces. ADO.NET 2.0 added base classes so
database-specific classes derived from the generic DbConnection, DbCommand, and
DbDataReader classes. The base classes or interfaces provide the generic
functionality; database-specific functionality is encapsulated in
provider-specific classes (e.g. SqlConnection, SqlCommand, and SqlDataReader).
For more information reference “Generic Coding with ADO.NET 2.0 Base Classes
and Factories” at http://msdn.microsoft.com/en-us/library/ms379620(VS.80).aspx.
Database
transactions are accommodated in the model in two ways. Local transactions can
be represented by a DbTransaction object which is specified as a property of
the connection object. Alternatively, .NET developers can use the
System.Transactions library. System.Transactions supports local and distributed
transactions and passes and tracks transactions through an instance of the
TransactionScope class.
The specific
server, database, and other connection parameters are specified in a connection
string. The preferred method for specifying a connection string is to include it
in the application’s .NET configuration (.config) file. There is a standard
configuration file location used for ADO.NET connection strings.
Programmers
use the .NET type system in their applications, but relational data types in
the database, so there needs to be a low-level .NET type system-to-database
type system correspondence. Database types mostly follow the ISO-ANSI SQL
standard type system, but database vendors can include database-specific types.
In ADO.NET, common database types are represented using an enumeration for the
common types, System.Data.DbTypes. Programmers rely on documented mapping of
their database’s data type to DbTypes, but providers can also add
database-specific types. SqlClient accommodates SQL Server-specific types by
using a SqlDbTypes enumeration. A type correspondence mismatch occurs because
most types in the .NET type system have nothing that distinguishes database
NULL from empty instance of a type. A set of SQL Server-specific types in the
System.Data.SqlTypes namespace not only encapsulate the concept of database
NULL by implementing an INullable interface, but are isomorphic with the data
types in SQL Server. .NET 2.0 adds a generic nullable type that works with .NET
value types (structures). Nullable types can be used as an alternative to
System.Data.SqlTypes.
The SqlClient
provider is the lowest-level .NET data access APIs and also the closest to SQL
Server. It is the only client database API that supports all of the latest
enhancements in SQL Server. Programmers must use native ADO.NET calls when they
need access to these features. For example, only ADO.NET and SqlClient directly
support the Filestream storage with streaming I/O and the table-valued
parameters features of SQL Server 2008. SqlClient is also the only database API
to directly support SQL Server data types implemented in .NET, such as SQL
Server 2008’s spatial data types and hierarchyid type, as well as SQL Server
UDTs (user-defined types) implemented in .NET in the server and on the client.
The SQL Server and .NET teams work closely together; an example of this is the
introduction a new .NET primitive type, the System.DateTimeOffset data type in
.NET 3.5 that corresponds to SQL Server 2008’s datetimeoffset data type. This
makes it straightforward for programmers to take advantage of the very latest
SQL Server features.
ADO.NET also
provides for a “disconnected update” scenario using a class called the DataSet
that contains of collection of DataTable instances with collections of DataRow
and DataColumn objects. The DataSet interacts with the database through a
DataAdapter class, and includes relational database-like features such as
indexing, primary keys, defaults, identity columns, and constraints, including
referential constraints between DataTables. Programmers can use the DataSet as
an in-memory object model for database data, although with the advent of the
ADO.NET Entity Framework a model based on entities is preferred because you
program against business objects rather than a relational-style object model.
LINQ – Native Data Querying integrated into .NET Languages
LINQ is a
SQL-like, strongly-typed, query language that is implemented using built-in
programming language constructs in .NET. This provides the ability to query any
implementation of a .NET IEnumerable<T> class using SQL syntax. LINQ
defines a standard set of query operators that allow both set-based (e.g.
projection, selection) and cursor-based (e.g. traversal) queries using
programming language syntax. LINQ uses a provider model to allow access to
domain-specific stores such as relational databases, XML, Windows Active
Directory, collections of objects, and ADO.NET DataSets. In most cases, the
providers work by translating LINQ queries to queries against the
domain-specific store.
Using LINQ
queries that produce SQL query statements gives programmers compile-time syntax
checking and IntelliSence, reduces data access coding time compared to coding
with raw SQL strings, and also provides protection from SQL injection. This is
an improvement for programmers who don’t use stored procedures and code SQL
strings directly in application code, because these SQL strings are not
syntax-checked by the underlying programming language compiler.
Two
implementations of LINQ over ADO.NET were introduced in .NET 3.5. LINQ to SQL
is a SQL Server-specific implementation over a collection of tables represented
as objects. It includes a mapping tool that maps tables to objects, stored
procedure support, and implements almost all LINQ operators by translating them
into T-SQL. Programmers can insert, update, and delete rows from database
tables through objects using a DataContext class, which extends the basic LINQ
functionality. A similar LINQ provider also exists for the SQL Server Compact
Edition.
Programmers
use the ADO.NET DataSet object because it represents its data as a familiar
collection of tables that contain columns and rows. There is also a LINQ
provider over the DataSet class that provides functionality not included in the
original DataSet classes using standard SQL syntax. This implementation allows
selection, projection, joins, and other LINQ operators against the DataSet’s
collection of DataTables. Programmers that use the DataSet in existing
applications can use the LINQ provider to extend the base functionality. Using
LINQ to DataSet enhancements does not change the paradigm that filling the
DataSet with database data and updating the database when changes are made to
the DataSet is controlled by the DataAdapter class.
Entity Framework and the Entity Data Model Overcome the Object Relational
Impedance Mismatch
Object-oriented
programming concepts have taken hold in the development community, enjoying the
same widespread popularity as the relational normalization-based design and
set-based programming has in the database community. Therefore, there is a need
to bridge the gap between relational sets to collections of objects and map
operations on object instances to changes in the database. Programmers can
bridge this gap, known as the object-relational impedance mismatch, by using
the ADO.NET Entity Framework.
The ADO.NET
Entity Framework should be considered the development API of choice for .NET
SQL Server programmers going forward. The Entity Framework raises the
abstraction level of data access from logical relational database-based access
to conceptual model-based access. For more information on this, reference the
whitepaper “Next-Generation Data Access: Making
the Conceptual Level Real”.
The Entity
Framework consists of five main parts. These are the Entity Data Model (EDM),
the EntityClient provider and Entity SQL language, the ObjectServices
libraries, and the LINQ to Entities provider.
The Entity
Data Model (EDM) defines the conceptual model that you program against and is
specified using a Visual Studio designer. With the Entity Framework,
programmers specify the classes and relationships that will be used in the
object model (EDM or conceptual schema), how tuples and relations are
represented in the database (physical schema), and how to map between the two
(mapping schema). The Entity Framework directly supports modeling inheritance using
table-per-type mapping as well as table-per-hierarchy or
table-per-concrete-type mapping, mapping entities to stored procedure
resultsets and projections as a ComplexTypes, database views, and on-the-fly
projection mapping as anonymous types.
The Entity
Framework is layered over ADO.NET and most ADO.NET data providers have been
enhanced to work with it. Programmers can work directly with an Entity Data
Model using ADO.NET by using the EntityClient provider. To use EntityClient,
you specify both the Entity Framework Artifacts that represent the EDM, storage
schema and mapping and the underlying data provider (e.g. SqlClient) in the
ADO.NET connection string.
The Entity Framework
defines a language called Entity SQL that extends traditional SQL with some
additional object-oriented query constructs.
Entity SQL is the low-level query language of the Entity Framework that
is used to query against the Entity Data Model.
Finally, the Entity Framework includes an implementation of the LINQ
query language, known as LINQ to Entities. You can program against the Entity
Framework using three different programming models: LINQ to Entities,
ObjectServices, or “native” ADO.NET using the EntityClient provider. Both
EntityClient and ObjectServices use Entity SQL strings directly. Although some
lower-level query functionality can only be expressed using EntitySQL, using
LINQ to Entities is preferred for most development because using LINQ query
language instead of EntitySQL gives programmers compile-time syntax checking
and IntelliSence, reducing coding time.
The
programming model supports insert, update, and delete operations against the
entities by tracking changes with an ObjectContext. The model can generate the
SQL required for these operations or they can be mapped to a set of
user-written stored procedures. Both lazy and eager loading of related entities
is supported. Transactions are accommodated by using System.Transactions.
Entity
Framework 4.0, which will ship with .NET 4.0, adds a variety of new features
that support a model-first methodology, test-driven development (make it easier
to create a mocking layer for testing), and persistence ignorance. In addition,
there are improvements in the SQL generation layer, both in the translation of
LINQ to Entities queries and in the SQL Server provider specifically that make
the generated SQL perform better. EF 4.0 also directly supports a
representation of foreign keys between entities which is useful for data
binding.
WCF Data Services (ADO.NET Data Services) – RESTful Data Access
All of the
database client APIs that I’ve mentioned so far require a direct connection to
the database from either the client or middle-tier. But popular programming
models such as Silverlight and AJAX (Asynchronous Javascript and XML) interact
with all data sources directly from browser code, using either XML or JSON (JavaScript
Object Notation) payloads. This pattern requires programmers to use a
middle-tier service that’s exposed through either SOAP or REST-based APIs. Data
Services is the database API that exposes the data as a REST-based Web Service.
The service is a WCF-based (Windows Communication Foundation) service that
exposes database operations through the standard HTTP operations against a REST
endpoint like GET, POST, PUT, and DELETE.
Data Services
can use an Entity Framework model as an endpoint directly. It’s also able to
work with other LINQ-based data sources (including LINQ to SQL) for reading. If
you would like your non- EF data source to be available over Data Services for
updating, you must provide an implementation of the IUpdateable interface. SQL clauses like the WHERE, GROUP BY, and
ORDER BY clause are implemented as parameters on the URL.
Data Services
exposes and Entity Data Model (entities or any sets of data, such as SharePoint
lists in SharePoint 2010) using the Open Data Protocol as either ATOMPub feeds
(ATOM is a standard XML format used for data feeds) or collections of JSON
objects. JSON objects are especially useful with AJAX clients. Because security
needs to be considered in any service available on the Internet, granular data
access security is enabled by using a set of data access expressions coded in
the InitializeService method in combination with the authentication and
authorization scheme of your choice. By default, no resources or associations
are available unless you override the default InitializeService implementation.
To build a full-featured application, Data Services clients may also require
metadata; that is, information about the data available through a specific
service endpoint. A REST resource endpoint that supplies metadata for discovery
purposes is also part of the model.
In addition
to the capability to serve data, Data Services includes a client consumer API.
In this API, the client “connects” to the endpoint by using the REST URL.
Programmers query the data with a special LINQ provider that fetches updateable
EDM object instances. This pattern encapsulates Data Services access so that it
looks more like traditional data access code rather than raw HTTP and XML
calls.
Programmers
can use rich application frameworks that start with Data Services as a
REST-based substrate for interacting with a database. One such framework is
ASP.NET Dynamic Data, a template-based framework that allows you to build a
data driven application quickly. It can use Data Services and LINQ to SQL or the
Entity Framework to access the relational database.
Harness your .NET skills within the Database for Scalable, Reliable and
Manageable Applications
Data access
APIs are normally considered as programming outside the database server.
Beginning with version 2005, SQL Server permitted in-server programming of
database objects such as user-defined functions using .NET APIs as addition to
traditional programming in native database languages such as SQL, MDX, and DMX.
In addition, SQL Server provides mechanisms to extend and customize the product
itself using .NET APIs, and .NET support extends deep into all facets of the
server product. With SQL Server as the database, programmers can leverage their
existing knowledge of .NET programming to add value to existing applications
and also leverage knowledge of .NET APIs such as ADO.NET, which can be used
both “outside” and “inside” the server. .NET integration with the database engine
includes support for both application programmers and database administrators.
Programming SQL Server with .NET
Since SQL
Server 2005, programmers have had the ability to incorporate .NET in database
object code as an adjunct and alternative to using T-SQL. Hosting .NET code
inside the SQL Server engine (that is, using SQL Server as a .NET runtime host)
is commonly referred to as “SQLCLR”. One use of this .NET support is as a means
to access external resources such as the Registry or the file system directly
from database code without resorting to unsafe mechanisms such as enabling
xp_cmdshell or using undocumented system extended stored procedures. In
addition, the programmer has a choice of .NET or T-SQL when writing
user-defined functions, stored procedures and triggers. This allows a choice of
best-of-breed mechanisms, that is, the ability to use .NET code where it
performs better (such as regular expressions and complex calculations) and
T-SQL when it performs better (like database access and complex joins).
The SQL
server team designed its .NET integration with security and reliability in
mind, as well as performance. Safe SQLCLR code is limited to a subset of system
assemblies that have been hardened and tested with SQL Server. SQLCLR code runs
in process, as part of the SQL Server service process (sqlservr.exe) for best
performance, not in a separate service process.
Data access
in SQLCLR database objects follows the ADO.NET object model with optimizations
for in-server processing. SQLCLR code can directly execute T-SQL code, rather
than requiring a separate ODBC or OLE DB connection (as extended stored
procedures require) that use more memory and must be enlisted in the current
transaction manually. The SQLCLR code is part of the current connection. This
internal connection is requested by a special connection string parameter
“Context Connection=true”.
Server-side
programmers use the SqlClient provider and the traditional
Connection-Command-DataReader paradigm with a few extensions to accommodate server-specific
constructs. This not only lets programmers familiar with ADO.NET leverage their
existing expertise, but also allows the production of data access code that’s
relatively portable from in-server to middle-tier and even to client should the
need arise, with few changes to the code.
Server-side
programmers can also write SQLCLR user-defined aggregates and user-defined
types (UDTs). User-defined aggregates extend the reach of aggregate functions
to allow functions that are not built into T-SQL, such as covariant of a
population. Programmers can use UDTs to create new scalar data types that
extend the SQL Server type system. UDTs and user-defined aggregates must be
coded in SQLCLR, not T-SQL. The execution engine recognizes the new aggregate
functions and includes them in the query plan, even allowing the aggregation
function to be parallelized as with other query plan iterators.
Extended .NET Data Types and Code in SQL Server
In SQL Server
2008 some new features of the SQL Server engine itself were written in .NET
code. The new Policy Based Management and Change Data Capture features use .NET
internally indicating that the SQL Server team has made a commitment to use
SQLCLR code as an adjunct to unmanaged code where it is useful. Although there
is a database option to disallow user-written SQLCLR code, this does not
prevent the execution of system CLR code. Thus, in SQL Server 2008, .NET code
is used as part of the engine’s standard operating environment.
Three new
data types were introduced in SQL Server 2008 that are implemented using the
SQLCLR user-defined type architecture; these are the spatial data types,
geometry and geography, and the hierarchyid data type. The code that implements
these data types in contained in a .NET assembly, Microsoft.SqlTypes.Types.dll.
Implementing built-in data types in .NET code also provides benefits for
programmers that use the types. Firstly, the programming model for these types
is identical whether you’re writing a SQL query or client-side object
manipulation code. This allows server programmers writing SQL with the
instance.method syntax to leverage their expertise when writing either client
code or SQLCLR stored procedures.
Programmatic Management with SQL Server Management Objects
.NET is used
in programming administrative functions in the SQL Server Engine as well
through the SQL Server Management Objects (SMO) libraries. SMO is a set of
libraries that are used to manipulate and manage database objects (e.g. create
a new database, configure network protocols, backup and restore the database).
This provides an additional choice to manage the database, serving as an
alternative to SQL DDL statements and system stored procedures.
PowerShell Support in SQL Server
SQL Server
2008 introduced support for PowerShell, a new Windows shell and successor to
the command shell. This product is part of the Microsoft Common Engineering
Criteria and is supported as an administrative interface by almost every
product in the Windows platform.
Because
systems administrators or programmers often assume database administrator
responsibilities in smaller installations and PowerShell is a common
administration tool it is thought that “multi-product administrators” will
leverage their PowerShell expertise to manage SQL Server, while specialized SQL
Server-only DBAs will continue to use SQL scripts and system stored procedures.
Experience with .NET object models is a useful skill when working in the
PowerShell environment.
PowerShell is
based on programming and scripting objects, rather than passing text between
commands in scripts as traditional shells (e.g. CMD, csh) do. PowerShell uses
cmdlets to manipulate the system; these are programmed in .NET code. In
addition, products can expose administrative functionality through a provider
model. PowerShell providers assist in generic knowledge transfer by
representing system configuration using the file system paradigm combined with
a standard set of operations such create-alter-delete on product nodes. SQL
Server 2008 includes both cmdlets and a provider. SQL Server ships a custom
shell (named SQLPS.exe) in which the cmdlets and the provider are pre-installed
to prevent naming conflicts. You can also manually install the cmdlets and
provider for use in the main PowerShell shell.
Enrich your Applications with Embedded Business Intelligence
Business
Intelligence is a set of tools and technologies designed to improve business
decision making and planning. It generally includes reporting, quantitative
analysis, and predictive modeling. Programmers can use their .NET skills along
with tools in SQL Server in the programming of decision support systems.
.NET APIs are
not only integrated with the database engine, but are pervasive in every part
of the SQL Server product. In addition to programming business intelligence
client applications, you can also use .NET to do administrative and
configuration programming and write product extensions, but this is beyond the
scope of this whitepaper.
In this
section, I’ll briefly describe the .NET facilities used by SQL Server Reporting
Services (SSRS), SQL Server Analysis Services (SSAS), and SQL Server
Integration Services (SSIS).
Integrate Reports & Data Visualization within your application with SQL
Server Reporting Services
SQL Server
Reporting Services is the tool of choice for programming reports to display in
custom Web or Windows applications using the ReportViewer control or
publication in a central repository for convenient, secure, user access.
Programmers can use SQL Server or Microsoft Office SharePoint Server as a
repository or include report definitions directory in applications. All of the
entry points into the SSRS product allow programmers to leverage their .NET
coding experience. Reports can be rendered in a variety of formats and will
even be exposed as data (AtomPub feeds) in SQL Server 2008 R2.
Reports can
use custom code that extends the functionality provided by the SSRS built-in
expression language. This custom code must be provided in .NET assemblies,
which are registered with Report Manager and in the development environment.
Once registered, custom .NET classes are accessed just as the built-in
expressions are.
ADOMD.NET – Client side Analysis Services
Programmers
that expose Business Intelligence functionality, such as information from OLAP
cubes in custom .NET applications must use an API to extract the information
from the OLAP server just as they use an API to extract information from the
relational engine. Although many Business Intelligence users use packaged
client applications such as Excel, writing custom web applications that extract
analysis and data mining information, such as the familiar “people who brought
product X are also interested in product Y” is becoming more and more
commonplace. ADOMD.NET is to OLAP applications as ADO.NET is to relational
database applications.
ADOMD.NET is
a specialized ADO.NET data provider that not only exposes the traditional
Connection-Command-DataReader paradigm, but contains additions for
multidimensional data and data mining. With OLAP cubes, metadata is not only
more complex (it can also be hierarchical in the case of hierarchical
dimensions) but also more important, because OLAP client applications often
rely on discovery through metadata at runtime. ADOMD.NET has rich support for
this metadata. Programmers use ADOMD.NET on the client by adding a program
reference to the Microsoft.AnalysisServices.AdomdClient.dll.
ADOMD.NET
ships with SQL Server or is available separately as part of the SQL Server
2005-2008 Feature Pack.
ADOMD.NET – Server side Analysis Services
In SSAS you
can write stored procedures or user-defined functions in .NET using the
ADOMD.NET library. The server library has almost the exact same object model as
the client library but slightly different functionality. The reason for using
ADOMD.NET with in-server objects is to decrease network latency (one single
network call as opposed to possibly many network calls from the client) and to
encapsulate data logic for reuse. To write in-server .NET code, you reference
Microsoft.AnalysisServices.AdomdServer.dll.
Integrate Your Data with SQL Server Integration Services
SSIS is an
extract-transform-load (ETL) tool that is used to build database workflows
called packages. SSIS packages are designed for fast data transformation and
loading using an in-memory “data pipeline”, often using the bulk extract-bulk
insert functionality of databases directly. SSIS allows programmers to
concentrate on writing workflow logic rather than building custom code for each
ETL scenario. Programmers often use SSIS packages to populate Analysis Services
cubes and train Data Mining models, as well as import and export of relational
data.
SSIS ships
with a rich set of ETL components. It is interesting to note that each SSIS
component that is part of the product is implemented in a separate assembly,
that is, they are implemented in .NET. Therefore programmers with .NET
expertise can use SSIS object models to create custom components that extend the
base functionality of implement SSIS integration with ETL items that are not
supported by SSIS “in the box”.
SSIS ships
with script components that allow custom logic in control flows or data flows.
This allows script-based lightweight access for customizations without having
to build an entire custom component. SSIS includes both a Script Task for task
flow and a Script Component for the data flow. In SQL Server 2008, these
scripts use Visual Studio Tools for Applications (VSTA) and the script code can
be written in either VB.NET or C# (only VB.NET is supported in SQL Server
2005). Once again, the .NET programmer should be right at home with SSIS script
coding and component customizations.
Sync Services for ADO.NET enables collaboration and offline scenarios
As compact
devices get more prolific, a common pattern in data access emerges in which
data is stored in a corporate database, with many individual devices needing
replication/synchronization of their-specific portion of the database tables.
An example would be salespeople that need a local copy of their customers’ data
for offline use while on the road. The salesperson also needs to be able to
take orders offline then upload changes to the corporate database. For this
type of scenario, ADO.NET Sync Services fills the bill nicely, allowing
client-directed synchronization between databases such as SQL Server and SQL
Server Compact Edition. Sync Services for ADO.NET is part of the Microsoft Sync
Framework, which also supports file system sync and feed sync (for RSS and ATOM
feeds).
Sync Services
also allows synchronization between data stores. It will be a key component in
scenarios that include synchronization between SQL Azure cloud-based databases
and either compact devices or traditional data repositories in the data center.
Sync Services
works with a provider model and ships with a SyncClient provider for SQL Server
Compact Edition. The SyncServer provider works with any ADO.NET data provider.
Sync Services support unidirectional synchronization, bidirectional
synchronization, or complete refresh with each synchronization. Version 1.0
supports hub-and-spoke style synchronization, Version 2.0 also supports
peer-to-peer style synchronization.
The Server
provider works with the set of synchronization commands, but also allows for
programmable conflict resolution if the data on both the client and the server
changes. SQL Server 2008 includes a Change Tracking feature that allows
distinguishing which changes are made on client or server, and allows selective
synchronization on a per-client basis. Sync Services for ADO.NET is available
for SQL Server Compact Edition on the desktop and on compact devices.
The Future Frontier
So far, we’ve
covered how you can use .NET as a substrate to develop with databases outside
the server, and with SQL Server inside the server product as well. In future
products, the vast amount of functionality available to .NET data access
programmers as well the synergy between SQL Server and .NET continues to
increase. Here’s a survey of some of the features for .NET programmers coming
in the near future.
SQL Azure Database – Database in the Cloud
SQL Azure
Database is a cloud-based database built on SQL Server. Because it uses SQL
Server underneath, any database API, including Entity Framework, ADO.NET, and
others, can be used with SQL Azure Database simply by changing the connection
string. Version 1.0 of SQL Azure Database supports almost all of the database
objects (e.g. tables, views, and stored procedures) in the SQL Server database
engine, but Version 1.0 does not yet support in-database .NET programming
(SQLCLR).
Self-Service Business Intelligence
SQL Server
PowerPivot For Excel is an Excel 2010 add-in that facilitates the creation of
analysis models in Excel using multiple data sources, and also allows the user
to define relationships between tables in disparate data sources. This enables
an entire population of non-programmers to write Excel apps against their
corporate data in a way they never could before, enabling new scenarios in
Business Intelligence. The data in the models can be set to automatically
update on a schedule and the Excel-based model can be published to a SharePoint
server. Using SQL Server PowerPivot for SharePoint 2010, the SQL Server
PowerPivot Excel-based data can be exposed as an Analysis Services 2010 cube
data source. SQL Server PowerPivot For Excel will support ATOMPub data feeds,
meaning that any data source that’s exposed through ADO.NET Data Services will
be available. This allows SQL Server PowerPivot For Excel to import data
exposes through EDM object models, as well as SharePoint 2010 List data. In
addition, SSRS 2008 R2 will expose any report that’s published to the SSRS
Report Manager as an ATOMPub feed, permitting the creator of a SQL Server
PowerPivot For Excel data model to import report data for further analysis.
SQL Server Master Data Services– Manage Your Most Important Data Assets
SQL Server
Master Data Services is a feature of SQL Server 2008 R2 that allows a business
to manage its most vital and precious data such as customer and product lists.
Often multiple copies of this data are stored in different databases and it’s
easy to end up with duplicate but slightly different versions of information
such as contacts or addresses. MDS allows you to resolve such data
inconsistencies. For more information about Master Data Management concepts,
reference the whitepaper “The What,
Why, and How of Master Data Management”.
SQL Server
MDS includes a SQL Server-based repository, a Web-based front end that allows
domain experts and user analysts to manage the data, and a Web Service-based
interface for programmability. The feature implements fuzzy matching and
domain-specific indexes. The fuzzy indexing algorithms were developed by
Microsoft Research.
The
programming functionality is included as a set of SQLCLR system functions (e.g.
similarity and regular expression style functions) and index building and
lookup procedures for multicolumn domain-specific indexes, implemented in the
MDM repository database. Because they are written in SQLCLR these functions
could be leveraged in future in other parts of the SQL Server product such as
SSIS (for bulk matching and data cleansing), SSRS (to produce fuzzy match
reports) and IFTS (Integrated Fulltext Search) as an adjunct and alternative to
word-stemming based text matching.
Real-time Applications with Data in Flight (StreamInsight)
In most data
use cases, data is first stored in a database and analyzed after it’s stored.
However, with some use cases (e.g. power grid or heart monitor readings, stock
ticker prices) the act of storing the data introduces too much latency for the
analysis to be useful. StreamInsight is a Complex Event Processing (CEP) system
that allows data to be analyzed and events to be summarized “in flight”.
The
StreamInsight produce is built on a service process and input and output
providers. Input providers (written in .NET using the StreamInsight APIs)
translate input data into a standard set of input stream messages. The
streaming applications themselves are also written using a .NET API. Each
message type can be registered with one or more LINQ queries (StreamInsight
supplies the LINQ provider) for in-flight querying, message aggregation and
output through message output providers. Input and output providers for SQL
Server data are among those included with the current samples.
Modeling with M, EDM, Quadrant and the Entity Framework
Programmers
can be more productive when the transition from model to application is part of
the development process. Future versions of SQL Server will support several
features to facilitate this process, including a programming language to create
models (“M” language) and SQL Server Modeling Services, a repository for
storing, querying, and programming with models. The current CTP includes “M” to
T-SQL generation for defining and populating models and the Quadrant tool for creating and
compiling “M”-based models and DSLs (domain-specific languages), and browsing
and editing the repository.
One feature
of this product will be using the “M” language as a representation of the
Entity Data Model and the ability to use M with the Entity Framework. This will
allow greater model scalability compared to using XML to represent the EDM.
Programmers can either use the Entity Designer in Visual Studio or code their
Entity Data Model in “M”, and the model will be used by the Entity Framework
directly at runtime.
In addition
to the Entity Framework 4.0 improvements, such as supporting the majority of
object programming patterns used today, the Entity Framework and the Entity Data
Model will also be more closely integrated with the SQL Server product in the
future. A most likely first step is to allow programmers to design reports in
SQL Server Reporting Services directly against the Entity Data Model. The EDM
could be integrated with other models in the framework as well.
SQL Server Modeling Services Repository
The SQL
Server Modeling Services repository resides in a SQL Server database, so it can
be populated and queried with either the “M” or T-SQL languages. SQL Server-specific
facilities such as Change Data Capture can be used to provide versioning of the
repository. By using the “M” language, programmers can model not only the data
and object models, but also applications, workflows and business processes,
services, configuration information, and the operating system infrastructure
such as SMS (Microsoft Systems Management Service) configuration and
installation information. By storing the information is a single repository,
it’s possible to define relationships among domain-specific models for a
complete picture of business and computer-based processes. SQL Server Modeling
Services ships with a Base Domain Library (BDL) to provide a service layer to
model-driven applications, including pre-built domains for UML and CLR and
tools that support them.
The storage
of data and application models in the SQL Server Modeling Services Repository,
along with code generation to make repository information synergistic with data
applications and a rich visual tool for manipulating SQL Server data, will
assist in recording and using models to make SQL Server Modeling Services the
provider of a single, integrated, auditable, version of truth for the
organization’s data models, application service models, configuration
information, software management, and business processes.
Conclusion
As you can
see programmers use .NET as the underlying substrate for extending and
customizing all parts of the SQL Server family including deep support in SQL
Server Analysis Services, Integration Services, and Reporting Services. In
future SQL Server will evolve into a database that not only integrates with the
relational model of data but provides direct support of the conceptual model as
well.
The current
.NET data access APIs are always based on a provider model that defines a base
functionality with extensibility points that allow database-specific
functionality to be fit into the model. A SQL Server implementation of each
client API is able to serve as a reference implementation and also as a model
to illustrate provider extensibility. SQL Server carries the data models into
the database itself, enabling in-database .NET programming and extensibility.
For more information:
Data
Developer Center: http://msdn.microsoft.com/data
SQL Server
Web site:http://www.microsoft.com/sqlserver/
SQL Server
Tech Center: http://technet.microsoft.com/en-us/sqlserver/
SQL Server
Developer Center: http://msdn.microsoft.com/en-us/sqlserver/