Guide to the Data Development Platform for .NET Developers
SQL Server Technical Article
Writer: Bob Beauchemin, SQLskills
Technical Reviewers: James Baker, Kraig Brockschmidt, Grant Dickinson, Donald Farmer, Elisa Flasko, Stacia Misner, Syed Rasheed, Chris Sells
Published: November 2009
Applies to: SQL Server 2008
Summary: This whitepaper covers all facets of the .NET data development platform. This includes both client-side and service-based APIs but also .NET APIs for programming at a server level inside the SQL Server 2008 database and for developing and testing a SQL Server database application. It also includes information on future directions of the .NET and SQL Server development platform.
The large majority of applications use a database to store, query, and maintain their data. Almost all of them use a relational database that is designed using the principals of data normalization and queried with set-based queries using the SQL query language. Application programmers need to tie user activities, such as ordering products and browsing through lists of items for sale, to database activities like SELECT statements and stored procedure execution in a visually pleasing and responsive graphical user interface. They need data access and data binding to the user interface that is encapsulated in easy-to-use components that employ the same object-oriented concepts that are used in the rest of the application. To develop responsive, robust applications on-time and on-budget, programmers need to reduce the time from model to implementation, as well as to easily write automated test cases to ensure the application is robust and scalable.
But data usage doesn’t end there. One of the key challenges today is to make sense of the data in timely manner. This means applications often add Business Intelligence to the OLTP (online transaction processing system) data that is managed in a relational database for timely decision making within OLTP applications. The OLTP data can be imported, exported, combined with other data and transformed using ETL (extract, transform, and load) technologies, and analyzed using an OLAP (online analytical processing) system. Patterns and insights can be teased from the data using smart data mining algorithms. User interactions can be real-time, but reports serve an important role in data presentation and can be embellished graphically using charts, gauges, maps and other user-interface controls. Programmers enhancing applications with Business Intelligence should be able to leverage existing experience by using the same basic programming APIs and methodology as they use in application development.
Since the .NET framework 1.0 was released in 2002, it has become programmers’ preferred framework for developing software, including data-driven applications, on Microsoft platforms. .NET 1.0 includes a set of classes known as ADO.NET to provide a substrate for database development. ADO.NET is based on a provider model. Databases products (like SQL Server, Oracle, and DB2) hook into the model by delivering their own data provider. .NET 1.0 shipped with three data providers; bridge providers to existing ODBC drivers and OLE DB providers, and the SqlClient provider for the SQL Server database. Over time, not only has the SqlClient provider been improved to keep pace with innovations in the SQL Server database, but SQL Server has expanded the number of integration points between the database and its related features and the .NET framework. Today .NET APIs are an integral part of the SQL Server product. In this whitepaper, I’ll describe the current state of the Microsoft Data Development Platform and illustrate the deep integration between .NET and SQL Server to show why SQL Server is the preferred database product for .NET developers. The .NET database platform has been enhanced since the advent of .NET to include functionality like object-relational mapping and a programming language integrated query language. I’ll also discuss the direction that platforms and integration will take in the future so you can plan your overall database development strategy.
Flexible Data Access Frameworks for Productive .NET Application Development
Data Access Frameworks are what developers normally consider to be “database programming”. .NET developers have a choice of using the low-level ADO.NET object model or the higher-level conceptual model as defined by the ADO.NET Entity Framework. You can also program you client-level database access using through a well-known REST-based services model, exposing the database as a service in the world of Software As A Service. Each of the client-level data access frameworks integrate seamlessly with multi-tier applications and with visual controls in the .NET platform.
ADO.NET – The Substrate
ADO.NET is the base database API and the basis for all of today’s .NET-based data access frameworks. All existing .NET applications in releases prior to .NET 3.5 use this API. Programmers should use ADO.NET when they want direct control of the SQL statements used to communicate with the underlying database and also to access specific database features not supported by the ADO.NET Entity Framework.
Microsoft ships three SQL Server-related ADO.NET data providers. System.Data.SqlClient is the provider for the SQL Server database engine, mentioned previously. Microsoft.Data.SqlCe is a provider for SQL Server Compact Edition that works on the desktop or on compact devices. There is also a specialized data provider for SQL Server Analysis Services and Data Mining functionality (the ADOMD.NET provider), which will be mentioned later.
ADO.NET uses the “connection-command-resultset” paradigm. Programmers open a Connection to the database, and issue Commands consisting of stored procedures or SQL statements that can either perform insert, update, and delete operations, or select statements that return a DataReader (resultset). Additional classes encapsulate transactions, procedure parameters, and error handling. ADO.NET 1.0 and 1.1 used an interface-based design; that is, programmers used database-specific classes that implemented the well-known IDbConnection, IDbCommand, and IDataReader interfaces. ADO.NET 2.0 added base classes so database-specific classes derived from the generic DbConnection, DbCommand, and DbDataReader classes. The base classes or interfaces provide the generic functionality; database-specific functionality is encapsulated in provider-specific classes (e.g. SqlConnection, SqlCommand, and SqlDataReader). For more information reference “Generic Coding with ADO.NET 2.0 Base Classes and Factories” at http://msdn.microsoft.com/en-us/library/ms379620(VS.80).aspx.
Database transactions are accommodated in the model in two ways. Local transactions can be represented by a DbTransaction object which is specified as a property of the connection object. Alternatively, .NET developers can use the System.Transactions library. System.Transactions supports local and distributed transactions and passes and tracks transactions through an instance of the TransactionScope class.
The specific server, database, and other connection parameters are specified in a connection string. The preferred method for specifying a connection string is to include it in the application’s .NET configuration (.config) file. There is a standard configuration file location used for ADO.NET connection strings.
Programmers use the .NET type system in their applications, but relational data types in the database, so there needs to be a low-level .NET type system-to-database type system correspondence. Database types mostly follow the ISO-ANSI SQL standard type system, but database vendors can include database-specific types. In ADO.NET, common database types are represented using an enumeration for the common types, System.Data.DbTypes. Programmers rely on documented mapping of their database’s data type to DbTypes, but providers can also add database-specific types. SqlClient accommodates SQL Server-specific types by using a SqlDbTypes enumeration. A type correspondence mismatch occurs because most types in the .NET type system have nothing that distinguishes database NULL from empty instance of a type. A set of SQL Server-specific types in the System.Data.SqlTypes namespace not only encapsulate the concept of database NULL by implementing an INullable interface, but are isomorphic with the data types in SQL Server. .NET 2.0 adds a generic nullable type that works with .NET value types (structures). Nullable types can be used as an alternative to System.Data.SqlTypes.
The SqlClient provider is the lowest-level .NET data access APIs and also the closest to SQL Server. It is the only client database API that supports all of the latest enhancements in SQL Server. Programmers must use native ADO.NET calls when they need access to these features. For example, only ADO.NET and SqlClient directly support the Filestream storage with streaming I/O and the table-valued parameters features of SQL Server 2008. SqlClient is also the only database API to directly support SQL Server data types implemented in .NET, such as SQL Server 2008’s spatial data types and hierarchyid type, as well as SQL Server UDTs (user-defined types) implemented in .NET in the server and on the client. The SQL Server and .NET teams work closely together; an example of this is the introduction a new .NET primitive type, the System.DateTimeOffset data type in .NET 3.5 that corresponds to SQL Server 2008’s datetimeoffset data type. This makes it straightforward for programmers to take advantage of the very latest SQL Server features.
ADO.NET also provides for a “disconnected update” scenario using a class called the DataSet that contains of collection of DataTable instances with collections of DataRow and DataColumn objects. The DataSet interacts with the database through a DataAdapter class, and includes relational database-like features such as indexing, primary keys, defaults, identity columns, and constraints, including referential constraints between DataTables. Programmers can use the DataSet as an in-memory object model for database data, although with the advent of the ADO.NET Entity Framework a model based on entities is preferred because you program against business objects rather than a relational-style object model.
LINQ – Native Data Querying integrated into .Net Languages
LINQ is a SQL-like, strongly-typed, query language that is implemented using built-in programming language constructs in .NET. This provides the ability to query any implementation of a .NET IEnumerable<T> class using SQL syntax. LINQ defines a standard set of query operators that allow both set-based (e.g. projection, selection) and cursor-based (e.g. traversal) queries using programming language syntax. LINQ uses a provider model to allow access to domain-specific stores such as relational database, XML, Windows Active Directory, collections of objects, and ADO.NET DataSets. In most cases, the providers work by translating LINQ queries to queries against the domain-specific store.
These Using LINQ queries that produce SQL query statements gives programmers compile-time syntax checking and IntelliSence, reduces data access coding time compared to coding with raw SQL strings, and also provides protection from SQL injection. This is an improvement for programmers who don’t use stored procedures and code SQL strings directly in application code, because these SQL strings are not syntax-checked by the underlying programming language compiler.
Two implementations of LINQ over ADO.NET were introduced in .NET 3.5. LINQ to SQL is a SQL Server-specific implementation over a collection of tables represented as objects. It includes a mapping tool that maps tables to objects, stored procedure support, and implements almost all LINQ operators by translating them into T-SQL. Programmers can insert, update, and delete rows from database tables through objects using a DataContext class, which extends the basic LINQ functionality. A similar LINQ provider also exists for the SQL Server Compact Edition.
Programmers use the ADO.NET DataSet object because it represents its data as a familiar collection of tables that contain columns and rows. There is also a LINQ provider over the DataSet class that provides functionality not included in the original DataSet classes using standard SQL syntax. This implementation allows selection, projection, joins, and other LINQ operators against the DataSet’s collection of DataTables. Programmers that use the DataSet in existing applications can use the LINQ provider to extend the base functionality. Using LINQ to DataSet enhancements does not change the paradigm that filling the DataSet with database data and updating the database when changes are made to the DataSet is controlled by the DataAdapter class.
ADO.Net Entity Framework and the Entity Data Model Overcome the Object Relational Impedance Mismatch
Object-oriented programming concepts have taken hold in the development community, enjoying the same widespread popularity as the relational normalization-based design and set-based programming has in the database community. Therefore, there is a need to bridge the gap between relational sets to collections of objects and map operations on object instances to changes in the database. Programmers can bridge this gap, known as the object-relational impedance mismatch, by using the ADO.NET Entity Framework.
The ADO.NET Entity Framework should be considered the development API of choice for .NET SQL Server programmers going forward. The Entity Framework raises the abstraction level of data access from logical relational database-based access to conceptual model-based access. For more information on this, reference the whitepaper “Next-Generation Data Access: Making the Conceptual Level Real” at http://msdn.microsoft.com/en-us/library/aa730866(VS.80).aspx.
The Entity Framework consists of five main parts. These are the Entity Data Model (EDM), the EntityClient provider and Entity SQL language, the ObjectServices libraries, and the LINQ to Entities provider.
The Entity Data Model (EDM) specifies the conceptual model that you program against and it’s mapping to underlying database primitives. With EDM, programmers specify the classes and relationships that will be used in the object model (conceptual schema), how tuples and relations are represented in the database (physical schema), and how to map between the two (mapping schema). EDM directly supports table-per-type mapping as well as modeling inheritance using either table-per-hierarchy or table-per-concrete-type. But mapping supports more than just database tables. Mapping entities to stored procedure resultsets and projections as a ComplexType, database views, and on-the-fly projection mapping as anonymous types are also supported. The EDM is specified by using a Visual Studio designer against a set of XML schemas. This metadata can be represented as an XML file (edmx) which can compiled into a program resource.
Entity Framework is layered over ADO.NET and most ADO.NET data providers have been enhanced to work with it. Programmers can work directly with an Entity Data Model using ADO.NET by using the EntityClient provider. To use EntityClient, you specify both the Entity Data Model and the underlying data provider (e.g. SqlClient) in the ADO.NET connection string.
The entity framework defines a language called Entity SQL that extends traditional SQL with some additional object-oriented query constructs. Entity SQL is the low-level query language of the Entity Framework that is used to query against the conceptual model. Finally, the Entity Framework includes an implementation of the LINQ query language, known as LINQ to Entities. LINQ to Entities implements almost all of the constructs in Entity SQL.
You can program the Entity Framework using three different programming models: LINQ to Entities, ObjectServices, or “native” ADO.NET using the EntityClient provider. Both EntityClient and ObjectServices use Entity SQL strings directly. Although some lower-level query functionality can only be expressed using EntitySQL, using LINQ to Entities is the overwhelming favorite among programmers that use the Entity Framework. LINQ to Entities is preferred for most development because using LINQ query language instead of EntitySQL gives programmers compile-time syntax checking and IntelliSence, reducing coding time.
The Entity Framework includes a type system that supports Entity Types, Simple Types, Complex Types, and Row Types. Entity Types contain a reference to an instance of the type called an EntityKey. The set of supported Simple Types is a subset of data types found in most relational databases. A complex type is a set of multiple simple types, e.g. a resultset from a stored procedures or a projection from a SQL query. Entities and Associations between entities live in containers known as EntitySets and AssociationSets. EntityContainer is the top level object that usually maps to a database or application’s scope.
The programming model supports insert, update, and delete operations against the entities by tracking changes with an ObjectContext. The model can generate the SQL required for these operations or they can be mapped to a set of user-written stored procedures. Both lazy and eager loading of related entities is supported. Transactions are accommodated by using System.Transactions.
Entity Framework 4.0, which will ship with .NET 4.0, adds a variety of new features that support a model-first methodology, test-driven development (make it easier to create a mocking layer for testing ), and persistence ignorance. In addition, there are improvements in the SQL generation layer, both in the translation of LINQ to Entities queries and in the SQL Server provider specifically that make the generated SQL more performant. EF 4.0 also directly supports a representation of foreign keys between entities which is useful for data binding.
ADO.NET Data Services – RESTful Data Access
ADO.NET Data Services can use an Entity Framework model as an endpoint directly. It’s also able to work with other LINQ-based data sources (including LINQ to SQL) for reading. If you would like your non- EF data source to be available over ADO.NET data services for updating, you must provide an implementation of the IUpdateable interface. SQL clauses like the WHERE, GROUP BY, and ORDER BY clause are implemented as parameters on the URL.
ADO.NET Data Services exposes entities (or any sets of data, such as SharePoint lists in SharePoint 2010) as either ATOMPub feeds (ATOM is a standard XML format used for data feeds) or collections of JSON objects. JSON objects are especially useful with AJAX clients. Because security needs to be considered in any service available on the Internet, granular data access security is enabled by using a set of data access expressions coded in the InitializeService method in combination with the authentication and authorization scheme of your choice. By default, no resources or associations are available unless you override the default InitializeService implementation. To build a robust application, ADO.NET Data Services clients may also require metadata; that is, information about the data available through a specific service endpoint. A REST resource endpoint that supplies metadata for discovery purposes is also part of the model.
In addition to the capability to serve data, ADO.NET Data Services includes a client consumer API. In this API, the client “connects” to the endpoint by using the REST URL. Programmers query the data with a special LINQ provider that fetches updateable EDM object instances. This pattern encapsulates ADO.NET Data Services access so that it looks more like traditional data access code rather than raw HTTP and XML calls.
Programmers can use rich application frameworks that start with ADO.NET Data Services as a REST-based substrate for interacting with a database. One such framework is ASP.NET Dynamic Data, a template-based framework that allows you to build a data driven application quickly. It can use ADO.NET Data Services and LINQ to SQL or Entity Framework to access the relational database. Another framework is .NET RIA Services, which provides a set of components that allow building rich Internet applications (hence the acronym) using common patterns in ASP.NET, service-oriented architecture, and Silverlight 3.0 as a graphic user interface. It includes a pair of service models that hook directly into the ADO.NET Entity Framework, LINQ to SQL, or ADO.NET Data Services. This allows for data to be fetched though a service (a la ASP.NET AJAX) and bound to Silverlight data controls. These are mapped by .NET RIA Services to database maintenance functionality in the service models. .NET RIA Services is currently part of Visual Studio 2010 Beta2.
Harness your .Net skills within the Database for Scalable, Reliable and Manageable Applications
Data access APIs are normally considered as programming outside the database server. A specific database product, such as SQL Server, exposes programming models for in-server programming as well. Beginning with SQL Server 2005, SQL Server permitted in-server programming of database objects such as user-defined functions using .NET APIs as addition to traditional programming in native database languages such as SQL, MDX , and DMX. In addition, SQL Server provides mechanisms to extend and customize the product itself using .NET APIs, and .NET support extends deep into all facets of the server product. With SQL Server as the database, programmers can leverage their existing knowledge of .NET programming to add value to existing applications and also leverage knowledge of .NET APIs such as ADO.NET, which can be used both “outside” and “inside” the server. .NET integration with the database engine includes support for both application programmers and database administrators.
Programming SQL Server with .NET
Since SQL Server 2005, programmers have had the ability to incorporate .NET in database object code as an adjunct and alternative to using T-SQL. Hosting .NET code inside the SQL Server engine (that is, using SQL Server as a .NET runtime host) is commonly referred to as “SQLCLR”. One use of this .NET support is as a means to access external resources such as the registry or file system directly from database code, without resorting to unsafe mechanisms such as enabling xp_cmdshell or using undocumented system extended stored procedures. In addition, the programmer has a choice of .NET or T-SQL when writing user-defined functions, stored procedures and triggers. This allows a choice of best-of-breed mechanisms, that is, the ability to use .NET code where it’s more performant (such as regular expressions and complex calculations) and T-SQL when it’s more performant (database access and complex joins).
The SQL server team designed its .NET integration with security and reliability in mind, as well as performance. Safe SQLCLR code is limited to a subset of system assemblies that have been hardened and tested with SQL Server. These assemblies are guaranteed not to leak memory or other external resources, such as file handles, by the use of such coding primitives as SafeHandle and ConstrainedExecutionRegion introduced in .NET 2.0. Changes to the .NET 2.0 hosting APIs allow .NET resources such as memory allocation, thread pooling and scheduling, and exception handling, to be controlled by and integrated with the rest of the SQL Server engine. SQLCLR code runs in process, as part of the SQL Server service process (sqlservr.exe) for best performance, not in a separate service process.
Data access in SQLCLR database objects follows the ADO.NET object model with optimizations for in-server processing. SQLCLR code can directly execute T-SQL code, rather than requiring a separate ODBC or OLE DB connection (as extended stored procedures require) that use more memory and must be enlisted in the current transaction manually. This is accomplished by using a mechanism to give the appearance of an ordinary SqlConnection instance, but hook up memory pointers for direct access to database buffers, parameters, transaction space, and lock space. The SQLCLR code is part of the current connection. This internal connection is requested by a special connection string parameter “Context Connection=true”.
Server-side programmers use the SqlClient provider and the traditional Connection-Command-DataReader paradigm with a few extensions to accommodate server-specific constructs. This not only lets programmers familiar with ADO.NET leverage their existing expertise, but also allows the production of data access code that’s relatively portable from in-server to middle-tier and even to client should the need arise, with few changes to the code. The additional constructs for server-side code include a SqlContext and DbDataRecord. The SqlContext provides access to the code executor’s identity for access to external resources and a reference to a SqlPipe object. The SqlPipe is used to directly output resultsets or messages. The DbDataRecord allows the programmer to synthesize resultsets from external resources. You can use this to represent Web Service results or files as a resultset from a stored procedure. In ADO.NET 3.5 and SQL Server 2008 a collection of DbDataRecord objects can also be used to construct table-valued parameters in client code. Server-side programmers can also write .NET-based table-valued user-defined functions that expose any computed or refactored data as a set, in keeping with the T-SQL set-based paradigm.
There is a well-defined mapping between SQL data types and .NET data types and SQLCLR code must adhere to the mapping to be interoperable with T-SQL code. Although this mapping is mostly the same as client-side data type mapping, some data types (e.g. varchar, the non-Unicode string data type) do not have direct .NET equivalents and these cannot be used in SQLCLR result sets or parameters. In addition, .NET data types that do not have SQL Server equivalents (e.g. System.Collections.Hashtable) cannot be returned to T-SQL code, unless they are transformed into types (such as a collection of DbDataRecord objects returned from a stored procedure as a resultset) that T-SQL can work with.
Server-side programmers can also write SQLCLR user-defined aggregates and user-defined types (UDTs). User-defined aggregates extend the reach of aggregate functions to allow functions that are not built into T-SQL, such as covariant of a population. Programmers can use UDTs to create new scalar data types that extend the SQL Server type system. UDTs and user-defined aggregates must be coded in SQLCLR, not T-SQL. The execution engine recognizes the new aggregate functions and includes them in the query plan, even allowing the aggregation function to be parallelized as with other query plan iterators.
Extended .Net Data Types and Code in SQL Server
In SQL Server 2008 some new features of the SQL Server engine itself were written in .NET code. The new Policy Based Management and Change Data Capture features use .NET internally. This indicates that the SQL Server team has made a commitment to use SQLCLR code as an adjunct to unmanaged code where it is useful. Although there is a database option to disallow user-written SQLCLR code, this does not prevent the execution of system CLR code which is always enabled. Thus, in SQL Server 2008, .NET code is used as part of the engine’s standard operating environment.
Three new data types were introduced in SQL Server 2008 that are implemented using the SQLCLR user-defined type architecture; these are the spatial data types, geometry and geography, and the hierarchyid data type. The code that implements these data types in contained in a .NET assembly, Microsoft.SqlTypes.Types.dll. This assembly is required for normal server operation. One reason for the implementation is that many different operations can be performed on these data types, especially the spatial types and the natural way to expose these is through type-specific methods. The geometry data type, for example, has over 60 domain-specific methods and many properties that can be manipulated with the SQL-1999 standard “instance.method” syntax in T-SQL, instead of including a library of system functions that apply only to a single data type. This serves to better encapsulate the data type’s functionality.
Implementing built-in data types in .NET code also provides benefits for programmers that use the types. Firstly, the programming model for these types is identical whether you’re writing a SQL query or client-side object manipulation code. This allows server programmers writing SQL with the instance.method syntax to leverage their expertise when writing either client code or SQLCLR stored procedures. In addition, Microsoft.SqlTypes.Types is distributable as part of the SQL Server Feature Pack, and useable directly on the client-side (in map-drawing applications for example) without a need to connect to the server at all. The server is used for data storage, SQL queries that may include spatial functions, and spatial indexing.
Programmatic Management with SQL Server Management Objects
.NET is used in programming administrative functions in the SQL Server Engine as well through the SQL Server Management Objects (SMO) libraries. SMO is a set of libraries that are used to manipulate and manage database objects (e.g. create a new database, configure network protocols, backup and restore the database). These libraries are used by SQL Server Management Studio (SSMS) Object Explorer and are available to programmers and administrators as well. This provides an additional choice to manage the database, serving as an alternative to SQL DDL statements and system stored procedures. Because they are used to write the SQL Server Management Studio utility, internal Policy Based Management code, and SQL Server Configuration Manager, the SMO libraries are quite comprehensive. These libraries also contain a special Scripter class that can be used to produce a SQL DDL script for any supported database object. This functionality is used extensively in SSMS. The two main functions of the library are DDL type operations and configuration operations in which the library is actually a .NET wrapper around WMI (Windows Management Instrumentation) operations. In addition, there is a set of classes that enable creating and controlling trace objects and a set of classes for creating and managing SQL Server Agent jobs. SQL Server also ships with a .NET library for configuring replication. The RMO (Replicate Management Objects) library can be used as an alternative to the system stored procedures for configuring SQL Server replication.
PowerShell Support in SQL Server
SQL Server 2008 introduced support for PowerShell, a new Windows shell and successor to the command shell. This product is part of the Microsoft Common Engineering Criteria and is supported as an administrative interface by almost every product in the Windows platform. IIS, Exchange, and Active Directory are parts of the platform that currently ship with PowerShell support. Third-party products such as VMWare are beginning to provide PowerShell support as well.
Because systems administrators or programmers often assume database administrator responsibilities in smaller installations and PowerShell is a common administration tool it is thought that “multi-product administrators” will leverage their PowerShell expertise to manage SQL Server, while specialized SQL Server-only DBAs will continue to use SQL scripts and system stored procedures. Experience with .NET object models is a useful skill when working in the PowerShell environment.
PowerShell is based on programming and scripting objects, rather than passing text between commands in scripts as traditional shells (e.g. CMD, csh) do. PowerShell uses cmdlets to manipulate the system; these are programmed in .NET code. In addition, products can expose administrative functionality through a provider model. PowerShell providers assist in generic knowledge transfer by representing system configuration using the file system paradigm combined with a standard set of operations such create-alter-delete on product nodes. SQL Server 2008 includes both cmdlets and a provider. SQL Server ships a custom shell (named SQLPS.exe) in which the cmdlets and the provider are pre-installed to prevent naming conflicts. You can also manually install the cmdlets and provider for use in the main PowerShell shell.
The provider exposes a few directories under the SQLSERVER: virtual directory.
- SQL - The SQL database engine objects (servers, databases, tables)
- SQLPolicy – SQL Server policy-based management
- SQLRegistration - Server registrations in SSMS
- DataCollection – Data collection for the Management Data Warehouse
In SQL Server 2008 R2, two additional subdirectories are added
- SQLUtility –The SQL Server Utility (Enterprise Multi-Server Management)
- SQLDAC –DAC (Data-tier application databases)
There are only a few cmdlets in SQL Server 2008, including cmdlets for mapping between SQL Server database object names and PowerShell object names, a cmdlet with similar functionality to SQLCMD and cmdlets for executing policy-based management functions. Because the provider uses SMO objects to represent database objects internally, programming in PowerShell with SQL Server consists of quite a bit of navigation to the appropriate place in the provider virtual directory and then using SMO objects to perform operations.
The ability to use the SSMS Object Explorer context menu to bring up a PowerShell window at the appropriate position in the database object provider hierarchy as well as the ability to invoke PowerShell script as a SQL Agent job step round out the SQL Server 2008 PowerShell support.
Enrich your Applications with Embedded Business Intelligence
Business Intelligence is a set of tools and technologies designed to improve business decision making and planning. It generally includes reporting, quantitative analysis, and predictive modeling. Programmers can use their .NET skills along with tools in SQL Server in the programming of decision support systems.
.NET APIs are not only integrated with the database engine, but are pervasive in every part of the SQL Server product. .NET is used for four main Business Intelligence functionality groups:
- Programming Business Intelligence client applications
- Application programming at the server product level
- Administration and Configuration programming
- Writing product extensions
In this section, I’ll briefly describe the .NET facilities used by SQL Server Reporting Services (SSRS), SQL Server Analysis Services (SSAS), and SQL Server Integration Services (SSIS).
Integrate Reports & Data Visualization within your application with SQL Server Reporting Services
SQL Server Reporting Services is the tool of choice for programming reports to display in custom Web or Windows applications using the ReportViewer control , or publication in a central repository for convenient, secure, user access. Programmers can use SQL Server or Microsoft Office SharePoint Server as a repository or include report definitions directory in applications. All of the entry points into the SSRS product allow programmers to leverage their .NET coding experience.
The report definition language (RDL) is a XML document format based on a published XML schema. SSRS reports are usually designed and published using Business Intelligence Development Studio (BIDS) or the graphic Report Builder tool. Reports can use Data Sources as input (SQL Server and a variety of other data sources are supported) and work with tables, views, and stored procedure data directly, or Report Models that expose metadata and relationships in a more user-friendly form. Report Models are designed and programmed in Visual Studio and can be generated against a SQL Server or Oracle database or an Analysis Services database.
The Report Manager is an ASP.NET-based application that allows administrators to manage reports, common data source defintions, and subscriptions, and also provides the ability for users to view reports in HTML format. Reports can be rendered in a variety of formats and will even be exposed as data (AtomPub feeds) in SQL Server 2008 R2.
There is not a custom object model for building RDL, but because RDL is XML-based the .NET XML APIs can be used to built raw RDL with built-in schema validation. Programmers can integrate reports into their own applications or manage reports using a Web Service interface, as the functionality is exposed using Web Services. The Report Viewer control can use URL access or Web Service calls to access published reports, or use reports in local RDL files.
Reports can use custom code that extends the functionality provided by the SSRS built-in expression language. This custom code must be provided in .NET assemblies, which are registered with Report Manager and in the development environment. Once registered, custom .NET classes are accessed just as the built-in expressions are.
Administrators can administer the server programmatically by either using the Report Manager Web Service APIs or by writing scripts that are executed using the rs.exe utility. Administrative scripts for the rs.exe utility are programmed using Visual Basic .NET scripting language.
Reporting Services uses a componentized architecture and, as such, allows user-written extensions. These extensions use SSRS-specific .NET objects models. Some of the components that can be extended with custom .NET code are: data processing extensions, delivery extensions, rendering extensions, and security extensions. The object models are provided in two .NET libraries, Microsoft.ReportingServices.DataProcessing and Microsoft.ReportingServices.Interfaces that are shipped with SQL Server.
ADOMD.NET – Client side Analysis Services
Programmers that expose Business Intelligence functionality, such as information from OLAP cubes in custom .NET applications must use an API to extract the information from the OLAP server just as they use an API to extract information from the relational engine. Although many Business Intelligence users use packaged client applications such as Excel, writing custom web applications that extract analysis and data mining information, such as the familiar “people who brought product X are also interested in product Y” is becoming more and more commonplace. ADOMD.NET is to OLAP applications as ADO.NET is to relational database applications.
ADOMD.NET is a specialized ADO.NET data provider that not only exposes the traditional Connection-Command-DataReader paradigm, but contains additions for multidimensional data and data mining. With OLAP cubes, metadata is not only more complex (it can also be hierarchical in the case of hierarchical dimensions) but also more important, because OLAP client applications often rely on discovery through metadata at runtime. ADOMD.NET exposes this metadata in three different ways: through provider-specific properties (e.g. Cubes and MiningModelCollection properties) on the Connection object, through an extensive set of SchemaRowsets (accessed through the GetSchemaRowset method as with traditional ADO.NET providers), and through an ADOMD.NET-specific object model devoted to metadata, rooted at the CubeDef class. Because analysis queries can be long-running, the ADOMD.NET provider supports multiplexing Connection instances through a Session class. Programmers use ADOMD.NET on the client by adding a program reference to the Microsoft.AnalysisServices.AdomdClient.dll.
Multidemensional resultsets are also different than rectangular relational resultsets. These resultsets can have multiple axes as opposed to the two-axis column and row relational rowsets. The ADOMD.NET provider supports using the AdomdCommand class to execute MDX (MultiDimensionalExtensions) or DMX (DataMiningExtensions) language commands. These commands can return a two-dimensional AdomdDataReader or a multidimensional CellSet. The CellSet is shaped like a cube rather than a table, and programmers access data consisting of CellCollections and retrieve axis information using the Axes property. You can also use a DataAdapter to get cells in the form of a DataSet. The provider is currently read-only but supports composing multiple operations in a read-committed level transaction.
ADOMD.NET ships with SQL Server or is available separately as part of the SQL Server 2005-2008 Feature Pack.
ADOMD.NET – Server side Analysis Services
In SSAS you can write stored procedures or user-defined functions in .NET using the ADOMD.NET library. The server library has almost the exact same object model as the client library but slightly different functionality. To write in-server .NET code, you reference Microsoft.AnalysisServices.AdomdServer.dll.
User-defined functions are used in the context of either an MDX or DMX statement, as part of the statement itself. Stored procedures are executed stand-alone with the MDX or DMX CALL statement. Both UDFs and stored procedures can take parameters, but UDFs can return a variety of data types including instances of the ADOMD.NET Set object. Stored procedures can only return sets, as IDataReader or DataSet. A stored procedure can also have no return value. The reason for using ADOMD.NET with in-server objects is to decrease network latency (one single network call as opposed to possibly many network calls from the client) and to encapsulate data logic for reuse.
In server-side ADOMD.NET, you do not get a Connection object in order to access the data. Instead, you start by obtaining a Context object that represents your current connection. The Context object does not expose any schema rowsets, you must use the object model to access metadata. A UDF is created for either MDX or DMX and cannot mix functionality in a single module. Therefore, depending on the calling context, either a CurrentCube or CurrentMiningModel property is available from the context object. ADOMD.NET functions and stored procedures are read-only. They do not support writeback and therefore do not support transactions.
Analysis Management Objects
AMO is an object model for creating, altering, and managing Analysis Services cubes and Data Mining models. It is analogous to SMO in the database engine. In addition to a comprehensive set of classes to manipulate metadata, AMO contains classes that allows the .NET administrator to automate backup and restore of Analysis Services databases, a Trace class for monitoring, replay, and management of SSAS profiler traces, and CaptureLog and CaptureXML classes for scripting capturing and scripting operations performed by SSAS SMO statements. Once again, administrators with the ability the write .NET code can leverage their expertise with AMO. SSAS also supports an XML-based scripting language for management known as ASSL.
Microsoft does not ship a formal PowerShell provider or cmdlets for SSAS. However, you can use AMO classes with PowerShell in an analogous way as the SMO classes are used. In addition, a PowerShell snapin including a provider and cmdlets called PowerSSAS is available on the CodePlex website at http://www.codeplex.com/powerSSAS .
Integrated Your Data with Ease with SQL Server Integration Services
SSIS is an extract-transform-load (ETL) tool that is used to build database workflows called packages. SSIS packages are designed for fast data transformation and loading using an in-memory “data pipeline”, often using the bulk extract-bulk insert functionality of databases directly. SSIS allows programmers to concentrate on writing workflow logic rather than building custom code for each ETL scenario. Programmers often use SSIS packages to populate Analysis Services cubes and train Data Mining models, as well as import and export of relational data.
The SSIS data flow engine is not written in .NET code, but an ADO.NET provider can be used as either as source or destination of data. SSIS packages are usually created in the Business Intelligence Development Studio (BIDS) environment, but you can also create them programmatically through a .NET object model. The entire SSIS package model is available in .NET, rooted in a class named Package. You can also manage and execute packages programmatically using the classes in Microsoft.SqlServer.Dts.Runtime.dll. Management classes allow you to run packages locally or on a remote machine, load the output of a local package, load, store, and organize packages into folders, and enumerate existing packages stored in the SQL Server database or file system. SQL Server Agent is usually used to run the packages on a schedule or an ad-hoc basis.
SSIS packages consist of control flows and data flows. Control flows contain tasks, with special enumerator components to allow the execution of a task over a collection of items (e.g. executing an FTP task over a collection of files). Data flows can contain source providers, destination providers, and transformation components. Data source connection information for the providers (e.g. connection strings to databases) is managed by Connection Manager components. Components log information to an informational and error log.
SSIS ships with a rich set of ETL components. It is interesting to note that each SSIS component that is part of the product is implemented in a separate assembly, that is, they are implemented in .NET. Therefore programmers with .NET expertise can use SSIS object models to create custom components that extend the base functionality of implement SSIS integration with ETL items that are not supported by SSIS “in the box”. These components include not only an execution portion but also a designer portion so that the custom component can be used to design packages in BIDS. There’s a .NET object model to enable programmers to build:
· Custom tasks
· Custom connection managers
· Custom log providers
· Custom enumerators
· Custom data flow components – source, destination, or transformation components
Note that you cannot write custom components that derive from system components.
Finally, SSIS ships with script components that allow custom logic in control flows or data flows. This allows script-based lightweight access for customizations without having to build an entire custom component. SSIS includes both a Script Task for task flow and a Script Component for the data flow. In SQL Server 2008, these scripts use Visual Studio Tools for Applications (VSTA) and the script code can be written in either VB.NET or C# (only VB.NET is supported in SQL Server 2005). Once again, the .NET programmer or administrator should be right at home with SSIS script coding and component customizations.
Sync Services for ADO.NET enables collaboration and offline scenarios for applications, services and devices
As compact devices get more prolific, a common pattern in data access emerges in which data is stored on in a corporate database, with many individual devices needing replication/synchronization of their-specific portion of the database tables. An example would be salespeople that need a local copy of their customers’ data for offline use while on the road. The salesperson also needs to be able to take orders offline then upload changes to the corporate database. For this type of scenario, ADO.NET Sync Services fills the bill nicely, allowing client-directed synchronization between databases such as SQL Server and SQL Server Compact Edition. Sync Services for ADO.NET is part of the Microsoft Sync Framework, which also supports file system sync and feed sync (for RSS and ATOM feeds).
Sync Services works with a provider model and ships with a SyncClient provider for SQL Server Compact Edition. The SyncServer provider works with any ADO.NET data provider. Sync Services support unidirectional synchronization, bidirectional synchronization, or complete refresh with each synchronization. Version 1.0 supports hub-and-spoke style synchronization, Version 2.0 also supports peer-to-peer style synchronization.
The Server provider works with the set of synchronization commands, but also allowed for programmable conflict resolution if the data in the server database changes but there is also a change on the client. SQL Server 2008 includes a Change Tracking feature that allows distinguishing which changes are made on client or server, and allows selective synchronization on a per-client basis. Sync Services for ADO.NET is available for SQL Server Compact Edition on the desktop and on compact devices.
Microsoft Visual Studio Team System 2008 Database Edition
Regardless of which features that programmers use to interact with the database, they need a strong development environment that includes testing, profiling, version control, and software workflow capabilities. Visual Studio Team System (VSTS) is Microsoft’s flagship software development lifecycle product. In addition to application and service development, database programmers and administrators can also use to develop, test, and deploy database objects and schemas using the same software development lifecycle they are already familiar with. VSTS Database Edition includes SQL Server 2000-2008 support and provides a provider mechanism for other database support as well. In addition to the built-in functionality there are customization points for .NET programmers who wish to extend the product.
VSTS Database Edition unit tests are encapsulated in .NET modules and a .NET extension API is available to customize the tests. You’d do this when you need to test for special conditions that aren’t covered by the graphic user interface. Programmers also use VSTS Database Edition to generate test data. A standard set of test data generators are provided, but you can extend this with custom generators, using the classes in Microsoft.VisualStudio.TeamSystem.Data.Generators.dll.
The latest version of the product (VSTS 2008 Database Edition GDR) includes database object refactoring and static code analysis. You can extend the facilities of both of these features by writing custom .NET code. Custom database refactoring code (e.g. SQL code casing) and custom refactoring targets (e.g. text files) are supported by using a framework of classes in the VSTS Database Edition interop DLLs. Static code analysis rules can be plugged in to the existing rule infrastructure by using Microsoft.Data.Schema.ScriptDom.dll.
Finally, VSTS Database Edition ships with a SQL parser that developers can use to parse and validate T-SQL and generate T-SQL scripts. The parser is not only used by the product but available as a component for programmer use. For further information and an example of the parser’s use, reference http://blogs.msdn.com/gertd/archive/2008/08/21/getting-to-the-crown-jewels.aspx .
The Future Frontier
So far, we’ve covered how you can use .NET as a substrate to develop with databases outside the server, and with SQL Server inside the server product as well. In future products, the vast amount of functionality available to .NET data access programmers as well the synergy between SQL Server and .NET continues to increase. Here’s a survey of some of the features for .NET programmers coming in the near future.
SQL Azure Database – Database in the Cloud
SQL Azure Database is a cloud-based database built on SQL Server. Because it uses SQL Server underneath, any database API, including Entity Framework, ADO.NET, and others, can be used with SQL Azure Database simply by changing the connection string. Version 1.0 of SQL Azure Database supports almost all of the database objects (e.g. tables, views, and stored procedures) in the SQL Server database engine, but Version 1.0 does not yet support in-database .NET programming (SQLCLR).
Self-Service Business Intelligence
SQL Server PowerPivot For Excel is an Excel 2010 add-in that facilitates the creation of analysis models in Excel using multiple data sources, and also allows the user to define relationships between tables in disparate data sources. This enables an entire population of non-programmers to write Excel apps against their corporate data in a way they never could before, enabling new scenarios in Business Intelligence. The data in the models can be set to automatically update on a schedule and the Excel-based model can be published to a SharePoint server farm. Using SQL Server PowerPivot for SharePoint 2010, the SQL Server PowerPivot Excel-based data can be exposed as an Analysis Services 2010 cube data source. SQL Server PowerPivot For Excel will support ATOMPub data feeds, meaning that any data source that’s exposed through ADO.NET Data Services will be available. This allows SQL Server PowerPivot For Excel to import data exposes through EDM object models, as well as SharePoint 2010 List data. In addition, SSRS 2008 R2 will expose any report that’s published to the SSRS Report Manager as an ATOMPub feed, permitting the creator of a SQL Server PowerPivot For Excel data model to import report data for further analysis.
SQL Server Master Data Services– Manage Your Most Important Data Assets
SQL Server Master Data Services is a feature of SQL Server 2008 R2 that allows a business to manage its most vital and precious data such as customer and product lists. Often multiple copies of this data are stored in different databases and it’s easy to end up with duplicate but slightly different versions of information such as addresses. MDS allows you to resolve such data inconsistencies. For more information about Master Data Management concepts, reference the whitepaper “The What, Why, and How of Master Data Management” at http://msdn.microsoft.com/en-us/library/bb190163.aspx.
SQL Server MDS includes a SQL Server-based repository, a Web-based front end that allows domain experts and user analysts to manage the data, and a Web Service-based interface for programmability. The feature implements fuzzy matching and domain-specific indexes. The fuzzy indexing algorithms were developed by Microsoft Research.
The programming functionality is included as a set of SQLCLR system functions (e.g. similarity and regular expression style functions) and index building and lookup procedures for multicolumn domain-specific indexes, implemented in the MDM repository database. Because they are written in SQLCLR these functions could be leveraged in future in other parts of the SQL Server product such as SSIS (for bulk matching and data cleansing), SSRS (to produce fuzzy match reports) and IFTS (Integrated Fulltext Search) as an adjunct and alternative to word-stemming based text matching.
Real-time Applications with Data in Flight (StreamInsight)
In most data use cases, data is first stored in a database and analyzed after it’s stored. However, with some use cases (e.g. power grid or heart monitor readings, stock ticker prices) the act of storing the data introduces too much latency for the analysis to be useful. StreamInsight is a Complex Event Processing (CEP) system that allows data to be analyzed and events to be summarized “in flight”.
The StreamInsight produce in built on a service process and input and output providers. Input providers (written in .NET using the StreamInsight APIs) translate input data into a standard set of input stream messages. The streaming applications themselves are also written using a .NET API. Each message type can be registered with one or more LINQ queries (StreamInsight supplies the LINQ provider) for in-flight querying, message aggregation and output through message output providers. Input and output providers for SQL Server data are among those included with the current samples.
Simplify Data Development with Modeling
Modeling with M, EDM, and the Entity Framework
Programmers can be more productive when the transition from model to application is part of the development process. Future versions of SQL Server will support several features to facilitate this process, including a programming language to create models (“M” language) and SQL Server Modeling Services, a repository for storing, querying, and programming with models. The current CTP includes “M” to T-SQL generation for defining and populating models and the Quadrant tool for creating and compiling “M”-based models and DSLs (domain-specific languages), and browsing and editing the repository.
One near-term usage of this product will be using the “M” language as an eventual replacement of XML-based EDMX model files as part of the generation of the Entity Framework mapping model and code. This will allow greater model scalability compared to using EDMX. Programmers can either use the EDM graphic designer in Visual Studio or code their Entity Data Model in “M”, and the model will be used by the Entity Framework directly at runtime.
In addition to Entity Framework 4.0, which supports the major of object programming patterns used today, Entity Framework will also be more closely integrated with the SQL Server product in the future. A most likely first step is to allow programmers to design reports in SQL Server Reporting Services directly against the conceptual model. The conceptual model could be integrated with other models in the framework as well.
SQL Server Modeling Services Repository
The SQL Server Modeling Services repository resides in a SQL Server database, so it can be populated and queried with either the “M” or T-SQL languages. SQL Server-specific facilities such as Change Data Capture can be used to provide versioning of the repository. By using the “M” language, programmers can model not only the data and object models, but also applications, workflows and business processes, services, configuration information, and the operating system infrastructure such as SMS (Microsoft Systems Management Service) configuration and installation information. By storing the information is a single repository, it’s possible to define relationships among domain-specific models for a complete picture of business and computer-based processes. SQL Server Modeling Services ships with a Base Domain Library (BDL) to provide a service layer to model-driven applications, including pre-built domains for UML and CLR and tools that support them.
The storage of data and application models in the SQL Server Modeling Services Repository, along with code generation to make repository information synergistic with data applications, will assist in recording and using models to make SQL Server Modeling Services the provider of a single, integrated, auditable, version of truth for the organization’s data models, application service models, configuration information, software management, and business processes.
As you can see programmers use .NET as the underlying substrate for extending and customizing all parts of the SQL Server family including deep support in SQL Server Analysis Services, Integration Services, and Reporting Services. In future SQL Server will evolve into a database that not only integrates with the relational model of data but provides direct support of the conceptual model as well.
The current .NET data access APIs are always based on a provider model that defines a base functionality with extensibility points that allow database-specific functionality to be fit into the model. A SQL Server implementation of each client API is able to serve as a reference implementation and also as a model to illustrate provider extensibility. SQL Server carries the data models into the database itself, enabling in-database .NET programming and extensibility.
For more information:
http://www.microsoft.com/sqlserver/ : SQL Server Web site
http://technet.microsoft.com/en-us/sqlserver/ : SQL Server TechCenter
http://msdn.microsoft.com/en-us/sqlserver/ : SQL Server DevCenter
Did this paper help you? Please give us your feedback. Tell us on a scale of 1 (poor) to 5 (excellent), how would you rate this paper and why have you given it this rating? For example:
- Are you rating it high due to having good examples, excellent screen shots, clear writing, or another reason?
- Are you rating it low due to poor examples, fuzzy screen shots, or unclear writing?
This feedback will help us improve the quality of white papers we release.