Introduction to Data Access with ADO.NET
As you develop applications using ADO.NET, you will have different requirements for working with data. In some cases, you might simply want to display data on a form. In other cases, you might need to devise a way to share information with another company.
No matter what you do with data, there are certain fundamental concepts that you should understand about the data approach in ADO.NET. You might never need to know some of the details of data handling — for example, you might never need to directly edit an XML file containing data — but it is very useful to understand the data architecture in ADO.NET, what the major data components are, and how the pieces fit together.
This introduction presents a high-level overview of these most important concepts. The topic deliberately skips over many details — for example, there is much more to datasets than what is mentioned here — in favor of simply introducing you to ideas behind data integration in ADO.NET.
Note When you deploy an application that includes Visual Studio data-access components, you must make sure that the user installing the application has version 2.7 or later of the Microsoft Data Access Components (MDAC). For more information, see Adding a Launch Condition for Microsoft Data Access Components.
ADO.NET Does Not Depend On Continuously Live Connections
In traditional client/server applications, components establish a connection to a database and keep it open while the application is running. For a variety of reasons, this approach is impractical in many applications:
- Open database connections take up valuable system resources. In most cases, databases can maintain only a small number of concurrent connections. The overhead of maintaining these connections detracts from overall application performance.
- Similarly, applications that require an open database connection are extremely difficult to scale up. An application that does not scale up well might perform acceptably with four users but will likely not do so with hundreds. ASP.NET Web applications in particular need to be easily scalable, because traffic to a Web site can go up by orders of magnitude in a very short period.
- In ASP.NET Web applications, the components are inherently disconnected from each other. The browser requests a page from the server; when the server has finished processing and sending the page, it has no further connection with the browser until the next request. Under these circumstances, maintaining open connections to a database is not viable, because there is no way to know whether the data consumer (the client) requires further data access.
- A model based on always-connected data can make it difficult and impractical to exchange data across application and organizational boundaries using a connected architecture. If two components need to share the same data, both have to be connected, or a way must be devised for the components to pass data back and forth.
For all these reasons, data access with ADO.NET is designed around an architecture that uses connections sparingly. Applications are connected to the database only long enough to fetch or update the data. Because the database is not holding on to connections that are largely idle, it can service many more users.
Database Interactions Are Performed Using Data Commands
To perform operations in a database, you execute SQL statements or stored procedures (which include SQL statements). You use SQL statements or stored procedures to read and write rows and perform aggregate functions, such as adding or averaging. You also use SQL statements or stored procedures to create or modify tables or columns, to perform transactions, and so on.
In ADO.NET you use data commands to package a SQL statement or stored procedure. For example, if you want to read a set of rows from the database, you create a data command and configure it with the text of a SQL Select statement or the name of a stored procedure that fetches records.
When you want to get the rows, you do the following:
- Open a connection.
- Call an execute method of the command, which in turn:
- Executes the SQL statement or stored procedure referenced by the command.
- Then closes the connection.
The connection stays open only long enough to execute the statement or stored procedure.
When you call a command's execute method, it returns a value. Commands that update the database return the number of rows affected; other types of commands return an error code. If the command queries the database with a SELECT statement, the command can return a set of rows. You can fetch these rows using a data reader, which acts as a very fast read-only, forward-only cursor. For more information, see Retrieving Data Using the DataReader.
Security Note When using data commands with a CommandType property set to Text, carefully check information that is sent from a client before passing it to your database. Malicious users might try to send (inject) modified or additional SQL statements in an effort to gain unauthorized access or damage the database. Before you transfer user input to a database, you should always verify that the information is valid; it is a best practice to always use parameterized queries or stored procedures when possible. For more information, see Scripting Exploits.
If you need to perform more than one operation — for example, read some rows and then update them — you use multiple data commands, one for each operation. Each operation is performed separately. For example, to read the rows, you open the connection, read the rows, and then close the connection. When you want to update data, you again open the connection, perform the update, and then close the connection again.
Data commands can include parameters (specifically, a collection of parameter objects) that allow you to create parameterized queries such as the following:
Select * From customers Where (customer_id = @customerid)
You can then set the parameters at run time and execute the command to return or update the data you want.
Data Can Be Cached in Datasets
The most common data task is to retrieve data from the database and do something with it: display it, process it, or send it to another component. Very frequently, the application needs to process not just one record, but a set of them: a list of customers or today's orders, for example. Often the set of records that the application requires comes from more than one table: my customers and all their orders; all authors named "Smith" and the books they have written; and other, similar, sets of related records.
Once these records are fetched, the application typically works with them as a group. For example, the application might allow the user to browse through all the authors named "Smith" and examine the books for one Smith, then move to the next Smith, and so on.
In many cases, it is impractical to go back to the database each time the application needs to process the next record. (Doing so can undo much of the advantage of minimizing the need for open connections.) A solution, therefore, is to temporarily store the records retrieved from the database and work with this temporary set.
This is what a dataset is. A dataset is a cache of records retrieved from a data source. It works like a virtual data store: A dataset includes one or more tables based on the tables in the actual database, and it can include information about the relationships between those tables and constraints on what data the tables can contain.
The data in the dataset is usually a much-reduced version of what is in the database. However, you can work with it in much the same way you do the real data. While you are doing so, you remain disconnected from the database, which frees it to perform other tasks.
Of course, you often need to update data in the database (although not nearly as often as you retrieve data from it). You can perform update operations on the dataset, and these can be written through to the underlying database.
An important point is that the dataset is a passive container for the data. To actually fetch data from the database and (optionally) write it back, you use data adapters. A data adapter contains one or more data commands used to populate a single table in the dataset and update the corresponding table in the database. (A data adapter typically contains four commands, one each to select, insert, update, and delete rows in the database.) Therefore, a data adapter's Fill method might execute a SQL statement such as
SELECT au_id, au_lname, au_fname FROM authors whenever the method is called.
Because a dataset is effectively a private copy of the database data, it does not necessarily reflect the current state of the database. If you want to see the latest changes made by other users, you can refresh the dataset by calling the appropriate Fill method.
One of the advantages of using datasets is that components can exchange them as required. For example, a business object in the middle tier might create and populate a dataset, then send it to another component elsewhere in the application for processing. This facility means that components do not have to individually query the database.
Datasets Are Independent of Data Sources
Although a dataset acts as a cache for data drawn from a database, the dataset has no actual relationship with the database. The dataset is a container; it is filled by SQL commands or stored procedures executed from a data adapter.
Because a dataset is not directly tied to a data source, it is a good integration point for data coming from multiple sources. For example, some of the data in a dataset might come from your database, whereas other parts of it might come from a different database or a non-database source such as a spreadsheet. Some of the data in a dataset might arrive in a stream sent by another component. Once the data is in a dataset, you can work with it using a consistent object model, regardless of its original source.
Data Is Persisted as XML
Data needs to be moved from the data store to the dataset, and from there to various components. In ADO.NET the format for transferring data is XML. Similarly, if data needs to be persisted (for example, into a file), it is stored as XML. If you have an XML file, you can use it like any data source and create a dataset out of it.
In fact, in ADO.NET, XML is a fundamental format for data. The ADO.NET data APIs automatically create XML files or streams out of information in the dataset and send them to another component. The second component can invoke similar APIs to read the XML back into a dataset. (The data is not stored in the dataset as XML — for example, you cannot parse data in a dataset using an XML parser — but instead in a more efficient format.)
Basing data protocols around XML offers a number of advantages:
- XML is an industry-standard format. This means that your application's data components can exchange data with any other component in any other application, as long as that component understands XML. Many applications are being written to understand XML, which provides an unprecedented level of exchange between disparate applications.
- XML is text-based. The XML representation of data uses no binary information, which allows it to be sent via any protocol, such as HTTP. Most firewalls block binary information, but by formatting information in XML, components can still easily exchange the information.
For most scenarios, you do not need to know XML in order to use data in ADO.NET. ADO.NET automatically converts data into and out of XML as needed; you interact with the data using ordinary programming methods.
Schemas Define Data Structures
Although you do not need to know anything about XML to read and write to the database and work with datasets, there are situations in which working with XML is precisely the goal you are after. These are situations in which you are not accessing data, but instead, working with the design of data. To put it another way, in ADO.NET you use XML directly when you are working with metadata.
Datasets are represented as XML. The structure of the dataset — the definition of what tables, columns, data types, constraints, and so on are in the dataset — is defined using an XML Schema based on the XML Schema definition language (XSD). Just as data contained by a dataset can loaded from and serialized as XML, the structure of the dataset can be loaded from and serialized as XML Schema.
For most of the work you do with data in ADO.NET, you do not have to delve deeply into schemas. Typically, the Visual Studio .NET tools will generate and update schemas as needed, based on what you do in visual designers. For example, when you use the tools to create a dataset representing tables in your database, Visual Studio .NET generates an XML Schema describing the structure of the dataset. The XML Schema is then used to generate a typed dataset, in which data elements (tables, columns, and so on) are available as first-class members. For more information about typed datasets, see Introduction to Datasets.
However, there are times when you want to create or edit schemas yourself. A typical example is developing a schema in conjunction with a partner or client. In that case, the schema serves as a contract between you and the partner regarding the shape of the XML-based data you will exchange. In that situation, you will often have to map the schema elements to the structure of your own database. For more information about designing schemas, see XML Schemas and Data.
Components of ADO.NET
The following illustration shows the major components of an ADO.NET application.
ADO.NET Data Components
The following table summarizes the ADO.NET data components illustrated above and provides links to more information.
|Component or object||For more information see|
|Dataset||Introduction to Datasets|
|Data adapter||Introduction to Data Adapters|
|Data connection||Introduction to ADO.NET Connection Design Tools|
|Windows Form||Introduction to Windows Forms|
|Web Forms page||Introduction to Web Forms Pages|
|BizTalk||BizTalk Web page (http://www.microsoft.com/biztalk)|