Database Architecture: The Storage Engine
Microsoft Consulting Services
Summary: This article provides insight into the inner workings of SQL Server architecture. It covers database engine enhancements, providing usage tips and pointers to more information. Understanding depth information about the SQL Server engines can help a Database Administrator (Database System Engineer) take greater advantage of SQL Server when designing, building or enhancing database systems. Although targeted at database professionals, this material may also be useful from an educational or marketing perspective. (21 printed pages)
This article describes the new storage engine features in Microsoft® SQL Server™ 2000 and offers tips for using them, along with some insight into how the storage engine works. Developing a basic understanding of the inner workings of the storage engine can help you get the most out of SQL Server.
In a world that is focused on highly scalable applications, databases are now designed and implemented on short schedules, and remain in a constant state of evolution as development requirements change and production usage grows. Scalability, availability, and ease-of-use requirements demand a responsive and flexible data storage engine.
Different editions of SQL Server 2000 support a wide variety of systems, ranging in size from a tiny mobile system for a Pocket PC all the way up to a high availability multiple-terabyte transaction processing or decision support system running on clustered Windows® 2000 Datacenter Servers. All of these systems maintain the flexibility, security, and reliability mission-critical business systems demand.
You can deploy SQL Server 2000 applications for projects of varying purposes and sizes, as a result of intelligent, automated storage engine operations. A highly sophisticated architecture improves performance, availability, and scalability.
Reliability and concurrency are enhanced with new algorithms for physical file interaction. These algorithms eliminate the need to run database console commands (DBCCs) as part of regular maintenance. However, DBCC is still available, and the new DBCC CHECK commands can be run without inhibiting online processing.
The storage subsystem, which consists of the physical database files and their layout on disk, supports scaling from very small to very large databases. SQL Server can now support up to 64 GB of physical memory (RAM) and up to 32 processors.
Ease of use
Enhanced administration capabilities help the Database Administrator (DBA) to automate and centralize server management. This also allows easy maintenance of remote servers and applications without the necessity of having a DBA visit every site. Server configuration, managed by a sophisticated algorithm, is dynamically responsive to server usage patterns, freeing the DBA to concentrate on database management and optimization tasks.
The relational database server of SQL Server 2000 has two main parts: the relational engine and the storage engine. The two engines work independently, interacting with each other through native data access components such as OLE DB. The relational engine provides an interface into the storage engine, which is composed of services to interact with the underlying database storage components and features.
The primary responsibilities of the storage engine include:
- Providing features to improve ease of use for managing storage components
- Managing the data buffers and all I/O to the physical files
- Controlling concurrency, managing transactions, locking, and logging
- Managing the files and physical pages used to store data
- Recovering from system faults
The SQL Server 2000 storage engine offers new features that add conceptual simplicity and physical flexibility, while reducing the need for meticulous capacity planning and performance tuning. SQL Server 2000 reacts to its own environment and dynamically adapts to changes in database usage accurately and quickly. This breakthrough in technology has elevated the focus of database administration to the facilitation of data as a service. SQL Server 2000 DBAs can focus on designing a system that is responsive to data flow and usage, rather than spending time tuning individual parameters.
The changes in SQL Server 2000 are built on architectural enhancements introduced in SQL Server 7.0 to provide a foundation for ongoing improvement and innovation. A key goal for the storage engine team was to reduce the amount of time and effort spent tuning the server on a regular basis. Because most tuning parameter settings can be based on database usage, the engine now dynamically adjusts to situations in the database environment according to an adaptive algorithm. This automated flexibility has been implemented for tuning parameters that required constant adjustment and experimentation in earlier versions. You can still manually adjust tuning features, but SQL Server 2000 does more of the work for you. Only a small percentage of SQL Server customers should require any adjustment to the tuning parameters; this type of adjustment should only be performed with careful testing and under the supervision of fully qualified Database Administrators.
The following table summarizes the key enhancements made to the SQL Server 2000 storage engine. They are described in greater detail later in this article.
|Feature||Description and benefits|
|Application lock manager||If you need to control concurrent access to application-defined resources, such as forms, new stored procedures now allow you to lock these resources using the SQL Server application lock manager.|
|Database console commands (DBCCs)||The DBCC CHECK commands can now run during online processing, without blocking updates. New enhancements allow verifying consistency of physical pages to detect hardware-induced errors. In SQL Server 2000 Enterprise Edition, DBCC now runs in parallel across multiple processors.|
|Database options||All database options can now be modified using ALTER DATABASE. This ability simplifies administration.|
|Differential backups||Differential backups are quicker in SQL Server 2000, due to an enhancement that tracks database changes at the extent level.|
|Dynamic tuning||Using dynamic adaptive algorithms, the server automatically adjusts previously static configuration settings. Administrative control is still available to manage system-wide resources, but you will not usually need to use it. Manually set parameters adapt dynamically within their constrained boundaries.|
|In-row text||In tables that include a small, frequently used text column, smaller text values can be stored in the same page with the standard data row, rather than on a page of text values. In tables where this text data is accessed frequently, this feature can eliminate a large amount of disk I/O.|
|Index builds in parallel||In Enterprise Edition, index builds automatically make use of all processors configured for parallel processing, reducing the time it takes to build an index by as much as a factor of six on an eight-processor server. Index builds also take advantage of available resources in memory and tempdb.|
|Index read ahead||Index reads have been enhanced to increase performance on index scans.|
|Index reorganization||Improvements made to DBCC SHOWCONTIG provide more detailed information regarding index fragmentation. A new DBCC command, INDEXDEFRAG, reorganizes index pages online without disrupting database service or incurring any risk to database consistency or recovery.|
|Descending order key columns on indexes||Individual keys columns in an index can be specified as ascending or descending order.|
|KILL command||This command now reports completion progress. If this command is waiting on another process, such as a rollback, you can view how much of the command has been executed. This command has been enhanced to allow you to stop Microsoft Distributed Transaction Coordinator (MS DTC) transactions, which are not associated with a specific session.|
|Large memory support||Windows 2000 technology improves the performance of Enterprise Edition systems that use a large amount of memory. Using the AWE extensions of Windows 2000, SQL Server 2000 can support up to 64 GB of physical memory (RAM).|
|Locking||The lock manager has been enhanced to detect deadlocks across additional resources such as threads and memory. Concurrency improvements reduce deadlocks. This further enhances scalability in SQL Server 2000.|
|Logical log marks||Transact-SQL commands can create a bookmark in the log to permit restoration of the database to the point in time indicated by the bookmark. This feature also synchronizes restoration of multiple databases used for the same application.|
|Online index reorganization||Improvements made to DBCC SHOWCONTIG provide more detailed information regarding index fragmentation. A new DBCC command, INDEXDEFRAG, reorganizes index pages online without disrupting database service or incurring any risk to database consistency or recovery.|
|Optimized I/O read-ahead||SQL Server 2000 issues multiple serial read-ahead reads at once for each file involved in the scan. The query optimizer uses serial, read-ahead I/O when scanning tables and indexes for improved performance.|
|Passwords on backups||Backup media and individual backup can be password-protected. This prevents an unauthorized user restoring a backup and gaining access to a database.|
|Recovery models||Using recovery models, you can select the level of logging in the database. This allows greater flexibility of transaction log management. The recovery model can be altered online to complement varying database use throughout the day.|
|Shared table scans||In Enterprise Edition, multiple scans of a table can now take advantage of other ongoing scans of that table, reducing physical I/O to the disk.|
|Shrinking the log||The command to shrink the log runs immediately in more situations. When the log cannot be shrunk immediately, SQL Server will provide constructive feedback on what must be done before continuing with or completing the shrink operation.|
|Snapshot backups||Support for snapshot backups by third-party vendors has been enhanced. Snapshot backups take advantage of storage technologies to backup or restore an entire database in seconds. These backups can now be combined with a conventional transaction log and differential backups to provide complete protection for OLTP databases. This is especially beneficial for moderate to very large databases in which availability is extremely important.|
|Space-efficient empty tables and indexes||No disk pages are allocated for empty tables and indexes in SQL Server 2000. SQL Server 7.0 allocated as many as three pages for empty tables and indexes.|
|Top n sort||This new feature optimizes retrieval of top n values (for example, |
|Xlock||SQL Server 2000 provides this new Transact-SQL locking hint. It can be used to explicitly invoke an exclusive transaction-level page or table lock.|
SQL Server 2000 has been enriched with features that allow more efficient data interaction and more administrative flexibility. The following sections give more details on these enhancements plus some tips on how to use them.
In SQL Server 2000, the storage engine has been enhanced to provide even more scalability and performance when interacting with the data. Understanding these enhancements can help you use SQL Server more effectively.
The exchange of data begins with a query, whether it originates from a user interface or from an automated task. The data request is passed into the relational engine, which interacts with the storage engine to get the data and pass it back to the user. From the perspective of the user, and even the DBA, the functioning of the storage and relational engines are indistinguishable.
Reading Data More Effectively
Data flows between the server and the user through a series of transactions. The application or user initiates the work, and the database passes it to the query processor for completion and then returns the end results. The query processor does the work by accepting, interpreting, and executing SQL statements.
For example, when a user session issues a SELECT statement, the following steps occur:
- The relational engine compiles and optimizes the statement into an execution plan (which is a series of steps required to get the data). The relational engine then runs the execution plan. The execution steps involve accessing tables and indexes through the storage engine.
- The relational engine interprets the execution plan, making calls into the storage engine to gather the necessary data.
- The relational engine combines all the data returned by the storage engine into the final result set and then sends it back to the user.
A couple of improvements have been made to boost performance in this process. In SQL Server 2000, the relational engine relays qualifying query predicates to the storage engine so they can be applied earlier in the process, resulting in more efficient exchange between the storage and relational engine. This can provide a significant performance gain for qualifying queries.
Top n enhanced
Another improvement is in the way the storage engine handles selection of the top n records from a result set. In SQL Server 2000, a new top n engine analyzes the best path of operation for statements like this one:
SELECT top 5 * from orders order by date_ordered desc
For this example, if the whole table must be searched, the engine analyzes the data and tracks only the top n values in the cache. This is a tremendous performance boost for this type of SELECT statement, because only the values in the top n will be sorted, rather than the whole table.
In SQL Server 2000 Enterprise Edition, two or more queries can share ongoing table scans, which can improve performance in very large SQL Server 2000 databases. For example, when a query searches a very large table using an unordered scan, the pages flow through the cache to make room for the data flowing in. If another query were started, a second scan of the same table would incur disk I/O to retrieve those pages again. In an environment where there are frequent table scans, this can cause disk thrashing as both queries search the same data pages.
Figure 1. Shared scans efficiencies
An optimizing process reduces the amount of disk I/O produced by this type of data access pattern. The first unordered scan of a table will read the data from the disk; instead of having to read the disk again, subsequent unordered scans of the same table can build on the information already in memory. See Figure 1. During multiple simultaneous scans of the same table, this synchronization process may boost performance as much as eightfold. This improvement is even more noticeable in large decision support queries, where the total table size is much larger than the size of the cache.
Shared scans are a feature invoked by the storage engine to assist with queries that have no better available execution plan. The intent of this feature is to assist in frequent reads of very large tables. When the query processor determines that the best execution plan includes a table scan, this feature is invoked. However, while it is possible to use query or index tuning to force shared scans, no performance gain is achieved by forcing a table scan where a well-maintained index would do the job as well or better.
In order to maintain transactional consistency while many users are interacting with the data, the storage engine locks resources to manage dependencies on rows, pages, keys, key ranges, indexes, tables, and databases. By locking resources while they are being altered, the engine prevents more than one user from altering the same data at the same time. SQL Server locks are dynamically applied at various levels of granularity, in order to select the least restrictive lock required for the transaction.
In SQL Server 2000, concurrency improvements further reduce deadlocks and avoidable locking of resources. For example, the lock manager has been enhanced to be aware of other resources that might be in contention, such as threads and memory. This new ability can help a Database Administrator identify a wider variety of design or hardware limitations.
A new Transact-SQL interface into the lock manager has been introduced to support customized locking logic within programming code. Locks necessary for business logic can be initiated by invoking sp_getapplock within your Transact-SQL batch, which allows you to specify an application-defined resource to be locked (for example, a lock on an application resource like a form, instead of a lock on a data row), the mode of locking to use, the timeout value, and whether the scope of the lock should be the transaction or the session. After locks have been initiated with the new application lock manager, they participate in the normal lock management of SQL Server, just as if the storage engine had initiated them, so you do not have to worry that your application-initiated lock will remain open if the calling transaction is terminated.
The process by which locks are acquired in SQL Server 2000 takes into account whether or not all the data on the page is committed. For example, if you run a SELECT statement against a table whose data has not changed recently, such as a table in the pubs database, the process does not produce any locks because no active transactions have recently updated the table. The storage engine accomplishes this by comparing the log sequence number on the data page to the current active transactions. In databases where most of the data is older than the oldest active transaction, this can reduce locking significantly, enhancing performance.
While locks protect data during transactions, another process, latching, controls access to physical pages. Latches are very lightweight, short-term synchronization objects protecting actions that do not need to be locked for the life of a transaction. When the engine scans a page, it latches the page, reads the row, gives it back to the relational engine, and then unlatches the page again so another process can reach the same data. Through a process called lazy latching, the storage engine optimizes access to the data pages by releasing latches only when a page is also requested by another ongoing process. If no ongoing process requests the same data page, a single latch remains valid for the entire operation on that page.
For improving concurrency in your system, you should focus on the design of the database system and the code objects that touch it. SQL Server 2000 is designed to support multiple terabytes of data and virtually unlimited linear scalability. The role of the DBA is to manage the database life cycle, a cycle of design and optimization of all database components from code to data storage on disk, to ensure that the design continues to meet the service level agreement.
Enhancements have also been made to the physical data structures, to allow more flexibility of design and maintenance.
As a table or index grows, SQL Server allocates new data pages in sets of eight; these are called extents. A row of data cannot cross pages, so it can hold only 8 KB of data, although associated text, ntext, or image columns can be stored on different pages. Tables that have clustered indexes are physically stored in key order on disk. Heaps are tables that do not have clustered indexes and are not sorted. The records are stored in the order in which they were inserted.
SQL Server 2000 supports indexed views, often called materialized views in other database products. When a clustered index is created on a view, the view ceases to be a derived object and becomes a base object stored in the database with the same structure as a table with a clustered index. An indexed view is useful for storing precalculated values, or the result of a complex join, in cases where the maintenance cost does not outweigh the performance gain. In SQL Server 2000 Enterprise Edition, the query processor automatically uses an indexed view whenever this would optimize a query plan. Indexed views can improve query speed on data that is rarely changed but is frequently part of a complex join or calculation query.
In-row text allows you to store small text data in the primary page. For example, if you have a table that has a text column, but the text values are frequently small enough to fit on a normal page with the rest of the row, you can set a threshold on the text column. The threshold determines the size below which data is stored on the primary page rather than a separate text page. This results in much faster performance if the majority of the data will fit on the page, and only a small percentage of the data is actually large enough to justify the creation of a text page.
To determine when to use this new feature, balance the storage density or how many rows are stored on each data page versus the I/O improvement. For example, you have a text column for comments. In the table, you observe that 20 percent of the text values are large, but the other 80 percent are less than 100 bytes. This may seem like a logical candidate for the in-row text solution; however, you should only use in-row text if the data in that column is accessed frequently. If your users access this table frequently, but they do not look at the comments column unless they are doing special research, using in-row text might not be the best answer. The storage density is reduced because fewer rows per page are stored; and because the table contains more pages, table scan response times would be increased. Therefore, the best case for implementing in-row text is when you have a frequently accessed text column that also happens to have many values smaller than 8 K that could be stored in the row.
New Data Types
SQL Server 2000 introduces three new data types. bigint is an 8-byte integer type. sql_variant allows the storage of data values of different data types. The third data type, table, is useful for optimizing performance. Table variables make more efficient use of tempdb, and are faster than temporary tables. Like other variables, they are scoped to the batch in which they are declared. With functionality nearly identical to temporary tables, table variables perform faster than temporary tables or cursors and make better use of server resources. As a rule, always consider the best way to utilize the resources available on your servers when you create code to interact with a database.
Access to data is optimized through the use of indexes. Because indexing requirements are based on usage, incorrect indexing is one of the most common causes of slowness in a database. Standard index maintenance should include periodically verifying the current indexing schema and adjusting it to current system usage by dropping or adding indexes as appropriate.
Several new features in SQL Server 2000 make index maintenance more efficient and easier for administration. These enhancements decrease disk I/O, increasing the performance of index scans. This is especially useful where there is a secondary index available for a range scan.
When you build an index, the storage engine samples the rows and calculates the most efficient way to utilize server resources to build the index. Options allow you to control how indexes are built, so you can choose to control how system resources are allocated. You can use these options to balance resources in a process that is important to performance of the system as a whole, in accordance with your knowledge of the particular database system, so the index build will have the lowest possible impact on transaction processing.
|Memory||Specifies the amount of memory used by any index build.|
|TempDB||Causes disk space used for sorting during the index build to be allocated from tempdb. This can result in more I/O bandwidth if tempdb is on separate disks and can result in a more physically contiguous layout of index pages if the database is low on contiguous space.|
|CPU||Limits number of processors (CPU) used in parallel operations (server-wide).|
For more information about these options, see SQL Server 2000 Books Online.
Another scalability feature for large systems is the parallel index build, which is available in SQL Server 2000 Enterprise Edition. This process is invoked automatically when you issue a single CREATE INDEX statement. The storage engine calculates the requirements for the data and then creates separate threads, each of which builds a section of the index.
Figure 2. Parallel index optimization
An index build can also make use of a shared table scan, further optimizing the process.
SQL Server 2000 supports online reorganization of indexes, a tremendous advancement from earlier versions. Online index reorganization has minimal impact on transaction throughput and can be stopped and restarted at any time without loss of work. The reorganization is accomplished in small increments and is fully recoverable.
As information is inserted, deleted, and updated in a table, the clustered and nonclustered index pages can eventually become fragmented, decreasing the efficiency of range queries against that data. Therefore, it can be beneficial to defragment your indexes periodically. You can use DBCC SHOWCONTIG, which has been improved in SQL Server 2000, to analyze and report fragmentation. For more information, see SQL Server 2000 Books Online.
If you determine that an index is fragmented, use DBCC INDEXDEFRAG to reorganize it. It reorders the pages in logical key order, compacting free space and moving rows within the established extents to conform to the fill factor setting. This enhances read performance by densely populating the pages so less of them must be read during a scan of the data. Running DBCC INDEXDEFRAG has far less impact on online performance than rebuilding the index does, provided the index has been regularly maintained and is not completely fragmented.
DBCC INDEXDEFRAG is one of a number of long-running online administrative operations that use small transactions internally. These small transactions maximize concurrency within the server, allow the operation to be stopped without loss of work, and are fully logged to prevent having to redo them in case of a failure.
The transaction log is a stream of records that records changes to the database from the point the database was created until the current point in time. Every logged operation creates a log record. The log records generated by a transaction are written to disk when the transaction commits. In contrast, the data pages modified by the transaction are not immediately written to disk, but are retained in the SQL Server buffer cache and written to disk some time later. Delaying writes of the data to disk maximizes the efficiency of multiple accesses to the data pages and avoids disrupting scans. Forcing the log to disk on commit guarantees that no committed work is lost if the server goes down.
Recovery ensures that a database is transactionally consistent prior to bringing it online. If a database is transactionally consistent, all committed work is present and any uncommitted work has been undone. The log always defines the correct view of the database. Simply put, recovery is the process of making the data consistent with the transaction log at a given point in time.
Recovery is performed automatically when SQL Server starts, when a database is attached, or as the final step in restoring a database from backups. Recovery performed by SQL Server when it starts is called restartstartup recovery. Recovery from backups is normally due to disk failure. This type of recovery is called media recovery.
Restart recovery is automatic and always recovers to the most recent point in time. In the case of recovery from backups, the DBA may choose to recover to an earlier point in time. This is subject to restrictions. For more information, see SQL Server 2000 Books Online.
Startup recovery occurs automatically each time an instance of SQL Server is started and consists of rolling back any transactions that were incomplete when the instance was last shut down. In the case of recovery from backups, the DBA may choose to recover to an earlier point in time. This is subject to restrictions. For more information, see SQL Server 2000 Books Online. In both cases, recovery operates based on this target point in time.
Recovery consists of two phases:
- Redo all changes until the target point in time is encountered in the transaction log.
- Undo all work performed by transactions that were active at the point where redo stopped.
SQL Server uses checkpoints to speed restart recovery. A checkpoint forces all modified data pages currently in the buffer cache to disk. This creates a starting point for redo portion of recovery. Because checkpoints can be expensive, SQL Server automatically manages checkpoints to maximize performance while minimizing the time it takes to restart.
In SQL Server 2000, writes that complete successfully must be stored durably on disk. If you use write-caching disk storage, work with your storage vendor to ensure that the cache is fault-tolerant. Fault tolerance means that the cache is immune to power failures or operator actions. If your cache is not fault-tolerant, it should be disabled.
Logical Log Marks
In SQL Server 7.0, it was possible to recover to any specified point in time. In the case of hardware failure, the restore process was fairly straightforward. However, another threat to a database is the possibility that invalid data may be entered or that valid data may be destroyed by a user's action. In this case, you need to determine when the problem transaction began. In SQL Server 7.0, the only way to do this was to restore logs to a copy of the database until the problem recurred; then you could run your restore to the production image up to a point in time just prior to the discovered time of error.
In SQL Server 2000, you can mark transactions in the log. Later, if you need to restore, you can reference the mark that was used at the time of execution, rather than using wall-clock time. To do this, use a named BEGIN TRANSACTION statement and the WITH MARK [description] clause. The marks are stored in msdb. Recovery can include or stop right before a transaction that contains the mark. For example, if you have a process that runs in batch and changes many records, you can use this feature to ensure that if the process is run under the wrong circumstances, you can roll the data back to the point in time that the command was executed.
Mark names do not need to be unique. To indicate which transaction you need, specify a datetime value. The syntax for this is:
RESTORE LOG WITH [ STOPBEFOREMARK|STOPAFTERMARK ] = @TaggedTransaction AFTER @datetime
You can also use marks in a distributed transaction, known as distributed marks, to support recovery of multiple related databases to a transactionally consistent state. These related databases might reside on the same or different instances of SQL Server. You can set distributed marks across a set of databases periodically (for example, once every five minutes). If the transaction log of one of the databases is damaged, you must recover the set of databases to an earlier point in time. The distributed mark provides this point. Using distributed marks negates the worry of coordinating precise timing of backups for multiple related databases. For more information, see "Recovering to a Named Transaction" in SQL Server 2000 Books Online.
Shrinking the Transaction Log
Log shrink operations were not executed immediately in SQL Server 7.0. They were deferred until the transaction log was next backed up or truncated. This confused many SQL Server 7.0 customers. SQL Server 2000 shrinks the log as much as possible and then indicates if further shrinking will be possible after a log backup. In this case, run the shrink command again after the log backup has completed. For more information, see "Shrinking the Transaction Log" in SQL Server 2000 Books Online.
The size of the log will be based on your current recovery model and your application design. If you find that you need to shrink to log periodically, look beyond the symptom to the cause. You should further investigate what is causing the log to fill up, rather than focus on constant maintenance with the shrink command.
Recovery models were added to SQL Server 2000 to facilitate data protection planning. They clarify tradeoffs between performance, log space requirements, and protection from media (disk) failure. There are three models: Simple Recovery, Full Recovery, and Bulk-Logged.
The choice of recovery model is based on database usage and availability requirements and helps determine appropriate backup and restore procedures. Recovery models only apply to media recovery, that is, recovery from backups. Restart recovery recovers all committed work. For more information, see "Selecting a Recovery Model" in SQL Server 2000 Books Online.
You can easily transition between recovery models. For example, on a very large database, you can use full or bulk logged, or both. You can use full during the day and bulk_logged at night, during a data load process that consists of bulk insert and rebuilding indexes. You can also switch to bulk logging while you run a data load and switch back to full mode, run a transaction log backup, and be able to restore to that point in time without having to run a full database backup. This feature allows you to do the bulk processing more efficiently; all you need to do is make a transaction log backup afterwards.
To change recovery models, use the following syntax:
ALTER DATABASE SET RECOVERY RecoveryModel
For more information, see "Switching Recovery Models" in SQL Server 2000 Books Online.
Simple recovery model
The Simple Recovery model typically requires less log space, but it incurs the greatest potential work loss if data or log files are damaged. Only events needed for basic recovery are logged. Using the Simple Recovery Model, only full database and differential database backups are available. In the event of a failure, all committed work since the last backup must be redone. This model is the simplest to administer, but it is not a good choice for a mission-critical application where loss of committed work cannot be tolerated.
This model is similar to the truncate log on checkpoint option in SQL Server 7.0 and earlier versions.
Full recovery model
In the Full Recovery model, everything is logged. Full Recovery model provides complete protection against work loss from a damaged data file. If the transaction log is damaged, work committed since the most recent log backup is lost and must be redone manually.
Even when you use the Full Recovery model, it is important to use fault-tolerant disks for the transaction log to prevent data loss. The Full Recovery model also allows recovery to any specific point in time.
Bulk-logged recovery model
The Bulk-Logged Recovery model provides the highest performance for bulk operations. These operations also consume less log space than they do under the Full Recovery model. For example, the allocation of a new page is logged, but the data inserted onto the page is not. In SQL Server 2000, bulk operations consist of bulk load (BCP and BULK INSERT, including when they run within a DTS package), SELECT INTO, CREATE INDEX, WRITETEXT, and UPDATETEXT.
Compared with the Full Recovery model, the Bulk-Logged Recovery model minimizes logging for bulk operations. Keep in mind that in the event that recovery becomes necessary, if the log is damaged or if bulk operations have occurred since the most recent log backup, changes made in the database since the last log backup are lost.
This model does not support recovery to a specific point in time, but it will allow recovery to the end of a transaction log backup containing bulk changes. Transaction log backups made using the Bulk-Logged Recovery model contain the extents modified by bulk operations. This feature improves support for log shipping, because you no longer need to worry that a bulk operation will invalidate your backups. SQL Server maintains a bitmap to track the data extents modified, which optimizes the process by which SQL Server identifies changes.
Improved backup functionality
In addition to the introduction of recovery models to simplify data protection in general, SQL Server 2000 has improved manageability: snapshot technology, differential backups, and security have been enhanced.
- The transaction log backup chain is never broken. In SQL Server 7.0, certain operations, such as adding a file to a database, broke the log chain and required a subsequent full database backup.
- Backup operations do not conflict with applications or other administrative actions. For example, backups can occur concurrently with bulk operations such as create index and bulk load.
- Log and file backups can occur concurrently.
Unattended backup operations, regardless of system activity, are also well supported in SQL Server 2000.
SQL Server supports snapshot backup and restore technologies in conjunction with independent hardware and software vendors. Snapshot backups minimize or eliminate the use of server resources to accomplish the backup. This is especially beneficial for moderate to very large databases in which availability is extremely important. The primary benefits of this technology are:
- A backup can be created in a very short time, usually measured in seconds, with little or no impact on the server.
- A disk backup can be used to restore a database just as quickly.
- Another host can create a backup with no impact on the production system.
- A copy of a production database can be created instantly for reporting or testing.
Snapshot backups and restores are accomplished in cooperation with third-party hardware and/or software vendors who use features of SQL Server 2000 designed for this purpose. The backup technology creates an instantaneous copy of the data being backed up, usually by splitting a mirrored set of disks. At restore time, the original is immediately available. The underlying disks are synchronized in the background, resulting in almost instantaneous restores.
Differential database backups can be completed in a time that is proportional to the amount of data changed since the last full backup. The less your data has changed, the quicker the backup. SQL Server 2000 uses a bitmap to track data extents modified since the most recent database or file backup to enable them to be located efficiently. In addition, SQL Server 2000 supports file differential backups.
Backups still accumulate changes made to the database since the most recent full backup, functioning the same way in the event of recovery. They are significantly faster, however, because they only record the small amount of information that has changed, especially for very large databases that contain only a small amount of changed data.
For added security, you can implement password protection for your backup media and backup sets. This helps prevent unauthorized users from adding to your backups or restoring to your database.
Several administrative features of the storage engine have been enhanced in SQL Server 2000.
The DBCCs provide a variety of administrative capabilities, including the CHECK commands for verifying database consistency.
Experience with SQL Server 7.0 and SQL Server 2000 has shown that database inconsistency is caused by hardware problems that may or may not be detected by the database engine or applications during normal operation. This is particularly applicable to data that is accessed infrequently. In response to this need, SQL Server 2000 introduces a checking mode, Physical_Only, which is designed to detect most hardware-caused problems. It is very fast, approximately disk scan speed, and is not resource intensive.
Due to fundamental architectural improvements in the SQL Server storage engine, which started with SQL Server 7.0, it is not necessary to run database verification as part of normal maintenance. However, Microsoft remains committed to database verification tools as an important part of managing mission critical data. Microsoft recommends that you:
- Run the Physical_Only check occasionally, depending on your confidence in underlying hardware, particularly the disk subsystems.
- Run a complete database check at critical times, such as a hardware or software upgrade, or whenever a problem is suspected regardless of cause.
Microsoft does not recommend running a complete check as part of regular maintenance.
SQL Server 2000 also includes important enhancements to database verification:
- By default, checking is fully online. Online checking has low impact on the transaction workload. This impact will vary depending on the system load, hardware configuration, and speed of tempdb. Microsoft has measured this impact at 15 to 20 percent with a medium OLTP workload (50 percent CPU). The TABLOCK option is provided to force the check to take shared table locks, which enables it to run faster but will prevent updates.
- Checking is done in parallel on symmetric multiprocessing (SMP) computers, limited by the maximum degree of parallelism you have set for the instance of SQL Server.
SQL Server 2000 check commands continue to support the repair functionality introduced in SQL Server 7.0. Offline repair can provide an alternative to a restore from backups in some situations.
Database State Control
SQL Server 2000 includes enhancements to the ALTER DATABASE statement that allow more control of database states through Transact-SQL. All database options can now be modified with greater control through the ALTER DATABASE command; sp_dboption and databaseproperty() will no longer be updated in future releases. The Transact-SQL commands sp_helpdb and DatabasePropertyEx() provide information about the state of your database.
The following table lists database state options.
|Option type||Available settings|
SQL Server also sets the following states in reaction to conditions within the database: restoring, recovering, and suspect. The database options can be set by using the SET clause of the ALTER DATABASE statement, the sp_dboption system stored procedure, or, in some cases, SQL Server Enterprise Manager.
When the database state is changed, the session making changes to the database state remains connected, while sessions inconsistent with the new state can be terminated and their transactions rolled back. Session termination options include the following:
- Terminate immediately
- Terminate after a specified time
- Allow the ongoing processes to complete normally
- Check for activity and disregard the state change if active user sessions are found
Here are two examples of the syntax:
alter database accting set read_only with rollback immediate alter database accting set single_user with rollback after 60 seconds
For more information, see "Setting Database Options" in SQL Server 2000 Books Online.
System Process IDs and Units of Work
One additional administrative enhancement that helps when you need to stop a process is the KILL command. The KILL command has been enhanced with status feedback. So, if you want to learn the status of an outstanding KILL command, run the following:
KILL SPID WITH STATUSONLY
If you try to stop a system process ID (SPID) that is being stopped by another KILL command, the system returns the same status information.
In SQL Server 2000, MS DTC transactions can exist without an associated connection or SPID. Therefore, a connection can be used for other processes while waiting for a transaction or unit of work to complete. When the MS DTC transaction manager sends a message that it has completed the task, you can either commit or roll back the transaction. This is referred to as a unit of work (UOW), which is the transaction identifier used by MS DTC for the transaction. A UOW does not have a SPID.
For more information, see SQL Server 2000 Books Online.
In SQL Server 2000, usage-based performance tuning is managed dynamically, without required or recommended manual adjustments. The static parameters have been eliminated, but administrative control has been retained for certain resources (for example, setting an upper limit on the amount of memory SQL Server can use). This method is far more accurate and responsive than a manually calculated system based on averages and estimates. This allows you to concentrate on the design aspects of database management. Traditional database systems require a great deal of manual management and tuning. For example, to tune the system in response to usage, the DBA would be required to monitor the system, recording a vast amount of statistics over time, in order to select a static setting that seems to provide the optimal advantage for the system. Then the DBA would re-evaluate the system to judge what effect the new setting has, and the tuning process would begin again.
SQL Server 2000 introduces a dynamic algorithm into the storage engine, which actively monitors usage of the server and adjusts the settings internally. Dynamic feedback and analysis in SQL Server 2000 keeps the setting within 10 percent of the absolute optimal value (see Figure 3), resulting in a better-tuned and highly adaptive system.
Figure 3. Adaptive algorithm tuning
SQL Server 2000 balances processing across all available CPUs in coordination with the Windows 2000 operating system. If you are running a dedicated instance of SQL Server, and no other applications produce a load on the same resources, leave the processor-related settings at their default to make full use of all the processors. SQL Server can take advantage of parallel processing across multiple processors for queries, index builds, DBCCs and other operations. For more information on parallelism, see "Degree of Parallelism" in SQL Server 2000 Books Online.
SQL Server 2000 Standard Edition can support up to four processors and 2 GB of physical memory (RAM). Enterprise Edition can scale upwards to new levels, to support up to 32 processors and 64 GB of physical memory (RAM).
The main source of memory for an instance of SQL Server is called its memory pool. Almost all data structures that use memory in an instance of SQL Server are allocated from the memory pool. Examples of objects allocated from the memory pool include the buffer cache, where recently read data pages are stored, and the procedure cache, which holds recent execution plans.
The assignments within the memory pool are highly dynamic. To optimize performance, SQL Server constantly adjusts the amounts of the memory pool assigned to the various areas. For example, when the number of stored execution plans is lower, the memory pool is adjusted to make optimal use of the resources by making more memory available for the data cache.
SQL Server 2000 is designed to use memory to minimize disk I/O as much as possible. To accomplish this, SQL Server uses the buffer cache to hold recently referenced data, in physical memory (RAM), where it can be reused. One potential way to reduce disk I/O and speed up your database system would be to add to the physical memory (RAM) available to SQL Server.
Normally, memory settings do not require any adjustment. However, they can be controlled in certain situations. For example, memory requires special attention where you are running multiple instances of SQL Server on the same server, especially if you use failover clustering. You also need to monitor memory usage if you are running applications in addition to SQL Server on the same server.
Figure 4: Memory usage with greater than 4GB RAM
SQL Server 2000 takes advantage of new capabilities in Windows 2000 to address physical memory (RAM) beyond 3GB. See Figure 4. SQL Server 2000 Enterprise Edition can use as much memory as Windows 2000 Advanced Server or Windows 2000 Datacenter Server allows.
For more information on large memory support in SQL Server 2000, see "Managing AWE Memory" in SQL Server 2000 Books Online.
Files, Filegroups, and Disks
SQL Server stores data and the log in disk files. In a basic installation, and as a default, data and log files are created in the default location specified in the server configuration. However, to maximize performance and manageability, you can apply a few basic principles:
- Spread data over as many disks, channels, and controllers as possible.
In general, the more disks (spindles) you have (regardless of their individual size) and the faster your access to them (controllers and channels), the faster the storage engine can read and write data. The larger your system usage becomes, the more important it is to separate the data files from log files by storing them on different sets of physical drives. Also, because the use of tempdb has changed, you should now store tempdb on a large set of disks, for example, with the data files or on a set of disks.
- Use filegroups to make your enterprise database more manageable.
Every database begins with one default filegroup. Because SQL Server 2000 can work effectively without additional filegroups, many systems will not need to add user-defined filegroups. However, as a system grows, the use of additional filegroups can provide more manageability, when implemented and maintained by a qualified DBA.
In SQL Server 2000, if you set a particular filegroup within a database to read-only, the data on that filegroup cannot be altered, but catalog information such as permissions can still be managed.
Note In SQL Server 2000, the number of asynchronous I/Os is now managed dynamically inside the database engine, and is not influenced by the number of files or filegroups used, as was the case in SQL Server 7.0.
When implementing or optimizing a database design, the Database Administrator (Database System Engineer) needs to consider the configuration of the database storage components, particularly the layout of physical and logical disks and the arrangement of the database files across disks.
For DBAs, increased flexibility and control over performance provides the freedom to focus their database technology skills and experience on managing the database code, design, and storage components as a unified approach to database system management. The SQL Server 2000 database engine provides general extensibility and flexibility for a wide variety of database implementations.
For information about operating a successful database system with professional people and processes, read about the Microsoft Certified Database Administrator program, and Microsoft Operations Framework.
SQL Server 2000 Books Online is included with SQL Server 2000. It is also available online at http://www.microsoft.com/sql/productdoc/
For more information about hardware and SQL Server, see SQL Server 2000 on Large Servers in SQL Server 2000 Books Online.
For more information on capacity planning techniques, see the Microsoft SQL Server 2000 Administrator's Companion.
Technical information about SQL Server for developers is located in the MSDN SQL Server home page.