Distributed Partitioned View Recommendations
With the implementation of distributed partitioned views, Microsoft SQL Server 2000 Enterprise Edition allows for high-end users to scale their servers to meet the requirements of large Web sites and enterprise environments. Before choosing to create a federated server implementation, and consequently partition your tables, you need to decide whether distributed partitioned views are best for your overall environment. The implementation of distributed partitioned views can bring about a lot of complexity to the management and operation of the overall environment. Currently, the percentage of companies that may need to implement this scale-out behavior to improve their environment is very small.
Distributed partitioned views add a scale-out capability to the database backend by transparently partitioning the data across a group of servers. This implementation is designed for high-end OLTP and Web sites with individual SQL statements retrieving minimal data as compared to the decision support, Analysis Services (formerly OLAP Services). The following topics include high-level considerations for any environment that may use distributed partitioned views as part of the database implementation.
Scaling to the Limits of a Single Database Server
Have you already scaled to the limits of a single database server? Scaling refers to the process of adding resources to a tier so that it can handle increased workloads. Scaling can be done in one of these ways:
- Scale up. Increases the processing power of a server by using a more powerful computer.
- Scale out. Increases the processing power of a system designed in a modular fashion, such as becoming a cluster of computers, by adding one or more additional computers (also called nodes) to the system.
While a federation of servers implementing distributed partitioned views allow for a scale-out environment model, scaling up on a single server should be considered first. Distributed partitioned views should be considered a solution only for database systems which have already fully scaled up and at this point are looking to scale out their data services tier. Many perceived problems might well be resolved more efficiently through implementing a scale-up philosophy and adding additional resources for the single database server to use. By scaling up instead of scaling out, the change to the environment should be less intrusive, minimizing such things as application code changes and database design issues, while allowing for efficient memory usage on the single server.
Database and Table Design for a Distributed Environment
Does your database and table design lend itself to a distributed environment?
Consider the current database and table design of your environment. To partition a table successfully, the design must lend itself to this type of an implementation. The goal is to design partitions so that most of the queries are run locally, not remotely. Considerations such as which tables to partition are crucial to the overall performance of the system.
For example, an environment may have Orders, Customers, and Items tables, with Orders and Customers consistently changing and the Items table remaining fairly constant. As a practice, if the Items table, for example, has very few INSERT, UPDATE, or DELETE statements executed against it, you may want to clone this data across all partitions to keep the clusters of tables and data retrieved together.
The partitioning for an updatable view is done on a unique primary key constraint (horizontal partitioning). Therefore, consider not only how the table might be divided, but also the layout of your data today and in the future. Consider your partitioning column and the type of queries that will be run against this column.
In designing a partitioning scheme, it must be clear what data belongs to each partition. The partitioning column cannot be an identity, default, or timestamp column. You may want to use a hashed value for your key to get a relatively even distribution of each one of your partitions when new keys are inserted or deleted. If, in the future, tables need to be re-partitioned, an overhead will be incurred in modifying the view and underlying table schema, and possibly in changing the data-routing rules of the business tier.
Consider why you may need to implement a federated database design. If you have an extremely high-hit database, it may not be that database or table that you want to distribute. Rather, if you could minimize the load associated with other table queries on that server by distributing them, this could be an efficient implementation as well. It is recommended that most of the SQL statements be routed directly to the member server with a large percent of the necessary data, therefore minimizing the distributed nature of the design. For more information, see Designing Partitions.
Overall Query Performance
Have you considered overall query performance? Performance should be a large consideration in any database design and implementation. When distributing data across multiple servers, there is a performance hit for querying a remote server. Analysis should be done on the type of queries implemented in the OLTP environment to get a baseline on what data is being touched by specific queries. Because there is a certain degree of overhead in running distributed queries, this overhead may in some cases outweigh the benefits of distributing your tables. It is important to analyze queries and their generated plans to gather information as to how you want to distribute your data, whether you need to distribute your data, and to modify long-running queries. Removing redundant trips to a remote server and being able to cache compiled plans and execution plans can increase overall performance; however, the more remote trips (some are inherent in a distributed environment), the more performance will decrease.
Application and Business-Tier Design
What about your application and business-tier design? Consider that you can gain considerable performance in your federated database tier throughout the environment by connecting to the correct server (the server with the data you need or a server that can most efficiently process the query) the majority of the time. Within a business-tier logic that can route queries to the appropriate servers, the system-wide efficiency can be increased significantly. This routing logic can be implemented in the business tier or even within a table at the data tier, with the advent of data-routing rules. Data-routing rules are a set of rules that can be used to send a query to the server that contains the appropriate data. Implementing data-routing rules can require the initial overhead of making changes to your application or other tiers within your environment. For more information, see Designing Applications to Use Federated Database Servers.
Backup And Recovery Planning for Multiple Servers
How will you maintain your backup and recovery plan across multiple servers? Overall, manageability of your environment can become more challenging when more servers are added to the enterprise. The backup and restore features that are included with SQL Server 2000 become more complex when working with a distributed environment. Consider the need to backup and restore databases across partitions at the same logical point in time. SQL Server 2000 has made these restores easier with the implementation of marked transactions. For more information, see Backing Up and Restoring Federated Database Servers and Recovering to a Named Transaction.