Federated Database Servers Planning Guidelines
Building a federation of database servers involves designing a set of distributed partitioned views that spreads data across servers. Partitioning works well if the tables in the database are naturally divisible into similar partitions where most of the rows accessed by any SQL statement can be put on the same member server. Tables are clustered in related units. For example, suppose the entry of an order references the Orders, Customers, and Parts tables, together with all tables that record the relationships between customers, orders, and parts. Partitions work best if all the rows in a logical cluster can be put on the same member server.
Partitioning is most effective if all the tables in a database can be partitioned symmetrically in the following ways:
Related data is put on the same member server, so that most SQL statements routed to the correct member server will have minimal, if any, requirements for data on other member servers. A distributed partitioned view design goal can be stated as an 80/20 rule: Design partitions so that most SQL statements can be routed to a member server, where at least 80 percent of the data is on that server, and distributed queries are required for 20 percent or less of the data. A good test of whether this can be achieved is to see whether the partition allows for all rows to be put on the same member server as all their referencing foreign key rows. Database designs that support this goal are good candidates for partitioning.
The data is partitioned uniformly across the member servers.
For example, suppose a company has divided North America into regions. Each employee works in one region, and customers make most of their purchases in the state or province where they live. The region and employee tables are partitioned along regions. Customers are partitioned between regions by their state or province. Although some queries require data from multiple regions, the data needed for most queries is on the server for one region. Applications route SQL statements to the member server that contains the region inferred from the context of the user input.
Although symmetric partitions are the ideal goal, most applications have complex data access patterns that prevent symmetrical partitioning. Asymmetric partitions cause some member servers to assume larger roles than others. For example, only some of the tables in a database may be partitioned, with the tables that have not been partitioned remaining on the original server. Asymmetric partitions can provide much of the performance of a symmetric partition, with the following important benefits:
Dramatically improving the performance of a database that cannot be symmetrically partitioned by asymmetrically partitioning some of its tables.
Successfully partitioning a large existing system by making a series of iterative, asymmetric improvements. The tables chosen for partitioning in each step are typically those that will give the highest performance gain at that time.
In an asymmetric approach, the original server generally retains some tables that did not fit the partitioning scheme. The performance of these remaining tables is typically faster than in the original system because the member tables move to member servers, reducing the load on the original server.
Many databases can be partitioned in more than one way. The specific partitions chosen for implementation must be those that best meet the requirements of the typical range of SQL statements executed by the business services tier.