Scalability Wars

Only a year ago, I thought SQL Server might take several more years to reach the high level of scalability that Oracle and other database vendors had demonstrated with their TPC-C benchmarks, but SQL Server 2000's ability to scale out made short work of that lofty goal. Although recently disallowed for a minor technicality (for more information, see "TPC Overturns SQL Server Benchmark Record," page 15), SQL Server's recent TPC-C scores with 12 clustered systems decisively answered the question of scaling up versus scaling out.

Database products implement one of two competing database-clustering technologies: the shared-disk architecture that the Oracle Parallel Server uses or the shared-nothing technology that SQL Server 2000 and IBM's DB2 use. The shared-disk approach lets multiple servers access a shared set of disk devices. Shared-disk clusters give each cluster node a view of the entire database, which makes implementing applications on a shared-disk cluster similar to implementing applications on one server. Shared-disk clusters have an availability advantage: If a node in a shared-disk cluster fails, the cluster continues to function because all the cluster nodes always have access to the entire database. The challenge is making sure that each cluster node always has the same view of the data. To meet this challenge, the database vendor can use a shared lock manager, which governs disk writes and keeps each node's cache data up-to-date. The lock manager must communicate with all the nodes on every transaction. As you might expect, the lock manager can quickly become the limiting factor in shared-disk scalability.

In contrast to the shared-disk approach, each server in a shared-nothing approach essentially operates as an independent entity. Each server owns and manages its disk subsystem. The challenge in a shared-nothing cluster lies in partitioning the data and enabling applications to take advantage of multiple nodes. Shared-nothing clusters don't use a shared lock manager because each node manages that node's locks and data cache. To share data, the nodes send remote procedure calls between the clustered servers. Because the nodes exchange only relevant data, messaging overhead is much lower than in the shared-disk approach. In addition, the absence of a shared lock manager removes the shared-disk method's primary scalability limitation. However, node failure is an important consideration in shared-nothing database clusters: If a node fails, the rest of the cluster can't access the data that the failed node owns. (Microsoft currently recommends that you implement each node as a failover cluster to address this potential problem.)

SQL Server isn't the first database to use a shared-nothing approach to clustering. Most major database products (e.g., IBM's DB2 Enterprise Edition, Informix's Extended Parallel Server, and NCR's Teradata) implement shared-nothing clusters. Oracle's shared-disk database-clustering technology currently offers better manageability and availability, but Microsoft's upcoming shared-nothing implementation already has an edge in performance because it uses commodity hardware. SQL Server 2000's cluster management capabilities lag behind some of the established enterprise players' tools, but Microsoft has promised that cluster management tools will be available with SQL Server Yukon, the next version of SQL Server.

Comments

Plain text