An old saying about storage claims that data will expand to consume the available storage space, and many applications support this claim. E-commerce, imaging, data warehousing, enterprise resource planning (ERP), and customer relationship management (CRM) are applications that fill storage media quickly and seemingly without end. Data accessibility for these applications needs to be fast, and availability is paramount. Storage Area Networks (SANs) provide high-speed storage pools through a group of connected servers and high-speed workstations. (For more information about the rising need for SANs, see Mark Smith, Editorial, "Storage Area Networks," April 1999.)
Outside the mainframe world, a discrete instance of each crucial application (e.g., ERP) resides on a server (i.e., 10 servers will house 10 applications). This trend arises from the modularity of systems—especially client/server applications—and the history of adding applications after other successful application deployments. System modularity creates server farms and can result in multiple instances of data. If these instances need to relate to each other, you must use replication or synchronization methods to resolve them. Therefore, the monolithic server data becomes painful to organize and manage. SANs help ease this administrative burden.
SANs are networks within networks. The SAN design disassociates server applications from data storage without sacrificing storage access times and lets numerous servers and applications access the data.
SANs minimize the need for servers with discrete, enormous stores of data, and you can balance reliability and availability needs. You can also amortize storage costs over several servers and their applications.
SAN storage farms support many host OSs and data filing systems. The host OS defines how SAN members access a file system. SANs logically appear to Windows NT as locally accessible volumes under FAT or NTFS.
Servers (or high-speed I/O workstations) with connections to a high-speed I/O channel make up SANs. For example, in a SAN, which Figure 1 shows, servers and workstations connect to the hub through a switch. SCSI or fibre channel connects the workstations and servers to the storage. The SAN connection method dictates the SAN design and affects the extensibility and accessibility of the data that the SAN stores. Let's examine the available methods and their features.
The SCSI Method
SCSI is a good connection method for SANs because most servers have a SCSI host bus adapter (HBA). SCSI is a parallel bus (i.e., each bit occupies a separate wire) that has a 25m distance limitation (i.e., length). Wide SCSI can send two bytes (16 bits over 16 wire pairs) per time frame, and Fast SCSI increases the data transfer rate from the standard 5MBps to 10MBps. Ultra SCSI enhances the speed and the number of bytes that it can send concurrently. Ultra 160 and Ultra3 SCSI increase the speed to 160MBps.
SANs use SCSI because of its speed. At 160MBps, an Ultra 160 SCSI can far exceed (at burst speed) a full duplex Ethernet's speed (200 megabits per second), and even comes close to full duplex gigabit Ethernet (theoretically 2000 megabits per second, but closer to 186MBps before packetizing overhead).
SCSI has its limitations. One problem with SCSI is that an electrical interruption, a SCSI reset, occurs when you add or remove devices from the SCSI bus. During the reset, the bus loses pending commands. Although some vendors have made their devices less vulnerable to reset, a post-reset hiatus often occurs while the device sorts out the commands that were pending.
SCSI connections also have practical limitations. Each HBA uses one SCSI ID out of the seven available. As Figure 2 shows, three hosts that each have an internal disk and a SAN connection use up six of the seven IDs, leaving only one SCSI ID available for a storage unit, such as a RAID system. (RAID subsystem disks represent one SCSI ID because of the subsystem cabinet's intelligent controllers. The cabinets can contain several SCSI disks, which they represent as one ID to a host server.) Fast, Wide, and Ultra SCSIs increase the number of SCSI IDs and devices available on a bus to 16, but as speed increases, the maximum distance between devices drops from 25m to 12m, just as it does with Ethernet. SCSI bus extenders and other repeating devices extend distances, but these devices are expensive. Despite SCSI's limitations and problems, it's the least expensive way to provide SANs with multiple-host connectivity.
The High-Fibre Diet
The other SAN connection method, fibre channel, removes many SCSI limitations. To use fibre channel, many IT professionals have the impression that they must string fibre (instead of copper SCSI) between fibre channel devices, but fibre isn't necessary. Fibre merely increases the maximum distance of fibre channel SANs from 25m to 10km.
Fibre channel mimics the SCSI command set but uses a different communication method that handily bypasses SCSI limitations. SAN fibre channel connections come in three varieties: point-to-point, arbitrated loop, and switched topologies. A point-to-point connection has one hard disk connected to a host. An arbitrated loop is similar to a SCSI, but it's faster than SCSI, has a 127-node maximum, and is much easier to cable. Switched topologies are similar to switched Ethernet and Token-Ring networking and have more than 16 million possible nodes.
The most commonly deployed fibre channel SANs use arbitrated loops for a couple of reasons. Fibre Channel Arbitrated Loops (FC-ALs) cost about the same as Ultra 2 SCSI in terms of HBA and hard disk expense. And FC-ALs provide up to 127 nodes and a 10km length, so you can easily scale FC-ALs by adding workstations, servers, or storage devices. FC-AL SANs typically use hubs for bus stability.
Similarly to SCSI, FC-ALs have devices that electrically (if optically) enter and leave the channel. A hub stabilizes the connectivity electrically and logically the same way an Ethernet 100Base-TX hub adds and removes connections electrically to stabilize the Ethernet logical bus.
Fibre channel SAN components are similar to SCSI SAN components and consist of an HBA, a cable, and a disk subsystem. A fibre channel hard disk takes one fibre channel ID. Some fibre channel subsystems take one ID, but this occurrence isn't as common as with RAID system disks.
Disk enclosures that have quick-disconnect hub connections to the FC-AL are available. Unlike with SCSI RAID disk cabinets, you can chain these disk enclosures together quickly to create huge logical storage areas that are visible to FC-AL hosts. You can also use SCSI-fibre channel bridges to link legacy SCSI-based systems to fibre channel.
Creating large SANs makes sense when you need to use large data stores over a wide geography, such as a large WAN. Even the 10km fibre channel limitation can be detrimental when SAN data needs to be accessible to a wider geography, so groups are attempting to extend SANs' distance. SANs running over asynchronous transfer mode (ATM) or Synchronous Optical Network (SONET) are possible, and SanCastle Technologies has proposed piping or tunneling fibre channel SANs through Gigabit Ethernet. (For information about SanCastle's gigabit network technology, see http://www.sancastle.com.) As SAN popularity increases and quantity drives down the price of tunneling technologies, these ideas will become more practical.
When you begin investigating SAN deployment, you have a few resources available. Many large vendors of computers, disk controllers, and hard disks include product-specific SAN information on their Web sites. This information typically describes how the product relates to SANs, SCSI, and fibre channel.
Microsoft also has a Fibre Channel SAN Management Work Program (SNMWG-FC) that the company designed to aid the development of fibre channel-based management. Microsoft has proposed combining its Windows Management Instrumentation (WMI) and the Common Information Model (CIM) to develop common management services for SAN data. This combination would provide cross-platform data-access support and let users map and aggregate data from disparate data sources. For more information about SNMWG-FC, see http://www.microsoft.com/winhec/presents/enterprise/enterprise9.htm.
How NT Works with SANs
A SAN's cost can be high because a SAN is a huge pool of expensive storage devices. Most organizations use SANs in heterogeneous environments to spread the storage cost over several hosts and workstations. In this model, NT can share a SAN pool with non-Server Message Block (SMB) filing systems—usually UNIX filing systems. NT typically dominates the LUNs that represent devices in a fibre channel loop chain, and NT won't relinquish control of a LUN to a requestor, such as UNIX. Under NT, relinquishing control would destabilize the device because a change in the device's state would occur that NT doesn't expect. A UNIX or other filing system can dominate the LUN, but NT and UNIX can't both dominate.
Ethernet connects storage servers that don't fall into the category of SAN devices (e.g., Network Attached Storage—NAS—appliances) to LANs. The media in storage servers appear as ordinary shares on the LAN, whereas SAN devices appear (e.g., in Windows Explorer) as drives first and shares of the drive second. For more information about the differences of storage servers and SANs, see C. Thi Nguyen and Barrie Sosinsky, "NAS vs. SAN," page 87.
Under NTFS, no filing system mount currently lets NT share the same logical volume in a SAN with UNIX. Several organizations are working on SMB emulators that will make one logical storage area available to SMB and popular UNIX filing systems, such as NFS. These emulator devices would arbitrate calls that various hosts (e.g., NT, NFS) make, then translate the call into a common internal access command to complete read and write requests that users make to the emulation device. You can use NFS (which third-party software vendors supply) with NT servers, but NFS file-locking mechanisms are less desirable to NT than NTFS because the NFS file-locking mechanisms are a subset of SMB and NTFS.
Because SANs can consist of many RAID arrays and RAID arrays consist of a controller and multiple disks, an FC-AL's LUN number (1 to 127) can represent many devices. Ironically, heterogeneous networks that have NTFS and other file-system devices don't support multiple OSs, yet dual RAID controllers (which diminish the chance of one point of failure) have cache for multiple OSs. Some vendors, such as Mylex, use a technique called cache coherency to mirror controller cache so that if a failure occurs, calls that the cache accepted don't result in data delivery from the wrong OS. NTFS will survive a controller crash, as will the data on the surviving RAID array.
SANs that consist solely of NT servers are somewhat rare but are increasing in number. You can connect two or more NT servers to the same FC-AL LUN to let them share the same physical and logical media. To the server or its applications, the LUN appears as a storage device, such as an internal disk or external RAID device. You manage the SAN device (e.g., formatting, use, tape backup) the same way you manage locally connected storage media. File and record-locking mechanisms also follow the standard NT rules of management because the SAN media come preformatted with NTFS.
Today, NT ordinarily shares a SAN with non-NT hosts, and NT rarely shares data with other devices in the SAN unless the device uses a foreign host call, such as an ODBC call made to an Oracle database running on a UNIX server. Instead, administrators dedicate fractions of SANs to NT use and will continue to use this methodology in Windows 2000 (Win2K) until SMB emulators arrive or someone writes NTFS, UNIX, or other file-system access translators for Win2K and NT.
Closing the Loop
Microsoft has used SANs to demonstrate its potential Microsoft Exchange Server messaging throughput capability and has put a great deal of emphasis on Win2K's SAN and fibre channel capabilities. Although computer vendors aren't anxious to give up their internal-disk storage revenue, they also realize that data-management and SCSI ID limitations force organizations that need huge storage space (often on a variety of devices) to use SANs to perform the job. Fortunately, SAN technology has evolved dramatically, and SAN extensibility is mind-boggling. Whether an organization is large or small, the available options and the lower cost of fibre channel technology make SANs a very reasonable alternative to monolithic server farms.