Technology is a funny thing. It can accomplish amazing, unimaginable feats—and then make you numb to them. I distinctly remember the first time I saw a demonstration of RAID 5. A colleague pulled a drive out of a running server . . . and it kept running! I was flabbergasted that it actually worked. Fortunately, being in the technology industry, I get to periodically experience these "Oh wow!" moments. It's part of why I love my job so much.
SANs are another example of technological magic. SANs allow us to pool storage in central locations, and then dole it out to individual servers, as needed. In this article, I cover how we can intelligently and responsibly use SANs in Microsoft SharePoint environments. Because SharePoint employs a complicated infrastructure, I also discuss how you can use SANs with supporting technologies such as Microsoft SQL Server and virtualization hosts. By the end of this article, you'll have a good understanding of how SANs work with SharePoint and its friends, as well as how best to use the technology in your production environments.
Some SAN Basics
At a basic, physical level, a SAN is a box the size of a refrigerator, filled with hard disks. (Blinking lights everywhere; it can be quite soothing sometimes.) These disks can be carved into an infinite number of combinations, using every RAID level we've ever heard of, and even a couple we haven't. These combinations (called logical unit numbers-LUNs) are presented to servers and appear as local storage.
This approach provides many benefits. It allows us to centralize storage, making it easier to get a snapshot of our storage usage and needs. It also gives us the flexibility of adding storage to our servers without doing the "RAID drive shuffle" each time we want to expand the storage. SANs also provide additional disaster recovery options. Most SANs support mirroring drives within the SAN, and many offer the ability to mirror the drives to another SAN enclosure.
In some cases, SAN storage can be faster than the DAS that might come with a server. SANs are highly tuned devices and routinely have many gigabytes of cache, which can result in very good performance.
A word of caution about performance is due here. To many of us, SANs are magic boxes that provide bountiful storage and unlimited performance. We assume they're configured correctly and that they will be fast. That isn't always a correct assumption.
Those shelves and shelves of hard disks in the SAN don't configure themselves. Someone needs to go in and decide how to best group those disks for the application for which they'll be used. File servers are happy enough with RAID 5; SQL Server transaction logs perform best with mirroring, or RAID 1. However, just because we're running a SQL Server doesn't mean that the LUN we're using for the transaction logs must be RAID 1. We might have different tiers of the same type of application: For example, one SQL Server instance might have the databases for an application that requires speed, whereas another might store the databases for a rarely used legacy application and so can use slower, less-expensive storage. Knowing the type of load and the I/O performance that it requires is an important part of working with your storage team to make sure your needs are met.
To illustrate this point, imagine a SAN shelf with ten 300GB disks. These disks can be carved up and exposed in different ways, each having an effect on their performance. For instance, those ten disks can be configured as RAID 6, which means that two drives are used for parity, and the available storage is (n-2) ´ drive size (in our case, 8 ´ 300GB = 2.4TB). This 2.4TB of storage can be exposed as a single LUN or split into smaller LUNs. The server consuming that LUN or LUNs has no idea how it's configured on the back end. RAID 6 has good read performance but poor write performance because the parity bits must be calculated before the data can be written. That configuration is a bad fit for SQL Server transaction logs, which benefit from fast write times.
On the other end of the spectrum, those same ten 300GB drives could be configured as a RAID 1 mirror, providing 1.5TB of storage. This LUN would have much better write performance than the previous example, but it provides less storage. Same SAN, same disks-radically different experience for the server consuming those disks.
SharePoint on a SAN
When it comes to disk usage, SharePoint is a bit like a younger sibling. It doesn't use much I/O itself; it talks its siblings into doing all the heavy lifting (and taking all the blame). The Microsoft article "Capacity management and sizing overview for SharePoint Server 2010" doesn't even discuss the I/O that's needed for SharePoint Server 2010, only for SQL Server. This article assumes that any hard disks you can scrape together to host SharePoint will be more than sufficient for the program's modest demands. For the most part, this assumption is correct. If you have a SAN, though, you can leverage it to make SharePoint a little easier to manage and to tweak SharePoint's performance.
From a management standpoint, SANs make it easy to adjust the size and number of SharePoint's hard disks. According to the Microsoft article "Hardwareand software requirements (SharePoint Server 2010)," ,SharePoint's only disk requirement is an 80GB system drive.
SharePoint itself doesn't need much disk space: It uses about 1GB, excluding logs, search index files, and any custom solutions. But that disk also needs to hold Windows and all its associated patches for the next few years. And the disk needs enough space for SharePoint's logs, plus enough space to perform a memory dump in the unlikely event of a problem. Also, NTFS gets fussy when disks are more than 90-percent full, so leave enough space for some overhead, too.
SharePoint requires at least 80GB, but sometimes that isn't enough. If a SAN is hosting your SharePoint drives, expanding that 80GB system drive to, say, 120GB is painless. The SAN administrators turn a few knobs, pull a few levers, and Windows thinks it has a 120GB physical disk. A quick trip to Disk Manager, and your server now has 40GB more storage. Try that with a physical hard disk.
Performance is another area in which proper use of a SAN can help SharePoint. For the most part, SharePoint is understanding when it comes to performance, and its demands aren't very . . . well, demanding. SharePoint needs little to read from the local disk, and what it does need, it loads and caches. Still, disk performance can improve users' experience with SharePoint in a couple places: BLOB caching and search. Let's start by discussing BLOB caching.
In the context of SharePoint, BLOBs are files such as JPGs, GIFs, and MP3s. Large binary objects such as these don't change very often, so they're great candidates for caching. And because their size is usually large compared with the page size, they benefit the most from the process.
BLOB caching is a function of Microsoft IIS and is configured in each web application's web.config file (usually located in C:\inetpub\wwwroot\wss\virtualdirectories). Because wading through line after line of XML to make changes isn't for the faint of heart, SharePoint makes it easy for us to take advantage of BLOB caching. When SharePoint creates a web application, the program puts all the settings needed for BLOB caching in each web.config file-but doesn't turn on BLOB caching. How very thoughtful! To take advantage of BLOB caching, you need only find the relevant line in web.config and change the enabled value from false to true.
Figure 1 shows how the line looks before being altered. This figure is chock-full of good information, all of which Microsoft documents well. For this article, we're interested only in the location and maxsize parameters. By default, the location is set to the C drive. That makes some sense, because every Windows computer under the sun has that drive. However, it's a good practice to move as much as possible off the C drive, and BLOB caching is no exception. The BLOB cache location should be a secondary drive. That's where our SAN comes into play.
The reason we're turning on caching in the first place is to improve performance for end users. Each time IIS can serve up a file locally instead of pestering the back-end SQL Server instance, that file gets into the user's hands more quickly-which makes for happy users. By putting the BLOB cache location on a SAN drive that's configured for high performance, we can get that file to end users as quickly as possible. The maxsize parameter dictates how large (in gigabytes) this web application's BLOB can be. The default is 10GB. The larger the cache, the more things can be cached and retrieved quickly. Remember that this setting is per web application, so make sure to plan for enough space. You'll also need to edit the web.config file for each web application on each server in your farm.
SharePoint search can benefit the most from a high-performing SAN drive. Search has two primary roles: index and query. Both enjoy performance improvements from speedier I/O.
Index comes first. During indexing (and crawling, a related task), SharePoint scours itself and other configured content sources for the documents that you want to be discoverable in SharePoint. The files are crawled and then copied to the index server's RAM, where they're broken apart by an iFilter (not unlike a coconut on a rock), and all the words are listed. Those words are written to the index files on the index server's file system. The faster the index server can get those words out of its memory and onto its file system, the faster it can move on to the next file in its list. The larger your SharePoint farm gets, the more important crawl times are. A fast SAN drive on your index servers can reduce those times.
After your files have been indexed, they can be found by users. This is where the query servers come into play. As the index servers index files and write the information to their file system, that information is combined with the existing index files on the query servers. When a user searches for a document in SharePoint, the query servers spring into action. They take the user's search term and compare it to the index files on their file system, looking for matches. As your farm grows larger, the number of documents the query server must slog through grows as well. User queries take longer and longer. At some point, your users will become impatient as they wait for results. (And they'll spend that waiting time plotting ways to make sure that there's no Mountain Dew left in the cafeteria by the time you get there.) Putting the index files on a SAN drive with fast read performance reduces the time that's needed for queries to run. (Your users will be happy, and your tummy can get all the sweet caffeinated deliciousness it requires.)
As with the BLOB cache mentioned earlier, getting the index files off the C drive and onto a secondary drive is the best approach. If your query component is already configured to store its index files on the C drive, don't fret. The Search service deals well with change and happily lets you move the index to a different drive at a moment's notice. To do so, edit the Search Service Application Topology in Central Administration. For each of your index partitions, edit the value in the Location of Index field. Figure 2 shows an example in which the location has been changed to the D drive, which is hopefully a super-fast SAN drive.
I recommend just changing the drive letter; don't get crazy with the path. Leave the index files in their usual path. Doing so makes it easier when dealing with material that refers to the files in the default location, and it'll make it easier for someone else to step into your shoes after you get that big promotion or win the lottery. After you've changed the path, click OK and then click Apply Topology Changes.
Search is happy to move your index files for you, but it's in no rush to do so. The service's first concern is being able to respond to end-user requests, so moving the index doesn't get its full attention. Depending on the size of your index, the move process could take a while. It's worth the wait, though. After your index files are on a blazing fast SAN drive, your users will no longer be plotting your demise.
Virtualization on a SAN
SharePoint relies pretty heavily on other technologies to render pages for your users. In some cases, those technologies can make use of a SAN. Virtualization is one example. These days, almost every SharePoint farm has at least one virtualized server. In many cases, the entire farm (test or production) is virtualized. Whether you use virtualization software from Microsoft, VMware, or Jim-Bob's Discount Virtualization, it all can benefit from speedy SAN drives-or suffer from poor decisions.
All major virtualization software suites support hosting guests on SAN drives. You should consider the purpose of a farm (i.e., development, test, or production) when deciding how to configure the SAN drives that host that farm. Development farms, for instance, can have all their drives carved from the same spindles because their performance isn't important. If the servers in the farm are slow, only the developers are affected, and honestly, I'm OK with that. (It teaches them patience.) Test farms might need a little better performance, so those servers should be hosted on better-performing SAN drives. Production farms, if they're virtualized and using SAN drives, should have the best-performing LUNs that the SAN can provide. Unfortunately, there's no good way for you, as the SharePoint administrator, to know how your LUNs are configured. You'll need to take your storage administrators' word for it. Don't be afraid to bribe them with gifts of candy and World of Warcraft figurines.
SQL Server on a SAN
If I've said it once, I've said it 100 times: "SharePoint gets all its performance from SQL Server." In the beginning of this article, I talked about how different SAN configurations affect performance. Nowhere is that more important than with SQL Server. Every document, every list item, every search result, and every user profile is delivered out of a SQL Server database. If those databases are on a slow drive, then SQL Server is slow. If SQL Server is slow, then no amount of configuration trickery or swearing can make SharePoint fast. SQL Server really is that important.
The Microsoft article "Storage and SQL Server Capacity Planning and Configuration (SharePoint Server 2010)" gives some guidance on which services rely heavily on SQL Server I/O and how to calculate the I/O operations per second (IOPS) that you'll need for SharePoint. Again, there's no way to determine the configuration of the SAN drives that SQL Server is using. You can, however, use a tool such as the Microsoft SQLIO disk subsystem benchmark tool to determine the IOPS of the drives in the system. You can use this information to determine whether the drives are up to the task of supporting SharePoint. If they aren't, then you can work with your storage team to get the performance you need. (Although this article is about SAN drives, SQLIO works just as well with physical disks if you're curious to see how they're performing as well.) You can download the tool from the Microsoft site.
The Magic of SAN
Any SharePoint administrator worth his or her salt knows that SharePoint is a complicated beast. It has its fingers in Windows, IIS, and SQL Server, to name a few places. Many of these aspects of SharePoint can benefit from the magic of a SAN. SharePoint can use a SAN to cache frequently used files or store search index files. SharePoint servers can be virtualized on SAN drives, and SharePoint can use SQL Server databases that are stored on SAN drives. Regardless of how you leverage SANs for your SharePoint environment, keep your eyes on performance, and make sure to watch those blinking lights at least once. They're mesmerizing.