SharePoint administrators are defenders of order in any SharePoint installation. The integrity of SharePoint data comes from proper organization of sites, site collections, content types, and metadata tags.
As owners of the configuration and functionality of SharePoint, administrators share the responsibility in the overall performance and stability of the SharePoint platform. While ensuring the optimum performance of a SharePoint farm takes more than a well-oiled set of databases, it certainly helps to keep the lowest-level components operating as smoothly as possible.
Fortunately, you can keep SharePoint's databases in optimum condition with standard maintenance tasks. It's important for SharePoint administrators to understand what those maintenance tasks are and to know the best practices for how and when to run them.
SharePoint 2010 ships with Microsoft SQL Server Maintenance Plan Wizard, which helps SharePoint administrators to automate these tedious maintenance tasks, but it's important to understand exactly what needs to be done.
SharePoint 2007 was a simple affair when it came to understanding and managing the databases required for healthy operation of the application, with only a content database, a configuration database, and perhaps a search database or two.
SharePoint 2010 is a different animal. With the introduction of service applications and the large set of core databases, one SharePoint 2010 installation might have as many as 20 databases, each of which needs to be backed up and maintained.
Therefore, it's important to know which databases are core to the system or required for the proper functioning of various SharePoint service applications (Figure 1 and Figure 2 show the various databases and info about each).
Before diving too deeply into SharePoint databases, you should know which versions of SQL Server are supported by SharePoint 2010 installations. SharePoint 2010 supports 64-bit versions of SQL Server 2008 R2, SQL Server 2008, and SQL Server 2005.
We recommend you use SQL Server 2008 R2 because it has the widest set of functionality and features for SharePoint. For example, Remote BLOB Storage is available only on SQL Server 2008 Enterprise.
For more information about the different SQL Server versions, see the Microsoft article, "SQL Server 2008 R2 and SharePoint 2010: Better Together."
Be sure that any SQL Server setup is up to date with the latest patches and hotfixes per the recommendations in the TechNet article "Hardware and Software Requirements (SharePoint Server 2010)." This page is kept up to date as new cumulative updates and patches are released. Be sure to visit this page at least once a month to ensure compliance. At the time of this writing, the page had been updated three times, most recently in July 2010.
DBA Created vs. Auto-provisioned Databases
With few exceptions, SharePoint 2010 will create all the databases required for the healthy operation of a SharePoint environment. Given the ease of auto-provisioning, when should a DBA insist on pre-creating SharePoint databases?
In production environments, certain circumstances dictate a more prudent approach than auto-provisioning, including situations where you need
- guaranteed control over database names (no GUIDs in the database names)
- guaranteed control over database sizing
- procedural separation of control over application and data environments
If you decide to create databases by hand rather than through an auto-provisioning technique, you must use the appropriate PowerShell cmdlets to create the databases and register them with SharePoint. For example, you can use the following PowerShell command to create a new configuration database:
New-SPConfigurationDatabase -DatabaseName "SharePointConfigDB1" -DatabaseServer "SQL-01" -Passphrase (ConvertTo-SecureString "MyPassword" -AsPlainText -force) -FarmCredentials (Get-Credential)
You can find a complete walkthrough of how a DBA might create various content and configuration databases in the TechNet article, "Deploy by using DBA-created databases (SharePoint Server 2010)."
Nothing can ruin a business-critical content repository's reputation faster than data corruption. As the DBA or administrator responsible for the quality of the platform, it's important that you understand database corruption and how you can correct it.
Protecting data is a tricky task, especially if errant power spikes and sags cause your SQL Server's I/O subsystem to fail when writing data to disk. Corruption is frequently the result, and without the proper checks, that corruption will lurk in your database undetected until Murphy's Law dictates that you need quality data.
Corruption in a database might occur when the disk that holds a log file or data file has been altered. This type of physical corruption tends to affect sectors on a disk due to problems in the I/O subsystem, such as the physical network hardware and disk drives themselves.
Physical corruption, which is what the built-in SQL function DBCC CHECKDB looks for and reports, is usually caused by physical hardware problems.
Logical corruption is caused by data being altered in some unanticipated way that severs a data relationship. This type of corruption usually is caused by an application error or human error that causes data problems but doesn't affect the physical structure of the database.
SQL Server bugs can also cause this type of corruption, which you can learn about in the Microsoft article "When you use a file handle for FileStream access in a SQL Server 2008 transaction, the transaction may randomly fail to commit in Windows Server 2003 or in Windows XP Professional x64 Edition."
Best practice dictates that you should run a DBCC_CHECKDB command at least as often as you run a full backup. DBCC_CHECKDB will present a report of errors that you can investigate further.
It's important to note that DBCC CHECKDB doesn't check for logical corruption. But the command can cause logical corruption when the REPAIR_ALLOW_DATA_LOSS option is used, because the option doesn't take any constraints into consideration when repairing physical corruption issues.
So what's a DBA to do? When a DBCC_CHECKDB command returns with error messages, the appropriate solution is to turn to your database backups.
However, this requires regular backups that aren't corrupted. As mentioned earlier, DBCC_CHECKDB will potentially introduce errors if it's used to fix corruption. Without backups there's no way to get sanitized data back from a corrupt database.
SharePoint is a complex application that relies on many different frameworks, components, and server applications to function and perform properly. Improper maintenance at any of these levels can contribute to problems and perhaps downtime.
Fortunately, configuring SQL Server for optimum performance isn't complicated. Performance tuning SQL Server largely involves a couple of configuration settings, proper placement of data and log files, and the occasional rebuild of table indices.
First is the proper selection of SQL Server hardware. As identified in Microsoft's "SQL Server 2008 R2 and SharePoint 2010: Better Together" article, SharePoint 2010 requires a 64-bit SQL Server installation because 32-bit hardware and software are no longer supported.
Basic RAID disk configuration, SAN slice allocations, and local storage requirements also play a major role in determining the performance of SQL Server because several recommendations require the physical separation of data files and log files onto different spindles.
Finally, memory requirements for the SQL Server system start at 16GB and move up from there. Microsoft provides some SharePoint-specific SQL Server sizing recommendations in the TechNet article "Storage and SQL Server capacity planning and configuration (SharePoint Server 2010)."
Managing Database Files
When it comes to managing SQL Server data files, best practices dictate that data and log files reside on their own physical spindle.
This is more than just disk volume separation; the recommendation is to place log files and data files on different disks and to ensure that no other application uses those disks. This setup minimizes overall write access to the disks, and lessens the opportunities for file fragmentation.
Another recommendation is to try to pre-create database and log files of the appropriate size ahead of time, rather than allowing the files to auto-grow by small increments. The reason for this configuration is that an auto-grow operation can take time and slow down write-intensive environments such as SharePoint collaboration farms.
This doesn't mean that DBAs should disable auto-grow on SharePoint databases and logs, but you shouldn't rely on this capability for the initial sizing of the databases. It's important to note that the behavior of auto-grow for SQL Server 2005 and SQL Server 2008 has changed.
In earlier versions of SQL Server, log files were initialized and zero-filled when auto-grown, which is a large part of why the operation was slow. Proper configuration of SQL Server 2005 and SQL Server 2008's instant file initialization option allows for the elimination of that zero-filling initializing step.
Measuring and Reducing Fragmentation
Data fragmentation inside of SQL Server usually can be explained by the normal course of data manipulation, including inserts, updates, and deletes. The basic symptom of data fragmentation is an increasing volume of free space corresponding to data volumes.
This is because SQL Server stores data as a database page, which contains header details, record details, and an index. However, the database page is also configured by the SQL Server fill factor to have a minimum size.
Records smaller than the fill factor result in a lot of empty space in a table. SharePoint systems rely on GUID-based index keys and therefore exacerbate the problem by inserting data randomly throughout the range of records.
To correct this problem, best practices suggest either changing the index schema or rebuilding the index to compact and defragment the data. Because SharePoint can't have its index schema changed, the only option is to rebuild the index for tables or indexes that require it.
The temptation is to periodically rebuild all indexes using an automated maintenance plan, but this tends to require significant amounts of available free space and is a poor choice for large systems.
A more appropriate choice would be to use the DMV SYS.DM_DB_INDEX_PHYSICAL_STATS to find the indexes that are most fragmented. MSDN provides sample code.
Keeping Statistics Up to Date
When SQL Server executes a query, it does so along a calculated and compiled execution plan. The execution plan is created by SQL Server's Query Processor, and it defines which tables and indexes to use to achieve the best possible query performance.
The metrics for determining query performance are derived from statistics that help SQL Server understand how data is distributed inside a table or index. These statistics are generated by a variety of read operations, including full and sample data scans.
If these statistics are out of date due to fragmented indexes, the execution plan will not be as efficient as it could otherwise be.
Statistics are usually kept up to date automatically, but certain maintenance operations or intensive data manipulation operations can cause the statistics to become out of date.
In these cases, it's possible to force SQL Server to update the statistics via the built-in SP_UPDATESTATS function. Running a SP_UPDATESTATS function after an index rebuild isn't recommended, because this changes the previous sampling level (determined automatically) and might result in less than optimal results.
Per the August 2008 TechNet article "Top Tips for Effective Database Maintenance," the recommended database maintenance plan will perform the following steps:
- Analyze indexes and determine which indexes to operate on and how to do the fragmentation removal.
- For all indexes that weren't rebuilt, update the statistics.
- Update statistics for all the non-indexed columns.
Right-sizing your data tier is a critical part of any SharePoint implementation because it has direct impact on the cost, performance, and scalability of the entire application. To understand how much data your SharePoint environment must be able to hold, you need to consider the total number of documents, the number of versions of each document, and the average size of each document.
Additionally, you must consider the number of list items that will be stored in the application. Microsoft has provided the formula below for data sizing:
Database size = ((# Documents × # Versions) × Avg Size) + (10 KB ×(# List items + (# Versions × # Documents)))
Although this simple calculation will give you a rough idea of the required storage capacity, you do several things to affect the resulting number. As previously mentioned, database content is stored in database pages, which are kept to a uniform size by the configured fill factor.
Adjusting the fill factor can affect fragmentation and overall size on disk. It's possible to actually shrink a database, although Microsoft is issuing some fairly strong warnings these days about wonton use of the DBCC_SHRINKDATABASE function.
Finally, if allocating all data storage responsibilities to SQL Server entails too much cost and overhead, you can configure SharePoint to look to the file system for storage of large binary objects or BLOBs.
You can read more in the TechNet article "Storage and SQL Server capacity planning and configuration (SharePoint Server 2010)."
Set the Fill Factor for a Server
By changing the default configuration for FILLFACTOR, the DBA can reduce fragmentation and page splits (a symptom of fragmentation that affects performance), but the side effect is that this takes more database space because the database pages are larger.
The way the database fill factor defines how much free space is required on a database page during an index rebuild before moving to a new database page. During regular operation of the database, new content can be inserted into this free space without requiring a clustered index to adjust large amounts of data.
Kimberly Tripp offers a great deal of information about this important attribute in her blog series on database maintenance best practices.
All the supported versions of SQL Server for SharePoint have the ability to shrink data files, which recovers disk space by removing unused data. None of the databases in SharePoint are set to automatically shrink the data files.
The strong recommendation from Microsoft and knowledgeable SQL Server MVPs is not to perform auto-shrink of the database or to configure a maintenance plan that does it automatically on a database.
The reason is that the shrink ignores the fill factor setting and causes all the indexes to become fragmented. Then, when you run a rebuild indexes command, the database grows back to its original size.
Instead of relying on SQL Server's automated DBCC_SHRINKDATABASE commands, it's safer to partition content databases or to remove data from existing databases. The following list shows the activities you can perform to create free space in a SharePoint environment:
- Use STSADM MERGCONTENTSDB
- Delete documents
- Delete libraries
- Delete lists
- Delete list items
- Delete sites
Remote BLOB Storage
To free up critical resources such as the file system, you can use the new Remote BLOB Storage (RBS) mechanism available with SQL Server 2008.
Although this sounds like a great way to move large data items (e.g., image files, streaming video or sound clips) out of your database, you need to evaluate the advantages and disadvantages first. If the files are not large, and you have many small BLOBs, you can see a decrease in performance on the server.
So make sure to evaluate your content to determine whether you need to implement the RBS. The current recommendation is that your content database should be larger than 500GB and the BLOB data files larger than 256KB.
RBS will provide the most performance increase on systems that have large to extremely large files that aren't frequently accessed. Adding RBS to your write-intensive SharePoint collaboration implementation could actually make the user experience slower.
Database Maintenance Plans
Almost all tasks that you can perform with SQL Server can be automated. It's crucial to any SharePoint implementation to have an automated plan that will help maintain your site. Keep track of when these automated tasks run because you might have to notify users or plan for the system to run a little slower during this time to keep everything in tip-top shape.
A database maintenance plan is like routine upkeep on your car. You must do maintenance on a regular schedule to have your SQL Server deployment perform at an optimum level and to help keep your website running at peak performance.
The great part about the SQL Server Maintenance Plan is that you can use the Maintenance Plan Wizard to set it up. The wizard gives you the ability to add items such as database backups and transaction logs, to update database statistics, and to manage data such as indexes.
Based on the August 2008 TechNet magazine article "Top Tips for Effective Database Maintenance" here's a checklist of tasks that you should include in your maintenance plan:
- Remove excessive transaction log file fragmentation by ensuring the appropriate recovery model and backup schedule.
- Turn off any scheduled shrink operations to reduce the risk of unnecessary index fragmentation.
- Set auto-growth correctly by using a predetermined set file size rather than a percentage. Follow this up by periodically examining database sizes and determining whether manual database growth is necessary to ensure optimum performance.
- Turn on instant file initialization such that database auto-growth is an instantaneous operation rather than a slow operation that requires zero-filling new space.
- Put a regular process in place to detect and remove index fragmentation.
- Turn on AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS, and have a regular process in place to update statistics.
- Turn on page checksums.
- Have a regular process to run DBCC CHECKDB.
Any SharePoint administrator should make sure to spend a little time creating at least a basic maintenance plan to keep his or her system running well. The time spent setting it up will pay off with performance, speed, and backups, if ever needed.