Get Proactive with SharePoint 2010’s Improved Monitoring

For SharePoint administrators, 2010 is shaping up to be a great year. SharePoint 2010’s new and improved features will help skilled administrators in many areas, including finding problems in their farm. Let’s look at the improved monitoring features in SharePoint 2010—in particular, timer jobs, reporting, and the health analyzer, as they show up in Central Administration. We’ll cover what’s new, then demonstrate how these features can help you to be more proactive. By the end of this article, your powers to prevent SharePoint problems will make it seem like you can almost predict the future.

Timer Jobs

The first stop on our whirlwind tour of SharePoint 2010’s monitoring improvements is timer jobs. Timer jobs are unsung heroes, the workhorses of SharePoint. They work in the background making sure things are provisioned, email alerts are sent, and other ugly tasks get done.

In SharePoint 2007, if timer jobs weren’t happy, nobody was happy. The problem was there was no good way to troubleshoot timer jobs, and if you needed a timer job to run, you had no choice but to wait for it to run the next time it was scheduled. Nobody likes to wait, and it’s been scientifically proven that a watched timer job will not fire.

The first improvement in SharePoint 2010 is the timer job dashboard. This gives us a snapshot of the timer job subsystem and what’s going on. You get to the dashboard by going to Central Administration and clicking the Monitoring link on the left, which Figure 1 shows.

The set of links pertaining to timer jobs are in the second group of links, cleverly hidden under the heading labeled Timer Jobs. When you click the Check timer jobs link, you’re taken to a page akin to a scene in Dickens’ A Christmas Carol: the Timer Job Status page. You see the ghosts of timer jobs past, present, and future (see Figure 2).

The top of the page shows the timer jobs that are scheduled to run. Clicking on any of the timer jobs brings up their definition, a screen that explains what the timer job will do. You can also edit the schedule of the timer job, including disabling it completely or running it immediately. This is a huge improvement.

Now if a timer job fails for some reason or if you need to execute a timer job’s functionality (like collecting incoming e-mail) you don’t have to wait for its regularly scheduled occurrence. To get the full list of scheduled timer jobs, click Scheduled Jobs under Timer Links in the upper left corner of the page.

The middle section of the page shows running tasks. This is an improvement from SharePoint 2007 where we had no idea what timer jobs were currently running nor any information about them. They were like submarines:They ran silent, and they ran deep. With SharePoint 2010, we can see which jobs are currently running on which servers, how far along they are, and when they started. And because we SharePoint administrators are visual types, this section comes with a progress bar at no extra charge. There is also a page dedicated to displaying the running jobs only. You can get to it by clicking Running Jobs in the top left.

The bottom part of the timer job dashboard shows the timer jobs that have run in the past. SharePoint 2007 has a similar screen, but SharePoint 2010 takes it a step farther. Each finished timer job has a status attached to it: succeeded or failed. Clicking the status takes you to the job history page, where you can get information about that instance of the timer job execution, such as how long the job took, and which web apps and content databases it ran against. In the case of a timer job failure, the history screen tells why the failure occurred, which helps in troubleshooting.

Finally, the trusty old timer job definition from SharePoint 2007 has gotten a facelift in SharePoint 2010. It now lists all the timer jobs defined in the farm, regardless of whether they’re scheduled to run or not. Clicking a job definition opens its properties in the same way it does if you view the definition from the Scheduled Jobs link, which Figure 3 shows.

Not to be left out, Windows PowerShell also lets you manage timer jobs in the SharePoint Management Console. I won’t cover Windows PowerShell options very deeply here, but I will in a later article. Open the SharePoint Management Console and type

Get-Command *SPTimerJob

to get a list of all the cmdlets you can use to manipulate timer jobs. To get specific help on any of them use Get-Help, like this:

Get-Help Start-SPTimerJob

This command shows you how to start timer jobs at will. The other cmdlets work similarly.

While timer jobs function very similar to how they work in SharePoint 2007, in SharePoint 2010 the administration experience is much better, which helps us manage timer jobs more effectively.

Reporting

In a similar vein to timer jobs, the reporting system in SharePoint 2010 has been improved and enhanced. Like timer jobs, Reporting has its own heading with links on the Monitoring page of Central Administration.

The first link, View administrative reports, takes us to a library of administrative reports. As of the beta, this library included only reports from the Search team on statistics like query latency and crawl rate per content source. Hopefully other groups will include reports here too. The structure for these reports will be documented, so custom reports can be created as well.

The second link takes you to the page where diagnostic logging is configured. Several aspects of logging are configured here, and you’ll see two big improvements. First, any category not using the default logging settings shows up in bold. In SharePoint 2007, if you altered any category’s settings you had no way to know which ones you had changed, or what value you had changed it from.

That leads us to the second improvement, a new logging level, Reset to default. Now you can crank up your SharePoint logging with reckless abandon, knowing that bolded categories and Reset to default will help you get things back to normal. This page also lets you restrict log size by number of days kept or by space used. It’s also a good idea to use this page to move the Unified Logging Service (ULS) logs off of your servers’ C drives and onto another drive. Just remember that this setting is a farm setting, so all of your SharePoint servers must have the location you move your logs to.

At the View health reports link, automatically generated health reports give you information about two potential issues with your farm. One report provides a list of the slowest pages in your farm, which should let you isolate and deal with trouble pages before the users come to you. The second report lists your most active users and their activity. These reports, like the administrative reports, allow some basic filtering to help you get the information you’re interested in.

The next link under Reporting lets you configure usage and health data collection. This screen lets you configure what data, if any, is logged by SharePoint. You can choose which events SharePoint logs as well as where SharePoint stores its usage files. Like your ULS logs, it’s a good idea to save your usage logs on a drive other than the C drive.

The page does have one setting you can’t change: the location of the logging database. SharePoint 2010 requires you to use the PowerShell cmdlet Set-SPUsageApplication to alter the location of this database. Central Administration reports only the location of the logging database.

Moving the logging database is a good idea. Because SharePoint aggregates all its usage and health data to this database, it can get large, and it can also experience a lot of disk I/O. If either of these becomes a problem for your SQL Server instance, you might consider moving the logging database to its own instance or at least to its own spindles on your default SQL Server instance. Both SQL Server and your users will appreciate it.

SharePoint Health Analyzer

You might have noticed I didn’t start at the top of the Monitoring page in Central Admin (see Figure 1) and work down. This was by design. I was building anticipation for the big finish, the SharePoint Health Analyzer. If there were any part of SharePoint 2010 that I thought was magic, this would be it.

The Health Analyzer uses XML-based rules combined with timer jobs to periodically scan different aspects of your SharePoint farm and look for problems. When it finds aspects of your farm that violate the rules that are defined, it reports them under the Review problems and solutions link on the Monitoring page. Notice that the link not only shows the problems but also the solutions. Each of the rules specifies the error condition and provides an explanation of the problem and a link to the remedy for the problem. Now that’s full service.

For most of us, our first introduction to the SharePoint Health Analyzer is after installation. Unless you did a very good and thorough scripted installation of SharePoint, the Health Analyzer will show up the first time Central Administration is loaded. You’ll recognize it as the red bar across the top of Central Administration, which you can see in Figure 4. Clicking the View these issues link takes you to the same page as Review problems and solutions does under the Monitoring section.

To fully appreciate the gift we’ve been given with Health Analyzer, let’s go to the Review problems and solutions page. As if finding the problems for you wasn’t enough, it has a couple more tricks up its sleeve. As you can see in Figure 5, the list of problems is just, well, a list. As in a SharePoint list. Because of that you can subscribe to alerts to that list, or follow it with an RSS feed. Not only is Health Analyzer out there patrolling your perimeter, it contacts you when it finds something. That right there is worth the cost of admission alone.

When a problem does show up in the list, you have some options. If you click the item, a pop-up window, which you can see in Figure 6, shows a wealth of information. I don’t have the space to cover it all, but I’ll point out some notable features. First, you can see a good explanation of the problem. There’s also a Remedy section that describes how to fix the problem and an external link with more information. Microsoft really put a lot of work into making sure that administrators had all the information we need to understand and deal with when problems surface. If the problem is scoped to a particular server, web app, or service, it’s also called out here.

The Ribbon at the top also offers a few more options. For all rules, the Reanalyze Now button offers the chance to verify we’ve fixed a problem. This way we don’t have to wait for the next scheduled run for verification.

Some, though not all, rules also have a button labeled Repair Automatically. Like Arthur C. Clarke said, “Any sufficiently advanced technology is indistinguishable from magic.” This might not be magic, but it’s close. To piggyback on that functionality, you can click View next to Rule Settings, then Edit Item and check the box next to Repair Automatically. That tells SharePoint to fix this problem any time it comes up. Or you can leave the check box alone and just click the Repair Automatically button when the problem occurs. Not all rules have this option, which isn’t a bad thing, necessarily. Letting the rule Drives are running out of free space do anything automatic seems a touch scary.

All's Well on the Farm

SharePoint 2010’s improved monitoring should help overworked and underappreciated administrators keep a better eye on the SharePoint farm. This will free up your time to do things other than fight fires, and you’ll be able to keep your users happy, too. But whatever you do, don’t let it clean up drive space for you automatically—that’s just asking for trouble.

Comments

Plain text