Patch Testing: How Much Is Enough?

What determines how soon you should load a patch designed to correct a newly discovered security vulnerability? As soon as a security hole is publicized, a properly skilled attacker might be able to exploit that hole, but you're more likely to be hit first by a less skilled attacker equipped with a proof-of-concept "tool" written by someone more skilled, or by automated malware such as a worm that uses the hole as one of its attack vectors. The goal for most organizations is to load a patch before either of these events happens. The amount of time between the release of a security patch and when a worm or proof-of-concept program hits the Internet is the patch opportunity window. Unfortunately, that window is shrinking; you have less and less time to evaluate, test, and deploy patches than ever before. In this article, we'll consider current trends and take into account business risk and cost to provide a fresh look at how much testing you should do before rolling out patches.

Even though the quality of Microsoft patches has improved over the years, a significant number of security updates have to be re-released. Usually, when a patch is re-released it means the patch introduced new problems or, less frequently, opened new security holes. Sometimes a patch is re-released to compensate for newly discovered variants of the original vulnerability. The bottom line: patch testing is still necessary, if you're to avoid destabilizing your environment.

How Much Testing?
But how much testing is required? "Right-sizing" your testing process saves IT staff hours and lowers security risk, because your systems are protected sooner. On the other hand, as you reduce testing you increase the risk of destabilizing your environment, which can be costly in terms of hurting your reputation among user departments and customers, lost user hours and IT staff hours while the patch is rolled back, and lost sales if the patch affects your e-commerce systems. You can't eliminate security or business risks, or the costs of patching. It's a matter of balancing them. Consider the following activities and concepts relating to security patching. You'll find that many opportunities are available to lower cost and to optimize your patch management process and reduce business and security risks.

Defense-in-depth. Putting the principle of defense-in-depth to work on your network can help eliminate the need to load a significant portion of patches. For example, when you disable unneeded services and features, you become automatically immune to exploits discovered in those components.

Evaluation. Comprehensive evaluation of a patch really pays off. First, when evaluating a patch make sure you understand which systems on your network are vulnerable. Your defense-in-depth measures may have already ruled out the patch as a risk on a large number of systems (e.g., if the exploit affects a feature such as IIS that you've previously disabled on your typical desktop and laptop class of systems). For the remaining systems that are technically vulnerable without the patch, consider mitigating factors. Is local execution of rogue code required in order to exploit the vulnerability? If the patch requires local execution of malicious code, what counter measures are in place on vulnerable systems? If the only vulnerable systems are highly controlled servers, you may discover that the probability of malicious code being installed and executed on the servers is very low. On the other hand, if the exploit affects workstations or a terminal server with loose restrictions on user sessions, you may find that the risk of malware being executed is much higher.

Once you identify the systems that are truly vulnerable, consider the impact that a defective patch might have and compare that to the impact of malware exploiting the patch. Such a consideration must take into account the number of users or transactions involved, the information or processes supported by the systems involved, and management's sensitivity to stability and security. Security decisions aren't always a cold, logical process because good will is an important asset to businesses and the impact to good will due to stability or security problems varies from one industry to another, as well as from one upper management team to the next. All things considered, the goal of patch evaluation should be to determine which systems actually need the patch and whether to err on the side of stability or security.

Process alignment. You might consider whether you can speed up your evaluation by aligning your internal process with your vendor's patch process. For instance, Microsoft patches follow a much more predictable schedule now, with most patches being released monthly. You can get a head start on evaluating a patch by making use of Microsoft's advance notification of patches. Currently, Microsoft releases patches on the second Tuesday of each month. Three business days before that - on Friday - Microsoft provides advance notification of the patches it plans to release on Tuesday. The advance notification provides information about the products affected, as well as other information to help you make an initial evaluation. If you use this advance information and begin your evaluation on Friday, you should be able to identify the types of systems on your network requiring the patch and be ready to download and begin testing patches as soon as they are released the following week.

Testing. The goal of testing is to determine if a patch causes any problems with your unique combination of configuration settings, hardware, and applications. How long does it take to find such a problem? If you test the patch by rolling it out to a limited number of production servers, you may require an entire usage cycle to discover any problem, because patch defects often manifest themselves only when you exercise a certain function of the software or exceed a certain workload threshold. Depending on the business processes supported by your servers, a usage cycle may be a business day, a week, or a month if an application on the server has end-of-period processing to perform.

While you're performing your own internal testing, the rest of Microsoft's customer base is testing also - or even rolling out the patch to production. Chances are good that someone else will find any defects before you do. Interestingly, most Microsoft patch defects are found and the patch re-issued within 30 days from the patch's original release. Therefore, a month would seem to be the sweet spot for testing server patches via limited production rollout. Patch testing on workstations is easier because you have far more workstations than servers to choose from, the impact of a defective patch on a given workstation is a fraction of that on a server, and workstation usage cycles can often be compressed by having the user exercise all applications and activities more easily than can be done on servers. Therefore you may find that a week of testing on a sampling of production workstations that covers the breadth of user types and configurations provides enough time to flush out any defects that affect your organization. That week also gives the user community at large time to identify any problems with the patch.

But sometimes 30 or 7 days is 30 or 7 days too long if your evaluation shows the potential impact from the vulnerability is severe and you don't identify any compensating controls in the form of defense-in-depth measures or mitigating factors. For organizations that implement defense-in-depth and attack surface reduction measures, such high risk vulnerabilities will usually only affect a small subset of systems on the network. For instance, let's say that a vulnerability with the PPTP implementation of Windows Server's Routing and Remote Access Service is discovered. The corresponding patch will be needed only on the company's VPN servers that have PPTP enabled. In such cases, the emphasis clearly shifts from stability to security.

The only way to shorten patch testing in limited production rollout is with a formal testing lab that duplicates as much of the hardware configuration and application profile of production systems. Then on such a realistic test bed you must simulate the usage cycle and peak processing workloads seen in production over a longer period of time. The other benefit to a dedicated test environment is that you have opportunity to test the patch against simulated workload peaks caused by seasonal peaks of business specific to your industry. Testing and workload simulation tools are available that vary in sophistication and price, but maintaining a dedicated test environment is a significant cost no matter how you implement it.

Deployment. Without a way to automatically roll security patches out to even modest numbers of computers, your efforts to protect against exploits will fail and be exposed to operator error during the manual patch process. Therefore the need for a tool that allows you to roll patches automatically is critical to your overall security strategy. You can choose from Microsoft's free Software Updates Services or from a host of other more powerful patch automation tools.

Rollback. To limit the risk of rolling out a critical patch with little or no testing, it's crucial to have a rollback process in place and tested so that you can back the patch off if it suddenly causes a problem. The current tool from Microsoft included with Windows for patch deployment (Software Update Services) does not provide a way to centrally rollback defective patches, but the next version of SUS (called Windows Update Services) is supposed to have such a feature.

While the rate of Microsoft security patches averages one a week (usually released in monthly batches) and can at first be overwhelming, you can eliminate a lot of risk, work, and cost by following security best practices such as defense-in-depth and attack surface reduction, then evaluating patches to determine which systems really need the patch and how much testing is really necessary before you roll out the patch. Aligning your patch process with the vendor's patch release process helps reduce the risk incurred from the time the exploit is announced to when you deploy the patch. Automating the testing and rollout process further reduces deployment costs and the exposure window. Finally, a tested rollback process provides an emergency button to push if your previous efforts fail to prevent a defective patch from affecting your network.

Comments

Plain text