Describing performance tuning and scalability problems and concepts in a real-world context can help novice or intermediate IT professionals avoid similar mistakes in the future. I learned that lesson more than a decade ago, and it has served me well over the years as I’ve tried to assist my tuning customers. So I was especially interested when I saw the MSN article, "Last-minute files swamp tax servers," which describes a massive slowdown experienced by Intuit's TurboTax online tax return submission capability. Last Tuesday, April 17 (tax day), Intuit's servers were overloaded as many last-minute filers were trying to submit their taxes. Many users waited hours to receive confirmation that their taxes had been filed.
“Usually, it takes only a few minutes after hitting the submit button for TurboTax users to get a message indicating the transaction had gone through. By Tuesday evening, however, it was taking hours," said Intuit Vice President of Communications Harry Pforzheimer. "Don't wait until the last minute is the moral of the story.”
Luckily the situation ended happily; people learned not to wait until the last minute to file their taxes (not that I haven't ever been in line at the post office at 11:00 PM on tax day, hoping to drop off my tax returns in time to get them postmarked that day), and the IRS promised not to penalize the tardy filers. However, there's another moral to this story that’s relevant to the people who design transaction-oriented systems that are subject to massive fluctuations in demand.
Let’s do some simple math before we go any further. In the MSN article, Pforzheimer said, “during times of peak of demand, Intuit was processing 50 to 60 returns per second,” which means that the Intuit site suffered catastrophic performance failures at no more than 60 transactions per second, or 216,000 transactions per hour. I’m not a tax professional, but somehow I suspect that it shouldn't have been difficult for Intuit to guess that it might have to handle that many returns on tax day. To be fair, I’m sure that Intuit did some stress testing on its servers, but I have to presume that it fell into a common trap that development teams encounter when they plan stress tests. My guess is that the Intuit site experiences most of its sustained load only a few days per year, and that mid-April tends to be the site's busiest time, with the sustained concurrent user workload being substantially lower the rest of the year. In such a situation, there are at least two core problems that we need to address.
The first problem is that the peak workload is widely divergent from the average workload. Database and development teams commonly design for average workloads with no consideration for peak workloads or what will happen if they can’t keep up with that load. In some cases, it’s OK to design for average or slightly above average workloads; however, in other cases, not being able to handle peak workloads almost always leads to a massive system failure.
The second problem, which the article makes apparent, is that Intuit's site has the logical characteristics of a queuing system. I don’t know if the application was built using queue management techniques, but it's certainly managing a queue regardless of whether the application realizes it. Queue systems don’t necessarily have to keep up with peak workload, but they must be able to handle the backlog gracefully because they expose an entry point to front-end users. I didn’t use the Intuit site to do my taxes this year, so I don’t know what the user experience was, but this article makes me think that the experience was probably hit submit, then wait for hours with an hourglass before receiving confirmation that your return was filed. I suspect it would have been possible for Intuit to design a system that could gracefully handle the peak workload on tax day.
Live and learn. Should you wait until the last minute to file your taxes? Probably not. Should you pay attention to this scenario and strive to ensure that the applications you build avoid such problems? Absolutely.