SQL Server MVP Dan Guzman and his team at Abiliti Solutions have a multithreaded C++ application that uses IRowsetFastLoad to perform parallel bulk inserts. In Abiliti's test environment, the application runs on a 4-processor server against a 2-processor database server. Performance is usually good, but sometimes throughput drops by half without an obvious reason and with no warning. The team can reproduce the problem by using bulk copy program (bcp), so they're certain the problem isn't an application-level glitch and doesn't involve locking. The problem occurs even when the team performs bulk inserts into different databases on the same server.
After much investigation, Dan's team found what appeared to be a correlation between the SQL Server connection server process ID (SPID) and performance. On the 2-processor database server, when two concurrent bulk inserts both occur on even- or odd-numbered SPIDs, performance is slow and the database server CPU utilization is about 40 percent. However, when one of the two concurrent SPIDs is even and the other is odd, CPU utilization jumps to about 80 percent and the team observes almost twice the level of throughput. The same problem occurs on a quad-processor database server but is associated with every fourth SPID instead of every other SPID.
Ultimately, Dan's team found that the problem isn't related to the SPID but rather the User Mode Scheduler (UMS) ID that's assigned to the SPID. (To learn two undocumented techniques for associating a SPID with the UMS that controls it, see the Q&A "SPID Association," page 15.) Performance is slow only when concurrent bulk inserts happen to get assigned the same UMS ID. (SQL Server assigns SPIDs to a UMS in a round-robin fashion, so it's a matter of chance whether SQL Server will assign concurrently executing bulk inserts to the same UMS.) This behavior is consistent on all the machines Dan's team tested.
An in-depth discussion of the SQL Server UMS is beyond the scope of this column. For more information about UMSs, see Kalen Delaney's book Inside Microsoft SQL Server 2000 (Microsoft Press, 2001). But you need to be aware that the UMS is responsible for scheduling a SQL Server connection (represented by a SPID) on a CPU. And SPIDs don't migrate between different UMSs during the life of the connection.
So if you're running multiple, concurrent, large data-load operations, you might be getting only about half the throughput that's possible, depending on the order in which you make connections. You can't control which UMS SQL Server assigns a SPID to during the login process, so you can't directly solve the problem. However, you can code a relatively simple workaround. First, you create twice as many connections as you need on a 2-processor system. Then, inspect the UMS ID that's assigned to each SPID and ensure that you choose connections that aren't running on the same UMS. This process might sound complicated, but it's not difficult, and Dan's experiment shows that the technique can offer significant throughput increases when loading large volumes of data in parallel. Kudos to Dan and his team at Abiliti Solutions for tracking down this obscure but important performance solution.