Curious Case of the missing SOS_SCHEDULER_YIELD waits

Question: I came across a weird case recently where we had a server that was running at 100% across all CPUs for a time but when I used the DMVs to look at wait statistics I couldn’t see anything going on. How can I see what waits were occurring at the time?

Answer: This is a question that came up in our Performance Tuning class this week in London after I’d lectured about wait statistics.

The key point here is that in the situation described there weren’t any actual waits occurring.

Many of the monitoring scripts/tools for looking at wait statistics (for instance mine at http://bit.ly/fSWeO5) aggregate all the wait statistics and return the top 95% of all waits occurring, sorted by most prevalent waits by total wait time. This is done so that your view of the pertinent waits on the server isn’t obscured by those waits that are benign and always occur (e.g. from certain system tasks like the lazywriter in the buffer pool).

A side effect of this monitoring is that unless there are no other waits occurring on the system, you will not see the SOS_SCHEDULER_YIELD wait type show up, even though they may be occurring in great numbers.

Why is this?

The SOS_SCHEDULER_YIELD wait type occurs when a thread executing on a CPU manages to execute for its entire quantum – the uninterrupted time a thread is allowed to be using the CPU before it must allow other threads to have a turn executing. The SQL Server scheduling quanta is 4 milliseconds (and this is not configurable). When a thread exhausts its quantum, it moves off the CPU, but instead of being suspended and put on the waiter list, it moves directly to the bottom of the runnable queue in the scheduler.

This makes perfect sense as the thread isn’t waiting for a resource – so has no reason to be put on the waiter list. However, a wait type must be registered and so the SOS_SCHEDULER_YIELD type is used and it will have a zero resource wait time. There may be a signal wait time though (the time the thread spends on the runnable queue waiting to get back on the CPU again), depending on how many other tasks are running on that CPU.

This means that if you have a server where CPU is pegged at 100%, it’s possible to look at sys.dm_os_waiting_tasks and see zero rows returned – because nothing is waiting! It’s possible that the threads are all exhausting their quantas, yielding the CPUs, and then immediately getting back on the CPU again because the runnable queue has nothing else on it. In that case I would look specifically at sys.dm_os_wait_stats to see if the number of SOS_SCHEDULER_YIELD waits is going up, using code similar to that below:

SELECT [waiting_tasks_count] from sys.dm_os_wait_stats
WHERE [wait_type] = 'SOS_SCHEDULER_YIELD';

Comments

Plain text