My grandfather (who was a Marine pilot during WWII and who was later issued the Navy Commendation Medal in person, by President John F. Kennedy, for his role in saving several endangered ambassadors in Japan) once told a great story that has always stuck with me.
Guards at a factory were always suspicious of a guy working the night shift. The factory made some expensive products and theft was always a bit of a problem – hence the guards. Yet, every morning when this guy headed home, he’d report to the security checkpoint with a wheel-barrow – chock full of sand. The guards would scour through that sand, sometimes dumping practically ALL of it out into buckets or anything else they could to make sure he wasn’t smuggling something within that sand. Try as they might, they NEVER found him smuggling anything in that sand.
That’s because he was stealing wheelbarrows.
What if your WatchDog Falls Asleep?
Sometimes you just need a better vantage point – or a watchdog to watch your watchdog. And one area where that’s particularly true is when it comes to Alerts for your servers. Stated differently, when it comes to setting up alerts and monitoring for your servers, there are two main goals you typically always want to shoot for:
- Noise to Signal Ratio (Minimize False Alarms): One thing I always try to avoid when setting up alerting on any server or for any system is to minimize the potential for false alarms – and other kinds of noise. Because the last thing you want to do when setting up any type of alerting system that you’ll care about is teach or train yourself (or others) to IGNORE these alerts because they’re being sent too frequently. (In other words, I always try to avoid alerts or notifications about success – as I only want to be notified when there’s a failure or something else that needs my attention. Because, otherwise, it’s too easy and too integral to human nature to set up mailbox rules, ‘mental filters’, and other ‘helps’ to just ignore ‘noise’ – which is the worst thing you want happening for a system you care about.)
- Redundancy (Make sure you’ve got a guard guarding your guard): Of course, IF you’re going to go the route of only being alerted or notified when something’s busted or needs your attention, THEN you need to make sure that said alerts/notifications are actually sent. Otherwise, something might break or need your attention, and you’ll never be notified.
So, for example, if you’re using Database Mail to send alerts and notifications when something ugly or bad happens (either at the server level with Alerts or within your Jobs), then seeing something like the following in your Database Mail event log isn’t that should make you happy:
As such, if you’re relying upon Database Mail to send you regular alerts, then you’ll want to periodically go in and check to make sure that Database Mail isn’t throwing errors and failing to send emails. Otherwise, you don’t have a watchdog to watch your watchdog and you could miss something very important.
Redundancy Isn’t Always a Bad Thing
Of course, ‘regularly’ is a very subjective term – meaning that it’s really going to depend upon your environment, the criticality/cost of missing an important alert, and so on – to the point where regularly might mean daily, or weekly. And, in the case where regularly would mean something more like daily or even hourly…
then you’re likely going to need some form of redundancy – so that you can be sure that alerts are being sent. (Which, of course, is a bit problematic as it means you’ll run an INCREASED risk of blurring noise and signal – but such is life when things are important and simply CAN’T be missed.) Accordingly, if something IS so important that it can’t be missed, then looking into adding redundancy via the Windows Event Logs can be a great option – especially if you take advantage of how the WITH LOG option of RAISERROR works.