More logging, less debugging

DevOps practitioners are responsible for a range of technologies. "Responsibility" doesn't mean we have to know them all perfectly, and it certainly isn't an excuse to be ignorant about all of them eqully. We need familiarity with everything from HTML to cloud orchestration to storage hierarchies to DNS configuration, though. "Familiarity" amounts to clarity about what we know, and where to get help.

Many DevOps "look for the debugger" when tackling a new technology. As well-meaning as this is, it's also probably a mistake: there are approaches that pay off better.

One of the difficulties is that debuggers are often--always?--more complicated and harder to learn than the technologies they debug. I can teach someone enough PHP or PowerShell to tackle simple tasks in fifteen minutes. The conventional "debugger" for the former language can require an hour just to install effectively, and the debugging cmdlet sublanguage of the latter is rich to the point of overload.

Debuggers seems to bring out the worst in many practitioners: they pound keys with no plan beyond single-stepping until something turns up, they practice simplistic and unrewarding trial-and-error approaches, they target "the problem went away (maybe)" more than "I understand the cause", they exercise no control over the reproducibility of distinct sessions, and they adopt a "heroic" and error-prone speculative attitude rather than a more modest, incremental one.

I favor logging, monitoring, and throttling, over reliance on debugging. All these techniques require investment ahead of time. Here are a few examples of how they can work out better than waiting for incidents that demand debugging:

I have had systems that were subject to apparent deadlocks under load. I was able to put together quick rate-controls that limited loads to safe bands. While this approach lacked the intellectual satisfaction of deeper understanding, it ensured that end users received reliable results, and allowed our team to concentrate on even more consequential problems.

Capacity-planning can also be regarded as a throttle. Rather than invest in sophisticated programming of out-of-space responses, install easy alarms and controls to ensure filesystems always have a surplus of room. Report to end users that the back end is busy and they'll need to check back later, well before the back end hits its absolute limit; that's more satisfying to end users than having response times plummet while complicated algorithms split prioritization hairs.

Intelligent logging also promotes prevention of the need to resort to debugging. When an organization encourages careful logging, and an incident happens, often the logs will reveal exactly the details necessary to understand the cause, without any elaborate debugging. If the organization is smart enough, it will have made logging configurable in a lightweight way, and has the chance to amplify logging detail quickly enough to catch an incident while it's happening.

Throughout the range of your DevOps responsibilities, then, don't wait for things to go wrong, and debug what might have happened after the fact. Look ahead of time for ways you can prevent problems, and have problems report on themselves as much as possible. Needs for wizardry in debugging will still occur, but so rarely you might reasonably leave them to a specialized teammate.

Comments

Plain text