Errors, like Thanos, are Inevitable (and Where to Track Them)

Here are some questions you should be able to ask and easily answer about errors that occur in your web application.

What are the top errors that have occurred over a given period of time, like today, the last week or month?
Similarly, what are the top web pages/screens where errors occur over a given period of time? And is it the same error occurring on the page or 10 different ones?
When did a particular error first occur?
What errors only first appeared or disappeared so far after the last application update?
How recently has a particular error occurred?
What browser are users using when an error occurs?
Who are the users that are affected by errors?
On what server are errors occurring most/least?
Do certain errors occur only on particular hosts?
What action was the user taking in the application when the error occurred?
Because error counts vary by application usage, how many errors occur per page view?
Etc.

These are impossible to answer if you’re not tracking errors at all (obviously). If you own or manage such an application, one of the best things you can do is start tracking errors, or possibly start a rewrite because the vendors that write such applications tend to have impossibly buggy code. I don’t know why that goes hand-in-hand exactly but I’ve a theory or two. If you’re logging errors to a log file somewhere on your server, good news – the questions are now only nearly impossible to answer. Supposing you’ve captured all the data you’ll need, you’ll need to aggregate it somehow and that’ll take some work.

What you ultimately need is to track errors in a database. I know, I know… some errors might be that the database server is not available and so even attempting to store error information in a database might itself generate an error. Don’t let that hold you up. If it’s a concern, then you can mitigate that. The value here is being able to answer the questions above for the 99.99% of the errors that won’t prevent the errors from getting into that error tracking database. You’ll know about that 0.01% of errors in other ways or you’re not doing application monitoring correctly.

Five years ago I put together the database structure below to help Pixsys Technologies get better visibility into their application errors. Not only that, we started sharing the “errors per page view” metric with management/owners. Thus started the process of managing that metric and holding ourselves accountable. And today, I cannot feel comfortable building an application without having this error tracking structure in place. Below is what that database structure looks like.

Sample Error Tracking Database Structure

Tables:

Exception Signature – Captures the “fingerprint” of an error. Necessary because many error descriptions are artificially unique because they reference something like a particular value causing the error. We want there to be one entry in this table for something like a Primary Key violation on table Employee, not one entry for each value that violates the constraint. Scrubbing error messages to generate the “fingerprint” is the magic here and is likely where you’ll spend 90% of your time setting this up.
Exception – Captures the occurrence of the error described in the Exception Signature table.
Exception Web Context – Captures information particular to the user of a web application like their IP address (be careful of GDPR concerns), the URL, referrer, session ID, querystring and form data being sent to the web page, and so on.
Exception Web Context User Agent – The user agent tells us about the browser and OS in use at the time. We keep each unique user agent value in this table to avoid duplication.
Exception Web Context Url – The web address of the page the user was on when the error occurred. We keep each unique URL value in this table to avoid duplication.
Client – A table to allow us to track errors that occur across various clients. It is sometimes useful to know only client X sees a particular error.

Closing Thoughts

Feel free to copy as much or as little as you want from this database structure. This is sort of like “how to make a PB&J sandwich” in terms of intellectual property. Nothing proprietary other than just having this in place. That said, I don’t always have a lot of time to code some days. But there are cases where I have an hour or 45 minutes between meetings with the evil marketing folks about my blog assignments, and I can easily go to this database and find the best way to make that hour count.