Your logs are important. They’ve got to be, otherwise why do you have them? So why do the logs say things like “request handled” thousands of times, with no IP addresses or response codes? That’s a waste of everybody’s time. You can Do Better ™️.
I’ve used structured logging for a few years, and it’s one of those things that have been a game changer for me. It’s enabled some useful patterns within my code that I now can’t go without.
Use Structured Logging
Structured logging is where you use a data structure instead of a string for your logs. Straight up, this is the only option when it comes to your log format, and anything else is less than what you deserve. The biggest advantage by far is that you can add and remove fields at will, because it’s just an object. You can use whatever serialization format you’re happy with, but you should really just use JSON, one object per line. Here’s an example:
{"message": "hello", "level": 11, "time": "2020-08-16T00:56:52.977287Z", "module": "__main__", "pid": 61197, "application": "whatever", "thread": "11ee23dc0.0"}
Some things to take note of:
- Make sure you serialize fields in a stable order, so you don’t go insane when you’re skimming. Trust me.
- Log levels as numbers instead of labels makes it way easier to filter them in your
log aggregator. e.g. say INFO maps to
11
, everything above it is a warning or an error, you’d dolevel > 11
to get all of your logs that are worth a closer look. - Use UTC for the time, not machine time. You don’t have to worry about random machines that are in different timezones, and daylight saving time changes will never bite you.
- The module, pid, application and the thread are required stuff. You need to be able to know where your logs are coming from. I try very hard to not write multi-threaded software and the thread ID is still useful sometimes. Application is nice for the situation where you have multiple different tools making up some software, and you want to know which one is producing the logs.
- I know it’s not as easy to read as a traditional syslog format, but trust me, you’ll get used to it. The benefits outweigh the cost.
The Code Should Be Simple
It should be simple. You can’t build the structured logs by hand, that’s awful and you’ll hate yourself immediately. Setting it up and using it cannot be harder than this:
log = my_log_library.Logger()
def my_function():
log.info("hello", field="my thing", another_field="another thing")
def main():
my_log_library.init(application="cool application", path="/var/log/cool-application.log")
Notice that you’re not producing artisanal JSON by hand, like some kind of weird cyber monk producing a copy of the bible. I’ve talked with people who were trying to do that and they seemed to think it’s acceptable… I’m sorry my friends, it’s not.
The next important thing is that you’re easily adding extra fields to your log messages.
Notice the log.info
call includes keyword arguments field
and another_field
.
You’re going to want to add extra fields all the time, like usernames or request IDs or
whatever. It needs to be easy to do that.
This gives you a good baseline for your logs. You’ve got structured logs that you can dynamically add new fields to without much trouble.
Contextual Logging: The Best
Another useful concept is being able to do something like this:
with my_log_library.context(username="cool name"):
log.info("Checking username")
# ... do stuff
log.info("Done")
The context manager inserts the username into every log line emitted underneath. This is useful in a million different scenarios, but the most obvious ones are things like usernames, request IDs, etc.
This is what takes your logging next level. You’ve made it easy to add relevant metadata to your logs in bulk, and modifying your code to do this is extremely simple. No more needing to backtrack in your logs and use other lines to figure out who was doing what - every line will tell you what you need to know.
I’ve dealt with software that doesn’t use structured logs, and committed many sins to try and find relevant lines for the line in question. Multiple minutes spent trying to figure out which request matches a particular log line, because I only had the PID and thread IDs to go on.
Hell, I’ve watched people do that recently. I’ve seen them modify dozens of log lines and call signatures so that they could include some metadata on every line. This is such a minimal change you can make that will have a huge impact on your log and code quality.
Similarly, you can add a timers:
with my_log_library.timer("request handler", username="cool name") as timer:
resp = handle_request()
timer["response_code"] = resp.response_code
Where your timer logs the time taken and any relevant context when the block ends. Measuring the time something took is extremely standard, but having it in your logs is very useful. Particularly when you can add extra context on computations done within the timer - this helps you build some really useful dashboards.
Log Rotation
Every day. Don’t do weekly or monthly, inodes are cheap. Include the date in your log
file name, e.g. my-application-2020-05-30.log
. You’ll invariably trigger some scenario
where you log massive amounts of data and you’ll want to rotate your logs, deleting logs
by day instead of by week is much easier to justify internally.
My personal preference is that you only configure your logging to rotate to a new file at midnight, nothing more. You can use a daily tempwatch or something for compressing or deleting old files. Your start up or run time should not be impacted by cleaning up log files.
Logging Aggregator
This is gonna be short and sweet, but you need to use one if you’re managing a lot of different machines or want to produce dashboards from your data. Splunk is popular, Humio is up and coming and my personal favourite. It’s faster to load a web interface than to SSH to machine, especially if you’re collecting logs from multiple machines. Be kind to yourself.
You can build dashboards and gather all sorts of metrics about your systems from your logs. Super worth it. The structured logs make this significantly easier - you don’t need to parse your log lines, add regular expressions and whatnot, just take the data as you put it in.
Trim The Fat
Each log line should justify its own existence - they should be terse, clean and informative. It’s common to see code like this:
def do_work():
log.info("Calling worker")
worker()
def worker():
log.info("Starting worker")
# ... idk ...
log.info("Worker finished")
Having worker
and do_work
both log that the worker is starting is useless, duplicate
logging. Remove one of them, you’ve now got 30% less to read.
Similarly, if you have a lot of crufty log lines that don’t provide any value, strongly contemplate deleting them. They’re actively harming the value of your logs if you need to ignore them every time they’re on your screen.
Again, each line should justify its own existence.
Conclusion
Your logs are a reflection of your software, and they should be able to help you understand your software better. Structured logging is something you can make a small amount of effort to implement, making a massive impact on your systems.