Logging: Basic Structure and Objects

28 Oct 2020

Overview

The logging module in Python is a very powerful tool that I’ve summarily neglected learning until this week. However, as it’s often utilized as part of a robust ML pipeline, I finally took the plunge. In this notebook, I’m going to distill the basics I got from the docs, as well as show a simple project that highlights some of the behavior and configuration patterns.

Levels

logging operates using a concept of “levels,” which correspond to varying levels of message-severity. Going from least-severe to most-severe, we have:

DEBUG: Super detailed information
INFO: Confirmation that thing are working correctly
WARNING: Indication that something unexpected happened, but things still work
ERROR: Something ran incorrectly in our program
CRITICAL: The most serious message. Things severely broken

In general, once you set the “level” for a logging object, it operates at “that level or more severe” and ignores all activity lower than it’s configured for.

For example, if I’ve got my logging set to WARNING, then any time I log with level='INFO', nothing will happen. On the other hand, if I’ve configured my level to DEBUG, then any log of any level will be served.

Objects

Generally, there are four base classes that power the library.

Logger: Exposes the overall interface
Handler: Sends the records created by the Logger objects
Filter: Determines which logs to output
Formatter: Specifies the layout of the Handler output

Going down the list:

`Logger`

The Logger object is the one that holds all the other logging objects and is the main interface that the client code taps into to leverage the library.

You can literally skip all knowledge of intentional instantiation and just do the following

import logging

logging.warning('This hit')

But I don’t think that’s a very good decision, because then the lack of output for something like this is confusing.

logging.info('This also hit')

(More on this in a few)

Additionally, you can broadly configure your logging instance with the following

logging.basicConfig(filename='example.log')

But this makes things difficult to trace/debug. For instance, filename='example.log' suggests that we’ll be logging to some file. So why does the following print in standard out?

logging.warning('This should have written do disk')

And no, it didn’t both print and write.

!cat example.log

cat: example.log: No such file or directory

Instead, I like being more explicit in how I build this out with getLogger(). Note, that the docs explicitly say

Note that Loggers should NEVER be instantiated directly, but always through the module-level function logging.getLogger(name)

logger = logging.getLogger()
logger.warning('This hit')

WARNING:root:This hit

So, looking at the last few outputs, it should be clear that WARNING is the default level that we instantiate with.

But what’s going on with root in the print above?

Namespaces

Like modules, logging is namespace-aware.

When called with no arguments, as above, getLogger() looks to the root of the project and

Creates a new Logger object under root if it doesn’t exist
Returns the root Logger if it does

You can also do similar tricks to what you’d do with the os or pathlib libraries, such as

logging.getLogger(__name__)

<Logger __main__ (WARNING)>

Or calling it with __file__ (which doesn’t work in Notebooks). But the preferred convention is __name__.

This allows us to leverage clean hierarchical relationships between modules/submodules and their Logger objects. For instance, if I’ve got the line logging.getLogger(__name__) in each of my files, then much like imports and modules, there’s a clear relationship between the root level foo, and the Logger objects found in the child foo.bar and foo.bar.baz submodules. More on this at the end.

Using the `Logger`

We already did a bit of it earlier, but for the sake of completeness, when we log we can either do it generically, specifying the level explicitly:

logger.log(logging.CRITICAL, "This is srs bsns")

CRITICAL:root:This is srs bsns

Or by using the utility functions that correspond to the same name as the level

logger.critical("This is also srs. But simpler to write")

CRITICAL:root:This is also srs. But simpler to write

`Handler`

These, unlike the Logger are always instantiated directly. Sort of.

More accurately, we’ll always find ourselves instantiating some subclass of the Handler object. The two main examples of this are:

StreamHandler, which writes to the console/std.out
FileHandler, which wraps the open() function and writes logs to a file

There are a TON of other handlers batteries-included, but for the sake of keeping it simple, these are the only two we’ll concern ourselves with.

`Formatter`

These provide string formatting for our writes and then some.

It’s got its own special syntax/keywords for handling, among other things:

Time
File the log was generated in
Line number the long was generated on
Arbitrary messages

and a lot more. We’ll see an example of this at the end.

Filter

Looks too complicated for our purposes. Maybe another time.

Putting these All Together

So now that we know the constituent parts to a logging system, how do we assemble them for use?

Adding `Handler` objects

We’ll start by making a brand new logger (with an arbitrary name)

logger = logging.getLogger('new_logger')
logger.handlers

[]

Inspecting, this totally bare at the start.

Then, we’ll instantiate a StreamHandler object to write to sys.stdout and then apply the addHandler() method of our logger

import sys

stream_info_handler = logging.StreamHandler(sys.stdout)

logger.addHandler(stream_info_handler)

Looking again, we can see that there’s now a StreamHandler object that meets our definition. The NOTSET simply refers to the fact that we haven’t specified a level for the StreamHandler, and thus, it will defer to however its parent operates– which again, is WARNING by default.

logger.handlers

[<StreamHandler stdout (NOTSET)>]

This is a little tricky– why are there two prints?

logger.warning('Boop')

Boop


WARNING:new_logger:Boop

To investigate this, we’ll change the level of our Handler object to its most restrictive state, and rerun the .warning()

stream_info_handler.setLevel(logging.CRITICAL)

Only one print.

logger.warning('Boop')

WARNING:new_logger:Boop

This is because the Handler only fires on level=50. Which is code for CRITICAL, as specified

display(stream_info_handler.level)
logging.getLevelName(50)

50





'CRITICAL'

The logger itself, however, has an implicit (seemingly-hidden, IMO) writer to stderr, at the warning WARNING level and above.

display(logger.getEffectiveLevel())
logging.getLevelName(30)

30





'WARNING'

Still no new Handler objects in sight.

logger.handlers

[<StreamHandler stdout (CRITICAL)>]

But, if we overwrite the default behavior, you’ll notice that it no longer shows up.

logger.setLevel(logging.CRITICAL)

logger.warning("This shouldn't hit")

Adding `Formatter` objects

The setFormatter behaves similarly to the Logger.addHandler() method except applies to the Handler objects, instead.

So we start off by making another fresh Logger object

logger = logging.getLogger('even_newer_logger')

Then we’ll make a simple StreamHandler that writes to standard out at a permissive INTO level

new_stream_handler = logging.StreamHandler(sys.stdout)
new_stream_handler.setLevel(logging.INFO)

Then we’ll add the Handler to the Logger

logger.addHandler(new_stream_handler)

logger.warning('And we can see that this printed to standard out')

And we can see that this printed to standard out


WARNING:even_newer_logger:And we can see that this printed to standard out

However, if we wanted to go back and change the output format of the log generated by the StreamHandler, we’d use

FMTR = logging.Formatter("%(asctime)s — %(name)s — %(levelname)s —" " %(message)s")

new_stream_handler.setFormatter(FMTR)

logger.warning('Notice the difference?')

2020-10-28 18:02:33,042 — even_newer_logger — WARNING — Notice the difference?


WARNING:even_newer_logger:Notice the difference?

A Final Note on Inheritence

The docs have a great image to elucidate further, but I wanted to illustrate how level-inheritence works in logging.

Basically, there are two general rules that are followed:

If a Logger is initialized in some submodule (e.g. foo.bar), if you don’t specify the level, it will look to the parent (foo) to see if it has a declared level. If it doesn’t find it there, then it will look higher and higher until it finds root, which will either be declared, or set to WARNING, by default.
When calling logger.log() with any arbitrary level, if the call isn’t more severe than the logger level, it won’t get passed down to any of its Handler objects, even if those accept that severity.

To illustrate that second point, we’ll make a new Logger that only accepts CRITICAL, and give it a StreamHandler that will accept DEBUG, and therefore everything.

firewall = logging.getLogger('firewall')
firewall.setLevel(logging.CRITICAL)

easygoing_stream_handler = logging.StreamHandler(sys.stdout)
easygoing_stream_handler.setLevel(logging.DEBUG)

firewall.addHandler(easygoing_stream_handler)

Finally, we’ll make a simple .info() call that doesn’t call the StreamHandler, whatsoever.

firewall.info('Alas, this was blocked.')

# cleaning up our environment for the next section

del (FMTR, logger, easygoing_stream_handler,
     stream_info_handler, firewall, new_stream_handler)

Example Project

To tie these concepts together, I liked how this repo structured their logging and structured a dummy project that highlights the most essential aspects of their setup.

Say we’ve got some module, app, that we want to run via main.py to generate some logs into the logs directory.

simple_logging
|   main.py
|
+---app
|   |   app_code.py
|   |   __init__.py
|   |
|   \---config
|           logging_config.py
|
\---logs

Files

main.py is super straight-forward.

from app.app_code import run_app

run_app()

app/app_code.py contains our app. The goal here is to log each App ran for X to a log file, but print the i=6 case to the console.

import logging

logger = logging.getLogger(__name__)
print(__name__, 'printed from', __file__)

def run_app():
    for i in range(10):
        logger.info(f"App ran for {i}")

        if i == 6:
            logger.warn("We saw the number 6!")

The fact that we want to have two different log outputs should be a dead giveaway that we want to leverage both a StreamHandler and a FileHandler. Which is precisely what we specify in app/config/logging_config.py

import logging
import sys

FORMATTER = logging.Formatter(
    "%(asctime)s — %(name)s — %(levelname)s —" "%(funcName)s:%(lineno)d — %(message)s"
)

def get_console_handler():
    console_handler = logging.StreamHandler(sys.stdout)
    console_handler.setFormatter(FORMATTER)
    console_handler.setLevel(logging.WARNING)
    return console_handler


def get_file_handler():
    file_handler = logging.FileHandler("logs/app.log")
    file_handler.setFormatter(FORMATTER)
    file_handler.setLevel(logging.INFO)
    return file_handler

I’m not sure if the get___() functions above are convention or not, but I really liked how clean it looked in the implementation, so I’m copying it here.

These functions are imported to app/__init__.py, where we define the app-level Logger object.

import logging

from app.config import logging_config

print(__name__, 'printed from', __file__)
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

logger.addHandler(logging_config.get_console_handler())
logger.addHandler(logging_config.get_file_handler())

In Action

So when you consider the call stack of running main.py, the first thing to execute is from app.app_code import run_app. Recall from our notebook on Packages that the __init__.py in the root of the module is the first thing that gets executed. Here, that’s app/__init__.py.

In that file, we import logging. Easy enough. Then on line 3 we execute from app.config import logging config, which means we jump into app/config/logging_config.py. This file executes top to bottom, defining our logging.Formatter, FORMATTER, and stuffing it into the two get____ functions.

Once that’s done running, the functions get passed back to app/__init__.py, and are both executed. This creates both Handler subclasses, which are added to the Logger.

So let me stop with this wall of text and ask: What gets passed back after app/__init__.py runs?

Nothing… Sort of.

Before we run the whole thing, let’s just see what happens when we import simple_logging.app

import os

try:
    os.remove('simple_logging/logs/app.log')
except FileNotFoundError:
    pass

# change into the same directory as `main.py` for
# same behavior
os.chdir('simple_logging')

As you can see, the Logger instantiated in app/__init__.py comes before the one in app/app_code.py

from app.app_code import run_app

[<StreamHandler stderr (NOTSET)>]
__name__ app.app_code    printed from app/app_code.py
__name__ app             printed from app/__init__.py

But there don’t appear to be any artifacts created from their run.

whos

Variable   Type        Data/Info
--------------------------------
logging    module      <module 'logging' from 'C<...>b\\logging\\__init__.py'>
os         module      <module 'os' from 'C:\\Us<...>\\anaconda3\\lib\\os.py'>
run_app    function    <function run_app at 0x0000023BB8EDA8B8>
sys        module      <module 'sys' (built-in)>

But if you know where to look, you can find them. For this, we’ll dig into the logging namespace and find the dictionary of all active loggers.

There are a lot of them. A lot of which are a product of writing this post in a Jupyter Notebook.

But you’ll also notice that the Logger objects that we blew away, up above, are still here (new_logger, firewall, etc).

print(logging.Logger.manager.loggerDict.keys())

dict_keys(['concurrent.futures', 'concurrent', 'asyncio', 'prompt_toolkit', 'parso.python.diff', 'parso.python', 'parso', 'parso.cache', 'tornado.access', 'tornado', 'tornado.application', 'tornado.general', 'IPKernelApp', '__main__', 'new_logger', 'even_newer_logger', 'firewall', 'app.app_code', 'app'])

This is because, critically, the logging module hangs onto every object created– unless forced otherwise– for the duration of the program being run. So nothing is returned when we walked through all of the __init__ code, but we added our user-defined Logger instances to the logging module itself.

Let’s go ahead and clean these up… hidden_loggers = list(logging.Logger.manager.loggerDict.keys())

for logger_key in hidden_loggers: if logger_key not in [‘app’, ‘app.app_code’]: del logging.Logger.manager.loggerDict[logger_key]

logging.Logger.manager.loggerDict This allows us to cherry-pick the objects that got created from the import statement

app_logger = logging.Logger.manager.loggerDict['app']
app__app_code_logger = logging.Logger.manager.loggerDict['app.app_code']

And as you can see, because we created the app Logger first, and because app.app_code inherits from its parent class (unless specified) both Logger objects are of level='DEBUG'.

app_logger

<Logger app (DEBUG)>

app__app_code_logger

<Logger app.app_code (DEBUG)>

Actually Running the Code

However, if you look our app/config/logging_config.py you’ll notice that the get_console_handler() and get_file_handler() functions both explicitly overwrite the loose strictness of the Logger they’re assigned to.

The StreamHandler is set to WARNING
The FileHandler is set to INFO

In practice, it means that when we execute run_app(), we expect to see 10 INFO logs written to our logfile location, and 1 WARNING message, printed to the terminal.

# clearing the fallback Logger for the logging module.
# the un-instantiated stuff we did up top made a 
# tricky, hidden logger that wrote to standard out, in
# addition to the loggers we actually want. Made for gross
# notebook output. Signed, I've been debugging this for 3 hrs.
logging.root.handlers = []

Here’s our one WARNING

%run main.py

2020-10-28 18:02:33,117 — app.app_code — WARNING —run_app:13 — We saw the number 6!

and here’s 10 logged records

with open('logs/app.log', 'r') as f:
    print(f.read())

2020-10-28 18:02:33,116 — app.app_code — INFO —run_app:10 — App ran for 0
2020-10-28 18:02:33,116 — app.app_code — INFO —run_app:10 — App ran for 1
2020-10-28 18:02:33,116 — app.app_code — INFO —run_app:10 — App ran for 2
2020-10-28 18:02:33,116 — app.app_code — INFO —run_app:10 — App ran for 3
2020-10-28 18:02:33,116 — app.app_code — INFO —run_app:10 — App ran for 4
2020-10-28 18:02:33,116 — app.app_code — INFO —run_app:10 — App ran for 5
2020-10-28 18:02:33,116 — app.app_code — INFO —run_app:10 — App ran for 6
2020-10-28 18:02:33,117 — app.app_code — WARNING —run_app:13 — We saw the number 6!
2020-10-28 18:02:33,117 — app.app_code — INFO —run_app:10 — App ran for 7
2020-10-28 18:02:33,117 — app.app_code — INFO —run_app:10 — App ran for 8
2020-10-28 18:02:33,117 — app.app_code — INFO —run_app:10 — App ran for 9