Logging: Basic Structure and Objects
Overview
The logging module in Python is a very powerful tool that I’ve summarily neglected learning until this week. However, as it’s often utilized as part of a robust ML pipeline, I finally took the plunge. In this notebook, I’m going to distill the basics I got from the docs, as well as show a simple project that highlights some of the behavior and configuration patterns.
Levels
logging
operates using a concept of “levels,” which correspond to varying levels of message-severity. Going from least-severe to most-severe, we have:
DEBUG
: Super detailed informationINFO
: Confirmation that thing are working correctlyWARNING
: Indication that something unexpected happened, but things still workERROR
: Something ran incorrectly in our programCRITICAL
: The most serious message. Things severely broken
In general, once you set the “level” for a logging
object, it operates at “that level or more severe” and ignores all activity lower than it’s configured for.
For example, if I’ve got my logging set to WARNING
, then any time I log with level='INFO'
, nothing will happen. On the other hand, if I’ve configured my level to DEBUG
, then any log of any level will be served.
Objects
Generally, there are four base classes that power the library.
Logger
: Exposes the overall interfaceHandler
: Sends the records created by theLogger
objectsFilter
: Determines which logs to outputFormatter
: Specifies the layout of theHandler
output
Going down the list:
Logger
The Logger
object is the one that holds all the other logging
objects and is the main interface that the client code taps into to leverage the library.
You can literally skip all knowledge of intentional instantiation and just do the following
import logging
logging.warning('This hit')
But I don’t think that’s a very good decision, because then the lack of output for something like this is confusing.
logging.info('This also hit')
(More on this in a few)
Additionally, you can broadly configure your logging
instance with the following
logging.basicConfig(filename='example.log')
But this makes things difficult to trace/debug. For instance, filename='example.log'
suggests that we’ll be logging to some file. So why does the following print in standard out?
logging.warning('This should have written do disk')
And no, it didn’t both print and write.
!cat example.log
cat: example.log: No such file or directory
Instead, I like being more explicit in how I build this out with getLogger()
. Note, that the docs explicitly say
Note that Loggers should NEVER be instantiated directly, but always through the module-level function
logging.getLogger(name)
logger = logging.getLogger()
logger.warning('This hit')
WARNING:root:This hit
So, looking at the last few outputs, it should be clear that WARNING
is the default level that we instantiate with.
But what’s going on with root
in the print above?
Namespaces
Like modules, logging
is namespace-aware.
When called with no arguments, as above, getLogger()
looks to the root of the project and
- Creates a new
Logger
object underroot
if it doesn’t exist - Returns the
root
Logger
if it does
You can also do similar tricks to what you’d do with the os
or pathlib
libraries, such as
logging.getLogger(__name__)
<Logger __main__ (WARNING)>
Or calling it with __file__
(which doesn’t work in Notebooks). But the preferred convention is __name__
.
This allows us to leverage clean hierarchical relationships between modules/submodules and their Logger
objects. For instance, if I’ve got the line logging.getLogger(__name__)
in each of my files, then much like imports and modules, there’s a clear relationship between the root level foo
, and the Logger
objects found in the child foo.bar
and foo.bar.baz
submodules. More on this at the end.
Using the Logger
We already did a bit of it earlier, but for the sake of completeness, when we log we can either do it generically, specifying the level explicitly:
logger.log(logging.CRITICAL, "This is srs bsns")
CRITICAL:root:This is srs bsns
Or by using the utility functions that correspond to the same name as the level
logger.critical("This is also srs. But simpler to write")
CRITICAL:root:This is also srs. But simpler to write
Handler
These, unlike the Logger
are always instantiated directly. Sort of.
More accurately, we’ll always find ourselves instantiating some subclass of the Handler
object. The two main examples of this are:
StreamHandler
, which writes to the console/std.out
FileHandler
, which wraps theopen()
function and writes logs to a file
There are a TON of other handlers batteries-included, but for the sake of keeping it simple, these are the only two we’ll concern ourselves with.
Formatter
These provide string formatting for our writes and then some.
It’s got its own special syntax/keywords for handling, among other things:
- Time
- File the log was generated in
- Line number the long was generated on
- Arbitrary messages
and a lot more. We’ll see an example of this at the end.
Filter
Looks too complicated for our purposes. Maybe another time.
Putting these All Together
So now that we know the constituent parts to a logging system, how do we assemble them for use?
Adding Handler
objects
We’ll start by making a brand new logger (with an arbitrary name)
logger = logging.getLogger('new_logger')
logger.handlers
[]
Inspecting, this totally bare at the start.
Then, we’ll instantiate a StreamHandler
object to write to sys.stdout
and then apply the addHandler()
method of our logger
import sys
stream_info_handler = logging.StreamHandler(sys.stdout)
logger.addHandler(stream_info_handler)
Looking again, we can see that there’s now a StreamHandler
object that meets our definition. The NOTSET
simply refers to the fact that we haven’t specified a level for the StreamHandler
, and thus, it will defer to however its parent operates– which again, is WARNING
by default.
logger.handlers
[<StreamHandler stdout (NOTSET)>]
This is a little tricky– why are there two prints?
logger.warning('Boop')
Boop
WARNING:new_logger:Boop
To investigate this, we’ll change the level of our Handler
object to its most restrictive state, and rerun the .warning()
stream_info_handler.setLevel(logging.CRITICAL)
Only one print.
logger.warning('Boop')
WARNING:new_logger:Boop
This is because the Handler
only fires on level=50
. Which is code for CRITICAL
, as specified
display(stream_info_handler.level)
logging.getLevelName(50)
50
'CRITICAL'
The logger
itself, however, has an implicit (seemingly-hidden, IMO) writer to stderr
, at the warning WARNING
level and above.
display(logger.getEffectiveLevel())
logging.getLevelName(30)
30
'WARNING'
Still no new Handler
objects in sight.
logger.handlers
[<StreamHandler stdout (CRITICAL)>]
But, if we overwrite the default behavior, you’ll notice that it no longer shows up.
logger.setLevel(logging.CRITICAL)
logger.warning("This shouldn't hit")
Adding Formatter
objects
The setFormatter
behaves similarly to the Logger.addHandler()
method except applies to the Handler
objects, instead.
So we start off by making another fresh Logger
object
logger = logging.getLogger('even_newer_logger')
Then we’ll make a simple StreamHandler
that writes to standard out at a permissive INTO
level
new_stream_handler = logging.StreamHandler(sys.stdout)
new_stream_handler.setLevel(logging.INFO)
Then we’ll add the Handler
to the Logger
logger.addHandler(new_stream_handler)
logger.warning('And we can see that this printed to standard out')
And we can see that this printed to standard out
WARNING:even_newer_logger:And we can see that this printed to standard out
However, if we wanted to go back and change the output format of the log generated by the StreamHandler
, we’d use
FMTR = logging.Formatter("%(asctime)s — %(name)s — %(levelname)s —" " %(message)s")
new_stream_handler.setFormatter(FMTR)
logger.warning('Notice the difference?')
2020-10-28 18:02:33,042 — even_newer_logger — WARNING — Notice the difference?
WARNING:even_newer_logger:Notice the difference?
A Final Note on Inheritence
The docs have a great image to elucidate further, but I wanted to illustrate how level-inheritence works in logging
.
Basically, there are two general rules that are followed:
If a
Logger
is initialized in some submodule (e.g.foo.bar
), if you don’t specify the level, it will look to the parent (foo
) to see if it has a declared level. If it doesn’t find it there, then it will look higher and higher until it findsroot
, which will either be declared, or set toWARNING
, by default.When calling
logger.log()
with any arbitrary level, if the call isn’t more severe than thelogger
level, it won’t get passed down to any of itsHandler
objects, even if those accept that severity.
To illustrate that second point, we’ll make a new Logger
that only accepts CRITICAL
, and give it a StreamHandler
that will accept DEBUG
, and therefore everything.
firewall = logging.getLogger('firewall')
firewall.setLevel(logging.CRITICAL)
easygoing_stream_handler = logging.StreamHandler(sys.stdout)
easygoing_stream_handler.setLevel(logging.DEBUG)
firewall.addHandler(easygoing_stream_handler)
Finally, we’ll make a simple .info()
call that doesn’t call the StreamHandler
, whatsoever.
firewall.info('Alas, this was blocked.')
# cleaning up our environment for the next section
del (FMTR, logger, easygoing_stream_handler,
stream_info_handler, firewall, new_stream_handler)
Example Project
To tie these concepts together, I liked how this repo structured their logging and structured a dummy project that highlights the most essential aspects of their setup.
Say we’ve got some module, app
, that we want to run via main.py
to generate some logs into the logs
directory.
simple_logging
| main.py
|
+---app
| | app_code.py
| | __init__.py
| |
| \---config
| logging_config.py
|
\---logs
Files
main.py
is super straight-forward.
from app.app_code import run_app
run_app()
app/app_code.py
contains our app. The goal here is to log each App ran for X
to a log file, but print the i=6
case to the console.
import logging
logger = logging.getLogger(__name__)
print(__name__, 'printed from', __file__)
def run_app():
for i in range(10):
logger.info(f"App ran for {i}")
if i == 6:
logger.warn("We saw the number 6!")
The fact that we want to have two different log outputs should be a dead giveaway that we want to leverage both a StreamHandler
and a FileHandler
. Which is precisely what we specify in app/config/logging_config.py
import logging
import sys
FORMATTER = logging.Formatter(
"%(asctime)s — %(name)s — %(levelname)s —" "%(funcName)s:%(lineno)d — %(message)s"
)
def get_console_handler():
console_handler = logging.StreamHandler(sys.stdout)
console_handler.setFormatter(FORMATTER)
console_handler.setLevel(logging.WARNING)
return console_handler
def get_file_handler():
file_handler = logging.FileHandler("logs/app.log")
file_handler.setFormatter(FORMATTER)
file_handler.setLevel(logging.INFO)
return file_handler
I’m not sure if the get___()
functions above are convention or not, but I really liked how clean it looked in the implementation, so I’m copying it here.
These functions are imported to app/__init__.py
, where we define the app
-level Logger
object.
import logging
from app.config import logging_config
print(__name__, 'printed from', __file__)
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
logger.addHandler(logging_config.get_console_handler())
logger.addHandler(logging_config.get_file_handler())
In Action
So when you consider the call stack of running main.py
, the first thing to execute is from app.app_code import run_app
. Recall from our notebook on Packages that the __init__.py
in the root of the module is the first thing that gets executed. Here, that’s app/__init__.py
.
In that file, we import logging
. Easy enough. Then on line 3 we execute from app.config import logging config
, which means we jump into app/config/logging_config.py
. This file executes top to bottom, defining our logging.Formatter
, FORMATTER
, and stuffing it into the two get____
functions.
Once that’s done running, the functions get passed back to app/__init__.py
, and are both executed. This creates both Handler
subclasses, which are added to the Logger
.
So let me stop with this wall of text and ask: What gets passed back after app/__init__.py
runs?
Nothing… Sort of.
Before we run the whole thing, let’s just see what happens when we import simple_logging.app
import os
try:
os.remove('simple_logging/logs/app.log')
except FileNotFoundError:
pass
# change into the same directory as `main.py` for
# same behavior
os.chdir('simple_logging')
As you can see, the Logger
instantiated in app/__init__.py
comes before the one in app/app_code.py
from app.app_code import run_app
[<StreamHandler stderr (NOTSET)>]
__name__ app.app_code printed from app/app_code.py
__name__ app printed from app/__init__.py
But there don’t appear to be any artifacts created from their run.
whos
Variable Type Data/Info
--------------------------------
logging module <module 'logging' from 'C<...>b\\logging\\__init__.py'>
os module <module 'os' from 'C:\\Us<...>\\anaconda3\\lib\\os.py'>
run_app function <function run_app at 0x0000023BB8EDA8B8>
sys module <module 'sys' (built-in)>
But if you know where to look, you can find them. For this, we’ll dig into the logging
namespace and find the dictionary of all active loggers.
There are a lot of them. A lot of which are a product of writing this post in a Jupyter Notebook.
But you’ll also notice that the Logger
objects that we blew away, up above, are still here (new_logger
, firewall
, etc).
print(logging.Logger.manager.loggerDict.keys())
dict_keys(['concurrent.futures', 'concurrent', 'asyncio', 'prompt_toolkit', 'parso.python.diff', 'parso.python', 'parso', 'parso.cache', 'tornado.access', 'tornado', 'tornado.application', 'tornado.general', 'IPKernelApp', '__main__', 'new_logger', 'even_newer_logger', 'firewall', 'app.app_code', 'app'])
This is because, critically, the logging
module hangs onto every object created– unless forced otherwise– for the duration of the program being run. So nothing is returned when we walked through all of the __init__
code, but we added our user-defined Logger
instances to the logging
module itself.
Let’s go ahead and clean these up… hidden_loggers = list(logging.Logger.manager.loggerDict.keys())
for logger_key in hidden_loggers: if logger_key not in [‘app’, ‘app.app_code’]: del logging.Logger.manager.loggerDict[logger_key]
logging.Logger.manager.loggerDict
This allows us to cherry-pick the objects that got created from the import
statement
app_logger = logging.Logger.manager.loggerDict['app']
app__app_code_logger = logging.Logger.manager.loggerDict['app.app_code']
And as you can see, because we created the app
Logger
first, and because app.app_code
inherits from its parent class (unless specified) both Logger
objects are of level='DEBUG'
.
app_logger
<Logger app (DEBUG)>
app__app_code_logger
<Logger app.app_code (DEBUG)>
Actually Running the Code
However, if you look our app/config/logging_config.py
you’ll notice that the get_console_handler()
and get_file_handler()
functions both explicitly overwrite the loose strictness of the Logger
they’re assigned to.
- The
StreamHandler
is set toWARNING
- The
FileHandler
is set toINFO
In practice, it means that when we execute run_app()
, we expect to see 10 INFO
logs written to our logfile location, and 1 WARNING
message, printed to the terminal.
# clearing the fallback Logger for the logging module.
# the un-instantiated stuff we did up top made a
# tricky, hidden logger that wrote to standard out, in
# addition to the loggers we actually want. Made for gross
# notebook output. Signed, I've been debugging this for 3 hrs.
logging.root.handlers = []
Here’s our one WARNING
%run main.py
2020-10-28 18:02:33,117 — app.app_code — WARNING —run_app:13 — We saw the number 6!
and here’s 10 logged records
with open('logs/app.log', 'r') as f:
print(f.read())
2020-10-28 18:02:33,116 — app.app_code — INFO —run_app:10 — App ran for 0
2020-10-28 18:02:33,116 — app.app_code — INFO —run_app:10 — App ran for 1
2020-10-28 18:02:33,116 — app.app_code — INFO —run_app:10 — App ran for 2
2020-10-28 18:02:33,116 — app.app_code — INFO —run_app:10 — App ran for 3
2020-10-28 18:02:33,116 — app.app_code — INFO —run_app:10 — App ran for 4
2020-10-28 18:02:33,116 — app.app_code — INFO —run_app:10 — App ran for 5
2020-10-28 18:02:33,116 — app.app_code — INFO —run_app:10 — App ran for 6
2020-10-28 18:02:33,117 — app.app_code — WARNING —run_app:13 — We saw the number 6!
2020-10-28 18:02:33,117 — app.app_code — INFO —run_app:10 — App ran for 7
2020-10-28 18:02:33,117 — app.app_code — INFO —run_app:10 — App ran for 8
2020-10-28 18:02:33,117 — app.app_code — INFO —run_app:10 — App ran for 9