Nothing Special   »   [go: up one dir, main page]

Friday, 15 November 2019

A Qt GUI for logging

A question that comes up from time to time is about how to log to a GUI application. The Qt framework is a popular cross-platform UI framework with Python bindings using PySide2 or PyQt5 libraries.


The following example shows how to log to a Qt GUI. This introduces a simple QtHandler class which takes a callable, which should be a slot in the main thread that does GUI updates. A worker thread is also created to show how you can log to the GUI from both the UI itself (via a button for manual logging) as well as a worker thread doing work in the background (here, just logging messages at random levels with random short delays in between).

The worker thread is implemented using Qt’s QThread class rather than the threading module, as there are circumstances where one has to use QThread, which offers better integration with other Qt components.

The code should work with recent releases of either PySide2 or PyQt5. You should be able to adapt the approach to earlier versions of Qt. Please refer to the comments in the code snippet for more detailed information.

import datetime
import logging
import random
import sys
import time

# Deal with minor differences between PySide2 and PyQt5
try:
    from PySide2 import QtCore, QtGui, QtWidgets
    Signal = QtCore.Signal
    Slot = QtCore.Slot
except ImportError:
    from PyQt5 import QtCore, QtGui, QtWidgets
    Signal = QtCore.pyqtSignal
    Slot = QtCore.pyqtSlot


logger = logging.getLogger(__name__)


#
# Signals need to be contained in a QObject or subclass in order to be correctly
# initialized.
#
class Signaller(QtCore.QObject):
    signal = Signal(str, logging.LogRecord)

#
# Output to a Qt GUI is only supposed to happen on the main thread. So, this
# handler is designed to take a slot function which is set up to run in the main
# thread. In this example, the function takes a string argument which is a
# formatted log message, and the log record which generated it. The formatted
# string is just a convenience - you could format a string for output any way
# you like in the slot function itself.
#
# You specify the slot function to do whatever GUI updates you want. The handler
# doesn't know or care about specific UI elements.
#
class QtHandler(logging.Handler):
    def __init__(self, slotfunc, *args, **kwargs):
        super(QtHandler, self).__init__(*args, **kwargs)
        self.signaller = Signaller()
        self.signaller.signal.connect(slotfunc)

    def emit(self, record):
        s = self.format(record)
        self.signaller.signal.emit(s, record)

#
# This example uses QThreads, which means that the threads at the Python level
# are named something like "Dummy-1". The function below gets the Qt name of the
# current thread.
#
def ctname():
    return QtCore.QThread.currentThread().objectName()


#
# Used to generate random levels for logging.
#
LEVELS = (logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR,
          logging.CRITICAL)

#
# This worker class represents work that is done in a thread separate to the
# main thread. The way the thread is kicked off to do work is via a button press
# that connects to a slot in the worker.
#
# Because the default threadName value in the LogRecord isn't much use, we add
# a qThreadName which contains the QThread name as computed above, and pass that
# value in an "extra" dictionary which is used to update the LogRecord with the
# QThread name.
#
# This example worker just outputs messages sequentially, interspersed with
# random delays of the order of a few seconds.
#
class Worker(QtCore.QObject):
    @Slot()
    def start(self):
        extra = {'qThreadName': ctname() }
        logger.debug('Started work', extra=extra)
        i = 1
        # Let the thread run until interrupted. This allows reasonably clean
        # thread termination.
        while not QtCore.QThread.currentThread().isInterruptionRequested():
            delay = 0.5 + random.random() * 2
            time.sleep(delay)
            level = random.choice(LEVELS)
            logger.log(level, 'Message after delay of %3.1f: %d', delay, i, extra=extra)
            i += 1

#
# Implement a simple UI for this cookbook example. This contains:
#
# * A read-only text edit window which holds formatted log messages
# * A button to start work and log stuff in a separate thread
# * A button to log something from the main thread
# * A button to clear the log window
#
class Window(QtWidgets.QWidget):

    COLORS = {
        logging.DEBUG: 'black',
        logging.INFO: 'blue',
        logging.WARNING: 'orange',
        logging.ERROR: 'red',
        logging.CRITICAL: 'purple',
    }

    def __init__(self, app):
        super(Window, self).__init__()
        self.app = app
        self.textedit = te = QtWidgets.QPlainTextEdit(self)
        # Set whatever the default monospace font is for the platform
        f = QtGui.QFont('nosuchfont')
        f.setStyleHint(f.Monospace)
        te.setFont(f)
        te.setReadOnly(True)
        PB = QtWidgets.QPushButton
        self.work_button = PB('Start background work', self)
        self.log_button = PB('Log a message at a random level', self)
        self.clear_button = PB('Clear log window', self)
        self.handler = h = QtHandler(self.update_status)
        # Remember to use qThreadName rather than threadName in the format string.
        fs = '%(asctime)s %(qThreadName)-12s %(levelname)-8s %(message)s'
        formatter = logging.Formatter(fs)
        h.setFormatter(formatter)
        logger.addHandler(h)
        # Set up to terminate the QThread when we exit
        app.aboutToQuit.connect(self.force_quit)

        # Lay out all the widgets
        layout = QtWidgets.QVBoxLayout(self)
        layout.addWidget(te)
        layout.addWidget(self.work_button)
        layout.addWidget(self.log_button)
        layout.addWidget(self.clear_button)
        self.setFixedSize(900, 400)

        # Connect the non-worker slots and signals
        self.log_button.clicked.connect(self.manual_update)
        self.clear_button.clicked.connect(self.clear_display)

        # Start a new worker thread and connect the slots for the worker
        self.start_thread()
        self.work_button.clicked.connect(self.worker.start)
        # Once started, the button should be disabled
        self.work_button.clicked.connect(lambda : self.work_button.setEnabled(False))

    def start_thread(self):
        self.worker = Worker()
        self.worker_thread = QtCore.QThread()
        self.worker.setObjectName('Worker')
        self.worker_thread.setObjectName('WorkerThread')  # for qThreadName
        self.worker.moveToThread(self.worker_thread)
        # This will start an event loop in the worker thread
        self.worker_thread.start()

    def kill_thread(self):
        # Just tell the worker to stop, then tell it to quit and wait for that
        # to happen
        self.worker_thread.requestInterruption()
        if self.worker_thread.isRunning():
            self.worker_thread.quit()
            self.worker_thread.wait()
        else:
            print('worker has already exited.')

    def force_quit(self):
        # For use when the window is closed
        if self.worker_thread.isRunning():
            self.kill_thread()

    # The functions below update the UI and run in the main thread because
    # that's where the slots are set up

    @Slot(str, logging.LogRecord)
    def update_status(self, status, record):
        color = self.COLORS.get(record.levelno, 'black')
        s = '<pre><font color="%s">%s</font></pre>' % (color, status)
        self.textedit.appendHtml(s)

    @Slot()
    def manual_update(self):
        # This function uses the formatted message passed in, but also uses
        # information from the record to format the message in an appropriate
        # color according to its severity (level).
        level = random.choice(LEVELS)
        extra = {'qThreadName': ctname() }
        logger.log(level, 'Manually logged!', extra=extra)

    @Slot()
    def clear_display(self):
        self.textedit.clear()


def main():
    QtCore.QThread.currentThread().setObjectName('MainThread')
    logging.getLogger().setLevel(logging.DEBUG)
    app = QtWidgets.QApplication(sys.argv)
    example = Window(app)
    example.show()
    sys.exit(app.exec_())

if __name__=='__main__':
    main()

Wednesday, 27 January 2016

Using a context manager for selective logging

There are times when it would be useful to temporarily change the logging configuration and revert it back after doing something. For this, a context manager is the most obvious way of saving and restoring the logging context. Here is a simple example of such a context manager, which allows you to optionally change the logging level and add a logging handler purely in the scope of the context manager:

import logging
import sys

class LoggingContext(object):
    def __init__(self, logger, level=None, handler=None, close=True):
        self.logger = logger
        self.level = level
        self.handler = handler
        self.close = close

    def __enter__(self):
        if self.level is not None:
            self.old_level = self.logger.level
            self.logger.setLevel(self.level)
        if self.handler:
            self.logger.addHandler(self.handler)

    def __exit__(self, et, ev, tb):
        if self.level is not None:
            self.logger.setLevel(self.old_level)
        if self.handler:
            self.logger.removeHandler(self.handler)
        if self.handler and self.close:
            self.handler.close()
        # implicit return of None => don't swallow exceptions

If you specify a level value, the logger's level is set to that value in the scope of the with block covered by the context manager. If you specify a handler, it is added to the logger on entry to the block and removed on exit from the block. You can also ask the manager to close the handler for you on block exit - you could do this if you don't need the handler any more.

To illustrate how it works, we can add the following block of code to the above:

if __name__ == '__main__':
    logger = logging.getLogger('foo')
    logger.addHandler(logging.StreamHandler())
    logger.setLevel(logging.INFO)
    logger.info('1. This should appear just once on stderr.')
    logger.debug('2. This should not appear.')
    with LoggingContext(logger, level=logging.DEBUG):
        logger.debug('3. This should appear once on stderr.')
    logger.debug('4. This should not appear.')
    h = logging.StreamHandler(sys.stdout)
    with LoggingContext(logger, level=logging.DEBUG, handler=h, close=True):
        logger.debug('5. This should appear twice - once on stderr and once on stdout.')
    logger.info('6. This should appear just once on stderr.')
    logger.debug('7. This should not appear.')

We initially set the logger's level to INFO, so message #1 appears and message #2 doesn't. We then change the level to DEBUG temporarily in the following with block, and so message #3 appears. After the block exits, the logger's level is restored to INFO and so message #4 doesn't appear. In the next with block, we set the level to DEBUG again but also add a handler writing to sys.stdout. Thus, message #5 appears twice on the console (once via stderr and once via stdout). After the with statement's completion, the status is as it was before so message #6 appears (like message #1) whereas message #7 doesn't (just like message #2).

If we run the resulting script, the result is as follows:

vinay@ubuntu:~/projects/scratch$ python logctx.py 
1. This should appear just once on stderr.
3. This should appear once on stderr.
5. This should appear twice - once on stderr and once on stdout.
5. This should appear twice - once on stderr and once on stdout.
6. This should appear just once on stderr.

If we run it again, but pipe stderr to /dev/null, we see the following, which is the only message written to stdout:

vinay@ubuntu:~/projects/scratch$ python logctx.py 2>/dev/null
5. This should appear twice - once on stderr and once on stdout.

Once again, but piping stdout to /dev/null, we get:

vinay@ubuntu:~/projects/scratch$ python logctx.py >/dev/null
1. This should appear just once on stderr.
3. This should appear once on stderr.
5. This should appear twice - once on stderr and once on stdout.
6. This should appear just once on stderr.

In this case, the message #5 printed to stdout doesn't appear, as expected.

Of course, the approach described here can be generalised, for example to attach logging filters temporarily. The content of this post will shortly be added to the Logging Cookbook in the Python documentation. Note that the code works in Python 2 as well as Python 3.

Thursday, 26 July 2012

A potential security vulnerability in logging.fileConfig()

Although now superseded by the dictConfig() API proposed in PEP 391, the much older fileConfig() API has been widely used to configure logging. It is also possible to do on-the-fly reconfiguration of logging through logging’s listen() API, which listens on a socket for configuration data in either the format understood by fileConfig(), or JSON format for use with dictConfig(). Although listen() only ever binds to localhost and so cannot be reached from a remote machine, it is possible in certain scenarios, if you use the listen() function, that code which you don’t control will be executed in the process which calls listen(). To see how, consider a shared hosting environment where multiple users, who have no connection with each other other than being hosted on the same server, run processes. Ordinarily these processes are isolated from each other, but if one user’s process uses listen(), another user’s process could connect to the listened-on socket and send arbitrary configuration data. Because fileConfig() calls eval() under the hood to convert from text elements in the configuration text to Python objects for use in the configuration operation, potentially arbitrary code could be sent in a logging configuration from a malicious user to a victim, perhaps resulting in undesirable consequences for the victim when that code is run by eval(). (Thanks are due to Graham Dumpleton for reporting this vulnerability.)

Why does fileConfig() use eval(), and how can the vulnerability be tackled? Although recent versions of Python provide the ast module to assist with processing Python source code, this module was not available when fileConfig() was written. The type of configuration data which needs to be evaluated can be more complex than a simple parser can cope with: for example, a configuration may refer to a handler’s class, whether it be a stdlib-included one such as handlers.WatchedFileHandler, or a user-defined or third-party handler such as mailinglogger.MailingLogger. The configuration can also specify the constructor arguments for a handler, which can also be fairly involved to parse in ad hoc fashion. For this reason, eval() was used.

Users have already been alerted to the vulnerability via an update to the documentation, but that is perhaps not enough. One way of addressing the vulnerability is to not use eval(), but some other evaluation mechanism which is more sandboxed. One suggestion which has been made is to use ast.literal_eval(), but this only copes with Python literals – it does no name lookup, and so can’t cope with expressions like handlers.WatchedFileHandler.

As an experiment, I put together a simple expression evaluator which uses the ast module to parse input text and probably provides the bare minimum evaluation of AST nodes to support common use cases:

This shows a simple evaluator together with a simple test harness which evaluates user-entered expressions in the context of the logging package:

However, limiting the scope of evaluation is not the complete answer, and perhaps not the correct answer. For example, a malicious user could still send a bogus configuration which, for example, just turns the verbosity of all loggers off, or configures a huge number of bogus loggers. This would certainly be allowed even by a limited-evaluation scheme if a user legitimately wanted to do so; but if a malicious user sends the exact same “legal” configuration, it is still a security exploit because the consequences may be undesirable to the victim.

But how is the listener to know whether or not the configuration is coming from a legitimate source (a client process controlled by the same user who is running the process which uses listen())  or a malicious user (a client process controlled by some other user)? The simplest answer would appear to be a shared secret: When listen() is called, it is passed a text passphrase, which is also known to legitimate clients. When handling a configuration request via the socket, the configuration is checked to see if it contains the passphrase. If it does, the request is processed; otherwise, it is ignored.

In the fileConfig() input data, the passphrase could be provided via a passphrase=secret entry in the [default] section. In the dictConfig() input data, the passphrase could be provided against the passphrase key in the dict which is passed to dictConfig(). The checking would be done in the request handler code before calling fileConfig() or dictConfig(). If the passphrase argument to the listen() call is None (the default, preserving the current behaviour) no passphrase checking would be done.

As always, comments are welcome. Although this vulnerability is limited to specific scenarios, it would be good to address it as soon as possible. As Python 3.3 has entered beta, and is therefore in feature freeze, it is likely that a fix will go into Python 3.4 – the vulnerability is not currently considered serious enough to warrant fixing in earlier versions.

Sunday, 29 April 2012

Introspecting the logger hierarchy

Brandon Rhodes’ post about his logging_tree project got me thinking about providing this sort of functionality in the logging package. Though introspection of the logger hierarchy has been mentioned a couple of times in newsgroup postings over the years, it has never actually been requested as a fleshed-out feature, and until Brandon’s post, it was below my radar. Now, thanks to him, I have given it some attention.

There are a number of things which need to be considered when providing some way of introspecting the logger hierarchy:

  • In many systems, the logging hierarchy doesn’t change once it’s configured. However, there are also systems where it does – logging provides the facility for on-the-fly reconfiguration without having to restart a Python process. Therefore, a general introspection facility must act as a snapshot – the state of the hierarchy at a particular time when the snapshot was requested.
  • The introspection facility must allow both built-in and user-defined Logger, Handler, Formatter, Filter instances to return useful information.
  • The data returned by the introspection facility should be, as far as practicable, immutable. Users should not expect to get direct access to individual logging elements (as there are already specific APIs for doing that).
  • The introspection facility should perhaps eschew any presentation of the hierarchy; this is likely to be very application-dependent, and it is debatable whether it is worth providing a “default” presentation. Example presentations would be a textual printout (such as logging_tree provides) or a depiction of the hierarchy in a browser-based or desktop GUI.

I’ve made an attempt at addressing the above points when implementing my own take on Brandon’s work. Although I’ve no definite plan to include this functionality in Python 3.3, I’m publishing this post in order to get feedback about the approach used, and also whether something like this belongs in the core package.

The basic API is provided by a single function, get_snapshot(), which returns a dictionary. The keys of that dictionary are logger names, with '' being the key for the root logger. The corresponding values are instances of namedtuple subclasses; they are snapshots of the loggers, handlers, formatters and filters.

Here’s how it might work. Running the code snippet

logging.getLogger('logtree.test')
logging.basicConfig(filename='logtree.log')
result = get_snapshot()
tw = textwrap.TextWrapper()
tw.subsequent_indent = ' ' * 4
for name, info in sorted(result.items()):
    print(tw.fill('%r -> %r' % (name, info)))
    print('-'*40)

would display something like

'' -> logging_RootLoggerInfo(class_='logging.RootLogger', level=30,
    propagate=1, children=(('logtree',
    logging_PlaceHolderInfo(class_='logging.PlaceHolder',
    children=(('logtree.test',
    logging_LoggerInfo(class_='logging.Logger', name='logtree.test',
    level=0, handlers=(), disabled=0, propagate=1, filters=(),
    children=())),))),), filters=(),
    handlers=(logging_FileHandlerInfo(class_='logging.FileHandler',
    encoding=None, level=0, delay=None,
    filename='C:\\temp\\logtree.log', mode='a', filters=(),
    formatter=logging_FormatterInfo(class_='logging.Formatter',
    style='%', datefmt=None,
    format='%(levelname)s:%(name)s:%(message)s')),))
----------------------------------------
'logtree' -> logging_PlaceHolderInfo(class_='logging.PlaceHolder',
    children=(('logtree.test',
    logging_LoggerInfo(class_='logging.Logger', name='logtree.test',
    level=0, handlers=(), disabled=0, propagate=1, filters=(),
    children=())),))
----------------------------------------
'logtree.test' -> logging_LoggerInfo(class_='logging.Logger',
    name='logtree.test', level=0, handlers=(), disabled=0,
    propagate=1, filters=(), children=())
----------------------------------------

Of course, this is very simplified formatting, and can be much improved on, but it gives the gist of what’s being returned. The various classes whose names end in Info are the namedtuple subclasses, and the corresponding instances represent particular logging objects. You can see that specific attributes are displayed; these can be customised quite flexibly.

Customising is done via a configuration dictionary – if you don’t supply one, a default is used. The full definition of get_snapshot() is:

def get_snapshot(config=None):
    return Snapper(config).snapshot()

If no configuration dictionary is specified, a default one – available at Snapper.DEFAULT_CONFIG – is used. You can make a copy of that, add or change elements in your copy, and then pass that in to get_snapshot. (You could also subclass the Snapper class if you really need to, but in most cases, you shouldn’t need to.)

The configuration dictionary currently has two keys - attrmap and format – whose values are dictionaries. The dictionary is used when converting each logging object to its snapshot representation. Each object, and each attribute of that object, is passed recursively through the conversion process. This process works roughly as follows:

  1. A dict instance is converted to another dict instance whose keys and values have passed through the conversion process.
  2. Instances of list and tuple are converted to a tuple instance whose elements are the converted source elements.
  3. Instances of int, float, and str are passed through unchanged. None is passed through unchanged. (If needed, this could also be applied to complex and decimal.Decimal types, but it seems unlikely these would be used in a logging configuration.)
  4. For instances, a qualified class name is computed from instance.__class__.__name__ and instance.__module__ (if present). This is used as the key to the attrmap and format configuration sub-dictionaries. The format dictionary is checked first – if the key is present, the value should be a callable which takes a single argument – the instance – and returns the representation. This is used, for example, to convert internal implementation class instances such as PercentStyle, StrFormatStyle and StringTemplateStyle (in Python 3.2 and later) to the strings '%', '{' and '$'. (The value for config['attrmap']['logging.PercentStyle'] is lambda x: '%'.) If the key is not present in format, then attrmap is checked for the key. If present, the value should be a space-separated list of attribute names to bring into the snapshot; if an attribute name starts with an underscore, you can represent this as e.g. _fmt:format, where the instance attribute in the logging object will be _fmt but the field name in the corresponding namedtuple will be format. (Field names in named tuples can’t start with an underscore.) All named tuple classes have the field name class_ included, which holds the class name of the type represented by the named tuple.

This arrangement allows user-defined handler classes, for example, to be easily accommodated by just adding corresponding entries in the configuration dictionary. A part of the default configuration dictionary is shown below:

DEFAULT_CONFIG = {
    'attrmap': {
        'TextIOWrapper': 'name mode encoding', # for Python 3.x
        'file': 'name mode encoding', # for Python 2.x
        'logging.Formatter': 'datefmt _fmt:format _style:style',
        'logging.Logger': 'disabled filters handlers level name '
                          'propagate children',
        'logging.StreamHandler': 'filters formatter level stream',
        'logging.FileHandler': 'filters formatter level '
            'baseFilename:filename mode encoding delay',
        'logging.handlers.SMTPHandler': 'filters formatter level mailhost '
            'mailport fromaddr toaddrs username password subject secure '
            'timeout',
    },
    'format': {
        'logging.PercentStyle': lambda x : '%',
        'logging.StrFormatStyle': lambda x : '{',
        'logging.StringTemplateStyle': lambda x : '$',
    },
}

Just as an exercise, I coded up an implementation of a presentation which echoes Brandon’s formatting. You can see these in this gist, which contains two files: logtree.py, which contains the introspection implementation, and lttest.py, a simple test which defines a logging configuration similar to Django’s default configuration and prints out representations using Brandon’s logging_tree and the logtree.py implementation. The differences between the representations are not especially significant, and relate to the representation of individual handlers:

SHGHFV~A

Comments are welcome on the approach used, the desirability (or otherwise) of adding the logtree.py functionality to the stdlib, or any other pertinent point.

Wednesday, 7 March 2012

Structured logging, standardised

Moves are afoot in the Linux world to standardise structured logging – logging in such a way that log event data can be parsed in a standard fashion to extract useful information for later analysis. One such project is Fedora’s lumberjack, which aims to implement the specifications proposed in MITRE’s CEE (Common Event Expression). Another Fedora initiative is embodied in the ELAPI project, which covers a slightly wider problem domain.

The basic idea is to format messages using structured formats such as JSON or XML. These messages can be sent conventionally across existing transport mechanisms such as syslog, and the aim is that in due course, structured messages can be routed to structured message stores (e.g. MongoDB databases) rather than plain text files.

Python logging already supports structured logging in a generic way, with third-party handlers available both for RDBMS and MongoDB. However, a key element of the emerging standards is interoperability, so it may be that some Python application developers will need to support these standards at some point.

This support can be provided using a custom formatter class. For example,let’s look at using the CEE standard and assume that JSON will be used as the structured format, and syslog as the transport. A CEE event must contain, as a minimum, the following fields:

Name Description
p_proc The name of the process that produced the event.
p_sys The hostname of the system that generated the event.
time The time the event occurred. This timestamp should have microsecond granularity (1E-6 seconds) and include the timezone GMT offset.

The time format needed is not quite the same as the default provided by logging, but it should be easy enough to provide using the datetime class. The complete CEE schema is quite involved and I won’t attempt to detail it here – you can use the links provided above if you need more information.

Here is an example script showing how you can use a formatter to provide CEE-compliant log output:

#!/usr/bin/env python
import datetime
import json
import logging

import os
import socket
import sys

import pytz

class CEEFormatter(logging.Formatter):
    cee_token = ' cee: '

    def __init__(self, fmt=None, datefmt=None, tz=None):

        # We don't use super because logging classes are
        # old-style classes on 2.x
        logging.Formatter.__init__(self, fmt, datefmt)

        self.hostname = socket.gethostname()
        self.procname = os.path.basename(sys.argv[0])
        self.tz = tz

    def format(self, record):
        result = logging.Formatter.format(self, record)
        structured = self.format_structured(record)
        if structured:
            result += '%s%s' % (self.cee_token, json.dumps(structured))
        return result

    def get_event_time(self, record):
        return datetime.datetime.fromtimestamp(record.created,
                                               tz=self.tz).isoformat()

    def format_structured(self, record):
        result = {
            'Event': {
                'p_proc': self.procname,
                'p_sys': self.hostname,
                'time': self.get_event_time(record)
            },
        }
        return result

def main():
    if len(sys.argv) > 1:
        try:
            tz = pytz.timezone(sys.argv[1])
        except pytz.UnknownTimeZoneError:
            print('Unknown timezone %r, using Europe/London' % sys.argv[1])
            tz = pytz.timezone('Europe/London')
    else:
        tz = pytz.timezone('Europe/London')

    formatter = CEEFormatter('%(message)s', tz=tz)
    record = logging.makeLogRecord(dict(msg='Hello, world!'))

    print(formatter.format(record))

if __name__ == '__main__':
    sys.exit(main())

In the above script, the CEEFormatter class represents just one way in which you can implement a formatter to implement CEE-compatible logging. The CEE structured data is appended to the message, which can (for example) be sent via a SysLogHandler to a local or remote host. The cee_token is a value which indicates to a CEE-aware receiver that the message contains structured data. This is given as "cee:" in the CEE online documentation, but I’ve seen uses of "@cee:" as well. Whatever the standard decides upon, you can set this at a per-instance level for testing etc.

The format method of the CEEFormatter class calls the superclass method to format the message, and then calls format_structured with the record to get a dict indicating what the CEE structure should be. If the method returns a non-empty dict, it is formatted into JSON and appended to the message, preceded by the CEE token. In the above example, the absolutely minimal CEE-compatible message is formatted.

The get_event_time method is provided to factor out the event time format, just in case it has minor format variations you want to support – in which case you need only override this method. The example shows use of pytz to set up timezones, but this may not be needed in your case.

In practice, you can use one of the established methods (extra parameter to logging call, LoggerAdapter or Filter) to pass context information in the LogRecord, which can be used by the format_structured method to add additional, context-dependent information to the structured portion of the message.

If you run the above script (converted into an executable using ln –sT ceefmt.py ceefmt; chmod u+x ceefmt), you might see results similar to the following.

$ ./ceefmt
Hello, world! cee: {"Event": {"time": "2012-03-07T20:21:16.612857+00:00", "p_proc": "ceefmt", "p_sys": "eta-jaunty"}}
$ ./ceefmt Europe/Rome
Hello, world! cee: {"Event": {"time": "2012-03-07T21:21:29.021881+01:00", "p_proc": "ceefmt", "p_sys": "eta-jaunty"}}
$ ./ceefmt America/Chicago
Hello, world! cee: {"Event": {"time": "2012-03-07T14:21:41.745214-06:00", "p_proc": "ceefmt", "p_sys": "eta-jaunty"}}
$ ./ceefmt America/Anchorage
Hello, world! cee: {"Event": {"time": "2012-03-07T11:21:47.686027-09:00", "p_proc": "ceefmt", "p_sys": "eta-jaunty"}}
$ ./ceefmt Pacific/Auckland
Hello, world! cee: {"Event": {"time": "2012-03-08T09:21:57.687873+13:00", "p_proc": "ceefmt", "p_sys": "eta-jaunty"}}

Comments welcome.

Tuesday, 13 December 2011

Improved flexibility for log file rotation

The RotatingFileHandler and TimedRotatingFileHandler classes provide basic log file rotation functionality, but there are times when more functionality is required than is provided in these classes. One requirement is to compress log files when they are rotated. Currently, if you want this kind of functionality, you have to subclass the relevant handler and override the doRollover method. This works, but requires you to copy and paste the entire method, and then adapt it to your needs; clearly, this is not ideal. An example of this can be found in this ActiveState recipe for a TimedCompressedRotatingFileHandler. This uses the .zip format to compress log files.

In order to provide improved flexibility for log file processing (which will generally be compression, but in practice any transformation could be performed), I propose to add two new methods to these handlers (in their common base class, BaseRotatingHandler).

The first method is named rotate. It takes two arguments, source and dest. The source argument is the name of the log file about to be rotated (e.g. test.log), and the dest argument names the target of the rotation (e.g test.log.1). Under normal circumstances, the file named by dest should not exist, having been explicitly removed by the rollover logic. the default behaviour is what happens currently – a call to os.rename(source, dest).

Since we are potentially transforming log files, we might also want to change their extension, so that they are more usable with e.g. file system viewers like Nautilus or Windows Explorer. To facilitate this, a second method is provided, called filename. This takes a single argument – the filename of a destination file in a rotation (for example, test.log.1)  – and returns a modified version (for example, adding an extension – test.log.1.gz). The default behaviour is to return the name unchanged.

The definition of the new methods is as follows.

    def filename(self, name):

        """
        Modify the filename of a log file when rotating.

        This is provided so that a custom filename can be provided,
        say by adding an extension.

        The default implementation returns the name unchanged.

        :param name: The default name for the destination log file.
        """

    def rotate(self, source, dest):

        """
        When rotating, rotate the current log in the file named by
        source to the file named by dest.

        On entry, the file named by source should exist and the file
        named by dest should not exist.

        On exit, the file named by source should not exist and the

        file named by dest should exist.

        The default implementation simply renames source to dest.

        :param source: The source filename. This is normally the base
                       filename, e.g. 'test.log'
        :param dest:   The destination filename. This is normally
                       what the source is rotated to, e.g. 'test.log.1'.
        """

This will allow a custom rotation strategy to be implemented relatively easily: for example, if you want to gzip rotated log files, you can ensure that filename returns test.log.1.gz when passed test.log.1, and that rotate gzips test.log to test.log.1.gz.

The ActiveState example I cited earlier used the .zip format, but what if you want to use .gz, .bz2, .7z or some other compression algorithm? You would need to support all of them in a single handler, or write multiple handler classes for each format.

To avoid the need for users to subclass the rotating handlers, I further propose to allow configurability of the rotate and filename methods by providing two new handler attributes (both, by default, set to None in BaseRotatingHandler).

The first handler attribute is named rotator. You can set it to a callable that takes two arguments, source and dest. The meanings of these arguments are the same as for the rotate method. If a callable is provided in the rotator attribute, it is called by the rotate method to perform the rotation.

The second handler attribute is called namer. You can set this to a callable which takes a single argument – the filename of a destination file in a rotation (for example, test.log.1)  – and modifies it to, for example, add an extension. If a callable is set, then it is called by the filename method to return the new name.

An example of how you can define a namer and rotator is given in the following snippet, which shows zlib-based compression of the log file.

def namer(name):
    return name + ".gz"

def rotator(source, dest):
    with open(source, "rb") as sf:
        data = sf.read()
        compressed = zlib.compress(data, 9)
        with open(dest, "wb") as df:
            df.write(compressed)
    os.remove(source)

rh = logging.handlers.RotatingFileHandler(...)
rh.rotator = rotator
rh.namer = namer

I’m aware that these are not “true” .gz files, as they are bare compressed data, with no “container” such as you’d find in an actual gzip file. This snippet is just for illustration purposes.

Note: The namer function is called quite a few times during rollover, so it should be as simple and as fast as possible. It should also return the same output every time for a given input, otherwise the rollover behaviour may not work as expected. If either the namer or rotator function raises an exception, this will be handled in the same way as any other exception during an emit() call, i.e. via the handleError method of the handler.

The internal logic for determining which files to delete (when a backupCount is specified on a rotating file handler) uses, in the case of  TimedRotatingFileHandler, regular expressions for the date/time-based suffix. The regular expressions used have been updated to allow an optional file extension consisting of a period followed by one or more letters and digits.

These changes will soon appear in Python 3.3’s development repository. Your comments are welcome.

Thursday, 9 June 2011

Configurability tweaks for SysLogHandler and Formatter

There have been some recent changes to SysLogHandler and Formatter which will appear in Python releases.

  1. SysLogHandler currently appends a NUL byte to messages it sends to syslog daemons. This is because earlier versions of these daemons expected the NUL terminator to be present, even though it’s not part of the relevant specification (RFC 5424). More recent releases of these daemons don’t expect the NUL byte to be there, but strip it off if it’s present. And some even more recent daemons follow the specification more strictly, and pass the NUL byte on as part of the message. In order to facilitate dealing with all the various possible destinations for SysLogHandler messages, a new class-level attribute, append_nul, has been added to SysLogHandler. The default value is True, thus preserving current behaviour; however, you can set it to False for specific handler instances where you don’t want the NUL byte to be appended. This change is expected to appear in Python 3.2.1, which should be released soon.
  2. In the same vein, the date formatting logic used by Formatter has been changed to make it more configurable. Previously, the default ISO 8601 format was hard-coded as in this example: 2010-09-06 22:38:15,292 where the part before the comma is handled by a strptime format string ('%Y-%m-%d %H:%M:%S'), and the part after the comma is a millisecond value. Because strptime does not have a format placeholder for milliseconds, the millisecond value is appended using another format string, '%s,%03d' – both of these format strings have been hardcoded into the Formatter.formatTime method. That’s set to change with Python 3.3 – these strings will be defined as class-level attributes which can be overridden at the instance level when desired. The names of the attributes are default_time_format (for the strptime format string) and default_msec_format (for appending the millisecond value).