Exception leaks in Python 2 and 3

Recently I decided to port a little package that I had to Python 3, and ran into the traceback reference cycle problem.  This blog is the result of the detective work I had to do, both to re-familiarize myself with this issue (I haven’t been doing this sort of stuff for a few years) and to uncover the exact behaviour in Python 3.

Background

In Python 2, exceptions are stored internally as three separate objects: The type, the value and the traceback objects. The value is normally an instance of the type by the time your python code runs, so mostly we are dealing with value and traceback only. There are two pitfalls one should be aware of when writing exception handling code.

The traceback cycle problem

Normally, you don’t worry about the traceback object. You write code like:

def foo():
    try:
        return bar()
    except Exception as e:
        print "got this here", e

The trouble starts when you want to do something with the traceback. This could be to log it, or maybe translate the exception to something else:

def foo():
    try:
        return bar()
    except Exception as e:
        type, val, tb = sys.exc_info()
        print "got this here", e, repr(tb)

The problem is that the traceback, stored in tb, holds a reference to the execution frame of foo, which again holds the definition of tb. This is a cyclic reference and it means that both the traceback, and all the frames it contains, won’t disappear immediately.

This is the “traceback reference cycle problem” and it should be familiar to serious Python developers. It is a problem because a traceback contains a link to all the frames from the point of catching the exception to where it occurred, along with all temporary variables. The cyclic garbage collector will eventually reclaim it (if enabled) but that occurs unpredictably, and at some later point. The ensuing memory wastage may be problematic, the latency involved when gc finally runs, or it may cause problems with unittests that rely on reference counts to detect when objects die. Ideally, things should just go away when not needed anymore.

The same problem occurs whenever the traceback is present in a frame where an exception is raised, or caught. For example, this pattern here will also cause the problem in the called function translate() because tb is present in the frame where it is raised.

def translate(tp, val, tb):
    # translate this into a different exception and re-raise
    raise MyException(str(val)), None, tb

In python 2, the standard solution is to either avoid retrieving the traceback object if possible, e.g. by using

tp, val = sys.exc_info()[:2]

or by explicitly clearing it yourself and thus removing the cycle:

def translate(tp, val, tb):
    # translate this into a different exception and re-raise
    try:
        raise MyException(str(val)), None, tb
    finally:
        del tb

By vigorous use of try-finallythe prudent programmer avoids leaving references to traceback objects on the stack.

The lingering exception problem

A related problem is the lingering exception problem. It occurs when exceptions are caught and handled in a function that then does not exit, for example a driving loop:

def mainloop():
    while True:
        try:
            do_work()
        except Exception as e:
            report_error(e)

As innocent as this code may look, it suffers from a problem: The most recently caught exception stays alive in the system. This includes its traceback, even though it is no longer used in the code. Even clearing the variable won’t help:

report_error(e)
e = None

This is because of the following clause from the Python documentation:

If no expressions are present, raise re-raises the last exception that was active in the current scope.

In Python 2, the exception is kept alive internally, even after the try-except construct has been exited, as long as you don’t return from the function.

The standard solution to this, in Python 2, is to use the exc_clear() function from the sys module:

def mainloop():
    while True:
        try:
            do_work()
        except Exception as e:
            report_error(e)
        sys.exc_clear() # clear the internal traceback

The prudent programmer liberally sprinkles sys.exc_clear() into his mainloop.

Python 3

In Python 3, two things have happened that change things a bit.

  1. The traceback has been rolled into the exception object
  2. sys.exc_clear() has been removed.

Let’s look at the implications in turn.

Exception.__traceback__

While it unquestionably makes sense to bundle the traceback with the exception instance as an attribute, it means that traceback reference cycles can become much more common. No longer is it sufficient to refrain from examining sys.exc_info(). Whenever you store an exception object in a variable local to a frame that is part of its traceback, you get a cycle. This includes both the function where the exception is raised, and where it is caught.

Code like this is suspect:

def catch():    
    try:
        result = bar()
    except Exception as e:
        result = e
    return result

The variable result is part of the frame that is referenced result.__traceback__ and a cycle has been created.

(Note that the variable e is not problematic. In Python 3, this variable is automatically cleared when the except clause is exited.)

similarly:

def reraise(tp, value, tb=None):
    if value is None:
        value = tp()
    if value.__traceback__ is not tb:
        raise value.with_traceback(tb)
    raise value

(The above code is taken from the six module)

Both of these cases can be handled with a well placed try-finally to clear the variables result, value and tb respectively:

def catch():    
    try:
        result = bar()
    except Exception as e:
        result = e
    try:
        return result
    finally:
        del result
def reraise(tp, value, tb=None):
    if value is None:
        value = tp()
    try:
        if value.__traceback__ is not tb:
            raise value.with_traceback(tb)
        raise value
    finally:
        del value, tb
Note that the caller of reraise() also has to clear his locals that he used as an argument, because the same exception is being re-raised and the caller's frame will get added to the exception:
try:
    reraise(*exctuple):
finally:
    del exctuple

The lesson learned from this is the following:

Don’t store exceptions in local objects for longer than necessary. Always clear such variables when leaving the function using try-finally.

sys.exc_clear()

This method has been removed in Python 3. This is because it is no longer needed:

def mainloop():
    while True:
        try:
            do_work()
        except Exception as e:
            report_error(e)
        assert sys.exc_info() == (None, None, None)

As long as sys.exc_info() was empty when the function was called, i.e. it was not called as part of exception handling, then the inner exception state is clear outside the Except clause.

However, if you want to hang on to an exception for some time, and are worried about reference cycles or memory usage, you have two options:

  1. clear the exception’s __traceback__ attribute:
    e.__traceback__ = None
  2. use the new traceback.clear_frames() function
    traceback.clear_frames(e.__traceback__)

clear_frames() was added to remove local variables from tracebacks in order to reduce their memory footprint. As a side effect, it will clear the reference cycles.

Conclusion

Exception reference cycles are a nuisance when developing robust Python applications. Python 3 has added some extra pitfalls. Even though the local variable of the except clause is automatically cleared, the user must himself clear any other variables that might contain exception objects.

 

 

Advertisements

Idioms for proxy function interfaces

At PyCon 2013 I saw a presentation, with a common function signature:

def call_later(when, function, *args):
    ...

This got me thinking about some guidelines I wrote recently on our internal tech blog about how to write such proxy functions. The current recommendation I have is for a different signature, for the reason I shall now explain:

Let’s say that you have a function that calls another function for some reason. You start with something like this:

def mywrapper(func, *args, **kwargs):
   do_something()
   return func(*args, **args)

At some point though, you add another higher level wrapper:

def mybigwrapper(func, *args, **kwargs):
   do_something()
   return mywraper(func, *args, **args)

This is ok, until someone notices that this is rather slow. The reason is, that arguments are constantly being packed and unpacked. Unnecessarily so, because no one is really looking at them. So a clever software engineer comes up with a solution:

def mywrapper(func, *args, **kwargs):
   return mywrapper_without_the_stars(args, args)

def mywrapper_without_the_stars(func, args, kwargs):
   do_something()
   return func(*args, **args)

def mybigwrapper(func, *args, **kwargs):
   do_something()
   return mywraper_without_the_stars(func, args, args)

What has happened? Yes, we have created a set of functions that do not take variable arguments, but rather just take the argument tuple and keyword dict. When you nest a number of those, there is no argument packing and unpacking going on and they are all passed through verbatim. We then have a thin layer outside that does the argument packing, for api backwards compatibility.

But there is a lesson here: Perhaps it is not such a good idea to do this style of interface in the first place. Why didn’t we just write:

def mywrapper(func, args=(), kwargs={}):
   do_something()
   return func(*args, **args)

to begin with? In my opinion, this is actually a much better interface. To illustrate, lets say that we want to wrap a call to myfunc(1,2,3). Compare these two styles:

return mywrapper(myfunc, 1, 2, 3)

 

return mywrapper(myfunc, (1, 2, 3))

In the former case, we are mixing the callable (myfunc) and its arguments (1, 2, 3) into one big list. This doesn’t really make the distinction that “myfunc” is the callable and “1” is its first argument, but rather they look semantically to be equivalent, as if they were all just a chunk of arguments. In my opinion it is much clearer, when using this sort of proxy functions, to make a distinction between the callable and its arguments.
Therefore, this is currently the recommended way within CCP to write such wrappers. They take the argument tuple (and keyword dict) as a non-variable argument to the function.  Variable argument lists are only used in two cases:

  1. When writing a function where that is appropriate, such as logging functions
  2. When writing wrapper functions that emulat other function’s signature.

But recently, I have been thinking even more about this because passing around “args” and “kwargs” everywhere seems unnecessarily clunky. And we arrive at the thesis of this blog post:

Wrapper functions should be written and used like this:

# wrapper takes an argument-less callable
def mywrapper(func):
   do_something()
   return func()

# call myfunc with default args
a = mywrapper(myfunc)

# call myfunc with some arguments
a = mywrapper(lambda : myfunc(1, 2, 3))

# call myfunc with something from this context
def call():
    return myfunc(foo, bar)
a = mywrapper(call)

In other words: How about using Python’s powerful lambda and closure semantics to add those arguments if and when they are needed, rather than to write layer upon layer of functions that manually carry around argument tuples and keyword dicts?

Blog moved

So! My previous blog (at blogs.ccpgames.com) disappeared.  It has taken this long for me to get things back up and running.  The previous blog was on a private server run by an external party and it was compromised by one of those sneaky internet maladies that infect sites like these.

I’m in the process of salvaging all the posts from there and get them running here.  This is a somewhat painstaking process.  I was provided with some files in a folder and I have had to re-learn unix (Its been 10 years since I used that regularly), learn about Apache, WordPress, mysql, multi-site installs and all kinds of things.  10 years ago, they didn’t have sudo.  I’m not sure that I like it.

Anyway, I hope to finish this soon and start blogging again about my adventures in Python. PyCon 2013 is coming up and I have, as always, some ideas to put forward and Swiss army knives to grind.

Float object reuse

I thought I’d mention a cool little patch we did to Python some years back.

We work with database tables a lot.  Game configuration data is essentially rows in a vast database.  And those rows contain a lot of floats.  At some point I recognized that common float values were not being reused.  In particular, id(0.0) != id(0.0).  I was a bit surprized by this, since I figured, some floats must be more common than others.  Certainly, 0.0 is a bit special.

I mentioned this on python-dev some years back but with somewhat underwhelming results.  A summary of the discussion can be found here.

Anyway, I thought I’d mention this to people doing a lot of floating point.  We saved a huge amount of memory on our servers just caching integral floating point values between -10 and +10, including both the negative and positive 0.0.  These values are very frequent, for example as multipliers in tables, and so on.

Here’s some of the code:

[C]

PyObject *
PyFloat_FromDouble(double fval)
{
    register PyFloatObject *op;
    int ival;
    if (free_list == NULL) {
        if ((free_list = fill_free_list()) == NULL)
            return NULL;
        /* CCP addition, cache common values */
        if (!f_reuse[0]) {
            int i;
            for(i = 0; i<21; i++)
                f_reuse[i] = PyFloat_FromDouble((double)(i-10));
        }
    }
    /* CCP addition, check for recycling */
    ival = (int)fval;
    if ((double)ival == fval && ival>=-10 && ival <= 10) {
#ifdef MS_WINDOWS
        /* ignore the negative zero */
        if (ival || _fpclass(fval) != _FPCLASS_NZ) {
#else
        /* can't differentiate between positive and negative zeroes, ignore both */
        if (ival) {
#endif
            ival+=10;
            if (f_reuse[ival]) {
                Py_INCREF(f_reuse[ival]);
                return f_reuse[ival];
            }
        }
    }

    /* Inline PyObject_New */
    op = free_list;
    free_list = (PyFloatObject *)Py_TYPE(op);
    PyObject_INIT(op, &PyFloat_Type);
    op->ob_fval = fval;
    return (PyObject *) op;
}

[/C]

(Please excuse the lame syntax highlighter with its &amp; and &lt; thingies 🙂

Temporary thread state overhead

When doing IO, it is sometimes useful for a worker thread to notify Python that something has happened. Previously we have just had the Python main thread “Poll” some external variable for that, but recently we have been experimenting with having the main thread just grab the GIL and perform python work itself.

This should be straightforward. Python has an api called PyGILState_Ensure() that can be called on any thread. If that thread doesn’t already have a Python thread state, it will create a temporary one. Such a thread is sometimes called an external thread.

On a server loaded to some 40% with IO, this is what happened when I turned on this feature:

process cpu

The dark gray area is main thread CPU, (initially at around 40%) and the rest is other threads.  Turning on the “ThreadWakeup” feature adds some 20% extra cpu work to the process.

When the main thread is not working, it is idle doing a MsgWaitForMultipleObjects() Windows system call (with the GIL unclaimed).  So the worker thread should have no problem acquiring the GIL.  Further, there is only ever one woker thread doing a PyGILState_Ensure()/PyGILState_Release() at the same time, and this is ensured using locking on the worker thread side.

Further tests seem to confirm that if the worker thread already owns a Python thread state, and uses that to aquire the GIL (using a PyEval_RestoreThread() call) this overhead goes away.

This was surprising to me, but it seems to indicate that it is very expensive to “acquire a thread state on demand” to claim the GIL.  This is very unfortunate, because it means that one cannot easily use arbitrary system threads to call into Python without significant overhead.  These might be threads from the Windows thread pool for example, threads that we have no control over and therefore cannot assign thread state to.

I will try to investigate this furter, to see where the overhead is coming from.  It could be the extra TLS calls made, or simply the cost of malloc()/free() involved.  Depending on the results, there are a few options:

  1. Keep a single thread state on the side for (the single) external thread that can claim the GIL at a time, ready and initialized.
  2. Allow an external thread to ‘borrow’ another thread state and not use its own.
  3. Streamline the stuff already present.

Update, oct. 6th 2011:
Enabling dynamic GIL with tread state caching did notthing to solve this issue.
I think the problem is likely to be that spin locking is in effect for the GIL. I’ll see what happens if I explicitly define the GIL to not use spin locking.