Exception leaks in Python 2 and 3

Recently I decided to port a little package that I had to Python 3, and ran into the traceback reference cycle problem.  This blog is the result of the detective work I had to do, both to re-familiarize myself with this issue (I haven’t been doing this sort of stuff for a few years) and to uncover the exact behaviour in Python 3.

Background

In Python 2, exceptions are stored internally as three separate objects: The type, the value and the traceback objects. The value is normally an instance of the type by the time your python code runs, so mostly we are dealing with value and traceback only. There are two pitfalls one should be aware of when writing exception handling code.

The traceback cycle problem

Normally, you don’t worry about the traceback object. You write code like:

def foo():
    try:
        return bar()
    except Exception as e:
        print "got this here", e

The trouble starts when you want to do something with the traceback. This could be to log it, or maybe translate the exception to something else:

def foo():
    try:
        return bar()
    except Exception as e:
        type, val, tb = sys.exc_info()
        print "got this here", e, repr(tb)

The problem is that the traceback, stored in tb, holds a reference to the execution frame of foo, which again holds the definition of tb. This is a cyclic reference and it means that both the traceback, and all the frames it contains, won’t disappear immediately.

This is the “traceback reference cycle problem” and it should be familiar to serious Python developers. It is a problem because a traceback contains a link to all the frames from the point of catching the exception to where it occurred, along with all temporary variables. The cyclic garbage collector will eventually reclaim it (if enabled) but that occurs unpredictably, and at some later point. The ensuing memory wastage may be problematic, the latency involved when gc finally runs, or it may cause problems with unittests that rely on reference counts to detect when objects die. Ideally, things should just go away when not needed anymore.

The same problem occurs whenever the traceback is present in a frame where an exception is raised, or caught. For example, this pattern here will also cause the problem in the called function translate() because tb is present in the frame where it is raised.

def translate(tp, val, tb):
    # translate this into a different exception and re-raise
    raise MyException(str(val)), None, tb

In python 2, the standard solution is to either avoid retrieving the traceback object if possible, e.g. by using

tp, val = sys.exc_info()[:2]

or by explicitly clearing it yourself and thus removing the cycle:

def translate(tp, val, tb):
    # translate this into a different exception and re-raise
    try:
        raise MyException(str(val)), None, tb
    finally:
        del tb

By vigorous use of try-finallythe prudent programmer avoids leaving references to traceback objects on the stack.

The lingering exception problem

A related problem is the lingering exception problem. It occurs when exceptions are caught and handled in a function that then does not exit, for example a driving loop:

def mainloop():
    while True:
        try:
            do_work()
        except Exception as e:
            report_error(e)

As innocent as this code may look, it suffers from a problem: The most recently caught exception stays alive in the system. This includes its traceback, even though it is no longer used in the code. Even clearing the variable won’t help:

report_error(e)
e = None

This is because of the following clause from the Python documentation:

If no expressions are present, raise re-raises the last exception that was active in the current scope.

In Python 2, the exception is kept alive internally, even after the try-except construct has been exited, as long as you don’t return from the function.

The standard solution to this, in Python 2, is to use the exc_clear() function from the sys module:

def mainloop():
    while True:
        try:
            do_work()
        except Exception as e:
            report_error(e)
        sys.exc_clear() # clear the internal traceback

The prudent programmer liberally sprinkles sys.exc_clear() into his mainloop.

Python 3

In Python 3, two things have happened that change things a bit.

  1. The traceback has been rolled into the exception object
  2. sys.exc_clear() has been removed.

Let’s look at the implications in turn.

Exception.__traceback__

While it unquestionably makes sense to bundle the traceback with the exception instance as an attribute, it means that traceback reference cycles can become much more common. No longer is it sufficient to refrain from examining sys.exc_info(). Whenever you store an exception object in a variable local to a frame that is part of its traceback, you get a cycle. This includes both the function where the exception is raised, and where it is caught.

Code like this is suspect:

def catch():    
    try:
        result = bar()
    except Exception as e:
        result = e
    return result

The variable result is part of the frame that is referenced result.__traceback__ and a cycle has been created.

(Note that the variable e is not problematic. In Python 3, this variable is automatically cleared when the except clause is exited.)

similarly:

def reraise(tp, value, tb=None):
    if value is None:
        value = tp()
    if value.__traceback__ is not tb:
        raise value.with_traceback(tb)
    raise value

(The above code is taken from the six module)

Both of these cases can be handled with a well placed try-finally to clear the variables result, value and tb respectively:

def catch():    
    try:
        result = bar()
    except Exception as e:
        result = e
    try:
        return result
    finally:
        del result
def reraise(tp, value, tb=None):
    if value is None:
        value = tp()
    try:
        if value.__traceback__ is not tb:
            raise value.with_traceback(tb)
        raise value
    finally:
        del value, tb
Note that the caller of reraise() also has to clear his locals that he used as an argument, because the same exception is being re-raised and the caller's frame will get added to the exception:
try:
    reraise(*exctuple):
finally:
    del exctuple

The lesson learned from this is the following:

Don’t store exceptions in local objects for longer than necessary. Always clear such variables when leaving the function using try-finally.

sys.exc_clear()

This method has been removed in Python 3. This is because it is no longer needed:

def mainloop():
    while True:
        try:
            do_work()
        except Exception as e:
            report_error(e)
        assert sys.exc_info() == (None, None, None)

As long as sys.exc_info() was empty when the function was called, i.e. it was not called as part of exception handling, then the inner exception state is clear outside the Except clause.

However, if you want to hang on to an exception for some time, and are worried about reference cycles or memory usage, you have two options:

  1. clear the exception’s __traceback__ attribute:
    e.__traceback__ = None
  2. use the new traceback.clear_frames() function
    traceback.clear_frames(e.__traceback__)

clear_frames() was added to remove local variables from tracebacks in order to reduce their memory footprint. As a side effect, it will clear the reference cycles.

Conclusion

Exception reference cycles are a nuisance when developing robust Python applications. Python 3 has added some extra pitfalls. Even though the local variable of the except clause is automatically cleared, the user must himself clear any other variables that might contain exception objects.

 

 

Reference cycles with closures

Polishing our forthcoming console game, our team in Shanghai are relentlessly trying to minimize python memory use.
Today, an engineer complained to me that “cell” objects were being leaked(*).

This rang a bell with me. In 2009, I had posted about this to python-dev.
The response at the time wasn’t very sympathetic. I should be doing stuff differently or simply rely on the cyclic garbage collector and not try to be clever. Yet, as I pointed out, parts of the library are aware of the problem and do help you with these things, such as the xml.dom.minidom.unlink() method.

The data being leaked now appeared to pertain to the json module:

[2861.88] Python: 0: <bound method JSONEncoder.default of <json.encoder.JSONEncoder object at 0x12e14010>>
[2861.88] Python: 1: <bound method JSONEncoder.default of <json.encoder.JSONEncoder object at 0x12e14010>>
[2861.88] Python: 2: <bound method JSONEncoder.default of <json.encoder.JSONEncoder object at 0x12e14010>>

This prompted me to have a look in the json module, and behold, json.encoder contains this pattern:
[python]
def _make_iterencode(…)

def _iterencode(o, _current_indent_level):
if isinstance(o, basestring):
yield _encoder(o)
elif o is None:
yield ‘null’
elif o is True:
yield ‘true’
elif o is False:
yield ‘false’
elif isinstance(o, (int, long)):
yield str(o)
elif isinstance(o, float):
yield _floatstr(o)
elif isinstance(o, (list, tuple)):
for chunk in _iterencode_list(o, _current_indent_level):
yield chunk
elif isinstance(o, dict):
for chunk in _iterencode_dict(o, _current_indent_level):
yield chunk
else:
if markers is not None:
markerid = id(o)
if markerid in markers:
raise ValueError(“Circular reference detected”)
markers[markerid] = o
o = _default(o)
for chunk in _iterencode(o, _current_indent_level):
yield chunk
if markers is not None:
del markers[markerid]

return _iterencode
[/python]

The problem is this: The returned closure has a func_closure() member containing the “cell” objects, one of which points to this function. There is no way to clear the func_closure method after use. And so, iterencoding stuff using the json module causes reference cycles that persist until the next collection, possibly causing python to hang on to all the data that was supposed to be encoded and then thrown away.

Looking for a workaround, I wrote this code, emulating part of what is going on:
[python]
def itertest(o):
def listiter(l):
for i in l:
if isinstance(i, list):
chunks = listiter(i)
for i in chunks:
yield i
else:
yield i
return listiter(o)
[/python]

Testing it, confirmed the problem:

>>> import celltest
>>> l = [1, [2, 3]]
>>> import gc, celltest
>>> gc.collect()
>>> gc.set_debug(gc.DEBUG_LEAK)
>>> l = [1, [2, 3]]
>>> i = celltest.itertest(l)
>>> list(i)
[1, 2, 3]
>>> gc.collect()
gc: collectable <cell 01E96B50>
gc: collectable <function 01E97330>
gc: collectable <tuple 01E96910>
gc: collectable <cell 01E96B30>
gc: collectable <tuple 01E96950>
gc: collectable <function 01E973F0>
3

To fix this, it is necessary to clear the “cell” objects once there is no more need for them. It is not possible to do this from the outside, so how about from the inside? Changing the code to:
[python]
def itertest2(o):
def listiter(l):
for i in l:
if isinstance(i, list):
chunks = listiter(i)
for i in chunks:
yield i
else:
yield i

chunks = listiter(o)
for i in chunks:
yield i
chunks = listiter = None
[/python]
Does the trick. the function becomes a generator, yields the stuff, then cleans up:

>>> o = celltest.itertest2(l)
>>> list(o)
[1, 2, 3]
>>> gc.collect()
0

It is an unfortunate situation. The workaround requires work to be done inside the function. It would be cool if it were possible to clear the function’s closure by calling, e.g. func.close(). As it is, people have to be aware of these hidden cycles and code carfully around them.

(*) Leaking in this case means not being released immediately by reference counting but lingering. We don’t want to rely on the gc module’s quirkiness in a video game.

Update:

In my toy code, I got the semantics slightly wrong.  Actually, it is more like this:
[python]
def make_iter():
def listiter(l):
for i in l:
if isinstance(i, list):
chunks = listiter(i)
for i in chunks:
yield i
else:
yield i
return listiter

def get_iterator(data):
it = make_iter()
return it(data)
[/python]

This complicates things. Nowhere is, during iteration, any code running in the scope of make_iter that we can use to clear those locals after iteration. Everything is running in nested functions and since I am using Python 2.7 (which doesn’t have the “nonlocal” keyword) there seems to be no way to clear the outer locals from the inner functions once iteration is done.

I guess that means that I’ll have to modify this code to use class objects instead.

Also, while on the topic, I think Raymond Hettinger’s class-like objects are subject to this problem if they have any sort of mutual or recursive relationship among their “members”.