Finding C reference leaks using the gc module

A while back I got a defect assigned to me complaining that clicking a button on our server backend web pages caused the server to freeze.  The link was on our “python” page, containing various tools and information on the python interpreter embedded in EVE.  The link itself was, interestingly enough, called “Leaky C++”.

Looking at the source code I saw code similar to this:

import gc
def get_leaking_objects_naive():
    all = gc.get_objects()
    result = []
    find = [all]
    for i in xrange(len(all)):
        if gc.get_referrers(all[i]) == find:
            result.append(all[i])
    return result

The idea is to find all objects that do not appear to be referenced by any other object and produce a report on those objects.  The problem with this code is, however, that it is O(N^2).   When there are sufficient objects in the system, this code takes forever to run.

I disabled this code and thought nothing more of it.  But recently, it occurred to me that the idea might not be too bad and whether there might be a better way to do this.  It turns out there is.  Instead of finding the referrers in this manner, we use another gc method, gc.get_referents() that returns objects immediately referred to by an object.  This is an O(1) operation and by repeately using it and eliminating objects that are such targets we can weed out everything that has referrers.  We then  end up with the list of objects that have no referrers, in one fell O(N) swoop:

def get_leaking_objects():
    #create a dict of ids to objects
    all = dict((id(i), i) for i in gc.get_objects())

    #find all the objects that aren't referred to by any other object
    ids = set(all.keys())
    for i in all.values():
        ids.difference_update(id(j) for j in gc.get_referents(i))

    #this then is our set of objects without referrers
    return [all[i] for i in ids]

This turns out to work surprisingly well.   Combined with a object hierarchy browser, this allows us to find suspicious objects, identify them and thus home in on the C code that may be causing trouble.

There is a caveat to this, and it is that gc.get_objects() and gc.get_referents() are documented to only return objects that can be part of a reference cycle.  So your leaking strings and integers won’t show up using this tool.

update:

I just made two improvements to the code.

  1. It is faster and uses less memory to skip creating a dict out of the objects.
  2. We must make sure not to leave cyclic references lying about.  the “all” variable contains the current function frame so unless we clear it, the frame and its contents (including “all”) isn’t released immediately.
def get_leaking_objects2():
    all = gc.get_objects()
    try:
        ids = set(id(i) for i in all)
        for i in all:
            ids.difference_update(id(j) for j in gc.get_referents(i))
        #this then is our set of objects without referrers
        return [i for i in all if id(i) in ids]
    finally:
        all = i = j = None #clear cyclic references to frame
Advertisement