A while back I got a defect assigned to me complaining that clicking a button on our server backend web pages caused the server to freeze. The link was on our “python” page, containing various tools and information on the python interpreter embedded in EVE. The link itself was, interestingly enough, called “Leaky C++”.
Looking at the source code I saw code similar to this:
import gc def get_leaking_objects_naive(): all = gc.get_objects() result =  find = [all] for i in xrange(len(all)): if gc.get_referrers(all[i]) == find: result.append(all[i]) return result
The idea is to find all objects that do not appear to be referenced by any other object and produce a report on those objects. The problem with this code is, however, that it is O(N^2). When there are sufficient objects in the system, this code takes forever to run.
I disabled this code and thought nothing more of it. But recently, it occurred to me that the idea might not be too bad and whether there might be a better way to do this. It turns out there is. Instead of finding the referrers in this manner, we use another gc method, gc.get_referents() that returns objects immediately referred to by an object. This is an O(1) operation and by repeately using it and eliminating objects that are such targets we can weed out everything that has referrers. We then end up with the list of objects that have no referrers, in one fell O(N) swoop:
def get_leaking_objects(): #create a dict of ids to objects all = dict((id(i), i) for i in gc.get_objects()) #find all the objects that aren't referred to by any other object ids = set(all.keys()) for i in all.values(): ids.difference_update(id(j) for j in gc.get_referents(i)) #this then is our set of objects without referrers return [all[i] for i in ids]
This turns out to work surprisingly well. Combined with a object hierarchy browser, this allows us to find suspicious objects, identify them and thus home in on the C code that may be causing trouble.
There is a caveat to this, and it is that gc.get_objects() and gc.get_referents() are documented to only return objects that can be part of a reference cycle. So your leaking strings and integers won’t show up using this tool.
I just made two improvements to the code.
- It is faster and uses less memory to skip creating a dict out of the objects.
- We must make sure not to leave cyclic references lying about. the “all” variable contains the current function frame so unless we clear it, the frame and its contents (including “all”) isn’t released immediately.
def get_leaking_objects2(): all = gc.get_objects() try: ids = set(id(i) for i in all) for i in all: ids.difference_update(id(j) for j in gc.get_referents(i)) #this then is our set of objects without referrers return [i for i in all if id(i) in ids] finally: all = i = j = None #clear cyclic references to frame
6 thoughts on “Finding C reference leaks using the gc module”
Cool idea. Can I steal this for http://pypi.python.org/pypi/objgraph ?
Sure. Looks like objgraph is somethin I might find useful too!
Thanks, added to the upcoming 1.7.0dev version.
On my machine get_leaking_objects2() returns thousands of objects — mostly tuples. Do you see that too? How do you filter out interesting ones from all that chaff?
Thanks for this. I had a similar issue a couple of years ago, and wish I had thought of this solution then. It would have saved me a lot of waiting.
[…] This post was mentioned on Twitter by Joseph Curtis, Planet Python. Planet Python said: Kristján Valur: Finding C reference leaks using the gc module http://bit.ly/hv3KcY […]
[…] a recent comment to one of my posts here prompted me to have a look at an off-the-shelf library to visualize graphs. I’m currently […]