Optimizing the dict

This is another of those memory conservation stories on the PS3.

Our engineers were worried about how much memory was being spent/wasted in dictionaries. Python dicts are these sparse datastructures, optimized for performance and trading off memory usage to achieve speed.

This code shows you the memory used by dicts of various sizes:

for i in range(20): print i, sys.getsizeof(dict((j,j) for j in range(i)))
...
0 148
1 148
2 148
3 148
4 148
5 148
6 532
7 532
8 532
9 532
10 532
11 532
12 532
13 532
14 532
15 532
16 532
17 532
18 532
19 532

It’s rather striking that a 10 element dict on a 32 bit is consuming more than 1/2k of memory. I’m pretty sure BBC Basic can’t have used dicts.

Now, I was interested in tuning the dict implementation for the PS3, sacrificing performance for memory. Looking at the code led me to a file called dictnotes.txt explaining much about dicts. The section on tuning only considers performance. Too sparse a dict, you see, looses performance because of memory cache effects. Otherwise, I’m sure, we would want dicts infinitely sparse.

It turns out that only one parameter is easily tunable, PyDict_MINSIZE. Python 2.7 sets this to 8 for reasons of cache line size, although I find that an odd generalization across a huge number of platforms. in dictobject.h, I came across this comment:

* PyDict_MINSIZE is the minimum size of a dictionary. This many slots are
* allocated directly in the dict object (in the ma_smalltable member).
* It must be a power of 2, and at least 4.

It turns out this is wrong. Python will happily run with it set to 1.

As for the other tunable parameters, I ended up macrofying things that were hard-coded in various places in the code:

/* CCP change: Tunable parameters for dict growth */
#if _PyCCP_TIGHT_DICT
/* Save memory for dust */
#define _PyDICT_MAX_LOAD_NUM 4 /* grow at 80% load */
#define _PyDICT_MAX_LOAD_DENOM 5

#define _PyDICT_GROWTHRATE_NUM 3 /* scale by 1.5 */
#define _PyDICT_GROWTHRATE_DENOM 2
#else
/* max load 2/3, default python: */
#define _PyDICT_MAX_LOAD_NUM 2
#define _PyDICT_MAX_LOAD_DENOM 3

#define _PyDICT_GROWTHRATE_NUM (mp->ma_used > 50000 ? 2 : 4)
#define _PyDICT_GROWTHRATE_DENOM 1
#endif

And then later:

#if 0
    if (!(mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2))
        return 0;
    return dictresize(mp, (mp->ma_used > 50000 ? 2 : 4) * mp->ma_used);
#else
    /* CCP change, tunable growth parameters */
    if (!(mp->ma_used > n_used && mp->ma_fill*_PyDICT_MAX_LOAD_DENOM >= (mp->ma_mask+1)*_PyDICT_MAX_LOAD_NUM))
        return 0;
    return dictresize(mp, _PyDICT_GROWTHRATE_NUM * mp->ma_used / _PyDICT_GROWTHRATE_DENOM);
#endif

By using a smalltable of size 1, setting the max fill rate to 4/5 and growth rate to 1.5, we get this:
[python]
for i in range(20): print i, sys.getsizeof(dict((j,j) for j in range(i)))

0 64
1 88
2 112
3 160
4 160
5 160
6 256
7 256
8 256
9 256
10 256
11 256
12 448
13 448
14 448
15 448
16 448
17 448
18 448
19 448
[/python]

The size of the dicts is still governed by the fact that the tables must have sizes that are powers of two, so we can only delay the onset of growth. But at least we get rid of the super optimistic quadruple growth factor and the large small table.

Perhaps a different, again less optimal, version of the dict wouldn’t have that power of two requirement. There is no inherent need for that when hashing except that it makes for nice bitwise arithmetic.

Update:

The effect these changes had:

I just do a quick test. It saved 2MB in login screen. That’s awesome.

Thanks,

Kevin Zhang

Advertisement

Reference cycles with closures

Polishing our forthcoming console game, our team in Shanghai are relentlessly trying to minimize python memory use.
Today, an engineer complained to me that “cell” objects were being leaked(*).

This rang a bell with me. In 2009, I had posted about this to python-dev.
The response at the time wasn’t very sympathetic. I should be doing stuff differently or simply rely on the cyclic garbage collector and not try to be clever. Yet, as I pointed out, parts of the library are aware of the problem and do help you with these things, such as the xml.dom.minidom.unlink() method.

The data being leaked now appeared to pertain to the json module:

[2861.88] Python: 0: <bound method JSONEncoder.default of <json.encoder.JSONEncoder object at 0x12e14010>>
[2861.88] Python: 1: <bound method JSONEncoder.default of <json.encoder.JSONEncoder object at 0x12e14010>>
[2861.88] Python: 2: <bound method JSONEncoder.default of <json.encoder.JSONEncoder object at 0x12e14010>>

This prompted me to have a look in the json module, and behold, json.encoder contains this pattern:
[python]
def _make_iterencode(…)

def _iterencode(o, _current_indent_level):
if isinstance(o, basestring):
yield _encoder(o)
elif o is None:
yield ‘null’
elif o is True:
yield ‘true’
elif o is False:
yield ‘false’
elif isinstance(o, (int, long)):
yield str(o)
elif isinstance(o, float):
yield _floatstr(o)
elif isinstance(o, (list, tuple)):
for chunk in _iterencode_list(o, _current_indent_level):
yield chunk
elif isinstance(o, dict):
for chunk in _iterencode_dict(o, _current_indent_level):
yield chunk
else:
if markers is not None:
markerid = id(o)
if markerid in markers:
raise ValueError(“Circular reference detected”)
markers[markerid] = o
o = _default(o)
for chunk in _iterencode(o, _current_indent_level):
yield chunk
if markers is not None:
del markers[markerid]

return _iterencode
[/python]

The problem is this: The returned closure has a func_closure() member containing the “cell” objects, one of which points to this function. There is no way to clear the func_closure method after use. And so, iterencoding stuff using the json module causes reference cycles that persist until the next collection, possibly causing python to hang on to all the data that was supposed to be encoded and then thrown away.

Looking for a workaround, I wrote this code, emulating part of what is going on:
[python]
def itertest(o):
def listiter(l):
for i in l:
if isinstance(i, list):
chunks = listiter(i)
for i in chunks:
yield i
else:
yield i
return listiter(o)
[/python]

Testing it, confirmed the problem:

>>> import celltest
>>> l = [1, [2, 3]]
>>> import gc, celltest
>>> gc.collect()
>>> gc.set_debug(gc.DEBUG_LEAK)
>>> l = [1, [2, 3]]
>>> i = celltest.itertest(l)
>>> list(i)
[1, 2, 3]
>>> gc.collect()
gc: collectable <cell 01E96B50>
gc: collectable <function 01E97330>
gc: collectable <tuple 01E96910>
gc: collectable <cell 01E96B30>
gc: collectable <tuple 01E96950>
gc: collectable <function 01E973F0>
3

To fix this, it is necessary to clear the “cell” objects once there is no more need for them. It is not possible to do this from the outside, so how about from the inside? Changing the code to:
[python]
def itertest2(o):
def listiter(l):
for i in l:
if isinstance(i, list):
chunks = listiter(i)
for i in chunks:
yield i
else:
yield i

chunks = listiter(o)
for i in chunks:
yield i
chunks = listiter = None
[/python]
Does the trick. the function becomes a generator, yields the stuff, then cleans up:

>>> o = celltest.itertest2(l)
>>> list(o)
[1, 2, 3]
>>> gc.collect()
0

It is an unfortunate situation. The workaround requires work to be done inside the function. It would be cool if it were possible to clear the function’s closure by calling, e.g. func.close(). As it is, people have to be aware of these hidden cycles and code carfully around them.

(*) Leaking in this case means not being released immediately by reference counting but lingering. We don’t want to rely on the gc module’s quirkiness in a video game.

Update:

In my toy code, I got the semantics slightly wrong.  Actually, it is more like this:
[python]
def make_iter():
def listiter(l):
for i in l:
if isinstance(i, list):
chunks = listiter(i)
for i in chunks:
yield i
else:
yield i
return listiter

def get_iterator(data):
it = make_iter()
return it(data)
[/python]

This complicates things. Nowhere is, during iteration, any code running in the scope of make_iter that we can use to clear those locals after iteration. Everything is running in nested functions and since I am using Python 2.7 (which doesn’t have the “nonlocal” keyword) there seems to be no way to clear the outer locals from the inner functions once iteration is done.

I guess that means that I’ll have to modify this code to use class objects instead.

Also, while on the topic, I think Raymond Hettinger’s class-like objects are subject to this problem if they have any sort of mutual or recursive relationship among their “members”.