This is another of those memory conservation stories on the PS3.
Our engineers were worried about how much memory was being spent/wasted in dictionaries. Python dicts are these sparse datastructures, optimized for performance and trading off memory usage to achieve speed.
This code shows you the memory used by dicts of various sizes:
for i in range(20): print i, sys.getsizeof(dict((j,j) for j in range(i))) ... 0 148 1 148 2 148 3 148 4 148 5 148 6 532 7 532 8 532 9 532 10 532 11 532 12 532 13 532 14 532 15 532 16 532 17 532 18 532 19 532
It’s rather striking that a 10 element dict on a 32 bit is consuming more than 1/2k of memory. I’m pretty sure BBC Basic can’t have used dicts.
Now, I was interested in tuning the dict implementation for the PS3, sacrificing performance for memory. Looking at the code led me to a file called dictnotes.txt explaining much about dicts. The section on tuning only considers performance. Too sparse a dict, you see, looses performance because of memory cache effects. Otherwise, I’m sure, we would want dicts infinitely sparse.
It turns out that only one parameter is easily tunable, PyDict_MINSIZE. Python 2.7 sets this to 8 for reasons of cache line size, although I find that an odd generalization across a huge number of platforms. in dictobject.h, I came across this comment:
* PyDict_MINSIZE is the minimum size of a dictionary. This many slots are
* allocated directly in the dict object (in the ma_smalltable member).
* It must be a power of 2, and at least 4.
It turns out this is wrong. Python will happily run with it set to 1.
As for the other tunable parameters, I ended up macrofying things that were hard-coded in various places in the code:
/* CCP change: Tunable parameters for dict growth */ #if _PyCCP_TIGHT_DICT /* Save memory for dust */ #define _PyDICT_MAX_LOAD_NUM 4 /* grow at 80% load */ #define _PyDICT_MAX_LOAD_DENOM 5 #define _PyDICT_GROWTHRATE_NUM 3 /* scale by 1.5 */ #define _PyDICT_GROWTHRATE_DENOM 2 #else /* max load 2/3, default python: */ #define _PyDICT_MAX_LOAD_NUM 2 #define _PyDICT_MAX_LOAD_DENOM 3 #define _PyDICT_GROWTHRATE_NUM (mp->ma_used > 50000 ? 2 : 4) #define _PyDICT_GROWTHRATE_DENOM 1 #endif
And then later:
#if 0 if (!(mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2)) return 0; return dictresize(mp, (mp->ma_used > 50000 ? 2 : 4) * mp->ma_used); #else /* CCP change, tunable growth parameters */ if (!(mp->ma_used > n_used && mp->ma_fill*_PyDICT_MAX_LOAD_DENOM >= (mp->ma_mask+1)*_PyDICT_MAX_LOAD_NUM)) return 0; return dictresize(mp, _PyDICT_GROWTHRATE_NUM * mp->ma_used / _PyDICT_GROWTHRATE_DENOM); #endif
By using a smalltable of size 1, setting the max fill rate to 4/5 and growth rate to 1.5, we get this:
for i in range(20): print i, sys.getsizeof(dict((j,j) for j in range(i)))
The size of the dicts is still governed by the fact that the tables must have sizes that are powers of two, so we can only delay the onset of growth. But at least we get rid of the super optimistic quadruple growth factor and the large small table.
Perhaps a different, again less optimal, version of the dict wouldn’t have that power of two requirement. There is no inherent need for that when hashing except that it makes for nice bitwise arithmetic.
The effect these changes had:
I just do a quick test. It saved 2MB in login screen. That’s awesome.