Optimizing the dict

This is another of those memory conservation stories on the PS3.

Our engineers were worried about how much memory was being spent/wasted in dictionaries. Python dicts are these sparse datastructures, optimized for performance and trading off memory usage to achieve speed.

This code shows you the memory used by dicts of various sizes:

for i in range(20): print i, sys.getsizeof(dict((j,j) for j in range(i)))
...
0 148
1 148
2 148
3 148
4 148
5 148
6 532
7 532
8 532
9 532
10 532
11 532
12 532
13 532
14 532
15 532
16 532
17 532
18 532
19 532

It’s rather striking that a 10 element dict on a 32 bit is consuming more than 1/2k of memory. I’m pretty sure BBC Basic can’t have used dicts.

Now, I was interested in tuning the dict implementation for the PS3, sacrificing performance for memory. Looking at the code led me to a file called dictnotes.txt explaining much about dicts. The section on tuning only considers performance. Too sparse a dict, you see, looses performance because of memory cache effects. Otherwise, I’m sure, we would want dicts infinitely sparse.

It turns out that only one parameter is easily tunable, PyDict_MINSIZE. Python 2.7 sets this to 8 for reasons of cache line size, although I find that an odd generalization across a huge number of platforms. in dictobject.h, I came across this comment:

* PyDict_MINSIZE is the minimum size of a dictionary. This many slots are
* allocated directly in the dict object (in the ma_smalltable member).
* It must be a power of 2, and at least 4.

It turns out this is wrong. Python will happily run with it set to 1.

As for the other tunable parameters, I ended up macrofying things that were hard-coded in various places in the code:

/* CCP change: Tunable parameters for dict growth */
#if _PyCCP_TIGHT_DICT
/* Save memory for dust */
#define _PyDICT_MAX_LOAD_NUM 4 /* grow at 80% load */
#define _PyDICT_MAX_LOAD_DENOM 5

#define _PyDICT_GROWTHRATE_NUM 3 /* scale by 1.5 */
#define _PyDICT_GROWTHRATE_DENOM 2
#else
/* max load 2/3, default python: */
#define _PyDICT_MAX_LOAD_NUM 2
#define _PyDICT_MAX_LOAD_DENOM 3

#define _PyDICT_GROWTHRATE_NUM (mp->ma_used > 50000 ? 2 : 4)
#define _PyDICT_GROWTHRATE_DENOM 1
#endif

And then later:

#if 0
    if (!(mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2))
        return 0;
    return dictresize(mp, (mp->ma_used > 50000 ? 2 : 4) * mp->ma_used);
#else
    /* CCP change, tunable growth parameters */
    if (!(mp->ma_used > n_used && mp->ma_fill*_PyDICT_MAX_LOAD_DENOM >= (mp->ma_mask+1)*_PyDICT_MAX_LOAD_NUM))
        return 0;
    return dictresize(mp, _PyDICT_GROWTHRATE_NUM * mp->ma_used / _PyDICT_GROWTHRATE_DENOM);
#endif

By using a smalltable of size 1, setting the max fill rate to 4/5 and growth rate to 1.5, we get this:
[python]
for i in range(20): print i, sys.getsizeof(dict((j,j) for j in range(i)))

0 64
1 88
2 112
3 160
4 160
5 160
6 256
7 256
8 256
9 256
10 256
11 256
12 448
13 448
14 448
15 448
16 448
17 448
18 448
19 448
[/python]

The size of the dicts is still governed by the fact that the tables must have sizes that are powers of two, so we can only delay the onset of growth. But at least we get rid of the super optimistic quadruple growth factor and the large small table.

Perhaps a different, again less optimal, version of the dict wouldn’t have that power of two requirement. There is no inherent need for that when hashing except that it makes for nice bitwise arithmetic.

Update:

The effect these changes had:

I just do a quick test. It saved 2MB in login screen. That’s awesome.

Thanks,

Kevin Zhang

Using an isolated python.exe

Executive Summary:

If you want to completely control the sys.path of your copy of python.exe, do the following:

  1. Create a site.py next to it that contains nothing but the line “import sitecustomize”
  2. place a sitecustomize.py next to it, that rewrites sys.path to your liking
  3. run your python.exe, somewhat safe in the knowledge that it won’t be affected by any other python installed on the system.

The Long Explanation:

The problem

At CCP we work with many branches of many games.  All of these games use Python to some extent.  Each branch comes complete with its own python source tree, where local patches are added to Python and which may be of different python versions.  Each branch then builds its own python2x.dll and a python.exe, in addition, perhaps, to static versions of the python core for inclusion into a game executable.

What is more, each of these branches contains a plethora of offline tools.  These may be build scripts, test scripts and so on, and most of them are written in Python.  For general harmony, of course, this python version must be the same version as the one used in the game, that is, the offline tools used in a branch should use a Python version local to that branch.  This is where things become messy.

I’ve often vented my frustration to my colleagues about how “install oriented” Python appears to be.  For embedding, until recently there wasn’t even a way for a host application to completely control Python’s sys.path.  Python.exe will, when executed, go through a series of magic moves to guess an initial sys.path.  After this it will, unless instructed not to, try to import site.py which continues with the magic path munging process.

There is one alternative behaviour built into Python (yes, built in.)  If Python upon initialization detects that it is being run from something that looks like a build folder structure, it will initialize sys.path locally to that structure.  Otherwise, it will go ahead and set sys.path to what it thinks is the system wide sensible locations before importing site.py.

This then, is how python.exe is designed.  Either it is being built and then can live in a local, isolated, setting, or it is installed and uses machine global information for its environment.

For our branch specific tools, it was important to override this behaviour.  A buildstuff.cmd batch file in /games/foo/tools ought to be able to call ../bin/python.exe and have that particular copy of python.exe set up sys.path to, say, ../src/python27/Lib.  Further, this needs to happen without the kludgy help of environment variables or, dear I say it, registry settings.  These are a nightmare to manage in a distributed environment with gazillion developer machines, build bots, and so on.

Fortunately, there is something called sitecustomize.py.  The documentation specifies that after site.py is imported, python will try to “import sitecustomize” and this can be used to do local path adjustments.

The initial approach

When we started doing this in earnest we were using Python 2.6 both as an installed “tool” on developer machines and as the Python version in use for that particular branch.  We found that by copying python.exe out of the PCBuild build folder into a game/foo/bin folder, and placing a sitecustomize.py next to it, we could fully control sys.path the way we wanted.  The our sitecustomize.py would look something like this:

import sys
localdir = "/".join(__file__.split('\')[:-1])
root = localdir + "/../"
python = root + "src/python/python27/"
sys.path = [python+"PCBuild", python+"Lib", root + "modules"]

This works because python.exe’s directory is put in sys.path by Python built-in startup magic!  Of course, there is no reliable way to find it from the .py file, so the trick with __file__ is used.  Also, we can’t use os.path for our path manipulation since we really don’t want to import it.  Who knows what side effect is may have, importing stuff from the “default” sys.path.  But it is was good enough for our purposes.  For a while.

Switching to Python 2.7

Then we moved one branch to use Python 2.7.  Python was recompiled, and python.exe put in the bin folder as before, but suddenly, some machines (particularly the build machines) started failing.  The python tools complained that site could not be imported.

On investigation it turned out that previously all machines using this scheme had, by a happy coincidence, had Python 2.6 installed on them, and our local python.exe had been importing site.py from c:Python26Lib.  Now, python was looking for site.py in, among other places, c:Python27Lib (one of the magic path entries set up by python.exe and which it is impossible to override.)

To fix this, I placed an empty site.py next to sitecustomize.py and python.exe in the bin folder, hoping that would work.  Indeed, the build machines now succeeded in finding a site.py, but now sitecustomize.py wasn’t being run.

It wasn’t until I actually looked at pythonrun.c that I realized that sitecustomize.py is being imported and executed by the site.py in python’s standard library!.  So, it is site.py’s responsibility to call sitecustomize.py (and something called usercustomize.py that I had previously not known about.)

The solution then:  Instead of an empty site.py, have a site.py containing this code:

import sitecustomize
del sitecustomize

Musings

So, we have found that to have an isolated python.exe for which you control it’s sys.path absolutely with no external influences, you need a sitecustomize.py to override your sys.path.  But for that to run, you must provide your own site.py, in case a system-wide site.py isn’t found.

It is annoying that if a system-wide site.py is found, than this is run.  There, already, you have lost some control over your python.exe.  Who knows what a malicious system administrator may do in such a file?  So even with our approach, there is no absolute control.

It’s also annoying that you need two files to accomplish this task.  It would be nice if python.exe could be instructed to not do any automatic path guessing and just take its initial sys.path from a startup file.  The fact that python.exe has a built-in mechanism to set a different initial sys.path if it detects that it is being run out of a build folder, indicates that someone sometime recognized the need for this, but only took it half the way.

My suggestion would be this:  If a site.py (or site.pyc) is found in an initial place, e.g. next to the executable, then set the initial sys.path to only that place and skip all other magic.

This could then be used to great effect in the build system.  Instead of building into python.exe a separate folder structure for built environments, just create a site.py in the PCBuild directory (or the equivalent platform place) and set it up there.  Similarly, a site.py could be put in place in c:Python27bin and the entire ugly logic of initial path setting could be removed from python.exe.

As a side note, I have always thought that the initial path-guessing should be a feature of python.exe and not python27.dll as such.  python.exe should, in my opinion, call something like Py_Guess_Path() before calling Py_Initialize(), since it is a very specific behaviour of that particular embedding application.

Embedding Python and sys.path

When embedding a Python interpreter in another application, such as a game, a tricky parts is to get it to start up and find the right modules.  You typically want Python to:

  • Find your version of the standard library
  • Find your application modules
  • Not be confused by any other Python system that may be installed on the machine.

The application has absolute knowledge of the location of the python modules.  It has been installed in a known location and it knows where its resources are.

Unfortunately, Python often appears to be designed for the specific needs of python.exe, the “canonical embedding application”.  When one calls Py_Initialize(), a number of magic steps are invoked whereby the Python runtime tries to guess the initial setting for sys.path.  This may involve finding the program location, looking at the PYTHONPATH environment variable, and so on.  When Py_Initialize() returns, it may be too late to modify sys.path because in the mean time, module imports could have occured, such as that of copy_reg.

I have found that circumventing this involves patching pythoncore itself.  What we do in EVE before calling Py_Initialize() is to call a custom API, Py_SetPath() with an application-crafted semicolon-separated path string.  This will turn off any path-guessing in pythoncore and gives us absolute control.

Before we started doing this, for example, we could not encourage developers to have Python installed on their workstations, for example.  We would invariably run into conflicts where the embedded python picked up modules from the installed python and mysterious things would happen.  Now there is no need, and the embedded Python and the installed Python can coexsist without conflict.

I’ve recently contributed this patch to python.org.