Optimizing the dict

This is another of those memory conservation stories on the PS3.

Our engineers were worried about how much memory was being spent/wasted in dictionaries. Python dicts are these sparse datastructures, optimized for performance and trading off memory usage to achieve speed.

This code shows you the memory used by dicts of various sizes:

for i in range(20): print i, sys.getsizeof(dict((j,j) for j in range(i)))
...
0 148
1 148
2 148
3 148
4 148
5 148
6 532
7 532
8 532
9 532
10 532
11 532
12 532
13 532
14 532
15 532
16 532
17 532
18 532
19 532

It’s rather striking that a 10 element dict on a 32 bit is consuming more than 1/2k of memory. I’m pretty sure BBC Basic can’t have used dicts.

Now, I was interested in tuning the dict implementation for the PS3, sacrificing performance for memory. Looking at the code led me to a file called dictnotes.txt explaining much about dicts. The section on tuning only considers performance. Too sparse a dict, you see, looses performance because of memory cache effects. Otherwise, I’m sure, we would want dicts infinitely sparse.

It turns out that only one parameter is easily tunable, PyDict_MINSIZE. Python 2.7 sets this to 8 for reasons of cache line size, although I find that an odd generalization across a huge number of platforms. in dictobject.h, I came across this comment:

* PyDict_MINSIZE is the minimum size of a dictionary. This many slots are
* allocated directly in the dict object (in the ma_smalltable member).
* It must be a power of 2, and at least 4.

It turns out this is wrong. Python will happily run with it set to 1.

As for the other tunable parameters, I ended up macrofying things that were hard-coded in various places in the code:

/* CCP change: Tunable parameters for dict growth */
#if _PyCCP_TIGHT_DICT
/* Save memory for dust */
#define _PyDICT_MAX_LOAD_NUM 4 /* grow at 80% load */
#define _PyDICT_MAX_LOAD_DENOM 5

#define _PyDICT_GROWTHRATE_NUM 3 /* scale by 1.5 */
#define _PyDICT_GROWTHRATE_DENOM 2
#else
/* max load 2/3, default python: */
#define _PyDICT_MAX_LOAD_NUM 2
#define _PyDICT_MAX_LOAD_DENOM 3

#define _PyDICT_GROWTHRATE_NUM (mp->ma_used > 50000 ? 2 : 4)
#define _PyDICT_GROWTHRATE_DENOM 1
#endif

And then later:

#if 0
    if (!(mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2))
        return 0;
    return dictresize(mp, (mp->ma_used > 50000 ? 2 : 4) * mp->ma_used);
#else
    /* CCP change, tunable growth parameters */
    if (!(mp->ma_used > n_used && mp->ma_fill*_PyDICT_MAX_LOAD_DENOM >= (mp->ma_mask+1)*_PyDICT_MAX_LOAD_NUM))
        return 0;
    return dictresize(mp, _PyDICT_GROWTHRATE_NUM * mp->ma_used / _PyDICT_GROWTHRATE_DENOM);
#endif

By using a smalltable of size 1, setting the max fill rate to 4/5 and growth rate to 1.5, we get this:
[python]
for i in range(20): print i, sys.getsizeof(dict((j,j) for j in range(i)))

0 64
1 88
2 112
3 160
4 160
5 160
6 256
7 256
8 256
9 256
10 256
11 256
12 448
13 448
14 448
15 448
16 448
17 448
18 448
19 448
[/python]

The size of the dicts is still governed by the fact that the tables must have sizes that are powers of two, so we can only delay the onset of growth. But at least we get rid of the super optimistic quadruple growth factor and the large small table.

Perhaps a different, again less optimal, version of the dict wouldn’t have that power of two requirement. There is no inherent need for that when hashing except that it makes for nice bitwise arithmetic.

Update:

The effect these changes had:

I just do a quick test. It saved 2MB in login screen. That’s awesome.

Thanks,

Kevin Zhang

Advertisement

Reference cycles with closures

Polishing our forthcoming console game, our team in Shanghai are relentlessly trying to minimize python memory use.
Today, an engineer complained to me that “cell” objects were being leaked(*).

This rang a bell with me. In 2009, I had posted about this to python-dev.
The response at the time wasn’t very sympathetic. I should be doing stuff differently or simply rely on the cyclic garbage collector and not try to be clever. Yet, as I pointed out, parts of the library are aware of the problem and do help you with these things, such as the xml.dom.minidom.unlink() method.

The data being leaked now appeared to pertain to the json module:

[2861.88] Python: 0: <bound method JSONEncoder.default of <json.encoder.JSONEncoder object at 0x12e14010>>
[2861.88] Python: 1: <bound method JSONEncoder.default of <json.encoder.JSONEncoder object at 0x12e14010>>
[2861.88] Python: 2: <bound method JSONEncoder.default of <json.encoder.JSONEncoder object at 0x12e14010>>

This prompted me to have a look in the json module, and behold, json.encoder contains this pattern:
[python]
def _make_iterencode(…)

def _iterencode(o, _current_indent_level):
if isinstance(o, basestring):
yield _encoder(o)
elif o is None:
yield ‘null’
elif o is True:
yield ‘true’
elif o is False:
yield ‘false’
elif isinstance(o, (int, long)):
yield str(o)
elif isinstance(o, float):
yield _floatstr(o)
elif isinstance(o, (list, tuple)):
for chunk in _iterencode_list(o, _current_indent_level):
yield chunk
elif isinstance(o, dict):
for chunk in _iterencode_dict(o, _current_indent_level):
yield chunk
else:
if markers is not None:
markerid = id(o)
if markerid in markers:
raise ValueError(“Circular reference detected”)
markers[markerid] = o
o = _default(o)
for chunk in _iterencode(o, _current_indent_level):
yield chunk
if markers is not None:
del markers[markerid]

return _iterencode
[/python]

The problem is this: The returned closure has a func_closure() member containing the “cell” objects, one of which points to this function. There is no way to clear the func_closure method after use. And so, iterencoding stuff using the json module causes reference cycles that persist until the next collection, possibly causing python to hang on to all the data that was supposed to be encoded and then thrown away.

Looking for a workaround, I wrote this code, emulating part of what is going on:
[python]
def itertest(o):
def listiter(l):
for i in l:
if isinstance(i, list):
chunks = listiter(i)
for i in chunks:
yield i
else:
yield i
return listiter(o)
[/python]

Testing it, confirmed the problem:

>>> import celltest
>>> l = [1, [2, 3]]
>>> import gc, celltest
>>> gc.collect()
>>> gc.set_debug(gc.DEBUG_LEAK)
>>> l = [1, [2, 3]]
>>> i = celltest.itertest(l)
>>> list(i)
[1, 2, 3]
>>> gc.collect()
gc: collectable <cell 01E96B50>
gc: collectable <function 01E97330>
gc: collectable <tuple 01E96910>
gc: collectable <cell 01E96B30>
gc: collectable <tuple 01E96950>
gc: collectable <function 01E973F0>
3

To fix this, it is necessary to clear the “cell” objects once there is no more need for them. It is not possible to do this from the outside, so how about from the inside? Changing the code to:
[python]
def itertest2(o):
def listiter(l):
for i in l:
if isinstance(i, list):
chunks = listiter(i)
for i in chunks:
yield i
else:
yield i

chunks = listiter(o)
for i in chunks:
yield i
chunks = listiter = None
[/python]
Does the trick. the function becomes a generator, yields the stuff, then cleans up:

>>> o = celltest.itertest2(l)
>>> list(o)
[1, 2, 3]
>>> gc.collect()
0

It is an unfortunate situation. The workaround requires work to be done inside the function. It would be cool if it were possible to clear the function’s closure by calling, e.g. func.close(). As it is, people have to be aware of these hidden cycles and code carfully around them.

(*) Leaking in this case means not being released immediately by reference counting but lingering. We don’t want to rely on the gc module’s quirkiness in a video game.

Update:

In my toy code, I got the semantics slightly wrong.  Actually, it is more like this:
[python]
def make_iter():
def listiter(l):
for i in l:
if isinstance(i, list):
chunks = listiter(i)
for i in chunks:
yield i
else:
yield i
return listiter

def get_iterator(data):
it = make_iter()
return it(data)
[/python]

This complicates things. Nowhere is, during iteration, any code running in the scope of make_iter that we can use to clear those locals after iteration. Everything is running in nested functions and since I am using Python 2.7 (which doesn’t have the “nonlocal” keyword) there seems to be no way to clear the outer locals from the inner functions once iteration is done.

I guess that means that I’ll have to modify this code to use class objects instead.

Also, while on the topic, I think Raymond Hettinger’s class-like objects are subject to this problem if they have any sort of mutual or recursive relationship among their “members”.
 

Clearing weakrefs

I just had this problem which would have been elegantly solved with the ability to manually clear weak references pointing to an object. I am (for technical reasons) recycling an object, so instead of killing it and re-creating it, I re-initialize it. But that leaves old weak references in place. How nice wouldn’t it be to be able to call “myobject.clear_weakrefs()”?

Float object reuse

I thought I’d mention a cool little patch we did to Python some years back.

We work with database tables a lot.  Game configuration data is essentially rows in a vast database.  And those rows contain a lot of floats.  At some point I recognized that common float values were not being reused.  In particular, id(0.0) != id(0.0).  I was a bit surprized by this, since I figured, some floats must be more common than others.  Certainly, 0.0 is a bit special.

I mentioned this on python-dev some years back but with somewhat underwhelming results.  A summary of the discussion can be found here.

Anyway, I thought I’d mention this to people doing a lot of floating point.  We saved a huge amount of memory on our servers just caching integral floating point values between -10 and +10, including both the negative and positive 0.0.  These values are very frequent, for example as multipliers in tables, and so on.

Here’s some of the code:

[C]

PyObject *
PyFloat_FromDouble(double fval)
{
    register PyFloatObject *op;
    int ival;
    if (free_list == NULL) {
        if ((free_list = fill_free_list()) == NULL)
            return NULL;
        /* CCP addition, cache common values */
        if (!f_reuse[0]) {
            int i;
            for(i = 0; i<21; i++)
                f_reuse[i] = PyFloat_FromDouble((double)(i-10));
        }
    }
    /* CCP addition, check for recycling */
    ival = (int)fval;
    if ((double)ival == fval && ival>=-10 && ival <= 10) {
#ifdef MS_WINDOWS
        /* ignore the negative zero */
        if (ival || _fpclass(fval) != _FPCLASS_NZ) {
#else
        /* can't differentiate between positive and negative zeroes, ignore both */
        if (ival) {
#endif
            ival+=10;
            if (f_reuse[ival]) {
                Py_INCREF(f_reuse[ival]);
                return f_reuse[ival];
            }
        }
    }

    /* Inline PyObject_New */
    op = free_list;
    free_list = (PyFloatObject *)Py_TYPE(op);
    PyObject_INIT(op, &PyFloat_Type);
    op->ob_fval = fval;
    return (PyObject *) op;
}

[/C]

(Please excuse the lame syntax highlighter with its &amp; and &lt; thingies 🙂

Temporary thread state overhead

When doing IO, it is sometimes useful for a worker thread to notify Python that something has happened. Previously we have just had the Python main thread “Poll” some external variable for that, but recently we have been experimenting with having the main thread just grab the GIL and perform python work itself.

This should be straightforward. Python has an api called PyGILState_Ensure() that can be called on any thread. If that thread doesn’t already have a Python thread state, it will create a temporary one. Such a thread is sometimes called an external thread.

On a server loaded to some 40% with IO, this is what happened when I turned on this feature:

process cpu

The dark gray area is main thread CPU, (initially at around 40%) and the rest is other threads.  Turning on the “ThreadWakeup” feature adds some 20% extra cpu work to the process.

When the main thread is not working, it is idle doing a MsgWaitForMultipleObjects() Windows system call (with the GIL unclaimed).  So the worker thread should have no problem acquiring the GIL.  Further, there is only ever one woker thread doing a PyGILState_Ensure()/PyGILState_Release() at the same time, and this is ensured using locking on the worker thread side.

Further tests seem to confirm that if the worker thread already owns a Python thread state, and uses that to aquire the GIL (using a PyEval_RestoreThread() call) this overhead goes away.

This was surprising to me, but it seems to indicate that it is very expensive to “acquire a thread state on demand” to claim the GIL.  This is very unfortunate, because it means that one cannot easily use arbitrary system threads to call into Python without significant overhead.  These might be threads from the Windows thread pool for example, threads that we have no control over and therefore cannot assign thread state to.

I will try to investigate this furter, to see where the overhead is coming from.  It could be the extra TLS calls made, or simply the cost of malloc()/free() involved.  Depending on the results, there are a few options:

  1. Keep a single thread state on the side for (the single) external thread that can claim the GIL at a time, ready and initialized.
  2. Allow an external thread to ‘borrow’ another thread state and not use its own.
  3. Streamline the stuff already present.

Update, oct. 6th 2011:
Enabling dynamic GIL with tread state caching did notthing to solve this issue.
I think the problem is likely to be that spin locking is in effect for the GIL. I’ll see what happens if I explicitly define the GIL to not use spin locking.

namedtuple and exec()

In our port of Python 2.7 to the PS3 console, we have deliberately removed the python compiler. This was mainly done to save on the code size, since on a console every byte is sacred.  An additional benefit is slight hardening against certain kinds of attacks, since evil constructs such as eval() and exec() now raise the NotImplementedError when used.

Program code is pre-compiled and put in .zip archives so there is no need for regular compilation on the console. The most serious problem we encountered though, was with the new namedtuple construct.

The namedtuple is implemented in the collections module by constructing a class declaration with string interpolation and then calling exec() on it. With exec() removed, a lot of the standard library turned out to fail on import.

Our initial fix was simply to replace the namedtuples with regular tuples:
[python]
def namedtuple(typename, field_names, verbose=False, rename=False):
return tuple
[/python]
This worked surprisingly well. The parts of the library we were using were still using namedtuples just like regular tuples and all was well.

Recently, however, we found that the urlparse module was making non-trivial use of it so something needed to be done.  My initial reflex was to dive in and reimplement it using a metaclass or some such. But then I thought of asking the internet.

It turns out that this exists as an issue in the Python bug tracker.  Someone else had come across this oddity in the standard library and submitted an alternative implementation.  This works perfectly for our purposes.

I know that there is nothing inherently evil about using exec in Python, but this particular case still doesn’t quite ring true to me:  If the best way to implement a class is by resorting to the meta-language, doesn’t that indicate some shortcoming in the language itself?

Lazy Import

As Richard Tew mentioned on his blog, we are using the lazy importing trick to reduce memory overhead.

We improved the original module by adding some features:

  • A simpler and more robust injection mechanism using gc.get_referrers()
  • Supporting zip imports
  • Supporting reload()
  • A reporting capability.
  • Supporting bypassing selected modules or packages.

The last bit is important because some modules may be used internally by C and those cannot be treated with this.  In our case, we actually import this module from C right after importing site.py, in order to get maximum benefit, so we may be seeing more problems than the casual user who imports it from a .py file.

I don’t have any other good place to put this, so I’m just leaving it here for the time being.

[python]
# Copyright rPath, Inc., 2006
# Available under the python license
“”” Defines an on-demand importer that only actually loads modules when their
attributes are accessed. NOTE: if the ondemand module is viewed using
introspection, like dir(), isinstance, etc, it will appear as a
ModuleProxy, not a module, and will not have the correct attributes.
Barring introspection, however, the module will behave as normal.
“””

# modified for CCP by Kristján Valur Jónsson:
# – Use the gc.getreferrers() method to replace module references
# – Add zip support
# – support reload()
# – Add reporting and memory analysis
# – Add bypass mechanism for modules where this causes problems

import sys
import imp
import gc
import __builtin__
import zipimport

memory_query_func = None #set this to something returning memory use
verbose = False

ModuleType = type(sys)

#modules that bypass this mechanism
ignorenames = set() #module names
ignorepkg = set() #package names
ignorepath = set() #paths to ignore

#side effect register string unicode handling / conversion
ignorenames |= set([“encodings”])
#side effect prevent internal Python borrowed reference choking
ignorenames |= set([“warnings”])

#statistics
proxies = set()
proxyTally = 0
reals = set()
ignored = set()
existing = set(k for k,v in sys.modules.iteritems() if v)

def report(arg=””):
if not verbose:
return
loaded = arg.startswith(“load “)
if loaded:
if memory_query_func is not None:
print >> sys.stderr, “lazyimport: %s (now using %0.3f Mb)” % (arg, memory_query_func())
else:
print >> sys.stderr, “lazyimport: %s” % arg
else:
if memory_query_func is not None:
print >> sys.stderr, “lazyimport report: %s (now using %0.3f Mb)” % (arg, memory_query_func())
else:
print >> sys.stderr, “lazyimport report: %s” % arg

if verbose > 1 or not loaded:
print >> sys.stderr, “proxy imported %d %r”%(len(proxies), sorted(proxies))
print >> sys.stderr, “proxy imported (maximum size reached) %d” % proxyTally
print >> sys.stderr, “fully imported (pre lazyimport) %d %r”%(len(existing), sorted(existing))
print >> sys.stderr, “fully imported (via lazyimport) %d %r”%(len(reals), sorted(reals))
print >> sys.stderr, “fully imported (via allowed bypass) %d %r”%(len(ignored), sorted(ignored))

modules = set(k for k,v in sys.modules.iteritems() if v)
diff = modules-reals-proxies-ignored-existing
print >> sys.stderr, “fully imported (lost track of these) %d %r”%(len(diff), sorted(diff))

builtins = set(sys.builtin_module_names)
diff = builtins & proxies
print >> sys.stderr, “builtins (proxied) %d %r” % (len(diff), diff)
diff = builtins & (reals | existing)
print >> sys.stderr, “builtins (fully imported) %d %r” % (len(diff), diff)
diff = builtins – proxies – reals – existing
print >> sys.stderr, “builtins (not imported) %d %r” % (len(diff), diff)

def loadModule(proxy, name, loader):
#see if the module is already loaded
mod = sys.modules.get(name, None)
#avoid isinstace on mod, because it will cause __class__ lookup and this
#causes recursion
if mod is not proxy and isinstance(mod, ModuleType):
return mod

#load the module
mod = loader.load_module(name)
replaceModule(proxy, mod)

reals.add(name)
try:
proxies.remove(name)
except KeyError:
pass
report(“load “+name)
return mod

def replaceModule(proxy, mod):
“”” Find all dicts where proxy is, and replace it with the actual module.
Typcially, this is the sys.modules and any module dicts.
“””
for e in gc.get_referrers(proxy):
if isinstance(e, dict):
for k, v in e.iteritems():
if v is proxy:
e[k] = mod

class ModuleProxy(object):
def __init__(self, name, loader):
global proxyTally
object.__setattr__(self, “_args”, (name, loader))
proxies.add(name)
proxyTally += 1
#report(“proxy “+name)

# we don’t add any docs for the module in case the
# user tries accessing ‘__doc__’
def __getattribute__(self, key):
if key in [“_args”]:
return object.__getattribute__(self, key)
mod = loadModule(self, *self._args)
return getattr(mod, key)

def __setattr__(self, key, value):
mod = loadModule(self, *self._args)
setattr(mod, key, value)

def __dir__(self):
#modules have special dir handling, invoke that.
return dir(loadModule(self, *self._args))

def __repr__(self):
return “” %(self._args,)

class StandardLoader(object):
“”” A class that wraps the standard imp.load_module into
the new style object hook api, for consistency here
“””
def __init__(self, pathname, desc):
self.pathname, self.desc = pathname, desc

def __repr__(self):
return “” %(self.pathname, self.desc)

def load_module(self, fullname):
try:
f = open(self.pathname, ‘U’)
except:
f = None
try:
return imp.load_module(fullname, f, self.pathname, self.desc)
finally:
if f:
f.close()

class OnDemandLoader(object):
“”” The loader takes a name and real loader of the module to load and
“loads” it – in this case returning loading a proxy that
will only load the class when an attribute is accessed.
“””
def __init__(self, real_loader):
self.real_loader = real_loader

def load_module(self, fullname):
mod = sys.modules.get(fullname)
if not mod:
mod = ModuleProxy(fullname, self.real_loader)
sys.modules[fullname] = mod
return mod

class OnDemandImporter(object):
“”” The on-demand importer imports a module proxy that
inserts the desired module into the calling scope only when
an attribute from the module is actually used.
“””
def find_module(self, fullname, path=None):
if path:
#only bother with sub-modules if they are being loaded
#correctly, i.e. the parent module is already in sys.modules
head, tail = fullname.rsplit(‘.’, 1)
if not sys.modules.get(head):
return None
else:
tail = fullname

# See if the module can be found. It might be trying a relative
# import for example, so often modules are not found.
try:
f, pathname, desc = imp.find_module(tail, path)
if f:
f.close()
except ImportError:
return None #no zip found either

#Now, ignore some modules that we don’t want
#Since this is the meta_path, we just pass it on to the
#rest of the machinery, i.e. pretend not to have found it.
if ignore_module(fullname, pathname):
return None

#Ok, we are going to load this lazily
real_loader = StandardLoader(pathname, desc)
return OnDemandLoader(real_loader)

class OnDemandZipImporter(object):
def __init__(self, path):
importer = zipimport.zipimporter(path)
self.real_importer = importer
self.is_package = importer.is_package
self.get_code = importer.get_code
self.get_source = importer.get_source
self.get_data = importer.get_data
self.get_filename = importer.get_filename

def find_module(self, fullname, path=None):
result = self.real_importer.find_module(fullname, path)
if result is None:
return None
return self

def load_module(self, fullname):
if ignore_module(fullname, self.real_importer.archive):
return self.real_importer.load_module(fullname)

mod = sys.modules.get(fullname)
if not mod:
mod = ModuleProxy(fullname, self.real_importer)
sys.modules[fullname] = mod
return mod

onDemandImporter = OnDemandImporter()
RealReload = reload
def LazyReload(module):
if type(module) is ModuleType:
return RealReload(module)

def install():
if onDemandImporter not in sys.meta_path:
sys.meta_path.append(onDemandImporter)
try:
idx = sys.path_hooks.index(zipimport.zipimporter)
sys.path_hooks[idx] = OnDemandZipImporter
except ValueError:
pass

__builtin__.reload = LazyReload

def uninstall():
try:
sys.meta_path.remove(onDemandImporter)
try:
idx = sys.path_hooks.index(OnDemandZipImporter)
sys.path_hooks[idx] = zipimport.zipimporter
except ValueError:
pass
except ValueError:
return
__builtin__.reload = RealReload

def ignore_module(fullname, pathname=None):
“””
See if we want to ignore demand-loading of this module for any reason
“””
ignore = False
if fullname in ignorenames:
ignore = True
for pkg in ignorepkg:
if fullname.startswith(pkg):
ignore = True
if pathname:
for path in ignorepath:
if path in pathname.lower():
ignore = True
if ignore:
ignored.add(fullname)
return ignore

[/python]

Evaluating Nagare

Introduction

A little known feature of EVE Online, disabled in the client but very much active in the server, is a web server.  This was added early in the development, before I started on the project back in 2003.  It is the main back-end access point to the game server, used for all kinds of management, status information and debugging.

Back then, Python was much less mature as a web serving framework.  Also we initially wanted just very rudimentary functionality.  So we wrote our own web server.  It, and the site it presents, were collectively called ESP, for Eve Server Pages and over the years it has grown in features and content.  The heaviest use that it sees is as the dashboard for Game Managers, where everything a GM needs to do is done through HTML pages.  It is also one of the main tools for content authoring, where game designers access a special authoring server.  ESP presents a content authoring interface that then gets stored in the backend database.

Recently we have increasingly started to look for alternatives to our homegrown HTTP solution, though.  The main reasons are:

  1. We want to use a standard Python web framework with all the bells and whistles and support that such frameworks offer
  2. We want modern Web 2.0 features without having to write them ourselves
  3. We want something that our web developers can be familiar with already
  4. We want to share expertise between the ESP pages and other web projects run by CCP.  Confluence of synergies and all that.

Stackless Python

EVE Online is based on Stackless Python.  Embedded into the the game engine is a locally patched version of Stackless Python 2.7.  We have been using Stackless since the very beginning, the existence of Stackless being the reason we chose Python as a scripting solution for our game.

Systems like the web server have always depended heavily on the use of Stackless.  Using it we have been able to provide a blocking programming interface to IO which uses asynchronous IO behind the scenes.  This allows for a simple and intuitive programming interface with good performance and excellent scalability.  For some years now we use the in-house developed StacklessIO solution which provides an alternative implementation of the socket module.  By also providing emulation of the threading module by using tasklets instead of threads, many off the shelf components simply work out of the box.  As an example, the standard library’s xmlrpc module, itself based on the socketserver module, just works without modification.

Of course, the use of Stackless python is not limited to IO.  A lot of the complicated game logic takes advantage of its programming model, having numerous systems that run as their own tasklets to do different things.  As with IO, this allows for a more intuitive programming experience.  We can write code in an imperative manner where more traditional solutions would have to rely on event driven approaches, state machines and other patterns that are better left for computers than humans to understand.  This makes CCP very much a Stackless Python shop and we are likely to stay that way for quite a bit.

Nagare

It was therefore with a great deal of interest that we noticed the announcement of Nagare on the Stackless mailing list a few years ago.

Nagare promises a different approach to web development.  Instead of web applications that are in effect giant event handlers (responding to the HTTP connectionless requests) it allows the user to write the web applications imperatively much as one would write desktop applications.  Their web site has a very interesting demo portal with a number of applications demonstrating their paradigm, complete with running apps and source code.

This resonates well with me.  I am of the opinion that the programmer is the slowest part of software development.  Anything that a development environment can do to make a programmer able to express himself in familiar, straight-forward manner is a net win.  This is why we we use tasklet blocking tasklet IO instead of writing a game based on something as befuddling as Twisted.  And for this reason I thought it worthwhile to see if a radical, forward thinking approach to web development might be right for CCP.

Tasklet pickling

Nagare achieves its magic by using a little known Stackless feature called tasklet pickling.  A tasklet that isn’t running can have its execution state pickled and stored as binary data.  The pickle contains the execution frames, local variables and other such things.

Stackless Python contains some extensions to allow pickling of various objects whose pickling isn’t supported in regular Python.  These include frames, modules and generators among other things.

When a Nagare web application reaches the point where interaction with the user is required, its state is pickled and stored with the Nagare server, and a HTTP response sent back to the client.  When a subsequent request arrives, the tasklet is unpickled and its execution continues.  From the programmer’s point of view, a function call simply took a long time.  Behind the scenes, Stackless and Nagare were working its magic, making the inherently stateless HTTP protocol seem like smooth program flow to the application.

IO Model

In other ways, Nagare is a very traditional Python web framework.  It is based around WSGI and so uses whatever threading and IO model provided to it by a WSGI server.

Application Model

Unlike some smaller frameworks, Nagare is designed and distributed as a complete web server.  One typically installs it as the single application of a virualenv root, and then configures and runs it using a script called nagare_admin.  This is very convenient for someone simply running a web server, but it becomes less obvious how to use it as a component in a larger application.

Our tests

What we were interested in doing was to see if Nagare would work within the application that is EVE.  To do this we would have to:

  1. Extract Nagare and its dependencies as a set of packages that can be used with the importing framework that EVE uses.  As I have blogged about before, Python by default assumes a central package directory and doesn’t lend itself well to isolation by embedded applications.  We have done that with EVE, however, and would want nagare  to be an “install free” part of our application.
  2. Set up the necessary WSGI infrastructure within EVE for Nagare to run on
  3. Configure and run Nagare programmatically rather than by using the top level application scripts provided by the Nagare distribution.

I specifically didn’t intend to evaluate Nagare from the web developer’s point of view.  This is because I am not one of those and find the whole domain rather alien.  This would be a technical evaluation from an embedding point of view.

Extracting

My first attempt at setting up Nagare was to fetch the source from its repository.

 svn co svn://www.nagare.org/trunk/nagare

I then intended to fetch any dependencies in a similar manual manner.  However, I soon found that to be a long and painstaking process.  In the end I gave up and used the recommended approach:  Set up a virtualenv and use:

<NAGARE_HOME>Scriptseasy_install.exe nagare

This turned out to install a lot of stuff.  This is the basic install and a total of 13 packages were installed in addition to Nagare itself, a total of almost 12Mb. The full install of Nagare increases this to a total of 22 packages and 22Mb.

The original plan was to take this and put it in a place where EVE imports its own files from.  But because of the amount of files in question, we ended up keeping them in place, and hacking EVE to import from <NAGARE_HOME>Libsite-packages.

Setting up WSGI

Nagare requires Paste and can make use of the Paste WSGI server.   Fortunately, EVE already has a built-in WSGI service, based on Paste.  It uses the standard socket module, monkeypatched to use StacklessIO tasklet-blocking sockets.  So this part is easy.

Fitting it together

This is where it finally got tricky.  Normally, Nagare is a stand-alone application, managed with config files.  A master script, nagare_admin, then reads those config files and assembles the various Nagare components into a working application.  Unfortunately, documentation about how to do this programmatically was lacking.

However, the good people of Nagare were able to help me out with the steps I needed to take, thus freeing me from having to reverse-engineer a lot of configuration code.  What I needed to do was to create a Publisher, a Session manager and the Nagare application I need to run.  For testing purposes I just wanted to run the admin app that comes with Nagare.

After configuring EVE’s importer to find Nagare and its dependencies in its default install location, the code I ended up with was this:

#Instantiate a Publisher (WSGI application)
from nagare.publishers.common import Publisher
p = Publisher()

#Register the publisher as a WSGI app with our WSGI server
sm.StartService('WSGIService').StartServer(p.urls, 8080) #our WSGI server

#instantiate a simple session manager
from nagare.sessions.memory_sessions import SessionsWithMemoryStates
sm = SessionsWithMemoryStates()

#import the app and configure it
from nagare.admin import serve, util, admin_app
app = admin_app.app
app.set_sessions_manager(sm)

#register the app with the publisher
p.register_application('admin', 'admin', app, app)

#register static resources
def lookup(r, path=r"D:nagareLibsite-packagesnagare-0.3.0-py2.5.eggstatic"):
    return serve.get_file_from_root(path, r)
p.register_static('nagare', lookup)

This gave the expected result. Browsing to port 8080 gave this image:

So, success!  EVE was serving a Nagare app from its backend.

Conclusion

These tests showed that Nagare does indeed work as a backend webserver for EVE.  In particular, the architecture of StaclessIO sockets allows most socket-based applications to just work out of the box.

Also, because Nagare is a Python package, it is inherently programmable.  So, it is possible to configure it to be a part of a larger application, rather than the stand-alone application that it is primarily designed to be.  Using Nagare as a library and not an application, however, wasn’t well documented and I had to have some help from its friendly developers and read the source code to get it to work.

On the other hand, Nagare is a large application.  Not only is Nagare itself a substantial package, it also has a lot of external dependencies.  For an embedded application of Python, such as a computer game, this is a serious drawback.   We like to be very selective about what modules we make available within EVE.  The reasons range from the purely practical (space restraints, versioning hell, build management complexity) to externally driven issues like security and licensing.

It is for this reason that we ultimately decided that Nagare wasn’t right for us as part of the EVE backend web interface.  The backend web interface started out as a minimal HTTP server and we want to keep it as slim as possible.  We are currently in the process of picking and choosing some standard WSGI components and writing special case code for our own use.  This does, however, mean that we miss out on the cool web programming paradigm that is Nagare within EVE.

Evaluating Nagare

Introduction

A little known feature of EVE Online, disabled in the client but very much active in the server, is a web server.  This was added early in the development, before I started on the project back in 2003.  It is the main back-end access point to the game server, used for all kinds of management, status information and debugging.

Back then, Python was much less mature as a web serving framework.  Also we initially wanted just very rudimentary functionality.  So we wrote our own web server.  It, and the site it presents, were collectively called ESP, for Eve Server Pages and over the years it has grown in features and content.  The heaviest use that it sees is as the dashboard for Game Managers, where everything a GM needs to do is done through HTML pages.  It is also one of the main tools for content authoring, where game designers access a special authoring server.  ESP presents a content authoring interface that then gets stored in the backend database.

Recently we have increasingly started to look for alternatives to our homegrown HTTP solution, though.  The main reasons are:

  1. We want to use a standard Python web framework with all the bells and whistles and support that such frameworks offer
  2. We want modern Web 2.0 features without having to write them ourselves
  3. We want something that our web developers can be familiar with already
  4. We want to share expertise between the ESP pages and other web projects run by CCP.  Confluence of synergies and all that.

Stackless Python

EVE Online is based on Stackless Python.  Embedded into the the game engine is a locally patched version of Stackless Python 2.7.  We have been using Stackless since the very beginning, the existence of Stackless being the reason we chose Python as a scripting solution for our game.

Systems like the web server have always depended heavily on the use of Stackless.  Using it we have been able to provide a blocking programming interface to IO which uses asynchronous IO behind the scenes.  This allows for a simple and intuitive programming interface with good performance and excellent scalability.  For some years now we use the in-house developed StacklessIO solution which provides an alternative implementation of the socket module.  By also providing emulation of the threading module by using tasklets instead of threads, many off the shelf components simply work out of the box.  As an example, the standard library’s xmlrpc module, itself based on the socketserver module, just works without modification.

Of course, the use of Stackless python is not limited to IO.  A lot of the complicated game logic takes advantage of its programming model, having numerous systems that run as their own tasklets to do different things.  As with IO, this allows for a more intuitive programming experience.  We can write code in an imperative manner where more traditional solutions would have to rely on event driven approaches, state machines and other patterns that are better left for computers than humans to understand.  This makes CCP very much a Stackless Python shop and we are likely to stay that way for quite a bit.

Nagare

It was therefore with a great deal of interest that we noticed the announcement of Nagare on the Stackless mailing list a few years ago.

Nagare promises a different approach to web development.  Instead of web applications that are in effect giant event handlers (responding to the HTTP connectionless requests) it allows the user to write the web applications imperatively much as one would write desktop applications.  Their web site has a very interesting demo portal with a number of applications demonstrating their paradigm, complete with running apps and source code.

This resonates well with me.  I am of the opinion that the programmer is the slowest part of software development.  Anything that a development environment can do to make a programmer able to express himself in familiar, straight-forward manner is a net win.  This is why we we use tasklet blocking tasklet IO instead of writing a game based on something as befuddling as Twisted.  And for this reason I thought it worthwhile to see if a radical, forward thinking approach to web development might be right for CCP.

Tasklet pickling

Nagare achieves its magic by using a little known Stackless feature called tasklet pickling.  A tasklet that isn’t running can have its execution state pickled and stored as binary data.  The pickle contains the execution frames, local variables and other such things.

Stackless Python contains some extensions to allow pickling of various objects whose pickling isn’t supported in regular Python.  These include frames, modules and generators among other things.

When a Nagare web application reaches the point where interaction with the user is required, its state is pickled and stored with the Nagare server, and a HTTP response sent back to the client.  When a subsequent request arrives, the tasklet is unpickled and its execution continues.  From the programmer’s point of view, a function call simply took a long time.  Behind the scenes, Stackless and Nagare were working its magic, making the inherently stateless HTTP protocol seem like smooth program flow to the application.

IO Model

In other ways, Nagare is a very traditional Python web framework.  It is based around WSGI and so uses whatever threading and IO model provided to it by a WSGI server.

Application Model

Unlike some smaller frameworks, Nagare is designed and distributed as a complete web server.  One typically installs it as the single application of a virualenv root, and then configures and runs it using a script called nagare_admin.  This is very convenient for someone simply running a web server, but it becomes less obvious how to use it as a component in a larger application.

Our tests

What we were interested in doing was to see if Nagare would work within the application that is EVE.  To do this we would have to:

  1. Extract Nagare and its dependencies as a set of packages that can be used with the importing framework that EVE uses.  As I have blogged about before, Python by default assumes a central package directory and doesn’t lend itself well to isolation by embedded applications.  We have done that with EVE, however, and would want nagare  to be an “install free” part of our application.
  2. Set up the necessary WSGI infrastructure within EVE for Nagare to run on
  3. Configure and run Nagare programmatically rather than by using the top level application scripts provided by the Nagare distribution.

I specifically didn’t intend to evaluate Nagare from the web developer’s point of view.  This is because I am not one of those and find the whole domain rather alien.  This would be a technical evaluation from an embedding point of view.

Extracting

My first attempt at setting up Nagare was to fetch the source from its repository.

 svn co svn://www.nagare.org/trunk/nagare

I then intended to fetch any dependencies in a similar manual manner.  However, I soon found that to be a long and painstaking process.  In the end I gave up and used the recommended approach:  Set up a virtualenv and use:

<NAGARE_HOME>Scriptseasy_install.exe nagare

This turned out to install a lot of stuff.  This is the basic install and a total of 13 packages were installed in addition to Nagare itself, a total of almost 12Mb. The full install of Nagare increases this to a total of 22 packages and 22Mb.

The original plan was to take this and put it in a place where EVE imports its own files from.  But because of the amount of files in question, we ended up keeping them in place, and hacking EVE to import from <NAGARE_HOME>Libsite-packages.

Setting up WSGI

Nagare requires Paste and can make use of the Paste WSGI server.   Fortunately, EVE already has a built-in WSGI service, based on Paste.  It uses the standard socket module, monkeypatched to use StacklessIO tasklet-blocking sockets.  So this part is easy.

Fitting it together

This is where it finally got tricky.  Normally, Nagare is a stand-alone application, managed with config files.  A master script, nagare_admin, then reads those config files and assembles the various Nagare components into a working application.  Unfortunately, documentation about how to do this programmatically was lacking.

However, the good people of Nagare were able to help me out with the steps I needed to take, thus freeing me from having to reverse-engineer a lot of configuration code.  What I needed to do was to create a Publisher, a Session manager and the Nagare application I need to run.  For testing purposes I just wanted to run the admin app that comes with Nagare.

After configuring EVE’s importer to find Nagare and its dependencies in its default install location, the code I ended up with was this:

#Instantiate a Publisher (WSGI application)
from nagare.publishers.common import Publisher
p = Publisher()

#Register the publisher as a WSGI app with our WSGI server
sm.StartService('WSGIService').StartServer(p.urls, 8080) #our WSGI server

#instantiate a simple session manager
from nagare.sessions.memory_sessions import SessionsWithMemoryStates
sm = SessionsWithMemoryStates()

#import the app and configure it
from nagare.admin import serve, util, admin_app
app = admin_app.app
app.set_sessions_manager(sm)

#register the app with the publisher
p.register_application('admin', 'admin', app, app)

#register static resources
def lookup(r, path=r"D:nagareLibsite-packagesnagare-0.3.0-py2.5.eggstatic"):
    return serve.get_file_from_root(path, r)
p.register_static('nagare', lookup)

This gave the expected result. Browsing to port 8080 gave this image:

So, success!  EVE was serving a Nagare app from its backend.

Conclusion

These tests showed that Nagare does indeed work as a backend webserver for EVE.  In particular, the architecture of StaclessIO sockets allows most socket-based applications to just work out of the box.

Also, because Nagare is a Python package, it is inherently programmable.  So, it is possible to configure it to be a part of a larger application, rather than the stand-alone application that it is primarily designed to be.  Using Nagare as a library and not an application, however, wasn’t well documented and I had to have some help from its friendly developers and read the source code to get it to work.

On the other hand, Nagare is a large application.  Not only is Nagare itself a substantial package, it also has a lot of external dependencies.  For an embedded application of Python, such as a computer game, this is a serious drawback.   We like to be very selective about what modules we make available within EVE.  The reasons range from the purely practical (space restraints, versioning hell, build management complexity) to externally driven issues like security and licensing.

It is for this reason that we ultimately decided that Nagare wasn’t right for us as part of the EVE backend web interface.  The backend web interface started out as a minimal HTTP server and we want to keep it as slim as possible.  We are currently in the process of picking and choosing some standard WSGI components and writing special case code for our own use.  This does, however, mean that we miss out on the cool web programming paradigm that is Nagare within EVE.

_ssl modifications in 2.7

I recently pushed into action a plan that had been brewing for long while: To make the SSL module in the standard library work for our StacklessIO socket library.

The problem with _ssl is that it just uses internally an native socket BIO.  It doesn’t allow the application to control details of the communications.  This is unsuitable for anyone that is using a non-standard socket implementation.

I admit that I didn’t look around, otherwise I probably would have stumbled across pyOpenSSL I simply assumed that the standard library ssl was the only one and that was clearly unsuitable since it does its own socket IO.

So, anyway, what I did was to add two different features to _ssl.sslwrap():

  1. Instead of only allowing wrapping a socket, one can pass in a Python object. This is assumed to be a PyBIO object, an object that implements the read() and write() methods. the SSLContext object thus created will internally create a BIO pair object and then any “read” or “write” calls on it may invoke corresponding Python callbacks to the PyBIO object to satisfy the SSL context’s need to send or receive data.
    This feature can be used to send the data over whatever Python IO channel one chooses to implement in Python.
  2. Additionally, the user may wrap None. In this case, a “naked” SSLContext will be created, with a BIO pair. This is then suitable for use by any other C code that knows about Open SSL, the layout of PySSLObject and how to pump data back and forth out of a BIO pair. To facilitate this, I export the _ssl api in the same way that _socket does (to _ssl), using a new function PySSLModule_ImportModuleAndAPI(void).
    The purpose of this feature is to be able to use standard Python code to manage certificates and set up the SSL contex, then hand off this context to a custom transport layer written in C/C++.

This is now complete. The changes were not large. I have written extenstions to the test_ssl.py unittests that wrap a regular socket in a PyBIO and the entire thing works.

We are now using this in EVE, to add SSL support to the backend webserver. We need to pipe data trough the _ssl module and back into our code where stackless stack swapping magic happens to emulate blocking IO.

The question that remains, is this: I´d like to contribute this stuff back, but due to Python 2.x being feature frozen, there is no good place for it. Maybe I could push this into 3.x but I haven’t looked. Maybe it does ssl differently.

So it remains, like many other goodies, part of the custom branch of stackless 2.7 that CCP uses. For the time being.