Mutable default arguments are your friend.

In a recent comment to an elderly post of mine, I was asked about the following code:

def mywrapper(func, args=(), kwargs={}):
    ...

The commenter though that I should have made a special mention about using dict as a default argument, “because it’s such a common gotcha.”

My response is twofold:

  1. This particular case is idiomatic, and widely used for functions that call other functions.
  2. I actually don’t think mutable default arguments are a problem and that they don’t deserve all the stigma they are getting.

I want to expand on point 2 a bit here.

Continue reading

PythonPlus

Time to write a little bit about this little project of mine.

tl;dr

Multithreading more responsive in a Python 2.7.  30% more requests per second.  Satisfaction guaranteed!

Introduction

After leaving my old job at CCP Games last year, I had the urge to try to collect some of the stuff that we had done for Python 2.7 over there and make it available to the world.  So I started this little fork off 2.7.

The idea was to have a place to add “improvements” to vanilla (as opposed to Stackless) 2.7, so that they could be kept separately and in sync with CPython 2.7.

Thus far, what I’ve been mostly focusing on is modernizing thread support.  (for a full list of changes, see the whatsnew file).

When we were working on DUST 514 for the Playstation I had to make certain improvements to make networking work more efficiently on that platform.  We were interfacing stackless python with the native http api of the PS3, and had to use blocking worker threads.  Marshaling from those threads to tasklets was causing unnecessary latency.

We ended up doing a lot of experiments with condition variables, in the end, providing native C implementations to minimize GIL thrashing and reducing wakeup latency to the minimum.

In PythonPlus I have done this and some other stuff in order to improve threading performance.

The threading related changes cover among other things:

  1. Adding timeout parameters to blocking calls as in the 3.x api.
  2. Adding a native threading.Condition object
  3. Improving the GIL

Adding a native Condition object aims to reduce the thread thrashing that is otherwise associated with condition variables, since a lot lof locking and context switching needs to happen for a thread to wake up with the normal .py version of those constructs.  To do this, however, the internal non-recursive locks need to be implemented using a lock and a condition variable themselves, rather than using native semaphore objects.

Changing the lock types used required the GIL to be visited, since the behaviour of the old GIL was just a random side effect of the choice of internal locks.  This also allowed me to address the old Beazley problem while at it.

The GIL change is minor.  It is simply a separate function, and when a CPU bound thread wishes to yield the GIL to another thread, it calls a new api function, _PyThread_yield_GIL().  Threads that are trying to re-aquire the GIL after unlocking them, are considered to be IO threads and have priority for the GIL when a CPU thread yields it.  But if no such thread is present, then the GIL won’t actually be yielded 99 out of every 100 yields.  This minimizes unnecessary thrashing among CPU threads, while allowing IO threads to quickly get their foot in when required.

Performance

I quickly got this all up and running, but then I had to prove it to be actually better than regular 2.7.  To do this, I set up two test scenarios:

  1. Tools/plus/giltest.py – a test platform to measure performance of concurrent cpu threads as well as the performance of pairs of producer/consumer threads synchronized either with threading.Condition or threading.Lock
  2. Tools/plus/testserver.py – a multithreaded webserver using a pool of thread and socketserver.py, being exercised by ab.

On windows, I found it easy to see improvements.  I got the GIL to behave better and I got the web server to increase throughput.  producer/consumer pairs using Condition variables got a real performance boost and those IO threads got a priority boost over regular CPU bound threads as expected.

However, my virtual linux box was more disappointing.  Tests showed that just replacing the native non-recursive lock which was based on the posix sem_t object with a construct using pthread_mutex_t and pthread_cond_t, slowed down execution.

Fixing linux

I decided that there ought ot be no good reason for a pthread_cond_t to be so much slower than a sem_t, so I decided to write my own condition object using a sem_t.  To make a long story short, it worked.  My emulated condition variable (written using a pthread_mutex_t and a sem_t) is faster than a pthread_condition_t. At least on my dual core virtual box.  Go figure.

The making of this new condition variable is a topic for a blog post on its own.  I doggedly refused to look up other implementations of condition variables based on semaphores, and wanted to come up with a solution on my own that did not violate the more subtle promises that the protocol makes.  Along the way, I was guided by failing unittests of the threading.Barrier class, which relies on the underlying threading.Condition to be true to its promise.  I was actually stuck on this problem for a few months, but after a recent eureka moment I think I succeeded.

The results

So, this has been some months in the making.  I set up the header files so that various aspects of my patch could be switched on or off, and a macro especially for performance testing then sets these in a sensible way.

giltest.py

First, the results of the giltest.py file, with various settings of the macro and on windows and linux:

giltest

Some notes on this are in order.

  1. e is “efficiency”, the cpu throughput of two concurrent cpu threads (incrementing a variable) compared to just one thread.
  2. prod/con is a pair of producer/consumer threads using a threading.Lock primitive, and the column shows the number of transactions in a time-frame (one second)
  3. The green bit shows why a GIL improvement was necessary since IO threads just couldn’t get any priority over a cpu thread.  This column is showing prod/con transactions in the presence of a cpu thread.
  4. In the end, the improvements on linux are modest.  Maybe it is because of my virtual machine.  But the Beazley problem is fixed, and IO responsiveness picks up.  On windows it is more pronounced.
  5. The final column is a pair of producer/consumer threads synchronized using a threading.Condition object.  Notice on windows how performance picks up almost threefold, ending up being some 60% of a pair that’s synchronized with a threading.Lock.

 testserver.py

Now for more real-world like results.  Here the aim was to show that running many requests in parallel was better handled using the new system.  Again, improvements on linux are harder to gauge.  In fact, my initial attempts were so disappointing on linux that I almost scrapped the project.  But when I thought to rewrite the condition variable, things changed.

testserver

  1. Notice how performance picks up with “emulated condvar” on linux (green boxes) (on windows, it is always emulated)
  2. p=1 and p=10 are the number of parallel requests that are made.  “ab” is single threaded, it shoots off n requests and then waits for them all to finish before doing the next batch, so this is perhaps not completely real-world.
  3. On linux, rps (requests per second) go up for the multi-threaded case, both when we add the new GIL (better IO responsiveness) and when we add the native threading.Condition.  Combined, it improves 30%.
  4. On windows, we see the same, except that the biggest improvement is when we modify the locks (orange boxes).
  5. On windows, we achieve better throughput with multithreading.  I.e. multiple requests now work better than single requests, whereas on linux, multiple requests performed worse.

Conclusion

These tests were performed on a dual core laptop, running windows 7.  The linux tests were done in a virtual ubuntu machine on the same laptop, using two cpus.  I’m sure that the virtual nature has its effect on the results, and so, caveat emptor.

Overall, we get 30% improvement in responsiveness when there are multiple threads doing requests using this new threading code in Python Plus.  For real world applications serving web pages, that ought to matter.

On windows, the native implementation of threading.Condition provides a staggering 167% boost in performance of two threads doing rendezvous using a condition variable.

While optimizing the linux case, I uncovered how pthread_cond_t is curiously inefficient.  A “greedy” implementation of a condition variable using the posix sem_t showed dramatic improvement on my virtual machine.  I haven’t replicated this on native linux, but I suspect that the implementors of the pthread library are using explicit scheduling, whereas we rely on the presumably greedy scheduling semantics of the semaphore primitive.  But perhaps a separate blog post on this is in order, after some more research.

Fun stuff.

Blog moved

So! My previous blog (at blogs.ccpgames.com) disappeared.  It has taken this long for me to get things back up and running.  The previous blog was on a private server run by an external party and it was compromised by one of those sneaky internet maladies that infect sites like these.

I’m in the process of salvaging all the posts from there and get them running here.  This is a somewhat painstaking process.  I was provided with some files in a folder and I have had to re-learn unix (Its been 10 years since I used that regularly), learn about Apache, WordPress, mysql, multi-site installs and all kinds of things.  10 years ago, they didn’t have sudo.  I’m not sure that I like it.

Anyway, I hope to finish this soon and start blogging again about my adventures in Python. PyCon 2013 is coming up and I have, as always, some ideas to put forward and Swiss army knives to grind.

Zombieframes. A gratuitous optimization?

Examing a recent crash case, I stumbled across this code in frameobject.c:

PyFrameObject *
PyFrame_New(PyThreadState *tstate, PyCodeObject *code, PyObject *globals,
PyObject *locals)
...
if (code->co_zombieframe != NULL) {
f = code->co_zombieframe;
code->co_zombieframe = NULL;
_Py_NewReference((PyObject *)f);
assert(f->f_code == code);
}

Intrigued by the name, I examined the header where it is defined, code.h:

...
void *co_zombieframe; /* for optimization only (see frameobject.c) */
...
} PyCodeObject;

It turns out that for every PyCodeObject object that has been executed, a PyFrameObject of a suitable size is cached and kept with the code object. Now, caching is fine and good, but this cache is unbounded. Every code object has the potential to hang on to a frame, which may then never be released.
Further, there is a separate freelist cache for PyFrameObjects already, in case a frame is not found on the code object:

if (free_list == NULL) {
f = PyObject_GC_NewVar(PyFrameObject, &PyFrame_Type,
extras);
if (f == NULL) {
Py_DECREF(builtins);
return NULL;
}
}
else {
assert(numfree > 0);
--numfree;
f = free_list;
free_list = free_list->f_back;
...

Always concious about memory these days, I tried disabling this in version 3.3 and running the pybench test. I was not able to see any conclusive difference in execution speed.

Update:

Disabling the zombieframe on the PS3 shaved off some 50k on startup.  Not the jackpot, but still, small things add up.

——————————————————————————-
PYBENCH 2.1
——————————————————————————-
* using CPython 3.3.0a3+ (default, May 23 2012, 20:02:34) [MSC v.1600 64 bit (AMD64)]
* disabled garbage collection
* system check interval set to maximum: 2147483647
* using timer: time.perf_counter
* timer: resolution=2.9680909446810176e-07, implementation=QueryPerformanceCounter()

——————————————————————————-
Benchmark: nozombie
——————————————————————————-

Rounds: 10
Warp: 10
Timer: time.perf_counter

Machine Details:
Platform ID: Windows-7-6.1.7601-SP1
Processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel

Python:
Implementation: CPython
Executable: D:pydevhgcpython2pcbuildamd64python.exe
Version: 3.3.0a3+
Compiler: MSC v.1600 64 bit (AMD64)
Bits: 64bit
Build: May 23 2012 20:02:34 (#default)
Unicode: UCS4

——————————————————————————-
Comparing with: zombie
——————————————————————————-

Rounds: 10
Warp: 10
Timer: time.perf_counter

Machine Details:
Platform ID: Windows-7-6.1.7601-SP1
Processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel

Python:
Implementation: CPython
Executable: D:pydevhgcpython2pcbuildamd64python.exe
Version: 3.3.0a3+
Compiler: MSC v.1600 64 bit (AMD64)
Bits: 64bit
Build: May 23 2012 20:00:42 (#default)
Unicode: UCS4

Test minimum run-time average run-time
this other diff this other diff
——————————————————————————-
BuiltinFunctionCalls: 51ms 52ms -3.3% 52ms 53ms -2.0%
BuiltinMethodLookup: 33ms 33ms +0.0% 34ms 34ms +0.8%
CompareFloats: 50ms 50ms +0.1% 50ms 50ms +0.4%
CompareFloatsIntegers: 99ms 98ms +0.8% 99ms 99ms +0.6%
CompareIntegers: 77ms 77ms -0.5% 77ms 77ms -0.3%
CompareInternedStrings: 60ms 60ms +0.0% 61ms 61ms -0.1%
CompareLongs: 46ms 45ms +1.5% 46ms 45ms +1.2%
CompareStrings: 61ms 59ms +3.6% 61ms 59ms +3.6%
ComplexPythonFunctionCalls: 60ms 58ms +3.3% 60ms 58ms +3.2%
ConcatStrings: 48ms 47ms +2.4% 48ms 47ms +2.1%
CreateInstances: 58ms 57ms +1.3% 59ms 58ms +1.3%
CreateNewInstances: 43ms 43ms +1.1% 44ms 44ms +1.1%
CreateStringsWithConcat: 79ms 79ms -0.3% 79ms 79ms -0.1%
DictCreation: 71ms 71ms +0.4% 72ms 72ms +1.0%
DictWithFloatKeys: 72ms 70ms +2.1% 72ms 71ms +1.8%
DictWithIntegerKeys: 46ms 46ms +0.7% 46ms 46ms +0.4%
DictWithStringKeys: 41ms 41ms +0.0% 41ms 41ms -0.1%
ForLoops: 35ms 37ms -4.0% 35ms 37ms -4.0%
IfThenElse: 64ms 64ms -0.1% 64ms 64ms -0.4%
ListSlicing: 49ms 50ms -1.0% 53ms 53ms -0.8%
NestedForLoops: 54ms 51ms +6.7% 55ms 51ms +6.7%
NestedListComprehensions: 54ms 54ms -0.7% 54ms 55ms -2.2%
NormalClassAttribute: 94ms 94ms +0.1% 94ms 94ms +0.1%
NormalInstanceAttribute: 54ms 54ms +0.3% 54ms 54ms +0.2%
PythonFunctionCalls: 58ms 57ms +0.8% 58ms 58ms +0.6%
PythonMethodCalls: 65ms 61ms +6.3% 66ms 62ms +5.9%
Recursion: 84ms 85ms -1.0% 85ms 85ms -0.9%
SecondImport: 74ms 76ms -2.5% 74ms 77ms -3.5%
SecondPackageImport: 75ms 78ms -3.8% 76ms 79ms -3.9%
SecondSubmoduleImport: 163ms 169ms -3.4% 164ms 170ms -3.3%
SimpleComplexArithmetic: 43ms 43ms +1.0% 43ms 43ms +1.0%
SimpleDictManipulation: 80ms 78ms +2.2% 81ms 79ms +2.4%
SimpleFloatArithmetic: 42ms 42ms +0.1% 42ms 42ms -0.0%
SimpleIntFloatArithmetic: 52ms 53ms -1.2% 52ms 53ms -1.1%
SimpleIntegerArithmetic: 52ms 52ms -0.7% 52ms 53ms -0.8%
SimpleListComprehensions: 45ms 45ms -0.2% 45ms 45ms +0.3%
SimpleListManipulation: 44ms 46ms -4.0% 44ms 46ms -3.9%
SimpleLongArithmetic: 32ms 32ms -0.9% 32ms 32ms -0.1%
SmallLists: 58ms 57ms +1.2% 58ms 67ms -12.8%
SmallTuples: 64ms 65ms -0.5% 65ms 65ms -0.2%
SpecialClassAttribute: 148ms 149ms -0.8% 149ms 150ms -1.0%
SpecialInstanceAttribute: 54ms 54ms +0.2% 54ms 54ms +0.0%
StringMappings: 120ms 117ms +2.5% 120ms 117ms +2.5%
StringPredicates: 62ms 62ms +0.9% 62ms 62ms +1.0%
StringSlicing: 69ms 68ms +1.6% 69ms 68ms +2.1%
TryExcept: 37ms 37ms +0.0% 37ms 37ms +0.5%
TryFinally: 40ms 37ms +6.7% 40ms 37ms +6.5%
TryRaiseExcept: 19ms 20ms -1.0% 20ms 20ms -0.4%
TupleSlicing: 65ms 65ms +0.5% 66ms 65ms +1.2%
WithFinally: 57ms 56ms +1.9% 57ms 56ms +2.1%
WithRaiseExcept: 53ms 53ms +0.3% 54ms 54ms -0.8%
——————————————————————————-
Totals: 3154ms 3145ms +0.3% 3176ms 3177ms -0.0%

(this=nozombie, other=zombie)

I’m going to remove this weird, unbounded cache from the python interpreter we use on the PS3.

Clearing weakrefs

I just had this problem which would have been elegantly solved with the ability to manually clear weak references pointing to an object. I am (for technical reasons) recycling an object, so instead of killing it and re-creating it, I re-initialize it. But that leaves old weak references in place. How nice wouldn’t it be to be able to call “myobject.clear_weakrefs()”?

Temporary thread state overhead

When doing IO, it is sometimes useful for a worker thread to notify Python that something has happened. Previously we have just had the Python main thread “Poll” some external variable for that, but recently we have been experimenting with having the main thread just grab the GIL and perform python work itself.

This should be straightforward. Python has an api called PyGILState_Ensure() that can be called on any thread. If that thread doesn’t already have a Python thread state, it will create a temporary one. Such a thread is sometimes called an external thread.

On a server loaded to some 40% with IO, this is what happened when I turned on this feature:

process cpu

The dark gray area is main thread CPU, (initially at around 40%) and the rest is other threads.  Turning on the “ThreadWakeup” feature adds some 20% extra cpu work to the process.

When the main thread is not working, it is idle doing a MsgWaitForMultipleObjects() Windows system call (with the GIL unclaimed).  So the worker thread should have no problem acquiring the GIL.  Further, there is only ever one woker thread doing a PyGILState_Ensure()/PyGILState_Release() at the same time, and this is ensured using locking on the worker thread side.

Further tests seem to confirm that if the worker thread already owns a Python thread state, and uses that to aquire the GIL (using a PyEval_RestoreThread() call) this overhead goes away.

This was surprising to me, but it seems to indicate that it is very expensive to “acquire a thread state on demand” to claim the GIL.  This is very unfortunate, because it means that one cannot easily use arbitrary system threads to call into Python without significant overhead.  These might be threads from the Windows thread pool for example, threads that we have no control over and therefore cannot assign thread state to.

I will try to investigate this furter, to see where the overhead is coming from.  It could be the extra TLS calls made, or simply the cost of malloc()/free() involved.  Depending on the results, there are a few options:

  1. Keep a single thread state on the side for (the single) external thread that can claim the GIL at a time, ready and initialized.
  2. Allow an external thread to ‘borrow’ another thread state and not use its own.
  3. Streamline the stuff already present.

Update, oct. 6th 2011:
Enabling dynamic GIL with tread state caching did notthing to solve this issue.
I think the problem is likely to be that spin locking is in effect for the GIL. I’ll see what happens if I explicitly define the GIL to not use spin locking.

namedtuple and exec()

In our port of Python 2.7 to the PS3 console, we have deliberately removed the python compiler. This was mainly done to save on the code size, since on a console every byte is sacred.  An additional benefit is slight hardening against certain kinds of attacks, since evil constructs such as eval() and exec() now raise the NotImplementedError when used.

Program code is pre-compiled and put in .zip archives so there is no need for regular compilation on the console. The most serious problem we encountered though, was with the new namedtuple construct.

The namedtuple is implemented in the collections module by constructing a class declaration with string interpolation and then calling exec() on it. With exec() removed, a lot of the standard library turned out to fail on import.

Our initial fix was simply to replace the namedtuples with regular tuples:
[python]
def namedtuple(typename, field_names, verbose=False, rename=False):
return tuple
[/python]
This worked surprisingly well. The parts of the library we were using were still using namedtuples just like regular tuples and all was well.

Recently, however, we found that the urlparse module was making non-trivial use of it so something needed to be done.  My initial reflex was to dive in and reimplement it using a metaclass or some such. But then I thought of asking the internet.

It turns out that this exists as an issue in the Python bug tracker.  Someone else had come across this oddity in the standard library and submitted an alternative implementation.  This works perfectly for our purposes.

I know that there is nothing inherently evil about using exec in Python, but this particular case still doesn’t quite ring true to me:  If the best way to implement a class is by resorting to the meta-language, doesn’t that indicate some shortcoming in the language itself?

Lazy Import

As Richard Tew mentioned on his blog, we are using the lazy importing trick to reduce memory overhead.

We improved the original module by adding some features:

  • A simpler and more robust injection mechanism using gc.get_referrers()
  • Supporting zip imports
  • Supporting reload()
  • A reporting capability.
  • Supporting bypassing selected modules or packages.

The last bit is important because some modules may be used internally by C and those cannot be treated with this.  In our case, we actually import this module from C right after importing site.py, in order to get maximum benefit, so we may be seeing more problems than the casual user who imports it from a .py file.

I don’t have any other good place to put this, so I’m just leaving it here for the time being.

[python]
# Copyright rPath, Inc., 2006
# Available under the python license
“”” Defines an on-demand importer that only actually loads modules when their
attributes are accessed. NOTE: if the ondemand module is viewed using
introspection, like dir(), isinstance, etc, it will appear as a
ModuleProxy, not a module, and will not have the correct attributes.
Barring introspection, however, the module will behave as normal.
“””

# modified for CCP by Kristján Valur Jónsson:
# – Use the gc.getreferrers() method to replace module references
# – Add zip support
# – support reload()
# – Add reporting and memory analysis
# – Add bypass mechanism for modules where this causes problems

import sys
import imp
import gc
import __builtin__
import zipimport

memory_query_func = None #set this to something returning memory use
verbose = False

ModuleType = type(sys)

#modules that bypass this mechanism
ignorenames = set() #module names
ignorepkg = set() #package names
ignorepath = set() #paths to ignore

#side effect register string unicode handling / conversion
ignorenames |= set([“encodings”])
#side effect prevent internal Python borrowed reference choking
ignorenames |= set([“warnings”])

#statistics
proxies = set()
proxyTally = 0
reals = set()
ignored = set()
existing = set(k for k,v in sys.modules.iteritems() if v)

def report(arg=””):
if not verbose:
return
loaded = arg.startswith(“load “)
if loaded:
if memory_query_func is not None:
print >> sys.stderr, “lazyimport: %s (now using %0.3f Mb)” % (arg, memory_query_func())
else:
print >> sys.stderr, “lazyimport: %s” % arg
else:
if memory_query_func is not None:
print >> sys.stderr, “lazyimport report: %s (now using %0.3f Mb)” % (arg, memory_query_func())
else:
print >> sys.stderr, “lazyimport report: %s” % arg

if verbose > 1 or not loaded:
print >> sys.stderr, “proxy imported %d %r”%(len(proxies), sorted(proxies))
print >> sys.stderr, “proxy imported (maximum size reached) %d” % proxyTally
print >> sys.stderr, “fully imported (pre lazyimport) %d %r”%(len(existing), sorted(existing))
print >> sys.stderr, “fully imported (via lazyimport) %d %r”%(len(reals), sorted(reals))
print >> sys.stderr, “fully imported (via allowed bypass) %d %r”%(len(ignored), sorted(ignored))

modules = set(k for k,v in sys.modules.iteritems() if v)
diff = modules-reals-proxies-ignored-existing
print >> sys.stderr, “fully imported (lost track of these) %d %r”%(len(diff), sorted(diff))

builtins = set(sys.builtin_module_names)
diff = builtins & proxies
print >> sys.stderr, “builtins (proxied) %d %r” % (len(diff), diff)
diff = builtins & (reals | existing)
print >> sys.stderr, “builtins (fully imported) %d %r” % (len(diff), diff)
diff = builtins – proxies – reals – existing
print >> sys.stderr, “builtins (not imported) %d %r” % (len(diff), diff)

def loadModule(proxy, name, loader):
#see if the module is already loaded
mod = sys.modules.get(name, None)
#avoid isinstace on mod, because it will cause __class__ lookup and this
#causes recursion
if mod is not proxy and isinstance(mod, ModuleType):
return mod

#load the module
mod = loader.load_module(name)
replaceModule(proxy, mod)

reals.add(name)
try:
proxies.remove(name)
except KeyError:
pass
report(“load “+name)
return mod

def replaceModule(proxy, mod):
“”” Find all dicts where proxy is, and replace it with the actual module.
Typcially, this is the sys.modules and any module dicts.
“””
for e in gc.get_referrers(proxy):
if isinstance(e, dict):
for k, v in e.iteritems():
if v is proxy:
e[k] = mod

class ModuleProxy(object):
def __init__(self, name, loader):
global proxyTally
object.__setattr__(self, “_args”, (name, loader))
proxies.add(name)
proxyTally += 1
#report(“proxy “+name)

# we don’t add any docs for the module in case the
# user tries accessing ‘__doc__’
def __getattribute__(self, key):
if key in [“_args”]:
return object.__getattribute__(self, key)
mod = loadModule(self, *self._args)
return getattr(mod, key)

def __setattr__(self, key, value):
mod = loadModule(self, *self._args)
setattr(mod, key, value)

def __dir__(self):
#modules have special dir handling, invoke that.
return dir(loadModule(self, *self._args))

def __repr__(self):
return “” %(self._args,)

class StandardLoader(object):
“”” A class that wraps the standard imp.load_module into
the new style object hook api, for consistency here
“””
def __init__(self, pathname, desc):
self.pathname, self.desc = pathname, desc

def __repr__(self):
return “” %(self.pathname, self.desc)

def load_module(self, fullname):
try:
f = open(self.pathname, ‘U’)
except:
f = None
try:
return imp.load_module(fullname, f, self.pathname, self.desc)
finally:
if f:
f.close()

class OnDemandLoader(object):
“”” The loader takes a name and real loader of the module to load and
“loads” it – in this case returning loading a proxy that
will only load the class when an attribute is accessed.
“””
def __init__(self, real_loader):
self.real_loader = real_loader

def load_module(self, fullname):
mod = sys.modules.get(fullname)
if not mod:
mod = ModuleProxy(fullname, self.real_loader)
sys.modules[fullname] = mod
return mod

class OnDemandImporter(object):
“”” The on-demand importer imports a module proxy that
inserts the desired module into the calling scope only when
an attribute from the module is actually used.
“””
def find_module(self, fullname, path=None):
if path:
#only bother with sub-modules if they are being loaded
#correctly, i.e. the parent module is already in sys.modules
head, tail = fullname.rsplit(‘.’, 1)
if not sys.modules.get(head):
return None
else:
tail = fullname

# See if the module can be found. It might be trying a relative
# import for example, so often modules are not found.
try:
f, pathname, desc = imp.find_module(tail, path)
if f:
f.close()
except ImportError:
return None #no zip found either

#Now, ignore some modules that we don’t want
#Since this is the meta_path, we just pass it on to the
#rest of the machinery, i.e. pretend not to have found it.
if ignore_module(fullname, pathname):
return None

#Ok, we are going to load this lazily
real_loader = StandardLoader(pathname, desc)
return OnDemandLoader(real_loader)

class OnDemandZipImporter(object):
def __init__(self, path):
importer = zipimport.zipimporter(path)
self.real_importer = importer
self.is_package = importer.is_package
self.get_code = importer.get_code
self.get_source = importer.get_source
self.get_data = importer.get_data
self.get_filename = importer.get_filename

def find_module(self, fullname, path=None):
result = self.real_importer.find_module(fullname, path)
if result is None:
return None
return self

def load_module(self, fullname):
if ignore_module(fullname, self.real_importer.archive):
return self.real_importer.load_module(fullname)

mod = sys.modules.get(fullname)
if not mod:
mod = ModuleProxy(fullname, self.real_importer)
sys.modules[fullname] = mod
return mod

onDemandImporter = OnDemandImporter()
RealReload = reload
def LazyReload(module):
if type(module) is ModuleType:
return RealReload(module)

def install():
if onDemandImporter not in sys.meta_path:
sys.meta_path.append(onDemandImporter)
try:
idx = sys.path_hooks.index(zipimport.zipimporter)
sys.path_hooks[idx] = OnDemandZipImporter
except ValueError:
pass

__builtin__.reload = LazyReload

def uninstall():
try:
sys.meta_path.remove(onDemandImporter)
try:
idx = sys.path_hooks.index(OnDemandZipImporter)
sys.path_hooks[idx] = zipimport.zipimporter
except ValueError:
pass
except ValueError:
return
__builtin__.reload = RealReload

def ignore_module(fullname, pathname=None):
“””
See if we want to ignore demand-loading of this module for any reason
“””
ignore = False
if fullname in ignorenames:
ignore = True
for pkg in ignorepkg:
if fullname.startswith(pkg):
ignore = True
if pathname:
for path in ignorepath:
if path in pathname.lower():
ignore = True
if ignore:
ignored.add(fullname)
return ignore

[/python]