Archive for the 'Uncategorized' Category

Idioms for proxy function interfaces

At PyCon 2013 I saw a presentation, with a common function signature:

def call_later(when, function, *args):
    ...

This got me thinking about some guidelines I wrote recently on our internal tech blog about how to write such proxy functions. The current recommendation I have is for a different signature, for the reason I shall now explain:

Let’s say that you have a function that calls another function for some reason. You start with something like this:

def mywrapper(func, *args, **kwargs):
   do_something()
   return func(*args, **args)

At some point though, you add another higher level wrapper:

def mybigwrapper(func, *args, **kwargs):
   do_something()
   return mywraper(func, *args, **args)

This is ok, until someone notices that this is rather slow. The reason is, that arguments are constantly being packed and unpacked. Unnecessarily so, because no one is really looking at them. So a clever software engineer comes up with a solution:

def mywrapper(func, *args, **kwargs):
   return mywrapper_without_the_stars(args, args)

def mywrapper_without_the_stars(func, args, kwargs):
   do_something()
   return func(*args, **args)

def mybigwrapper(func, *args, **kwargs):
   do_something()
   return mywraper_without_the_stars(func, args, args)

What has happened? Yes, we have created a set of functions that do not take variable arguments, but rather just take the argument tuple and keyword dict. When you nest a number of those, there is no argument packing and unpacking going on and they are all passed through verbatim. We then have a thin layer outside that does the argument packing, for api backwards compatibility.

But there is a lesson here: Perhaps it is not such a good idea to do this style of interface in the first place. Why didn’t we just write:

def mywrapper(func, args=(), kwargs={}):
   do_something()
   return func(*args, **args)

to begin with? In my opinion, this is actually a much better interface. To illustrate, lets say that we want to wrap a call to myfunc(1,2,3). Compare these two styles:

return mywrapper(myfunc, 1, 2, 3)

 

return mywrapper(myfunc, (1, 2, 3))

In the former case, we are mixing the callable (myfunc) and its arguments (1, 2, 3) into one big list. This doesn’t really make the distinction that “myfunc” is the callable and “1″ is its first argument, but rather they look semantically to be equivalent, as if they were all just a chunk of arguments. In my opinion it is much clearer, when using this sort of proxy functions, to make a distinction between the callable and its arguments.
Therefore, this is currently the recommended way within CCP to write such wrappers. They take the argument tuple (and keyword dict) as a non-variable argument to the function.  Variable argument lists are only used in two cases:

  1. When writing a function where that is appropriate, such as logging functions
  2. When writing wrapper functions that emulat other function’s signature.

But recently, I have been thinking even more about this because passing around “args” and “kwargs” everywhere seems unnecessarily clunky. And we arrive at the thesis of this blog post:

Wrapper functions should be written and used like this:

# wrapper takes an argument-less callable
def mywrapper(func):
   do_something()
   return func()

# call myfunc with default args
a = mywrapper(myfunc)

# call myfunc with some arguments
a = mywrapper(lambda : myfunc(1, 2, 3))

# call myfunc with something from this context
def call():
    return myfunc(foo, bar)
a = mywrapper(call)

In other words: How about using Python’s powerful lambda and closure semantics to add those arguments if and when they are needed, rather than to write layer upon layer of functions that manually carry around argument tuples and keyword dicts?

Blog moved

So! My previous blog (at blogs.ccpgames.com) disappeared.  It has taken this long for me to get things back up and running.  The previous blog was on a private server run by an external party and it was compromised by one of those sneaky internet maladies that infect sites like these.

I’m in the process of salvaging all the posts from there and get them running here.  This is a somewhat painstaking process.  I was provided with some files in a folder and I have had to re-learn unix (Its been 10 years since I used that regularly), learn about Apache, WordPress, mysql, multi-site installs and all kinds of things.  10 years ago, they didn’t have sudo.  I’m not sure that I like it.

Anyway, I hope to finish this soon and start blogging again about my adventures in Python. PyCon 2013 is coming up and I have, as always, some ideas to put forward and Swiss army knives to grind.

Spam filtering

For the past montsh this blog has been completely overwhelmed with comment spam. I’ve finally had a solution installed (Akismet) and cleaned up the thousands of pending comments. I’ll be able to contribute stuff again!

Float object reuse

I thought I’d mention a cool little patch we did to Python some years back.

We work with database tables a lot.  Game configuration data is essentially rows in a vast database.  And those rows contain a lot of floats.  At some point I recognized that common float values were not being reused.  In particular, id(0.0) != id(0.0).  I was a bit surprized by this, since I figured, some floats must be more common than others.  Certainly, 0.0 is a bit special.

I mentioned this on python-dev some years back but with somewhat underwhelming results.  A summary of the discussion can be found here.

Anyway, I thought I’d mention this to people doing a lot of floating point.  We saved a huge amount of memory on our servers just caching integral floating point values between -10 and +10, including both the negative and positive 0.0.  These values are very frequent, for example as multipliers in tables, and so on.

Here’s some of the code:

[C]

PyObject *
PyFloat_FromDouble(double fval)
{
    register PyFloatObject *op;
    int ival;
    if (free_list == NULL) {
        if ((free_list = fill_free_list()) == NULL)
            return NULL;
        /* CCP addition, cache common values */
        if (!f_reuse[0]) {
            int i;
            for(i = 0; i<21; i++)
                f_reuse[i] = PyFloat_FromDouble((double)(i-10));
        }
    }
    /* CCP addition, check for recycling */
    ival = (int)fval;
    if ((double)ival == fval && ival>=-10 && ival <= 10) {
#ifdef MS_WINDOWS
        /* ignore the negative zero */
        if (ival || _fpclass(fval) != _FPCLASS_NZ) {
#else
        /* can't differentiate between positive and negative zeroes, ignore both */
        if (ival) {
#endif
            ival+=10;
            if (f_reuse[ival]) {
                Py_INCREF(f_reuse[ival]);
                return f_reuse[ival];
            }
        }
    }

    /* Inline PyObject_New */
    op = free_list;
    free_list = (PyFloatObject *)Py_TYPE(op);
    PyObject_INIT(op, &PyFloat_Type);
    op->ob_fval = fval;
    return (PyObject *) op;
}

[/C]

(Please excuse the lame syntax highlighter with its &amp; and &lt; thingies :)

Temporary thread state overhead

When doing IO, it is sometimes useful for a worker thread to notify Python that something has happened. Previously we have just had the Python main thread “Poll” some external variable for that, but recently we have been experimenting with having the main thread just grab the GIL and perform python work itself.

This should be straightforward. Python has an api called PyGILState_Ensure() that can be called on any thread. If that thread doesn’t already have a Python thread state, it will create a temporary one. Such a thread is sometimes called an external thread.

On a server loaded to some 40% with IO, this is what happened when I turned on this feature:

process cpu

The dark gray area is main thread CPU, (initially at around 40%) and the rest is other threads.  Turning on the “ThreadWakeup” feature adds some 20% extra cpu work to the process.

When the main thread is not working, it is idle doing a MsgWaitForMultipleObjects() Windows system call (with the GIL unclaimed).  So the worker thread should have no problem acquiring the GIL.  Further, there is only ever one woker thread doing a PyGILState_Ensure()/PyGILState_Release() at the same time, and this is ensured using locking on the worker thread side.

Further tests seem to confirm that if the worker thread already owns a Python thread state, and uses that to aquire the GIL (using a PyEval_RestoreThread() call) this overhead goes away.

This was surprising to me, but it seems to indicate that it is very expensive to “acquire a thread state on demand” to claim the GIL.  This is very unfortunate, because it means that one cannot easily use arbitrary system threads to call into Python without significant overhead.  These might be threads from the Windows thread pool for example, threads that we have no control over and therefore cannot assign thread state to.

I will try to investigate this furter, to see where the overhead is coming from.  It could be the extra TLS calls made, or simply the cost of malloc()/free() involved.  Depending on the results, there are a few options:

  1. Keep a single thread state on the side for (the single) external thread that can claim the GIL at a time, ready and initialized.
  2. Allow an external thread to ‘borrow’ another thread state and not use its own.
  3. Streamline the stuff already present.

Update, oct. 6th 2011:
Enabling dynamic GIL with tread state caching did notthing to solve this issue.
I think the problem is likely to be that spin locking is in effect for the GIL. I’ll see what happens if I explicitly define the GIL to not use spin locking.

Hey look! It’s a blog!

So, here we are, starting yet another blog.  The purpose of this one is for me to rant about the work I do for CCP.  Most of this will probably be related to Python, or even Stackless Python, but some might be completely different.  Time will tell.



Follow

Get every new post delivered to your Inbox.