Winsock timeout / closesocket()

I was working on StacklessIO for Stackless Python. When running the test_socket unittest, I came across a single failure: The testInsideTimeout would fail, with the server receiving a ECONNRESET when trying to write its “done!” string.

Normally this shouldn’t happen. Because even though the client has closed the connection, a RST packet is only sent in response to the send() call, so at the time of calling send all appears well and the call should succeed.

Investigating this further, it turned out to be due to the way I’m implementing timeout.

StacklessIO uses Winsock overlapped IO.  A request is issued and then when it is finished a thread waiting on a IOCompletion port will wake up and cause the waiting tasklet to be resumed.  To time out, for example, a recv() call, I schedule a Windows timer as well.  If it fires before the request is done, the tasklet is woken up with a timeout error.  There appears to be no way in the API to cancel a pending IO request, so at this point, the IO is still pending.

Anyway, this all is well and good, but where does the RST come from then?  Well, when the timeout occurs, the tasklet wakes up and the socket is closed.  And calling closesocket() on a connection with pending IO has at least two effects, only one of which is documented:

  1. All pending IO is canceled with the WSA_OPERATION_ABORTED error.
  2. A RST is sent to the remote party

I’ve never seen the latter behaviour documented.  But apparently then, calling closesocket() when IO is pending is equivalent to an abortive close().

I’m not sure if this is significant.  If a socket call times out, the usual recommendation is to close the connection anyway since the connection may be in an undefined state due to race conditions.  But it is a bit annoying all the same.

Advertisement

selectmodule on PS3

As part of a game that we are developing on the Sony PS3, we are porting Stackless Python 2.7 to run on it.  What would be more natural?

Python is to do what it does best, act as a supervising puppetmaster, running high-level logic that you don’t want to code using something primitive such as C++.

While there are many issues that we have come across and addressed (most are minor), I’m just going to mention a particular issue with sockets.

An important role of Python will be to manage network connection using sockets.  We are using the Stacklesssocket module with Stackless Python do do this.  For this to work, we need not only the socketmodule but also the selectmodule.

The PS3 networking api is mostly BSD compliant with a few quirks.  Porting these modules to use it is mostly a straightforward affair of providing #defines for some API calls that have different names, dealing with APIs that are missing (return NotImplementedError and such) and mapping error values to something sensible.  For the most part, it behaves exactly as you would expect.

We ran into one error though, prompting me to write special handling code for select() and poll().  the Sony implementation of these functions not only returns with an error if they themselves fail (which would be serious) but they also indicate a error return if a socket error is pending on one of the sockets.  For example, if a socket receives a ECONNRESET, while you are waiting for data to arrive, select() will return with an ECONNRESET error indicator.  Not what you would expect.

The workaround is to simply filter the error values from select() and poll() and ignore the unexpected socket errors.  Rather, such a return must be considered a successful select()/poll() and for the latter function, ‘n’, the number of valid file descriptors, having been set as -1, must be recreated by walking the list of file descriptors.