Python close_fds issue

| Git | python | trac

So I spent the better part of my evening trying to track down a bug, which turns out to be a “feature” of python.

I had just installed the GitPlugin for trac but I started to experience problems. When browsing the source inside trac it was taking over 30seconds to load the page and sometimes it would fail completely. A lot of searching didn’t help much, so I attempted to debug the problem myself. The first thing I noticed was Apache was taking 100% of the processor for a good 30seconds. I attached strace to it and saw something like this:

[pid 22682] close(43029)                = -1 EBADF (Bad file descriptor)
[pid 22682] close(43030)                = -1 EBADF (Bad file descriptor)
[pid 22682] close(43031)                = -1 EBADF (Bad file descriptor)
[pid 22682] close(43032)                = -1 EBADF (Bad file descriptor)
[pid 22682] close(43033)                = -1 EBADF (Bad file descriptor)
[pid 22682] close(43034)                = -1 EBADF (Bad file descriptor)
[pid 22682] close(43035)                = -1 EBADF (Bad file descriptor)
[pid 22682] close(43036)                = -1 EBADF (Bad file descriptor)

This obviously didn’t look good! After some tinkering I found the problem went away when I ran trac standalone, instead of using mod_python or fcgi. This turned out to be a bit of a red herring because I spent my time trying to figure out what was different between a standalone executable and one being run inside Apache.

After playing around with environment variables, I gave up and attempted to printf debug the trac git plugin. I found that the actual call to git was taking on the order of seconds, whereas calling it myself from the command took milliseconds. The line of code (in PyGIT.py) looked a bit like this:

p = Popen(self.__build_git_cmd(git_cmd, *cmd_args), stdin=None, stdout=PIPE,
			                                        stderr=PIPE, close_fds=True)

Now, when I removed the close_fds argument the problems went away! After some more digging I found this bug report which describes the behaviour of close_fds. Python will spin in a tight loop calling close for all possible valid fd number just incase it was previously used. WTF! You can see the python code here:

def _close_fds(self, but):
    os.closerange(3, but)
    os.closerange(but + 1, MAXFD)

So the simple fix to this was to remove the close_fds, so that Python doesn’t stupidly spin calling close(). I suspect the reason I only noticed this when running inside Apache, is because Apache must have a larger MAXFD. Hopefully in the future Python will change this behaviour and find a more sensible way to close all file descriptors, especially when I read this bug report which advises changing close_fds default to true.