Be careful about socket timeouts in Python
23 Dec 2023 - John Z. Li
The socket
package in Python’s standard library has an API to set timeout,
that is, the settimeout
member function. my_socket.settimeout(None)
sets
my_socket
to blocking mode, and my_socket.settimeout(0.0)
sets my_socket
to non-blocking mode. (They are equivalent to calling my_socket.setblocking(True)
and my_socket.setblocking(False)
, respectively.) When settimeout
is called
with a paramter of positive normal floating pointer number, the socket is set to
blocking mode with the specified timeout in seconds.
If you are a programmer who are farmilar with Linux’s TCP stack, it is tempting
to think that the timeout mechanism of Python’s socket library is implemented via
socket options SO_RCVTIMEO
and SO_SNDTIMEO
. These two options specify that
when receiving or sending data via a blocking TCP, the corresponding operations
are blocked for this period of time at most (specified using setsockopt
with
struct timeval
). The behavior of these two options are as below:
- The default values of them are
0
, meaning no timeout will never occur for blocking mode TCP’s, and having no effect on non-blocking mode TCP’s. - Only affect socket system calls that perform I/O, that is
accept
,connect
,send
,recv
and friends, but notselect
,poll
oropoll_wait
. - For I/O related system calls, if some data has been transferred before timeout,
the number of bytes transferred will be returned; If no data is transfered before
timeout,
-1
is returned with errno set toEAGAIN
orEWOULDBLOCK
, orEINPROGRESS
. (The last one is only forconnect
.)
We can see that the Linux TCP stack is designed with flexibility in mind. It allows
the user to twist how I/O operations should be done to best suit his needs. For
example, if SO_SNDTIMEO
is set to 1
second, and send
is called on a
blocking-mode TCP with a very large chuck of data, say 1GiB
,
send
is guaranteed to return within 1
second. This means, a very large size
of data being written or read can not freeze the whole application, which sometimes
is important.
Back to Python’s socket
in its standard library, it is tempting to assume that
settimeout()
works in a similar fansion. But that is not what happens. Users
of Python are sometimes caught off guard if they hold the wrong assumption.
For example, if the timeout
of a blocking socket is set to 5
seconds, is it
guaranteed that the send
function call will retuan within 5
seconds? The answer
is NO. The reason is that in each send
or recv
call on sockets, a select
is
first called with the specified timeout. If the corresponding send
or recv
can
be performed after the select
call, it move forward to actually send or receive
data from the socket. But for blocking sockets, once the sending or receiving starts,
unless there is an error, the system call will only return after all data is sent
for send
, or either all available data is received given a big enough receiving
buffer.
We can see that the design of Python’s socket
module emphasize on ease of use
at the cost of some performance loss. This means even you use epoll
to wiat on
I/O events on sockets, at the time of actually calling send
or recv
, there is
always the overhead of calling select
. This is obviously suboptimal in terms of
performance. The bright side of this design is that it makes writing simple message
exchange applications easy. This means, for blocking sockets, the send
function
is a All or nothing interface, as well as the recv
function given a big
enough receiving buffer. With this design, the users usually do not have to worry
about message fragmentation at the socket level.
This design align well with the overall design goals of Python, that is ease of use.
Some other Python network related libraties inherit the same design principles of
socket
, for example, with the below code, what is the expected behavior here?
from requests import Session
session = Session()
session.get("http://example.com/example.zip", timeout = 5)
# equivalent to
# session.get("http://example.com/example.zip", timeout = (5, 5))
It is easy to think that the session.get()
function call is guaranteed to
return in 10
seconds (5
seconds for connection timeout and 5
seconds for receiving
data). The actual behavior is that if the ‘zip’ file is very big, if the server
starts sending data within the timeout, the function will only return after the whole
file has been received. The documentation of requests
is explicit about this, quote
timeout is not a time limit on the entire response download; rather, an exception is raised if the server has not issued a response for timeout seconds (more precisely, if no bytes have been received on the underlying socket for timeout seconds). Say, if a remote server sends 1 bytes per 4 seconds,
session.get
is totally fine with it and will never trigger a timeout exception. In a sense, this is also a All or nothing interface.