Lately I’ve been playing with asyncio, a new package being introduced in Python 3.4 for rebooted asynchronous IO support in the standard library.
It’s very nice, and despite there being no documentation apart from the PEP at the moment, I’ve found it quite straight forward to work with. I thought I’d share some examples and compare it to my experiences with Gevent and Twisted.
While Gevent and Twisted aim to be higher level frameworks, asyncio aims to be a lower-level implementation of an asynchronous event loop, with the intention that higher level frameworks like Twisted, Gevent or Tornado, will build on top of it. However by itself, it makes a suitable framework on its own.
By providing a common event loop for all the major frameworks to plug into, the intent is that you can mix and match all the different frameworks together and have it just work.
Here are some quick examples of what the code looks like:
Asynchronous Sleep
Nothing fancy. Just sleep for 5 seconds. But without blocking the main loop.
import asyncio
import time
@asyncio.coroutine
def sleepy():
print("before sleep", time.time())
yield from asyncio.sleep(5)
print("after sleep", time.time())
asyncio.get_event_loop().run_until_complete(sleepy())
Simple echo server
This is a simple echo server, used for showing you how to start a server with a
given protocol. You can connect to it with telnet 127.0.0.1 4444
. Everything
you type into the telnet session will be sent back to you from the server.
import asyncio
loop = asyncio.get_event_loop()
# an instance of EchoProtocol will be created for each client connection.
class EchoProtocol(asyncio.Protocol):
def connection_made(self, transport):
self.transport = transport
def data_received(self, data):
self.transport.write(data)
def connection_lost(self, exc):
server.close()
# run the coroutine to establish the server connection, then keep running
# the event loop until the server is stopped.
server = loop.run_until_complete(loop.create_server(EchoProtocol, '127.0.0.1', 4444))
loop.run_until_complete(server.wait_closed())
Async clamd client (virus scanner)
This is the longest example, but it’s basically a
clamd client. It will open a file from sys.argv
and
send it to clamd for scanning. It displays how you can use asyncio.Future
objects to communicate between coroutines, and how you create client TCP
connections.
import asyncio
import sys
import struct
loop = asyncio.get_event_loop()
class ClamAVProtocol(asyncio.Protocol):
def __init__(self, future, payload):
self.future = future
self.payload = payload
self.response_data = b''
def connection_made(self, transport):
self.transport = transport
self.transport.write(b'nINSTREAM\n')
size = struct.pack(b'!L', len(self.payload))
self.transport.write(size + self.payload)
self.transport.write(struct.pack(b'!L', 0))
def data_received(self, data):
self.response_data += data
if b'\n' not in self.response_data:
return
self.transport.close()
response = self.response_data.split(b'\n')[0]
# set the result on the Future so that the main() coroutine can
# resume
if response.endswith(b'FOUND'):
name = response.split(b':', 1)[1].strip()
self.future.set_result((True, name))
else:
self.future.set_result((False, None))
def clamav_scan(payload):
future = asyncio.Future()
if payload:
scanner = ClamAVProtocol(future, payload)
# kick off a task to create the connection to clamd.
asyncio.async(loop.create_connection(lambda: scanner, host='127.0.0.1', port=3310))
else:
future.set_result((False, None))
# return the future for the main() coroutine to wait on.
return future
def main():
with open(sys.argv[1], 'rb') as f:
body = f.read()
found_virus, name = yield from clamav_scan(body)
if found_virus:
print("Found a virus! %s" % name)
else:
print("No virus. Everything is safe.")
if __name__ == '__main__':
loop.run_until_complete(main())
There’s obviously a lot more to it, but that’s the basic gist of using the library. How does it compare to Twisted and Gevent, though?
Versus Twisted
As @eevee said:
That really does capture it quite well. asyncio’s Protocol class provides much
of the same interface as Twisted’s Protocol class, in that you can pause/resume
producing of transports, you have connection_made
, data_received
and
connection_lost
methods, as well as other things. So in terms of the
Protocol/Transport API, Twisted and asyncio can be considered roughly the same.
I won’t go into it very much - you can read more on the PEP and draw your own
conclusions.
Both libraries provide a way to defer blocking operations to threads, to avoid blocking the main loop, and communicating the results back to the main loop when done.
In Twisted, you typically plug things into the event loop by using methods on
twisted.internet.reactor
, like callLater
. Similarly in asyncio, you plug
things into the event loop by using the object returned by
asyncio.get_event_loop()
. Beware that in asyncio, though, you cannot pass a
coroutine to any of the call_*
methods. This bit me and took me a while to
figure out.
Coroutines written for asyncio use yield from
syntax to signify asynchronous
operations while still looking reasonably synchronous otherwise. This is
similar to Twisted’s defer.inlineCallbacks
decorator. If callbacks are more
your style, though, you can add them to Tasks and Futures with the
add_done_callback
method.
Something that asyncio does provide that Twisted doesn’t, is asynchronous signal handling. This is something I’ve wanted to see in Twisted for a while, and I’m not sure why it doesn’t have it.
By far, Twisted’s advantage is in its protocols and helpers, though. It has
protocols for just about everything, meaning you very rarely have to implement
anything yourself. If you’ve used twisted.web.client.getPage
, you’ll feel a
bit of frustration using asyncio as it doesn’t have anything like this - you
need to implement it all yourself. The same goes for
twisted.internet.defer.maybeDeferred
- if you want anything like this, you’ll
have to implement it yourself. This is basically Twisted’s biggest selling
point to me.
As I mentioned earlier, asyncio is intended as more of a lower-level implementation, and there are already efforts to run Twisted on it, so this likely won’t be an issue for much longer, but that point, you’re just running Twisted.
Versus Gevent
Gevent’s big pitch is that it makes synchronous code asynchronous, typically by monkey patching the standard library to make other packages think they’re still running synchronously. This means that if your network-bound application is written synchronously, you can, with a bit of effort, get it running asynchrously under Gevent for a good performance boost under heavier loads. Definitely much less effort than rewriting it to use another library.
As hinted by the monkey patching, Gevent is a very big on implicit behaviour. As such, there is no “event loop” object. This goes against the zen of Python in a pretty big way, but I can see the advantage - it’s great at taking existing code synchronous code and making it asynchronous without the effort of a full rewrite.
As with all things like this, there are naturally edgecases. Gevent has been working towards a 1.0 release for a few years now (which they did today!), which has mostly been ironing out these edgecases, and ensuring that the API of what is being monkey patched has been replicated thoroughly.
Traditionally, Gevent can patch out the threading
module as well, turning
threads into Greenlets, which are basically coroutines. This has the
disadvantage and implication that you aren’t really supposed to use threads in
Gevent-based applications. Compared to both Twisted and asyncio’s ability to
defer tasks to threads, it’s a little frustrating. You’re given no primitives
for deferring long-blocking operations to threads, leaving you to write your
own and avoid monkey patching the threading
module.
So what are your options? You could go with a fork-based approach to spread out across multiple cores, but there are caveats to that as well. You could also go with something like celery for your blocking tasks, but it is very unclear in my reseach how well the client library works with Gevent. But what should you do? That’s unclear. I did say there were edgecases.
I spent a few months working with Gevent for a personal project, and I found the implicit behaviour more mind-bending than Twisted and asyncio’s reactive behaviour. What do I mean by that? Basically in Gevent, you have to know what direction data is (supposed to be) going in. You need to know “at this point, I should be receiving some data on this socket”. If you’re put in a situation where both sides of the connection disagree on this, you’ll enter a dead lock. The same goes for closed connections - every time you do a read, you need to check that the connection was closed by the other end (i.e. an empty read). You don’t have to do these checks in a reactive event loop, you just react to the event that it was closed.
By far, though, the biggest thing for me is that Gevent has no Python 3 support. There’s a third party fork that brings Gevent to Python 3, but nothing official. There’s a lot of talk about the matter on the Github repo, but no action as of yet.
Conclusions
I like asyncio. It’s basically a minimal Twisted, and I don’t think that’s a bad thing. Reflecting on what I’ve written here, I haven’t actually written very much about it, but that’s because anything I said would basically be the purple monkey dishwasher version of the PEP.
Despite the similarities, I don’t think people will complain about asyncio the
way they complain about Twisted. This is because of the way coroutine support
has been integrated with yield from
from the beginning. The
defer.inlineCallbacks
decorator was a late addition to Twisted, though, and
wasn’t even possible until Python 2.5. That makes it a victim of history and
circumstance, I suppose. That and the camelCasing… but whatever. It’s still
pretty awesome.
Another complaint that people have with Twisted is that you’re “buying into the framework” because so many other things don’t interact well with it. But that’s true of a lot of things. That’s what Gevent attempts to solve, but as I pointed out earlier, there are a few caveats to this. Even then, you’re never not buying into Gevent - you’re buying into its ability to trick other libraries into thinking that they’re still synchronous. asyncio however aims to try and fix this problem by allowing you to run Twisted and Gevent code side-by-side, provided the glue code exists.
This whole thing kind of reads as a big “Asyncio and Twisted vs Gevent-and-Gevent-is-frustrating-sometimes” but it’s really not intended to be like that. They’re all libraries that are good at solving the problems they intend to, and Asyncio is a good foundation for bringing them all together.