I’ve noticed a small pattern emerging in my Twisted code of late, that can drastically clean up the interfaces where threading is involved. Here it is.
Say you have a small piece of Twisted code like this:
class MyThingy(object):
def __init__(self, bunch_of_stuff):
self.bunch_of_stuff = bunch_of_stuff
self.output = Queue()
def go(self):
workers = [threads.deferToThread(self.worker)
for i in range(10)]
return defer.DeferredList(workers)
def worker(self):
for i in self.bunch_of_stuff:
self.output.put(i)
return self.output
m = MyThing(range(20))
d = m.go()
d.addCallback
def callback(results):
print 'woo results', results
As it is now, the callback will end up with some really, really ugly results. Namely, the same queue multiple times, giving you no clear definition of the exact amount of work done.
You could do something hacky like pushing the queue through a set, but that’s even worse; it runs the risk of destroying data, removing duplicate results when that’s likely not what you want to do. Alternatively, you could store the output in a local list, rather than on the object, and have the worker return it.
Here are the proposed changes I recommend, taking advantage of how Twisted’s Deferred objects work:
def go(self):
workers = [threads.deferToThread(worker)
for i in range(10)]
workers = defer.DeferredList(workers)
d = defer.Deferred()
@workers.addCallback
def callback(results):
d.callback(self.output)
@workers.addErrback
def errback(reason):
d.errback(reason)
return d
def worker(self):
for i in self.bunch_of_stuff:
self.output.put(i)
# no need to return anything anymore
It’s doing a little more, but here’s what’s going on.
We’re creating a Deferred object of our own, that fires callbacks/errbacks with
the output once all the threads are finished. By wrapping the threads like
this, the handlers of the MyThingy().go()
call will only receive a single set
of data, with no need to worry about the problems I detailed above. Not only is
this much more straight-forward, but the handlers get to do much less work.
So, why not just store the output in a local list, and return it, like I mentioned above? That would be much simpler, wouldn’t it? It would, yes, but you’d still have to merge all the results, so it doesn’t entirely clean up the issue. Going the way I’ve detailed above is nicer for everyone, and from what I can see, the cleanest way possible.
I love Twisted. This is the way that Deferred objects were meant to be used.