Futures as a Design Pattern for Refactoring

We've been using the NDB library in our app to manage all interactions with the DB. With NDB you use Futures to express asynchronous Python code. It feels a lot like gevent mixed with SQLAlchemy's ORM, but more thoughtfully integrated. What's interesting is how future-based programming has emerged as a design pattern for refactoring for performance. Let me explain:

We often have code that looks like this, especially for dashboards, where we need to do N independent queries to construct a view:

class DashboardHandler(RequestHandler):
  def get(self):
    user = MyUserModel.get_by_id(self.request.get('user_id'))

    context = {}

    posts = MyPosts.query().filter(
        MyPosts.user == user).fetch(10)
    for post in posts:
      # Populate the template context dictionary...

    comments = MyComments.query().filter(
        MyComments.user == user).fetch(10)
    for comment in comments:
      # Populate the template context dictionary...

    self.render('my_template.html', context)

Initially this is fine because it's only a couple of queries in serial. Eventually our DashboardHandler grows to be 10 or 20 separate operations that need to join together to construct the response. At that point the DashboardHandler would be excruciatingly slow. It also gets long, as the various methods that fetch and traverse objects are added to the handler's get() method.

Using NDB we'll split up something like this. We do it by finding the logical units of work in the DB (focusing on what's being fetched) and breaking them out into methods that are tasklets which execute concurrently:

class DashboardHandler(RequestHandler):
  @toplevel
  def get(self):
    user = MyUserModel.get_by_id(self.request.get('user_id'))

    post_data = self.get_posts(user)  # These return futures
    comment_data = self.get_comments(user)

    context = {}
    context.update((yield post_data))
    context.update((yield comment_data))
    self.render('my_template.html', context)

  @tasklet
  def get_posts(self, user)
    context = {}
    posts = MyPosts.query().filter(
        MyPosts.user == user).fetch_async(10)
    for post in posts:
      # Populate the local template context dictionary...
    raise Return(context)

  @tasklet
  def get_comments(self, user)
    context = {}
    comments = MyComments.query().filter(
        MyComments.user == user).fetch_async(10)
    for comment in comments:
      # Populate the local template context dictionary...
    raise Return(context)

Now get_posts() and get_comments() will run in parallel, minimizing the idle time for the Python GIL thread, maximizing throughput. Simultaneously we've refactored our code to be more readable, logically separated, and potentially reusable. But it still reads like procedural code and can be tested synchronously, like this:

class MyTest(TestCase):
  def test_get_posts(self):
    handler = DashboardHandler()
    user = MyUserModel(id='my name')
    post = MyPosts(user=user)
    post.put()

    future = handler.get_posts(user)
    self.assertEquals(dict(posts=post), future.get_result())

So this style of refactoring using futures is all win with very little effort. With almost just copying and pasting code sections you can get tremendous latency improvements through simultaneous I/O. And it's way easier to understand than continuation-passing style asynchronous programming. In general I wish more APIs worked this way. Maybe Tornado could be paired with something for a non-App-Engine solution?

As an aside, this also illustrates why I'm not optimistic about nodejs's longevity. Programmers don't understand asynchronous programming, even when they've been warned. Futures are the sanest way to transition a synchronous project to async without using threads. When I see there's resistance to officially adopting the project that makes Node feel imperative it makes its future as an application platform look grim. I think what gets popular is what's easy to learn that solves a real need; what lasts is what's easy to make robust and fast.

One Big Fluke

17 July 2012

Futures as a Design Pattern for Refactoring