24 April 2015

Response to "The Long-Term Problem With Dynamically Typed Languages"

I enjoyed this post quite a bit: "The Long-Term Problem With Dynamically Typed Languages". I think he's got some great points. I especially like the analogy with the "broken windows effect". It's interesting to hear about someone's experience using a software system or practice for a long time.

The best data point I have on this personally is my current project/team. The codebase is over 500KLOC now. The majority of it is in Python, followed by JS. I’ve been working on it since the beginning—over 4 years. We’ve built components and then extended them way beyond their original design goals. There’s a lot of technical debt. Some of it we’ve paid down through refactoring. Other parts we’ve rewritten. Mostly we live with it.

As time has gone on, we’ve gained a better understanding of the problem domain. The architecture of the software system we want is very different than what we have or what we started with. Now we’re spending our time figuring out how to get from where we are to where we want to be without having to rewrite everything from scratch.

I agree we have the lava layers problem the author describes, with multiple APIs to do the same thing. But I’m not sure if we would spend our time unifying them if we had some kind of miraculous tooling afforded by static types.

Our time is better spent reevaluating our architecture and enabling new use-cases. For example, one change we’ve been working towards reduces the turn-around time for a particular data analysis pipeline from 30 minutes to 1 millisecond (6 orders of magnitude). Now our product will be able to do a whole bunch of cool stuff that was impossible before. It took a lot of prototyping to get here. I don’t think static types would have helped.

My team’s biggest problem has always been answering the question: “How do we stay in business?” We've optimized for existence. We’ve had to adapt our system to enable product changes that make our users happy. Maybe once your product definition is so stable, like Google Search or Facebook Timeline, you can focus on a codebase that scales to 10,000 engineers and 10+ years of longevity. I haven't worked on such a project in my career. For me the requirements are always changing.

(Originally from my comment here)
And now an official Google blog post about Borg is up. To get all the info read the paper here.

22 April 2015

Another epic post by Aphyr about MongoDB's consistency problems.

19 April 2015

Some updates to dpxdt

I landed a few updates to Depicted today:

  • We now use virtualenv to manage dependencies. No more git submodules! That was a huge mistake. At the time (two years ago) I thought pip was just as bad. But now I'm fine with it.
  • I rewrote the deployment instructions to use App Engine Managed VMs. It's now 10x easier to deploy. Still non-trivial because Google's Cloud Console is so complicated.
  • Instructions for local dpxdt are now at the top of the README, thanks to Dan. I moved the whole set of dpxdt local instructions over. Hopefully this will make the project less scary for newbies.

What's left: Make it so you can install the whole server with pip install dpxdt_server and be done with it.

18 April 2015

This post explains how to implement the core API of React using jQuery. Good to understand.

The future is not Borg

The New Stack has an interesting write-up about Google's long-secret Borg system. I can't say anything specific about this and I haven't read the paper.

What I will say is when I first arrived at Google in 2005 I felt like I was stepping into the future. Tools like Docker, CoreOS, and Mesos are 10 years behind what Borg provided long ago, according to The New Stack's write-up. Following that delayed timeline, I wonder how long it will be before people realize that all of this server orchestration business is a waste of time?

Ultimately, what you really want is to never think about systems like Borg that schedule processes to run on machines. That's the wrong level of abstraction. You want something like App Engine, vintage 2008 platform as a service, where you run a single command to deploy your system to production with zero configuration.

Kubernetes is interesting to watch, but I worry that it suffers from requiring too much configuration (see this way-too-long "guestbook example" for what I mean). Amazon's Container Service or Google's Container Engine may make such tools more approachable, but it's still very early days.

I believe systems like Borg are necessary infrastructure, but they should be yet another component you take for granted (like your kernel, a disk driver, x86 instructions, etc).

15 April 2015

Out of stock

Effective Python sold out thanks to PyCon 2015! My publisher had to do an emergency second printing a few weeks ago because it looked like this was a possibility. Luckily, the book will be restocked everywhere on April 17. Amazon still has some left in their warehouse in the meantime. I'm happy to know people are enjoying it!

12 April 2015

Links from PyCon 2015

Instead of writing a conference overview, here's an assorted list of links for things I saw, heard of, or wondered about while attending talks and meeting new people at PyCon this year. These are in no particular order and I'm missing a bunch I forgot.

  • "SageMath is a free open-source mathematics software system licensed under the GPL. It builds on top of many existing open-source packages: NumPy, SciPy, matplotlib, Sympy, Maxima, GAP, FLINT, R and many more. Access their combined power through a common, Python-based language or directly via interfaces or wrappers."
  • "ROS [Robot Operating system] is an open-source, meta-operating system for your robot. It provides the services you would expect from an operating system, including hardware abstraction, low-level device control, implementation of commonly-used functionality, message-passing between processes, and package management."
  • "GeoJSON is a format for encoding a variety of geographic data structures."
  • "GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types."
  • "N-grams to the rescue! A collection of unigrams (what bag of words is) cannot capture phrases and multi-word expressions, effectively disregarding any word order dependence. Additionally, the bag of words model doesn’t account for potential misspellings or word derivations."
  • "A Few Useful Things to Know about Machine Learning: This article summarizes twelve key lessons that machine learning researchers and practitioners have learned. These include pitfalls to avoid, important issues to focus on, and answers to common questions."
  • "Compare randomized search and grid search for optimizing hyperparameters of a random forest. All parameters that influence the learning are searched simultaneously (except for the number of estimators, which poses a time / quality tradeoff)."
  • SainSmart 4-Axis Control Palletizing Robot Arm Model For Arduino UNO MEGA250
  • Nanopore sequencing of DNA
  • Advanced C++ Programming Styles and Idioms
  • "diff-cover: Automatically find diff lines that need test coverage. Also finds diff lines that have violations (according to tools such as pep8, pyflakes, flake8, or pylint). This is used as a code quality metric during code reviews."
  • "FuzzyWuzzy: Fuzzy string matching like a boss."
  • "Glue is a Python library to explore relationships within and among related datasets."
  • "Tabula: If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful it is — there's no easy way to copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface."
  • "Blaze expressions: Blaze abstracts tabular computation, providing uniform access to a variety of database technologies"
  • "AppVeyor: Continuous Delivery service for Windows" (aka: Travis CI for Windows)
  • "Microsoft Visual C++ Compiler for Python 2.7: This package contains the compiler and set of system headers necessary for producing binary wheels for Python 2.7 packages."
  • "Ghost Inspector lets you create and manage UI tests that check specific functionality in your website or application. We execute these tests continuously from the cloud and alert you if anything breaks."
  • "Think Stats: Probability and Statistics for Programmers"
  • "The bytearray class is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has."
  • "git webdiff: Two-column web-based git difftool"
  • "Bug: use HTTPS by default for uploading packages to pypi"

If you ever wonder why people go to conferences, this is why. You get exposure to a wide range of topics in a very short period of time, plus you get to meet people who are excited about everything and inspire you to learn more.

11 April 2015

How to Be More Effective with Functions

Here are the slides and a video of my talk from PyCon 2015 Montréal. I'm still at the conference now and looking forward to another great day of talks. They're uploading the videos amazingly quickly, so be sure to check out the full list here.

10 April 2015

Facebook updated their open source patent clause

Facebook published an updated version of their patent clause. I'm not sure if this new clause is better than the old one (update: see more from DannyB here). It's definitely more words. See this previous post to understand why the old one was bad.

Here's the wdiff between them (also in this gist):

Additional Grant of Patent Rights {+Version 2+}
"Software" means [-fbcunn-] {+the osquery+} software distributed by Facebook, Inc.
{+Facebook, Inc. ("Facebook")+} hereby grants [-you-] {+to each recipient of the Software
("you")+} a perpetual, worldwide, royalty-free, non-exclusive, irrevocable
(subject to the termination provision below) license under any
[-rights in any patent claims owned by Facebook,-] {+Necessary
Claims,+} to make, have made, use, sell, offer to sell, import, and otherwise
transfer the Software. For avoidance of doubt, no license is granted under
Facebook’s rights in any patent claims that are infringed by (i) modifications
to the Software made by you or [-a-] {+any+} third [-party,-] {+party+} or (ii) the Software in
combination with any software or other [-technology
provided by you or a third party.-] {+technology.+}
The license granted hereunder will terminate, automatically and without notice,
[-for anyone that makes any claim (including by filing-]
{+if you (or+} any [-lawsuit, assertion or
other action) alleging (a) direct, indirect,-] {+of your subsidiaries, corporate affiliates or agents) initiate
directly+} or [-contributory infringement-] {+indirectly,+} or
[-inducement to infringe-] {+take a direct financial interest in,+} any [-patent:-] {+Patent
Assertion:+} (i) [-by-] {+against+} Facebook or any of its subsidiaries or {+corporate+}
affiliates, [-whether or not such claim is related to the Software,-] (ii) [-by-] {+against+} any party if such [-claim-] {+Patent Assertion+} arises in whole or
in part from any software, {+technology,+} product or service of Facebook or any of
its subsidiaries or {+corporate+} affiliates, [-whether or not
such claim is related to the Software,-] or (iii) [-by-] {+against+} any party relating
to the
[-Software;-] {+Software. Notwithstanding the foregoing, if Facebook+} or [-(b) that-] any [-right-] {+of its
subsidiaries or corporate affiliates files a lawsuit alleging patent
infringement against you+} in [-any-] {+the first instance, and you respond by filing a+}
patent {+infringement counterclaim in that lawsuit against that party that is
unrelated to the Software, the license granted hereunder will not terminate
under section (i) of this paragraph due to such counterclaim.
A "Necessary Claim" is a+} claim of {+a patent owned by+} Facebook {+that+} is [-invalid-]
{+necessarily infringed by the Software standing alone.
A "Patent Assertion" is any lawsuit or other action alleging direct, indirect,
or contributory infringement or inducement to infringe any patent, including a
cross-claim+} or
[-unenforceable.-] {+counterclaim.+}

09 April 2015

I'm giving a talk at PyCon Montréal tomorrow, April 10th. More info is here. Will post slides and code as soon as it's done!

01 April 2015

What is "founder code"?

I heard a fun story last week from Andy Smith about something he calls "founder code". This is code that's in your codebase that was written by a cofounder of the project. The purpose of founder code is to demonstrate the intended outcome of design choices. It goes beyond a requirements or design document because it's exactly specific about how the software is supposed to work.

Founder code is great for illustrating an important architectural choice (e.g., how to make a system scalable). Founder code is crucial for developer tools where the product is a system for building software (it's worthless to design developer tools without using them yourself). Founder code is especially helpful in UX design. Complex animations, transitions, and interactions can be hard to describe. Data visualizations take a long time to play with before you discover the right combination that's compelling.

The concept of "founder code" also explains a lot of the bad things I've seen in codebases over the years. You can imagine this is the mindset of a programmer solving a problem the first time around: "Better to have something working the slow way than not at all. Maybe someday we can make that awful bookkeeping function fast by using a better architecture. For now, this is going to get us by and provide the functionality we think our users need."

Looking back, this may justify my style in general. My teammates have described my code as having a particular flavor. When most programmers encounter a barrier in their path, they'll often find a way around it by writing a bit more code, a better abstraction, etc. They do something that solves the problem, but still preserves energy for the road ahead. When I encounter a barrier, I often find a way to bust through the center of it, even if it means causing long-term damage to the codebase or my forward momentum. My goal is not a sustainable, long-term implementation. The purpose of the code is to convey behavior.

Founder code is fundamentally just another name for prototyping. When a team has grown to the point where other (better) programmers have time to rewrite the original founder code, that's success. The job of founders is to hire the right people to rebuild the current or next version of the system correctly. Thus, the ultimate goal of founder code is to become obsolete. Andy pointed out that founders are totally comfortable with this idea, where many other developers may be enraged by the sight of their code being ripped out and replaced (it's perceived as spiteful).

For my current project, most of my original work has been replaced (some of it I even rewrote myself in a better way — I'm not always bad!). But even though my code is gone, the behaviors I outlined in the first prototypes carry on. Many of the fundamental assumptions and design choices I made still exist, for better or worse. This means I can still help my teammates reason about the codebase, even though it's changed so much since I wrote the first line over four years ago.

29 March 2015

"What I Wish I Knew When Learning Haskell" seems both useful and cautionary.

20 March 2015

Fun Setup of Dillon Markey, a stop-motion animator. He uses a Power Glove to do it:

19 March 2015

I got a nice shoutout from Scott Meyers on his blog (about Effective Python). Thanks!

08 March 2015

Effective Python is now published! Early birds get 35% off by following this link and using the discount code EFFPY. If you're reading this, thanks for your interest and support over the past year!

05 March 2015

First hard copy of Effective Python I've seen! Bizarre but awesome to have a year of work in your hand.

23 February 2015

Ghost of Threading Past

I got an interesting link from Luciano Ramalho in reference to the asynchronous Python discussion: A paper called "Why Events Are A Bad Idea (for high-concurrency servers)" from Eric Brewer's lab in 2003. I hadn't seen this one before. The abstract says:

We examine the claimed strengths of events over threads and show that the weaknesses of threads are artifacts of specific threading implementations and not inherent to the threading paradigm. As evidence, we present a user-level thread package that scales to 100,000 threads and achieves excellent performance in a web server. We also refine the duality argument of Lauer and Needham, which implies that good implementations of thread systems and event systems will have similar performance. Finally, we argue that compiler support for thread systems is a fruitful area for future research. It is a mistake to attempt high concurrency without help from the compiler, and we discuss several enhancements that are enabled by relatively simple compiler changes.

Fixing threads must have been a popular subject back then. I remember that Linux got fast user-space mutexes in December 2003. Python's PEP 342 "Coroutines via Enhanced Generators" is from May 2005. But that design builds on rejected PEP 325 "Resource-Release Support for Generators" from August 2003, which outlined a way to make generators more cooperative. Jeremy Hylton also wrote about "Scaling [Python] to 100,000 Threads" in September 2003. Coroutines are clearly a big part of the paper's solution and Python had them early on.

Node.js brought back the popularity of an event loop in May 2009. Many programmers rejoiced, others lamented its implications. Node is bound to a single thread like Python. It can only take advantage of multiple CPUs using subprocesses, also like Python. It's funny that Python's asyncio now focuses on event loops, like Node, for better asynchronous programming. But Node and JavaScript won't get async coroutines until ECMAScript 7. At least it's got a good JIT and you can use ES6 features today.

Go, released in March 2011, addresses all of the issues from the paper with its goroutines and channels. The paper mentions the importance of dynamic stack growth, which Go addressed with segmented stacks (as of Go 1.4, it uses a stack reallocation model instead). The paper suggests compiler support for preventing race conditions, which Go has built-in. The paper says that threads are bad for dynamic fan-in and fan-out, but Go and Python's asyncio solve those use-cases, too. Seems to me that Go is winning the race these days.

If I could retitle the original paper, I'd call it: "Why Coroutines Are a Good Idea". What's old is new; what's new is old — always.

22 February 2015

Two more cool tools from Facebook on their way to open source: Relay and GraphQL. Too bad the license makes them worthless.

21 February 2015

Cool story about a 3D printed shaving razor.
Here's a gem from r/Machinists. Made on a DMU 65 MonoBlock (a 5 axis milling machine). 1.3 million lines of G-code from Esprit. Some day I want to know how to do this!

17 February 2015

"Introducing DataFrames in Spark for Large Scale Data Science". Spark just keeps looking better and better.

16 February 2015

Python's asyncio is for composition, not raw performance

Mike Bayer (of SQLAlchemy and Mako) wrote a post entitled "Asynchronous Python and Databases". I enjoyed it and I think it's worth your time to read. He put Python's new asyncio library to the test and concludes:

My point is that when it comes to stereotypical database logic, there are no advantages to using [asyncio] versus a traditional threaded approach, and you can likely expect a small to moderate decrease in performance, not an increase.

The benchmarks he ran are interesting, but initially I thought they missed the biggest advantage of asyncio: trivial concurrency through coroutines. I gave a talk about that at PyCon 2014 (slides here). I think what asyncio enables is remarkable. Later on, Mike linked me to another reply of his and now I think I understand where he's coming from.

Basically: If you're dealing with a transactional database, why would you care about the type of concurrency that asyncio enables? When you're using a database with strong consistency, you need to wrap all operations in a transaction, you need to provide a clear ordering of execution, locking, etc. He's right that asynchronous Python doesn't help in this situation alone.

But what if you want to mix queries through SQLAlchemy with lookups in Memcache and Redis, and simultaneously enqueue some background tasks with Celery? This is when asyncio shines: When you want to use many disparate systems together with the same asynchronous programming model. Asyncio makes it trivial to compose these types of infrastructure.

Asyncio won't win in benchmarks that focus on raw performance, as Mike showed. But asyncio will be faster in practice when there are parallel RPCs to distributed systems. It's Amdahl's law in action. What I mean specifically is cases where you issue N coroutines and wait for them all later:

def do_work(ip_address, session_id):
  # Issue two RPCs
  session_future = memcache.get(session_id)
  location_future = geocoder.lookup(ip_address)

  # Wait for both
  done, _ = yield from asyncio.wait([session_future, location_future])
  session, location = done

  # Now do something with both results ...

The majority of my experience in asynchronous Python comes from the NDB library for App Engine, which was a precursor to asyncio and is very similar. In that environment, you can access all of the APIs (Database, memcache, URLFetch, Task queues, RPCs, etc) with a unified asynchronous model. Our codebase that uses NDB employs asynchronous coroutines almost everywhere. That makes it simple to combine many parallel RPCs into workflows and larger pipelines.

Here's a simplified example of one pipeline from my day job. You can think of this as a very basic search engine. Note how many parallel coroutines are executed.

  1. Receive an HTTP request
  2. In parallel:
    1. Send RPC to geocode the IP address
    2. Send RPC to lookup inbound IP in a remote database
      1. After receiving response, in parallel:
        1. Lookup N rate limiters in N memcache shards
      2. Return whether inbound IP is over rate limits
    3. Lookup the user's session in memcache
      1. If it's missing, create the new session object, then in parallel:
        1. Enqueue a task to save the session to the DB
        2. Populate the session into memcache
        3. Set the user session response header
      2. Return the session (new or existing)
  3. Wait for geocode and session RPCs to finish
  4. In parallel:
    1. Do N separate queries based on the user's attributes
      1. First, look in memcache for cached data by attribute
      2. If memcache is missing or empty, do a database query
      3. Look up query results in rate limiting cache
  5. As queries finish (i.e., asyncio.as_completed)
    1. Rank results by relevance
    2. Return best result after all queries finish
  6. Wait for rate limit check from #2b above
    1. If the rate limits are over, return the 503 response and abort
  7. In parallel:
    1. Update result rate limiting caches
    2. Enqueue task to log ranking decision
    3. Enqueue task to update user's session in database
    4. Update user's session in memcache
    5. Start writing response
  8. Wait for all coroutines to finish

This pipeline has grown a lot over time. It began as a simple linear process. Now it's 5 layers "deep" and 10 parallel coroutines "wide" in some places. But it's still straightforward to test and expand because coroutines make the asynchronous boundaries clear to new readers of the code.

I can't wait to have this kind of composability throughout the Python ecosystem! Such a unified asynchronous programming model has been a secret weapon for our team. I hope that Mike enables asyncio for SQLAlchemy because I want to use it along with other tools, asynchronously. My goal isn't to speed up the use of SQLAlchemy alone.

11 February 2015

Nest case study

I'm proud of how Nest uses Google Surveys to make decisions:

Word of the day is kerf: "the width of material that is removed by a cutting process".

06 February 2015

Video version of Effective Python

I'm working on a LiveLesson video version of Effective Python. I stopped by the Pearson office in San Francisco to start recording.

They have a lot of books in the office!

Here's the audio booth where I'm making the screencasts:

I wrote a Sublime Text plugin for the code demoing part of the video. As soon as I save a file it synchronously runs the Python script, sends its output to a file, and then immediately reloads the other pane to show the result. Here's an example video of it in action:

My favorite part is that I have two key commands: one for Python 3 (the default) and one for Python 2. It runs the code out of the same buffer. This makes it really easy to show the differences between the versions and how little is required to apply the advice from the book to either environment.

05 February 2015

Helpful reply about why people liked the Lisp Machines.

04 February 2015

Consider the source

I like reading people's opinions about building software, but it’s really important to consider the source. I think a lot of advice for programmers out there isn’t contextualized enough.

Examples of metadata I want to know:

  • Is the author running an agency? Do they just need to build X more things faster?
  • Is the author building a 10+ year company? Do they need long-term infrastructure?
  • Is the author a frontend, backend, full-stack, functional, etc programmer?
  • Does the author manage? Are they a tech lead? Do they write code every day?

Knowing the answer to such questions determines whether or not the author's guidance applies to your situation. It's highly likely that most of the advice you see out there isn't actually relevant to you at all. Including this post :)

(Originally from my comment here)
Apache Flink looks pretty cool! Like Spark but with iterative graph processing and optimization.

Here's a presentation about it from FOSDEM:

Rust may exceed the pain threshold of some programmers

This post on Lambda the Ultimate about Rust is interesting. One takeaway that resonated with me is:

Rust has to do a lot of things in somewhat painful ways because the underlying memory model is quite simple. This is one of those things which will confuse programmers coming from garbage-collected languages. Rust will catch their errors, and the compiler diagnostics are quite good. Rust may exceed the pain threshold of some programmers, though.

This is where I'm at. I was a die-hard C++ programmer for a long time. Yet Rust seems totally unapproachable to me. I've moved on to Python and Go. It seems the problems I solve these days aren't incompatible with garbage collection.

The problem starts with the new operators in Rust:

ref mut

As a newbie these might as well be on an APL keyboard:

With a history of using C, C++ (before C++11), or Java, none are obvious except for & and maybe mut. I realize some of these have changed in newer versions of the language, but this is my impression of it.

Most of the magical memory management and pointer types I've used in C++ are variations on templates like shared_ptr or unique_ptr or linked_ptr. Instead of special symbols or syntax it's just plain old names. These are readable and discoverable to a new reader of the code.

Python and Go are also pretty simple in the operators they use. The worst Python has is probably * and ** for function arguments. Go's worst is <- for channel operations. I realize that Rust is trying to do something different and needs more ways to express that, so I'm not putting it down here.

If it's worth anything, another data point is I'm not in favor of Python adopting the @ operator for matrix multiplication (PEP 465). I think it's too hard for newbies to quickly understand what it means.

(Originally from my comment here)

28 January 2015

Patent clauses

Update: They've updated the patent clause; see the latest here.

Interesting discussion of the license Facebook is now using for their open-sourced code:

For those who aren't aware, it says

1. If facebook sues you, and you counterclaim over patents (whether about software or not), you will lose rights under all these patent grants.

So essentially you can't defend yourself.

This is different than the typical apache style patent grant, which instead would say "if you sue me over patents in apache licensed software x, you lose rights to software x" (IE it's limited to software, and limited to the thing you sued over)

2. It terminates if you challenge the validity of any facebook patent in any way. So no shitty software patent busting!

I think React is a great tool, but I didn't realize this license change happened on October 8th, 2014. Prior to that it was Apache 2.0 licensed (which is my license of choice). After that, React is licensed with BSD plus their special patent document (described above). Bummer.

26 January 2015

Here's an example of how people are trying to improve C++ templating. I loved C++ and wrote code with crazy shit like compile-time polymorphism. At the time I thought it was worth the complexity-- hah! I feel bad for the folks maintaining my old code now. Why would you want to enable such behavior in Go?

25 January 2015

Seems that some folks are having problems using Open ID to post comments here. I've disabled that mode until I can figure it out. In the meantime, please use the other comment form. It'd be nice to hear from you. Sorry for the trouble!

24 January 2015

Sad but true: "Hacker News" isn't news for hackers anymore. The community I enjoyed has moved to Lobsters.

18 January 2015

Experimentally verified: "Why client-side templating is wrong"

This week Peter-Paul Koch ("PPK" of QuirksMode fame) wrote about his issues with AngularJS. Discussing Angular isn't the goal of my post here. It was one aside PPK made that caught me off guard (emphasis mine):

Although templating is the correct solution, doing it in the browser is fundamentally wrong. The cost of application maintenance should not be offloaded onto all their users’s browsers (we’re talking millions of hits per month here) — especially not the mobile ones. This job belongs on the server.

Wow! Turns out I'm not the only one who was surprised by that statement. So he wrote a follow-up that explains "why client-side templating is wrong"; it provides more nuance than the original post:

Populating an HTML page with default data is a server-side job because there is no reason to do it on the client, and every reason for not doing it on the client. It is a needless performance hit that can easily be avoided by a server-side templating system such as JSP or ASP.

To me this could mean one of two things:

  1. Do not use client-side templating at all. Render all data server-side as HTML for the initial page view.
  2. Do not send an AJAX request back to the server after loading the shell of your app (a style that plagues GWT apps like AdWords and Twitter's old UI). Instead, you should provide the initial view data server-side in the first page HTML (i.e., as a JSON blob) and render it with JavaScript immediately after load.

Style #2 is what I see becoming more popular. For example, check out the source of Google Analytics. The first page load is a small HTML shell, JS and CSS resources, and a blob of JSON. Some may say that this style prevents you from getting indexed, but that's old information. Web crawlers now support running JavaScript, meaning you don't need to use 2009-era hacks like /#! anymore.

PPK resolves the ambiguity with this:

Then, when both framework and application are fully initialised, the user starts interacting with the site. By this time all initial costs have been incurred, so of course you use JavaScript here to show new data and change the interface. It would be wasteful not to, after all the work the browser has gone through.

So he definitely means option #1 from above: Never use JavaScript for first page load rendering. Now other detail from his post makes more sense (emphasis mine):

I think it is wasteful to have a framework parse the DOM and figure out which bits of default data to put where while it is also initialising itself and its data bindings. Essentially, the user can do nothing until the browser has gone through all these steps, and performance (and, even more importantly, perceived performance) suffers. The mobile processor may be idle 99% of the time, but populating a new web page with default data takes place exactly in the 1% of the time it is not idle. Better make the process as fast and painless as possible by not doing it in the browser at all.

I'd guess PPK would be okay with how ReactJS can render a template client- or server-side using the same code. With React the first page load can be dumb HTML that's later decorated by JavaScript to come alive for user interaction. So I think PPK's concern isn't about JavaScript templating. The debate is whether JavaScript should be required to run before a page is first rendered (a requirement of style #2 above).

Instead of more words about philosophy and architecture, I decided to understand this with numbers by putting it to the test on real hardware. What follows is the experiment I ran.

Experiment HTML pages

I wrote a small server in Go that generates test pages that I figure are a fair approximation of "templating complexity". The silly example I used is straight out of the HTML5 spec for rendering a table of cats that are up for adoption. The server lets you vary the number of cats in the table from 1 to 10,000+. Here's what the output of the server looks like:

Name Colour Sex Legs
Maggie Brown Mackeral Torbie Unknown 3
Lucky Seal Point with White (Mitted Pattern) Male 4
Minka Silver Mackerel Tabby Female 4
Millie Solid Chocolate Unknown 2
Kitty Blue Classic Tabby Female (neutered) 4

The server can generate this table in two different ways.

The first way the server renders is a classic server-side template. Go templates are pretty fast and the results are served compressed and chunked, so I think this is a reasonable proxy for "the best you can do":

<!DOCTYPE html>
  <meta charset="utf-8">
  <title>Server render</title>
      {{range .}}

The second way the server renders is with the new HTML5 template tag used by frameworks like Polymer and Web Components. The server generates a blob of JSON and does all rendering from that data client-side. The only server-side templating required is injecting the JSON serialized data into the HTML shell:

<!DOCTYPE html>
  <meta charset="utf-8">
  <title>Template tag render</title>
    var data = {{.}};  // The JSON data goes here
      <template id="row">
    var template = document.querySelector('#row');
    for (var i = 0; i < data.length; i += 1) {
      var cat = data[i];
      var clone = template.content.cloneNode(true);
      var cells = clone.querySelectorAll('td');
      cells[0].textContent = cat.name;
      cells[1].textContent = cat.color;
      cells[2].textContent = cat.sex;
      cells[3].textContent = cat.legs;

Notably, neither of these pages has any external resources. I wanted to control for that variable and understand the best you could do theoretically with either approach.

Browser setup

I tested the two test pages in two different browsers.

The first browser is Chrome Desktop version 39.0.2171.99 on Mac OS X version 10.10.1; the machine is somewhat old, a mid-2012 MacBook Air (2GHz Core i7, 8GB RAM, Intel Graphics). The second browser is Chrome Mobile version 39.02171.93 on Android 4.4.4; the device is top of the line, a OnePlus One (Snapdragon 801, 2.5GHz quad-core, 3GB RAM, Adreno 330 578MHz). Granted, this is a high-end mobile device, but it's also one of the cheapest out there — I wouldn't bet against Moore's law (especially with what's happening to Samsung).

On desktop, the page loads were done directly to localhost:8080, eliminating any concerns of network overhead. On mobile, I used port forwarding from the Chrome DevTools to access the same localhost:8080 URLs.

Content size

Here's a chart of the size of the content as a function of the number of cats on the adoption page (note that this log-scale on the X axis and log-scale on the Y axis).

What I see in this graph:

  • The page that uses JSON is smaller than the HTML page
  • But the compressed size only varies by 10-20% at most

Conclusion: Response sizes don't matter if you're using compression.

Server response time

Here's a chart of the server response time as a function of the number of cats on the adoption page (again this is a log-log chart on both axes). This is the time it takes for the Go server to render the template without actually loading it in a browser. I measured this time with curl -w '%{time_total}\n' -H 'Accept-Encoding: gzip' -s -o /dev/null URL.

What I see in this graph:

  • The pages are about the same performance until you get to 250 cats
  • Above 250 cats, the JSON page becomes increasingly faster (5x for 10,000 cats)

Conclusion: The page that uses JSON is faster at higher numbers of cats likely because there's less data to copy to the client. But this latency is small enough and similar enough (JSON vs. HTML) that it can be ignored when comparing overall render time.

First & last paint time

This is the most important thing to measure. Essentially, PPK's argument comes down to the critical rendering path. That is: How long from when the user starts trying to load your webpage in their browser to when they actually see something useful appear on their screen?

Measuring the time from "request to glass" is pretty well documented these days. Ilya Grigorik gave a great talk at Velocity 2013 about optimizing the critical rendering path. Since then the same content has been turned into helpful documentation. Chrome even provides a window.chrome.loadTimes object that will give you all of the timing information (see this post for details).

I added instrumentation to the templates above to console.log two numbers:

  • Time to first paint: How long between the request start and drawn pixels on the screen. Importantly, the page may still be loading (via a chunked response) when this happens.
  • Time to last paint: How long between the request start and the page fully finishing its load and render. This is the time after all resources have loaded, all JavaScript has run, and all client-side templates have inserted data into the DOM.

I ran each timing test 5 times to account for environment variations (like GC pauses and random system interrupts).

Results: Desktop

Here's a chart of time to first paint on Chrome Desktop (log-scale X axis).

Here's a chart of time to last paint on Chrome Desktop (also log-scale X axis).

What I see in these graphs:

  • The first and last paint times for the JSON/JavaScript approach are the same (as expected)
  • The first and last paint times for the server-side approach are different because the browser can render HTML before the whole chunked response has finished loading
  • Up to 1000 cats, there is practically no latency difference on Desktop between client-side rendering with JavaScript and server-side rendering with dumb HTML
  • Above 1000 cats, the server-side rendered approach will do first paint well before the client-side approach (3x faster in the case of 10,000 cats)
  • However, the client-side rendered approach will do last paint well before the server-side approach (2x faster in the case of 10,000 cats)

Conclusion: It depends on what you're doing. If you care about first paint time, server-side rendering wins. If your app needs all of the data on the page before it can do anything, client-side rendering wins.

Results: Mobile

I repeated the same painting tests for the mobile browser.

Here's a chart of time to first paint on Chrome Mobile (log-scale X axis, and note that the Y axis is 0 to 10 seconds instead of 0 to 2 seconds used in the desktop charts above).

Here's a chart of time to last paint on Chrome Mobile (also log-scale X axis and 0 to 10 second Y axis).

What I see in these graphs:

  • Everything is about 5x slower than desktop
  • There's a lot more variance in latency on mobile
  • Performance up to 1000 cats is roughly the same between server-side and client-side rendering (with up to a 30% difference in once case; 15% or less otherwise)
  • Above 1000 cats, the server-side rendered approach has a lower time to first paint (up to 5x faster) just like desktop
  • The client-side rendered approach will do last paint well before the server-side approach (up to 2x faster) just like desktop

Conclusion: Almost the same comparative differences as client- and server-side rendering on desktop.


Is PPK correct in saying that "client-side templating is wrong"? Yes and no. My test pages use the number of cats in the adoption table as a proxy for the complexity of the page. Below 1,000 cats worth of complexity, the client- and server-side rendering approaches have essentially the same time to first paint on both desktop and mobile. Above 1,000 cats worth of complexity, server-side rendering will do first paint faster than client-side rendering. But the client-side rendering approach will always win for last paint time above 1,000 cats.

That leads to two important questions:

1. How many cats worth of complexity are you rendering in your pages?

For most of the things I've built, I usually don't render a huge amount of data on a page (besides images). 1,000 cats worth of complexity in the data model (which corresponds to ~75KB of JSON data) seems like quite a bit of room to work with. For example, the Google Analytics page I linked to above only has ~30KB of JSON data embedded in it for initial load (in my case).

After the first load, modern tools make it trivial to dynamically load content beneath the fold. You can even do this in a way that doesn't render elements that aren't visible, significantly improving performance. And I'd say optimizing the size and shape of the first render data payload is straightforward.

Practically speaking, this all means that if your first load is less than 1,000 cats of complexity, client-side rendering is just as fast as server-side rendering on desktop and mobile. To me, the benefits of client-side rendering (e.g., UX, development speed) far outweigh the downsides. It's good to know I'm not making this tradeoff blindly anymore.

2. Do you care about first paint time or last paint time?

With Google's guide on performance, Facebook's BigPipe system, and companies like CloudFlare and Fastly making a splash, it's clear that some people really care a lot about first paint performance. Should you?

I'd say it really depends on what you're doing. If your business model relies on showing advertisements to end-users as soon as possible, time to first paint may make all the difference between a successful business and a failed one. If your website is more like an app (like MailChimp or Basecamp), the most important thing is probably time to last paint: how soon the app will become interactive after first load.

The take-away here is to choose the right architecture for your problem. Most of what I build these days is more app-like. For me, time to first paint doesn't matter as much as time to last paint. So again, I'll stick with the benefits of client-side rendering and last-paint performance over server-side rendering and first-paint performance.


My conclusion from all this: I hope never to render anything server-side ever again. I feel more comfortable in making that choice than ever thanks to all this data. I see rare occasions when server-side rendering could make sense for performance, but I don't expect to encounter many of those situations in the future.

(PS: You can find all of the raw data in my spreadsheet here and all of the test code here)

(PPS: If you're particularly inspired by this post to adopt a cat, check out the ASPCA)

16 January 2015

M.C. Escher and Monument Valley

I love the visual style of M.C. Escher.

It's probably old news but I finally learned about the game Monument Valley this week. It's beautiful and clearly an homage to Escher's art. The puzzle game requires traversing a map of impossible constructions and geometry.

Here's a good example of what you have to twist your mind around (you have to go up the stairs, then across the bridge, then down the ladder, then rotate the platform, then go back up in an impossible direction):

The original game is absolutely gorgeous and worth it just for the visuals and audio. The new levels ($2 extra) are more difficult than the first set and even more reminiscent of Escher's wackier art. More like this please!

13 January 2015

"Territory Annexed" on The Awl is a great article about how media and silos are evolving on the internet.
Insightful post by legendary (Python) programmer Ian Bicking: "Being A Manager Is Lonely".

Even if you don't manage people, this is helpful in that it illustrates what your boss is probably worried about and what you have to look forward to if management is thrust upon you.

11 January 2015

$ make clean

Agreeing to the Xcode/iOS license requires admin privileges, please re-run as root via sudo.


09 January 2015

Attempts at adding Go's select to Python

I found a great explanation of how Go channels are implemented under the covers (January 2014). It compliments a similar implementation from Plan 9 and some related discussion on go-nuts.

I also found an implementation of channels for Python called goless that relies on Stackless or gevent. The project's big open issue is that it's missing support for asyncio. That should be possible given asyncio has built-in support for synchronization primitives. But even with that, I'm not enthusiastic about how goless implements the select statement from Go:

chan = goless.chan()
cases = [goless.rcase(chan), goless.scase(chan, 1), goless.dcase()]
chosen, value = goless.select(cases)
if chosen is cases[0]:
    print('Received %s' % value)
elif chosen is cases[1]:
    assert value is None
    assert chosen is cases[2]

An earlier attempt at select for Stackless has the same issue. Select is just really ugly. Send and receive on a channel aren't too pretty either. I wonder if there's a better way?

08 January 2015

Interesting summary on LWN of Guido's type hinting proposal for Python. The comments are the best part.

07 January 2015

Update from Jason Scott about the Internet Archive's cool DOS games treasure trove.
Apache Samza, LinkedIn’s Framework for Stream Processing. I still don't know what tool I'd use for this on an open source stack.
Useful ECMAScript 6 compatibility table. Looks like 6to5 and Traceur are almost ready for prime time.

06 January 2015

What would Von Neumann say about data binding?

Interesting quote and book summary on High Scalability about building the ENIAC:

Von Neumann had one piece of advice for us: not to originate anything... One of the reasons our group was successful, and got a big jump on others, was that we set up certain limited objectives, namely that we would not produce any new elementary components... We would try and use the ones which were available for standard communications purposes. We chose vacuum tubes which were in mass production, and very common types, so that we could hope to get reliable components, and not have to go into component research.

This makes me wonder about the evolution of tools we're seeing around data binding in building websites these days: React, Angular, Polymer. The outcomes, APIs, and benefits of these projects are very similar. But somehow using these tools feels like you're doing a type of UX development research — they're unproven.

The conservative approach, as Von Neumann suggests, would be to avoid these tools entirely. However, I believe these next generation UX tools are actually a secret weapon akin to Paul Graham's description of using LISP instead of a median programming language:

We wrote our software in a weird AI language, with a bizarre syntax full of parentheses. For years it had annoyed me to hear Lisp described that way. But now it worked to our advantage. In business, there is nothing more valuable than a technical advantage your competitors don't understand. In business, as in war, surprise is worth as much as force.

Obviously, the median language has enormous momentum. I'm not proposing that you can fight this powerful force. What I'm proposing is exactly the opposite: that, like a practitioner of Aikido, you can use it against your opponents.

If you work for a big company, this may not be easy. You will have a hard time convincing the pointy-haired boss to let you build things in Lisp, when he has just read in the paper that some other language is poised, like Ada was twenty years ago, to take over the world. But if you work for a startup that doesn't have pointy-haired bosses yet, you can, like we did, turn the Blub paradox to your advantage: you can use technology that your competitors, glued immovably to the median language, will never be able to match.

And so we're trying to build some things using Polymer. I hate doing anything trendy but I must admit that it feels extremely productive to use. Given that the <template> tag is built into many browsers now, I think this is more stable than it appears. I'll let you know how it goes.
Effective Python's first public review (I figure they read the Rough Cut).
Cool explanation of some Lua/LISP tools for an RPG.
© 2009-2015 Brett Slatkin