feedabout

One Big Fluke

18 March 2010

The highlight from SxSW (besides catching up with nice people) is food.

BBQ crash course (Snow's brisket in the center)
Country-fried Pork chop, Mexican martini
Tacos from La Casa del Fuego with flour tortillas
Delicious beer on tap at the Gingerman
→ reply
The third dimension of scaling.

Wondering about the next frontier for data systems is a recurring impulse for us programmers. Recently it's become more clear to me what the third dimension of scaling could be. The divisions blur and this is not meant to be rigorous, but I think an abbreviated timeline for each dimension helps me define progress and the trajectory. In the diagrams the green squares represent data, the outlines are machines, and the purple is for views.

1. Data
The first dimension was data. The military had their firing tables. People like Diebold envisioned data processing. All input and output data were private to the machine and user doing the computation. This was scaling up, mostly through dollars; it reminds me of the work on the moon landing.

  • One machine
    • More bits (BCD, 12 bit, 32 bit, 64 bit)
    • More storage (memory, tape, disk)
    • Bigger chips (IBM360 had a polynomial divide instruction!)
  • Big-iron super computers
  • Monolithic

2. Machines
The second dimension was machines. It was made possible by high-speed networking and dropping prices. Instead of scaling up this was scaling out-- as wide as possible. This has led to infrastructure as a service and the end of monolithic vendors. This effort is still underway, mostly solved, and the future holds efficiency gains more than raw performance improvements. Interestingly, in this dimension the input data is often public and the output data is also semi-public (probably the web's influence).

  • Commodity hardware
    • Three-tier architecture (frontend, appserver, database)
    • Use a bigger DB machine and cheaper stateless nodes
    • Consistent caching for high-throughput reads
  • Vertical partitions
  • Distributed data
    • Bigtable, Cassandra, and "NoSQL" systems; analysis through MapReduce
    • Hotspots handled by tablet-splitting and automatic rebalancing
    • Eventually consistent via multi-master replication, strongly consistent via Paxos

3. Views
The third dimension is views: unique perspectives on shared data where attributes vary based on who or what is viewing. Its necessity came in part from personalization and socialization on the web, but also multi-tenancy, hosted services, and growing datasets. For this dimension the input data is both public and private and the output data blends. Caching of shared, public views isn't nearly as effective. The scaling challenge is connecting disparate, multifaceted information.

  • Personalization
    • Offline analysis of past history and behavior to derive new recommendations
    • Blend user's analyzed preferences with global data, common signals
    • Work done at query time, sometimes cacheable, often results are one-time-only
  • Broadcast messaging
    • Fan-out of messages to per-user consumption feeds
    • Work done at write-time (Twitter), read-time (Facebook), or both
    • Message destinations are contacts, groups for access control
  • Filtering and correlation
    • Search by keyword, geospatially across public and private data in real time
    • Cross-publisher integration with access control and immediacy
    • Relevance computed for new information at moment of collection/creation
~
I'm not sure if the "third dimension of scaling" is the perfect phrase, but I think it's good for describing where we've been and where we're going.
→ reply

15 March 2010

Highlight of open questions for Twitter founder from SxSW (whole exchange here):

Q: @ev Why would I ever, ever want a newspaper to @anywhere link to @anildash instead of using HTML to link to anildash.com?
A: @anildash It’s not an either/or. It’s a hover action. Link still exists. Will result in more followers and ultimately traffic

--For whom?
→ reply
Tons o' Stats.

The reference PubSubHubbub hub has been growing briskly (more on that another time). We needed more fine-grained measurement of growth and the status of feed publishers and subscribers. To that end I implemented a reservoir sampler to give us a statistical estimate of the high volume traffic. This approach samples both the input key and value, so it can estimate latency and error rate (numerical metrics) easily.

Below are various screenshots of this new code in action (it's live). The subscriber diagnostics and publisher diagnostics pages on the reference hub will let you see these stats for your feed or subscriber callback's domain (if you supply the right secret token). The redacted boxes protect sensitive URLs and stats.

Hub stats delivery to subscriber callbacks error rate (1 day window)


Hub stats delivery to subscriber callbacks latency (1 hour window)


Feed publisher statistics


Subscriber callback statistics

The code for the MultiSampler is here; it's modular (and tested!) and I encourage you to use it in your own App Engine projects. Eventually I hope to combine N of these samplers, synchronized together, for a moving window/average function so you don't lose data resolution when the reservoir resets.
→ reply
Video from this past week's Gillmor Gang is up.
→ reply

14 March 2010

Holy shit the panel got on CNN.com too-- crazy! → reply
Nice coverage of my panel yesterday from Louis Gray and GigaOm. Thanks everyone who came! 1 reply

12 March 2010

Cool the video from my PyCon talk on PubSubHubbub is up!
→ reply
Watch the Gillmor Gang at 12:30pm Pacific with me and Chris about DiSo2 and the Google Buzz API. → reply
Come to my SxSW panel on Saturday at 11am CST! Can the real-time web be realized? with Jack from Collecta, Scott from Gowalla, Dare from Microsoft, and Marshall from ReadWriteWeb. → reply

05 March 2010

Nerdy topic of the day: elisp to HTML5 interpreter (so we can code on browser-centric netbooks). → reply

03 March 2010

Many thanks again to Joseph Scott for implementing PubSubHubbub support for WordPress.com. → reply

18 February 2010

Yay in-flight Wifi! MapReduce from 30,000 feet~ → reply
Off to PyCon Atlanta to present about PubSubHubbub and App Engine. Looking forward to Buzzing at a conference! → reply

12 February 2010

Own your content on the web with WebFinger

Now that all Gmail users have WebFinger enabled, its time for some real talk about why it's useful.

Tantek is passionate about his presence on the web. He's mentioned the importance of link shorteners but this has nothing to do with a 140 character limit; the concern here is attribution. When content is redistributed through services like Google Buzz and FriendFeed the URLs shared are owned by Google or Facebook. This exposes content creators to the risk of these services changing unexpectedly, but more importantly it disconnects creators from the means of distribution: links.

As a straw-man solution, let's say that Google Buzz let you redirect all of your content through your own domain (using a CNAME alias); that means your items would be shared on URLs like http://buzz.example.com/1234 instead of http://www.google.com/buzz/1234. At any time in the future you could change your links to redirect to another server (not Buzz) without losing any attribution. All of the URLs distributed throughout the web would continue to work, but you would make them point at a new place. For example, someone could export all of their Buzz data to WordPress and host it independently. Enabling URL ownership like this is an extension of the Data Liberation efforts, but for what we do on the web: share.

There are many problems with this straw man: it doesn't scale well because you would need to configure a new CNAME for every site you use on the web; it's error-prone because you could forget to set your redirection preference; and it's probably too complex for an average user to do.

Instead, what if sites could determine, just from your logged-in identity, how you want your URLs to be shared? The site would see that you're johndoe@example.com and magically determine the standard web service for generating short links on your behalf. You could configure this shortener a single time, associate it with your email address a single time, and then Google Buzz and hundreds of other sites could use your service to generate links to your content.

This is the promise of what you can do with WebFingerassociate global preferences with identity.

For the technical people, this works because the site would find a URL shortening service in the user's XRD file that looks like this:
<?xml version='1.0'?>
<XRD xmlns='http://docs.oasis-open.org/ns/xri/xrd-1.0'>
  <Subject>acct:johndoe@example.com</Subject>
  <Alias>http://johndoe.example.com</Alias>
  <Link rel='http://openshortener.org/spec' 
        href='http://johndoe.example.com/shortener'/>
</XRD>

Note that this shortening standard isn't established yet, but it should be. Defining services like this is part of what DeWitt and others meant by "work with everyone here on where we are going next". Now that we have WebFinger, we're calling on you and other developers out there to run with it: Define the services you want to use, how they will work, and provide sample implementations. This is how the ecosystem of WebFinger services will grow.
→ reply

11 February 2010

Notice your Blog's reach stats lately? If you've got Buzz hooked up it probably exploded! The secret: Liberal HTML embedding; i.e., tracking gifs work!
→ reply

10 February 2010

Calling it the "Buzz API" is kinda funny because it's not an API in the common sense. We're just reusing the best practices on the web (e.g., feeds). → reply
Could someone (like twitterfeed) use the Buzz API to make Buzz posts from Gmail work as a Twitter publishing client? This can be real-time via PubSubHubbub.

Filter for entries with the http://activitystrea.ms/schema/1.0/note activity type and a //feed/entry/link[@rel="alternate"] pointing to google.com/buzz/*.
→ reply

09 February 2010

Brad and I whipped up a tool for forcing Social Graph recrawls. Use this to fix your feed associations for Buzz! 2 replies
Wunderbar seeing all the excitement about open standards today. → reply
Check out the Buzz API docs. PubSubHubbub supported inbound and outbound TODAY! ActivityStreams too! → reply
Be sure to watch this at 10am Pacific (30 mins from now)!! http://www.youtube.com/feb0910googleevent → reply

08 February 2010

Meep Meep~ → reply

07 February 2010

Anchor Bock + New Belgium Mighty Arrow-- Yay spring! → reply
Wow amazing in SF today-- forgot what the sun looks like. Too bad it's the superbowl. → reply

06 February 2010

The "like" button is ubiquitous but often inappropriate ("My dog died"-- *like*). How about adding a tomato button too? I want "negative acknowledge". → reply

05 February 2010


Pizza trials. Try, try, try again.


Arugula is delicious.
→ reply

30 January 2010

This talk of Flash's death is missing the point: Flash's appeal is ease of development. Adobe will do a hail-mary with a Flash-to-HTML5 cross-compiler (like GWT) and save their ecosystem. I bet a prototype already exists. → reply

27 January 2010

iPadOS is At Ease 2.0? → reply

26 January 2010

Tantek is now pushing updates from his own site. Self-hosted + personal URL shortener + h/Atom feeds + PubSubHubbub = independent awesome! → reply

25 January 2010

Just searched for "Billy Joel Keeping the Fail" -- whoops. → reply

22 January 2010

Funny excerpt from John Mayer interview regarding Twitter:
Shouldn't the journalists be the skeptical ones?
→ reply

20 January 2010

Became Today is a new blog about entrepreneurs in San Francisco that Colleen put together. First video interview is with Eve Batey from SFAppeal. Check it out! → reply
Insane prediction: Apple tablet will use color eink display; works for websites, music, ebooks, but not movies. → reply
Bill Gates launches a blog, no RSS feeds to be found (right?). Come on, man-- who's reading this? 1 reply

19 January 2010

Gillmor Gang video from last week is up!

→ reply
Long day of video editing thanks to Mr. Skidgel and a DVX-100. Good timing for the storm. → reply

18 January 2010

Panasonic Lumix GF1 on the way. Micro-4/3s cameras are the future; they leave the past of mirrors behind. More detail here. My review forthcoming. → reply

14 January 2010

Colbert + Philip Glass holy shit -- lame, Hulu took it down. → reply

13 January 2010

Going on Gillmor Gang tomorrow for a PubSubHubbub update. Watch at http://www.building43.com/realtime/ → reply
Fake word of the day: agnosticate-- verb. To make agnostic; e.g., we need to agnosticate our API. → reply

12 January 2010

In case you live under a rock: A new approach to China--

"These attacks and the surveillance they have uncovered--combined with the attempts over the past year to further limit free speech on the web--have led us to conclude that we should review the feasibility of our business operations in China. We have decided we are no longer willing to continue censoring our results on Google.cn, and so over the next few weeks we will be discussing with the Chinese government the basis on which we could operate an unfiltered search engine within the law, if at all. We recognize that this may well mean having to shut down Google.cn, and potentially our offices in China."
→ reply
GWT 2.0's DevMode is super convenient. I can use Firebug with GWT widgets now-- Yay! Upgrade from GWT 1.5.3 to 2.0 was surprisingly painless. → reply

09 January 2010

6.5 off coast of Eureka: http://earthquake.usgs.gov/earthquakes/recenteqsus/Quakes/nc71338066.php → reply
Another earthquake? Anyone else feel that? Third in a row? → reply

08 January 2010

Cheesiest PubSubHubbub promo video with Brad and me.

→ reply

06 January 2010

A tell, the poker term, exists in other fields-- kinda. An engineering tell indicates what someone doesn't know. Contrived example: "An external hard disk backup is always safe" → Never heard of bit rot. Some recent indicators I've come across: never used Linux; never used threads; never had a problem bigger than a single machine; never heard of Reed-Solomon; and many more. I want to know: What are my tells? → reply

05 January 2010

Nexus One usage has obviated my need for computers over the holiday. Device convergence is multitasking, and the N1 does it better than any mobile I've seen. → reply

04 January 2010

Javascript is a cousin of Lisp, so they say. It's feeling especially functional with the strict continuation passing style of the Chrome Extensions API. → reply

01 January 2010

NY Times' rent vs. buy housing widget is great. It accounts for the opportunity cost of investing in a house over something more predictable (e.g., bonds). Recently they're saying that this is the bottom of the US real-estate market. Shrug.

These are scenarios I consider realistic to explore potential home ownership. They are in San Francisco and only apply to the metropolitan condition. Inflation and rent increases are pegged at 2% and they use a rent/price ratio of ~24: $4k/month for rent, house of ~$1.2M (the outcome is the same for different rent/price values, only the ratio matters). More »
→ reply