I'm Brett Slatkin and this is where I write about programming and related topics. Check out my favorite posts if you're new to this site. You can also contact me here or view my projects.

15 March 2010

Tons o' Stats

The reference PubSubHubbub hub has been growing briskly (more on that another time). We needed more fine-grained measurement of growth and the status of feed publishers and subscribers. To that end I implemented a reservoir sampler to give us a statistical estimate of the high volume traffic. This approach samples both the input key and value, so it can estimate latency and error rate (numerical metrics) easily.

Below are various screenshots of this new code in action (it's live). The subscriber diagnostics and publisher diagnostics pages on the reference hub will let you see these stats for your feed or subscriber callback's domain (if you supply the right secret token). The redacted boxes protect sensitive URLs and stats.

Hub stats delivery to subscriber callbacks error rate (1 day window)

Hub stats delivery to subscriber callbacks latency (1 hour window)

Feed publisher statistics

Subscriber callback statistics

The code for the MultiSampler is here; it's modular (and tested!) and I encourage you to use it in your own App Engine projects. Eventually I hope to combine N of these samplers, synchronized together, for a moving window/average function so you don't lose data resolution when the reservoir resets.
© 2009-2016 Brett Slatkin