I'm Brett Slatkin and this is where I write about programming and related topics. You can contact me here or view my projects.

19 August 2014

The Data Triumvirate

There's a huge focus right now on building products and services that do data analysis. Developing these systems involves three distinct groups of stakeholders that have opposing viewpoints.

  • The product managers are trying to sell something. They want the data to show what they're selling is working for someone (themselves, customers, end-users). They want impact.
  • The statisticians are trying to ensure correctness. They want the data to be unbiased. They want the methodology for finding results in the data to be defensible to their peers.
  • The engineers are trying to ship the simplest thing possible. They want to minimize the complexity of analyzing the data. They want a data pipeline that is maintainable and extensible.

The tension between these roles is crucial. If one outlook dominates a joint effort you're setting yourself up for failure.

  • If the product managers always get their way you're letting a fox guard the hen house. They'll find significance in the data at the expense of bias and methodological validity. You'll be selling snake oil.
  • If the statisticians get their way you'll never ship your product. Compensating for every bias in a dataset is nearly impossible. You'll never have the 99% confidence they want for every measure.
  • If the engineers get their way your product will be too simplistic. The most maintainable implementation will undermine the statistical methods. The impact measured won't be compelling enough to sell.

If you're doing analysis, make sure you're part of a data triumvirate. It's similar to the relationship between product managers and tech leads. You need a balance of power to build the right thing.
© 2009-2024 Brett Slatkin