There's a huge focus right now on building products and services that do data analysis. Developing these systems involves three distinct groups of stakeholders that have opposing viewpoints.
The product managers are trying to sell something. They want the data to show what they're selling is working for someone (themselves, customers, end-users). They want impact.
The statisticians are trying to ensure correctness. They want the data to be unbiased. They want the methodology for finding results in the data to be defensible to their peers.
The engineers are trying to ship the simplest thing possible. They want to minimize the complexity of analyzing the data. They want a data pipeline that is maintainable and extensible.
The tension between these roles is crucial. If one outlook dominates a joint effort you're setting yourself up for failure.
If the product managers always get their way you're letting a fox guard the hen house. They'll find significance in the data at the expense of bias and methodological validity. You'll be selling snake oil.
If the statisticians get their way you'll never ship your product. Compensating for every bias in a dataset is nearly impossible. You'll never have the 99% confidence they want for every measure.
If the engineers get their way your product will be too simplistic. The most maintainable implementation will undermine the statistical methods. The impact measured won't be compelling enough to sell.
I'm still trying to find the boundaries between Python, C++, and Go when building something new. The split between Python and C++ is a clear tradeoff of bare-metal performance for developer productivity. The split between Go and C++ is easy for me because of scars from templates and the difficulty of concurrent programming in C++.
What's been difficult is finding the dividing line between Python and Go. What I've been able to come up with so far is a trite analogy involving types of bicycles.
Python is a touring bicycle. It's approachable and easy for anyone to learn how to ride. It has features like a basket, fenders, lights, and a pump that make it practical for almost every situation. Its limited gears mean it's dependable, but slow unless you pedal hard.
C++ is a race bike. It's difficult to ride and easy to crash. It has multiple sets of handlebars and every other feature a bike could offer for maximum speed. Its fragility makes it impractical for simple riding, but it can get you there faster than anything else.
Go is a modern cyclo-cross bike. It's simpler and safer than a race bike, but still fast. It has most of the features you want (light-weight, aerodynamic) and some that are uniquely surprising (durability, knobby tires). It's a whole new category of riding.
Continuing with the bad joke, Java would be the bicycle factory factory. Okay — the analogy breaks down quickly. But it's not so awful.
For an unexpected adventure I'd choose the touring bike (Python): quickly building something new, handling all environments easily, everything I need built-in. For competing on a more well-defined course with treacherous obstacles I'd choose the modern cyclo-cross bike (Go): speed, versatility, safety. For a time trial on smooth roads with no obstacles I may even risk riding the race bike (C++).
That's the best assessment I've got for now. Where do you draw the line?
From plastic 3D print to milled aluminum — Making a bicycle gear shifter
Earlier this year I spent 30 days designing and 3D-printing a bicycle gear shifter for the Shimano 8 speed internal hub. The goal was to make one for my Mission Bicycle. I wrote about the experience here. After posting that story I worked with the team to revise the design. Here's the result as a plastic prototype. I've been pedaling around with this for the past few months.
My next goal was to find a machinist to turn this part into 6061 aluminum for strength and durability. I searched around the SF Bay Area and found a few machine shops. A couple folks I contacted never responded. One replied but declined because they couldn't open my 3D design files. Another met me in person, but wasn't excited about the project (his primary business is building things that go into space!).
Finally, a friend of mine connected me to someone whose day job is running a machine shop. For the sake of privacy I'll refer to him as "the machinist". It's important to note that there's no way any of what follows would have happened without their work. I'm grateful for all of their effort on the project. I've just been along for the ride.
Design for manufacturing
The machinist also had problems opening my CAD files (from 123D Design — $10/month). The issue is the DWG files exported by one company's software can't always be imported by another company's software. There are tools out there for format conversions (like Teigha) but they aren't good enough. Professionals seem to use Solidworks (MSRP: $5000) over Autodesk. It's the Cadillac of CAD. I can't afford it.
Luckily, I found a free tool by the makers of Solidworks called Draftsight that can be used to open and annotate the DWG files from Autodesk products. As long as it could be opened in Draftsight it could also be opened in Solidworks. When it didn't work I knew something in my model was funky and needed to be fixed.
Once the machinist could open my files we went through another cycle of design tweaking. There were a lot of things I thought were simple to mill out of aluminum that turned out to be impossible. Machining is really an art form like blacksmithing. It's full of arcane knowledge. The only way to learn is to actually use the tools and gain experience from your mistakes. Here's the final design for manufacturing, which looks better than the original.
Milling the part
With the final design in hand I bought all of the parts to build and assemble a prototype. I got the screws, washers, and ball springs from McMaster-Carr. I bought aluminum from TCI Aluminum. Machinists call raw uncut aluminum blocks "blanks". A blank is the starting material that is cut down to the final part by the CNC machine. Here's what my blanks looked like.
You normally buy 12 foot long bars of aluminum that best fit the profile of your part. Then you cut it yourself to size. Here's what that process looks like. It's quite a saw.
The control panel on the right lets you modify the program before or while it runs. What you see in the middle on a spindle is the business end of the machine. Here's a picture of the 3 inch diameter facing endmill. It's hardened steel and extremely heavy. It has carbide inserts so it doesn't wear down too quickly.
While the mill is doing its work it needs to change tools for particular cuts. To do this it has an enormous cache of end mills that can be changed automatically by the machine. This lets the machine cut, drill, face, and tap without human intervention. The machine uses a robot arm and compressed air fittings to swap the bits. It takes less than a second to swap (insane!). Here's a picture of the revolving cache, about 10 feet off the ground.
The mill is a dumb machine like a computer. It runs programs in G-Code, which describe where to move the cutting bits in XYZ coordinates. The mill does little to ensure the G-Code is valid. If the G-Code said to bash the bit into the vise the machine would do it. It's similar to how a bad set of instructions can crash a computer program.
The G-Code comes from CAD software that's the machinist's equivalent of a compiler (Mastercam and HSMWorks are popular tools). The CAD lets you build a set of "operations". An operation is all of the various cuts you need to make in one side of an aluminum blank. This can include multiple passes by many different types of cutting tools. The software doesn't automatically convert your 3D model into operations. You need to manually look at the outlines of the part and come up with a series of "tool paths" for the cutting blades to follow. The hope is the cuts will result in your design by subtracting from the aluminum blank. Machinists call this conversion process "programming".
One of the coolest things the design software can do is simulate the mill in action. It's like a highly sophisticated interactive debugger. Here's what it looks like.
Once you have the G-Code you put it on a USB stick and upload it to the mill. Then you put the blank in the machine and clamp it down with a vise. You bang it with a dead blow hammer to make sure it's seated properly. Then you use the mill control panel and it's 1000 buttons to tell the machine to measure the XYZ coordinate of one of the blank's corners. This tells the mill where to start cutting so you get exactly the physical object that you expect. Here's one blank in the vise.
You can see aluminum shavings all over the inside of the mill. Those are the "chips" from the end mill. There's also this weird fluid on everything. That's lubricating coolant to ensure the end mills don't overheat while they're spinning at 8000+ RPM and cutting. When the machine's actually going you can't see anything because coolant is everywhere. But the cutting sound is extremely loud. Here's a video of the machine making a cut.
The bike shifter ended up being 7 operations in all. 3 were for the base that goes over the steering tube. 4 were for the knob that pulls the shifter cable. The design also required 3 different pairs of "soft jaws", which are essentially jigs that hold the aluminum blanks in place to handle mating surfaces that aren't square.
I was surprised by how important it is to hold the aluminum blank properly in the mill. It significantly affects how you choose the order of your operations. Each operation must leave on enough "fixture stock" to let you grip the part from the opposing side until everything is square or fits in a soft jaw. Here you can see each step in cutting the shifter's base piece and the various fixtures.
The shifter knob provided an additional challenge because it was important that all of the cuts for the center hole and ball springs were concentric. Here's what the blank looked like after each operation for the knob.
Built and biking
At last! It's all done. Here's the shifter actually mounted on my bicycle.
And here's what it looks like from the top when you're riding it.
For the next month we'll try these prototypes and see how they feel. We'll probably make some design revisions and update the G-Code programs accordingly. We'll get some anodized in black. Once that's done I'm hoping we'll do a manufacturing run at a local machine shop. I'd like to start putting these on Mission Bikes and selling them to anyone who likes the Nexus 8 hub. Let me know if you're interested in getting one!
I've been working on Effective Python for just over two months now. The plan is to have 8 chapters in total. I've written a first draft of 5 so far. Chapter 3, the first one I wrote, was the hardest for many reasons. I had to establish a consistent voice for talking about Python. I took on the most difficult subjects first (objects and metaclasses) to get that work out of the way. I also had to build tools to automate my workflow for writing.
Each chapter consists of a number of short items that are about 2-4 pages in length. The title of an item is the "what", the shortest possible description of its advice (e.g., "Prefer Generators to Returning Lists"). The text of the item is the "why", the explanation that justifies following the advice. It's important to make the core argument for each item using Python code. But it's also important to surround the code with detailed reasoning.
Before I started I read a nice retrospective on how one author wrote their programming book. They had separate source files for the example code and used special comments to stitch it into the book's text at build time. That's a great idea because it ensures the code that's in the book definitely compiles and runs. There's nothing worse in programming tutorials than typing in code from the text and having it barf.
I wanted to go a step further. I wanted my examples to be very short. I wanted to intermix code and prose more frequently, so the reader could focus on smaller pieces one step at a time. I wanted to avoid huge blocks of code followed by huge blocks of prose. I needed a different approach.
After some experimenting what I landed on is a script that processes GitHub Flavored Markdown. It incrementally reads the input Markdown, finds blocks that are Python code, runs them dynamically, and inserts the output into a following block.
Here's an example of what the input Markdown looks like:
The basic form of the slicing syntax is `list[start:end]`
where `start` is inclusive and `end` is exclusive.
a = [1, 2, 3, 4, 5, 6, 7, 8]
print('First four:', a[:4])
print('Last four: ', a[-4:])
print('Middle two:', a[3:-3])
First four: [1, 2, 3, 4]
Last four: [5, 6, 7, 8]
Middle two: [4, 5]
When slicing from the start of a list you should leave
out the zero index to reduce visual noise.
assert a[:5] == a[0:5]
I write the files in Sublime Text. When I press Command-B it builds the Markdown by running my script, which executes all the Python, inserts the output back into the text, and then overwrites the original file in-place. This makes it easy to develop the code examples at the same time I'm writing the explanatory prose. It feels like the read/eval/print loop of an interactive Python shell.
My favorite part is how I made Python treat the Markdown files as input source code. That means when there's an error in my examples and an exception is raised, I'll get a traceback into the Markdown file at exactly the line where the issue occurred.
Here's an example of what that looks like in the Sublime build output:
Traceback (most recent call last):
File ".../Slicing.md", line 29, in
IndexError: list index out of range
It's essentially iPython Notebook, but tuned for my specific needs and checked into a git repo as Markdown flat files. Update: A couple people mentioned that this is a variation of Knuth's Literate Programming. Indeed it is!
Unfortunately, my deliverable for each chapter must be a Microsoft Word document. As a supporter of open source software and open standards this requirement made me wince when I first heard it. But the justification is understandable. The publisher has a technical book making system that uses Word-based templates and formatting. They have their own workflow for editing and preparing the book for print. This is the reality of desktop publishing. More modern tools like O'Reilly Atlas exist, but they are new and still in beta.
There is no way I'm going to manually convert my Markdown files into Word files. The set of required paragraph and character styles is vast and complicated. These styles are part of why the published book will look good, but it's tedious work that's easy to get wrong. Sounds like the perfect job for automation!
I have a second script that reads the input Markdown (using mistune) and spits out a Word .docx file (using python-docx). The script has a bunch of rules to map Markdown syntax to Word formatting. The script also passes all of the Python code examples through the Python lexer to generate syntax highlighting in the resulting document.
The other important thing the publishing script does is post-process the source code. Often times in an example there are only two lines out of 20 I need to show to the reader to demonstrate my point. The other 18 lines are for setup and ensuring the example actually demonstrates the right thing (testing). So I have special directives in the code as comments that can hide lines or collapse them with ellipses.
Here's an example of what this looks like in the Markdown file:
def __getattr__(self, name):
if name == 'missing':
raise AttributeError('That property is missing!')
value = 'Value for %s' % name
setattr(self, name, value)
data = MissingPropertyDB()
data.foo # Test the success case
except AttributeError as e:
The actual output in the published book would look like this:
def __getattr__(self, name):
if name == 'missing':
raise AttributeError('That property is missing!')
data = MissingPropertyDB()
except AttributeError as e:
AttributeError('That property is missing!',)
If you have any interest in using these tools let me know! Writing a book is already hard enough. Having a good workflow helps a lot. I'd like to save you the trouble. Otherwise, if you have any suggestions on what I should put in the book, please email me here.
I'm less interested in generics in Go after reading this intro to Boost, the C++ template library. The resulting code is impenetrable to anyone but an expert. I don't think generalization is as important as computer scientists say it is. Approachability should be the biggest concern given that developing software is almost always a social problem, not a technical one.
Rob Pike's solution for generic programming in Go could be his proposal for the "generate" command in the Go toolchain. This addresses a whole range of requirements, including lexers, binary embedding, and protocol buffers. Templating for Go generics is just a natural consequence of the design. It'll be interesting to see if existingattempts at generics in Go move to use "generate" instead.
Dropbox rewrote some infrastructure and produced 200KLOC of Go. They open sourced their common Go libraries. That's great. Some folks are saying this means they've abandoned Python. I don't read it that way. Dropbox needs to reduce costs and scale up if they want to have an IPO. Moving to Go seems like a natural choice for a company heavily invested in Python. Brad says Go gives you "90% of the ease of scripting languages with 90% of the performance of systems languages." That sounds right to me, but it makes the choice between Python and Go even more murky. I'm still searching for the dividing line.
Von Neumann's crucial insight is that part of the replicator has a double use; being both an active component of the construction mechanism, and being the target of a passive copying process. This part is played by the tape of instructions in Von Neumann's combination of universal constructor plus instruction tape.
The combination of a universal constructor and a tape of instructions would i) allow self-replication, and also ii) guarantee that the open-ended complexity growth observed in biological organisms was possible. The image below illustrates this possibility.
This insight is all the more remarkable because it preceded the discovery of the structure of the DNA molecule by Watson and Crick, though it followed the Avery-MacLeod-McCarty experiment which identified DNA as the molecular carrier of genetic information in living organisms. The DNA molecule is processed by separate mechanisms that carry out its instructions and copy the DNA for insertion for the newly constructed cell. The ability to achieve open-ended evolution lies in the fact that, just as in nature, errors (mutations) in the copying of the genetic tape can lead to viable variants of the automaton, which can then evolve via natural selection.
I'm overwhelmingly excited to be writing the Effective Python book, a follow-on to Scott Meyers' classic, Effective C++. I'm honored to have the opportunity to author this book. I first read Effective C++ when I was 15 years old. Scott's Effective books led to my 7 year obsession with C++ and my first job. At Google I learned Python. I've been building infrastructure and applications with it for the past 9 years. Hopefully I have valuable advice to share from my own experience and what I've learned from the Python community.
Like the original, my book will be ~50 specific pieces of advice, 2-4 pages each, on how to write better Python programs. It should be the second thing you read after wonderful introductory books like Alex Martelli's Python in a Nutshell (which is how I learned Python) and Zed Shaw's Learn Python the Hard Way. It will be a stepping stone towards Python mastery via more thorough references like Wesley Chun's Core Python books and the Python Cookbook.
The goal of Effective Python is to give readers a sense of the "right way" of writing Python code in general. It's not a language introduction, a cookbook, an encyclopedic reference, or a guide for a specific area like Django, NumPy, of Kivy. This book is for programmers who want to know the dirt, the real stuff, the guidance of hard-won experience. It should transcend specific problem domains. This is what made Effective C++ so awesome.
A team at Google released FlatBuffers, yet another data encoding and IDL. It makes the same important tradeoff as Cap'n Proto: There is no encoding or decoding; how it's serialized is the same as what's in memory. Having a distinct encode/decode step is the fatal flaw in Protocol Buffers.
Slogged through some Java today. It's fine. But I've never achieved Java nirvana. When I see a runtime error caused by tools like Guice and Dagger I want to delete my homedir and go on a silent retreat. People say they'll never use Python in production because of syntax errors at runtime. It seems like injection warnings are another dimension of the same problem.
My current project is first time I've managed other people directly. But I'm still a software engineer, so my responsibilities are:
Lead our engineering team to get things done
#3 is the "engineering management" part of my job. The gist of engineering management:
Ask the right questions
Prioritize the team's efforts to address these risks
Escalate issues beyond your control
It's meetings with your team, meetings with other managers, and planning.
There is another type of engineering manager at Google that only does #3. These folks are pure managers, not software engineers. They are not responsible for direct technical contributions (even though many still write code and participate in design). Their only goal is leading their team. Most Directors and VPs have this role. Many large teams are led by pure engineering managers because building software at scale is primarily a social problem, not a technical one.
What's interesting is what happens in an emergency.
When something goes wrong I want to write code to solve the problem because that's what I do. Engineering managers want to schedule meetings because communication is what they do. There's nothing wrong with this, but it causes a conflict like cats and dogs. Dogs bow when they want to play. Cats think dogs are posturing to pounce and are being aggressive. So cats and dogs often have trouble getting along. Similarly, engineering managers schedule meetings with programmers when there's something wrong. Programmers want to be writing code, etc, so they see these meetings as a waste of precious time, interrupting progress towards a solution.
Neither group is "right", it's just the difference between the roles. You need both roles to create the healthy, constructive tension that makes an engineering organization function properly at scale. But I hadn't really appreciated how much this difference in priorities matters until I experienced the friction first-hand in a recent episode.
Here are some suggestions I've come up with to make it easier going forward.
Be clear about why you want to avoid meetings (i.e., you're debugging)
Have more meetings with pure managers to enable them do their jobs
Maximize engineers' time by having scalable meetings instead of one-on-ones
Minimize interruptions by preferring asynchronous channels like bug trackers and mailing lists
Cool to see Micro Python, a specialized version of Python for embedded devices.
Micro Python is a new implementation of the Python 3 language, which aims to be properly compatible with CPython, while sporting a very minimal RAM footprint, a compact compiler, and a fast and efficient runtime. These goals have been met by employing many tricks with pointers and bit stuffing, and placing as much as possible in read-only memory.
Unfortunately as long as the language / runtime relies on garbage collection important use-cases like "real-time" aren't possible. Go has the same problem, even though it's a lot closer to the metal. That's not preventing people from trying, though.