I'm Brett Slatkin and this is where I write about programming and related topics. You can contact me here or view my projects.

27 January 2013

It is ridiculously easy to refactor Go

I've written Go in the past, but not enough to understand its nuances or appreciate its design tradeoffs. Over the holiday I hacked quite a bit on a project in Go. I discovered a profound outcome of Go's support for static type checking and duck typing: It is ridiculously easy to refactor Go. This wasn't obvious to me until I experienced it myself.

My project involved a bunch of email parsing, specifically RFC822 messages with multipart MIME content. This is normally hell on earth, but I've written code to deal with multipart so many times now that it's one of my "learn the language" exercises.

Note: I swear to you this will be light on how mime works. I want to show a simple motivating example that surprised me.


My first attempt

The biggest problem with mime is it's turtles all the way down. You can have mime-in-mime-in-mime, multipart/alternative in multipart/mixed, and so on. While hacking away I found two pieces of code doing pretty much the same thing to deal with the turtles. The code initially looked like this:
func main() {
  var myHeader mail.Header
  var body io.Reader

  // ↑ Somehow fill these variables ...

  result, err := extractBody(
      myHeader.Get("Content-Type"),
      myHeader.Get("Content-Transfer-Encoding"),
      body)

  // Do something with the email ...
}

func extractBody(contentType string, transferEncoding string, bodyReader io.Reader) (*Email, error) {
  mediaType, params, _ := mime.ParseMediaType(contentType)
  if mediaType[:4] == "text" {
    return extractTextBody(contentType, transferEncoding, bodyReader)
  } else if mediaType[:9] == "multipart" {
    return extractMimeBody(params["boundary"], bodyReader)
  }
  return nil, fmt.Errorf("Unsupported content type: %s", contentType)
}

func extractTextBody(contentType string, transferEncoding string, bodyReader io.Reader) (*Email, error) {
  // Decode the body bytes based on the content type, and the encoding
  // of the text, returning a newly allocated Email struct ...
}

func extractMimeBody(boundary string, bodyReader io.Reader) (*Email, error) {
  mimeReader := multipart.NewReader(bodyReader, boundary)
  var result Email
  var err error
  for {
    part, _ := reader.NextPart()
    defer part.Close()

    // ↓ This looks identical to "main" and "extractBody" -- gross!
    contentType := part.Header.get("Content-Type")
    mediaType, params, _ := mime.ParseMediaType(contentType)
    if mediaType[:4] == "text" {
      result, err = extractTextBody(
          contentType,
          part.Header.Get("Content-Transfer-Encoding"),
          bodyReader)
    } else if mediaType[:9] == "multipart" {
      result, err = extractMimeBody(params["boundary"], part)
    }
    // End gross section

    if result != nil && result.ContentType[:4] == "text" {
      return result, nil
    }
  }
}

The problem

When I noticed this duplicate code extracting headers, my first thought was, "Hey I'll just pass the Header struct all the way down." In Python that's fine, but in Go mail.Header and multipart.Part.Header are separate types. Go won't let you pass one instead of the other because they support different functions:
// mail.Header
type Header map[string][]string
func (h Header) AddressList(key string) ([]*Address, error)
func (h Header) Date() (time.Time, error)
func (h Header) Get(key string) string

// multipart.Part.Header == textproto.MIMEHeader
type MIMEHeader map[string][]string
func (h MIMEHeader) Add(key, value string)
func (h MIMEHeader) Del(key string)
func (h MIMEHeader) Get(key string) string
func (h MIMEHeader) Set(key, value string)
That seems stupid, because they are actually the same underlying type of map. But it makes sense that Go enforces this constraint at compile time. If my "extractTextBody" function called MIMEHeader.Del, that wouldn't work on mail.Headers because mail.Header.Del does not exist.


The solution

So now it's time for the trick, which in hindsight is obvious. Sometimes the closest exit is behind you: I defined a new interface that is smaller than Header and MIMEHeader, and has the common function that I actually needed from both:
type emailHeader interface {
  Get(key string) string
}
And behold! Now I could change my functions to accept the emailHeader interface. I could pass either Header or MIMEHeader structs to these functions as an emailHeader parameter. In C++ or Java you can't do this and I would be shit out of luck. In Go, this let me delete a bunch of duplicate code:
// My new magic interface!
// Because mail.Header.Get == multipart.Part.Header.Get
type emailHeader interface {
  Get(key string) string
}

func main() {
  var myHeader mail.Header
  var body io.Reader

  // ↑ Somehow fill these variables ...

  // Here I pass a mail.Header
  result, err := extractBody(myHeader, body)

  // Do something with the email ...
}

func extractBody(header emailHeader, bodyReader io.Reader) (*Email, error) {
  contentType := header.Get("Content-Type")
  mediaType, params, _ := mime.ParseMediaType(contentType)
  if mediaType[:4] == "text" {
    // Here I pass any kind of emailHeader!
    return extractTextBody(header, bodyReader)
  } else if mediaType[:8] == "multipart" {
    return extractMimeBody(params["boundary"], bodyReader)
  }
  return nil, fmt.Errorf("Unsupported content type: %s", contentType)
}

func extractTextBody(header emailHeader, bodyReader io.Reader) (*Email, error) {
  // Decode the body bytes based on whatever headers this needs,
  // returning a newly allocated Email struct ...
}

func extractMimeBody(boundary string, bodyReader io.Reader) (*Email, error) {
  mimeReader := multipart.NewReader(bodyReader, boundary)
  for {
    part, _ := reader.NextPart()
    defer part.Close()
    // Here I pass a MIMEHeader
    result, err := extractBody(part.Header, part)
    if result != nil && result.ContentType[:4] == "text" {
      return result, nil
    }
  }
}

Conclusion

I've written previously about how futures are a design pattern for refactoring. I see static, compile-time duck typing as another strong example.
© 2009-2024 Brett Slatkin