Thoughts on (premature) optimization

We all know premature optimization is bad, or according to Donald Knuth, “the root of all evil”. Here’s the full quote:

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

There’s a lot of information packed in the quote. First, it is important to note that it is premature optimization that is bad, not optimization in general. Of course, you should optimize your systems, but not when you don’t know if it will have any measurable impact or optimize so often that it derails your project.

Second, you have to find the 3% opportunities in your system that will have the most impact on your performance. Most of your energy should be focused on this 3% and not the other 97% that may not have a significant impact.

This is a great rule of thumb but there are more exceptions to it. That’s what we’ll discuss in this post.

What is Premature Optimization?

Let’s say you are building a web backend server with a JSON API. Performance is critical to the success of this project and you want to keep your response times as fast as possible.

The standard library in the language of your choice comes with a package for converting native objects/structs to and from JSON strings. For example, Go comes with the encoding/json package to do just this.

You happen to know of a few optimizations that could be made to the encoding and decoding of JSON strings that have not been implemented in the Go standard library package. Now, do you write your implementation of this package with the optimizations or go ahead with the standard library package for now?

At this point, if you chose to implement a new package to optimize the encoding & decoding of JSON strings, that would be premature optimization. In most cases (if not all), you should use what’s readily available to complete your project.

You don’t know how many milliseconds you could have shaved off with your JSON library. You are more than likely to introduce more bugs compared to the heavily tested package included in the standard library. It is far more likely that your API would be slow due to a poorly written database query than the encoding/decoding of JSON structs.

However unlikely, in some cases, you may need to optimize that too. And there’s a method to do it correctly.

Right way to optimize

When your project is complete and you are now ready to optimize, you have to follow 3 (not so) simple steps: 1. Measure 2. Optimize 3. Re-measure Keep repeating the steps until you have hit your target response times.

If you are trying to optimize an endpoint, you need a baseline number to start with. Run load tests or check its performance in production to see what the 95th percentile response time is. Let’s say, your tests show that the endpoint responds within 100ms 95% of the time.

Now, find a block of code that you suspect could be causing the delay. This could be database operations, making network calls, I/O operations, or in your case, encoding & decoding JSON. Wrap the block of code with time measurements to see how fast or slow that particular operation is.

If your JSON encoding & decoding takes 5ms, you now know that you could potentially optimize it and reduce it by 50% (in best case scenario) saving 2.5ms. Is that worth the optimization? It depends on your project and requirements but now you have all of the information to make the right decision.

Optimize now or later?

The example I discussed above was an obvious case of premature optimization and it was the right decision to wait until the end of the project to measure and optimize the encoding & decoding of JSON. In some cases, the answer is not so obvious.

As a backend engineer, you may be familiar with the N+1 problem. For a problem as common as this, do you optimize this now or use the measure → optimize → re-measure steps discussed above? Is this premature optimization since you know of this problem and have a fairly simple solution to it?

In some cases, the cost to optimize now is so low that it outweighs the benefit of not optimizing right away. This is likely to happen when an optimization problem is common and has a solution readily available. Yes, it might take a few extra lines of code to preload all of the rows from your SQL query but, in my opinion, this is just optimization, not premature optimization.

When you are deciding on whether to optimize now or later, ask yourself what the cost to optimize is? Have you seen this problem before and are there solutions readily available for you to implement? If the cost to optimize is negligible with existing solutions, you should optimize.

Follow me on Twitter to get future post updates