Kyle Simpson | February 14, 2011
One of the most quoted maxims in programming comes from Donald Knuth in his famous paper “Structured Programming With Go To Statements”:
Premature optimization is the root of all evil.
The surrounding context of this quote is important. His assertion is that optimizing on the 97% of non-critical code paths leads to uglier/harder-to-maintain code for little-to-no benefits. But, this implies that optimizing on the 3% of critical code paths is quite important. So, in reality, he’s more saying: “Optimizing non-critical code paths is the root of all evil.”
Knuth’s full intended meaning is lost almost entirely when developers use “Don’t prematurely optimize” to discredit any code that they feel is overly complex for no foreseeable benefits. Of course, such criticisms are entirely subjective; what’s overly complex to you may be perfectly readable and maintainable to me. It’s also untrue that performant code is always more complex, as the opposite is quite often more true.
The key in keeping to the spirit of Knuth’s admonition is of course how to determine what is “critical code path” and what is “non-critical”. Is his assertion of a 97/3 split even valid, or only suggestive? On the surface, the answer seems dead-simple, but underneath it’s very complex. For instance, what may be a critical path for one use-case of the software may be a non-critical path in another use-case, and vice versa. Should code in the “critical path” be subject to the same code style guidelines as other code in the project?
To extract something actionable out of this complexity, many people suggest a more pragmatic definition for premature optimization: optimizing any piece of code before you observe it performing unacceptably slow/non-performant. In other words, code is only a candidate for “critical path” optimization if you can observe it slowing down the overall software. Otherwise, it should be assumed non-critical, and thus shouldn’t be optimized.
But, how do we define observable? The conventional wisdom is that human beings can’t perceive less than 100ms response time in cause-effect. So, as long as some piece of code doesn’t slow things down by 100ms or more, then it isn’t observable.
There are others who don’t subscribe to this exact application of Knuth’s wisdom. A user on a slow computer with a slow internet connection is far more likely to cross that threshold than someone with a powerhouse computer and blazing fiber connection. The debate is far from a settled issue.
Give Me Speed or Give Me Death
Premature optimization is often compared to “death by a thousand paper-cuts”; it’s considered futile effort because it’s attacking lots of little things that individually aren’t critical, in the pessimistic assumption that all of them will conspire simultaneously to take the application’s performance down (in an observable way). In practice, it seems, only a few of those paper-cuts typically show up at once -- not enough to matter.
While this perspective is wise and level-headed, the other side of the coin is how much hurt even a few paper-cuts can cause if they aren’t known about and instead go untreated. In other words, rather than subjectively choosing specific optimizations and not others, the more productive path is usually greater awareness of how and where all the “cuts” can happen, so as to better avoid them.
As you read the rest of this article, consider this rephrasing of Knuth’s quote as you evaluate your own code optimization efforts:
Immature optimization is the root of all evil.
The Usual Suspects
Where should we begin looking for areas of our code that should be optimized? Are there patterns we can look for which are more likely to create observable performance bottlenecks? How do we make code more optimal without making it uglier and harder to maintain?
These are by no means the only offenders, but it’s a good bet that they’ll be the first places bad performance creeps into your application. If you can become more aware of these common pitfalls, your application performance will likely be more optimized by default.
So let’s take a look at a few common ways that developers unintentionally slow down their loops. Consider this code:
Notice on line 14 that `countOdds(...)` is called inside the loop, meaning it’s called every single iteration (1000 times!). Not only is it called every time, but inside `countOdds`, another loop occurs. So that means this inner-loop is happening with every call, and thus every time the outer-loop iterates.
This may seem like a silly example, but it’s actually quite common for someone to, in the process of refactoring code, move something like an inner-loop into another function, which masks the performance implications of calling such a function. If you couldn’t see the implementation of `countOdds`, you might not realize the inner-loop that’s happening.
Of course, in this contrived example, there’s no reason `countOdds` needs to be calculated over and over again. It can be calculated once at the end, saving on performance. But if there was some reason that the number of odds needed to be recalculated to be used inside each iteration, then you would have this performance tradeoff to keep in mind.
You could consider another algorithm to tackle the problem in a different way that avoids the performance penalties:
If you are moving code into a function because it will be called from more than one location, this is probably a re-factoring that makes sense, from a code maintainability perspective. But if you’re simply abstracting away a single block of inline code into a single function call, you’ve only marginally improved your code readability and taken on some potential performance baggage.
Lesson: when you make such a code change, keep in mind the potential for performance degradation, especially if the loop could eventually be called lots of times.
Another example of function calling inside loops can be seen here:
Here we see a very common pattern in object-oriented programming: the idea of creating a “data structure” that encapsulates some data and exposes some public methods to operate on (or iterate through) that data set. While this definitely has some attractive properties in terms of code style/structure, again, we must be aware of the potential performance implications.
For instance, on line 12 we see that the data structure doesn’t provide any API for a “bulk load” of data, so we must call the `add` function with each iteration of the first for-loop. What if the API could be adjusted slightly to allow `add` to accept an array, which we could build up in our for-loop and make one function/method call to the data structure to pass it in? That would save 999 function calls in this example, and we’d still have a pretty good “objected oriented” data structure abstraction.
Then, in the second loop, we see two method calls being made to the data structure for each iteration. The first one (line 16) is obvious: `obj.next()`. The second one (line 15) is sometimes harder to spot in your code. `obj.size()` is being called as an accessor method to an internal property. But, what if this data structure simply had an automatically-updated public `size` property that we could examine, instead of making a function/method call?
And what if instead of forcing us to call `obj.next()` over and over again, we could ask for a bulk dump of items by passing in a number parameter to `next`? Together, those two changes will save 1,998 method calls for this example, and again, our data structure won’t have sacrificed too much of its “object-oriented’ness” style. Take a look:
There’s no “free lunch” in programming, so admittedly everything is a trade-off, either for code style, or performance, etc. But knowing and understanding all sides of these issues is more than half the battle in being prepared to make those decisions in your code when the time comes.
A special topic I want to call out is a common pattern of degraded performance that often occurs in innocuous looking code like jQuery loops with `$.fn.each` and the like. Again, loops just tend to multiply small performance paper-cuts into bigger gashes.
In this snippet, we’re telling jQuery to loop over every <a> link element in the page, and we’re decorating each one with a className. Which class is added depends on the `rel` attribute of the element. Immediately, you should ask yourself, what it this code runs on a page that has 1000 links on it? You may be designing now for a page with only a few links, but will this code totally fall over 2 months from now when your page has a lot more content on it?
Let’s ignore for a moment that this jQuery pattern has implicitly resulted in the anonymous function (definition starting line 1) being called once for every single iteration of the loop. We already talked above about how common such patterns are, and the possible performance issues it can raise, depending on how many iterations there will be.
Inside the function, we have an object reference (`this`) which jQuery has bound to the specific <a> DOM element for the current iteration. It’s pretty common (in fact, I’d say much more than not) that we want the jQuery’ified reference to `this` rather than the raw DOM element reference.
This is because, like in this example, we like the handy jQuery helpers of `attr(...)` and `addClass(...)` that are available to us if we’re using a jQuery collection object. We don’t really know/care that we’re actually just wrapping a collection object around a single element (line 2) to get those helpers.
However, this is where potentially our biggest performance hit can come in. For reasons that are far beyond the scope of this article, calling `$(...)` and passing in an object is not quite as simple as just wrapping an array/collection around the object.
jQuery has a whole bunch of complex internal functionality for managing collections/traversal/stack, and even though we are only passing in a single object just so we can get the helper functions, all that other machinery must be consulted. Not surprisingly then, `$(...)` is not a particularly quick/efficient function call.
Fortunately, there is an answer to the slowness of `$(...)` for the case where we want to wrap a collection object around only one element. This code is inspired by the helpful each2 jQuery plugin by Ben Alman ( @cowboy):
On line 1, outside the loop, we set up a dummy collection that we can re-purpose for each loop iteration. On line 3, we fake the wrapping of the collection object by direct assignment (instead of the `$(...)` function call), avoiding most of the bad performance. As performance tests show, this type of technique, while a bit uglier, can improve performance by a significant amount.
Of course, as with everything else here, use with caution and careful thought. You should not assume that your faked collection object will necessarily operate exactly the same in terms of adding to the collection, traversing multiple items, etc. But if you’re only ever dealing with one object in the collection, this approach may be something to consider.
The first pitfall is scope lookup, as illustrated here:
In this example, the inner function `baz`, when called (not shown), will need to look for where it can find the variable reference `fun`. It will first look in its local function (`baz`) scope, then the parent function (`bar`) scope, then `foo`s scope, and finally the global scope, where it will find `fun` finally. Scope lookup is handy and convenient, but it does cost a small amount in performance.
Now, our example only uses `fun` once, so it’s not a big deal. But imagine if `fun` were accessed thousands of times inside `baz` (like in a loop; see above); those fractions of milliseconds can add up. It’s unlikely this alone would ever cost you more than a few dozen milliseconds in worst case, but as I said above, the more of these things that happen at once, the more likely you are to noticeably detract from performance. And it’s not difficult to think about such issues as you code, and in some cases, proactively help yourself out by avoiding the problem before it occurs.
In this case, if `baz` is going to need to read the value of `fun` frequently and often, we can declare a local copy of `fun` inside `baz`, which reduces the scope chain lookup for each usage from 4 hops to 1:
Side note: Be careful with making local aliases if you need to change the value of the variable. In this example, since `fun` is a primitive string, the local assignment will be by value-copy rather than by reference, and so changing `fun` inside the `baz` scope will not affect the global copy. Use `window.fun` for such changes, similar to the assignment on line 6.
A close cousin to this pitfall is the prototype-chain lookup. For instance:
Side note: I’m using the new ES5 built-in `Object.create()` as a quick way to illustrate creating objects that are connected via their prototype chain. A shim for browsers that don’t yet support ES5’s `Object.create` is easy to write.
In this example, `baz` is a descendant object (aka, “child”) of `bar`, which is in turn a child of `foo`. As the code stands, `foo` is the object which holds the `fun` property. Contrary to how you may think about prototypal inheritance, `bar` and `baz` do not have a copy of `foo` on them. Instead, all the objects are live-linked via the hidden prototype chain, such that if I ask for `fun` on the `baz` object, and that property is not found, it will traverse up the prototype chain until it finds `fun` (or it dead-ends).
In much the same way as scope traversal above, this prototype-chain traversal has a small cost associated with it. Similarly, if you need to access a property via the prototype chain a bunch of times, you can abate this performance hit by declaring a local variable copy/reference. The same caution from above applies: use this trick only for read access.
Yet another cousin scenario is nested property access, which is a common occurrence when code uses “namespaces” to collect variables/functions together, as seen here:
No surprise, if you are going to make repeated read access to `fun` as it’s nested in that namespace, you can avoid a small performance hit by simply creating a local variable copy of it once and reusing that. Don’t forget: read access, not write.
Before we look at code, let’s make that toll-bridge analogy a bit more colorful as it applies to performance optimization. Imagine you have 1000 kittens that you need to take across a bridge from your house to grandma’s house on the other side. And each time you cross the bridge (both directions), you pay a $1 toll. Your car can fit 50 or so kittens at a time. But you can borrow your neighbor’s trailer, which can hold another 200 kittens. Using that trailer will only cost you $0.50 extra toll.
Would you rather take 20 round-trips across the toll bridge (total $40) in just your car, or would you rather take a little extra time in preparation of each trip, load up 250 kittens in your car+trailer, and only need to make 4 round-trips (for a total cost of $12)? Duh.
That’s the concept we’re pointing at. If you are going to manipulate the DOM, you want to do so with a large batch of operations at once, rather than a bunch of individual DOM manipulation calls.
Or, let’s say that another way, you need to cross the toll-bridge to visit the store to see if they got in a shipment of your favorite cookies yet or not. Do you want to cross the bridge every day? Or would you like to visit the store once, then remember that they said the next shipment wouldn’t be in until next week (and save yourself the unnecessary trips)?
I’m sure this is all completely obvious to most readers, and you probably feel like I’m talking down to you. But you’d really be surprised how easy it is to fall under the spell of the magical and whimsically easy syntax of a library like jQuery, and wistfully forget all your performance concerns as you code away. I’m not promising that this will help you write perfectly performant code, but hopefully this reminder will keep you a little more aware of such issues the next time you open up your code editor.
Consider this code:
Here, we’re performing a very simple task: create a <ul> element, and add one <li> child element to the list for each item in our array.The code above reads pretty straightforwardly. Step 1: create the <ul>, and add it to the page. Step 2: loop through the list, and add a new <li> for each array item. Makes sense, right?
Even though it’s probably obvious to you that the above code has opted for simplicity and readability over some potentially bad performance, you’d be surprised how many seasoned jQuery professionals still write code like that from time to time. Sometimes, we know the rules well enough that we can break them in a willful but cautious way. But other times, we just get to coding quickly, and we don’t see the problem with our small unit-test data of 3 array items.
But, then, we hook this up to live data, and 1000 items get added, and boy does our code start to slow to a crawl. We find ourselves quickly having to go back to this code and refactor. In and of itself, this is a good thing, but refactoring is always more risky (especially if you don’t have proper and full unit-tests to regression-proof your code).
We’ve solved both the read and write cases of performance problems in this code snippet. Look at line 5 in the previous snippet, and you’ll see that each time we wanted to add an <li> item to the list, we re-queried the DOM by its “id” (“#my_list”). In this code snippet, we store a reference to the <ul> element in a local variable `$ul`. Each time we access the element, or make changes to it, we’re using the cached reference, and don’t have to pay the query penalty.
But more than that, we’ve created a <ul> in memory that we’ve not yet added to the DOM. We are free to modify this element directly in memory (kind of in a document fragment if you will), add items to it, etc, and we’re not paying the DOM toll, because the live DOM of the page doesn’t yet know about our in-memory <ul>. Once we’re done with all our changes to the in-memory element, we inject the entire element (along with all its children) into the live DOM with one `appendTo(...)` call.
In other words, we load up our car with all the kittens, and we make one trip across the bridge, paying only one toll. And, while we’re over there, we double-check with the cookie store, just in case.
To reinforce the usefulness of caching a DOM query, and bring several discussed performance hints together, here’s one final snippet to examine:
We only need to obtain a reference to the “my_list” DOM element once, so we do that and cache it in a variable `$list`. All of the click handler functions can share that one reference; no need to re-query for the element. Again, just as above, we create a temporary <ul> called `$tmp_ul` that we add our 100 list items to, then we replace all the children of the live `$list` with the children of our in-memory `$tmp_list`. Finally, notice that we only extract the `rel` attribute from the clicked <a> element once, and reuse that local string variable in our loop instead of re-querying.
Side note: I’m sure many of you jQuery pros are screaming, “Use .live() or .delegate()!” at that last snippet of code. And you’re correct. If you’re going to be doing lots of identical event handling across many DOM elements, it’s much better to delegate to a single handler rather than to 1000. This snippet in only intended to illustrate how the optimizations discussed in this article can be brought together in a single piece of code. Lesson: performance optimization of code has many layers, and it takes a keen eye to evaluate your code (sometimes multiple times) looking for those rough edges to smooth out.
In this article, we looked at a variety of different potential performance bottlenecks and the common coding patterns to look out for. Some of the optimizations suggested are very small, and are unlikely to, on their own, have a noticeable impact on your application’s performance. Other optimizations are huge, and are almost certainly going to slow things down if you ignore them.
But small or big, I hope the more important take-away message here is that awareness of performance issues as you write code is the key. Many times, you can avoid performance pitfalls altogether without much effort, if you’re already thinking “fast by default”. On the other hand, if you never even consider such issues until your application slows to a crawl in front of the whole web audience, you’re going to have a really hard time effectively addressing them in the fire.
Most importantly, don’t just rely on faster browser engines to speed up your code for you, and abdicate your responsibility (and ability) as an engineer to make good choices about when to write smarter, better code from the beginning.
In each of the example snippets shown above, the performance-optimized code was not much uglier or more complicated than the original code. If you know what you’re doing, mature optimization is something every developer can do well. So, what’s your excuse for slow code?
About the Author
Find Kyle on: