Avoiding Hash Lookups in a Ruby Implementation

I had an interesting realization tonight: I'm terrified of hash tables. Specifically, my work on JRuby (and even more directly, my work optimizing JRuby) has made me terrified to ever consider using a hash table in the hot path of any program or piece of code if there's any possibility of eliminating it. And what I've learned over the years is that the vast majority of execution-related (as opposed to data-related, purely dynamic-sourced lookup tables) hash tables are totally unnecessary.

Some background might be interesting here.

Hashes are a Language Designer's First Tool

Anyone who's ever designed a simple language knows that pretty much everything you do is trivial to implement as a hash table. Dynamically-expanding tables of functions or methods? Hash table! Variables? Hash table! Globals? Hash table!

In fact, some languages never graduate beyond this phase and remain essentially gobs and gobs of hash tables even in fairly recent implementations. I won't name your favorite language here, but I will name one of mine: Ruby.

Ruby: A Study in Hashes All Over the Freaking Place

As with many dynamic languages, early (for some definition of "early") implementations of Ruby used hash tables all over the place. Let's just take a brief tour through the many places hash tables are used in Ruby 1.8.7

(Author's note: 1.8.7 is now, by most measures, the "old" Ruby implementation, having been largely supplanted by the 1.9 series which boasts a "real" VM and optimizations to avoid most hot-path hash lookup.)

In Ruby (1.8.7), all of the following are (usually) implemented using hash lookups (and of these, many are hash lookups nearly every time, without any caching constructs):

  • Method Lookup: Ruby's class hierarchy is essentially a tree of hash tables that contain, among other things, methods. Searching for a method involves searching the target object's class. If that fails, you must search the parent class, and so on. In the absence of any sort of caching, this can mean you search all the way up to the root of the hierarchy (Object or Kernel, depending what you consider root) to find the method you need to invoke. This is also known as "slow".
  • Instance Variables: In Ruby, you do not declare ahead of time what variables a given class's object instances will contain. Instead, instance variables are allocated as they're assigned, like a hash table. And in fact, most Ruby implementations still use a hash table for variables under some circumstances, even though most of these variables can be statically determined ahead of time or dynamically determined (to static ends) at runtime.
  • Constants: Ruby's constants are actually "mostly" constant. They're a bit more like "const" in C, assignable once and never assignable again. Except that they are assignable again through various mechanisms. In any case, constants are also not declared ahead of time and are not purely a hierarchically-structured construct (they are both lexically and hierarchically scoped), and as a result the simplest implementation is a hash table (or chains of hash tables), once again.
  • Global Variables: Globals are frequently implemented as a top-level hash table even in modern, optimized language. They're also evil and you shouldn't use them, so most implementations don't even bother making them anything other than a hash table.
  • Local Variables: Oh yes, Ruby has not been immune to the greatest evil of all: purely hash table-based local variables. A "pure" version of Python would have to do the same, although in practice no implementations really support that (and yes, you can manipulate the execution frame to gain "hash like" behavior for Python locals, but you must surrender your Good Programmer's Card if you do). In Ruby's defense, however, hash tables were only ever used for closure scopes (blocks, etc), and no modern implementations of Ruby use hash tables for locals in any way.

There are other cases (like class variables) that are less interesting than these, but this list serves to show how easy it is for a language implementer to fall into the "everything's a hash, dude!" hole, only to find they have an incredibly flexible and totally useless language. Ruby is not such a language, and almost all of these cases can be optimized into largely static, predictable code paths with nary a hash calculation or lookup to be found.

How? I'm glad you asked.

JRuby: The Quest For Fewer Hashes

If I were to sum up the past 6 years I've spent optimizing JRuby (and learning how to optimize dynamic languages) it would be with the following phrase: Get Rid Of Hash Lookups.

When I tweeted about this realization yesterday, I got a few replies back about better hashing algorithms (e.g. "perfect" hashes) and a a few replies from puzzled folks ("what's wrong with hashes?"), which made me realize that it's not always apparent how unnecessary most (execution-related) hash lookups really are (and from now on, when I talk about unnecessary or optimizable hash lookups, I'm talking about execution-related hash lookups; you data folks can get off my back right now).

So perhaps we should talk a little about why hashes are bad in the first place.

What's Wrong With a Little Hash, Bro?

The most obvious problem with using hash tables is the mind-crunching frustration of finding THE PERFECT HASH ALGORITHM. Every year there's a new way to calculate String hashes, for example, that's [ better | faster | securer | awesomer ] than all precedents. JRuby, along with many other languages, actually released a security fix last year to patch the great hash collision DoS exploit so many folks made a big deal about (while us language implementers just sighed and said "maybe you don't actually want a hash table here, kids"). Now, the implementation we put in place has again been "exploited" and we're told we need to move to cryptographic hashing. Srsly? How about we just give you a crypto-awesome-mersenne-randomized hash impl you can use for all your outward-facing hash tables and you can leave us the hell alone?

But I digress.

Obviously the cost of calculating hash codes is the first sin of a hash table. The second sin is deciding how, based on that hash code, you will distribute buckets. Too many buckets and you're wasting space. Too few and you're more likely to have a collision. Ahh, the intricate dance of space and time plagues us forever.

Ok, so let's say we've got some absolutely smashing hash algorithm and foresight enough to balance our buckets so well we make Lady Justice shed a tear. We're still screwed, my friends, because we've almost certainly defeated the prediction and optimization capabilities of our VM or our M, and we've permanently signed over performance in exchange for ease of implementation.

It is conceivable that a really good machine can learn our hash algorithm really well, but in the case of string hashing we still have to walk some memory to give us reasonable assurance of unique hash codes. So there's performance sin #1 violated: never read from memory.

Even if we ignore the cost of calculating a hash code, which at worst requires reading some object data from memory and at best requires reading a cached hash code from elsewhere in memory, we have to contend with how the buckets are implemented. Most hash tables implement the buckets as either of the typical list forms: an array (contiguous memory locations in a big chunk, so each element must be dereferenced...O(1) complexity) or a linked list (one entry chaining to the next through some sort of memory dereference, leading to O(N) complexity for searching collided entries).

Assuming we're using simple arrays, we're still making life hard for the machine since it has to see through at least one and possibly several mostly-opaque memory references. By the time we've got the data we're after, we've done a bunch of memory-driven calculations to find a chain of memory dereferences. And you wanted this to be fast?

Get Rid Of The Hash

Early attempts (of mine and others) to optimize JRuby centered around making hashing as cheap as possible. We made sure our tables only accepted interned strings, so we could guarantee they'd already calculated and cached their hash values. We used the "programmer's hash", switch statements, to localize hash lookups closer to the code performing them, rather than trying to balance buckets. We explored complicated implementations of hierarchical hash tables that "saw through" to parents, so we could represent hierarchical method table relationships in (close to) O(1) complexity.

But we were missing the point. The problem was in our representing any of these language features as hash tables to begin with. And so we started working toward the implementation that has made JRuby actually become the fastest Ruby implementation: eliminate all hash lookups from hot execution paths.

How? Oh right, that's what we were talking about. I'll tell you.

Method Tables

I mentioned earlier that in Ruby, each class contains a method table (a hash table from method name to a piece of code that it binds) and method lookup proceeds up the class hierarchy. What I didn't tell you is that both the method tables and the hierarchy are mutable at runtime.

Hear that sound? It's the static-language fanatics' heads exploding. Or maybe the "everything must be mutable always forever or you are a very bad monkey" fanatics. Whatever.

Ruby is what it is, and the ability to mix in new method tables and patch existing method tables at runtime is part of what makes it attractive. Indeed, it's a huge part of what made frameworks like Rails possible, and also a huge reason why other more static (or more reasonable, depending on how you look at it) languages have had such difficulty replicating Rails' success.

Mine is not to reason why. Mine is but to do and die. I have to make it fast.

Proceeding from the naive implementation, there are certain truths we can hold at various times during execution:

  • Most method table and hierarchy manipulation will happen early in execution. This was true when I started working on JRuby and it's largely true now, in no small part due to the fact that optmizing method tables and hierarchies that are wildly different all the time is really, really hard (so no implementer does it, so no user should do it). Before you say it: even prototype-based languages like Javascript that appear to have no fixed structure do indeed settle into a finite set of predictable, optimizable "shapes" which VMs like V8 can take advantage of.
  • When changes do happen, they only affect a limited set of observers. Specifically, only call sites (the places where you actually make calls in code) need to know about the changes, and even they only need to know about them if they've already made some decision based on the old structure.

So we can assume method hierarchy structure is mostly static, and when it isn't there's only a limited set of cases where we care. How can we exploit that?

First, we implement what's called an "inline cache" at the call sites. In other words, every place where Ruby code makes a method call, we keep a slot in memory for the most recent method we looked up. In another quirk of fate, it turns out most calls are "monomorphic" ("one shape") so caching more than one is usually not beneficial.

When we revisit the cache, we need to know we've still got the right method. Obviously it would be stupid to do a full search of the target object's class hierarchy all over again, so what we want is to simply be able to examine the type of the object and know we're ok to use the same method. In JRuby, this is (usually) done by assigning a unique serial number to every class in the system, and caching that serial number along with the method at the call site.

Oh, but wait...how do we know if the class or its ancestors have been modified?

A simple implementation would be to keep a single global serial number that gets spun every time any method table or class hierarchy anywhere in the system is modified. If we assume that those changes eventually stop, this is good enough; the system stabilizes, the global serial number never changes, and all our cached methods are safely tucked away for the machine to branch-predict and optimize to death. This is how Ruby 1.9.3 optimizes inline caches (and I believe Ruby 2.0 works the same way).

Unfortunately, our perfect world isn't quite so perfect. Methods do get defined at runtime, especially in Ruby where people often create one-off "singleton methods" that only redefine a couple methods for very localized use. We don't want such changes to blow all inline caches everywhere, do we?

Let's split up the serial number by method name. That way, if you are only redefining the "foobar" method on your singletons, only inline caches for "foobar" calls will be impacted. Much better! This is how Rubinius implements cache invalidation.

Unfortunately again, it turns out that the methods people override on singletons are very often common methods like "hash" or "to_s" or "inspect", which means that a purely name-based invalidator still causes a large number of call sites to fail. Bummer.

In JRuby, we went through the above mechanisms and several others, finally settling on one that allows us to only ever invalidate the call sites that actually called a given method against a given type. And it's actually pretty simple: we spin the serial numbers on the individual classes, rather than in any global location.

Related:
1 2 Page 1
Page 1 of 2