A First Taste of InvokeDynamic

1 2 3 4 Page 2
Page 2 of 4

Adding insult to injury, JVMs even run verification against the bytecode you feed them to make sure you're following the rules. One little mistake and zooop...off to the exception farm you go. It's downright unfair.

The traditional way to get around all this rigidity (a technique used heavily even by normal Java libraries, since everyone wants to bend the rules sometimes) is to abstract out the act of "invoking" itself, usually by creating "Method" objects that do the call for you. And oddly enough, the reflection capabilities of the JVM come into heavy play here. "Method" happens to be one of the types in the java.lang.reflect package, and it even has an "invoke" method on it. Even better, "invoke" returns Object, and accepts as parameters an Object receiver and an array of Object arguments. Can it truly be this easy? Well, yes and no.

Using reflection to invoke methods works great...except for a few problems. Method objects must be retrieved from a specific type, and can't be created in a general way. You can't ask the JVM to give you a Method that just represents a signature, or even a name and a signature; it must be retrieved from a specific type available at runtime. Oh, but that's at runtime, right? We're ok, because we do actually have types at runtime, right? Well, yes and no.

First off, you're ignoring the second inconvenience above. Language implementations like JRuby or Rhino, which have interpreters, often simply don't *have* normal Java types they can present for reflection. And if you don't have normal types, you don't have normal methods either; JRuby, for example, has a method object type that represents a parsed bit of Ruby code and logic for interpreting it.

Second, reflected invocation is a lot slower than direct invocation. Over the years, the JVM has gotten really good at making reflected invocation fast. Modern JVMs actually generate a bunch of code behind the scenes to avoid a much of the overhead old JVMs dealt with. But the simple truth is that reflected access through any number of layers will always be slower than a direct call, partially because the completely generified "invoke" method must check and re-check receiver type, argument types, visibility, and other details, but also because arguments must all be objects (so primitives get object-boxed) and must be provided as an array to cover all possible arities (so arguments get array-boxed).

The performance difference may not matter for a library doing a few reflected calls, especially if those calls are mostly to dynamically set up a static structure in memory against which it can make normal calls. But in a dynamic language, where every call must use these mechanisms, it's a severe performance hit.

Build a Better Mousetrap?

As a result of reflection's poor (relative) performance, language implementers have been forced to come up with new tricks. In JRuby's case, this means we generate our own little invoker classes at build time, one per core class method. So instead of calling through our DynamicMethod to a java.lang.reflect.Method object, boxing argument lists and performing type checks along the way, we're able to create a fast, specialized bit of bytecode that does the trick for us.

public org.jruby.runtime.builtin.IRubyObject call(org.jruby.runtime.ThreadContext, org.jruby.runtime.builtin.IRubyObject,
            org.jruby.RubyModule, java.lang.String, org.jruby.runtime.builtin.IRubyObject);
   0: aload_2
   1: checkcast #13; //class org/jruby/RubyString
   4: aload_1
   5: aload 5
   7: invokevirtual #17; //Method org/jruby/RubyString.split:(Lorg/jruby/runtime/ThreadContext;
   10: areturn

Here's an example of a generated invoker for RubyString.split, the implementation of String#split, taking one argument. We pass into the "call" method a ThreadContext (runtime information for JRuby), an IRubyObject receiver (the String itself), a RubyModule target Ruby type (to track the hierarchy during super calls), a String method name (to allow aliased methods to present an accurate backtrace), and the argument. Out of it we get an IRubyObject return value. And the bytecode is pretty straightforward; we prepare our arguments and the receiver and we make the call directly. What would normally be perhaps a dozen layers of reflected logic has been reduced to 10 bytes of bytecode, plus the size of the class/method metadata like type signatures, method names, and so on.

But there's still a problem here. Take a look at this other invoker for RubyString.slice_bang, the implementation of String#slice!:

public org.jruby.runtime.builtin.IRubyObject call(org.jruby.runtime.ThreadContext, org.jruby.runtime.builtin.IRubyObject,
            org.jruby.RubyModule, java.lang.String, org.jruby.runtime.builtin.IRubyObject);
   0: aload_2
   1: checkcast #13; //class org/jruby/RubyString
   4: aload_1
   5: aload 5
   7: invokevirtual #17; //Method org/jruby/RubyString.slice_bang:(Lorg/jruby/runtime/ThreadContext;
   10: areturn

Oddly familiar, isn't it? What we have here is called "wastefulness". In order to provide optimal invocation performance for all core methods, we must generate hundreds of these these tiny methods into tiny classes with everything neatly tied up in a bow so the JVM will pretty please perform that invocation for us as quickly as possible. And the largest side effect of all this is that we generate the same bytecode, over and over again, with only the tiniest of changes. In fact, this case only changes

one thing

: the string name of the method we eventually call on RubyString. There are dozens of these cases in JRuby's core classes, and if we attempted to extend this mechanism to all Java types we encountered (we don't, for memory-saving purposes), there would be hundreds of cases of nearly-complete duplication.

I smell an opportunity. Our first step is to trim all that fat.

Hitting the Wall

Let me tell you a little story.

Little Billy developer wanted to freely generate bytecode. He'd come to recognize the power of code generation, and knew his language implementation was dynamic enough that compiling once would not be optimal. He also knew his language needed to do dynamic invocation on top of a statically-typed language, and needed lots of little invokers.

So one day, Billy's happily playing in the sandbox, building invokers and making "vroom, vroom" sounds, when along comes mean old Polly Permgen.

"Get out of my sandbox, Billy," cried Polly, "you're taking up too much space, and this is *my* heap!"

"Oh, but Polly," said Billy, rising to his feet. "I'm having ever so much fun, and there's lots of room to play on that heap over there. It's oh so large, and there's plenty of open space," he desperately replied.

"But I told you...this is MY heap. I don't want to play over there, because I like playing *right here*." She threw her exceptions at Billy, smashing his invokers to dust. Satisfied by the look of horror on Billy's face, she plopped down right where he had been sitting, and smiled terribly up at him.

Dejected, Billy sulked away and became a Lisp programmer, living forever in a land where data is code and code is data and everyone eats butterscotches and rides unicorns. He was never seen nor heard from again.

This story will be very familiar to anyone who's tried to push the limits of code generation on the JVM. The JVM keeps in memory a large, pre-allocated chunk of reserved space called the "heap". The heap is maintained as a contiguous area of space to allow the JVM's garbage collector to move objects around at will. All objects allocated by the system come out of this heap, which is usually split up into "generations". The "young" generation sees the most activity. Objects that are created and immediately dereferenced (like, abandoned?), never make it out of this generation. Objects that persist longer stick around longer. Some objects live forever and get to the oldest generations, but most objects die an early death. And when they die, their bodies become the grass, and the antelope eat the grass. It's a beautiful circle of life. But why are there no butterscotches and unicorns?

The dirty secret of several JVM implementations, Hotspot included, is that there's a separate heap (or a separate generation of the heap) used for special types of data like class definitions, class metadata, and sometimes bytecode or JITted native code. And it couldn't have a scarier name: The Permanent Generation. Except in rare cases, objects loaded into the PermGen are never garbage collected (because they're supposed to be permanent, get it?) and if not used very, very carefully, it will fill up, resulting in the dreaded "java.lang.OutOfMemoryError: PermGen space" that ultimately caused little Billy to go live in the clouds and have tea parties with beautiful mermaids.

So it is with great reluctance that we are forced to abandon the idea of generating a lot of fat, wasteful, but speedy invokers. And it's with even greater reluctance we must abandon the idea of recompiling, since we can barely afford to generate all that code once. If only there were a way to share all that code and decrease the amount of PermGen we consume, or at least make it possible for generated code to be easily garbage collected. Hmmm.


Now it starts to get cool.

Enter java.dyn.AnonymousClassLoader. AnonymousClassLoader is the first artifact introduced by the InvokeDynamic work, and it's designed to solve two problems:

  1. Generating many classes with similar bytecode and only minor changes is very inefficient, wasting a lot of precious memory.
  2. Generated bytecode must be contained in a class, which must be contained in a ClassLoader, which keeps a hard reference to the class; as a result, to make even one byte of bytecode garbage-collectable, it must be wrapped in its own class and its own classloader.

It solves these problems in a number of ways.

First, classes loaded by AnonymousClassLoader are not given full-fledged symbolic names in the global symbol tables; they're given rough numeric identifiers. They are effectively anonymized, allowing much more freedome to generate them at will, since naming conflicts essentially do not happen.

Second, the classes are loaded without a parent ClassLoader, so there's no overprotective mother keeping them on a short leash. When the last normal references to the class disappear, it's eligible for garbage collection like any other object.

Third, it provides a mechanism whereby an existing class can be loaded and slightly modified, producing a new class with those modifications but sharing the rest of its structure and data. Specifically, AnonymousClassLoader provides a way to alter the class's constant pool, changing method names, type signatures, and constant values.

   public static class Invoker implements InvokerIfc {
        public Object doit(Integer b) {
            return fake(new Something()).target(b);

    public static Class rewrite(Class old) throws IOException, InvalidConstantPoolFormatException {
        HashMap<String,String> constPatchMap = new HashMap<String,String>();
        constPatchMap.put("fake", "real");

        ConstantPoolPatch patch = new ConstantPoolPatch(Invoker.class);
        patch.putPatches(constPatchMap, null, null, true);

        return new AnonymousClassLoader(Invoker.class).loadClass(patch);

Here's a very simple example of passing an existing class (Invoker) through AnonymousClassLoader, translating the method name "fake" in the constant pool into the name "real". The resulting class has exactly the same bytecode for its "doIt" method and the same metadata for its fields and methods, but instead of calling the "fake" method it will call the "real" method. If we needed to adjust the method signature as well, it's just another entry in the constPatchMap.

So if we put these three items together with our two invokers above, we see first that generating those invokers ends up being a much simpler affairs. Where before we had to be very cautious about how many invokers we created, and take care to stuff them into their own classloaders (in case they need to be garbage-collected later), now we can load them freely, and we will see neither symbolic collisions nor PermGen leaks. And where before we ended up generating mostly the same code for dozens of different classes, now we can simply create that code once (perhaps as normal Java code) and use that as a template for future classes, sharing the bulk of the class data in the process. Plus we're still getting the fastest invocation money can buy, because we don't have to use reflection.

Who could ask for more?

Parametric Explosion

I could. There's still a problem with our invokers: we have to create the templates.

Let's consider only Object-typed signatures for a moment. Even if we accept that everything's going to be an Object, we still want to avoid stuffing arguments into an Object[] every time we want to make a call. It's wasteful, because of all those transient Object[] we create and collect, and it's slow, because we need to populate those arrays and read from them on the other side. So you end up hand-generating many different methods to support signatures that don't box arguments into Object[]. For example, the many call signatures on JRuby's DynamicMethod type, which is the supertype of all Ruby method objects in a JRuby runtime:

1 2 3 4 Page 2
Page 2 of 4