The Pain of Broken Subprocess Management on JDK

I prefer to write happy posts...I really do. But tonight I'm completely defeated by the JDK's implementation of subprocess launching, and I need to tell the world why.

JRuby has always strived to mimic MRI's behavior as much as possible, which in many cases has meant we need to route around the JDK to get at true POSIX APIs and behaviors.

For example, JRuby has provided the ability to manipulate symbolic links since well before Java 7 provided that capability, using a native POSIX subsystem built atop jnr-ffi, our Java-to-C FFI layer (courtesy of Wayne Meissner). Everyone in the Java world knew for years the lack of symlink support was a gross omission, but most folks just sucked it up and went about their business. We could not afford to do that.

We've repeated this process for many other Ruby features: UNIX sockets, libc-like IO, selectable stdin, filesystem attributes...on and on. And we've been able to provide the best POSIX runtime on the JVM bar none. Nobody has gone as far or done as much as JRuby has.

Another area where we've had to route around the JDK is in subprocess launching and management. The JDK provides java.lang.ProcessBuilder, an API for assembling the appropriate pieces of a subprocess launch, producing a java.lang.Process object. Process in turn provides methods to wait for the subprocess, get access to its streams, and destroy it forcibly. It works great, on the surface.

Unfortunately, the cake is a lie.

Under the covers, the JDK implements Process through a complicated series of tricks. We want to be able to interactively control the child process, monitor it for writes, govern its lifecycle exactly. The JDK attempts to provide a consistent experience across all platforms. Unfortunately, those two worlds are not currently compatible, and the resulting experience is consistently awful.

We'll start at the bottom to see where things go wrong.

POSIX, POSIX, Everywhere

At the core of ProcessBuilder, inside the native code behind UNIXProcess, we do find somewhat standard POSIX calls to fork and exec, wrapped up in a native downcall forkAndExec:

The C code behind this is a bit involved, so I'll summarize what it does.

  1. Sets up pipes for in, out, err, and fail to communicate with the eventual child process.
  2. Copies the parent's descriptors from the pipes into the "fds" array.
  3. Launches the child through a fairly standard fork+exec sequence.
  4. Waits for the child to write a byte to the fail pipe indicating success or failure.
  5. Scrubs the unused sides of the pipes in parent and child.
  6. Returns the child process ID.

This is all pretty standard for subprocess launching, and if it proceeded to put those file descriptors into direct, selectable channels we'd have no issues. Unfortunately, things immediately go awry once we return to the Java code.

Interactive?

The call to forkAndExec occurs inside the UNIXProcess constructor, as the very first thing it does. At that point, it has in hand the three standard file descriptors and the subprocess pid, and it knows that the subprocess has at least been successfully forked. The next step is to wrap the file descriptors in appropriate InputStream and OutputStream objects, and this is where we find the first flaw.

This is the code to set up an OutputStream for the input channel of the child process, so we can write to it. Now we know the operating system is going to funnel those written bytes directly to the subprocess's input stream, and ideally if we're launching a subprocess we intend to control it...perhaps by sending it interactive commands. Why, then, do we wrap the file descriptor with a BufferedOutputStream?

This is where JRuby's hacks begin. In our process subsystem, we have the following piece of code, which attempts to unwrap buffering from any stream it is given.

The FieldAccess.getProtectedFieldValue call there does what you think it does...attempt to read the "out" field from within FilteredOutputStream, which in this case will be the FileOutputStream from above. Unwrapping the stream in this way allows us to do two things:

  1. We can do unbuffered writes to (or reads from, in the case of the child's out and err streams) the child process.
  2. We can get access to the more direct FileChannel for the stream, to do direct ByteBuffer reads and writes or low-level stream copying.

So we're in good shape, right? It's a bit of hackery, but we've got our unbuffered Channel and can interact directly with the subprocess. Is this good enough?

I wish it were.

Selectable?

The second problem we run into is that users very often would like to select against the output streams of the child process, to perform nonblocking IO operations until the child has actually written some data. It gets reported as a JRuby bug over and over again because there's simply no way for us to implement it. Why? Because FileChannel is not selectable.

FileChannel implements methods for random-access reads and writes (positioning) and blocking IO interruption (which NIO implements by closing the stream...that's a rant for another day), but it does not implement any of the logic necessary for doing nonblocking IO using an NIO Selector. This comes up in at least one other place: the JVM's own standard IO streams are also not selectable, which means you can't select for user input at the console. Consistent experience indeed...it seems that all interaction with the user or with processes must be treated as file IO, with no selection capabilities.

(It is interesting to note that the JVM's standard IO streams are *also* wrapped in buffers, which we dutifully unwrap to provide a truly interactive console.)

Why are inter-proces file descriptors, which would support selector operations just wonderfully, wrapped in an unselectable channel? I have no idea, and it's impossible for us to hack around.

Let's not dwell on this item, since there's more to cover.

Fear the Reaper

You may recall I also wanted to have direct control over the lifecycle of the subprocess, to be able to wait for it or kill it at my own discretion. And on the surface, Process appears to provide these capabilities via the waitFor() and destroy() methods. Again it's all smoke and mirrors.

Further down in the UNIXProcess constructor, you'll find this curious piece of code:

For each subprocess started through this API, the JVM will spin up a "process reaper" thread. This thread is designed to monitor the subprocess for liveness and notify the parent UNIXProcess object when that process has died, so it can pass on that information to the user via the waitFor() and exitValue() API calls.

The interesting bit here is the waitForProcessExit(pid) call, which is another native downcall into C land:

There's nothing too peculiar here; this is how you'd wait for the child process to exit if you were writing plain old C code. But there's a sinister detail you can't see just by looking at this code: waitpid can be called exactly once by the parent process.

Part of the Ruby Process API is the ability to get a subprocess PID and wait for it. The concept of a process ID has been around for a long time, and Rubyists (even amateur Rubyists who've never written a line of C code) don't seem to have any problem calling Process.waitpid when they want to wait for a child to exit. JRuby is an implementation of Ruby, and we would ideally like to be able to run all Ruby code that exists, so we also must implement Process.waitpid in some reasonable way. Our choice was to literally call the C function waitpid(2) via our FFI layer.

Here's the subtle language from the wait(2) manpage (which includes waitpid):

RETURN VALUES
     If wait() returns due to a stopped or terminated child
     process, the process ID of the child is returned to the
     calling process.  Otherwise, a value of -1 is returned
     and errno is set to indicate the error.

     If wait3(), wait4(), or waitpid() returns due to a
     stopped or terminated child process, the process ID of
     the child is returned to the calling process.  If there
     are no children not previously awaited, -1 is returned
     with errno set to [ECHILD].  Otherwise, if WNOHANG is
     specified and there are no stopped or exited children,
     0 is returned. If an error is detected or a caught
     signal aborts the call, a value of -1 is returned and
     errno is set to indicate the error.
Related:
1 2 Page 1
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.