Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

JavaWorld Daily Brew

Can Dynamic Languages Scale?

 

The recent "failure"
of the Chandler PIM project
generated the question, "Can
Dynamic Languages Scale?"
on TheServerSide, and, as is all too typical these days,
it turned into a "You suck"/"No you suck" flamefest between a couple of posters to
the site.

I now make the perhaps vain attempt to address the question meaningfully.

What do you mean by "scale"?

There's an implicit problem with using the word "scale" here, in that we can think
of a language scaling in one of two very orthogonal directions:

  1. Size of project, as in lines-of-code (LOC)

  2. Capacity handling, as in "it needs to scale to 100,000 requests per second"

Part of the problem I think that appears on the TSS thread is that the posters never
really clearly delineate the differences between these two. Assembly language can
scale(2), but it can't really scale(1) very well. Most people believe that C scales(2)
well, but doesn't scale(1) well. C++ scores better on scale(1), and usually does well
on scale(2), but you get into all that icky memory-management stuff. (Unless, of course,
you're using the Boehm GC implementation, but that's another topic entirely.)

Scale(1) is a measurement of a language's ability to extend or enhance the complexity
budget
of a project. For those who've not heard the term "complexity budget",
I heard it first from Mike Clark (though I can't find a citation for it via Google--if
anybody's got one, holler and I'll slip it in here), he of Pragmatic Project Automation
fame, and it's essentially a statement that says "Humans can only deal with a fixed
amount of complexity in their heads. Therefore, every project has a fixed complexity
budget, and the more you spend on infrastructure and tools, the less you have to spend
on the actual business logic." In many ways, this is a reflection of the ability of
a language or tool to raise the level of abstraction--when projects began to exceed
the abstraction level of assembly, for example, we moved to higher-level languages
like C to help hide some of the complexity and let us spend more of the project's
complexity budget on the program, and not with figuring out which register needed
to have the value of the interrupt to be invoked. This same argument can be seen in
the argument against EJB in favor of Spring: too much of the complexity budget was
spent in getting the details of the EJB beans correct, and Spring reduced that amount
and gave us more room to work with. Now, this argument is at the core of the Ruby/Rails-vs-Java/JEE
debate, and implicitly it's obviously there in the middle of the room in the whole
discussion over Chandler.

Scale(2) is an equally important measurement, since a project that cannot handle the
expected user load during peak usage times will have effectively failed just as surely
as if the project had never shipped in the first place. Part of this will be reflected
in not just the language used but also the tools and libraries that are part of the
overall software footprint, but choice of language can obviously have a major impact
here: Erlang is being tossed about as a good choice for high-scale systems because
of its intrinsic Actors-based model for concurrent processing, for example.

Both of these get tossed back and forth rather carelessly during this debate, usually
along the following lines:

  1. Pro-Java (and pro-.NET, though they haven't gotten into this particular debate so
    much as the Java guys have) adherents argue that a dynamic language cannot scale(1)
    because of the lack of type-safety commonly found in dynamic languages. Since the
    compiler is not there to methodically ensure that parameters obey a certain type contract,
    that objects are not asked to execute methods they couldn't possibly satisfy, and
    so on. In essence, strongly-typed languages are theorem provers, in that they take
    the assertion (by the programmer) that this program is type-correct, and validate
    that. This means less work for the programmer, as an automated tool now runs through
    a series of tests that the programmer doesn't have to write by hand; as one contributor
    to the TSS thread put
    it
    :
    "With static languages like Java, we get a select subset of code
    tests, with 100% code coverage, every time we compile. We get those tests for "free".
    The price we pay for those "free" tests is static typing, which certainly has hidden
    costs."

    Note that this argument frequently derails into the world of IDE
    support and refactoring (as its witnessed on the TSS thread), pointing out that Eclipse
    and IntelliJ provide powerful automated refactoring support that is widely believed
    to be impossible on dynamic language platforms.

  2. Pro-Java adherents also argue that dynamic languages cannot scale(2) as well as Java
    can, because those languages are built on top of their own runtimes, which are arguably
    vastly inferior to the engineering effort that goes into the garbage collection facilities
    found in the JVM Hotspot or CLR implementations.

  3. Pro-Ruby (and pro-Python, though again they're not in the frame of this argument quite
    so much) adherents argue that the dynamic nature of these languages means less work
    during the creation and maintenance of the codebase, resulting in a far fewer lines-of-code
    count than one would have with a more verbose language like Java, thus implicitly
    improving the scale(1) of a dynamic language.

    On the subject of IDE refactoring, scripting language proponents point out that the
    original refactoring browser was an implementation built for (and into) Smalltalk,
    one of the world's first dynamic languages.

  4. Pro-Ruby adherents also point out that there are plenty of web applications and web
    sites that scale(2) "well enough" on top of the MRV (Matz's Ruby VM?) interpreter
    that comes "out of the box" with Ruby, despite the widely-described fact that MRV
    Ruby Threads are what Java used to call "green threads", where the interpreter manages
    thread scheduling and management entirely on its own, effectively using one native
    thread underneath.

  5. Both sides tend to get caught up in "you don't know as much as me about this" kinds
    of arguments as well, essentially relying on the idea that the less you've coded in
    a language, the less you could possibly know about that language, and the more you've
    coded in a language, the more knowledgeable you must be. Both positions are fallacies:
    I know a great deal about D, even though I've barely written a thousand lines of code
    in it, because D inherits much of its feature set and linguistic expression from both
    Java and C++. Am I a certified expert in it? Hardly--there are likely dozens of D
    idioms that I don't yet know, and certainly haven't elevated to the state of intuitive
    use, and those will come as I write more lines of D code. But that doesn't mean I
    don't already have a deep understanding of how to design D programs, since it fundamentally
    remains, as its genealogical roots imply, an object-oriented language. Similar rationale
    holds for Ruby and Python and ECMAScript, as well as for languages like Haskell, ML,
    Prolog, Scala, F#, and so on: the more you know about "neighboring" languages on the
    linguistic geography, the more you know about that language in particular. If two
    of you are learning Ruby, and you're a Python programmer, you already have a leg up
    on the guy who's never left C++. Along the other end of this continuum, the programmer
    who's written half a million lines of C++ code and still never uses the "private"
    keyword is not an expert C++ programmer, no matter what his checkin metrics
    claim. (And believe me, I've met way too many of these guys, in more than just the
    C++ domain.)

A couple of thoughts come to mind on this whole mess.

Just how refactorable are you?

First of all, it's a widely debatable point as to the actual refactorability of dynamic
languages. On NFJS speaker panels, Dave Thomas (he of the PickAxe book) would routinely
admit that not all of the refactorings currently supported in Eclipse were possible
on a dynamic language platform given that type information (such as it is in a language
like Ruby) isn't present until runtime. He would also take great pains to point out
that simple search-and-replace across files, something any non-trivial editor supports,
will do many of the same refactorings as Eclipse or IntelliJ provides, since type
is no longer an issue. Having said that, however, it's relatively easy to imagine
that the IDE could be actively "running" the code as it is being typed, in much the
same way that Eclipse is doing constant compiles, tracking type information throughout
the editing process. This is an area I personally expect the various IDE vendors will
explore in depth as they look for ways to capture the dynamic language dynamic (if
you'll pardon the pun) currently taking place.

Who exactly are you for?

What sometimes gets lost in this discussion is that not all dynamic languages need
be for programmers; a tremendous amount of success has been achieved by creating a
core engine and surrounding it with a scripting engine that non-programmers use to
exercise the engine in meaningful ways. Excel and Word do it, Quake and Unreal (along
with other equally impressively-successful games) do it, UNIX shells do it, and various
enterprise projects I've worked on have done it, all successfully. A model whereby
core components are written in Java/C#/C++ and are manipulated from the UI (or other
"top-of-the-stack" code, such as might be found in nightly batch execution) by these
less-rigorous languages is a powerful and effective architecture to keep in mind,
particularly in combination with the next point....

Where do you run again?

With the release of JRuby, and the work on projects like IronRuby and Ruby.NET, it's
entirely reasonable to assume that these dynamic languages can and will now run on
top of modern virtual machines like the JVM and the CLR, completely negating arguments
2 and 4. While a dynamic language will usually take some kind of performance and memory
hit when running on top of VMs that were designed for statically-typed languages,
work on the DLR and the MLVM, as well as enhancements to the underlying platform that
will be more beneficial to these dynamic language scenarios, will reduce that. Parrot
may change that in time, but right now it sits at a 0.5 release and doesn't seem to
be making huge inroads into reaching a 1.0 release that will be attractive to anyone
outside of the "bleeding-edge" crowd.

So where does that leave us?

The allure of the dynamic language is strong on numerous levels. Without having to
worry about type details, the dynamic language programmer can typically slam out more
work-per-line-of-code than his statically-typed compatriot, given that both write
the same set of unit tests to verify the code. However, I think this idea that the
statically-typed developer must produce the same number of unit tests as his dynamically-minded
coworker is a fallacy--a large part of the point of a compiler is to provide those
same tests, so why duplicate its work? Plus we have the guarantee that the compiler
will always execute these tests, regardless of whether the programmer using it remembers
to write those tests or not.

Having said that, by the way, I think today's compilers (C++, Java and C#) are pretty
weak in the type expressions they require and verify. Type-inferencing languages,
like ML or Haskell and their modern descendents, F# and Scala, clearly don't require
the degree of verbosity currently demanded by the traditional O-O compilers. I'm pretty
certain this will get fixed over time, a la how C# has introduced implicitly
typed variables
.

Meanwhile, why the rancor between these two camps? It's eerily reminiscent of the
ill-will that flowed back and forth between the C++ and Java communities during Java's
early days, leading me to believe that it's more a concern over job market and emplyability
than it is a real technical argument. In the end, there will continue to be a ton
of Java work for the rest of this decade and well into the next, and JRuby (and Groovy)
afford the Java developer lots of opportunities to learn those dynamic languages and
still remain relevant to her employer.

It's as Marx said, lo these many years ago: "From each language, according to its
abilities, to each project, according to its needs."

Oh, except Perl. Perl just sucks, period. :-)

PostScript

I find it deeply ironic that the news piece TSS cited at the top of the discussion
claims that the Chandler project failed due to mismanagement, not its choice of implementation
language. It doesn't even mention what language was used to build Chandler, leading
me to wonder if anybody even read the piece before choosing up their sides and throwing
dirt at one another.





Enterprise consulting, mentoring or instruction. Java, C++, .NET or XML services.
1-day or multi-day workshops available. Contact
me for details
.