Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

JavaWorld Daily Brew

Modular Toolchains

 

During the Lang.NET Symposium, a couple of things "clicked" all simultaneously, giving
me one of those "Oh, I get it now" moments that just doesn't want to leave you alone.

During the Intentional Software presentation, as the demo wound onwards I (and the
rest of the small group gathered there) found myself looking at the same source code,
but presented in a variety of new ways, some of which appealed to me as the programmer,
others of which appealed to the mathematicians in the room, others of which appealed
to the non-programmers in the room. (I heard one of the Microsoft hosts, a non-technical
program manager, I think, say, "Wow, even I could understand that spreadsheet view,
and that was writing code?")

During the spreadsheet-written-in-IronPython presentation (ResolverOne), we were essentially
looking at new ways of writing IronPython code, thus leveraging all the syntactic
power of a programming language with a nicer front end.

During the aspect-oriented talk (the one by Stefan Wenig and Fabian Schmeid), we found
ourselves looking at a tool that essentially takes compiled assemblies and weaves
in additional code based on descriptors from outside that codebase; in essence, just
another aspect-oriented tool.

But combine this with my own investigations into Soot, LLVM, Parrot, and Phoenix,
alongside the usual discussions around the DLR, CLR, JVM and DaVinci machine, couple
that with the presentation Harry gave about parser expression grammars and the research
in the functional community into parser combinators, throw in the aspect-oriented
and metaprogramming facilities that the Rubyists and other dynamic linguists go on
for days about, and what do you end up with?

Folks, the future is in modular toolchains.

This is an oversimplification, and a radical oversimplification at that, but imagine
for a moment:

  1. A parser takes your source code (let's assume it is Java, just for grins) and builds
    an AST out of it. Not an AST that's inherently deeply coupled to the Java language,
    mind you, but a general-purpose one that stands as a union of Java, C#, C++, Perl,
    Python, Smalltalk, and other languages. (Note that some of the linguistic concepts
    in some of those languages may not end up in this AST, but instead operate on the
    AST itself, a la C++'s template facilities.) Said parser is now finished, and can
    either output a binary (or potentially XML, though it'd probably be hideously verbose)
    version of this AST to disk for later consumption, or would more than likely be passed
    directly along to the next beast in the chain.

  2. In the simplest scenario, the next beast would be a code generator, which takes the
    AST and seeks to export some kind of back-end code out of it. Here, since we're working
    with a general-purpose AST, we can assume that this back-end is flexible and open,
    a la the Phoenix toolkit (where either native or MSIL can be generated).

  3. In a slightly more complicated scenario, verification of the correctness of the AST
    (against whatever libraries are specified) is checked, usually prior to code-gen,
    thus making this particular toolchain a statically-checked chain; were verification
    left out, it would need to happen at runtime, in which case we'd be talking about
    a dynamically-checked chain.
    Note that I stay away from the term "statically-typed"
    or "dynamically-typed" for the moment. That would be a measurement of the parser,
    not the verifier. Verification still occurs in a lot of these dynamically-typed languages,
    just as it does in statically-typed languages.

    Assuming the verification
    process succeeds, the AST can be again, written out or passed to the next step in
    the chain.

  4. Another potential step in the process, usually post-parser and pre-verification, would
    be an "aspect" step, in which a tool takes the AST, consults some external descriptors,
    and modifies the AST based on what it finds there. (This is how most of your non-AspectJ-like
    AOP tools work today, except that they have to rebuild the AST from compiled .class
    files or assemblies first.)

  5. Naturally, another step in the process would be an optimize step, but this has to
    be considered carefully, since some "high-level" optimizations can be done without
    regard to code-gen backend, and some will need to be done with regard to code-gen
    backend; for example, register spill is (from what I've heard, can't say I know too
    much about this) generally only useful if you know how many registers you're targeting.
    Plus, it's not hard to imagine certain optimizations that are only generally useful
    on the x86 architecture, versus those that are useful on other CPU platforms. Even
    operating systems I would imagine would have an impact here. (It turns out that many
    compiler toolchains go through a dozen or so optimization steps today, so it's not
    hard to imagine a "code-gen backend" being a series of a half-dozen or so targeted
    optimization steps before actually generating code.)

  6. Bear in mind, too, that these ASTs should have enough information to be directly executable,
    thus giving us an interpreter back-end instead of a code-generation back-end, a la
    the DLR instead of the CLR.

  7. Also, given the standard AST format, it would be relatively trivial to create a whole
    series of different "parser"s to get to the AST, along the lines of what the Intentional
    Software guys have created, thus blowing open the whole concept of "DSL" into areas
    that heretofore have only been imagined. You still get the complete support of the
    rest of the toolchain, which is what makes the whole DSL concept viable in the first
    place, including aspects and verification and your choice of either interpretation
    or compilation.

  8. While we're at it, bear in mind that this AST could/should also be reachable from
    within the code itself, thus giving languages that want to operate on their own AST
    at runtime the ability to do so, because the AST is in a standard format and the interpreter
    could be bundled as part of the generated executable, thus providing a compile-when-you-can-interpret-when-you-must
    flavor that is currently the reigning meme in language/platform environments like
    JRuby. (It would also have the happy side effect of making Paul Graham shut up about
    Lisp, at least for a while. Yes, Paul, code-as-data, it's brilliant, it's wonderful,
    we get it.)

  9. Nothing says this toolchain needs be one-way, by the way: many of the toolkits I mentioned
    before (LLVM, Phoenix, Soot) can start from compiled binary and work back to AST,
    thus offering us the opportunity to do surgery of either the exploratory kind (static
    analysis) or the manipulative kind (aspect-weaving, etc) on compiled code in a relatively
    clean way. Reflector demonstrates the power of being able to go "back and forth" in
    this way (even in the relatively limited way Reflector does so), so imagine how powerful
    it would be to do this from end-to-end throughout the toolchain.

How likely is this utopian vision? I'm not sure, honestly--certainly tools like LLVM
and Phoenix seem to imply that there's ways to represent code across languages in
a fairly generic form, but clearly there's much more work to be done, starting with
this notion of the "uber-AST" that I've been so casually tossing around without definition.
Every AST is more or less tied to the language it is supposed to represent, and there's
clearly no way to imagine an AST that could represent every language ever invented.
Just imagine trying to create an AST that could incorporate Java, COBOL and Brainf*ck,
for example. But if we can get to a relatively stable 80/20, where we manage to represent
the most-commonly-used 80% of languages within this AST (such as an AST that can incorporate
Java, C#, and C++, for starters), then maybe there's enough of a critical mass there
to move forward.

Now all I need to do is find somebody who'll fund this little bit of research... anybody
got a pile of cash they don't know what to do with? :-)

Update: By the way, in case you want a graphical depiction of what
I'm thinking about, the
Phoenix page has one
(though obviously it's limited to the Phoenix scope of vision,
and you may have to be a Microsoft CONNECT member to see it).





Enterprise consulting, mentoring or instruction. Java, C++, .NET or XML services.
1-day or multi-day workshops available. Contact
me for details
.