The other night I ran into Jackie Barbetta and Don Rosenberg at our local Apache Spark meetup group. Barbetta and Rosenberg work in IBM Emerging Technologies on an open source project called EclairJS.
Why on earth would you want to do that?
Streaming analytics aside, merely having access to a stream processing system -- more particularly, stream processing with an appropriate storage mechanism -- is a key advantage for Node.js developers. If you, the Node.js developer, recently discovered that neither MySQL nor MongoDB are the best choices for high-end data streams, Spark and HBase might turn out to be your new best friends.
How EclairJS works
This component wrapper connects to the Jupyter Gateway to Apache Toree on the server-side. Toree is an implementation of the iPython protocol, but is (clearly) not limited to Python and has its roots in the Jupyter/iPython project. Toree is more or less an RPC wrapper for Spark.
What's the catch?
Next, a lot of the type marshaling is limited to primitive or noncomplex types, so you may have to do your own stunts in marshaling/unmarshaling JSON. There is no “require” on the Nashorn side, only “load” -- if you dreamt there would be no learning curve or places you never have to try real hard, you’ll be disappointed.
Where you can find EclairJS and what it can do now
A small team of IBMers is doing most of the development for EclairJS, and they’d love your help. With the hotness that is Node.js and the hotness that is Spark, this could be a good place for you to distinguish yourself by contributing. Or if you were good at the maths but ended up as a Web developer and got into Node.js, maybe this is where you can show your data science chops without having to learn Scala or Python.