Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

Sponsored Links

Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs

How big data platforms should work

Tips for optimizing public cloud infrastructures

  • Print
  • Feedback

Page 2 of 3

Enforcing immutability yields architectural simplicity. A massively distributed storage system across networked nodes must contend with network congestion and component failures. A technically seductive approach is to distribute a single object's data across nodes, with erasure-coded partial redundancy to survive one or more node failures at reasonable economic investment. The limits of physics and metaphysics are captured in Brewer's CAP theorem and elsewhere like in Abadi's PACELC refinement.

These acronyms themselves express the trade-offs inherent in a distributed architecture. "CAP" stands for "consistency or availability under network partition"? "PACELC" is "partition, availability, or consistency, else latency or consistency"? Without going into the math, data writes orchestrated at the grain of an entire object are synchronous (consistency first) or asynchronous (availability first).

Unwittingly, these popular redundant-array-of-independent-nodes (RAIN) architectures force an expensive reconstitution operation across the network for each object read. Furthermore, no meaningful computation can be performed at a storage node because each node has only a fractional view, tesselated arbitrarily by the erasure code chosen. By design, this popular object storage architecture requires non-negotiable network bandwidth to operate on an object.

Joyent Manta Storage and Compute Architecture
The Joyent Manta Storage Service's architecture is a departure from storage-only approaches. Instead, a high-performance compute cluster is included in each storage node. We aimed for a converged storage-and-compute design goal.

Rather than erasure-coding across nodes, Manta uses erasure codes across disks (currently 9+2 across three stripes with three hot standby disks, if you must know). It also uses a default multiple-data-center full-copy redundancy to achieve equal data durability to RAIN systems. The economic overhang for a default two copies is surprisingly negligible compared to common RAIN architectures for standard redundancy services. This suggests a two-fold redundancy or a rich gross margin associated with these services.

Regardless, the architectural limitations exist when distributing objects across nodes. Instead of beginning with a full object copy, a Manta storage node already has the entire object over a very high-bandwidth PCIe 3.0 bus and a theoretical 256Gbps, and it can perform meaningful computation on that object with minimal data movement latency.

What types of meaningful computation, you ask? The biggest use cases we'ave seen are delightfully poetic one-liners operating on large data sets. They may extract user or other cohort data from logs for aggregate, operational, billing, sentiment, intention, or geographic insights. Another use case leverages the Internet-visible nature of Manta: bulk in-place media transcoding. These include more prosaic needs like e-commerce catalog thumbnail creation or video format changes. We're beginning to see higher-level machine learning and classification computations that exploit Manta's map/reduce framework even further.


  • Print
  • Feedback