Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

JavaWorld Daily Brew

The Well-Grounded Java Developer

What does it mean to be a Java/JVM developer today and in the future? This blog explores Java 7+, languages on the JVM and new approaches to SDLC topics


Snake Oil Salesmen, Java, General Relativity and Low-Latency Trading

 

I've always wanted to write a post which had that as the title.

Sharp practices abound in this world, and one of the most disturbing aspects to them is the extent to which the media (and people in general) uncritically repeat "information" or "news" that is really just a press release by a commercial interest.

So, I was pleased to see the launch of Churnalism (http://churnalism.com/), a website where you can cut and paste the text of a news "story" and see how much of it was pulled, verbatim, from a press release.

What does all this have to do with Java and low-latency trading?
Well, the technology that underpins low-latency trading is sometimes seen as something of a Dark Art. Certainly, it's not well-understood by most technology journalists / pundits. On the other hand, it's seen as a sexy subject - and a good way to drum up page views. Faced with this situation, recycling press releases seems like a good solution to the overworked journalist.

Let's consider this story from a financial tech news site near you. I'll keep it anonymous to spare their blushes:

"Speaking at a London conference on Tuesday, Donal Byrne, chief executive of Corvil, a high-speed trading technology company ... suggested that trading speeds could be reduced to picoseconds in the not too distant future."

Before I go any further, I'd better explain that I’m not knocking Corvil as a company - a lot of their tech is very sound. I think it’s one of the better network monitoring solutions in the market. I've met some of their engineers - and they seem to have their heads screwed on. The subject I want to address is the hyperbole and outlandish claims that many vendors make, and the extent to which this seems to be much worse in the low-latency space.

To make sense of this claim, let's review how Java thinks about time. There are two methods - currentTimeMillis() and nanoTime() - both in the System class. The VM needs to ask the OS what the time is, so these methods are implemented as native code (as they boil down to system calls in a fairly straightforward fashion).

currentTimeMillis() will give an answer in milliseconds, so for most everyday needs, it's perfect. It's accurate up to a thousandth of a second, and there's probably some sort of network time protocol that keeps it in synch with the rest of the world to a resonable tolerance.

The situation with nanoTime() is a bit different. It returns a counter value, rather than a time, so to use it, you have to measure a time duration, and subtract one counter value from another, to get a time in nanoseconds (a nanosecond is a billionth of a second, and equals 1000 picoseconds). The reason it's implemented like this is because on most hardware / OSs, what's being read is the CPU cycle counter, not an actual high-precision clock.

There are all sorts of things which can go wrong when using a cycle counter instead of a true clock - different CPUs can tick at different rates, powersaving, etc, etc. Then, there's the CPU clock speed to think about. A 2GHz clock means that the clock will cycle 2 billion times a second. That means it will take 500 picoseconds to complete a single cycle. The fastest production chips ever produced still take around 200 picoseconds to complete a cycle.

One useful yardstick is the "distinguishability" time. This is the minimum time that can occur between two events and an ordinary bit of code in userspace can still say that they happened at different times. If two events happen closer together than the distinguishability time, then non-privileged code probably can't tell that they were really ever separate events. Depending on your system, this is probably in the range 10-100ns (or a bit less for specially-tuned low-latency kit).

Let's turn to Relativity. One well-known result is that the minimum latency between London and New York down a fibre-optic cable is around 27.5ms. This is a pretty straightforward consequence of Einstein's Special Relativity (because the speed of light in glass is 200,000 km per sec).

But if we're to believe Corvil's numbers, we will need to deal with the full glory and mathematical sophistication of General Relativity. Why? Because a nanonsecond really isn't very long at all and is short enough that we can observe general relativistic effects.

Let's do a quick example. Take two datacentres, one at sea level (London) and one at 500m above sea level (Zurich). General Relativity tells us that Zurich will experience more time than London.

Let me say that again: A person in Zurich  will actually experience more time (ie will age faster) than someone in London. How much time? Well, as we're talking about close to sea-level, we can use this equation for how much extra time Zurich will see compare to London:

T - T0 = T k g / c * c

Let's plug in T = 1 year and k = 500m (Zurich’s height above sea level), with g being gravitational accelaration and c the speed of light. The answer we get is that Zurich experiences about 1.72 microsecs extra per year that London simply doesn't. That's almost 5000 picoseconds per day.

While it's possible that Corvil have built network taps that are sufficiently sensitive that they've catered for all possible sources of error in a datacentre setting (including those induced by General Relativity!) - and they can detect someone swapping out a 5m patch cable for a 10m - I have to be frank - I doubt it.

In fact, the Corvil press release for their new tech uses a slightly slippery language trick. It refers to "granualarity" of the measurement. This is utterly meaningless. I can quote answers to as many decimal places as I like, but if they’re only accurate to 2 dp, then the granularity of the answer I quoted means nothing.

Every desktop Windows machine and Linux server in the world can report time to a granularity of nanoseconds. The accuracy, however, is another story. With care, you might be able to get the accuracy down to a couple of microseconds or so, and with specialized timing kit, you can get lower. This is really ticklish stuff, however, and going any lower is really quite difficult.

Unfortunately, that's one of the problems with high-performance and low-latency trading tech. So many people involved in it are determined to have an edge (and be seen to have one) - and jealously guard their ideas and numbers and tech that no-one dares to suggest that actually the emperor is somewhat scantily-clad. How many times are these type of latency numbers actually backed up with a proper statistical analysis? How many vendors have actually even established confidence intervals for their latency numbers, never mind releasing them to their customers? Some vendors even attempt to force non-disclosure agreements onto unwary purchasers which forbid publication of performance testing numbers.

Blatant Plug Alert!: You can read more about timing and its impact on performance tuning in Chapter 6 of our book: http://www.manning.com/evans/