Recommended: Sing it, brah! 5 fabulous songs for developers
JW's Top 5
The Simple-Talk newsletter is
a monthly e-zine that the folks over at Red Gate Software (makers of some pretty cool
toys, including their ANTS Profiler, and recent inheritors of the Reflector utility
legacy) produce, usually to good effect.
But this month carried
with it an interesting editorial piece, which I reproduce in its entirety here:
When the market is slack, nothing succeeds better at tightening it up than promoting
serial group-panic within the community. As an example of this, a wave of multi-core
panic spread across the Internet about 18 months ago. IT organizations, it was said,
urgently had to improve application performance by an order of magnitude in order
to cope with rising demand. We wouldn't be able to meet that need because we were
at the "end of the road" with regard to step changes in processor power
and clock speed. Multi-core technology was the only sure route to improving the speed
of applications but, unfortunately, our current "serial" programming techniques,
and the limited multithreading capabilities of our programming languages and programmers,
left us ill-equipped to exploit it. Multi-core mania gripped the industry.However, the fever was surprisingly short-lived. Intel's "largest open-source
effort ever" to provide a standard tool for writing multi-threaded code, caused
little more than a ripple of interest. Various books, rushed out while the temperature
soared, advocated the urgent need for new "multi-core-friendly" programming
models, involving such things as "software pipelines". Interesting as they
undoubtedly are, they sit stolidly on bookshelves, unread.The truth is that it's simply not a big issue for the majority of people. Writing
truly "concurrent" applications in languages such as C# is difficult, as
you get very little help from the language. It means getting involved with low-level
concurrency primitives, such as lock statements and so on.Many programmers lack the skills to do this, but more pertinently lack the need. Increasingly,
programmers work in a web environment. As long as these web applications are deployed
to a load-balanced web farm, then page requests can be handled in parallel so all
available cores will be used efficiently without the need for the programmer to be
concerned with fine-grained parallelism.Furthermore, the SQL Server engine behind these web applications is intrinsically
"parallel", and can handle and use effectively about as many cores as you
care to throw at it. SQL itself is a declarative rather than procedural language,
so it is fundamentally concurrent.A minority of programmers, for example games programmers or those who deal with "embarrassingly
parallel" desktop applications such as Photoshop, do need to start working with
the current tools and 'low-level' coding techniques that will allow them to exploit
multi-core technology. Although currently perceived to be more of "academic"
interest, concurrent languages such as Erlang, and concurrency techniques such as
"software transactional memory", may yet prove to be significant.For most programmers and for most web applications, however, the multi-core furore
is a storm in a teacup; it's just not relevant. The web and database platforms already
cope with concurrency requirements. We are already doing it.
My hope is that this newsletter, sent on April 1st, was intended to be a
joke. Having said that, I can’t find any verbage in the email that suggests that it
is, in which case, I have to treat it as a legitimate editorial.
And frankly, I think it’s all crap.
It's dangerously ostrichian in nature—it encourages developers to simply bury their
heads in the sand and ignore the freight train that's coming their way. Permit me,
if you will, a few minutes of your time, that I may be allowed to go through and demonstrate
the reasons why I say this.
To begin ...
When the market is slack, nothing succeeds better at tightening it up than promoting
serial group-panic within the community. As an example of this, a wave of multi-core
panic spread across the Internet about 18 months ago. IT organizations, it was said,
urgently had to improve application performance by an order of magnitude in order
to cope with rising demand. [...] Multi-core mania gripped the industry.
Point of fact: The “panic” cited here didn’t start about 18 months ago, it started
with Herb Sutter’s most excellent (and not only highly recommended but highly required)
article, “The Free Lunch is Over: A Fundamental Turn Toward Concurrency in Software”,
appeared in the pages of Dr. Dobb’s Journal in March of 2005. (Herb’s website notes
that “a much briefer version under the title “The Concurrency Revolution” appeared
in C/C++ User’s Journal” the previous month.) And the panic itself wasn’t rooted in
the idea that we weren’t going to be able to cope with rising demand, but that multi-core
CPUs, back then a rarity and reserved only for hardware systems in highly-specialized
roles, were in fact becoming commonplace in servers, and worse, as they migrated into
desktops, they would quickly a fact of life that every developer would need to face.
Herb demonstrated this by pointing out that CPU speeds had taken an interesting change
of pace in early 2003:
Around the beginning of 2003, [looking at the website Figure 1 graph] you’ll
note a disturbing sharp turn in the previous trend toward ever-faster CPU clock speeds.
I’ve added lines to show the limit trends in maximum clock speed; instead of continuing
on the previous path, as indicated by the thin dotted line, there is a sharp flattening.
It has become harder and harder to exploit higher clock speeds due to not just one
but several physical issues, notably heat (too much of it and too hard to dissipate),
power consumption (too high), and current leakage problems.
Joe Armstrong, creator of Erlang, noted in a presentation at QCon London 2007 that
another of those physical limitations was the speed of light—that for the first time,
CPU signal couldn't get from one end of the chip to the other in a single clock cycle.
Quick: What’s the clock speed on the CPU(s) in your current workstation? Are you running
at 10GHz? On Intel chips, we reached 2GHz a long time ago (August 2001), and according
to CPU trends before 2003, now in early 2005 we should have the first 10GHz Pentium-family
chips.
Just to (re-)emphasize the point, here, now, in early 2009, we should
be seeing the first 20 or 40 GHz processors, and clearly we’re still plodding along
in the 2 – 3 GHz range. The "Quake Rule" (when asked about perf problems,
tell your boss you'll need eighteen months to get a 2X improvement, then bury yourselves
in a closet for 18 months playing Quake until the next gen of Intel hardware comes
out) no longer works.
For the near-term future, meaning for the next few years, the performance gains in
new chips will be fueled by three main approaches, only one of which is the same as
in the past. The near-term future performance growth drivers are:
- hyperthreading
- multicore
- cache
Hyperthreading is about running two or more threads in parallel inside a single CPU.
Hyperthreaded CPUs are already available today, and they do allow some instructions
to run in parallel. A limiting factor, however, is that although a hyper-threaded
CPU has some extra hardware including extra registers, it still has just one cache,
one integer math unit, one FPU, and in general just one each of most basic CPU features.
Hyperthreading is sometimes cited as offering a 5% to 15% performance boost for reasonably
well-written multi-threaded applications, or even as much as 40% under ideal conditions
for carefully written multi-threaded applications. That’s good, but it’s hardly double,
and it doesn’t help single-threaded applications.Multicore is about running two or more actual CPUs on one chip. Some chips, including
Sparc and PowerPC, have multicore versions available already. The initial Intel and
AMD designs, both due in 2005, vary in their level of integration but are functionally
similar. AMD’s seems to have some initial performance design advantages, such as better
integration of support functions on the same die, whereas Intel’s initial entry basically
just glues together two Xeons on a single die. The performance gains should initially
be about the same as having a true dual-CPU system (only the system will be cheaper
because the motherboard doesn’t have to have two sockets and associated “glue” chippery),
which means something less than double the speed even in the ideal case, and just
like today it will boost reasonably well-written multi-threaded applications. Not
single-threaded ones.Finally, on-die cache sizes can be expected to continue to grow, at least in the near
term. Of these three areas, only this one will broadly benefit most existing applications.
The continuing growth in on-die cache sizes is an incredibly important and highly
applicable benefit for many applications, simply because space is speed. Accessing
main memory is expensive, and you really don’t want to touch RAM if you can help it.
On today’s systems, a cache miss that goes out to main memory often costs 10 to 50
times as much getting the information from the cache; this, incidentally, continues
to surprise people because we all think of memory as fast, and it is fast compared
to disks and networks, but not compared to on-board cache which runs at faster speeds.
If an application’s working set fits into cache, we’re golden, and if it doesn’t,
we’re not. That is why increased cache sizes will save some existing applications
and breathe life into them for a few more years without requiring significant redesign:
As existing applications manipulate more and more data, and as they are incrementally
updated to include more code for new features, performance-sensitive operations need
to continue to fit into cache. As the Depression-era old-timers will be quick to remind
you, “Cache is king.”
Herb’s article was a pretty serious wake-up call to programmers who hadn’t noticed
the trend themselves. (Being one of those who hadn’t noticed, I remember reading his
piece, looking at that graph, glancing at the open ad from Fry’s Electronics sitting
on the dining room table next to me, and saying to myself, “Holy sh*t, he’s right!”.)
Does that qualify it as a “mania”? Perhaps if you’re trying to pooh-pooh the concern,
sure. But if you’re a developer who’s wondering where you’re going to get the processing
power to address the ever-expanding list of features your users want, something Herb
points out as a basic fact of life in the software development world ...
There’s an interesting phenomenon that’s known as “Andy giveth, and Bill taketh away.”
No matter how fast processors get, software consistently finds new ways to eat up
the extra speed. Make a CPU ten times as fast, and software will usually find ten
times as much to do (or, in some cases, will feel at liberty to do it ten times less
efficiently).
... then eking out the best performance from an application is going to remain
at the top of the priority list. Users are classic consumers: they will always want
more and more for the same money as before. Ignore this truth of software (actually,
of basic microeconomics) at your peril.
To get back to the editorial, we next come to ...
However, the fever was surprisingly short-lived. Intel's "largest open-source
effort ever" to provide a standard tool for writing multi-threaded code, caused
little more than a ripple of interest. Various books, rushed out while the temperature
soared, advocated the urgent need for new "multi-core-friendly" programming
models, involving such things as "software pipelines". Interesting as they
undoubtedly are, they sit stolidly on bookshelves, unread.
Wow. Talk about your pretty aggressive accusation without any supporting evidence
or citation whatsoever.
Intel's not big into the open-source space, so it doesn't take much for an open-source
project from them to be their "largest open-source effort ever". (What,
they're going to open-source the schematics for the Intel chipline? Who could read
them even if they did? Who would offer up a patch? What good would it do?) The fact
that Intel made the software available in the first place meant that they knew the
hurdle that had yet to be overcome, and wanted to aid developers in overcoming it.
They're members of the OpenMP group for the same reason.
Rogue Wave's software pipelines programming model is another case where real benefits
have accrued, backed by case studies. (Disclaimer: I know this because I ghost-wrote
an article for them on their Software Pipelines implementation.) Let's not knock something
that's actually delivered value. Pipelines aren't going to be the solution to every
problem, granted, but they're a useful way of structuring a design, one that's curiously
similar to what I see in functional programming languages.
But simply defending Intel's generosity or the validity of an alternative programming
model doesn't support the idea that concurrency is still a hot topic. No, for that,
I need real evidence, something with actual concrete numbers and verifiable fact to
it.
Thus, I point to Brian Goetz’s Java Concurrency in Practice, one of those
“books, rushed out while the temperature soared”, which also turned out to be the
best-selling book at Java One 2007, and the second-best-selling book (behind
only Joshua Bloch’s unbelievably good Effective Java (2nd Ed) ) at Java One
2008. Clearly, yes, bestselling concurrency books are just a myth, alongside the magical
device that will receive messages from all over the world and play them into your
brain (by way of your ears) on demand, or the magical silver bird that can wing its
way through the air with no visible means of support as it does so. Myths, clearly,
all of them.
To continue...
The truth is that it's simply not a big issue for the majority of people. Writing
truly "concurrent" applications in languages such as C# is difficult, as
you get very little help from the language. It means getting involved with low-level
concurrency primitives, such as lock statements and so on.Many programmers lack the skills to do this, but more pertinently lack the need. Increasingly,
programmers work in a web environment. As long as these web applications are deployed
to a load-balanced web farm, then page requests can be handled in parallel so all
available cores will be used efficiently without the need for the programmer to be
concerned with fine-grained parallelism.
He’s right when he says you get very little help from the language, be it C# or Java
or C++. And getting involved with low-level concurrency primitives is clearly not
in anybody’s best interests, particularly if you’re not a concurrency guru like Brian.
(And let’s be honest, even low-level concurrency gurus like Brian, or Joe Duffy, who
wrote Concurrent Programming on Windows, or Mike Woodring, who co-authored Win32
Multithreaded Programming, have better things to do.) But to say that they “pertinently
lack the need” is a rather impertinent statement. “As long as these web applications
are deployed to a load-balanced web farm", which is very likely to continue to
happen, “then page requests can be handled in parallel so all available cores will
be used …”
Um... excuse me?
Didn’t you just say that programmers didn’t need to learn concurrency
constructs? It would strike me that if their page requests are being handled in
parallel that they have to learn how to write code that won’t break when it’s
accessed in parallel or lead to data-corruption problems or race conditions
when their pages are accessed in parallel. If parallelism is a fundamental
part of the Web, don’t you think it’s important for them to learn how to write programs
that can behave correctly in parallel?
Look for just a moment at the average web application: if data is stored in a per-user
collection, and two simultaneous requests come in from a given user (perhaps because
the page has AJAX requests being generated by the user on the page, or perhaps because
there’s a frameset that’s generating requests for each sub-frame, or ...), what happens
if the code is written to read a value from the session, increment it, and store it
back? ASP.NET can save you here, a little, in that it used to establish a per-user
lock on the entirety of the page request (I don’t know if it still does this—I really
have lost any desire to build web apps ever again), but that essentially puts an artificial
throttle on the scalability of your system, and makes the end-users’ experience that
much slower. Load-balancer going to spray the request all over the farm? So long as
the user session state is stored on every machine in the farm, that’ll work... But
of course if you store the user’s state in the SQL instance behind each of those machines
on the farm, then you take the performance hit of an extra network round-trip
(at which point we’re back to concurrency in the database) ...
... all because the programmer couldn’t figure out how to make “lock” work? This is
progress?
The Java Servlet specification specifically backed away from this "lock on every
request" approach because of the performance implications. I heard a fair amount
of wailing and gnashing during the early ASP.NET days over this. I heard the ASP.NET
dev team say they made their decision because the average developer can't figure out
concurrency correctly anyway.
And, by the way folks, this editorial completely ignores XML services. I guess "real"
applications don't write services much, either.
The next part is even better:
Furthermore, the SQL Server engine behind these web applications is intrinsically
"parallel", and can handle and use effectively about as many cores as you
care to throw at it. SQL itself is a declarative rather than procedural language,
so it is fundamentally concurrent.
True… and false. SQL is fundamentally “parallel” (largely because SQL is a non-strict
functional language, not just a “declarative” one), but T-SQL isn’t. And how many
developers actually know where the line is drawn between SQL and T-SQL? More importantly,
though, how many effective applications can be written with a complete ignorance
of the underlying locking model? Why do DBAs spend hours tuning the database’s physical
constructs, establishing where isolation levels can be turned down, establishing where
the scope of a transaction is too large, putting in indexed columns where necessary,
and figuring out where page, row, or table locking will be most efficient? Because
despite the view that a relational database presents, these queries are being executed in
parallel, and if a developer wants to avoid writing an application that requires
a new server for each and every new user added to the system, they need to learn how
to maximize their use of the database’s parallelism. So even if the language is
"fundamentally concurrent" and can thus be relied upon to do the right thing
on behalf of the developer, the implementation isn't, and needs to be understood
in order to be implemented efficiently.
He finishes:
For most programmers and for most web applications, however, the multi-core furore
is a storm in a teacup; it's just not relevant. The web and database platforms already
cope with concurrency requirements. We are already doing it.
This is one of those times I wish I had a time machine handy—I'd love to step forward
five years, have a look around, then come back and report the findings. I'm tempted
to close with the challenge to just let’s come back in five years and see what the
programming language landscape and hardware landscape looks like. But that's too easy
an "out", and frankly, doesn't do much to really instill confidence, in
my opinion.
To ignore the developers building "rich" applications (be they being done
in Flex/Flash, Cocoa/iPhone, WinForms, Swing, WPF, or what-have-you) is to also ignore
a relatively large segment of the market. Not every application is being built on
the web and is backed by a relational database—to simply brush those off and not even
consider them as part of the editorial reveals a dangerous bias on the editor's part.
And those applications aren't hosted in an "intrinsically 'parallel'" container
that developers can just bury their head inside.
Like it or not, folks, the path forward isn't one that you get to choose. Intel, AMD,
and other chip manufacturers have already made that clear. They're not going
to abandon the multicore approach now, not when doing so would mean trying to wrestle
with so many problems (including trying to change the speed of light) that simply
aren't there when using a multicore foundation. That isn't up for debate anymore.
Multicore has won for the forseeable future. And, as a result, multicore is going
to be a fact of the developer's life for the forseeable future. Concurrency is thus
also a fact of the developer's life for the forseeable future.
The web and database platforms “cope” with concurrency requirements by either making
"one-size-fits-all" decisions that almost always end up being the wrong
decision for high-scale systems (but I'm sure your new startup-based idea, like a
system that allows people to push "micro-entries" of no more than 140 characters
in length to a publicly-trackable feed would never actually take off and start carrying
millions and millions of messages every day, right?), or by punting entirely and forcing
developers to dig deeper beneath the covers to see the concurrency there. So if you're
happy with your applications running no faster than 2GHz for the rest of the forseeable
future, then sure, you don't need to worry about learning concurrency-friendly kinds
of programming techniques. Bear in mind, by the way, that this essentially locks you
in to small-scale, web-plus-database systems for the forseeable future, and clearly
nothing with any sort of CPU intensiveness to it whatsoever. Be happy in your niche,
and wave to the other COBOL programmers who made the same decision.
This is a leaky abstraction, full stop, end of story. Anyone who tells you otherwise
is either trolling for hits, trying to sell you something, or striving to persuade
developers that ignorance isn't such a bad place to be.
All you ignorant developers, this is the phrase you will be forced to learn before
you start your next job: "Would you like fries with that?"