|
|
Gilad makes
the case that static, that staple of C++, C#/VB.NET, and Java, does not belong:
Most imperative languages have some notion of static variable. This is unfortunate,
since static variables have many disadvantages. I have argued against static state
for quite a few years (at least since the dawn of the millennium), and in Newspeak,
I’m finally able to eradicate it entirely.
I think Gilad conflates a few things, but he's also got some good points. To the dissecting
table!
To begin:
Static variables are bad for security. See the E
literature for extensive discussion on this topic. The key idea is that static
state represents an ambient capability to do things to your system, that may be taken
advantage of by evildoers.
Eh.... I'm not sure I buy into this. For evildoers to be able to change static state,
they have to have some kind of "poke" access inside the innards of your application,
and if they have that, then just about anything is vulnerable. Now, granted, I haven't
spent a great deal of time on the E literature, so maybe I'm missing the point here,
but if an attacker has data-manipulability into my program, then I'm in a whole world
of pain, whether he's attacking statics or instances. Having said that, statics have
to be stored in a particular well-known location inside the process, so maybe that
makes them a touch more vulnerable. Still, this seems a specious argument.
Static variables are bad for distribution. Static state needs to either be replicated
and sync’ed across all nodes of a distributed system, or kept on a central node accessible
by all others, or some compromise between the former and the latter. This is all difficult/expensive/unreliable.
Now this one I buy into, but the issue isn't the "static"ness of the data, but the
fact that it's effectively a Singleton, and Singletons in any distributed system are
Evil. I talked a great deal about this in Effective Enterprise Java, so I'll leave
that alone, but let me point out that any Singleton is evil, whether it's
represented in a static, a Singleton object, a Newspeak module, or a database. The
"static"ness here is a red herring.
Static variables are bad for re-entrancy. Code that accesses such state is not re-entrant.
It is all too easy to produce such code. Case in point: javac. Originally conceived
as a batch compiler, javac had to undergo extensive reconstructive surgery to make
it suitable for use in IDEs. A major problem was that one could not create multiple
instances of the compiler to be used by different parts of an IDE, because javac had
significant static state. In contrast, the code in a Newspeak module definition is
always re-entrant, which makes it easy to deploy multiple versions of a module definition
side-by-side, for example.
Absolutely, but this is true for instance fields, too--any state that is modified
as part of two or more method bodies is vulnerable to a re-entrancy concern, since
now the field is visibly modified state to that particular instance. How deeply do
you want your code to be re-entrant? Gilad's citation of the javac compiler points
out that the compiler was hardly re-entrant at any reasonable level, but the fact
is that the compiler *could* have been used in a parallelized fashion using the isolational
properties of ClassLoaders. (Its ugly, and Java desperately needs Isolates for that
reason.)
Static variables are bad for memory management. This state has to be handed specially
by implementations, complicating garbage collection. The woeful tale of class unloading
in Java revolves around this problem. Early JVMs lost application’s static state when
trying to unload classes. Even though the rules for class unloading were already implicit
in the specification, I had to add a section to the JLS to state them explicitly,
so overzealous implementors wouldn’t throw away static application state that was
not entirely obvious.
This one I can't really comment on, since I'm not in the habit of writing memory-management
code. I'll take Gilad's word for it, though I'm curious to know why this is so, in
more detail.
Static variables are bad for for startup time. They encourage excess initialization
up front. Not to mention the complexities that static initialization engenders: it
can deadlock, applications can see uninitialized state, and unless you have a really
smart runtime, you find it hard to compile efficiently (because you need to test if
things are initialized on every use).
I'm not sure I see how this is different for any startup/initialization code--anything
that the user can specify as part of startup will run the risk of deadlocks and viewing
uninitialized state. Consider the alternative, however--if the user didn't have the
ability to specify startup code, then they would have to either write their own, post-runtime,
startup code, or else they have to constantly check the state of their uninitialized
objects and initialize them on first use, the very thing that he claims is hard to
compile efficiently.
Static variables are bad for for concurrency. Of course, any shared state is bad for
concurrency, but static state is one more subtle time bomb that can catch you by surprise.
Absolutely: any shared state is bad for concurrency. However, I think we need to go
back to first principles here. Since any shared state is bad for concurrency, and
since static data is always shared by definition, it follows that static data is bad
for concurrency. Pay particular attention to that chain of reasoning, however: any shared
state is bad for concurrency, whether it's held by the process in a special non-instance-aligned
location or in an data store that happens to be reachable from multiple paths of control.
This means that your average database table is also bad for concurrency, were it not
for the transactional protections that surround the table. This isn't an indictment
of static variables, per se, but of shared state.
Gilad goes on to describe how Newspeak solves this problem of static:
It may seem like you need static state, somewhere to start things off, but you don’t.
You start off by creating an object, and you keep your state in that object and in
objects it references. In Newspeak, those objects are modules.Newspeak isn’t the only language to eliminate static state. E has also done so, out
of concern for security. And so has Scala, though its close cohabitation with Java
means Scala’s purity is easily violated. The bottom line, though, should be clear.
Static state will disappear from modern programming languages, and should be eliminated
from modern programming practice.
I wish Newspeak were available for widespread use, because I'd love to explore this
concept further; in the CLR, for example, there is the same idea of "modules", in
that modules are singleton entities in which methods and data can reside, at a higher
level than individual objects themselves. Assemblies, for example, form modules, and
this is where "global variables" and "global methods" exist (when supported by the
compiling language in question). At the end of the day, though, these are just statics
by another name, and face most, if not all, of the same problems Gilad lays out above.
Scala "objects" have the same basic property.
I think the larger issue here is that one should be careful where one stores state,
period. Every piece of data has a corresponding scope of accessibility, and developers
have grown complacent about considering that scope when putting data there: they consider
the accessibility at the language level (public, private, what-have-you), and fail
to consider the scope beyond that (concurrency, re-entrancy, and so on).
At the end of the day, it's simple: static entities and instance entities are
just entities. Nothing more, nothing less. Caveat emptor.