|
|
Since we're examining various aspects of the canonical O-O language (the three principals
being C++, Java and C#/VB.NET), let's take in a review of another recent post, this
time on the
use of "new" in said languages.
All of us have probably written code like this:
Foo f = new Foo();
And what could be simpler? As long as the logic in the constructor is simple
(or better yet, the constructor is empty), it would seem that the simplest code is
the best, so just use the constructor. Certainly the MSDN documentation is rife
with code that uses public constructors. You can probably find plenty of public
constructors used right here on my blog. Why invest the effort in writing (and
using) a factory class that will probably never do anything useful, other than call
a public constructor?In his excellent podcast entitled
"Emergent Design: The Evolutionary Nature of Software Development," Scott Bain of
Net Objectives nevertheless makes a strong case against the routine use of public
constructors. The problem, notes Scott, is that the use of a public constructor
ties the calling code to the implementation of Foo as a concrete class. But
suppose that you later discover that there need to be many subtypes of Foo, and Foo
should therefore be an abstract class instead of a concrete class--what then?
You've got a big problem, that's what; a lot of client code that has been making use
of Foo's public constructor suddenly becomes invalid.
I just love it when people rediscover advice that they could have had much earlier,
had they only been aware of the prior art in the field. I refer the curious C#/VB.NET
developer to the book Effective Java, by Joshua Bloch, in which Item 1 states,
"Consider providing static factory methods instead of constructors". Quoting from
said book, we see:
One advantage of static factory methods is that, unlike constructors, they
have names. If the parameters to a constructor do not, in and of themselves,
describe the object being returned, a static factory with a well-chosen name can make
a class easier to user and the resulting client code easier to read. ...A second advantage of static factory methods is that, unlike constructors,
they are not required to create a new object each time they're invoked. This
allows immutable classes (Item 13) to use preconstructed instances or to cache instances
as they're constructed and to dispense these instances repeatedly so as to avoid creating
unnecessary duplicate values. ...A third advantage of static factory methods is that, unlike constructors,
they can return an object of any subtype of their return type. This gives
you great flexibility in choosing the class of the returned object. ...The main disadvantage of static factory methods is that classes without public
or protected constructors cannot be subclassed. The same is true for nonpublic
classes returned by public static factories.A second disadvantage of static factory methods is that they are not readily
distinguishable from other static methods. They do not stand out in API documentation
the way that constructors do.
C# and VB.NET developers are encouraged to read the book to discover about 30 or so
other nuggets of wisdom that are directly applicable to the .NET framework. Note that
Josh is in the process, this very month, of revising the book for rerelease as a second
edition, taking into account the wide variety of changes that have taken place in
the Java language since EJ's initial release.
Meanwhile....
One thing that's been nagging at me is how I think Java and C# missed the boat in
respect to the various ways we'd like to construct objects. The presumption was always
that allocation and initialization would (a) always take place at the same time, and
(b) always take place in the same manner--the underlying system would allocate the
memory, the object would be laid out in this newly-minted chunk of heap, and your
constructor would then initialize the contents. Neither assumption can be taken to
be true, as we've seen over the years; the object may need to come from pre-existing
storage (a la the object cache), or the object may need to be a derived type (a la
the covariant return Josh mentions in #3 advantage above), or in some cases you want
to mint the object from an entirely different part of the process.
C++ actually had an advantage over C# and Java here, in that you could overload operator
new() for a class (which then meant you had to overload operator delete(), and oh-by-the-way
don't forget to overload array new, that is, operator new[]() and its corresponding
twin, array delete, operator delete[](), which was a bit of a pain) to gain better
control over both allocation and initialization, to a degree. Initially we always
used it to control allocation--the idea being one would create a class-specific allocator,
on the grounds that knowing some of the assumptions of the class, such as its size,
would allow you to write faster allocation routines for it. But one of the rarely-used
features of operator new() was that it could take additional parameters, using a truly
obscure syntactic corner of C++:
1: void* operator new(size_t s, const string& message)2: {3: cout << "Operator new sez " << message << endl;4: // allocate s bytes and return; Foo ctor will be invoked automagically5: }6: Foo* newFoo = new ("Howdy, world!") Foo();
Officially, one such overloaded operator was recognized, the placement
new operator, which took a void* as a parameter, indicating the exact location
in which your object was to be allocated and thus laid down. This meant that C++ developers
could allocate from some other part of the process (including shudder a pointer
they'd made up out of thin air) and drop the initialized object right there. While
useful in its own right, placement new opened up a whole new world of construction
options to the C++ developer that we never really took advantage of, since now you
could pass parameters to the construction process without involving the constructor.
That's kind of nifty, in an obscure and slightly terrifying fashion. One thought I'd
always had was that it would be cool if a C++ O/R-M overloaded operator new() for
database-bound objects to indicate which database connection to use during construction:
1: DBConnection conn;2:3: Person* newFoo = new (conn) Person("Ted", "Neward");
Of course, such syntax has the immediate drawback of eliciting
a chorus of "WTF?!?" at the next code review, but still....
Meanwhile, other languages choose to view new as one of those nasty static methods
Gilad dislikes so much, Ruby and Smalltalk being two of them. That is to say, construction
now basically calls into a static method on a class, which has the nice effect of
keeping the number of "special" parts of the language to a minimum (since now "new"
is just a method, not a keyword), makes it easier to have different-yet-similar names
to represent slightly different concepts ("create" vs "new" vs "fetch" vs "allocate",
and so on) sitting side by side, and helps eliminate Josh's second disadvantage above.
I'm not certain how exactly this could eliminate Josh's first disadvantage (that of
inheritance and inaccessible constructors), but it's not entirely unimaginable that
the language would have a certain amount of incestuous knowledge here to be able to
reach those static method (constructors) in the same way it does currently.
(It actually works better if they aren't static methods at all, but instance methods
on class objects, to which the language automatically defers when it sees a "classname.new";
that is, when it sees
Person ann = Person.new("Ann", "Sheriff");
the language automatically changes this to read:
Person ann = Person.class.new("Ann", "Sheriff");
which would be eminently doable in Java, were class objects available for modification/definition
somehow. In a language built on top of the JVM or CLR, the class object would be a
standalone singleton, a la "object" definitions in Scala.)