C#: A language alternative or just J--?, Part 2

The semantic differences and design choices between C# and Java

In Part 1 of this series, I covered the similarities between C# and Java, explained C#'s role in .Net, speculated on the possible outcomes of using multiple languages in application development, and looked at some high-level differences between Java and C#. This article covers C# language constructs in depth, points out various language features, and provides short pieces of sample code that illustrate C# programming principles. It concludes with a rant on C# and the tension between proprietary interests on one hand and open standardization on the other.

C# features

Some of the differences between C# and Java are simply cosmetic. In particular, there's the question of capitalization. I don't know for sure, but I'm guessing that the capitalization conventions in C# come from Delphi, as I suspect the WriteLine method also does. The capitalization in C#, with its Main() method and string built-in type, feel odd to the Java programmer. Are these conventions simply lifted from Delphi, or are they used rather to make C# look, on the surface, less like Java? If the reason was the latter, I'm afraid it wasn't very successful. The effect is rather like putting lipstick on your dog. He's still your dog, only with lipstick, and he doesn't necessarily look any better or worse for it.

Fortunately, C# has several features that go beyond the cosmetic. Some of these, such as enumerated values ("enums"), are simple "syntactic sugar," providing self-documentation and arguably clearer source code. Other features, such as delegates and events, are quite useful (though implemented in a confusing way) and provide functionality built into the language that, in Java, requires coding and a nontrivial understanding of Java design patterns.

This section will cover several interesting C# features.

Type system

Several features of the C# type system are interesting. Primitive values can always be treated as objects. There are a few class member types (properties, structs, enumerations, and delegates) that do not exist in Java. C# arrays work differently from Java arrays. Operators are overloadable in a fashion similar, but not identical, to that of C++. You can access indexed collections through a mechanism called an indexer, which works similarly to an operator.

Primitive/object unification

One feature of the C# type system that is bound to be popular is the use of all primitive types as objects. In fact, you can use primitive literal values, such as strings and integers, as if they were objects, without first constructing a String or Integer object in code, as is necessary in Java. So, for example, the Java code:

Integer iobj = new Integer(12);
System.out.println(iobj.toString());

could be expressed in C# more clearly as:

Console.WriteLine(12.ToString());

You'll notice in this example that the literal 12 can be used as an object. Since every object in C# is a subclass of class object and every literal value is an object in its own right, it is always possible to call any method of class object against a primitive variable or primitive literal value. Many Java programmers feel that the distinction between primitive value objects and primitive values is an unfortunate one in Java, and the designers of C# apparently agree. The processes of automatically converting a primitive value into an object and vice versa in C# are called boxing and unboxing, respectively.

Array types

Array types differ somewhat between C# and Java. Both languages support simple and multidimensional arrays. In both languages, multidimensional arrays can be arrays of arrays, as shown in Table 1.

JavaC#
int[] arrayOfInt = new
int[10];
int[] arrayOfInt = new
int[10];
int[][] multiArray = new
int[10][3];
int[][] multiArray = new int[10][];
for (int i = 0; i < 10; i++) {
   multiArray[i] = new int[3];
}
// Valid in both
C# and Java
int[][] multiArray2 = new int[][] {
   new int []       { 1 },
   new int []    { 1, 2, 1 },
   new int []   { 1, 3, 3, 1 }
};
int[][] multiArray2 = new int[3][];
multiArray2[0] =     { 1 };
multiArray2[1] =  { 1, 2, 1 };
multiArray2[2] = { 1, 3, 3, 1 };
Table 1. Java and C# array definition and initialization

Table 1 shows the differences in initialization syntax for Java and C# arrays. Oddly, Java's initialization syntax is more compact than C#'s, but C# has rectangular arrays that Java lacks. The following example shows the definition of two rectangular arrays in C#:

int[,] r2dArray  = new int[,] { {1, 2, 3}, 
{2, 3, 4} };
int[,,] r3dArray = { { { 1, 2 },
                       { 3, 4 } },
                     { { 5, 6 },
                       { 7, 8 } } };

You'll notice that the second preceding example, r3dArray, uses an acceptable shorthand (omitting new int [,,]) to initialize the array. The C# specification states that the element type in the array and the number of dimensions define the array type, but the size of each dimension does not. For example, int[4][3][5]and int [3][2][1] are the same type, because they have the same base type and dimensions.

Arrays are reference types, meaning they do not copy on assignment. All arrays inherit from the System.Array base class, and therefore inherit all of that class's properties, including the Length property. Since C# arrays are collection types, you can use them with the C# foreach keyword.

Indexers

A class's indexer lets you access any instance of that class as if it were an array. A class may define multiple indexers, each of which differs by the number and type of its arguments. Indexers are very similar to properties, especially in the syntax for defining them, as shown in the following code example:

public class Sparse2DArray {
   private double[][] _values;
   private int _i, _j;
   public Sparse2DArray(int i, int j) {
      _i = i;
      _j = j;
   }
   public double this[int i, int j] {
      set {
         // Check bounds
         if (i < 0 || i >= _i || j < 0 || j >= _j) {
            throw new IndexOutOfRangeException();
         }
         // Create values matrix if it doesn't
exist
         if (_values == null) {
            _values = new double[_i][];
         }
         // Create j'th array if it doesn't
exist
         if (_values[i] == null) {
            _values[i] = new double[_j];
         }
         _values[i][j] = value;
      }
      get {
          // Check bounds
         if (i < 0 || i >= _i || j < 0 || j >= _j) {
            throw new IndexOutOfRangeException();
         }
         // Sparse matrix is zero where no
values exist
         if (_values[i] == null || values[i][j] == 
         null) {
            return 0.0;
         } else {
            return _values[i][j];
         }
      }
   }
};

The indexer is defined with a syntax similar to the definition of a method, with the text: public double this[int i, int j]. The set block for the indexer (everything inside the block within the set { ... }) creates the private _values array, or any subarray of that array, as needed to store the value passed to the indexer. The get block (inside get { ... } is the value present at the given [i, j] position, or zero if the value does not exist. You can index instances of Sparse2DArray with the square bracket notation used for arrays, like so:

Sparse2DArray s2d = new Sparse2DArray(1000, 10);
s2d[512, 6] = 3.14159265;
Console.WriteLine(s2d[12,5]);  // will print
zero

Structs, enumerations, properties, delegates

C# defines several class member types that do not exist in Java: structs, enumerations, properties, delegates. This section describes the function of each type.

Structs

A struct is somewhat like a struct in C, except that a C# struct may have any kind of class member, including constructors and methods; and the default accessibility for struct members is private, rather than public as in C. Like C structs, though, C# structs always copy by value and are therefore both mutable and exempt from dynamic memory management (i.e., garbage collection). Variables that copy by value don't need garbage collection because the memory used to represent them disappears when those variables go out of scope. In many ways, a struct is similar to a class: a struct can implement interfaces and can have the kinds of members that classes can have. But structs can't inherit from other structs.

Interestingly, the scalar types like int and double are implemented in C# as aliases for predefined structs in the System namespace. In other words, when you define an int variable in C#, for example, you're actually defining an instance of System.Int32, which is a struct predefined in C# language. The struct System.Int32, in turn, inherits all of the members of System.Object. (While the language specification says structs can't inherit, it also says they do inherit from System.Object; presumably, this is an exception.) So, basically, every primitive type in C# is also an object, and therefore "object" in C# means something other than the traditional interpretation of "instance of a class."

Since primitives, classes, and structs inherit from System.Object, everything in C# is an object and can be treated as such. This is how C# represents primitives as objects: they already are objects. Yet instances of primitives and structs copy by value, while instances of class copy by reference. Conversion from value types to reference types and vice versa are the boxing and unboxing operations mentioned earlier.

Enums

Another class member type Java does not provide is the enum. While similar to enums in C, C# enums are based on an "underlying type," which can be any signed or unsigned integer type. Enumerations are derived from the built-in class System.Enum, and therefore every enum inherits all of that class's members.

For example, an enum based on an unsigned long integer can be defined as:

enum description: ulong {
   Good,
   Bad,
   Ugly
};

Enums are inherently type-safe and require explicit type casts when they are assigned to and from integer types, even the type from which the enum is derived.

A common idiom in Java for "faking" enums is to define public final ints in an interface and then inherit the interface, like this:

public interface Description  {
   public final int Good = 0,
   public final int Bad = 1,
   public final int Ugly = 2
};
public class Sheriff implements Description {
   protected int _description;
   public Sheriff() {
      _description = Good;
   }
}

While this idiom improves readability in Java, it doesn't provide type safety, nor can methods overload on two such "enums," because they're really integers. C# provides true enumerated types.

Properties

Directly accessing the public data members of objects is generally understood to be poor form. Direct access to a data member of an object inherently breaks data encapsulation and is a maintenance hazard. When a data member is removed from an existing class, any code that accesses that removed member will also need to be fixed. Further, code that accesses a class's public data members relies on a particular implementation of that class. The traditional solution to this problem is to provide "accessor methods," or "getters" and "setters," which provide access to information about the object. The value set retrieved by an accessor method is typically called a "property." For example, a block of text in a word processor might have properties like "foreground color," "background color," and so on. Instead of exposing public Color members, the object representing a block of text could provide getter and setter methods to those properties. This concept of using access methods to encapsulate internal virtual object state is a design pattern that spans object-oriented languages. It isn't limited to C# or Java.

C# has taken the concept of properties a step further by actually building accessor methods into the language semantics. An object property has a type, a set method, and a get method. The set and get methods of the property determine how the property's value is set and retrieved. For example, a TextBlock class might define its background color property in this way:

public class TextBlock {
   // Assume Color is an enum
   private Color _bgColor;
   private Color _fgColor;
   public Color backgroundColor {
      get {
         return _bgColor;
      }
      set {
         _bgColor = value;
      }
   //... and so on...
   }
}

(Notice in the preceding set block _bgColor is set to value, which is a keyword that, in this context, means the value of the property.) Some other object could set or get a TextBlock's backgroundColor property like this:

TextBlock tb;
if (tb.backgroundColor == Color.green) { //
"get" is called for comparison
   tb.backgroundColor = Color.red;  // "set" is
called
} else {
   tb.backgroundColor = Color.blue;  // "set"
is called
}

So, the syntax to access a property looks like a member but is implemented like a method. Either of the get and set methods is optional, providing a way of creating "read only" and "write only" properties.

Some would say that Java has properties, since the JavaBeans specification defines properties in terms of method naming conventions and the contents of JavaBean PropertyDescriptors. While JavaBeans properties do everything C# properties do, JavaBeans properties are not built into the Java language, and so the syntax for using them is a method call:

TextBlock tb;
public class TextBlock {
   private Color _bgColor;
   public Color getBackgroundColor() {
      return _bgColor;
   }
   public Color setBackgroundColor(Color value_) {
      _bgColor = value_;
   }
   // ...etc...
}
TextBlock tb;
if (tb.getBackgroundColor() == Color.green) {
   tb.setBackgroundColor(Color.red);
} else {
   tb.setBackgroundColor(Color.blue);
}

The meaning of the preceding example code in Java and C# is identical, but the syntax in C# is cleaner.

You may have heard that property accessors are simply methods used for accessing objects' internal variables. This idea is a complete misunderstanding of what property accessors truly are. While property accessors often, even usually, simply set and return the value of a private variable, property accessors are most useful when they return a calculated or deferred value. What's more, a property accessor that today returns the value of an internal variable might tomorrow return a value it got from somewhere else, if the class's implementation changed. The classes that access the property don't need to know where the property value came from, as they would if they were accessing an internal variable. Property accessors cannot break encapsulation in the way that public field accessors do. They are simply not the same thing. Whether defined by convention as in Java or built into a language like C#, properties are an excellent way to improve decoupling between classes.

Delegates

Delegates are C#'s answer to C++ function pointers, except that delegates are safe, and function pointers are dangerous. Both Java and C# have done away with function pointers, finding safer ways to maintain references to behavior that is determined at runtime (which is what function pointers are used for in C++).

C# borrowed the concept of delegates from Delphi and Visual J++. A delegate is a reference to a method, plus either a class or an instance against which the method can be called, depending on whether or not the method is static. Delegates are useful whenever behavior needs to be deferred to another object. For example, a sort algorithm might take a delegate as one of its arguments:

// Declaration of the delegate
delegate int Comparator(object a, object b);
class Quicksort {
   static void sort(Comparator c, object[] 
   objectsToSort) {
      // ... quicksort logic leading to a
comparison
         if (c(objectsToSort[left], 
         objectsToSort[pivot]) > 0) {
            // recursive call...
         } else { // ...and so on...
   } };

In the preceding code you see the delegate called Comparator declared and then used as an argument in Quicksort.sort(), which is static. In the "if" statement, the delegate c is called just as if it were a method. This is equivalent to calling through a function pointer in C, except the delegate is type-safe and can't possibly have an illegal value.

The typical Java approach usually involves implementing an interface, like this:

public interface Comparator {
   int compare(Object a, Object b);
}
public class Quicksort  {
   public static void sort(Comparator c, Object[] 
   objectsToSort) {
         // ...quicksort logic...
         if (c.compare(objectsToSort[left], 
         objectsToSort[pivot]) > 0) {
         // ... and so on
      }
   }
};

The object implementing Comparator in the preceding code will commonly be in an inner or even anonymous class. Again, as in the case of properties, the results in Java and C# are quite similar, but the C# syntax is slightly clearer. Delegates are more than just syntactic shorthand for interface calls, however. A delegate can maintain a reference both to a method and to an instance upon which that method can be invoked. You'll see an example of this usage in the next section on events. Also, you'll find a link to a Sun whitepaper on delegates in Resources.

Events and event notification

C# events operate much like Java 1.1+ events, except that C#'s are integrated into the language. For a class to receive an event, it must have a field or property that is a delegate marked with the event keyword. From inside the class, this delegate member is just like any other delegate: it can be called, checked to see if it's null, and so forth. From outside the class, however, there are only two things you can do with it: add or remove a method to the delegate, using the operator+= or the operator-=, respectively. Event delegates apparently act as collections or lists of references to methods and instances.

In essence, an event delegate is a list of methods to be called when the delegate is invoked. In my opinion, this usage is extremely confusing, since in this particular case delegates act like a collection of method pointers, rather than a single method pointer. The language spec does, however, specify that all delegates define addition and subtraction operators, implying that any delegate may be used in this way.

For example, here is the definition of part of a class called MyButton, which can respond to mouse clicks:

public delegate void ClickHandler(object source, 
Event e);
public class MyButton: Control {
   public event ClickHandler HandlerList;
   protected void OnClick(Event e) {
      if (HandlerList != null) {
         HandlerList(this, e);
      }
   }
   //... override other methods of Control...
}

A dialog class containing one of these buttons could then register itself to receive clicks from the button, adding one of its methods to the button's HandlerList with the operator+=, like so:

public class OneWayDialog: Control {
   protected MyButton DialPhoneButton;
   public OneWayDialog() {
      DialPhoneButton = new MyButton("Dial Phone");
      DialPhoneButton.HandlerList += new 
      ClickHandler(DoDial);
   }
   public void DoDial(object source, Event e) {
      // Call method that dials phone
number
   }
}

When a OneWayDialog is created, it creates a MyButton and then adds a delegate to its method DoDial() to the MyButton's HandlerList. When the user clicks the MyButton, the framework calls the MyButton's OnClick() method, which then invokes the HandlerList delegate, resulting in a callback to OneWayDialog.DoDial() against the original OneWayDialog instance.

In Java, this sort of mechanism (called event multicasting) is handled with lists of objects that implement event listener interfaces. Manipulation of these lists of interfaces must be done manually, although there are "support" classes that do it for you. C# manages these "listener lists" under the hood by building them into the delegate's semantics. While writing code that maintains listener lists in Java can be a bit tedious, at least it's clear what's going on. C# "simplifies" the handling of these lists by building it into the language, but at the cost of creating a language feature (the delegate) that behaves like a method pointer sometimes and like a collection of method pointers at other times. This may just be a matter of taste, but personally I think the delegates' semantics are messy and confusing. On the other hand, not everyone fully understands Java event listener interfaces the first time.

Operator overloading

Operator overloading lets you define how operators can be used with their classes. Operations on instances of those classes can then be expressed in expression notation using the operators, rather than as explicit method calls. Operators used in expressions map directly to the class's operator overloading definitions, subject to the language's rules for operator precedence and associativity. Mathematics packages often use operator overloading to provide a more intuitive programming model for their objects. For example, instead of matrixA.matMul(matrixB), the operator called operator* can be overloaded to handle matrix multiply,

yielding matrixA = matrixA * matrixB;.

Unlike Java, C# permits operator overloading, using a syntax almost identical to that of C++. Some operators that can be overloaded in C++, such as operator=, cannot be overloaded in C#. Experienced C++ programmers will have to learn and remember the differing rules for C# operator overloading, despite the almost identical syntax.

In light of their experience with C++, Java's designers decided deliberately to leave operator overloading out of Java. The question of operator overloading in Java is an ongoing debate. Fans of operator overloading claim that leaving it out of the language makes some code unreadable, if not virtually unwritable. Many in the numerics community would like to see operator overloading added to Java. Overloading detractors say that operator overloading is widely abused and that it all too often makes code unmaintainable. For the time being, the free program JFront (whose name is a pun on CFront) provides a preprocessor that implements operator overloading for Java (see Resources).

Methods

C# methods have some features that Java does not. In particular, C# provides several modifiers on method parameters and has keywords for virtual methods and method overriding.

As in C, C# method parameters are value parameters. The ref modifier on a method parameter makes that parameter a reference, so any change to its values is reflected in the caller's variable. The out modifier indicates that the parameter is an output parameter, which is identical to a reference parameter, except that the parameter must be assigned definitely before the method returns. The params keyword allows variable-length argument lists. For example, a method to print an arbitrary number of items might look like this:

#import main
void PrintItems(params object[] values) {
   foreach (object value in values) {
      Console.WriteLine(value);
   }
}

C# methods are, by default, nonvirtual but can be made virtual by explicit use of the keyword virtual. The C# virtual keyword in a base class defines a method as virtual. Subclasses must use the override modifier to override inherited virtual methods. Without the override modifier, a method in a subclass with the same signature as an inherited virtual method simply hides the superclass implementation, although the compiler produces a warning in this case. The keyword new, used as a method modifier, hides the superclass implementation of an inherited virtual method and turns off the error message.

C#'s designers went to a great deal of trouble to create fine-grained control over method overriding and hiding, which interacts in various subtle ways with accessibility (public, private, internal, and so on). In this case, C# is actually more complex than C++. The ref and out method parameter modifiers are an old, useful idea, and even more useful in a network environment. Network object marshaling code can use them to decide which objects to serialize and which to ignore.

Preprocessor

The C preprocessor was not included in Java because it was seen as an invitation to poor coding practice and an enormous source of bugs. C# includes a preprocessor essentially identical to that of C. Several preprocessors for Java are freely available for programmers addicted to this feature.

Native 'unsafe' code

One of C#'s most widely touted features is its direct integration of native code. Code blocks bearing the modifier unsafe can use C#'s pointer types to access unmanaged memory and platform-native resources. There is also a fixed keyword, used to "pin" an object to a specific address and protect it from the garbage collector. The word unsafe indicates to the compiler that the code being called uses memory pointers. While the C# specification targets performance as the reason for using unsafe code, it will probably be used more commonly for accessing the Win32 API and existing DLLs. The Java Native Interface (JNI) allows access to underlying system resources, but the Java language itself does not. JNI provides access through a software layer and in another language, usually C.

By this point, you have a good basic understanding of many C# features, which make it interesting and useful to Windows programmers. But the question remains: Why create this new language? Why not just integrate Java more closely with Windows or, even better, integrate Windows with Java? The next section addresses those questions.

The Standard Open Proprietary architecture

Microsoft has submitted C# to ECMA (European Computer Manufacturers Association), the European standards body that standardized ECMAScript. This is probably a thumb in the eye to Sun, which scrubbed Java standardization plans with both ECMA and ISO last year because of copyright concerns. (It remains to be seen how genuinely and successfully Microsoft will push for C# standardization.) These so-called standardizations of what are essentially proprietary technologies muddy the water for people doing real work in standardization development. True open standards are created in an open forum, with multiple parties providing input as the technology develops. Every high-tech company would like to be in the position of controlling an open proprietary architecture. To paraphrase Nixon aide Chuck Colson, if you've got customers by the architecture, their hearts and minds will follow.

Standards organizations should drive a hard bargain with companies who want to develop proprietary technologies behind closed doors and then come begging for "standardization" long after the time for meaningful community input has passed. Those that fail to do so risk becoming irrelevant repositories for vested interests instead of advocates for open systems and collaborative development. On the other hand, vendors can hardly be expected to contribute heavily to the success of their competitors' infrastructures. Consequently, standards organizations should accept proprietary technologies into their processes only if the originators of those technologies are willing to relinquish veto power over future development.

Standards are not open simply because their specifications are openly available (or, as in the case of ISO, available for a price). True open standards, like XML, are developed openly and collaboratively and often by competing interests. Those whose designs begin proprietarily become true open standards only when the technology originators have no special place at the table in the standards organization.

For example, Microsoft and Sun are both leaders in the development of XML technology, yet they will have to compete with everyone else on the basis of technical excellence, since the standard was developed in full public view. While other languages such as C and C++ were initially developed behind closed doors, the standards process effectively made them common property, in spirit if not in practice. The same cannot be said for Java, since Sun has retracted its bid for standardization. We'll see what ECMA will do about C#.

Closing remarks

If I were a Windows developer, I would be rejoicing at the creation of C#. It is much easier to use than C++, and yet is more full featured than Visual Basic. MFC programmers, in particular, should flock to this tool. It seems likely that C# will become a major language for developing under the Windows platform. Because of C# creator Anders Hejlsberg's excellent track record, I expect the language to live up to its promises, assuming that Microsoft delivers an implementation that does so. C# solves most of the same problems with C++ that Java solved five years ago, usually in the same way. C# also solves the business problems that Microsoft encountered when it found it could embrace and extend Java, but not extinguish it. And, if Microsoft marketing is to be believed, COM will finally be usable. C# itself is not particularly innovative: there is little in this language that has not been seen before. Still, its skillful combination of these well-understood features will provide good value to Windows programmers. Of course, those not wanting to limit themselves to Windows can still choose from among the many excellent implementations of Java for Windows.

Because of its lack of platform neutrality, C# is in no way a "Java killer." Even leaving aside Sun's five-year head start, and Java's successful capture of the "gorilla" (market-owning) position among enterprise server languages, C#'s Achilles' heel is that it is tied to Windows. Of course, in theory it isn't. But widespread cross-platform implementation of C# is like widespread multivendor implementation of Win32 and COM: possible, in theory.

High-technology consumers today, and especially IT managers, are appropriately wary of vendor lock-in. Encoding procedural information assets in a way that ties them to a particular vendor is a Faustian bargain. The Java platform is neutral with respect to operating systems. If you don't like the service you are getting from one vendor, or if your needs change, you can find another vendor that better meets your requirements. It will be some time before that can be said of C# or .Net. In short, while C# is a fine language for Windows programming, it will be able to compete with Java only when C# is freed from its Windows dependence. For the time being, C# users still won't get to decide where they're going today.

Acknowledgments

Many thanks to fellow ITworld.com author Michael Perry, who performed the technical review for this series, caught several potentially embarrassing mistakes for me, and tried his best to rein in my unruly opinions. Be sure to see Perry's article, as well as my discussion with him about C# versus Java, in Resources.

Mark Johnson is a writer, a consultant, and a JavaBeans columnist and XML guru at JavaWorld. His interests include design patterns, structured data representation, software architecture, enterprise data systems, and codeless software development. He is the president of elucify technical communications (etc), a Colorado-based corporation dedicated to clarifying novel or complex ideas through clear explanation and example. His client list currently includes Sun Microsystems, where he is a contract technical writer.

Learn more about this topic