Java language oddities

Java's handling of arrays and a few other language elements may surprise you

While learning Java, you'll occasionally encounter a language behavior that leaves you puzzled. For example, what does expression new int[10] instanceof Object returning true signify about arrays? In this post, I'll examine some of Java's language oddities.

Arrays are objects

A long time ago, while writing about message formatters, I encountered something strange in Java's java.text.MessageFormat standard library class. Consider the following pair of formatting methods:

  • StringBuffer format(Object[] arguments, StringBuffer result, FieldPosition pos)
  • StringBuffer format(Object arguments, StringBuffer result, FieldPosition pos)

According to the Javadoc, either method formats an array of objects. Wait a minute! How can you pass an array of objects to Object arguments? Is this a Javadoc misprint? The answer is no: you can pass an array of objects to this parameter.

The Java Language Specification explains this oddity. Section 10.1. Array Types states (in the fine print) that Object is also a supertype of all array types. Hence, each of the following lines of code will output true:

System.out.println(new int[10] instanceof Object);
System.out.println(new String[] { "A", "B" } instanceof Object);

I've created an ArraysAreObjects application that demonstrates arrays being objects. Listing 1 presents the application's source code.

Listing 1. ArraysAreObjects.java (version 1)

public class ArraysAreObjects
{
   public static void main(String[] args)
   {
      print(new String[] { "A", "B", "C" });
      print("Hello");
      print(new int[] { 1, 2, 3 });
      print(new Integer[] { 1, 2, 3 });
   }

   static void print(Object objects)
   {
      if (objects instanceof Object[])
         for (Object object: (Object[]) objects)
            System.out.println(object);
      else
         System.out.printf("[%s]%n", objects);
      System.out.println();
   }
}

ArraysAreObjects declares a print() method that prints an object or an array of objects. It differentiates between these cases via objects instanceof Object[], which returns true when objects references an array of objects.

Compile Listing 1 as follows:

javac ArraysAreObjects.java

Run the resulting application as follows:

java ArraysAreObjects

You should observe the following output (with a different hash code):

A
B
C

[Hello]

[[I@42d3bd8b]

1
2
3

Perhaps you're surprised to see something like [[I@42d3bd8b] instead of each integer on a separate line when executing print(new int[] { 1, 2, 3 });. Section 4.10.3. Subtyping among Array Types provides an answer:

The following rules define the direct supertype relation among array types:

    If S and T are both reference types, then S[] >1 T[] iff S >1 T.

    Object >1 Object[]

    Cloneable >1 Object[]

    java.io.Serializable >1 Object[]

    If P is a primitive type, then:

        Object >1 P[]

        Cloneable >1 P[]

        java.io.Serializable >1 P[]

Essentially, this section tells us that Object and not Object[] is the supertype of a primitive array type

This information helps to explain why MessageFormat has two format() methods that differ only in the type of the first parameter: Object[] or Object. The format() method with Object[] as its first parameter is called for reference array type arguments (e.g., new String[] { "A", "B" }), whereas the other format() method is called for primitive array type arguments, as in format(new int[] { 1, 2, 3 }, sb, pos).

Never write code like that shown in Listing 1. Instead, use Java's variable arguments (varargs) language feature (introduced in Java 5 long after Java 1.1's debut of MessageFormat) to achieve more concise code. Consider Listing 2.

Listing 2. ArraysAreObjects.java (version 2)

public class ArraysAreObjects
{
   public static void main(String[] args)
   {
      print("A", "B", "C");
      print("Hello");
      print(1, 2, 3);
   }

   static void print(Object... objects)
   {
      for (Object object: objects)
         System.out.println(object);
      System.out.println();
   }
}

Although this code is straightforward, you might be curious about print(1, 2, 3);. The compiler generates code to autobox each integer into an Integer object. These objects are stored in an Object[] array that's passed to print().

When you run this application, you should observe the following output:

A
B
C

Hello

1
2
3

The java.util package's Arrays and Objects classes also demonstrate the impact of arrays being objects. Arrays declares a boolean deepEquals(Object[] a1, Object[] a2) method to determine whether two arrays are deeply equal (defined in that method's Javadoc). Similarly, Objects declares boolean deepEquals(Object a, Object b) to determine whether two nonarray or array objects are deeply equal.

You don't have to use Objects.deepEquals() to compare a pair of nonarray objects. Instead, you could create a pair of arrays to hold these objects and pass these arrays to Arrays.deepEquals(). But isn't that a code smell?

In case you're wondering how primitive array types are handled, note that Objects.deepEquals() and Arrays.deepEquals() delegate to Arrays.deepEquals0(). Here's that method's source code:

static boolean deepEquals0(Object e1, Object e2) 
{
   assert e1 != null;
   boolean eq;
   if (e1 instanceof Object[] && e2 instanceof Object[])
      eq = deepEquals ((Object[]) e1, (Object[]) e2);
   else if (e1 instanceof byte[] && e2 instanceof byte[])
      eq = equals((byte[]) e1, (byte[]) e2);
   else if (e1 instanceof short[] && e2 instanceof short[])
      eq = equals((short[]) e1, (short[]) e2);
   else if (e1 instanceof int[] && e2 instanceof int[])
      eq = equals((int[]) e1, (int[]) e2);
   else if (e1 instanceof long[] && e2 instanceof long[])
      eq = equals((long[]) e1, (long[]) e2);
   else if (e1 instanceof char[] && e2 instanceof char[])
      eq = equals((char[]) e1, (char[]) e2);
   else if (e1 instanceof float[] && e2 instanceof float[])
      eq = equals((float[]) e1, (float[]) e2);
   else if (e1 instanceof double[] && e2 instanceof double[])
      eq = equals((double[]) e1, (double[]) e2);
   else if (e1 instanceof boolean[] && e2 instanceof boolean[])
      eq = equals((boolean[]) e1, (boolean[]) e2);
   else
      eq = e1.equals(e2);
   return eq;
}

As you can see, each primitive array type is handled as a special case.

Bytes and shorts are second-class citizens

According to Section 4.2. Primitive Types and Values in the Java Language Specification, Java supports five integral types: byte integer, short integer, integer, long integer, and character. These primitive types are represented via keywords byte, short, int, long, and char, respectively. Each of the byte, short, int, and long types represents a signed integer. In contrast, char represents an unsigned UTF-16 code unit.

Consider byte, short, int, and long. Each type differs only in its range of values based on the number of bits associated with the type: 8 (byte), 16 (short), 32 (int), or 64 (long). Because byte and short have smaller ranges (-128 through 127 for byte and -32768 through 32767 for short), the Java virtual machine (JVM) was designed with limited support for these types (which saved a few instructions).

The JVM provides various int-only instructions (e.g., iadd, isub, and imul). Similarly, the JVM provides various long-only instructions (e.g., ladd, ldiv, and lneg). In contrast, byte and short don't merit similar instructions.

The JVM does provide the following instructions to support byte and short:

  • bipush: Sign-extend 8-bit byte integer operand to 32-bit integer and push the result onto the operand stack.
  • i2b: Pop the 32-bit integer from the top of the operand stack, truncate this value to an 8-bit byte integer, sign-extend the result to a 32-bit integer, and push the result onto the operand stack.
  • i2s: Pop the 32-bit integer from the top of the operand stack, truncate this value to a 16-bit short integer, sign-extend the result to a 32-bit integer, and push the result onto the operand stack.
  • sipush: Sign-extend 16-bit short integer operand to 32-bit integer and push the result onto the operand stack.

The Java language reflects this second-class support for byte and short by not supporting byte or short integer literals. An integer literal is either of type int (with no suffix) or of type long (with the l or L suffix). However, it does provide one convenience: when assigning an int literal to a byte or a short variable, you don't have to specify a cast operator when the literal ranges from -128 through 127 (byte) or -32768 through 32767 (short). For example, you can specify byte b = 27; instead of having to specify byte b = (byte) 27;. Similarly, you can specify short s = 299; instead of having to specify short s = (short) 299;.

It's easier to understand this second-class citizen business when you examine the bytecode to a simple application. Consider Listing 3.

Listing 3. BytesAndShorts.java (version 1)

public class BytesAndShorts
{
   public static void main(String[] args)
   {
      byte b = 27;
      short s = 299;
   }
}

Assuming that you've compiled this listing to BytesAndShorts.class, execute the following command to obtain a disassembly:

javap -v BytesAndShorts

The following is that portion of the disassembly that's relevant to the main() method:

public static void main(java.lang.String[]);
  descriptor: ([Ljava/lang/String;)V
  flags: (0x0009) ACC_PUBLIC, ACC_STATIC
  Code:
    stack=1, locals=3, args_size=1
       0: bipush        27
       2: istore_1
       3: sipush        299
       6: istore_2
       7: return

There are three local variables: 0 (args), 1 (b), and 2 (s).

At the source code level, 27 is a 32-bit integer literal. For efficiency, 27 is stored as an 8-bit byte following the operation code (opcode) for the bipush instruction. As stated earlier, this instruction sign-extends this 8-bit value to a 32-bit value that's stored on the operand stack. This value will be popped off the stack and stored in local variable 1 (via istore_1) -- recall that 1 refers to b in the source code.

Here is something interesting: the istore_1 instruction reveals that byte variable b is really of type int at the JVM level. After all, the istore instructions store 32-bit values.

Continuing with the disassembly, sipush 299 sign-extends 299 to a 32-bit value that's stored on the operand stack, and the subsequent istore_2 instruction stores this 32-bit value in int variable s.

It appears that the JVM does not recognize byte or short variables, but treats them as if they are of type int. Listing 4 presents an application that probes deeper into this situation.

Listing 4. BytesAndShorts.java (version 2)

public class BytesAndShorts
{
   public static void main(String[] args)
   {
      int i = 35;
      byte b = (byte) i;
      short s = (byte) i;
   }
}

Assuming that you've compiled this listing to BytesAndShorts.class, execute the following command to obtain a disassembly:

javap -v BytesAndShorts

The following is that portion of the disassembly that's relevant to the main() method:

public static void main(java.lang.String[]);
  descriptor: ([Ljava/lang/String;)V
  flags: (0x0009) ACC_PUBLIC, ACC_STATIC
  Code:
    stack=1, locals=4, args_size=1
       0: bipush        35
       2: istore_1
       3: iload_1
       4: i2b
       5: istore_2
       6: iload_1
       7: i2b
       8: i2s
       9: istore_3
      10: return

There are four local variables: 0 (args), 1 (i), 2 (b), and 3 (s).

The first two instructions convert 35 to a 32-bit integer and store it in int variable i. There are no surprises here. In contrast, the next three instructions retrieve this value, convert it to a byte (via i2b), and store the result in "int" variable b. Even though the JVM doesn't regard b to be of type byte, it still treats this variable as if it were a byte: i2b ensures that the 32-bit integer value won't lie outside the range -128 through 127.

The instruction sequence from offset 6 through offset 9 is interesting. I could have specified short s = (short) i; instead of short s = (byte) i; in the source code, but chose to deviate in order to see what happens at the JVM level. The i2b instruction at offset 7 first converts the 32-bit integer value stored in i to an 8-bit byte. The subsequent i2s instruction converts this result to a 16-bit short integer, which is then sign-extended to a 32-bit integer in preparation for being stored in s via istore_3. The bytecode sequence for short s = (byte) i; ensures that the value stored in "int" variable s doesn't lie outside the range -32768 through 32767 (and shows that you should avoid useless casts).

Private fields and methods are accessible without reflection

Under certain circumstances, you can access an object's private field or call its private method without having to use Java's Reflection API. Consider Listing 5.

Listing 5. PrivateAccess.java

1 2 Page 1
Page 1 of 2