Attack of the clones

Time and space considerations in four different approaches to implementing deep clone() methods

1 2 3 Page 2
Page 2 of 3

Approach 2 is more error prone than Approach 1 because it is easy to forget to override clone() and accidentally inherit a superclass's version that will return an object of the wrong type. If you make the same mistake in Approach 1, the result will be less disastrous. Additionally, it is harder to maintain the implementation in Approach 2 when fields are added and removed (compare the OBJECT_CLONE branch in TestBaseClass.clone() with similar code in the copy constructor). Also, Approach 1 requires less class cooperation in some cases: for a base class with only shallow fields, you don't need to implement Cloneable or even provide a clone() override if you do not intend to clone at the base class level.

However, an undeniable advantage of cloning via copy construction is that it can handle both final fields and inner classes. But due to dangers present when inheritance is involved, I recommend using this sparingly and preferably simultaneously with making the relevant classes final.

Approach 3: Cloning via Java serialization

Java serialization is convenient. Many classes are made serializable by simply declaring them to implement java.io.Serializable. Thus, a whole hierarchy of classes can be made cloneable by deriving them from a base Serializable class whose clone() is implemented as a simple, yet extremely generic method:

    public Object clone (Object obj)
    {
        try
        {
            ByteArrayOutputStream out = new ByteArrayOutputStream ();
            ObjectOutputStream oout = new ObjectOutputStream (out);
            oout.writeObject (obj);
            
            ObjectInputStream in = new ObjectInputStream (
                new ByteArrayInputStream (out.toByteArray ()));
            return in.readObject ();
        }
        catch (Exception e)
        {
            throw new RuntimeException ("cannot clone class [" +
                obj.getClass ().getName () + "] via serialization: " +
                e.toString ());
        }
    }

This is so generic it can be used for cloning classes that can be written and added to your application by someone else long after you provide the base classes. But this convenience comes at a price. After switching TestBaseClass.clone() and TestClass.clone() to the SERIALIZATION branch I get:

clone implementation: serialization
method duration: 2.724 ms

This is roughly 100 times slower than Approaches 1 and 2. You probably would not want this option for defensive cloning of parameters of otherwise fast intra-JVM methods. Even though this method can be used for generic containers with deep cloning semantics, cloning a few hundred objects would make you see times in the one-second range: a doubtful prospect.

There are several reasons why this approach is so slow. Serialization depends on reflective discovery of class metadata, known to be much slower than normal method calls. Furthermore, because a temporary input/output (I/0) stream is used to flatten the entire object, the process involves UTF (Universal Transformation Format) 8-encoding and writing out every character of, say, TestBaseClass.m_string. Compared to that, Approaches 1 and 2 only copy String references; each copy step has the same small fixed cost.

What's even worse, ObjectOutputStream and ObjectInputStream perform a lot of unnecessary work. For example, writing out class metadata (class name, field names, metadata checksum, etc.) that may need to be reconciled with a different class version on the receiving end is pure overhead when you serialize a class within the same ClassLoader namespace.

On the plus side, serialization imposes fairly light constructor requirements (the first non-Serializable superclass must have an accessible no-arg constructor) and correctly handles final fields and inner classes. This is because native code constructs the clone and populates its fields without using any constructors (something that can't be done in pure Java).

One more interesting advantage of Approach 3 is that it can preserve the structure of object graph rooted at the source object. Examine the dummy TestBaseClass constructor. It fills the entire m_strings array with the same m_string reference. Without any special effort on our part, the invariant m_strings[0] == m_string is preserved in the cloned object. In Approaches 1 and 2, the same effect is either purely incidental (such as when immutable objects remain shared by reference) or requires explicit coding (as with m_object1 and m_object2 in TestClass). The latter could be hard to get right in general, especially when object identities are established at runtime and not compile time (as is the case with TestClass).

Approach 4: Cloning via Java reflection

Approach 4 draws inspiration from Approach 3. Anything that uses reflection can work on a variety of classes in a generic way. If I require the class in question to have a (not necessarily public) no-arg constructor, I can easily create an empty instance using reflection. It is especially efficient when the no-arg constructor doesn't do anything. Then it is a straightforward matter to walk the class's inheritance chain all the way to Object.class and set all (not just public) declared instance fields for each superclass in the chain. For each field, I check whether it contains a primitive value, an immutable object reference, or an object reference that needs to be cloned recursively. The idea is straightforward but getting it to work well requires handling a few details. My full demo implementation is in class ReflectiveClone, available as a separate download. Here is the pseudo-code of the full implementation, with some details and all error handling omitted for simplicity:

public abstract class ReflectiveClone
{
    /**
     * Makes a  reflection-based deep clone of 'obj'. This method is mutually
     * recursive with {@link #setFields}.
     * 
     * @param obj current source object being cloned
     * @return obj's deep clone [never null; can be == to 'obj']
     */
    public static Object clone (final Object obj)
    {        
        final Class objClass = obj.getClass ();
        final Object result;
                
        if (objClass.isArray ())
        {           
            final int arrayLength = Array.getLength (obj);
            
            if (arrayLength == 0) // empty arrays are immutable
                return obj;
            else
            {                      
                final Class componentType = objClass.getComponentType ();
                
                // Even though arrays implicitly have a public clone(), it
                // cannot be invoked reflectively, so need to do copy construction:
                
                result = Array.newInstance (componentType, arrayLength);
                
                if (componentType.isPrimitive () ||
                    FINAL_IMMUTABLE_CLASSES.contains (componentType))
                {
                    System.arraycopy (obj, 0, result, 0, arrayLength);
                }
                else
                {
                    for (int i = 0; i < arrayLength; ++ i)
                    {
                        // Recursively clone each array slot:
                        final Object slot = Array.get (obj, i);
                        if (slot != null)
                        {
                            final Object slotClone = clone (slot);
                            Array.set (result, i, slotClone);
                        }
                    }
                }
                
                return result;
            }
        }
        else if (FINAL_IMMUTABLE_CLASSES.contains (objClass))
        {
            return obj;
        }
        
        // Fall through to reflectively populating an instance created
        // via a no-arg constructor:
        // clone = objClass.newInstance () can't handle private constructors:
            
        Constructor noarg = objClass.getDeclaredConstructor (EMPTY_CLASS_ARRAY);
        if ((Modifier.PUBLIC & noarg.getModifiers ()) == 0)
        {
            noarg.setAccessible (true);
        }
        result = noarg.newInstance (EMPTY_OBJECT_ARRAY);
        
        for (Class c = objClass; c != Object.class; c = c.getSuperclass ())
        {
            setFields (obj, result, c.getDeclaredFields ());
        }
        
        return result;
    }    
    /**
     * This method copies all declared 'fields' from 'src' to 'dest'.
     * 
     * @param src source object
     * @param dest src's clone [not fully populated yet]
     * @param fields fields to be populated
     */
    private static void setFields (final Object src, final Object dest,
                                   final Field [] fields)
    {
        for (int f = 0, fieldsLength = fields.length; f < fieldsLength; ++ f)
        {            
            final Field field = fields [f];
            final int modifiers = field.getModifiers ();
            
            if ((Modifier.STATIC & modifiers) != 0) continue;
            
            // Can also skip transient fields here if you want reflective
            // cloning to be more like serialization.
            
            if ((Modifier.FINAL & modifiers) != 0)
                throw new RuntimeException ("cannot set final field" +
                field.getName () + " of class " + src.getClass ().getName ());
            
            if ((Modifier.PUBLIC & modifiers) == 0) field.setAccessible (true);
            
            Object value = field.get (src);
            
            if (value == null)
                field.set (dest, null);
            else
            {
                final Class valueType = value.getClass ();
                
                if (! valueType.isPrimitive () &&
                    ! FINAL_IMMUTABLE_CLASSES.contains (valueType))
                {
                    // Value is an object reference, and it could be either an
                    // array or of some mutable type: try to clone it deeply
                    // to be on the safe side.
                        
                    value = clone (value);
                }
                
                field.set (dest, value);
            }
        }
    }
 
    private static final Set FINAL_IMMUTABLE_CLASSES; // Set in 
    private static final Object [] EMPTY_OBJECT_ARRAY = new Object [0];
    private static final Class [] EMPTY_CLASS_ARRAY = new Class [0];
    
    static
    {
        FINAL_IMMUTABLE_CLASSES = new HashSet (17);
        
        // Add some common final/immutable classes:
        FINAL_IMMUTABLE_CLASSES.add (String.class);
        FINAL_IMMUTABLE_CLASSES.add (Byte.class);
        ...
        FINAL_IMMUTABLE_CLASSES.add (Boolean.class);
    }
} // End of class

Note the use of java.lang.reflect.AccessibleObject.setAccessible() to gain access to nonpublic fields. Of course, this requires sufficient security privileges.

Since the introduction of JDK 1.3, setting final fields via reflection is no longer possible (see Note 1 in Resources); so, this approach resembles Approach 1 because it can't handle final fields. Note also that inner classes cannot have no-arg constructors by definition (see Note 2 in Resources), so this approach will not work for them either.

Coupled with the no-arg constructor requirement, this approach restricts the type of classes it can handle. But you would be surprised how far it can go. The full implementation adds a few useful features. While traversing the object graph rooted at the source object, it keeps an internal objMap parameter that maps values in source object graphs to their respective clones in the cloned graphs. This restores the ability to preserve object graphs that I had in Approach 3. Also, the metadataMap parameter caches class metadata for all classes that it encounters while cloning an object and improves performance by avoiding slow reflection. The relevant data structures are scoped to a single call to clone(), and the overall idea is very similar to Java serialization revamped to just do object cloning. Similar to the previous section, a whole hierarchy of suitable classes can be made cloneable by equipping the base class with one generic method:

    public Object clone ()
    {
        return ReflectiveClone.clone (this);
    }

What is this method's performance? Rerunning the test with the REFLECTION branch selected produces:

clone implementation: reflection
method duration: 0.537 ms

This is roughly five times faster than straightforward serialization—not too bad for another generic approach. In terms of its performance and capabilities, it represents a compromise between the other three solutions. It can work very well for JavaBean-like classes and other types that usually do not have final fields.

Resource considerations

Measuring memory overhead is more difficult than measuring performance. It should be obvious that the first two approaches shine in this area, as they instantiate only the data that will populate the cloned fields.

Cloning via serialization has an extra drawback that may have escaped your attention above. Even though serializing an object preserves the structure of the object graph rooted at that instance, immutable values will get duplicated across disjoint calls to clone(). As an example, you can verify for yourself that

    TestClass obj = new TestClass ("dummy");
    System.out.println (obj.m_string == ((TestClass) obj.clone ()).m_string);

will print false for Approach 3

only

. Thus, cloning via serialization will have a tendency to pollute heap with redundant copies of immutable objects like

String

s. Approaches 1 and 2 are completely free from this problem, and Approach 3 is mostly free from it.

A quick and dirty proof of these observations can be seen by changing the body of Main.main() to keep the clones in memory and track the object count when a given heap size is reached:

1 2 3 Page 2
Page 2 of 3