Finding the most efficient Java ObjectOutput method for strings

Coding distributed services and apps often calls for marshalling Java objects into a binary form that can be streamed over the network. Infinispan, for example, requires objects to be serialised so they can be multicast to the nodes of the data grid.

The standard Java serialisation implementation packs a lot of object data in order to be able to automatically reproduce an object upon deserialisation. Devising your custom serialiser can save a lot of bytes, as the following snippet demonstrates:

// Serialise a date object
Date now = new Date();

// As object
ByteArrayOutputStream bout = new ByteArrayOutputStream();
ObjectOutput oout = new ObjectOutputStream(bout);
oout.writeObject(now);
oout.flush();
System.out.println("Date writeObject length: " + bout.toByteArray().length);

// As long representation
bout = new ByteArrayOutputStream();
oout = new ObjectOutputStream(bout);
oout.writeLong(now.getTime());
oout.flush();
System.out.println("Date writeLong length: " + bout.toByteArray().length);

Running the above code produces the following result:

Date writeObject length: 46
Date writeLong length: 14

Amazing, converting the Date object to its long representation results in a three-fold saving. For every 1 million transmitted Date objects that means a difference of 32 Mbytes.

The conclusion is clear: custom serialisers make sense. They can reduce network traffic and speed up the overall responsiveness of your distributed service or app.

Developers are typically given the standard DataOutput interface to implement their custom serialisers. This is true for Infinispan, JBoss Marshalling, and probably other frameworks too. Use of the DataOutput interface is straightforward – you just have to pick the correct method for the data type that you wish to marshal. For serialising strings, however, there are three methods provided:

  • DataOutput.writeUTF(String)
  • DataOutput.writeBytes(String)
  • DataOutput.writeChars(String)

Let’s find out which one is the most efficient:

String s = "The quick brown fox jumps over the lazy dog";

ByteArrayOutputStream bout = new ByteArrayOutputStream();
ObjectOutput oout = new ObjectOutputStream(bout);
oout.writeUTF(s);
oout.flush();
System.out.println("writeUTF length: " + bout.toByteArray().length);

bout = new ByteArrayOutputStream();
oout = new ObjectOutputStream(bout);
oout.writeBytes(s);
oout.flush();
System.out.println("writeBytes length: " + bout.toByteArray().length);

bout = new ByteArrayOutputStream();
oout = new ObjectOutputStream(bout);
oout.writeChars(s);
oout.flush();
System.out.println("writeChars length: " + bout.toByteArray().length);

bout = new ByteArrayOutputStream();
oout = new ObjectOutputStream(bout);
oout.writeObject(s);
oout.flush();
System.out.println("writeObject length: " + bout.toByteArray().length);

Running the above produces the following:

writeUTF length: 51
writeBytes length: 49
writeChars length: 92
writeObject length: 50

The writeUTF and writeBytes methods output similar byte lengths. Surprisingly so does writeObject. This could be due to Java treating strings in writeObject in a optimised way. writeChars however produces a byte output that is almost twice as long.

The conclusion is that for strings you could use any of the available methods, but stay clear of writeChars.

Happy coding! 🙂