Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
While incorporating XML into your distributed applications, you may encounter the need to transfer binary data as part of your XML document. For example, you may need to pass to the client binary images embedded within an XML document, which includes additional data elements such as images. Simply embedding the byte values within the stored XML document won't work due to the XML specification's valid-character restriction and due to character encoding and decoding as the document travels from its source to its parsing destination.
According to the XML 1.0 specification, valid character values include the following ranges of hexadecimal values: 0x9, 0xA, 0xD, 0x20-0xd7ff, 0xe000-0xfffd, and 0x10000-0x10ffff. The specification also uses the character definition specified by the ISO/IEC 10646 standard and requires that all conforming
XML processors "...accept the UTF-8 and UTF-16 encodings of 10646."
For readers not familiar with the ISO/IEC 10646, the standard was first published in 1993 by the International Organization for Standardization (ISO), whose objective specifies the encoding of characters used in every written language into binary form. To provide compatibility between the multilingual encodings and most existing software applications that use the ASCII standard, the ISO has defined many transformations including the UTF-8 and UTF-16 encodings. For more information about the ISO/IEC 10646 standard and UTF encodings, see the Resources section.
What does all this have to do with the problem at hand? Well, if you embed the binary data within the XML document within a specific element tag, the receiving XML processor attempts to interpret the byte sequence following the UTF-8 or UTF-16 encodings. This most likely causes the parser to encounter invalid sequences and fail.
This implies that you must encode your own binary data into the valid character set before embedding it into the XML document. Obviously, you then have to decode the data on the receiving side. In the rest of this tip, I describe three different approaches for encoding binary data before embedding it into an XML document.
The direct approach to solving this encoding problem converts each binary data byte into its two character, hexadecimal representation.
By doing that, you encode the 256 possible byte values using for each byte two characters from the character set 0-9, a-f:
byte[] buffer = readFile(filename);
int readBytes = buffer.length;
StringBuffer hexData = new StringBuffer();
for (int i=0; i < readBytes; i++) {
hexData.append(padHexString(Integer.toHexString(0xff & buffer[i])));
}
As the code above illustrates, the conversion is simple enough. Timing the conversion routine above on a typical desktop PC
(a Pentium III machine running at 800MHz with 256MB of memory) gave my team a conversion rate of 485 KB/sec. Note, we used
a StringBuffer rather than plain String concatenation to build the binary buffer's resulting character representation. We did that to avoid the unnecessary cost
of repeatedly creating and then releasing String class instances. If necessary, you could accelerate this conversion using a hexadecimal number lookup table as shown below.
Timing the conversion on the same PC gave my team a conversion rate of 1,920 KB/sec using this approach -- about a four-fold
increase: