I’m going to do a series of blog posts on some of the very basic issues in file trading—what needs to be done before you submit an ONIX file (or an E-book if your e-book is in XML). In doing this I’m hoping that publishers will comment about software they like (and don’t), problems they have—and with any luck their successes.
So, for the first post: Why XML?
Any discussion about file exchange has to start with why XML works, which is because of its underlying assumptions and the software that supports them. The main assumption is that all the characters, line returns, visible and hidden content—all of it—are recognized in every file. XML software tests for this and it’s so important that information about it normally appears in the first line of an XML file as an encoding statement, right after you identify that this is an XML document:
<? xml version=”1.0” encoding=”utf-8” ?>
or
<? xml version=”1.0” encoding=”iso-8859-1” ?>
Think about that for a moment: How obvious and how could it be otherwise? And then think about just how unlikely it is to be true about a publisher’s ONIX file, built up over long periods of time through cut and paste from who knows what source documents. You don’t really know where all the millions of characters in your ONIX file came from, do you? And that’s why trading delimited files or database files doesn’t work. None of these test the incoming data. But XML software does and it won’t work with less than “well encoded” data.
Publishers can think of it this way: You’ve probably heard of or published a book where an “incompetent freelance designer didn’t use the right font” (or used “outdated software,” or provided “bad thingies”) and the files screwed up when it went to the printer. And your production manager “fixed that file” with a lot of overtime and foul language. That’s an encoding problem: What you sent to someone else didn’t appear as you intended it to be. If you were trading files in XML and did it right that wouldn’t happen. All sorts of other things might—but not that.
The trick to the encoding statement is it doesn’t really matter where the characters came from—it’s not your ability to answer the Zen koan: “What is the encoding of the letter you’re typing now?” What matters is what happens when someone else loads the file. Does their software recognize all the characters? You may have software designed to create an ONIX file, but does it monitor what’s going into it? Does it prevent you from loading dashes from Word 97 or WP5.1 with an error message? Does it ask you want the output encoding to be and prevent anything else going it? It would be surprising if it did.
So the first rule of data exchange is that you must test the ONIX output every time you create it. You test your data with XML software before you send it. The XML standard demands it. The ONIX standard depends on it.
That’s why XML works. The XML standard and software are designed to enforce things like this. You may think you can trade data using Excel or delimited formats, but none of these will do a good job of ensuring that what you send can be read at the other end. XML does (somewhat—don’t think it’ll be perfect), and that’s main reason it’s better for data transfer.