An XML file is simply text—nothing very special—except that in order for XML software to read and interpret it, everything needs to be just so. XML is, loosely, a computer formatting language, and as such is a low type of computer code—if not quite as finicky as a proper programming language it has much stricter rules than HTML.
Every part of the structure of the file, and aspects of the contents, must match two defining documents: An ONIX file is validated by using XML software to compare your file against the rules of the XML standard (www.W3.org) and the schema written by the ONIX developers at Editeur (www.editeur.org). So an ONIX validation is both something that applies to all XML documents and is specific to the ONIX data exchange standard—and validation errors might be from either. You shouldn’t confuse the XML validation process with the Certification report generated by BiblioShare. Every file accepted into BiblioShare, after it passes the XML validation discussed here, gets a quality assessment that looks for data issues. This is a distinct and separate process from XML validation.
You probably should research and try to understand as much about XML as time, energy and inclination allow you—you’ll be happier producing ONIX if you do, and possibly more comfortable using Epub too. There are good resources on the web, Wikipedia and www.w3schools.com/XML/ are recommended as a start.
What Gets Validated?
The ONIX file, the file you send to BiblioShare and your other trading partners is what gets validated. In solving validation problems you might make corrections to your original dataset and re-output the ONIX file, or you might just manually modify the file itself, but it’s the file, whatever.xml, that we’re working with here. Validation is always the last step—before sending XML files to anyone they should be checked.
Taking Stock
First off: Do you have any XML software—an XML editor or development suite like XML Spy or oXygen? If you’ve inherited this job, look at your program list, ask! It can’t hurt and you may as well use what you’ve got or paid for. I will be recommending some specific software and one is free, but there’s nothing special about it. You should consider getting more than one validation tool (you can never have enough validation).
You can find software through a web search on “XML editor” or look at the “XML Resources” at O’Reilly’s www.XML.com.
Text Editor
As noted above, an XML file is just text and XML files can be opened in a text editor. If you’ve got an XML editor you might use that as it’s designed for the work, but you absolutely can view and edit an ONIX file in a text editor. Your only concern is ensuring the editor does not change the file. For example you can use MS-Word to open an xml file—but don’t do it!! Word is set up to “help” you run XSLT transformation scripts and will make any number of assumptions and changes to the file content, none will be good for our purpose of using the XML standard to exchange data. (This warning about software changing files applies to a lot of XML software. Until you’re sure it doesn’t, assume any software might be making changes to an XML file.)
What you want to use is as simple a text editor as possible, on a PC Notepad or Wordpad, on a Mac, TextEdit or SimpleText, make gentle changes to the text you can see and save it without rendering the document unreadable to XML software. Really, it’s just use the keyboard or cut and paste text, and exit using the most straightforward options. If forced to choose format options on saving first try to use the one labeled “ASCII US” or “ASCII text”.
There are number of text editors available designed to be used by programmers—they tend to have better “Go To Line” features, are usually tag sensitive (you’ll understand that when you see it—very handy in XML)—and they don’t muck with the code. I’m fond of Notepad2, http://www.flos-freeware.ch/notepad2.html, but Notepad++ http://notepad-plus.sourceforge.net/uk/site.htm might be worth checking out.
As always all work should be done on a copy of your ONIX file—experiment but don’t trash your work.
PC vs MAC
Macs are better for a lot of things but you have more options (and more free options) for XML software on a PC. If your Mac has a Windows emulation or operating system boot area any PC solution should work. Mac solutions are typically Java based—and there’s nothing wrong with that (PC software usually rely on the .NET Framework)—but they are more likely to have fees associated with them.
I would really appreciate feedback from Mac users as I’m not very familiar with what’s out there. oXygen seems to be the clear favorite but I’m sure there’s some good freeware too.
File Size
XML software is typically processor intensive and requires a lot of RAM memory resources. Some software fails at large file sizes, and all most will be more difficult when handling large files. You’ll find it faster and easier to understand if you do this on a smaller file (below 1000 records and below 100 records would be even better), at least while doing you’re first validations. When you’re familiar with the software and its responses try using larger sizes—most XML software has an upper limit at which it’s unresponsive. How would you know if you haven’t done it successfully?
How do you cut a file down to size? Use a text editor, open the ONIX file and remove individual product records by starting with the tag (or for short tags) and include the corresponding tag (or ). So long as you remove whole product records ( to ) and leave the other tags alone you can take out as many as you want.
Internet Access
XML software usually needs internet access to work—do this on a computer hooked up to it.
The ONIX Documentation
It’s big, it’s dull and you need it on your computer: www.editeur.org ONIX / ONIX for Books / Previous releases / Release 2.1 Downloads / Download Release 2.1 format specifications You’ll need to get the current release so I’ve not provided a direct link. Having a copy of the Product Manual and the Message Specifications is invaluable. The PDF is linked to the code lists and it’s the easiest way to look up something.