ONIX 2.1 to 3.0: Collections

[NOTE: For an updated look at this topic go here.]


onix_2to3_transition.png

This is the first in a series of blog posts (introduced here) offering solutions to improve the ONIX 3.0 data we’ve been seeing in BiblioShare. In this post, Tom Richardson looks at Collections, to which the old ONIX 2.1 data on Series and Sets migrated.

This post on Collections will address the following:

I suggest giving this post a fast read and organizing the resources before looking in detail at your data.

ONIX 3.0 takes the ONIX 2.1 kludge solution used by publishers to control retailer display—which involves sticking the series information into the Title (rendering it useless for indexing)—and allows it to function as Collection information by fielding it in the P.6 Title composite. It’s brilliant. It’s simple. It allows for additional choices like brand name. But the one thing you must do is field your information, at least if you want retailers to index it and be able to present it to consumers as a unit outside of this particular product.

And never repeat information more than once. You field the information to prevent duplication and to ensure the retailer uses it properly by giving them no choice.

 

How does Collection work?

There were a number of reasons for revising ONIX 2.1’s Set and Series sections, but a big one was no one could define the difference between them. Therefore, “Collection” is a unified creation that doesn’t care what you call it: set, series, whatever—if it’s a collection of products or if it’s part of a larger collection, the information is in Collection. It’s simpler that way.

As I noted in the introduction to this blog series, simplicity in ONIX often includes a meaningful choice and Collection allows its information to be presented in two ways: as a part of the P.6 Title block OR separated out into the P.5 Collection composite. There are a number of specialized situations where this arrangement can be extremely beneficial (academic publishing in particular needs the P.5 Collection composite), but in North American trade publishing I don’t think it makes much difference. The choice is simple enough:

Is the series closely integrated with the title? Typically that would mean the series information is presented on the title page or that you can’t sensibly reference the book’s “distinct book title” without referencing the broader series at the same time.

  • If yes: You’ll want to present the collection information within the P.6 Title composite. A classic and clear example would be a series called “Focus on (various disciplines)” where the title for the specific ISBN is “Workbook” and the number. You have to present the book title with the series information to define the book.

  • If no: You might want to present the collection information within the P.6 Collection composite, but you still have a choice. Penguin Modern Classics and Terry Pratchett’s Discworld series may be two examples: neither are “integrated” into the book titles. But note that both of these could be displayed in the Title block if that’s a good marketing choice. Also worth considering is that the book cover might carry a logo or these words associated with the series.

There is an element of choice. The point is that there are books where you must put the collection information into P.6 Title, but use of the P.5 Collection section is often optional and that is especially true for trade titles.

  • Exceptions: There aren’t many, but consistency is a big one. All the ONIX records that are part of the Collection should be presented the same way. So in terms of the choice above, if your series’ 2010 titles leaned one way and your 2015 titles lean another, present both the same way.

Does it matter?

Before you let the above choice irritate you, ask yourself: Does it matter to the retailer? What you call your series will determine their ability to create indexes and group them together. Getting it right benefits publishers, retailers, and consumers. If the information is presented consistently across the series and would be coded the same way in either P.5 or P.6 anyway, then that’s pretty trivial to do. The only question retailers need to deal with is how to display “Collection” with a book’s “Title.” Here’s another way to think about it: if you present information within the P.6 Title Composite, you can control exactly how the retailer should display title information.

ONIX 3.0 takes the ONIX 2.1 kludge solution used by publishers to control retailer display—which involves sticking the series information into the Title (rendering it useless for indexing)—and allows it to function as Collection information by fielding it in the P.6 Title composite. It’s brilliant. It’s simple. It allows for additional choices like brand name. But the one thing you must do is field your information, at least if you want retailers to index it and be able to present it to consumers as a unit outside of this particular product.

And never repeat information more than once. You field the information to prevent duplication and to ensure the retailer uses it properly by giving them no choice.

How to tell you’ve screwed up Collections in ONIX 3.0

Open your ONIX 3.0 file and look at the P.5 Collection composite and the P.6 Product Title composite on a record where you expect series information. The things referenced in this list will be explored in the blog post, but you’ve done it wrong if you see:

  • Repetition of the same or similar text between these composites. Good ONIX 3.0 will allow you to transmit information with complete clarity and no repetition.

Other problem signs are:

  • If all of the Series or Set data has been mapped to P.5 Collections and none of it to Collections codes within the product’s P.6 Distinct Title entry, you have options that should improve data presentation (and you’ve probably done it wrong);

  • If you haven’t added Sequence Numbers to the TitleDetail in P.5 Collections and the P.6 Product Title title, you should add it;

  • If branding and series information is not fielded as distinct elements within the TitleDetail in P.5 Collections or P.6 Product Title, you’re not letting retailers use the data fully and are replicating the poor data practices used in ONIX 2.1.

If you can’t make sense of what’s in your data, or see how a retailer can create indexes to group all the parts of your “Collection,” then it’s unlikely a retailer will be able to. The most likely cause is ONIX 2.1 data being used in an unproofed conversion without properly creating an ONIX 3.0 Collection.

Repetition is the worst, and to prove this, here is a code block pulled from an actual ONIX record submitted to BiblioShare—the only changes made were to replace a text string in the record with “Lazy Daisy Walk” and to change the Product Subtitle to protect the innocent:

<!-- P.5 COLLECTION INFORMATION -->
<Collection>
  <CollectionType>10</CollectionType>
  <TitleDetail>
      <TitleType>01</TitleType>
      <TitleElement>
       <TitleElementLevel>02</TitleElementLevel>
       <TitleText>Lazy Daisy Walk</TitleText>
      </TitleElement>
  </TitleDetail>
</Collection>
<Collection>
  <CollectionType>00</CollectionType>
  <TitleDetail>
      <TitleType>01</TitleType>
      <TitleElement>
       <TitleElementLevel>04</TitleElementLevel>
       <PartNumber>2 of 4</PartNumber>
       <TitleText>Lazy Daisy Walk Series</TitleText>
       <Subtitle>Lazy Daisy Walk: Division</Subtitle>
      </TitleElement>
  </TitleDetail>
</Collection>
<!-- P.6 TITLE INFORMATION -->
<TitleDetail>
  <TitleType>01</TitleType>
  <TitleElement>
      <TitleElementLevel>01</TitleElementLevel>
      <TitleText>Lazy Daisy Walk: Division</TitleText>
      <Subtitle>Product Subtitle was here</Subtitle>
  </TitleElement>
</TitleDetail>

“Lazy Daisy Walk” appears four times. There are two Collection composites: one as a “publisher” collection “01” and one as an “Undefined” collection “00”. The Product Title composite repeats information found in the Collection composite—only the Product Subtitle was unique. I contacted the publisher and discovered that this is the result of a conversion process from their ONIX 2.1. And, to be honest, I doubt that the ONIX 2.1 starting record was ideal. My guess is that it likely supported both Series and Set information: that Part Number “2 of 4” suggests one entry is a Set, and that resulted in two Collections in the conversion results.

Some thoughts on conversion

I have to applaud the creator of conversion software for making a problem clear: one of the Collection composites is type “00” for “undefined”—and it appears intentional because by mapping two Collections and making one “undefined,” a problem has been flagged. There is no realistic use case for an originating publisher to be unable to define a Collection: It’s your metadata—define it. There is often an “undefined” option offered in the ONIX Code Lists, but it’s there for metadata creators who don’t know, which is very sad. Even if you’re a distributor and really don’t know, “undefined” should be only be a temporary marker on an active record. Your clients should expect more from you than leaving their data undefined.

But even taking out the weird undefined Collection, there is still pointless repetition in the case above: it’s still not been made into “good” ONIX 3.0. Consider the following:

  • Ground Zero: An ongoing “production” process converting a 2.1 file and making it 3.0 cannot create a “good” 3.0 file—or it’s unlikely if you haven’t done an awful lot of work on the 2.1 file to make conversion accurate. (That’s a subtext to this series: the work of making the transition to 3.0 is the same work you’ll need to do in a conversion.)

  • Rule one (Ground Zero revisited): The ONIX 3.0 conversion can only be as good as the 2.1 source file. Graham Bell’s ONIX 2.1 to 3.0 webinar (around the 47-minute mark) points this out in detail, and has ways to improve your 2.1 before making the switch. Both you and the conversion creator should understand how ONIX 3.0 should appear: it is not identical in all cases to ONIX 2.1, even if what it’s communicating is broadly the same. You almost certainly will have data loss wherever you support deprecated elements, and you might have it where ONIX 2.1 offers composite versions where you only support the pre-2.1 element version.

  • Rule two: Your ONIX 3.0 conversion will only be as good as your proofing and updating of the ONIX 3.0 output file. If you make no data changes to your 3.0 file after its conversion from 2.1, then you can’t be using 3.0 to its full potential. Allow time to learn how your conversion works. Then expect to spend a similar amount of work updating and correcting the ONIX 3.0.

  • Rule three: Use the data to help you proof. Hopefully the conversion documentation will let you know what it does as default values and when it has a choice where the answer is unclear. And hopefully the conversion will be appropriate for most books. But all? As in the code example above, look for undefined composites and elements in the output and define them. (While “00” often indicates an undefined code, it can vary between code lists, so always confirm the right value from the ONIX code list.) A good quality conversion shouldn’t assume it knows the answer, and you should expect to be supplying it in some cases.

  • Rule four: ONIX records are pretty literal and literate—just scanning one finds problems. Ask yourself: How will a retailer load <that> to a dataset? There’ll be more on that <that> when we look at Market in the next blog post.

Three Oh’s Collection in action

Rio2.jpg

9780062284990 is a HarperCollins title and part of a children’s book series featuring some licensed, branded products as well as what I think are likely characters created by HC. This is an intertwined series that could be isolated by level across brand/characters, or read as a progression of levels based on brand/character. It’s clearly structured but how could a retailer replicate it? Check out the dedicated website and decide for yourself.

ONIX 3.0 handles it beautifully, and it illustrates the choice between P.5 and P.6 well. There’s no right answer, but HarperCollins’ marketing department can definitely meet retailer needs in their data feed if that’s a goal. At first look, you’d think, “All the information is on the cover, therefore it must all be put into P.6 Title,” but I don’t think so. I think the reading levels are better carried in the P.5 Collection composite, and the brand/character information specific to the product title are better carried in P.6 Title. Remember the rule of thumb: is it an integrated part of the title? “I Can Read” and “Reading With Help 2” aren’t really part of the product title, but all these elements could benefit from the structured presentation possible in Collection.

Would HarperCollins be wrong if they did include everything in the P.6 Title block? Would they be wrong if in addition to the “I Can Read” metadata they moved the branding/character information up to Collection? I don’t know about right or wrong here, but I’d say it’s a good reason to talk to a couple of major trading partners who might want to improve their data display. ONIX is all about your trading partners, so talking to them is never a mistake.

Let’s take it apart:

This is not so hard and pretty sensible. But as I write, I can hear the grinding of publisher teeth as they protest, “Fine… But. Retailers. Will. Screw. That. Up. If they can’t display Series right now, why would they take that?” Because you’ll tell them exactly how to do it. Because it’s simpler to do it right than to do it wrong in ONIX 3.0.

But you have to use ONIX 3.0. So if you haven’t updated your manual in a couple of years, you’re what we call a silly ninny in the standards community. Go on. Right now. Download the current manual. New “versions” are released once every year or so when new functionality is added. Code list updates update code lists, and version changes add new data functionality. Update your manual with every code list change and know you’re always up-to-date. 

If you’d like to see the code, you can download it here.

There you go. There are other ways this might be done, but this code is a good example of simplicity in ONIX:

  • A series is clearly labeled as well as a sub-series (and therefore easily indexed).

  • Two brands are clearly identified, one with Collection and one with Title. (Should both have been under Title? Blue Sky Studios could have been listed as a Corporate author. These are decisions that Marketing can make, and it really depends on how that entity wants to be identified. If Blue Sky Studios wants to be a brand, then they can be. If they want to be an author, that’s fine, too. It’s your contract and client.)

  • The book title is Vacation in the Wild. There’s no ambiguity between Title and Subtitle. This is the book’s title, and we can get there because Rio 2 is correctly fielded for what it is: a marketable brand name that will span multiple products, not a book title. But it should “display” before the Title and the sequence controls for that.

The retailer is given two ways to create a Display: either they can follow the numbers and display the parts in order, or they can pick up the Display Statement and display it that way. Either way, there is no difficulty creating searches and indexes based on this. All the parts are clear.

Can you just use “some” of ONIX 3.0?

I hope that the above illustrates a problem. ONIX 2.1 was defined by publishers trying to do minimal metadata to communicate with retailers. ONIX 3.0 needs granularity and fielded information to work properly. Nothing will stop data senders from refusing to support Collections, just as in ONIX 2.1 there were data senders who refused to supply Series information or who sent it embedded in Title. And retailers can choose to implement it badly or partially (though if they do then I suspect it will be due to their experience receiving 3.0 data that replicates the problems of 2.1). I can’t stress enough that retailers should implement and display ONIX 3.0 fully and reward complete and accurate use (see Introduction).

The answer to the question “Can I use just ‘some’ of ONIX 3.0?” is clearly: no. ONIX 3.0 is simpler, but it’s more integrated and needs all its parts to work. We need retailers to do the “simple” thing and implement the above. So if you send them data that’s fielded, with no repetition, and with all the parts appropriately labeled, then it’s simple. Fielded. Labeled. Ordered. Do anything else and it’s not.

And before you complain too much about complication, please note that TitleDetail is identical in both Collection and ProductTitle (and elsewhere). While there are more bits in play to create clarity and simplicity, the data structure is used consistently. That’s why companies find that implementing 3.0 is easier than they expect it will be. And it’s why 3.0 gets more value from each of its parts. It should be more stable and last longer than 2.1 did, but it relies on the metadata arriving sensibly and thoughtfully.

Before you decide, just go back to the opening and the “Lazy Daisy Walk” example. That clearly won’t work, right? Then admire the elegance of the new solution. It’s verbose, but clear. Supporting verbosity is not adding new data—it’s adding clarity and simplicity.

Resources

ONIX for Books Product Information Message: How to describe sets, series and multiple item products in ONIX 3 (PDF)

I’m a little reluctant to push this document too much. It was written in 2010 so it represents a “how-to” for ONIX 3.0 version 0—you won’t find sequence numbers and such in it. So don’t base your understanding of good ONIX 3.0 on it. Always look at the date and consider what has happened since. But there are examples and sample scripts, and they are clear. And it lays out how Collections works and explores the special cases and complications that may apply to your trading needs. The simple case above is mostly what trade publishers need to think about, but this is a primary document.

Graham Bell’s webinar on transitioning from ONIX 2.1 to 3.0

  • The section on Collections, which starts around the 47-minute mark, has excellent and detailed explanations—the whole webinar should be consulted.

BISG / BNC Joint Best Practices for North American Data (PDF)

  • The new version published in June 2015 is fully updated to support ONIX 3.0.

The full ONIX Manual and EDItEUR’s International Best Practices

  • Go here to find the ONIX manual—there are so many choices. These are the documents you should be updating regularly.