BNC BiblioShare is going gangbusters with over 230,000 EANs in the system, over 12,000 Canadian author markers and aggregators starting to experiment with the data. It seems like a good time to take stock.
One of the first things we found is anyone who wants the data really wants—and I mean really really wants—Canadian indicators, as much as they can get. I’m required at every opportunity to list these, so to review:
- Canadian authorship is shown using the Contributor composite’s country code;
- BISAC Regional Codes (code 11 in the Subject composite) can be used to draw the attention of a region’s book buyers or libraries;
- if you’re still using an old BISAC code list, well, you know: a number of Canada specific codes were added 3 or 4 years ago and you should update yearly.
The industry wants to promote your books, so help them. There’s even been some feelers about, maybe, getting publishers to use Country of Publication more too—and Country of Manufacture can be in the mix too and with Code Issue 11 you can include it in ONIX 2.1 files using the familiar Other Text composite.
The other piece of feedback that we’re getting from aggregators is: Why is the data so inconsistent (actual adjectives have been standardized in the interest of simplicity and taste)? Even BookNet Canada’s own programmer has wondered, the programmer from the ACP’s Bookshelf (as experienced an ONIX producer as exists in Canada) has been caustic, and Library and Archives Canada has been close to un-librarian-like.
This is not a problem unique to Canada: BIC’s most recent annual report said the same thing and questioned the effectiveness of certifying publishers, it was discussed last year at the ONIX meetings at the London Book Fair and if not for volcanoes would have been again.
When it gets to e-pub metadata the universally agreed descriptor is crap.
It’s a feature of BNC BiblioShare that it’s the publisher’s data and that rather than changing it we work with publishers to make it better.
Just to be clear, BNC’s aggregated dataset—the stuff we’ll be serving up over the web and issuing as aggregated files from our own database—will have some level of standardization in it, but we aren’t staffed or funded to do a intensive remediation of files. Actually, that how the industry got to where it is. Big Publishers submitted files and Big Retail’s programmers wrote scripts, publisher by publisher, to clean them up to match as best it could their individual standard. Bowker and other aggregators invested heavily in systems to compensate so that their output is a clean as they can make it. And it’s lovely! But the smaller the publisher the less economic that model is, and everyone who’s data has been changed knows it’s a mixed blessing: the fix can be its own problem.
So, I say again: It’s a feature of BNC BiblioShare that it’s the publisher’s data. The size of file is irrelevant to us because all files are processed the same—the only limit to getting in is they have to pass a strict validation. But once that’s passed we work with publishers to make it better. Each and every file has its own Detailed Report—it’s not perfect or the final word on data quality but it’s very good. Mind you it’s only good if publisher’s use it and think that these are things they should fix.
And this leads to a question for the Canadian publishing industry: What do you want from certification? The aggregators we’re working with seem to think that a publisher file that’s been “certified” should be usable without work on their part. And by that they mean: quality bibliographic information that really matches the book, full utilization of all the correct ONIX elements, no glitchy characters, Canadian identifiers up the wazoo, as much enhanced content a possible… They’re pretty demanding given they are trying to provide publishers with free discover-ability and support.
This seems like a no-brainer—of course a certified file should be usable! But I’m not sure. Does Canada want a hard-assed data certification system? What are the boundaries of failure to certify? What should that mean?
Here’s a chance to talk about it and comment below…