On July 28, 2021, BookNet Canada is hosting “Canadian Creators & Metadata in the Publishing Supply Chain: A Dialogue.” As the Canadian industry considers best practices for the collection, storage, and dissemination of creator identity information as a part of supply chain practices, the Canadian Bibliographic Committee (a national committee of the non-profit organization BookNet Canada) struck the Equity, Diversity & Inclusion in Metadata Working Group in September 2020 with the ultimate objective of advising the Canadian publishing industry in selecting the wisest course of action to communicate equity, diversity & inclusion information about books and creators.

In the interest of involving creators in that discussion, as primary stakeholders to the question of how creator identity information should be shared, the Working Group is hosting a webinar to share its work to-date and to start a conversation with creators. For the full list of topics for discussion, more details, and to register please click here.

To prepare for the webinar, BookNet Canada’s Bibliographic Manager Tom Richardson has offered this introduction to what we mean when we say “metadata” in the professional publishing industry.

I’ve been asked to supply context for this webinar about what book metadata is as well as how and why it’s distributed. In a nutshell, metadata is information about information — data about data. Records have always been kept about books and at its simplest, book metadata is a catalogue of book titles and authors. A listing of what’s on the cover or title page.

Starting in the 1990s if a retailer wanted to sell books online they literally keyed the book title and author name into their system using publishers' printed catalogues. By the end of that decade if a publisher wanted a retailer to sell their books online they needed to supply something the retailer could load into their system — digital records. Needless to say, there’s been a lot of development since then.

If you’ve filed taxes electronically you’ve done exactly what publishers do regularly: You’ve used software to create a digital file which is then sent and loaded directly into CRA’s database using your social insurance number to identify your CRA account. Publishers use software (either written in-house or bought) to create a file — typically in XML as that’s arguably an efficient way for two computers to talk to each other. This file distributes information about their books — the metadata — using ISBNs to identify each salable version of a book and is loaded to a retailer’s, library’s, wholesaler’s, or other end-user’s database.

So there are systems to digitally trade information about individual ISBNs that represent books, their authors, and the book’s content. These systems favour big impersonal datasets.

Other things have changed, too — we now often find books digitally; maybe as much through social media as by searching retail datasets or online library catalogues. If you know a book exists, you expect to be able to find it easily. And mostly you can. And yes, brick and mortar bookstores are still a major source of discovery, but book ordering happens and bookseller’s knowledge comes to them electronically.

And individuals use digital tools to search out content or creators that interest them or that they want to support. Things like movements to buy from BIPOC businesses and support BIPOC authors are examples of this type of digital searching. That too is a component of book discovery today and it’s part of what this webinar is looking at.

Canadian author example

I’m going to close by providing a quick example of where good things have come from businesses trading information that may be considered private: Identification of Canadian authors. This is what BookNet Canada started to promote around 2005 and started to see increasingly good uptake by 2010:

The bit in bold identifies an association of a country code with an author — that “CA” is used to designate a “Canadian author” in the Canadian book industry.

We’re asking for two things now as metadata for development, though we don’t have much data for the first one yet:

<ContributorPlace>
    <ContributorPlaceRelator>04</ContributorPlaceRelator>
<CountryCode>CA</CountryCode>
<RegionCode>CA-NT</RegionCode>
<LocationName>Yellowknife</LocationName>
</ContributorPlace>
<ContributorPlace>
      <ContributorPlaceRelator>08</ContributorPlaceRelator>
      <CountryCode>CA</CountryCode>
</ContributorPlace>

Libraries and retailers are requesting information that will allow them to identify and promote local authors. The script is coded for “currently resides in” and provides Country / Province or State and a City Name — enough to create and monitor a list of local authors. For example, the code above designates that the author currently resides in Yellowknife, NWT, Canada, and is a Canadian author. And the second group, the bit in bold is the Canadian author marker. Two things, two requests, and two data points defined by a code and a value(s).

Trading that “CA” and storing it in databases is how we support the promotion of Canadian authors. It is the data that creates things like:

49th Shelf, a site built to promote Canadian authors using the dataset from BiblioShare, BookNet Canada’s database of book metadata
CataList, BookNet’s online catalogue service and order management tool
The Globe & Mail and other media bestseller lists

I saved this example for last because The Globe & Mail takes editorial authority over the metadata and confirms that it’s accurate. Because it’s important that it be accurate.

The publishers who know the data, supply the data making it largely accurate. No one assumes that the marker is perfect, so editors confirm it, but the data does help avoid the problem of systemic bias — ignoring names that don’t look Canadian and similar problems.
It streamlines decisions and provides a starting point. Its use is part of its accuracy because mistakes are corrected, and because book metadata is distributed regularly the correction is disseminated.
There is no assumption of perfection but there is a cycle that creates accuracy.

That’s metadata and databases at work and what my colleagues want to talk to you about.