Keyword adoption in LGBT book metadata: A case study

We recently looked at LGBT book sales in another blog post and found that they have generally been increasing over the past seven years, according to print book sales tracked in BNC SalesData. Those findings were really interesting, especially the data showing that "with adult Fiction, sales are on the rise, even with declining availability of titles categorized Gay or Lesbian."

In that blog post, when we were talking about the "declining availability of titles categorized Gay or Lesbian," we were talking about BISAC subject codes, which are used to classify books at a high level. It made us think about the slightly different role of keywords and how they are being applied to LGBT titles, as we wondered if keywords may be more useful than BISAC coding for identifying books with LGBT themes or characters. Specifically, could the above decline be due to the fact that publishers are moving away from using LGBT codes in favour of using generic subject codes alongside more specific, LGBT-related keywords?

Thus began our quest to gain insight into keyword usage. Step one: start pulling data from BiblioShare.

Who's using LGBT BISAC codes?

We found that we have 75 publishers who have used LGBT BISAC codes on a total of 3,243 titles. Admittedly, this is a small data set, but it's a good enough size for this initial look into keyword usage in the Canadian supply chain.

In case you need a refresher, the current LGBT BISAC codes are the following:

  • FIC005030 - FICTION / Erotica / Gay
  • FIC005040 - FICTION / Erotica / Lesbian
  • BIO031000 - BIOGRAPHY & AUTOBIOGRAPHY / LGBT
  • CGN009000 - COMICS & GRAPHIC NOVELS / LGBT
  • CGN004130 - COMICS & GRAPHIC NOVELS / Manga / LGBT
  • DRA017000 - DRAMA / LGBT
  • FIC068000 - FICTION / LGBT / General
  • FIC011000 - FICTION / LGBT / Gay
  • FIC018000 - FICTION / LGBT / Lesbian
  • FIC027300 - FICTION / Romance / LGBT / General
  • FIC027190 - FICTION / Romance / LGBT / Gay
  • FIC027210 - FICTION / Romance / LGBT / Lesbian
  • HUM024000 - HUMOR / Topic / LGBT
  • JUV060000 - JUVENILE FICTION / LGBT
  • JNF053080 - JUVENILE NONFICTION / LGBT
  • LCO016000 - LITERARY COLLECTIONS / LGBT
  • LIT004160 - LITERARY CRITICISM / LGBT
  • POE021000 - POETRY / LGBT
  • SOC064000 - SOCIAL SCIENCE / LGBT Studies / General
  • SOC012000 - SOCIAL SCIENCE / LGBT Studies / Gay Studies
  • SOC017000 - SOCIAL SCIENCE / LGBT Studies / Lesbian Studies
  • TRV026070 - TRAVEL / Special Interest / LGBT
  • YAF010140 - YOUNG ADULT FICTION / Comics & Graphic Novels / LGBT
  • YAF031000 - YOUNG ADULT FICTION / LGBT
  • YAF052040 - YOUNG ADULT FICTION / Romance / LGBT
  • YAN032000 - YOUNG ADULT NONFICTION / LGBT

For those keenly attuned to BISAC codes, you may notice that there are more LGBT categories available in the current 2016 code list than in the previous 2015 list.

Keywords used by LGBT books

Of the 3,243 titles that are classified as LGBT, 34% (1,103) have keywords assigned. This isn't too bad.

Pie chart: 34% have keywords, 66% don't have keywords.

Here we see the span of keywords being applied to LGBT titles. Of the 1,103 LGBT titles providing keywords to BiblioShare, there are 3,055 distinct keywords being used.

Keyword word cloud.

The top 20 most popular keywords are pretty unspecific and unlikely to help discovery in any significant way:

  1. gay (342)
  2. LGBT (280)
  3. fiction (270)
  4. Audio (173)
  5. Book (173)
  6. CD (173)
  7. Audiobook (171)
  8. Lesbian (160)
  9. LGBTQ (147)
  10. romance (140)
  11. memoir (131)
  12. homosexuality (125)
  13. queer (120)
  14. biography (104)
  15. sexuality (104)
  16. Literature (99)
  17. non-fiction (95)
  18. love (95)
  19. family (81)
  20. glbt (79)

However, it's possible that the long tail of more than 3,000 other keywords do get more specific to each book and its contents, making them a useful tool for discovery. (See our Amazon case study below for an example.)

We always like going a step further, so we broke down the percentage of titles in each of the LGBT BISAC categories that have keywords in their data. Look at those travel books!

Graph showing percentage of titles with keywords by BISAC category.

But, yes, as we all know percentages can be misleading. For example, there are only four titles in Travel / Special Interest / LGBT — but still, 100% keyword population is good, right?

Well, not always. Here's the bibliographic data for one of those titles (displayed using our Chrome extension Biblio-o-matic). Check out those keywords: Audiobook; Audio; Book; CD; Biography; Memoir. Not so great!

What about books with LGBT themes that aren't coded with LGBT BISACs?

We're glad you asked. We did find a piece of the answer to our original question, regarding titles not using LGBT BISAC codes and instead applying LBGT-related keywords.

If we look for titles that don't use an LGBT BISAC code but do employ any of the top 20 LGBT keywords listed above, we come up with 147,296 titles.

More investigation would be required to determine if these books are indeed works that could be classified as LGBT. It's always possible that the keywords being used are not precise enough to really classify the content.

An Amazon keywords case study

We also noticed while doing this research that your book doesn't require a particular BISAC code to be assigned to it in order to show up high in the Amazon ranking for that category.

In this example, Price of Salt carries the BISAC code FICTION / Mystery & Detective.

But on Amazon, at the time of this search, it was the #1 ranked title in Gay & Lesbian > Literature & Fiction > Mysteries > Lesbian.

You may be wondering where Amazon gets that information from. They could be pulling keywords from the book's description, but they are likely also taking into account the keywords provided in the metadata, which are:

lesbian;queer;romance;lgbt;1950s;mystery;american;road trip;classic;new york;gay;coming of age;women;lgbtq;sexuality;homosexuality;lesbian fiction;20th century;suspense;america;patricia highsmith;lesbians;divorce;pulp;relationships;thriller;classics;literature;crime fiction;lesbianism;pulp fiction

You can sense the algorithm at play that got this book into that particular Amazon category — BISAC + keywords — proving the case for the importance of adding keywords to your metadata.

Do you have questions about keywords, LGBT titles in the marketplace, or something else entirely? Send your inquires to marketing@booknetcanada.ca and we'll do our best to answer them. We love digging in to the data!