Exploring the Billions and Billions of Words in the HathiTrust Corpus with Bookworm: HathiTrust + Bookworm Project

An image that compares dancing trends over the whole corpus (similar to Google Ngrams Viewer)

An image that compares dancing trends over the whole corpus (similar to Google Ngrams Viewer). From the HathiTust + Bookworm project.

A comparison of "love" across US and UK works
This image shows comparison across facets, like the comparison of 'love' in US works and UK-published works. These sorts of comparisons are what make Bookworm so powerful and novel.

Project Directors: J. Stephen Downie (Professor and Associate Dean for Research of Library & Information Science) and Erez Lieberman-Aiden (Assistant Professor of Genetics and Director for Center of Genome Architecture, Baylor College of Medicine)

Grant Program: Digital Humanities Implementation Grants

Years: 20142016

This grant funded the enhancement and integration of the Bookworm analytical tool with the HathiTrust Digital Library, which holds 3.9 billion pages of digitized materials. Scholars would be able to build individual collections of materials to be studied and to discover new textual use patterns across the corpus. The HathiTrust + Bookworm (HT+BW) Project provides scholars new ways to explore trends within the massive HathiTrust corpus. Detailed exploration of metadata facets adds analytic value over such tools as Google Ngram Viewer. It enables scholars to explore personal worksets and aids discovery of new works. It will help the HathiTrust Research Center provide computational access to the HathiTrust corpus. Open-source improvements to Bookworm code will increase value to other large text projects.

For more information, visit the Project's website.

 

The Past Five Years

Exploring the Billions and Billions of Words in the HathiTrust Corpus with Bookworm: HathiTrust + Bookworm Project