Exploring the Billions and Billions of Words in the HathiTrust Corpus with Bookworm: HathiTrust + Bookworm Project
Project Directors: J. Stephen Downie (Professor and Associate Dean for Research of Library & Information Science) and Erez Lieberman-Aiden (Assistant Professor of Genetics and Director for Center of Genome Architecture, Baylor College of Medicine)
Grant Program: Digital Humanities Implementation Grants
This grant funded the enhancement and integration of the Bookworm analytical tool with the HathiTrust Digital Library, which holds 3.9 billion pages of digitized materials. Scholars would be able to build individual collections of materials to be studied and to discover new textual use patterns across the corpus. The HathiTrust + Bookworm (HT+BW) Project provides scholars new ways to explore trends within the massive HathiTrust corpus. Detailed exploration of metadata facets adds analytic value over such tools as Google Ngram Viewer. It enables scholars to explore personal worksets and aids discovery of new works. It will help the HathiTrust Research Center provide computational access to the HathiTrust corpus. Open-source improvements to Bookworm code will increase value to other large text projects.
For more information, visit the Project's website.