• shareshare
  • link
  • cite
  • add
Publication . Other literature type . 2018

The Gutenberg English Poetry Corpus: Exemplary Quantitative Narrative Analyses

Jacobs, Arthur M.;
Open Access
Published: 01 Jan 2018
Publisher: Freie Universität Berlin
Country: Germany

This paper describes a corpus of about 3,000 English literary texts with about 250 million words extracted from the Gutenberg project that span a range of genres from both fiction and non-fiction written by more than 130 authors (e.g., Darwin, Dickens, Shakespeare). Quantitative narrative analysis (QNA) is used to explore a cleaned subcorpus, the Gutenberg English Poetry Corpus (GEPC), which comprises over 100 poetic texts with around two million words from about 50 authors (e.g., Keats, Joyce, Wordsworth). Some exemplary QNA studies show author similarities based on latent semantic analysis, significant topics for each author or various text-analytic metrics for George Eliot’s poem “How Lisa Loved the King” and James Joyce’s “Chamber Music,” concerning, e.g., lexical diversity or sentiment analysis. The GEPC is particularly suited for research in Digital Humanities, Computational Stylistics, or Neurocognitive Poetics, e.g., as training and test corpus for stimulus development and control in empirical studies.


410, 801, quantitative narrative analysis, digital literary studies, neurocognitive poetics, culturomics, language model, neuroaesthetics, affective-aesthetic processes, literary reading

Related Organizations