• shareshare
  • link
  • cite
  • add
auto_awesome_motion View all 2 versions
Publication . Preprint . Article . 2018

Explorations in an English Poetry Corpus: A Neurocognitive Poetics Perspective

Jacobs, Arthur M.;
Open Access
Published: 06 Jan 2018

This paper describes a corpus of about 3000 English literary texts with about 250 million words extracted from the Gutenberg project that span a range of genres from both fiction and non-fiction written by more than 130 authors (e.g., Darwin, Dickens, Shakespeare). Quantitative Narrative Analysis (QNA) is used to explore a cleaned subcorpus, the Gutenberg English Poetry Corpus (GEPC) which comprises over 100 poetic texts with around 2 million words from about 50 authors (e.g., Keats, Joyce, Wordsworth). Some exemplary QNA studies show author similarities based on latent semantic analysis, significant topics for each author or various text-analytic metrics for George Eliot's poem 'How Lisa Loved the King' and James Joyce's 'Chamber Music', concerning e.g. lexical diversity or sentiment analysis. The GEPC is particularly suited for research in Digital Humanities, Natural Language Processing or Neurocognitive Poetics, e.g. as training and test corpus, or for stimulus development and control.

27 pages, 4 figures


Computer Science - Computation and Language, Computation and Language (cs.CL), FOS: Computer and information sciences

Related Organizations
43 references, page 1 of 5

Aryani, A., Jacobs, A. M., & Conrad, M. (2013). Extracting salient sublexical units from written texts: “Emophon,” a corpus-based approach to phonological iconicity. Frontiers in Psychology, 4:654. doi: 10.3389/fpsyg.2013.00654

Aryani, A., Kraxenberger, M., Ullrich, S., Jacobs, A. M., & Conrad, M. (2016). Measuring the ba- sic a ective tone of poems via phonological saliency and iconicity. Psychology of Aesthetics, Creativity, and the Arts, 10, 191-204. DOI: 10.1037/ aca0000033

Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python. O'Reilly Media, Inc.

Bornet C and Kaplan F (2017) A Simple Set of Rules for Characters and Place Recognition in French Novels. Front. Digit. Humanit. 4:6. doi: 10.3389/fdigh.2017.00006

Braun, M., Hutzler, F., Ziegler, J. C., Dambacher, M. & Jacobs, A. M. (2009). Pseudo homophone effects provide evidence of early lexico-phonological processing in visual word recognition. Human brain mapping, 30(7), 1977-1989. [OpenAIRE]

Clements, G. N. (1990). The role of sonority in core syllabification. In J. Kingston & M. E. Beckman (Eds.), Papers in laboratory phonology I. Between the grammar and physics of speech (pp. 283-333). Cambridge: CUP.

Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41, 391-407.

Frank, S. L. (2013). Uncertainty reduction as a measure of cognitive load in sentence comprehension. Topics in Cognitive Science, 5(3), 475-494. doi: 10.1111/tops.12025

Ganascia J-G (2015) The Logic of the Big Data Turn in Digital Literary Studies. Front. Digit. Humanit. 2:7. doi: 10.3389/fdigh.2015.00007

Jacobs, A. M. (2015a). Neurocognitive poetics: Methods and models for investigating the

Related to Research communities