Aryani, A., Jacobs, A. M., & Conrad, M. (2013). Extracting salient sublexical units from written texts: “Emophon,” a corpus-based approach to phonological iconicity. Frontiers in Psychology, 4:654. doi: 10.3389/fpsyg.2013.00654
Aryani, A., Kraxenberger, M., Ullrich, S., Jacobs, A. M., & Conrad, M. (2016). Measuring the ba- sic a ective tone of poems via phonological saliency and iconicity. Psychology of Aesthetics, Creativity, and the Arts, 10, 191-204. DOI: 10.1037/ aca0000033
Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python. O'Reilly Media, Inc.
Bornet C and Kaplan F (2017) A Simple Set of Rules for Characters and Place Recognition in French Novels. Front. Digit. Humanit. 4:6. doi: 10.3389/fdigh.2017.00006
Braun, M., Hutzler, F., Ziegler, J. C., Dambacher, M. & Jacobs, A. M. (2009). Pseudo homophone effects provide evidence of early lexico-phonological processing in visual word recognition. Human brain mapping, 30(7), 1977-1989. [OpenAIRE]
Clements, G. N. (1990). The role of sonority in core syllabification. In J. Kingston & M. E. Beckman (Eds.), Papers in laboratory phonology I. Between the grammar and physics of speech (pp. 283-333). Cambridge: CUP.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41, 391-407.
Frank, S. L. (2013). Uncertainty reduction as a measure of cognitive load in sentence comprehension. Topics in Cognitive Science, 5(3), 475-494. doi: 10.1111/tops.12025
Ganascia J-G (2015) The Logic of the Big Data Turn in Digital Literary Studies. Front. Digit. Humanit. 2:7. doi: 10.3389/fdigh.2015.00007
Jacobs, A. M. (2015a). Neurocognitive poetics: Methods and models for investigating the
This paper describes a corpus of about 3000 English literary texts with about 250 million words extracted from the Gutenberg project that span a range of genres from both fiction and non-fiction written by more than 130 authors (e.g., Darwin, Dickens, Shakespeare). Quantitative Narrative Analysis (QNA) is used to explore a cleaned subcorpus, the Gutenberg English Poetry Corpus (GEPC) which comprises over 100 poetic texts with around 2 million words from about 50 authors (e.g., Keats, Joyce, Wordsworth). Some exemplary QNA studies show author similarities based on latent semantic analysis, significant topics for each author or various text-analytic metrics for George Eliot's poem 'How Lisa Loved the King' and James Joyce's 'Chamber Music', concerning e.g. lexical diversity or sentiment analysis. The GEPC is particularly suited for research in Digital Humanities, Natural Language Processing or Neurocognitive Poetics, e.g. as training and test corpus, or for stimulus development and control.
27 pages, 4 figures