publication . Conference object . 2019

An annotated dataset of literary entities

David Bamman; Sejal Popat; Sheng Shen;
Open Access
  • Published: 21 Jul 2019
  • Publisher: Association for Computational Linguistics
Abstract
We present a new dataset comprised of 210,532 tokens evenly drawn from 100 different English-language literary texts annotated for ACE entity categories (person, location, geo-political entity, facility, organization, and vehicle). These categories include non-named entities (such as “the boy”, “the kitchen”) and nested structure (such as [[the cook]’s sister]). In contrast to existing datasets built primarily on news (focused on geo-political entities and organizations), literary texts offer strikingly different distributions of entity categories, with much stronger emphasis on people and description of settings. We present empirical results demonstrating the p...
Persistent Identifiers
Subjects
free text keywords: Sister, Disparate impact, Natural language processing, computer.software_genre, computer, Computer science, Artificial intelligence, business.industry, business
Communities
Digital Humanities and Cultural Heritage
Any information missing or wrong?Report an Issue