Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Journal of Biomedica...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
versions View all 4 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Acquisition and evaluation of verb subcategorization resources for biomedicine

Authors: Laura Rimell; Thomas Lippincott; Karin Verspoor; Helen L. Johnson; Anna Korhonen;

Acquisition and evaluation of verb subcategorization resources for biomedicine

Abstract

Background: Biomedical natural language processing (NLP) applications that have access to detailed resources about the linguistic characteristics of biomedical language demonstrate improved performance on tasks such as relation extraction and syntactic or semantic parsing. Such applications are important for transforming the growing unstructured information buried in the biomedical literature into structured, actionable information. In this paper, we address the creation of linguistic resources that capture how individual biomedical verbs behave. We specifically consider verb subcategorization, or the tendency of verbs to ''select'' co-occurrence with particular phrase types, which influences the interpretation of verbs and identification of verbal arguments in context. There are currently a limited number of biomedical resources containing information about subcategorization frames (SCFs), and these are the result of either labor-intensive manual collation, or automatic methods that use tools adapted to a single biomedical subdomain. Either method may result in resources that lack coverage. Moreover, the quality of existing verb SCF resources for biomedicine is unknown, due to a lack of available gold standards for evaluation. Results: This paper presents three new resources related to verb subcategorization frames in biomedicine, and four experiments making use of the new resources. We present the first biomedical SCF gold standards, capturing two different but widely-used definitions of subcategorization, and a new SCF lexicon, BioCat, covering a large number of biomedical sub-domains. We evaluate the SCF acquisition methodologies for BioCat with respect to the gold standards, and compare the results with the accuracy of the only previously existing automatically-acquired SCF lexicon for biomedicine, the BioLexicon. Our results show that the BioLexicon has greater precision while BioCat has better coverage of SCFs. Finally, we explore the definition of subcategorization using these resources and its implications for biomedical NLP. All resources are made publicly available. Conclusion: The SCF resources we have evaluated still show considerably lower accuracy than that reported with general English lexicons, demonstrating the need for domain- and subdomain-specific SCF acquisition tools for biomedicine. Our new gold standards reveal major differences when annotators use the different definitions. Moreover, evaluation of BioCat yields major differences in accuracy depending on the gold standard, demonstrating that the definition of subcategorization adopted will have a direct impact on perceived system accuracy for specific tasks.

Subjects by Vocabulary

Microsoft Academic Graph classification: Phrase Computer science Context (language use) Verb computer.software_genre Semantics Lexicon Subcategorization Parsing business.industry Relationship extraction Artificial intelligence business computer Natural language processing

Keywords

Biomedical Research, Abstracting and Indexing, Information Storage and Retrieval, Health Informatics, Lexical resources, Biomedical text processing, Natural Language Processing, Publications, Semantics, Computer Science Applications, Verb subcategorization

29 references, page 1 of 3

[1] Lippincott T, Rimell L, Verspoor K, Korhonen A. Approaches to verb subcategorization for biomedicine. J Biomed Inform, in press.

[2] Ananiadou S, Thompson P, Nawaz R. Improving search through event-based biomedical text mining. In: Proceedings o the first international worksho on automated motif discovery in cultural heritage and scientific communication texts (AMICUS 2010), CLARIN/DARIAH 2010. Vienna, Austria; 2010. [OpenAIRE]

[3] Rupp C, Thompson P, Black W, McNaught J. A specialised verb lexicon as the basis of fact extraction in the biomedical domain. In: Proceedings of interdisciplinary workshop on verbs: the identification and representation of verb features (Verb 2010). Pisa, Italy; 2010.

[4] Sasaki Y, Montemagni S, Pezik P, Rebholz-Schuhmann D, McNaught J, Ananiadou S. BioLexicon: a lexical resource for the biology domain. In: Proc. of the third international symposium on semantic mining in biomedicine (SMBM 2008); 2008. [OpenAIRE]

[5] Rebholz-Schuhmann D, Pezik P, Lee V, Kim JJ, del Gratta R, McNaught YSJ, et al. Towards a reference terminological resource in the biomedical domain. In: Proc. of 16th ann. int. conf. on intelligent systems for molecular biology (ISMB2008). Toronto, Canada: Oxford University Press; 2008.

[6] Venturi G, Montemagni S, Marchi S, Sasaki Y, Thompson P, McNaught J, et al. Bootstrapping a verb lexicon for biomedical information extraction. In: Gelbukh A, editor. Computational linguistics and intelligent text processing. Lecture notes in computer science, vol. 5449. Berlin/Heidelberg: Springer; 2009. p. 137-48. doi: 10.1007/978-3-642-00382-0-1, URL http://dx.doi.org/ 10.1007/978-3-642-00382-0-11.. [OpenAIRE]

[7] Thompson P, McNaught J, Montemagni S, Calzolari N, Gratta RD, Lee V, et al. The biolexicon: a large-scale terminological resource for biomedical text mining. BMC Bioinformatics 2011;12. 397-397. [OpenAIRE]

[8] Grishman R, Macleod C, Meyers A. COMLEX syntax: building a computational lexicon. In: Proceedings of COLING. Kyoto; 1994. [OpenAIRE]

[9] Wattarujeekrit T, Shah P, Collier N. PASBio: predicate-argument structures for event extraction in molecular biology. BMC Bioinformatics 2004;5.

[10] Tsai RTH, Chou WC, Lin YC, Sung CL, Ku W. BIOSMILE: adapting semantic role labeling for biomedical verbs: an exponential model coupled with automatically generated template features. In: Proceedings of the BioNLP'06 workshop on linking natural language processing and biology. Association for Computational Linguistics; 2005. p. 57-64.

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    8
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
  • citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    8
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    Powered byBIP!BIP!
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
8
Average
Average
Average
moresidebar

Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.