publication . Preprint . 2017

Measurement Context Extraction from Text: Discovering Opportunities and Gaps in Earth Science

Hundman, Kyle; Mattmann, Chris A.;
Open Access English
  • Published: 11 Oct 2017
We propose Marve, a system for extracting measurement values, units, and related words from natural language text. Marve uses conditional random fields (CRF) to identify measurement values and units, followed by a rule-based system to find related entities, descriptors and modifiers within a sentence. Sentence tokens are represented by an undirected graphical model, and rules are based on part-of-speech and word dependency patterns connecting values and units to contextual words. Marve is unique in its focus on measurement context and early experimentation demonstrates Marve's ability to generate high-precision extractions with strong recall. We also discuss Marve's role in refining measurement requirements for NASA's proposed HyspIRI mission, a hyperspectral infrared imaging satellite that will study the world's ecosystems. In general, our work with HyspIRI demonstrates the value of semantic measurement extractions in characterizing quantitative discussion contained in large corpuses of natural language text. These extractions accelerate broad, cross-cutting research and expose scientists new algorithmic approaches and experimental nuances. They also facilitate identification of scientific opportunities enabled by HyspIRI leading to more efficient scientific investment and research.
free text keywords: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Funded by
NSF| An open source framework for metadata exploration and discovery of Polar Data
  • Funder: National Science Foundation (NSF)
  • Project Code: 1348450
  • Funding stream: Directorate for Geosciences | Division of Polar Programs
Download from
25 references, page 1 of 2

[1] Milan Agatonovic, Niraj Aswani, Kalina Bontcheva, Hamish Cunningham, Œomas Heitz, Yaoyong Li, Ian Roberts, and Valentin Tablan. 2008. Large-scale, parallel automatic patent annotation. In Proceedings of the 1st ACM workshop on Patent information retrieval. ACM, 1-8.

[2] Alan Akbik and Alexander Lo¨ser. 2012. Kraken: N-ary facts in open information extraction. In Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction. Association for Computational Linguistics, 52-56.

[3] Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, and Michael Collins. 2016. Globally normalized transition-based neural networks. arXiv preprint arXiv:1603.06042 (2016).

[4] AM Baldridge, SJ Hook, CI Grove, and G Rivera. 2009. Œe ASTER spectral library version 2.0. Remote Sensing of Environment 113, 4 (2009), 711-715.

[5] Hannah Bast and Elmar Haussmann. 2013. Open information extraction via contextual sentence decomposition. In Semantic Computing (ICSC), 2013 IEEE Seventh International Conference on. IEEE, 154-159.

[6] Mohamed R Berber, Inas H Hafez, Keiji Minagawa, Masami Tanaka, and Takeshi Mori. 2012. An ecient strategy of managing irrigation water based on formulating highly absorbent polymer-inorganic clay composites. Journal of Hydrology 470 (2012), 193-200.

[7] Annie Bryant Burgess, Chris MaŠmann, Giuseppe Totaro, Lewis John McGibbney, and Paul M Ramirez. 2015. TREC Dynamic Domain: Polar Science.. In TREC.

[8] Danqi Chen and Christopher D Manning. 2014. A Fast and Accurate Dependency Parser using Neural Networks.. In EMNLP. 740-750.

[9] Hamish Cunningham, Diana Maynard, Kalina Bontcheva, and Valentin Tablan. 2002. A framework and graphical development environment for robust NLP tools and applications.. In ACL. 168-175.

[10] Luciano Del Corro and Rainer Gemulla. 2013. Clausie: clause-based open information extraction. In Proceedings of the 22nd international conference on World Wide Web. ACM, 355-366.

[11] ‹entin Hardy. 2016. Dell Gets Bigger and HewleŠ Packard Gets Smaller in Separate Deals. (Sep 2016). hŠps:// dell-gets-bigger-and-hewleŠ-packard-gets-smaller-in-separate-deals.html

[12] Graham R Hunt. 1977. Spectral signatures of particulate minerals in the visible and near infrared. Geophysics 42, 3 (1977), 501-513.

[13] Christine M Lee, Morgan L Cable, Simon J Hook, Robert O Green, Susan L Ustin, Daniel J Mandl, and Elizabeth M Middleton. 2015. An introduction to the NASA Hyperspectral InfraRed Imager (HyspIRI) mission and preparatory activities. Remote Sensing of Environment 167 (2015), 6-19.

[14] Charlie LLoyd. 2013. Landsat 8 Bands. (2013). hŠp:// landsat-8/landsat-8-bands/

[15] Patrice Lopez. 2010. Automatic extraction and resolution of bibliographical references in patent documents. In Information Retrieval Facility Conference. Springer, 120-135.

25 references, page 1 of 2
Any information missing or wrong?Report an Issue