publication . Conference object . 2019

From massive databases to Web of data: disambiguation and alignment of geographical entities in scientific texts

Pascal Cuxac; Alain Collignon; Stéphanie Gregorio; François Parmentier;
  • Published: 09 Oct 2019
  • Publisher: HAL CCSD
  • Country: France
International audience; In this paper we present an automatic approach to disambiguate and align geographic entities. A method based on word embeddings allows, from unsupervised learning, to remove ambiguity with polysemic terms. This allows automatic alignment with different databases (BNF, wikidata...) having a triplestore. We then use semantic web technologies, both to expose the data in a different way (data.istex) but also to allow complex queries that cannot be solved from traditional search engines. We will discuss a concrete case based on the ISTEX database, and a qualitative evaluation of the method will be proposed.; Dans cet article nous présentons un...
free text keywords: Web of Data, Linked Open Data, Automatic alignment, Disambiguation, Geographic entities, Web de données, Données ouvertes liées, Alignement automatique, Désambiguïsation, Entités géographiques, [SHS.INFO]Humanities and Social Sciences/Library and information sciences

[2] [15] Collignon A. & Cuxac P. (2017). ISTEX : des enrichissements au web de donneées. I2D Information, données documents, vol. 54, n° 4, p. 8-15.

Cuxac P. & Thouvenin N. (2017). Archives numeériques et fouille de textes : le projet ISTEX. Atelier TextMine, Conférence EGC'17, Grenoble.

Harlow C. (2015). Data Munging tools in preparation for RDF: Catmandu and LODRefine. Code {4}lib journal, 2015, vol. 30, p. 1-12.

Any information missing or wrong?Report an Issue