publication . Conference object . 2020

Information Extraction Workflow for Digitised Entry-based Documents

Khemakhem, Mohamed; Gabay, Simon; Joyeux-Prunel, Béatrice; Romary, Laurent; Saint-Raymond, Léa; Rondeau Du Noyer, Lucie;
Open Access English
  • Published: 26 May 2020
  • Publisher: HAL CCSD
  • Country: France
Abstract
International audience
Subjects
ACM Computing Classification System: ComputingMilieux_MISCELLANEOUS
free text keywords: Exhibition catalogue, Dictionary, Auction catalogue, TEI-XML, GROBID-Dictionaries, Catalogue d'Exposition, Dictionnaire, Catalogue de vente aux enchères, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR], [SHS.HIST]Humanities and Social Sciences/History, [SHS.ART]Humanities and Social Sciences/Art and art history, [SHS.LITT]Humanities and Social Sciences/Literature

[1] Ian Gregory. Challenges and opportunities for digital history. Frontiers in digital humanities, 2014.

[2] Jose van Dijck. Big data, grand challenges: on digitization and humanities research. KWALON, 21:8 { 18.

[3] Mohamed Khemakhem, Luca Foppiano, and Laurent Romary. Automatic Extraction of TEI Structures in Digitized Lexical Resources using Conditional Random Fields. In electronic lexicography, eLex 2017, Leiden, Netherlands, September 2017. [OpenAIRE]

[4] James Pustejovsky and Amber Stubbs. Natural Language Annotation for Machine Learning: A guide to corpus-building for applications. O'Reilly Media, 2012.

[5] Mohamed Khemakhem, Axel Herold, and Laurent Romary. Enhancing Usability for Automatically Structuring Digitised Dictionaries. In GLOBALEX workshop at LREC 2018, Miyazaki, Japan, May 2018. [OpenAIRE]

[6] Mohamed Khemakhem, Carmen Brando, Laurent Romary, Frederique Melanie-Becquet, and Jean-Luc Pinol. Fueling Time Machine: Information Extraction from Retro-Digitised Address Directories. In JADH2018 "Leveraging Open Data", Tokyo, Japan, September 2018.

[7] David Lindemann, Mohamed Khemakhem, and Laurent Romary. Retrodigitizing and Automatically Structuring a Large Bibliography Collection. In European Association for Digital Humanities (EADH) Conference, Galway, Ireland, December 2018. [OpenAIRE]

[8] Lucie Rondeau Du Noyer, Simon Gabay, Mohamed Khemakhem, and Laurent Romary. Scaling up Automatic Structuring of Manuscript Sales Catalogues. In TEI 2019: Book of Abstracts, Graz, Austria, September 2019. [OpenAIRE]

[9] Simon Gabay, Lucie Rondeau du Noyer, Mohamed Khemakhem, and Laurent Romary. Selling autograph manuscripts in 19th c. paris: digitising the revue des autographes. In AIUCD 2020: Proceedings of the Ninth Annual Conference, Milan, Italy, January 2020.

Abstract
International audience
Subjects
ACM Computing Classification System: ComputingMilieux_MISCELLANEOUS
free text keywords: Exhibition catalogue, Dictionary, Auction catalogue, TEI-XML, GROBID-Dictionaries, Catalogue d'Exposition, Dictionnaire, Catalogue de vente aux enchères, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR], [SHS.HIST]Humanities and Social Sciences/History, [SHS.ART]Humanities and Social Sciences/Art and art history, [SHS.LITT]Humanities and Social Sciences/Literature

[1] Ian Gregory. Challenges and opportunities for digital history. Frontiers in digital humanities, 2014.

[2] Jose van Dijck. Big data, grand challenges: on digitization and humanities research. KWALON, 21:8 { 18.

[3] Mohamed Khemakhem, Luca Foppiano, and Laurent Romary. Automatic Extraction of TEI Structures in Digitized Lexical Resources using Conditional Random Fields. In electronic lexicography, eLex 2017, Leiden, Netherlands, September 2017. [OpenAIRE]

[4] James Pustejovsky and Amber Stubbs. Natural Language Annotation for Machine Learning: A guide to corpus-building for applications. O'Reilly Media, 2012.

[5] Mohamed Khemakhem, Axel Herold, and Laurent Romary. Enhancing Usability for Automatically Structuring Digitised Dictionaries. In GLOBALEX workshop at LREC 2018, Miyazaki, Japan, May 2018. [OpenAIRE]

[6] Mohamed Khemakhem, Carmen Brando, Laurent Romary, Frederique Melanie-Becquet, and Jean-Luc Pinol. Fueling Time Machine: Information Extraction from Retro-Digitised Address Directories. In JADH2018 "Leveraging Open Data", Tokyo, Japan, September 2018.

[7] David Lindemann, Mohamed Khemakhem, and Laurent Romary. Retrodigitizing and Automatically Structuring a Large Bibliography Collection. In European Association for Digital Humanities (EADH) Conference, Galway, Ireland, December 2018. [OpenAIRE]

[8] Lucie Rondeau Du Noyer, Simon Gabay, Mohamed Khemakhem, and Laurent Romary. Scaling up Automatic Structuring of Manuscript Sales Catalogues. In TEI 2019: Book of Abstracts, Graz, Austria, September 2019. [OpenAIRE]

[9] Simon Gabay, Lucie Rondeau du Noyer, Mohamed Khemakhem, and Laurent Romary. Selling autograph manuscripts in 19th c. paris: digitising the revue des autographes. In AIUCD 2020: Proceedings of the Ninth Annual Conference, Milan, Italy, January 2020.

Any information missing or wrong?Report an Issue