Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Ghent University Aca...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
Hyper Article en Ligne
Other literature type . 2017
versions View all 5 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Towards a IIIF-based corpus management platform

Authors: Daems, Joke; Chambers, Sally; Verbruggen, Christophe; Zere, Tecle;

Towards a IIIF-based corpus management platform

Abstract

International audience; The digital text platform is part of the Flemish contribution to DARIAH Belgium (DARIAH = Digital Research Infrastructure for the Arts and Humanities). The goal is to create a platform for the collaborative management and discovery of digitised textual collections that allows digital humanities researchers to prepare their corpora (consisting of, for example, digitised newspapers and books) for textual analysis. The platform will enable researchers to browse and search the digitised collections compiled, cleaned, enriched and managed by the researchers themselves. Once the relevant research sub-corpus has been compiled, data export tools, using standardised open formats (such as XML, JSON, .csv, .txt, etc.) will enable researchers to export sub-corpus for analysis with existing digital text analysis tools such as MALLET, (http://mallet.cs.umass.edu/topics.php) for topic modelling, VOYANT (http://voyant-tools.org) for data visualisation or AntConC (http://www.laurenceanthony.net/software/antconc/) for concordance and textual analysis.The platform has been conceived as part of a larger and modular virtual research environment service infrastructure (http://www.ghentcdh.ugent.be/projects/dariah-vl_vre.si). In a previous phase, possible frameworks and content management systems were tested, notably Islandora (a digital asset management system based on Fedora Commons and Drupal), but also Mediawiki and Omeka.One of the main challenges of the envisaged new platform is the possibility to integrate a wider variety of possible textual data streams (including a scan workflow). In addition, user-friendliness, scalability, adherence to standards and facilitating the interoperability of data are key issues to be addressed. The platform will build on the existing IIIF format, the International Image Interoperability Framework. This format is used by some of the most important libraries and cultural heritage institutions in the world, therefore providing access to enormous collections of digital objects. As the name suggests, IIIF is mainly focused on displaying and annotating images. However, we fully endorse the IIIF-community’s vision to develop an overarching interoperability framework for other data types, including all kinds of textual data. Benefits of the format include the interoperability, the ease of sharing images and annotations without the need to exchange files, and its support for multilingual data. In the months leading up to the conference, we will evaluate the existing IIIFpowered digital libraries and research projects and how they deal with practices of co-creation, data cleaning and enrichment of (structural) metadata. OCR improvement will become vital, as digital textual analysis can only be performed well on high-quality textual data. A related challenge will be combining the various input formats and converting them to different output formats required for analysis. In our poster, we will present a summary of our experiences with and technical assessment of our previous Islandora installation, in addition to our survey of the existing corpus management solutions. As a way of conclusion, we will introduce the envisioned new version of the platform.

Countries
Belgium, France
Related Organizations
Keywords

[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing, Digital Humanities, [INFO.INFO-TT] Computer Science [cs]/Document and Text Processing, Text analysis, [SHS] Humanities and Social Sciences, [SHS]Humanities and Social Sciences, DARIAH, DARIAH-BE, Corpus management

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
  • citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    Powered byBIP!BIP!
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Related to Research communities
Digital Humanities and Cultural Heritage