Powered by OpenAIRE graph
Found an issue? Give us feedback
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

The Karjala database – challenges and solutions for digitizing heterogeneous, old genealogical documents for internet use

Authors: Saarti, Jarmo; Ropponen, Jari; Soivanen, Satu;

The Karjala database – challenges and solutions for digitizing heterogeneous, old genealogical documents for internet use

Abstract

International audience; The Karjala database contains digitized demographic data of the parish registers from the regions ceded to the Soviet Union in 1944. The objectives of the digitization project have been to promote access to digitized records for scientific research and genealogy as well as encouraging research on the people of the ceded Karelia region. The main sources for the database have been catechetical lists, lists of children, and registers of vital statistics (registers of births, marriages, migrations and deaths) that are available in Digital Archives of the National Archives of Finland from the period of 1681 – 1949. The data in the database amounts to about 10.3 million entries, but only data older than 100 years is published openly on the Internet. According to decisions by the Finnish data protection authorities, the Personal Data Act is applied to personal registers less than 100 years old. The digitization process is still going on; it has been calculated that there are 1.2 million entries still to be processed. The database is available to users via https://katiha.mamk.fi/. At present, there are about 6.5 million file entries available on the Internet, each presenting data about one individual, e.g. names, the date of birth and death, the cause of death, age, gender, marital status, occupation, residence, migration, the parish. The Karjala database can be exploited for diverse research purposes; it improves access to the church records that are sometimes very difficult to read. Information in the database can be utilized for historical research, medical genetics, social sciences, and family and onomastics. The database is can be utilized for clarifying family structures, migratory patterns or child mortality. The database also offers excellent opportunities for interdisciplinary research. Our presentation will describe the digitization process management of old, handwritten documents that consist of non-structured data from a historical period that contains varied linguistic material: several languages from a historical period where nations, states and languages were still evolving, different calendars and spelling rules etc. We will also introduce our plans to use text recognition technology so that the handwritten documents such as the Karjala database will be incorporated into the international READ project network http://read.transkribus.eu/network/. We will also discuss the challenges encountered in this type of heterogeneous data and the possibilities for more defined and structured data management that could enable the automated use of the database. We will also include in our presentation a description of the evolution of the different phases of the database, emphasizing the evolution of the database and its linkage with internet technologies e.g. how they have either hindered or enabled the digitization project.

Country
France
Related Organizations
Keywords

Karelia, digitization, [INFO.INFO-DL]Computer Science [cs]/Digital Libraries [cs.DL], [INFO.INFO-DL] Computer Science [cs]/Digital Libraries [cs.DL], genealogical documents, Finland, handwriting

Blanke, Tobias and Bryant, Michael and Hedges, Mark (2012). Open Source Optical Character Recognition for Historical Research. Journal of Documentation (68): 659 - 83.

Cimtech (2011). Crowdsourcing Project Helps Finnish Library to Digitise Historical Documents. Information Management & Technology (IM@T.Online).

Parry, Marc (2012). Historians Ask the Public to Help Organize the Past; but is the Crowd Up to it?" The Chronicle of Higher Education 59(2).

http://go.galegroup.com.ezproxy.lib.umb.edu/ps/i.do?id=GALE%7CA302048583&v=2.1&u=mlin_b_umass &it=r&p=AONE&sw=w (accessed 22.6.2017).

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
  • citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    Powered byBIP!BIP!
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average