Der Blog

Unsere Blogartikel werden von Mitgliedern des Projektteams verfasst. Die Themen umfassen ein breites Spektrum: Bericht über Tagungen, auf denen wir vertreten waren, Reflexionen über aktuelle Herausforderungen und Themen, interne Ergebnisse und Fortschrittsberichte, Case Studies aus dem Bereich der digitalen Geisteswissenschaften, sowie Nachrichten über aktuelle Publikationen aus dem Projekt. Die Blogbeiträge werden hauptsächlich auf Englisch verfasst, ab und zu aber auch in der Sprache des*der Verfasser*in. Wir sind schließlich ein multilinguales Team! Wir wünschen spannende Lektüre.

 

NewsEye and the National Library of Finland

As the NewsEye project will come to an end in January 2022, we are looking back on what we have accomplished since May 2018. Within our final blog series, we are spotlighting the partners and people who have contributed to the project over its 45-month lifespan.

The National Library of Finland is a cultural heritage organization that is open to all and provides nationwide services to citizens, scientific communities and other societal operators. It is part of the University of Helsinki. The Library’s mission is to secure the availability of cultural heritage published in Finland or relating to Finland, as well as to transmit and produce information content for research, studies, citizens and the society. The Library develops services in cooperation with libraries, archives, museums and others. Its Research Library and the Digital Humanities Research Unit were partners in the NewsEye project.

There were three people from the Digital Humanities Unit who were most intensely involved in the project. Juha Rautiainen’s strong IT, data and digital library expertise was essential in our participation. Kimmo Kettunen’s versatile linguistics and machine learning-oriented knowledge and experience helped us in the implementation. Minna Kaukonen is a specialist in project management regarding digital library development. The National Library participated in two work packages: data collection and preservation and dissemination. Our experts were also involved in writing and evaluating deliverables and disseminating the results in various presentations.

The Finnish newspaper materials in the project contained 10 titles which spanned until 1918. Three of them are still published today, which is amazing and reflects the central position of newspapers in Finnish society. The three modern titles include the biggest Finnish-language and the biggest Swedish-language newspaper. All the newspapers in the project were chosen based on their significance and history, in cooperation with the Finnish digital humanities researchers. The timeframe was limited to the 1910s due to copyright issues. Half of the chosen pages were in Swedish, our other official language. It is prevalent in older titles due to our common history with Sweden, even during the time that Finland was a grand duchy of Russia from 1809 to 1917.

The cooperation of computer scientists, digital humanities researchers and library professionals was valuable and profitable. One of the developed areas was text recognition, which is a key component in the quality and usability of digitized historical resources. The most important result of the NewsEye project for us was the improvement of text recognition of newspapers and the possibility to also improve the search results in our digital library digi.nationallibrary.fi. In fact, optical character recognition was developed towards automatic text recognition (ATR) in the project. The Transkribus platform used was originally developed for handwritten text but has successfully been expanded to process printed text, as well. Because the historical Finnish newspapers include a considerable amount of gothic font, the critical improvement is that Transkribus can now understand it better than the software we have previously used.

We have decided to re-process all the older newspaper titles from 1771 – 1914, containing about 2.5 million pages altogether, including the titles within the NewsEye project. Compared to earlier optical character recognition results, the quality of ATR is considerably better: on average 10 percentage units. We are investigating the possibility to continue to process newspapers after 1914. In the extended process, we are collaborating with the READ-COOP cooperative, which develops the usability of historical resources with artificial intelligence. The National Library of Finland is a member of READ-COOP.

The participation of the National Library of Finland in NewsEye has proven to be a remarkable achievement, both regarding improved digital library newspaper quality and networking, as well as building future paths for research together with the consortium. 

To learn more about Finnish newspaper history, please read the earlier NewsEye blog: https://www.newseye.eu/blog/news/building-a-bilingual-nation/