Blog

Blog posts are written by project team members. Topics range from conferences we attend, musings on current affairs of relevance, internal project findings and news and more succinct content which can be found in our Digital Humanities Case studies or project related publications. Blog posts will mainly be posted in English but will from time to time feature in the language of the project team member’s preference, since we are a multilingual bunch! Happy reading!

 

NewsEye and the Austrian National Library (ONB): An Interview

As the NewsEye project will come to an end in January 2022, we are looking back on what we have accomplished since May 2018. Within our final blog series, we are spotlighting the partners and people who have contributed to the project over its 45-month lifespan.

Can you present the Austrian National Library (ONB) and its role in the NewsEye project?

The Austrian National Library (ONB) is the central academic library of the Republic of Austria with a history dating back to the 14th century.  Due to its history as the successor of the court library of the Habsburg Empire, the historical collections are among the most important worldwide.

A large part of the historical book collection has been digitised; more than 600,000 books with more than 200 million pages are available online. For almost 20 years, ONB has also been active in newspaper digitization. ONB’s newspaper portal ANNO (Austrian Newspapers Online) gives full-text access to around 1,500 historical newspapers from the 16th century to the early 1950s with more than 25 million pages...With its extent and temporal as well as geographical spread, it can serve as a perfect basis for quantitative analysis of image and text data in the humanities. As such, ANNO formed a foundation for the material required for the analysis tools in NewsEye.

ONB has been involved in several European projects about improving digitisation of historical texts, among them IMPACT and Europeana Newspapers.

What you have learned from working on the project and/or how it relates to your institution?

NewsEye relates to the key strategic aims of ONB, in particular in the domain of Digital Humanities. Providing digitized collections as data for research is a core goal of ONB. With ONB Labs, ONB provides an infrastructure for data reuse by researchers and for creative experimentation. ONB is constantly working on improving the quality of data; therefore, it was important for us to be involved in NewsEye. Results in improving quality of OCR (Optical Character Recognition) and the emerging technologies of ATR (Automatic Text Recognition) were very promising and the collaboration between collection experts from the library and computer scientists was very fruitful in the project.

The other key area for ONB is the advance of the state of the art in article separation. ONB is currently investigating possibilities to integrate the NewsEye article separation workflow into the ONB’s workflows in the future in order to provide access to parts of the newspaper collection on article level (but this is definitely a long-term goal which might require additional projects). Also our collaboration with the different DH groups in the project was very fruitful, as it helped us a lot to learn about requirements and priorities from DH researchers – This will definitely inform the improvement of our services for Digital Humanities. A survey regarding services and usability of our newspaper portal ANNO carried out by NewsEye will help us improve the service in future iterations. In particular, this regarded a close collaboration with the DH group in Innsbruck, as we will use educational material developed in the project in our activities with schools.

What have you created during the project?

The dissemination of project results and sustainability planning was of particular relevance for ONB as leader of Work Package 7 (which focused on Demonstration, Dissemination, Outreach and Exploitation). ONB was responsible for the external communication of the project throughout the first three years, including via social media channels. Among other activities, ONB created the NewsEye logo and CI, the NewsEye website + blog + publications section, social media, infographic + leaflets and was responsible for the  NewsEye policy brief. ONB also organised the first user workshop and supported the National Library of France with the NewsEye Conference.

How will your results be sustained after the end of the project?

We are currently working with the University of La Rochelle (ULR) regarding transfer of knowledge, software code and data from the NewsEye project into ONB Labs. First step: integration of the NewsEye platform in ONB Labs, providing access to ONB data via the platform. In the future: investigate whether the whole NewsEye data pipeline (OCR, article separation, named entities, search index) can be applied to the ANNO corpus at ONB. Code will be available on our public GitLab platform. Ongoing development of new features of the platform together with ULR, also after the end of the project.

How were your viewpoints and priorities shaped by your status as a national library? Who were the members of your team?

We were interested in use cases for our newspaper data which are provided by the DH groups, the augmentation of existing newspaper data using eg. NER (Named Entity Recognition), the complexity of processing large data corpora as shown by the CS departments, and learning from the other libraries how they tackle the same problems. Our team members included Tonica Hunter, Simon Mayer, Christoph Steindl, Georg Petz and Max Kaiser.