Blog

Blog posts are written by project team members. Topics range from conferences we attend, musings on current affairs of relevance, internal project findings and news and more succinct content which can be found in our Digital Humanities Case studies or project related publications. Blog posts will mainly be posted in English but will from time to time feature in the language of the project team member’s preference, since we are a multilingual bunch! Happy reading!

 

Taking Stock of NewsEye with the University of Rostock

As the NewsEye project will come to an end in January 2022, we are looking back on what we have accomplished since May 2018. Within our final blog series, we are spotlighting the partners and people who have contributed to the project over its 45-month lifespan.

Located close to the Baltic sea in the northern German state of Mecklenburg-Western Pomerania, the University of Rostock celebrated the 600th anniversary of its founding in 2019. With a population of about 14,000 students (as of 2021), the university boasts an array of faculties and a wide range of studies. Professor Roger Labahn, who led Rostock’s involvement in the NewsEye project, has witnessed many changes at the university since obtaining a doctorate in Discrete Mathematics there in 1987, when the city was still a part of the German Democratic Republic.

The first decade or so of his career focused on mathematical theory, and he began to develop more of an interest in Applied Mathematics (which is a field now closely related to Computer Science) around the year 2000. Along with Labahn, three doctoral students also contributed to the university’s participation in the project: Max Weidemann, Johannes Michael and Bastian Laasch. All of them are or have been members of the university’s Computer Intelligence and Technology Lab (CITlab), which focuses on Artificial Intelligence and Machine Learning technology.

Rostock’s involvement in the NewsEye project mainly took the form of a work package which focused on the theme of ‘Text Recognition and Article Separation’. Text Recognition, in this case, refers to the process of Optical Text Recognition (OCR) which has been defined by the Cambridge Dictionary as ‘the process by which an electronic device recognises printed or written letters or numbers’. In the context of the NewsEye project, OCR was used to convert printed newspaper text into machine-readable, searchable text. The concept of Article Separation is slightly more self-evident, as it refers to the division of newspaper pages by the articles that are contained within them.

However, in practice, this process posed some challenges, as the newspaper collections which were used in the project varied by country (France, Austria and Finland), language (French, German, Swedish, Finnish and English) and time period (between roughly 1850 and 1950). This means that the layouts of pages were not always consistent, making it more difficult to automate the recognition of individual articles. Nonetheless, the Rostock team made many advances in article separation techniques. If you are interested in exploring these tools, you can access them on the NewsEye GitHub page: https://github.com/NewsEye/Article-Separation.

Rostock’s participation in the NewsEye project has served to boost both the university and the CITlab’s already sizable international reputation, following the success of the Recognition and Enrichment of Archival Documents - READ project (which included another NewsEye partner, the University of Innsbruck). To learn more about the University of Rostock’s involvement in the NewsEye project, listen to this NewsEye podcast episode conversation (in German) between Roger Labahn and Martin Gasteiner (University of Vienna): „KI unter genauer Überwachung durch die Mathematik?“: