Blog posts are written by project team members. Topics range from conferences we attend, musings on current affairs of relevance, internal project findings and news and more succinct content which can be found in our Digital Humanities Case studies or project related publications. Blog posts will mainly be posted in English but will from time to time feature in the language of the project team member’s preference, since we are a multilingual bunch! Happy reading!


Computer Science and Digital Humanities: The Role of the University of Helsinki in the NewsEye Project

As the NewsEye project will come to an end in January 2022, we are looking back on what we have accomplished since May 2018. Within our final blog series, we are spotlighting the partners and people who have contributed to the project over its 45-month lifespan.

We joined the NewsEye project with three subteams from the University of Helsinki: computer scientists, historians and librarians (who were mentioned in the previous blog post about the National Library of Finland). In this post we reflect on the project experiences of the computer scientists and historians.

It’s in the DNA of computer science to collaborate with other fields. As a scientific discipline, we draw motivation and research problems from domains that offer application potential for novel computational solutions. The analysis of historical newspapers clearly is such a domain, and it was natural to start cross-disciplinary collaboration both within the University of Helsinki as well as with other project partners.

Collaboration and integrated interdisciplinarity have been the key concepts for digital humanities (DH) at the University of Helsinki for the past decade. Compared to computer science, this entails a fundamental change for the working habits of historians, who customarily have worked and published on their own. The required cultural change from individual scholarship to larger research collaboration easily creates different kinds of challenges. New questions arise about the everyday life of researchers, starting with communication methods and best practices with respect to publications.

The collaboration in NewsEye has been fun and fruitful – but at times also challenging. The best part was when you learned something yourself, based on the complementary (if not even contradictory) views of scientists from other fields. Take, for instance, the hermeneutic research tradition and its interpretative emphasis, which is in strong opposition to the quantitative approach favored by computer and data scientists.

In the NewsEye project, the computer science team developed algorithms and software for the analysis of historical newspaper collections and for reporting about the results. The analysis involved methods for so-called topic modeling, on the one hand, and automation of the analysis process on the other hand. For reporting analysis results, we developed natural language generation tools that automatically produce textual reports for the user. We also collaborated with historians on a number of case studies.

Reflecting on NewsEye from a historian's perspective, collaboration with computer scientists always entails a difficulty in assessing how methods from natural language processing can be adapted to research questions that interest historians. Often, the answer is simply that they cannot without laborious testing and adaptation. For all of the studies we collaborated on, a considerable part of the reporting included a discussion on how to interpret the results of algorithmic methods in lieu of and with other historical evidence. In our experience, there are then two steps to making natural language processing methods relevant for inquiry in the humanities. First, the methods cannot be used as off-the-shelf-applications, but require testing for appropriateness in order to produce purposeful results. Second, for the results to be interpreted in a meaningful way, it has to be clear from the very start that the results of an algorithmic analysis cannot stand alone, but need to be studied in conjunction with other knowledge.

The main impact of the NewsEye project on the work of the historians at the University of Helsinki has been that it has provided us with several different case studies that at the same time have improved our working methods and the quality of historical newspaper data. In addition to the undertaken historical case studies, this can be considered as an investment towards the future because through this work historians will continue trying to understand the development of public discourse in nineteenth-century Europe.

The scientific results are available in a number of publications and the code in GitHub repositories, so those interested can build on our work.

Fig. 1. Clusters (Marjanen, Jani, Jussi Kurunmäki, Lidia Pivovarova, och Elaine Zosa. ”The expansion of isms, 1820–1917: Data-driven analysis of political language in digitized newspaper collections”. Journal of Data Mining and Digital Humanities HistoInformatics (December 2020).

COVID 19 influenced our work in a couple of ways. A good amount of travel time and funding was saved when project meetings and conferences were moved online. At the same time, communication between partners became less fluent due to the extra friction created by video meetings and other online communication. Luckily, the project had a strong start before COVID 19 with physical meetings, so project staff had become acquainted with each other before the lockdowns. In Helsinki, we had set an arrangement with weekly visits by one of the historians in the computer science group to facilitate more immediate interaction across disciplinary borders, but this type of collaboration came to a halt in February 2020.

Our team members included the following people:

  • Computer Science: Dr. Mark Granroth-Wilding, Leo Leppänen, Elaine Zosa, Dr. Lidia Pivovarova, Jari Avikainen + several MSc students

  • History: Jani Marjanen, Ruben Ros, Simon Hengchen, Mikko Tolonen + MA students