Le blog

Les articles du blog NewsEye sont rédigés par les membres de notre équipe projet. Parmi les thèmes traités figurent les conférences auxquelles nous assistons, des réflexions sur les questions d’actualité pertinentes, des actualités et les avancés de notre projet, ainsi que du contenu plus succinct diffusé dans le cadre de nos études de cas sur les humanités numériques ou de nos publications relatives au projet. Les articles du blog sont publiés principalement en anglais, mais néanmoins proposés de temps à autre dans la langue de prédilection du membre de l’équipe projet concerné, puisque notre petite troupe parle plusieurs langues ! Bonne lecture !

 

Taking the Lead: La Rochelle University and the NewsEye Project

As the NewsEye project will come to an end in January 2022, we are looking back on what we have accomplished since May 2018. Within our final blog series, we are spotlighting the partners and people who have contributed to the project over its 45-month lifespan.

The Role of La Rochelle University in the NewsEye project: How Did it Start ?

With the advancement of digitization and elaborate algorithms based on machine and deep learning techniques, the analysis and enrichment of digital newspaper research projects increased proportionally. NewsEye provided us with the opportunity of working in an interdisciplinary manner with computer scientists, humanities researchers, and librarians from different countries, universities, and national libraries.

NewsEye started from the combination of ULR's bilateral collaboration in the context of a very relevant Horizon 2020 call for proposals. ULR was already collaborating with the University of Helsinki on NLP and data mining, with the National Library of France on digital libraries and with the University of Innsbruck on digital history, with plans to analyse a smaller collection of Austrian historical newspapers. The topic 'European cultural heritage, access and analysis for a richer interpretation of the past' (topic CULT-COOP-09-2017) of the Horizon 2020 call of 2017 brought the opportunity to join forces and build a large project proposal, with support from seed funding of the French national research agencies (MRSEI programme of ANR). We then came into a process of refining the project’s goals and extending the consortium to add the adequate competences. We were able to meet twice to prepare the proposal, which considerably improved its quality and helped kick-start the project since this strengthened our mutual understanding and expectations, which were already very high on the first day of the project.

What has been produced for NewsEye at La Rochelle University?

One of the main accomplishments of La Rochelle University's contribution was the development of the NewsEye platform (https://platform.newseye.eu) which works as a prototype interface for digitized newspapers that was created as a proof of concept. Once the newspaper articles and the full text are extracted, they are uploaded to the NewsEye platform to further showcase textual content as data that the university would mine and analyze to extract relevant insights.

University of La Rochelle focused on providing a platform for finer-grained document analysis. Thus, we extracted locations, person names, organizations (also referred to as named entity recognition), events and entity stances (opinions). One of the main challenges in the project was the analysis of the text collections that were very noisy, due to imperfect output from optical character recognition (OCR) and because layout or article segmentation processes produced rather imperfect results. 

After more than a year of fruitful online meetings with colleagues from the project, we were able to produce meaningful results. Our proposed methods for named entity recognition and linking (NER, NEL) obtained state-of-the-art results and proved that creating methods that are robust to OCR noise and that can alleviate such errors related to spelling variations in historical newspapers is possible. The models were also evaluated in international competitions where they won first place in most leaderboards. For instance, when our colleagues of the Impresso project organized the CLEF HIPE 2020 competition on named entity recognition and linking in historical newspapers, our approaches performed best in all 3 languages, with 50 first places and 2 second places, out of 52 leaderboards, and with a total of 13 participating teams. Stance detection systems were also developed in order to enrich named entities with the stance (opinion) of the news article authors toward each named entity extracted. Once entities were extracted and linked to knowledge bases, University of La Rochelle performed the detection of events from newspaper articles. Finally, all these steps could be considered as a building block of historical knowledge with which historians and humanities researchers formulate their system of ideas about the past and future. If you are interested in exploring these tools, you can access them on the NewsEye GitHub page (https://github.com/NewsEye/Named-Entity-Recognition and https://github.com/NewsEye/event-detection). Even though the project is now ending, we are planning to continue improving our results and find new use cases with new upcoming projects.

We hope to push the field of digital humanities further in La Rochelle, in particular around the access and analysis of large collections of historical and digitized documents. Several projects have already started, supported by public and/or private funding, for research. In this context, we intend to further develop the results of NewsEye, as well as to exploit them. Notably we have already granted for several years the further development of the NewsEye platform and the accompanying semantic enrichment tools (NER, NEL, stance and event detection). We are thus happy to welcome interdisciplinary collaborations and are looking forward to grounding this line in work in the context of the upcoming digital humanities center of the University of La Rochelle and EU-CONEXUS, the European University to which it belongs (together with another NewsEye partner, the University of Rostock).

Presenting the team

Multiple people from this university contributed to the project on different topics and with various levels and forms of involvement: Guillaume Bernard, Emanuela Boros, Mickaël Coustaty, Antoine Doucet, Cyril Faucher, Esteban Frossard, Petra Gomez, Ahmed Hamdi, Axel Jean-Caurant, José Moreno, Nhu Khoa Nguyen, Thi Tuyet Hai Nguyen, Elvys Linhares Pontes, Christophe Rigaud, Nicolas Sidère and Cyrille Suire.

This video, which was created by Cédric Rochereul of La Rochelle University, presents the project through the words of collaborators from different countries, institutions and professional backgrounds, including Axel Jean-Caurant and Antoine Doucet.