Travaux de recherche

Consulter les articles publiés par le projet NewsEye sur cette page. Toutes ces publications sont également disponibles sur notre page Zenodo et archivées dans le référentiel OpenAIRE de la Commission européenne.



Marjanen, Jani, Vaara, Villle, Kanner, Antti, Roivainen, Hege, Mäkelä,Eetu, Lahti, Leo, & Tolonen, Mikko. (2019). A National Public Sphere? Analyzing the Language, Location, and Form of Newspapers in Finland, 1771–1917. Journal of European Periodical Studies 4.1 (summer 2019), 55–78.

Pontes, E. L., Huet, S., Torres-Moreno, J.-M., da Silva, T. G., & Linhares, A. C. (2020). A Multilingual Study of Multi-Sentence Compression using Word Vertex-Labeled Graphs and Integer Linear Programming. Journal of Computación y Sistemas: Vol. 24, No. 2, 2020.

Pfanzelter, E., Oberbichler, S., Marjanen, J., Langlais, P.-C., & Hechl, S. (2021). Digital interfaces of historical newspapers: opportunities, restrictions and recommendations. Journal of Data Mining and Digital Humanities,, In press, HistoInformatics.

Marjanen, J., Kurunmäki, J., Pivovarova, L., & Zosa, E. (2020). The expansion of isms, 1820–1917: Data-driven analysis of political language in digitized newspaper collections. Journal of Data Mining and Digital Humanities,, 2020, HistoInformatics.

Nguyen, Thi-Tuyet-Hai, Jatowt, Adam, Coustaty, MIickael, & Doucet, Antoine. (2021). Survey of Post-OCR Processing Approaches. ACM Computing Surveys, 1, 1 (March 2020), 36. 


Book Chapter/Section

Nguyen, T.-T.-H., Coustaty, M., Doucet, A., Jatowt, A., & Nguyen, N.-V. (2018). Adaptive Edit-Distance and Regression Approach for Post-OCR Text Correction. Maturity and Innovation in Digital Libraries, 278–289.

Mutuvi, S., Doucet, A., Odeo, M., & Jatowt, A. (2018). Evaluating the Impact of OCR Errors on Topic Modeling. Maturity and Innovation in Digital Libraries, 3–14.

Michael, Johannes, Weidemann, Max, Laasch, Bastian, & Labahn, Roger. (2021). ICPR 2020 Competition on Text Block Segmentation on a NewsEye Dataset.   Lecture Notes in Computer Science, (LNCS, volume 12668). Springer. 


Avikainen, J. (2019). A Method for Wavelet-Based Time Series Analysis of Historical Newspapers.

Hechl, S. P. (2020). ‘Wir dürfen wieder Österreicher sein!’ Die Rolle der Tagespresse in österreichischen Nation-Building-Prozessen 1945–1948 – eine quantitative Analyse ausgewählter digitaler Zeitungskorpora samt Vorschlägen zur didaktischen Umsetzung.

Conference Papers

22nd International Academic Mindtrek Conference, 10th - 11th October 2018
Alhalaseh, Rola, Munezero, Myriam, Leinonen, Miika, Leppänen, Leo, Avikainen, Jari, & Toivonen, Hannu. (2018). Towards Data-Driven Generation of Visualizations for Automatically Generated News Articles. ACM, Association for Computing Machinery. 

ACM/IEEE Joint Conference on Digital Libraries (JCDL), Urbana-Champaign, Illinois, June 2-6, 2019
Sumikawa, Y., Jatowt, A., Doucet, A., & Moreux, J.-P. (2019). Large Scale Analysis of Semantic and Temporal Aspects in Cultural Heritage Collection's Search.

Hamdi, A., Jean-Caurant, A., Sidere, N., Coustaty, M., & Doucet, A. (2019). An Analysis of the Performance of Named Entity Recognition over OCRed Documents.

Nguyen, T.-T.-H., Jatowt, A., Coustaty, M., Nguyen, N.-V., & Doucet, A. (2019). Deep Statistical Analysis of OCR Errors for Effective Post-OCR Processing.

IFLA WLIC Conference, Athens, Greece, 24th-30th August 2019
Rautiainen, J. (2019). Opening Digitized Newspapers for Different User Groups - Successes and Challenges. Zenodo. 

Recent Advances in Natural Language Processing (RANLP), Bulgaria, 2-4 September 2019
Zosa, E., & Granroth-Wilding, M. (2019). Multilingual Dynamic Topic Model. Zenodo. 

15th International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20-25th September 2019
Michael, J., Labahn, R., Gruning, T., & Zollner, J. (2019). Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition.

Nguyen, T. T. H., Jatowt, A., Coustaty, M., Nguyen, N. V., & Doucet, A. (2019). Post-OCR Error Detection by Generating Plausible Candidates.

Rigaud, C., Doucet, A., Coustaty, M., & Moreux, J.-P. (2019). ICDAR 2019 Competition on Post-OCR Text Correction

Language Technology for Digital Historical Archives (Workshop collocated with RANLP 2019) (LT-DHA 2019), Varna Bulgaria, 5th September 2019
Pivovarova, L., Marjanen, J., & Zosa, E. (2019). Word Clustering for Historical Newspapers Analysis. 

HistoInformatics2019 - the 5th International Workshop on Computational History (HistoInformatics2019), Oslo, Norway, 12th September 2019
Marjanen, J., Pivovarova, L., Zosa, E., & Kurunmaki, J. (2019). Clustering Ideological Terms in Historical Newspaper Data with Diachronic Word Embeddings.

21st International Conference on Asia-Pacific Digital Libraries (ICADL 2019), Kuala Lumpur, Malaysia, 4th-7th November 2019
Linhares Pontes, E., Hamdi, A., Sidere, N., & Doucet, A. (2019). Impact of OCR Quality on Named Entity Linking. published in Digital Libraries at the Crossroads of Digital Information for the Future, Springer LNCS, pp. 102-115 (978-3-030-34057-5))

Digital Humanities in the Nordic Countries (DHN), Riga, Latvia, 17th - 20th March 2020
Kettunen, Kimmo, & La Mela, Matti. (2020). Digging Deeper into the Finnish Parliamentary Protocols – Using a Lexical Semantic Tagger for Studying Meaning Change of Everyman\'s Rights (allemansrätten). Zenodo.

Zosa, E., Hengchen, S., Marjanen, J., Pivovarova, L., & Tolonen, M. (2020). Disappearing Discourses: Avoiding anachronisms and teleology with data-driven methods in studying digital newspaper collections. Zenodo.

Ros, Ruben, & Oberbichler, Sarah. (2020). The Helsinki Digital Humanities Hackathon: Two Perspectives on Multidisciplinary Historical Newspapers Research in a Hackathon Context. Zenodo. 

10th Temporal Web Analytics Workshop (TempWeb), Taipei, 20th April 2020
Martinc, Matej, Montariol, Syrielle, Zosa, Elaine, & Pivovarova, Lidia. (2020). Capturing Evolution in Word Usage: Just Add More Clusters?. 

12th Edition Language Resources and Evaluation Conference. (LREC 2020), Marseilles, France, 13th-15th May 2020
Frossard, Esteban, Coustaty, Mickael, Jatowt, Adam, & Hengchen, Simon. (2020). Dataset for Temporal Analysis of English-French Cognates. Zenodo. 

Mutuvi, Stephen, Doucet, Antoine, Lejeune, Gael, & Odeo, Moses. (2020). A Dataset for Multi-lingual Epidemiological Event Extraction. Zenodo. 

Zosa, E., Granroth- Wilding, M., & Pivovarova, L. (2020). A Comparison of Unsupervised Methods for Ad hoc Cross-Lingual Document Retrieval.

ACM/IEEE Joint Conference on Digital Libraries (JCDL 2020), Wuhan, Hubei, P. R. China, 1st-5th August 2020
Pontes, E. L., Doucet, A., & Moreno, J. G. (2020). Linking Named Entities across Languages using Multilingual Word Embeddings.

Nguyen, T.-T.-H., Jatwot, A., Nguyen, N.-V., Doucet, A., & Coustaty, M. (2020). Neural Machine Translation with BERT for Post-OCR Error Detection and Correction.

2020 European Conference on Information Retrieval (ECIR 2020), Lisbon, Portugal, 14th-17th April 2020
Pivovarova, L., Jean-Caurant, A., Avikainen, J., Alnajjar, K., Granroth-Wilding, M., Leppanen, L., Zosa, E., & Toivonen, H. (2020). Personal Research Assistant for Online Exploration of Historical News.

Digital Humanities 2020 (DH 2020), Ottawa, Canada, 20th-25th July 2020
Doucet, A., Gasteiner, M., Granroth-Wilding, M., Kaiser, M., Kaukonen, M., Labahn, R., Moreux, J.-P., Muehlberger, G., Pfanzelter, E., Therenty, M.-E., Toivonen, H., & Tolonen, M. (2020). NewsEye: A digital investigator for historical newspapers.

14. International Conference on Data Analytics in Logistics (ICDAL 2020), Dubai, United Arab Emirates, 17th-18th December 2020
Huynh, V.-N., Hamdi, A., & Doucet, A. (2020). When to use OCR post-correction for named entity recognition?

Conference and Labs of the Evaluation Forum (CLEF 2020), online
Boros, E., Pontes, E. L., Cabrera-Diego, L. A., Hamdi, A., Moreno, J. G., Sidère, N., & Doucet, A. (2020). Robust Named Entity Recognition and Linking on Historical Multilingual Documents.

Conference on Computational Natural Language Learning (CoNLL), online, 19th-20th November 2020
Boros, E., Hamdi, A., Pontes, E. L., Adrian Cabrera-Diego, L., Moreno, J. G., Sidere, N., & Doucet, A. (2020). Alleviating Digitization Errors in Named Entity Recognition for Historical Documents.

Digital Humanities in the Nordic Countries (DHN), 20.-23. October 2020

Klaus, Barbara. (2020). Can Umlauts Ruin Your Research in Digitized Newspaper Collections? A NewsEye Case Study on 'The Dark Sides of War' (1914–1918). Presented at the Digital Humanities in the Nordic Countries (DHN), Zenodo. 

28th International Conference on Computational Linguistics (COLING'2020), online, 8th-13th December 2020
Mutuvi, S., Boros, E., Doucet, A., Lejeune, G., Jatowt, A., & Odeo, M. (2021). Multilingual Epidemiological Text Classification: A Comparative Study.

European Conference on Information Retrieval (ECIR) (ECIR 2021), Lucca, Italy, 28th April - 1st March 2021
Boros, E., Moreno, J. G., & Doucet, A. (2021). Event Detection with Entity Markers.

Thirteenth Text Analysis Conference ((TAC 2020), Evaluation: August 2020 - January 2021. Workshop: February 22-23, 2021

Boros, Emanuela, & Doucet, Antoine. (2021). Transformer-based Methods for Recognizing Ultra Fine-grained Entities (RUFES). Presented at the Thirteenth Text Analysis Conference ((TAC 2020). 

 European Chapter of the Association for Computational Linguistics , BSNLP 2021 workshop, Online 20th April 2021

Piskorski, Jakub, Babych, Bogdan, Kancheva, Zara, Kanishcheva, Olga, Lebedeva, Maria, Marcinczuk, Michał, … Yangarber, Roman. (2021). Slav-NER: the 3rd Cross-lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic languages. Presented at the European Chapter of the Association for Computational Linguistics , BSNLP 2021 workshop (EACL 2021, BSNLP 2021), online: Zenodo. 

North American Association for Computational Linguistics (NAACL), Online, June 6-11, 2021

Montariol, Syrielle, Martinc, Matej, & Pivovarova, Lidia. (2021). Scalable and Interpretable Semantic Change Detection. Presented at the North American Association for Computational Linguistics (NAACL), Online: Zenodo. 

The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021), Online, July 11-15, 2021

Hamdi, Ahmed, Linhares Pontes, Elvys, Boros, Emanuela, Tuyet Hai Nguyen, Thi, Hackl, Günter, Moreno, Jose G., & Doucet, Antoine. (2021). A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers. Presented at the The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021), Online: Zenodo. 

23rd Nordic Conference on Computational Linguistics ((NoDaLiDa 2021)), Online, May 31st - June 2nd, 2021

Leppanen, Leo, & Toivonen, Hannu. (2021). A Baseline Document Planning Method for Automated Journalism. Presented at the 23rd Nordic Conference on Computational Linguistics ((NoDaLiDa 2021)), Online: Zenodo.​​​​​​​ 


Kanner, Antti, Mäkelä, Eetu, Marjanen, Jani, Tolonen, Mikko, Oberbichler, Sarah, Duong, Quan, Pivovarova, Lidia, Ali, Dilawar, Verstockt, Steven, Ollion, Étienne, Shen, Rubing, Arnold, Matthias, Brown, David, Adam, Raven, Balasubramanian, Saranya, Charvat, Vera Maria, Füllsack, Manfred, Kleinert, Jörn, Misera, Hanna, … Lomazow, Steven. (2021). The Book of Abstracts for What's Past is Prologue: The NewsEye International Conference. What's Past is Prologue: The NewsEye International Conference - Towards a future of interdisciplinary collaboration between Cultural Heritage, Digital Humanities, Computer Science and Data S. Zenodo.

NewsEye Consortium. (2020). NewsEye Policy Brief. Zenodo.

Oberbichler, S. (2020). Using LDA and Jensen-Shannon Distance (JSD) to group similar newspaper articles (Version v1.1) [Computer software]. Zenodo.