Research Publications

Check out the papers published by the NewsEye project on this page. All publications are also featured on our Zenodo page and archived in the European Commission OpenAIRE repository.


Book Chapter/Section

Nguyen, T.-T.-H., Coustaty, M., Doucet, A., Jatowt, A., & Nguyen, N.-V. (2018). Adaptive Edit-Distance and Regression Approach for Post-OCR Text Correction. Maturity and Innovation in Digital Libraries, 278–289. doi:10.1007/978-3-030-04257-8_29 

Mutuvi, S., Doucet, A., Odeo, M., & Jatowt, A. (2018). Evaluating the Impact of OCR Errors on Topic Modeling. Maturity and Innovation in Digital Libraries, 3–14. doi:10.1007/978-3-030-04257-8_1


Avikainen, J. (2019). A Method for Wavelet-Based Time Series Analysis of Historical Newspapers.

Conference Papers

22nd International Academic Mindtrek Conference (10th - 11th October 2018)

Alhalaseh, Rola, Munezero, Myriam, Leinonen, Miika, Leppänen, Leo, Avikainen, Jari, & Toivonen, Hannu. (2018). Towards Data-Driven Generation of Visualizations for Automatically Generated News Articles. ACM, Association for Computing Machinery. 

ACM/IEEE Joint Conference on Digital Libraries (JCDL), Urbana-Champaign, Illinois, June 2-6, 2019

Sumikawa, Y., Jatowt, A., Doucet, A., & Moreux, J.-P. (2019). Large Scale Analysis of Semantic and Temporal Aspects in Cultural Heritage Collection's Search.

Hamdi, A., Jean-Caurant, A., Sidere, N., Coustaty, M., & Doucet, A. (2019). An Analysis of the Performance of Named Entity Recognition over OCRed Documents.

Nguyen, T.-T.-H., Jatowt, A., Coustaty, M., Nguyen, N.-V., & Doucet, A. (2019). Deep Statistical Analysis of OCR Errors for Effective Post-OCR Processing.

IFLA WLIC Conference, Athens, Greece, 24th -30th August 2019

Rautiainen, J. (2019). Opening Digitized Newspapers for Different User Groups - Successes and Challenges. Zenodo. 

Recent Advances in Natural Language Processing (RANLP), Bulgaria, 2-4 September 2019

Zosa, E., & Granroth-Wilding, M. (2019). Multilingual Dynamic Topic Model. Zenodo. 

15th International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20 -25th September 2019

Michael, J., Labahn, R., Gruning, T., & Zollner, J. (2019). Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition.

Nguyen, T. T. H., Jatowt, A., Coustaty, M., Nguyen, N. V., & Doucet, A. (2019). Post-OCR Error Detection by Generating Plausible Candidates.

Rigaud, C., Doucet, A., Coustaty, M., & Moreux, J.-P. (2019). ICDAR 2019 Competition on Post-OCR Text Correction.

Language Technology for Digital Historical Archives (Workshop collocated with RANLP 2019) (LT-DHA 2019), Varna Bulgaria, 5th September 2019

Pivovarova, L., Marjanen, J., & Zosa, E. (2019). Word Clustering for Historical Newspapers Analysis.

HistoInformatics2019 - the 5th International Workshop on Computational History (HistoInformatics2019), Oslo, Norway, 12th September 2019

Marjanen, J., Pivovarova, L., Zosa, E., & Kurunmaki, J. (2019). Clustering Ideological Terms in Historical Newspaper Data with Diachronic Word Embeddings.

21st International Conference on Asia-Pacific Digital Libraries (ICADL 2019), Kuala Lumpur, Malaysia, 4-7 November 2019

Linhares Pontes, E., Hamdi, A., Sidere, N., & Doucet, A. (2019). Impact of OCR Quality on Named Entity Linking. published in Digital Libraries at the Crossroads of Digital Information for the Future, Springer LNCS, pp. 102-115 (978-3-030-34057-5))

Digital Humanities in the Nordic Countries (DHN), Riga, Latvia, 17th - 20th March 2020

Kettunen, Kimmo, & La Mela, Matti. (2020). Digging Deeper into the Finnish Parliamentary Protocols – Using a Lexical Semantic Tagger for Studying Meaning Change of Everyman\'s Rights (allemansrätten). Zenodo.

Zosa, E., Hengchen, S., Marjanen, J., Pivovarova, L., & Tolonen, M. (2020). Disappearing Discourses: Avoiding anachronisms and teleology with data-driven methods in studying digital newspaper collections. Zenodo.