Research Publications

Check out the papers published by the NewsEye project on this page. All publications are also featured on our Zenodo page and archived in the European Commission OpenAIRE repository.

 

Journals

Marjanen, Jani, Vaara, Villle, Kanner, Antti, Roivainen, Hege, Mäkelä,Eetu, Lahti, Leo, & Tolonen, Mikko. (2019). A National Public Sphere? Analyzing the Language, Location, and Form of Newspapers in Finland, 1771–1917. Journal of European Periodical Studies 4.1 (summer 2019), 55–78. doi.org/10.5281/zenodo.3697749

Book Chapter/Section

Nguyen, T.-T.-H., Coustaty, M., Doucet, A., Jatowt, A., & Nguyen, N.-V. (2018). Adaptive Edit-Distance and Regression Approach for Post-OCR Text Correction. Maturity and Innovation in Digital Libraries, 278–289. doi:10.1007/978-3-030-04257-8_29 

Mutuvi, S., Doucet, A., Odeo, M., & Jatowt, A. (2018). Evaluating the Impact of OCR Errors on Topic Modeling. Maturity and Innovation in Digital Libraries, 3–14. doi:10.1007/978-3-030-04257-8_1

Theses

Avikainen, J. (2019). A Method for Wavelet-Based Time Series Analysis of Historical Newspapers. doi.org/10.5281/ZENODO.3628263

Conference Papers

22nd International Academic Mindtrek Conference (10th - 11th October 2018)

Alhalaseh, Rola, Munezero, Myriam, Leinonen, Miika, Leppänen, Leo, Avikainen, Jari, & Toivonen, Hannu. (2018). Towards Data-Driven Generation of Visualizations for Automatically Generated News Articles. ACM, Association for Computing Machinery. http://doi.org/10.1145/3275116.3275131 

ACM/IEEE Joint Conference on Digital Libraries (JCDL), Urbana-Champaign, Illinois, June 2-6, 2019

Sumikawa, Y., Jatowt, A., Doucet, A., & Moreux, J.-P. (2019). Large Scale Analysis of Semantic and Temporal Aspects in Cultural Heritage Collection's Search. https://doi.org/10.5281/ZENODO.3243336

Hamdi, A., Jean-Caurant, A., Sidere, N., Coustaty, M., & Doucet, A. (2019). An Analysis of the Performance of Named Entity Recognition over OCRed Documents. https://doi.org/10.5281/ZENODO.3243343

Nguyen, T.-T.-H., Jatowt, A., Coustaty, M., Nguyen, N.-V., & Doucet, A. (2019). Deep Statistical Analysis of OCR Errors for Effective Post-OCR Processing. https://doi.org/10.5281/ZENODO.3245169

IFLA WLIC Conference, Athens, Greece, 24th -30th August 2019

Rautiainen, J. (2019). Opening Digitized Newspapers for Different User Groups - Successes and Challenges. Zenodo. https://doi.org/10.5281/ZENODO.3403158 

Recent Advances in Natural Language Processing (RANLP), Bulgaria, 2-4 September 2019

Zosa, E., & Granroth-Wilding, M. (2019). Multilingual Dynamic Topic Model. Zenodo. https://doi.org/10.5281/ZENODO.3402877 

15th International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20 -25th September 2019

Michael, J., Labahn, R., Gruning, T., & Zollner, J. (2019). Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition. https://doi.org/10.5281/ZENODO.3362980

Nguyen, T. T. H., Jatowt, A., Coustaty, M., Nguyen, N. V., & Doucet, A. (2019). Post-OCR Error Detection by Generating Plausible Candidates. https://doi.org/10.5281/ZENODO.3381148

Rigaud, C., Doucet, A., Coustaty, M., & Moreux, J.-P. (2019). ICDAR 2019 Competition on Post-OCR Text Correction.

Language Technology for Digital Historical Archives (Workshop collocated with RANLP 2019) (LT-DHA 2019), Varna Bulgaria, 5th September 2019

Pivovarova, L., Marjanen, J., & Zosa, E. (2019). Word Clustering for Historical Newspapers Analysis. https://doi.org/10.5281/ZENODO.3402939 

https://doi.org/10.5281/ZENODO.3459116

HistoInformatics2019 - the 5th International Workshop on Computational History (HistoInformatics2019), Oslo, Norway, 12th September 2019

Marjanen, J., Pivovarova, L., Zosa, E., & Kurunmaki, J. (2019). Clustering Ideological Terms in Historical Newspaper Data with Diachronic Word Embeddings. https://doi.org/10.5281/ZENODO.3403174

21st International Conference on Asia-Pacific Digital Libraries (ICADL 2019), Kuala Lumpur, Malaysia, 4-7 November 2019

Linhares Pontes, E., Hamdi, A., Sidere, N., & Doucet, A. (2019). Impact of OCR Quality on Named Entity Linking. https://doi.org/10.5281/ZENODO.3529179(also published in Digital Libraries at the Crossroads of Digital Information for the Future, Springer LNCS, pp. 102-115 (978-3-030-34057-5))

Digital Humanities in the Nordic Countries (DHN), Riga, Latvia, 17th - 20th March 2020

Kettunen, Kimmo, & La Mela, Matti. (2020). Digging Deeper into the Finnish Parliamentary Protocols – Using a Lexical Semantic Tagger for Studying Meaning Change of Everyman\'s Rights (allemansrätten). Zenodo. http://doi.org/10.5281/zenodo.3676372

Zosa, E., Hengchen, S., Marjanen, J., Pivovarova, L., & Tolonen, M. (2020). Disappearing Discourses: Avoiding anachronisms and teleology with data-driven methods in studying digital newspaper collections. Zenodo. https://doi.org/10.5281/ZENODO.3631614

Ros, Ruben, & Oberbichler, Sarah. (2020). The Helsinki Digital Humanities Hackathon: Two Perspectives on Multidisciplinary Historical Newspapers Research in a Hackathon Context. Zenodo. http://doi.org/10.5281/zenodo.3689228 

10th Temporal Web Analytics Workshop (TempWeb), Taipei, 20th April 2020

Martinc, Matej, Montariol, Syrielle, Zosa, Elaine, & Pivovarova, Lidia. (2020). Capturing Evolution in Word Usage: Just Add More Clusters?. http://doi.org/10.1145/3366424.3382186 

 12th Edition Language Resources and Evaluation Conference. (LREC 2020), Marseilles, France, 13th-15th May 2020

Frossard, Esteban, Coustaty, Mickael, Jatowt, Adam, & Hengchen, Simon. (2020). Dataset for Temporal Analysis of English-French Cognates. Zenodo. http://doi.org/10.5281/zenodo.3693651 

Mutuvi, Stephen, Doucet, Antoine, Lejeune, Gael, & Odeo, Moses. (2020). A Dataset for Multi-lingual Epidemiological Event Extraction. Zenodo. http://doi.org/10.5281/zenodo.3693647