Travaux de recherche

Consulter les articles publiés par le projet NewsEye sur cette page. Toutes ces publications sont également disponibles sur notre page Zenodo et archivées dans le référentiel OpenAIRE de la Commission européenne.

 

Journals

Marjanen, Jani, Vaara, Villle, Kanner, Antti, Roivainen, Hege, Mäkelä,Eetu, Lahti, Leo, & Tolonen, Mikko. (2019). A National Public Sphere? Analyzing the Language, Location, and Form of Newspapers in Finland, 1771–1917. Journal of European Periodical Studies 4.1 (summer 2019), 55–78. http://doi.org/10.5281/zenodo.3697749

Pontes, E. L., Huet, S., Torres-Moreno, J.-M., da Silva, T. G., & Linhares, A. C. (2020). A Multilingual Study of Multi-Sentence Compression using Word Vertex-Labeled Graphs and Integer Linear Programming. Journal of Computación y Sistemas: Vol. 24, No. 2, 2020. https://doi.org/10.5281/ZENODO.3759286

Pfanzelter, E., Oberbichler, S., Marjanen, J., Langlais, P.-C., & Hechl, S. (2021). Digital interfaces of historical newspapers: opportunities, restrictions and recommendations. Journal of Data Mining and Digital Humanities, Episciences.org, In press, HistoInformatics. https://doi.org/10.5281/ZENODO.4446818

Marjanen, J., Kurunmäki, J., Pivovarova, L., & Zosa, E. (2020). The expansion of isms, 1820–1917: Data-driven analysis of political language in digitized newspaper collections. Journal of Data Mining and Digital Humanities, Episciences.org, 2020, HistoInformatics. https://doi.org/10.5281/ZENODO.4447025

Book Chapter/Section

Nguyen, T.-T.-H., Coustaty, M., Doucet, A., Jatowt, A., & Nguyen, N.-V. (2018). Adaptive Edit-Distance and Regression Approach for Post-OCR Text Correction. Maturity and Innovation in Digital Libraries, 278–289. https://doi.org/10.1007/978-3-030-04257-8_29

Mutuvi, S., Doucet, A., Odeo, M., & Jatowt, A. (2018). Evaluating the Impact of OCR Errors on Topic Modeling. Maturity and Innovation in Digital Libraries, 3–14. https://doi.org/10.1007/978-3-030-04257-8_1

Theses

Avikainen, J. (2019). A Method for Wavelet-Based Time Series Analysis of Historical Newspapers. https://doi.org/10.5281/ZENODO.3628263

Hechl, S. P. (2020). ‘Wir dürfen wieder Österreicher sein!’ Die Rolle der Tagespresse in österreichischen Nation-Building-Prozessen 1945–1948 – eine quantitative Analyse ausgewählter digitaler Zeitungskorpora samt Vorschlägen zur didaktischen Umsetzung. https://doi.org/10.5281/ZENODO.4468295

Conference Papers

22nd International Academic Mindtrek Conference, 10th - 11th October 2018
Alhalaseh, Rola, Munezero, Myriam, Leinonen, Miika, Leppänen, Leo, Avikainen, Jari, & Toivonen, Hannu. (2018). Towards Data-Driven Generation of Visualizations for Automatically Generated News Articles. ACM, Association for Computing Machinery. http://doi.org/10.1145/3275116.3275131 
 

ACM/IEEE Joint Conference on Digital Libraries (JCDL), Urbana-Champaign, Illinois, June 2-6, 2019
Sumikawa, Y., Jatowt, A., Doucet, A., & Moreux, J.-P. (2019). Large Scale Analysis of Semantic and Temporal Aspects in Cultural Heritage Collection's Search. https://doi.org/10.5281/ZENODO.3243336

Hamdi, A., Jean-Caurant, A., Sidere, N., Coustaty, M., & Doucet, A. (2019). An Analysis of the Performance of Named Entity Recognition over OCRed Documents. https://doi.org/10.5281/ZENODO.3243343

Nguyen, T.-T.-H., Jatowt, A., Coustaty, M., Nguyen, N.-V., & Doucet, A. (2019). Deep Statistical Analysis of OCR Errors for Effective Post-OCR Processing. https://doi.org/10.5281/ZENODO.3245169
 

IFLA WLIC Conference, Athens, Greece, 24th-30th August 2019
Rautiainen, J. (2019). Opening Digitized Newspapers for Different User Groups - Successes and Challenges. Zenodo. https://doi.org/10.5281/ZENODO.3403158 
 

Recent Advances in Natural Language Processing (RANLP), Bulgaria, 2-4 September 2019
Zosa, E., & Granroth-Wilding, M. (2019). Multilingual Dynamic Topic Model. Zenodo. https://doi.org/10.5281/ZENODO.3402877 
 

15th International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20-25th September 2019
Michael, J., Labahn, R., Gruning, T., & Zollner, J. (2019). Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition. https://doi.org/10.5281/ZENODO.3362980

Nguyen, T. T. H., Jatowt, A., Coustaty, M., Nguyen, N. V., & Doucet, A. (2019). Post-OCR Error Detection by Generating Plausible Candidates. https://doi.org/10.5281/ZENODO.3381148

Rigaud, C., Doucet, A., Coustaty, M., & Moreux, J.-P. (2019). ICDAR 2019 Competition on Post-OCR Text Correctionhttps://zenodo.org/record/3459116
 

Language Technology for Digital Historical Archives (Workshop collocated with RANLP 2019) (LT-DHA 2019), Varna Bulgaria, 5th September 2019
Pivovarova, L., Marjanen, J., & Zosa, E. (2019). Word Clustering for Historical Newspapers Analysis. https://doi.org/10.5281/ZENODO.3402939 
 

HistoInformatics2019 - the 5th International Workshop on Computational History (HistoInformatics2019), Oslo, Norway, 12th September 2019
Marjanen, J., Pivovarova, L., Zosa, E., & Kurunmaki, J. (2019). Clustering Ideological Terms in Historical Newspaper Data with Diachronic Word Embeddings. https://doi.org/10.5281/ZENODO.3403174
 

21st International Conference on Asia-Pacific Digital Libraries (ICADL 2019), Kuala Lumpur, Malaysia, 4th-7th November 2019
Linhares Pontes, E., Hamdi, A., Sidere, N., & Doucet, A. (2019). Impact of OCR Quality on Named Entity Linking. https://doi.org/10.5281/ZENODO.3529179(also published in Digital Libraries at the Crossroads of Digital Information for the Future, Springer LNCS, pp. 102-115 (978-3-030-34057-5))
 

Digital Humanities in the Nordic Countries (DHN), Riga, Latvia, 17th - 20th March 2020
Kettunen, Kimmo, & La Mela, Matti. (2020). Digging Deeper into the Finnish Parliamentary Protocols – Using a Lexical Semantic Tagger for Studying Meaning Change of Everyman\'s Rights (allemansrätten). Zenodo. http://doi.org/10.5281/zenodo.3676372

Zosa, E., Hengchen, S., Marjanen, J., Pivovarova, L., & Tolonen, M. (2020). Disappearing Discourses: Avoiding anachronisms and teleology with data-driven methods in studying digital newspaper collections. Zenodo. https://doi.org/10.5281/ZENODO.3631614

Ros, Ruben, & Oberbichler, Sarah. (2020). The Helsinki Digital Humanities Hackathon: Two Perspectives on Multidisciplinary Historical Newspapers Research in a Hackathon Context. Zenodo. http://doi.org/10.5281/zenodo.3689228 
 

10th Temporal Web Analytics Workshop (TempWeb), Taipei, 20th April 2020
Martinc, Matej, Montariol, Syrielle, Zosa, Elaine, & Pivovarova, Lidia. (2020). Capturing Evolution in Word Usage: Just Add More Clusters?. http://doi.org/10.1145/3366424.3382186 
 

12th Edition Language Resources and Evaluation Conference. (LREC 2020), Marseilles, France, 13th-15th May 2020
Frossard, Esteban, Coustaty, Mickael, Jatowt, Adam, & Hengchen, Simon. (2020). Dataset for Temporal Analysis of English-French Cognates. Zenodo. http://doi.org/10.5281/zenodo.3693651 

Mutuvi, Stephen, Doucet, Antoine, Lejeune, Gael, & Odeo, Moses. (2020). A Dataset for Multi-lingual Epidemiological Event Extraction. Zenodo. http://doi.org/10.5281/zenodo.3693647 

Zosa, E., Granroth- Wilding, M., & Pivovarova, L. (2020). A Comparison of Unsupervised Methods for Ad hoc Cross-Lingual Document Retrieval. https://doi.org/10.5281/ZENODO.3751036
 

ACM/IEEE Joint Conference on Digital Libraries (JCDL 2020), Wuhan, Hubei, P. R. China, 1st-5th August 2020
Pontes, E. L., Doucet, A., & Moreno, J. G. (2020). Linking Named Entities across Languages using Multilingual Word Embeddings. https://doi.org/10.5281/ZENODO.3759437

Nguyen, T.-T.-H., Jatwot, A., Nguyen, N.-V., Doucet, A., & Coustaty, M. (2020). Neural Machine Translation with BERT for Post-OCR Error Detection and Correction. https://doi.org/10.5281/ZENODO.3759447
 

2020 European Conference on Information Retrieval (ECIR 2020), Lisbon, Portugal, 14th-17th April 2020
Pivovarova, L., Jean-Caurant, A., Avikainen, J., Alnajjar, K., Granroth-Wilding, M., Leppanen, L., Zosa, E., & Toivonen, H. (2020). Personal Research Assistant for Online Exploration of Historical News. https://doi.org/10.5281/ZENODO.3759614
 

Digital Humanities 2020 (DH 2020), Ottawa, Canada, 20th-25th July 2020
Doucet, A., Gasteiner, M., Granroth-Wilding, M., Kaiser, M., Kaukonen, M., Labahn, R., Moreux, J.-P., Muehlberger, G., Pfanzelter, E., Therenty, M.-E., Toivonen, H., & Tolonen, M. (2020). NewsEye: A digital investigator for historical newspapers. https://doi.org/10.5281/ZENODO.3895269
 

14. International Conference on Data Analytics in Logistics (ICDAL 2020), Dubai, United Arab Emirates, 17th-18th December 2020
Huynh, V.-N., Hamdi, A., & Doucet, A. (2020). When to use OCR post-correction for named entity recognition? https://doi.org/10.5281/ZENODO.4008952
 

Conference and Labs of the Evaluation Forum (CLEF 2020), online
Boros, E., Pontes, E. L., Cabrera-Diego, L. A., Hamdi, A., Moreno, J. G., Sidère, N., & Doucet, A. (2020). Robust Named Entity Recognition and Linking on Historical Multilingual Documents. https://doi.org/10.5281/ZENODO.4068075
 

Conference on Computational Natural Language Learning (CoNLL), online, 19th-20th November 2020
Boros, E., Hamdi, A., Pontes, E. L., Adrian Cabrera-Diego, L., Moreno, J. G., Sidere, N., & Doucet, A. (2020). Alleviating Digitization Errors in Named Entity Recognition for Historical Documents. https://doi.org/10.5281/ZENODO.4475989
 

28th International Conference on Computational Linguistics (COLING'2020), online, 8th-13th December 2020
Mutuvi, S., Boros, E., Doucet, A., Lejeune, G., Jatowt, A., & Odeo, M. (2021). Multilingual Epidemiological Text Classification: A Comparative Study. https://doi.org/10.5281/ZENODO.4476039​​​​​​​
 

European Conference on Information Retrieval (ECIR) (ECIR 2021), Lucca, Italy, 28th April - 1st March 2021
Boros, E., Moreno, J. G., & Doucet, A. (2021). Event Detection with Entity Markers. https://doi.org/10.5281/ZENODO.4476151
​​​​​​​

Others

NewsEye Consortium. (2020). NewsEye Policy Brief. Zenodo. https://doi.org/10.5281/ZENODO.4291895

Oberbichler, S. (2020). Using LDA and Jensen-Shannon Distance (JSD) to group similar newspaper articles (Version v1.1) [Computer software]. Zenodo. https://doi.org/10.5281/ZENODO.3887193