Travaux de recherche

Consulter les articles publiés par le projet NewsEye sur cette page. Toutes ces publications sont également disponibles sur notre page Zenodo et archivées dans le référentiel OpenAIRE de la Commission européenne.

 

Journals

Marjanen, Jani, Vaara, Villle, Kanner, Antti, Roivainen, Hege, Mäkelä,Eetu, Lahti, Leo, & Tolonen, Mikko. (2019). A National Public Sphere? Analyzing the Language, Location, and Form of Newspapers in Finland, 1771–1917. Journal of European Periodical Studies 4.1 (summer 2019), 55–78. http://doi.org/10.5281/zenodo.3697749

Pontes, E. L., Huet, S., Torres-Moreno, J.-M., da Silva, T. G., & Linhares, A. C. (2020). A Multilingual Study of Multi-Sentence Compression using Word Vertex-Labeled Graphs and Integer Linear Programming. Journal of Computación y Sistemas: Vol. 24, No. 2, 2020. https://doi.org/10.5281/ZENODO.3759286

Pfanzelter, E., Oberbichler, S., Marjanen, J., Langlais, P.-C., & Hechl, S. (2021). Digital interfaces of historical newspapers: opportunities, restrictions and recommendations. Journal of Data Mining and Digital Humanities, Episciences.org, In press, HistoInformatics. https://doi.org/10.5281/ZENODO.4446818

Marjanen, J., Kurunmäki, J., Pivovarova, L., & Zosa, E. (2020). The expansion of isms, 1820–1917: Data-driven analysis of political language in digitized newspaper collections. Journal of Data Mining and Digital Humanities, Episciences.org, 2020, HistoInformatics. https://doi.org/10.5281/ZENODO.4447025

Nguyen, Thi-Tuyet-Hai, Jatowt, Adam, Coustaty, MIickael, & Doucet, Antoine. (2021). Survey of Post-OCR Processing Approaches. ACM Computing Surveys, 1, 1 (March 2020), 36. http://doi.org/10.5281/zenodo.4635569 

 

Book Chapter/Section

Nguyen, T.-T.-H., Coustaty, M., Doucet, A., Jatowt, A., & Nguyen, N.-V. (2018). Adaptive Edit-Distance and Regression Approach for Post-OCR Text Correction. Maturity and Innovation in Digital Libraries, 278–289. https://doi.org/10.1007/978-3-030-04257-8_29

Mutuvi, S., Doucet, A., Odeo, M., & Jatowt, A. (2018). Evaluating the Impact of OCR Errors on Topic Modeling. Maturity and Innovation in Digital Libraries, 3–14. https://doi.org/10.1007/978-3-030-04257-8_1

Michael, Johannes, Weidemann, Max, Laasch, Bastian, & Labahn, Roger. (2021). ICPR 2020 Competition on Text Block Segmentation on a NewsEye Dataset.   Lecture Notes in Computer Science, (LNCS, volume 12668). Springer. http://doi.org/10.5281/zenodo.4555751 

Theses

Avikainen, J. (2019). A Method for Wavelet-Based Time Series Analysis of Historical Newspapers. https://doi.org/10.5281/ZENODO.3628263

Hechl, S. P. (2020). ‘Wir dürfen wieder Österreicher sein!’ Die Rolle der Tagespresse in österreichischen Nation-Building-Prozessen 1945–1948 – eine quantitative Analyse ausgewählter digitaler Zeitungskorpora samt Vorschlägen zur didaktischen Umsetzung. https://doi.org/10.5281/ZENODO.4468295

Conference Papers

22nd International Academic Mindtrek Conference, 10th - 11th October 2018
Alhalaseh, Rola, Munezero, Myriam, Leinonen, Miika, Leppänen, Leo, Avikainen, Jari, & Toivonen, Hannu. (2018). Towards Data-Driven Generation of Visualizations for Automatically Generated News Articles. ACM, Association for Computing Machinery. http://doi.org/10.1145/3275116.3275131 
 

ACM/IEEE Joint Conference on Digital Libraries (JCDL), Urbana-Champaign, Illinois, June 2-6, 2019
Sumikawa, Y., Jatowt, A., Doucet, A., & Moreux, J.-P. (2019). Large Scale Analysis of Semantic and Temporal Aspects in Cultural Heritage Collection's Search. https://doi.org/10.5281/ZENODO.3243336

Hamdi, A., Jean-Caurant, A., Sidere, N., Coustaty, M., & Doucet, A. (2019). An Analysis of the Performance of Named Entity Recognition over OCRed Documents. https://doi.org/10.5281/ZENODO.3243343

Nguyen, T.-T.-H., Jatowt, A., Coustaty, M., Nguyen, N.-V., & Doucet, A. (2019). Deep Statistical Analysis of OCR Errors for Effective Post-OCR Processing. https://doi.org/10.5281/ZENODO.3245169
 

IFLA WLIC Conference, Athens, Greece, 24th-30th August 2019
Rautiainen, J. (2019). Opening Digitized Newspapers for Different User Groups - Successes and Challenges. Zenodo. https://doi.org/10.5281/ZENODO.3403158 
 

Recent Advances in Natural Language Processing (RANLP), Bulgaria, 2-4 September 2019
Zosa, E., & Granroth-Wilding, M. (2019). Multilingual Dynamic Topic Model. Zenodo. https://doi.org/10.5281/ZENODO.3402877 
 

15th International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20-25th September 2019
Michael, J., Labahn, R., Gruning, T., & Zollner, J. (2019). Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition. https://doi.org/10.5281/ZENODO.3362980

Nguyen, T. T. H., Jatowt, A., Coustaty, M., Nguyen, N. V., & Doucet, A. (2019). Post-OCR Error Detection by Generating Plausible Candidates. https://doi.org/10.5281/ZENODO.3381148

Rigaud, C., Doucet, A., Coustaty, M., & Moreux, J.-P. (2019). ICDAR 2019 Competition on Post-OCR Text Correctionhttps://zenodo.org/record/3459116
 

Language Technology for Digital Historical Archives (Workshop collocated with RANLP 2019) (LT-DHA 2019), Varna Bulgaria, 5th September 2019
Pivovarova, L., Marjanen, J., & Zosa, E. (2019). Word Clustering for Historical Newspapers Analysis. https://doi.org/10.5281/ZENODO.3402939 
 

HistoInformatics2019 - the 5th International Workshop on Computational History (HistoInformatics2019), Oslo, Norway, 12th September 2019
Marjanen, J., Pivovarova, L., Zosa, E., & Kurunmaki, J. (2019). Clustering Ideological Terms in Historical Newspaper Data with Diachronic Word Embeddings. https://doi.org/10.5281/ZENODO.3403174
 

21st International Conference on Asia-Pacific Digital Libraries (ICADL 2019), Kuala Lumpur, Malaysia, 4th-7th November 2019
Linhares Pontes, E., Hamdi, A., Sidere, N., & Doucet, A. (2019). Impact of OCR Quality on Named Entity Linking. https://doi.org/10.5281/ZENODO.3529179(also published in Digital Libraries at the Crossroads of Digital Information for the Future, Springer LNCS, pp. 102-115 (978-3-030-34057-5))
 

Digital Humanities in the Nordic Countries (DHN), Riga, Latvia, 17th - 20th March 2020
Kettunen, Kimmo, & La Mela, Matti. (2020). Digging Deeper into the Finnish Parliamentary Protocols – Using a Lexical Semantic Tagger for Studying Meaning Change of Everyman\'s Rights (allemansrätten). Zenodo. http://doi.org/10.5281/zenodo.3676372

Zosa, E., Hengchen, S., Marjanen, J., Pivovarova, L., & Tolonen, M. (2020). Disappearing Discourses: Avoiding anachronisms and teleology with data-driven methods in studying digital newspaper collections. Zenodo. https://doi.org/10.5281/ZENODO.3631614

Ros, Ruben, & Oberbichler, Sarah. (2020). The Helsinki Digital Humanities Hackathon: Two Perspectives on Multidisciplinary Historical Newspapers Research in a Hackathon Context. Zenodo. http://doi.org/10.5281/zenodo.3689228 
 

10th Temporal Web Analytics Workshop (TempWeb), Taipei, 20th April 2020
Martinc, Matej, Montariol, Syrielle, Zosa, Elaine, & Pivovarova, Lidia. (2020). Capturing Evolution in Word Usage: Just Add More Clusters?. http://doi.org/10.1145/3366424.3382186 
 

12th Edition Language Resources and Evaluation Conference. (LREC 2020), Marseilles, France, 13th-15th May 2020
Frossard, Esteban, Coustaty, Mickael, Jatowt, Adam, & Hengchen, Simon. (2020). Dataset for Temporal Analysis of English-French Cognates. Zenodo. http://doi.org/10.5281/zenodo.3693651 

Mutuvi, Stephen, Doucet, Antoine, Lejeune, Gael, & Odeo, Moses. (2020). A Dataset for Multi-lingual Epidemiological Event Extraction. Zenodo. http://doi.org/10.5281/zenodo.3693647 

Zosa, E., Granroth- Wilding, M., & Pivovarova, L. (2020). A Comparison of Unsupervised Methods for Ad hoc Cross-Lingual Document Retrieval. https://doi.org/10.5281/ZENODO.3751036
 

ACM/IEEE Joint Conference on Digital Libraries (JCDL 2020), Wuhan, Hubei, P. R. China, 1st-5th August 2020
Pontes, E. L., Doucet, A., & Moreno, J. G. (2020). Linking Named Entities across Languages using Multilingual Word Embeddings. https://doi.org/10.5281/ZENODO.3759437

Nguyen, T.-T.-H., Jatwot, A., Nguyen, N.-V., Doucet, A., & Coustaty, M. (2020). Neural Machine Translation with BERT for Post-OCR Error Detection and Correction. https://doi.org/10.5281/ZENODO.3759447
 

2020 European Conference on Information Retrieval (ECIR 2020), Lisbon, Portugal, 14th-17th April 2020
Pivovarova, L., Jean-Caurant, A., Avikainen, J., Alnajjar, K., Granroth-Wilding, M., Leppanen, L., Zosa, E., & Toivonen, H. (2020). Personal Research Assistant for Online Exploration of Historical News. https://doi.org/10.5281/ZENODO.3759614
 

Digital Humanities 2020 (DH 2020), Ottawa, Canada, 20th-25th July 2020
Doucet, A., Gasteiner, M., Granroth-Wilding, M., Kaiser, M., Kaukonen, M., Labahn, R., Moreux, J.-P., Muehlberger, G., Pfanzelter, E., Therenty, M.-E., Toivonen, H., & Tolonen, M. (2020). NewsEye: A digital investigator for historical newspapers. https://doi.org/10.5281/ZENODO.3895269
 

14. International Conference on Data Analytics in Logistics (ICDAL 2020), Dubai, United Arab Emirates, 17th-18th December 2020
Huynh, V.-N., Hamdi, A., & Doucet, A. (2020). When to use OCR post-correction for named entity recognition? https://doi.org/10.5281/ZENODO.4008952
 

Conference and Labs of the Evaluation Forum (CLEF 2020), online
Boros, E., Pontes, E. L., Cabrera-Diego, L. A., Hamdi, A., Moreno, J. G., Sidère, N., & Doucet, A. (2020). Robust Named Entity Recognition and Linking on Historical Multilingual Documents. https://doi.org/10.5281/ZENODO.4068075
 

Conference on Computational Natural Language Learning (CoNLL), online, 19th-20th November 2020
Boros, E., Hamdi, A., Pontes, E. L., Adrian Cabrera-Diego, L., Moreno, J. G., Sidere, N., & Doucet, A. (2020). Alleviating Digitization Errors in Named Entity Recognition for Historical Documents. https://doi.org/10.5281/ZENODO.4475989

Digital Humanities in the Nordic Countries (DHN), 20.-23. October 2020

Klaus, Barbara. (2020). Can Umlauts Ruin Your Research in Digitized Newspaper Collections? A NewsEye Case Study on 'The Dark Sides of War' (1914–1918). Presented at the Digital Humanities in the Nordic Countries (DHN), Zenodo. http://doi.org/10.5281/zenodo.4686731 

28th International Conference on Computational Linguistics (COLING'2020), online, 8th-13th December 2020
Mutuvi, S., Boros, E., Doucet, A., Lejeune, G., Jatowt, A., & Odeo, M. (2021). Multilingual Epidemiological Text Classification: A Comparative Study. https://doi.org/10.5281/ZENODO.4476039
 

European Conference on Information Retrieval (ECIR) (ECIR 2021), Lucca, Italy, 28th April - 1st March 2021
Boros, E., Moreno, J. G., & Doucet, A. (2021). Event Detection with Entity Markers. https://doi.org/10.5281/ZENODO.4476151

Thirteenth Text Analysis Conference ((TAC 2020), Evaluation: August 2020 - January 2021. Workshop: February 22-23, 2021

Boros, Emanuela, & Doucet, Antoine. (2021). Transformer-based Methods for Recognizing Ultra Fine-grained Entities (RUFES). Presented at the Thirteenth Text Analysis Conference ((TAC 2020). http://doi.org/10.5281/zenodo.4555788 

 European Chapter of the Association for Computational Linguistics , BSNLP 2021 workshop, Online 20th April 2021

Piskorski, Jakub, Babych, Bogdan, Kancheva, Zara, Kanishcheva, Olga, Lebedeva, Maria, Marcinczuk, Michał, … Yangarber, Roman. (2021). Slav-NER: the 3rd Cross-lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic languages. Presented at the European Chapter of the Association for Computational Linguistics , BSNLP 2021 workshop (EACL 2021, BSNLP 2021), online: Zenodo. http://doi.org/10.5281/zenodo.4635585 

North American Association for Computational Linguistics (NAACL), Online, June 6-11, 2021

Montariol, Syrielle, Martinc, Matej, & Pivovarova, Lidia. (2021). Scalable and Interpretable Semantic Change Detection. Presented at the North American Association for Computational Linguistics (NAACL), Online: Zenodo. http://doi.org/10.5281/zenodo.4680154 

The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021), Online, July 11-15, 2021

Hamdi, Ahmed, Linhares Pontes, Elvys, Boros, Emanuela, Tuyet Hai Nguyen, Thi, Hackl, Günter, Moreno, Jose G., & Doucet, Antoine. (2021). A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers. Presented at the The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021), Online: Zenodo. http://doi.org/10.5281/zenodo.4694466 

23rd Nordic Conference on Computational Linguistics ((NoDaLiDa 2021)), Online, May 31st - June 2nd, 2021

Leppanen, Leo, & Toivonen, Hannu. (2021). A Baseline Document Planning Method for Automated Journalism. Presented at the 23rd Nordic Conference on Computational Linguistics ((NoDaLiDa 2021)), Online: Zenodo. http://doi.org/10.5281/zenodo.4694493​​​​​​​ 

Others

Kanner, Antti, Mäkelä, Eetu, Marjanen, Jani, Tolonen, Mikko, Oberbichler, Sarah, Duong, Quan, Pivovarova, Lidia, Ali, Dilawar, Verstockt, Steven, Ollion, Étienne, Shen, Rubing, Arnold, Matthias, Brown, David, Adam, Raven, Balasubramanian, Saranya, Charvat, Vera Maria, Füllsack, Manfred, Kleinert, Jörn, Misera, Hanna, … Lomazow, Steven. (2021). The Book of Abstracts for What's Past is Prologue: The NewsEye International Conference. What's Past is Prologue: The NewsEye International Conference - Towards a future of interdisciplinary collaboration between Cultural Heritage, Digital Humanities, Computer Science and Data S. Zenodo. https://doi.org/10.5281/zenodo.5167375.

NewsEye Consortium. (2020). NewsEye Policy Brief. Zenodo. https://doi.org/10.5281/ZENODO.4291895

Oberbichler, S. (2020). Using LDA and Jensen-Shannon Distance (JSD) to group similar newspaper articles (Version v1.1) [Computer software]. Zenodo. https://doi.org/10.5281/ZENODO.3887193