Hengchen, S., Ros, R., Marjanen, J., & Tolonen, M. (2021). A data-driven approach to studying changing vocabularies in historical newspaper collections. Digital Scholarship in the Humanities, 36(Supplement_2), ii109–ii126. https://doi.org/10.1093/llc/fqab032
Linhares Pontes, E., Cabrera-Diego, L. A., Moreno, J. G., Boros, E., Hamdi, A., Doucet, A., Sidere, N., & Coustaty, M. (2021). MELHISSA: A multilingual entity linking architecture for historical press articles. International Journal on Digital Libraries. https://doi.org/10.1007/s00799-021-00319-6
Linhares Pontes, E., Huet, S., Torres Moreno, J. M., Silva, T. G. da, & Linhares, A. C. (2020). A Multilingual Study of Multi-Sentence Compression using Word Vertex-Labeled Graphs and Integer Linear Programming. Computación y Sistemas, 24(2). https://doi.org/10.13053/cys-24-2-3335
Marjanen, J., Kurunmäki, J., Pivovarova, L., & Zosa, E. (2020). The expansion of isms, 1820-1917: Data-driven analysis of political language in digitized newspaper collections. Journal of Data Mining & Digital Humanities, HistoInformatics, 6159. https://doi.org/10.46298/jdmdh.6159
Marjanen, J., Vaara, V., Kanner, A., Roivainen, H., Mäkelä, E., Lahti, L., & Tolonen, M. (2019). A National Public Sphere? Analyzing the Language, Location, and Form of Newspapers in Finland, 1771–1917. Journal of European Periodical Studies, 4(1), 54–77. https://doi.org/10.21825/jeps.v4i1.10483
Nguyen, T. T. H., Jatowt, A., Coustaty, M., & Doucet, A. (2021). Survey of Post-OCR Processing Approaches. ACM Computing Surveys, 54(6), 1–37. https://doi.org/10.1145/3453476
Oberbichler, S., Boroş, E., Doucet, A., Marjanen, J., Pfanzelter, E., Rautiainen, J., Toivonen, H., & Tolonen, M. (2022). Integrated interdisciplinary workflows for research on historical newspapers: Perspectives from humanities scholars, computer scientists, and librarians. Journal of the Association for Information Science and Technology, 73(2), 225–239. https://doi.org/10.1002/asi.24565
Oberbichler, S., Hechl, S., & Pfanzelter, E. (2020). Als eine andere Epidemie die Welt in Atem hielt: Die Spanische Grippe 1918/19 in der österreichischen Presse. Tiroler Chronist - Fachblatt von Und Für Chronisten in Nord-, Süd- Und Osttirol, 154, 15–22.
Oberbichler, S., & Pfanzelter, E. (n.d.). Topic-specific corpus building: A step towards a representative newspaper corpus on the topic of return migration using text mining methods. Journal of Digital History Journal, jdh001. https://journalofdigitalhistory.org/en/article/4yxHGiqXYRbX
Pfanzelter, E., Oberbichler, S., Marjanen, J., Langlais, P.-C., & Hechl, S. (2021). Digital interfaces of historical newspapers: Opportunities, restrictions and recommendations. Journal of Data Mining & Digital Humanities, HistoInformatics(HistoInformatics), 6121. https://doi.org/10.46298/jdmdh.6121
Conference Papers
Alhalaseh, R., Munezero, M., Leinonen, M., Leppänen, L., Avikainen, J., & Toivonen, H. (2018). Towards Data-Driven Generation of Visualizations for Automatically Generated News Articles. Proceedings of the 22nd International Academic Mindtrek Conference, 100–109. https://doi.org/10.1145/3275116.3275131
Bernard, Guillaume, Suire, Cyrille, Faucher, Cyril, & Doucet, Antoine. (2022a). A Comprehensive Extraction of Relevant Real-World-Event Qualifiers for Semantic Search Engines. Linking Theory and Practice of Digital Libraries, 12866, 153–164. https://doi.org/10.5281/ZENODO.5900757
Bernard, Guillaume, Suire, Cyrille, Faucher, Cyril, & Doucet, Antoine. (2022b). Event Related Document Retrieval with Multilingual Real World Event Representation. Proceedings of the 20th International Semantic Web Conference, 2980. https://doi.org/10.5281/ZENODO.5900742
Boros, E., Hamdi, A., Linhares Pontes, E., Cabrera-Diego, L. A., Moreno, J. G., Sidere, N., & Doucet, A. (2020). Alleviating Digitization Errors in Named Entity Recognition for Historical Documents. Proceedings of the 24th Conference on Computational Natural Language Learning, 431–441. https://doi.org/10.18653/v1/2020.conll-1.35
Boros, E., Hamdi, A., Pontes, E. L., Cabrera-Diego, L. A., Moreno, J. G., Sidère, N., & Doucet, A. (2021, April 15). Atténuer les erreurs de numérisation dans la reconnaissance d’entités nommées pour les documents historiques. Conférence en Recherche d’Informations et Applications - CORIA 2021, French Information Retrieval Conference, Grenoble, France. https://doi.org/10.24348/CORIA.2021.MINI_24
Boros, E., Moreno, J. G., & Doucet, A. (2021a, December 14). Exploring Entities in Event Detection as Question Answering. Proceedings of the 44th European Conference on Information Retrieval. 44th European Conference on Information Retrieval, Stavanger, Norway.
Boros, E., Moreno, J. G., & Doucet, A. (2021b). Event Detection with Entity Markers. In D. Hiemstra, M.-F. Moens, J. Mothe, R. Perego, M. Potthast, & F. Sebastiani (Eds.), Proceedings of the 43rd European Conference on Information Retrieval (ECIR) (ECIR 2021) (Vol. 12657, pp. 233–240). Springer International Publishing. https://doi.org/10.1007/978-3-030-72240-1_20
Boros, Emanuela, Linhares Pontes, Elvys, Cabrera-Diego, Luis Adrián, Hamdi, Ahmed, Moreno, Jose G., Sidère, Nicolas, & Doucet, Antoine. (2020). Robust Named Entity Recognition and Linking on Historical Multilingual Documents. Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, 2696, 1–17. https://doi.org/10.5281/ZENODO.4059652
Cabrera-Diego, Luis Adrián, Moreno, Jose G., & Doucet, Antoine. (2021a, April 12). Simple ways to improve NER in every language using markup. Proceedings of the 2nd International Workshop on Cross-Lingual Event-Centric Open Analytics Co-Located with the 30th The Web Conference (WWW 2021). 2nd International Workshop on Cross-lingual Event-centric Open Analytics co-located with the 30th The Web Conference (WWW 2021), Ljubljana, Slovenia. https://doi.org/10.5281/ZENODO.4680998
Cabrera-Diego, Luis Adrián, Moreno, Jose G., & Doucet, Antoine. (2021b). Using a Frustratingly Easy Domain and Tagset Adaptation for Creating Slavic Named Entity Recognition Systems. Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing, 98–104. https://doi.org/10.5281/ZENODO.4730478
Doucet, Antoine, Gasteiner, Martin, Granroth-Wilding, Mark, Kaiser, Max, Kaukonen, Minna, Labahn, Roger, Moreux, Jean-Philippe, Muehlberger, Guenter, Pfanzelter, Eva, Therenty, Marie-Eve, Toivonen, Hannu, & Tolonen, Mikko. (2020, July 23). NewsEye: A digital investigator for historical newspapers. 15th Annual International Conference of the Alliance of Digital Humanities Organizations, DH 2020, Ottawa, Canada. https://doi.org/10.5281/ZENODO.3895269
Duong, Quan, Hämäläinen, Mika, & Hengchen, Simon. (2020). An Unsupervised method for OCR Post-Correction and Spelling Normalisation for Finnish. Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), 240–248. https://doi.org/10.5281/ZENODO.4242890
Duong, Quan, Pivovarova, Lidia, & Zosa, Elaine. (2021). Benchmarks for Unsupervised Discourse Change Detection. Proceedings of the 6th International Workshop on Computational History (HistoInformatics 2021), 2981. https://doi.org/10.5281/ZENODO.5548067
Frossard, E., Coustaty, M., Doucet, A., Jatowt, A., & Hengchen, S. (2020). Dataset for Temporal Analysis of English-French Cognates. Proceedings of the 12th Language Resources and Evaluation Conference, 855–859. https://doi.org/10.5281/ZENODO.3693651
Gutehrlé, Nicolas, Harlamov, Oleg, Karimi, Farimah, Wei, Haoyu, Jean-Caurant, Axel, & Pivovarova, Lidia. (2021, September 30). SpaceWars: A Web Interface for Exploring the Spatio-temporal Dimensions of WWI Newspaper Reporting. Proceedings of the 6th International Workshop on Computational History (HistoInformatics 2021). 6th International Workshop on Computational History, Aachen, Germany. https://doi.org/10.5281/ZENODO.5566462
Hamdi, A., Carel, E., Joseph, A., Coustaty, M., & Doucet, A. (2021). Information Extraction from Invoices. In J. Lladós, D. Lopresti, & S. Uchida (Eds.), Proceedings of the 16th International Conference on Document Analysis and Recognition (ICDAR 2021) (Vol. 12822, pp. 699–714). Springer International Publishing. https://doi.org/10.1007/978-3-030-86331-9_45
Hamdi, A., Jean-Caurant, A., Sidere, N., Coustaty, M., & Doucet, A. (2019). An Analysis of the Performance of Named Entity Recognition over OCRed Documents. 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 333–334. https://doi.org/10.1109/JCDL.2019.00057
Hamdi, A., Jean-Caurant, A., Sidère, N., Coustaty, M., & Doucet, A. (2020). Assessing and Minimizing the Impact of OCR Quality on Named Entity Recognition. In M. Hall, T. Merčun, T. Risse, & F. Duchateau (Eds.), Proceedings of the 24th International Conference on Theory and Practice of Digital Libraries, TPDL 2020 (Vol. 12246, pp. 87–101). Springer International Publishing. https://doi.org/10.1007/978-3-030-54956-5_7
Hamdi, A., Linhares Pontes, E., Boros, E., Nguyen, T. T. H., Hackl, G., Moreno, J. G., & Doucet, A. (2021). A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2328–2334. https://doi.org/10.1145/3404835.3463255
Huynh, V.-N., Hamdi, A., & Doucet, A. (2020). When to Use OCR Post-correction for Named Entity Recognition? In E. Ishita, N. L. S. Pang, & L. Zhou (Eds.), Proceedings of the 14. International Conference on Data Analytics in Logistics (ICDAL 2020) (Vol. 12504, pp. 33–42). Springer International Publishing. https://doi.org/10.1007/978-3-030-64452-9_3
Kettunen, K., & La Mela, M. (2020). Digging Deeper into the Finnish Parliamentary Protocols – Using a Lexical Semantic Tagger for Studying Meaning Change of Everyman’s Rights (allemansrätten). Proceedings of the Digital Humanities in the Nordic Countries 5th Conference, 63–80. https://doi.org/10.5281/ZENODO.3676371
Klaus, Barbara. (2020). Can Umlauts Ruin Your Research in Digitized Newspaper Collections? A NewsEye Case Study on “The Dark Sides of War” (1914–1918). Proceedings of the Digital Humanities in the Nordic Countries 5th Conference, 2612, 267–274. https://doi.org/10.5281/ZENODO.4686731
Kutuzov, A., & Pivovarova, L. (2021). Three-part diachronic semantic change dataset for Russian. Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change 2021, 7–13. https://doi.org/10.18653/v1/2021.lchange-1.2
Leppänen, Leo, & Toivonen, Hannu. (2021). A Baseline Document Planning Method for Automated Journalism. Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), 101–111. https://doi.org/10.5281/ZENODO.5562428
Linhares Pontes, E., Cabrera-Diego, L. A., Moreno, J. G., Boros, E., Hamdi, A., Sidère, N., Coustaty, M., & Doucet, A. (2020). Entity Linking for Historical Documents: Challenges and Solutions. In E. Ishita, N. L. S. Pang, & L. Zhou (Eds.), Proceedings of the 22nd International Conference on Asia-Pacific Digital Libraries (ICADL 2020) (Vol. 12504, pp. 215–231). Springer International Publishing. https://doi.org/10.1007/978-3-030-64452-9_19
Linhares Pontes, E., Hamdi, A., Sidere, N., & Doucet, A. (2019). Impact of OCR Quality on Named Entity Linking. In A. Jatowt, A. Maeda, & S. Y. Syn (Eds.), Digital Libraries at the Crossroads of Digital Information for the Future (Vol. 11853, pp. 102–115). Springer International Publishing. https://doi.org/10.1007/978-3-030-34058-2_11
Linhares Pontes, E., Moreno, J. G., & Doucet, A. (2020). Linking Named Entities across Languages using Multilingual Word Embeddings. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, 329–332. https://doi.org/10.1145/3383583.3398597
Marjanen, Jani, Pivovarova, Lidia, Zosa, Elaine, & Kurunmäki, Jussi. (2019). Clustering Ideological Terms in Historical Newspaper Data with Diachronic Word Embeddings. HistoInformatics 2019 : International Workshop on Computational History 2019 : The 5th International Workshop on Computational History (HistoInformatics 2019) : Co-Located with the 23rd International Conference on Theory and Practice of Digital Libraries (TPDL 2019), 21–29. https://doi.org/10.5281/ZENODO.3689466
Marjanen, Jani, Zosa, Elaine, Hengchen, Simon, Pivovarova, Lidia, & Tolonen, Mikko. (2021). Topic modelling discourse dynamics in historical newspapers. Proceedings of the 5th Conference Digital Humanities in the Nordic Countries (DHN 2020), 63–77. https://doi.org/10.5281/ZENODO.5648114
Martinc, M., Montariol, S., Zosa, E., & Pivovarova, L. (2020). Capturing Evolution in Word Usage: Just Add More Clusters? Companion Proceedings of the Web Conference 2020, 343–349. https://doi.org/10.1145/3366424.3382186
Michael, J., Labahn, R., Gruning, T., & Zollner, J. (2019). Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition. 2019 International Conference on Document Analysis and Recognition (ICDAR), 1286–1293. https://doi.org/10.1109/ICDAR.2019.00208
Michael, J., Weidemann, M., Laasch, B., & Labahn, R. (2021). ICPR 2020 Competition on Text Block Segmentation on a NewsEye Dataset. In A. Del Bimbo, R. Cucchiara, S. Sclaroff, G. M. Farinella, T. Mei, M. Bertini, H. J. Escalante, & R. Vezzani (Eds.), Pattern Recognition. ICPR International Workshops and Challenges (Vol. 12668, pp. 405–418). Springer International Publishing. https://doi.org/10.1007/978-3-030-68793-9_30
Montariol, S., Martinc, M., & Pivovarova, L. (2021). Scalable and Interpretable Semantic Change Detection. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 4642–4652. https://doi.org/10.18653/v1/2021.naacl-main.369
Mutuvi, S., Boros, E., Doucet, A., Jatowt, A., Lejeune, G., & Odeo, M. (2020). Multilingual Epidemiological Text Classification: A Comparative Study. Proceedings of the 28th International Conference on Computational Linguistics, 6172–6183. https://doi.org/10.18653/v1/2020.coling-main.543
Mutuvi, S., Boros, E., Doucet, A., Lejeune, G., Jatowt, A., & Odeo, M. (2021a, April 15). Étude comparative de méthodes de classification multilingue appliquées à l’épidémiologie. COnférence en Recherche d’Informations et Applications - CORIA 2021, French Information Retrieval Conference, Grenoble, France. https://doi.org/10.5281/ZENODO.4734471
Mutuvi, S., Boros, E., Doucet, A., Lejeune, G., Jatowt, A., & Odeo, M. (2021b). Multilingual Epidemic Event Extraction. In H.-R. Ke, C. S. Lee, & K. Sugiyama (Eds.), Proceedings of the 23rd International Conference on Asian Digital Libraries (Vol. 13133, pp. 139–156). Springer International Publishing. https://doi.org/10.1007/978-3-030-91669-5_12
Mutuvi, S., Boros, E., Doucet, A., Lejeune, G., Jatowt, A., & Odeo, M. (2021c). Token-Level Multilingual Epidemic Dataset for Event Extraction. In G. Berget, M. M. Hall, D. Brenn, & S. Kumpulainen (Eds.), Proceedings of the 25th International Conference on Theory and Practice of Digital Libraries (Vol. 12866, pp. 55–59). Springer International Publishing. https://doi.org/10.1007/978-3-030-86324-1_6
Mutuvi, S., Doucet, A., Lejeune, G., & Odeo, M. (2020). A Dataset for Multi-lingual Epidemiological Event Extraction. Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 4139–4144. https://doi.org/10.5281/ZENODO.3693647
Mutuvi, S., Doucet, A., Odeo, M., & Jatowt, A. (2018). Evaluating the Impact of OCR Errors on Topic Modeling. In M. Dobreva, A. Hinze, & M. Žumer (Eds.), Proceedings of the 20th International Conference on Asia-Pacific Digital Libraries (Vol. 11279, pp. 3–14). Springer International Publishing. https://doi.org/10.1007/978-3-030-04257-8_1
Nguyen, N. K., Boroş, E., Lejeune, G., & Doucet, A. (2020). Impact Analysis of Document Digitization on Event Extraction. Proceedings of the 4th Workshop on Natural Language for Artificial Intelligence (NL4AI 2020), 2735, 17–28. https://doi.org/10.5281/ZENODO.4734268
Nguyen, N. K., Boros, E., Lejeune, G., Doucet, A., & Delahaut, T. (2021). L3i_LBPAM at the FinSim-2 task: Learning Financial Semantic Similarities with Siamese Transformers. Companion Proceedings of the Web Conference 2021, 302–306. https://doi.org/10.1145/3442442.3451384
Nguyen, T. T. H., Jatowt, A., Nguyen, N.-V., Coustaty, M., & Doucet, A. (2020). Neural Machine Translation with BERT for Post-OCR Error Detection and Correction. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, 333–336. https://doi.org/10.1145/3383583.3398605
Nguyen, T.-T.-H., Coustaty, M., Doucet, A., Jatowt, A., & Nguyen, N.-V. (2018). Adaptive Edit-Distance and Regression Approach for Post-OCR Text Correction. In M. Dobreva, A. Hinze, & M. Žumer (Eds.), Proceesings of the 20th International Conference on Asia-Pacific Digital Libraries (Vol. 11279, pp. 278–289). Springer International Publishing. https://doi.org/10.1007/978-3-030-04257-8_29
Nguyen, T.-T.-H., Jatowt, A., Coustaty, M., Nguyen, N.-V., & Doucet, A. (2019a). Deep Statistical Analysis of OCR Errors for Effective Post-OCR Processing. 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 29–38. https://doi.org/10.1109/JCDL.2019.00015
Nguyen, T.-T.-H., Jatowt, A., Coustaty, M., Nguyen, N.-V., & Doucet, A. (2019b). Post-OCR Error Detection by Generating Plausible Candidates. 2019 International Conference on Document Analysis and Recognition (ICDAR), 876–881. https://doi.org/10.1109/ICDAR.2019.00145
Piskorski, Jakub, Babych, Bogdan, Kancheva, Zara, Kanishcheva, Olga, Lebedeva, Maria, Marcinczuk, Michał, Nakov, Preslav, Osenova, Petya, Pivovarova, Lidia, Pollak, Senja, Pribá, Pavel, Radev, Ivaylo, Robnik-Šikonja, Marko, Starko, Vasyl, Steinberger, Josef, & Yangarber, Roman. (2021). Slav-NER: The 3rd Cross-lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic languages. Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing, 122–133. https://doi.org/10.5281/ZENODO.4635585
Pivovarova, L., Jean-Caurant, A., Avikainen, J., Alnajjar, K., Granroth-Wilding, M., Leppänen, L., Zosa, E., & Toivonen, H. (2020). Personal Research Assistant for Online Exploration of Historical News. In J. M. Jose, E. Yilmaz, J. Magalhães, P. Castells, N. Ferro, M. J. Silva, & F. Martins (Eds.), Proceedings of the 42nd European Conference on IR Research (Vol. 12036, pp. 481–485). Springer International Publishing. https://doi.org/10.1007/978-3-030-45442-5_62
Pivovarova, Lidia, & Zosa, Elaine. (2022, January 25). Visual Topic Modelling for NewsImage Task at MediaEval 2021. Working Notes Proceedings of the MediaEval 2021 Workshop. MediaEval 2021 Workshop. https://doi.org/10.5281/ZENODO.5900719
Rautiainen, J. (2019, July 15). Opening Digitized Newspapers for Different User Groups—Successes and Challenges. IFLA World Library and Information Congress 2019, Athens, Greece. https://doi.org/10.5281/ZENODO.3403158
Rigaud, C., Doucet, A., Coustaty, M., & Moreux, J.-P. (2019). ICDAR 2019 Competition on Post-OCR Text Correction. 2019 International Conference on Document Analysis and Recognition (ICDAR), 1588–1593. https://doi.org/10.1109/ICDAR.2019.00255
Ros, R., & Oberbichler, S. (2020). The Helsinki Digital Humanities Hackathon: Two Perspectives on Multidisciplinary Historical Newspapers Research in a Hackathon Context. Proceedings of the Twin Talks 2 and 3 Workshops at DHN 2020 and DH 2020, 66–74. https://doi.org/10.5281/ZENODO.3689228
Sumikawa, Y., Jatowt, A., Doucet, A., & Moreux, J.-P. (2019). Large Scale Analysis of Semantic and Temporal Aspects in Cultural Heritage Collection’s Search. 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 77–86. https://doi.org/10.1109/JCDL.2019.00021
University of Helsinki, Finland, Pivovarova, L., Marjanen, J., & Zosa, E. (2019). Word Clustering for Historical Newspapers Analysis. Proceedings of the Workshop on Language Technology for Digital Historical Archives - with a Special Focus on Central-, (South-)Eastern Europe, Middle East and North Africa, 3–10. https://doi.org/10.26615/978-954-452-059-5_002
Zosa, E., Granroth- Wilding, M., & Pivovarova, L. (2020). A Comparison of Unsupervised Methods for Ad hoc Cross-Lingual Document Retrieval. Proceedings of the Workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020), 32–37. https://doi.org/10.5281/ZENODO.3751036
Zosa, E., & Granroth-Wilding, M. (2019). Multilingual Dynamic Topic Model. Proceedings - Natural Language Processing in a Deep Learning World, 1388–1396. https://doi.org/10.26615/978-954-452-056-4_159
Zosa, Elaine, Hengchen, Simon, Marjanen, Jani, Pivovarova, Lidia, & Tolonen, Mikko. (2020, March 1). Disappearing Discourses: Avoiding anachronisms and teleology with data-driven methods in studying digital newspaper collections. DHN 2020. https://doi.org/10.5281/ZENODO.3631613
Zosa, Elaine, Mutuvi, Stephen, Granroth-Wilding, Mark, & Doucet, Antoine. (2022, January 25). Evaluating the Robustness of Embedding-Based Topic Models to OCR Noise. Proceedings of the 23rd International Conference on Asian Digital Libraries 2021. 23rd International Conference on Asian Digital Libraries 2021, Online. https://doi.org/10.5281/ZENODO.5900730
Zosa, Elaine, Shekhar, Ravi, Karan, Mladen, & Purver, Matthew. (2021). Not All Comments are Equal: Insights into Comment Moderation from a Topic-Aware Model. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), 1652–1662. https://doi.org/10.5281/ZENODO.5562465
Boros, Emanuela; Khoa Nguyen, Nhu; Lejeune, Gaël; Coustaty, Mickael; Doucet, Antoine (2021). Transformer-based Methods with #Entities for Detecting Emergency Events on Social Media. TREC Incident Streams 2021. https://zenodo.org/record/6334513#.YjyMflXMKU
Cabrera-Diego, Luis Adrián; Boros, Emanuela; Doucet, Antoine (2021). Elastic Embedded Background Linking for News Articles with Keywords, Entities and Events. TREC Incident Streams 2021. zenodo.org/record/6334523
Zosa, Elaine, Pivovarova, Lidia, Boggia, Michele & Ivanova, Sardana (2022). Multilingual Topic Labelling of News Topics using Ontological Mapping. 44th European Conference on Information Retrieval (ECIR 2022). https://zenodo.org/record/6334491#.YiX86Gdv7z8
Emanuela Boros; Carlos-Emiliano Gonzalez-Gallardo; Jose G. Moreno; Antoine Doucet (2022). L3i at SemEval-2022 Task 11: Straightforward Additional Context for Multilingual Named Entity Recognition. https://zenodo.org/record/6369947#.YjyN1lXMKUk
Elaine Zosa; Emanuela Boros; Boshko Koloski; Lidia Pivovarova (2022). EMBEDDIA at SemEval-2022 Task 8: Investigating Sentence, Image, and Knowledge Graph Representations for Multilingual News Article Similarity. zenodo.org/record/6369944
Datasets
Bernard, G., Doucet, A., Faucher, C., & Suire, C. (2021). Event representation on Wikidata and Wikipedia with, and without the analysis of vernacular languages (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/ZENODO.4733507
Frossard, E., Coustaty, M., Doucet, A., Jatowt, A., & Hengchen, S. (2020). Data for “Dataset for Temporal Analysis of English-French Cognates” [Data set]. Zenodo. https://doi.org/10.5281/ZENODO.3688086
Hamdi, A., Elvys Linhares Pontes, Boros, E., Nguyen, T. T. H., Hackl, G., Moreno, J. G., & Doucet, A. (2021). Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers (V1.0) [Data set]. Zenodo. https://doi.org/10.5281/ZENODO.4573312
Hamdi, Ahmed, Jean-Caurant, Axel, Sidere, Nicolas, Coustaty, Mickaël, & Doucet, Antoine. (2020). Benchmark for the evaluation of named entity recognition over ancient documents [Data set]. Zenodo. https://doi.org/10.5281/ZENODO.3877553
Hengchen, Simon, Ros, Ruben, Marjanen, Jani, & Tolonen, Mikko. (2019). Models for “A data-driven approach to studying changing vocabularies in historical newspaper collections” (1.0.0) [Data set]. Zenodo. https://doi.org/10.5281/ZENODO.3585027
Linhares Pontes, E., Hamdi, A., Sidere, N., & Doucet, A. (2019). Benchmark for the evaluation of Named Entity Linking over ancient documents (0.1) [Data set]. Zenodo. https://doi.org/10.5281/ZENODO.3490332
Michael, Johannes, Weidemann, Max, Laasch, Bastian, & Labahn, Roger. (2020). Dataset of ICPR 2020 Competition on Text Block Segmentation on a NewsEye Dataset (1.0) [Data set]. Zenodo. https://doi.org/10.5281/ZENODO.4943581
Muehlberger, Guenter, & Hackl, Guenter. (2020). NewsEye / READ OCR training dataset from French Newspapers (18th, 19th, early 20th C.) [Data set]. Zenodo. https://doi.org/10.5281/ZENODO.4293601
Muehlberger, Guenter, & Hackl, Guenter. (2021a). NewsEye / READ OCR training dataset from Finnish Newspapers (18th, 19th, early 20th C.) (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/ZENODO.4599471
Muehlberger, Guenter, & Hackl, Guenter. (2021b). NewsEye / READ OCR training dataset from Swedish Newspapers (18th, 19th, early 20th C.) (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/ZENODO.4599623
Muehlberger, Guenter, & Hackl, Guenter. (2021c). NewsEye / READ AS training dataset from Austrian Newspapers (19th, early 20th C.) (Version 2) [Data set]. Zenodo. https://doi.org/10.5281/ZENODO.4693412
Muehlberger, Guenter, & Hackl, Guenter. (2021d). NewsEye / READ AS training dataset from French Newspapers (19th, early 20th C.) (Version 3) [Data set]. Zenodo. https://doi.org/10.5281/ZENODO.4600635
Muehlberger, Günter, & Hackl, Günter. (2021). NewsEye / READ AS training dataset from Finnish Newspapers (19th C.) (Version 2) [Data set]. Zenodo. https://doi.org/10.5281/ZENODO.4600745
Mutuvi, S., Doucet, A., Lejeune, G., & Odeo, M. (2020). Data for “A Dataset for Multi-lingual Epidemiological Event Extraction” [Data set]. Zenodo. https://doi.org/10.5281/ZENODO.3709616
Rigaud, C., Doucet, A., Coustaty, M., & Moreux, J.-P. (2019). Dataset of ICDAR 2019 Competition on Post-OCR Text Correction [Data set]. Zenodo. https://doi.org/10.5281/ZENODO.3515402
Hechl, Stefan Patrick. (2021). “Wir dürfen wieder Österreicher sein!” Die Rolle der Tagespresse in österreichischen Nation-Building-Prozessen 1945–1948 – eine quantitative Analyse ausgewählter digitaler Zeitungskorpora samt Vorschlägen zur didaktischen Umsetzung [Universität Innsbruck]. https://zenodo.org/record/4468295
Hamdi, A., Elvys Linhares Pontes, & Doucet, A. (2021). Annotation Guidelines for Named Entity Recognition, Entity Linking and Stance Detection (v3.1). Zenodo. https://zenodo.org/record/4574198
Kanner, Antti, Mäkelä, Eetu, Marjanen, Jani, Tolonen, Mikko, Oberbichler, Sarah, Duong, Quan, Pivovarova, Lidia, Ali, Dilawar, Verstockt, Steven, Ollion, Étienne, Shen, Rubing, Arnold, Matthias, Brown, David, Adam, Raven, Balasubramanian, Saranya, Charvat, Vera Maria, Füllsack, Manfred, Kleinert, Jörn, Misera, Hanna, … Lomazow, Steven. (2021). The Book of Abstracts for What’s Past is Prologue: The NewsEye International Conference. Zenodo. https://doi.org/10.5281/ZENODO.5167375
Oberbichler, S. (2020). Using LDA and Jensen-Shannon Distance (JSD) to group similar newspaper articles (v1.0). Zenodo. https://zenodo.org/record/3876063