Tutkimusjulkaisut

Näet NewsEye-projektin julkaisut tällä sivulla. Kaikki julkaisut ovat myös näkyvillä Zenodo-sivullamme ja arkistoituina Euroopan komission OpenAire-järjestelmässä.

 

Journals

Marjanen, Jani, Vaara, Villle, Kanner, Antti, Roivainen, Hege, Mäkelä,Eetu, Lahti, Leo, & Tolonen, Mikko. (2019). A National Public Sphere? Analyzing the Language, Location, and Form of Newspapers in Finland, 1771–1917. Journal of European Periodical Studies 4.1 (summer 2019), 55–78. http://doi.org/10.5281/zenodo.3697749

Pontes, E. L., Huet, S., Torres-Moreno, J.-M., da Silva, T. G., & Linhares, A. C. (2020). A Multilingual Study of Multi-Sentence Compression using Word Vertex-Labeled Graphs and Integer Linear Programming. Journal of Computación y Sistemas: Vol. 24, No. 2, 2020. https://doi.org/10.5281/ZENODO.3759286

Pfanzelter, E., Oberbichler, S., Marjanen, J., Langlais, P.-C., & Hechl, S. (2021). Digital interfaces of historical newspapers: opportunities, restrictions and recommendations. Journal of Data Mining and Digital Humanities, Episciences.org, In press, HistoInformatics. https://doi.org/10.5281/ZENODO.4446818

Marjanen, J., Kurunmäki, J., Pivovarova, L., & Zosa, E. (2020). The expansion of isms, 1820–1917: Data-driven analysis of political language in digitized newspaper collections. Journal of Data Mining and Digital Humanities, Episciences.org, 2020, HistoInformatics. https://doi.org/10.5281/ZENODO.4447025

Book Chapter/Section

Nguyen, T.-T.-H., Coustaty, M., Doucet, A., Jatowt, A., & Nguyen, N.-V. (2018). Adaptive Edit-Distance and Regression Approach for Post-OCR Text Correction. Maturity and Innovation in Digital Libraries, 278–289. https://doi.org/10.1007/978-3-030-04257-8_29

Mutuvi, S., Doucet, A., Odeo, M., & Jatowt, A. (2018). Evaluating the Impact of OCR Errors on Topic Modeling. Maturity and Innovation in Digital Libraries, 3–14. https://doi.org/10.1007/978-3-030-04257-8_1

Theses

Avikainen, J. (2019). A Method for Wavelet-Based Time Series Analysis of Historical Newspapers. https://doi.org/10.5281/ZENODO.3628263

Hechl, S. P. (2020). ‘Wir dürfen wieder Österreicher sein!’ Die Rolle der Tagespresse in österreichischen Nation-Building-Prozessen 1945–1948 – eine quantitative Analyse ausgewählter digitaler Zeitungskorpora samt Vorschlägen zur didaktischen Umsetzung. https://doi.org/10.5281/ZENODO.4468295

Conference Papers

22nd International Academic Mindtrek Conference, 10th - 11th October 2018
Alhalaseh, Rola, Munezero, Myriam, Leinonen, Miika, Leppänen, Leo, Avikainen, Jari, & Toivonen, Hannu. (2018). Towards Data-Driven Generation of Visualizations for Automatically Generated News Articles. ACM, Association for Computing Machinery. http://doi.org/10.1145/3275116.3275131 
 

ACM/IEEE Joint Conference on Digital Libraries (JCDL), Urbana-Champaign, Illinois, June 2-6, 2019
Sumikawa, Y., Jatowt, A., Doucet, A., & Moreux, J.-P. (2019). Large Scale Analysis of Semantic and Temporal Aspects in Cultural Heritage Collection's Search. https://doi.org/10.5281/ZENODO.3243336

Hamdi, A., Jean-Caurant, A., Sidere, N., Coustaty, M., & Doucet, A. (2019). An Analysis of the Performance of Named Entity Recognition over OCRed Documents. https://doi.org/10.5281/ZENODO.3243343

Nguyen, T.-T.-H., Jatowt, A., Coustaty, M., Nguyen, N.-V., & Doucet, A. (2019). Deep Statistical Analysis of OCR Errors for Effective Post-OCR Processing. https://doi.org/10.5281/ZENODO.3245169
 

IFLA WLIC Conference, Athens, Greece, 24th-30th August 2019
Rautiainen, J. (2019). Opening Digitized Newspapers for Different User Groups - Successes and Challenges. Zenodo. https://doi.org/10.5281/ZENODO.3403158 
 

Recent Advances in Natural Language Processing (RANLP), Bulgaria, 2-4 September 2019
Zosa, E., & Granroth-Wilding, M. (2019). Multilingual Dynamic Topic Model. Zenodo. https://doi.org/10.5281/ZENODO.3402877 
 

15th International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20-25th September 2019
Michael, J., Labahn, R., Gruning, T., & Zollner, J. (2019). Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition. https://doi.org/10.5281/ZENODO.3362980

Nguyen, T. T. H., Jatowt, A., Coustaty, M., Nguyen, N. V., & Doucet, A. (2019). Post-OCR Error Detection by Generating Plausible Candidates. https://doi.org/10.5281/ZENODO.3381148

Rigaud, C., Doucet, A., Coustaty, M., & Moreux, J.-P. (2019). ICDAR 2019 Competition on Post-OCR Text Correctionhttps://zenodo.org/record/3459116
 

Language Technology for Digital Historical Archives (Workshop collocated with RANLP 2019) (LT-DHA 2019), Varna Bulgaria, 5th September 2019
Pivovarova, L., Marjanen, J., & Zosa, E. (2019). Word Clustering for Historical Newspapers Analysis. https://doi.org/10.5281/ZENODO.3402939 
 

HistoInformatics2019 - the 5th International Workshop on Computational History (HistoInformatics2019), Oslo, Norway, 12th September 2019
Marjanen, J., Pivovarova, L., Zosa, E., & Kurunmaki, J. (2019). Clustering Ideological Terms in Historical Newspaper Data with Diachronic Word Embeddings. https://doi.org/10.5281/ZENODO.3403174
 

21st International Conference on Asia-Pacific Digital Libraries (ICADL 2019), Kuala Lumpur, Malaysia, 4th-7th November 2019
Linhares Pontes, E., Hamdi, A., Sidere, N., & Doucet, A. (2019). Impact of OCR Quality on Named Entity Linking. https://doi.org/10.5281/ZENODO.3529179(also published in Digital Libraries at the Crossroads of Digital Information for the Future, Springer LNCS, pp. 102-115 (978-3-030-34057-5))
 

Digital Humanities in the Nordic Countries (DHN), Riga, Latvia, 17th - 20th March 2020
Kettunen, Kimmo, & La Mela, Matti. (2020). Digging Deeper into the Finnish Parliamentary Protocols – Using a Lexical Semantic Tagger for Studying Meaning Change of Everyman\'s Rights (allemansrätten). Zenodo. http://doi.org/10.5281/zenodo.3676372

Zosa, E., Hengchen, S., Marjanen, J., Pivovarova, L., & Tolonen, M. (2020). Disappearing Discourses: Avoiding anachronisms and teleology with data-driven methods in studying digital newspaper collections. Zenodo. https://doi.org/10.5281/ZENODO.3631614

Ros, Ruben, & Oberbichler, Sarah. (2020). The Helsinki Digital Humanities Hackathon: Two Perspectives on Multidisciplinary Historical Newspapers Research in a Hackathon Context. Zenodo. http://doi.org/10.5281/zenodo.3689228 
 

10th Temporal Web Analytics Workshop (TempWeb), Taipei, 20th April 2020
Martinc, Matej, Montariol, Syrielle, Zosa, Elaine, & Pivovarova, Lidia. (2020). Capturing Evolution in Word Usage: Just Add More Clusters?. http://doi.org/10.1145/3366424.3382186 
 

12th Edition Language Resources and Evaluation Conference. (LREC 2020), Marseilles, France, 13th-15th May 2020
Frossard, Esteban, Coustaty, Mickael, Jatowt, Adam, & Hengchen, Simon. (2020). Dataset for Temporal Analysis of English-French Cognates. Zenodo. http://doi.org/10.5281/zenodo.3693651 

Mutuvi, Stephen, Doucet, Antoine, Lejeune, Gael, & Odeo, Moses. (2020). A Dataset for Multi-lingual Epidemiological Event Extraction. Zenodo. http://doi.org/10.5281/zenodo.3693647 

Zosa, E., Granroth- Wilding, M., & Pivovarova, L. (2020). A Comparison of Unsupervised Methods for Ad hoc Cross-Lingual Document Retrieval. https://doi.org/10.5281/ZENODO.3751036
 

ACM/IEEE Joint Conference on Digital Libraries (JCDL 2020), Wuhan, Hubei, P. R. China, 1st-5th August 2020
Pontes, E. L., Doucet, A., & Moreno, J. G. (2020). Linking Named Entities across Languages using Multilingual Word Embeddings. https://doi.org/10.5281/ZENODO.3759437

Nguyen, T.-T.-H., Jatwot, A., Nguyen, N.-V., Doucet, A., & Coustaty, M. (2020). Neural Machine Translation with BERT for Post-OCR Error Detection and Correction. https://doi.org/10.5281/ZENODO.3759447
 

2020 European Conference on Information Retrieval (ECIR 2020), Lisbon, Portugal, 14th-17th April 2020
Pivovarova, L., Jean-Caurant, A., Avikainen, J., Alnajjar, K., Granroth-Wilding, M., Leppanen, L., Zosa, E., & Toivonen, H. (2020). Personal Research Assistant for Online Exploration of Historical News. https://doi.org/10.5281/ZENODO.3759614
 

Digital Humanities 2020 (DH 2020), Ottawa, Canada, 20th-25th July 2020
Doucet, A., Gasteiner, M., Granroth-Wilding, M., Kaiser, M., Kaukonen, M., Labahn, R., Moreux, J.-P., Muehlberger, G., Pfanzelter, E., Therenty, M.-E., Toivonen, H., & Tolonen, M. (2020). NewsEye: A digital investigator for historical newspapers. https://doi.org/10.5281/ZENODO.3895269
 

14. International Conference on Data Analytics in Logistics (ICDAL 2020), Dubai, United Arab Emirates, 17th-18th December 2020
Huynh, V.-N., Hamdi, A., & Doucet, A. (2020). When to use OCR post-correction for named entity recognition? https://doi.org/10.5281/ZENODO.4008952
 

Conference and Labs of the Evaluation Forum (CLEF 2020), online
Boros, E., Pontes, E. L., Cabrera-Diego, L. A., Hamdi, A., Moreno, J. G., Sidère, N., & Doucet, A. (2020). Robust Named Entity Recognition and Linking on Historical Multilingual Documents. https://doi.org/10.5281/ZENODO.4068075
 

Conference on Computational Natural Language Learning (CoNLL), online, 19th-20th November 2020
Boros, E., Hamdi, A., Pontes, E. L., Adrian Cabrera-Diego, L., Moreno, J. G., Sidere, N., & Doucet, A. (2020). Alleviating Digitization Errors in Named Entity Recognition for Historical Documents. https://doi.org/10.5281/ZENODO.4475989
 

28th International Conference on Computational Linguistics (COLING'2020), online, 8th-13th December 2020
Mutuvi, S., Boros, E., Doucet, A., Lejeune, G., Jatowt, A., & Odeo, M. (2021). Multilingual Epidemiological Text Classification: A Comparative Study. https://doi.org/10.5281/ZENODO.4476039​​​​​​​
 

European Conference on Information Retrieval (ECIR) (ECIR 2021), Lucca, Italy, 28th April - 1st March 2021
Boros, E., Moreno, J. G., & Doucet, A. (2021). Event Detection with Entity Markers. https://doi.org/10.5281/ZENODO.4476151
​​​​​​​

Others

NewsEye Consortium. (2020). NewsEye Policy Brief. Zenodo. https://doi.org/10.5281/ZENODO.4291895

Oberbichler, S. (2020). Using LDA and Jensen-Shannon Distance (JSD) to group similar newspaper articles (Version v1.1) [Computer software]. Zenodo. https://doi.org/10.5281/ZENODO.3887193