Goals

Text Recognition and Article Separation: NewsEye will essentially address two major obstacles of current research projects dealing with historical newspapers: One is the fact that in many cases, conventional Optical Character Recognition (OCR) does not provide satisfying results. The other is that text recognition results are mostly on newspaper page level only instead on the appropriate article level.

Multilingual and Uncertainty-aware Semantic Text Enrichment: While named entity recognition (NER) and linking (NEL) are very active research areas, their results still are very weak when applied to historical data. The main reason is that most of the models require linguistic analysis, which is not robust to noisy text recognition.

Dynamic Text Analysis: Tools for exploring large sets of historical newspapers are scarce, in particular in terms of advanced ability to discover and express historical trends, topics and viewpoints suggested by large-scale analysis.

Personal Research Assistant: The ambition is to crucially change the accessibility, exploration and analysis possibilities of the store of information made available by historical newspaper archives for professional researchers, family historians and the interested public.

Digital Humanities (DH): Access to the digital surrogates of newspapers is potentially possible at any time from everywhere in the world and therefore allows new research approaches. This has led to an opening up of digital newspaper collections to user groups other than professional researchers and lay historians. So far, this probably is the biggest achievement of digital collections within libraries and archives.