Bibliothèque nationale de France

BNF

The Bibliothèque nationale de France (BnF) is one of the largest public and research libraries in the world. Throughout its history it has always performed the task of collecting and conserving the national heritage entrusted to its care, in whatever form, for the use of all researchers, students and professionals. Today, its patrimonial collections encompass all areas of culture and knowledge in a great variety of languages, and illustrate the library’s encyclopedic nature.

BnF offers access to its digital library Gallica obtained through the library’s commitment to the digitization of selected items of its collections. Gallica currently contains over 4 million digitized documents in French and other languages: manuscripts, sound materials and music scores, books, images and newspapers issues. They cover all domains of knowledge, with a specific focus in literature and history. Together with these collections, all in public domain, Gallica gives access to digitized documents belonging to French partner libraries, as well as to a set of copyrighted documents in collaboration with the French Publishers Association, some publishers and e-retailers (more than 280 partners).

BnF has a large experience in mass digitization and digitization process, especially OCR and text enrichment (named entities recognition, topic modeling...). It created in 2011 data.bnf.fr, a service that aggregates data from both the catalogues and the digital library, provides link to internal and external sources, and uses semantic web technologies to link BnF resources in the data web. Back in 2005, BnF launched the SPAR project (Scalable Preservation and Archiving Repository), which set up a complete digital repository that answers the issues of sustainability and security of digital data.

BnF plays a key role in the governance of Europeana and is one of its major contributors. The institution also took part in some of the projects which contributes to enlarge and enrich its content for e.g. with manuscripts (Europeana Regia as coordinator), materials related to WWI (Europeana Collections 1914-1918 and Europeana Awareness), newspapers (Europeana Newspapers), and more recently with audiovisual and sound materials (Europeana Sounds). It is also participating into two centers of competence: Open Preservation Foundation (OPF) and IMPACT for digitization.

Along with 16 partners, the BnF has contributed to improve the processes implemented on the newspapers collections digitized within the framework of the Europeana Newspapers project (CIP ICP-PSP). With the goal of facilitating access to a major multilingual newspaper collection of 18 M. pages, 1.1 M pages were provided by BnF for OCR and 1 M more were structured at article-level (Optical Layout Recognition, OLR). Together with the IT lab of Paris VI University (Lip6), BnF has monitored the recognition and extraction of named entities in French. The ALTO Editorial Board, in which the BnF is involved in, has worked on the description of named entities in order to integrate them in version 3 of the ALTO XML format. All those enrichments had to be added to the digital items, hence the creation of a dedicated profile for newspapers digitization on which BnF has worked within the framework of this project: the ENMAP (Europeana Newspapers METS/ALTO profile). This profile is currently used on the BnF newspapers mass digitization program.

BnF is involved in several national and international research programs and welcomes researchers in residence. The Digitization team is currently committed to CORPUS, a four-year program (2016-2019) that runs internally and aims at providing a better service to researchers for extraction and analysis of large-scale corpora. This year (2017), the team is working on heritage newspapers content, in close collaboration with researchers from the CELSA/Gripic Paris lab (Humanities and Social Sciences/Media). BnF has also developed prototype tools for extraction and analysis of corpora whose efficiency has been tested and validated by researchers. Its objective is now to scale up with a constant awareness of standards and norms, in order to enter into the global offer of services of the Library and its policy towards Library users.

Within NewsEye, BnF will essentially contribute to WP7 on demonstration, dissemination, outreach and exploitation. Notably, it will take advantage of its key position within several networks and centers of competence to plan and realize the sustainability of the project.