NewsEye: How digitalization changes history (and maybe also historical research)

by Jani Marjanen, postdoctoral researcher at the University of Helsinki.

According to a now worn-out saying: “history is written by the winners”.

An inside historian joke modifies this trope by saying that “history is written by historians being commissioned by the winners”.

Perhaps there is some truth to both of these claims, but in this text I would like to advance a slightly different, but related, claim suggesting that changes in how we manage our historical records form structural changes that direct the way we write history. We are living through such a rupture right now with the massive digitalization projects that are being carried out all over the world (but predominantly in the rich parts of the world).

The digitized collections we now have encompass almost 50 per cent of all books published in Britain in the eighteenth century, all newspapers published in Finland before the year 1929, every single book published in Norway ever, and many other collections throughout the world. Every European country has produced digitized historical collections. On top of this, we have Google’s massive collection of millions of digitized books in the Hathi Trust. We have particularly good digital access to historical newspapers and parliament records from many countries – two collections that seem exceptionally good for long-term datasets.

~ ~ ~ ~

These projects have an effect on how history is written. Take the digitized newspapers in Finland as an example: a first batch of them was published already in 2004. If you look at any dissertation in history before 2004 that deals with the nineteenth century (from which the bulk of the newspapers come), they usually do not refer to historical newspapers (unless they are specifically about newspapers), but now, in the 2010s, any dissertation dealing with nineteenth century Finland will cite newspapers even in cases when they do not focus on newspapers. The simple reason is that a once very laborious material to use has suddenly become the best available historical source material for the period.

Any new large-scale historical dataset changes history, not because it is the winners deciding what to write history about, but because the choices of what to digitize directs research by providing materials that are, comparatively speaking, easier to use. Massive amounts of historical sources remain undigitized and they are obviously used by historians but have become more difficult to access.

This is not the first time in history such a reorganization of historical data has happened. In the late nineteenth century and the early twentieth century, we saw the establishment of national collections of books, newspapers and other archived materials. Shaping these collections, sometimes in bound series or carefully curated collections sent a signal of what was important, but also concretely made a selection of material more accessible.

If you go to any European capital and visit a national library or a national archive, there is a high likelihood that the building is from the second half of the nineteenth century. This is when the organization of knowledge according to national institutions was at its strongest.

During the course of the twentieth century, organizations challenging a national portrayal of history emerged. Archives of the working class, women, ethnic minorities, sexual minorities, businesses, among other groups and stakeholders in society gained prominence.

In professional history, the period from the 1970s to the 1990s was characterized by challenging an old nationalist narrative of history, the 2000s was much about finding modes for writing transnational, global and postcolonial history.

The digitalization of historical records in the 2010s does not necessarily follow the same path, as it seems to go back to the collections established in the late nineteenth century and the early twentieth century.

Quite naturally, big digitalization projects tend to emphasize the collections that are perceived as most important or assumed to be most interesting to the public and/or are easy to digitize. These collections are very often closely related to the national collections that were started with in the nineteenth century.

While we should be thrilled by the abundance of digitized historical datasets, it is at the same time worth promoting a certain pluralism in terms of what gets digitized and implementing practices in studying digitized sources that hinder a re-nationalization of history writing. The most obvious, but not the easiest to implement, of those practices is setting up research projects so that they include comparative elements such as in the NewsEye and Impresso projects. (Having said this, it is important to underline that this does not mean that the history of the nation should be disregarded.)

We need to remember that the digitalization of historical collections is at the heart of deciding what is important in history. And that the choices made are inherently political and direct historical research.

~ ~ ~ ~

If digitalization changes history, it also has direct consequences for how historians work. I will specifically raise three points that I think may help promote a more pluralist take on history.

First, historical research must start working much closer with memory organizations, that is, libraries, archives and museums.

One way to ensure that digitalization of new materials does not end up being a conservative effort, is to tie research projects to the digitalization efforts themselves.

This is a concrete way for memory organizations to get input from historians (and other scholars), but perhaps more importantly this means that historians get a better picture of how collections are made, what they include and how they guide research.

For this to be realized in a successful way, new funding instruments are needed. Currently, I do not know of any instruments that would allow for a long enough period that would include funds to digitize and then conduct a research project partly in conjunction but mostly after the digitalization is ready.

The NewsEye project does this by including humanists, computer scientists and libraries, thus bringing together data providers, data science and interpretation of data, but because of the time frame for EU-funded projects it primarily relies on already digitized material.

Second, dealing with large-scale digitized collections creates a demand for humanists to work in groups.

This is in a sense trivial, but to really be able to use large-scale digital datasets, historians need to collaborate with at least computer scientists and computational linguists.

In the Helsinki Computational History Groups led by Mikko Tolonen, and which I am working in, we are four historians, four linguists and three computer scientists (depending a bit on how you count). In the NewsEye project the disciplinary range of researchers is even greater.

Working in such groups has many effects, but I would say one of the major things is that it challenges the way historians write historical narratives. Instead of those big books written by one lone historian, we are moving much more towards argument-driven articles without a sense of an unraveling, often chronological, historical narrative. Working with big textual data requires a different type of discussion of the materials and methods used also to better document the research process that covers data analysis and harmonization. A discussion of data also invites a critical examination of the often national frame of historical collections used.

Third, the availability of large digital datasets opens up for reassessing classic interpretations in history.

One important thing about many digital material collections (especially those produced by publicly funded organizations), is that they are available to practically everyone.

This means we get a lot of amateurs studying them, we get research groups from other fields approaching historical themes just because the data is there and we get historians playing around with new methods that they do not quite understand, but they still play around because they can.

The openness of the data allows for new interventions from people who have not used their long careers to build up that domain knowledge. Sometimes poor domain knowledge leads to silly or anachronistic research questions, but in general better access to material opens up the field in a positive way and results in new perspectives and ultimately to history in the plural. Digitalization strips many gatekeepers of historical inquiry of their power.

This, it seems, also brings the big picture to the fore. People who come from outside or who use methods designed for the analysis of big data also tend to focus on big questions.

In our group in Helsinki we have gone back to using huge datasets to discuss Jürgen Habermas’ classic on The Structural Transformation of the Public Sphere (org. Strukturwandel der Öffentlichkeit) from 1962. Similarly, a group led by Tim Hitchcock has used digitized court cases to revisit Norbert Elias’s classic on The Civilizing Process (org. Über den Prozeß der Zivilisation) from 1939.

In the NewsEye project starting points are also on big questions rather than detailed ones. The humanities groups are approaching themes of migration, gender, journalism and nationalism based on large-scale datasets. Although more detailed research cases are being developed within the project, also in this project big questions are seen as a way to assess the promise of digital history. A crucial aspect of this is that the big cases are easily conceived (but not easily conducted) as comparative or transnational projects.

Back