Science and Society, Technology

The British Library's been digitising newspapers…

There were many things of which I have been meaning to write today, but this particular piece of news has swept all of those aside.

The British Libray, the national library of the UK, is a not unimpressive institution.  Indeed, it’s so not unimpressive that it has the title of being the world’s largest library (in total number of items). According to its website, it holds over 150 million items, in almost all the world’s languages, has everything from drawings and scores to manuscripts and books, original cylinder recordings to stamps, and requires some 625 km of shelving which grows by 12km every year.

Heaven, in other words.

BUT! This post is not meant to wax lyrical about its many holdings, but instead focuses on one recent piece of news.  The BL has digitised up 4 million pages’ (and about 300 years’) worth of newspapers, which are now available and searchable online. And they’re not finished - oh, no :)  According to engadget, the BL and the company with whom they’re doing this, Brightsolid, are planning to digitise some 40 million pages over the coming decade.

Which is kinda cool.

According to them, you can search through the collection for:

News Articles - read about national events, as well as issues of local and regional importance. News articles are your window into daily life in historical Britain.

Family Notices - search for your family’s birth, marriage and death notices plus related announcements including engagements, anniversaries, birthdays and congratulations.

Letters - read letters to the editor written by the newspaper’s readers, including illuminating contemporary debates, aspirations and anxieties.

Obituaries view a wealth of contemporary information on the lives of notable individuals and ancestors.

Advertisements - these include classifieds, shipping notices and appointments..Illustrations - see photographs, engravings, graphics, maps and editorial cartoons.

Of course, it also means many people can have enormous amounts of fun trying to trace their families back - for myself, I quite hope to find out that I have several regrettable forbears who got up to all sorts of No Good*.  We shall see :)

The best thing, though, is that it’s _accessible_. OK, not as accessible as I would like it: while searching is free, you still have to pay to get your hands on this material.  But it’s really quite reasonable!  Contrast, for example: buying access to a pay-wall-restricted journal paper (just one) can cost on the order of US$30.  The BL is charging 80 squids (GBP) for 12 month unfettered access, 30 squids for 30 days/3000 credits (that works out to several hundred pages) or a mere 7 squids for 2 days and 500 credits**.

That’s a lot of information.  And I wouldn’t be at all surprised to see some wonderful research and learnings come out of it.  Just think of what’s happened with other publicly available and accessible datasets, such as the SDSS (think Galaxy Zoo)***.

With such projects, not only do the answers we can gain expand as more people are able to work on them, but so do the possible questions that can be asked: a positive feedback loop of knowledge.

Stunning stuff :)

But I’m still sad there’s a charge attached, as it puts it out of the reach of many both internal to the UK and external to it (30 GBP is an awful lot of money for, for example, a school student or a cash-strapped third world researcher).


  • Exhaustive coverage of crime and punishment – from infamous murder trials to heart-rending stories of men, women and children transported to Australia for the most minor thefts (in one case, seven years transportation for the theft of seven cups and five saucers);
  • Eyewitness accounts of social transformation – newspaper reports, commentary and letters to the editor on topics ranging from the railway mania of the mid-19th century to the extraordinary expansion of the temperance movement;
  • Illustrations and advertisements – the aspirations and anxieties of the time laid bare in searchable ads and classifieds, peddling everything from the latest fashion to miracle cures for baldness and venereal disease;

More about the newspaper digitisation.

More facts about the Library.


* So far, I have a forbear, C. K. Whitcroft, who won the R.A.C. Tourist Trophy in a Riley car in 1932. Another forbear struck people who declined to give them their share of the money (1858), one was involved in some sort of heroism in 1921, someone who was a detective in 1902 and some 41 other pages of results.  Heh :P

** Cost of pages is between 0.05 and 0.21 squids per page.

*** It’s worth noting that the data from projects such as the SDSS is FREE.  Which is important. And yes, I’ve been reading Michael Nielsen’s latest book, Reinventing Discovery.  Stay tuned for a review soon!