Text-mining and open access

October 18, 2012

There is an excellent opinion piece in the latest edition of Research Fortnight by Professor Doug Kell on text-mining and open access. As for many the article will be behind a pay-wall (the irony…) I thought I would summarize the argument and post a few quotes here.

The argument goes like this:

  • New research findings are being added to the body of literature at a rate that means it is impossible for anyone to read it all, let alone assimilate and make sense of it all. The only solution is to use text-mining.
  • There are clear benefits for researchers, business and policy-makers in using text-mining of the scientific literature. For example a recent report from JISC concludes that “there is clear potential for significant productivity gains, with benefit both to the sector and to the wider economy”.
  • But for text-mining to be effective access is needed to the full text. Abstracts are not enough, and for rapid interpretation of new research embargo periods are a problem.

And here are some key paragraphs from the article:

The PubMed database records two new peer-reviewed papers in the life sciences every minute. Across all the sciences, the number is five.

Such is the rate at which scholarly papers are produced that only computers can read them all. As a result, text-mining techniques are infiltrating every field of research, from genomics to the social sciences and humanities. Historians are using text mining to analyse court records from the Old Bailey. Business has been mining newswires since the 1980s to acquire competitive intelligence and today companies use text mining, including of social media, to discover what customers think of their products and services.

[…]

To get the most from text mining requires open access to the literature. And it requires it as soon after publication as possible. In the life sciences, six months—the maximum embargo allowed in Research Councils UK’s policy on ‘green’ open access—is a very long time.

This is one reason why the research councils’ policy on open access announced this July made the ‘gold’ model the preferred route. Pursuing gold open access will help the UK to get ahead of the curve in exploiting the opportunities, including text mining, that come from open access.

Advertisements

One Response to “Text-mining and open access”

  1. Andrew Miller Says:

    Hi. Thanks for interesting post. Is the available corpus of OA and mine-able material sufficient to test this hypothesis now, do you think?

    Disclosure: I’m a publisher for Elsevier.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: