Search engine for open access research articles


Open Access (OA) in short means that publicly funded scholarly research is to be provided, free of charge, to the public. However, millions of publicly funded OA resources are being stored across thousands of university websites, also known as Institutional Repositories (IR). Searching across all of these IRs is impossible. This blog is designed to walk you through the series of events which have led to both problems and opportunities in relation to OA, or more broadly, scholarly communication. At the end of this blog we will introduce you to a different kind of search engine for open access research articles.

Problems in a nutshell

  1. OA resources are stored across more than 4000 Institutional Repositories (IR) worldwide (, 2015), and visiting each IR in turn is impossible. This may well be the reason why academic staff and students turn to commercial search engines to access OA resources.
  2. Conventional reading methods alone are no longer sufficient and we need machine assistance. Academics are experiencing information overload (Cantoni and Danowski, 2015). The amount of information available to us exceeds our ability to process it (Bergamaschi, Guerra and Leiba, 2010), this causes stress and breaks our concentration (Vilar, 2015). While it is imperative that academic staff and students have access to the entire global corpora of OA resources, it should be noted that it is impossible for a human to read the entire global corpus of OA research literature.

Why is research important?

It is estimated that 26% of activity ($330 billion per year) in the Australian economy, results from the direct and flow-on impact of research and scientific advancement (Australian Academy Of Science, 2016) and so naturally, government agencies from around the world allocate billions of dollars to research each year. Examples of these agencies are The Australian Research Council (ARC) and The National Health and Medical Research Council (NHMRC) who collectively allocated around $19 billion dollars to Australian research projects between 2000 and 2014 (, 2016) (, 2015). Streams of this public funding goes towards projects and programs within universities and this leads to, amongst other things, journal and book publications which are the cornerstone of this scholarly communication (Brown and Boulderstone, 2008).

A bit of history …

Over the last two decades universities have experienced what is known as the serials crisis (Tinerella, 1999) – a period of extremely high and rapidly increasing costs for accessing scholarly communication (O’Donnell et al., 2015). Causes of the serial crisis include price hikes by publishers, library budget cuts and inflation (Kumar and Sanjaya, 2015), all of which are compounded by differential pricing for different continents and the effects of currency exchange (Borchert and Ives, 2007).

The Open Access (OA) movement, which traces its origins at least back to at least the 1960s, now fosters the publishing of digital scholarly communication for immediate and free worldwide access (Open Access, broad readership, high impact – Springer, 2013).

OA solves the pricing crisis for scholarly journals. It also solves what I’ve called the permission crisis. OA also serves library interests in other, indirect ways. Librarians want to help users find the information they need, regardless of the budget-enforced limits on the library’s own collection (Suber, 2012).


OA has specific benefits for the academic community and its growth continues to accelerate (Open Access, broad readership, high impact – Springer, 2013) and was again called for in December 2001 via the Budapest Open Access Initiative (Suber, 2012); an initiative designed to further accelerate research and enrich education (Budapest Open Access Initiative, 2002). Since its inception, the initiative has been fueled by the internet and digital technologies which have decreased tremendously the cost associated with both the production and circulation of research literature (Open Access Status of Journal Articles from ERC Funded Projects, 2012).

What’s next

It is evident that OA is no longer a new or experimental model; it has emerged and matured into a fully developed alternative to traditional subscription publishing (Open Access, broad readership, high impact – Springer, 2013) and in turn, the growth of open access mandates and policies adopted by universities, research institutions and research funders has increased by 500% during the last decade (The Registry of Open Access Repository Mandates and Policies, 2016).

This windfall of openly available resources means that academics are now working with enormous amounts of information in high-velocity digital environments, which can be challenging (Sacco et al., 2015). Evidently, OA has created an auspicious issue which is the ongoing exponentially increase in the volume of openly available scholarly research publications, otherwise known as information overload.

In 2010 Michel et al. (2010) demonstrated the effectiveness of contemporary technology against information overload by performing computational analysis on a corpus of digitised texts, containing about 5 million books (4% of all books ever printed). The shift from conventional reading to computational analysis of digital content provided extraordinary and unparalleled findings for Michel et al. leading to the successful publishing of their multidisciplinary paper in the journal Science. According to Michel et al, if a human tried to read only the English-language entries from the year 2000 alone, at the reasonable pace of 200 words/min, without interruptions for food or sleep, it would take them about 80 years (Michel et al., 2010).

A recent harvest of .edu, .gov, .ac and .org university websites, which I performed, produced around 16, 000, 000 papers. Reading between the lines which Michel et al. drew in a TED Talk, I decided to prune a clean set of records (taking only the papers with near perfect metadata – dates, abstracts etc) and then presented them in a Bookworm (the software which inspired the Google Books Ngram Viewer).

The result is a website which allows anyone to dive into billions of words which are stored in the millions of OA research papers.

That’s not all …

Whilst users are able to chart frequencies and plot the usage of words over time, this Bookworm also allows users to click on the graph at which point a link to OA research papers is provided. Now millions of research papers are literally just a few clicks away and because the Bookworm only harvests university IRs there are no publisher pay-walls, pop-ups or advertising.

NB. is still under development so please forgive (among many other things) the fact that the web site is not a mobile first-responsive web page. I will be making significant changes to the front end interface and will also perform another harvest which will provide a fresh set of research papers right up to January 2016 – as you could appreciate harvesting the global corpora of research texts and presenting them to you is quite a laborious and costly task. In addition I am planning on building APIs and much much more, but this is all I have to share with you today.


  • Ngram – is a term used to describe a format to store contiguous (side by side, sharing a space) sequences of words. For example a 1-gram (unigram) would be a single word like “kindergarten”, a 2-gram (bigram) would be two contiguous words like “child care”, a 3-gram (trigram) would be 3 contiguous words like “day care facility” (, 2016)
  • Open Access – is a kind of access or availability which can be applied broadly to any digital content, however the Budapest Open Access Initiative only calls for open access to a certain kind of scientific and scholarly literature and goes on to articulate that open access means free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers (Budapest Open Access Initiative, 2002)
  • Scholarly communication – is the process or system through which academics, scholars and researchers share, evaluate, publishing and disseminate their research findings so that they are available to the wider academic community for future use (, 2016).


  • ARC & NHMRC, Expert Panel, (2013). Expert panel on the ARC & NHMRC’s open access policies. [online] Available at: [Accessed 22 Jan. 2016]., (2015). Fact Sheet—Open Access | Australian Research Council. [online] Available at: [Accessed 13 Oct. 2015].
  •, (2016). Scholarly Communication | Association of Research Libraries® | ARL®. [online] Available at: [Accessed 22 Jan. 2016].
    Australian Academy of Science. (2016). THE IMPORTANCE OF ADVANCED PHYSICAL, MATHEMATICAL AND BIOLOGICAL SCIENCES TO THE AUSTRALIAN ECONOMY. [online] Australian Academy of Science. Available at: [Accessed 22 Jan. 2016].
  • Bergamaschi, S., Guerra, F. and Leiba, B. (2010). Guest Editors’ Introduction: Information Overload. IEEE Internet Computing, 14(6), pp.10-13.
    Bergamaschi, S., Guerra, F. and Leiba, B. (2010). Guest Editors’ Introduction: Information Overload. IEEE Internet Computing, 14(6), pp.10-13.
  •, (2016). Google Ngram Viewer. [online] Available at: [Accessed 22 Jan. 2016].
    Bookworm, (2015). Why Bookworm?. [online] Available at: [Accessed 18 Jun. 2015].
  •, (2016). Bookworm. [online] Available at: [Accessed 20 Jan. 2016].
    Borchert, C. and Ives, G. (2007). Mile-high views. New York: Haworth Information Press.
  • Brown, D. and Boulderstone, R. (2008). The impact of electronic publishing. München: Saur.
  • Budapest Open Access Initiative. (2002). Budapest, Hungary.
  • Cantoni, L. and Danowski, J. (2015). Schulz, Peter J.; Cobley, Paul: Handbooks of Communication Science [HoCS]/Communication and Technology. Berlin: de Gruyter Mouton.
  • Collaboration On the Edge of a New Paradigm. (2015). [film] Alfred Birkegaard Hansted & Katja Gry Birkegaard Carlsen.
  • Deloitte, (2015). The paradigm shift Redefining education. [online] Available at: [Accessed 14 May 2015].
  • Griffin, S. (2013). New Roles for Libraries in Supporting Data-Intensive Research and Advancing Scholarly Communication. International Journal of Humanities and Arts Computing, 7(supplement), pp.59-71.
  • Jaguszewski,, J. and Williams,, K. (2013). New Roles for New Times. Transforming Liaison Roles in Research Libraries. [online] Washington,: Association of Research Libraries. Available at: [Accessed 20 Jan. 2016].
  • Kingsley, D. (2012). The changing nature of scholarly communication.
  • Kumar, D. and Sanjaya, M. (2015). The Serials Crisis. Paris: UNESCO, pp.44-67.
  • Lewis, D. (2012). The Inevitability of Open Access. College & Research Libraries, 73(5), p.504.
  • Michel, J., Shen, Y., Aiden, A., Veres, A., Gray, M., Pickett, J., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M. and Aiden, E. (2010). Quantitative Analysis of Culture Using Millions of Digitized Books. Science, 331(6014), pp.176-182.
  •, (2015). Research funding statistics and data | National Health and Medical Research Council. [online] Available at: [Accessed 22 Jan. 2016].
  • O’Donnell, D., Hobma, H., Cowan, S., Ayers, G., Bay, J., Swanepoel, M., Merkley, W., Devine, K., Dering, E. and Genee, I. (2015). Aligning Open Access Publication with Research and Teaching Missions of the Public University: The Case of The Lethbridge Journal Incubator (If ‘if’s and ‘and’s were pots and pans). The Journal of Electronic Publishing, 18(3).
  • Open Access Status of Journal Articles from ERC Funded Projects. (2012). [online] Brussels.: European Research Council. Available at: [Accessed 21 Jan. 2016].
  • Open Access, broad readership, high impact – Springer. (2013). 1st ed. [ebook] Springer. Available at: [Accessed 20 Jan. 2016].
  •, (2015). Registered Data Providers. [online] Available at: [Accessed 5 Aug. 2015].
  • Sacco, K., Richmond, S., Parme, S. and Wilkes, K. (2015). Supporting digital humanities for knowledge acquisition in modern libraries.
  • Suber, P. (2012). Budapest Open Access Initiative, FAQ. [online]
  • Available at: [Accessed 22 Jan. 2016].
  • Taylor, M. (2016). The world needs One Repo – BioMed Central blog. [online] BioMed Central blog. Available at: [Accessed 24 Jan. 2016].
  • The Registry of Open Access Repository Mandates and Policies, (2016).
  • Welcome to ROARMAP. [online] Available at: [Accessed 20 Jan. 2016].
  • Tinerella, V. (1999). The Crisis in Scholarly Publishing and the Role of the Academic Library. [online] Available at: [Accessed 3 Feb. 2016].
  • Vilar, P. (2015). Information behaviour of scholars. Libellarium: journal for the research of writing, books, and cultural heritage institutions, 7(1), p.17.


  1. Great idea. I’m publicizing this article through the Open Access Tracking Project.

    I’m about to do the same for itself. But since xyz doesn’t have an “about” page with a short description of the tool, I’ll have to clip a description from this article. I recommend creating an “about” page for the tool. Good luck!

  2. Hi Peter,
    Thanks for your advice. I have created an about page at

    Thanks for adding this to the Open Access Tracking Project, much appreciated.
    If you know of anyone who is able to further promote or publicize this free product I would love to meet them and collaborate.


Leave a Reply

Your email address will not be published. Required fields are marked *