Universities' own electronic repositories yet to impact on Open Access
More and more academic institutions are creating their own digital repositories, but a recent survey I carried out1
suggests they have had little effect on publishing practices so far.
Two years ago, the Massachusetts Institute of Technology launched DSpace, an institutional repository (IR) intended to capture MIT's entire intellectual output in a stable electronic archive. Linked to similar archives at other research institutions, its ultimate goal is to create a seamless worldwide network, where multiple databases could be searched as if they were a single entity, and specialized collections built by drawing on content spread across many institutions.
DSpace, a $2.4 million joint venture with the technology firm Hewlett-Packard of Palo Alto, California, USA, is one of the largest examples of a trend for universities and other academic institutions to create their own digital repositories. Such repositories will be critical in helping institutions manage, exploit and archive for posterity, the explosion in the number of digital documents and data they generate. But what impact might they have on the ways in which researchers publish their work? That is a question I addressed recently in a report, 'Pathfinder Research on Web-based Repositories', commissioned by the UK-based organization, Publisher and Library/Learning Solutions (PALS)1.
Repositories have been touted as a fast track to open access by enthusiasts such as Stevan Harnad2, from the University of Southampton, UK, and the Scholarly Publishing & Academic Resources Coalition (SPARC), a Washington-based advocacy group promoting alternative publishing models. They argue that to broaden access immediately all that authors need to do is to self-archive their e-prints preprints or reprints of their published papers, in such repositories3.
Because IRs such as DSpace and the University of California's eScholarship archive are designed mainly to archive the output of a particular academic centre, their mission differs from subject-based archives (such as the arXiv physics repository), which are designed to serve communities in specific disciplines. They are designed to be cumulative and perpetual, and like journals form a collection of record.
As well as collecting and organizing content, most repositories also provide publishing tools. A powerful feature is that the software used to build the various archives usually complies with standards developed by the Open Archive Initiative (OAI) for describing documents and other digital objects. These standards are the glue that allow the creation of distributed archives that work together, and can be searched as if they were one, either using specialist search engines called OAI service providers or indeed general search engines like Google.
Repositories usually allow open access to all their content, although some categories of material such as doctoral theses of the full text of books, written by local authors are sometimes restricted to users on the campus. Because repositories collect, store and disseminate research, teaching and other outputs of the institution, they form part of the process of scholarly communication. In addition, most emphasize long-term preservation of digital materials an important and as yet unresolved problem facing digital scholarly publishing.
From a broad perspective, repositories are therefore seen as part of the digital infrastructure of the modern university, offering a set of services for the management and dissemination of digital materials created by the institution and its members4. Those who share the SPARC perspective also see the rationale for repositories to be the reform of scholarly communication, and in particular scholarly publishing. SPARC also considers that institutions will enhance their prestige by making their research outputs more visible.
Publishers and many authors tend to equate 'open access' with new open access journals that charge authors a publication fee rather than charging libraries for subscriptions. However there is another route to open access for articles, whereby authors archive their e-prints on a website. Previously this would have been a personal or departmental website or, more usefully, a subject-based repository such as arXiv or RePEc (Research Papers in Economics). Frustratingly for Open Access advocates, the success of the early subject-based repositories (arXiv has over 250,000 e-prints and offers 100% coverage of the published literature in some areas of physics) has not been repeated in other disciplines. Chemists and especially biomedical researchers seem to fear the clinical or social consequences of the publication of non-peer-reviewed preprints, and scholars in humanities often are concerned of the risks of plagiarism.
Advocates place high hopes in IRs as a means to change this situation, and thus encourage more self-archiving. They point out that such repositories have credibility, given that they are backed by prestigious institutions. The proximity of such institutions to the author (they are usually the author's employer), should, they argue, help persuade academics from many disciplines to adopt self-archiving.
There is little evidence so far that this is the case, however. In my survey of 45 IRs I found the average number of documents per archive to be just 1256, most of which were theses, dissertations or grey literature, such as technical reports, and working papers. The University of Virginia's archive had the largest content, 21,000 documents, although 14,000 of these came from a collection of digital photographs. Many archives contained no records at all, and I sportingly omitted these from the calculation of the overall average. Only about one-fifth of archived documents in these repositories were e-prints, with most of these being preprints.
Generally online self-archiving has only flourished in disciplines that had an existing paper-based preprint culture. The subjects covered by most repositories were mainly physical sciences, mathematics, computer science and economics, with a small number of other subjects, including linguistics, philosophy and some social sciences. None of the repositories studied in the survey had content from the medical or clinical sciences, and only archive mentioned chemistry.
MIT's DSpace allows any format of digital object, from papers, to audio and video, to be uploaded and stored, although it only 'supports' i.e. guarantees to preserve in the long-term a limited range of non-proprietary digital formats. The DSpace team at MIT have devoted significant effort before and after its launch in November 2002 to marketing the service to MIT faculty, but at the end of 2003 most of the around 3000 documents on DSpace appear to have originated from pre-existing collections of grey literature.
Organizers of repositories report that a key obstacle is securing the engagement and participation of faculty. One unexplained exception appears to be the Australian National University which has some 2000 documents deposited in its archive without a sustained campaign of advocacy5.
What do publishers think of all this? Proponents of self-archiving frequently present the findings of the RoMEO (Rights MEtadata for Open archiving) study6 to claim that 55% of journal publishers explicitly permit self-archiving, while many more will agree if asked: in fact the study showed that only 19% explicitly permitted web archiving of the final refereed, edited and published version of the paper ('postprints'), with the balance of 55% comprising publishers who permitted authors to archive the preprint. This was reflected in the survey of publishers' opinions I conducted3: responding publishers were in general relaxed about preprints, but much more concerned about the impact on their businesses of the archiving of postprints. Interestingly, some 12% currently permit posting, but in future expect to restrict archiving in IRs or e-print archives. A majority of publishers in the survey thought that repositories would impact on publishing, but most considered that this would be broadly neutral, with negative impacts being offset by positive ones, such as building goodwill with particular communities. What most concerned publishers were copyright issues, proliferation of multiple versions of articles, and impact on subscriptions and the journal business model.
One publisher that is actively engaging with repositories is the not-for-profit publisher Oxford University Press (OUP). OUP has agreed to provide the Oxford University Library's repository with free access to published articles written by Oxford academics. Martin Richardson, OUP's journals director, has pointed to the benefits from exploring the key technical, economic and cultural issues surrounding the creation of IRs, but it is not clear how these will benefit OUP directly.
However, there are indications that repositories may play a much more important role in open access in the future. One is the growing usage levels of archived documents: for instance, at the Australian National University, the 2000 documents in the e-print repository have been downloaded 220,000 times, with the top document downloaded 1765 times. Perhaps the most interesting development is the growing adoption of institutional policies mandating faculty to deposit their publications in the local archive. For example, Queensland University of Technology (QUT) introduced a policy whereby all research outputs except where 'commercialisation or individual royalty payment or revenue for the author or QUT' is expected must be deposited in the QUT repository7. Clearly the widespread adoption of similar policies would transform the status of IRs but there are still organizational and cultural barriers to this at many universities.
My prediction is that institutional repositories will become an important part of the scholarly landscape over the next decade, but mainly for publishing, long-term archiving and stewardship of educational materials and datasets. They are likely to happily coexist with publishers, and their impact on open access will be limited.
Publishing Consultant, Bristol, UK
- Ware, M. Pathfinder Research on Web-based Repositories. PALS (Publisher and Library/Learning Solutions). Report available from the PALS website, http://www.palsgroup.org.uk (2004).
- See Nature 410, 1024-1025 (2001).
- Crow, R. The Case for Institutional Repositories: A SPARC position paper. Available from the SPARC website, http://www.arl.org/sparc/IR/ir.html (2002).
- Lynch, C. Institutional Repositories: Essential infrastructure for scholarship in the digital age. ARL Bimonthly Report 226 (February 2003). Also available from the ARL website, http://www.arl.org/newsltr/226/ir.html (2003).
- Steele, C. OAI: A 'down under' perspective. Paper given at the CERN Workshop on Innovations in Scholarly Communication: Implementing the benefits of OAI, 12-14 February 2004. See http://agenda.cern.ch/fullAgenda.php?ida=a035925 (2004).
- For example, see Gadd, E., Oppenheim, C. & Probets, S. The Intellectual Property Rights Issues Facing Self-archiving: Key Findings of the RoMEO Project. D-Lib Magazine 9(9), September 2003, http://www.dlib.org/dlib/september03/gadd/09gadd.html (2003).
- See http://www.qut.edu.au/admin/mopp/F/F_01_03.html.
This article is largely based on a research report commissioned and funded by PALS and is published with permission.
The Publisher and Library/Learning Solutions (PALS) working group is an ongoing collaboration between UK publishers (represented by the Association of Learned and Professional Society Publishers and the Publishers Association) and further and higher education (represented by the UK Joint Information Systems Committee (JISC)). PALS will organize a conference on institutional repositories in London on 24 June 2004. For further information see http://www.palsgroup.org.uk/palsconference04.