I was invited recently to deliver a one-day course on Institutional Repositories and Metadata for a group of academic librarians in Ireland. The course was based on the CILIP course of the same name and tailored to the needs of ANLTC (Academic and National Libraries Training Consortium). During discussions with colleagues in ANLTC – all experienced information professionals – one issue came up repeatedly: the difficulty in describing items on institutional repositories in a consistent way.
Typically an institutional repository contains digital resources such as preprints, machine readable theses, manuscripts or old maps or drawings. It may also contain data sets generated by research at the institution. The repository represents the intellectual and research output of an institution, so it is not only an internal resource but also a showcase. Ideally a repository is designed in such a way that academics and researchers can deposit and tag their own material quickly and easily, and make it available to the scholarly community. In practice, it often falls to the library and information staff to create the metadata.
One of the issues is lack of expertise, time or inclination on the part of researchers to index or tag their material adequately. Although there has been a lot of talk about automatic indexing this is still not a reality. Notable experiments in the United States for instance have attempted to develop applications to do this but so far need further work. So we are back to the old and tested approach of library and information professionals deploying their skills to make the resources in information repositories retrievable.
Core information skills are required to do this. For instance, how do you ensure that similar items are consistently indexed? Or which vocabulary do you use to describe the content of a resource on the repository? There are various approaches:
- The use of a classification scheme such as Library of Congress or DDC – or perhaps a specialist classification for your subject domain – such as MeSH (Medical Subject Headings)
- If there is no classification system that suits the purpose, an institution might devise its own. This is a path taken by many organisations particularly where they have a large collection of specialist material. However, for many institutional repositories this is not an option however, because of the range of subjects offered, each with its own specialist vocabulary. This is where information professionals with their knowledge of taxonomies are able to select and apply them. Part of their skill is also in understanding the users. For example, a research student looking for previous relevant work at their institution might search by author or keyword. A lecturer might be seeking materials to illustrate a presentation in a lecture and will have different priorities. A prospective student might want to browse to get an impression of profile of the institution, and a funder might be looking for evidence of performance among the staff and the research units that it funds.
- A third approach is to prepare an abstract or to index the item so that it can be ‘discovered’ by a federated search service or by an individual query. Although free text searching – looking for words or phrases as they appear in the text of a document – is a very powerful search tool, it has limitations. Some terms are ambiguous because they mean different things in different contexts, some terms may be so specific that a searcher only thinks of one synonym when different synonyms are used in different items. Without search aids such as drop down lists there is no way to guarantee that the search results will be relevant. Another problem arises with digitised images where free text searching is not always available and which need to be explicitly indexed.
Skilled information professionals with a range of subject knowledge and an understanding of how users seek information and the skills to describe information resources (by cataloguing them and preparing summaries) will continue to be needed for some time to come. I would go further and suggest that they are also needed to define the standards and manage the systems that make resources available in institutional repositories.