Installation and Customization Experience of Metadata Harvester System : Case from University of Ruhuna

University of Ruhuna is a thriving modern Sri Lankan university with strengths in both research and teaching. Library system of University of Ruhuna provides an enormous contribution to the university in order to provide access to scholarly information through journals (online and printed), books, e-repositories and other means. Library network of the university spend substantial amount of money for subscribing to printed and electronic journals annually. Access to paid scholarly literature is limited depending on the subscription type to the database or journal. There is a renewed trend of searching for open-access (OA) knowledge which break the price boundaries and provide better access to the current knowledge among academic community. Open access repositories provide searching and downloading of scholarly information materials for free of charge. Spending extra time on sorting and filtering of OA-materials by information seekers out of the search results obtained has been a major demerit of using OA knowledge. Finding and searching on different open access repositories separately is also been a difficult and time consuming task. A unified interface for searching on different OA repositories worldwide can facilitates readers by reducing the searching time and effort of different scholarly materials and OA repositories in the world. Harvester System of University of Ruhuna (HaSURu) deployed in order to fulfil the gap of a unified interface that can connect number of OA archives together and search over the internet for OA information. HaSURu keeps OA information of OAIregistered open-access archives worldwide. OAI is an organization that promotes the better dissemination of knowledge among researchers. There are 1944 OA repositories worldwide has registered under OAI as data providers. Present study aimed at acquiring all the OA repository information from OAI and provide access through a unified interface using OAI-PMH driven metadata harvester. This paper discusses the process of installing and customizing PKP OHS Harvester2 as OA information finder carried out at University of Ruhuna. All the Information seekers 1 Corresponding author: Assistant Librarian, Main Library, University of Ruhuna, Sri Lanka. E mail: kusala@lib.ruh.ac.lk 1 Senior Assistant Librarian, Main Library, University of Ruhuna, Sri Lanka. E mail: nimal@lib.ruh.ac.lk Journal of the University Librarians Association of Sri Lanka, Vol.17, Issue 1, January 2013 13 of Sri Lanka will find this new service as a better source of access to OA repositories worldwide for their future research work.


Introduction
University, as defined by the Oxford Dictionary (Oxford University press, 2007, p. 3445) is, "A corporation of teachers and students formed for the purpose of giving and receiving instructions in a fixed range of subjects at a level beyond that provided at a school.Later, and institution of higher education, offering courses and research facilities in mainly non-vocational subjects and having acknowledged powers and privileges, esp.that of conferring degrees." In other words, Universities are 'high-level educational institution in which students study for degrees and academic research is done' (Oxford University press, 2007, p. 1415).
Therefore academic institutes conducts massive amount of research in various disciplines.
Scholars who engaged in research need to get access to relevant academic knowledge (literature) to conduct their research studies successfully.
Prime responsibility of an academic library is to facilitate researchers (Brandt, 2007, pp. 365-396) with effective, efficient and reliable sources of scholarly literature from a wide spectrum of subject array.Libraries get subscription to electronic and printed journals from various subject disciplines in order to access to the current knowledge.Library system of University of Ruhuna (UOR) has currently subscribed to an array of printed and electronic journals.In addition to this, university also receiving journals as gifts and there are avenues to exchange journals to fulfill the literature requirements of academics.
Journals are primary means of sharing scholarly knowledge among academics (Chan, Gray, & Kahn, 2012, p. 7).According to the Library statistics, university of Ruhuna has spends substantial amount of money from the acquisition budget to get the subscription to online and printed journals annually.Depending on the amount of budget allocations, the university can only access to a limited collection of journal articles under the paid subscription to databases and journals.Paid subscription to databases is very expensive and there are also hidden limitations behind the technical walls and license restrictions (Charles & Bailey, 2006).Even though there are a number of means to access to current knowledge through journals.The present paid subscription to databases and individual journals is not sufficient to cover all subject disciplines of the University and therefore it is essential to find supplementary means to fill out the lacuna of the information supply in the university.

Open Access to Research
There is a renewed focus on browsing for Open Access (OA) scholarly literature among academics (Chan, Gray, & Kahn, 2012, p. 5) in universites.Access limitations resulted from the copyright, patent, and licensing of commercial publications (Chan, Gray, & Kahn, 2012, pp. 5-10) is also tend readers shifting towards OA materials.A study conducted by Swan & Brown (2005) shows that most of the scholars tend to use Google Scholar for searching desired academic information.According to European Commission, Open-access is "free access to the publicly funded academic knowledge that includes actual research publications, research data, and variety of other digital media and objects" (2008).Openaccess journals are free to download and use their contents with proper acknowledgements (Charles & Bailey, 2006) to the original authors.

Requirement of a Unified Searching Interface
Searching result produced by Google or other online search engine may contain both free and fee based information sources in the search result (Jacsó, 2005).Information seekers have to spend extra time and resources to sort out and filter the desired OA information among miscellaneous records with a noise.To avoid this problem, information seekers have to find OA information source and search on desired OA information separately (Liu, Kurt, Zubair, & Nelson, 2001).Finding information about different types of OA resources and searching on them separately is a difficult and time consuming task (Donaldson & Nelson, 2011).A unified searching interface can address this issue by amalgamating different OA sources together and providing a unified interface to perform search queries (Liu, Kurt, Zubair, & Nelson, 2001).It will provide information seekers to reach to their desired information in an effective and efficient way.
Aim of the Present project is to design a unified interface for searching and accessing to open-access repositories registered in OAI (Open archves initiatives, 2002).OAI provides a comprihensive list of Open access archives in the world.

Open Archives Initiative (OAI)
OAI is an organization that promotes Open access of reseach in the world.There are various projects maintained by OAI.Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a protocol that design to facilitate interoperability of information repositories through metadata exchange (Open Archives Initiative, n.d).Users can access to data providers through the OAI-PMH to download (harvest) its metadata information.Data providers are administer systems that supports OAI-PMH as a means of exposing metadata to harvesters (Lagoze & Sompel, 2008).Lagoze & Sompel (2008) explained OAI-PMH as an "application-independent interoperability framework based on metadata harvesting from metadata providers".OAI-PMH is a low-barrier mechanism for repository interoperability that provide a set of six verbs, requests or services that are invoked within Hyper Text Transfer Protocol (HTTP) (Open Archives Initiative, n.d).
Harvesters use these verbs to fetch and request metadata information form data providers.Metadata Harvesters are designed to facilitate this requirement of incorporating different archives into a single database and let users to search all of these archives in the same time through a unified searching interface (Liu, Kurt, Zubair, & Nelson, 2001;Public knowledge project, 2012).Lagoze & Sompel (2008) explains Harvester as a 'client application that issues OAI-PMH requests and collects metadata from other online repositories worldwide'.

Metadata
Metadata are 'structured data about data' or descriptive information about an object or resource whether it is physical or electronic (Greenberg, 2010, p. 3611).Metadata can be available in various standards (Greenberg, 2010, p. 3612).Most of the Digital libraries and repositories use Dublin Core (DC) as the major format of metadata (NISO, 2004).DC is a simple form of metadata which organized under 15 headings (NISO, 2004).Contents hosted in the archives are a complex of the digital material and metadata.Digital materials can be born digital or digitized and exists in different formats (documents, music, image, video, etc.).Metadata are carrying all the bibliographic information of the digital materials hosted in the archive (William, 2000, p. 14).Each digital material that hosted in data provider (repositories) are incorporated with metadata (Greenberg, 2010, p. 3610).
Harvester application can access to external data providers (data sources) using the OAI-PMH and download only the metadata information related with the digital materials hosted in the archive.
Universities, research institutes and other database vendors around the world produce their own Digital Libraries (DL), Institutional repositories (IR) or Digital Archives (DA), Open Journal systems (OJS) etc. to collect, store and disseminate scholarly materials of different file formats (William, 2000, p. 70).Most of these DLs, IRs and DA are publicly available through internet (McCray & Gallagher, 2001, p. 51) and provide their content to share by researchers (Chan, Gray, & Kahn, 2012).

Materials and Methods
An open source software platform named Public Open Harvester System (OHS) that developed by Knowledge Project (PKP) was used to deploy the Harvester System for the University of Ruhuna (HaSURu).Following steps were accomplished in the process of deployment.

Installation of the Harvester System
Present study used the latest version (OHS-2.3.2) of the PKP open harvester system that installed on Ubuntu Linux 12.04.1amd64LTS.OHS system uses MySQL 5.5.28 as the relational database, PHP5 5.3.10, and Apache2 2.2.22 as the HTTP server.HP PortLiant ML350 G6 server was used to install the base system of the harvester.HP PortLiant ML350 G6 server is configured with redundant power and Network supplies.
Redundant power supply was used to install two separate Un-interrupting Power Supply (UPS) systems.Redundant network connections were configured for the LAN/WAN access and local backup management separately (Figure 01).Having separate connections for LAN and remote backup storage access will be increased effective through put of the cables.Since the read/write (I/O) ratio is high in the Harvester, total storage of 1TB was configured as RAID 0. RAID 0 provides only the striping with zero redundancy.Mirroring of storage space was not configured since separate backups were maintained in another local server.Scheduled updating and indexing scripts will keep the system up-to-date with the frequent changes in the data providers.This will enhance the searching efficiency.The files and directories were write-enabled (including config.inc.php,public,cache, cache/t_cache, cache/t_config, cache/t_compile, cache/_db).Client, connection and database character sets were set to Unicode (UTF-8) to get better encoding functionality in different locales and MD5 was used as the password encryption algorithm.

Customization
Since the OHS is a common platform, it should have to be modified according to the requirement of the institute.The initial interface, security and access policies were considered in the customization step.The browsing content of the system was organized under four access points: Author, Title, Subject and Date.several data providers in the harvester.Harvesting of all addedd archives can be initiated by providing the following command at linux shell (root terminal).

#php[dir]/tools/harvest.php all useLastSets
New addition of data providers can be proceeding after harvesting metadata from initially defined archives.After defining different data providers in each update, the harvesting process can be initiated by using the following command.

#[dir]/tools/phpharvest.php all from=last skipExistingEntries
This command will update newly added data provider information only.If the first command used, it will update all the records from the beginning including the existing entries.This will take a long time to complete the metadata base.Since it using long-lived transactions, can cause internet traffic under low bandwidths.
Indexed list of all the data providers added to the harvester (Figure 03) can be obtained by, #php[dir]/tools/phpharvest.phplist Since the output (Figure 03) showing the list of data providers with the archive ID and number of records that harvested from that particular site, administrator can easily refer the newly defined data provider out of the list and execute further update process on it.

Updating harvester base system
Harvester system can be updated time to time with the release of new versions by the PKP.There are various ways to perform this task.Following commands can be used in In other view it provides information of individual archives (Figure 08).In this interface user can find the name of the desired archive (data provider) all the contents harvested from that data provider, and the link to access to the original record.
Harvester also provides summarized details about all the data providers in the harvester (Figure 09).OAI-registered archives in the Harvester.This interface provide details about archive name, number of materials hosted, total number of data providers, and number of pages ahead.

Intended Expansions to the System
HaSURu system can be hosted in the centralized server in university of Ruhuna and all other university libraries can link to this service via the WWW in the form of Software as a service (SaaS) of cloud computing (library) concept.Data providers that registered on OpenDOAR will be able to add into the harvester as a second step of the implementation.
Repositories hosted in 2227 (on 1 st May, 2013) OpenDOAR consists of open-access repositories from Asia, Africa, Australia, Caribbean, Central America, Europe, North America, Oceania and South America to the HaSURu database.
They keep a record of all the open access archives who registerd with OAI in the world.These repositories contain information from a broad array of subjectes and with multidisciplinary knowledge.According to the (Open Archives Initiative, n.d) 'Open Archives Initiative (OAI) develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content' and increasing the availability of scholarly communication.The fundamental technological framework and standards that are developing to support this work are independent and opening up access to a range of digital materials.OAI maintains a list of 1944 data providers who registered and confirmed by OAI in order to disseminate scholarly knowledge among researchers (Open archves initiatives, 2002).Community can get use of these OA information sources with a proper acknowledgment.
Harvester2 is an open source metadata harvester and aggregator that have been developed by the Public Knowledge Project (Public Knowledge project, 2013) which aimed at expanding and improve access to global research.Harvester2 (Public knowledge project, 2010) has designed as a flexible tool for fetching, storing, indexing and searching data from different types of information sources (Liu, Kurt, Zubair, & Nelson, 2001).Harvester2 supports multiple harvesting protocols versions (OAI version 1 and 2), metadata standards (Dublin core, MODS, MARC), and languages with an emphasis on performance and simplicity of use (Public knowledge project, 2010).Among different types of harvesters, PKP Harvester2 provides easy management and installation of the base system.It can be further customized by designing new plugins, patches to the base system (Public knowledge project, 2010).

Figure 03 :
Figure 03: Output of the List command which displaying the archive ID, name and number of records harvested from the data provider

Figure 05 :
Figure 04: output of the check command showing the existing and available product versions

Figure 07 :Figure 09 :
Figure 07: Index of total content in HaSURu