Recent Changes · Search:
 

The ARROW project team at Monash University provided written answers to the working party’s questions as part of their presentation. These are recorded verbatim below, with additional comments from the discussions.

1.  What is the history of the institutional repository programme or project?

The ARROW project funding bid was created by a team led by Monash University staff members Cathrine Harboe-Ree and Andrew Treloar in August 2003 in response to the call by the Department of Education Science and Training (DEST) for applications for Systemic Infrastructure Initiative funding. In October 2003 ARROW was granted AU$3.66 million over three years. Apart from Monash University, the other members of the consortium are the University Of New South Wales, Swinburne University of Technology and the National Library of Australia. In the same process funding was allocated to three related projects:

Meta Access Management System (MAMS)
The MAMS project is developing middleware for user authentication and authorisation for access to resources across institutional boundaries, and corresponding fine grained access control mechanisms for managing access to content.

Australian Partnership for Sustainable Repositories (APSR)
Whereas ARROW can be characterised as establishing Greenfield repositories, APSR is looking at ways of ensuring the sustainability of existing collections of digital objects, and the preservation strategies appropriate to repositories of digital objects.

Australian Digital Theses program (ADT)
The ADT program is redeveloping the metadata harvesting and resource discovery services which support the discovery of theses produced in Australian Universities.

Collectively these projects have been christened by DEST as the Federated Repositories of Digital Objects (FRODO) projects. Areas of common interest across these four projects include metadata and middleware, and the related standards.

↑ Contents

2.  What were the project’s objectives and expected outcomes?

The ARROW project was summarised in the funding bid as follows:

The ARROW project (ARROW) will identify and test a software solution or solutions to support best-practice institutional digital repositories comprising e-prints, digital theses and electronic publishing. A wide range of digital content types will be managed in these repositories. The NLA will develop a repository and associated metadata to support independent scholars (those not associated with institutions). A complementary activity of ARROW is the development and testing of national resource discovery services (developed by the NLA) using metadata harvested from the institutional repositories, and the exposing of metadata to provide services via protocols and toolkits. Initially ARROW will be tested in the four partner institutions, prior to it being offered more widely across the higher-education sector. The solution will be open-standards based, or will support open standards, and will facilitate interoperability within and between participating institutions.

These objectives have shaped the following ARROW services profile:

ARROW Services Δ

An essential component of ARROW will be the integration of research output management tools (initially Research Master, which is used by all three universities in the consortium). This integration is intended to allow universities to capture publications (in any category) with associated metadata once and then allow multiple use, for example, for reporting to DEST, for academics to use for promotional purposes, and so on. The Library’s involvement in the capture process should also help to improve the quality of metadata.

↑ Contents

3.  What is the content scope (eg document types and formats)?

The ARROW project is developing an institutional repository facility with the intention of accommodating a wide variety of digital objects. This means we are designing ARROW to allow declaration of a variety of content models with associated metadata schemas and validation rules.

The initial priorities are the management of digital theses, research papers, images and electronic journals.

↑ Contents

4.  What were the main issues and challenges that the project encountered?

Policy settings
While the software has been developed, the project leaders in each university have been acting as advocates for the repositories, explaining their purpose and potential. There is as much work in this area as in the software development, and quite long lead times must be allowed. For example, all three universities have recently mandated for electronic submission of doctoral theses. Monash University has also established memoranda of understanding with two faculties, while Swinburne has identified and is working with project champions.

Metadata
The project initially envisaged a single metadata schema to apply to all content. This has been modified to support multiple schemata, ensuring that appropriate metadata for various classes of digital objects is accommodated.

Persistent Identifiers
ARROW agonised over the type and format of its persistent identifier regime. Handles have been chosen as the persistent identifier scheme, with no branding or other meaning embedded in the handles identifiers. This will allow content to be moved between institutions as time passes without a need to relabel objects as institutions come and go. For the time being the preferred form of citation of an object in an ARROW repository includes the resolver address to ensure a web browser can navigate to the item without the need for a plug-in to manage the handles.

Copyright
Gaining copyright clearance for material to held in the repositories is a huge challenge. ARROW has made use of the JISC ROMEO findings, and Swinburne is currently expanding this database. ARROW is also interested in working with Creative Commons to find solutions to the copyright dilemma.

↑ Contents

5.  What were lessons learned and things that you would do differently?

The lead time for the project to reach the point where it could establish detailed requirements was longer than anticipated.

The iterative process allowed for in the agreement with VTLS, and the management for the project effort through a bank of development points rather than rigid specifications has served us well.

Adoption of Fedora as our preferred ARROW storage layer has required us to undertake more fundamental design work around content modelling and use cases than had been anticipated. The absence of established practice around content models has meant developing our ideas from scratch.

↑ Contents

6.  What repository software do you use and how did you choose it?

ARROW relies on a mix of open source and proprietary software as shown in Figure 2 below.

ARROW Software Δ

6.1  Storage layer

Following an internal review of open source repository software, ARROW chose Fedora as the storage layer for the ARROW repositories. We felt the flexibility of Fedora best supports our requirement to manage a wide variety of digital objects consistent with our building an institutional repository rather than a discipline based or “class of objects” based repository. Fedora supports the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) which is fundamental to populating the ARROW Resource Discovery Service established by the National Library of Australia. Fedora is still under active development and ARROW has taken the opportunity to become a founding member of the Fedora Development Consortium, whereby we can learn from others and influence the evolution of the Fedora software.

6.2  Content Workflow and Management Layer

The focus of the Fedora developments to date have been storage layer related rather than workflow and search related.

In January 2004 VTLS Inc announced VITAL, an image management system built over Fedora. The VITAL software is being developed in a partnership between VTLS and ARROW to provide the workflow and management utilities, and the search facilities for the ARROW repositories. The software is being developed to support generic institutional repository features such as declaring a content model then working with objects conforming to that model. ARROW is funding the development of a series of open source web services that sit between the Fedora software and the VITAL software. VITAL is being developed with VTLS resources.

Open Journal Systems software from the public knowledge project at the University of British Columbia has been adopted to support open journal publishing. OJS provides its own storage layer, however ARROW is working with Rutgers University to adapt software they have developed to migrate OJS content to Fedora for long term archiving.

6.3  Search and Exposure

The VITAL Access Portal is a web client for searching the ARROW repositories. Search is also facilitated by the exposure of ARROW content for harvesting by Google and other search engines.

Dublin core metadata can be searched through the National Library’s ARROW Resource Discovery Service, and SRU/SRW access is in development to permit users to search the native metadata of collections of objects stored in ARROW.

↑ Contents

7.  How much effort is involved in installing and configuring the software?

VTLS Inc provides all installation and maintenance services for the ARROW repositories.

ARROW has commissioned utilities from VTLS to allow declaration of content models and other configuration operations with the intention of making it easy to operate the ARROW repositories without recourse to use the Fedora utilities.

↑ Contents

8.  What process do researchers use to deposit their research outputs?

Assisted submission

Until shibboleth can be incorporated to support authentication and authorisation of users for write and update access to the repository, all end user submissions are to be assisted. At present all ingest through VITAL 1.3 requires gathering of materials by email or other means, and manual ingest by ARROW project staff.

With VITAL 2.0, to be released in June or July depending when Fedora 2.1 is released, the end users will be provided with a web client they can use to send files and metadata for review and validation by a trusted intermediary who has write and update to the repository.

For theses the intermediary will be the librarian responsible for cataloguing theses.

For research publications a variety of practices apply in participating universities, but in each case staff from the area responsible for reporting research outputs to DEST will be the trusted intermediaries.

VITAL 2.0 will also include a generic batch ingest tool supporting a variety of scenarios, including matching digital object files with related metadata files, or ingesting all files in a directory.

↑ Contents

9.  How are you recording and quality-checking the metadata records?

This will be the responsibility of the trusted intermediaries mentioned above. JHOVE validation (from Harvard University) is being incorporated as part of a future release of VITAL. How this will work in the case of direct deposit by end users is not yet obvious.

↑ Contents

10.  What are the main ongoing operational issues that you need to manage?

This is unknown as yet. The ARROW project is only just reaching the stage where the software is sufficiently developed to commence operations. We expect the individual institutional issues, especially capture of the annual published output, to be the biggest issue into the future, rather than the software.

↑ Contents

11.  How active is the repository (eg new articles or searches per week)?

This is unknown as yet. The ARROW project is only just reaching the stage where the software is sufficiently developed to commence operations. Monash has backed up content ready to load, including 600+ retrospectively digitised theses, 600+ working papers and several hundred digital images.

↑ Contents

12.  What are your plans for the repository’s future development?

The project has a further eighteen months to run, and software development to deliver all functionality required to meet the present objectives is planned through to May 2006.

Under the ARROW funding agreement the project is obliged, subject to a feasibility study, to offer the ARROW solution to other Australian universities beginning in mid 2005.

There are several bids for funding from the 2005 allocation of Systemic Infrastructure Initiative funds which propose building on the ARROW functionality, including such areas as support for e-research, large data sets, and learning objects.

Crucial to ARROW’s success will be the achievement of deposit of a wide variety of digital objects into the repositories and the ability to create reports from this material. The ARROW Content Committee is working on strategies to achieve this.

↑ Contents

13.  What advice can you offer to those starting a new repository initiative?

Policy and planning must drive institutional repositories. Also, as repositories are a relatively new field, and there are few established practices and standards in the area. It will be advantageous to any organisation starting a new repository to be actively involved in the forums where standards are developed and the issues around interoperability and federation of repositories are resolved.

↑ Contents

« State Library of Tasmania | Fact Finding | »

Home Page

Main.SideBar (edit)

PmWiki

pmwiki.org

ShareAlike Licence

Edit · History · Print · Recent Changes · Search · Links
Page last modified on 26 November 2006, at 06:34 PM