Coordinating Committees
CUWL Resource Discovery Exploratory Task Force
Final Report | August 14, 2009
This report outlines a number of the issues identified by the CUWL Resource Discovery Exploratory Task Force (RDETF) that should be considered as we work to improve access to our library resources. The nature of data hosting and interface design has undergone significant change in recent years, leading us to a vision characterized by flexibility and control. The pursuit of this vision is not only feasible, but also strategically imperative in order to provide broad, dynamic and relevant searching of disparate collections.
Our vision can be characterized as follows:
- Decouple the resource discovery interface from the integrated library system (ILS).
- Decouple the resource discovery interface from the underlying data.
- Pursue searching of harvested data rather than federated searching.
- Reduce the number of discovery interfaces.
- Improve our discovery interface(s) through user centered design practices.
Our vision can be understood visually as follows:

Goals:
The following goals are derived from user focus group data, published and unpublished reports, and internal (task force) discussions.
Provide a simple and intuitive search and retrieval experience.
- Allow natural language searching.
- Include 'did you mean' functionality.
- Distinguish between peer-reviewed and non-peer reviewed sources.
- Allow patrons to quickly drill down within the results of a search.
- Offer recommended resources based on a patron’s search.
- Provide access to online help that is integrated into the interface.
- Allow for various sorting of results including relevancy ranked order.
- Integrate with citation management tools.
Allow patrons to find an array of resources and formats, including analog and digital items that we own or license through a single or reduced number of integrated search interface(s).
- Link to full text documents and snippet views (e.g. Google Books).
- Distinguish between sources immediately available online and those that might take some time to access.
- Distinguish between analog sources available immediately within our campus libraries and those that would require time to access.
Allow for intuitive and efficient administration of resources and the user interface.
- Allow UW Libraries to have control over the search interface and our data.
- Allow UW Libraries to host or provide access to multiple types (formats) of data.
Task Force Recommendations:
- The UW Madison Shared Development Group (SDG) should continue to develop UW Forward. UW Forward is a unified search interface for discovering UW System library data. For an in-depth explanation of UW Forward, please see: http://disco.library.wisc.edu/
- UW Libraries should be encouraged to provide access to UW Forward 1.0 from their public Web site as soon as the public beta is available (expected by 11/2009). CUWL should charge the USCC with gathering user and staff feedback and formally assessing UW Forward in early 2010.
- UW Libraries should continue to make improvements to locally managed resource discovery layers in line with user expectations as identified by user focus groups and other usability data.
- UW Libraries should pursue data hosting services for harvested and licensed article data. Individual campuses should be encouraged to experiment with data hosting services as they become available.
- UW Libraries should continue to analyze the functionality and integration of core library services, such as Universal Borrowing, with locally developed and vendor supplied discovery layers.
- CUWL should begin to address the need for individual UW System Libraries to experiment with possible system-wide solutions for resource discovery. One System, One Library should not preclude individual campuses from piloting solutions and then sharing their experiences with all other UW Libraries.
Discussion:
The UW Madison resource discovery report available at http://staff.library.wisc.edu/rdetf/RDETF-final-report.pdf (6/2008) recommends that we decouple the discovery interface from the integrated library system (ILS) and present our data in a way that is aligned with user behaviors and expectations. The CUWL Task Force agrees with this recommendation and feels that the underlying ILS should not tie us to a specific discovery interface.
Over the last twenty years, libraries have increased the number of individual silos of content that we provide our students and faculty. We’ve gone from providing our catalog plus ten or twenty vendor-supplied discipline specific literature databases to hundreds of individual sources of content – all with their own resource discovery interface. During this same period, our users’ experience of resource discovery on the open Web has been exactly opposite of this trend. Web search engines have created single massive indexes of harvested content that can be accessed via a single search from a single interface.
Federated search systems (broadcasting searches across multiple, distributed, data sources) arose out of a real need to simplify the user experience. In practice, there are three major drawbacks to federated searching. First, the number of live connections that can be sustained simultaneously is limited. Second, the slowest performing target defines the best performance of the overall search. Finally, in order to effectively perform collation, de-duplication, sorting, and relevancy ranking, federated search services need to limit the number records displayed to a small initial result set received from each remote source.
We recommend improving the user experience of federated searching, as we move towards integrated searching of pre-indexed harvested content. At the same time, we recommend reducing the number of interfaces to our content, including the licensed content we purchase.
Just as we recommend the decoupling of the discovery interface from the integrated library system (ILS), we also believe that the discovery interface should be independent of any specific silo of data. By making the interface independent, we are better positioned to reduce the number of interfaces while improving their overall design and functionality.
One primary means of achieving this independence is to pursue the use of application program interfaces (APIs) whenever possible. An API is a standardized way for a website or service to talk to another website or service and pull data (e.g. search results) back into the environment of choice. It is through the use of APIs that we can access pre-harvested silos of content, which allows us to create a more integrated interface for searching and retrieving both locally and externally managed content.
Even as we improve federated searching and pursue a model of searching pre-indexed harvested data, non-federated and non-harvested searching of external silos of content (i.e. searching individual databases of content) will continue since not every resource can be searched using federated services, nor does every resource provide access to their content using an API.
One resource discovery solution that aligns with our vision is the UW Forward project. The UW Forward project originated in the spring of 2009 within the UW Madison Shared Development Group (SDG) and has focused on development of an integrated discovery layer to search UW System data (bibliographic, digital collections, repository data, etc.). Using the widely adopted Solr/Lucene indexing environment along with code based on the University of Virginia's Blacklight project, UW Forward consists of a flexible, faceted interface searching across a harvested index of all UW System catalogs, representative Minds@UW repository data and two collections from the UW Digital Collections Center (Ecology and Natural Resources Collection and the State of Wisconsin Collection). While Forward is currently a full-featured interface that searches locally harvested and managed data, future development would likely involve extending the common interface to externally harvested and managed data (primarily licensed content) through the use of APIs.
Due to the system-wide scope of UW Forward, issues related to the integration of services and delivery of content (universal borrowing, licensed content access, etc.) present challenges. It's worth noting however that these challenges are not unique to this project and will need to be addressed with any new consortia resource discovery solution.
Overall, we believe that this project shows great promise and it should be encouraged through the remainder of this calendar year. At that point, we should be better positioned to determine the long-term feasibility and cost of UW Forward and to decide whether to proceed with this locally managed resource discovery solution.
To sum up, we are not recommending a particular resource discovery solution at this time. UW Forward shows great promise, but it is too early to know whether this product will ultimately become the UW System resource discovery solution of choice. Likewise, we cannot recommend any of the commercial products currently available. Some commercial products show promise, but like UW Forward, it is too early to know whether these products will meet our objectives. The resource discovery market is in great flux as new products are being introduced almost monthly. Therefore, we recommend that CUWL support the continued development of UW Forward and closely monitor other resource discovery solutions.
Brief Product Update:
Section 6 of the UW Madison Resource Discovery Exploratory Task Force Final Report provided a brief scan of several resource discovery products. Here’s an updated assessment, concentrating on those products that would enable access to harvested vendor data.
- WorldCat Local (OCLC): WorldCat Local (WCL) is a localized, branded implementation of the Worldcat.org database. It provides access to WorldCat bibliographic holdings and numerous hosted article-level databases via a rich, faceted, interface. Results are presented in a tiered fashion with home library holdings listed first followed by holdings of consortium members as defined by the customer. Customization of the interface and function of WCL is limited. The University of Washington library catalog at http://www.lib.washington.edu/ is an example of a WorldCat Local implementation.
WorldCat Local strengths:
-The user interface is developed and supported by OCLC and thus requires minimal development or maintenance effort.
-Consortial view directs users to most accessible records.
-Integrated view of different works, expressions, manifestations and items (FRBR)
-Links to book covers, reviews, Google books, etc.WorldCat Local weaknesses:
-The WorldCat database does not contain all UW collections. Local collections of various types would either need to be exported to WCL or searched through a different interface. Likely negative consequences of this include an increase in ILL requests for owned items and patron misperceptions or dissatisfaction with collections in the UW System.
-Many of the features that patrons currently depend on are not available in WorldCat Local, or function in less than optimal ways.
-Searching geared towards public libraries with advanced searching options somewhat limited.
-Currently no means of limiting searches to specific branches or locations.
-Limited ability to integrate with other systems and services such as local course reserves, subject guides, new book lists, etc.
-The procedures for ordering and processing of materials may need to change so that the information gets entered into OCLC WorldCat sooner than it does now. Otherwise on order and in process materials do not display to our patrons.
-UB integration within our consortium is not yet fully developed.
-Matching UW Library holdings with the WorldCat Local database and working out the ensuing issues would be a very labor-intensive project, likely worth undertaking, but worth noting.
-
VuFind/Blacklight including UW Forward (Open Source): Both VuFind and Blacklight are Solr/Lucene-based harvesters. UW Madison installed both VuFind and Blacklight with a selection of UW Madison catalog data. We found that VuFind would take significant reworking of the holdings data to work with the UW Madison catalog and is in a programming environment that is not supported on the Madison campus. UW Madison chose to end its VuFind development work.
Blacklight, on the other hand, is a more open solution using a computing environment in which we have significant staff expertise. While it requires extensive templating of the user interface, the UW Forward project is being built using Blacklight and has now ingested and matched on OCLC number all UW System OPAC data as well as several UW collections and Minds@UW. Using an API to connect to a vendor-harvested database and intermixing these results with the results from UW System databases has not yet been done but shows great promise. UB functionality has not yet been programmed but is underway. The UW Forward discovery interface can be accessed at http://disco.library.wisc.edu/forward. An online presentation and demo of UW Forward is available at http://disco.library.wisc.edu/.
-
Summon (Serials Solutions): Summon is a Solr/Lucene-based vendor product that harvests licensed article level data. This central store of harvested data can be searched using the Summon interface or through locally developed interfaces via a published API (not yet tested). It currently contains content from over 100 providers and includes 400+ databases, 6000 publishers, 50,000+ journal titles, peer review designation provided by Ulrichs and 500,000,000 items indexed. New publishers are joining Summon Service weekly including Gale, ProQuest, Ingenta, ISI and L/N. They do not have an agreement with EBSCO because EBSCO is not a publisher - they get content directly from the publisher. Summon is also designed to host OPAC and other types of data.
Beta versions of Summon are now available at a few universities, Dartmouth is available publicly at: http://www.dartmouth.edu/~library/home/find/summon/
The vendor stated they are not ready to harvest our consortium library catalogs and provide UB functionality, but they are ready to harvest campuses individually. Summon shows great promise either as a host for licensed article level content only or for all UW System data (OPACs, digital content, etc.). However, the product is very new and pricing information for our consortium is not yet available. We are still working on a price quote for UW La Crosse.
-
Primo and Primo Central (ExLibris): Primo Central was just announced and has no product date as of July 2009, but it would be a competitor to Summon which would be very promising given our current use of MetaLib. A Primo implementation at the UW consortium level would be similar to what is currently being done with the UW Forward project. Primo allows for the ingestion and indexing of data in a Solr/Lucene database for more integrated retrieval. Integration of display and consortium services such as UB would likely require development effort to implement. ExLibris has stated that an API can be used to access our harvested data to integrate it into the resource discovery interface of our choice. We can also access a Primo database harvested and hosted by another institution. An example of Primo (not Primo Central) including data in addition to the library catalog is at Vanderbilt University at http://discoverlibrary.vanderbilt.edu/.
All the work that UW developers have done to index the UW System catalog data and UW collections for the UW Forward project would have to be redone in this environment. But since it is Solr/Lucene-based, we now have gained a lot of expertise in setting up the input processing needed. De-duping and UB functionality would have to be built into this product as it would for any other harvester.
UW System librarians have expressed great interest in the Summon system ability to limit to peer reviewed content. Whether Primo Central will offer peer-reviewed filtering has not been stated.
-
Endeca (Endeca Technologies): Endeca’s Information Access Platform (IAP) is a commercial resource discovery layer used by several libraries and many large companies. It is designed to provide quick and intuitive public access to data. Major strengths include an excellent Solr/Lucene-based indexer and the availability of user interface templates from NCSU and the University of Florida. Endeca has a high cost relative to other open source Solr/Lucene-based solutions now available. To implement Endeca, UW staff would need to build and configure any non-library catalog data input streams. At this time, no vendor data has been harvested using this system, so we would have to negotiate these agreements ourselves. Given the price and work involved, there are other more cost effective solutions available.
-
EBSCO Discovery Service and Integrated Search (EBSCO): Announced in April 2009, version 1 of this product is planned for December 2009. EBSCO states it “already has 410 content participants signed up for “local” access – this is in addition to the availability of 300+ locally loaded databases via EBSCOhost. The comprehensive collection will include journals from databases, e-journals from publishers, and non-journal content, including but not limited to: WorldCat (130 million books, videos and music CDs), NewsBank, Readex, Alexander Street Press, etc.” We are not yet able to view this product or compare it to other systems.
-
eXtensible Catalog (Open Source): This Andrew Mellon-funded project is working to create a set of toolkits for allowing libraries to harvest and make available their library data in a more easily accessible way. It will also allow the data to be accessed via other content or learning management systems. The OAI (Open Archives Initiative) toolkit allowing for OAI harvesting of catalog data and NCIP (NISO Circulation Interchange Protocol) toolkit for real time circulation status retrieval have been released while several others are in development. This project merits watching and is likely to be very useful in the future, but at this time this project is not a good match with our current computing environment.
-
AquaBrowser (Bowker) and Encore and Enterprise Portal Solution (SirsiDynix): While both of these products provide competent resource discovery interfaces, given the computing environments and existing vendors in use by UW System Libraries we see no advantage to choosing one of these products given several other good choices.
Report Submitted to CUWL on July 24, 2009
Amended on August 14, 2009 - NOTE: The only change to the original report is the wording of the WorldCat Local description, strengths, and weaknesses under the ‘Brief Product Update’ section on pages 4 and 5.
Task Force Members:
Barb Bren (Whitewater)
Sue Dentinger (Madison)
Bill Doering (La Crosse)
Steve Frye (Chair, Madison)
Eric Jennings (Eau Claire)
Lisa Jewel (UWSA)
Mitch Lundquist (Madison)
Valerie Malzacher (CUWL Liaison, River Falls)


