How OPACs Suck, Part 1: Relevance Rank (Or the Lack of It)
I recently wrote about NCSU adding a search engine to its online catalog. But after talking to librarians who asked me, “So what did they get for doing that?” I realized I need to back-pedal and explain how a search engine makes an online catalog easier to use (or, as Andrew Pace puts it, "Why OPACs Suck").
Cream Rising to the Top
I'll start today with relevance ranking—the building block of search, found in any search engine, from Google to Amazon to Internet Movie Database to little old Librarians' Internet Index.
At MPOW (My Place Of Work), as we say on the blogs, we're evaluating new search engines. Every product I've looked at offers relevance ranking, and every search-engine vendor tells me, bells and whistles aside, relevance ranking works pretty much the same everywhere.
By default, when a user conducts a search in a search engine—say, a search for the term million—the search engine should return best matches first. That's relevance ranking: the cream of the search results rising to the top. We're so used to this we don't even think twice when Google's first page of hits for the term million returns satisfying results.
OPAC Suckitude
But compare that same search in your typical online catalog. Today I picked two dozen online catalogs from around the country and conducted keyword searches for the term million. Call me picky, but the first page of hits—often the first or second hits—for those catalog searches should not include:
- Hog heaven: the story of the Harley-Davidson empire
- The rock from Mars: a detective story on two planets / Kathy Sawyer
- The Johnstown Flood
- Mosby's 2006 drug consult for nurses
- Hotel Rwanda
- Teens cook dessert
An OPAC that Got Game
You don't have to be a rocket scientist to see these catalogs aren't using relevance ranking. But you shouldn't have to be a rocket scientist to use a library catalog in the first place. Compare those results with the same search for million in the NCSU library catalog, powered by the Endeca search engine. Here are the first seven hits:
- 12 million black voices
- Million man march
- million dollar directory
- Black religion after the Million Man March
- Le Million
- Million dollar prospecting techniques
- Green groatsvvorth of vvit, bought with a million of repentance
So How Do You Make Cream, Anyway?
Relevance ranking is actually fairly simple technology. It's primarily determined by the magic of something every search-engine vendor will talk your ears off about: TF/IDF.
TF, for term frequency, measures the importance of the term in the item you're retrieving, whether you're searching a full-text book in Google or a catalog record. The more the term million shows up in the document—think of a catalog record for the book Million Little Pieces—the more important the term million is to the document.
IDF, for inverse document frequency, measures the importance of the word in the database you're searching. The fewer times the term million shows up in the entire database, the more important, or unique, it is.
Put TF and IDF together—the importance of a term in a document, and the uniqueness of the same term in an entire database—and you have basic relevance ranking. If the word million shows up several times in a catalog record, and it's not that common in the database, the item should rise to the top, as Endeca presents them in the NCSU catalog.
The users who complain that your online catalog is hard to search aren't stupid; they are simply pointing out the obvious. Relevance ranking is just one of many basic search-engine functionalities missing from online catalogs. NCSU worked around it by adding a search engine on top of its catalog database. But the interesting questions are: Why don't online catalog vendors offer true search in the first place? and Why we don't demand it? Save the time of the reader!
Technorati tags: library, library catalog, library catalogs, Online catalogs, OPAC
36 Comments
Hey, my catalog proudly proclaims that its results are 'System Sorted.' What could be clearer than that? And thank heavens the vendor doesn't seem to make it possible to customize that wording, or we might come up with something less Platonically ideal than 'System Sorted.'
Seriously tho, it still escapes me why more ILS vendors haven't implemented term frequency -- especially TF based on MARC (e.g. a 'million' in a 245 $a should rank higher than a 'million' in one of the 500 notes fields).
Vendors meet our expectations. (System sorted, indeed!) I'm trying to change our expectations, one librarian at a time. :)
The following observations are from three librarians so I'm using a collective 'I.'
While I don't disagree that OPACs need work, I do question if your test is revealing what you think it is. You did a general keyword search within the entire record, right? Your results would have been considerably different if your search was keyword within title fields. With a search term like 'million' you don't really know if someone is looking for the word in the title, sub title, author, subject, or whereever. While you don't specify, it looks like you want the 245 to have more weight that other tags. You said that you will be addressing tag weighting later but that seems to be the crux of your complaints against OPAC search results and it would be interesting to see if a committee of librarians could actually agree on weighting. An argument could be made that the search results at NCSU's catalog present an appearance of relevancy, rather than a true representation of a searcher's intent. Yet, if searchers are incapable of narrowing their keyword searches even by the title field is it going to be necessary to build it into the engine? Why are those works the titles of which have 'million' in the first five words of the title proper more relevant than works which have the word 'million' in a subtitle or in an essay title listed in a contents note or in a scope note? Why do those works that have 'million' within the first few words of their titles any more relevant than those that had it as the eleventh or fifteenth word? Why would a record that had 'million' in the title proper be any more relevant than a record that had 'million' in the title of an essay listed in a contents note? Authors notoriously choose titles proper that are for marketing (catchy, clever, puzzling) and that often lack real subject terms. Regarding term frequency, why would a catalog record be more relevant if it had the word 'million' more than once in the catalog record, and less relevant if it had the word 'million' just once? The number of times a word appears in a catalog record does not, in my mind, signify much, if anything at all. I just don't think that catalog records provide a whole lot of scope for relevance ranking.
In terms of why titles are more important, stick around for my discussion about field weighting, and I'll explain. But yes, the title is important. Google thinks so, and for that matter so does III--I just poked at a test installation of their ILS with relevance ranking and a neandrathal but at least extant spell-checker.
You don't even have to wait for your pre-historic ILS vendor to do it. Just grab the nearest techie, point your left hand at the OPAC, your right hand at http://aspell.sourceforge.net/, give them a flask of strong coffee, and shout 'GO! GO! GO!'
Joking aside, it's doable and your patrons will sing your praises for doing it.
p.s. it's not just III, the latest version of the SirsiDynix HIP OPAC also has a spellchecker
Personally, if I run a search that gets 10000 results, I'd much rather the OPAC at least had a stab at presenting the most likely hits on the first page. Whether it's done by some hyper-complex relevance ranking or if it's simply sorting the results by popularity (e.g. number of loans) is irrelevant from the end user's point-of-view.
Let's face it -- if I type 'million' into your OPAC, I'm more likely to be looking for 'Million Dollar Baby' than 'India's Unfinished Agenda: Equality and Justice for 200 Million Victims of The Caste System, Serial No. 109-102, October 6, 2005, 109-1 Hearing' (no offence intended).
Any relevance ranking on MARC would need to be customisable, but I'd expect a vendor would be able to make a fair stab at a default config without resorting to death-by-a-committee. Relevance ranking already appears in cross-database searching tools, and I see no reason why it doesn't have a place in the OPAC.
To my mind, relevance ranking and faceted searching are much more useful to our users than us just telling them to pick better choices of keywords.
Some of our users are always going to choose poor keywords -- is it too much to ask that we simply accept that and try to help them by using a 40 year old tried & tested technology? Or perhaps, should we insist that they attend an intensive 3 week course of 'Effective OPAC Searching Strategies for Dummies' before we give them a library card?
I'm writing as me now so expect a dramatic rise in the babble factor. I'm not trying to be a gadfly, honest, and Karen your blog is one of my daily 'must reads' and I'm just a systems guy so what do I know about how users search anyway but ...
I'm still a bit skeptical about relevancy ranking. I think it depends a lot on the type of user and collection. In a public library, sure, the patron might be looking for Million dollar Baby. What assumptions can you make about a user of an academic library? Perhaps here someone is looking for information about the Million Man March which is in a essay in a collection. If contents notes are furhter down the heirarchy of ranking the book might not pop to the top. As one of my cataloger colleagues pointed out, titles of scholarly works are not reliable indicators of relevancy. Personally, I'm more concerned with the problem of searching and 'granularity mismatch' as you mentioned Karen.
What I would like to see are more usability studies done on how users actually search and if relevance ranking is a help or hinderance. We just went through an OPAC makeover and none of it was based on usability. While I don't discount the experiences of public services librarians, I'd really like to see how users are searching. Let's face it, if users are entering single word search terms then relevance ranking isn't likely to be much of a help. That said I would really like to see OPAC relevance ranking that could be adjusted for a target audience. Our OPAC creates logs that includes the types of searches and the terms searched and I've been scrolling through a couple of months wondering what kind of analysis I could do to analyze user search behavior.
Dave mentioned spell checking. We are using the product Lucien from a company named Jaunter. It uses the Google spell checker.
So, in conclusion, I don't think we are that far apart in thinking. Relevance ranking is a tool to aid the user in searching but I want to make sure that it is really an aid and not an application of the 'we are the professionals and we know what you need' approach.
There are a few considerations related to searching metadata collections that come into play, and I will be discussing these. But the idea that relevance ranking hasn't proved its worth is simply unfounded. Also, as Dave points out, I never said RR was the only tool, but I will stick by my conviction that it is significant.
Ask Andrew Pace if relevance ranking is important. When he added Endeca to the NCSU catalog, the focus was on faceting. But he gained a number of other user-oriented functionalities that improve usability for his users.
You say 'We just went through an OPAC makeover and none of it was based on usability.' I'm wondering if you consider usability itself overrated, and why?
As far as my opinion about relevance ranking, I'm not hostile. In fact, I'm for anything that improves search results and plan to read the rest of your series to find out more. My questioning was more the result of never having run across any studies (of course I haven't directly looked lately).
I wonder if I should do a whole piece on usability...
One of the articles I found was by Andrew Pace, one by Walt Crawford, and as much as I respect and enjoy both of them as writers and thinkers, the articles in this case represent their THINKING on the subject, not research. One of the other articles looks like it might be reporting search, but it was in Portuguese. Two of the other articles may be research, but the full text wasn't available in Lib Lit, so I can't tell for sure. The final article studies OPAC transaction logs in light of web based search engines, and discusses user behavior. It doesn't study the question at hand.
Is there really enough data in a MARC record for relevance ranking to work? When you're ranking 10 (or 100) bib records, each of which has the same related LC subject heading, and each of which has your keyword once in the title, how do you decide which is more relevant?
I don't think TF and IDF cut it in this situation. I don't see how IDF has any relevance to a single word search, like in the cited example, 'million' anyhow. Unless I'm missing something, IDF only works when you're comparing two words to each other, and the rarer word gets the higher score.
And shouldn't the discussion of relevance ranking in the OPAC have stressed location of the keyword? Surely if the keyword occurs in the title or in the LC subject heading, it should be scored higher than if it occurs in a note or description, or somewhere else? Or not? This is exactly the kind of question we need research to answer, not blind hypothesizing. Excuse the rant, I guess you said you'd address the issue of field weighting in Part II.
It seems to me that relevance ranking has been demonstrated to work (sort of, more or less) in large full-text document collections, but that doesn't automatically translate to OPACs and their collections of MARC records. And frankly, TF and IDF don't work all that well by themselves even in full-text databases. Surely most people remember the early days of Lycos, Web Crawler, InfoSeek, Alta Vista and the like. How well I remember when I used to teach Search Engine classes back in the late 90's, and I'd tell people that if you got 3 items that looked even remotely related to your intended subject out of the first 10 hits, that was pretty much par for the course.
Google changed all that with its PageRank algorithm, which took links into account, putting human selection back into the equation. Dave Pattern (comment above) hinted at this with his suggestion that the number of loans should be factored into the relevance algorithm, which at least provides a possible way to tap into user selection again, similar to Google's PageRank. Another possible factor would be the number of holdings in WorldCat (how many librarians chose to purchase the title) which might be a fairly powerful ranking selector, but again, WHERE's THE RESEARCH!
Yes, I agree that OPACs suck, and that they should incorporate relevance (if proved relevant!) but where's the research on relevance ranking algorithms applied to collections of MARC records? It is my (limited) understanding that today's search engine algorithms and relevance ranking techniques such as TF and IDF are based on linguistics research that was done decades ago. But this research was done on collections of full text documents, not bibliographic records. Where's the similar research for MARC records? Until we can point to research to support our arguments, we're all just blowing in the wind.
Will
I will say this. We want research? Look at the difference layering a search engine over an OPAC made for NCSU. It's not academic research, it's street research. But I would ask anyone asking 'where's the beef' to look under the pickle. The pickle, to mix metaphors, is not blowing in the wind. You can see the pickle right up there, in the difference between NCSU's catalog and other catalogs. You can also see it when you search other metadata-dependent portals, such as IMDB. Is this entirely due to relevance ranking? As I've said repeatedly (sounding like one of those scientists being uber-cautious about global warming...), there are a number of factors at work. Does RR play a crucial role? Yes.
Frankly, from managing a metadata directory that has had relevance ranking since 2002, I will say that relevance works. In fact, I wonder, Will, if IDF isn't more important than TF in a metadata directory. It's moot (it's not as if search engines have switches to turn off one or the other), but an argument could be made for IDF's crucial role.
My last piece in this series will be about the inherent problems of any metadata directory, and yes, Will, I agree this is an enormous problem. I have my own ideas for how to address this. I even have my own Rube Goldberg flowchart that I developed for MPOW and trotted around to various findability gurus for their input and nod of approval. Stay tuned.
Keep those comments coming!
'Tried: (new and orleans and katrina) no records found... Tried: (new or orleans or katrina) 20237 results found. Sorted by Relevance'
None of these 20237 results is on my topic! Unless the user knows to look for the message I quoted, they will waste a lot of time with no chance of success. What kind of 'relevance' has it sorted by?
I agree with Will in that we need lots of research, but turning library catalog records over to Google isn't going to work any better than what we have now, maybe even worse. I think in the future the labor of creating library catalog records will be more automated and less labor intensive, but some kind of human intervention in identifying appropriate vocabulary to describe a book or film or whatever will be needed for a long time to come.
I would like to add that you can't judge the capability of OPAC design by comparing searches in the OPACs. Most decent library systems require staff time and technical knowledge to configure and take advantage of many capabilities. Many library staffs are lacking in this kind of time and expertise. We know that we could be doing a better job ourselves, but we are about 5 librarians short, based on our consultant's report.
Of course, there are library systems of all kinds.
Many are incapable of doing what makes research easy. But research will never be easy, for heaven's sakes. Good relevancy ranking alone will not give us the books we need without some time and thought. The system is not the judge of the book; the researcher has to make that judgement.
Here at NCSU, our hope is is to put our best first effort at relevance ranking on the table (how could it be worse than 'system sorted'?), and then actually do the research to see how well it's working. When we went live with our Endeca catalog, we created a relevance algorithm based on what seemed like common sense and initial testing. Now that we're starting to catch our breath, we're conducting comparative usability testing between Endeca and our old OPAC interface as well as studying the relevance of sample searches issued in both the Endeca interface and the old OPAC interface. We're keenly interested in whether the new interface makes things easier, and in how much it improves the relevance of the top results. Hopefully we will have results we can share soon.
I do have some interesting anecdotal evidence from usability tests this week. I've watched undergraduates with experience in our old catalog shy away from general keyword searching as if it were a 3-headed beast. They've been struggling to find LC subject terms that are *sort of* related to their topic b/c they know from experience that Keyword Anywhere in the library catalog gives them way too many results that just aren't relevant. One student illustrated his point using the 'declaration of independence' - his own (unprompted) example. He entered that phrase in our old system (no quotes) as a general keyword search and pointed out entries on the first page like:
- Pregnancy and power : a short history of reproductive politics in America / Rickie Solinger.
- Gibraltar : British or Spanish? / Peter Gold.
Of the first 10 results, 1 was actually about the declaration of independece. If this is what we hand students, they will just stop using our tools.
Both TF/IDF and tag weighting have been mentioned as relevance ranking methods. Both are supported within Endeca, and we're stressing phrase matching and tag weighting over TF/IDF for now. By exposing subject terminology through refinement lists, we're hoping to help students narrow broad single-word searches to more relevant results. There is no way for a catalog to auto-magically know what users are looking for when they type 'million' into a general keyword search. But we should be able to have a much better idea when they enter something like 'declaration of independence'. Try that puppy out (no quotes) in the Keyword Anywhere search in the NCSU Endeca catalog and look at all the cream that rises to the top! And who knew there was a 'constitutional history' subject heading? In the end, it seems like all we can do is put forward the best algorithm we can come up with and try to provide some reasonable suggestions. Even if it's not exactly what the user is looking for, at least they can see a sensible connection between the word they entered and the results they got, rather than something that appears completely random. Random-seeming results will not build faith in our search tools among the user community.
A quick note on spell-checking - in a recent usability test I watched a user completely fail to obtain results b/c he mispelled words like 'spanish', 'architecture', and 'ancient'. Yes, as a college student he should have been able to spell those words. That doesn't change the fact that he mispelled them. When tested in our Endeca catalog, all of those mispellings would have been automatically corrected, or at least suggested. I agree; our online catalogs need spell checking.
Also, in a public library catalog, a search for 'million' is pretty meaningful these days--whether you're talking '...dollar baby' or '...little pieces.' Plus I'm sure you know that most keyword searches in search engines (I didn't say OPACs) are one, two, and three-word searches, with 'one' the lead search. 'Declaration of independence' I could buy, but what are your search logs showing?
LII will be a better test for subject faceting, imho, because our primary 'subjects' are not the Klingon-esque LCSH but our local thesaurus developed by public service librarians and already usability tested. I believe in faceting. I think it's like vitamins. You really can't go wrong. It just isn't quite enough.
But I'm sort of digressing. I wanted to comment on a couple of things in an earlier post. Enhancing a catalog with search functionality available for the last several decades isn't 'turning it over to Google.' My goodness. How did we get there from talking about TF/IDF? Also, I always challenge myself to think about how we can present content with the least real-time human intervention. 'Save the time of the reader.' Is it possible to eliminate intervention? No. But it's a good goal, a North Star that keeps us in mind of the user who needs our services whenever, wherever--and also frees us to do other things (training, guides) that can serve serve as the rowboat across what Donald Norman called the Gulf of Usability? Heck yeah.
Boy, do I have anecdotes from our usability testing. Like out of a pool of subjects, some including the creme de la creme of librarianship, the only one who read the help file was the 20-something soccer mom.
Re spell-check--we gots data. We gots GOOD data. Stand by for that post!
I will probably first do a post that lists a couple dozen key functionalities of basic search engines. Clearly people want to hear about this. I'm ready to provide it.
And sure, LCSH subject terminology isn't necessarily natural language. My point is simply that providing some type of subject terminology is a start - I'm pretty sure research has proven that visual recognition is easier than thinking of a keyword from memory.
Libraries have always prefered to add metadata to improve access - FRBR is a recent example - and I think we're right to do so most of the time.
In this case we have to start providing user interfaces that match user expectations. Relevancy ranking is just the tip of the iceberg - no more phrase searching by default, no more remembering that 'before' is a proximity operator and you need to put it in quotes if you want to search for that word.
I think it's also time to start exploring the addition of user behavior to relevancy rankings:
If the last 10 people searching for harry potter picked out a record for the most recent edition, the most recent edition should come up first.
There is never any reason not to explore a better way.
I'm very curious about the leverage that can be gotten from fields I hadn't even considered. It will be interesting when we start testing the first faceted engine on our dev server later this month.
I still want to have an SLA bakeoff... or maybe I am proud of it because I just signed the invoice for the coding that made it happen ;)
'Relevance' - in terms of matching words especially in titles - is a pretty good start. It gives people confidence that the catalog contains content that applies to their topic. It's 'transparent' -- obvious why these results are found, another confidence-building element.
Alphabetic by author, Chronological by date, there are lots of choices. The trick is to find something that starts off reasonably useful and transparent, and provide good tools to go on from there.
Apparently-random (system sort) is the *worst* option. Most OPACs end up with some kind of random order, and that's a shame.
I'm very curious about the leverage that can be gotten from fields I hadn't even considered. It will be interesting when we start testing the first faceted engine on our dev server later this month.
I still want to have an SLA bakeoff... or maybe I am proud of it because I just signed the invoice for the coding that made it happen ;)
First two hits were:
A million little pieces / James Frey.
Million dollar baby [videorecording] / directed by Clint Eastwood ;
Which I guess for a public library (in July 2006) is pretty good.
Our OPAC sorts by "popularity," a number made up of copies, holds, media types, circ, and "activity dates."
At customers request, we also added a Title sort, mostly because it brings the different media types for the same "work" together. Popularity didn't do this, so that the movie, talking book, large print, etc. were all over the place. Patrons didn't like this.
Thanks for the thoughtful article.





