OPAC Home > Interview with Marshall Breeding

Interview with Marshall Breeding, Director for Innovative Technologies, Jean and Alexander Heard Library, Vanderbilt University, and Executive Director of the Vanderbilt Television News Archive

Transcribed from a telephone interview by Megan Macken, Assistant Director, Visual Resources Collection, University of Chicago, April 2008

Your last article in Computers and Libraries, “Content, Community, and Visibility: A Winning Combination,” discusses the type of content that will draw in users–locally digitized photos, audio and video. But, it seems that not many catalogs include this type of content yet. What’s holding them back?

Well, I think that we’re just now at the beginning of this. The current traditional OPAC is one that just has the inventory of what’s in the ILS, you know, the books, the DVDs, the microfilm and all of that. So, that interface really hasn’t been an appropriate place to put item-specific information related to the contents of digital collections. You might have a collection-level record or something like that as opposed to having an online catalog record with the image with all the insides of these collections. But I think that’s what’s changing now. We’re really thinking quite differently now about the what the scope of a library interface ought to be. So you look at the major ones that are out there now, things like Primo and Encore and Aquabrowser, things like that. Each of them makes a stab at trying to expand the scope of what is subsumed in the library interface. I can probably speak more for Primo, since that’s the one we’ve elected to go with here at Vanderbilt. That’s really a key feature of it. This ability to expand the scope into all the other things that a library considers parts of its collections at a very granular level.

We’ve started off by putting in all the records for our Vanderbilt Television and News Archives. These are records that point to full-motion video clips of the newscasts that are in that collection. It’s kind of unique to Vanderbilt, but I think a good example. We have 850,000 video clips as a part of our news archive that are used mostly by the external world. A lot of schools of journalism and other things like that know about our archive and use it, but it’s not used so much on campus. So by putting the records from TV/News directly into our interface, we think that will give us a lot better exposure to our local users. So we’re mixing in our bibliographic records from our ILS and from TV/News as the first proof of concept, expanding the scope, including bibliographic and other kinds of metadata records. And it’s been very interesting. We get great exposure of the TV/News stuff and search results that we wouldn’t have gotten otherwise. And this is just to start off with–we did our ILS inventory and TV/News just to show that the interface works well for searching different kinds of collections in a single index.

We have in mind to next put in a lot of images from other collections that we have. We have all of the photographic archives from special collections. We’ve got a special art collection containing pictures of art works from the early part of the century. We’ve got a database of images of religious iconography. All those kinds of things that we plan to pour into this interface. I think that’s part of what it is that I’m talking about: the ability of putting into the mix all of these audio and video and sound clips that will better represent what libraries are these days. I forgot to mention we have this thing called global music archive where we have clips of East African traditional music that has been recorded in the field. We have mp3s of them and a little separate database of them. But we plan to put those in our Primo based-interface as well. So, we have an agenda, hopefully, getting all this done, having a library interface that from the first place where our users begin searching is going to include things far beyond our book collection, but you know, video, sight and sound.

How do we know our data will “fit” in the next-generation catalog?

Well, the metadata is what it is. They’re all different. That’s part of the challenge though. We’re not going to turn them all into Marc records. If you have to do a lot with the metadata to pour it into one of these next-generation interfaces then the threshold of entry is just simply going to be too high. That’s why I thought the TV/News experiment is a good one because if we can mix into an index search, the Primo indexes–the very detailed Marc records from our book collections, from the ILS data, it’s more than books, but you know what I’m talking about, all those nuanced Marc fields–you take that structured data and then you mix in with it our TV/News metadata, which is the opposite end of the spectrum. It’s mostly an abstract. We have some structured fields that describe the network it was recorded from, the date, the time, you know, maybe the reporters, but hardly any structured data there, and certainly no authority control.

So, I think the challenge here is to be able to have a search and retrieval environment that does ok with apples and oranges. That it’s able to actually produce search results when the quality and the depth of the metadata isn’t always equal. In my observation with Primo is that it’s done ok with that. TV/News results show up where you would expect them to. They do show up. They’re not underprivileged in the search results. The tricky part is the faceted navigation. It wants to be able to pick out facets to help the user narrow the search, and those are drawn out of the structured fields. So, it’s kind of under-represented in the facets that show up in the interface, but you know at least it’s there. So, in an ideal world you would have rich metadata for everything, but in the real world you have the metadata that you have, and you’re certainly going to have a different kind of metadata for contents from image collections or video collections, or anything else that you have. So then you just have to compensate with the technology to ensure that regardless of all those nuanced metadata issues. It does ok.

You mentioned in your presentation, “Introduction to Next Generation Library Interfaces” in Wheeling, IL, that we are “moving to a post-metadata world.” How do you envision the future role of metadata librarians?

Well, there’s still going to be a lot there. This is an “in addition to” and not an “instead of” kind of thing. In my next column in Computers in Libraries, which isn’t out yet–I think it’s for the May issue–I talk about the issue of Deep Search, which is really related to what I was talking about in that presentation, first starting off with book content, but I think it will expand to the rich media collections as well.

It’s an era where we we’re going to have, already have, millions of books digitized through things like the Google Library Print Project, the Google Publisher Project, the efforts of the internet archive, and all European mass-digitization efforts. People are digitizing books by the millions. We’re not too far from the time when the full text of all the books, most all the books, is going to be available for searching. So that I think libraries need to have search technologies that are capable of doing that and be able to address the question of how to get at the full text, especially, with the Google digitization environment. The world doesn’t necessarily get access to all the full text. But I think all that’s important so that one can search for a phrase and know what book it’s in–going far beyong the access points that even the best catalogers be able to put in a marc record.

I did an experiment in preparation for this article or for a talk or something where I used Google Book Search. And so first I’d look for a book and find it, find a phrase that it exposed somewhere in a book that I thought wouldn’t be in any of the subject headings or anything like that. Then I did a new search for that, and sure enough you can search across Google Book Search and find books based on phrases that exist in books. So I think I did another search for like RFID technology and it came up with a few hundred books that talked about that. Where it might have a chapter, it might be a few paragraphs in a chapter, but none of the books were entirely about that so they weren’t cataloged that way.

So, it kind of shows the power possible in full-text searching that goes beyond what you could do when you’re just cataloging the book with metadata. That all by itself would probably be a lousy way to search. In a way it’s too much. It errs on the side of retrieving too much. Whereas the traditional cataloging approach might err on the side of retrieving too little. I think a blend of the two is what we need, where you’re combining the power of the deep-searchable full text with the kind of thoughtful selection of access points a librarian would do as she or he catalogs the book. So, they complement each other.

The way I see the world right now is that the commercial world is pretty far ahead of us in the deep search arena. Where in Google Book Search you can search the full text of millions of books, in Amazon you can’t search across the full text of books. But, larger and larger portions of the books that you find in Amazon have the ability to search within the book. So that once you find a book, you can search the full text of it and find a given piece of information to see if you want that book. As far as I know, no library-based search service, no library catalog, no next-generation search interface today offers either level of deep search-either search within a book, or full text across all known books. So I think that’s a future that we’ve got to be thinking about if we want to maintain a good level of relevancy when it comes to search and retreival of book and of the library collections going forward. So I see that as an important part of what we think in next generation interfaces that’s pretty far beyond what I’m seeing in the current next-generation interfaces

And what about the equivalent for rich media? Do you use automated video-descriptive technology at the Vanderbilt Television/News Archive?

In the same way right now, the technology is here and now to be able to digitize books. Digital video and audio are not too different than that. For the Television Archive we basically use a manual process to describe the news segments where we have somebody watch them then write abstracts and pull out the participants and reporters and all that, and it’s a very costly and time-consuming process. But the state of the research is much different than that.

I’ve had some dealings with the Informedia Lab which is part of the computer science department at Carnegie Mellon University. And that’s their specialty, is automatic video description. So they’re able to do for video what scanning does for books. They’re able to come up with a lot of deep text information that describes the book. They’re able to take the closed caption track and do speech to text, and you just get the text out of that. They’re able to take the audio track kind of as the just the audio and do speech to text, and then they do OCR with words on the screen. That’s three kinds of points of information for a video clip that any of them by themselves is extremely weak. The closed caption is really dirty, if you’ve ever paid close attention to that you’d see that words are misspelled all over the place, they miss things. It serves a funcion, but it isn’t like a full transcription of the video. The speech-to-text isn’t so great. You can train one voice to do pretty good speech-to-text. In things like news, it’s the many different voices that technology doesn’t do so well. It’s getting better. Then the OCR, how often is there really text you can scrape off the screen and make part of the search and retrieval? So separately they’re pretty crummy, but together it’s amazing how well that they work. I’ve seen the software that the Informedia folks use for that and it’s really amazing. It works. Without any human hands touching the video, they describe it and put it in their index, and you can find stuff at an amazingly granular level.

Video is more than the spoken word that’s there. It’s the visual object so that the next level of research they’re doing has to do with putting object recognition as part of the search and retrieval. The last time I was there, and that’s been a year and half, two years ago, they were pretty far along with that. You could pull up a piece of video from a clip, and then if it has people in it you could take your mouse and drag a square around somebody’s face then click a button to say find more of this. And it works, using facial recognition technology to be able to do search and retrieval on large libraries of video.

A more kind of generic object recognition model is part of what they’re doing as well. If you take your mouse and drag it around an airplane and you say go find more like that airplane, maybe even that model of airplane. That kind of thing. So, you know with the same kind of degree of difference between the MARC record and the full text of a book, the next thing in audio and video I think is the ability to search the actual digital object and the contents therein, far beyond what a human would do in writing a metadata record about that. When it comes to still images, I know less about that, but the ability to do search and retrieval based on object recognition facial recognition–color, hues, and all these different kinds of things–I think that will be part of the search and retrieval of the future. The problem we face is that these collections are so big. When you have tens of thousands or hundreds of thousands of images, how long would it take a human to write metadata about each of those? You know, it just takes longer than we have. So the more that we can leverage automatic description to at least get a first cut at search and retrieval . . . I think [this] will be important as the universe of these objects gets bigger and bigger.

I noticed on your CV that your academic background is very different from your professional experience. Could you tell me how you became involved with library technology?

Well, as you can see, I’ve never had a computer class and didn’t go to library school. But, rather, I’ve had a long career of doing technology in libraries. I was lucky enough to begin working in Vanderbilt library about the time when libraries started adopting PCs and computing and all that kind of thing. So i got in at the ground floor of it. You know back in those days library schools weren’t teaching technology, so anyone who had an aptitude for technology… I was in the right place at the right time. So by virtue of my interest and a lot of lucky opportunities, I have been able to kind of keep pushing ahead on this track of, you know, automation and technology and libraries, and just kind of made that my niche. Obviously, I had a lot of interest and passion for both technology and for libraries, the combination of the two. It’s just been a continual kind of career development path that I’ve followed for more than twenty years now.

Do you have any advice about how to keep up with the latest technological trends in libraries?

Well, I’ve got my slant of that. That I make available through my Library Technology Guides Web site. Anything that I follow I try to keep on my Web site so others can follow that as well. You just kind of got to keep your feelers out and look at a lot of different things. You, know, I’m lucky enough to go to a lot of library conferences where I can hear people talking about what’s the buzz. You can get a certain amount through the blogosphere, but it has a certain slant to it, and then by being in close contact with the people that are involved in the companies and other development efforts. So, it’s just a matter of being attentive. I know the niche of the things I’m interested in, so, I know the people and organizations involved in doing that. It’s just kind of being in constant information-gathering mode, always kind of interested to hear things I haven’t heard about before, and when I do trying to dig in a little bit deeper and find out what it is. So, no particular formula, just try to be attentive.

Any additional thoughts you’d like to share with art librarians?

A similar kind of thing. Our library hosted a regional meeting of art librarians, and I gave them a talk on the art content in the Television News Archive. So, my point is, there are a lot of non-art specific collections that have a lot of interesting content, so that the more that we build these deeper libraries of buried content, the better benefit it will be for each of the specific disciplines as well in ways you wouldn’t have thought of.

Hear more from Marshall Breeding: Click here to listen to an interview by Tom Peters.

OPAC Home | What is a Next-Gen OPAC? | So What? An Art and Visual Resources Library Perspective | Interviews | Conference Presentation | Next Generation OPAC Examples | DAMS & Metasearching | Bibliography

What’s Hot & What’s Not: Trends in Technologies and Services in Libraries