XML Server and web services
Web services are applications that run on remote servers that you can access over a network. Unlike a web page, these apps reply with data that in a form (like XML, or RSS, etc) that can be used in other applications. This makes it possible to use that data in new ways, perhaps combining data from several sources or presenting it to patrons in new contexts.
Some of the best known web services include those from Amazon.com, Google, Yahoo, and Flickr (links go to API documentation). There are more, so please name any interesting ones. BBC News is creating BBC Backstage with a set of web services, including one that will integrate geolocation data. Larry D. Larsen at the Poynter Institute has some ideas about how that might work out.
I spoke of many of the possibilities of this technology in my IUG 2005 presentation on III’s XML Server. David Walker of CSU San Marcos has done some of the best work with III’s XML Server I’ve seen so far, but I’d like to start a conversation, on how these might effect your library. What happens when the data is freed from the box and you can use it where you need it?
Technorati Tags: api, apis, flickr, geolocation, google, libdev, libraries, library, patrons, web services, xml, yahoo
July 5th, 2005 at 2:05 pm
I would really like a more webservice type approach for the xml server. Right now it just seems more like a “data export” type set-up and doesn’t really work as good as it could. A few tweaks though and I think III could have a killer service/product on their hands.
I agree with your post that the schema is a bit lacking. I wouldn’t want MARCXML though as that has similar pitfalls. I would really like more named elements so that it’s easier to extract the data. The schemes I’ve seen just seem to loop through the data and throw in the marc tag number and the value. It would be much nicer, at least for people like me that don’t know MARC, to have elements similar to Amazon or the like. The MODS and Dublin Core transformations are more towards what I would like. Perhaps a switch could be done on the URL in order to get it to use alternate schemes.
Is there work being done on a good library catalog/webservice schema, not just record export? Perhaps there should be.
July 6th, 2005 at 10:04 am
I would actually lobby in favor of MARC-XML as the default schema for the III XML Server.
I can completely appreciate the fact that it is a bit difficult to understand, and has other disadvantages. But there are a couple of key advantages to MARC-XML:
(1) It is the most direct mapping of MARC (ISO 2709) to XML, and thus the only loss-less schema. If I need to make sure that all my data remains in tact, then MARC-XML is the only schema that can guarantee that.
(2) It is trivial to convert MARC to MARC-XML, and thus it is both the fastest and the easiest way to get MARC data into XML. That makes it more likely that vendors, such as Innovative, will be able to implement it.
(3) Once the data is in MARC-XML, you can easily convert it to more user-friendly schemas, like MODS, Dublin Core, or RSS using the XSLT style sheets provided by the Library of Congress. In that way, you can build off of work that has already been done by others.
July 6th, 2005 at 10:23 am
The best work I’ve seen for a web-services architecture for bibliographic databases is SRU/SRW from the Library of Congress.
In fact, I recently put together a demo SRU/SRW service. It works with any Innopac system, just enter the URL.
Shrew: SRU Service for Innovative Catalogs.
This is, in my mind, how Innovative should improve its own XML Server. It has a couple of advantages:
(1) It allows you to export the records in a couple of different schemas, based on your needs — MARC-XML by default, MODS, or DC. All of those are standard schemas.
(2) The query string values (or SOAP requests if you set it up that way) are also standards-based. In that way, any SRU/SRW-aware client system can consume the information from your catalog — it is, in essence, the next generation of Z39.50, which had many of the same ideas.
(3) It’s extensible, allowing III or other vendors to add system-specific values without having to invent a wholly different set of query string (or SOAP) values.
July 6th, 2005 at 10:28 am
Let me try that again
SRU/SRW from the Library of Congress.
Shrew: SRU Service for Innovative Catalogs.
July 6th, 2005 at 10:30 am
I’ll second ebyran’s wish that XML Server worked more like the web services named above. I’m hugely impressed by some of the applications people are building with the rich APIs Amazon and Flickr offer and I’d love to have that access to the data in my ILS.
At the same time, I think David has a point in his advocacy of MARC-XML. The smallest change III would need to make to XML Server to make it useful to me is to output MARC-XML.
In the long run, I’d like to see XML Server expanded to enable access to all the data in the ILS. And it should be a two-way service. I’d like access to patron info AND the ability to create or change patron records via this interface. The Bursar Interface needs this sort of functionality too.
July 6th, 2005 at 9:59 pm
I agree with all the points regarding MARC-XML. I wouldn’t say it’s any less user-friendly. It is probably the ideal for those who want loss-less output as you describe. I would also prefer that everything be available for output. Having it based on a standard would also make it much easier to collaborate with people.
I suppose my main problem is that I don’t want loss-less, I want a very specific data-set. Right now the III system seems like it would let me exclude lots of things but since the nodes are named the same it’s more of an all or nothing. If there’s a way to have it return some of the variable fields and not others please share. The documentation leaves something to be desired.
One of the things I love about the Amazon API is the Response Group settings where you can choose either groups of data or specific data you want returned. Since what I build is not being hosted on the same server it would be nice not to have gigantic xml files to process when all I need are a few nodes.
My named element suggestion was based on the fact that it would allow me to do this with their current set-up. Now if they would implement MARC-XML and some nice response group options, well then I would be just as happy or more so.
July 13th, 2005 at 3:45 pm
Digressing from the logical technical discussion…
Lock-in is a very real vendor strategy. To export data, especially in lossless, standards-based bulk, is a giant red flag for some vendors.
What happens when the data is freed from the box and you can use it where you need it? Burger King. Have it your way.
Of course, the situation is infinately more complex…
Hopefully you under stand my point about logical viewpoints. Our viewpoint is data freedom. Some vendor’s viewpoint is lock-in, thus continued revenue.
July 14th, 2005 at 5:41 pm
Brad, I can see you’re playing devil’s advocate. The following is targeted at vendors…
It used to be true that scarcity = value (or increased prices, anyway). The New York Times endorses that model in the way they put their content behind an auth-wall.
But as we look at commercial successes in the post dot-bomb era, we see more examples of how openness = value.
Google is one example, but I like Flickr the best. It’s a photo sharing site for the “all electronic, no need for prints” crowd. They’ve made a lot of news and gotten a lot of customers by making it easy for people to use the Flickr-hosted content very easily.
A presentation at ETech2005 spoke directly to this. I wasn’t there, but the program description and online conversation made me wish I was.
If Flickr were an ILS, I would be incredibly pleased with it. It does what it does well — the cataloging, classification, storage, and presentation of media objects (photos). I really like their “OPAC” interface, but Flickr has won me over because their API is so rich. I could export all my data and walk away, but I won’t because I’m happy to let them manage it for me. Instead, I can present my Flickr-hosted content elsewhere and re-use it in ways Flickr never imagined. Flickr Colr Pickr is one outstanding example.
July 15th, 2005 at 12:34 pm
Allow to me to also play the devil’s (or at least the cynic’s) advocate.
I would suggest that, much like Flickr, the Innopac actually does do what it does well. Like any ILS, the Innopac is really just an inventory management system. And a pretty good inventory management system it is — so good, in fact, that our IT department here ditched their old system in favor of using our Innopac system to check-out laptops and multimedia equipment.
The problem is that we have long mistaken these systems for a search and retrieval tool. That is really only a secondary (at best) feature of any ILS, and so it should come as no surprise to us that the Innopac leaves much, much to be desired in that area. A good web services API, the ability to index and search on any field, and unlimited user liscences — all without having to pay an arm and a leg for it. Those would be the featres of a good search and retrieval tool.
We have inventory management systems.
July 25th, 2005 at 9:15 am
[…] There’s an interesting discussion going at LibDev about what our ILSs are. It all started with a discussion of what role XML and webservices could/should play with ILS/catalogs, but a comment reminded us that Vendor’s decisions about adding new features to products that have been around for 20 or 30 years sometimes edge towards lock-in. I replied offering Flickr as an example of a vendor that’s been successful in part because of their open APIs. […]
July 25th, 2005 at 10:02 am
A relevant quote found elsewhere:
That was Dick Miller and his colleagues at the Lane Medical Library, Stanford University Medical Center. They weren’t speaking specifically to this issue, but it seems to work. I found it at Panlibus.
(note, I’m conveniently ignoring the role of MARC in all this, since my real concern is about open access to any available metadata, no matter how it’s structured.)
November 16th, 2005 at 11:07 pm
[…] In this prototype, I’m using XML access to our catalog to fetch the top 150 results for a keyword search, aggregate the subject headings and authors, and display it all for the user. The data is live, so go get clicky on it. Also, try this version that displays the clusters in a more tag-like way. I’m not sure which view I like better, so I’m experimenting with both. […]
November 17th, 2005 at 10:06 pm
Casey Bisson Does It Again and Presents Exhibit B
Displaying Clustered Search Results“A big point in my NEASIS&T presentation Tuesday was how new technologies like XML and web services allow us to separate the tools that manage and store our data from the tools that display and manipulate it…
November 22nd, 2005 at 4:23 pm
[…] Two problems: I haven’t encountered CDATA in my XML yet, but I do hope to develop a better solution than offered here when I do. The other is that SimpleXML chokes on illegal characters, a unfortunately common occurrence in documents coming from III’s XML Server. […]
November 30th, 2005 at 10:59 am
[…] AADL’s John Blyberg is doing some great stuff with III’s XML Server, and his XMLOPAC PHP class is just what we need to start making use of the product (and cut through the bad XML it outputs). I’ve started re-writing the work I did previously, and I’m taking good advantage of the get_opac_data() function in that class. […]
November 30th, 2005 at 3:39 pm
[…] Think about this for a moment: Our ILSs are inventory management systems, but our OPACs are (supposed to be) search and retrieval systems. The difference is obvious from here, but our vendors continue to operate as though you can’t have one without the other. […]