Converting the III XML Server into an SRU/SRW Server

I realized the other day that you could use a very simple XSLT transformation to convert the III XML Server output into a SRU/SRW implementation. Rather than having to deal with that mess of tags and data known as the III XML Record format, you can apply different XSLT style sheets to get the records into MARC-XML or Dublin Core, turning the III XML Server into something truly useful.

DEMO:

Shrew 0.2 for III XML Server

XSLT files:

III to MARC-XML (SRU/SRW)

III to Dublin Core (SRU/SRW)

I still need to clean-up a few things, but will put updated versions on the site as I go.

Technorati Tags: , , , , , , , , , , , , , , , ,



BACKGROUND

I recently put together a little proof-of-concept system called Shrew to explore some thoughts about how we could improve the III XML Server. Shrew version 0.1 exports data out of any Innovative catalog using a screen scrapping method, and offers it to you in an SRU implementation.

It’s a fun little system, and took all of an afternoon to put together. But it is not a scalable solution for a production environment.

So yesterday I wrote a new version of Shrew that interfaces directly with the III XML Server. Using a very simple XSLT transformation, you can convert the III XML Server output into a SRU implementation. You can apply different XSLT style sheets to get the records in MARC-XML or Dublin Core. It’s quick and simple -– so simple, in fact, I’m not sure why I didn’t think of this before.

THE PROBLEM THIS SOLVES:

There are essentially two problems with the current Innovative XML Server.

(1) It uses proprietary schemas.

The URL syntax and the search result information are entirely unique to the XML Server, and the records are in Innovative’s own III XML format.

(2) The III Record schema is horribly designed.

If it was simply a matter of it being proprietary that would be disappointing, but not altogether problematic considering that Amazon and other web services also use proprietary formats. But the III Record schema is, frankly, unworkable.

The schema uses no unique element names or attributes for defining MARC data. Every MARC field is simply contained in a <varfield> tag. When processing the document, to actually determine what MARC data you are looking at, you have to access the value of the <marctag> child element contained within each VARFIELD.

What that means is that, if you want to access the value of, say, the ISBN of the record, you have to iterate over the entire set of VARFIELDs, checking each one to see if the MARCTAG child element contains a value of ‘020,’ and then also iterate over the subfields, which also have no unique element names or attributes, to see if you are at ‘a’ to grab the actual number itself.

If you need to access six or seven different values from a result set of ten records, you’ll have to perform 60 to 70 iterations over the XML document. Not only is that horribly inefficient and resource intensive, but it prevents you from doing simple things like comparing two field values at the same time or other types of common XSLT functions. Clearly, whoever designed this schema at Innovative had no sense whatsoever of its practical use or implementation.

HOW TO USE SHREW 0.2

When your application queries the XML server, instead of trying to manipulate its output directly, you would first transform it into SRU/MARC-XML or SRU/Dublin Core using one of the above XSLT style sheet. Once the data is in this new format, you can easily manipulate it in any way you want. You can output the response directly to a SRU / SRW-aware client, or transform it into HTML using existing MARC-XML and Dublin Core tools.

Ah, the power of standards!

In that way, you can write a little class in your programming language of choice that queries the III XML server and does this initial transformation. From your web application code you can just call that object, send it a CQL / SRU query and then deal with the response as if it were always in MARC-XML or Dublin Core.

Here’s a basic ide of how that would look in C#:

InnopacXml objSearch = new InnopacXml();
xmlResults = objSearch.GetRecords( "Java programming", 1, 10, "relevance" );

“xmlResults” in this case would be an SRU / MARC-XML document for the first 10 records in the result set (staring with record number one) for the query “Java programming,” sorted by relevance.

I think it would be great if we could flesh this whole idea out, making open source class libraries for C#, Java, PHP, etc., so that other libraries looking at the III XML server could easily pick this up and use it. And, who knows — maybe Innovative will simply fold all this back into the XML server itself, and make it SRU/SRW compliant to begin with.



One Response to “Converting the III XML Server into an SRU/SRW Server”

  • 1
    dwalker Says:

    Just got our III XML Server back up and running today, and started using the above style sheets.

    Boy is it fast!

    I was testing the XSLT previously against Plymouth State and Michigan State’s catalogs, but firewalls and network latency slowed it down a bit.

    When using it against our own catalog, it’s sub-second fast. The initial XSLT transformation of the III output to SRU / MARC XML or Dublin Core is not even noticeable.

    I think this really ultimately removes any problems I had in using the III XML Server. It would be nice if Innovative actually made the XML Server SRU and MARC-XML compliant from the beginning. But this work-around is so straight forward and in no way a hack that I’m as happy as can be with the whole purchase now.