How to create a Firefox Search Plugin – OpenWorldCat

Taking a break from talk of bigger ideas, here’s a simple thing you can do to add value and make it easier to search your offerings. I’m one of those people that prefer not to install a whole toolbar though I may be prone to install a search plugin or extension. I try to keep my UI free of too much clutter. I created one for my library back in May since it was so easy and since there has been a recent interest in it I thought I’d share how easy it is. If you can code a form in html or actually read form code off a page then you can create a plugin. For this example I will create one for OpenWorldCat through Google since I currently find it a pain in the ass to search it otherwise.

First lets take a look at the code you can use in a webpage to search OpenWorldCat via Google:


<form action="http://www.google.com/search" name="f">
<input maxlength="256" size="40" name="q" value="" />
<input type="hidden" name="ie" value="UTF-8" />
<input type="hidden" name="oe" value="UTF-8" />
<input type="hidden" name="hl" value="en\" />
<input type="hidden" name="domains" value="worldcatlibraries.org">
<input type="hidden" name="sitesearch" value="worldcatlibraries.org" />
<input type="submit" value="Search Google" name="btnG" />
</form>

I took most of this code I believe from Google and the OCLC demo. Now that we know what form inputs are used, we can start to create the source file for Firefox. In this example I am going to name it worldcat.src though you can name it as you wish. Here’s what the final source will look like. I’ll talk about the various parts afterwards:


# Mozilla+ search plugin for Google Open Worldcat Keyword Search (http://oclc.org)
# Author: Ryan Eby spam@spam.org
# Created: 28 December 2005
# Last Modified: 28 December 2005
#
# Country: US
# Language: en
# Category: 5 (Reference)
#
# Known issues: None.
#

<search
version="7.1"
name="OpenWorldCat - Google"
description="Search OpenWorldCat via Google"
action="http://www.google.com/search"
searchForm="http://www.google.com/search"
method="GET" >

<input name="q" user>

<input name="ie" value="UTF-8" />
<input name="oe" value="UTF-8" />
<input name="hl" value="en" />
<input name="domains" value="worldcatlibraries.org">
<input name="sitesearch" value="worldcatlibraries.org" />

<interpret
# Dummy section added to prevent spurious links parsing
browserResultType="result"
resultListStart="</body>"
resultListEnd="</html>"

>
</search>

<BROWSER
update="http://url/to/worldcat.src"
updateIcon="http://url/to/worldcat.png"
updateCheckDays="10"
>

Header Information

The first thing you will notice is the header information. This should include information on who wrote it, when it was created, when it was last modified, etc. This information is actually more for developer use than anything and helps others if they decide to modify it.

Search Information

This is where we include information on how we are going to search. First we specify the version of Netscape that the plugin is tested against. The documentation suggests 7.1. We then choose a name and description. We then set-up the actual form. As you can see I used the same action that the original form has. I used the same thing for searchForm as that will still take you to a Google search box at least. The rest of the inputs are the same as our form except that I removed the “type=hidden”. You’ll also note that the name=q input has user added to the end. This is what tells Mozilla that this is the input that the user will be entering.

Interpret

You can do some advanced things with the result set that comes back. In this case I just want the normal results page and I found the above works well. I’m not certain where I found it anymore. You can find out more about the interpret options on the mycroft documentation site for it.

Browser

As you can probably tell this is the area where you specify where your src file and your graphic are located. It also specifies how often it should check for an update. Depending on how often you are tweaking the code you may wish to change this value.

How to Offer It

Now that you have a src file you probably want to offer it for installation. You may also want to provide an image. It should be named the same as the source file, be 16×16 pixels and be either jpg, gif or png. You will want to upload both of these files to a directory on a webserver. You then include the following code in an html page, replacing the URLs with the correct paths to your files.


<script
type="text/javascript">
<!--
function errorMsg()
{
alert("Netscape 6 or Mozilla is needed to install a sherlock plugin");
}
function addEngine(name,ext,cat,type)
{

if1
{
//cat="Web";
//cat=prompt('In what category should this engine be installed?','Web')
window.sidebar.addSearchEngine(
"http://localhost.localdomain/plugins directory/"+name+".src",
"http://localhost.localdomain/plugins directory/"+name+"."+ext, name, cat );
}
else
{
errorMsg();
}
}

//-->
</script>

The code above will allow you to link to as many plugins as you want. You do so by adding the following to your html:

<a href="javascript:addEngine('worldcat','png','Reference',0)">OpenWorldCat Plugin</a>

As you can probably tell the information you pass in the addEngine area is the name you used for your files, the extension you used on your image and the category of the plugin. In this case I chose Reference. More information on installation can be found at the mycroft site.

In the end you should end up with something like this. Of course the page I created is bare. Give it a whirl and comment if you have problems.

  1. typeof window.sidebar == "object") && (typeof
    window.sidebar.addSearchEngine == "function" []

Personalized Search

In reading The Search one of the topics that comes up is the goal of the perfect search. In order to achieve the perfect search it’s believed you need to have personalized search so that searches can be put into context and the likely search “intent” of the searcher. A person that searches for coffee a lot and a person who searches are mostly computer related may have a different intent of “java”. Right now the searcher is required to add more context to the item if they want good results but the goal is too move most of this to the backend.

Since I use Google a lot I went ahead and turned on their personlized search. While this is just the beginning, and doesn’t do the above, the interface is quite interesting and may give some ideas of what could be done in the OPAC. I’m not a huge fan of the whole saved records set-up, email, etc that some OPACs and online databases have but at least it’s something. I don’t remember when I turned on the search in google but it appears to have been sometime in December.

Personalized Search: Trends

Google Personal Search Trends

As you can see above, Google breaks down my searches by date, time and frequency. You can click on the dates as well and see the top searches for that month, day or hour. While this looks like it would be more cool factor than useful I’ve already found it worthwhile. I had a search the other day where I went through many pages of results as I just couldn’t find the right keyword. I eventually found a page that was worthwhile though I forgot to bookmark it. No problem as Google now tells me which results I clicked on.

Personal Search: Bookmarks

Google Personal Search Bookmarks

If the trends didn’t catch your fancy, you may be interested in the bookmark feature. When you click on searches you can see your history and then “star” things you want to save. You can then also apply labels to it. For those with Gmail your already familiar with this interface. I can only presume that bookmarks will be integrated with gmail sometime in the future.

An interesting note: It tells me what my failed searches were. An interesting tidbit of data and something I may use in the future to help me refine searches.

Uses of the Data for OPACs

Setting aside the many privacy issues, what can this data be useful for if similar things were done in an OPAC environment. The first thing I can think of is suggestions. When you know what searches work and which ones fail then building smart suggestions for users becomes much more useful. Right now I’ve seen various implementations of suggested searches which are based on such things as whether there are results, etc. This is good in it’s own right as it prevents suggestions that don’t return anything, but it would be even better when you can bump up suggestions that have a high success ratio. This would be easier to create, presuming your library has holds, etc. If it doesn’t then it may be difficult to track “success” other than a specific item was viewed.

Another possibility is a smart citation list. The ability to keep and tag citations as well as the searches that got them adds another powerful layer to research. Some of this is already possible using external services but as most OPACs have some sort of “mark record” functionality it would be nice to have a more versatile system. Again, a nice API would be helpful here so that this information can be moved in and out of the system. The ability to organize research by project, go back to “good” searches if not enough information ended up being available and seeing what searches came up empty would all be helpful in this process.

There are probably a hundred other things that could be done on the server side that the patron never sees but these are just a few that can help on the public face of things. If you don’t mind your searches being logged to your account then I recommend giving it a whirl.


LiveSearch and Clustered Displays

I’ve written on my blog multiple times about “LiveSearch” or dynamically loaded results. While heavy on the server it does give an interesting feedback mechanism as you can instantly refine your search and possibly find things that may have been lost otherwise or have just taken too much time to get to.

Thom Hickey has recently created some mockups of LiveSearch with OCLC records including subject headings, etc. Here’s the two so far:

What you should be able to see is an index to all the records that a large public library holds in WorldCat. We’ve extracted all the 5-word phrases from authors, titles, statement of responsibilty, and subject fields. It’s a bit of trick to get the right phrase from the right manifestation from the right work to display. We get the speed by loading all the information into memory in several flat files, and generating the screens from those.

As cool as those are (I really like the clean results) what actually caught my attention in the post was a screenshot of a clustered livesearch.

OCLC Live Search Clustered

Casey recently posted here regarding clustered results and I think we may see much more of it in the future. With the increasing amount of information there is going to have to be more ways of drilling down and refining the content. And I’m not just talking the generic OPAC advanced search but more informative things such as what subjects are included, editions, popularity, relationships, etc. Limiting it to books or articles is nice but having hints and clues such as keywords and subjects adds the touch some people need to really make things relevant.

As I’ve written here before though, you have to be careful not to make things overly complicated. The results screen should probably avoid looking like a space shuttle cockpit. What do you think should be included in a results screen?

Technorati Tags: , , , , ,



Can you be trusted with Library 2.0?

If you haven’t been following along there has been an open discussion lately regarding ILS user rights and where OPACs are headed. In my recent post I wondered whether an open architecture might allow the faster paced changes that are needed in a technology and information-centric world. Since then Talis has responded again regarding John Blyberg’s response. I’m certain he will post with a better response since he has more first hand experience but I thought I’d address a few things.

First, I’ve read Talis’ whitepaper on Library 2.0 and have to say it has a lot of great ideas and states exactly what it should. In fact is says pretty much everything I would like to but in a much easier to sell to executive type language. For example:

Rather than being hidden in catalogues with a single web interface, stored in proprietary databases only visible via a project’s web site, or accessible only to users of certain machines physically connected to particular networks, Library 2.0 resources should be more widely exposed. They should be available to the wider web, visible to search engines such as Google, and harvestable into new applications and services built by the library, and by third parties.

Some bits are a bit confusing though and I don’t think I fully understand what they are attempting to do. They’re recent responses to John Blyberg’s posts make me even more wary. For example:

Looking at the issue from John’s end of the telescope it sounds so obvious and simple. Imagine looking at it from as a support analyst’s point of view. From her end of the telescope she can see [in Talis’ case] potentially 100+ Johns hacking away on their systems every day – a thought to drive you straight toward the caffeine in the morning!

So the whitepaper seems to give the impression that libraries should be able to build their own services while it would be a support nightmare for them to have access to the ILS data. Further reading of both the whitepaper and the various posts make me think that their idea of the Library 2.0 is a information stream and possibly even central datastore. The image I get is WorldCat with an API. You purchase streams in and out and build your services off that. I think such a thing might work for some but many would find giving up that level of control over their data and offerings.

I also disagree with the support argument as I’ve pointed out in a recent post if the architecture is done right so that “hacking” isn’t required then support can actually become easier as others have found. I also think the OPAC created by the 100+ Johns would likely be much better than any vendor supplied solution. With the amount of resources and integration required by many these days, a single solution just doesn’t work. What happens when an academic library needs to integrate with software that only exists at that university? While it’s nice that Talis has plans to include things like RSS, IM and the like how easy will it be for the library to add things that aren’t “hot technologies”?

It is a revamping of the whole architecture to get those nice to haves, and make it easy to add so so much more.

It’s good to see the architecture is being thought about. However I get the impression that the “easy to add” will be for vendors and their “approved partners”. The whitepaper had quite a few references to third-party partners and providers but didn’t really list libraries as being one of them that much.

Any SELECT which causes a substantial slow down in the performance of your ILS is dangerous for the reputation of your Library. Are you sure that all the bespoke work you do against your database is scalable, (eg RSS feeds) when used by the majority of you customers.

While I agree that such things could cause performance problems, having things more open could help a library scale as they want. I also think the reputation of the library is harmed when they have to answer patrons with “we’re waiting for the vendor”. Our university mail system recently had quite a few hick-ups that caused some hate and it went on for what many believed to be too long. The status message? “Waiting for patch from vendor”. Granted some things require core updates but others shouldn’t.

The message I’m starting to get from these conversations is that librarians can’t be trusted with library 2.0. That things will be different this time, don’t worry about it. I’ll be interested to see how it all turns out and how some of the open source ILS systems handle similar questions. What role should libraries have in their OPAC?

Technorati Tags: , , , , ,



Hiding Complexity in Library 2.0

One of the benefits of a more open architecture for the OPAC, as discussed earlier, is that it allows you to have as simple or as complex an interface as you need. If your patrons don’t need a specific feature then you can turn it off. It also allows those with other requirements (such as privacy) to tweak it to meet their guidelines. Some libraries may have no problem storing data for personalized searches while others may have a strict privacy guideline forbidding such things. In general extensibility allows you to be more flexible.

Right now, however, it is a bit difficult to do such a task. You can tweak templates but any large changes or integration often takes lots of hacks and is not very elegant. The usability of such a system could also decrease as you try to hack on this feature or that. This is important as more and more libraries begin to look into ways of making the OPAC easier to use and more importantly, information easier to find. A recent quote caught my eye and it may be useful to keep in mind when discussing features for you OPAC and how to implement them.

“Google has the functionality of a really complicated Swiss Army knife, but the home page is our way of approaching it closed. It’s simple, it’s elegant, you can slip it in your pocket, but it’s got the great doodad when you need it. A lot of our competitors are like a Swiss Army knife open–and that can be intimidating and occasionally harmful.” (via Column Two)

Since there are likely some google haters in the crowd here’s a better takeaway from the article:

In a 2002 poll, the Consumer Electronics Association discovered that 87% of people said ease of use is the most important thing when it comes to new technologies.

In the ramp up and discussion for “Library 2.0″ the feature list keeps getting longer and the ideas ever more complex. Can you keep you OPAC from becoming a confusing mess? Can you even implement these features if you wanted to? When someone visits your OPAC are they eased into what’s available or hit head on by the thousand options they have? Can they find a book without knowing boolean constructs?

All of these questions are important as people will demand these features but also demand that they not get in the way. Is there really a reason to have a button to show the MARC record front and center? Can you tell who the OPAC was designed for? With a good URL structure such things could be relegated to the background for advanced users. In WordPress if you put a /feed/ at the end on any URL you will get the feed for the item be it author, category, item (comments) or site. With a good architecture an OPAC could do similar things with /marc/ showing the MARC record, /feed/ giving the RSS/Atom, /xml/ giving the MARC XML or other XML syntax, etc. This allows you to expose what you want (add links on the page) or just leave it there for those “in the know” (librarians that want MARC). This is another area where WordPress gives a good example of how it can be done. And it’s extensible so if you wanted /dc/ to give you a Dublin-Core output of the page you could do so if you wanted. There’s already a WordPress blog that’s COins-PMH compliant, how long would it take you to make your OPAC as well (server-side)?

Does your OPAC allow you the flexibility to integrate, add/remove features or otherwise change as patron demand changes? Shouldn’t it?

Technorati Tags: , , , , ,



ILS Architecture: Open vs Turnkey

If you haven’t already read Blyberg’s ILS Bill of Rights I suggest you do so. Also check out the Talis reply and Blyberg’s response. Here’s a brief overview of the points though the commentary on the posts is worth reading:

  1. Open, read-only, direct access to the database.
  2. A full-blown, W3C standards-based API to all read-write functions
  3. The option to run the ILS on hardware of our choosing, on servers that we administer
  4. High security standards

Anyone that has been following the various OPAC hacks or problems knows why these are important. Currently many are forced to either mirror their entire dataset (intensive) or at least cache certain information so that the queries are easier to do. The drawbacks of this approach are obvious and having options 1 and 2 would help alleviate some of this (caching may still be important for performance issues). It will be interesting to see if any of the OSS ILS people respond to the points but I’ve been very impressed by the openness of Talis lately. They are certainly getting their name out if nothing else.

Since I told you all to subscribe to Blyberg’s blog I wasn’t going to post about this but a unrelated post over at Photo Matt (of WordPress fame) got me thinking:

All that said, hard-core developers often need flexibility in the system to expand WordPress to things we’ve never even imagined, and that’s where our plugin system comes in. While we often say no to new options, we rarely ever shoot down a suggested extension to our plugin API. The beauty of this is it allows for near-infinite flexibility in how you interact with the program (there are some amazing plugins out there) while still keeping the core light, clean, stable, and fast. It also makes support relatively painless: “Does it work when you deactivate the plugin?”

Would it be better for ILS vendors to offer a simple turnkey based system with options added as extensions? They could go to the point of selling their own extensions (ones with support) but also allowing others to build on it if they wish. Add RSS feeds yourself? Then it’s your support problem. Buy their RSS extension then you can get support. As Matt states this helps with support issues while allowing near endless functionality.

With this architecture in mind it may be possible for vendors to have a solid lightweight core ILS while allowing those that need more to add the features they need (what their budget allows). They could also then work on enhancing the API hooks as people need them for future releases.

From the Talis reply it almost sounds like a Turnkey vs Open dispute. I see the problem that some libraries will want something they can just plugin and go but this is far from ideal for many that actually have the IT support to do more. Could plugin/extension hooks be a possible solution?

I should note that I’m more concerned with the OPAC side of things than the full ILS.

Technorati Tags: , , , , , ,



PHP5 Class for III XMLOPAC

Update: A new version of the class with more data and some bug fixes is available.

I guarantee I will be using this alot and it will actually allow me to try some things out in my free time that my coding skillz could not allow prior. Created and released by John Blyberg.

The upshot is that I threw together a PHP5 class that’ll take a bib number and return a simple array of data from the bib record. It seems pretty portable so I decided to make it available. Even though it’s pretty basic now, I can easily see the scope of this class growing.

PHP5 class to interface with III’s XMLOPAC

Subscribe to his blog! Lots of good things there and more sure to come, especially if you use the III system. I’d feel bad reposting things here all the time.


BBC Archive Catalogue

In a recent post here I mentioned a thread on XML4Lib that discussed standards in libraries. It has evolved into a discussion regarding the various non-library standards and their use in the library catalog. If you haven’t been reading it I highly recommend to. The discussion has brought up many ideas and problems and discussions about the various technologies. This hit home today when I saw the following:

Ever wondered what’s in that archive? Who looks after it? It turns out there’s a huge database that’s been carefully tended by a gang of crack BBC librarians for decades. Nearly a million programmes are catalogued, with descriptions, contributor details and annotations drawn from a wonderfully detailed controlled vocabulary.

Controlled vocabulary? Librarians? Catalogue? Sounds familiar. Are we looking at another generic OPAC? Not really

…has oodles of Ajax, and tags, and RDF, and FOAF, and Sparklines, and Microformats, and just about everything else we can fit in.

Now things are getting interesting. Their quickly prototyping interfaces and it looks to be interesting (screenshots at the sites). Can’t wait to see what it looks like once they get it beta quality. I’ll be really interested in the use of FOAF for contributers and what that might allow them to do. Obviously this is not the same as a general library collection but it will be interesting to watch nonetheless, especially with the debate on XML4Lib on the utility of RDF.


COinS-PMH and Microformats

Update: A Firefox sidebar that you can use to try COinS-PMH out is now available at dchud’s blog.

I recently posted here regarding standards and libraries, specifically the need for lightweight APIs/formats for use in various projects. I also mentioned an article over at darcus blog regarding light vs complex, and there is even a bet that lightweight will win over heavyweight. While that can be debated, there is definitely a place for lightweight implementations.

An example would be COinS or more specifically COinS-PMH. I have to take a second to admit that I’m playing catch-up here and probably don’t understand all the background and possibilities, but what I’ve seen I definitely like. There’s days I feel like there’s too much history with libraries to ever catch-up but hopefully I’m catching the important ones. For reference here are the applicable specs/overviews:

An over simplification is that COinS is a way of using OpenURLs within HTML and COinS-PMH is a simplified OAI-PMH protocol/set-up that allows the easy extraction of metadata regarding an object marked with COinS. In otherwords it turns an HTML page into a sort of simple repository. While this doesn’t seem that great the idea becomes more fruitful when it’s used at sites like CiteULike, journal pages, etc.

If you haven’t already looked into microformats, I recommend doing so. Microformats in the simplest term is standardized markup. You markup things with specific tags, class names, etc and this allows people to create parsers to extract the information. I’ve also written about this before at my blog. The power of this becomes apparent when more people adopt it. To quote Mark Pilgrim (who is writing some of the parsers for microformats):

Imagine having your own private database of every person you’ve ever stumbled across online, and being able to download their vCards into your address book. And every event, which you can download into iCal/Sunbird/Outlook. Plus a list of all the Creative Commons-licensed content you’ve ever read, which you can repurpose — legally, according to the terms of the license. Now imagine searching such a database. And subscribing to your search results as a syndicated feed.

Lets apply this idea to COinS-PMH. I’m doing some research and browsing sites like CiteULike, Journals, my library’s OPAC, etc. I have a greasemonkey script/bookmarklet that adds a button beside any citation that is marked up properly. When I hit the button it adds it to a datastore I set-up (or someone else did) that I can then search or browse later to find what I really want. Since I have actual metadata about the objects I can integrate links to ILL forms, EndNote, etc. While some of this is already possible (saving to endnote, OpenURL buttons, etc) this creates a standardized markup that allows it to be extended in anyway I see fit.

I see this having some potential and I look forward to seeing where it leads. Hopefully more scripts and examples become available. Until then you may wish to look at the following posts:

Also, if your library is in the OCLC OpenURL Resolver Registry you can get a bookmarklet or greasemonkey script for use with your library that detects COinS. I found one for our library’s ILL request form. Pretty spiffy.