July 2009 Archives

Why you should care about APIs

| 2 Comments
Chances are pretty good that you have not really given much thought to Application Programming Interfaces (or APIs) before.  And this is for good reason -- your field of study probably has little to do with computer technology.  But I am hoping you will spend a couple of minutes reading this post to see why APIs actually do matter to you and how they could make your life easier and more productive.

Today, we released Sente 5.7.9.  There were a number of small changes in this release, but the primary reason we had to get this out is because Google changed the structure of the HTML in their Scholar pages and these changes broke our reference detector code (the code that puts the little target icons next to each reference).  So, for the last couple of days Sente users have been unable to get references from Google Scholar pages.

Sente 5.7.9 fixes this problem, but the underlying problem is the lack of a stable API to Google Scholar (and to many other resources).

So, what exactly is an API?  It is an interface for use not by people, but by programs.  A user interface, or UI, is designed to be used by a person. An API is designed to be used by other programs.

An example might help.  If you use Apple's Mail program and you have a MobileMe account, you can access your email either through MobileMe's web-based user interface, or you can read your email in Mail.  The Mail application uses several APIs at MobileMe to get and send mail (e.g., SMTP, IMAP, POP).  If Mail could not use these APIs, it would be forced to try to extract mail messages from the web interface, which would be very difficult, slow and error prone.  And, every time someone at MobileMe tweaked the way the web page looked, the Mail application would have to be modified to handle the changes.

This is where we are with respect to Google Scholar and hundreds of other data sources.  Our targeted browsing code looks at the structure of each web page that it gets and tries to find references embedded within the page. This feature can be very useful, but it is not nearly as useful as it could be if each site supported some standard APIs for searching and retrieving reference data.  And, every time one of the supported sites changes their HTML, we have to modify Sente to handle the change.

Contrast this with PubMed.  PubMed has supported a well-defined, relatively stable API for accessing reference data (and other types of data) for many years now.  This means that Sente can do more things, more quickly and more reliably, with PubMed than we can with most other data sources.  And our users in the bio-medical sciences benefit directly from this.

People in the humanities often accuse developers of academic reference managers like Sente of caring more about biology than about French literature, or medieval art.  This is not the case.  The problem is that the data sources in the humanities (and many other fields) are far, far behind the data sources in biology.  

Instead of treating citation data (titles, author names, abstracts, publication details, etc.) as a little advertisement that will attract readers to their publication, many publishers in the humanities (and other fields) treat citation data as an asset that they can license to services like Thomson and EBSCOhost, who in turn sell access to institutions.  I think it is fair to say that the primary motivation for this way of doing things is not to provide the best support possible for academic research.  (Note that I am not talking about free access to the full text of all articles; just to the basic citation data about each reference.)

Now, you might be thinking that I have confused two issues -- APIs and the licensing of reference data -- and, to some extent I have.  But they are related.  Google provides an API for regular Google searches.  But not for Google Scholar.  Why?  Because the academic publishers would not agree to this because they were afraid of losing their income from licensing their data to resellers.  Thus, we are reduced to trying to keep up with each little change in the HTML that Google makes to their Scholar search results page.

So, back to the original question: why should you care about APIs?  Because the lack of support for open, stable APIs to basic reference data in many fields is holding back everyone  those fields.  We will continue to find ways to make it easier to search the literature in all fields of academic study, but we would be more successful if the publishers actually wanted this to work.

Many of you have, or will have, significant influence on one or more of the academic publishers in your fields.  You may be on the editorial board for a journal.  You decide which journals you submit your manuscripts to.  I think it is time for everyone to use whatever influence they have to push publishers in all academic fields to follow the lead of publishers in the bio-medical sciences.  It is in everyone's best interest (even the publishers) that there be ready access, through both user and programming interfaces, to all of the basic citation data in all academic fields.

Sente 6: Synchronized Libraries, Part Two

| 5 Comments
In my previous post on synchronized libraries I talked about the needs that we were trying to address in our design of this new feature.  In this post, I thought I would let you see synchronized libraries in action.  I will be demonstrating synchronized libraries using a pre-release version of Sente 6.

The following video demonstrates the use of synchronized libraries on one computer.  The mechanism is exactly the same when the copies are on different computers, but this was the best way to show the updates as they happen.


In the real world, performance will be a bit slower than in the video because you will be further from the servers than we are, but updates will still propagate in just a few seconds.  The biggest difference will be seen when first synchronizing a large library and when propagating PDFs -- which are much larger than typical reference updates -- these will take quite a bit longer than in the video.  But other than when you first create a synchronized library, the volume of data going back-and-forth will be relatively modest, so you should still be pleased with the performance.

Synchronized libraries is working in Sente 6, Preview 3, which was recently distributed to our small group of testers.  We will be making one or more public previews available in the coming weeks, so everyone will be able to kick the tires shortly.

Please let us know what you think -- we want to make sure that our design for synchronized libraries works well for as many people as possible.

Michael

Sente 6: Synchronized Libraries, Part One

| 3 Comments
The ability to maintain identical, synchronized copies of a single library on multiple computers is one of the most important features in Sente 6.  This is the first of two posts discussing this new feature.

Most people care about synchronization because they work on two or more computers and they want all their data on each at all times.  Additions and edits on any computer should be automatically reflected on all others.  PDFs obtained on one computer should be automatically available on all others.  Ideally, one should not have to think about the synchronization process at all, and it should be possible to make changes on any copy of the library at any time. Furthermore, it should be possible to make changes even when one's computer is off-line and changes should be propagated the next time the computer is on-line.

Synchronized libraries in Sente 6 will do this.

In addition, some users want to share their library with colleagues.  They might want to share their library with their entire lab or department.  Or maybe they are part of a distributed collaboration and everyone wants access to the same library.  These Sente users typically want to be able to give some other users limited (or full) edit capability, and read-only access to others. They may also need to be able to restrict access to attachments for some users, while granting full access to others. And, of course, they want this ability to scale to any number of synchronized copies of a library.

Synchronized libraries in Sente 6 will also do this.

One other benefit of synchronization in Sente 6 is that once it is turned on, one can restore a complete, up-to-date copy of the library on any computer at any time, including attachments.  This means that some people will use synchronization even if they have only one computer, just  to know that they will not lose their library, even if they lose their computer.

We think synchronized libraries in Sente 6 address all the major needs that we have heard from our users over the past few years. If you can think of something that we have forgotten, please let us know.

In my next post, I will say a little more about how synchronization works in Sente 6.

Sente 6 is currently being distributed only as a private preview. The official release is scheduled for September 2009. Anyone who purchases a Sente 5 license now will receive a free upgrade to Sente 6 when it is released.