Chances are pretty good that you have not really given much thought to Application Programming Interfaces (or APIs) before. And this is for good reason -- your field of study probably has little to do with computer technology. But I am hoping you will spend a couple of minutes reading this post to see why APIs actually do matter to you and how they could make your life easier and more productive.
Today, we released Sente 5.7.9. There were a number of small changes in this release, but the primary reason we had to get this out is because Google changed the structure of the HTML in their Scholar pages and these changes broke our reference detector code (the code that puts the little target icons next to each reference). So, for the last couple of days Sente users have been unable to get references from Google Scholar pages.
Sente 5.7.9 fixes this problem, but the underlying problem is the lack of a stable API to Google Scholar (and to many other resources).
So, what exactly is an API? It is an interface for use not by people, but by programs. A user interface, or UI, is designed to be used by a person. An API is designed to be used by other programs.
An example might help. If you use Apple's Mail program and you have a MobileMe account, you can access your email either through MobileMe's web-based user interface, or you can read your email in Mail. The Mail application uses several APIs at MobileMe to get and send mail (e.g., SMTP, IMAP, POP). If Mail could not use these APIs, it would be forced to try to extract mail messages from the web interface, which would be very difficult, slow and error prone. And, every time someone at MobileMe tweaked the way the web page looked, the Mail application would have to be modified to handle the changes.
This is where we are with respect to Google Scholar and hundreds of other data sources. Our targeted browsing code looks at the structure of each web page that it gets and tries to find references embedded within the page. This feature can be very useful, but it is not nearly as useful as it could be if each site supported some standard APIs for searching and retrieving reference data. And, every time one of the supported sites changes their HTML, we have to modify Sente to handle the change.
Contrast this with PubMed. PubMed has supported a well-defined, relatively stable API for accessing reference data (and other types of data) for many years now. This means that Sente can do more things, more quickly and more reliably, with PubMed than we can with most other data sources. And our users in the bio-medical sciences benefit directly from this.
People in the humanities often accuse developers of academic reference managers like Sente of caring more about biology than about French literature, or medieval art. This is not the case. The problem is that the data sources in the humanities (and many other fields) are far, far behind the data sources in biology.
Instead of treating citation data (titles, author names, abstracts, publication details, etc.) as a little advertisement that will attract readers to their publication, many publishers in the humanities (and other fields) treat citation data as an asset that they can license to services like Thomson and EBSCOhost, who in turn sell access to institutions. I think it is fair to say that the primary motivation for this way of doing things is not to provide the best support possible for academic research. (Note that I am not talking about free access to the full text of all articles; just to the basic citation data about each reference.)
Now, you might be thinking that I have confused two issues -- APIs and the licensing of reference data -- and, to some extent I have. But they are related. Google provides an API for regular Google searches. But not for Google Scholar. Why? Because the academic publishers would not agree to this because they were afraid of losing their income from licensing their data to resellers. Thus, we are reduced to trying to keep up with each little change in the HTML that Google makes to their Scholar search results page.
So, back to the original question: why should you care about APIs? Because the lack of support for open, stable APIs to basic reference data in many fields is holding back everyone those fields. We will continue to find ways to make it easier to search the literature in all fields of academic study, but we would be more successful if the publishers actually wanted this to work.
Many of you have, or will have, significant influence on one or more of the academic publishers in your fields. You may be on the editorial board for a journal. You decide which journals you submit your manuscripts to. I think it is time for everyone to use whatever influence they have to push publishers in all academic fields to follow the lead of publishers in the bio-medical sciences. It is in everyone's best interest (even the publishers) that there be ready access, through both user and programming interfaces, to all of the basic citation data in all academic fields.