Improve OAI resumptionToken implementation
|Assignee:||Dan Gillean||% Done:|
|Target version:||Release 2.2.0|
|Google Code Legacy ID:||Tested version:|
Currently, a call to list identifiers can be expressed in OAI like so:
With most actively used AtoM sites, there will be more than 100 results return, at which point the response terminates and provides a resumptionToken so a harvester can continue the request. Currently, this output looks like so:
It has been noted in the User Forum that manually copying the resumptionToken and inserting it into the URL for the subsequent request will fail. Instead, the user must manually encode special characters for use in a URL, like so:
Looking at the OAI standard, it notes the following:
Before including a resumptionToken in the URL of a subsequent request, a harvester must encode any special characters in it.
However, the Digital Library Federation's Best Practices for OAI Data Provider Implementations and Shareable Metadata notes that:
resumptionTokens in the response should not be URL encoded. This is different from an OAI request, in which resumptionTokens MUST be URL encoded. It is a best practice not to use characters in resumptionTokens that require URL encoding.
All other examples I have seen for resumption tokens are different than AtoM's current implementation. Because of this, we should improve the way AtoM generates resumption tokens so that URL encoding is not required.
Some examples from other sites:
Example listed in the DLF Best Practice guidelines, linked above:
<resumptionToken expirationDate="2005-07-26T16:57:24Z" completeListSize="31979" cursor="4">lr42e519f4d1e58</resumptionToken>
This resumptionToken indicates when it will expire, how many incomplete lists have been returned, and what the complete number of records is for the ListRecords request. As stated above it is a best practice to include both these attributes in a resumptionToken.
Another DSpace example (pretty much same as above): http://repositorio-tematico.up.pt/oaiextended/request?verb=ListRecords&metadataPrefix=oai_dc&set=rap
#2 Updated by Dan Gillean about 7 years ago
- Status changed from QA/Review to Feedback
- Target version set to Release 2.2.0
One thing I've noticed:
The first resumptionToken seems to work perfectly. However, the same resumption token is issued in the next request - meaning the harvester probably won't be able to page past the first 2 sets of results. Shouldn't it issue a different resumptionToken each time, so the harvester can continue on to the next batch of a truncated set?
#3 Updated by Mark Triggs about 7 years ago
It definitely should give a different resumption token on each page or it's just not going to work :) Are you sure they're exactly the same? I got tricked a couple of times because they're only different by one character. For example, here are two consecutive ones I saw:
eyJmcm9tIjoiIiwidW50aWwiOiIiLCJjdXJzb3IiOjE2MDAsIm1ldGFkYXRhUHJlZml4Ijoib2FpX2RjIiwic2V0Ijoib2FpOnZpcnR1YWw6dG9wLWxldmVsLXJlY29yZHMifQ== eyJmcm9tIjoiIiwidW50aWwiOiIiLCJjdXJzb3IiOjE3MDAsIm1ldGFkYXRhUHJlZml4Ijoib2FpX2RjIiwic2V0Ijoib2FpOnZpcnR1YWw6dG9wLWxldmVsLXJlY29yZHMifQ== ^
Which at first blush seem identical, except for the character I marked with a '^'. The nature of the tokens is that they're mostly only changing by one character (offset=100 becomes offset=200), so the resulting base64-encoded version only changes by one character as well. I wasted an embarrassingly long amount of time "debugging" this ;)
#4 Updated by Dan Gillean about 7 years ago
- Status changed from Feedback to Verified
You're totally right, Mark. I should have kept going with the tests before updating the ticket. It works. I've iterated through 4 consecutive pages, and I have checked that they are different than the records previously returned, and that the curson position is iterating in the returned header.
Thanks, this is great!
I'll update the new docs with this functional example, and remove the warning I had there about URL encoding etc.
#5 Updated by Dan Gillean about 7 years ago
V. simple docs update here, to correct the resumptionToken example and remove the warning: