Enhance import matching behaviors: add a digital object checksum column to CSV import/export templates and use it on import for first match when importing updates
|Assignee:||Dan Gillean||% Done:|
|Category:||Import/Export||Estimated time:||12.00 hours|
|Target version:||Release 2.4.0|
|Google Code Legacy ID:||Tested version:|
Since the 2.0 release, AtoM uses the digital object's SHA256 checksum to generate the digital object URLs (e.g. http://example.com/uploads/r/my-repository/0/9/9/0956e36f27cdb64cbfee7a92a262a140e9794d66e7ad846a3dc6df0c9d52017e/my-image.jpg).
While this makes for some long URLs, it also gives us a very convenient way to tell if a digital object has been updated - if the checksum has changed, it's a new digital object.
This feature will make a few changes - first a new digital object checksum column will be included on CSV imports and exports. This includes adding this column to the sample templates found in lib/task/import/example.
Second, we will use this checksum during the import process to avoid downloading and generating new derivatives for a digital object that hasn't changed since the last import (when updating existing descriptions during import).
When no checksum value is included during import, or the checksum does not match the existing one, the digital object will be assumed to be new - and therefore part of the update. In this case, the original digital object will be removed, and the new path to the associated digital object will be used to recreate and attach a new digital object (including generating new derivatives, etc).