Add ability for digital object load task to discard master DO after derivative creation
|Assignee:||Michelle Curran||% Done:|
|Category:||CLI tools||Estimated time:||20.00 hours|
|Target version:||Release 2.5.0|
|Google Code Legacy ID:||Tested version:|
Background and summary
The digital object load task is described in our documentation here:
The task expects a CSV as input, with 2 columns: a filepath, and either an information_object_id or an identifier.
Though undocumented, previous testing has shown that adding an HTTP web link in the filepath column will make the DO load task behave similarly to using a URL to link an external object when performed via the user interface. When external HTTP web URIs are passed to AtoM as digital objects to load, AtoM generally will:
- Follow the URI to fetch the object
- Make a copy in a tmp directory
- Process the temp copy to generate the derivatives
- Discard the temp copy
Currently there is no way to replicate this behavior with local file paths, where the master digital objects are not kept. Users who store master objects in a separate local repository might not want to maintain 2 copies of every digital object. This task will allow such users to load digital objects from a local file path, but have the master DO not be stored at the end of the process. Instead, we will store the filepath, just as we store the URI for externally linked objects.
- Add a new option to the DO load task (e.g. --link-source or something) that will modify the behavior of the DO load task to behave like an external DO being uploaded via URI (i.e. store the path to source file in the database, and don't copy the source "master" file to uploads directory)
- When used, local derivatives should still be generated and saved in the uploads directory as usual.