Feature #13034

Overhaul digitalobject:load task and clarify behavior

Added by Dan Gillean over 2 years ago. Updated over 2 years ago.

Status:NewStart date:05/16/2019
Priority:MediumDue date:
Assignee:-% Done:

0%

Category:-
Target version:-
Sponsored:No Tested version:2.5

Description

The digital object load task is described in our documentation here:

There is some logic that will determine different outcomes based on the input, and on what is found in AtoM, here:

The task expects a CSV as input, with 2 columns: a filepath, and either an information_object_id or an identifier. It will load the entire CSV in memory, so it can check if there are cases where there is more than 1 row in the CSV related to a single description. IF so, then the import is handled differently.

Because the outcome depends on 1) some of the task options 2) the number of rows per IO in the spreadsheet, and 3) whether or not a DO is already attached to the target IO, we can sometimes get inconsistent results. For example:

CASE 1a: Single CSV row; correct path; no existing DO on target IO

  • Attach image directly to target

CASE 1b: Single CSV row; incorrect path; no existing DO on target IO

  • Report issue and skip

CASE 1c: Single CSV row; correct path; existing DO on target IO

  • Report issue and skip

CASE 2a: multiple CSV rows; correct paths; no existing DO on target

  • Create stub child descriptions for all rows under target IO and attach directly to child stubs
  • Nothing is attached to the original target IO

CASE 2b: multiple CSV rows; incorrect paths; no existing DO on target

  • Skip and report in console
  • Nothing is attached to the original target IO and no child DOs are created (see Note 1 below)

CASE 2c: multiple CSV rows; correct paths; existing DO on target

  • Ignore parent
  • Create stub child descriptions for all rows under target IO and attach directly to child stubs
  • If child stubs already exist, they are ignored - i.e. if you run the task twice, you get duplicate stubs

Etc.... there are more cases, but as you can see, it becomes a bit unclear to the end user, and the behavior is different throughout.

Additionally, the code could be significantly optimized.

Wishlist enhancement

This wishlist ticket is to overhaul and improve the DO task - its code, its options, and its error handling. Things to improve include:

  • Add support for a slug column in the import CSV so users don't need SQL to determine the objectID
  • Add task options to better control error handling - e.g. skipping vs erroring out behavior
  • Optimize code
  • Overhaul task documentation
  • Consider logging options (so console log can be saved to an output source for later review)
  • ???

Note 1

  • As of 2019-05-16, this is not the current behavior - right now, the stub IOs would be created first, then the incorrect path would lead to the file being skipped and the task progressing. We are currently changing this so that unused stub IOs will not be created if there is no corresponding DO to attach

History

#1 Updated by Dan Gillean over 2 years ago

  • Description updated (diff)

Also available in: Atom PDF