Feature #12020

Add command-line task for deaccession record CSV import

Added by Dan Gillean about 1 year ago. Updated 19 days ago.

Status:VerifiedStart date:03/06/2018
Priority:MediumDue date:
Assignee:-% Done:

100%

Category:CSV import
Target version:Release 2.5.0
Google Code Legacy ID: Tested version:2.5
Sponsored:Yes Requires documentation:No

Description

AtoM supports the ability to create de-accession records and associate them with an accession record. However, while an accession record CSV import option currently exists, there is no way to import deaccession records.

This feature will add a new command-line task to AtoM, giving system administrators the ability to import deaccessions data into AtoM. Multiple rows of data (aka multiple deaccession records) can target the same accession record. This current feature ticket does not include time to add support for this task to the user interface.

The expected CSV will have 7 columns, corresponding to various fields available in the Deaccession record template. These include:

  • accessionNumber: expects the accession number of an existing accession record in AtoM as input. If no match is found for an existing accession, the console will provide a warning, the row will be skipped, and the task will continue.
  • deaccessionNumber: an identifier for the deaccession. Free text, will support symbols and typographical marks such as dashes and slashes
  • date: expects a date value in ISO-8601 format (YYYY-MM-DD)
  • scope: expects one of the controlled terms from the "Scope" field in the AtoM deaccession record template. English options include "Whole" and "Part"
  • description: Free-text. Identify what materials are being deaccessioned.
  • extent: Free-text. Identify the number of units and the unit of measurement for the amount of records being deaccessioned.
  • reason: Free-text. Provide a reason why the records are being deaccessioned
  • culture*: Expects a 2-letter ISO 639-1 language code as input (e.g.: en, fr, es, pt, etc)

More than 1 row of data can be associated with the same accession record. To prevent accidental exact duplicates, by default AtoM will skip any rows where all data is identical to a row preceding it, and will report the skipped record in the console log. If you are intentionally importing duplicate deaccession records, you can use the --ignore-duplicates option.

If you wish a summary of warnings reported in the console log, you can use the --error-log option - it takes a path to a new text file as input, and will copy all console warnings to this log file. Acceptable file extensions for the log file are .txt or .log.

Basic syntax of the command:

php symfony csv:deaccession-import lib/task/import/example/example_deaccessions.csv

Full help text for the new CLI task:

php symfony help csv:deaccession-import

Usage:
 symfony csv:deaccession-import [--application[="..."]] [--env="..."] [--connection="..."] [--rows-until-update[="..."]] [--skip-rows[="..."]] [--error-log[="..."]] [--ignore-duplicates] filename
Arguments:
 filename             The input file (csv format).
Options:
 --application        The application name (default: qubit)
 --env                The environment (default: cli)
 --connection         The connection name (default: propel)
 --rows-until-update  Output total rows imported every n rows.
 --skip-rows          Skip n rows before importing.
 --error-log          File to log errors to.
 --ignore-duplicates  Load all records from csv, including duplicates.
Description:
 Import CSV deaccession data

An example of the CSV template has been attached to this issue for reference - you can also find a copy in AtoM, at:

lib/task/import/example/example_deaccessions.csv

example_deaccessions.csv Magnifier - Example AtoM Deaccession records CSV import template (277 Bytes) Dan Gillean, 03/06/2018 02:38 PM

deaccession-test.csv Magnifier - Test file, used in test described below in note-21 (602 Bytes) Dan Gillean, 06/19/2018 02:56 PM

Dan_test_CSV_file.png (166 KB) Steve Breker, 06/19/2018 04:10 PM

History

#2 Updated by Steve Breker about 1 year ago

Ready for CR.

https://github.com/artefactual/atom/pull/674

Will need this for PAS so will need to pick to stable/2.4.x once CR, and QA are complete.

#3 Updated by Steve Breker about 1 year ago

  • Status changed from New to Code Review
  • Assignee changed from Steve Breker to Nick Wilkinson

#4 Updated by Nick Wilkinson about 1 year ago

  • Assignee changed from Nick Wilkinson to Mike Cantelon

Hi Mike, can you please take a look for CR?

#5 Updated by Mike Cantelon about 1 year ago

  • Status changed from Code Review to Feedback
  • Assignee changed from Mike Cantelon to Steve Breker

Other than a couple of super minor nitpicks, looks good!

#6 Updated by Steve Breker about 1 year ago

  • Assignee changed from Steve Breker to Mike Cantelon

Suggested changes made.

#7 Updated by Mike Cantelon about 1 year ago

  • Assignee changed from Mike Cantelon to Steve Breker

Awesome... looks good to me!

#8 Updated by Steve Breker about 1 year ago

  • Status changed from Feedback to QA/Review
  • Assignee changed from Steve Breker to Nick Wilkinson

Merged to qa/2.5.x. Ready for QA.

#9 Updated by Nick Wilkinson about 1 year ago

  • Assignee changed from Nick Wilkinson to Dan Gillean

Hi Dan, passing to you for QA.

#10 Updated by Dan Gillean about 1 year ago

  • Status changed from QA/Review to Feedback
  • Assignee changed from Dan Gillean to Steve Breker

Hi Steve,

One piece of feedback so far: we are missing what I'd consider to be a key field on the import template - the deaccession number. In the user interface, you can assign a specific deaccession number to a deaccession record. It looks like right now, on import the accessionNumber column is being used to both match against the parent Accession, and as the deaccession number. However, it would be better if these were 2 different fields.

I will keep testing trying to break some of the other fields/options to look for edge cases etc.

Sidenote: I think I've found a regression related to accession records in general in 2.5 - I will keep testing, and file a separate ticket for that.

#11 Updated by Dan Gillean about 1 year ago

Found an issue with i18n stuff. I'm not sure what's up with our fixtures and i18n, but they don't seem to be working as expected.

Looking here:

It appears that "Partie" is a fixture translation for "Part" in the Deaccession scope. When I go to an accession, flip the UI to French, and create a deaccession record however, I see "Part" and "Whole" displaying. I recall that there are some issues with fixtures and translations and upgrading, but this is the 2.5 vagrant box - I've never manually updated it.

If I try to include a deaccession record in the import CSV that has a culture value of "fr" I get an error that halts the import on the scope of my fr row, whether I add "Part" or "Partie" - so neither the default en term, nor the supplied fr term in the fixtures works. This means that it will be impossible to import a deaccession record with Scope data in any culture other than English.

#12 Updated by Dan Gillean about 1 year ago

I tried the following so I could get a log file with the error described above:

php symfony csv:deaccession-import --ignore-duplicates --error-log='/vagrant/deaccession-errors.txt' /vagrant/example_deaccessions.csv

I also tried it with double-quotes on the path to the new error log txt file. In both cases, I didn't end up with an error log. This suggests that if you get an error that aborts the import, you don't get an error log - which sort of defeats the purpose of the log, maybe? unless i'm doing something wrong?

#13 Updated by Dan Gillean about 1 year ago

Note, the related error I mentioned might be caused by the work Mike C is doing on the accessions counter - for now, until told otherwise, I've added a note about it there. For reference:

#14 Updated by Steve Breker about 1 year ago

  • Status changed from Feedback to Code Review
  • Assignee changed from Steve Breker to Nick Wilkinson

I have created a new PR containing a fix for the issue that Dan reports above in Note 10.

https://github.com/artefactual/atom/pull/680

This PR includes:

- new deaccession number column
- fix for one error message
- correct case on 'AND' clause in query.

The issues Dan reported in note 11 are not related to this change.

RE: Note 12: only messages that are the result of error handling issues in the import process will be logged to the file. These include:

"Skipping. Match not found for accession number: ". $accessionIdentifier;
"Skipping duplicate deaccession: ". $self->rowStatusVars['accessionNumber'];

#15 Updated by Dan Gillean about 1 year ago

Ok, the issue described in note-11 is likely due to #9541 - issues with fixtures and translations in upgraded environments. I was testing with a copy of the Demo Data sqldump loaded. If I purge and test on a clean instance, then the correct terms display. This also allowed my fr record to import correctly, and the log file to be generated.

Will retest when Steve's most recent update (to address the feedback in note-11) has been code reviewed and pushed.

#16 Updated by Nick Wilkinson about 1 year ago

  • Assignee changed from Nick Wilkinson to Mike Cantelon

Hi Mike, passing to you for CR.

#17 Updated by Mike Cantelon about 1 year ago

  • Status changed from Code Review to Feedback
  • Assignee changed from Mike Cantelon to Steve Breker

Looks good to me!

#18 Updated by Steve Breker about 1 year ago

Merged PR from Note 14 to qa/2.5.x.

Ready for QA.

#19 Updated by Steve Breker about 1 year ago

  • Status changed from Feedback to QA/Review
  • Assignee changed from Steve Breker to Dan Gillean

#20 Updated by Dan Gillean 11 months ago

  • Description updated (diff)

Update description to include the deaccessionNumber column

#21 Updated by Dan Gillean 11 months ago

  • File deaccession-test.csvMagnifier added
  • Status changed from QA/Review to Feedback
  • Assignee changed from Dan Gillean to Steve Breker

I imported the attached test CSV, using the demo data as a base, and after first importing the example_accessions.csv file.

I imported it with the following command:

php symfony csv:deaccession-import --error-log='/vagrant/deaccession-errors1.txt' /vagrant/deaccession-test.csv

This time, though I had 2 deaccessions going to the same accession twice in the test file, I got no warning or skipped row - and still no error message. So... it seems like maybe the previous check about duplicate rows (which could be gotten around via the --ignore-duplicates flag, is no longer in place?

Otherwise, everything did in fact import as expected :)

#22 Updated by Steve Breker 11 months ago

So the issue is: what constitutes a dupe?

The code checks all of the following fields - if any do not match, it is not a dupe.

deaccessionNumber
date
scope
description
extent
reason
culture

Using this criteria, I don't see any dupes in your input file. Screen shot of your input file is attached. I based this matching logic off the comment at the top "by default AtoM will skip any rows where all data is identical to a row preceding it".

Does this matching criteria need to change?

In addition, for dupe checking, the importer is checking for a deaccession record matching the above criteria on any accession record in the system - dupe checking is not restricted to the accession that is implicated in the accessionNumber column, but perhaps it should. What do you think?

#23 Updated by Dan Gillean 11 months ago

  • Status changed from Feedback to QA/Review

Oh, okay - sorry, I think I misunderstood what we were calling a duplicate - I had assumed this meant more than one deaccession record to the same accession record.

I can't really think of why someone would want to import multiple deaccession records with the exact same deaccession number, but it could happen i guess, so this is a fine check. I'll re-test quickly with this in mind, and will update my test csv for that specific use case.

#24 Updated by Dan Gillean 11 months ago

  • Status changed from QA/Review to Verified
  • Assignee deleted (Dan Gillean)
  • % Done changed from 50 to 100
  • Tested version 2.5 added

Looks good! Thanks Steve.

#25 Updated by Steve Breker 11 months ago

picked to the PAS client repo as a custom.

Test loads on PAS server will verify that this is working there.

#26 Updated by Dan Gillean 19 days ago

  • Requires documentation changed from Yes to No

Also available in: Atom PDF