Add command-line task for deaccession record CSV import
|Target version:||Release 2.5.0|
|Google Code Legacy ID:||Tested version:||2.5|
AtoM supports the ability to create de-accession records and associate them with an accession record. However, while an accession record CSV import option currently exists, there is no way to import deaccession records.
This feature will add a new command-line task to AtoM, giving system administrators the ability to import deaccessions data into AtoM. Multiple rows of data (aka multiple deaccession records) can target the same accession record. This current feature ticket does not include time to add support for this task to the user interface.
The expected CSV will have 7 columns, corresponding to various fields available in the Deaccession record template. These include:
- accessionNumber: expects the accession number of an existing accession record in AtoM as input. If no match is found for an existing accession, the console will provide a warning, the row will be skipped, and the task will continue.
- deaccessionNumber: an identifier for the deaccession. Free text, will support symbols and typographical marks such as dashes and slashes
- date: expects a date value in ISO-8601 format (YYYY-MM-DD)
- scope: expects one of the controlled terms from the "Scope" field in the AtoM deaccession record template. English options include "Whole" and "Part"
- description: Free-text. Identify what materials are being deaccessioned.
- extent: Free-text. Identify the number of units and the unit of measurement for the amount of records being deaccessioned.
- reason: Free-text. Provide a reason why the records are being deaccessioned
- culture*: Expects a 2-letter ISO 639-1 language code as input (e.g.: en, fr, es, pt, etc)
More than 1 row of data can be associated with the same accession record. To prevent accidental exact duplicates, by default AtoM will skip any rows where all data is identical to a row preceding it, and will report the skipped record in the console log. If you are intentionally importing duplicate deaccession records, you can use the --ignore-duplicates option.
If you wish a summary of warnings reported in the console log, you can use the --error-log option - it takes a path to a new text file as input, and will copy all console warnings to this log file. Acceptable file extensions for the log file are .txt or .log.
Basic syntax of the command:
php symfony csv:deaccession-import lib/task/import/example/example_deaccessions.csv
Full help text for the new CLI task:
php symfony help csv:deaccession-import Usage: symfony csv:deaccession-import [--application[="..."]] [--env="..."] [--connection="..."] [--rows-until-update[="..."]] [--skip-rows[="..."]] [--error-log[="..."]] [--ignore-duplicates] filename Arguments: filename The input file (csv format). Options: --application The application name (default: qubit) --env The environment (default: cli) --connection The connection name (default: propel) --rows-until-update Output total rows imported every n rows. --skip-rows Skip n rows before importing. --error-log File to log errors to. --ignore-duplicates Load all records from csv, including duplicates. Description: Import CSV deaccession data
An example of the CSV template has been attached to this issue for reference - you can also find a copy in AtoM, at:
#10 Updated by Dan Gillean about 2 years ago
- Status changed from QA/Review to Feedback
- Assignee changed from Dan Gillean to Steve Breker
One piece of feedback so far: we are missing what I'd consider to be a key field on the import template - the deaccession number. In the user interface, you can assign a specific deaccession number to a deaccession record. It looks like right now, on import the accessionNumber column is being used to both match against the parent Accession, and as the deaccession number. However, it would be better if these were 2 different fields.
I will keep testing trying to break some of the other fields/options to look for edge cases etc.
Sidenote: I think I've found a regression related to accession records in general in 2.5 - I will keep testing, and file a separate ticket for that.
#11 Updated by Dan Gillean about 2 years ago
Found an issue with i18n stuff. I'm not sure what's up with our fixtures and i18n, but they don't seem to be working as expected.
It appears that "Partie" is a fixture translation for "Part" in the Deaccession scope. When I go to an accession, flip the UI to French, and create a deaccession record however, I see "Part" and "Whole" displaying. I recall that there are some issues with fixtures and translations and upgrading, but this is the 2.5 vagrant box - I've never manually updated it.
If I try to include a deaccession record in the import CSV that has a culture value of "fr" I get an error that halts the import on the scope of my fr row, whether I add "Part" or "Partie" - so neither the default en term, nor the supplied fr term in the fixtures works. This means that it will be impossible to import a deaccession record with Scope data in any culture other than English.
#12 Updated by Dan Gillean about 2 years ago
I tried the following so I could get a log file with the error described above:
php symfony csv:deaccession-import --ignore-duplicates --error-log='/vagrant/deaccession-errors.txt' /vagrant/example_deaccessions.csv
I also tried it with double-quotes on the path to the new error log txt file. In both cases, I didn't end up with an error log. This suggests that if you get an error that aborts the import, you don't get an error log - which sort of defeats the purpose of the log, maybe? unless i'm doing something wrong?
#13 Updated by Dan Gillean about 2 years ago
Note, the related error I mentioned might be caused by the work Mike C is doing on the accessions counter - for now, until told otherwise, I've added a note about it there. For reference:
#14 Updated by Steve Breker about 2 years ago
- Status changed from Feedback to Code Review
- Assignee changed from Steve Breker to Nick Wilkinson
I have created a new PR containing a fix for the issue that Dan reports above in Note 10.
This PR includes:
- new deaccession number column
- fix for one error message
- correct case on 'AND' clause in query.
The issues Dan reported in note 11 are not related to this change.
RE: Note 12: only messages that are the result of error handling issues in the import process will be logged to the file. These include:
"Skipping. Match not found for accession number: ". $accessionIdentifier;
"Skipping duplicate deaccession: ". $self->rowStatusVars['accessionNumber'];
#15 Updated by Dan Gillean about 2 years ago
Ok, the issue described in note-11 is likely due to #9541 - issues with fixtures and translations in upgraded environments. I was testing with a copy of the Demo Data sqldump loaded. If I purge and test on a clean instance, then the correct terms display. This also allowed my fr record to import correctly, and the log file to be generated.
Will retest when Steve's most recent update (to address the feedback in note-11) has been code reviewed and pushed.
#21 Updated by Dan Gillean almost 2 years ago
- File deaccession-test.csv added
- Status changed from QA/Review to Feedback
- Assignee changed from Dan Gillean to Steve Breker
I imported the attached test CSV, using the demo data as a base, and after first importing the example_accessions.csv file.
I imported it with the following command:
php symfony csv:deaccession-import --error-log='/vagrant/deaccession-errors1.txt' /vagrant/deaccession-test.csv
This time, though I had 2 deaccessions going to the same accession twice in the test file, I got no warning or skipped row - and still no error message. So... it seems like maybe the previous check about duplicate rows (which could be gotten around via the --ignore-duplicates flag, is no longer in place?
Otherwise, everything did in fact import as expected :)
#22 Updated by Steve Breker almost 2 years ago
- File Dan_test_CSV_file.png added
- Assignee changed from Steve Breker to Dan Gillean
So the issue is: what constitutes a dupe?
The code checks all of the following fields - if any do not match, it is not a dupe.
Using this criteria, I don't see any dupes in your input file. Screen shot of your input file is attached. I based this matching logic off the comment at the top "by default AtoM will skip any rows where all data is identical to a row preceding it".
Does this matching criteria need to change?
In addition, for dupe checking, the importer is checking for a deaccession record matching the above criteria on any accession record in the system - dupe checking is not restricted to the accession that is implicated in the accessionNumber column, but perhaps it should. What do you think?
#23 Updated by Dan Gillean almost 2 years ago
- Status changed from Feedback to QA/Review
Oh, okay - sorry, I think I misunderstood what we were calling a duplicate - I had assumed this meant more than one deaccession record to the same accession record.
I can't really think of why someone would want to import multiple deaccession records with the exact same deaccession number, but it could happen i guess, so this is a fine check. I'll re-test quickly with this in mind, and will update my test csv for that specific use case.
#26 Updated by Dan Gillean about 1 year ago
- Requires documentation changed from Yes to No
Documentation added to 2.5 in https://github.com/artefactual/atom-docs/commit/350678ee23762495dad9e8009f4278301d06d906
I have also updated the wiki to add a copy of the deacession CSV template: