Bug #9459
Term duplication in CSV import
Status: | Verified | Start date: | 02/19/2016 | |
---|---|---|---|---|
Priority: | Medium | Due date: | ||
Assignee: | José Raddaoui Marín | % Done: | 0% | |
Category: | CSV import | |||
Target version: | Release 2.3.0 | |||
Google Code Legacy ID: | Tested version: | 2.0.0, 2.0.1, 2.1, 2.1.1, 2.1.2, 2.2, 2.3 | ||
Sponsored: | Yes | Requires documentation: |
Description
We're creating a mapping with the term ids and names for some of the taxonomies, this mapping is updated each time a new term is added, but the problem is that the terms added to that mapping are always in English (if they have 'en' name) but the new ones added are in the CSV row culture:
https://github.com/artefactual/atom/blob/stable/2.2.x/lib/task/import/csvImportTask.class.php#L141
https://github.com/artefactual/atom/blob/stable/2.2.x/lib/QubitFlatfileImport.class.php#L1534
https://github.com/artefactual/atom/blob/stable/2.2.x/lib/QubitFlatfileImport.class.php#L1476
On the import, AtoM tries to match the term name with the ones in the mapping and if it doesn't match with any, it creates a new one:
https://github.com/artefactual/atom/blob/stable/2.2.x/lib/task/import/csvImportTask.class.php#L486
https://github.com/artefactual/atom/blob/stable/2.2.x/lib/task/import/csvImportTask.class.php#L497
In other cases AtoM tries to match the terms without looking at that mapping, but if it doesn't send the culture parameter (like in the following case) it has the same issue:
https://github.com/artefactual/atom/blob/stable/2.2.x/lib/task/import/csvImportTask.class.php#L1002
https://github.com/artefactual/atom/blob/stable/2.2.x/lib/QubitFlatfileImport.class.php#L1295
But for the level of description, AtoM uses the current culture of the application, which will avoid those duplications if the import rows are in the same culture:
https://github.com/artefactual/atom/blob/stable/2.2.x/lib/task/import/csvImportTask.class.php#L1068
https://github.com/artefactual/atom/blob/stable/2.2.x/lib/model/QubitInformationObject.php#L1988
To fix this we will need to create the mapping with all culture names and change how the terms are checked, and also change the other cases to use the current row culture while creating or fetching the term.
History
#2 Updated by José Raddaoui Marín about 5 years ago
- Status changed from In progress to Code Review
- Assignee changed from José Raddaoui Marín to Nick Wilkinson
Ready for code review in PR 281
#3 Updated by José Raddaoui Marín about 5 years ago
Notes for testing:
I don't think the description is clear from a non-developer point of view. You can find more information in the related client ticket alongside a CSV sample file to reproduce the issue.
The issue happens in all CSV import types, where terms in a culture different than English always create a new term record, creating a lot of duplicates. After this fix, AtoM will check for matches in the same culture as the CSV row.
Here is a list of the taxonomies affected that should be tested:
*IO CSV import:* QubitTaxonomy::DESCRIPTION_STATUS_ID QubitTaxonomy::PUBLICATION_STATUS_ID QubitTaxonomy::DESCRIPTION_DETAIL_LEVEL_ID QubitTaxonomy::NOTE_TYPE_ID QubitTaxonomy::RAD_NOTE_ID QubitTaxonomy::RAD_TITLE_NOTE_ID QubitTaxonomy::MATERIAL_TYPE_ID QubitTaxonomy::RIGHT_ACT_ID QubitTaxonomy::COPYRIGHT_STATUS_ID QubitTaxonomy::PHYSICAL_OBJECT_TYPE_ID QubitTaxonomy::EVENT_TYPE_ID QubitTaxonomy::PLACE_ID *Accessions CSV import:* QubitTaxonomy::ACCESSION_ACQUISITION_TYPE_ID QubitTaxonomy::ACCESSION_RESOURCE_TYPE_ID QubitTaxonomy::ACCESSION_PROCESSING_STATUS_ID QubitTaxonomy::ACCESSION_PROCESSING_PRIORITY_ID QubitTaxonomy::EVENT_TYPE_ID QubitTaxonomy::PLACE_ID *Actors CSV import:* QubitTaxonomy::NOTE_TYPE_ID QubitTaxonomy::ACTOR_ENTITY_TYPE_ID QubitTaxonomy::ACTOR_RELATION_TYPE_ID QubitTaxonomy::DESCRIPTION_STATUS_ID QubitTaxonomy::DESCRIPTION_DETAIL_LEVEL_ID *Events CSV import:* QubitTaxonomy::EVENT_TYPE_ID
There are also a couple of places in AtoM where part of the modified code is used, so it would be nice if we could check that the following is working as before:
*Information objects EAD XML export:* QubitTaxonomy::RAD_NOTE_ID QubitTaxonomy::RAD_TITLE_NOTE_ID QubitTaxonomy::DACS_NOTE_ID *Information objects CSV export:* QubitTaxonomy::NOTE_TYPE_ID QubitTaxonomy::RAD_NOTE_ID QubitTaxonomy::RAD_TITLE_NOTE_ID QubitTaxonomy::LEVEL_OF_DESCRIPTION_ID QubitTaxonomy::DESCRIPTION_DETAIL_LEVEL_ID QubitTaxonomy::DESCRIPTION_STATUS_ID QubitTaxonomy::EVENT_TYPE_ID QubitTaxonomy::PHYSICAL_OBJECT_TYPE_ID
#4 Updated by Nick Wilkinson about 5 years ago
- Assignee changed from Nick Wilkinson to Mike Cantelon
Hi Mike, assigning to you for Code Review.
#5 Updated by Mike Cantelon about 5 years ago
- Status changed from Code Review to Feedback
- Assignee changed from Mike Cantelon to José Raddaoui Marín
Looks good to me... nice work!
#6 Updated by José Raddaoui Marín about 5 years ago
- Status changed from Feedback to QA/Review
- Assignee changed from José Raddaoui Marín to Dan Gillean
Merged in qa/2.3.x
#7 Updated by José Raddaoui Marín about 5 years ago
Fixed a small regression in 0bd2114
#8 Updated by Sara Allain almost 5 years ago
- Assignee changed from Dan Gillean to José Raddaoui Marín
Tested this as per comment 3 above, with the following results for imports:
IO CSV import:
All imported as expected.Accessions CSV import:
All imported as expected.Actors CSV import:
All imported as expected.Events CSV import:
Import unsuccessful - filed #9649 to investigate; have marked it as a blocker for this issue in the meantime as this can't be tested until the Events CSV import is working.
And the following results for exports:
Information objects EAD XML export:
All exported as expected.Information objects CSV export:
French-language fields did not export. Fields with translations exported English results only. Taxonomy terms were exported in English. Filed #9663 to investigate.
#9 Updated by Sara Allain almost 5 years ago
- Blocked by Bug #9649: Events CSV imports will not show in user interface without further manual edits added
#10 Updated by Jesús García Crespo almost 5 years ago
- Status changed from QA/Review to Feedback
#11 Updated by José Raddaoui Marín almost 5 years ago
- Status changed from Feedback to Verified
It looks like we have created individual tickets for the feedback in update 8.
#12 Updated by Dan Gillean over 3 years ago
- Blocked by deleted (Bug #9649: Events CSV imports will not show in user interface without further manual edits)