Bug #9459

Term duplication in CSV import

Added by José Raddaoui Marín about 5 years ago. Updated almost 5 years ago.

Status:VerifiedStart date:02/19/2016
Priority:MediumDue date:
Assignee:José Raddaoui Marín% Done:

0%

Category:CSV import
Target version:Release 2.3.0
Google Code Legacy ID: Tested version:2.0.0, 2.0.1, 2.1, 2.1.1, 2.1.2, 2.2, 2.3
Sponsored:Yes Requires documentation:

Description

We're creating a mapping with the term ids and names for some of the taxonomies, this mapping is updated each time a new term is added, but the problem is that the terms added to that mapping are always in English (if they have 'en' name) but the new ones added are in the CSV row culture:

https://github.com/artefactual/atom/blob/stable/2.2.x/lib/task/import/csvImportTask.class.php#L141
https://github.com/artefactual/atom/blob/stable/2.2.x/lib/QubitFlatfileImport.class.php#L1534
https://github.com/artefactual/atom/blob/stable/2.2.x/lib/QubitFlatfileImport.class.php#L1476

On the import, AtoM tries to match the term name with the ones in the mapping and if it doesn't match with any, it creates a new one:

https://github.com/artefactual/atom/blob/stable/2.2.x/lib/task/import/csvImportTask.class.php#L486
https://github.com/artefactual/atom/blob/stable/2.2.x/lib/task/import/csvImportTask.class.php#L497

In other cases AtoM tries to match the terms without looking at that mapping, but if it doesn't send the culture parameter (like in the following case) it has the same issue:

https://github.com/artefactual/atom/blob/stable/2.2.x/lib/task/import/csvImportTask.class.php#L1002
https://github.com/artefactual/atom/blob/stable/2.2.x/lib/QubitFlatfileImport.class.php#L1295

But for the level of description, AtoM uses the current culture of the application, which will avoid those duplications if the import rows are in the same culture:

https://github.com/artefactual/atom/blob/stable/2.2.x/lib/task/import/csvImportTask.class.php#L1068
https://github.com/artefactual/atom/blob/stable/2.2.x/lib/model/QubitInformationObject.php#L1988

To fix this we will need to create the mapping with all culture names and change how the terms are checked, and also change the other cases to use the current row culture while creating or fetching the term.

History

#2 Updated by José Raddaoui Marín about 5 years ago

  • Status changed from In progress to Code Review
  • Assignee changed from José Raddaoui Marín to Nick Wilkinson

Ready for code review in PR 281

#3 Updated by José Raddaoui Marín about 5 years ago

Notes for testing:

I don't think the description is clear from a non-developer point of view. You can find more information in the related client ticket alongside a CSV sample file to reproduce the issue.

The issue happens in all CSV import types, where terms in a culture different than English always create a new term record, creating a lot of duplicates. After this fix, AtoM will check for matches in the same culture as the CSV row.

Here is a list of the taxonomies affected that should be tested:

*IO CSV import:*

QubitTaxonomy::DESCRIPTION_STATUS_ID
QubitTaxonomy::PUBLICATION_STATUS_ID
QubitTaxonomy::DESCRIPTION_DETAIL_LEVEL_ID
QubitTaxonomy::NOTE_TYPE_ID
QubitTaxonomy::RAD_NOTE_ID
QubitTaxonomy::RAD_TITLE_NOTE_ID
QubitTaxonomy::MATERIAL_TYPE_ID
QubitTaxonomy::RIGHT_ACT_ID
QubitTaxonomy::COPYRIGHT_STATUS_ID
QubitTaxonomy::PHYSICAL_OBJECT_TYPE_ID
QubitTaxonomy::EVENT_TYPE_ID
QubitTaxonomy::PLACE_ID

*Accessions CSV import:*

QubitTaxonomy::ACCESSION_ACQUISITION_TYPE_ID
QubitTaxonomy::ACCESSION_RESOURCE_TYPE_ID
QubitTaxonomy::ACCESSION_PROCESSING_STATUS_ID
QubitTaxonomy::ACCESSION_PROCESSING_PRIORITY_ID
QubitTaxonomy::EVENT_TYPE_ID
QubitTaxonomy::PLACE_ID

*Actors CSV import:*

QubitTaxonomy::NOTE_TYPE_ID
QubitTaxonomy::ACTOR_ENTITY_TYPE_ID
QubitTaxonomy::ACTOR_RELATION_TYPE_ID
QubitTaxonomy::DESCRIPTION_STATUS_ID
QubitTaxonomy::DESCRIPTION_DETAIL_LEVEL_ID

*Events CSV import:*

QubitTaxonomy::EVENT_TYPE_ID

There are also a couple of places in AtoM where part of the modified code is used, so it would be nice if we could check that the following is working as before:

*Information objects EAD XML export:*

QubitTaxonomy::RAD_NOTE_ID
QubitTaxonomy::RAD_TITLE_NOTE_ID
QubitTaxonomy::DACS_NOTE_ID

*Information objects CSV export:*

QubitTaxonomy::NOTE_TYPE_ID
QubitTaxonomy::RAD_NOTE_ID
QubitTaxonomy::RAD_TITLE_NOTE_ID
QubitTaxonomy::LEVEL_OF_DESCRIPTION_ID
QubitTaxonomy::DESCRIPTION_DETAIL_LEVEL_ID
QubitTaxonomy::DESCRIPTION_STATUS_ID
QubitTaxonomy::EVENT_TYPE_ID
QubitTaxonomy::PHYSICAL_OBJECT_TYPE_ID

#4 Updated by Nick Wilkinson about 5 years ago

  • Assignee changed from Nick Wilkinson to Mike Cantelon

Hi Mike, assigning to you for Code Review.

#5 Updated by Mike Cantelon about 5 years ago

  • Status changed from Code Review to Feedback
  • Assignee changed from Mike Cantelon to José Raddaoui Marín

Looks good to me... nice work!

#6 Updated by José Raddaoui Marín about 5 years ago

  • Status changed from Feedback to QA/Review
  • Assignee changed from José Raddaoui Marín to Dan Gillean

Merged in qa/2.3.x

#7 Updated by José Raddaoui Marín about 5 years ago

Fixed a small regression in 0bd2114

#8 Updated by Sara Allain almost 5 years ago

  • Assignee changed from Dan Gillean to José Raddaoui Marín

Tested this as per comment 3 above, with the following results for imports:

IO CSV import:
All imported as expected.

Accessions CSV import:
All imported as expected.

Actors CSV import:
All imported as expected.

Events CSV import:
Import unsuccessful - filed #9649 to investigate; have marked it as a blocker for this issue in the meantime as this can't be tested until the Events CSV import is working.

And the following results for exports:

Information objects EAD XML export:
All exported as expected.

Information objects CSV export:
French-language fields did not export. Fields with translations exported English results only. Taxonomy terms were exported in English. Filed #9663 to investigate.

#9 Updated by Sara Allain almost 5 years ago

  • Blocked by Bug #9649: Events CSV imports will not show in user interface without further manual edits added

#10 Updated by Jesús García Crespo almost 5 years ago

  • Status changed from QA/Review to Feedback

#11 Updated by José Raddaoui Marín almost 5 years ago

  • Status changed from Feedback to Verified

It looks like we have created individual tickets for the feedback in update 8.

#12 Updated by Dan Gillean over 3 years ago

  • Blocked by deleted (Bug #9649: Events CSV imports will not show in user interface without further manual edits)

Also available in: Atom PDF