Task #13352

Reduce resource save/indexing count on description's CSV import

Added by José Raddaoui Marín 4 months ago. Updated 3 months ago.

Status:VerifiedStart date:06/13/2020
Priority:MediumDue date:
Assignee:-% Done:

0%

Category:CSV import
Target version:Release 2.6.0
Google Code Legacy ID: Tested version:2.5, 2.6
Sponsored:No Requires documentation:

Description

Enabling the MySQL general log and running the CSV import task with the ISAD example over the demo data as follows:

php symfony csv:import --index --source-name=test lib/task/import/example/isad/example_information_objects_isad.csv

The following log is generated:

https://gist.github.com/jraddaoui/f287b75386ececc0a877c888f5d3abaf

Searching for ...

Query    SELECT
         io.*,
         obj.created_at,
         obj.updated_at,
         slug.slug,
         pubstat.status_id as publication_status_id,
         do.id as digital_object_id,
         do.media_type_id as media_type_id,
         do.usage_id as usage_id,
         do.name as filename
       FROM information_object io
       JOIN object obj
         ON io.id = obj.id
       JOIN slug slug
         ON io.id = slug.object_id
       JOIN status pubstat
         ON io.id = pubstat.object_id
       LEFT JOIN digital_object do
         ON io.id = do.object_id
       WHERE io.id = 

... brings 39 hits, while the imported file only contains 2 descriptions. This query is executed only once when a description is indexed in:

https://github.com/artefactual/atom/blob/qa/2.6.x/plugins/arElasticSearchPlugin/lib/model/arElasticSearchInformationObjectPdo.class.php#L136-L155

In this example the first description is indexed 23 times and its son 16 times. Even if the index flag is set, we should avoid to index the resource until its final save and, if possible, reduce that amount of saves too.

PS. It's even worst with the RAD example ... 35 times for the top-level and 28 for the children.

History

#1 Updated by José Raddaoui Marín 4 months ago

  • Status changed from New to In progress
  • Assignee set to José Raddaoui Marín

#2 Updated by José Raddaoui Marín 4 months ago

  • Description updated (diff)

#3 Updated by José Raddaoui Marín 4 months ago

  • Status changed from In progress to Code Review
  • Target version set to Release 2.6.0

#4 Updated by José Raddaoui Marín 4 months ago

  • Status changed from Code Review to QA/Review
  • Assignee deleted (José Raddaoui Marín)

Merged in qa/2.6.x.

We should verify that the resources are being indexed when the CSV import is called with the --index option. Verifying that the indexing only happens once per IO will require to enable and check the ES or MySQL logs. This changes affect CSV import of other entities too, so we should test CSV imports for: accessions, actors, IOs and repositories. To be tested alongside #13355 and #13354.

#5 Updated by Steve Breker 4 months ago

INFO OBJS
---------
php symfony csv:import --index lib/task/import/example/isad/example_information_objects_isad.csv

IO indexing query count: 2 for entire example CSV
Searching for items successful - IO browse successful.

ACTORS
------
php symfony csv:authority-import --index lib/task/import/example/authority_records/example_authority_records.csv

Actor indexing query count: 2 per item in example CSV
(counting: do.media_type_id as media_type_id )
Searching for actors successful - Actor browse successful.

ACCESSIONS
----------
php symfony csv:accession-import --index lib/task/import/example/example_accessions.csv

Accession indexing query count: 1 per item in example CSV
(counting: JOIN slug slug ON acc.id = slug.object_id )

Searching for accessions successful in accessions search box - Accession browse successful.

REPOSITORIES
------------
Repo indexing query count: 2 per item in example CSV
(counting: SELECT * FROM repository_i18n WHERE id )

Searching for repositories successful - Repo browse successful.

#6 Updated by Dan Gillean 3 months ago

  • Status changed from QA/Review to Verified

Also available in: Atom PDF