Bug #13581

ES index update jobs frequently run when not needed.

Added by Dan Gillean 6 months ago.

Status:NewStart date:11/08/2021
Priority:MediumDue date:
Assignee:-% Done:

0%

Category:Job scheduling
Target version:-
Google Code Legacy ID: Tested version:2.6, 2.7
Sponsored:No Requires documentation:No

Description

In issue #13386 we began looking at ways to remove unnecessary fields from related entities from the AtoM search index. This will hopefully lead to less noise in the search results, and a minor performance boost (~10-15% faster) in reindexing operations.

However, that ticket only describes half of the current problem. In addition, there are many times when editing a record will trigger an Elasticsearch update job to be run that is not required.

For example: when editing a repository or actor, a job will be triggered to reindex all related descriptions. However, this job is often triggered unnecessarily, as only the authorized form of name (and actor history, entity type, dates) are displayed on related records. Similarly, if a parent description is edited, a job will be triggered to update all descendants - but if the original update was to a notes field or a scope and content statement, this re-indexing of descendants accomplishes nothing - only creator names, identifiers, and repository names are inherited at lower levels (and dates of creation would be useful to update as well for date range calculation).

Ideally, we would limit the execution of this job so that it only runs when related fields are affected, and not every time a related record is edited.

To reproduce

Case 1

  • Find or create a description with multiple descendants
  • Enter edit mode on the parent description
  • Update the scope and content statement
  • Save
  • Navigate to the Jobs page

Case 2

  • Find or create a repository record linked to multiple descriptions
  • Update the Reproduction services field in the Services information area
  • Save
  • Navigate to the jobs page

Case 3

  • Find or create an authority record linked to multiple descriptions
  • Enter edit mode
  • Update the Legal status field in the Description information area
  • Save
  • Navigate to the jobs page

Resulting error

  • Search index update jobs are triggered for related entities, despite them not being needed
  • In an active AtoM site with many users and many jobs being run, this can lead to critical updates being queued behind these unnecessary jobs; can increase the overall system resources used; etc.

Expected outcome

  • Search index update jobs are only triggered when a relevant field is updated in the related record.

Related issues

Related to Access to Memory (AtoM) - Feature #13386: Remove unnecessary data from Elasticsearch index Verified 06/21/2019

History

#1 Updated by Dan Gillean 6 months ago

  • Related to Feature #13386: Remove unnecessary data from Elasticsearch index added

Also available in: Atom PDF