Task #13273

Use Elasticsearch's "update by query API" to update related resources

Added by José Raddaoui Marín over 1 year ago. Updated over 1 year ago.

Status:NewStart date:03/13/2020
Priority:MediumDue date:
Assignee:-% Done:

0%

Category:Performance / scalability
Target version:-
Google Code Legacy ID: Tested version:
Sponsored:No Requires documentation:

Description

The update by query API was an experimental feature in Elasticsearch 2.x but it's no longer like that in 5.x and later versions:

https://www.elastic.co/guide/en/elasticsearch/reference/2.4/docs-update-by-query.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.6/docs-update-by-query.html

With some considerations in how the affected fields are indexed, this could be used to update all the related resources (descendants, IOs related to a repository, etc.) instead of do it one by one.


Related issues

Related to Access to Memory (AtoM) - Feature #13096: Remove unnecessary repository and actor data from informa... Verified 06/21/2019
Related to Access to Memory (AtoM) - Bug #13274: Publication status update job can stall when trying to up... New 03/17/2020
Related to Access to Memory (AtoM) - Task #13252: Look into use of partial ES update in repository bulk upd... New 01/30/2020
Related to Access to Memory (AtoM) - Feature #13386: Remove unnecessary data from Elasticsearch index and redu... New 06/21/2019

History

#1 Updated by José Raddaoui Marín over 1 year ago

  • Related to Feature #13096: Remove unnecessary repository and actor data from information object Elasticsearch index added

#2 Updated by José Raddaoui Marín over 1 year ago

An example that could be used in the update publication status job:

https://gist.github.com/jraddaoui/ab41bd527248ec52bec34bb457a748c9

#3 Updated by Dan Gillean over 1 year ago

  • Related to Bug #13274: Publication status update job can stall when trying to update large hierarchies added

#4 Updated by José Raddaoui Marín over 1 year ago

  • Related to Task #13252: Look into use of partial ES update in repository bulk updating code added

#5 Updated by José Raddaoui Marín over 1 year ago

  • Description updated (diff)

#6 Updated by Dan Gillean 3 months ago

  • Related to Feature #13386: Remove unnecessary data from Elasticsearch index and reduce unnecessary re-index operations added

Also available in: Atom PDF