Bug #10029

Elasticsearch document removal are not bulked

Added by Jesús García Crespo about 4 years ago. Updated almost 3 years ago.

Status:VerifiedStart date:06/15/2016
Priority:MediumDue date:
Assignee:-% Done:

0%

Category:Search / BrowseEstimated time:8.00 hours
Target version:Release 2.4.0
Google Code Legacy ID: Tested version:
Sponsored:No Requires documentation:No

Description

First, make sure that Elasticsearch bulk mode is enabled (it's enabled by default)

Write a script where you:
  1. Create an information object A
  2. Delete information object A

The object will be properly deleted from the database but the Elasticsearch document is not deleted. There is an attempt here but it fails and the exception is ignored.

The original issue is that the index operation was put in the batch, which is not flushed until the script ends (and __destruct() is invoked by the interpreter). However, the deletion is invoked directly, producing the error as we are trying to delete a document that has not been indexed yet.

Temporary solutions:
  1. Disable bulk mode
  2. Flush the batch manually before deletion

Preferred solution: update arElasticSearchPlugin::delete() to bulk the delete operation. Our current bulk implementation is based on documents, we should use actions instead.

History

#1 Updated by Mike Cantelon over 3 years ago

  • Assignee set to Mike Cantelon

#2 Updated by Mike Cantelon over 3 years ago

  • Status changed from New to Code Review
  • Assignee changed from Mike Cantelon to Nick Wilkinson

#3 Updated by Nick Wilkinson over 3 years ago

  • Assignee changed from Nick Wilkinson to Steve Breker

Hi Steve, assigning to you for CR.

#4 Updated by Steve Breker over 3 years ago

  • Status changed from Code Review to Feedback
  • Assignee changed from Steve Breker to Mike Cantelon

Hi Mike

Looks good - just wondering if flushBatch() should be called from delete() as it is called from addDocument()?

#5 Updated by Mike Cantelon over 3 years ago

Yeah, that's a good point... the queued deletes could sit there infinitely until enough adds are queued to trigger a flush. I'll fix that. :D

#6 Updated by Mike Cantelon over 3 years ago

  • Assignee changed from Mike Cantelon to Steve Breker

Okay, I've added flushing to delete.

#7 Updated by Steve Breker over 3 years ago

  • Assignee changed from Steve Breker to Mike Cantelon

CR complete. Looks great!

#8 Updated by Mike Cantelon over 3 years ago

  • Status changed from Feedback to QA/Review
  • Assignee changed from Mike Cantelon to Dan Gillean

If you want to test this, Dan, here's how I do it.

1. php symfony search:populate --demo

2. Import/create a few sample records

3. php symfony search:populate

4. Make sure you have Elasticsearch Head installed on your Vagrant box (or something that'll let you list what QubitInformationObject documents exist in AtoM's Elasticsearch index)

5. Verify that QubitInformationObject documents exist corresponding to what you've imported/created

6. php symmfony tools:run test.php # test.php should contain script below

<?php

$i = new QubitInformationObject;
$i->parentId = 1;
$i->title = 'Buddytown';
$i->setStatus(array('typeId' => QubitTerm::STATUS_TYPE_PUBLICATION_ID, 'statusId' => sfConfig::get('app_defaultPubStatus')));
$i->save();

print "Created ". $i->id ."...\n";

$i->delete();

print "Deleted.\n";

7. Note the ID of the information object that was created than deleted

8. Verify, using Elasticsearch Head or whatever, that the corresponding Elasticsearch QubitInformation object document has also been deleted

#9 Updated by Dan Gillean about 3 years ago

  • Assignee deleted (Dan Gillean)

#10 Updated by Nick Wilkinson almost 3 years ago

  • Assignee set to José Raddaoui Marín

Hi Radda, further to the email I sent out, assigning this to you.

#11 Updated by José Raddaoui Marín almost 3 years ago

  • Status changed from QA/Review to Verified
  • Assignee deleted (José Raddaoui Marín)

#12 Updated by José Raddaoui Marín almost 3 years ago

  • Requires documentation set to No

Also available in: Atom PDF