Bug #8490

search:populate task sometimes crashes with error "1"

Added by Mike Gale almost 7 years ago. Updated almost 7 years ago.

Status:VerifiedStart date:05/28/2015
Priority:MediumDue date:
Assignee:Mike Gale% Done:

0%

Category:CLI tools
Target version:Release 2.2.0
Google Code Legacy ID: Tested version:2.1, 2.1.1, 2.1.2, 2.2
Sponsored:No Requires documentation:

Description

To reproduce:
1. Run php symfony search:populate
2. Happen to get lucky and have a search:populate session where it happens
3. An exception will be thrown and the task exits. The only error message displayed is "1".

Expected result: For that not to happen...

This one is really tricky since 90% of the time I've seen search:populate run it doesn't happen. But once in a blue moon it does. I believe this issue is because we're telling ES to flush its documents when it's already flushing, so it sends back an exception 'already flushing' (which is safe to ignore, according to my googling on it).

I made a custom search:populate script for Hector and David H that has extra error logging, and they both produced a similar result:

Elastica\Exception\PartialShardFailureException
string(2211) "#0 /usr/share/nginx/atom/vendor/Elastica/lib/Elastica/Request.php(168): Elastica\Transport\Http->exec(Object(Elastica\Request), Array)
#1 /usr/share/nginx/atom/vendor/Elastica/lib/Elastica/Client.php(592): Elastica\Request->send()
#2 /usr/share/nginx/atom/vendor/Elastica/lib/Elastica/Index.php(444): Elastica\Client->request('atom/_flush', 'POST', Array, Array)
#3 /usr/share/nginx/atom/vendor/Elastica/lib/Elastica/Index.php(415): Elastica\Index->request('_flush', 'POST', Array, Array)
#4 /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/arElasticSearchPlugin.class.php(333): Elastica\Index->flush()
#5 /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/model/arElasticSearchInformationObject.class.php(69): arElasticSearchPlugin->addDocument(Array, 'QubitInformatio...')
#6 /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/model/arElasticSearchInformationObject.class.php(81): arElasticSearchInformationObject->recursivelyAddInformationObjects('476455', '258441', Array)
#7 /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/model/arElasticSearchInformationObject.class.php(81): arElasticSearchInformationObject->recursivelyAddInformationObjects('470013', '258441', Array)
#8 /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/model/arElasticSearchInformationObject.class.php(42): arElasticSearchInformationObject->recursivelyAddInformationObjects(1, '258441')
#9 /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/arElasticSearchPlugin.class.php(279): arElasticSearchInformationObject->populate()
#10 /usr/share/nginx/atom/lib/task/search/arPopulateTask.class.php(53): arElasticSearchPlugin->populate()
#11 /usr/share/nginx/atom/vendor/symfony/lib/task/sfBaseTask.class.php(68): arSearchPopulateTask->execute(Array, Array)
#12 /usr/share/nginx/atom/vendor/symfony/lib/task/sfTask.class.php(97): sfBaseTask->doRun(Object(sfCommandManager), NULL)
#13 /usr/share/nginx/atom/vendor/symfony/lib/command/sfSymfonyCommandApplication.class.php(76): sfTask->runFromCLI(Object(sfCommandManager), NULL)
#14 /usr/share/nginx/atom/vendor/symfony/lib/command/cli.php(20): sfSymfonyCommandApplication->run()
#15 /usr/share/nginx/atom/symfony(14): include('/usr/share/ngin...')
#16 {main}"

There's also a corresponding error in the elasticsearch.log:

[2015-05-27 20:44:46,508][DEBUG][action.admin.indices.flush] [Layla Miller] [atom_d2][0], node[3UtGvQsQQ1WRVpoJXZJoWA], [P], s[STARTED]: failed to execute [org.elasticsearch.action.admin.indices.flush.FlushRequest@55c1addd]
org.elasticsearch.index.engine.FlushNotAllowedEngineException: [atom_d2][0] already flushing...
at org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:828)
at org.elasticsearch.index.shard.service.InternalIndexShard.flush(InternalIndexShard.java:627)
at org.elasticsearch.action.admin.indices.flush.TransportFlushAction.shardOperation(TransportFlushAction.java:110)
at org.elasticsearch.action.admin.indices.flush.TransportFlushAction.shardOperation(TransportFlushAction.java:50)
at org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction$1.run(TransportBroadcastOperationAction.java:171)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

flush-fix.patch Magnifier (992 Bytes) Mike Gale, 05/29/2015 12:56 AM

0001-Call-refresh-instead-of-flush-in-ES-8490.patch Magnifier (1.32 KB) Mike Gale, 06/25/2015 07:53 PM

History

#5 Updated by Mike Gale almost 7 years ago

A temporary solution I made for the sysadmins. It just eats any exception coming from index->flush() and ignores it.

#6 Updated by David Hume almost 7 years ago

Doubling batch_size from 500 to 1000 in config/search.yml did not seem to help, 3rd attempt that followed died about the same place as the 2nd attempt (first with Mike's custom search:populate script)

 
    [InformationObject] Third annual ball of the Vancouver Firemen's Benefit Association inserted (1938.8s) (107067/258441)
    [InformationObject] Fourth annual ball inserted (1938.82s) (107068/258441)
    [InformationObject] Fifth annual ball inserted (1938.83s) (107069/258441)
    [InformationObject] Fifth annual ball inserted (1938.85s) (107070/258441)
    [InformationObject] Souvenir of the Vancouver Fire Department inserted (1938.87s) (107071/258441)

root@cvan011:/usr/share/nginx/atom#
root@cvan011:/usr/share/nginx/atom#  cat /tmp/es_err.log

Elastica\Exception\PartialShardFailureException
string(2004) "#0 /usr/share/nginx/atom/vendor/Elastica/lib/Elastica/Request.php(168): Elastica\Transport\Http->exec(Object(Elastica\Request), Array)
#1 /usr/share/nginx/atom/vendor/Elastica/lib/Elastica/Client.php(592): Elastica\Request->send()
#2 /usr/share/nginx/atom/vendor/Elastica/lib/Elastica/Index.php(444): Elastica\Client->request('atom/_flush', 'POST', Array, Array)
#3 /usr/share/nginx/atom/vendor/Elastica/lib/Elastica/Index.php(415): Elastica\Index->request('_flush', 'POST', Array, Array)
#4 /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/arElasticSearchPlugin.class.php(333): Elastica\Index->flush()
#5 /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/model/arElasticSearchInformationObject.class.php(69): arElasticSearchPlugin->addDocument(Array, 'QubitInformatio...')
#6 /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/model/arElasticSearchInformationObject.class.php(81): arElasticSearchInformationObject->recursivelyAddInformationObjects('1619759', '258441', Array)
#7 /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/model/arElasticSearchInformationObject.class.php(42): arElasticSearchInformationObject->recursivelyAddInformationObjects(1, '258441')
#8 /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/arElasticSearchPlugin.class.php(279): arElasticSearchInformationObject->populate()
#9 /usr/share/nginx/atom/lib/task/search/arPopulateTask.class.php(53): arElasticSearchPlugin->populate()
#10 /usr/share/nginx/atom/vendor/symfony/lib/task/sfBaseTask.class.php(68): arSearchPopulateTask->execute(Array, Array)
#11 /usr/share/nginx/atom/vendor/symfony/lib/task/sfTask.class.php(97): sfBaseTask->doRun(Object(sfCommandManager), NULL)
#12 /usr/share/nginx/atom/vendor/symfony/lib/command/sfSymfonyCommandApplication.class.php(76): sfTask->runFromCLI(Object(sfCommandManager), NULL)
#13 /usr/share/nginx/atom/vendor/symfony/lib/command/cli.php(20): sfSymfonyCommandApplication->run()
#14 /usr/share/nginx/atom/symfony(14): include('/usr/share/ngin...')
#15 {main}" 

Will now put batch_size back and try Mike's flush-fix patch...

#7 Updated by David Hume almost 7 years ago

Successful completion of new base index with discussed modification
to plugins/arElasticSearchPlugin/lib/arElasticSearchPlugin.class.php

[InformationObject] TEST 2015-03-09 inserted (6045.05s) (258421/258441)
[InformationObject] TEST_AtoM_Index_2015-03-12 inserted (6045.19s) (258422/258441)
[InformationObject] TEST 2015-04-28 inserted (6045.31s) (258423/258441)
[InformationObject] Sectional and other maps inserted (6045.43s) (258424/258441)
Index populated with 276369 documents in 6045.44 seconds.

#8 Updated by Mike Gale almost 7 years ago

  • Priority changed from High to Medium

Setting to medium priority, as the sysadmins have a patch that fixes this issue in the mean time. Would like to discuss possible long term solutions with Jesús when he returns.

#9 Updated by Mike Gale almost 7 years ago

  • Status changed from In progress to Feedback
  • Assignee changed from Mike Gale to Jesús García Crespo
  • Target version changed from Release 2.1.2 to Release 2.2.0

#10 Updated by Mike Gale almost 7 years ago

  • Status changed from Feedback to In progress
  • Assignee changed from Jesús García Crespo to Mike Cantelon

#11 Updated by Mike Gale almost 7 years ago

  • Assignee changed from Mike Cantelon to Mike Gale

oops wrong mike

#12 Updated by Mike Gale almost 7 years ago

#13 Updated by Jesús García Crespo almost 7 years ago

  • Status changed from Code Review to Feedback
  • Assignee changed from Jesús García Crespo to Mike Gale

LGTM!

#14 Updated by Mike Gale almost 7 years ago

  • Status changed from Feedback to Verified

merged, I tested a search:populate and it seems fine

Also available in: Atom PDF