Bug #8490
search:populate task sometimes crashes with error "1"
Status: | Verified | Start date: | 05/28/2015 | |
---|---|---|---|---|
Priority: | Medium | Due date: | ||
Assignee: | Mike Gale | % Done: | 0% | |
Category: | CLI tools | |||
Target version: | Release 2.2.0 | |||
Google Code Legacy ID: | Tested version: | 2.1, 2.1.1, 2.1.2, 2.2 | ||
Sponsored: | No | Requires documentation: |
Description
To reproduce:
1. Run php symfony search:populate
2. Happen to get lucky and have a search:populate session where it happens
3. An exception will be thrown and the task exits. The only error message displayed is "1".
Expected result: For that not to happen...
This one is really tricky since 90% of the time I've seen search:populate run it doesn't happen. But once in a blue moon it does. I believe this issue is because we're telling ES to flush its documents when it's already flushing, so it sends back an exception 'already flushing' (which is safe to ignore, according to my googling on it).
I made a custom search:populate script for Hector and David H that has extra error logging, and they both produced a similar result:
Elastica\Exception\PartialShardFailureException
string(2211) "#0 /usr/share/nginx/atom/vendor/Elastica/lib/Elastica/Request.php(168): Elastica\Transport\Http->exec(Object(Elastica\Request), Array)
#1 /usr/share/nginx/atom/vendor/Elastica/lib/Elastica/Client.php(592): Elastica\Request->send()
#2 /usr/share/nginx/atom/vendor/Elastica/lib/Elastica/Index.php(444): Elastica\Client->request('atom/_flush', 'POST', Array, Array)
#3 /usr/share/nginx/atom/vendor/Elastica/lib/Elastica/Index.php(415): Elastica\Index->request('_flush', 'POST', Array, Array)
#4 /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/arElasticSearchPlugin.class.php(333): Elastica\Index->flush()
#5 /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/model/arElasticSearchInformationObject.class.php(69): arElasticSearchPlugin->addDocument(Array, 'QubitInformatio...')
#6 /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/model/arElasticSearchInformationObject.class.php(81): arElasticSearchInformationObject->recursivelyAddInformationObjects('476455', '258441', Array)
#7 /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/model/arElasticSearchInformationObject.class.php(81): arElasticSearchInformationObject->recursivelyAddInformationObjects('470013', '258441', Array)
#8 /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/model/arElasticSearchInformationObject.class.php(42): arElasticSearchInformationObject->recursivelyAddInformationObjects(1, '258441')
#9 /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/arElasticSearchPlugin.class.php(279): arElasticSearchInformationObject->populate()
#10 /usr/share/nginx/atom/lib/task/search/arPopulateTask.class.php(53): arElasticSearchPlugin->populate()
#11 /usr/share/nginx/atom/vendor/symfony/lib/task/sfBaseTask.class.php(68): arSearchPopulateTask->execute(Array, Array)
#12 /usr/share/nginx/atom/vendor/symfony/lib/task/sfTask.class.php(97): sfBaseTask->doRun(Object(sfCommandManager), NULL)
#13 /usr/share/nginx/atom/vendor/symfony/lib/command/sfSymfonyCommandApplication.class.php(76): sfTask->runFromCLI(Object(sfCommandManager), NULL)
#14 /usr/share/nginx/atom/vendor/symfony/lib/command/cli.php(20): sfSymfonyCommandApplication->run()
#15 /usr/share/nginx/atom/symfony(14): include('/usr/share/ngin...')
#16 {main}"
There's also a corresponding error in the elasticsearch.log:
[2015-05-27 20:44:46,508][DEBUG][action.admin.indices.flush] [Layla Miller] [atom_d2][0], node[3UtGvQsQQ1WRVpoJXZJoWA], [P], s[STARTED]: failed to execute [org.elasticsearch.action.admin.indices.flush.FlushRequest@55c1addd]
org.elasticsearch.index.engine.FlushNotAllowedEngineException: [atom_d2][0] already flushing...
at org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:828)
at org.elasticsearch.index.shard.service.InternalIndexShard.flush(InternalIndexShard.java:627)
at org.elasticsearch.action.admin.indices.flush.TransportFlushAction.shardOperation(TransportFlushAction.java:110)
at org.elasticsearch.action.admin.indices.flush.TransportFlushAction.shardOperation(TransportFlushAction.java:50)
at org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction$1.run(TransportBroadcastOperationAction.java:171)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
History
#5 Updated by Mike Gale almost 7 years ago
- File flush-fix.patch
added
A temporary solution I made for the sysadmins. It just eats any exception coming from index->flush() and ignores it.
#6 Updated by David Hume almost 7 years ago
Doubling batch_size from 500 to 1000 in config/search.yml did not seem to help, 3rd attempt that followed died about the same place as the 2nd attempt (first with Mike's custom search:populate script)
[InformationObject] Third annual ball of the Vancouver Firemen's Benefit Association inserted (1938.8s) (107067/258441) [InformationObject] Fourth annual ball inserted (1938.82s) (107068/258441) [InformationObject] Fifth annual ball inserted (1938.83s) (107069/258441) [InformationObject] Fifth annual ball inserted (1938.85s) (107070/258441) [InformationObject] Souvenir of the Vancouver Fire Department inserted (1938.87s) (107071/258441) root@cvan011:/usr/share/nginx/atom# root@cvan011:/usr/share/nginx/atom# cat /tmp/es_err.log Elastica\Exception\PartialShardFailureException string(2004) "#0 /usr/share/nginx/atom/vendor/Elastica/lib/Elastica/Request.php(168): Elastica\Transport\Http->exec(Object(Elastica\Request), Array) #1 /usr/share/nginx/atom/vendor/Elastica/lib/Elastica/Client.php(592): Elastica\Request->send() #2 /usr/share/nginx/atom/vendor/Elastica/lib/Elastica/Index.php(444): Elastica\Client->request('atom/_flush', 'POST', Array, Array) #3 /usr/share/nginx/atom/vendor/Elastica/lib/Elastica/Index.php(415): Elastica\Index->request('_flush', 'POST', Array, Array) #4 /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/arElasticSearchPlugin.class.php(333): Elastica\Index->flush() #5 /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/model/arElasticSearchInformationObject.class.php(69): arElasticSearchPlugin->addDocument(Array, 'QubitInformatio...') #6 /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/model/arElasticSearchInformationObject.class.php(81): arElasticSearchInformationObject->recursivelyAddInformationObjects('1619759', '258441', Array) #7 /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/model/arElasticSearchInformationObject.class.php(42): arElasticSearchInformationObject->recursivelyAddInformationObjects(1, '258441') #8 /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/arElasticSearchPlugin.class.php(279): arElasticSearchInformationObject->populate() #9 /usr/share/nginx/atom/lib/task/search/arPopulateTask.class.php(53): arElasticSearchPlugin->populate() #10 /usr/share/nginx/atom/vendor/symfony/lib/task/sfBaseTask.class.php(68): arSearchPopulateTask->execute(Array, Array) #11 /usr/share/nginx/atom/vendor/symfony/lib/task/sfTask.class.php(97): sfBaseTask->doRun(Object(sfCommandManager), NULL) #12 /usr/share/nginx/atom/vendor/symfony/lib/command/sfSymfonyCommandApplication.class.php(76): sfTask->runFromCLI(Object(sfCommandManager), NULL) #13 /usr/share/nginx/atom/vendor/symfony/lib/command/cli.php(20): sfSymfonyCommandApplication->run() #14 /usr/share/nginx/atom/symfony(14): include('/usr/share/ngin...') #15 {main}"
Will now put batch_size back and try Mike's flush-fix patch...
#7 Updated by David Hume almost 7 years ago
Successful completion of new base index with discussed modification
to plugins/arElasticSearchPlugin/lib/arElasticSearchPlugin.class.php
[InformationObject] TEST 2015-03-09 inserted (6045.05s) (258421/258441)
[InformationObject] TEST_AtoM_Index_2015-03-12 inserted (6045.19s) (258422/258441)
[InformationObject] TEST 2015-04-28 inserted (6045.31s) (258423/258441)
[InformationObject] Sectional and other maps inserted (6045.43s) (258424/258441)
Index populated with 276369 documents in 6045.44 seconds.
#8 Updated by Mike Gale almost 7 years ago
- Priority changed from High to Medium
Setting to medium priority, as the sysadmins have a patch that fixes this issue in the mean time. Would like to discuss possible long term solutions with Jesús when he returns.
#9 Updated by Mike Gale almost 7 years ago
- Status changed from In progress to Feedback
- Assignee changed from Mike Gale to Jesús García Crespo
- Target version changed from Release 2.1.2 to Release 2.2.0
#10 Updated by Mike Gale almost 7 years ago
- Status changed from Feedback to In progress
- Assignee changed from Jesús García Crespo to Mike Cantelon
#11 Updated by Mike Gale almost 7 years ago
- Assignee changed from Mike Cantelon to Mike Gale
oops wrong mike
#12 Updated by Mike Gale almost 7 years ago
- File 0001-Call-refresh-instead-of-flush-in-ES-8490.patch
added
- Status changed from In progress to Code Review
- Assignee changed from Mike Gale to Jesús García Crespo
#13 Updated by Jesús García Crespo almost 7 years ago
- Status changed from Code Review to Feedback
- Assignee changed from Jesús García Crespo to Mike Gale
LGTM!
#14 Updated by Mike Gale almost 7 years ago
- Status changed from Feedback to Verified
merged, I tested a search:populate and it seems fine