Task #13504

Improve stability and error reporting of AtoM job scheduler

Added by Dan Gillean 5 months ago. Updated about 1 month ago.

Status:QA/ReviewStart date:05/03/2021
Priority:MediumDue date:
Assignee:-% Done:

0%

Category:Job scheduling
Target version:Release 2.7.0
Google Code Legacy ID: Tested version:
Sponsored:No Requires documentation:

Description

This is a general roundup ticket for work that can improve the error reporting and overall stability of the atom-worker managed by the Gearman job scheduler. Problems include:

  • Job scheduler needs manual restart after deployment
  • atom-worker error logging is not verbose enough for troubleshooting in many cases
  • Job details page often truncates before errors are encountered

etc. Possible solutions include:

  • Logging and displaying errors separately from standard output logs, so they are not lost
  • Using PHP 7's throwable option to better consolidate error output information
  • Changes in base systemd configuration
  • etc...

History

#1 Updated by Mike Cantelon 5 months ago

  • Status changed from New to Code Review
  • Assignee deleted (Mike Cantelon)

PR for CR: https://github.com/artefactual/atom/pull/1291 (for adding more job worker logging and catching more errors)

#2 Updated by Mike Cantelon 5 months ago

  • Status changed from Code Review to QA/Review

#3 Updated by Mike Cantelon 5 months ago

PR merged into qa/2.x for QA.

#4 Updated by Mike Cantelon 4 months ago

  • Status changed from QA/Review to Code Review

PR (to log worker termination) for CR: https://github.com/artefactual/atom/pull/1295

#5 Updated by Mike Cantelon 4 months ago

  • Status changed from Code Review to QA/Review

PR merged into qa/2.x for QA.

#7 Updated by David Juhasz about 1 month ago

Melanie added additional job logging data with commit https://github.com/artefactual/atom/commit/dea6e65189ef3d6aa760265aced28605e09812de (qa/2.x)

Also available in: Atom PDF