Bug #13598

Job scheduler doesn't always clean up temporary directories

Added by David Juhasz 4 months ago. Updated about 1 month ago.

Status:QA/ReviewStart date:01/12/2022
Priority:MediumDue date:
Assignee:-% Done:

0%

Category:Job scheduling
Target version:Release 2.7.0
Google Code Legacy ID: Tested version:2.7
Sponsored:No Requires documentation:

Description

Several AtoM asynchronous jobs create a temporary working directory (in /tmp) for staging files before adding them to a zip archive for delivery to the user: including the clipboard export jobs (archival description, authority record and archival institution), Physical storage report, and CSV validator. Several of these jobs (e.g. archival institution clipboard export, physical storage report, and csv validator) don't delete the working directory at the end of the job. The temporary files then take up disk space unnecessarily until the server is rebooted or the /tmp directories are manually deleted. Additionally when many of these working directories accumulate in the /tmp directory it's possible for a namespace collision on the directory name which then causes extra, unrelated files to be included in the zip packages delivered to the user.

To reproduce

Note: terminal access to the AtoM server is required to list the /tmp directory contents

  1. Log in to AtoM as an admin user
  2. Go to "Manage > Physical storage"
  3. Click "Export storage report"
  4. Click "Export"
  5. Click the "Job management" link
  6. Wait until the job completes (refresh the page until "Job status" is Complete)
  7. Open an SSH terminal to the AtoM server
  8. ls /tmp

Resulting error

There will be a directory named with an md5 hash (e.g. 269a15765b521abef1ddf25276378610) containing a holdings.csv file.

Expected result

The temporary directory should be deleted when the job completes. Ideally this is implemented so all jobs that create a /tmp directory automatically delete the temporary directory on job completion.

History

#1 Updated by Dan Gillean 4 months ago

This process should not be automatically set to run as soon as a job completes - otherwise that defeats part of the purpose of handling jobs asynchronously. A user needs time to return to the Jobs page and download the results of any job before the related directory and file is deleted.

I think it'd be better to implement something similar to the clipboard expiry setting, so an admin can set a number of days after which older directories are automatically deleted.

#2 Updated by David Juhasz about 1 month ago

Dan Gillean wrote:

This process should not be automatically set to run as soon as a job completes - otherwise that defeats part of the purpose of handling jobs asynchronously. A user needs time to return to the Jobs page and download the results of any job before the related directory and file is deleted.

To clarify, deleting the temporary directory will not delete the zip file that is downloaded by the user, it is just deleting the temporary working directory that is used to create the final zip. For example, deleting the temporary directory will not effect the user's ability to download the storage report.

#3 Updated by David Juhasz about 1 month ago

  • Status changed from New to In progress
  • Assignee set to David Juhasz

#4 Updated by David Juhasz about 1 month ago

  • Status changed from In progress to Code Review

#5 Updated by José Raddaoui Marín about 1 month ago

  • Status changed from Code Review to Feedback

#6 Updated by David Juhasz about 1 month ago

  • Status changed from Feedback to QA/Review
  • Assignee deleted (David Juhasz)

Also available in: Atom PDF