Schedule removal of old data from the "access_log" table
|Target version:||Release 2.7.0|
|Google Code Legacy ID:||Tested version:||2.7|
A row is added to the "access_log" table every time an AtoM resource (e.g. archival description, authority record, archival institution) is loaded and this data is used to populate the "Popular this week" feature on the AtoM homepage.
There is no mechanism in AtoM currently to remove access_log table rows from the database though, so the table can quickly grow very large on a high traffic AtoM site. The "Popular this week" feature only needs usage data for the previous seven days, so any data older than seven days is unnecessary.
Provide an automated way to remove obsolete data from the "access_log" table.
#4 Updated by David Juhasz about 1 month ago
- Subject changed from Schedule removal of old data from the access_log table to Schedule removal of old data from the "access_log" table
I considered three possible solutions to expire "access_log" data after seven days:
- Expire old data when the "Popular this week" widget is loaded (in the DefaultPopularComponent class)
- Expire old data when adding a new "access_log" row (in the QubitAccessLogObserver class)
- [Choosen] Expire old data with a CLI script (e.g. tools:expire-data)
- With Option 1, if a client removes the "Popular this week" widget from their home page (we have several clients that do so) then data will continue to be added to the access_log table, the data will not be used at all (the data is only used to populate "Popular this week"), AND the data will never be expired
- Option 2 would run the expiration code on every page load of an archival description, authority record, repository, or function (see: https://github.com/artefactual/atom/commit/caf01bb3a94e86ed74b630d66fc9cdcb0baaea4b) which seems excessive
- Both Option 1 & 2 would automatically delete "access_log" data even though the site administrators may not be aware of the change (e.g. when they update to Release 2.7.0); if the "access_log" data is being used for other purposes than the "Popular this week" feature the data deletion could be problematic
- Option 3 requires explicitly running the expiration script manually or via a scheduler. This ensures that the AtoM site administrators must choose if the expiration happens, and on what schedule (e.g. daily, weekly, monthly)
#6 Updated by David Juhasz about 1 month ago
- Status changed from Code Review to QA/Review
- Assignee deleted (
Change merged to qa/2.x: https://github.com/artefactual/atom/commit/a1e7e078a102346704343eb8a0b138fe31eb6ae3
#8 Updated by David Juhasz about 1 month ago
I found a bug with deleting multiple resource types at the same time where the first calculated "expiry date" is used for all subsequent types.
1) Expire multiple resource types without using the "--older-than" option. E.g.
symfony tools:expire-data clipboard,job,access_log
The first "expiry date" calculated will be used for all resource types. E.g.
>> expire-data Used app_clipboard_save_max_age setting to set expiry date of 2021-10-20. Are you sure you want to delete saved clipboards older than 2021-10-20 (y/N)? y >> expire-data 0 saved clipboards deleted. Are you sure you want to delete jobs (and any related files) older than 2021-10-20 (y/N)? y >> expire-data 0 jobs (and any related files) deleted. Are you sure you want to delete access logs older than 2021-10-20 (y/N)?
The expiry date for each resource type should be calculated independently based on the rules for that resource type. E.g. the default expiry date for access_log should be 7 days before the current date.
#9 Updated by Steve Breker about 1 month ago
Relevant docs section re: tools:expire-data:
#11 Updated by David Juhasz about 1 month ago
#12 Updated by Dan Gillean about 1 month ago
- Status changed from QA/Review to Verified
- Target version set to Release 2.7.0
- % Done changed from 0 to 100
- Requires documentation deleted (
Documentation updated in https://github.com/artefactual/atom-docs/commit/d5f7386b7e4bcbbbeb16907c8eeac6777ab16cbe