Reduce memory usage on descriptions CSV import
|Target version:||Release 2.6.0|
|Google Code Legacy ID:||Tested version:||2.5, 2.6|
In large CSV imports the memory usage keeps growing until the process ends. We have seen similar cases working with related resources and the ORM. The problem is bigger when the system memory is exhausted by a CSV import running as a background job in the AtoM worker, as the job is retried when the worker restarts.
Until we find a better solution we should use clear the model classes on each description creation, like we've done in other issues recently.
#3 Updated by José Raddaoui Marín almost 2 years ago
- Status changed from Code Review to QA/Review
- Assignee deleted (
José Raddaoui Marín)
#5 Updated by Miguel Angel Medinilla Luque almost 2 years ago
Tested with a bit CSV file (3MB) and a large database (io table > 500k records) with memory-profiler (https://pypi.org/project/memory-profiler/).
- Atom 2.5 VM: 4 vCPUs, 15GB memory, Ubuntu Bionic
- Atom 2.6 VM: 2 vCPUs, 7GB memory, Ubuntu Bionic
To install memory-profiler:
sudo pip install -U memory-profiler matplotlib tk
To run the memory-profiler with the csv import:
sudo su - www-data -s /bin/bash cd /usr/share/nginx/atom mprof run --include-children php symfony csv:import --index --source-name=XX /home/artefactual/XXXXXXXX_CSV.csv mprof plot --output atom_memory.png --backend Agg
Running the above commands, a png file is created: /usr/share/nginx/atom/atom_memory.png
AtoM 2.5 memory test¶
The max memory usage is 120MB, but the graph shows that the memory usage is always growing but with a low slope.
AtoM 2.6 memory test¶
There are 2 different sections in this graph:
a) Memory usage is almost constant with a lower slope than 2.5 graph, and the max memory usage is ~ 80MB
b) The memory usage is increased till ~ 470MB (but it remains constant). This section happens when rebuilding the nested set.
The memory usage in AtoM 2.6 is lower than 2.5 (80MB vs 120MB in this big test) with CSV import if the rebuild nested set section is discarded.
This a a very large test database (io table > 500k records) but I think the memory usage when rebuilding the nested set should be reviewed.