Task #13355

Reduce memory usage on descriptions CSV import

Added by José Raddaoui Marín 4 months ago. Updated 3 months ago.

Status:VerifiedStart date:06/15/2020
Priority:MediumDue date:
Assignee:-% Done:

0%

Category:CSV import
Target version:Release 2.6.0
Google Code Legacy ID: Tested version:2.5, 2.6
Sponsored:No Requires documentation:

Description

In large CSV imports the memory usage keeps growing until the process ends. We have seen similar cases working with related resources and the ORM. The problem is bigger when the system memory is exhausted by a CSV import running as a background job in the AtoM worker, as the job is retried when the worker restarts.

Until we find a better solution we should use clear the model classes on each description creation, like we've done in other issues recently.

atom2.6_memory.png (36.6 KB) Miguel Angel Medinilla Luque, 07/10/2020 08:39 AM

atom2.5_memory.png (39.1 KB) Miguel Angel Medinilla Luque, 07/10/2020 08:39 AM


Related issues

Related to Access to Memory (AtoM) - Task #13271: Investigate options to reduce memory usage working with r... New 03/09/2020
Related to Access to Memory (AtoM) - Task #13384: Reduce memory usage on nested set build task New 07/16/2020

History

#1 Updated by José Raddaoui Marín 4 months ago

  • Related to Task #13271: Investigate options to reduce memory usage working with related models added

#2 Updated by José Raddaoui Marín 4 months ago

  • Status changed from In progress to Code Review
  • Target version set to Release 2.6.0

#3 Updated by José Raddaoui Marín 4 months ago

  • Status changed from Code Review to QA/Review
  • Assignee deleted (José Raddaoui Marín)

Merged in qa/2.6.x.

In orther to test this change, it will require a big CSV file, imported over a considerable big database (at least the IOs table) and monitoring the memory usage. To be tested alongside #13354 and #13352.

#4 Updated by Dan Gillean 3 months ago

  • Status changed from QA/Review to Verified

#5 Updated by Miguel Angel Medinilla Luque 3 months ago

Tested with a bit CSV file (3MB) and a large database (io table > 500k records) with memory-profiler (https://pypi.org/project/memory-profiler/).

  • Atom 2.5 VM: 4 vCPUs, 15GB memory, Ubuntu Bionic
  • Atom 2.6 VM: 2 vCPUs, 7GB memory, Ubuntu Bionic

To install memory-profiler:

sudo pip install -U memory-profiler matplotlib tk

To run the memory-profiler with the csv import:

sudo su - www-data -s /bin/bash
cd /usr/share/nginx/atom
mprof run --include-children php symfony csv:import --index --source-name=XX /home/artefactual/XXXXXXXX_CSV.csv
mprof plot --output atom_memory.png --backend Agg

Running the above commands, a png file is created: /usr/share/nginx/atom/atom_memory.png

AtoM 2.5 memory test

The max memory usage is 120MB, but the graph shows that the memory usage is always growing but with a low slope.

AtoM 2.6 memory test

There are 2 different sections in this graph:

a) Memory usage is almost constant with a lower slope than 2.5 graph, and the max memory usage is ~ 80MB
b) The memory usage is increased till ~ 470MB (but it remains constant). This section happens when rebuilding the nested set.

Conclusions

The memory usage in AtoM 2.6 is lower than 2.5 (80MB vs 120MB in this big test) with CSV import if the rebuild nested set section is discarded.

This a a very large test database (io table > 500k records) but I think the memory usage when rebuilding the nested set should be reviewed.

#6 Updated by José Raddaoui Marín 3 months ago

  • Related to Task #13384: Reduce memory usage on nested set build task added

Also available in: Atom PDF