Investigate memory usage and garbage collection in Finding aid generation issues
|Google Code Legacy ID:||Tested version:||2.4|
From a user forum report, 2019-08-02: https://groups.google.com/d/msg/ica-atom-users/1J8L0_Jy8HQ/4ZNCGIldFgAJ
Hello, when I generate the Finding aid report the following error happens:
"Exception in thread" main "java.lang.OutOfMemoryError: GC overhead limit exceeded"
The file “php3wjUT2” has 143MB and approximately 1761547 lines.
"arFindingAidJob": Running: java -jar '/var/www/html/atom/atom-2.4.1/lib/task/pdf/saxon9he.jar' -s: '/ tmp / phpnAm9Fl' -xsl: '/ var /www/html/atom/atom-2.4.1/lib/task/pdf/ead-pdf-inventory-summary.xsl '-o:' / tmp / php3wjUT2 '2> & 1
"arFindingAidJob": Running: fop -r -q -fo '/ tmp / php3wjUT2' -pdf '/var/www/html/atom/atom-2.4.1/downloads/legislative-language-assembly -general-3.pdf '2> & 1
"arFindingAidJob": Converting the EAD FO to PDF has failed.
I split the php3wjUT2 file into 3 parts of about 50MB and ran the command line below and the pdf was generated successfully.
fop -r -q -fo '/ tmp / php3wjUT2' -pdf '/var/www/html/atom/atom-2.4.1/downloads/legislative-assembly-of-blade-general-3.pdf '2> & 1
CPU (s): 4 - MHz: 2600.000
Free Disk: 34G
Are there any adjustments I can make to generate the report without splitting it?
Reading up on the GC overhead limit error: https://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#par_gc.oom
"The parallel collector will throw an OutOfMemoryError if too much time is being spent in garbage collection: if more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered, an OutOfMemoryError will be thrown. This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap is too small. If necessary, this feature can be disabled by adding the option -XX:-UseGCOverheadLimit to the command line."
The file being generated was large (143MB), but the user also had 16GB memory, which in theory should be enough. If 98% of time is being spent in garbage collection, we should look at the finding aid generation for memory leak and garbage collection issues to see if we can optimize it, so the only solution isn't disabling the UseGCOverheadLimit and/or throwing more memory at the problem.