Task #13133

Investigate memory usage and garbage collection in Finding aid generation issues

Added by Dan Gillean about 2 years ago.

Status:NewStart date:08/02/2019
Priority:MediumDue date:
Assignee:-% Done:


Category:Finding aids
Target version:-
Google Code Legacy ID: Tested version:2.4
Sponsored:No Requires documentation:


From a user forum report, 2019-08-02: https://groups.google.com/d/msg/ica-atom-users/1J8L0_Jy8HQ/4ZNCGIldFgAJ

Hello, when I generate the Finding aid report the following error happens:

"Exception in thread" main "java.lang.OutOfMemoryError: GC overhead limit exceeded"

The file “php3wjUT2” has 143MB and approximately 1761547 lines.

"arFindingAidJob": Running: java -jar '/var/www/html/atom/atom-2.4.1/lib/task/pdf/saxon9he.jar' -s: '/ tmp / phpnAm9Fl' -xsl: '/ var /www/html/atom/atom-2.4.1/lib/task/pdf/ead-pdf-inventory-summary.xsl '-o:' / tmp / php3wjUT2 '2> & 1

"arFindingAidJob": Running: fop -r -q -fo '/ tmp / php3wjUT2' -pdf '/var/www/html/atom/atom-2.4.1/downloads/legislative-language-assembly -general-3.pdf '2> & 1

"arFindingAidJob": Converting the EAD FO to PDF has failed.

I split the php3wjUT2 file into 3 parts of about 50MB and ran the command line below and the pdf was generated successfully.

fop -r -q -fo '/ tmp / php3wjUT2' -pdf '/var/www/html/atom/atom-2.4.1/downloads/legislative-assembly-of-blade-general-3.pdf '2> & 1

Server Settings:
Linux Centos
Memory: 16GB
CPU (s): 4 - MHz: 2600.000
Free Disk: 34G

Are there any adjustments I can make to generate the report without splitting it?

Reading up on the GC overhead limit error: https://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#par_gc.oom

"The parallel collector will throw an OutOfMemoryError if too much time is being spent in garbage collection: if more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered, an OutOfMemoryError will be thrown. This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap is too small. If necessary, this feature can be disabled by adding the option -XX:-UseGCOverheadLimit to the command line."

See also: https://www.jvmhost.com/articles/what-is-java-lang-outofmemoryerror-gc-overhead-limit-exceeded/

The file being generated was large (143MB), but the user also had 16GB memory, which in theory should be enough. If 98% of time is being spent in garbage collection, we should look at the finding aid generation for memory leak and garbage collection issues to see if we can optimize it, so the only solution isn't disabling the UseGCOverheadLimit and/or throwing more memory at the problem.

Also available in: Atom PDF