Bug #11662

FITS XML output not stored if NLNZ Metadata Extractor fails

Added by Tim Hutchinson about 4 years ago. Updated about 4 years ago.

Status:NewStart date:10/30/2017
Priority:MediumDue date:
Assignee:-% Done:


Target version:-
Google Code Legacy ID: Pull Request:
Sponsored:No Requires documentation:


If NLNZ Metadata Extractor returns an error, it seems that this error gets added to the standard output (not just error output), resulting in an invalid XML file, and therefore the full FITS output not being added to the METS file.

Command: characterizeFile_v0.0 "/var/archivematica/sharedDirectory/watchedDirectories/workFlowDecisions/extractPackagesChoice/Word5-87383bfa-b113-4178-a7ab-1c1d53dbecb5/objects/RANDOM08.DOC" "82b4ad76-2f54-4c4f-a02f-1c88a952012d" "87383bfa-b113-4178-a7ab-1c1d53dbecb5"


2017-10-30 21:21:34,523 ERROR [NLNZ Metadata Extractor] MetadataExtractor:133 - NLNZ Metadata Extractor error while harvesting file: java.lang.NegativeArraySizeException

<fits xmlns="http://hul.harvard.edu/ois/xml/ns/fits/fits_output" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/fits/fits_output http://hul.harvard.edu/ois/xml/xsd/fits/fits_output.xsd" version="0.8.4" timestamp="10/30/17 9:21 PM">


edu.harvard.hul.ois.fits.exceptions.FitsToolException: NLNZ Metadata Extractor error while harvesting file RANDOM08.DOC
at edu.harvard.hul.ois.fits.tools.nlnz.MetadataExtractor.extractInfo(MetadataExtractor.java:134)
at edu.harvard.hul.ois.fits.tools.ToolBase.run(ToolBase.java:275)
at java.lang.Thread.run(Thread.java:748)

XML output for command "FITS" (183f6d5f-3a8e-4e5a-a6bc-948b9bfca176) was not valid XML; not saving to database


#1 Updated by Tim Hutchinson about 4 years ago

The extra text before the start of the XML file seems to be an issue with the NLNZ tool, or else FITS. It came up for me with Word 5.x for DOS files; I was able to reproduce it with the MP3 file (from Internet Archive) linked in a similar report here:

#2 Updated by Tim Hutchinson about 4 years ago

It turns out this duplicates #8647

Also available in: Atom PDF