JHOVE reporting validation failure when MediaConch reports success
|Assignee:||Sara Allain||% Done:|
|Target version:||Release 1.7.0|
|Google Code Legacy ID:||Pull Request:|
Testing with /acceptance-tests/preforma/when-normalized-all-valid, I created a second validation rule to validate mkv files with Jhove, in addition to the rule to validate using MediaConch. After normalization, the preservation derivatives pass validation using MediaConch, but not with Jhove. Jhove's outcome detail says "Well formed and valid," but the task has a non-zero exit code, and the eventOutcome for Jhove is "fail."
This is a bit of an edge case, we do expect people to use MediaConch for mkv validation moving forward, but it would be interesting to know if the non-zero exit code is related to something within Archivematica rather than a failure of Jhove itself.
#1 Updated by Joel Dunham over 4 years ago
- Status changed from New to Feedback
- Assignee changed from Joel Dunham to Nick Wilkinson
I need some higher-level guidance on this one since it goes beyond a software issue and is more about desired digital preservation behaviour. See below.
I believe what is happening here is expected JHOVE validation command behaviour. If you look at the "Validate using JHOVE" FPR command, you will see a comment that states:
# JHOVE returns "bytestream" for unrecognized file formats. # That can include unrecognized or malformed PDFs, JPEG2000s, etc. # Since we're whitelisting the formats we're passing in, # "bytestream" indicates that the format is not in fact well-formed # regardless of what the status reads.
In the case that Sarah describes, JHOVE validation of an .mkv file (a preservation derivative from a normalized .mov or .mpeg) is returning very little input (compared to using JHOVE on a file it recognizes) and crucially is identifying the format as
$ jhove -h xml mov-not-conforms-NYULibraries_QTv210-60dc89a6-47a1-4eac-960b-847b82e636e4.mkv <?xml version="1.0" encoding="UTF-8"?> <jhove xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://hul.harvard.edu/ois/xml/ns/jhove" xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/jhove http://hul.harvard.edu/ois/xml/xsd/jhove/1.6/jhove.xsd" name="Jhove" release="1.6" date="2011-01-04"> <date>2017-08-07T23:25:48+00:00</date> <repInfo uri="mov-not-conforms-NYULibraries_QTv210-60dc89a6-47a1-4eac-960b-847b82e636e4.mkv"> <reportingModule release="1.3" date="2007-04-10">BYTESTREAM</reportingModule> <lastModified>2017-08-07T23:25:42+00:00</lastModified> <size>9387281</size> <format>bytestream</format> <status>Well-Formed and valid</status> <mimeType>application/octet-stream</mimeType> </repInfo> </jhove>
As a result, the JHOVE FPR command interprets the JHOVE XML output as indicating failure.
My question: Is the JHOVE FPR command behaving correctly? Is it doing what we want it to do?
Note that JHOVE returns the same type of output on both .mkv files derived (via ffmpeg) from the .mov/.mpeg files in /acceptance-tests/preforma/when-normalized-all-valid. MediaConch on the other hand, recognizes the mp4-to-mkv one (mp4-test-0476c13c-7905-4ef9-9954-c2b40bda90d5.mkv) as invalid and the mov-to-mkv one (mov-not-conforms-NYULibraries_QTv210-60dc89a6-47a1-4eac-960b-847b82e636e4.mkv) as valid.
#4 Updated by Sara Allain about 4 years ago
Using the Archivematica sample data Multimedia folder, further investigation shows that JHOVE returns the format "bytestream" for the following formats:
The only format that was properly identified and validated was WAV.
Essentially, I think we're applying JHOVE validation to the wrong formats. I think the solution here is to check out which formats are being validated with JHOVE and confirm that JHOVE can recognize the format. Could also create some more specific JHOVE commands - i.e. using the AIFF module.
Note: as a result of this testing, replaced the AIFF in archivematica-sampledata/Multimedia data because it was invalid.