Bug #11117

JHOVE reporting validation failure when MediaConch reports success

Added by Sarah Romkey over 4 years ago. Updated about 4 years ago.

Status:FeedbackStart date:04/26/2017
Priority:LowDue date:
Assignee:Sara Allain% Done:

0%

Category:-
Target version:Release 1.7.0
Google Code Legacy ID: Pull Request:
Sponsored:No Requires documentation:

Description

Testing with /acceptance-tests/preforma/when-normalized-all-valid, I created a second validation rule to validate mkv files with Jhove, in addition to the rule to validate using MediaConch. After normalization, the preservation derivatives pass validation using MediaConch, but not with Jhove. Jhove's outcome detail says "Well formed and valid," but the task has a non-zero exit code, and the eventOutcome for Jhove is "fail."

This is a bit of an edge case, we do expect people to use MediaConch for mkv validation moving forward, but it would be interesting to know if the non-zero exit code is related to something within Archivematica rather than a failure of Jhove itself.

History

#1 Updated by Joel Dunham over 4 years ago

  • Status changed from New to Feedback
  • Assignee changed from Joel Dunham to Nick Wilkinson

Hi Nick,

I need some higher-level guidance on this one since it goes beyond a software issue and is more about desired digital preservation behaviour. See below.

I believe what is happening here is expected JHOVE validation command behaviour. If you look at the "Validate using JHOVE" FPR command, you will see a comment that states:

# JHOVE returns "bytestream" for unrecognized file formats.
# That can include unrecognized or malformed PDFs, JPEG2000s, etc.
# Since we're whitelisting the formats we're passing in,
# "bytestream" indicates that the format is not in fact well-formed
# regardless of what the status reads.

In the case that Sarah describes, JHOVE validation of an .mkv file (a preservation derivative from a normalized .mov or .mpeg) is returning very little input (compared to using JHOVE on a file it recognizes) and crucially is identifying the format as bytestring:

$ jhove -h xml mov-not-conforms-NYULibraries_QTv210-60dc89a6-47a1-4eac-960b-847b82e636e4.mkv
<?xml version="1.0" encoding="UTF-8"?>
<jhove xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://hul.harvard.edu/ois/xml/ns/jhove" xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/jhove http://hul.harvard.edu/ois/xml/xsd/jhove/1.6/jhove.xsd" name="Jhove" release="1.6" date="2011-01-04">
 <date>2017-08-07T23:25:48+00:00</date>
 <repInfo uri="mov-not-conforms-NYULibraries_QTv210-60dc89a6-47a1-4eac-960b-847b82e636e4.mkv">
  <reportingModule release="1.3" date="2007-04-10">BYTESTREAM</reportingModule>
  <lastModified>2017-08-07T23:25:42+00:00</lastModified>
  <size>9387281</size>
  <format>bytestream</format>
  <status>Well-Formed and valid</status>
  <mimeType>application/octet-stream</mimeType>
 </repInfo>
</jhove>

As a result, the JHOVE FPR command interprets the JHOVE XML output as indicating failure.

My question: Is the JHOVE FPR command behaving correctly? Is it doing what we want it to do?

Note that JHOVE returns the same type of output on both .mkv files derived (via ffmpeg) from the .mov/.mpeg files in /acceptance-tests/preforma/when-normalized-all-valid. MediaConch on the other hand, recognizes the mp4-to-mkv one (mp4-test-0476c13c-7905-4ef9-9954-c2b40bda90d5.mkv) as invalid and the mov-to-mkv one (mov-not-conforms-NYULibraries_QTv210-60dc89a6-47a1-4eac-960b-847b82e636e4.mkv) as valid.

#2 Updated by Nick Wilkinson over 4 years ago

  • Assignee changed from Nick Wilkinson to Sara Allain

Hi Sara, are you able to provide Joel with some advice on this?

#3 Updated by Sara Allain about 4 years ago

Further testing suggests that this might be an issue with JHOVE itself. Next step is to download JHOVE locally and do some testing.

#4 Updated by Sara Allain about 4 years ago

Using the Archivematica sample data Multimedia folder, further investigation shows that JHOVE returns the format "bytestream" for the following formats:

  • MPEG
  • MP3
  • WMV
  • WMA
  • MOV

The only format that was properly identified and validated was WAV.

Essentially, I think we're applying JHOVE validation to the wrong formats. I think the solution here is to check out which formats are being validated with JHOVE and confirm that JHOVE can recognize the format. Could also create some more specific JHOVE commands - i.e. using the AIFF module.

Note: as a result of this testing, replaced the AIFF in archivematica-sampledata/Multimedia data because it was invalid.

Also available in: Atom PDF