Task #6815

METS format info only from transfer file ID

Added by Holly Becker over 7 years ago. Updated almost 5 years ago.

Status:NewStart date:06/11/2014
Priority:MediumDue date:
Assignee:-% Done:

0%

Category:-
Target version:-
Google Code Legacy ID: Requires documentation:
Sponsored:No

Description

Only the file identification run during transfer should be in the METS file, not the file identification run during ingest. This brings up the question of why file identification is run during ingest.

Investigate why this is the case, and update or remove as needed.


Related issues

Related to Archivematica - Task #5674: Format ID available in both Transfer and Ingest Verified 09/25/2013

History

#1 Updated by Misty De Meo over 7 years ago

This brings up the question of why file identification is run during ingest.

My understanding is that file ID is run in ingest in order to allow reidentification in the event that the IDs from transfer weren't satisfactory, e.g. to switch tools. In that case, if file ID is rerun, it should probably invalidate the IDs from transfer - only the most recent set of IDs should be included in the METS.

#2 Updated by Evelyn McLellan over 7 years ago

I brought this up with Holly because I understand the main purpose of the Identify file format micro-service during transfer to be to capture and record file format identification for preservation purposes. As a user, I would expect this to be the micro-service that produces the output that is written to the METS file. If that is not always the case, I think we need to make that very explicit in our documentation.

#3 Updated by Sarah Romkey almost 5 years ago

I think the most important use case for having identification information recorded from the Ingest stage is if your material was backlogged for some time, and PRONOM has since been updated. Possibly files that couldn't be identified in transfer can now be identified correctly.

It's also conceivable that someone will get different output from Fido vs. Siegfried and wants to re-identify for that reason.

My suggestion for desired behaviour would be:

- use the data from Transfer if the information gathered in Ingest isn't any different
- use the data from Ingest if the information is different from that in Transfer.

Also available in: Atom PDF