Bug #6870

Sanitized files cause normalization.csv to fail

Added by Sarah Romkey over 3 years ago. Updated over 2 years ago.

Status:VerifiedStart date:06/19/2014
Priority:LowDue date:
Assignee:Sarah Romkey% Done:

90%

Category:Normalization
Target version:Release 1.4.0
Google Code Legacy ID: Pull Request:
Sponsored:No Requires documentation:

Description

From the user forum:

"I also did some further testing relating to another variation - in this case the issue ticket indicates that this wasn't part of the core requirements, but it appears that both the file names under manualNormalization and all the entries in normalization.csv need to be manually sanitized (I tested with spaces, but presumably this applies generally). On the other hand the original workflow (without the csv file, in a case where there are no conflicts in the filenames), manual normalization works without doing the sanitization first - so I had hoped that you would just need to sanitize the entries in the csv. More specifically:
1) normalized files unsanitized, no csv (no conflict in names) => works
2) normalized files unsantized, csv supplied with no sanitization => fails
3) normalized files unsanitized, csv supplied with sanitization of columns 2 and 3 => fails
4) normalized files unsanitized, csv supplied with sanitization of all columns => fails
5) normalized files sanitized, csv supplied with sanitization of columns 2 and 3 => fails
6) normalized files sanitized, csv supplied with sanitization of all columns => works"

While option 6 is a workaround, it would be desirable as noted in the forum post to sanitize the csv manually, and not need to manually sanitize both the files and the csv.

History

#1 Updated by Tim Hutchinson about 3 years ago

I'm wondering if the pull request posted to #4922 (not re-opened) will actually address this issue.
https://github.com/artefactual/archivematica/pull/97

From the pull request: "Fix normalization.csv parsing to match on the unsanitized name in all cases, instead of the sanitized name."

#2 Updated by Sarah Romkey about 3 years ago

  • Status changed from New to Feedback
  • Target version set to Release 1.3.1

Holly, could you take a look at PR 97 and see if it fixes this issue?

#3 Updated by Holly Becker about 3 years ago

After the fix from https://github.com/artefactual/archivematica/pull/97 goes in (assuming the original file on disk is always unsanitized) the behavior should be as follows:

1) normalized files unsanitized, no csv (no conflict in names) => works
2) normalized files unsantized, csv supplied with no sanitization => works
3) normalized files unsanitized, csv supplied with sanitization of columns 2 and 3 => expected fail (names in csv do not match file on disk)
4) normalized files unsanitized, csv supplied with sanitization of all columns => expected fail (names in csv do not match file on disk, manually normalized files in csv not recognized)
5) normalized files sanitized, csv supplied with sanitization of columns 2 and 3 => works
6) normalized files sanitized, csv supplied with sanitization of all columns => expected fail (name of original file in csv does not match file on disk, manually normalized files only partially recognized)

In short, it will be updated so that the filenames in the normalization.csv should match whatever is on disk when the Transfer is given to Archivematica.

The fix is currently in code review.

#4 Updated by Justin Simpson about 3 years ago

  • Status changed from Feedback to Code Review
  • Target version changed from Release 1.3.1 to Release 1.4.0

As part of code review, it became clear that this fix does have an impact on how the METS file will look. Since this is a change in behaviour, and not just a bug fix, I am bumping this to the 1.4.0 release.

#5 Updated by Misty De Meo over 2 years ago

  • Status changed from Code Review to QA/Review
  • Assignee changed from Holly Becker to Sarah Romkey
  • % Done changed from 0 to 90

This was merged.

#6 Updated by Sarah Romkey over 2 years ago

  • Status changed from QA/Review to Verified

Verified that this works now as described by Holly in the comment above.

Also available in: Atom PDF