Bug #13245

Checksum validation fails when importing large digital objects

Added by José Raddaoui Marín about 1 month ago. Updated about 1 month ago.

Status:NewStart date:01/17/2020
Priority:MediumDue date:
Assignee:-% Done:

0%

Category:Digital object
Target version:-
Google Code Legacy ID: Tested version:2.5, 2.6
Sponsored:No Requires documentation:

Description

Initially reported on the user forum:

https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/ica-atom-users/7pcaU6ziXi8

Using the hash function over the file contents may return a different checksum than using the hash_file function over the file path. See the following script to verify this issue:

<?php

$filePath = 'test_checksum.mkv';
$copyPath = 'test_checksum_copy.mkv';
$algorithm = 'sha256';  // Same behavior with md5

$contents = file_get_contents($filePath);
file_put_contents($copyPath, $contents);

$pathChecksum = hash_file($algorithm, $filePath);
$copyChecksum = hash_file($algorithm, $copyPath);
$contentsChecksum = hash($algorithm, $contents);

unlink($copyPath);

print("Path:    $pathChecksum\n");
print("Copy:    $copyChecksum\n");
print("Content: $contentsChecksum\n");

/*
Output with a small file (250 MB):

Path:    33b6d78fdce183a74aa63e2de1d373480a47041869911fbafd0fb390b3c5f0f7
Copy:    33b6d78fdce183a74aa63e2de1d373480a47041869911fbafd0fb390b3c5f0f7
Content: 33b6d78fdce183a74aa63e2de1d373480a47041869911fbafd0fb390b3c5f0f7

Output with a large file (5 GB):

Path:    ac8dff0fb4eab233b704bffde925b371a2bffb9ec4564299fa619c26a0efd552
Copy:    ac8dff0fb4eab233b704bffde925b371a2bffb9ec4564299fa619c26a0efd552
Content: 9aa3258480bc0838909ec120321247274a841caf431c1597fe3941991edb8ddb
*/

This is causing an error when the checksum is validated after copying the master copy to the uploads folder (here). Even if the file content is on memory (something that we should avoid on #13236) we should also use the same hashing function to validate the copied file.

https://github.com/artefactual/atom/blob/stable/2.5.x/lib/model/QubitDigitalObject.php#L3234
https://github.com/artefactual/atom/blob/stable/2.5.x/lib/QubitAsset.class.php#L121-L140


Related issues

Related to Access to Memory (AtoM) - Bug #13236: Investigate memory usage in PHP during digital object upload New 01/09/2020

History

#1 Updated by José Raddaoui Marín about 1 month ago

  • Related to Bug #13236: Investigate memory usage in PHP during digital object upload added

Also available in: Atom PDF