Bug #791

Need to log error and notify user if FFMPEG is not able to successfully convert a file

Added by Joseph Perry about 10 years ago. Updated over 7 years ago.

Status:VerifiedStart date:
Priority:CriticalDue date:
Assignee:Evelyn McLellan% Done:

0%

Category:-
Target version:Release 0.6
Google Code Legacy ID:archivematica-136 Pull Request:
Sponsored: Requires documentation:

Description

I managed to normalize the MultimediaSIP test folder Evelyn gave me.
Originally it was having problems occationally "crapping out" on FFMPEG.

When the system runs out of space. It couldn't log this problem to the log
file because there was no space to do so.

Also it requires almost twice as much space to create the bagit.zip.
(one for the original + compressed size).
This is why occationally the zip folder is left as a .zip.templx or
something.

FYI: MultimediaSIP is 231 MB uncompressed and 187MB zipped by bagit.
Currently I don't think our builds have over 350 MB free.

The biggest problem I see, is there is not good notification sent to the
user when this happens. The RM at the end of the prepareAIP makes it look
like it doesn't run out of space.

[g] Legacy categories: Ingest

History

#1 Updated by Peter Van Garderen about 10 years ago

  • Priority changed from Medium to Critical
  • Target version set to Release 0.6

Nice job solving and reporting this mystery, Joseph. I'm relieved to hear it's not an
unpredictable FFmpeg issue.

So here are my proposed solutions
1) review sample files used for demo and trim down to short & small multimedia files
to maximize the 3.7 GB total we have on the USB key (Peter & Evelyn)
2) let's get rid of the extra 200MB FAT32 partition and add that space back to a
single partition for all of Archivematica. Also, let's stretch that partition to the
maximum 3.7GB available on our USBs (Austin). If the user wants to bring in outside
files they can actually just plug in a second USB. I've just tested that and it works
great. The second option is to mount a shared folder, both are legitimate options and
wins us some precious space. Note: both these options for external data import should
be documented at http://archivematica.org/usb, I think we can copy most of the
existing shared folder docs from the virtual appliance page (Austin? Evelyn?)
3) let's discuss eliminating the practice of including normalized access copies in
the Bagit AIP. This is a larger issue for Archivematica that applies in production
versions as well. I think we need to reconsider this design as storage demands will
add up significantly. Making this change will also help to simplify the workflow
needed to do enable multiple single file upload of the access copies (i.e. upload
each individual file in the SIP to Qubit). (Peter)

[g] Labels added: Priority-Critical, Milestone-Release-0.6, Component-Ubuntu
[g] Labels removed: Priority-Medium

#2 Updated by Peter Van Garderen about 10 years ago

Nice job solving and reporting this mystery, Joseph. I'm relieved to hear it's not an
unpredictable FFmpeg issue.

So here are my proposed solutions
1) review sample files used for demo and trim down to short & small multimedia files
to maximize the 3.7 GB total we have on the USB key (Peter & Evelyn)
2) let's get rid of the extra 200MB FAT32 partition and add that space back to a
single partition for all of Archivematica. Also, let's stretch that partition to the
maximum 3.7GB available on our USBs (Austin). If the user wants to bring in outside
files they can actually just plug in a second USB. I've just tested that and it works
great. The second option is to mount a shared folder, both are legitimate options and
wins us some precious space. Note: both these options for external data import should
be documented at http://archivematica.org/usb, I think we can copy most of the
existing shared folder docs from the virtual appliance page (Austin? Evelyn?)
3) let's discuss eliminating the practice of including normalized access copies in
the Bagit AIP. This is a larger issue for Archivematica that applies in production
versions as well. I think we need to reconsider this design as storage demands will
add up significantly. Making this change will also help to simplify the workflow
needed to do enable multiple single file upload of the access copies (i.e. upload
each individual file in the SIP to Qubit). (Peter)

#3 Updated by Evelyn McLellan about 10 years ago

Joseph,

That's great news! I'm pretty sure the worst size offenders were the avi and mov
files normalized to mxf. I substituted a much smaller avi file at the end of the day
and can have another look at the mov file. I'll review tomorrow morning.

#4 Updated by Anonymous about 10 years ago

I'm seeing the same sort of problem when working with the demo on as a virtual appliance. In my case, I'm working a SIP containing Access and Filemaker Pro databases, which are all fairly large (size in the tens of MBs).

As for Peter's point 3 above, I think this should be a configurable option, perhaps based on the normalization configurations - there are some cases where we will want to retain normalized copies in the AIP as well.

#5 Updated by Anonymous about 10 years ago

Also, part of the reason why this is a critical bug is also that when the SIP folder is moved into the prepareAIP folder, the cleanup process gets run even if the bagging process fails. Accordingly, the SIP at that point gets completely erased. IIRC it's possible to recover the DC metadata from the ingestLogs folder, but it took some digging to figure this out.

#6 Updated by Peter Van Garderen about 10 years ago

  • Subject set to Need to log error and notify user if FFMPEG is not able to successfully convert a file

Hi Mark,

1) AFAIK, the virtual appliance has a default storage capacity and RAM which presumably your larger Access & Filemaker databases are running into. Austin, can you remind us what those are? Those values should be configurable inside your virtual machine (VirtualBox, VMPlayer).
2) As for my point (3) above, we have implemented that into release 0.6 but, to be sure, that is referring only to the normalized access copy. Obviously the normalized preservation copy is stored inside the AIP alongside the original file in each case. I've created /p/archivematica/issues/detail?id=161 to capture the request to make including the access copy in the AIP a configurable option
3) Getting back to the root problem for this issue ticket which I think can be summarized/generalized to say that we need notification in the event that a normalization fails, that is true whether FFMPEG, OpenOffice, Imagemagick, etc. is doing the normalization. However, the error reporting for each tool is likely to be different so we probably need a seperate ticket for each normalization tool? For now I've just changed the name of this /p/archivematica/issues/detail?id=136 to address it for FFMEG.
4) RE: Mark's comment 5. The workflow instructions are to make a copy of the SIP at the 1-receiveSIP step so that the user can recover with an unprocessed SIP in the event of downstream errors like this. Mark, would that have solved your issue?
5) Finally, a new issue that Mark's report raises is that we currently do not have a preservation plan and normalization tool for file-based databases like MS-Access & Filemaker. Mark, we're open to suggestions. I've started a separate /p/archivematica/issues/detail?id=162 for this, please add any further comments there.

[g] Labels added: Component-Ingest
[g] Labels removed: Component-Ubuntu

#7 Updated by Anonymous about 10 years ago

Peter,

Re: your point 3: I'm not seeing this on normalization per se, but rather on the creation of the actual BagIt zip for the AIP - I can determine this because the file left behind is named something like "originalsipdirname-uuid.zip.biltemp".

Re: point 4: It solves it in part with the exception of "losing" the DC metadata. If there were potential instructions about how to recover this (even along the lines of "run grep -r (or ls -lt to determine recent ingest projects) in the ingestLogs directory") I'd probably be happy for the time being. I guess my point is that I've figured out how to recover it, but it should probably be documented for people who may be slightly less commandline savvy.

#8 Updated by Austin Trask about 10 years ago

The default VirtualBox images are 3.7GB(to fit on USB) with 500MB of free space this will quickly become a issue with large file sets. I will work on a updated copy of VirtualBox with 10GB+ of free space.

#9 Updated by Austin Trask about 10 years ago

Mark,

Ive uploaded a VirtualBox image with 13GB of space total. Let me know if your testing with a alternate Virtualization platform, and I will upload images as needed.

it is available here
http://archivematica.org/download/archivematica-0.6-100614-vbox.tgz (1.1GB)
md5sum:65bb2d0a3b1d88071b11033ffe38c37d

#10 Updated by Anonymous about 10 years ago

Austin, thanks for the new VirtualBox image. It should be sufficient for me now.

#11 Updated by Joseph Perry almost 10 years ago

Created /p/archivematica/issues/detail?id=175 to seperately address the free space problems with bag-it and converting.

My thoughts on addressing the out of space issue while normalizing:
I don't think we can automate a way for the OS to detect the system is out of space and take appropriate action.
As an alternative, I suggest we use a script to determine how much free space there is on the system drive, and if it is below a set threshold, then delete the normalized file (not the original) and then log the error.

#12 Updated by Joseph Perry almost 10 years ago

#This little bit of python code will get the free space on the drive in bytes.
#This will be usefull

import os
hd=os.statvfs("/")
freeSpace=(hd.f_bsize * hd.f_bavail)

#This can be used in conjunction with the freeSpace, to delete the file if the space is too low on the disk.
os.remove('path/file.extension')

#Once the file is deleted, there should be room for the problem to be logged.

#13 Updated by Joseph Perry over 9 years ago

Implemented a means to check the space one a normalized and access copy were created. Deletes the file if there isn't enough space and returns non-zero.

This needs to get tested.

[g] New owner: epmclellan

#14 Updated by Evelyn McLellan over 9 years ago

  • Status changed from New to Verified

Also available in: Atom PDF