Bug #10226
MCP server error when normalization tool's stderr output contains non-unicode characters
Status: | QA/Review | Start date: | 08/22/2016 | |
---|---|---|---|---|
Priority: | Medium | Due date: | ||
Assignee: | Sarah Romkey | % Done: | 0% | |
Category: | - | |||
Target version: | - | |||
Google Code Legacy ID: | Pull Request: | https://github.com/artefactual/archivematica/pull/497 | ||
Sponsored: | No | Requires documentation: |
Description
Ref. #10212 notes 3 and 5
archivematica does not check/sanitize the stderr/stdout from the normalization tool before attempting to save to the Tasks table, breaking when it contains non UTF-8 characters
History
#1 Updated by Hector Akamine almost 6 years ago
- Subject changed from MCP server error when normalization tool stderr output contains non-unicode characters to MCP server error when normalization tool's stderr output contains non-unicode characters
#5 Updated by Hector Akamine almost 6 years ago
- Assignee set to Nick Wilkinson
Hi Nick, this is related to #10212 (MoMA, this problem potentially breaks their automation workflow). Could you please assign a developer to estimate how much effort would take to fix it?
#6 Updated by Nick Wilkinson almost 6 years ago
- Assignee changed from Nick Wilkinson to Jesús García Crespo
#7 Updated by Hector Akamine almost 6 years ago
I uploaded the .wav file mentioned in https://projects.artefactual.com/issues/10212#note-5 to tamias:/home/artefactual/clients/moma/issue-10226/
#8 Updated by Jesús García Crespo almost 6 years ago
I was able to reproduce in stable/1.5.x:
ffprobe version 3.0-7:3.0-2~14.04 Copyright (c) 2007-2016 the FFmpeg developers built with gcc 4.8 (Ubuntu 4.8.4-2ubuntu1~14.04.1) configuration: --arch=amd64 --enable-pthreads --enable-runtime-cpudetect --extra-version='7:3.0-2~14.04' --libdir=/usr/lib/x86_64-linux-gnu --disable-stripping --prefix=/usr --enable-bzlib --enable-libdc1394 --enable-frei0r --enable-gnutls --enable-libgsm --enable-libmp3lame --enable-librtmp --enable-libopencv --enable-libopenjpeg --enable-libpulse --enable-libschroedinger --enable-libspeex --enable-libtheora --enable-vaapi --enable-vdpau --enable-libvorbis --enable-libvpx --enable-zlib --enable-gpl --enable-postproc --enable-libcdio --enable-x11grab --enable-libx264 --shlibdir=/usr/lib/x86_64-linux-gnu --enable-shared --disable-static avcodec configuration: --arch=amd64 --enable-pthreads --enable-runtime-cpudetect --extra-version='7:3.0-2~14.04' --libdir=/usr/lib/x86_64-linux-gnu --disable-stripping --prefix=/usr --enable-bzlib --enable-libdc1394 --enable-frei0r --enable-gnutls --enable-libgsm --enable-libmp3lame --enable-librtmp --enable-libopencv --enable-libopenjpeg --enable-libpulse --enable-libschroedinger --enable-libspeex --enable-libtheora --enable-vaapi --enable-vdpau --enable-libvorbis --enable-libvpx --enable-zlib --enable-gpl --enable-postproc --enable-libcdio --enable-x11grab --enable-libx264 --shlibdir=/usr/lib/x86_64-linux-gnu --enable-shared --disable-static --enable-libopencore-amrnb --enable-version3 --enable-libopencore-amrwb --enable-version3 --enable-libvo-amrwbenc --enable-version3 libavutil 55. 17.103 / 55. 17.103 libavcodec 57. 24.102 / 57. 24.102 libavformat 57. 25.100 / 57. 25.100 libavdevice 57. 0.101 / 57. 0.101 libavfilter 6. 31.100 / 6. 31.100 libswscale 4. 0.100 / 4. 0.100 libswresample 2. 0.101 / 2. 0.101 libpostproc 54. 0.100 / 54. 0.100 Input #0, wav, from '/var/archivematica/sharedDirectory/watchedDirectories/workFlowDecisions/extractPackagesChoice/Prueba005-46c72ea9-3b9f-41ae-bca0-1b1a35ec52ca/objects/(PaulStretch_massappeal).wav': Metadata: title : (PaulStretch_massappeal) NITR : NTKB? Duration: 00:28:32.24, bitrate: 1411 kb/s Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 2 channels, s16, 1411 kb/s [xml @ 0x2056e40] 3 invalid UTF-8 sequence(s) found in string 'NTKB?', replaced with '�'
ERROR 2016-08-24 21:34:22 archivematica.mcp.server:utils:wrapped:15: Uncaught exception Traceback (most recent call last): File "/vagrant/src/archivematica/src/MCPServer/lib/utils.py", line 13, in wrapped return fn(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/django_mysqlpool/__init__.py", line 10, in wrapper return f(*args, **kwargs) File "/vagrant/src/archivematica/src/MCPServer/lib/taskStandard.py", line 88, in performTask self.check_request_status(completed_job_request) File "/vagrant/src/archivematica/src/MCPServer/lib/taskStandard.py", line 97, in check_request_status self.linkTaskManager.taskCompletedCallBackFunction(self) File "/vagrant/src/archivematica/src/MCPServer/lib/linkTaskManagerFiles.py", line 144, in taskCompletedCallBackFunction databaseFunctions.logTaskCompletedSQL(task) File "/usr/lib/archivematica/archivematicaCommon/databaseFunctions.py", line 224, in logTaskCompletedSQL task.save() File "/usr/local/lib/python2.7/dist-packages/django/db/models/base.py", line 589, in save force_update=force_update, update_fields=update_fields) File "/usr/local/lib/python2.7/dist-packages/django/db/models/base.py", line 617, in save_base updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields) File "/usr/local/lib/python2.7/dist-packages/django/db/models/base.py", line 679, in _save_table forced_update) File "/usr/local/lib/python2.7/dist-packages/django/db/models/base.py", line 723, in _do_update return filtered._update(values) > 0 File "/usr/local/lib/python2.7/dist-packages/django/db/models/query.py", line 600, in _update return query.get_compiler(self.db).execute_sql(CURSOR) File "/usr/local/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 1005, in execute_sql cursor = super(SQLUpdateCompiler, self).execute_sql(result_type) File "/usr/local/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 787, in execute_sql cursor.execute(sql, params) File "/usr/local/lib/python2.7/dist-packages/django/db/backends/utils.py", line 65, in execute return self.cursor.execute(sql, params) File "/usr/local/lib/python2.7/dist-packages/django/db/backends/mysql/base.py", line 129, in execute return self.cursor.execute(query, args) File "/usr/local/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 228, in execute if not self._defer_warnings: self._warning_check() File "/usr/local/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 127, in _warning_check warn(w[-1], self.Warning, 3) Warning: Invalid utf8 character string: 'F59D3F'
#9 Updated by Jesús García Crespo almost 6 years ago
- Assignee changed from Jesús García Crespo to Holly Becker
This solution is working for me but I don't now if it's the right thing to do.
diff --git a/src/archivematicaCommon/lib/databaseFunctions.py b/src/archivematicaCommon/lib/databaseFunctions.py index 61790a6..c7ef0aa 100755 --- a/src/archivematicaCommon/lib/databaseFunctions.py +++ b/src/archivematicaCommon/lib/databaseFunctions.py @@ -219,8 +219,8 @@ def logTaskCompletedSQL(task): task = Task.objects.get(taskuuid=taskUUID) task.endtime = getUTCDate() task.exitcode = exitCode - task.stdout = stdOut - task.stderror = stdError + task.stdout = deUnicode(stdOut, errors='ignore') + task.stderror = deUnicode(stdOut, errors='ignore') task.save() def logJobCreatedSQL(job): @@ -314,7 +314,7 @@ def getAccessionNumberFromTransfer(UUID): raise ValueError("No Transfer found for UUID: {}".format(UUID)) -def deUnicode(str): +def deUnicode(str, errors=None): """ Convert a unicode string into an str by encoding it using UTF-8. @@ -323,4 +323,7 @@ def deUnicode(str): """ if str == None: return None - return unicode(str).encode('utf-8') + kwargs = {} + if errors is not None: + kwargs['errors'] = errors + return unicode(str, **kwargs).encode('utf-8')
Thoughts?
#10 Updated by Holly Becker over 5 years ago
- Status changed from New to Feedback
- Assignee changed from Holly Becker to Jesús García Crespo
Encoding the stdout/stderr looks like a good idea. However, I'd suggest using archivematicaFunctions.unicodetoStr
instead of deUnicode. It looks like deUnicode is actually buggy (unicode(str_with_encoded_unicode)
fails), so replacing it entirely with unicodeToStr updated to handle errors would be great.
#11 Updated by Jesús García Crespo over 5 years ago
- Status changed from Feedback to In progress
I'll try that, thanks.
#12 Updated by Nick Wilkinson over 5 years ago
- Assignee changed from Jesús García Crespo to Holly Becker
Hey Holly, reassigning to you but let me know if you think this should stay with Jesús based on progress to date.
#14 Updated by Holly Becker over 5 years ago
- Status changed from In progress to QA/Review
- Assignee changed from Holly Becker to Nick Wilkinson
- Pull Request set to https://github.com/artefactual/archivematica/pull/497
This is the same bug as #9967, where the log output needs to handle (or ignore) problematic characters. This should be fixed with AMPR https://github.com/artefactual/archivematica/pull/497
Does it need to be backported to stable/1.5.x for hosted/support clients?
#15 Updated by Nick Wilkinson over 5 years ago
- Assignee changed from Nick Wilkinson to Sara Allain
Hi Sara, do you have advice for Holly about her question in note # 14?
#16 Updated by Sara Allain over 5 years ago
- Assignee changed from Sara Allain to Holly Becker
If this issue sometimes causes the MCP server to crash during transfer (as described in #9967) then it should probably be backported. Some of the Archivematica hosted clients haven't been upgraded yet, so they'd benefit from backporting, assuming it's not a huge amount of work.
I discussed quickly with Justin and he suggested backporting to stable/1.5.x for now - we can cut a 1.5.2 branch if the 1.6 release takes too long.
#17 Updated by Holly Becker over 5 years ago
- Assignee changed from Holly Becker to Nick Wilkinson
AMPR 497 merged and backported to stable/1.5.x
#18 Updated by Nick Wilkinson over 5 years ago
- Assignee changed from Nick Wilkinson to Sara Allain
#19 Updated by Sara Allain over 5 years ago
- Assignee changed from Sara Allain to Sarah Romkey
Sarah, passing to you for testing when you get a chance!