Bug #7171
Extent tags in EAD with ampersands in them crash XML export
Status: | Verified | Start date: | 08/29/2014 | |
---|---|---|---|---|
Priority: | Medium | Due date: | ||
Assignee: | Dan Gillean | % Done: | 0% | |
Category: | EAD | |||
Target version: | Release 2.1.0 | |||
Google Code Legacy ID: | Tested version: | 2.0.0, 2.0.1, 2.1 | ||
Sponsored: | No | Requires documentation: |
Description
If the extent tag has a & in it, it'll crap out half way through the export and just spew XML to stdout.
I presume other "bad" characters would crash this also. The solution was to call esc_specialchars() on the extent data before parsing it for display.
Related issues
History
#1 Updated by Mike Gale over 7 years ago
- Status changed from New to QA/Review
- Assignee changed from Mike Gale to Dan Gillean
Committed a fix to the general AtoM repo: 9937f0ed044565a25a996fffbeb4424c1ce0c0b8
#2 Updated by Dan Gillean over 7 years ago
- Status changed from QA/Review to New
- Assignee changed from Dan Gillean to Mike Gale
- Tested version 2.0.0, 2.0.1, 2.1 added
I've related this to a bunch of other issues - notably #6949 for the moment - but as you can see, this is affecting all of our metadata exchange formats. It would be nice to come up with a more global solution for escaping these and angle brackets ( < and > will also break export) and/or replacing them with encoding values so they still display properly, etc.
#3 Updated by Dan Gillean over 7 years ago
- Status changed from New to QA/Review
- Assignee changed from Mike Gale to Dan Gillean
oops, simultaneous update
#4 Updated by Dan Gillean over 7 years ago
- Status changed from QA/Review to Feedback
Welp,
In my local testing, the ampersand is not being escaped so much as just removed entirely. Tried to import the following:
<physdesc> <extent encodinganalog="3.1.5">22 pp of textual records & other stuff. </extent> </physdesc>
What I got, in both view and edit mode, was the extent with the ampersand completely removed. This is perhaps better than breaking the import entirely, but the user might not even notice that the character has been removed, making the sentence confusing and incomplete.
I don't know what's possible, but ideally, an ampersand would just be replaced with the html code for adding one, e.g.
&
Testing with brackets found that the opening < bracket was removed, while the closing bracket > was successfully imported.
Warnings were thrown in all cases despite the successful import - but these warnings aren't verbose or clear enough for an end-user to understand that special characters were removed.
Willing to let this in as is, but would like to see this improved in the future if possible.
#5 Updated by Dan Gillean over 7 years ago
Another quick update:
An unintended consequence of escaping this field is that it affects the nested tags we were using to create struction for multiple fields (such as extent) nested in the physdesc element - the <dt> and <dd> tags are also being escaped:
Haven't yet tried to roundtrip this file, but the results might be unpredictable.
#6 Updated by Mike Gale over 7 years ago
Hey Dan, to clarify this fix was for exporting EAD was being broken by &s. Not importing.
I think this may be a separate bug unrelated to my fix?
#7 Updated by Mike Gale over 7 years ago
- Status changed from Feedback to In progress
- Assignee changed from Dan Gillean to Mike Gale
Possible regression with the <>s being escaped for dd and dt tags in the extent field
#8 Updated by Dan Gillean over 7 years ago
- Status changed from In progress to Feedback
Not sure if this is caused by your recent fix on this issue as well or not, but strangely, <eadid> element tags are being escaped:
Don't see this happening with any other elements. Not sure why this one is being affected.
#9 Updated by Mike Gale over 7 years ago
I'm pretty sure I broke the eadid tag when I fixed #6949. It was dumb of me to escape the entire tag (which of course has escapable characters like < and > in it) instead of just the problem identifier. Now we can both say we did something silly this ticket, Dan :P
Working on a fix now.
#10 Updated by Mike Gale over 7 years ago
- File 7171-fix.patch
added
Attaching a GIT patch since my SSH keys aren't working on this new work VM I setup at home..
#11 Updated by Jesús García Crespo over 7 years ago
- Status changed from Feedback to QA/Review
- Assignee changed from Mike Gale to Dan Gillean
Ok, I've just applied your patch.
#12 Updated by Sarah Romkey over 7 years ago
- Status changed from QA/Review to Verified
Verified! No more crapping out.