Extent tags in EAD with ampersands in them crash XML export
|Assignee:||Dan Gillean||% Done:|
|Target version:||Release 2.1.0|
|Google Code Legacy ID:||Tested version:||2.0.0, 2.0.1, 2.1|
If the extent tag has a & in it, it'll crap out half way through the export and just spew XML to stdout.
I presume other "bad" characters would crash this also. The solution was to call esc_specialchars() on the extent data before parsing it for display.
#2 Updated by Dan Gillean over 7 years ago
- Status changed from QA/Review to New
- Assignee changed from Dan Gillean to Mike Gale
- Tested version 2.0.0, 2.0.1, 2.1 added
I've related this to a bunch of other issues - notably #6949 for the moment - but as you can see, this is affecting all of our metadata exchange formats. It would be nice to come up with a more global solution for escaping these and angle brackets ( < and > will also break export) and/or replacing them with encoding values so they still display properly, etc.
#4 Updated by Dan Gillean over 7 years ago
- Status changed from QA/Review to Feedback
In my local testing, the ampersand is not being escaped so much as just removed entirely. Tried to import the following:
<physdesc> <extent encodinganalog="3.1.5">22 pp of textual records & other stuff. </extent> </physdesc>
What I got, in both view and edit mode, was the extent with the ampersand completely removed. This is perhaps better than breaking the import entirely, but the user might not even notice that the character has been removed, making the sentence confusing and incomplete.
I don't know what's possible, but ideally, an ampersand would just be replaced with the html code for adding one, e.g.
Testing with brackets found that the opening < bracket was removed, while the closing bracket > was successfully imported.
Warnings were thrown in all cases despite the successful import - but these warnings aren't verbose or clear enough for an end-user to understand that special characters were removed.
Willing to let this in as is, but would like to see this improved in the future if possible.
#5 Updated by Dan Gillean over 7 years ago
Another quick update:
An unintended consequence of escaping this field is that it affects the nested tags we were using to create struction for multiple fields (such as extent) nested in the physdesc element - the <dt> and <dd> tags are also being escaped:
Haven't yet tried to roundtrip this file, but the results might be unpredictable.
#8 Updated by Dan Gillean over 7 years ago
- Status changed from In progress to Feedback
Not sure if this is caused by your recent fix on this issue as well or not, but strangely, <eadid> element tags are being escaped:
Don't see this happening with any other elements. Not sure why this one is being affected.
#9 Updated by Mike Gale over 7 years ago
I'm pretty sure I broke the eadid tag when I fixed #6949. It was dumb of me to escape the entire tag (which of course has escapable characters like < and > in it) instead of just the problem identifier. Now we can both say we did something silly this ticket, Dan :P
Working on a fix now.