Bug #8475
Nested EAD elements in <physdesc> should not be escaped
Status: | Verified | Start date: | 05/22/2015 | |
---|---|---|---|---|
Priority: | Medium | Due date: | ||
Assignee: | Dan Gillean | % Done: | 0% | |
Category: | EAD | |||
Target version: | Release 2.2.0 | |||
Google Code Legacy ID: | Tested version: | 2.2 | ||
Sponsored: | No | Requires documentation: |
Description
Background
In 2.2, we are adding the ability for archivists to embed nested EAD elements inside the Extent and medium / Physical description field, to help reconcile the granularity of EAD with the big bucket approach of the content standards (like ISAD, RAD, and DACS).
Theoretically, users can embed the following elements inside the Extent and medium field:
<extent></extent> <dimensions></dimensions> <genreform></genreform> <physfacet></physfacet>
When saved, AtoM will hide the nested EAD elements from the display page. This allows more granular EAD during export, without sacrificing the presentation for end-users.
We have also added escaping to the field, to avoid export errors when special characters like unencoded ampersands are included.
Error encountered
Right now, if nested tags are present during export, they are escaped along with any other unencoded special characters.
On roundtrip, this means that the nesting no longer works. When re-imported, AtoM interprets the escaped content as plain text, resulting in something like this:
<physdesc encodinganalog="3.1.5"> Extent and medium, ISAD 3.1.5. Can include nested tags like:<lb/><lb/> <dimensions>Dimensions</dimensions><lb/><lb/> <extent>Extent</extent><lb/><lb/> <physfacet>Physical facet</physfacet><lb/><lb/> <genreform>Genre form</genreform> </physdesc>
Expected behavior
AtoM should not escape the EAD elements listed above, or the <lb/> element used to preserve line-breaks.
Related issues
History
#1 Updated by Mike Gale almost 7 years ago
- Blocked by Bug #8508: $options is sometimes undefined when rendering EAD, crashes AtoM added
#2 Updated by Mike Gale almost 7 years ago
- File 0001-Fix-physdesc-element-escaping-8475.patch added
- Status changed from New to Code Review
- Assignee changed from Mike Gale to José Raddaoui Marín
Boy do I feel silly after this one. I wasted a bunch of time because the XDebug module for PHP 'decorates' var_dump output with HTML, and it was doing its own escaping and confusing the heck out of me until I finally realized what was going on. The solution was also embarrassingly simple, just stop parsing the node manually (I think this was left over code from when we were messing with this field a bunch of times before), remove that code, and just use setExtentAndMedium with $nodeValue.
It seems to round trip well now.
#3 Updated by Mike Gale almost 7 years ago
Other note: it was because parsing the node manually (with saveXML()) instead of just using nodeValue, which apparently means $doc->substituteEntities = true; doesn't apply, which is why it was not 'unescaping' the nested elements again on import.
#4 Updated by Mike Gale almost 7 years ago
- File deleted (
0001-Fix-physdesc-element-escaping-8475.patch)
#5 Updated by Mike Gale almost 7 years ago
#6 Updated by José Raddaoui Marín almost 7 years ago
- Status changed from Code Review to In progress
- Assignee changed from José Raddaoui Marín to Mike Gale
Looks good!
#7 Updated by Mike Gale almost 7 years ago
- Status changed from In progress to QA/Review
- Assignee changed from Mike Gale to Dan Gillean
#8 Updated by Dan Gillean almost 7 years ago
- Status changed from QA/Review to Verified
Interesting. So everything still looks escaped (as in the issue ticket description example above, where the <lb> elements are mysteriously not escaped, but everything else is)... but upon re-import, they are "unescaped", and display properly in both the view and edit templates.
good enough for me!