Bug #8475

Nested EAD elements in <physdesc> should not be escaped

Added by Dan Gillean about 7 years ago. Updated almost 7 years ago.

Status:VerifiedStart date:05/22/2015
Priority:MediumDue date:
Assignee:Dan Gillean% Done:

0%

Category:EAD
Target version:Release 2.2.0
Google Code Legacy ID: Tested version:2.2
Sponsored:No Requires documentation:

Description

Background

In 2.2, we are adding the ability for archivists to embed nested EAD elements inside the Extent and medium / Physical description field, to help reconcile the granularity of EAD with the big bucket approach of the content standards (like ISAD, RAD, and DACS).

Theoretically, users can embed the following elements inside the Extent and medium field:

<extent></extent>
<dimensions></dimensions>
<genreform></genreform>
<physfacet></physfacet>

When saved, AtoM will hide the nested EAD elements from the display page. This allows more granular EAD during export, without sacrificing the presentation for end-users.

We have also added escaping to the field, to avoid export errors when special characters like unencoded ampersands are included.

Error encountered
Right now, if nested tags are present during export, they are escaped along with any other unencoded special characters.

On roundtrip, this means that the nesting no longer works. When re-imported, AtoM interprets the escaped content as plain text, resulting in something like this:

<physdesc encodinganalog="3.1.5"> Extent and medium, ISAD 3.1.5. Can include
        nested tags like:&lt;lb/&gt;<lb/>
        &lt;dimensions&gt;Dimensions&lt;/dimensions&gt;&lt;lb/&gt;<lb/>
        &lt;extent&gt;Extent&lt;/extent&gt;&lt;lb/&gt;<lb/>
        &lt;physfacet&gt;Physical facet&lt;/physfacet&gt;&lt;lb/&gt;<lb/>
        &lt;genreform&gt;Genre form&lt;/genreform&gt;    
</physdesc>

Expected behavior
AtoM should not escape the EAD elements listed above, or the <lb/> element used to preserve line-breaks.

0001-Fix-physdesc-element-escaping-8475.patch Magnifier (1.7 KB) Mike Gale, 06/11/2015 01:29 AM


Related issues

Blocked by Access to Memory (AtoM) - Bug #8508: $options is sometimes undefined when rendering EAD, crash... Verified 06/02/2015

History

#1 Updated by Mike Gale almost 7 years ago

  • Blocked by Bug #8508: $options is sometimes undefined when rendering EAD, crashes AtoM added

#2 Updated by Mike Gale almost 7 years ago

  • File 0001-Fix-physdesc-element-escaping-8475.patch added
  • Status changed from New to Code Review
  • Assignee changed from Mike Gale to José Raddaoui Marín

Boy do I feel silly after this one. I wasted a bunch of time because the XDebug module for PHP 'decorates' var_dump output with HTML, and it was doing its own escaping and confusing the heck out of me until I finally realized what was going on. The solution was also embarrassingly simple, just stop parsing the node manually (I think this was left over code from when we were messing with this field a bunch of times before), remove that code, and just use setExtentAndMedium with $nodeValue.

It seems to round trip well now.

#3 Updated by Mike Gale almost 7 years ago

Other note: it was because parsing the node manually (with saveXML()) instead of just using nodeValue, which apparently means $doc->substituteEntities = true; doesn't apply, which is why it was not 'unescaping' the nested elements again on import.

#4 Updated by Mike Gale almost 7 years ago

  • File deleted (0001-Fix-physdesc-element-escaping-8475.patch)

#6 Updated by José Raddaoui Marín almost 7 years ago

  • Status changed from Code Review to In progress
  • Assignee changed from José Raddaoui Marín to Mike Gale

Looks good!

#7 Updated by Mike Gale almost 7 years ago

  • Status changed from In progress to QA/Review
  • Assignee changed from Mike Gale to Dan Gillean

#8 Updated by Dan Gillean almost 7 years ago

  • Status changed from QA/Review to Verified

Interesting. So everything still looks escaped (as in the issue ticket description example above, where the <lb> elements are mysteriously not escaped, but everything else is)... but upon re-import, they are "unescaped", and display properly in both the view and edit templates.

good enough for me!

Also available in: Atom PDF