Bug #6949

Ampersand in identifier breaks EAD export

Added by Evelyn McLellan almost 8 years ago. Updated over 7 years ago.

Status:VerifiedStart date:07/08/2014
Priority:HighDue date:
Assignee:Mike Gale% Done:

0%

Category:EAD
Target version:Release 2.1.0
Google Code Legacy ID: Tested version:2.0.0
Sponsored:No Requires documentation:

Description

XML Parsing Error: not well-formed
Location: http://2x.test.artefactual.com/carleton-university-bursars-office-fonds;ead?sf_format=xml
Line Number 5, Column 254:

<eadid identifier="http://2x.test.artefactual.com/carleton-university-bursars-office-fonds" countrycode="CA" mainagencycode="CA-OCNCUA" url="http://2x.test.artefactual.com/carleton-university-bursars-office-fonds" encodinganalog="identifier">1995-13 & 1996-42 & 1998-38</eadid> <filedesc>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------^


Related issues

Related to Access to Memory (AtoM) - Bug #7171: Extent tags in EAD with ampersands in them crash XML export Verified 08/29/2014
Related to Access to Memory (AtoM) - Bug #5984: Parsing error in DC export when institution name contains... Verified 11/16/2013
Related to Access to Memory (AtoM) - Bug #5588: Unencoded ampersand in ISDIAH record (authorized form of ... Verified 09/14/2013
Related to Access to Memory (AtoM) - Bug #7211: Ampersands in extent, title and identifier tags are strip... Won't fix 09/08/2014

History

#1 Updated by Zachary Howarth-Schueler almost 8 years ago

An example of an unreferenced entity (also an ampersand) in a note also breaking EAD export:
https://memoryns.ca/helen-creighton-fonds
Export works, but makes "bad" xml.

#2 Updated by Dan Gillean almost 8 years ago

  • Tested version 2.0.0, 2.0.1 added

#3 Updated by Tim Hutchinson almost 8 years ago

See also #5984 and #5588 (re ISDIAH record in those cases). Similar issue for MODS and DC was fixed in #3809.

But the original report seems to be about identifier, not title. I was able to reproduce for identifier but not title.

#4 Updated by Dan Gillean over 7 years ago

  • Subject changed from Ampersand in title breaks EAD export to Ampersand in identifier breaks EAD export

yup, original report was triggered by an ampersand in the identifier, not the title. Changed issue ticket title to reflect this. Note that changing them to HTML ampersands fixed the export.

#5 Updated by Mike Gale over 7 years ago

  • Status changed from New to QA/Review
  • Assignee changed from Jesús García Crespo to Dan Gillean

I pushed a fix for this, 8bb3ff980a23bde44404d2f40f53082c333812c5

#6 Updated by Dan Gillean over 7 years ago

  • Status changed from QA/Review to Feedback
  • Assignee changed from Dan Gillean to Mike Gale

See my notes on #7171 - same solution used here. If an ampersand is encountered, it is stripped out completely. The warning the user is given may not make this clear to them:

libxml error 68 on line 5 in input file: xmlParseEntityRef: no name

According to Mike, the fix should replace the ampersand with the html code equivalent - which does display correctly in the browser. However, during local testing, I found that it was just stripped out entirely instead.

Will test again in a different environment to make sure its not some strange issue with my local environment. Mike to look into on Monday.

#7 Updated by Dan Gillean over 7 years ago

  • Status changed from Feedback to Verified

Oops, I was testing on import - which is a different bug, filed and described in #7211. On export, I've re-tested, and this works.

#8 Updated by Misty De Meo over 7 years ago

  • Tested version 2.2 added

#9 Updated by Misty De Meo over 7 years ago

  • File dude.xml added

#10 Updated by Misty De Meo over 7 years ago

  • File deleted (dude.xml)

#11 Updated by Misty De Meo over 7 years ago

  • Tested version 2.0.0 added

Sorry, attached to the wrong issue.

Also available in: Atom PDF