Bug #13204

An apostrophe in the repository code will cause EAD XML imports to fail

Added by Dan Gillean 12 months ago. Updated 11 months ago.

Status:VerifiedStart date:10/24/2019
Priority:MediumDue date:
Assignee:-% Done:

0%

Category:EAD
Target version:Release 2.6.0
Google Code Legacy ID: Tested version:2.5, 2.6
Sponsored:No Requires documentation:No

Description

Any time that an apostrophe appears in the @repocode value of a descriptions <unittitle> in an EAD XML file, AtoM will throw an error - the job scheduler will fail and will require a full job queue purge to re<start it successfully. This appears to be because AtoM is not escaping the apostrophe, and therefore interpreting it as a break, and what follows is unexpected. Consequently, when using the import-bulk task for example, you will see something like this:

vagrant$ php symfony import:bulk --index /vagrant/isad-ead-crosswalk-2.3-apostrophe-EXPORT.xml
Importing 1 files from /vagrant/isad-ead-crosswalk-2.3-apostrophe-EXPORT.xml (indexing is ENABLED) ...

PHP Parse error:  syntax error, unexpected 's' (T_STRING), expecting ')' in /usr/share/nginx/atom/lib/QubitXmlImport.class.php(671) : eval()'d code on line 1

To reproduce

  • Import an XML file that you know will import successfully. I'm attaching one here (isad-ead-crosswalk-2.3.xml)
  • Confirm that the import works
  • Navigate to the repository record
  • Add an aposotrophe in the repository identifier
  • Return to the description, and export it

Note that so far, AtoM has allowed us to add the aposotrophe to the repository identifier, and it has exported the description successfully. Now:

  • Purge your test environment to ensure there is no interference or pre-existing repositories
  • Import the exported file

I'm also attaching a version with an apostrophe added to the repository code, for testing (isad-ead-crosswalk-2.3-apostrophe.xml)

Resulting error

  • The file fails to import
  • The job scheduler gets caught in a loop and needs to be cleared and reset
  • The error message shown above is shown if the import is from the CLI

Expected result

Ideally, the values inside the double quotes for the @repocode attribute are escaped before processing

isad-ead-crosswalk-2.3.xml Magnifier - WORKING import example (11.1 KB) Dan Gillean, 10/24/2019 05:30 PM

isad-ead-crosswalk-2.3-apostrophe.xml Magnifier - FAILING import, with apostrophe added to repo code attribute in unittitle (11.1 KB) Dan Gillean, 10/24/2019 05:30 PM

History

#1 Updated by Dan Gillean 12 months ago

First reported in the AtoM user forum, 2019-10-23: https://groups.google.com/d/msg/ica-atom-users/Tduk3YXnFs4/5SmOeG5QAgAJ

Note as well:

The same repository identifier also appears in @mainagencycode attribute found in the <eadid>  element in the header of the XML document. However, I couldn't reproduce an error that crashed the job scheduler by adding an apostrophe here - AtoM's job scheduler will output a libxml warning that the code isn't formatted correctly, but it will proceed with the import. 

#2 Updated by Mike Cantelon 12 months ago

  • Status changed from New to Code Review

PR for CR: https://github.com/artefactual/atom/pull/990

With this PR's fix a libxml warning still gets shown. Apparently EAD's DTD defines the repocode attribute as an NMTOKEN and these, from what I read, aren't supposed to have apostrophes in them, but I tried a couple ways of putting in escaped apostrophes (&apos; and &#39;) and still got the warning so not sure how this can be done without a libxml warning.

#4 Updated by Mike Cantelon 11 months ago

  • Status changed from Code Review to QA/Review

Merged into qa/2.6.x.

#5 Updated by Dan Gillean 11 months ago

  • Target version set to Release 2.6.0
  • Requires documentation set to No

This is a fine solution - give a warning, but don't crash the whole import. This is an edgecase anyway and yes, people shouldn't be putting apostrophes in the field, but since the AtoM UI doesn't have any validation on the field or mention of this in the related tooltip, I think this is a great solve. Thanks!

#6 Updated by Dan Gillean 11 months ago

  • Status changed from QA/Review to Verified

Also available in: Atom PDF