Bug #8385

AtoM EAD is not compliant with DTD, causes DTD warnings on roundtrip

Added by Dan Gillean about 7 years ago. Updated almost 7 years ago.

Status:VerifiedStart date:05/04/2015
Priority:HighDue date:
Assignee:Mike Gale% Done:

0%

Category:EAD
Target version:Release 2.2.0
Google Code Legacy ID: Tested version:2.2
Sponsored:No Requires documentation:Yes

Description

As of AtoM 2.2, we are now validating our EAD imports against a local copy of the EAD DTD, and spitting out warnings where the EAD is non-compliant (see for example #2787).

However, with the many new features, and various EAD changes made in 2.2, no one was validating our EAD output before adding new code.

Now, if you roundtrip a file in AtoM, it will spit out a horrible bunch of warnings, depending on the file. One example screenshot is attached.

Examples:

  • The <origination> element is appearing illegally in the <archdesc> element. It should be moved within the <did> of each level to be valid.
  • The <bioghist> element contains a <date> for the dates of existence of the related actor, but <date> isn't allowed in <bioghist>. Instead, we would have to make it something like <bioghist><note><p><date>
  • We are including empty <controlaccess></controlaccess> elements on export in the EAD, which we shouldn't - it causes a DTD error on roundtrip
  • <processinfo> cannot contain <date> directly; if we want to use it, we need it to be <p><date>

There are likely more, but these are the constantly reoccuring ones. We should keep running our exports against an EAD validator until we're sure our EAD is compliant.

dtd-errors-roundtrip.png - One example of the DTD errors encountered. (37.1 KB) Dan Gillean, 05/04/2015 03:03 PM

dtdissue.xml Magnifier (16 KB) Mike Gale, 05/25/2015 05:27 PM


Related issues

Related to Access to Memory (AtoM) - Bug #8459: <date> information inside <bioghist> tags causes problems... Verified 05/14/2015
Related to Access to Memory (AtoM) - Bug #8458: Origination and bioghist elements don't always line up pr... Verified 05/14/2015
Related to Access to Memory (AtoM) - Bug #6221: EAD <processinfo> is dropped during import Verified 01/17/2014
Blocked by Access to Memory (AtoM) - Bug #8504: Move archivist notes from <titlestmt><author> to <process... Verified 06/01/2015

History

#1 Updated by Dan Gillean about 7 years ago

  • Assignee deleted (Mike Cantelon)

#2 Updated by Dan Gillean about 7 years ago

another example:

libxml error 504 on line 58 in input file: Element controlaccess content does not follow the DTD, expecting (head? , (address | chronlist | list | note | table | blockquote | p | corpname | famname | geogname | name | occupation | persname | subject | genreform | function | title | controlaccess)+), got () 

This is occurring because, even if the user adds no access points, we are including an empty <controlaccess></controlaccess> element in the EAD output.

#3 Updated by Evelyn McLellan about 7 years ago

  • Assignee set to Mike Gale
  • Priority changed from Medium to High

#4 Updated by Evelyn McLellan about 7 years ago

  • Target version set to Release 2.2.0

#5 Updated by Dan Gillean about 7 years ago

  • Description updated (diff)

Added 2 more examples to the description.

#6 Updated by Mike Gale about 7 years ago

  • Related to Bug #8459: <date> information inside <bioghist> tags causes problems on roundtrip added

#7 Updated by Mike Gale about 7 years ago

  • Related to Bug #8458: Origination and bioghist elements don't always line up properly added

#8 Updated by Mike Gale about 7 years ago

  • Status changed from New to Code Review
  • Assignee changed from Mike Gale to José Raddaoui Marín

https://github.com/artefactual/atom/pull/167

Sorry about the diff in the first commit, it's a bit messy since a bunch of the EAD tags weren't indented properly. In the future I'll do those style fixes in a different commit.

#9 Updated by José Raddaoui Marín about 7 years ago

  • Status changed from Code Review to In progress
  • Assignee changed from José Raddaoui Marín to Mike Gale

It looks great Mike!

#10 Updated by Mike Gale about 7 years ago

  • Status changed from In progress to QA/Review
  • Assignee changed from Mike Gale to Dan Gillean

#11 Updated by Mike Gale about 7 years ago

merged in qa/2.2.x

#12 Updated by Dan Gillean about 7 years ago

  • Status changed from QA/Review to Feedback
  • Assignee changed from Dan Gillean to Mike Gale

Looks like you might have missed the closing </p> element in the <processinfo><p><date> string

#13 Updated by Mike Gale about 7 years ago

  • Status changed from Feedback to QA/Review
  • Assignee changed from Mike Gale to Dan Gillean

#14 Updated by Dan Gillean about 7 years ago

  • Status changed from QA/Review to Verified
  • Requires documentation set to Yes

I've now roundtripped descriptions with every field filled out in the ISAD and RAD templates, without errors. I will keep an eye on this as we continue functional release testing, but so far it's looking very good. Marking verified.

Documentation
  • Will require a quick review, and a few updates, to our EAD mapping in the data entry section (ISAD and RAD).

#15 Updated by Dan Gillean about 7 years ago

  • Related to Bug #6221: EAD <processinfo> is dropped during import added

#16 Updated by Mike Gale about 7 years ago

New one found:

libxml error 504 on line 7 in input file: Element titlestmt content does not follow the DTD, expecting (titleproper+ , subtitle* , author? , sponsor?), got (titleproper author author )

#17 Updated by Mike Gale about 7 years ago

  • Status changed from Verified to In progress
  • Assignee changed from Dan Gillean to Mike Gale

#18 Updated by Mike Gale almost 7 years ago

  • Blocked by Bug #8504: Move archivist notes from <titlestmt><author> to <processinfo><p> added

#19 Updated by Dan Gillean almost 7 years ago

  • Status changed from In progress to Verified

Also available in: Atom PDF