Bug #4845

Import EAD - Warning libxml error - Syntax of value for attribute id of bioghist not valid

Added by Jessica Bushey about 9 years ago. Updated almost 9 years ago.

Status:In progressStart date:03/26/2013
Priority:MediumDue date:
Assignee:José Raddaoui Marín% Done:

50%

Category:EADEstimated time:15.00 hours
Target version:Release 1.4.0
Google Code Legacy ID: Tested version:
Sponsored:No Requires documentation:

Description

1. Create archival description with one creator (start and end date).
2. Save.
3. Export and save XML.
4. Delete record.
5. Import XML and get WARNING.
6. View archival description.

Error:
Duplication of Creation date.

Expected:
One Creation date and one creator.

See screencaptures attached.
BB-screenshot.png is the archival description before export
BB-2-screenshot.png is the warning upon import.
BB-3-screenshot.png is the archival description after import
bunny-bugs;ead.xml is the exported XML
bunny-bugs2;ead.xml is the XML of the imported description

BB-screenshot.png (119 KB) Jessica Bushey, 03/26/2013 10:06 AM

BB-2-screenshot.png (25.1 KB) Jessica Bushey, 03/26/2013 10:07 AM

BB-3-screenshot.png (91 KB) Jessica Bushey, 03/26/2013 10:07 AM

bunny-bugs;ead.xml Magnifier (3.01 KB) Jessica Bushey, 03/26/2013 10:07 AM

bunny-bugs2;ead.xml Magnifier (3.07 KB) Jessica Bushey, 03/26/2013 10:08 AM

DougDrucker-2.png - Example before Export (112 KB) Jessica Bushey, 03/27/2013 01:25 PM

DougDrucker-3.png - Example after Import (125 KB) Jessica Bushey, 03/27/2013 01:25 PM

doug-drucker-2;ead.xml Magnifier - EAD of Export (2.76 KB) Jessica Bushey, 03/27/2013 01:26 PM

doug-drucker-3;ead.xml Magnifier - EAD after Import (notice duplicate <unitdate>) (2.92 KB) Jessica Bushey, 03/27/2013 01:26 PM


Related issues

Related to Access to Memory (AtoM) - Bug #4436: Rework how EAD imports/exports creation dates Verified 12/14/2012
Duplicated by Access to Memory (AtoM) - Bug #5842: Importing EAD results in duplicate creation dates Duplicate 10/21/2013
Blocks Access to Memory (AtoM) - Bug #4267: EAC and EAD export have URI for Authority Records Feedback

History

#1 Updated by José Raddaoui Marín about 9 years ago

About the warning:

Something has to be wrong for my side, EAD import takes more than one minute for me, and I always get differents warnings. I'll ask about it to Sevein as soon as I can, maybe he knows what is wrong...

About the duplication of creation dates:

On the export, creation dates are exported twice:

Inside <did>:

<unitdate normal="2012/2013" encodinganalog="3.1.3">2012 - 2013</unitdate>

Inside <bioghist>:

<date type="creation" normal="20120000/20130000"/>

Both are taken in the import, the one inside the <bioghist> tag creates the event with the actor, etc; and the one from the <did> tag creates the event with only the dates.

Should I remove <unitdate> export for creation events, or just ignore it on the import?

#2 Updated by Jessica Bushey about 9 years ago

Radda,

I created an archival description with a creation date of 1900-1990.
I linked a creator who has an existence date of 1880-1990.
See screen capture: DougDrucker-2.png
Export EAD, see file:

I delete the archival description.

I import the EAD xml file and the result is two dates for the archival description. See screen capture: DougDrucker-3.png
In the EAD XML i can see that <unitdate> is duplicated.Maybe this is why we are getting two dates? See file:

#3 Updated by Jessica Bushey about 9 years ago

#4 Updated by Jessica Bushey about 9 years ago

I spent a number of hours trying to solve this problem today, reviewing EAD and EAC, but the trouble is that our <date type="creation" normal="XXXXXXX/XXXXXX</date> is critical to making the connection between creator and information object. But it is also the reason we are generating a second date upon import.
I'm not sure what to do...

#5 Updated by David Juhasz about 9 years ago

  • Estimated time set to 15.00

#6 Updated by José Raddaoui Marín about 9 years ago

About the warning:

I finally solved the problems I had in EAD import, and now I'm getting the same warning than Jessica.

This warning is happening because we are using '/' inside the id attribute with the url. The ead.dtd file ask for SGML, and this standard has problems with the slash.

What should I do about it?

About the duplication of creation dates:

As I said in the first update, the dates of creation events are exported two times. As Jessica said <date type="creation" normal="XXXXXXX/XXXXXX</date> is critical to making the connection between creator and information object, but <unitdate normal="2012/2013" encodinganalog="3.1.3">2012 - 2013</unitdate> is not, this element creates an event with only the dates in the import, so, as far I can see, we have two options:

- If it is completly necesary that EAD xml contains <unitdate normal="2012/2013" encodinganalog="3.1.3">2012 - 2013</unitdate> for creation events, we can ignore this element in the import to avoid duplication.

- If it's not completly necesary, I can remove <unitdate normal="2012/2013" encodinganalog="3.1.3">2012 - 2013</unitdate> for creation events in the export.

#7 Updated by Jessica Bushey about 9 years ago

Does this document help in regards to EAD and SGML?
[[http://www.loc.gov/ead/ag/agconc.html#sec1]]

#8 Updated by José Raddaoui Marín about 9 years ago

Yes, thanks Jessica. But mostly it gives you the possibility to modify the DTD file in order to ignore the attribute syntax. But, we are using an external DTD file (http://lcweb2.loc.gov/xmlcommon/dtds/ead2002/ead.dtd) so I don't think it would be possible.

#9 Updated by David Juhasz almost 9 years ago

Radda, let's use the md5 hash of the URL to create a highly probable unique value and get around the invalid characters problem. Based on <http://stackoverflow.com/questions/201705/how-many-random-elements-before-md5-produces-collisions> the chance of a random md5 hash collision is very remote.

#10 Updated by David Juhasz almost 9 years ago

We need to prefix a string on to the beginning of the md5 hash, since IDs can't start with a number, but md5 hashes may begin with a number. Let's prefix the hash with "md5-", unless anyone else has a better suggestion.

#11 Updated by Dan Gillean almost 9 years ago

I think that "md5-" is a great prefix; it is similar to the IDs we were generating to link containers to physical storage, in that the prefix doubles as an indication to end-users looking at the EAD where the number is derived from and/or what it's there for. Let's go with it.

#12 Updated by José Raddaoui Marín almost 9 years ago

  • Status changed from New to In progress
  • % Done changed from 0 to 50

Thanks David and Dan,

The warning is fixed now with the md5 hash, and it's also merged. But I'll wait to mark this as QA/Review until the date's duplication problem is solved in #5094.

Also available in: Atom PDF