Bug #4309

XML rountripping error when subject terms have been created on the fly

Added by Dan Gillean almost 10 years ago. Updated about 9 years ago.

Status:VerifiedStart date:
Priority:LowDue date:
Assignee:José Raddaoui Marín% Done:

100%

Category:Import/Export
Target version:Release 1.4.0
Google Code Legacy ID:atom-2361 Tested version:
Sponsored:No Requires documentation:

Description

To reproduce this error:

1) Create a new information object or navigate to an existing one with no access points.
2) From the edit information object screen, add new subject terms. Save the edits.
3) Export the information object in EAD XML and then delete the information object.
4) Delete the subject terms from the taxonomies (Manage > Taxonomies > Subjects).
4) Re-import the EAD XML file

Resulting error:

See attached screenshots

Expected result:

EAD XML file should import with no errors or warnings.

NOTE: the import view screen lists warnings associated with the import, but all data, including the subject terms, seems to import properly.

Note as well that in the attached XML file screenshot, the subject term "Dancing" shows the successful importing of a term associated with a pre-established subject taxonomy, while the terms below ("Global Harmony" etc) show terms created on the fly, that have not been associated with an existing taxonomy.

[g] Legacy categories: Taxonomy / term, Import/Export

parsing-error-subjects.PNG (19.1 KB) Dan Gillean, 12/01/2012 05:30 AM

Artefactual-fonds-import-error-SUBJECTS.PNG (19.4 KB) Dan Gillean, 12/01/2012 05:30 AM

ImportWarning.png (23.7 KB) Jessica Bushey, 01/02/2013 04:21 PM

History

#1 Updated by Dan Gillean almost 10 years ago

Second screenshot:

#2 Updated by Jessica Bushey almost 10 years ago

  • Status changed from New to New
  • Priority set to Low
  • Target version set to Release 2.1.0

[g] Labels added: Priority-Low, Milestone-Release-2.0, Component-Thesauri, Component-Import-Export
[g] New owner: Dan Gillean

#3 Updated by Mike Cantelon over 9 years ago

  • Status changed from New to QA/Review

This should be fixed now.

#4 Updated by Jessica Bushey over 9 years ago

  • Category set to Import/Export
  • Status changed from QA/Review to Feedback
  • Assignee changed from Dan Gillean to Mike Cantelon
  • Sponsored set to No

I tested this bug and confirm that the original warnings are no-longer occurring.
HOWEVER....
A new error occurs upon import (see below):

Error 518 on line 21. Element archdesc does not carry attribute level.

When I tried to compare the two sets of EAD (Import and Export) there was no differences on line 21. The only change was an extra space in front of the closing </did> tag on line 25.
The information object and its subject terms imported fine, regardless of the error.

#5 Updated by Jessica Bushey over 9 years ago

When testing EAD import from MemoryBC in AtoM 1.3 - I got the following WARNING (see png attached)
This time the warning is related to line 62 in my EAD file:
<name source>XXXX</name>

#6 Updated by Jesús García Crespo over 9 years ago

  • Target version changed from Release 2.1.0 to Release 1.4.0

#7 Updated by Dan Gillean about 9 years ago

  • Assignee changed from Mike Cantelon to José Raddaoui Marín

Radda, the problem is a warning that is generated on import when records are exported with as source attribute for a name, but the source is left blank. see for example:

<eventgrp>
          <event>
                        <origination encodinganalog="3.2.1">
                            <name source="">New Creator</name>
                          </origination>
          </event>
</eventgrp>

The @source attribute is defined in the EAD Tag Library as:
SOURCE -- The source of the controlled vocabulary term contained in the element.

However, our users tend not to be using controlled vocabs for authorities - they are either linking them to existing ISAAR records, or generating new ones on the fly. As such, until we revisit this in the future, I think you could remove the source attribute.

#8 Updated by Dan Gillean about 9 years ago

Note: if you do remove @SOURCE from the export, it should be done in such a way that the DTD is unaffected, so that if a user is importing a record with a valid source attribute, the Parser will not fail. Note as well that, as in the examples given at the top of the issue, this is affecting subject access points as well. See the screenshots for more details.

#9 Updated by José Raddaoui Marín about 9 years ago

Warnings in subject access points have already been fixed.

The warning regarding archdesc is happening when the archival description doesn't have level of description assigned. Should I add "otherlevel" value for the level attribute, and otherlevel attribute when this happen? In this case, what value do I use in the otherlevel attribute?

The warning in the source attribute for name element is also happening in persname, famname and corpname. This attribute is exporting dates of existence of the actor when it's populated, and I was thinking to keep it if so, but then I realized that it's a free text field and we'll have the same problem that we have in #3850 (update 25). So I guess it would be better to remove it, if it's not completly necessary.

#10 Updated by Jessica Bushey about 9 years ago

Do not fix the warning on archdesc.
If the user does not have a level selected on import the warning is ok. Best practice is to have a level selected.

#11 Updated by Dan Gillean about 9 years ago

Hi Radda,

Yes, as Jessica says: if a user does not include a level, I think it is fine for them to get a warning - the EAD still imports and works. Including a level is best practice in both our content standard (ISAD or RAD, etc) and our data structure standard (EAD), so while a user can choose to leave this out, they shouldn't complain about a warning on import. The tricky thing is that there is no way to set a level in the DC template - but the use case for a user creating records in DC, and then using the DC export (instead of the DC XML export) are low - and if the issue ever does come up, we can always tell them to switch templates (to ISAD for example), apply a level (such as collection or item for DC records), and switch back to DC again after.

As for the other issue:

It took me a bit to understand what you meant, because I did not realize that we are using @source to pass the dates of existence! I don't think this is good practice at all, and I would like to change it.

As you've noted above, I've already been suggesting that we use the <event> wrapper around these dates. <origination> will not allow for a <date> tag within it, and neither will <name> contain a date. However, with everything wrapped in an <event>, we can then include a separate <date> element with a unique @type added, like this:


    <event>
        <origination encodinganalog="3.2.1">
            <name>New Creator</name>
        </origination>
        <date type="existence" normal="19500000/19990000">1950-1999</date>
    </event>

If it is needed or useful, this can then be wrapped in the <eventgrp> tag as well, but i don't think we need it for our uses. The @type "existence" is useful and appropriate, because the ISAAR standard refers to these dates as "Dates of Existence" - see ISAAR 5.2.1.

This means that you should be able to remove @source. NOTE - the ideal behavior is that @source is included ONLY if there is data for it - is there a way to check for NULL or !=="" or something? If users download a controlled SKOS vocabulary from somewhere like the Library of Congress, and import them, there may be valuable data in the @source field, and it would be nice if we could include this. Ideally, if a user imports something with @source included, it will not trip up the xml parser.

#12 Updated by José Raddaoui Marín about 9 years ago

Hi Dan, a couple of things:

Dates of existence is a free text field and doesn't create start and end dates like creation dates does, so, the date element will never have a normal attribute.

And I don't fully understand your last paragraph, should I remove @source or use other data in it? In that case, what data? I'll check for NULL or "" if so. Also, if somebody imports a SKOS vocabulary, the import takes another way, and no @source will be lost, neither uses the same xml parser. The change will only affect EAD, and only this element.

#13 Updated by Dan Gillean about 9 years ago

Radda, you're right, if there's no normal for dates of existence, ignore this.

As for @source: I am basically saying, if there is nothing in the source attribute, it should not show up in the export. we should never export something that says source="". However, if someone does import a taxonomy that has a source attribute (example: <persname source="LoC">, this should roundtrip fine, and not cause errors or warnings. Is this possible?

#14 Updated by José Raddaoui Marín about 9 years ago

If we are removing dates of existence from the source attribute, there won't be any source attribute, unless we add other data to it. I'll check for nulls in that case and remove the attribute if so, but, what field data should we use? The field 'sources' in the control area of the actor?

And the same in the import, if we found '<persname source="LoC">', in what field should we add 'LoC'? The field 'sources' in the control area of the actor?

#15 Updated by Dan Gillean about 9 years ago

It's fine for the @source information to remain only in the EAD - we don't need to add this information anywhere, because it may be part of a taxonomy imported for general use - for example, if someone imported the Library of Congress Subject Headings, and then used one of the terms as an access point on an archival description, the export might say <subject source="LCSH">Dancing</subject>. However, this is mainly internal archival data - if it exists, I don't want us to lose it or have it break the application, but we do not need to add it into an archival description anywhere. So don't worry about that.

Ideally, the <name> elements will appear the same, even if they are exported differently in the EAD. I.e., if someone adds a creator, and goes to the ISAAR record and adds dates of existence, the archival description should still say include under the name access points FirstName LastName (creation), and the archival description area should still have Name of Creator: FirstName LastName (dates). Nothing should be different.

If there is no data for sources, then the attribute should not appear - we don't have to go looking for data to add. The purpose of the @source element is to include " The source of the controlled vocabulary term contained in the element." (EAD Tag Library). If that information is not present - i.e., if the user has added the term themselves - then exclude it. IF the information is present when someone imports a SKOS file, it should appear.

#16 Updated by José Raddaoui Marín about 9 years ago

Ok, we are talking about two completly different source attributes, each one with a different export/import process:

The one from <subject>:

This exports the source's notes for the term (access point). It has already been fixed and It should roundtrip fine.

The one from <name>, <persname>:

This is the one exporting and importing the creator's dates of existence. And, from now on, should I remove it completly or roundtrip other data in it?

#17 Updated by Dan Gillean about 9 years ago

Yes, sorry it's confusing to have both in here, but the answers are all already on the ticket.

For subject, remove @source. But, if a user has @source on import, it should be able to roundtrip in our system without breaking anything. The information does not need to appear anywhere except the EAD.

For names, same thing. we don't need @source, but if someone has imported authority records and they have @source, it should not break anything when roundtripped. The data in there does not need to appear anwywhere except the EAD if someone has included it.

To implement names, please use the model from the example in comment 11 above. That way, you don't need to use @source to pass the dates of existence, and we can remove it.

#18 Updated by José Raddaoui Marín about 9 years ago

  • Status changed from Feedback to QA/Review
  • % Done changed from 0 to 100

Applied in changeset atom|commit:21a7889f470240819ea46e65b527020ae37f1a81.

#19 Updated by Dan Gillean about 9 years ago

  • Status changed from QA/Review to Verified

Also available in: Atom PDF