Bug #4997

Remove @encodinganalog from <language> tags and rework how EAD imports are determined for languages and scripts od description (and material)

Added by Dan Gillean about 9 years ago. Updated about 9 years ago.

Status:VerifiedStart date:04/24/2013
Priority:HighDue date:
Assignee:José Raddaoui Marín% Done:

0%

Category:EAD
Target version:Release 1.4.0
Google Code Legacy ID: Tested version:
Sponsored:No Requires documentation:

Description

Step 1: remove the encodinganalog from the language tags

Currently the code in AtoM appears to rely on @encodinganalog attributes used within <language> elements to determine where the language belongs (language or script of material, OR language/script of description.)

EX:

<language langcode="eng" encodinganalog="Language of Description">English</language> 
OR
 <language scriptcode="Cari" encodinganalog="Script">Carian</language>

However, this is an incorrect use of the @encoding analog, which is "A field or element in another descriptive encoding system to which an EAD element or attribute is comparable." (EAD Tag Library 2002). Generally, this is used to encode standards numbers, such as ISAD, or MARC fields, or RAD, etc. Many other users trying to import their data who do not use the attribute the same peculiar way that we currently do will therefore have problems importing their data into AtoM.

Step 2: Rework how PHP calls to these elements and determines what goes where

Each language element is already wrapped in a tag that determines whether it is a language of descrption or material, so the encoding analog should not be necessary, and the PHP code should not be using these attributes in the switch function.

Language of Material and Script of Material should appear wrapped in <langmaterial> tags like such

<langmaterial encodinganalog="3.4.3">
     <language langcode="eng">English</language>
     <language scriptcode="latn">Latin</language>
LANG AND SCRIPT NOTES HERE
</language>

Language of Description and Script of Description should be similar, only wrapped in the <langusage> element:

<langusage>
     <language langcode="eng">English</language>
     <language scriptcode="latn">Latin</language>
</langusage>

Thus it should be possible to use the parent tags to determine which language belongs where, instead of relying on information jammed into the @encoding analog.


Related issues

Related to Access to Memory (AtoM) - Bug #4996: Scriptcode for materials wrong EAD element Verified 04/24/2013
Related to Access to Memory (AtoM) - Bug #4990: Language and Script notes missing from EAD Verified 04/23/2013
Related to Access to Memory (AtoM) - Bug #4431: Language of description not roundtripping in EAD (RAD) Verified 12/12/2012
Blocked by Access to Memory (AtoM) - Bug #5012: ISO 639: remove country code if provided Verified 04/28/2013

History

#1 Updated by Dan Gillean about 9 years ago

  • Description updated (diff)

#2 Updated by Dan Gillean about 9 years ago

  • Description updated (diff)

#3 Updated by José Raddaoui Marín about 9 years ago

We have a little problem here. Currently AtoM is exporting three types of languages:

- Language of material

We are already using <langmaterial> inside the <did> element for exporting these languages. But they are not importing yet.

<langmaterial encodinganalog="3.4.3">
  <language langcode="fre">French</language>
</langmaterial>

- Language of description

<langusage>
  <language langcode="eng" encodinganalog="Language Of Description">English</language>
</langusage>

- Source language / Export language

AtoM also exports the source culture of the archival description and the user's culture.

<langusage>
  <language langcode="eng" encodinganalog="Language">English</language>
</langusage>

So, AtoM is relying on @encodinganalog attributes to determine if it's the source language of the description or a description language. NOT language of material or language of description.

I'm going to fix the export for scripts of material, add the language and script notes to the export, and add the import for the whole <langmaterial> element. But, what can we do to distinguish the source languages for the languages of description?

#4 Updated by David Juhasz about 9 years ago

From what I can tell from the EAD working group documents <http://www2.archivists.org/groups/technical-subcommittee-on-encoded-archival-description-ead> the new version of EAD (Scheduled for August 2013?) will allow using the @xml:lang attribute for all tags which will allow encoding mixed-language finding aids (hurray!) and mapping i18n data between AtoM and EAD will become much easier.

In the meantime, we've got a bit of a mess with the difference in the way AtoM handles languages/scripts and the way EAD 2002 does.

Here's what I suggest:

Language/script of material

This isn’t controversial as far as I can tell, we just need to get it importing properly. I'm not sure if we have a separate issue for encoding the "Language and script notes" field, but we should include it in the <langmaterial> element.

<langmaterial encodinganalog="3.4.3">
  <language langcode="eng"/>
  <language langcode="fre"/>
  <language scriptcode="Latn"/>
  Correspondence is predominantly English, with some French.
</langmaterial>

Language/script of description

The problem here is that we have two definitions for the language(s) of the description:

  1. The i18n data schema (i.e. information_object_i18n.culture)
  2. The “Language(s)” list (a “property” row in the database)

I think that rather than using the @encodinganalog, which is defined for another purpose and creates confusion, to distinguish definition 1 from definition 2, that we should assume that the first (and only the first) <language> element is the language of the EAD-XML file (i.e. the current culture in AtoM when the EAD-XML is exported), and all other languages are pulled from the “Language(s)” list (definition 2).

For example, assume a Fonds which has English and French descriptions in AtoM (It has ‘en’ and ‘fr’ i18n rows), and it has “Language(s)”: [English, Spanish, Russian] and “Script(s)”: [Latin, Cyrillic]. If my current culture in AtoM is English, and I export the Fonds as EAD then I propose the representation should be:

<langusage>
  <language langcode="eng"/>
  <language langcode="spa"/>
  <language langcode="rus"/>
  <language scriptcode="Latn"/>
  <language scriptcode="Cyrl"/>
</langusage>

Note that the ordering is only important for the first <language> element as this will set the “culture” of the description on import. Also notice that we are not including French in the list, even though the original description in AtoM includes a French translation. I think it’s important to respect the conscious choice of the archivist of what to include in the description, even if it doesn’t match the data. However, we must include the current culture when exporting the description, so we can use the correct culture when importing the EAD file into another instance of AtoM.

The <langusage> list should not contain duplicates.

In AtoM, if I now switch my culture to French and export the same Fonds, the EAD <langusage> representation should be:

<langusage>
  <language langcode="fre"/>
  <language langcode="eng"/>
  <language langcode="spa"/>
  <language langcode="rus"/>
  <language scriptcode="Latn"/>
  <language scriptcode="Cyrl"/>
</langusage>

French would be the culture (and source_culture) of the description on import, and would added to the “Language(s)” list in AtoM on import. Although including all of the <langusage><language> elements in the “Language(s)” list in AtoM means this data doesn’t roundtrip from AtoM to AtoM perfectly, I think this convention is necessary to support EAD imported from other (non-AtoM) sources.

#5 Updated by José Raddaoui Marín about 9 years ago

This should be fixed. But there is still the problem with FreeBeer639, where some languages are imported, but not properly. Sevein is working on this.

#6 Updated by David Juhasz about 9 years ago

Awesome, thanks Radda! :D

#7 Updated by José Raddaoui Marín about 9 years ago

  • Status changed from New to QA/Review

#8 Updated by Dan Gillean about 9 years ago

  • Status changed from QA/Review to Verified

Also available in: Atom PDF