Container and Physical location fields not exporting
|Assignee:||José Raddaoui Marín||% Done:|
|Category:||EAD||Estimated time:||3.00 hours|
|Target version:||Release 1.4.0|
|Google Code Legacy ID:||atom-1901||Tested version:|
To reproduce this error:
1)Go to show screen for an information object with attached physical storage
2)Export the record
No physloc fields in the export file
Should be something like <physloc>Shelf 11, Aisle C10, Main Repository</physloc> in export file
[g] Legacy categories: Import/Export, EAD
#7 Updated by Jessica Bushey over 7 years ago
Physical container is being exported but only when it is included at a lower-level of a multi-level description. The physical container for the fonds is not recorded in EAD export, but the physical container for the series (child level) is recorded in the EAD export. Upon Import the physical container for the series is accurate and represented, but not for the fonds level.
Below is the sample EAD.XML from the export for the series level physical container.
<physloc>cold storage, shelf 8A</physloc>
<container type="Folder">SPCA 01-A</container>
<unittitle encodinganalog="3.1.2">Client files</unittitle><unitid encodinganalog="3.1.1">01</unitid>
<unitdate datechar="creation" normal="1922/2010" encodinganalog="3.1.3">1922 - 2010</unitdate>
#15 Updated by José Raddaoui Marín over 7 years ago
Now, AtoM is exporting only <container>, <physloc> doesn't appear at the export.
I have tried with ISAD and RAD templates, with Fonds, Series and Files in a three level hierarchy, with differents physical locations, and a lot of posibilities, and for me <container> appears when it should.
Could you give me more information?
#16 Updated by Jessica Bushey over 7 years ago
The physical container information is roundtripping, BUT it is creating a Warning upon Import: syntax for attribute type of container is not valid.
This is suggested:
We are using this:
<container type="box">archival storage<title>XX</title></container>
#18 Updated by José Raddaoui Marín over 7 years ago
This warning appears when the type attribute has spaces. In the begining it fails with "Cardboard box", "Hollinger box", "Filing cabinet" and "Map cabinet". But, when importing, if the type does not match with any term, it will create a new term and type; so, if a new type with spaces is created, it will create a warning too.
And if we use <container type="box">XX</container> instead of <container type="box">archival storage<title>XX</title></container> we will loose the location in the import.
How should we export types with spaces?
#19 Updated by Jessica Bushey over 7 years ago
<container type="Hollinger box">
Aisle 4, Shelf 5 <title>JFF-01</title>
We were getting: "Syntax value for attribute type of container is not valid".
So I changed it to lowerCamelCase:
Aisle 4, Shelf 5 <title>JFF-01</title>
No more warning!
#20 Updated by Dan Gillean over 7 years ago
My suggestion is to keep the fix for this issue as close to available EAD examples as possible - in most cases I have seen, only the @TYPE of "box" or "folder" have been used. Therefore, I think that we should keep our types simple - box, folder, shelf, etc - and if possible, use the @LABEL attribute for specifying which specific kind of box.
Example: user selects Hollinger box from the container type drop down list in AtoM: EAD will be
<container type="box" label="hollinger"><title>[USER_DATA_FROM_CONTAINER_NAME_FIELD]</title></container>
Location information should not be captured inside of the <container> field. This data should be using the <physloc> EAD element. See: http://www.loc.gov/ead/tglib/elements/physloc.html
Here is an example from the physloc element:
<c02 level="file"> <did> <physloc>112.I.8.1B-2</physloc> <container type="box">2</container> <unittitle><unitdate type="inclusive">December 1908-July 1917 </unitdate></unittitle> </did> </c02>
#21 Updated by José Raddaoui Marín about 7 years ago
I've added the label attribute for those elements with spaces but we have a problem with the location information, if we use <physloc>, the location and the container will be separated, which will make difficult to join them in the import, because we have the posibility of having more than one physical storage for an object.
#22 Updated by Dan Gillean about 7 years ago
Each <physloc> and <container> will be contained within the <did> of a <c> level description. It's true that a <physloc> may hold many containers, but that information will be repeated in the export/import within every <did> to which it applies. If there are multiple containers within the same <did> and only one <physloc>, it is reasonable to assume that each container belongs to the same physical location. I am not sure I see the problem in uniting them, but perhaps I don't understand the scripts that you are creating to manage the import. If we assume that any <container> listed in a <did> belongs to the <physloc> within the same <did>, is the import still challenged? And if so, can you clarify so I can better answer your question? Thanks.
#23 Updated by José Raddaoui Marín about 7 years ago
An information object can have more than one physical storage, each physical storage has a location and a container. If we use <physloc> and <container> there will be more than one for each one inside the <did> element. I'm attaching an example with the work made so far.
#24 Updated by Dan Gillean about 7 years ago
What if we were to use the physical location provided in the interface (ie, the physloc element) as an id for the container?
<physloc>Almacén 32</physloc> <container type="folder" id="Almacén 32"> <title>carpeta_A</title> </container> <physloc>Almacén 32</physloc> <container type="box" id="Almacén 32"> <title>house in a tree</title> </container> <container type="box" label="cardboard"> <title>box 35</title> </container>
This would depend on 2 things:
1) That we can test and ensure that any information the user might enter into the Location field will not cause XML errors when round tripped - spaces, accents and other diacritical marks, special characters, etc.
2) If the Location field is not filled out (i.e., IF Location == "" or NULL, as in the "box 35" example above) then we do not add the ID field to the container.
I don't like this solution, but then again, the problem is with the existing interface, which does not allow for IDs to be used, for the Physical location to contain its own drop down (and thus have locations be re-used), and it confuses things by including "shelf" as a possible container in the controlled dropdown vocabulary (which rightly belongs to <physloc>). I hope in the future that we can find institutions interested in funding development for this part of AtoM, to address these problems throughout the application. In the meantime, let me know your thoughts on this proposed solution.
#25 Updated by José Raddaoui Marín about 7 years ago
Location is a freetext field, and it will create the same warning for the id attribute, for example, if it has spaces. Doesn't look like a good option to me. I can't see any wrapper for both elements, but maybe we can use the <ref> element inside <phyloc> or <container> to reference the other. But I don't know, maybe I'm talking nonsense.
#26 Updated by Dan Gillean about 7 years ago
Radda, I have posted a question about this to the EAD List-Serv, to see if the broader community has suggestions about how we can approach this while still conforming to accepted best practice. I will update the issue again soon if/when I have some feedback. View the question on the list-serv here: http://listserv.loc.gov/cgi-bin/wa?A2=ind1304&L=ead&T=0&P=1551
If we don't get any good suggestions, Jessica and I will discuss and get back to you.
#27 Updated by Dan Gillean about 7 years ago
After discussing with people on the list-serv (see:http://listserv.loc.gov/cgi-bin/wa?A1=ind1304&L=ead&T=0, all headings listed under the title "Establishing a relationship between containers and physical location"), it seems the best option to add a relationship between the two is the following:
we will add an @id to the <physloc>, and then put the same value in <container>/@parent. We were considering using the table ID from QubitRelation, but as this number is unique and only used within the context of the EAD, there is no reason to reveal table IDs to end users via our EAD (security).
Instead, we will use a simple counter. My preference would be to use a number with several leading zeros - ie, 0001 not 1. So here is an example:
<physloc id="0001">Almacén 32</physloc> <container type="folder" parent="0001">folder title here</container> <physloc id="0002">Almacén 32</physloc> <container type="box" parent="0002">house in a tree</container> <container type="box" label="cardboard">box 35</container> <!--if there's no associated physloc, we should not add @parent -->
You will note that the ID's are different for the two same physical locations. This is due to the limitations of our current GUI. There is no dropdown list for physical location, no way to reuse the same information - so a user would have to manually type out the same information (e.g., "Almacén 32") twice. Every @id must be unique in XML, so we would have to have a way to only use the <phsyloc> element 1 time and point 2 <containers> at it - but that's not really possible with the current interface.
So for now, just use a different ID for each location; if the user repeats the location, that's okay - it's already generating 2 different <physloc> elmements so adding a unique id to each should not be a problem.
Ideally, if there is no physical location information added, then the <container> should not have an empty @parent.
#28 Updated by José Raddaoui Marín about 7 years ago
Hi Dan, thanks a lot!
I made the changes and everything looks fine, just a little problem when importing. A warning appears for every id or parent attribute:
libxml error 502 on line 24 in input file: Syntax of value for attribute id of physloc is not valid libxml error 502 on line 25 in input file: Syntax of value for attribute parent of container is not valid
The problem is that the ID have to be a valid NCName, which for example means, that the first letter can't be a number. I tried with 'id0001' instead of '0001' and the warning dissapear.
It's 'id0001' ok? Any other suggestions?
#29 Updated by Dan Gillean about 7 years ago
I worry that our users will be confused by this random ID, if they ever look at the EAD closely, so maybe this is an opportunity to make the relationship explicit.
Why don't we use "physloc" as the predicate? For example, <physloc id='physloc0001'> and <container parent='physloc0001'>
It's a bit longer, but it makes the relationship clearer. Otherwise, id is fine.
#30 Updated by Dan Gillean about 7 years ago
Also, please note (as I put it in the example in comment #27, but forgot to point it out:
Although <title> is technically allowed in <container>, its use in this context is improper, as the Chair of the Technical Subcommittee on EAD reminded us via list-serv responses. <title> is "The formal name of a work, such as a monograph, serial, or painting, listed in a finding aid." Since the <container> element can contain CDATA, we don't need it here anyways; please remove the <title> element from the EAD in this fix.
#34 Updated by Jessica Bushey about 7 years ago
- Status changed from QA/Review to Verified
Here are two examples of the physical location container for Parent and Child descriptions:
EAD for Parent Physical Location:
<did><physloc id="physloc0001">Row 2, Shelf AA</physloc> <container type="box" label="hollinger" parent="physloc0001">NLA-031</container> <unittitle encodinganalog="3.1.2">Summer Day Fonds</unittitle>
EAD for Child Physical Location:
<c level="series"><did><physloc id="physloc0001">Row 2, Shelf AA</physloc> <container type="box" label="hollinger" parent="physloc0001">NLA- 032</container> <unittitle encodinganalog="3.1.2">Photographs</unittitle> <unitid encodinganalog="3.1.1">s01</unitid></did>