Bug #2669

XML import not converting <p> and <br /> tags

Added by Anonymous about 13 years ago. Updated about 9 years ago.

Status:VerifiedStart date:04/01/2009
Priority:HighDue date:
Assignee:José Raddaoui Marín% Done:

100%

Category:Import/Export
Target version:Release 1.4.0
Google Code Legacy ID:atom-719 Tested version:
Sponsored:No Requires documentation:

Description

Google user: rada...@gmail.com

<p> tags embedded in some EAD tags (eg <bioghist> <scopecontent>) not
recognized. Can be problematic in case of long admin history.

[g] Legacy categories: Import/Export, EAD

ICA-AtoMDemoScreenGrab.jpg (258 KB) Anonymous, 12/01/2012 12:33 AM

QubitTrunkScreenGrab.jpg (339 KB) Anonymous, 12/01/2012 12:33 AM

USaskScreenGrab.jpg (305 KB) Anonymous, 12/01/2012 12:33 AM

Notes_Area_Example_CarriageReturns.png - Line breaks not preserved in the Notes Area fields (25.8 KB) Dan Gillean, 03/25/2013 04:05 PM

ScopeContent_Example_CarriageReturns.png - Scope and content line breaks missing in show screen after import (9.7 KB) Dan Gillean, 03/25/2013 04:06 PM

ScopeContent_Example_CarriageReturns_EAD.png - Line breaks in EAD export, but wrapped in a single paragraph tag (25.2 KB) Dan Gillean, 03/25/2013 04:06 PM

History

#1 Updated by Peter Van Garderen about 13 years ago

  • Target version set to Release 1.0.8

[g] Labels added: Milestone-Release-1.0.8

#2 Updated by Peter Van Garderen almost 13 years ago

[g] Labels added: Component-EAD

#3 Updated by Peter Van Garderen over 12 years ago

  • Subject set to XML import not converting &lt;p&gt; and &lt;br /&gt; tags

will have to convert <p> to "\n\n" and <br /> to "\n" and </p> to "" on import. Can
write a simple QubitHelper to do this.

function convert_lb($string) {
$replace1 = array("<br />", "<br />");
$string1 = str_replace($replace1, "\r\n", $string);
$replace2 = array("<p>", "<p>");
$string2 = str_replace($replace2, "\r\n\r\n", $string1);
$replace3 = array("</p>", "</p>");

return str_replace($replace3, "", $string2);    
}

However, cannot include this helper in the current XML import configuration. Will
split-up and simplify XML import per standard in release 1.1. and include this
capability at that time.

As a workaround for release 1.0.8, will use nl2br to include <br /> tags to preserve
linebreaks and paragraphs in the content data.

As of 1.0.8 <p> and <br /> will (re)import into Qubit literally as long as the '<'
and '>' reserved characters are escaped properly ('<' and '>'). The Symfony
output escaping respects and displays these properly in the view and show templates.
This just means that tags will be embedded within the regular text content that is
stored in the dbase (not ideal).

#4 Updated by Peter Van Garderen over 12 years ago

  • Target version changed from Release 1.0.8 to Release 1.1

see r3241 for 1.0.8 workaround

[g] Labels added: Milestone-Release-1.1
[g] Labels removed: Milestone-Release-1.0.8

#5 Updated by Anonymous almost 12 years ago

  • Priority set to Low

[g] Labels added: Priority-Low

#6 Updated by Evelyn McLellan over 11 years ago

  • Target version set to Release 1.2

Moved to 1.2.

[g] Labels added: Milestone-Release-1.2

#7 Updated by Anonymous about 11 years ago

"Import EAD.XML removes line break Formatting" discovered by Jessica Bushey while testing Wu's patch for /p/qubit-toolkit/issues/detail?id=1132.

See Screenshots attached:
*ICA-AtoM Demo ScreenGrab shows accurate presentation of Context Area formatting.This screengrab was taken prior to deleting the townley.ead.xml file. After deletion I imported the townley.ead.xml file and the presentation was no-longer accurate. See following screengrabs for example of inaccurate formatting.
*Qubit Trunk ScreenGrab shows presentation after importing townley.ead.xml file. Note: inaccurate presentation of Context Area formatting.
*USaskScreenGrab shows presentation after importing townley.ead.xml file. Note: inaccurate presentation of Context Area formatting.

[g] Labels added: Priority-High
[g] Labels removed: Priority-Low
[g] New owner: Jessica Bushey

#8 Updated by Anonymous about 11 years ago

#9 Updated by Tim Hutchinson about 11 years ago

A few things:

- the development thread indicates that export works as expected. This does not seem to be the case, since paragraphs and line breaks are not retained on export (testing in the demo site with the same record)
- the relevant EAD element for linebreak is <lb/>, not <br/>
- in USask testing, we made a change to retain <p>'s on import, so may be be able to contribute a patch later. However, this was not done with a QubitHelper as suggested above, so we'll need to review whether it's been done correctly. In any case based on Peter's comments above it doesn't seem to too complicated.

#10 Updated by David Juhasz almost 11 years ago

[g] New owner: MJ Suhonos

#11 Updated by David Juhasz almost 11 years ago

  • Priority set to Medium

[g] Labels added: Priority-Medium

#12 Updated by David Juhasz almost 11 years ago

See /p/qubit-toolkit/issues/detail?id=2037 for the converse case.

#13 Updated by Tim Hutchinson almost 11 years ago

Just an update to comment #9 (third point) - the way we implemented this in SK was a bit of a hack, so it won't be possible to contribute a patch in this case :)

#14 Updated by Anonymous over 10 years ago

I'm trying to generate XML for import to ICA-AtoM which includes multiple values for description extents. According to instructions displayed when entering them manually, they should be separated with a linebreak.

I've tried separating the values in XML with <lb/>, <br/>, /n, and /n/n, and also tried using separate extent tags for each value - all to no avail: the values (including the text values of the separators) are all concatenated on the first line.

Have I missed something, or has anyone found a way round this?

Thanks,
Stephen Gadd

#15 Updated by David Juhasz over 10 years ago

  • Target version set to Release 1.3

Roll over to Release 1.3

[g] Labels added: Milestone-Release-1.3

#16 Updated by Anonymous about 10 years ago

  • Target version changed from Release 1.3 to Release 2.1.0

When exporting to EAD, ICA-AtoM does not translate line breaks in a field as <p> or <br> tags within a parent element; when importing, it does not translate <p> or <br> tags within a parent element as line breaks in field data.

Assessment: this can be a problem when dealing with long data fields (e.g. administrative histories); it will requires the user to manually go in and add line breaks after importing a finding aid.

Recommendation: Implement line breaks in EAD import/export.
[CCAD-35]

[g] Labels added: Milestone-Release-2.0
[g] Labels removed: Milestone-Release-1.3

#17 Updated by Jesús García Crespo almost 10 years ago

[g] New owner: David Juhasz

#18 Updated by David Juhasz almost 10 years ago

Reassign to David's new account.

[g] New owner: David Juhasz

#19 Updated by Dan Gillean over 9 years ago

  • Start date set to 04/01/2009

#20 Updated by Dan Gillean over 9 years ago

  • Category set to Import/Export

#21 Updated by Dan Gillean over 9 years ago

Thought: what about using EAD tags to solve this problem? encoding the AtoM fields so that every carriage return adds <lb> tag? Not sure about the feasability of this from a dev point of view, but it is worth noting that the EAD Tag library includes a line break tag:

http://www.loc.gov/ead/tglib/elements/lb.html

#22 Updated by David Juhasz over 9 years ago

  • Target version changed from Release 2.1.0 to Release 1.4.0

#23 Updated by Dan Gillean about 9 years ago

  • Assignee changed from David Juhasz to José Raddaoui Marín
  • Priority changed from Medium to High
  • Sponsored set to No

#24 Updated by José Raddaoui Marín about 9 years ago

As Stephen says in update #16 EAD export does not translate line breaks in a field as <p> or <br>. So the export looks like this:

<note><p>actor_history

actor_history

- a
- b
- c</p></note>

As Dan has recomended, I've replaced all the '\n' for '<lb/>' in all fields in EAD export, so now the export looks like this:

<note><p>actor_history<lb/><lb/>actor_history<lb/><lb/>- a<lb/>- b<lb/>- c</p></note>

The tricky part was the import, replacing it back and trying not to mess with other imports. But I think finally I got it.

#25 Updated by José Raddaoui Marín about 9 years ago

  • Status changed from New to QA/Review
  • % Done changed from 0 to 100

Applied in changeset atom|commit:afad854036c5924682ea4f6761da67469fe63218.

#26 Updated by Dan Gillean about 9 years ago

This is a broad error which must apply to every field in AtoM. Testing on a sample EAD record did not show any carriage returns being preserved when roundtripping. As it is, when you enter a number of carriage returns in the AtoM template, only one line of separation is preserved on the showscreen regardless of how many breaks you insert when editing.

I've attached some screenshots, but I couldn't see line breaks being preserved anywhere during roundtripping. Interestingly, the line breaks were preserved in the EAD file (see screenshot), but as they were wrapped in a single < p > tag, they were not preserved when imported again.

#27 Updated by José Raddaoui Marín about 9 years ago

I'll talk tomorrow with Sevein, it looks like the changes I made are not merged in 1.x.

#28 Updated by Jesús García Crespo about 9 years ago

  • Status changed from Feedback to QA/Review

Applied in changeset atom|commit:2f53fbf2e8d8e82c7a6574b0d230862c6c9c456e.

#29 Updated by Jesús García Crespo about 9 years ago

Applied in changeset atom|commit:2f53fbf2e8d8e82c7a6574b0d230862c6c9c456e.

#30 Updated by Dan Gillean about 9 years ago

  • Status changed from QA/Review to Verified

EAD Linebreak (<lb>) tags have been successfully introduced, and roundtrip without issue. When a roundtrip fonds is exported again, the linebreaks are preserved in the EAD as well.

NOTE: The atom display interface will not display multiple linebreaks in a saved record - any number of carriage returns is represented as a single line break. However, when a user edits the record, line breaks and spacing are preserved in the edit template. This means that a user can easily delete <lb> tags from the EAD through the GUI, in the edit template. AtoM's display will not represent multiple carriage returns in the display, though paragraph separations (ie 1-2 linebreaks) will appear. If a user really wants several linebeaks to appear in the display, we have allow logged in users to use the HTML <br /> tag for added linebreaks - this will show up in the display as well.

Also available in: Atom PDF