Convert existing AtoM data to Parsedown syntax
|Category:||CLI tools||Estimated time:||24.00 hours|
|Target version:||Release 2.5.0|
|Google Code Legacy ID:||Tested version:|
We added automatic escaping of HTML content for security purposes in issue #7647. To help users remove existing HTML content added to edit templates, a command-line task was added with issue #8574. When a custom linking syntax was added to AtoM in issue #8410, we also updated the HTML scrub task to automatically convert HTML styled hyperlinks to the new custom format, in #9184, and then further supplemented the HTML scrub task to cover escaped HTML entities with issue #11207.
Now, with the addition of support for Parsedown coming in issue #12148, we will remove previous custom formatting rules (the custom linking syntax, and the automatic conversion of asterisk characters followed by a space starting a new line into list elements, for example) to prevent conflicts with the Parsedown formatting.
This may leave some users who have made use of the previous custom link syntax with broken/improperly displayed link elements. To assist these users in updating this content, we will update the existing HTML conversion CLI task to use the new Parsedown formatting where applicable when converting content, and we will add support for a method to transform the previous custom link syntax in #8410 to the new link formatting supported by Parsedown.
This new method will be added as a new separate command-line task, used specifically to convert the previous custom syntax for HTML links to the new Parsedown syntax. We will also call this new task as a migration script, so users who are upgrading will not have to manually run the task after upgrading.
Development tasks: Create command line scripts to convert existing HTML or Markdown syntax to Parsedown syntax.
- Update existing HTML conversion script to transform applicable HTML tags to Parsedown syntax.
- Create new command line task to convert the old AtoM link Markdown to the Parsedown syntax.
- Add the new CLI task to convert old AtoM link syntax to Parsedown as a migration, so it will be automatically executed when users upgrade
#8 Updated by José Raddaoui Marín over 1 year ago
The convert HTML task has been modified to convert links, lists, definition lists, line breaks and paragraphs to Markdown instead of to plain text:
php symfony i18n:remove-html-tags
A new task has been added, which process the same fields as the previous one, to convert the Redmine syntax links to Markdown:
php symfony i18n:custom-link-to-markdown
This last task is also being executed in the upgrade process.
#10 Updated by José Raddaoui Marín about 1 year ago
I18n tables and columns being processed by this tasks:
#13 Updated by Dan Gillean 4 months ago
- Status changed from QA/Review to Feedback
- Assignee set to José Raddaoui Marín
I was testing in 16.04 2.5 VM, and the remove-html task does not seem to be working.
My block of test HTML:
This is some <strong>bolded</strong> test, and this is some <em>italicized</em> text. <ul><li>this</li><li>is</li><li>a</li><li>LIST!</li></ul> <br /> <h2>this is a title!</h2> ----- This is an outdated <b>bold</b> tag, and an outdated <i>italics</i> HTML tag!
None of the tags were removed or replaced when running:
php symfony i18n:remove-html-tags
I added that test block to:
1) Scope and content in an IO
2) Finding aids in a repository record
Corinne has also tested this on the 18.04 test vagrant box and found the same thing.
#16 Updated by David Juhasz 4 months ago
For future reference, the PR Mike refers to in comment 15 is: https://github.com/artefactual/atom/commit/5a4dc4806026f73318239893047a8e5bd673a6df