Update HTML scrub script to replace HTML links with custom linking formatting used in AtoM
|Assignee:||Mike Gale||% Done:|
|Target version:||Release 2.2.1|
|Google Code Legacy ID:||Tested version:||2.3|
With the 2.2 AtoM release, we introduced changes (described in #7647) that mean that HTML content is escaped for security purposes. This change has had some unintented consequences however, one of which is that many users have used HTML to add anchor links in edit text-fields (such as the Finding aids field) in their descriptions.
In 2.3 via issue #8410, some custom markdown-like syntax was added to AtoM, so users can again add links that will not be escaped, and which also will not break the EAD export. The format chosen, after extensive regex testing, is that used by redmine:
"anchor text here":http://your-link-here.example.com
At the same time, 2.3 will also include an HTML scrub CLI task, to assist users with legacy HTML added to descriptive fields to remove it with a task (see ticket #8574). However, at the time of its development, #8410 and #8574 were not integrated - in the first iteration of the HTML scrub task, links formatted with display text are replaced with a raw link dumped in brackets beside the display text.
This ticket will enhance the original CLI task in #8574, so when it encounters HTML links, it will replace the HTML with the custom markdown-like formatting, thus restoring the appearance to its original intended outcome.
#3 Updated by Dan Gillean almost 5 years ago
- Requires documentation deleted (
Docs updated in 2.3 branch: https://github.com/artefactual/atom-docs/commit/1eb6ad84215fb2b96e3e955dca8353fdc23b2205