Enhance the HTML scrub script to apply to more entities and escaped HTML characters
|Assignee:||Dan Gillean||% Done:|
|Target version:||Release 2.4.0|
|Google Code Legacy ID:||Tested version:|
In issue #8574 we introduced a command-line task that would help users remove HTML content, after we began automatically escaping HTML for security in release 2.2. We further enhanced that work by converting links to use AtoM's custom linking syntax in #9184. Previously, the task would only affect core information object fields.
This enhancement further extends the functionality of the scrub task in two ways: first, the task now includes further entities. Affected entities are notes, rights, actors, and repositories. Second, escaped HTML characters, such as:
... will be converted back to to ASCII characters, such as "ç".
This work is the product of community collaboration, after one user (Clara Rosales) shared some modified code via the user forum, and another user (Darryl Friesen) further enhanced the shared code, incorporated it into the existing task, and with permission from Clara, shared this back with the public project.
Related pull request: https://github.com/artefactual/atom/pull/568
Related user forum thread: https://groups.google.com/d/msg/ica-atom-users/_xdBK0ucegg/RQnNM5DKBAAJ
#4 Updated by Dan Gillean almost 5 years ago
- Status changed from QA/Review to Verified
- Requires documentation deleted (
Looks good, thanks for the contribution, Darryl and Clara!
Docs updated in 2.4 branch: https://github.com/artefactual/atom-docs/commit/dec43a966634a08cc2be5784b807bd46c3645364