Feature #11207

Enhance the HTML scrub script to apply to more entities and escaped HTML characters

Added by Dan Gillean almost 5 years ago. Updated almost 5 years ago.

Status:VerifiedStart date:05/30/2017
Priority:MediumDue date:
Assignee:Dan Gillean% Done:


Category:CLI tools
Target version:Release 2.4.0
Google Code Legacy ID: Tested version:
Sponsored:No Requires documentation:


In issue #8574 we introduced a command-line task that would help users remove HTML content, after we began automatically escaping HTML for security in release 2.2. We further enhanced that work by converting links to use AtoM's custom linking syntax in #9184. Previously, the task would only affect core information object fields.

This enhancement further extends the functionality of the scrub task in two ways: first, the task now includes further entities. Affected entities are notes, rights, actors, and repositories. Second, escaped HTML characters, such as:


... will be converted back to to ASCII characters, such as "รง".

This work is the product of community collaboration, after one user (Clara Rosales) shared some modified code via the user forum, and another user (Darryl Friesen) further enhanced the shared code, incorporated it into the existing task, and with permission from Clara, shared this back with the public project.

Related pull request: https://github.com/artefactual/atom/pull/568

Related user forum thread: https://groups.google.com/d/msg/ica-atom-users/_xdBK0ucegg/RQnNM5DKBAAJ


#1 Updated by Nick Wilkinson almost 5 years ago

  • Assignee changed from Nick Wilkinson to Mike Gale

#2 Updated by Mike Gale almost 5 years ago

  • % Done changed from 0 to 50

Awaiting PR author to implement 2 small suggestions I had. There are some style issues but they are endemic to the task overall so I'll just go and clean it all up.

#3 Updated by Mike Gale almost 5 years ago

  • Status changed from New to QA/Review
  • Assignee changed from Mike Gale to Dan Gillean

it's merged into qa/2.4.x.

#4 Updated by Dan Gillean almost 5 years ago

  • Status changed from QA/Review to Verified
  • Requires documentation deleted (Yes)

Looks good, thanks for the contribution, Darryl and Clara!

Docs updated in 2.4 branch: https://github.com/artefactual/atom-docs/commit/dec43a966634a08cc2be5784b807bd46c3645364

Also available in: Atom PDF