Feature #12148

Add Markdown support to AtoM

Added by José Raddaoui Marín over 1 year ago. Updated 4 months ago.

Status:VerifiedStart date:03/13/2018
Priority:MediumDue date:
Assignee:Dan Gillean% Done:

0%

Category:CSS/HTMLEstimated time:44.00 hours
Target version:Release 2.5.0
Google Code Legacy ID: Tested version:
Sponsored:Yes Requires documentation:

Description

This feature will add the ability for AtoM users to add custom formatting such as bolding, italics, styled hyperlinks, lists, and more in user editable content (such as descriptions, authority records, and other AtoM entities), using the Parsedown (http://parsedown.org/) PHP markdown library to allow markdown formatting in AtoM resources.

Parsedown makes use of GitHub flavored markdown syntax - more information on formatting can be found here:

(Note that some aspects of GitHub-flavored markdown are GitHub specific and may not work in AtoM. A full list of supported elements and examples will be added to the documentation when this feature is completed.)

Where previous support has been added for alternative styling methods (such as the custom linking syntax added in issue #8410), these will be removed and will no longer work going forward. Users will be provided with a command-line task to update existing syntax to the new supported format - for more information on this task, see issue #12149. Additionally, as per #7647, raw HTML will continue to be escaped in AtoM, so HTML strong, emphasis, etc. elements will not work, though the work in #12149 will update the HTML scrub task to convert compatible HTML elements to the new parsedown syntax.

Finally, a new setting will be added to Admin > Settings > Global that will allow an administrator to control whether or not markdown rendering is enabled. When disabled, any parsedown formatting added to AtoM will simply display as raw text content.

Deliverables

  • Integrate Parsedown into AtoM.
  • Add administrator setting to allow enabling or disabling markdown rendering functionality.
  • When markdown rendering is enabled, convert markdown to HTML for display in web user interface.
  • Ignore markdown characters in search queries.
  • Remove existing custom markdown code for lists and links, to avoid conflicts with Parsedown.

atom-markdown-test.csv Magnifier (9.61 KB) Dan Gillean, 03/20/2019 02:33 PM


Related issues

Related to Access to Memory (AtoM) - Feature #12149: Convert existing AtoM data to Parsedown syntax Verified 03/13/2018
Related to Access to Memory (AtoM) - Bug #12661: Custom hyperlink in text fields not displaying properly Invalid 12/21/2018

History

#2 Updated by Dan Gillean over 1 year ago

  • Description updated (diff)
  • Category set to CSS/HTML
  • Sponsored changed from No to Yes
  • Requires documentation set to Yes

#3 Updated by Dan Gillean over 1 year ago

  • Related to Feature #12149: Convert existing AtoM data to Parsedown syntax added

#4 Updated by José Raddaoui Marín about 1 year ago

Some notes for testing/documenting:

The scope and content field in the search results was being truncated after 250 chars causing rendering issues when the rendered Markdown is being cut. To avoid this issue I've used the same expandable behavior we have in the index pages. I hope that's okay because the only alternatives we have is to display the entire value always or to remove the Markdown syntax from the field on render.

The data is being stored with Markdown in the ES index but, when the Markdown setting is enabled, the following punctuation chars are converted to spaces when the values are analyzed, to avoid issues when searching over Markdown data like a description with the following title: "__test description__". For the same reason, in autocompletes using the ORM (all except the IO autocomplete) the search will be made with two wildcards ("*query*") instead of only one like before ("query*"), only when the Markdown setting is enabled.

*_#->`+~:|^=

I'll add a couple of examples that are not present in the following link, but those tests are a good start to know what is possible to do with Markdown syntax after this changes:

http://parsedown.org/tests/

#5 Updated by José Raddaoui Marín about 1 year ago

Also, I forgot to mention that the search index needs to be populated each time the Markdown setting is changed, like it happens with the language setting. I'll add a note about that in the settings page.

#6 Updated by David Juhasz about 1 year ago

  • Status changed from New to Feedback

Hi Radda,

While converting punctuation chars to spaces in search is probably fine for most AtoM fields (especially the ones that tend to have a lot of content like "Scope & Content) Tthere are some fields where I think this will be problematic, especially:
  • Identifiers
  • Alternate identifiers
  • Slugs

I'm not sure if Dan can think of any other ones? Maybe physical location numbers? Accession numbers? Repository ids? Dates?

Can we be selective about which ES fields convert punctuation?

#7 Updated by Dan Gillean about 1 year ago

Interestingly, I added italics/emphasis to an identifier, but was still able to search and find matches on it, even when searching in quotations (e.g. exact search). I can test further for this, but I was pleasantly surprised not to find any initial issues with this, in regards to search. I didn't try a reference code search where only one part of the ref code was in italics or bolded or anything - I'll do that on the next round of QA to see how it affects things.

#8 Updated by José Raddaoui Marín about 1 year ago

Hi David,

That could be done, but it will require some work, specially if we want to do it in some i18n fields. The char filter that removes punctuation when the Markdown setting is on, is being added to all analyzers, including the default one, therefore it's being used to analyze the query string too.

#9 Updated by David Juhasz about 1 year ago

Radda, let's see how Dan's testing goes with regards to the punctuation filter. If searching for reference codes and identifiers works fine with the markdown character filter on, then I can't see any other problems with removing those chars from the search text.

#10 Updated by Dan Gillean about 1 year ago

I had great success with searches for reference codes.

I modified various parent identifiers with different markdown - inline escaping in one, bold in another, italics in yet another - and was still able to get matching search results. The results were both in general global searches, and when putting the reference code in quotations and limiting the match to Reference code.

One thing I didn't test, which I'll check out in the next round when Radda merges the feature into qa/2.5.x for further testing:

Turning on the permissive slug generation settings, and adding markdown to a title. I suspect it will work fine, but might lead to an unexpected slug. However, users can always edit the slug manually. If it doesn't break anything, then I think it's acceptable to simply make a note of this in the documentation.

#11 Updated by David Juhasz about 1 year ago

I'm getting a Fatal Error when I try and do a CSV import via the web interface:

2018/05/23 21:06:54 [error] 1412#1412: *60 FastCGI sent in stderr: "PHP message: PHP Fatal error:  Uncaught Error: Class 'QubitMarkdown' not found in /usr/share/nginx/atom/lib/helper/QubitHelper.php:181
Stack trace:
#0 /usr/share/nginx/atom/apps/qubit/modules/informationobject/actions/indexAction.class.php(130): strip_markdown('Example fonds S...')
#1 /usr/share/nginx/atom/plugins/sfIsadPlugin/modules/sfIsadPlugin/actions/indexAction.class.php(32): InformationObjectIndexAction->execute(Object(sfWebRequest))
#2 /usr/share/nginx/atom/cache/qubit/prod/config/config_core_compile.yml.php(966): sfIsadPluginIndexAction->execute(Object(sfWebRequest))
#3 /usr/share/nginx/atom/cache/qubit/prod/config/config_core_compile.yml.php(961): sfExecutionFilter->executeAction(Object(sfIsadPluginIndexAction))
#4 /usr/share/nginx/atom/cache/qubit/prod/config/config_core_compile.yml.php(947): sfExecutionFilter->handleAction(Object(sfFilterChain), Object(sfIsadPluginIndexAction))
#5 /usr/share/nginx/atom/cache/qubit/prod/config/config_core_compile.yml.php(1045): sfExecutionFilter->execute(Object" 
while reading response header from upstream, client: 10.10.10.1, server: _, request: "GET /example-fonds-creation-and-accumulation-dates-all-fields HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.atom.sock:", host: "10.10.10.10", referrer: "http://10.10.10.10/informationobject/browse" 

I restarted nginx, php7.0-fpm, memcached and the atom-worker, and did a symfony cc, but the error is still occurring.

#12 Updated by David Juhasz about 1 year ago

Oops, Ignore comment 11. I had two vagrant boxes running at the same time without knowing it, and the error I reported was from a week ago. I just retested with the right vagrant VM, and all is well.

#13 Updated by José Raddaoui Marín about 1 year ago

  • Status changed from Feedback to QA/Review
  • Assignee changed from José Raddaoui Marín to Dan Gillean

#14 Updated by José Raddaoui Marín 11 months ago

A regression was found in the full-width tree-view title. It should be already fixed in qa/2.5.x

#15 Updated by Dan Gillean 7 months ago

  • Related to Bug #12661: Custom hyperlink in text fields not displaying properly added

#16 Updated by Dan Gillean 4 months ago

  • File atom-markdown-test.csvMagnifier added
  • Status changed from QA/Review to Feedback
  • Assignee changed from Dan Gillean to José Raddaoui Marín

Hey Radda,

I wonder if your fix for the full-width treeview title might have introduced a regression?

The CSV i'm attaching used to work as a demo of all supported Markdown fields, added in the scope and content.

When I test now, it breaks the full-width treeview - the treeview ends up being displayed in the body of the record!

I think we should ensure that no markdown a user adds breaks other parts of the page layout, if possible.

#17 Updated by José Raddaoui Marín 4 months ago

  • Status changed from Feedback to Code Review
  • Assignee changed from José Raddaoui Marín to Mike Cantelon

Good catch Dan!

It was more related to the FWTV JS code than to the Markdown feature, we need to be more careful from now own with our CSS and JS selectors to avoid piking elements created by the Markdown transformation, but it won't be easy to find them all. At least you found this one and it was an easy fix.

Ready for code review in: https://github.com/artefactual/atom/pull/859

#18 Updated by Mike Cantelon 4 months ago

  • Status changed from Code Review to Feedback
  • Assignee changed from Mike Cantelon to José Raddaoui Marín

Looks good to me! Thanks Radda!

#19 Updated by José Raddaoui Marín 4 months ago

  • Status changed from Feedback to QA/Review
  • Assignee changed from José Raddaoui Marín to Dan Gillean

Thanks Mike, merged in qa/2.5.x.

#20 Updated by Dan Gillean 4 months ago

  • Status changed from QA/Review to Verified
  • Requires documentation deleted (Yes)

Also available in: Atom PDF