Add Markdown support to AtoM
|Assignee:||Dan Gillean||% Done:|
|Category:||CSS/HTML||Estimated time:||44.00 hours|
|Target version:||Release 2.5.0|
|Google Code Legacy ID:||Tested version:|
This feature will add the ability for AtoM users to add custom formatting such as bolding, italics, styled hyperlinks, lists, and more in user editable content (such as descriptions, authority records, and other AtoM entities), using the Parsedown (http://parsedown.org/) PHP markdown library to allow markdown formatting in AtoM resources.
Parsedown makes use of GitHub flavored markdown syntax - more information on formatting can be found here:
(Note that some aspects of GitHub-flavored markdown are GitHub specific and may not work in AtoM. A full list of supported elements and examples will be added to the documentation when this feature is completed.)
Where previous support has been added for alternative styling methods (such as the custom linking syntax added in issue #8410), these will be removed and will no longer work going forward. Users will be provided with a command-line task to update existing syntax to the new supported format - for more information on this task, see issue #12149. Additionally, as per #7647, raw HTML will continue to be escaped in AtoM, so HTML strong, emphasis, etc. elements will not work, though the work in #12149 will update the HTML scrub task to convert compatible HTML elements to the new parsedown syntax.
Finally, a new setting will be added to Admin > Settings > Global that will allow an administrator to control whether or not markdown rendering is enabled. When disabled, any parsedown formatting added to AtoM will simply display as raw text content.
- Integrate Parsedown into AtoM.
- Add administrator setting to allow enabling or disabling markdown rendering functionality.
- When markdown rendering is enabled, convert markdown to HTML for display in web user interface.
- Ignore markdown characters in search queries.
- Remove existing custom markdown code for lists and links, to avoid conflicts with Parsedown.
#4 Updated by José Raddaoui Marín 11 months ago
Some notes for testing/documenting:
The scope and content field in the search results was being truncated after 250 chars causing rendering issues when the rendered Markdown is being cut. To avoid this issue I've used the same expandable behavior we have in the index pages. I hope that's okay because the only alternatives we have is to display the entire value always or to remove the Markdown syntax from the field on render.
The data is being stored with Markdown in the ES index but, when the Markdown setting is enabled, the following punctuation chars are converted to spaces when the values are analyzed, to avoid issues when searching over Markdown data like a description with the following title: "__test description__". For the same reason, in autocompletes using the ORM (all except the IO autocomplete) the search will be made with two wildcards ("*query*") instead of only one like before ("query*"), only when the Markdown setting is enabled.
I'll add a couple of examples that are not present in the following link, but those tests are a good start to know what is possible to do with Markdown syntax after this changes:
#6 Updated by David Juhasz 10 months ago
- Status changed from New to Feedback
Hi Radda,While converting punctuation chars to spaces in search is probably fine for most AtoM fields (especially the ones that tend to have a lot of content like "Scope & Content) Tthere are some fields where I think this will be problematic, especially:
- Alternate identifiers
I'm not sure if Dan can think of any other ones? Maybe physical location numbers? Accession numbers? Repository ids? Dates?
Can we be selective about which ES fields convert punctuation?
#7 Updated by Dan Gillean 10 months ago
Interestingly, I added italics/emphasis to an identifier, but was still able to search and find matches on it, even when searching in quotations (e.g. exact search). I can test further for this, but I was pleasantly surprised not to find any initial issues with this, in regards to search. I didn't try a reference code search where only one part of the ref code was in italics or bolded or anything - I'll do that on the next round of QA to see how it affects things.
#8 Updated by José Raddaoui Marín 10 months ago
That could be done, but it will require some work, specially if we want to do it in some i18n fields. The char filter that removes punctuation when the Markdown setting is on, is being added to all analyzers, including the default one, therefore it's being used to analyze the query string too.
#9 Updated by David Juhasz 10 months ago
Radda, let's see how Dan's testing goes with regards to the punctuation filter. If searching for reference codes and identifiers works fine with the markdown character filter on, then I can't see any other problems with removing those chars from the search text.
#10 Updated by Dan Gillean 10 months ago
I had great success with searches for reference codes.
I modified various parent identifiers with different markdown - inline escaping in one, bold in another, italics in yet another - and was still able to get matching search results. The results were both in general global searches, and when putting the reference code in quotations and limiting the match to Reference code.
One thing I didn't test, which I'll check out in the next round when Radda merges the feature into qa/2.5.x for further testing:
Turning on the permissive slug generation settings, and adding markdown to a title. I suspect it will work fine, but might lead to an unexpected slug. However, users can always edit the slug manually. If it doesn't break anything, then I think it's acceptable to simply make a note of this in the documentation.
#11 Updated by David Juhasz 10 months ago
I'm getting a Fatal Error when I try and do a CSV import via the web interface:
2018/05/23 21:06:54 [error] 1412#1412: *60 FastCGI sent in stderr: "PHP message: PHP Fatal error: Uncaught Error: Class 'QubitMarkdown' not found in /usr/share/nginx/atom/lib/helper/QubitHelper.php:181 Stack trace: #0 /usr/share/nginx/atom/apps/qubit/modules/informationobject/actions/indexAction.class.php(130): strip_markdown('Example fonds S...') #1 /usr/share/nginx/atom/plugins/sfIsadPlugin/modules/sfIsadPlugin/actions/indexAction.class.php(32): InformationObjectIndexAction->execute(Object(sfWebRequest)) #2 /usr/share/nginx/atom/cache/qubit/prod/config/config_core_compile.yml.php(966): sfIsadPluginIndexAction->execute(Object(sfWebRequest)) #3 /usr/share/nginx/atom/cache/qubit/prod/config/config_core_compile.yml.php(961): sfExecutionFilter->executeAction(Object(sfIsadPluginIndexAction)) #4 /usr/share/nginx/atom/cache/qubit/prod/config/config_core_compile.yml.php(947): sfExecutionFilter->handleAction(Object(sfFilterChain), Object(sfIsadPluginIndexAction)) #5 /usr/share/nginx/atom/cache/qubit/prod/config/config_core_compile.yml.php(1045): sfExecutionFilter->execute(Object" while reading response header from upstream, client: 10.10.10.1, server: _, request: "GET /example-fonds-creation-and-accumulation-dates-all-fields HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.atom.sock:", host: "10.10.10.10", referrer: "http://10.10.10.10/informationobject/browse"
I restarted nginx, php7.0-fpm, memcached and the atom-worker, and did a symfony cc, but the error is still occurring.
#16 Updated by Dan Gillean 1 day ago
- File atom-markdown-test.csv added
- Status changed from QA/Review to Feedback
- Assignee changed from Dan Gillean to José Raddaoui Marín
I wonder if your fix for the full-width treeview title might have introduced a regression?
The CSV i'm attaching used to work as a demo of all supported Markdown fields, added in the scope and content.
When I test now, it breaks the full-width treeview - the treeview ends up being displayed in the body of the record!
I think we should ensure that no markdown a user adds breaks other parts of the page layout, if possible.
#17 Updated by José Raddaoui Marín about 17 hours ago
- Status changed from Feedback to Code Review
- Assignee changed from José Raddaoui Marín to Mike Cantelon
Good catch Dan!
It was more related to the FWTV JS code than to the Markdown feature, we need to be more careful from now own with our CSS and JS selectors to avoid piking elements created by the Markdown transformation, but it won't be easy to find them all. At least you found this one and it was an easy fix.
Ready for code review in: https://github.com/artefactual/atom/pull/859