Feature #11761

Update slug generation rules include a more permissive option

Added by Dan Gillean over 1 year ago. Updated 24 days ago.

Status:VerifiedStart date:12/01/2017
Priority:MediumDue date:
Assignee:-% Done:

0%

Category:RoutingEstimated time:80.00 hours
Target version:Release 2.5.0
Google Code Legacy ID: Tested version:
Sponsored:Yes Requires documentation:No

Description

Currently, AtoM slugs will only allow lowercase characters and dashes - all capitalizations are lowercased and special characters are sanitized when a new slug is generated based on the title of a resource. This is described somewhat further in the AtoM documentation, here:

Generated slugs will only allow digits, letters, and dashes. Sequences of unaccepted characters (e.g. accented or special characters, etc.) are replaced with valid characters such as English alphabet equivalents or dashes. This conforms to general practice around slug creation - for example, it is “common practice to make the slug all lowercase, accented characters are usually replaced by letters from the English alphabet, punctuation marks are generally removed, and long page titles should also be truncated to keep the final URL to a reasonable length” (Wikipedia). In AtoM, slugs are truncated to a maximum of 250 characters.

However, these rules are in fact more restrictive than what is currently necessary to generate valid URL URI path segments. For more information, see: https://en.wikipedia.org/wiki/Uniform_Resource_Identifier

This enhancement will add a new setting option in Admin > Settings > Global which will allow users to select more permissive rules for slug generation throughout the application. The default setting will preserve AtoM's current behavior, for legacy users. When enabled, this setting will allow any valid URI PATH segment character to appear in a slug, including UTF-8 glyphs for special characters. Restricted IRI characters ( /?#{} ) and literal spaces will be replaced with dashes '-' as we currently do.

Development Notes:

- Affect all object types but not computed random slugs (e.g. like those for terms or users)
- currently being highly restricted to subset of valid chars in slugify()
- the colon ":" is important for the sponsoring client and must be included in the allowed characters.
- valid URI PATH segment characters should NOT be percent encoded.
- currently there is logic to ensure slugs are not duplicated. This will need to be maintained.
- upper case characters are currently being forced to lower case in slugify(). This will be changed for when the setting is enabled, so that capitalization is factored into the uniqueness of a slug.
- Must ensure that the slug-generation CLI task will check and respect the new setting

Development tasks

  • Add code to allow any valid URL URI path segment character to appear in a slug
    • Replace reserved characters and literal spaces with a dash character
  • Make slugs case sensitive in URL
  • Add configuration in Admin > Settings > Global to allow a choice between using the old and new slug generation logic
    • Add migration to ensure that default setting is for old/current slug generation rules, for upgrading users
    • Present warning on setting change about need to change Nginx configuration changes
  • Update AtoM's propel:generate-slugs CLI task to check and respect the new setting
  • Add unit tests to validate slug generation output

History

#2 Updated by Dan Gillean about 1 month ago

  • Status changed from New to Verified

#3 Updated by Dan Gillean 24 days ago

  • Requires documentation changed from Yes to No

Also available in: Atom PDF