Feature #8687
Improve ES anlyzers for a better work with diacritics
Status: | Verified | Start date: | 06/03/2015 | |
---|---|---|---|---|
Priority: | Medium | Due date: | ||
Assignee: | Dan Gillean | % Done: | 0% | |
Category: | Search / Browse | |||
Target version: | Release 2.3.0 | |||
Google Code Legacy ID: | Tested version: | |||
Sponsored: | Yes | Requires documentation: |
Description
1. The researches give different results when we use accents/diacritics and when we don't.
For example : researches for "évaluation" and "evaluation" / "hôtel" and "hotel" / "déjà" and "deja" won't give the same results.
Would it be possible to change the catalogue so it won't take care of the diacritic marks / accents when we do some researches?2. When we search for a word like "évaluation", we won't get in the results all the descriptions including "l'évaluation" or "d'évaluation".
Would it be possible to correct this so we find all the results (including descriptions with "l'évaluation" and "d'évaluation"), when entering "évaluation" in the search menu?
This is a problem for all the words beginning with a-e-i-o-u-y, since we often use them with "l'" or "d'" in french. The sign ' should be considered as a separation between two words.
Proposed solutions:
- Implement the ASCII folding token filter for French to ignore diacritics in searches to address case #1
- Implement the Elision token filter to address case #2
Related issues
History
#2 Updated by José Raddaoui Marín almost 7 years ago
- File deleted (
french_token_filters.png)
#3 Updated by José Raddaoui Marín almost 7 years ago
- File deleted (
french_analyzer.png)
#4 Updated by José Raddaoui Marín almost 7 years ago
- File deleted (
i18n_fr_fields_mapping.png)
#5 Updated by José Raddaoui Marín almost 7 years ago
- Status changed from New to Code Review
- Assignee changed from José Raddaoui Marín to Mike Gale
#6 Updated by Mike Gale almost 7 years ago
- Status changed from Code Review to Feedback
- Assignee changed from Mike Gale to José Raddaoui Marín
#7 Updated by Mike Gale almost 7 years ago
Looks good
#8 Updated by José Raddaoui Marín almost 7 years ago
- Status changed from Feedback to QA/Review
- Assignee changed from José Raddaoui Marín to Dan Gillean
Merged in qa/2.3.x
The search index needs to be rebuilt.
#9 Updated by José Raddaoui Marín almost 7 years ago
- Related to Bug #8676: Elasticsearch analyzers not working over 'multi_field' type fields added
#10 Updated by José Raddaoui Marín almost 7 years ago
I've added the asciifolding filter to the ES default analyzer to make it work with non i18n fields:
https://github.com/artefactual/atom/commit/e510528327112720ab7de4781d1bba9dc3ce0185
The search index needs to be rebuilt again.
#11 Updated by Dan Gillean over 6 years ago
- Status changed from QA/Review to Verified
- Sponsored changed from No to Yes
- Requires documentation deleted (
Yes)