Feature #1889

Add underscores and periods as tokenizers to value list in ElasticSearch

Added by Evelyn McLellan over 8 years ago. Updated over 4 years ago.

Status:NewStart date:
Priority:LowDue date:
Assignee:Mike Cantelon% Done:

0%

Category:Index/Search
Target version:-
Google Code Legacy ID:archivematica-1234 Pull Request:
Sponsored:No Requires documentation:

Description

History

#1 Updated by Mike Cantelon over 8 years ago

Yeah, mentioning to users that they can do wildcard searches might be good rather than changing the tokenizer.

If a user enters "*master*" or "Members_Master2009.xls" now they'd end up seeing the entry for Members_Master2009.xls. If we change the tokenizer so it splits words up by underscores, etc., we might lose the ability to search for "Members_Master2009.xls" precisely.

I've posted a question online to see if anyone has any tokenizer advice on how we could tokenize "Members_Master_2009.xls" into "members", "master", and "Members_Master_2009.xml".

#2 Updated by Mike Cantelon over 8 years ago

  • Sponsored set to No

Bump to 1.1?

#3 Updated by Mike Cantelon over 8 years ago

  • Target version changed from Release 0.10-beta to Release 1.0.0

#4 Updated by Evelyn McLellan about 8 years ago

  • Category set to Index/Search

#5 Updated by Mike Cantelon almost 8 years ago

Updating analysis then testing...

curl -XPOST 'http://192.168.1.70:9200/aips/_close'
curl -XPUT 'http://192.168.1.70:9200/aips/_settings' -d '{
"index": {
"analysis" : {
"analyzer": {
"default": {
"tokenizer" : "standard",
"filter" : ["preserve_hyphens_filter", "lowercase", "stop"]
},
"filter" : {
"preserve_hyphens_filter" : {
"type" : "word_delimiter",
"generate_word_parts": false,
"catenate_words": true
}
}
}
}
}
}'
curl -XPOST 'http://192.168.1.70:9200/aips/_open'
curl -XGET '192.168.1.70:9200/aips/_analyze?field=msg&pretty=1' -d "Run to the hills and rock-around-town."

#6 Updated by Courtney Mumma over 7 years ago

  • Target version changed from Release 1.0.0 to Release 1.1.0

#7 Updated by Justin Simpson about 7 years ago

  • Target version deleted (Release 1.1.0)

We do not have advanced search functionality on the Archivematica roadmap at the present time. We would like to provide a better search interface, at some point, I am moving this ticket out of any targetted version queues until the feature can have requirements generated.

#8 Updated by Justin Simpson over 4 years ago

  • Priority changed from High to Low

Also available in: Atom PDF