Bug #12186

Searching for titles with stop words will not give results

Added by Dan Gillean 11 months ago. Updated 11 days ago.

Status:NewStart date:05/03/2018
Priority:MediumDue date:
Assignee:-% Done:

0%

Category:-
Target version:-
Google Code Legacy ID: Tested version:2.4
Sponsored:No Requires documentation:

Description

  • Navigate to the AtoM demo site
  • Search for department of medical imaging" - 0 results
  • Now search for department medical imaging - 1 result

Error encountered

No results found for searcch on department of medical imaging, even though there is a matching result.

Expected result

1 matching record returned when search is: department of medical imaging

Analysis

I believe that 2 factors in ES are causing this issue - the fact that we use the English stopwords list, and that our default Boolean operator is AND in 2.4 and later. In ES, default English stopwords are as follows:

a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with

This means that "of" is being stripped from the index. However, our default Boolean operator is now AND in 2.4 or later, meaning that a search for department of medical imaging is in fact:

department AND of AND medical AND imaging

Only, with the stopword removed from the index, AtoM doesn't find a record that matches the requirements.

You think this issue would come up for many ES users, but interestingly, I couldn't find many other examples of users reporting issues like this - so it's possible that there's something about how we've implemented our index settings that is incorrect.

More on stopwords:

There are alternatives to stopwords we could investigate as well - like using Common terms instead. See for example:

There is also the common_grams token filter:

And likely other configuration options.

I think we need to first investigate if this is something we can adjust to work better while using the various stopwords lists. Barring that, we may have to investigate some of these alternative methods - which may require development support.


Related issues

Related to AtoM Wishlist - Feature #12862: Make search queries ignore stop words instead of killing ... New 03/05/2019

History

#1 Updated by Dan Gillean 11 days ago

Sounds like it should be possible to configure ES to remove stopwords from user queries as well, and not just from the index - this would lead to better search results without having to overhaul how we implement our index. See for example:

#2 Updated by Dan Gillean 9 days ago

  • Related to Feature #12862: Make search queries ignore stop words instead of killing the query added

Also available in: Atom PDF