Bug #12186

Searching for titles with stop words will not give results

Added by Dan Gillean 6 months ago.

Status:NewStart date:05/03/2018
Priority:MediumDue date:
Assignee:-% Done:

0%

Category:-
Target version:-
Google Code Legacy ID: Tested version:2.4
Sponsored:No Requires documentation:

Description

  • Navigate to the AtoM demo site
  • Search for department of medical imaging" - 0 results
  • Now search for department medical imaging - 1 result

Error encountered

No results found for searcch on department of medical imaging, even though there is a matching result.

Expected result

1 matching record returned when search is: department of medical imaging

Analysis

I believe that 2 factors in ES are causing this issue - the fact that we use the English stopwords list, and that our default Boolean operator is AND in 2.4 and later. In ES, default English stopwords are as follows:

a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with

This means that "of" is being stripped from the index. However, our default Boolean operator is now AND in 2.4 or later, meaning that a search for department of medical imaging is in fact:

department AND of AND medical AND imaging

Only, with the stopword removed from the index, AtoM doesn't find a record that matches the requirements.

You think this issue would come up for many ES users, but interestingly, I couldn't find many other examples of users reporting issues like this - so it's possible that there's something about how we've implemented our index settings that is incorrect.

More on stopwords:

There are alternatives to stopwords we could investigate as well - like using Common terms instead. See for example:

There is also the common_grams token filter:

And likely other configuration options.

I think we need to first investigate if this is something we can adjust to work better while using the various stopwords lists. Barring that, we may have to investigate some of these alternative methods - which may require development support.

Also available in: Atom PDF