Bug #5944

general search returns all descriptions from a repository when search string matches a string in the repository record

Added by Tim Hutchinson over 6 years ago. Updated almost 4 years ago.

Status:VerifiedStart date:11/12/2013
Priority:HighDue date:
Assignee:Dan Gillean% Done:

0%

Category:Search / Browse
Target version:Release 2.4.0
Google Code Legacy ID: Tested version:2.0.0, 2.0.1
Sponsored:Yes Requires documentation:

Description

I came across this with our test site, in searching for a photographer name (Dommasch) - a search for dommasch returned all the records from U of S. A virtual exhibit relating to the Dommasch collection is mentioned in the Finding aids, guides and publications field. Similarly Saskatchewan Wheat Pool - so "wheat" gives you all records. Same thing for "Saskatoon", which seems be coming from the contact area. You can see an example of this in the AC beta by searching for Toronto within York University Archives & Special Collections.

To reproduce:
  • start with a reasonably small data set (i.e. so that re-indexing is quick)
  • find a repository record with at least of the description area fields populated.
  • look for a word not likely to appear in every description from that repository. E.g. "email" for Regional Municipality of Waterloo, on the AC beta site
  • do a search for that word - if necessary, limited to the repository in question
  • all the descriptions from that repository will be in the result set
  • now edit the repository record; remove the word in question
  • rebuild the search index
  • try the search again - this time only the relevant records (perhaps none) will be returned

This applies to the general search box but also the advanced search, using "any field"


Related issues

Related to Access to Memory (AtoM) - Bug #8128: Actor histories return archival description results durin... Verified 03/23/2015

History

#1 Updated by Dan Gillean over 6 years ago

  • Subject changed from general search returning results from repository records to general search returns all descriptions from a repository when search string matches a string in the repository record
  • Category set to Search / Browse
  • Assignee set to Jesús García Crespo
  • Priority changed from Medium to High
  • Target version set to Release 2.0.2

Wow, this is a weird bug. Thanks for catching it Tim. It took me a bit to understand what the issue was (it takes a strange set of circumstances to be able to reproduce), but I have managed to recreate it. I've updated the title of the issue ticket to try to better express my understanding of what's going on - feel free to edit and improve if you think I've missed the issue.

#2 Updated by Tim Hutchinson over 6 years ago

Thanks Dan, I think that captures it better.

I think this will particularly come up if institutions use the ISDIAH record to summarize collection strengths. As well as the city name.

#3 Updated by Dan Gillean almost 6 years ago

  • Priority changed from High to Critical
  • Target version changed from Release 2.0.2 to Release 2.1.0
  • Tested version 2.0.0, 2.0.1 added

Bumping to critical and 2.1 in the hopes of fixing this for the upcoming release.

#4 Updated by Dan Gillean almost 6 years ago

  • Priority changed from Critical to High

#5 Updated by Jesús García Crespo almost 6 years ago

  • Status changed from New to Feedback
  • Assignee changed from Jesús García Crespo to Dan Gillean

In Elasticsearch, we embed the repository document in each of its descriptions. When you do a search we set the query string query to query _all if you are logged in. For anonymous users, we set the fields where we want to search based in the visibility of the fields (see example for ISAD).

So the problem with _all is that it makes the query to search for every field where "include_in_all" is true. In AtoM, that's true for every i18n field of every time (e.g. QubitInformationObject) and every nested i18n property of nested entities (e.g. QubitRepository is nested into QubitInformationObject).

The safest solution would be to stop the search query from using _all and declare a list of fields manually as we do with anonymous users. It would be a relative easy fix. Changing the mapping, the include_in_all properties, etc... would be easy too but I'm afraid that would break other parts of the app that may rely on that specific features.

Thoughts? Let me know if you need to clarify more details about the implementation.

#6 Updated by Jesús García Crespo almost 6 years ago

  • Target version changed from Release 2.1.0 to Release 2.2.0

We want to put some more thought after the release.

#7 Updated by José Raddaoui Marín over 5 years ago

  • Related to Bug #8128: Actor histories return archival description results during search when linked as subjects added

#8 Updated by Dan Gillean over 5 years ago

  • Target version deleted (Release 2.2.0)

#10 Updated by Dan Gillean almost 4 years ago

  • Status changed from Feedback to Verified
  • Target version set to Release 2.4.0
  • Sponsored changed from No to Yes

Also available in: Atom PDF