general search returns all descriptions from a repository when search string matches a string in the repository record
|Assignee:||Dan Gillean||% Done:|
|Category:||Search / Browse|
|Target version:||Release 2.4.0|
|Google Code Legacy ID:||Tested version:||2.0.0, 2.0.1|
I came across this with our test site, in searching for a photographer name (Dommasch) - a search for dommasch returned all the records from U of S. A virtual exhibit relating to the Dommasch collection is mentioned in the Finding aids, guides and publications field. Similarly Saskatchewan Wheat Pool - so "wheat" gives you all records. Same thing for "Saskatoon", which seems be coming from the contact area. You can see an example of this in the AC beta by searching for Toronto within York University Archives & Special Collections.To reproduce:
- start with a reasonably small data set (i.e. so that re-indexing is quick)
- find a repository record with at least of the description area fields populated.
- look for a word not likely to appear in every description from that repository. E.g. "email" for Regional Municipality of Waterloo, on the AC beta site
- do a search for that word - if necessary, limited to the repository in question
- all the descriptions from that repository will be in the result set
- now edit the repository record; remove the word in question
- rebuild the search index
- try the search again - this time only the relevant records (perhaps none) will be returned
This applies to the general search box but also the advanced search, using "any field"
#1 Updated by Dan Gillean over 8 years ago
- Subject changed from general search returning results from repository records to general search returns all descriptions from a repository when search string matches a string in the repository record
- Category set to Search / Browse
- Assignee set to Jesús García Crespo
- Priority changed from Medium to High
- Target version set to Release 2.0.2
Wow, this is a weird bug. Thanks for catching it Tim. It took me a bit to understand what the issue was (it takes a strange set of circumstances to be able to reproduce), but I have managed to recreate it. I've updated the title of the issue ticket to try to better express my understanding of what's going on - feel free to edit and improve if you think I've missed the issue.
#5 Updated by Jesús García Crespo over 7 years ago
- Status changed from New to Feedback
- Assignee changed from Jesús García Crespo to Dan Gillean
In Elasticsearch, we embed the repository document in each of its descriptions. When you do a search we set the query string query to query _all if you are logged in. For anonymous users, we set the fields where we want to search based in the visibility of the fields (see example for ISAD).
So the problem with _all is that it makes the query to search for every field where "include_in_all" is true. In AtoM, that's true for every i18n field of every time (e.g. QubitInformationObject) and every nested i18n property of nested entities (e.g. QubitRepository is nested into QubitInformationObject).
The safest solution would be to stop the search query from using _all and declare a list of fields manually as we do with anonymous users. It would be a relative easy fix. Changing the mapping, the include_in_all properties, etc... would be easy too but I'm afraid that would break other parts of the app that may rely on that specific features.
Thoughts? Let me know if you need to clarify more details about the implementation.