Feature #6695

Advanced search by reference code

Added by José Raddaoui Marín about 8 years ago. Updated over 7 years ago.

Status:VerifiedStart date:05/12/2014
Priority:MediumDue date:05/18/2014
Assignee:Dan Gillean% Done:

100%

Category:Search / BrowseEstimated time:4.00 hours
Target version:Release 2.1.0
Google Code Legacy ID: Tested version:2.0.1
Sponsored:No Requires documentation:

Description

Limiting the search criteria to search by identifier causes AtoM to search only for a complete match between the search term string (case sensitive) and the identifier field (information_object.identifier). Search should not be case sensitive, should search on reference code, and allow for partial matches and wildcard searches.


Related issues

Related to Access to Memory (AtoM) - Bug #5851: Advanced search - search in identifier only returns exact... Duplicate 10/22/2013

History

#1 Updated by José Raddaoui Marín about 8 years ago

  • Category set to Search / Browse
  • Status changed from New to QA/Review
  • Assignee changed from José Raddaoui Marín to Dan Gillean
  • % Done changed from 0 to 100

Fixed in 2.x, commit: 8c83658. The search index needs to be rebuilt.

#2 Updated by Sarah Romkey almost 8 years ago

  • Status changed from QA/Review to Feedback
  • Assignee changed from Dan Gillean to José Raddaoui Marín

Tested in 2x, in advanced search screen on Reference Code field
- Searching for exact reference code returns no results
- Searching for partial reference code returns no results. Also tested in quotation marks, also returns no results.
- wildcard searching verified- worked with * to represent either characters before or after the search entered.
- As long as wildcard is used, search is not case sensitive. Cannot test with exact reference code in different case because that search is not working.

#3 Updated by Dan Gillean almost 8 years ago

Other notes:

Take the following record - http://2x.test.artefactual.com/test-doc-01

Full inherited reference code (including country and repository info from ISDIAH record): CA LAW PF11-PF11-1-984040-1-1-001

1) Search for "CA LAW PF11-PF11-1-984040-1-1-001" = no results found
2) Search for "PF11-PF11-1-984040-1-1-001" = 1 record (the one I wanted)
3) Search for CA LAW PF11-PF11-1-984040-1-1-001 = 16007 results, but at least my record is first
4) Search for PF11-PF11-1-984040-1-1-001 = 9791 results, but at least mine is first
5) Various attempts to use inheritReferenceCode: prefix were not consistent, but maybe I just need the usage of this explained.

Expected outcome
If the full inherited ref code includes the country and repo identifiers, then when I search "CA LAW PF11-PF11-1-984040-1-1-001" I should get 1 result - same as 2) above

#4 Updated by José Raddaoui Marín almost 8 years ago

  • Status changed from Feedback to QA/Review
  • Assignee changed from José Raddaoui Marín to Dan Gillean

The field was not being analyzed. It's fixed in commit:fcd88f7, but the search index needs to be rebuilt again.

Also notice that '-' and ' ' may be interpreted as reserved characters:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_reserved_characters

#5 Updated by Dan Gillean over 7 years ago

  • Status changed from QA/Review to Feedback
  • Assignee changed from Dan Gillean to José Raddaoui Marín
  • Tested version 2.0.1 added

I'm still getting the exact same results as above, in our new test environment - see for example: http://qa-21x.test.artefactual.com/test-doc-one

Full reference code of this description, counting country and repository code, is: CA ON00311 PF11-PF11-1-984040-1-1-001

  • Navigate to advanced search page
  • Use "Reference code" as the field to search

1) Search CA ON00311 PF11-PF11-1-984040-1-1-001 = 25114 results
2) Search "CA ON00311 PF11-PF11-1-984040-1-1-001" = 0 results
3) Search PF11-PF11-1-984040-1-1-001 = 4179 results

Looks like the correct syntax for using the general searchbox field is inheritReferenceCode=""

inheritReferenceCode="PF11-PF11-1-984040-1-1-001" = 1 result
inheritReferenceCode="CA ON00311 PF11-PF11-1-984040-1-1-001" = 0 results

Expected outcome
If the full inherited ref code includes the country and repo identifiers, then when I search "CA LAW PF11-PF11-1-984040-1-1-001" I should get 1 result - same as 2) above

If this isn't possible, let me know - I can always update the docs to clarify this. However, as I said above, if we are adding the country code and the repository code to all reference numbers then I think users will expect that they can search using them as well.

#6 Updated by José Raddaoui Marín over 7 years ago

  • Status changed from Feedback to QA/Review
  • Assignee changed from José Raddaoui Marín to Dan Gillean

Hi, a few notes:

- The field name has been recently changed to "referenceCode"
- Country and identifier from the repository are added to the field, but, if you add that data to the repository after the relation with the descriptions is made, those description won't have it in the reference code until the search index is rebuilt or they are saved again.
- Spaces and '-' break the string if you are not using double quotes and it's actually querying for: CA OR ON00311 OR PF11 OR PF11 OR 1 ...
- I'm getting the expected result when using double quotes, probably because the index was rebuilt last night:

http://qa-21x.test.artefactual.com/search?query=%22CA+ON00311+PF11-PF11-1-984040-1-1-001%22
http://qa-21x.test.artefactual.com/search/advanced?f=&so0=and&sq0=%22CA+ON00311+PF11-PF11-1-984040-1-1-001%22&sf0=referenceCode

#7 Updated by Dan Gillean over 7 years ago

  • Status changed from QA/Review to Verified

Sounds good! Works well, I think. FYI for those reading, reserved special characters can be escaped using a back slash: so one could search a reference code like so: PF11\-PF11\-1\-984040\-1\-1\-001 or "PF11\-PF11\-1\-984040\-1\-1\-001"

You can search without escaping the dashes, and it seems to work just the same. For example:

  • PF11-PF11-1-984040-1-1-001 = 4192 results (but with the correct result as the first returned)
  • PF11\-PF11\-1\-984040\-1\-1\-001 = 4192 results (correct result is also first returned)
  • "F11\-PF11\-1\-984040\-1\-1\-001" = 1 result (correct)
  • PF11-PF11-1-984040-1-1-001 = 1 result (correct)
  • CA ON00311 PF11-PF11-1-984040-1-1-001 = 25295 results (correct result is first returned)
  • CA ON00311 PF11\-PF11\-1\-984040\-1\-1\-001 = same
  • "CA ON00311 PF11-PF11-1-984040-1-1-001" = 1 result (correct)
  • "CA ON00311 PF11\-PF11\-1\-984040\-1\-1\-001" = same

Ergo: best way to search for a full reference code is to use quotations. Escaping does not seem necessary. Searches without quotations will return more results as they are tokenized by the search, but the high frequency of matches, on long ref codes at least, still ensures that the desired result is returned high in the results list.

Also available in: Atom PDF