Bug #5743

Institution dedicated search not returning results from some fields

Added by Dan Gillean over 8 years ago. Updated almost 8 years ago.

Status:VerifiedStart date:10/03/2013
Priority:MediumDue date:
Assignee:José Raddaoui Marín% Done:

100%

Category:Search / Browse
Target version:Release 2.1.0
Google Code Legacy ID: Tested version:
Sponsored:No Requires documentation:

Description

The dedicated search bar on the archival institution page is supposed to be global. However, testing has shown that certain select fields are missing:

To reproduce
1) Create a new archival institution
2) add unique data to each field in the edit template, to be able to test searching. Save.
3) Navigate to /repository/browse (Browse archival institution)
4) Search for data held in each field

Resulting error
Some fields are inexplicably not returning results:
  • Finding aids, guides, and publications (description area)
  • Access conditions and requirements (Access area)
  • Accessibility (Access area)

Note that search does not return results for data in the Control area - and for the address, it only returns hits on the locality (e.g. city). I think this is fine - but the other fields should be indexed at some point.


Related issues

Related to Access to Memory (AtoM) - Bug #5742: Authority record dedicated search not returning results f... Verified 10/03/2013
Related to Access to Memory (AtoM) - Bug #5741: Title should be weighted in search results for authority ... New 10/03/2013

History

#1 Updated by Jesús García Crespo over 8 years ago

  • Target version changed from Release 2.0.0 to Release 2.0.1

#2 Updated by Dan Gillean over 8 years ago

Note as well the recommendations for weighting Archival institution search results, on #5741

#3 Updated by Dan Gillean over 8 years ago

Also not indexed for searching (based on testing):

  • Parallel form(s) of name (Identity area)
  • Other form(s) of name (Identity area)

In the issue description I said that nothing in the control area was returning results, but this doesn't seem to be true, on further testing. The following fields will be returned: Dates of creation, revision, and deletion; Sources.

The following missing field, then, should be added:

  • Maintenance notes (control area)

#4 Updated by Dan Gillean over 8 years ago

Weirdness: in 2x I suddenly was getting a hit on data entered into Finding aids, guides, and publications (in the description area). However, when I tried to add test data to a different repository in the same field, I still am not getting any hits when searching for the term. I can't tell by GUI testing what is indexed and what's not.

#5 Updated by Tim Hutchinson over 8 years ago

I think the issue may be that the search index isn't getting updated after the record is updated. Or maybe just for certain fields? - not sure how that works. In any case, I tried editing a repository record with a made-up word. Searching for that didn't return any results; but after rebuilding the search index, it did.

#6 Updated by Jesús García Crespo over 8 years ago

  • Target version changed from Release 2.0.1 to Release 2.0.2

#7 Updated by José Raddaoui Marín over 8 years ago

  • Status changed from New to In progress
  • Assignee changed from Jesús García Crespo to José Raddaoui Marín

#8 Updated by José Raddaoui Marín over 8 years ago

  • Status changed from In progress to QA/Review
  • % Done changed from 0 to 100

As you have supposed, there was a problem saving the repositories in the search index. Only some fields were being updated. This is fixed in:

AtoM|commit: 82f4269e50836e0c9880b4425ab7d6a159055a48

But the following fields are not in the search index yet:

  • Parallel form(s) of name (Identity area)
  • Other form(s) of name (Identity area)
  • Maintenance notes (control area)

Please, let me know if I should add them.

#9 Updated by Dan Gillean over 8 years ago

Hi Radda,

My initial tests on this look really good. There are some fields i don't get hits on, but I think it's fine - I just want to confirm this with you, for the documentation:

Not indexed:

Identity area (not indexed)
- Archive type

FYI, I think that we should add both forms of name (parallel and other) to the index, as name searches will be the primary search type performed. I'm less concerned about maintenance notes - the AC data demonstrates that this field is rarely used - esp since there is no visible elements module for the ISDIAH record, so archivists will not add internal notes here that the public can't see. I think we can leave it out of the index unless there is compelling reason to add it.

Contact area (NOT in index)
- Primary Contact
- Contact type
- Street address
- Postal code
- Country name
- Telephone
- Fax
- Email
- URL

(i.e. the only things indexed in this area are: locality, region, and note - is that right?)

If possible, I would propose adding the Primary Contact name to the index. If the information is public, it is conceivable that someone could interact with an archivist through their research, and remember the name, but not the institution specifics (if they are searching broadly, such as in a national portal). If this is difficult for some reason, it's not a priority, but it could be useful.

Control area (NOT in index)
- Description identifier
- Language
- Script
- Status
- Level of detail
- Maintenance notes

See my comments above about maintenance notes - I think it's fine to exclude for now.

Let me know if this is correct! Thanks.

Finally, this is mentioned on issue #5741, so perhaps we should address it there, but: I wonder if it is possible to weight the searches, or if this is already done? The proposal I made over there was:

Authorized form of name (i.e. Title) - x10
Identifier - x5
Other forms of name - x3
Contact information - x3

I realize that, with the "relevance" option removed from the sort button, there is confusion with returning weighted results versus the sorting of most recent/alphabetic. Ideally, the "relevance" option would appear as the default in the sort button ONLY when a search has been performed, and the user could still switch the results presented to most recent/alphabetic if they wanted. "Relevance" would NOT appear as an option when browsing.

As I said, this might be better dealt with over on #5741, but since you are working on this, I wanted to mention it.

Thanks Radda! So - add the 2 forms of name, don't worry about maintenance notes, and let me know if we can do anything about weighting. Cheers!

#10 Updated by Dan Gillean over 8 years ago

  • Status changed from QA/Review to Feedback

#11 Updated by José Raddaoui Marín over 8 years ago

  • Status changed from Feedback to QA/Review

AtoM|commit: a6a1ff01408b4488a175ea7a51df63f95fc3e08b

Hi Dan,

Both forms of name (parallel and other) and the Primary Contact name are now included in the ES index (it needs to be rebuilt before testing).

Some of the fields you said are in the search index but they're not included in '_all' (an ES field that includes the fields we want), and this queries are made over the '_all' field.

About weighting, we'll need to stop using the '_all' field and specify the fields and their weight manually. It's done in the accessions search, so it shouldn't be a problem.

#12 Updated by José Raddaoui Marín over 8 years ago

  • Assignee changed from José Raddaoui Marín to Austin Trask

Hi Austin, please update the search index for 2x.test.artefactual.com again. Thanks ;)

#13 Updated by Dan Gillean over 8 years ago

  • Status changed from QA/Review to Verified
  • Assignee changed from Austin Trask to José Raddaoui Marín

I have verified that the additional fields have been added, and return results as expected. Marking this issue as verified - questions of weighting and relevance sorting can be addressed on the issue ticket for #5741.

Thanks Radda!

#14 Updated by Dan Gillean almost 8 years ago

  • Target version changed from Release 2.0.2 to Release 2.1.0

Also available in: Atom PDF