Bug #8744

Mapping issue causes ElasticSearch indexing to stop with an error

Added by Mike Cantelon about 3 years ago. Updated almost 3 years ago.

Status:VerifiedStart date:07/24/2015
Priority:MediumDue date:
Assignee:Dan Gillean% Done:

0%

Category:Search / Browse
Target version:Release 2.2.1
Google Code Legacy ID: Tested version:2.2
Sponsored:No Requires documentation:

History

#1 Updated by Mike Cantelon about 3 years ago

Made a pull request with the workaround: https://github.com/artefactual/atom/pull/212

#2 Updated by Mike Cantelon about 3 years ago

  • Status changed from New to Code Review
  • Assignee changed from Mike Cantelon to Nick Wilkinson

#3 Updated by Dan Gillean about 3 years ago

  • Tested version 2.2 added

Comments from the related PR: https://github.com/artefactual/atom/pull/212

This tweak to the mapping YAML file fixes an issue where ElasticSearch indexing (via the search:populate task) halts prematurely with an error.

Mapping for each ElasticSearch document type is set, in AtoM, in a YAML file. In the YAML, a type can have a "_foreign_types" property set, which allows it to include the mapping of another type defined in same YAML file. This allows, for example, an "accession" type to include the properties of the "actor" type as the property "creators". This allows creator data to be nested within an accession document.

An issue with how we render mapping from YAML is that the expansion of the "_foreign_types" property is non-recursive. The result of this is
that if a type that has a "_foreign_types" property is included in the mapping of another type before its own "_foreign_types" have been expanded then the mapping that's rendered and sent to ElasticSearch will be incomplete and when AtoM tries to index it will try to put data into properties that haven't been defined in the mapping and with halt with an error.

At some point we should make "_foreign_types" expansion recursive (maybe with a maximum depth to prevent circular references), but the meantime we can work around this by making sure "child" types (included by other types) are placed in the mapping YAML file before "parent" types so the child's "_foreign_types" property will be expanded before it's included by other types.

#4 Updated by Nick Wilkinson about 3 years ago

  • Assignee changed from Nick Wilkinson to Mike Gale

Hi Mike G, can you please take a look at this for Code Review?

#5 Updated by Mike Gale about 3 years ago

  • Status changed from Code Review to Feedback
  • Assignee changed from Mike Gale to Mike Cantelon

Hi Mike, it looks fine. I think that we should maybe put a comment at the top of mapping.yml indicating that the order of ES type definitions matters for this issue, at least until we come up with a better solution.

#6 Updated by Mike Cantelon about 3 years ago

Mike Gale wrote:

Hi Mike, it looks fine. I think that we should maybe put a comment at the top of mapping.yml indicating that the order of ES type definitions matters for this issue, at least until we come up with a better solution.

Yeah, good call. I'll add that.

#7 Updated by Mike Cantelon about 3 years ago

We're going to come out with a 2.2.1 release to fix this issue, I believe, a quick fix is to replace plugins/arElasticSearchPlugin/config/mapping.yml with the contents of this URL: http://tinyurl.com/pde6v8w.

A diff of the change is here (essentially the "accession" type definition needs to be moved so it comes after the "actor" type definition): https://github.com/artefactual/atom/pull/212/files .

#8 Updated by Dan Gillean about 3 years ago

  • Target version changed from Release 2.2.0 to Release 2.2.1

#9 Updated by Mike Cantelon almost 3 years ago

  • Status changed from Feedback to QA/Review
  • Assignee changed from Mike Cantelon to Dan Gillean

#10 Updated by Dan Gillean almost 3 years ago

Mike Cantelon, how should I test this issue? Is this fix in qa/2.3.x as well? Thanks!

#11 Updated by Mike Cantelon almost 3 years ago

The fix is in 2.3.x. I'm try to work out a test case for so you can replicate it, but it's not failing. I'd originally been using user-provided data where I found the issue originally, but I don't the data anymore. Maybe we should just mark it as verified.

#12 Updated by Dan Gillean almost 3 years ago

  • Status changed from QA/Review to Verified

Also available in: Atom PDF