Bug #4375

Wrong number of Information Objects itens during rebuild of search index - duplicated search results on Icaatom 1.3.0

Added by Anonymous about 9 years ago. Updated over 8 years ago.

Status:VerifiedStart date:
Priority:CriticalDue date:
Assignee:David Juhasz% Done:

0%

Category:Search / Browse
Target version:Release 1.3.1
Google Code Legacy ID:atom-2427 Tested version:
Sponsored:No Requires documentation:

Description

Google user: crislea...@gmail.com

To reproduce this error: ========================
1) execute "php symfony search:populate QubitSearch"
2) version 1.3.0 - default culture - pt_BR
3) Linux Open Suse 11.2 with PHP 5.3.3, Apache 2.2.13 and MySQL 5.1.49

Resulting error: ================
Incorrect number of itens indexed. There aren´t 15733 information objetcts. It should be 11397.
And during search on ICA-AtoM interface we see some duplicated results.

sfSearch >> QubitInformationObject - Repensando o Brasil 500 anos depois inserted (352.13s) (15731/11397)
sfSearch >> QubitInformationObject - II Encontro Nacional das Frentes Parlamentares do Cooperativismo inserted (352.15s) (15732/11397)
sfSearch >> QubitInformationObject - Coleção Memória Política de Minas (Lançamento dos livros de Oscar Correia e Armando Ziller) inserted (352.17s) (15733/11397)

Expected result: ================
The command to populate search index
should reindex 11397 information objects.

sfSearch >> QubitInformationObject - Coleção Memória Política de Minas (Lançamento dos livros de Oscar Correia e Armando Ziller) inserted (352.17s) (11397/11397)

[g] Legacy categories: Search / browse


Related issues

Duplicates Access to Memory (AtoM) - Bug #584: Wrong number of Information Objects added during rebuild ... Verified 10/11/2012

History

#1 Updated by Anonymous about 9 years ago

- Missing comment -

#2 Updated by Anonymous about 9 years ago

I´ve found an weird thing on my information_object_i18n table...
My default culture is pt_BR.
But some itens had the field "culture" with "en".
And there were 3 lines with id duplicated... One with culture "en" and other with culture "pt_BR".
The lines with culture "en" had all fields null, except id and culture.
I´ve fixed it and now the rebuild task worked successfully.

Let me try to show what I did.
First of all, I did a full backup of the database.
And I ´ve created backup tables of information_object and information_object_i18n tables.

select id, count()
from information_object_i18n
group by id
having count(
) > 1;

id count(*)
496 2
8766 2
8767 2

select * from information_object_i18n
where id in (496, 8766, 8767)
order by id;

The lines with culture "en" had all fields with null, except id and culture...
The lines with culture "pt_BR" were ok.

"title","alternate_title","edition","extent_and_medium","archival_history","acquisition","scope_and_content","appraisal","accruals","arrangement","access_conditions","reproduction_conditions","physical_characteristics","finding_aids","location_of_originals","location_of_copies","related_units_of_description","institution_responsible_identifier","rules","sources","revision_history","id","culture"
"","","","","","","","","","","","","","","","","","","","","",496,"en"
"Elaboração do Projeto","","","Textual, 0,10 metros lineares","","","Contém documentos relativos às atividades realizadas na Comissão Constitucional, tendo em vista a elaboração, a discussão e a votação do Projeto de Constituição, tais como: correspondências, abaixo-assinados, sugestões, propostas, estudos técnicos, relatórios, listagens, emendas, pareceres, destaques, requerimentos e pronunciamentos","","","","sem restrições","sem restrições, mediante compromisso de crédito","","","","","","","","","Data da Descrição: 13/08/2010",496,"pt_BR"
"","","","","","","","","","","","","","","","","","","","","",8766,"en"
"16ª Legislatura","","","","","","Compreende o conjunto de documentos produzidos pelas Comissões Permanentes durante a 16ª Legislatura (01/02/2007 - 31/01/2011). Inclui os dossiês das reuniões ordinárias, extraordinárias, especiais e visitas realizadas pelas Comissões","","","Os conjuntos documentais são dispostos por Comissão (arranjadas fisicamente por ordem alfabética). Dentro de cada comissão os dossiês de cada reunião, audiência pública ou visita estão por ordem cronológica descrescente.","Sem restrições","Sem restrições, mediante autorização e compromisso de crédito","","","","","","","","","Descrição: jul./2012",8766,"pt_BR"
"","","","","","","","","","","","","","","","","","","","","",8767,"en"
"6ª Legislatura","","","","","","Compreende o conjunto de documentos produzidos pelas Comissões Permanentes durante a 6ª Legislatura (01/02/1967 a 31/01/1971). Inclui os dossiês das reuniões ordinárias, extraordinárias, especiais e visitas realizadas pelas Comissões.","","","Os conjuntos documentais são dispostos por Comissão (arranjadas fisicamente por ordem alfabética). Dentro de cada comissão os dossiês de cada reunião, audiência pública ou visita estão por ordem cronológica descrescente.","Sem restrições","Sem restrições, mediante autorização e compromisso de crédito","","","","","","","","","Descrição: jul./2012",8767,"pt_BR"

Then I´ve decided to create 3 new information object itens (information_object), just temporarily.

I´ve updated the 3 lines above (information_object_i18n) that were with "en" culture to these new temporary IDs created.
Just to be sure that the IDs wouldn´t match to the ones that were correct with "pt_BR" culture.

Finally I´ve deleted these 3 lines, from information_object_i18n (the updated ones) and from information_object (the temporary ones created).

Now, the tables information_object and information_object_i18n don´t have any entries with culture "en" anymore.
And no more weird lines with many fields with null value...

The "search:populate" task worked successfully.

And no more duplicated results during search.

#3 Updated by David Juhasz almost 9 years ago

  • Status changed from New to In progress
  • Priority set to Critical
  • Target version set to Release 1.4.0

This is due to an error in the way search:populate was handling multi-lingual descriptions. :(

I've started work on a fix:
https://github.com/artefactual/atom/commit/18409663acad8e9d7f292711a11e69d4d1edccae

[g] Labels added: Component-SearchBrowse, Priority-Critical, Milestone-Release-1.3.1
[g] New owner: David Juhasz

#4 Updated by David Juhasz almost 9 years ago

  • Status changed from In progress to QA/Review

#5 Updated by Jessica Bushey almost 9 years ago

  • Assignee changed from Jessica Bushey to David Juhasz
  • Sponsored set to No

David - I'm not sure how to test this fix in order to verify.

#6 Updated by David Juhasz almost 9 years ago

  • Status changed from QA/Review to Verified

#7 Updated by David Juhasz over 8 years ago

  • Category set to Search / Browse
  • Target version changed from Release 1.4.0 to Release 1.3.1

Also available in: Atom PDF