Bug #13423

The full width treeview fails to load for description with 900K children

Added by David Juhasz over 1 year ago. Updated 11 months ago.

Status:VerifiedStart date:09/22/2020
Priority:MediumDue date:
Assignee:-% Done:

0%

Category:Performance / scalability
Target version:Release 2.7.0
Google Code Legacy ID: Tested version:2.6
Sponsored:Yes Requires documentation:No

Description

When viewing a description with ~931 000 children, the full width treeview never renders, and displays a "Loading..." message (and animated gif) indefinitely.

To reproduce

The issue is difficult to reproduce because it requires an archival description with a very large number of children. With a sufficiently large data set:

1) Search for a description with hundreds of thousands of descriptions or go directly to the description's URL

Error encountered

The treeview window displays "Loading ..." with an animated "loading" image indefinitely.

Expected behavior

The treeview loads, and replaces the "Loading ..." message and image.


Related issues

Related to AtoM Wishlist - Feature #13293: Improve full-width paging options for large hierarchies New 04/24/2020
Related to Access to Memory (AtoM) - Bug #13414: Siblings order is not maintained in archival descriptions... Verified 09/04/2020

History

#1 Updated by David Juhasz over 1 year ago

  • Sponsored changed from No to Yes

#2 Updated by David Juhasz over 1 year ago

  • Description updated (diff)

#3 Updated by David Juhasz over 1 year ago

PHP error message after treeview request returns a 500 error:

2020/09/18 18:37:39 [error] 30905#30905: *1476803 FastCGI sent in stderr: "PHP
message: PHP Fatal error:  Allowed memory size of 536870912 bytes exhausted 
(tried to allocate 4096 bytes) in /usr/share/nginx/atom/lib/QubitPdo.class.php
on line 95 PHP message: PHP Fatal error:  Allowed memory size of 536870912 bytes
exhausted (tried to allocate 4096 bytes) in /usr/share/nginx/atom/cache/qubit/
prod/config/config_core_compile.yml.php on line 3861" while reading response 
header from upstream, client: XXX.XXX.XXX.42, server: XXXXXXXXXXXXXX,
request: "GET /test-123/informationobject/fullWidthTreeView?
nodeLimit=50&firstLoad=true HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", 
host: "XXXXXXXXXXXXX", referrer: "XXXXXXXXXXXXXXXXXX" 

#4 Updated by David Juhasz over 1 year ago

I increased the PHP memory_limit on the server to 2G, but the treeview request still exceeded that limit and died.

#5 Updated by David Juhasz over 1 year ago

  • Description updated (diff)

#6 Updated by David Juhasz over 1 year ago

  • Related to Feature #13293: Improve full-width paging options for large hierarchies added

#7 Updated by David Juhasz over 1 year ago

I've opened Pull Request 1205 which switches the data source for the full width treeview from MySQL to Elasticesearch. The switch to Elasticsearch drastically reduces the memory required by PHP and improves the full width treeview load times significantly. The switch to Elasticsearch does not allow the full width treeview to load 900 000 siblings though. With the applied fix the full width treeview does load, and can be navigated, but the selected description will only be highlighted in the treeview if it is within the first page of results (the page size can be configured in the AtoM Admin settings).

Elasticsearch returns a maximum of 10 000 search results with the default configuration, and while this limit can be raised, the Elasticsearch documentation advises against doing so (ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#paginate-search-results). There are other ways to load more than 10 000 descriptions at a given level of description, e.g. progressive loading, but it seems unlikely that a user will want to manually scroll through tens of thousands of descriptions in the treeview.

One possible solution is implementing bi-directional scrolling in the treeview, as described in issue #13293, but that work would require additional funding.

#8 Updated by David Juhasz over 1 year ago

  • Status changed from New to Code Review
  • Assignee changed from David Juhasz to Mike Cantelon

#9 Updated by David Juhasz over 1 year ago

  • Status changed from Code Review to QA/Review

#10 Updated by David Juhasz over 1 year ago

  • Assignee deleted (Mike Cantelon)

#11 Updated by Dan Gillean over 1 year ago

  • Status changed from QA/Review to Feedback
  • Assignee set to David Juhasz

May have found an issue with this - although it could be a regression caused by something else. New child descriptions are not showing up in the treeview when created in an edit template using the add new child records widget. Hard browser refresh doesn't make a change, and neither does clearing the application cache and restarting services - you have to re-edit the description for them to show up, or else run the search:pop task again. I assume that an edit to the parent is triggering a search index update, which makes the child records show up when this occurs, though I'm not sure.

To reproduce

  • Log in
  • Create a new description
  • While in the edit template, use the "Add new child records" widget (found in the Identity area of the ISAD template) to add a couple of child records
  • Finish your parent description and save

Resulting error

  • Child records are not shown in full-width treeview on save
  • Search index update is required for them to become displayed (which can also be triggered by editing the parent description and saving)

Expecte result

  • Child records created with the widget are immediately available in the full-width treeview after saving a new description.

#12 Updated by Peter Van Garderen over 1 year ago

The same error is occurring on DIP upload and metadata-only DIP upload from Archivematica to AtoM.

To reproduce:

  • Perform a DIP upload or metadata-only DIP upload from an Archivematica instance connected to AtoM.
  • Navigate to the AtoM slug used for the DIP upload.

Resulting error:

The DO objects appear in the carousel but the DO records do not appear as children of the parent slug in the treeview widget.

Expected result:

Child records created via DIP upload are immediately available in the full-width treeview.

Current workaround:
Same as above. Re-save the parent description to trigger an index update. Then the children appear in the treeview.

#13 Updated by David Juhasz over 1 year ago

It sounds to me like this change is revealing some places in AtoM where the Elasticsearch index isn't updated when an archival description is added. Before this commit the descriptions added by the "Add new child records" and DIP upload functions were not showing up in search results or the browse page, but they were shown in the treeview, because it was reading the data from the database. Now that the treeview is loading data from the ES index, the descriptions added through these channels are no longer showing up because they are not updating the Elasticsearch index. :(

I think the solution should be to update the "Add new child records" and "DIP upload" features so they update the ES index after adding archival descriptions to the database, but this will require more work. :(

Radda also pointed out in the code review for PR 1205 that until issue #13414 is fixed, sibling descriptions added by import will be displayed in random sort order until the search index is updated.

#14 Updated by David Juhasz over 1 year ago

  • Related to Bug #13414: Siblings order is not maintained in archival descriptions CSV import added

#15 Updated by Dan Gillean over 1 year ago

Interesting note, David. However, I think this must be tied to a treeview-related regression though - I can also reproduce this problem with a CSV import, and Steve has reproduced it by using the digital object multi-uploader.

In both cases, child descriptions can be found in search/browse, suggesting that they are being added to the index.... but just not showing up in the full-width treeview. Also, if you flip to the sidebar treeview, everything is shown fine.

I still don't know if this issue is the direct cause of course, but I don't think it's a failure to index that's the issue here - it seems to be particular to the full-width treeview.

#16 Updated by David Juhasz over 1 year ago

Ah, I found the problem. The treeview code is checking the parent ES node "children" field to see if the node has children, but it looks like the "children" field value is not being updated on the parent node after adding children. :(

I add a quick fix while investigating the problem that seems to work, so I'll create an new pull request that should fix the problem.

Thanks for the follow up testing and information Dan, it was really helpful in finding the bug.

#17 Updated by David Juhasz over 1 year ago

Okay, I've submitted PR #1240 with a fix for the missing children. Unfortunately the fix is significantly slower for large trees (from 0.4s to 4s to load ~10,000 nodes) because it requires an Elasticsearch query to check if each node in the tree has any children. :(

#18 Updated by David Juhasz over 1 year ago

  • Status changed from Feedback to Code Review
  • Assignee deleted (David Juhasz)

#19 Updated by David Juhasz over 1 year ago

  • Status changed from Code Review to QA/Review

Merged 4076601 to qa/2.x. Ready for QA/Review.

#20 Updated by Dan Gillean 11 months ago

  • Status changed from QA/Review to Verified

Also available in: Atom PDF