Task #13224

Improve hierarchy management queries

Added by José Raddaoui Marín 3 months ago. Updated about 1 month ago.

Status:Code ReviewStart date:12/09/2019
Priority:MediumDue date:
Assignee:-% Done:

0%

Category:Performance / scalability
Target version:Release 2.6.0
Google Code Legacy ID: Tested version:
Sponsored:No Requires documentation:

Description

After the upgrade to support MySQL 8 on #13220, which includes support for CTE queries, investigate options to improve hierarchy management queries.

ancestors_queries_charts.png (84.2 KB) José Raddaoui Marín, 01/08/2020 03:25 PM

ancestors_queries_new.png (118 KB) José Raddaoui Marín, 01/08/2020 03:25 PM

ancestors_queries_old.png (95.8 KB) José Raddaoui Marín, 01/08/2020 03:25 PM

do_browse_new_fetch.png (117 KB) José Raddaoui Marín, 01/08/2020 06:41 PM

do_browse_charts.png (205 KB) José Raddaoui Marín, 01/08/2020 06:41 PM

do_browse_old_count.png (87.7 KB) José Raddaoui Marín, 01/08/2020 06:41 PM

do_browse_old_fetch.png (100 KB) José Raddaoui Marín, 01/08/2020 06:41 PM

descendants_new_query.png (91.7 KB) José Raddaoui Marín, 01/08/2020 11:09 PM

descendants_chart.png (146 KB) José Raddaoui Marín, 01/08/2020 11:09 PM

descendants_old_query.png (105 KB) José Raddaoui Marín, 01/08/2020 11:09 PM

descendants_count.png (89.4 KB) José Raddaoui Marín, 01/09/2020 05:30 PM


Related issues

Related to Access to Memory (AtoM) - Task #13237: Reconsider nested set implementation in some models New 01/13/2020
Related to Access to Memory (AtoM) - Task #13238: Avoid fetching multiple times the languages enabled from ... QA/Review 01/13/2020
Related to Access to Memory (AtoM) - Bug #13211: The deletion of archival descriptions with a lot of desce... QA/Review 11/13/2019
Related to Access to Memory (AtoM) - Task #13239: Improve deletion of nested set hierarchies (specially terms) New 01/13/2020
Related to Access to Memory (AtoM) - Task #13240: Analyze full removal of nested set in favor of the curren... New 01/13/2020
Related to Access to Memory (AtoM) - Task #13241: Use CTE (or parent_id keymap) to get the related terms an... New 01/13/2020
Related to Access to Memory (AtoM) - Bug #13250: Editing/deleting a repository with a big amount of relate... QA/Review 01/28/2020
Related to Access to Memory (AtoM) - Task #13261: Improve Elasticsearch indexing process In progress 02/16/2020

History

#1 Updated by José Raddaoui Marín 3 months ago

  • Status changed from In progress to Code Review
  • Assignee changed from José Raddaoui Marín to Steve Breker

To help spotting database issues and measuring possible improvements and new features, I have configured Percona Monitoring and Management on the Docker Compose environment. Related PRs:

https://github.com/artefactual/atom/pull/1017
https://github.com/artefactual/atom-docs/pull/132

#2 Updated by José Raddaoui Marín 2 months ago

  • Status changed from Code Review to In progress
  • Assignee changed from Steve Breker to José Raddaoui Marín

#3 Updated by José Raddaoui Marín about 1 month ago

Ancestors query

The ancestors criteria generated by the ORM for each hierarchical model has been modified to use a recursive CTE (commit). This replaces the costly table scans over the rgt and lft columns for a few joins using id and parent_id columns. The improvement will depend on the full nested set size and the position of the record on it, as that will determine the amount of rows that need to be examined for the lft and rgt columns, while the number of ancestors for the record (which determines the amount of joins) is unlikely to increase that much.

In order to analyze and test the query improvements using the ORM directly, I've created this quick PHP script to loop over all IOs and print their ancestors titles. The following screenshot represents the execution of that script on a relatively small nested set (46625 records), the first time using the old query and the second one using the new implementation:

The important line for this matter on the second chart is the light blue, which indicates operations per second. The four "vertical" segments indicate the start and end of the script execution and the horizontal length of those sections indicate the execution time of the script.

Old query metrics

New query metrics:

Summary

+----------------------------+-----------+-----------+
|                            | OLD QUERY | NEW QUERY |
+----------------------------+-----------+-----------+
| Total script time          | 705.34 s  | 81.8 s    |
| Total query time           | 633 s     | 18.23 s   |
| Min. query time            | 157.34 µs | 255.48 µs |
| Max. query time            | 51.38 ms  | 10.82 ms  |
| Avg. query time            | 13.57 ms  | 391.01 µs |
| Max. operations per second | 1.07 k    | 2.44 k    |
| Min. operations per second | 122 k     | 2.21 k    |
| Rows examined              | 1.21 b    | 1.27 m    |
+----------------------------+-----------+-----------+

#4 Updated by José Raddaoui Marín about 1 month ago

Ancestors query (IO browse ACL checks)

A more significant use case, testing the ACL checks performed on the IO browse page. While doing this tests I noticed a "duplication" of the ancestors query on the ACL checks, which I've avoided on this commit. Therefore, in addition to the improvement to the ancestor query explained above, the new implementation will perform half of the ancestor queries than before.

Again, using the same data (46625 IOs) I've executed this script with the old and the new implementation. It requests the DO browse page 100 times with a 100 records limit and other filters and calculates the average response time.

The important line for this matter on the second chart is the light blue, which indicates operations per second and the duration of the script execution in both cases. On this case, you can also notice an impact on the CPU utilization (green line from the first chart). Bellow the charts, there is a table with the most demanding queries of the process, the first two are from the old implementation and the third is the only one executed with the new code.

Old query (count)

Old query (fetch)

New query (fetch)

Summary

+--------------------+---------------+-----------+
|                    |  OLD QUERIES  | NEW QUERY |
+--------------------+---------------+-----------+
| Total script time  | 346.86 s      | 91.68 s   |
| Avg. response time | 3.47 s        | 0.91 s    |
| Total query time   | 129 s * ~2    | 3.65 s    |
| Min. query time    | 4.29 ms * ~2  | 264.59 µs |
| Max. query time    | 34.48 ms * ~2 | 1.2 ms    |
| Avg. query time    | 13.03 ms * ~2 | 373.46 µs |
| Rows examined      | 466.28 m * ~2 | 271.2 k   |
+--------------------+---------------+-----------+

#5 Updated by José Raddaoui Marín about 1 month ago

Descendants query

Like in the ancestors query, the lft and rgt columns are also used to find the descendants of a resource. However, using CTE on that query is not as beneficial as in the ancestors case. On this one, the recursive CTE joins are performed on the opposite direction and their amount will be a lot higher in most of the cases.

Nevertheless, on a nested set implementation, the lft and rgt values of each record will not intersect, this means that if the LFT value is in between the lft and rgt values of another record, you could assume that the rgt value will also be in between those values. Using only the lft column to find descendants on the nested set avoids the scan over the rgt column, resulting in a big performance improvement (commit).

To test this changes I created a similar script to the one for the ancestors test, now counting the descendants of all IOs (46625). The following screenshot represents the execution of that script, the first time using the old query and the second one using the new implementation:

Again, the light blue line indicates operations per second. The four "vertical" segments indicate the start and end of the scripts execution and the horizontal length of those sections indicate the execution time of the scripts.

Old query metrics

New query metrics:

Summary

+----------------------------+-----------+-----------+
|                            | OLD QUERY | NEW QUERY |
+----------------------------+-----------+-----------+
| Total script time          | 595.68 s  | 52.4 s    |
| Total query time           | 546 s     | 6.89 s    |
| Min. query time            | 135.33 µs | 101.30 µs |
| Max. query time            | 89.87 ms  | 65.73 ms  |
| Avg. query time            | 11.58 ms  | 149.28 µs |
| Max. operations per second | 1.4 k     | 3.84 k    |
| Min. operations per second | 135       | 3.54 k    |
| Rows examined              | 1.21 b    | 317.34 k  |
+----------------------------+-----------+-----------+

#6 Updated by José Raddaoui Marín about 1 month ago

Descendants count

Running the search populate task, the most demanding query of the process is one used to count the descendants of each term (here). It's called indexing the terms documents and also from other resources related to the term, which may be a lot.

The following metrics of that query are taken from a single search populate run, executed using the demo data (a really small database that takes less than 20 seconds to be indexed):

On a nested set implementation the amount of descendants of a record can be calculated using the lft and rgt values with a simple formula without hitting the database.

In addition to that query, we count the descendants in a similar manner in some other places. I've tried to address most of them on this commit, removing the query completely and using the already populated lft and rgt values from the record.

#7 Updated by José Raddaoui Marín about 1 month ago

  • Status changed from In progress to Code Review
  • Assignee deleted (José Raddaoui Marín)

#8 Updated by José Raddaoui Marín about 1 month ago

  • Related to Task #13237: Reconsider nested set implementation in some models added

#9 Updated by José Raddaoui Marín about 1 month ago

  • Related to Task #13238: Avoid fetching multiple times the languages enabled from the database on the search populate task added

#10 Updated by José Raddaoui Marín about 1 month ago

  • Related to Bug #13211: The deletion of archival descriptions with a lot of descendants is prone to web server timeouts added

#11 Updated by José Raddaoui Marín about 1 month ago

  • Related to Task #13239: Improve deletion of nested set hierarchies (specially terms) added

#12 Updated by José Raddaoui Marín about 1 month ago

  • Related to Task #13240: Analyze full removal of nested set in favor of the current parent relation plus an order column added

#13 Updated by José Raddaoui Marín about 1 month ago

  • Related to Task #13241: Use CTE (or parent_id keymap) to get the related terms and their ancestors on the search populate task added

#14 Updated by José Raddaoui Marín 29 days ago

  • Related to Bug #13250: Editing/deleting a repository with a big amount of related descriptions can timeout added

#15 Updated by José Raddaoui Marín 9 days ago

  • Related to Task #13261: Improve Elasticsearch indexing process added

Also available in: Atom PDF