Task #12066

Improve performance of DO lookup query

Added by Steve Breker about 1 year ago. Updated 2 months ago.

Status:VerifiedStart date:03/26/2018
Priority:MediumDue date:
Assignee:-% Done:

0%

Category:Digital object
Target version:Release 2.5.0
Google Code Legacy ID: Tested version:
Sponsored:Yes Requires documentation:No

Description

Review query and reduce the number of rows read. This query was appearing in a user's slow query log frequently, and is doing a whole table search to grab 1 specific digital object.

# Query_time: 1.156579  Lock_time: 0.000083  Rows_sent: 1  Rows_examined: 314248  Rows_affected: 0  Rows_read: 314248
# Bytes_sent: 1745
SET timestamp=1521426279;
SELECT object.CLASS_NAME, object.CREATED_AT, object.UPDATED_AT, object.ID, object.SERIAL_NUMBER, digital_object.ID, digital_object.INFORMATION_OBJECT_ID, digital_object.USAGE_ID, digital_object.MIME_TYPE, digital_object.MEDIA_TYPE_ID, digital_object.NAME, digital_object.PATH, digital_object.SEQUENCE, digital_object.BYTE_SIZE, digital_object.CHECKSUM, digital_object.CHECKSUM_TYPE, digital_object.PARENT_ID FROM `object`, `digital_object` WHERE digital_object.PATH='/uploads/r/[client-repository]/9/9/7/9974ceb83498d59d9584f320f9179cfc51541e2d95ecc3d607c53f836fbaeca4/' AND digital_object.NAME='VM002-02-006-D006_1852_06_Saint-Laurent_op_142.jpg' AND digital_object.ID=object.ID LIMIT 1;

See 11875 note 48.

History

#2 Updated by Steve Breker about 1 year ago

The following types of range query where we try to match a range using LFT and RGT on the digital object table are doing a full table scans because there is no index on RGT. Any query with a range match like `WHERE information_object.LFT<'332760' AND information_object.RGT>'332761'` is going to review all rows for RGT's value without an index on RGT.

The following queries regularly appear in the slow query log at [client site] - below I show before and after this index is added showing number of rows affected by the query:

Adding the following index greatly reduces the number of rows scanned: CREATE INDEX `rgt` ON `information_object` (`RGT`);

SELECT object.CLASS_NAME, object.CREATED_AT, object.UPDATED_AT, object.ID, object.SERIAL_NUMBER, information_object.ID, information_object.IDENTIFIER, information_object.OAI_LOCAL_IDENTIFIER, information_object.LEVEL_OF_DESCRIPTION_ID, information_object.COLLECTION_TYPE_ID, information_object.REPOSITORY_ID, information_object.PARENT_ID, information_object.DESCRIPTION_STATUS_ID, information_object.DESCRIPTION_DETAIL_ID, information_object.DESCRIPTION_IDENTIFIER, information_object.SOURCE_STANDARD, information_object.DISPLAY_STANDARD_ID, information_object.LFT, information_object.RGT, information_object.SOURCE_CULTURE FROM `object`, `information_object` WHERE information_object.LFT<'332760' AND information_object.RGT>'332761' AND information_object.ID=object.ID ORDER BY information_object.LFT ASC,information_object.LFT ASC;

BEFORE

+----+-------------+--------------------+--------+---------------+---------+---------+-------------------------------------+--------+-----------------------------+
| id | select_type | table              | type   | possible_keys | key     | key_len | ref                                 | rows   | Extra                       |
+----+-------------+--------------------+--------+---------------+---------+---------+-------------------------------------+--------+-----------------------------+
|  1 | SIMPLE      | information_object | ALL    | PRIMARY,lft   | NULL    | NULL    | NULL                                | 197385 | Using where; Using filesort |
|  1 | SIMPLE      | object             | eq_ref | PRIMARY       | PRIMARY | 4       | information_object.id               |      1 |                             |
+----+-------------+--------------------+--------+---------------+---------+---------+-------------------------------------+--------+-----------------------------+

AFTER

+----+-------------+--------------------+--------+-----------------+---------+---------+----------------------------+------+-----------------------------+
| id | select_type | table              | type   | possible_keys   | key     | key_len | ref                        | rows | Extra                       |
+----+-------------+--------------------+--------+-----------------+---------+---------+----------------------------+------+-----------------------------+
|  1 | SIMPLE      | information_object | range  | PRIMARY,lft,rgt | rgt     | 4       | NULL                       |    1 | Using where; Using filesort |
|  1 | SIMPLE      | object             | eq_ref | PRIMARY         | PRIMARY | 4       | atom.information_object.id |    1 |                             |
+----+-------------+--------------------+--------+-----------------+---------+---------+----------------------------+------+-----------------------------+

explain SELECT COUNT() FROM `information_object`, `object` *WHERE information_object.LFT<'273524' AND information_object.RGT>'273525' AND information_object.ID=object.ID;

BEFORE io.RGT index

+----+-------------+--------------------+--------+---------------+---------+---------+-------------------------------------+--------+-------------+
| id | select_type | table              | type   | possible_keys | key     | key_len | ref                                 | rows   | Extra       |
+----+-------------+--------------------+--------+---------------+---------+---------+-------------------------------------+--------+-------------+
|  1 | SIMPLE      | information_object | ALL    | PRIMARY,lft   | NULL    | NULL    | NULL                                | 165632 | Using where |
|  1 | SIMPLE      | object             | eq_ref | PRIMARY       | PRIMARY | 4       | information_object.id               |      1 | Using index |
+----+-------------+--------------------+--------+---------------+---------+---------+-------------------------------------+--------+-------------+

AFTER

+----+-------------+--------------------+--------+-----------------+---------+---------+----------------------------+------+-------------+
| id | select_type | table              | type   | possible_keys   | key     | key_len | ref                        | rows | Extra       |
+----+-------------+--------------------+--------+-----------------+---------+---------+----------------------------+------+-------------+
|  1 | SIMPLE      | information_object | range  | PRIMARY,lft,rgt | rgt     | 4       | NULL                       |    1 | Using where |
|  1 | SIMPLE      | object             | eq_ref | PRIMARY         | PRIMARY | 4       | atom.information_object.id |    1 | Using index |
+----+-------------+--------------------+--------+-----------------+---------+---------+----------------------------+------+-------------+
2 rows in set (0.00 sec)

explain SELECT object.CLASS_NAME, object.CREATED_AT, object.UPDATED_AT, object.ID, object.SERIAL_NUMBER, digital_object.ID, digital_object.INFORMATION_OBJECT_ID, digital_object.USAGE_ID, digital_object.MIME_TYPE, digital_object.MEDIA_TYPE_ID, digital_object.NAME, digital_object.PATH, digital_object.SEQUENCE, digital_object.BYTE_SIZE, digital_object.CHECKSUM, digital_object.CHECKSUM_TYPE, digital_object.PARENT_ID FROM `object` CROSS JOIN `digital_object` CROSS JOIN `information_object` LEFT JOIN status ON (information_object.ID=status.OBJECT_ID) WHERE information_object.LFT>'187858' AND information_object.RGT<'187867' AND (status.TYPE_ID=158 AND status.STATUS_ID=160) AND information_object.ID=digital_object.INFORMATION_OBJECT_ID AND digital_object.ID=object.ID LIMIT 25;

BEFORE

+----+-------------+--------------------+--------+-------------------------------------+---------------------+---------+---------------------------------+-------+-------------+
| id | select_type | table              | type   | possible_keys                       | key                 | key_len | ref                             | rows  | Extra       |
+----+-------------+--------------------+--------+-------------------------------------+---------------------+---------+---------------------------------+-------+-------------+
|  1 | SIMPLE      | status             | ref    | status_FI_1,status_FI_2,status_FI_3 | status_FI_2         | 5       | const                           | 94165 | Using where |
|  1 | SIMPLE      | information_object | eq_ref | PRIMARY,lft                         | PRIMARY             | 4       | status.object_id                |     1 | Using where |
|  1 | SIMPLE      | digital_object     | ref    | PRIMARY,digital_object_FI_2         | digital_object_FI_2 | 5       | status.object_id                |     1 | Using where |
|  1 | SIMPLE      | object             | eq_ref | PRIMARY                             | PRIMARY             | 4       | digital_object.id               |     1 |             |
+----+-------------+--------------------+--------+-------------------------------------+---------------------+---------+---------------------------------+-------+-------------+

AFTER

+----+-------------+--------------------+--------+-------------------------------------+---------------------+---------+----------------------------+------+-------------+
| id | select_type | table              | type   | possible_keys                       | key                 | key_len | ref                        | rows | Extra       |
+----+-------------+--------------------+--------+-------------------------------------+---------------------+---------+----------------------------+------+-------------+
|  1 | SIMPLE      | information_object | range  | PRIMARY,lft,rgt                     | lft                 | 4       | NULL                       |    1 | Using where |
|  1 | SIMPLE      | status             | ref    | status_FI_1,status_FI_2,status_FI_3 | status_FI_1         | 4       | atom.information_object.id |    1 | Using where |
|  1 | SIMPLE      | digital_object     | ref    | PRIMARY,digital_object_FI_2         | digital_object_FI_2 | 5       | atom.status.object_id      |    1 | Using where |
|  1 | SIMPLE      | object             | eq_ref | PRIMARY                             | PRIMARY             | 4       | atom.digital_object.id     |    1 |             |
+----+-------------+--------------------+--------+-------------------------------------+---------------------+---------+----------------------------+------+-------------+

#3 Updated by Steve Breker about 1 year ago

Digital object queries on the DO path and name also appear in the slow query log with large data sets. Adding an index on PATH will narrow these lookups down significantly:

CREATE INDEX `digital_object_FI_6` ON `digital_object` (`path`);

explain SELECT object.CLASS_NAME, object.CREATED_AT, object.UPDATED_AT, object.ID, object.SERIAL_NUMBER, digital_object.ID, digital_object.INFORMATION_OBJECT_ID, digital_object.USAGE_ID, digital_object.MIME_TYPE, digital_object.MEDIA_TYPE_ID, digital_object.NAME, digital_object.PATH, digital_object.SEQUENCE, digital_object.BYTE_SIZE, digital_object.CHECKSUM, digital_object.CHECKSUM_TYPE, digital_object.PARENT_ID FROM `digital_object`,`object` WHERE digital_object.PATH='/uploads/r/example-repository/0/4/6/0464c8f8e8028808673ede90e6b5110f3d9e049e3679fa998e2810bee10589b0/' AND digital_object.NAME='Screen_Shot_2018-03-26_at_5_141.jpg' AND digital_object.ID=object.ID LIMIT 1;

BEFORE

+----+-------------+----------------+--------+---------------+---------+---------+------------------------+-------+-------------+
| id | select_type | table          | type   | possible_keys | key     | key_len | ref                    | rows  | Extra       |
+----+-------------+----------------+--------+---------------+---------+---------+------------------------+-------+-------------+
|  1 | SIMPLE      | digital_object | ALL    | PRIMARY       | NULL    | NULL    | NULL                   | 29155 | Using where |
|  1 | SIMPLE      | object         | eq_ref | PRIMARY       | PRIMARY | 4       | atom.digital_object.id |     1 |             |
+----+-------------+----------------+--------+---------------+---------+---------+------------------------+-------+-------------+
2 rows in set (0.00 sec)

AFTER:

+----+-------------+----------------+--------+-----------------------------+---------------------+---------+------------------------+------+-------------+
| id | select_type | table          | type   | possible_keys               | key                 | key_len | ref                    | rows | Extra       |
+----+-------------+----------------+--------+-----------------------------+---------------------+---------+------------------------+------+-------------+
|  1 | SIMPLE      | digital_object | ref    | PRIMARY,digital_object_FI_6 | digital_object_FI_6 | 767     | const                  |    1 | Using where |
|  1 | SIMPLE      | object         | eq_ref | PRIMARY                     | PRIMARY             | 4       | atom.digital_object.id |    1 |             |
+----+-------------+----------------+--------+-----------------------------+---------------------+---------+------------------------+------+-------------+
2 rows in set (0.00 sec)

#4 Updated by Steve Breker about 1 year ago

  • Status changed from New to Code Review
  • Assignee changed from Steve Breker to Nick Wilkinson

#5 Updated by Nick Wilkinson about 1 year ago

  • Assignee changed from Nick Wilkinson to José Raddaoui Marín

Hi Radda, passing to you for CR.

#6 Updated by José Raddaoui Marín about 1 year ago

  • Status changed from Code Review to Feedback
  • Assignee changed from José Raddaoui Marín to Steve Breker

#7 Updated by Steve Breker about 1 year ago

  • Status changed from Feedback to QA/Review
  • Assignee changed from Steve Breker to Nick Wilkinson

Merged to qa/2.5.x. Ready for QA.

The goal of this ticket is to increase performance of DO adding, browsing, viewing. Basically we will want to ensure that DO import and linking still work as expected and that browsing and viewing of DO's still works as expected. Ping me if you have any questions.

#8 Updated by Steve Breker about 1 year ago

We will need to create a custom script to load these database changes into [client site]'s database once QA is complete. This script can be based off the 161 migration file.

#9 Updated by Nick Wilkinson about 1 year ago

  • Assignee changed from Nick Wilkinson to Dan Gillean

#10 Updated by Steve Breker about 1 year ago

I have manually created these indexes onsite at [client site] on Apr 25, 2018. Slow query log is running to record query run times over 1 second.

#11 Updated by Steve Breker about 1 year ago

As per this comment on the other ticket:

https://projects.artefactual.com/issues/11875#note-60

On May 10, 2018 I ran analyze-table on the information_object table. Since then, there are only ~5 instances of slow range queries in the slow query log which is a big imporvement. The slow query log was set to capture queries taking longer than one second. The slow query log has been capturing data from May 10 to May 22.

I see a number of entries in the slow query log for the full width treeview (query defined here: https://github.com/artefactual/atom/blob/qa/2.5.x/apps/qubit/modules/default/actions/fullTreeViewAction.class.php#L74), but this is unrelated to the queries we were trying to speed up. The majority of these treeview queries are very close to the 1 second mark and the query seems to be using appropriate indexes already. I expect moving to faster disk at [client site] will speed these up.

It is possible that the slow range queries are not appearing in the slow query log because the load is lighter on the server within this timeframe - perhaps fewer digital object uploads are occurring?

#12 Updated by Dan Gillean 2 months ago

  • Tracker changed from Bug to Task
  • Description updated (diff)
  • Category set to Digital object
  • Status changed from QA/Review to Verified
  • Assignee deleted (Dan Gillean)
  • Sponsored changed from No to Yes
  • Requires documentation set to No

#13 Updated by Dan Gillean 2 months ago

  • Description updated (diff)

Also available in: Atom PDF