One of the things that often come up while thinking about query optimization is the order of execution. When the database get’s the query – how can we think about it in a way to come up with a fix or plan to optimize the query?
What is order of execution?
This is the sequence in which different components of a query are processed:
- parsing and analyzing queries
- optimization
- query execution plan
- logical and physical phases of execution
- result retrieval
I don’t know what most of that means when I’m faced with a bad query so it’s usually easier for me to look at it from a different perspective using the following parse-tree:
- FROM + JOIN / ON
- WHERE
- GROUP BY
- HAVING
- SELECT columns
- ORDER BY
- LIMIT
I think of each step above as temp tables in memory, so the larger the temp table, the more memory used to the final result set. This may actually be complete fiction, but it is the mindset that I think of these that matters in query optimization.
Recently, I was presented with a slow query. CPU and Memory were fine (no peaks, everything seemed to be fine as far as CPU was concerned). A quick EXPLAIN showed that it was looking at two indexes and doing an index merge. I looked at the cardinality using SHOW INDEXES FOR and noticed that although cardinality was low (2.6 million rows, 115 for index 1, 3 for index 2), I could at least bring the lower cardinality index into the index that had 115 unique values, thereby removing the index merge. I quickly tested the index, set ALGORITHM=INPLACE and LOCK=NONE, and ran it.
METADATA LOCKS everywhere. Index creation is usually fast, safe, and nothing to worry about, so there was a different issue. Start thinking through the order of execution, only to realize that from step 1 to 5, this query was probably exceeding temp memory limits.
I also noticed some of the primary key SELECTS were taking an inordinately long time for the questions they were asking. Remembering that MySQL will flow from memory to on-disk, I realized this was probably the case, so I requested from leadership to do a Flush Tables. Someone suggested a server restart for a different reason, and jumping on that, we were able to do the restart, clear the caches and add the index.
The index creation was a quick stop-gap to allow the development team to fix the cardinality issue. Also, there is a lot of politics involved in creating even a miniscule (to us) outage, and I was also sitting in a zoom meeting with 20 people, many of them asking questions while I was trying to focus on actually fixing the problem. Very chaotic indeed, but it is what it is.
What I believe was happening is that due to thread re-use, and the similar queries, MySQL had exceeded temp table or buffer space for the queries and moved them to on-disk, which caused the primary keys to slow down, to make the locks longer (and the Metadata locks to show up).
But though some people may know the technical specifics, I only need to know that it worked, that there is a fix and it got us to the point of revenue. But without these little tidbits of information like order of execution, it does get harder to find root cause.