View Full Version : Query optimization in MySQL

10-20-2010, 09:47 AM
Optimization is system change with purpose of processing speed rising.
Optimization of working with database can be divided into 3 types:

Query optimization
Structure optimization
Server optimization

Let�s view query optimization. Query optimization is the most simple and the most effective type of optimization.

Queries that more often optimized are select queries.
If you want to look how select query will run use operator EXPLAIN. With its help we can see the order of the tables and indexes which will be used.
The main mistake of beginners is that they don�t put indexes on the required fields or conversely put them on the wrong fields. If you make simple select, for example:

SELECT * FROM table WHERE field1 = 123 You have to put index on the field field1, if you make a select by two fields:

SELECT * FROM table WHERE field1 = 123 AND field2 = 234 You have to create aggregative index on field1 and field2
If you want to connect 2 or more tables:

FROM a, b
WHERE a.b_id = b.id Or more general view:

[LEFT] JOIN b ON b.id = a.b_id
[LEFT] JOIN ? ON ?.id = b.c_id You have to create indexes on the fields tables will be connected at. In our case it is field b.id and c.id. But that statement is true only in the case if select will be carried out in the order they listed in query. For example, if MySQL optimizer selects notations in the next order: c, b, a, indexes should be put on the fields b.c_id and a.b_id.
There are some cases when we have to select from the same part of the very big table, for example in many queries there is a connection with part of the table:

[LEFT] JOIN b ON b.id = a.b_id AND b.field1 = 123 AND b.field2 = 234 On those cases it will be reasonable to make that part as a separate temporary table:

CREATE TEMPORARY TABLE tmp_b TYPE=HEAP SELECT * FROM b WHERE b.field1 = 123 AND b.field2 = 234 And work with it.
If we several times calculate aggregate function for the same data, it is better to make such a calculation and put its result to the temporary table.
Sometimes people try to �kill two birds with one stone�. Here is an example:

SELECT f_m. *, MAX( f_m_v_w.date )
AS last_visited, COUNT( DISTINCT f_b.id ) AS books_num,
IF ( f_m.region != 999, f_r.name, f_m.region_other ) AS region_name
FROM fair_members f_m
LEFT JOIN fair_members_visits_week f_m_v_w ON f_m_v_w.member_id = f_m.id
LEFT JOIN fair_regions AS f_r ON f_m.region = f_r.id
LEFT JOIN fair_books AS f_b ON f_b.pub_id = f_m.id
GROUP BY f_m.id Author tries to calculate the maximum attribute value from one table and amount of notations in the other table in one query. As a result you have to connect 2 different tables to one query. It slows down the select. If you want to accelerate the select it is better to make MAX calculating and COUNT calculating as different queries.
Use function COUNT(*) for counting the amount of the lines.
The reason why COUNT(*) is faster than COUNT(id) is explained in the example:
We have table id | user_id | text with PRIMARY(id), INDEX(user_id) index.
We have to count user�s messages with given $user_id.
Let�s compare two queries:

SELECT COUNT(*) FROM message WHERE user_id = $user_id and

SELECT COUNT(id) FROM message WHERE user_id = $user_id To perform first query we have to look through the user_id index and count the number of notations which satisfy the condition. That operation is fast enough, because all indexes are sorted and situated in the buffer.
To perform the second query we have to look through the index for selecting notations that satisfy the condition. If that notation satisfies the condition we take it for getting id value and then increment the counter.
Thus we have a result. Even if there are a lot of notations, speed of the first query is faster.

The speed of insertion and update in database depend on inserted (updated) notation size and on the time of index insertion. Time of index insertion in its turn depends on amount of inserted indexes and table size. That dependence can be shown in the formula:

[Time of index insertion] = [Amount of indexes] * LOG2 ([Table size]) [Amount of indexes] is those indexes that include updated fields.
Conditions in update queries are optimized the same way as in the select queries.
If you want to delete all lines in the table use the command TRUNCATE TABLE table_name