You are here

Do you always need index on WHERE column ?

MySQL Performance Blog - Tue, 28/08/2007 - 5:53pm

I believe we wrote about this before, but this topic popups again and again.
Today I've read opinion that if we have clause WHERE has_something=1 we should have index on column `has_something` (the column has two values 0 and 1).

In reality the right answer is not so simple.

Let's look next table

PLAIN TEXT SQL:
  1. CREATE TABLE `testr` (
  2.   `id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  3.   `name` varchar(32) NOT NULL,
  4.   `has_something` tinyint(3) UNSIGNED NOT NULL,
  5.   PRIMARY KEY  (`id`),
  6.   KEY `has_something` (`has_something`)
  7. ) ENGINE=MyISAM

with 20.000.000 records.

And in first case has_something=0 for 90% of rows (with random distribution)

PLAIN TEXT SQL:
  1. mysql> SELECT cnt0/cnt FROM (SELECT count(*) cnt0 FROM testr WHERE has_something=0) t, (SELECT count(*) cnt FROM testr) t1;
  2. +----------+
  3. | cnt0/cnt |
  4. +----------+
  5. |   0.9001 |
  6. +----------+
  7. 1 row IN SET (7.56 sec)

Let's check execution time with and without index


mysql> select count(name) from testr force key (has_something) where has_something=0;
+-------------+
| count(name) |
+-------------+
| 18001245 |
+-------------+
1 row in set (35.96 sec)


mysql> select count(name) from testr ignore key (has_something) where has_something=0;
+-------------+
| count(name) |
+-------------+
| 18001245 |
+-------------+
1 row in set (10.46 sec)

As you see with index the time is by 3.5 times slower.

Good that mysql in this case choose do not use index

PLAIN TEXT SQL:
  1. mysql> EXPLAIN SELECT count(name) FROM testr  WHERE has_something=0;                                 
  2. +----+-------------+-------+------+---------------+------+---------+------+----------+-------------+
  3. | id | select_type | TABLE | type | possible_keys | KEY  | key_len | ref  | rows     | Extra       |
  4. +----+-------------+-------+------+---------------+------+---------+------+----------+-------------+
  5. |  1 | SIMPLE      | testr | ALL  | has_something | NULL | NULL    | NULL | 15000000 | USING WHERE |
  6. +----+-------------+-------+------+---------------+------+---------+------+----------+-------------+
  7. 1 row IN SET (0.00 sec)

Let look the case when has_something = 0 for 50% of rows.


mysql> select count(name) from testr force key (has_something) where has_something=0;
1 row in set (20.27 sec)


mysql> select count(name) from testr ignore key (has_something) where has_something=0;
1 row in set (10.62 sec)

query with index is still 2 times slower.

and this time mysql is going to use index in execution plan:

PLAIN TEXT SQL:
  1. mysql> EXPLAIN SELECT count(name) FROM testr  WHERE has_something=0;                                               +----+-------------+-------+------+---------------+---------------+---------+-------+---------+-------+
  2. | id | select_type | TABLE | type | possible_keys | KEY           | key_len | ref   | rows    | Extra |
  3. +----+-------------+-------+------+---------------+---------------+---------+-------+---------+-------+
  4. |  1 | SIMPLE      | testr | ref  | has_something | has_something | 1       | const | 8890716 |       |
  5. +----+-------------+-------+------+---------------+---------------+---------+-------+---------+-------+
  6. 1 row IN SET (0.00 sec)

What about 30% rows with has_something=0 ?

mysql> select count(name) from testr force key (has_something) where has_something=0;
1 row in set (12.36 sec)


mysql> select count(name) from testr ignore key (has_something) where has_something=0;
1 row in set (10.51 sec)

Still query without index is faster.

And finally for case with 20% rows with has_someting=0


mysql> select count(name) from testr force key (has_something) where has_something=0;
1 row in set (8.39 sec)


mysql> select count(name) from testr ignore key (has_something) where has_something=0;
1 row in set (10.43 sec)

So only in the last case we really need the index on column `has_something`