You are here

Redundant index is not always bad

MySQL Performance Blog - Tue, 28/08/2007 - 9:54am

About year ago Peter wrote about redundant indexes and mentioned sometimes it is good to leave two indexes, even one is first part of another. I'm speaking about BTREE indexes, for example, KEY (A), and KEY (A,B). From SQL point of view KEY(A) is not needed, as for queries like WHERE A=5 the index (A,B) also can be used.

But there is case when for performance it would be good to have both

Let we have the table

PLAIN TEXT SQL:
  1. CREATE TABLE `userinfo` (
  2.   `id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  3.   `name` varchar(64) NOT NULL DEFAULT '',
  4.   `email` varchar(64) NOT NULL DEFAULT '',
  5.   `password` varchar(64) NOT NULL DEFAULT '',
  6.   `dob` date DEFAULT NULL,
  7.   `address` varchar(255) NOT NULL DEFAULT '',
  8.   `city` varchar(64) NOT NULL DEFAULT '',
  9.   `state_id` tinyint(3) UNSIGNED NOT NULL DEFAULT '0',
  10.   `zip` varchar(8) NOT NULL DEFAULT '',
  11.   `country_id` smallint(5) UNSIGNED NOT NULL DEFAULT '0',
  12.   `gender` enum('M','F') NOT NULL DEFAULT 'M',
  13.   `account_type` varchar(32) NOT NULL DEFAULT '',
  14.   `verified` tinyint(4) NOT NULL DEFAULT '0',
  15.   `allow_mail` tinyint(3) UNSIGNED NOT NULL DEFAULT '0',
  16.   `parrent_account` int(10) UNSIGNED NOT NULL DEFAULT '0',
  17.   `closest_airport` varchar(3) NOT NULL DEFAULT '',
  18.   PRIMARY KEY (`id`),
  19.   UNIQUE KEY `email` (`email`),
  20.   KEY `country_id` (`country_id`),
  21.   KEY `state_id` (`state_id`),
  22. ) ENGINE=MyISAM

with 1,000,000 rows, and for each state_id there are about 20,000 records.

In the benchmark the query (Q1)

SELECT count(*) FROM userinfo WHERE state_id=5

which uses KEY (`state_id`), shows the result 115 queries/sec

And now we have the task to execute the query (Q2)
SELECT state_id,city,address FROM userinfo WHERE state_id=5
which extracts additional info - city and address, both long strings columns

In the benchmark for this query we have 10 queries per sec.

Usuall technique we use for such queries is covering index, which store all needed columns in the index, and there is no need to read row data from the table.

So - we can extend index `state_id_idx` (`state_id`) by two columns:

PLAIN TEXT SQL:
  1. ALTER TABLE userinfo DROP KEY state_id, ADD KEY `state_id_2` (state_id, city, address)

And now for Q2 we have 16.34 q/s, that is almost +100% comparing with non-convering index.

But if run the test for Q1 we have: 25.40 q/s, that in 4 times slower than in first run.
(This is for MyISAM, for InnoDB difference will be less. MyISAM uses key-compression for VARCHAR columns, so makes things worse).

So to have good performance for both case we should leave both indexes:
KEY `state_id` (state_id) and
KEY `state_id_2` (state_id, city, address)

The drawback is worse INSERT performance, here is time to INSERT 1million records:
only with state_id KEY - 72 sec
with state_id and state_id_2 KEYs - 470sec
(Here I limited available memory to fit only one index, that shows how can degrade performance if we balance on memory limits)

If you are interested in, the numbers for InnoDB in the same tests:
only state_id KEY:
Q1 - 108 q/s
Q2 - 12.12 q/s
only state_id_2 KEY:
Q1 - 100 q/s
Q2 - 24.08 q/s

And INSERT of 1mil records:
only with state_id KEY - 80 sec
with state_id and state_id_2 KEYs - 136sec
(but in this case the memory was enough for both indexes)