You are here

Multi-Column IN clause - Unexpected MySQL Issue

MySQL Performance Blog - Sat, 05/04/2008 - 1:08am

We have an application which stores massive amount of urls. To save on indexes instead of using URL we index CRC32 of the URL which allows to find matching urls quickly. There is a bit of chance there would be some false positives but these are filtered out after reading the data so it works all pretty well.

If we just process urls one by one it works great:

PLAIN TEXT SQL:
  1. mysql> EXPLAIN SELECT url FROM 124pages.124pages WHERE url_crc=484036220 AND url="http://www.dell.com/";
  2. +----+-------------+----------+------+---------------+---------+---------+-------+------+-------------+
  3. | id | select_type | TABLE    | type | possible_keys | KEY     | key_len | ref   | rows | Extra       |
  4. +----+-------------+----------+------+---------------+---------+---------+-------+------+-------------+
  5. |  1 | SIMPLE      | 124pages | ref  | url_crc       | url_crc | 4       | const |    1 | USING WHERE |
  6. +----+-------------+----------+------+---------------+---------+---------+-------+------+-------------+
  7. 1 row IN SET (0.02 sec)

Handling URLs one by one is however not efficient if you're processing millions of them so we tried to do bulk fetches:

PLAIN TEXT SQL:
  1. mysql> EXPLAIN SELECT  url FROM 106pages.106pages WHERE (url_crc, url) IN ((2752937066, 'http://members.aye.net/~gharris/blog/'), (3799762538, 'http://www.coxandforkum.com/'));
  2. +----+-------------+----------+------+---------------+------+---------+------+-------+-------------+
  3. | id | select_type | TABLE    | type | possible_keys | KEY  | key_len | ref  | rows  | Extra       |
  4. +----+-------------+----------+------+---------------+------+---------+------+-------+-------------+
  5. |  1 | SIMPLE      | 106pages | ALL  | NULL          | NULL | NULL    | NULL | 14936 | USING WHERE |
  6. +----+-------------+----------+------+---------------+------+---------+------+-------+-------------+
  7. 1 row IN SET (0.00 sec)

As you can see just using multiple column IN makes MySQL to pick doing full table scan in this case, even though the cardinality on the first column is almost perfect. I did some more testing and it looks like a bug or missing optimizer feature.

I should not be surprised though as multi-column in is not the most used MySQL feature out there.

For given application case we could simply rewrite query using more standard single column IN clause:

PLAIN TEXT SQL:
  1. mysql> EXPLAIN SELECT  url FROM 106pages.106pages WHERE url_crc IN (2752937066,3799762538) AND url IN('http://members.aye.net/~gharris/blog/','http://www.coxandforkum.com/');                                                                  +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
  2. | id | select_type | TABLE    | type  | possible_keys | KEY     | key_len | ref  | rows | Extra       |
  3. +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
  4. |  1 | SIMPLE      | 106pages | range | url_crc       | url_crc | 4       | NULL |    2 | USING WHERE |
  5. +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
  6. 1 row IN SET (0.01 sec)

Theoretically speaking this query is not equivalent to the first one - because row having url_crc=2752937066 and url='http://www.coxandforkum.com/' would match it, while it should not. It however does not happen in our case as url_crc is functionally dependent on url so both queries are equivalent.

So we've got our work around and can forget about the issue and MySQL team gets yet another bug to deal with.
What worries me again is - this is very simple case which seems to to be generally broken which raises a question how good coverage MySQL tests have.

Entry posted by peter | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks