You are here

UNION vs UNION ALL Performance

MySQL Performance Blog - Fri, 05/10/2007 - 9:01pm

When I was comparing performance of UNION vs MySQL 5.0 index merge algorithm Sinisa pointed out I should be using UNION ALL instead of simple UNION in my benchmarks, and he was right. Numbers would be different but it should not change general point of having optimization of moving LIMIT inside of union clause being cool thing.

But So is UNION ALL indeed faster than UNION DISTINCT (the UNION is shortcut for UNION DISTINCT) ?

Indeed it is. I did not have the same data as I used for the other test but I created similar test case - table with separate indexes on "a" and "b" columns with cardinality of 100, having about 40.000.000 of rows

PLAIN TEXT SQL:
  1. SELECT * FROM test.abc WHERE i=5 union  SELECT * FROM test.abc WHERE j=5

This original query was taking about 22 seconds.

As I modified it:

PLAIN TEXT SQL:
  1. SELECT * FROM test.abc WHERE i=5 union ALL SELECT * FROM test.abc WHERE j=5 AND i!=5

The query time dropped to about 6 seconds which is 3.5 times faster - quite considerable improvement.

As you can notice I added "i!=5" clause - this is what allows us to ensure we do not have duplicate rows in result set matching both conditions and so result will be same as query with "i=5 or j=5" where clause.

I also tried this original query (which uses index merge method in MySQL 5.0):

PLAIN TEXT SQL:
  1. SELECT * FROM test.abc WHERE i=5 OR j=5

Such query takes 4 seconds so if you do not need to trick with order by and limit using index merge is faster than UNION as it indeed should be.

So why UNION ALL is faster than UNION DISTINCT ?

The first informed guess would be - because UNION ALL does not need to use temporary table to store result set, however this is not correct - both UNION ALL and UNION distinct use temporary table for result generation. Perhaps one more thing for Optimizer Team to look into.

Interesting enough the fact UNION and UNION ALL require temporary table can only be seen in SHOW STATUS - EXPLAIN does not want to tell you this shameful fact:

PLAIN TEXT SQL:
  1. mysql> EXPLAIN (SELECT * FROM test.abc WHERE i=5) union ALL (SELECT * FROM test.abc WHERE j=5 AND i!=5) \G
  2. *************************** 1. row ***************************
  3.            id: 1
  4.   select_type: PRIMARY
  5.         TABLE: abc
  6.          type: ref
  7. possible_keys: i
  8.           KEY: i
  9.       key_len: 5
  10.           ref: const
  11.          rows: 348570
  12.         Extra: USING WHERE
  13. *************************** 2. row ***************************
  14.            id: 2
  15.   select_type: UNION
  16.         TABLE: abc
  17.          type: ref
  18. possible_keys: i,j
  19.           KEY: j
  20.       key_len: 5
  21.           ref: const
  22.          rows: 349169
  23.         Extra: USING WHERE
  24. *************************** 3. row ***************************
  25.            id: NULL
  26.   select_type: UNION RESULT
  27.         TABLE: <union1,2>
  28.          type: ALL
  29. possible_keys: NULL
  30.           KEY: NULL
  31.       key_len: NULL
  32.           ref: NULL
  33.          rows: NULL
  34.         Extra:
  35. 3 rows IN SET (0.00 sec)

In fact EXPLAIN output is the same for UNION and UNION ALL (which is too bad as execution for them is obviously different).

The difference in execution speed comes from the fact UNION requires internal temporary table with index (to skip duplicate rows) while UNION ALL will create table without such index.

This also explains why difference becomes larger when on disk table is required (as in this case) - Hash indexes used by MEMORY table are very efficient and do not give so much overhead.

Entry posted by peter | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks