You are here

Using flow control functions for performance monitoring queries

MySQL Performance Blog - Sat, 24/05/2008 - 8:43am

I'm not big fan on flow control functions like IF or CASE used in MySQL Queries as they are often abused used to create queries which are poorly readable as well as can hardly be optimized well by MySQL Optimizer.

One way I find IF statement very useful is computing multiple aggregates over different set of rows in the single query sweep.

Here is how I like to use it for web site performance analyzes. As you can see in this table we have recorded "wtime" which is wallclock time it took to generate the page. We also track types of pages because they often have different performance profile.

PLAIN TEXT SQL:
  1. mysql> SELECT count(*) cnt, avg(wtime) avw, sum(IF(wtime>0.3,1,0))/count(*) soso, sum(IF(wtime>1,1,0))/count(*) poor, sum(IF(wtime>5,1,0))/count(*) fatal FROM performance_log_080523;
  2. +---------+------------------+--------+--------+--------+
  3. | cnt     | avw              | soso   | poor   | fatal  |
  4. +---------+------------------+--------+--------+--------+
  5. | 4412134 | 0.18669403011609 | 0.1280 | 0.0396 | 0.0017 |
  6. +---------+------------------+--------+--------+--------+
  7. 1 row IN SET (7.51 sec)

This query scans through page generation log for some site and reports number of requests, average request time as well as classifies requests to multiple classes. From the same query we can see portion of requests which were over 300ms (12.8%) - so we call them "so so" as we set 300ms or less as performance goal for the web site. 4% of these requests get ranking "poor" being over 1 seconds and 0.1% of requests get "fatal" classification because we assume user will not wait over 5 seconds and will already go away.

On the side note I should say the average time is least usable because average does not tell you a lot. It is much better set performance goals for high percentage portion such as - 95% or 99% and see what fraction of requests matches this goal.

In this example if our 95% goal would be 1 seconds we would pass but for 95% 0.3 second response time we would fail, same as 99% requests served within 1 second.

Here is another example:

PLAIN TEXT SQL:
  1. mysql> SELECT count(*) cnt, avg(wtime) avw, sum(IF(wtime>0.3,1,0))/count(*) soso, sum(IF(wtime>1,1,0))/count(*) poor, sum(IF(wtime>5,1,0))/count(*) fatal,page_type FROM performance_log_080523 WHERE bot='google' GROUP BY page_type;
  2. +-------+-------------------+--------+--------+--------+---------------+
  3. | cnt   | avw               | soso   | poor   | fatal  | page_type     |
  4. +-------+-------------------+--------+--------+--------+---------------+
  5. | 21954 |  0.36869757085721 | 0.4373 | 0.0545 | 0.0008 | search        |
  6. | 25843 |   1.1290003426468 | 0.9114 | 0.4267 | 0.0081 | profile  |
  7. ...
  8. |  4393 |  0.63011296296095 | 0.5852 | 0.1689 | 0.0052 | rss           |
  9. +-------+-------------------+--------+--------+--------+---------------+
  10. 15 rows IN SET (5.87 sec)

In this example I hid some rows to obfuscate some date :)

In this query we're looking at response time for different pages and we can find "search" page responds within 1 second in about 95% while profile page in less than 60% - It is very important to do such grouping by user functions because otherwise you will not catch important but may be less used functions performance problems.

Another thing you may notice I look for performance stats not for all pages but just for pages retrieved by Google crawler. In many cases this is worth looking at (may be for all search bots rather than just google) because bots have very different site access pattern. In many cases your human visitors will visit relatively few hot pages, which will use content from the cache (or be served from the cache all together). Bots however tend to visit distinct pages which tends to have significantly lower cache hit rate.

Entry posted by peter | 2 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks