You are here

Should MySQL Extend GROUP BY Syntax ?

MySQL Performance Blog - Fri, 17/08/2007 - 8:55pm

Jan has a good article about finding the row matching some value in the group:

This is one illustration of group by limitations in SQL language which is not offset by any MySQL specific extensions,yet
As you can see if you want to get one row from the group which is sorted some way you have to use ugly hacks. This is because SQL does not have a notion of sorting the data within the groups and in fact ANSI SQL even forbids you to select columns which are not aggregates or part of group by because result in this case is not defined.

What would be quite helpful is to have GROUPORDER (pick the name) clause which defines which element is selected for non aggregate columns, something like

PLAIN TEXT SQL:
  1. SELECT MAX(Population),  City, Country  FROM City GROUP BY Country GROUPORDER BY Population DESC;

This could give you row which has highest population for which country, and really you would not even need to query MAX(population) if you do not need that because you're soring by population you would see MAX(Population) in the population column:

PLAIN TEXT SQL:
  1. SELECT City, Population City FROM City GROUP BY Country GROUPORDER BY  Population DESC

would simply return city with highest population for each country.

In fact we had similar problem in Sphinx when we were working on BoardReader project - we wanted to group search results by thread and return only one best matching post from each thread while matches themselve could be sorted by relevance or freshness. The benefit of not following any standards is what you can always add what you want and this is just what we did with sphinx in similar case.

SQL Standard might be good for enterprise applications but huge amount of Web applications do not care - they are stuck enough in MySQL they would have rewrite a lot going to other database anyway while they would love to get features which allow you to do stuff more efficient or more beautiful way.

On other hand what I'm talking about if probably the most commonly requested extension which is already supported by number of vendors - CONNECT BY is still not implemented.