I would appreciate your input to help me count unique values for a SET type in MySql. I have a column named "features" defined as a SET field as follows:
CREATE TABLE cars (features SET('power steering', 'power locks', 'satellite radio', 'power windows', 'sat nav', 'turbo'));
As I fill this table, since the features are not mutually exclusive, I will get records which include a combination of 2 or more of these features. For example:
Car 1 has power steering and power windows, but none of the remaining features. Car 2 has all features. Car 3 has all features, except sat nav and turbo.
What I want to do is to get a list of all single listed features in the table, including the count of records associated to each in a similar fashion as a SELECT statement using a GROUP BY clause. So, following with the example above, I should be able to get the following result:
features |count
---------------+------
power steering | 3 //All cars have this feature
power locks | 2 //Only cars 2 and 3 have it
satellite radio| 2 //Only cars 2 and 3 have it
power windows | 3
sat nav | 1 //only car 2 has it
turbo | 1 //only car 2 has it
I have tried using the following query with the expectation of obtaining the aforementioned result:
SELECT features, COUNT(features) FROM cars GROUP BY features;
However, instead of what I was expecting, I got the count of each of the existing feature combinations:
features |count
------------------------------------------------+--------
power steering, power windows | 1 //i.e. only 1 car has
| //only these 2 features
| //(car 1 in this example)
|
------------------------------------------------+-------
power steering, power locks, satellite radio, |
power windows, sat nav, turbo | 1
------------------------------------------------+-------
power steering, power locks, satellite radio, |
power windows | 1
So, the question is: Is there a way of obtaining the count of each single feature, as shown in the first table, using one single MySQL query? I could do it by executing one query for each feature, but I'm sure there must be a way of avoiding such hassle. Someone might as well suggest using a different table for the features and joining, but it is not possible at this point without heavily impacting the rest of the project. Thanks in advance!
SELECT set_list.features, COUNT(cars.features) FROM
(SELECT TRIM("'" FROM SUBSTRING_INDEX(SUBSTRING_INDEX(
(SELECT TRIM(')' FROM SUBSTR(column_type, 5)) FROM information_schema.columns
WHERE table_name = 'cars' AND column_name = 'features'),
',', @r:=@r+1), ',', -1)) AS features
FROM (SELECT @r:=0) deriv1,
(SELECT ID FROM information_schema.COLLATIONS) deriv2
HAVING @r <=
(SELECT LENGTH(column_type) - LENGTH(REPLACE(column_type, ',', ''))
FROM information_schema.columns
WHERE table_name = 'cars' AND column_name = 'features')) set_list
LEFT OUTER JOIN cars
ON FIND_IN_SET(set_list.features, cars.features) > 0
GROUP BY set_list.features
Adapted from:
MySQL: Query for list of available options for SET
My query takes the SQL at the above post as the basis, to get a list of the available column values. All of the indented SQL is that one query, if you execute it alone you'll get the list, and I create a result set from it which I call "set_list". I just copied that query as is, but it is basically doing a lot of string manipulation to get the list - as Mike Brant suggested, the code would be far simpler (but maybe just not as dynamic) if you put the list into another table, and just joined that.
I then join set_list back against the cars table, joining each item from set_list against the rows in cars that contain that feature - FIND_IN_SET(). It's an outer join, so if anything from the set list isn't represented, it will be there with a count of zero.
Typically, we use the FIND_IN_SET
function.
You could use a query like this to return the specified result:
SELECT f.feature
, COUNT(1)
FROM ( SELECT 'power steering' AS feature
UNION ALL SELECT 'power locks'
UNION ALL SELECT 'satellite radio'
UNION ALL SELECT 'power windows'
UNION ALL SELECT 'sat nav'
UNION ALL SELECT 'turbo'
) f
JOIN cars c
ON FIND_IN_SET(f.feature,c.features)>0
GROUP BY f.feature
ORDER BY f.feature
You could omit >0
and get the same result. This query omits "zero counts": rows with a "feature" that doesn't appear for any car. To get those, you could use an outer join (add the LEFT
keyword before JOIN
, and rather than COUNT(1)
in the SELECT list, COUNT(expr)
where expr
is a column from cars
that is NOT NULL
, or some other expression that will be non-NULL when a matching row is found, and NULL
when a matching row is not found.