Counting items with mysql and regex in the group by statement

Page Cached from:
http://www.cruzinthegalaxie.com/counting-items-with-mysql-and-regex-in-the-group-by-statement/

Yesterday I needed to do a little report for the boss but the data that I needed to group together wasn’t exactly the same. I hit up the “Goog” for ways to use regular expressions in a count query, but that is kinda tricky to find so this is what I was able to come up with….

So let’s say you had a table where you were attaching the landing url a site visiting is entering your website on, and you wanted to track that landing url the whole way to the end of some web based procedure. Of course certain urls will be different but some could be common for the purpose of counting them together on a report. In a practical example, lets say you are doing pay-per-click traffic with Google and Yahoo. Both of these providers use unique ids in the url to help distinguish between multiple clicks from the same person (so you end up with many urls that are different but mean the same thing). And of course you were a good little developer when you decided to send Google traffic to /google/ and Yahoo to /yahoo/!

With that premise and the following fields in a database table called “signup”, you can group the unique Google and Yahoo landing pages together with a simple regular experssion.

signup table

+----------------------+
| Field
+----------------------+
| id
| signup_date
| landing_page
| etc....
+----------------------+

SELECT
DATE_FORMAT(signup_date,’%m-%d-%Y’) as signup_date,
count(landing_page) as signup_count,
IF(landing_page REGEXP ‘^/(google|yahoo).*’,SUBSTRING(landing_page,1,7),landing_page) as lp
FROM signup
WHERE
1=1
AND signup_date >= ‘” . $istartdate . “‘
AND signup_date <= ‘” . $ienddate . “‘
GROUP BY DATE_FORMAT(signup_date,’%m-%d-%Y’), IF(landing_page REGEXP ‘^/(google|yahoo).*’,SUBSTRING(landing_page,1,7),landing_page)

If that makes no sense whatsoever…. awesome!

When I make a report based on grouping data by day, I select the date for each row, then the count of the column I am looking at, and in this example the field that I am grouping to look at. (This query is sitting behind a web form so the php variables for start and end date are passed in.) In the GROUP BY clause, if you specify the first grouping as the formatted date, you can base the format on the way you are outputting the information on the report. In this example I am doing the report by day. If you wanted to grab the values by month, you would just change the date format pattern from this: ‘%m-%d-%Y’ to ‘%m-%Y’. The tricky part in this query was I need to have one grouping condition that would pull together all the like landing page urls as each value. I chose to use an if statement IF(Condition,true,false) where the condition was a regular expression pattern. Using the condition = landing_page REGEXP ‘^/(google|yahoo).* ‘ tells MySQL that IF the landing_page starts with ‘/google’ or ‘/yahoo’ return the true value, otherwise return the false value. I was not able to return the entire value, so using SUBSTRING will chop off the characters that you want and ignore the rest. If the pattern fails to match it will return the normal value for landing_page. And that is all there is to it.

Let me know if that was helpful to anyone!