So I have a JS script which people embed into their sites and it tracks all the URLs and clicks of visitors. Each visitor gets “Token” which is unique value to them and is used to track their actions on the site.
I wanted to show owners of the site actions of their visitors so I wrote following query. All the visits are stored in “custom_logs” table.
SELECT * FROM custom_logs WHERE pn = 'pn-9283896662' AND id IN (SELECT MAX(id) FROM custom_logs WHERE action_clicked_text LIKE '%sometext%' GROUP BY token ) AND token != '' AND action_timestamp > 11568 AND action_timestamp < 1570846368 order by action_timestamp desc LIMIT 0, 30;
I narrowed the problem to “GROUP BY token” part of the query. When I remove this part query runs way faster, but still pretty slow (0.7s compared to 5s with Group By part), and there are 4 of those queries in one page. In custom_logs table there is already like 250 000 rows.
“GROUP BY token” part is there because I want to show people only one and only newest log of each user so they can click on it and see the complete logs of said user. So after the query I use mysqli_num_rows($results);
to count the results. Is there any other way besides caching to quickly count all the results?
I read something about indexing columns so I made token varchar(255) and made it indexed but it did nothing in terms of speed. But also I am not so good in SQL at all.
Advertisement
Answer
I would suggest replacing in
with a correlated subquery. I’m thinking:
SELECT cl.* FROM custom_logs cl WHERE cl.pn = 'pn-9283896662' AND cl.id = (select max(cl2.id) from customer_logs cl2 where cl2.token = cl.token and cl2.action_clicked_text LIKE '%sometext%' ) and cl.token <> '' AND cl.action_timestamp > 11568 AND cl.action_timestamp < 1570846368 order by cl.action_timestamp desc limit 0, 30;
And for this, I recommend the following indexes:
customer_logs(pn, action_timestamp, token, id)
customer_logs(token, action_clicked_text, id)