Skip to content
Advertisement

SELECT COUNT with JOIN optimization for tables with > 100M rows

I have following query

SELECT SUBSTRING(a0_.created_date FROM 1 FOR 10) AS sclr_0, 
       COUNT(1) AS sclr_1 
FROM applications a0_ INNER JOIN 
     package_codes p1_ ON a0_.id = p1_.application_id 
WHERE a0_.created_date BETWEEN '2019-01-01' AND '2020-01-01' AND
      p1_.type = 'Package 1'
GROUP BY sclr_0

— EDIT —

Most of you have focused on GROUP BY and SUBSTRING, but this is not the root of problem.

Following query have the same execution time:

SELECT COUNT(1) AS sclr_1 
FROM applications a0_ INNER JOIN 
     package_codes p1_ ON a0_.id = p1_.application_id 
WHERE a0_.created_date BETWEEN '2019-01-01' AND '2020-01-01' AND
      p1_.type = 'Package 1'

— EDIT 2 —

After adding index on applications.created_date and forcing the query to use specified indexes as @DDS suggests the execution time drops to ~750ms

The current query looks like:

SELECT SUBSTRING(a0_.created_date FROM 1 FOR 10) AS sclr_0, 
       COUNT(1) AS sclr_1 
FROM applications a0_ USE INDEX (applications_created_date_idx) INNER JOIN 
     package_codes p1_ USE INDEX (PRIMARY, UNIQ_70A9C6AA3E030ACD, package_codes_type_idx) ON a0_.id = p1_.application_id 
WHERE a0_.created_date BETWEEN '2019-01-01' AND '2020-01-01' AND
      p1_.type = 'Package 1'
GROUP BY sclr_0

— EDIT 3 —

I have found that using to much indexes in query may cause that in some cases MySQL will use non optimal indexes, so the final query should looks as following:

SELECT SUBSTRING(a0_.created_date FROM 1 FOR 10) AS sclr_0, 
       COUNT(1) AS sclr_1 
FROM applications a0_ USE INDEX (applications_created_date_idx) INNER JOIN 
     package_codes p1_ USE INDEX (package_codes_application_idx) ON a0_.id = p1_.application_id 
WHERE a0_.created_date BETWEEN '2019-01-01' AND '2020-01-01' AND
      p1_.type = 'Package 1'
GROUP BY sclr_0

— END EDIT —

package_codes contains over 100.000.000 records.

applications contains over 250.000 records.

The query takes 2 minutes to get result. Is there any way to optimize it? I’m stuck on MySQL 5.5.

Tables:

CREATE TABLE `applications` (
  `id` int(11) NOT NULL,
  `created_date` datetime NOT NULL,
  `name` varchar(64) COLLATE utf8mb4_unicode_ci NOT NULL,
  `surname` varchar(64) COLLATE utf8mb4_unicode_ci NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

ALTER TABLE `applications`
  ADD PRIMARY KEY (`id`),
  ADD KEY `applications_created_date_idx` (`created_date`);

ALTER TABLE `applications`
  MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;
CREATE TABLE `package_codes` (
  `id` int(11) NOT NULL,
  `application_id` int(11) DEFAULT NULL,
  `created_date` datetime NOT NULL,
  `type` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
  `code` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
  `disabled` tinyint(1) NOT NULL DEFAULT '0',
  `meta_data` longtext COLLATE utf8mb4_unicode_ci
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

ALTER TABLE `package_codes`
  ADD PRIMARY KEY (`id`),
  ADD UNIQUE KEY `UNIQ_70A9C6AA3E030ACD` (`application_id`),
  ADD KEY `package_codes_code_idx` (`code`),
  ADD KEY `package_codes_type_idx` (`type`),
  ADD KEY `package_codes_application_idx` (`application_id`),
  ADD KEY `package_codes_code_application_idx` (`code`,`application_id`);

ALTER TABLE `package_codes`
  MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;

ALTER TABLE `package_codes`
  ADD CONSTRAINT `FK_70A9C6AA3E030ACD` FOREIGN KEY (`application_id`) REFERENCES `applications` (`id`);

Advertisement

Answer

After adding index on applications.created_date and forcing the query to use specified indexes as @DDS suggests the execution time drops to ~750ms

The final query should look as following:

SELECT SUBSTRING(a0_.created_date FROM 1 FOR 10) AS sclr_0, 
       COUNT(1) AS sclr_1 
FROM applications a0_ USE INDEX (applications_created_date_idx) INNER JOIN 
     package_codes p1_ USE INDEX (package_codes_application_idx) ON a0_.id = p1_.application_id 
WHERE a0_.created_date BETWEEN '2019-01-01' AND '2020-01-01' AND
      p1_.type = 'Package 1'
GROUP BY sclr_0
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement