Skip to content
Advertisement

MYSQL joining two table slow

I have 2 main table in MYSQL DB. I will call it entityA and entityB.
There are some logic that user will start to create a entityA and then convert it to entityB but some time they will not create entityA and only create entityB
Therefore, I have created a middle table which stored entityAId and entityBId.

entityAId entityBId
1 1
2 null
null 2

I need to show them in one list view for user. If there are entityB, show the information in entityB table and if no entityB, show entityA information. What current SQL will be like below:

SELECT IFNULL(entityB.colA, entityA.colA), IFNULL(entityB.colB, entity.colB), ....
FROM middle
LEFT JOIN entityA on middle.entityAId = entityA.id
LEFT JOIN entityB on middle.entityBId = entityB.id
WHERE (entityB.colA = "XXX" OR entityA.colA= "XXX") AND ....
ORDER BY IFNULL(entityB.colA, entityA.colA)
LIMIT 20 OFFSET 0

The SQL become slow when the table become bigger as one customer imported a large amount of historical data. Can anybody help how to make it faster? I am not sure if it can use by rewriting the SQL or creating some index on it? Thanks

======== AND SOME MORE FROM COMMENT =========

Currently middle table has no id, I just added a unique for entityAId and entityBId

Below is the Create statement for 3 table:

CREATE TABLE `entityA` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `colA` varchar(100) DEFAULT NULL,
  `colB` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_entityA_party_group_code` (`colA`),
  KEY `idx_entityA_booking_no` (`colB`),
  KEY `idx_entityA_id_partyGroupCode` (`id`,`colA`),
  KEY `idx_entityA_id_moduleTypeCode` (`id`,`colB`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
CREATE TABLE `entityB` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `colA` varchar(100) DEFAULT NULL,
  `colB` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_entityB_party_group_code` (`colA`),
  KEY `idx_entityB_booking_no` (`colB`),
  KEY `idx_entityB_id_colA` (`id`,`colA`),
  KEY `idx_entityB_id_colB` (`id`,`colB`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
CREATE TABLE `middle` (
  `entityAId` bigint DEFAULT NULL,
  `entityBId` bigint DEFAULT NULL,
  UNIQUE KEY `idx_middle_entityAId_entityBId` (`entityAId`,`entityBId`),
  KEY `idx_middle_entityAId (`entityAId`),
  KEY `idx_middle_entityBId` (`entityBId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

====. ADD explain in JSON =====

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "717863.40"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "middle",
          "access_type": "index",
          "key": "idx_middle_entityAId_entityBId",
          "used_key_parts": [
            "entityAId",
            "entityBId"
          ],
          "key_length": "18",
          "rows_examined_per_scan": 896658,
          "rows_produced_per_join": 896658,
          "filtered": "100.00",
          "using_index": true,
          "cost_info": {
            "read_cost": "537.00",
            "eval_cost": "89665.80",
            "prefix_cost": "90202.80",
            "data_read_per_join": "20M"
          },
          "used_columns": [
            "entityAId",
            "entityBId"
          ]
        }
      },
      {
        "table": {
          "table_name": "entityA",
          "access_type": "eq_ref",
          "possible_keys": [
            "PRIMARY",
            "idx_entityA_id_entityANo",
            "idx_entityA_id_partyGroupCode",
            "idx_entityA_id_moduleTypeCode"
          ],
          "key": "PRIMARY",
          "used_key_parts": [
            "id"
          ],
          "key_length": "8",
          "ref": [
            "db_main.middle.entityAId"
          ],
          "rows_examined_per_scan": 1,
          "rows_produced_per_join": 896658,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "224164.50",
            "eval_cost": "89665.80",
            "prefix_cost": "404033.10",
            "data_read_per_join": "23G"
          },
          "used_columns": [
            "id",
            "partyGroupCode"
          ]
        }
      },
      {
        "table": {
          "table_name": "entityB",
          "access_type": "eq_ref",
          "possible_keys": [
            "PRIMARY",
            "idx_entityB_id_partyGroupCode",
            "idx_entityB_id_jobNo",
            "idx_entityB_id_masterNo",
            "idx_entityB_id_houseNo"
          ],
          "key": "PRIMARY",
          "used_key_parts": [
            "id"
          ],
          "key_length": "8",
          "ref": [
            "db_main.middle.entityBId"
          ],
          "rows_examined_per_scan": 1,
          "rows_produced_per_join": 896658,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "224164.50",
            "eval_cost": "89665.80",
            "prefix_cost": "717863.40",
            "data_read_per_join": "27G"
          },
          "used_columns": [
            "id",
            "partyGroupCode"
          ],
          "attached_condition": "<if>(found_match(entityB), (((`db_main`.`entityA`.`id` is not null) or (`db_main`.`entityB`.`id` is not null)) and ((`db_main`.`entityB`.`partyGroupCode` = 'DEV') or (`db_main`.`entityA`.`partyGroupCode` = 'DEV'))), true)"
        }
      }
    ]
  }
}

Advertisement

Answer

  • When you have PRIMARY KEY(id), do not also have INDEX(id, ...). (Space improvement, not speed.)

  • If middle is a many-to-many mapping table, I am surprised that either of the columns is declared NULL. Make them NOT NULL and follow the indexing instructions in http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table (Speed and space improvements; hard to predict how much improvement.)

  • OR does not optimize well. Use a UNION: (Potentially a large speed improvement)

    ( SELECT  IFNULL(entityB.colA, entityA.colA),
          IFNULL(entityB.colB, entity.colB), ....
      FROM  middle
      LEFT JOIN  entityA  ON middle.entityAId = entityA.id
      LEFT JOIN  entityB  ON middle.entityBId = entityB.id
      WHERE  entityB.colA = "XXX"
        AND  ....
      ORDER BY  IFNULL(entityB.colA, entityA.colA)
      LIMIT  20 )
    UNION
    ( SELECT  IFNULL(entityB.colA, entityA.colA),
          IFNULL(entityB.colB, entity.colB), ....
      FROM  middle
      LEFT JOIN  entityA  ON middle.entityAId = entityA.id
      LEFT JOIN  entityB  ON middle.entityBId = entityB.id
      WHERE  entityA.colA= "XXX"
        AND  ....
      ORDER BY  IFNULL(entityB.colA, entityA.colA)
      LIMIT  20  )
    ORDER BY  IFNULL(entityB.colA, entityA.colA)
    LIMIT  20 OFFSET 0 
    
  • If you will be increasing the OFFSET, see this for how to deal with it: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#or

  • UNION ALL is faster than UNION DISTINCT (which is the default). Use ALL if you “XXX” will not show up in both tables.

  • LEFT is probably unnecessary. (No speed change.)

  • "rows_examined_per_scan": 896658 may drop to 20 !

  • It may be possible to store the data so that the relationship always has A < B. That way, the search could do one Select (without OR or UNION) (double speed, but add complexity) Since you are hiding the “entity”, I cannot say whether it would matter.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement