Skip to content
Advertisement

MySql Partitioning questions for very segmented data

There are a number of partitioning questions out here and a lot of the time people caution that it often does not help performance. I want to see if my use case is one of the few that does and also clarify something.

In my use case, every user is going to be assigned to a bucket. Every user in a bucket has no visibility to users in other buckets. Thus almost every single query the application runs will include WHERE bucket = X.

Therefore even if there are 1 billion rows in the table, any given query only cares about a small segment of this.

My questions are

  1. Is this a use case that truly is a good performance booster for partitioning?

  2. When I add my partition column to every unique key, that index would look like (Column A, Partition Column). If I recall, in a compound index, you don’t get free indexing on the 2nd index by itself. So if I did a query on every single user in partition 5 ordered by some non unique Column C does the fact that the partition column does not have its own index matter?

  3. There could be 100-200 of these partitions depending on the volume of users into the system. I’m looking at a massive and nasty partition range expression that looks like the following

          ALTER TABLE XXX
          PARTITION BY RANGE(the_thing) (
          PARTITION p0 VALUES LESS THAN (1),
          PARTITION p1 VALUES LESS THAN (2),
          PARTITION p2 VALUES LESS THAN (3),
          PARTITION p3 VALUES LESS THAN (4),
          PARTITION p3 VALUES LESS THAN (5),
          PARTITION p3 VALUES LESS THAN (6),
          PARTITION p3 VALUES LESS THAN (7),
          PARTITION p3 VALUES LESS THAN (8),
          AND ON AND ON AND ON....
          PARTITION p3 VALUES LESS THAN (200),
          PARTITION p3 VALUES LESS THAN MAX_VALUE);
    

That looks really akward and ugly. Is this just how it is?

Advertisement

Answer

Yes, this sounds like a good case for partitioning, since the queries you want to optimize will all be searching for a specific partition. The usual problem is that many apps have a mix of queries, so only some of them benefit from partition pruning, while the others have a bit of extra overhead for doing N queries against partitions and unifying the results.

If you define unique indexes as compound indexes like (ColumnA, PartitionColumn) keep in mind that ColumnA will only be unique within a given partition. Other rows in other partitions can have the same value in ColumnA. For example, the following is allowed, even with the unique index:

INSERT INTO XXX (ColumnA, PartitionColumn) 
VALUES
  ('abc', 5),
  ('abc', 6),
  ('abc', 19);

This might not be what you want, but that’s how compound unique indexes work. As long as the set of columns have unique values that don’t occur on any other row, the new row is allowed.

As for using the index, yes, a query can use some other index that does not include the partition column, even after doing partition pruning.

The bulky list of explicit partitions is required if you use RANGE partitioning as you show.

You could alternatively use HASH partitioning

ALTER TABLE XXX PARTITION BY HASH(PartitionColumn) PARTITIONS 200;

Then you don’t have to list every partition explicitly. Data will automatically be inserted into the respective partition according to the modulus of the partition expression. The partition expression must return an integer.

You can also use KEY partitioning, which is not restricted to a single column, nor integer columns. The value(s) are concatenated and hashed and that determines an integer used to find the right partition. The method of hashing depends on the storage engine of the table.


Re comments:

I will have to create an additional unique constraint on the primary key alone to enforce cross-partition uniqueness

That doesn’t work. The partition column must be part of all unique keys.

This is the most common reason that most people who want to use partitioning decide they can’t.

is it guaranteed that only 1 number will map to only 1 partition (assuming I don’t go over 200) and that once I assign somebody a value in this column, unless I change it, they would always be on the same partition.

Yes, HASH partitioning uses a deterministic operation: modulus. If you have 200 partitions and all the values are less than 200, then each partition will have only rows with one value. And rows with that value will not occupy more than one partition.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement