Skip to content
Advertisement

MySQL Spatial Join on Closest Point

I’ve looked around a bit and found quite a few people seeking to order a table of points by distance to a set point, but I’m curious how one would go about efficiently joining two tables on the minimum distance between two points. In my case, consider the table nodes and centroids.

CREATE TABLE nodes (
    node_id VARCHAR(255),
    pt POINT
);
CREATE TABLE centroids (
    centroid_id MEDIUMINT UNSIGNED,
    temperature FLOAT,
    pt POINT
);

I have approximately 300k nodes and 15k centroids, and I want to get the closest centroid to each node so I can assign each node a temperature. So far I have created spatial indexes on pt on both tables and tried running the following query:

SELECT
    nodes.node_id,
    MIN(ST_DISTANCE(nodes.pt, centroids.pt))
FROM nodes
INNER JOIN centroids
ON ST_DISTANCE(nodes.pt, centroids.pt) <= 4810
GROUP BY
    nodes.node_id
LIMIT 10;

Clearly, this query is not going to solve my problem; it does not retrieve temperature, assumes that the closest centroid is within 4810, and only evaluates 10 nodes. However, even with these simplifications, this query is very poorly optimized, and is still running as I type this. When I have MySQL give details about the query, it says no indexes are being used and none of the spatial indexes are listed as possible keys.

How could I build a query that can actually return the data I want joined efficiently utilizing spatial indexes?

Advertisement

Answer

I think a good approach would be partitioning (numerically not db partitioning) the data into cells. I don’t know how well spatial indexes applies here, but the high-level logic is to say bin each node and centroid point into square regions and find matches between all the node-centroid in the same square, then make sure that there isn’t a closer match in an 8-adjacent square (e.g. using the same nodes in original square). The closest matches can then be used to compute and save the temperature. All subsequent queries should ignore nodes with the temperature set.

There will still be nodes with centroids that aren’t within the same or 8-adjacent squares, you would then expand the search, perhaps use squares with double the width and height. I can see this working with plain indexes on just the x and y coordinate of the points. I don’t know how spatial indexes can further improve this.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement