Skip to content
Advertisement

MariaDB/MySQL UPDATE statement with multiple joins including a ranged join

I have for tables

A Login History

create table login_history
(
    id         int auto_increment primary key,
    ip         int unsigned,
    created    datetime(6)  not null,
    uid    int unsigned not null,
);

An IP To Location Table

create table ip2location
(
    ip_from      int unsigned not null primary key,
    ip_to        int unsigned null,
    country_code char(2)      null,
)

An Account Table

create table account
(
    uid               int unsigned not null primary key,
);

Some Orders

create table order
(
    id             int auto_increment primary key,
    item_id        varchar(20)       not null,
    price          int               not null,
    timestamp      datetime(6)       not null,
    country_code   char(2)           null,
    uid            int unsigned      null
)

All tables have fitting indices for this problem.

My goal is to fill the country code of the orders with the country from the ip2location table. I have a login history and because I want to make the problem not more complicated as necessary, I am fine with using the most recent IP, a user had in the given time range. I assume that switching a country and buying something within the time range, is a negligible use case. Also, because the login history is only held for a couple of days, I want to fill old orders, which have set the country_code to null, also to get the country for the user.

My approach is the following.

I am trying to join both tables with the following “on” expression.

update order

left join account using(uid)
left join (
    select uid, 
           LAST_VALUE(ip) over (PARTITION BY uid) as `ip_int`
    from login_history
    where created >= '{{ current_date }}'
    and created < '{{ next_date }}'
    group by user_id
    ) as lh
on account.uid = lh.uid
left join ip2location as ip on
    (ip.ip_from < login_history.ip_int and ip.ip_to >= login_history.ip_int)
    or
    (ip.ip_from = lh.ip_int)
set
    order.country_id = ip.country_id
where order.country_id is null;

It works but is very slow. Probably also because of the size of the tables:

  • login_history > 15 Mio. entries (where statement reduces this to 500K entries)
  • account > 7 Mio. entries
  • ip2location ~ 200K entries
  • orders > 1 Mio.

Maybe that’s a use case where MariaDB can’t provide a solution. The target is, to finish this query in under 30 seconds. For reasons of not locking the table for too long, faster would be better of course.

I see some potential in the following statement. For finding the right entry in the ip2location table, I have to use a range and I have to also consider an entry exists, where only one IP is given, and the ip_to field is null.

left join ip2location as ip on
        (ip.ip_from <= login_history.ip_int and ip.ip_to >= login_history.ip_int)
        or
        (ip.ip_from = lh.ip_int)

Also, the following select looks somewhat time intense:

select uid, 
               LAST_VALUE(ip) over (PARTITION BY uid) as `ip_int`
        from login_history
        where created >= '{{ current_date }}'
        and created < '{{ next_date }}'
        group by user_id

I thought about splitting this up by first use a select and then an update statement, but in the end, this could cost more time and also would use more CPU time, because of the script, which organizes this task.

Can you help me find a better query or do you have some good advice on how to tackle this problem efficiently?

Thanks in advance and have a nice day!

Advertisement

Answer

I think the following approach, based on a correlated subquery, does what you ask for:

update orders o
set country = (
    select il.country_code
    from login_history lh
    inner join ip2location il on lh.ip >= il.ip_from and lh.ip_to < il.ip_to
    where lh.created <= o.timestamp and lh.uid = o.uid
    order by lh.created desc limit 1
) 
where o.country_id is null

This searches for the latest login_history for the same user whose date prior or equal to the order timestamp, and recovers the corresponding country.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement