I have for tables
A Login History
create table login_history ( id int auto_increment primary key, ip int unsigned, created datetime(6) not null, uid int unsigned not null, );
An IP To Location Table
create table ip2location ( ip_from int unsigned not null primary key, ip_to int unsigned null, country_code char(2) null, )
An Account Table
create table account ( uid int unsigned not null primary key, );
Some Orders
create table order ( id int auto_increment primary key, item_id varchar(20) not null, price int not null, timestamp datetime(6) not null, country_code char(2) null, uid int unsigned null )
All tables have fitting indices for this problem.
My goal is to fill the country code of the orders with the country from the ip2location table. I have a login history and because I want to make the problem not more complicated as necessary, I am fine with using the most recent IP, a user had in the given time range. I assume that switching a country and buying something within the time range, is a negligible use case. Also, because the login history is only held for a couple of days, I want to fill old orders, which have set the country_code to null, also to get the country for the user.
My approach is the following.
I am trying to join both tables with the following “on” expression.
update order left join account using(uid) left join ( select uid, LAST_VALUE(ip) over (PARTITION BY uid) as `ip_int` from login_history where created >= '{{ current_date }}' and created < '{{ next_date }}' group by user_id ) as lh on account.uid = lh.uid left join ip2location as ip on (ip.ip_from < login_history.ip_int and ip.ip_to >= login_history.ip_int) or (ip.ip_from = lh.ip_int) set order.country_id = ip.country_id where order.country_id is null;
It works but is very slow. Probably also because of the size of the tables:
- login_history > 15 Mio. entries (where statement reduces this to 500K entries)
- account > 7 Mio. entries
- ip2location ~ 200K entries
- orders > 1 Mio.
Maybe that’s a use case where MariaDB can’t provide a solution. The target is, to finish this query in under 30 seconds. For reasons of not locking the table for too long, faster would be better of course.
I see some potential in the following statement. For finding the right entry in the ip2location table, I have to use a range and I have to also consider an entry exists, where only one IP is given, and the ip_to field is null.
left join ip2location as ip on (ip.ip_from <= login_history.ip_int and ip.ip_to >= login_history.ip_int) or (ip.ip_from = lh.ip_int)
Also, the following select looks somewhat time intense:
select uid, LAST_VALUE(ip) over (PARTITION BY uid) as `ip_int` from login_history where created >= '{{ current_date }}' and created < '{{ next_date }}' group by user_id
I thought about splitting this up by first use a select and then an update statement, but in the end, this could cost more time and also would use more CPU time, because of the script, which organizes this task.
Can you help me find a better query or do you have some good advice on how to tackle this problem efficiently?
Thanks in advance and have a nice day!
Advertisement
Answer
I think the following approach, based on a correlated subquery, does what you ask for:
update orders o set country = ( select il.country_code from login_history lh inner join ip2location il on lh.ip >= il.ip_from and lh.ip_to < il.ip_to where lh.created <= o.timestamp and lh.uid = o.uid order by lh.created desc limit 1 ) where o.country_id is null
This searches for the latest login_history for the same user whose date prior or equal to the order timestamp, and recovers the corresponding country.