Skip to content
Advertisement

speed up join on string field

I have a table with data like the tableA example below. the date column is formatted as a string. the close column is an integer, ticker is formatted as string.
I’m trying to run the query below on a mysql database and it is taking a very long time. is there anything I can do to speed this up, like changing the format of the date column, or adding indices or primary keys? The combination of ticker and date should be a unique value, and the date field is a timestamp, it’s just currently formatted as string.

code:

select avg((a.close-b.close)/b.close) as avg_annual_returns,
a.ticker
from tableA a 
join tableA b
on cast(a.date as date)=date_add(cast(b.date as date),interval 365 DAY)
and a.ticker=b.ticker
where b.close is not null
group by a.ticker

tableA

+--------+-----+------+
|date    |close|ticker|
+--------+-----+------+
|2/1/2019|5    |abc   |
+--------+-----+------+
|2/3/2019|7    |efd   |
+--------+-----+------+
|2/4/2019|3    |hij   |
+--------+-----+------+

update answer:

select ticker,date, ( -1 +
        a.close / max(a.close) over (partition by ticker 
                                     order by date
                                     range between interval 365 day preceding and interval 365 day preceding
                                    ) 
       ) as annual_returns              
from tableA a
) b where annual_returns is not null
group by ticker

Advertisement

Answer

If you want the difference from a year ago, then use window functions. Before that, though, fix the data model! Do not store dates as strings. So:

alter table talbeA modify column date date;

Then to get the close from a year ago:

select( -1 +
        a.close / max(a.close) over (partition by ticker 
                                     order by date
                                     range between interval 365 day preceding and interval 365 day preceding
                                    )
       )               
from tablea a;

You don’t have to worry about NULL values because AVG() ignores them.

Here is a db<>fiddle.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement