so I have a statement I believe should work… However it feels pretty suboptimal and I can’t for the life of me figure out how to optimise it.
I have the following tables:
- Transactions
- [Id] is
PRIMARY KEY IDENTITY
- [Hash] has a
UNIQUE
constraint - [BlockNumber] has an Index
- [Id] is
- Transfers
- [Id] is
PRIMARY KEY IDENTITY
- [TransactionId] is a Foreign Key referencing [Transactions].[Id]
- [Id] is
- TokenPrices
- [Id] is
PRIMARY KEY IDENTITY
- [Id] is
- TokenPriceAttempts
- [Id] is
PRIMARY KEY IDENTITY
- [TransferId] is a Foreign Key referencing [Transfers].[Id]
- [Id] is
What I want to do, is select all the transfers, with a few bits of data from their related transaction (one transaction to many transfers), where I don’t currently have a price stored in TokenPrices related to that transfer.
In the first part of the query, I am getting a list of all the transfers, and calculating the DIFF between the nearest found token price. If one isn’t found, this is null (what I ultimately want to select). I am allowing 3 hours either side of the transaction timestamp – if nothing is found within that timespan, it will be null.
Secondly, I am selecting from this set, ensuring first that diff is null as this means the price is missing, and finally that token price attempts either doesn’t have an entry for attempting to get a price, or if it does than it has fewer than 5 attempts listed and the last attempt was more than a week ago.
The way I have laid this out results in essentially 3 of the same / similar SELECT statements within the WHERE clause, which feels hugely suboptimal…
How could I improve this approach?
WITH [transferDateDiff] AS ( SELECT [t1].[Id], [t1].[TransactionId], [t1].[From], [t1].[To], [t1].[Value], [t1].[Type], [t1].[ContractAddress], [t1].[TokenId], [t2].[Hash], [t2].[Timestamp], ABS(DATEDIFF(SECOND, [tp].[Timestamp], [t2].[Timestamp])) AS diff FROM [dbo].[Transfers] AS [t1] LEFT JOIN [dbo].[Transactions] AS [t2] ON [t1].[TransactionId] = [t2].[Id] LEFT JOIN ( SELECT * FROM [dbo].[TokenPrices] ) AS [tp] ON [tp].[ContractAddress] = [t1].[ContractAddress] AND [tp].[Timestamp] >= DATEADD(HOUR, - 3, [t2].[Timestamp]) AND [tp].[Timestamp] <= DATEADD(HOUR, 3, [t2].[Timestamp]) WHERE [t1].[Type] < 2 ) SELECT [tdd].[Id], [tdd].[TransactionId], [tdd].[From], [tdd].[To], [tdd].[Value], [tdd].[Type], [tdd].[ContractAddress], [tdd].[TokenId], [tdd].[Hash], [tdd].[Timestamp] FROM [transferDateDiff] AS tdd WHERE [tdd].[diff] IS NULL AND ( ( SELECT COUNT(*) FROM [dbo].[TokenPriceAttempts] tpa WHERE [tpa].[TransferId] = [tdd].[Id] ) = 0 OR ( ( SELECT COUNT(*) FROM [dbo].[TokenPriceAttempts] tpa WHERE [tpa].[TransferId] = [tdd].[Id] ) < 5 AND ( DATEDIFF(DAY, ( SELECT MAX([tpa].[Created]) FROM [dbo].[TokenPriceAttempts] tpa WHERE [tpa].[TransferId] = [tdd].[Id] ), CURRENT_TIMESTAMP ) >= 7 ) ) )
Advertisement
Answer
Here is an attempt to help simplify. I stripped out all the [brackets] that really are not required unless you are running into something like a reserved keyword, or columns with spaces in their name (bad to begin with).
Anyhow, your main query had 3 instances of a select per ID. To eliminate that, I did a LEFT JOIN to a subquery that pulls all transfers of type < 2 AND JOINS to the price attempts ONCE. This way, the result will have already pre-aggregated the count(*) and Max(Created) done ONCE for the same basis of transfers in question with your WITH CTE declaration. So you dont have to keep running the 3 queries each time, and you dont have to query the entire table of ALL transfers, just those with same underlying type < 2 condition. The result subquery alias “PQ” (preQuery)
This now simplifies the readability of the outer WHERE clause from the redundant counts per Id.
WITH transferDateDiff AS ( SELECT t1.Id, t1.TransactionId, t1.From, t1.To, t1.Value, t1.Type, t1.ContractAddress, t1.TokenId, t2.Hash, t2.Timestamp, ABS( DATEDIFF( SECOND, tp.Timestamp, t2.Timestamp )) AS diff FROM dbo.Transfers t1 LEFT JOIN dbo.Transactions t2 ON t1.TransactionId = t2.Id LEFT JOIN dbo.TokenPrices tp ON t1.ContractAddress = tp.ContractAddress AND tp.Timestamp >= DATEADD(HOUR, - 3, t2.Timestamp) AND tp.Timestamp <= DATEADD(HOUR, 3, t2.Timestamp) WHERE t1.Type < 2 ) SELECT tdd.Id, tdd.TransactionId, tdd.From, tdd.To, tdd.Value, tdd.Type, tdd.ContractAddress, tdd.TokenId, tdd.Hash, tdd.Timestamp FROM transferDateDiff tdd LEFT JOIN ( SELECT t1.Id, COUNT(*) Attempts, MAX(tpa.Created) MaxCreated FROM dbo.Transfers t1 JOIN dbo.TokenPriceAttempts tpa on t1.Id = tpa.TransferId WHERE t1.Type < 2 GROUP BY t1.Id ) PQ on tdd.Id = PQ.Id WHERE tdd.diff IS NULL AND ( PQ.Attempts IS NULL OR PQ.Attempts = 0 OR ( PQ.Attempts < 5 AND DATEDIFF(DAY, PQ.MaxCreated, CURRENT_TIMESTAMP ) >= 7 ) )
REVISED to remove the WITH CTE into a single query
SELECT t1.Id, t1.TransactionId, t1.From, t1.To, t1.Value, t1.Type, t1.ContractAddress, t1.TokenId, t2.Hash, t2.Timestamp FROM -- Now, this pre-query is left-joined to token price attempts -- so ALL Transfers of type < 2 are considered ( SELECT t1.Id, coalesce( COUNT(*), 0 ) Attempts, MAX(tpa.Created) MaxCreated FROM dbo.Transfers t1 LEFT JOIN dbo.TokenPriceAttempts tpa on t1.Id = tpa.TransferId WHERE t1.Type < 2 GROUP BY t1.Id ) PQ -- Now, we can just directly join to transfers for the rest JOIN dbo.Transfers t1 on PQ.Id = t1.Id -- and the rest from the WITH CTE construct LEFT JOIN dbo.Transactions t2 ON t1.TransactionId = t2.Id LEFT JOIN dbo.TokenPrices tp ON t1.ContractAddress = tp.ContractAddress AND tp.Timestamp >= DATEADD(HOUR, - 3, t2.Timestamp) AND tp.Timestamp <= DATEADD(HOUR, 3, t2.Timestamp) WHERE ABS( DATEDIFF( SECOND, tp.Timestamp, t2.Timestamp )) IS NULL AND ( PQ.Attempts = 0 OR ( PQ.Attempts < 5 AND DATEDIFF(DAY, PQ.MaxCreated, CURRENT_TIMESTAMP ) >= 7 ) )