Failed finding a solution to my problem, would love your help.
~~ Post has been edited to have only one question ~~-
Group by one query while selecting multiple columns.
In MySQL
you can simply group by whatever you want, and it will still select all of them, so if for example I wanted to select the newest 100 transactions, grouped by Email (only get the last transaction of a single email)
In MySQL I would do that:
SELECT * FROM db.transactionlog group by Email order by TransactionLogId desc LIMIT 100;
In SQL Server
its not possible, googling a bit suggested to specify each column that I want to have with an aggregate as a hack, that couldn’t cause a mix of values (mixing columns between the grouped rows)?
For example:
SELECT TOP(100) Email, MAX(ResultCode) as 'ResultCode', MAX(Amount) as 'Amount', MAX(TransactionLogId) as 'TransactionLogId' FROM [db].[dbo].[transactionlog] group by Email order by TransactionLogId desc
TransactionLogId
is the primarykey
which is identity , ordering by it to achieve the last inserted.
Just want to know that the ResultCode
and Amount
that I’ll get doing such query will be of the last inserted row, and not the highest of the grouped rows or w/e.
~Edit~ Sample data –
row1:
Email : test@email.com ResultCode : 100 Amount : 27 TransactionLogId : 1
row2:
Email: test@email.com ResultCode:50 Amount: 10 TransactionLogId: 2
Using the sample data above, my goal is to get the row details of
TransactionLogId
= 2.
but what actual happens is that I get a mixed values of the two, as I do get transactionLogId = 2, but the resultcode and amount of the first row.
How do I avoid that?
Thanks.
Advertisement
Answer
You should first find out which is the latest transaction log by each email, then join back against the same table to retrieve the full record:
;WITH MaxTransactionByEmail AS ( SELECT Email, MAX(TransactionLogId) as LatestTransactionLogId FROM [db].[dbo].[transactionlog] group by Email ) SELECT T.* FROM [db].[dbo].[transactionlog] AS T INNER JOIN MaxTransactionByEmail AS M ON T.TransactionLogId = M.LatestTransactionLogId
You are currently getting mixed results because your aggregate functions like MAX()
is considering all rows that correspond to a particular value of Email
. So the MAX()
value for the Amount
column between values 10
and 27
is 27
, even if the transaction log id is lower.
Another solution is using a ROW_NUMBER()
window function to get a row-ranking by each Email
, then just picking the first row:
;WITH TransactionsRanking AS ( SELECT T.*, MostRecentTransactionLogRanking = ROW_NUMBER() OVER ( PARTITION BY T.Email -- Start a different ranking for each different value of Email ORDER BY T.TransactionLogId DESC) -- Order the rows by the TransactionLogID descending FROM [db].[dbo].[transactionlog] AS T ) SELECT T.* FROM TransactionsRanking AS T WHERE T.MostRecentTransactionLogRanking = 1