Skip to content
Advertisement

Group by question in SQL Server, migration from MySQL

Failed finding a solution to my problem, would love your help.

~~ Post has been edited to have only one question ~~-

Group by one query while selecting multiple columns. In MySQL you can simply group by whatever you want, and it will still select all of them, so if for example I wanted to select the newest 100 transactions, grouped by Email (only get the last transaction of a single email)

In MySQL I would do that:

SELECT * FROM db.transactionlog
group by Email
order by TransactionLogId desc
LIMIT 100;

In SQL Server its not possible, googling a bit suggested to specify each column that I want to have with an aggregate as a hack, that couldn’t cause a mix of values (mixing columns between the grouped rows)?

For example:

SELECT TOP(100)
   Email,
   MAX(ResultCode) as 'ResultCode',
   MAX(Amount) as 'Amount',
   MAX(TransactionLogId) as 'TransactionLogId'
  FROM [db].[dbo].[transactionlog]

  group by Email
  order by TransactionLogId desc

TransactionLogId is the primarykey which is identity , ordering by it to achieve the last inserted. Just want to know that the ResultCode and Amount that I’ll get doing such query will be of the last inserted row, and not the highest of the grouped rows or w/e.

~Edit~ Sample data –

row1:

Email : test@email.com
ResultCode : 100
Amount : 27
TransactionLogId : 1

row2:

Email: test@email.com
ResultCode:50
Amount: 10
TransactionLogId: 2

Using the sample data above, my goal is to get the row details of TransactionLogId = 2. but what actual happens is that I get a mixed values of the two, as I do get transactionLogId = 2, but the resultcode and amount of the first row. How do I avoid that?

Thanks.

Advertisement

Answer

You should first find out which is the latest transaction log by each email, then join back against the same table to retrieve the full record:

;WITH MaxTransactionByEmail AS
(
    SELECT
        Email,
        MAX(TransactionLogId) as LatestTransactionLogId
    FROM 
        [db].[dbo].[transactionlog]
    group by 
        Email
)
SELECT
    T.*
FROM
    [db].[dbo].[transactionlog] AS T
    INNER JOIN MaxTransactionByEmail AS M ON T.TransactionLogId = M.LatestTransactionLogId

You are currently getting mixed results because your aggregate functions like MAX() is considering all rows that correspond to a particular value of Email. So the MAX() value for the Amount column between values 10 and 27 is 27, even if the transaction log id is lower.

Another solution is using a ROW_NUMBER() window function to get a row-ranking by each Email, then just picking the first row:

;WITH TransactionsRanking AS
(
    SELECT
        T.*,
        MostRecentTransactionLogRanking = ROW_NUMBER() OVER (
            PARTITION BY 
                T.Email                     -- Start a different ranking for each different value of Email
            ORDER BY 
                T.TransactionLogId DESC)    -- Order the rows by the TransactionLogID descending
    FROM 
        [db].[dbo].[transactionlog] AS T
)
SELECT
    T.*
FROM
    TransactionsRanking AS T
WHERE
    T.MostRecentTransactionLogRanking = 1
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement