In my SELECT statement below, I’m looking to get a COUNT of connections from a table, where connections are defined as 1 connection per unique transactionID.
Thus, in my COUNT, I want to check to make sure that a connection has not already been made (i.e. same transactionID but a lower primary key id).
I tried using COUNT(DISTINCT pc.TransactionID) but the problem was that the same transactionID can occur within each grouping (ThisDate, SiteName, UserName, Command), so it was counting some connections more than once.
Then I tried joining with the same table and used SUM instead of COUNT, which seemed to work, but that increased BytesTransferred value due to the extra records that were pulled in.
Ideas on the best way to handle this?
ProtocolCommands table
ProtocolCommandID, int, not null, primary key Time_Stamp Datetime SiteName nvarchar(50) null UserName nvarchar(50) null Command nvarchar(50) null BytesTransferred numeric(18,0) null TransactionID int not null
Select statement
SELECT CONVERT(date,pc.Time_stamp) As ThisDate, pc.SiteName, pc.UserName, pc.Command, SUM(CASE WHEN p.TransactionID IS NULL THEN 1 ELSE 0 END) As Connections, -- COUNT(DISTINCT pc.TransactionID) AS Connections, SUM(pc.BytesTransferred) AS DataTransferred FROM ProtocolCommands pc LEFT OUTER JOIN ProtocolCommands p ON p.TransactionID = pc.TransactionID AND p.ProtocolCommandID < pc.ProtocolCommandID WHERE pc.Time_Stamp >= '01/01/2019' GROUP BY CONVERT(date,pc.Time_stamp),pc.SiteName, pc.UserName, pc.Command
To give an example, here are some records:
Time_stamp SiteName UserName Command BytesTransferred TransactionID 2019-04-20 MySite Joe connect 2000 12345 2019-04-20 MySite Joe send 20000 12345 2019-04-21 MySite Joe retrieve 25952 12345 2019-04-21 MySite Joe send 30000 12345
When executing the SELECT with the COUNT(DISTINCT pc.TransactionID) and removing the inner join, the results are this (note incorrect DataTransferred for the last two rows):
ThisDate SiteName UserName Command Connections DataTransferred 2019-04-20 MySite Joe connect 1 2000 2019-04-20 MySite Joe send 0 20000 2019-04-21 MySite Joe retrieve 0 51904 2019-04-21 MySite Joe send 0 90000
When executing the SELECT with the SUM and adding the inner join, the results are this (note incorrect Connections – there should only be 1 total):
ThisDate SiteName UserName Command Connections DataTransferred 2019-04-20 MySite Joe connect 1 2000 2019-04-20 MySite Joe send 1 20000 2019-04-21 MySite Joe retrieve 1 25952 2019-04-21 MySite Joe send 1 30000
Expected result:
ThisDate SiteName UserName Command Connections DataTransferred 2019-04-20 MySite Joe connect 1 2000 2019-04-20 MySite Joe send 0 20000 2019-04-21 MySite Joe retrieve 0 25952 2019-04-21 MySite Joe send 0 30000
Advertisement
Answer
Not sure if I understood correctly, but you seem to want to detect the oldest occurance of a connection. You can do that using this query:
SELECT ROW_NUMBER() OVER (PARTITION BY TransactionId ORDER BY ProtocolCommandID ASC) as RowNum , ProtocolCommandId FROM ProtocolCommands
This will have RowNum = 1 for the record where connection appears for the very first time.
Joining that to your main table you can get the correct connection count
SELECT CONVERT(date,pc.Time_stamp) As ThisDate, pc.SiteName, pc.UserName, pc.Command, CASE WHEN first_conn.RowNum = 1 THEN SUM(first_conn.RowNum) ELSE 0 END AS Connection, SUM(pc.BytesTransferred) AS DataTransferred FROM ProtocolCommands pc LEFT OUTER JOIN (SELECT ROW_NUMBER() OVER (PARTITION BY TransactionId ORDER BY ProtocolCommandID ASC) as RowNum , ProtocolCommandId FROM ProtocolCommands) as first_conn ON pc.ProtocolCommandID = first_conn.ProtocolCommandID GROUP BY CONVERT(date,pc.Time_stamp),pc.SiteName, pc.UserName, pc.Command, first_conn.RowNum