In my SELECT statement below, I’m looking to get a COUNT of connections from a table, where connections are defined as 1 connection per unique transactionID.
Thus, in my COUNT, I want to check to make sure that a connection has not already been made (i.e. same transactionID but a lower primary key id).
I tried using COUNT(DISTINCT pc.TransactionID) but the problem was that the same transactionID can occur within each grouping (ThisDate, SiteName, UserName, Command), so it was counting some connections more than once.
Then I tried joining with the same table and used SUM instead of COUNT, which seemed to work, but that increased BytesTransferred value due to the extra records that were pulled in.
Ideas on the best way to handle this?
ProtocolCommands table
ProtocolCommandID, int, not null, primary key
Time_Stamp Datetime
SiteName nvarchar(50) null
UserName nvarchar(50) null
Command nvarchar(50) null
BytesTransferred numeric(18,0) null
TransactionID int not null
Select statement
SELECT
CONVERT(date,pc.Time_stamp) As ThisDate,
pc.SiteName,
pc.UserName,
pc.Command,
SUM(CASE WHEN p.TransactionID IS NULL THEN 1 ELSE 0 END) As Connections,
-- COUNT(DISTINCT pc.TransactionID) AS Connections,
SUM(pc.BytesTransferred) AS DataTransferred
FROM ProtocolCommands pc
LEFT OUTER JOIN ProtocolCommands p ON p.TransactionID = pc.TransactionID AND p.ProtocolCommandID < pc.ProtocolCommandID
WHERE pc.Time_Stamp >= '01/01/2019'
GROUP BY CONVERT(date,pc.Time_stamp),pc.SiteName, pc.UserName, pc.Command
To give an example, here are some records:
Time_stamp SiteName UserName Command BytesTransferred TransactionID
2019-04-20 MySite Joe connect 2000 12345
2019-04-20 MySite Joe send 20000 12345
2019-04-21 MySite Joe retrieve 25952 12345
2019-04-21 MySite Joe send 30000 12345
When executing the SELECT with the COUNT(DISTINCT pc.TransactionID) and removing the inner join, the results are this (note incorrect DataTransferred for the last two rows):
ThisDate SiteName UserName Command Connections DataTransferred
2019-04-20 MySite Joe connect 1 2000
2019-04-20 MySite Joe send 0 20000
2019-04-21 MySite Joe retrieve 0 51904
2019-04-21 MySite Joe send 0 90000
When executing the SELECT with the SUM and adding the inner join, the results are this (note incorrect Connections – there should only be 1 total):
ThisDate SiteName UserName Command Connections DataTransferred
2019-04-20 MySite Joe connect 1 2000
2019-04-20 MySite Joe send 1 20000
2019-04-21 MySite Joe retrieve 1 25952
2019-04-21 MySite Joe send 1 30000
Expected result:
ThisDate SiteName UserName Command Connections DataTransferred
2019-04-20 MySite Joe connect 1 2000
2019-04-20 MySite Joe send 0 20000
2019-04-21 MySite Joe retrieve 0 25952
2019-04-21 MySite Joe send 0 30000
Advertisement
Answer
Not sure if I understood correctly, but you seem to want to detect the oldest occurance of a connection. You can do that using this query:
SELECT ROW_NUMBER() OVER (PARTITION BY TransactionId ORDER BY ProtocolCommandID ASC) as RowNum
, ProtocolCommandId
FROM ProtocolCommands
This will have RowNum = 1 for the record where connection appears for the very first time.
Joining that to your main table you can get the correct connection count
SELECT
CONVERT(date,pc.Time_stamp) As ThisDate,
pc.SiteName,
pc.UserName,
pc.Command,
CASE WHEN first_conn.RowNum = 1 THEN SUM(first_conn.RowNum) ELSE 0 END AS Connection,
SUM(pc.BytesTransferred) AS DataTransferred
FROM ProtocolCommands pc
LEFT OUTER JOIN
(SELECT ROW_NUMBER() OVER (PARTITION BY TransactionId ORDER BY ProtocolCommandID ASC) as RowNum
, ProtocolCommandId
FROM ProtocolCommands) as first_conn ON pc.ProtocolCommandID = first_conn.ProtocolCommandID
GROUP BY CONVERT(date,pc.Time_stamp),pc.SiteName, pc.UserName, pc.Command, first_conn.RowNum