The data I am working with
Consider the following 2 database tables:
CREATE TABLE [dbo].[Contact]( [ID] [int] IDENTITY(1,1) NOT NULL, [Contact_UID] [uniqueidentifier] NOT NULL CONSTRAINT [DF_Contact_Contact_UID] DEFAULT (newsequentialid()), [Name] [nvarchar](255) NOT NULL, [ContactStatus] [nvarchar](255) NOT NULL) CREATE TABLE [dbo].[Contact_Log]( [ID] [int] IDENTITY(1,1) NOT NULL, [LogDate] [datetimeoffset](7) NOT NULL CONSTRAINT [DF_Contact_Log_LogDate] DEFAULT (sysdatetimeoffset()), [Contact_UID] [uniqueidentifier] NOT NULL CONSTRAINT [DF_Contact_Log_Contact_UID] DEFAULT (newsequentialid()), [Name] [nvarchar](255) NOT NULL, [ContactStatus] [nvarchar](255) NOT NULL)
The Contact table is the main table for contact records. It stores the name of the contact and the status (e.g. “Alive”, “Dead” or whatever).
The Contact_Log table stores all changes made to the Contact table.
So here is some sample data:
Contact:
+----+--------------------------------------+------+---------------+ | ID | Contact_UID | Name | ContactStatus | +----+--------------------------------------+------+---------------+ | 1 | 62918AC1-1C6C-4DEB-B7F8-5D5EF913F667 | John | Dead | +----+--------------------------------------+------+---------------+ | 2 | F7844037-2FF5-47B9-874D-C0920E7DC092 | Jane | Alive | +----+--------------------------------------+------+---------------+
Contact_Log:
+----+--------------------------------------+------+---------------+------------+ | ID | Contact_UID | Name | ContactStatus | LogDate | +----+--------------------------------------+------+---------------+------------+ | 1 | 62918AC1-1C6C-4DEB-B7F8-5D5EF913F667 | John | Alive | 2019-01-01 | +----+--------------------------------------+------+---------------+------------+ | 2 | 62918AC1-1C6C-4DEB-B7F8-5D5EF913F667 | John | Dead | 2019-01-02 | +----+--------------------------------------+------+---------------+------------+ | 3 | 62918AC1-1C6C-4DEB-B7F8-5D5EF913F667 | John | Dead | 2019-01-03 | +----+--------------------------------------+------+---------------+------------+ | 4 | F7844037-2FF5-47B9-874D-C0920E7DC092 | Jane | Alive | 2019-01-04 | +----+--------------------------------------+------+---------------+------------+
NOTE: At this point I haven’t added any indexes or anything like that on that tables.
Test Scenario
The above is just some sample data. The data I am testing on has the following row counts:
Contact: ~10,000 rows
Contact_Log: ~3,000,000 rows
I am currently using SQL Server 2008 R2 for testing. So a solution that is supported in that and later is preferred.
What I am trying to achieve
Basically I am trying to formulate a query that can tell me the LogDate
for when the ContactStatus
field was last changed, for a specific Contact_UID
, taken from the Contact_Log
table.
For example, if the record I was interested in was “John”, then the result should be “2019-01-02”. As this is the date that John’s ContactStatus last changed (i.e. it changed from “Alive” to “Dead”).
Ultimately, I want to put this query into a function. A function that can be called by passing in the Contact_UID and the name of the field I want to check. This function could then be called as part of a more general query. For example:
SELECT Name, MyFunction('62918AC1-1C6C-4DEB-B7F8-5D5EF913F667', 'ContactStatus') AS StatusLastChanged FROM Contact
What I have tried so far
Well, I have tried a few things, and although I can get the results I want. My attempts are really struggling with performance problems.
NOTE: Although I only really want a single datetimeoffset result. Some attempts include more data/fields simply to try and validate the data is accurate.
Attempt 1:
SELECT TOP(1) a.LogDate FROM Contact_Log AS a WHERE a.Contact_UID = '62918AC1-1C6C-4DEB-B7F8-5D5EF913F667' AND a.ContactStatus <> ( SELECT TOP(1) b.ContactStatus FROM Contact_Log AS b WHERE b.Contact_UID = '62918AC1-1C6C-4DEB-B7F8-5D5EF913F667' AND a.LogDate > b.LogDate ORDER BY b.LogDate DESC ) ORDER BY LogDate DESC
Problem 1: Too slow. I had to stop the query after nearly an hour of waiting with no results.
Attempt 2:
SELECT A.LogDate FROM (SELECT ROW_NUMBER() OVER (ORDER BY LogDate DESC, ID DESC) AS rnum, ID, LogDate, Contact_UID, ContactStatus FROM Contact_Log) A LEFT JOIN (SELECT ROW_NUMBER() OVER (ORDER BY LogDate DESC, ID DESC) AS rnum, ID, LogDate, Contact_UID, ContactStatus FROM Contact_Log) B ON A.rnum = B.rnum-1 WHERE (B.rnum IS NULL OR (A.Contact_UID = '62918AC1-1C6C-4DEB-B7F8-5D5EF913F667' AND B.Contact_UID = '62918AC1-1C6C-4DEB-B7F8-5D5EF913F667' AND A.ContactStatus != B.ContactStatus)) ORDER BY A.rnum
Problem 2: This works and gives me the correct data set. However it takes 6 seconds which is too slow. Remember it will need to work as a function in a more general query (with ~10,000 rows).
Attempt 3: Now this is basically the same as attempt 2, expect I tried to apply TOP(1)
so that I can get the result that I actually want.
SELECT TOP(1) A.LogDate FROM (SELECT ROW_NUMBER() OVER (ORDER BY LogDate DESC, ID DESC) AS rnum, ID, LogDate, Contact_UID, ContactStatus FROM Contact_Log) A LEFT JOIN (SELECT ROW_NUMBER() OVER (ORDER BY LogDate DESC, ID DESC) AS rnum, ID, LogDate, Contact_UID, ContactStatus FROM Contact_Log) B ON A.rnum = B.rnum-1 WHERE (B.rnum IS NULL OR (A.Contact_UID = '62918AC1-1C6C-4DEB-B7F8-5D5EF913F667' AND B.Contact_UID = '62918AC1-1C6C-4DEB-B7F8-5D5EF913F667' AND A.ContactStatus != B.ContactStatus)) ORDER BY A.rnum
Problem 3: To my surprise, this takes a lot longer than attempt 2, even though all I did was add TOP(1)
at the start. This took over 5 minutes so I stopped the query and gave up.
Question
How can I do what I want in “What I am trying to achieve”, but with a reasonable amount of performance? (I would be happy getting it under 1 second at this stage).
Remember, I just want a single datetimeoffset as the result so it can be used in a function.
So far I have no specific indexes created. I am happy to consider suggestions for those as a suitable answer, if no improvement to the query are possible. Or any changes to the schema that is appropriate.
Bottom Line
I am looking for a query that will produce 1 result, with 1 datetimeoffset field. It needs to take less than 1 second to run.
Advertisement
Answer
You want to select the smallest date right after the highest date that does not equal the current ContactStatus. That would be something like this:
select min(LogDate) from Contact_Log where Contact_UID='62918AC1-1C6C-4DEB-B7F8-5D5EF913F667' and ContactStatus = ( select top 1 ContactStatus from Contact_Log where Contact_UID='62918AC1-1C6C-4DEB-B7F8-5D5EF913F667' order by Log_Date desc ) and LogDate > ( select max(LogDate) from Contact_Log where Contact_UID='62918AC1-1C6C-4DEB-B7F8-5D5EF913F667' and ContactStatus != ( select top 1 ContactStatus from Contact_Log where Contact_UID='62918AC1-1C6C-4DEB-B7F8-5D5EF913F667' order by Log_Date desc ) );