Skip to content
Advertisement

High performance query to get datetime when value was last changed

The data I am working with

Consider the following 2 database tables:

The Contact table is the main table for contact records. It stores the name of the contact and the status (e.g. “Alive”, “Dead” or whatever).

The Contact_Log table stores all changes made to the Contact table.

So here is some sample data:

Contact:

Contact_Log:

NOTE: At this point I haven’t added any indexes or anything like that on that tables.

Test Scenario

The above is just some sample data. The data I am testing on has the following row counts:

Contact: ~10,000 rows

Contact_Log: ~3,000,000 rows

I am currently using SQL Server 2008 R2 for testing. So a solution that is supported in that and later is preferred.

What I am trying to achieve

Basically I am trying to formulate a query that can tell me the LogDate for when the ContactStatus field was last changed, for a specific Contact_UID, taken from the Contact_Log table.

For example, if the record I was interested in was “John”, then the result should be “2019-01-02”. As this is the date that John’s ContactStatus last changed (i.e. it changed from “Alive” to “Dead”).

Ultimately, I want to put this query into a function. A function that can be called by passing in the Contact_UID and the name of the field I want to check. This function could then be called as part of a more general query. For example:

What I have tried so far

Well, I have tried a few things, and although I can get the results I want. My attempts are really struggling with performance problems.

NOTE: Although I only really want a single datetimeoffset result. Some attempts include more data/fields simply to try and validate the data is accurate.

Attempt 1:

Problem 1: Too slow. I had to stop the query after nearly an hour of waiting with no results.

Attempt 2:

Problem 2: This works and gives me the correct data set. However it takes 6 seconds which is too slow. Remember it will need to work as a function in a more general query (with ~10,000 rows).

Attempt 3: Now this is basically the same as attempt 2, expect I tried to apply TOP(1) so that I can get the result that I actually want.

Problem 3: To my surprise, this takes a lot longer than attempt 2, even though all I did was add TOP(1) at the start. This took over 5 minutes so I stopped the query and gave up.

Question

How can I do what I want in “What I am trying to achieve”, but with a reasonable amount of performance? (I would be happy getting it under 1 second at this stage).

Remember, I just want a single datetimeoffset as the result so it can be used in a function.

So far I have no specific indexes created. I am happy to consider suggestions for those as a suitable answer, if no improvement to the query are possible. Or any changes to the schema that is appropriate.

Bottom Line

I am looking for a query that will produce 1 result, with 1 datetimeoffset field. It needs to take less than 1 second to run.

Advertisement

Answer

You want to select the smallest date right after the highest date that does not equal the current ContactStatus. That would be something like this:

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement