Skip to content
Advertisement

How to find consecutive values in column

I have been given this task to try to detect some duplicate records in a table with a large volume of rows. The table comprises 2 joined tables. So to begin with I have:

Now we have a table to work from. The duplicate records have unique event_number_id, the reamining fields will contain identical data, so something like this:

So, this is an example of a duplicate record. There are consecutive event_number_id’s and all the remaining columns have identical information. We believe that our system has been creating duplicate events for some time now (this isn’t supposed to happen), so I want to query the whole joined table and find anything that has rows that have exactly the same data, but different and consecutive event numbers.

So far, I managed to make a simple query that shows me any rows that have identical information, excluding the event_number_id column:

which returns:

The problem is, sometimes there are rows that have identical data, but could be valid, so the best we can do at this point is find any of these matching rows that have consecutive event_number_id’s. This is where I am hung up. Is there a way to pull out only the rows that contain these consecutive numbers?

Advertisement

Answer

Here’s an approach based on a join of the data sets:

The efficiency will depend on a few factors, such as the efficiency of scanning the base tables and the size of the data sets.

It would be interesting to see a comparison of the execution plans of this method and the analytics-function methods. This common table expression-based join ought to be very efficient as it depends on hash joins, which have almost no cost as long as they stay in memory (a big question mark over that).

I’d be inclined to go for the analytic functions if the event_number_id’s were not consecutive — if there might be gaps, for instance, which would be harder to implement as a join. Given that one of them is the other incremented, I think it’s worth taking a punt on a join.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement