Skip to content
Advertisement

“Distinct” results in complex query

I have the following query, which returns all merchants in the database who have transactions between two given dates.

But, there is additional information I need to include in the query. I’ve written the enhanced query below, but it results in duplicated merchants. I know the query could be written better, but that’s the limit of my SQL knowledge.

I need to keep the same number of results as the above query.

This returns duplicated merchants. In the screenshot below, I have filtered on a single merchant to show the duplication. The first result set is from the original query. The second is from the updated query. “DISTINCT” does its job but I only want to see one merchant record.

enter image description here

There are two tables involved:

EDIT

I am trying to avoid having the dates inside a subquery

VARUN’S ANSWER

The updated query returns many duplicates as below.

MAX(CASE WHEN tt.bank_txt = 'IBA'

SQL SCRIPT

LATEST VERSION RESULTS

enter image description here

INDEXES

merchant_t

transaction_t

Advertisement

Answer

In your first query you have this clause:

In your second query you have this clause:

You are getting different number of rows in the second query because your GROUP BY is different. You need to keep it the same.


Here is one way to do it. I used CTE to make the query readable.

As you can see in the comments in the query, it returns the value of the trans_live column for the merchant’s most recent transaction. Most recent transaction of the merchant across all dates, regardless of the filter in the CTE. Same with the date of the last transaction that uses IBank (IBA). It returns the last transaction of the merchant across all dates, regardless of the filter in the CTE.

It is not clear from the question if this is what you want, though. Please clarify.

For this query to work efficiently, I would recommend to create certain indexes. You already have an index IX_status on the merchant_t table. This is good.

Those indexes that you have on the transaction_t table are not the best for this kind of query. You existing indexes IX_transaction_dt, IX_trans_live, IX_bank_txt are not useful for this query. Even IX_merchant_id by itself is not really useful, especially if you replace it with a composite index on (merchant_id, transaction_dt), as I’ve shown below.

I hope you do have a clustered primary key on id in transaction_t table. Similar to the primary key in the merchant_t table. If not, I’d create it.

Then, for efficient join between merchant_t and transaction_t tables and for efficient retrieval of the latest trans_live we need the following index:

The order of columns in this index is important.

You have an index on just transaction_dt. This query might use it, but it would be not as efficient as on (merchant_id, transaction_dt). Your current index is useful if you have queries that filter by transaction date without looking at the merchant_id. If you don’t have such queries, you’d better drop it. Having too many indexes “just in case” may be a problem for optimizer, and it slows down updates and inserts.

For efficient retrieval of the last last_IBA_transaction_dt we’ll need this index:

Again, the order of columns in this index is important.


If you don’t want to create two indexes specifically for this query, you can try just one index, which is great for getting trans_live and should somewhat help with last_IBA_transaction_dt.

You can try and measure performance with the first two indexes and then just with the third one and compare.


By the way, if there are merchants that don’t have any transactions within the given range of dates, then the query will return a row for these merchants. The COUNT in your original query will return 1 for these merchants. Probably this is not what you want.

To return 0 count for these merchants, the COUNT function should be COUNT(tt.id). I have made these changes in the code above.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement