Skip to content
Advertisement

Getting rows with count > 1 without executing two different queries

I have a table that looks like this:

ITE_ITEM_ID SIT_SITE_ID INVENTORY_ID ITE_VARIATION_ID STOCK_EAN STOCK_FULFILLMENT_ACTIVE
16302514 B WAE6496 62101793519 79098642 1
6210113 M GKVU072 [object Object] NULL 1
16021657 M YHQG635 60513515602 NULL 1
8449326 A ZZRV751 52555136800 NULL 1
1154338160 B VXWP565 NULL NULL 0
559568 M GYPZ201 32325593678 NULL 1
13255753 B PH63355 70388916917 NULL 1
7614543 M XOQO412 51698700618 NULL 1

I am trying to get the different ITE_VARIATION_ID that share the same STOCK_EAN and have a count > 1

I am currently doing this by splitting two statements:

create multiset volatile table eans_plus as (
SELECT STOCK_EAN, count(*) as count_ean
FROM my_table
group by 1
having count_ean > 1
) with data primary index (stock_ean) on commit preserve rows;

SELECT a.STOCK_EAN, a.ITE_VARIATION_ID
from my_table a inner join eans_plus b on a.stock_ean = b.stock_ean
;

however this takes a while to execute (>140 secs), since the table is very large, I was wondering if there is a more efficient way to do this, either by avoiding executing two queriees or adding an index. I am using alation on Teradata

Advertisement

Answer

You can combine into a single query easily enough, but the performance improvement from doing that is likely to be quite small:

WITH eans_plus AS (
SELECT STOCK_EAN, count(*) as count_ean
FROM my_table
WHERE STOCK_EAN IS NOT NULL
group by 1
having count_ean > 1
)
SELECT a.STOCK_EAN, a.ITE_VARIATION_ID
from my_table a inner join eans_plus b on a.stock_ean = b.stock_ean
;

You could also use a window function with QUALIFY; not sure this would be an improvement

SELECT STOCK_EAN, ITE_VARIATION_ID
FROM my_table a
WHERE STOCK_EAN IS NOT NULL
QUALIFY COUNT(*) OVER (PARTITION BY STOCK_EAN) > 1;

Making STOCK_EAN the Primary Index for my_table (if it isn’t already) could improve these queries, but you’d need to see how it impacts distribution and other usage of this table.

A single-table join index would likely improve performance for these queries, though table maintenance would see some performance degradation.

CREATE JOIN INDEX my_table_aji AS
SELECT STOCK_EAN, COUNT(*) as theCount FROM my_table
GROUP BY 1
PRIMARY INDEX (STOCK_EAN);

EDIT: add WHERE STOCK_EAN IS NOT NULL filtering

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement