I have a table that looks like this:
ITE_ITEM_ID | SIT_SITE_ID | INVENTORY_ID | ITE_VARIATION_ID | STOCK_EAN | STOCK_FULFILLMENT_ACTIVE |
---|---|---|---|---|---|
16302514 | B | WAE6496 | 62101793519 | 79098642 | 1 |
6210113 | M | GKVU072 | [object Object] | NULL | 1 |
16021657 | M | YHQG635 | 60513515602 | NULL | 1 |
8449326 | A | ZZRV751 | 52555136800 | NULL | 1 |
1154338160 | B | VXWP565 | NULL | NULL | 0 |
559568 | M | GYPZ201 | 32325593678 | NULL | 1 |
13255753 | B | PH63355 | 70388916917 | NULL | 1 |
7614543 | M | XOQO412 | 51698700618 | NULL | 1 |
I am trying to get the different ITE_VARIATION_ID
that share the same STOCK_EAN
and have a count > 1
I am currently doing this by splitting two statements:
create multiset volatile table eans_plus as ( SELECT STOCK_EAN, count(*) as count_ean FROM my_table group by 1 having count_ean > 1 ) with data primary index (stock_ean) on commit preserve rows; SELECT a.STOCK_EAN, a.ITE_VARIATION_ID from my_table a inner join eans_plus b on a.stock_ean = b.stock_ean ;
however this takes a while to execute (>140 secs), since the table is very large, I was wondering if there is a more efficient way to do this, either by avoiding executing two queriees or adding an index. I am using alation on Teradata
Advertisement
Answer
You can combine into a single query easily enough, but the performance improvement from doing that is likely to be quite small:
WITH eans_plus AS ( SELECT STOCK_EAN, count(*) as count_ean FROM my_table WHERE STOCK_EAN IS NOT NULL group by 1 having count_ean > 1 ) SELECT a.STOCK_EAN, a.ITE_VARIATION_ID from my_table a inner join eans_plus b on a.stock_ean = b.stock_ean ;
You could also use a window function with QUALIFY; not sure this would be an improvement
SELECT STOCK_EAN, ITE_VARIATION_ID FROM my_table a WHERE STOCK_EAN IS NOT NULL QUALIFY COUNT(*) OVER (PARTITION BY STOCK_EAN) > 1;
Making STOCK_EAN
the Primary Index for my_table
(if it isn’t already) could improve these queries, but you’d need to see how it impacts distribution and other usage of this table.
A single-table join index would likely improve performance for these queries, though table maintenance would see some performance degradation.
CREATE JOIN INDEX my_table_aji AS SELECT STOCK_EAN, COUNT(*) as theCount FROM my_table GROUP BY 1 PRIMARY INDEX (STOCK_EAN);
EDIT: add WHERE STOCK_EAN IS NOT NULL
filtering