Skip to content
Advertisement

SQL efficient way to match ANY in a large table

I am joining a small table and a very large table and want to return a distinct item if ANY items match. The table is so large that it takes hours for something that I think should take seconds.

The problem is that I am “iterating” over every single entry in the second table. I want to be able to “break” once a condition is met and return that value instead of continuing over every single account.

In the code below, I am finding every single row for each name that I am joining, even though I am only returning the DISTINCT example.name and don’t care about every row. How can I return DISTINCT.name after finding the first instance of new_ex.data = ... after performing the INNER JOIN?

SELECT DISTINCT example.name
FROM (
    SELECT DISTINCT ex.user AS name
    FROM exampleTable ex
    WHERE ex.timestamp >= '2022-01-01'
    AND ex.group = 'test'
    AND new_ex.data = '123'
) AS example_users
INNER JOIN exampleTable new_ex on example_users.name = new_ex.user
AND new_ex.timestamp >= '2022-01-01'
AND (
    OR new_ex.data = 'abc'
    OR new_ex.data = 'def'
    OR new_ex.data = 'ghi'
    -- ~10 more of these OR statements
)

Advertisement

Answer

Without seeing the data it’s hard to be sure this can’t be simplified further, but I think you can at least boil this down to

select distinct ex.user as name
from exampleTable ex
where ex.timestamp >= '2022-01-01'
and ex.group = 'test'
AND new_ex.data = '123'
and exists (
    select 1
    from exampleTable new_ex 
    where new_ex.user=ex.name
        and new_ex.data = '123'
        and new_ex.timestamp >= '2022-01-01'
        and new_ex.data in ('abc','def','ghi'...)
)
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement