Skip to content
Advertisement

Postgresql query: update status of limit number of records based on group size

I have a postgresql table contains a list of email addresses. The table has three columns, Email, EmailServer (e.g., gmail.com, outlook.com, msn.com, and yahoo.com.ca etc.), and Valid (boolean).

Now, I want to group those emails by EmailServer and then update the first 3 records of each large group (count >=6) as Valid = true while leaving the rest of each group as Valid = false.

I failed to get the wanted output by below query:

UPDATE public."EmailContacts"   
SET "Valid"=true
WHERE "EmailServer" IN (
    SELECT "EmailServer"
    FROM public."EmailContacts"
    GROUP by "EmailServer"
    HAVING count(*) >=6
    LIMIT 5)

Please help to modify so as to get the expected results. Would be greatly appreciated for any kind of your help!

Advertisement

Answer

WITH major_servers AS (
    SELECT email_server
    FROM email_address
    GROUP by email_server
    HAVING count(*) >=6
),
enumerated_emails AS (
    SELECT email,
           email_server,
           row_number() OVER (PARTITION BY email_server ORDER BY email) AS row_number --TODO:: ORDER BY email - attention
    FROM email_address
    WHERE email_server IN (SELECT email_server FROM major_servers)
)
UPDATE email_address
SET valid = true
WHERE email IN (SELECT email
                FROM enumerated_emails ee
                WHERE ee.row_number <= 3);
  • The first query major_servers finds major groups where more than 5 email servers exist.
  • The second query enumerated_emails enumerates emails by their natural order (see a TODO comment, I think you should choose another ORDER BY criteria) which belong to major groups using window function row_number().
  • The last query updates the first 3 rows in each major server group.

Find the sql-fiddle here.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement