Skip to content
Advertisement

How to query for missing fields and associations and return what’s missing in PostgreSQL?

I am learning more raw SQL after coming from years of Rails and other ORMs, so have quite a ways to learn how to do complex queries efficiently. What I’m wondering here is _how to find all users which are missing some fields and associations, and returning which fields/associations they are missing.

I have a rough idea of how to write this in SQL but not exact (for PostgreSQL).

I have something like this data model, a table of users, a table of “social media links”, and an association mapping the link to the user, so a user can have many social media links, but there could be more than one user associated with one link (i.e. organizations):

CREATE TABLE users (
  id INT GENERATED BY DEFAULT AS IDENTITY,
  slug VARCHAR(255) NOT NULL,
  name VARCHAR(255),
  description TEXT,
  PRIMARY KEY (id)
)

CREATE TABLE sociallinks (
  id INT GENERATED BY DEFAULT AS IDENTITY,
  type VARCHAR(255),
  value TEXT,
  PRIMARY KEY (id)
)

CREATE TABLE usersociallinks (
  id INT GENERATED BY DEFAULT AS IDENTITY,
  user_id INTEGER REFERENCES users;
  sociallink_id INTEGER REFERENCES sociallinks,
  PRIMARY KEY (id)
)

The question is, how do you perform the query (with this pseudocode):

select from users
join on usersociallinks.user_id = users.id
join on sociallinks.id = usersociallinks.id
where name = null
  or description = null
  or missing linkedin? (sociallinks.type == linkedin)
  or missing facebook? (sociallinks.type == facebook)
return slug from users table
return has_name = false if name is null from users table
return has_description = false if description is null from users table
return has_linkedin = false if linkedin is null from sociallinks table
return has_facebook = false if facebook is null from sociallinks table

In natural language, “select the users which either don’t have a name or a description, or are missing a linkedin or facebook link (sociallinks.value), and return what fields they are missing”.

I can do this the naive, long and convoluted way or querying one thing at a time, but I’m wondering how to do this efficiently, in possibly just one query (or as few queries as possible).

SELECT * FROM users
WHERE name IS NULL
  OR description IS NULL
LIMIT 1

Fetch those, then do:

const record = await knex.raw(SQL)
const output = {}
if (!record.name) output.hasName = false
if (!record.description) output.hasDescription = false
return output

Then the next query is more complex, but I would do one at a time.

How do you do it somewhat efficiently and in as few queries as possible? Fetching with a limit of say 100 users per query.

Input data would be:

users:
id,slug,name,description
1,foo,Foo,Im a description
2,bar,,Im a description too
3,baz,,
4,hello,Hello,
5,world,,
6,food,Foo,Im a descriptiond
7,bard,,Im a description tood
8,bazd,asdf,fdsa
9,hellod,,
10,worldd,,A worldd description

sociallinks:
id,type,value
1,facebook,foo
2,facebook,bar
3,facebook,baz
4,facebook,hello
5,facebook,world
6,linkedin,foo
7,linkedin,bar
8,linkedin,baz
9,linkedin,hello
10,linkedin,world

usersociallinks:
id,user_id,sociallink_id
1,1,1
2,2,2
3,2,6
4,3,7
5,5,3
6,8,4
7,8,8
8,9,9

Output data would be:

user_id,slug,has_name,has_description,has_linkedin,has_facebook
1,foo,true,true,false,true
2,bar,false,true,true,true
3,baz,false,false,true,false
4,hello,true,false,false,false
5,world,false,false,false,true
6,food,true,true,false,false
7,bard,false,true,false,false
// 8 has everything so it is missing
9,hellod,false,false,true,false
10,worldd,false,true,false,false

Advertisement

Answer

EXISTS() to the rescue:


SELECT u.id, u.slug
        , (u.name > '' ) AS has_name 
        , (u.description> '' ) AS has_description
        , EXISTS(SELECT 1 FROM usersociallinks sl JOIN sociallinks s ON s.id = usl.sociallink_i
                 WHERE sl.user_id = u.id AND s.type = 'facebook') AS has_facebook
        , EXISTS(SELECT 1 FROM usersociallinks sl JOIN sociallinks s ON s.id = usl.sociallink_i
                 WHERE sl.user_id = u.id AND s.type = 'linkedin') AS has_linkedin
FROM users u
ORDER BY u.id
        ;
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement