I am learning more raw SQL after coming from years of Rails and other ORMs, so have quite a ways to learn how to do complex queries efficiently. What I’m wondering here is _how to find all users which are missing some fields and associations, and returning which fields/associations they are missing.
I have a rough idea of how to write this in SQL but not exact (for PostgreSQL).
I have something like this data model, a table of users, a table of “social media links”, and an association mapping the link to the user, so a user can have many social media links, but there could be more than one user associated with one link (i.e. organizations):
CREATE TABLE users ( id INT GENERATED BY DEFAULT AS IDENTITY, slug VARCHAR(255) NOT NULL, name VARCHAR(255), description TEXT, PRIMARY KEY (id) ) CREATE TABLE sociallinks ( id INT GENERATED BY DEFAULT AS IDENTITY, type VARCHAR(255), value TEXT, PRIMARY KEY (id) ) CREATE TABLE usersociallinks ( id INT GENERATED BY DEFAULT AS IDENTITY, user_id INTEGER REFERENCES users; sociallink_id INTEGER REFERENCES sociallinks, PRIMARY KEY (id) )
The question is, how do you perform the query (with this pseudocode):
select from users join on usersociallinks.user_id = users.id join on sociallinks.id = usersociallinks.id where name = null or description = null or missing linkedin? (sociallinks.type == linkedin) or missing facebook? (sociallinks.type == facebook) return slug from users table return has_name = false if name is null from users table return has_description = false if description is null from users table return has_linkedin = false if linkedin is null from sociallinks table return has_facebook = false if facebook is null from sociallinks table
In natural language, “select the users which either don’t have a name or a description, or are missing a linkedin or facebook link (sociallinks.value), and return what fields they are missing”.
I can do this the naive, long and convoluted way or querying one thing at a time, but I’m wondering how to do this efficiently, in possibly just one query (or as few queries as possible).
SELECT * FROM users WHERE name IS NULL OR description IS NULL LIMIT 1
Fetch those, then do:
const record = await knex.raw(SQL) const output = {} if (!record.name) output.hasName = false if (!record.description) output.hasDescription = false return output
Then the next query is more complex, but I would do one at a time.
How do you do it somewhat efficiently and in as few queries as possible? Fetching with a limit of say 100 users per query.
Input data would be:
users: id,slug,name,description 1,foo,Foo,Im a description 2,bar,,Im a description too 3,baz,, 4,hello,Hello, 5,world,, 6,food,Foo,Im a descriptiond 7,bard,,Im a description tood 8,bazd,asdf,fdsa 9,hellod,, 10,worldd,,A worldd description sociallinks: id,type,value 1,facebook,foo 2,facebook,bar 3,facebook,baz 4,facebook,hello 5,facebook,world 6,linkedin,foo 7,linkedin,bar 8,linkedin,baz 9,linkedin,hello 10,linkedin,world usersociallinks: id,user_id,sociallink_id 1,1,1 2,2,2 3,2,6 4,3,7 5,5,3 6,8,4 7,8,8 8,9,9
Output data would be:
user_id,slug,has_name,has_description,has_linkedin,has_facebook 1,foo,true,true,false,true 2,bar,false,true,true,true 3,baz,false,false,true,false 4,hello,true,false,false,false 5,world,false,false,false,true 6,food,true,true,false,false 7,bard,false,true,false,false // 8 has everything so it is missing 9,hellod,false,false,true,false 10,worldd,false,true,false,false
Advertisement
Answer
EXISTS()
to the rescue:
SELECT u.id, u.slug , (u.name > '' ) AS has_name , (u.description> '' ) AS has_description , EXISTS(SELECT 1 FROM usersociallinks sl JOIN sociallinks s ON s.id = usl.sociallink_i WHERE sl.user_id = u.id AND s.type = 'facebook') AS has_facebook , EXISTS(SELECT 1 FROM usersociallinks sl JOIN sociallinks s ON s.id = usl.sociallink_i WHERE sl.user_id = u.id AND s.type = 'linkedin') AS has_linkedin FROM users u ORDER BY u.id ;