I’m trying to get the top 5 comments by score for each Reddit post. I only want to retrieve the top N comments by score for each post title.
Example: I only would want comment 1 and 2 for each post.
Post 1 | Comment 1 | Comment Score 10 Post 1 | Comment 2 | Comment Score 9 Post 1 | Comment 3 | Comment Score 8 Post 2 | Comment 1 | Comment Score 10 Post 2 | Comment 2 | Comment Score 9 Post 2 | Comment 3 | Comment Score 8
StandardSQL
SELECT 
    posts.title, 
    posts.url, 
    posts.score AS postsscore, 
    DATE_TRUNC(DATE(TIMESTAMP_SECONDS(posts.created_utc)), MONTH), 
    SUBSTR(comments.body, 0, 80), 
    comments.score AS commentsscore, 
    comments.id
FROM 
    `fh-bigquery.reddit_posts.2015*` AS posts
    JOIN `fh-bigquery.reddit_comments.2015*` AS comments
        ON posts.id = SUBSTR(comments.link_id, 4)
WHERE 
    posts.subreddit = 'Showerthoughts' 
    AND posts.score >100 
    AND comments.score >100
ORDER BY 
    posts.score DESC, 
    posts.title DESC, 
    comments.score DESC
Advertisement
Answer
Below is for BigQuery Standard SQL
#standardSQL
SELECT * EXCEPT(pos) FROM (
  SELECT 
    posts.title, 
    posts.url, 
    posts.score AS postsscore, 
    DATE_TRUNC(DATE(TIMESTAMP_SECONDS(posts.created_utc)), MONTH), 
    SUBSTR(comments.body, 0, 80), 
    comments.score AS commentsscore, 
    comments.id,
    ROW_NUMBER() OVER(PARTITION BY posts.url ORDER BY comments.score DESC) pos
  FROM `fh-bigquery.reddit_posts.2015*` AS posts
  JOIN `fh-bigquery.reddit_comments.2015*` AS comments
  ON posts.id = SUBSTR(comments.link_id, 4)
  WHERE posts.subreddit = 'Showerthoughts' 
  AND posts.score >100 
  AND comments.score >100
) 
WHERE pos < 3
ORDER BY postsscore DESC, title DESC, commentsscore DESC