Skip to content
Advertisement

Unnecessary blank row in SQL query

Aim was to write a SQL query to return ride details from a public database.

While going through a data analytics course, I stumbled across this problem. the code used is correct as the instruction provided was the same however the result I obtained was different. The first row returned a blank row with just the number of trips mentioned.

1

This is the code I used

SELECT 
usertype,
CONCAT (start_station_name," to ",end_station_name) AS Route,
Count(*) AS num_trips,
Round(Avg(CAST(tripduration as INT64)/60),2) AS duration,
FROM `bigquery-public-data.new_york_citibike.citibike_trips`
GROUP BY 
start_station_name, end_station_name, Usertype
ORDER BY 
num_trips DESC
Limit 10

Advertisement

Answer

That happens because you actually have 5.8M rows with empty usertype, start_station_name and end_station_name and as you are sorting in DESC order they show as first line in the output.

Assuming that you only need to count when there is at least a start_station_name then you can add a WHERE clause

SELECT 
usertype,
CONCAT (start_station_name," to ",end_station_name) AS Route,
Count(*) AS num_trips,
Round(Avg(CAST(tripduration as INT64)/60),2) AS duration,
FROM `bigquery-public-data.new_york_citibike.citibike_trips`

WHERE start_station_name <> ''

GROUP BY 
start_station_name, end_station_name, Usertype
ORDER BY 
num_trips DESC
Limit 10
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement