I am reading some raw input which looks something like this:
20 abc def 21 ghi jkl mno pqr 23 stu
Note the first two rows are “good” rows and the last two rows are “bad” rows since they are missing some data.
Here is the snippet of my hive query which is reading this raw data into a readonly external table:
DROP TABLE IF EXISTS readonly_s3; CREATE EXTERNAL TABLE readonly_s3 (id string, name string, data string) ROW FORMAT DELIMITED FIELDS TERMINATED BY 't'
I need to get the count of ALL the rows, both “good” and “bad.” The problem is some of the data is missing, and if I do SELECT count(id) as total_rows
for example, that doesn’t work since not all the rows have an id
.
Any suggestions on how I can count ALL the rows in this raw data file?
Advertisement
Answer
Hmmm . . . You can use:
select sum(case when col1 is not null and col2 is not null and col3 is not null then 1 else 0 end) as num_good, sum(case when col1 is null or col2 is null or col3 is null then 1 else 0 end) as num_bad from readonly_s3;