For example, which of the following calculations should perform faster?
sum ( case when fieldA is not null then 1 else 0 end ) total
or
sum ( case when fieldA is not null then fieldB -- binary field, 1 or 0. end ) total
For the sake of this example, assume that when fieldA is not null, fieldB will always equal 1. fieldB can also equal 1 if fieldA is null, which is why I use the case statement.
Advertisement
Answer
The two queries do not do the same thing, unless fieldB
is uniformly 1
(or uniformly 1
when fieldA
is not NULL
). In general, you should run the query that does what you really need.
Redshift is a columnar database. That means that every column used in a query adds overhead to the execution.
Hence, it is better to avoid reading a column if you can. Of course, if the column is referenced elsewhere in the query, then this does not apply.
In addition, SUM()
operates on numbers. I’m not sure if “binary” means that the value is a number. If not, then it needs to be converted, which also adds overhead.