Skip to content
Advertisement

why boolean field is not working in Hive?

I have a column in my hive table which datatype is boolean. when I tried to import data from csv, it stored as NULL.

This is my sample table :

My sample data :

Result :

when loading manually :

Result:

can someone explain why this dissimilarity?

Advertisement

Answer

Boolean type needs literals representation. SQL standard defines only three values for boolean: TRUE, FALSE, and UNKNOWN(=NULL in Hive). Using integers is not standardized in SQL, though many databases supports them.

You are using LazySimpleSerDe for deserealizing table data. LazySimpleSerDe uses this property hive.lazysimple.extended_boolean_literal to determine if it treats ‘T’, ‘t’, ‘F’, ‘f’, ‘1’, and ‘0’ as extended, legal boolean literals, in addition to ‘TRUE’ and ‘FALSE’. The default is false, which means only ‘TRUE’ and ‘FALSE’ are treated as legal boolean literals.

Set this property to be able to read CSV files with 1 and 0 as Booleans:

See this Jira HIVE-3635 Try also to set this property in the table DDL:

TBLPROPERTIES ("hive.lazysimple.extended_boolean_literal"="true")

About using other than TRUE or FALSE boolean literals in Hive query language official documentation says implicit conversion of other types to Boolean is not possible: AllowedImplicitConversions though as you can see it works.

Few tests with casting strings and integers to boolean:

Also have a look at this Jira: HIVE-3604 and Type Conversion Functions documentation, it says: If cast(expr as boolean) Hive returns true for a non-empty string. And actually this conforms with UDFToBoolean source code.

So, better use officially allowed literals for Boolean type to make sure you have no side effects also described in this article: Hive: Booleans Are Too Confusing To Be Usable

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement