Will the constraints remain after Importing table from RDBMS to HIVE with sqoop?

Question

When we use sqoop import to transfer an RDBMS table to HIVE, will the constraints of the table such as primary key remain ? i.e. will the column of the table which is the primary key remain as &#8230;

Accepted Answer

As you can see in the link of the official documentation of Hive QL below, PRIMARY and FOREIGN constraints have been added since Hive version 2.1.0.Hive QLSo, I assume that the PRIMARY and FOREIGN Keys constraints will remain when you import your tables to Hive using sqoop.I tested a sqoop import of MySQL database, and I can see that PRIMARY KEY CONSTRAINT is not maintained during the import.MySQL Table Format:    mysql> show create table employees;+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| Table     | Create Table                                                                                                                                                                                                                                                                                   |+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| employees | CREATE TABLE `employees` (  `emp_no` int(11) NOT NULL,  `birth_date` date NOT NULL,  `first_name` varchar(14) NOT NULL,  `last_name` varchar(16) NOT NULL,  `gender` enum('M','F') NOT NULL,  `hire_date` date NOT NULL,  PRIMARY KEY (`emp_no`)) ENGINE=InnoDB DEFAULT CHARSET=latin1 |+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+1 row in set (0,00 sec)Data has been imported from MySQL to Hive with the following command:sqoop import --connect jdbc:mysql://localhost/employees --username root --password password --table employees --hive-import --create-hive-table --hive-table employeesWhen I describe the table in hive, I cannot see the PRIMARY KEY CONSTRAINThive> show create table employees;OKCREATE TABLE `employees`(  `emp_no` int,   `birth_date` string,   `first_name` string,   `last_name` string,   `gender` string,   `hire_date` string)COMMENT 'Imported by sqoop on 2019/03/18 00:24:11'ROW FORMAT SERDE   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES (   'field.delim'='',   'line.delim'='n',   'serialization.format'='') STORED AS INPUTFORMAT   'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'LOCATION  'hdfs://localhost:9000/user/hive/warehouse/employees'TBLPROPERTIES (  'transient_lastDdlTime'='1552865076')Time taken: 1.304 seconds, Fetched: 22 row(s)I inserted a new row with same employee number to check if Hive manage PK Constraint. The new row has been added as you can see:hive> insert into employees values (10001, "1986-04-17", "Hichem", "BOUSSETTA", "M", "2014-09-91");Moving data to directory hdfs://localhost:9000/user/hive/warehouse/employees/.hive-staging_hive_2019-03-18_00-32-16_851_8569619447966100947-1/-ext-10000Loading data to table default.employeesMapReduce Jobs Launched: Stage-Stage-1: Map: 1   Cumulative CPU: 5.79 sec   HDFS Read: 5080 HDFS Write: 120 SUCCESSTotal MapReduce CPU Time Spent: 5 seconds 790 msecOKTime taken: 42.422 secondshive> select * from employees;OK10001   1986-04-17  Hichem  BOUSSETTA   M   2014-09-9110001   1953-09-02  Georgi  Facello M   1986-06-2610002   1964-06-02  Bezalel Simmel  F   1985-11-21So to conclude: Sqoop does not preserve PK constraint when importing RDBMS data to Hive

Advertisement

Answer