Skip to content
Advertisement

Will the constraints remain after Importing table from RDBMS to HIVE with sqoop?

When we use sqoop import to transfer an RDBMS table to HIVE, will the constraints of the table such as primary key remain ?

i.e. will the column of the table which is the primary key remain as primary key at the HIVE. Will this information be in the Hive metastore ?

Thanks a lot.

Advertisement

Answer

As you can see in the link of the official documentation of Hive QL below, PRIMARY and FOREIGN constraints have been added since Hive version 2.1.0. Hive QL

Hive 2.1.0 Changes

So, I assume that the PRIMARY and FOREIGN Keys constraints will remain when you import your tables to Hive using sqoop.

I tested a sqoop import of MySQL database, and I can see that PRIMARY KEY CONSTRAINT is not maintained during the import.

MySQL Table Format:

    mysql> show create table employees;
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table     | Create Table                                                                                                                                                                                                                                                                                   |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| employees | CREATE TABLE `employees` (
  `emp_no` int(11) NOT NULL,
  `birth_date` date NOT NULL,
  `first_name` varchar(14) NOT NULL,
  `last_name` varchar(16) NOT NULL,
  `gender` enum('M','F') NOT NULL,
  `hire_date` date NOT NULL,
  PRIMARY KEY (`emp_no`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0,00 sec)

Data has been imported from MySQL to Hive with the following command:

sqoop import --connect jdbc:mysql://localhost/employees --username root --password password --table employees --hive-import --create-hive-table --hive-table employees

When I describe the table in hive, I cannot see the PRIMARY KEY CONSTRAINT

hive> show create table employees;
OK
CREATE TABLE `employees`(
  `emp_no` int, 
  `birth_date` string, 
  `first_name` string, 
  `last_name` string, 
  `gender` string, 
  `hire_date` string)
COMMENT 'Imported by sqoop on 2019/03/18 00:24:11'
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
WITH SERDEPROPERTIES ( 
  'field.delim'='', 
  'line.delim'='n', 
  'serialization.format'='') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://localhost:9000/user/hive/warehouse/employees'
TBLPROPERTIES (
  'transient_lastDdlTime'='1552865076')
Time taken: 1.304 seconds, Fetched: 22 row(s)

I inserted a new row with same employee number to check if Hive manage PK Constraint. The new row has been added as you can see:

hive> insert into employees values (10001, "1986-04-17", "Hichem", 
"BOUSSETTA", "M", "2014-09-91");
Moving data to directory hdfs://localhost:9000/user/hive/warehouse/employees/.hive-staging_hive_2019-03-18_00-32-16_851_8569619447966100947-1/-ext-10000
Loading data to table default.employees
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 5.79 sec   HDFS Read: 5080 HDFS Write: 120 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 790 msec
OK
Time taken: 42.422 seconds
hive> select * from employees;
OK
10001   1986-04-17  Hichem  BOUSSETTA   M   2014-09-91
10001   1953-09-02  Georgi  Facello M   1986-06-26
10002   1964-06-02  Bezalel Simmel  F   1985-11-21

So to conclude: Sqoop does not preserve PK constraint when importing RDBMS data to Hive

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement