Skip to content
Advertisement

Create external table from csv on HDFS , all values come with quotes

I have a csv file on HDFS and I am trying to create an impala table , the situation is it created the table and values with all the “

CREATE external TABLE abc.def
(

name STRING,
title STRING,
last  STRING, 
pno STRING
)
row format delimited fields terminated by ','

location 'hdfs:pathlocation'
tblproperties ("skip.header.line.count"="1") ;

The output is
name tile last pno
“abc” “mr” “xyz” “1234”
“rew” “ms” “pre” “654”

I just want to create table from csv file without quotes. Please guide where I am going wrong. Regards, R

Advertisement

Answer

A way to do that is creating a stage table that load the file with quotes and then with CTAS (Create table as select) create the right table cleaning the fields with replace function. As an example

CREATE TABLE quote_stage(
 id STRING,
 name STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
+-----+----------+
| id  | name     |
+-----+----------+
| "1" | "pepe"   |
| "2" | "ana"    |
| "3" | "maria"  |
| "4" | "ramon"  |
| "5" | "lucia"  |
| "6" | "carmen" |
| "7" | "alicia" |
| "8" | "pedro"  |
+-----+----------+
CREATE TABLE t_quote 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
AS SELECT replace(id,'"','') AS id, replace(name,'"','') AS name FROM quote_stage;
+----+--------+
| id | name   |
+----+--------+
| 1  | pepe   |
| 2  | ana    |
| 3  | maria  |
| 4  | ramon  |
| 5  | lucia  |
| 6  | carmen |
| 7  | alicia |
| 8  | pedro  |
+----+--------+

Hope this helps.

3 People found this is helpful
Advertisement