I have a hive table with the following structure and data:
Table structure:
CREATE EXTERNAL TABLE IF NOT EXISTS db_crprcdtl.shcar_dtls ID string, CSK string, BRND string, MKTCP string, AMTCMP string, AMTSP string, RLBRND string, ROW FORMAT DELIMITED FIELDS TERMINATED BY 't' LINES TERMINATED BY 'n' STORED AS TEXTFILE LOCATION '/on/hadoop/dir/' ------------------------------------------------------------------------------- ID | CSK | BRND | MKTCP | AMTCMP ------------------------------------------------------------------------------- 782 flatn,grpl,mrtn hnd,mrc,nsn 34555,56566,66455 38900,59484,71450 1231 jikl,bngr su,mrc,frd 56566,32333,45000 59872,35673,48933 123 unsrvl tyt,frd,vlv 25000,34789,33443 29892,38922,36781
Trying to push this data into the SQL Server. But while doing so, getting the following error message:
SQL Error [107090] [S0001]: HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopExecutionException: Not enough columns in this line.
What I tried:
There’s an online article where the author has documented similar kind of issues. I tried to implement one of them Looked in Excel and found two columns that had carriage returns
but this also doesn’t come handy.
Any suggestion/help would be really appreciated. Thanks
Advertisement
Answer
If I’m able to understand your issue, then it seems that your ,
separated data is getting divided into various columns rather one column on the SQL-SERVER
, something like:
------------------------------ ID |CSK |BRND |MKTCP |AMTCMP ------------------------------ 782 flatn grpl mrtn hnd mrc nsn 345 56566 66455 38900 59484 71450 1231 jikl bngr su mrc frd 56566 32333 45000 59872 35673 48933 123 unsrvl tyt frd vlv 25000 34789 33443 29892 38922 36781
So, if you look on Hive
there are only 5 columns. While on SQL-SERVER
the same. This I presume as you haven’t shared the schema. But if that’s the case, then you see that there are more than 5 values are being passed. While the schema definition is only of 5 columns.
So the error is populating.
Refer this Document by MS and try to create a FILE_FORMAT
with FIELD_TERMINATOR ='t'
,
like:
CREATE EXTERNAL FILE FORMAT <name> WITH ( FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS ( FIELD_TERMINATOR ='t', | STRING_DELIMITER = string_delimiter | First_Row = integer -- ONLY AVAILABLE SQL DW | DATE_FORMAT = datetime_format | USE_TYPE_DEFAULT = { TRUE | FALSE } | Encoding = {'UTF8' | 'UTF16'} ) );
Hope that helps to resolve to your issue 🙂