Skip to content
Advertisement

PutHiveQL NiFi Processor extremely slow – misconfiguration?

I am currently setting up a simple NiFi flow that reads from a RDBMS source and writes to a Hive sink. The flow works as expected until the PuHiveSql processor, which is running extremely slow. It inserts one record every minute approximately.
Currently is setup as a standalone instance running on one node.

enter image description here

The logs showing the insert every 1 minute approx:

(INSERT INTO customer (id, name, address) VALUES (x, x, x)) enter image description here

Any ideas about why this may be? Improvements to try?

Thanks in advance

Advertisement

Answer

Inserting one record at a time into Hive will result extreme slowness.

As your doing regular insert into hive table:

Change your flow:

QueryDatabaseTable
PutHDFS

Then create Hive avro table on top of HDFS directory where you have stored the data.

(or)

QueryDatabaseTable
ConvertAvroToORC //incase if you need to store data in orc format
PutHDFS

Then create Hive orc table on top of HDFS directory where you have stored the data.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement