I am currently setting up a simple NiFi flow that reads from a RDBMS source and writes to a Hive sink. The flow works as expected until the PuHiveSql processor, which is running extremely slow. It inserts one record every minute approximately.
Currently is setup as a standalone instance running on one node.
The logs showing the insert every 1 minute approx:
(INSERT INTO customer (id, name, address) VALUES (x, x, x)
)
Any ideas about why this may be? Improvements to try?
Thanks in advance
Advertisement
Answer
Inserting one record at a time into Hive will result extreme slowness.
As your doing regular insert into hive table:
Change your flow:
QueryDatabaseTable PutHDFS
Then create Hive avro
table on top of HDFS directory where you have stored the data.
(or)
QueryDatabaseTable ConvertAvroToORC //incase if you need to store data in orc format PutHDFS
Then create Hive orc
table on top of HDFS directory where you have stored the data.