Skip to content
Advertisement

JPA Batch inserts with non auto generated id

im triying to batch insert a few Million Entitys. The Batch insert kind of works, but my programm executes a few JDBC Statements in the background which i dont want.

}

my Repository:

my Entity:

my JPA Settings:

The Batch Insert Does work, but if i try to Upload 100 Entitys i have 33 JDBC Statements which are checking the id.

This is the output for 33 Entitys:

… My Programm is trying to load the entitys but idk why, i havent inserted them yet. he does this for 32 id’s. For every id except the first one (0) after this output, There’s a batch insert for all 33 entitys …

.. after that i get this summary:

if i only use 1 entity the output is:

for 2 Entitys it shows the following (my id starts at 0, so he only JDBC executes the second entity):

the output for 3 is

… So i have two questions:

  1. Why do I have all this JDBC statements when I only want one Batch insert? (and how do I fix this)

  2. I tried this for a few Million entitys but I cant see any updates in the Database until the Programm is done. I do call the iceCreamRepository.saveAll(iceList); function every 4000 lines. I thought that this will write all the Entityts into the Database. My ram isnt hughe, i have a 10 gb file of Data and only 2 gb of ram. If the programm waits to write all the data till the end, why dont i run out of ram?

Advertisement

Answer

The answer is going to be a bit convoluted, but bear with me.

I do call the iceCreamRepository.saveAll(iceList)

From the above, I’m assuming you’re using Spring Data with JPA.

Why do I have all this JDBC statements when I only want one Batch insert? (and how do I fix this)

The implementation of JpaRepository.saveAll() is to call save() on each entity in the list, while the implementation of save() is as follows:

The default implementation of EntityInformation ‘considers an entity to be new whenever EntityInformation.getId(Object) returns null’, meaning your entities fall into the second branch of the if ... else ... statement.

Effectively, Spring Data is telling JPA to merge the entities with their existing version in the DB. Thus, JPA needs to load that existing version first, and this is why you’re seeing all the additional querying.

To solve this issue, either:

  • make your entity implement Persistable, and return true from isNew() (note that this may affect the persisting logic elsewhere; see this link for more information)
  • OR inject and interact with EntityManager directly, calling persist() instead of merge()

I tried this for a few Million entitys but I cant see any updates in the Database until the Programm is done

For actual queries to get executed, you need to call EntityManager.flush() after each batch (if you choose not to interact with EntityManager directly, use JpaRepository.flush() instead)

(As a side note, JPA comes with a lot of overhead involving caching, conversions etc. and is generally a poor choice for batch operations. I’d consider switching to Spring Batch with JDBC if I were you)

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement