Bulk and Batch imports with Spring Boot

20.12.2017

Bulk and Batch imports with Spring Boot

This article describes how to implement bulk and batch inserts with Spring Boot and Hibernate. For this it is necessary to configure the batch size so that Hibernate knows that the SQL inserts have to be combined to batch_size SQL-statements. The following properties must be set in the application.properties file of Spring Boot to achieve that:

spring.jpa.properties.hibernate.jdbc.batch_size=5
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true

It is important that the prefix spring.jpa.properties is used. This ensures that Spring passes the values through to Hibernate. Many thanks to Michael Simons @rotnroll666 and Vlad Mihalcea @vlad_mihalcea for the helpful tips!

In addition, it is recommended that you set the following property in the file application.properties to see the Hibernate statistics and to be able to check whether the SQL inserts were really executed in a batch.

spring.jpa.properties.hibernate.generate_statistics=true

In the following two ways are shown how to perform a bulk import within a Spring boot application.

Repository

One possibility is to create your own repository.

package org.hameister.bulk.data;

import org.springframework.data.jpa.repository.support.SimpleJpaRepository;
import org.springframework.stereotype.Repository;
import org.springframework.transaction.annotation.Transactional;

import javax.persistence.EntityManager;
import java.util.List;

/**
 * Created by hameister on 19.12.17.
 */

@Repository
public class BulkImporterRepository extends SimpleJpaRepository<Item, String> {

    private EntityManager entityManager;
    public BulkImporterRepository(EntityManager entityManager) {
        super(Item.class, entityManager);
        this.entityManager=entityManager;
    }

    @Transactional
    public List<Item> save(List<Item> items) {
        items.forEach(item -> entityManager.persist(item));
        return items;
    }
}

This extends the class SimpleJpaRepository and gets an EntityManager in the constructor. In the save method, the entity manager is used to save the Item Objects with persist. Important is the annotation @Transactional, which ensures that Spring handles the transactions.

It should also be noted that this example does not use the SimpleJpaRepository.save(Iterable <S> entities) method. The reason for this is that in the example you want to make sure that persist() is called and not merge(). Why this can lead to problems in the example and prevents bulk import is described in the article Using Ids for the Bulk Import in the correct way.

Item

package org.hameister.bulk.data;

import javax.persistence.*;

/**
 * Created by hameister on 01.12.17.
 */
@Entity
@Table(name = "Item")
public class Item {

    @Id
    String id;

    @Column(name = "description")
    private String description;

    @Column(name = "location")
    private String location;

    public Item() {
    }

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getDescription() {
        return description;
    }

    public void setDescription(String description) {
        this.description = description;
    }

    public String getLocation() {
        return location;
    }

    public void setLocation(String location) {
        this.location = location;
    }
}

Service

Another way to perform a bulk import is to create your own service.

package org.hameister.bulk.service;

import org.hameister.bulk.data.BulkImporterRepository;
import org.hameister.bulk.data.Item;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import org.springframework.util.Assert;

import java.util.List;

@Service
public class BulkImporterService {

    private EntityManagerFactory emf;

    @Autowired
    public BulkImporterService(EntityManagerFactory emf) {
        Assert.notNull(emf, "EntityManagerFactory must not be null");
        this.emf = emf;
    }

    public List<Item> bulkWithEntityManager(List<Item> items) {
            EntityManager entityManager = emf.createEntityManager();
            entityManager.getTransaction().begin();
            items.forEach(item -> entityManager.persist(item));
            entityManager.getTransaction().commit();
            entityManager.close();

            return items;
    }
}

This solution uses Dependency injection in the constructor to inject an EntityManagerFactory. This is used in the method bulkWithEntityManager to create an EntityManager. With this EntityManager a transaction is created and then all items are stored by calling the method persist. After that the transaction is committed so that the data is written to the database. In addition, the EntityManager should be closed with close(). As you can see, you have to deal with the transaction handling yourself in this variant.

The complete source code can be found on Github SpringBootBulkImport as a Maven project.

The example also contains a Spring Boot application with REST-Controller to test the import. If you call the endpoint http://localhost:8080/repositoryimport in a browser after you started the application you should see a similar output in the console if batch_size=5.

5015192 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
442437 nanoseconds spent preparing 1 JDBC statements;
0 nanoseconds spent executing 0 JDBC statements;
25708379 nanoseconds spent executing 2 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
110442364 nanoseconds spent executing 1 flushes (flushing a total of 10 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)

You see that the 10 Items (entities) are imported in two batches within one SQL statement.

Further informations and explanations concerning Hibernate can be found in Vlad Mihalceas Blogpost The best way to do batch processing with JPA and Hibernate.