Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
4.2k views
in Technique[技术] by (71.8m points)

java - Spring Data JPA Additional EntityManagerFactory Optimized for Cache and Bulk Operations Only

I have a legacy Spring Data JPA application that has a large number of Entities and CrudRepositories. JPA is configured using the XML below. We have a new requirement that requires us to insert 10,000 - 50,000 entities into the database at once via a FileUpload. With the existing configuration the database CPU spikes. After enabling hibernate statistics, its was apparent that these 10,000 insert operations were generating over 200,000 DB queries due to all the validation logic required with a single insert in the InvoiceService.

Original Configuration

<bean id="dataSource" destroy-method="close" class="org.apache.commons.dbcp2.BasicDataSource">
    <property name="driverClassName" value="${db.driver}"/>
    <property name="url" value="${db.jdbcurl}"/>
    <property name="username" value="${db.username}"/>
    <property name="password" value="${db.password}"/>

    <property name="maxTotal" value="${db.maxTotal}"/>
    <property name="maxIdle" value="${db.maxIdle}"/>
    <property name="minIdle" value="${db.minIdle}"/>
    <property name="initialSize" value="${db.initialSize}"/>
    <property name="maxWaitMillis" value="${db.maxWaitMillis}"/>
    <property name="minEvictableIdleTimeMillis" value="${db.minEvictableIdleTimeMillis}"/>
    <property name="timeBetweenEvictionRunsMillis" value="${db.timeBetweenEvictionRunsMillis}"/>
    <property name="testOnBorrow" value="${db.testOnBorrow}"/>
    <property name="testOnReturn" value="${db.testOnReturn}"/>
    <property name="testWhileIdle" value="${db.testWhileIdle}"/>
    <property name="removeAbandonedOnBorrow" value="${db.removeAbandonedOnBorrow}"/>
    <property name="removeAbandonedOnMaintenance" value="${db.removeAbandonedOnMaintenance}"/>
    <property name="removeAbandonedTimeout" value="${db.removeAbandonedTimeout}"/>
    <property name="logAbandoned" value="${db.logAbandoned}"/>
</bean>

<bean id="entityManagerFactory"
    class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean" depends-on="flyway">
    <property name="dataSource" ref="dataSource" />
    <property name="packagesToScan" value="my.package.domain" />
    <property name="loadTimeWeaver">
        <bean class="org.springframework.instrument.classloading.InstrumentationLoadTimeWeaver" />
    </property>
    <property name="jpaVendorAdapter">
        <bean class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter" />
    </property>
    <property name="jpaProperties">
        <props>
            <prop key="hibernate.hbm2ddl.auto">${hibernate.hbm2ddl.auto:validate}</prop>
            <prop key="hibernate.dialect">org.hibernate.dialect.PostgreSQL9Dialect</prop>
            <prop key="hibernate.show_sql">${hibernate.show_sql:false}</prop>
        </props>
    </property>
    <property name="persistenceUnitName" value="entityManagerFactory" />
</bean>

<bean id="persistenceAnnotationBeanPostProcessor" class="org.springframework.orm.jpa.support.PersistenceAnnotationBeanPostProcessor">
    <property name="defaultPersistenceUnitName" value="entityManagerFactory"/>
</bean>

<bean id="transactionManager" class="org.springframework.orm.jpa.JpaTransactionManager">
    <property name="entityManagerFactory" ref="entityManagerFactory" />
</bean>

<tx:annotation-driven proxy-target-class="true" />

<bean id="persistenceExceptionTranslationPostProcessor"
    class="org.springframework.dao.annotation.PersistenceExceptionTranslationPostProcessor" />

<jpa:repositories base-package="my.package.repository" entity-manager-factory-ref="entityManagerFactory"/>

The FileUploadService snippet looks like this...

    EntityManager batchEntityManager = entityManagerFactory.createEntityManager();
    EntityTransaction transaction = batchEntityManager.getTransaction();
    transaction.begin();
    try (BufferedReader buffer = new BufferedReader(new InputStreamReader(file.getInputStream()))) {

        buffer.lines()
            .filter(StringUtils::isNotBlank)
            .forEach(csvLine -> {
                invoiceService.createInvoice(csvLine);
                if (counter.incrementAndGet() % (updateFrequency.equals(0) ? 1 : updateFrequency) == 0) {
                    FileUpload fileUpload1 = fileUploadRepository.findOne(fileUpload.getId());
                    fileUpload1.setNumberOfSentRecords(sentCount.get());
                    fileUploadRepository.save(fileUpload1);
                    transaction.commit();
                    transaction.begin();
                    batchEntityManager.clear();
                }
            });
          transaction.commit();
    } catch (IOException ex) {
        systemException.incrementAndGet();
        log.error("Unexpected error while performing send task.", ex);
        transaction.rollback();
    }

    // Update FileUpload status.
    FileUpload fileUpload1 = fileUploadRepository.findOne(fileUpload.getId());
    fileUpload1.setNumberOfSentRecords(sentCount.get());
    if (systemException.get() != 0) {
        fileUpload1.setStatus(FileUploadStatus.SYSTEM_ERROR);
    } else {
        fileUpload1.setStatus(FileUploadStatus.SENT);
    }
    fileUploadRepository.save(fileUpload1);
    batchEntityManager.close();

Most of the DB queries were select statements that return the same results for each record being inserted. It was obvious that enabling EhCache as a Second-Level cache would have a significant performance improvement. However, this application has been running flawlessly in production for several years without ehcache enabled. I am hesitant to turn this on globally as I do not know how this will affect the large number of other repositories/queries.

Question 1

Is there a way to configure a "alternate" EntityManagerFactory that uses the second level cache for this batch process only? How can I choose to use this factory instead of the primary for this batch process only?

I experimented adding something like below to my spring config. I can easily inject this additional EntityManager into my class and use it. However, the existing Repositories (such as FileUploadRepository) don't seem to use it - they just return null. I am not sure if this approach is possible. The documentation for JpaTransactionManager says

This transaction manager is appropriate for applications that use a single JPA EntityManagerFactory for transactional data access

which is exactly what I am not doing. So what other options are there?

<bean id="batchEntityManagerFactory"
    class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean" depends-on="flyway">
    <property name="dataSource" ref="dataSource" />
    <property name="packagesToScan" value="my.package.domain" />
    <property name="loadTimeWeaver">
        <bean class="org.springframework.instrument.classloading.InstrumentationLoadTimeWeaver" />
    </property>
    <property name="jpaVendorAdapter">
        <bean class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter" />
    </property>
    <property name="jpaProperties">
        <props>
            <prop key="hibernate.hbm2ddl.auto">${hibernate.hbm2ddl.auto:validate}</prop>
            <prop key="hibernate.dialect">org.hibernate.dialect.PostgreSQL9Dialect</prop>
            <prop key="hibernate.show_sql">${hibernate.show_sql:false}</prop>
            <prop key="hibernate.generate_statistics">${hibernate.generate_statistics:false}</prop>
            <prop key="hibernate.ejb.interceptor">my.package.HibernateStatistics</prop>
            <prop key="hibernate.cache.use_query_cache">true</prop>
            <prop key="hibernate.cache.use_second_level_cache">true</prop>
            <prop key="hibernate.cache.region.factory_class">org.hibernate.cache.ehcache.EhCacheRegionFactory</prop>
            <prop key="hibernate.jdbc.batch_size">100</prop>
            <prop key="hibernate.order_inserts">true</prop>
            <prop key="hibernate.order_updates">true</prop>
        </props>
    </property>
    <property name="persistenceUnitName" value="entityManagerFactory" />
</bean>

Question 2

Assuming there is no other option to "selectively" use EhCache, I tried enabling it on the primary EntityManagerFactory only. We can certainly do regression testing to make sure we don't introduce new issues. I assume this is fairly safe to do? However, another issue came up. I am trying to commit inserts in batches as described in this post and shown in my code above. I am getting an RollbackException when trying to commit the batch due to Connection org.postgresql.jdbc.PgConnection@1e7eb804 is closed.. I presume this is due to the maxWaitMillis property on the dataSource.

I don't want to change this property for every other existing spring Service/Repository/query in the application. If I could use the "custom" EntityManagerFactory I could easily provide a different DataSource Bean with the properties I want. Again, is this possible?

Maybe I looking at this problem all wrong. Are there any other suggestions?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can have another EntityManagerFactory bean with a different qualifier, so that's one option. I would still recommend you look into these select queries. I bet the problem is just a missing index which causes the database to do full table scans. If you add the proper indexes, you probably don't need to change a thing in your application.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...