本文根据SpringOne 2021的演讲Building Fast and Scalable Persistence Layers with Spring Data JPA整理而成,英语能力不错的建议直接观看下面的原始视频

演讲的PPT和源码在下面的地址

演讲中的示例代码使用了postgresql数据库,本地没有安装的可以使用 docker run --name some-postgres -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres -e POSTGRES_DB=test -p 5432:5432 --rm postgres 启动一个postgresql容器

1. Introduction

Spring Data JPA是Spring Data家族的其中一个项目,主要对一些实现了JPA规范的持久层框架进行封装和增强, 它极大地降低了JPA的使用难度,使那些基于Spring框架构建的应用非常轻松地实现持久层访问。

Spring Data JPA良好的封装性使它非常易用,但也隐藏了内部所有的执行逻辑,使其成为一个黑盒子,当然对于一般的中小型应用也确实不需要关注 其内部的执行逻辑,但是在构建大型应用或者对性能要求较高的中小型应用时很可能会发现Spring Data JPA并不能满足要求, 这通常是由于使用方法不当使用导致的,因此本文对Spring Data JPA的最佳实践进行说明,遵循这些建议可以在 提升开发效率的同时保证应用运行速度。

2. Identify Issues

要提升Spring Data JPA的执行效率,首先需要能够观测其内部执行情况, 大部分情况下Spring Data JPA与Hibernate一起使用,因此可以通过配置Hibernate的相关参数在控制台展示内部执行情况。

Properties
spring.jpa.properties.hibernate[generate_statistics]=true
spring.jpa.properties.hibernate.session.events.log[LOG_QUERIES_SLOWER_THAN_MS]=3
logging.level.org.hibernate.stat=debug
logging.level.org.hibernate[SQL]=debug
Yaml
spring:
  jpa:
    properties:
      hibernate:
        generate_statistics: true                  # <1>
        session:
          events:
            log:
              LOG_QUERIES_SLOWER_THAN_MS: 3        # <2>
logging:
  level:
    org:
      hibernate:
        stat: debug                                # <3>
        SQL: debug
1 输出执行统计
2 输出慢Sql
3 输出执行的Sql
注意这些参数配置会一定程度上影响执行速度,因此生产环境不建议使用

修改 ChessPlayer#tournaments 的 fetchType 为 EAGER 并运行 TestDemo#findAll() 会在控制台输出执行的SQL以及如下统计信息

2022-04-25 21:45:13.095  INFO 30756 --- [           main] i.StatisticalLoggingSessionEventListener : Session Metrics {
2150100 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
1206300 nanoseconds spent preparing 20 JDBC statements;
45058100 nanoseconds spent executing 20 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
92742400 nanoseconds spent performing 40 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
729500 nanoseconds spent performing 19 L2C misses;
0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);
13700 nanoseconds spent executing 1 partial-flushes (flushing a total of 0 entities and 0 collections)
}

3. Association Fetching

关联查询是JPA中非常常见的操作,定义实体关联关系的注解都包含了一个 fetch 属性用于定义关联查询方式

@ManyToOne(fetch = FetchType.LAZY)
private Publisher publisher;

FetchType 是一个枚举类型,包含 LAZYEAGER 两个值表示使用时查询和立即查询, ToMany默认使用 LAZY,ToOne默认使用 EAGER

EAGER 意味着不管是否使用都会查询关联的实体,因此一般情况下 LAZY 是更好的选择,也就是说 ToOne需要显式指定 fetch 值为 LAZY。而在可以确定后续代码中会使用关联的实体的场景中则 可以使用 @Query 注解手动在JPQL语句中指定join关系用于同时查询所需实体和其关联实体。

TestDemo#findWithTournaments() 用于演示 LEFT JOIN FETCH 和 EntityGraph 的效果

  1. 修改 ChessPlayer#tournamentsfetch 值为默认值。

  2. 注释 ChessPlayerRepository#findWithTournamentsBy() 的 @Query 和 @EntityGraph 后执行 TestDemo#findWithTournaments(),观察控制台统计信息发现会执行20条statement。

  3. 分别打开 ChessPlayerRepository#findWithTournamentsBy() 的 @Query 和 @EntityGraph 注释后再 执行 TestDemo#findWithTournaments(),观察控制台统计信息发现都只执行一条statement。

使用 join 查询确实会减少执行的sql数量,但也意味着会一次查询大量数据,因此很容易出现慢sql,尤其是 关联多个实体时,因此在使用时要特别注意。

4. Many-to-Many Association

多对多关联的场景中,我们经常使用 List 保存 的实体,然而此时Hibernate在对这些 的实体执行写操作时效率很低, 必须使用 Set 进行替换。

TestDemo#removePlayerFromTournament() 用于演示使用 ListSet 时执行不同的sql

  1. 首先修改 ChessTournament#players 的类型为List并执行 TestDemo#removePlayerFromTournament() 观察控制台信息会发现执行了 18条statement,因为先删除了所有的player再插入不需要删除的player。

  2. 修改 ChessTournament#players 的类型为Set再次执行 TestDemo#removePlayerFromTournament() 观察控制台信息会发现 执行了4条statement。

List
2022-04-26 20:30:50.183 DEBUG 19764 --- [           main] org.hibernate.SQL                        :
    select
        chesstourn0_.id as id1_2_0_,
        chesstourn0_.end_date as end_date2_2_0_,
        chesstourn0_.name as name3_2_0_,
        chesstourn0_.start_date as start_da4_2_0_,
        chesstourn0_.version as version5_2_0_
    from
        chess_tournament chesstourn0_
    where
        chesstourn0_.id=?
2022-04-26 20:30:50.214  INFO 19764 --- [           main] org.hibernate.SQL_SLOW                   : SlowQuery: 6 milliseconds. SQL: 'HikariProxyPreparedStatement@472019958 wrapping select chesstourn0_.id as id1_2_0_, chesstourn0_.end_date as end_date2_2_0_, chesstourn0_.name as name3_2_0_, chesstourn0_.start_date as start_da4_2_0_, chesstourn0_.version as version5_2_0_ from chess_tournament chesstourn0_ where chesstourn0_.id=2'
2022-04-26 20:30:50.386 DEBUG 19764 --- [           main] org.hibernate.SQL                        :
    select
        players0_.tournaments_id as tourname1_4_0_,
        players0_.players_id as players_2_4_0_,
        chessplaye1_.id as id1_1_1_,
        chessplaye1_.birth_date as birth_da2_1_1_,
        chessplaye1_.first_name as first_na3_1_1_,
        chessplaye1_.last_name as last_nam4_1_1_,
        chessplaye1_.version as version5_1_1_
    from
        chess_tournament_players players0_
    inner join
        chess_player chessplaye1_
            on players0_.players_id=chessplaye1_.id
    where
        players0_.tournaments_id=?
2022-04-26 20:30:50.406  INFO 19764 --- [           main] org.hibernate.SQL_SLOW                   : SlowQuery: 4 milliseconds. SQL: 'HikariProxyPreparedStatement@1453157901 wrapping select players0_.tournaments_id as tourname1_4_0_, players0_.players_id as players_2_4_0_, chessplaye1_.id as id1_1_1_, chessplaye1_.birth_date as birth_da2_1_1_, chessplaye1_.first_name as first_na3_1_1_, chessplaye1_.last_name as last_nam4_1_1_, chessplaye1_.version as version5_1_1_ from chess_tournament_players players0_ inner join chess_player chessplaye1_ on players0_.players_id=chessplaye1_.id where players0_.tournaments_id=2'
2022-04-26 20:30:50.558 DEBUG 19764 --- [           main] org.hibernate.SQL                        :
    update
        chess_tournament
    set
        end_date=?,
        name=?,
        start_date=?,
        version=?
    where
        id=?
        and version=?
2022-04-26 20:30:50.583  INFO 19764 --- [           main] org.hibernate.SQL_SLOW                   : SlowQuery: 5 milliseconds. SQL: 'HikariProxyPreparedStatement@177522915 wrapping update chess_tournament set end_date='2021-05-25 +08', name='Local Championship', start_date='2021-05-22 +08', version=1 where id=2 and version=0'
2022-04-26 20:30:50.590 DEBUG 19764 --- [           main] org.hibernate.SQL                        :
    delete
    from
        chess_tournament_players
    where
        tournaments_id=?
2022-04-26 20:30:50.594 DEBUG 19764 --- [           main] org.hibernate.SQL                        :
    insert
    into
        chess_tournament_players
        (tournaments_id, players_id)
    values
        (?, ?)
...
2022-04-26 20:30:50.646 DEBUG 19764 --- [           main] org.hibernate.SQL                        :
    insert
    into
        chess_tournament_players
        (tournaments_id, players_id)
    values
        (?, ?)
2022-04-26 20:30:50.650  INFO 19764 --- [           main] org.hibernate.SQL_SLOW                   : SlowQuery: 4 milliseconds. SQL: 'HikariProxyPreparedStatement@306941929 wrapping insert into chess_tournament_players (tournaments_id, players_id) values (2, 33)'
2022-04-26 20:30:50.669  INFO 19764 --- [           main] i.StatisticalLoggingSessionEventListener : Session Metrics {
    3880100 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    2100900 nanoseconds spent preparing 18 JDBC statements;
    58723500 nanoseconds spent executing 18 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    144689100 nanoseconds spent performing 18 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    6642700 nanoseconds spent performing 1 L2C misses;
    149136500 nanoseconds spent executing 1 flushes (flushing a total of 16 entities and 47 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}
Set
2022-04-26 20:33:32.488 DEBUG 32020 --- [           main] org.hibernate.SQL                        :
    select
        chesstourn0_.id as id1_2_0_,
        chesstourn0_.end_date as end_date2_2_0_,
        chesstourn0_.name as name3_2_0_,
        chesstourn0_.start_date as start_da4_2_0_,
        chesstourn0_.version as version5_2_0_
    from
        chess_tournament chesstourn0_
    where
        chesstourn0_.id=?
2022-04-26 20:33:32.503  INFO 32020 --- [           main] org.hibernate.SQL_SLOW                   : SlowQuery: 5 milliseconds. SQL: 'HikariProxyPreparedStatement@622702995 wrapping select chesstourn0_.id as id1_2_0_, chesstourn0_.end_date as end_date2_2_0_, chesstourn0_.name as name3_2_0_, chesstourn0_.start_date as start_da4_2_0_, chesstourn0_.version as version5_2_0_ from chess_tournament chesstourn0_ where chesstourn0_.id=2'
2022-04-26 20:33:32.611 DEBUG 32020 --- [           main] org.hibernate.SQL                        :
    select
        players0_.tournaments_id as tourname1_4_0_,
        players0_.players_id as players_2_4_0_,
        chessplaye1_.id as id1_1_1_,
        chessplaye1_.birth_date as birth_da2_1_1_,
        chessplaye1_.first_name as first_na3_1_1_,
        chessplaye1_.last_name as last_nam4_1_1_,
        chessplaye1_.version as version5_1_1_
    from
        chess_tournament_players players0_
    inner join
        chess_player chessplaye1_
            on players0_.players_id=chessplaye1_.id
    where
        players0_.tournaments_id=?
2022-04-26 20:33:32.789 DEBUG 32020 --- [           main] org.hibernate.SQL                        :
    update
        chess_tournament
    set
        end_date=?,
        name=?,
        start_date=?,
        version=?
    where
        id=?
        and version=?
2022-04-26 20:33:32.809  INFO 32020 --- [           main] org.hibernate.SQL_SLOW                   : SlowQuery: 6 milliseconds. SQL: 'HikariProxyPreparedStatement@1761528165 wrapping update chess_tournament set end_date='2021-05-25 +08', name='Local Championship', start_date='2021-05-22 +08', version=1 where id=2 and version=0'
2022-04-26 20:33:32.818 DEBUG 32020 --- [           main] org.hibernate.SQL                        :
    delete
    from
        chess_tournament_players
    where
        tournaments_id=?
        and players_id=?
2022-04-26 20:33:32.836  INFO 32020 --- [           main] i.StatisticalLoggingSessionEventListener : Session Metrics {
    2230700 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    1028500 nanoseconds spent preparing 4 JDBC statements;
    17931500 nanoseconds spent executing 4 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    113258000 nanoseconds spent performing 18 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    11600200 nanoseconds spent performing 1 L2C misses;
    116423200 nanoseconds spent executing 1 flushes (flushing a total of 16 entities and 47 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}

5. Projection

Spring Data JPA提供的默认查询虽然便于使用,但每次都会查询所有字段,很多时候这是不必要的, 比如在使用联合索引的情况下,只查询索引包含的字段可以提升查询效率。 为此Spring Data JPA提供了Projection实现只查询部分字段。

5.1. Projection with entity

TestDemo#getPlayerNamesDto() 用于演示基于实体类的Projection,从控制台的输出可以看到生成的SQL只查询了实体的两个属性

public class PlayerName {

    private String firstName;
    private String lastName;

    //getter, setter
}
2022-04-26 21:11:25.688 DEBUG 26960 --- [           main] org.hibernate.SQL                        :
    select
        chessplaye0_.first_name as col_0_0_,
        chessplaye0_.last_name as col_1_0_
    from
        chess_player chessplaye0_
    where
        chessplaye0_.first_name=?

5.2. Projection with interface

TestDemo#getPlayerNames() 用于演示基于接口的Projection,从控制台的输出可以看到生成的sql只查询了接口的get方法对于的字段

public interface PlayerNameIntf {

    String getFirstName();
    String getLastName();
}
2022-04-26 21:17:52.261 DEBUG 31952 --- [           main] org.hibernate.SQL                        :
    select
        chessplaye0_.first_name as col_0_0_,
        chessplaye0_.last_name as col_1_0_
    from
        chess_player chessplaye0_
    where
        chessplaye0_.first_name=?

一般情况下优先选择使用基于接口的Projection,除了声明更简单外还支撑Native Query,TestDemo#getPlayerNamesDtoNative()TestDemo#getPlayerNamesDtoNative() 分别使用实体类和接口接收Native Query的查询结果, 但使用实体类时会抛出 ConverterNotFoundException,而使用接口则能正常查询。

5.3. Nested associations

Projection还支持接口中定义关联的实体,TestDemo#getTournamentWithPlayers() 用于演示此场景

public interface TournamentIntf {

    String getName();
    List<PlayerNameIntf> getPlayers();
}

public interface PlayerNameIntf {

    String getFirstName();
    String getLastName();
}
2022-04-26 22:21:20.112 DEBUG 9948 --- [           main] org.hibernate.SQL                        :
    select
        chesstourn0_.id as id1_2_,
        chesstourn0_.end_date as end_date2_2_,
        chesstourn0_.name as name3_2_,
        chesstourn0_.start_date as start_da4_2_,
        chesstourn0_.version as version5_2_
    from
        chess_tournament chesstourn0_
    where
        chesstourn0_.name=?
2022-04-26 22:21:20.138  INFO 9948 --- [           main] org.hibernate.SQL_SLOW                   : SlowQuery: 13 milliseconds. SQL: 'HikariProxyPreparedStatement@1965388767 wrapping select chesstourn0_.id as id1_2_, chesstourn0_.end_date as end_date2_2_, chesstourn0_.name as name3_2_, chesstourn0_.start_date as start_da4_2_, chesstourn0_.version as version5_2_ from chess_tournament chesstourn0_ where chesstourn0_.name='Tata Steel Chess Tournament 2021''
2022-04-26 22:21:20.222 DEBUG 9948 --- [           main] o.h.stat.internal.StatisticsImpl         : HHH000117: HQL: select generatedAlias0 from ChessTournament as generatedAlias0 where generatedAlias0.name=:param0, time: 122ms, rows: 1
2022-04-26 22:21:20.238  INFO 9948 --- [           main] c.thorben.janssen.spring.data.TestDemo   : ======== Test Output ===========
2022-04-26 22:21:20.253 DEBUG 9948 --- [           main] org.hibernate.SQL                        :
    select
        players0_.tournaments_id as tourname1_4_0_,
        players0_.players_id as players_2_4_0_,
        chessplaye1_.id as id1_1_1_,
        chessplaye1_.birth_date as birth_da2_1_1_,
        chessplaye1_.first_name as first_na3_1_1_,
        chessplaye1_.last_name as last_nam4_1_1_,
        chessplaye1_.version as version5_1_1_
    from
        chess_tournament_players players0_
    inner join
        chess_player chessplaye1_
            on players0_.players_id=chessplaye1_.id
    where
        players0_.tournaments_id=?

尽管查询能够正常执行,但从控制台的输出可以看出查询了所有字段,而不只是Projection中声明的字段,这样的结果 完全失去了Projection的优势,反而还多定义了接口,因此绝对不要使用Nested associations。

5.4. SPEL

Projection的接口支持使用SPEL表达式,TestDemo#getPlayerFullNames() 演示了这一场景,与 Nested associations一样,最终会查询所有字段,因此这种方式也不要使用。

public interface PlayerFullNameIntf {

    @Value("#{target.lastName +', ' + target.firstName}")
    String getFullName();
}

通过在Projection接口中定义默认方法既可以保留Projection的好处,又能对查询的字段计算后返回, TestDemo#getBetterPlayerFullNames() 演示了这一用法

public interface BetterPlayerFullNameIntf {

    String getFirstName();
    String getLastName();

    default String getFullName() {return getLastName()+", "+getFirstName();}
}

从控制台的输出也可以看出只查询了get方法对应的字段

2022-04-26 22:29:44.729 DEBUG 25592 --- [           main] org.hibernate.SQL                        :
    select
        chessplaye0_.first_name as col_0_0_,
        chessplaye0_.last_name as col_1_0_
    from
        chess_player chessplaye0_
    where
        chessplaye0_.first_name=?