N + 1

NPlus1

N + 1 is a general term used in ORM to describe a situation where loading a single object graph takes N + 1 SQL queries. When the database is remote over a network (the most common situation) then this situation can lead to very bad performance. A good ORM provides good mechanisms to mitigate or avoid this scenario.

By default Ebean ORM applies batch lazy loading with a batch size of 10 so by default for an completely un-tuned query you would instead observe 1 + N/10 SQL queries.

What is N?

When lazy loading is invoked on a OneToMany or ManyToMany relationship then N is the number of distinct parent Ids. For example, if you query for 100 orders and lazy load the order details (a OneToMany) then N is the number of distinct order ids.

When lazy loading a ManyToOne or OneToOne relationship then N is the number of distinct Id's for that bean type. For example, if you query for 100 orders and lazy load customers then N is the number of distinct customer Id's that relate to those orders.

N is per "path"

The lazy loading is invoked depending on which "paths" of the object graph are traversed. If you fetch orders and traverse from order to both "customer" and "details" that is actually 2 distinct paths and 2 distinct lazy loading queries will fire.

For example, fetching 100 orders and then traversing from each order to the "customer" and the "details" could results in between 3 and 201 SQL queries depending on the lazy loading batch size.

1 + (# of customers)/batchSize + (# of orders)/batchSize SQL queries.

batch size 1 results in up to : 1 + 100/1 + 100/1 = Up to 201 SQL queries
batch size 10 results in up to : 1 + 100/10 + 100/10 = Up to 21 SQL queries
batch size 100 results in up to : 1 + 100/100 + 100/100 = Up to 3 SQL queries

The N + 1 issue depends on the paths traversed / which parts of the object graph are used.

Lazy load batch size

With Ebean the default lazy load batch size is 10. You can change the lazy load batch size:

Globally via ServerConfig.setLazyLoadBatchSize()
Per Query via Query.setLazyLoadBatchSize()
Per Query path via Query.fetch() and FetchConfig

Global default

If you want to change the global default lazy load batch size to 100 you can set this via ServerConfig.setLazyLoadBatchSize()

serverConfig.setLazyLoadBatchSize(100);

... or via ebean.properties:

ebean.lazyLoadBatchSize=100

Per Query

If you want to change the lazy load batch size on a specific query you can set this via Query.setLazyLoadBatchSize():

List<Order> orders = database.find(Order.class)
    // set lazy loading batch size for this query
    .setLazyLoadBatchSize(100)
    ...
    .findList();

Per Path

If you want to change the lazy load batch size on a specific query you can set this via FetchConfig:

List<Order> orders = database.find(Order.class)

  // lazy fetch customer using batchSize of 10
  .fetch("customer", new FetchConfig().lazy(10))

  // eager fetch details using batchSize of 100
  .fetch("details", new FetchConfig().query(100))
  ...
  .findList();

Eager loading

The Ebean query language allows you to specify the "paths" that you want eagerly fetched. You can optionally specify a batch size to use and if not specified the batch size defaults to 100.

List<Order> orders = database.find(Order.class)

  // specify some predicates
  .status.eq(Order.Status.NEW)
  .orderDate.after(since)

  // specify 'what to fetch'

  // eagerly fetch customer and details using batchSize of 100
  .fetch("customer")
  .fetch("details")
  .findList();

ORM query to SQL Query

How is an ORM query broken into multiple SQL queries

Ebean will automatically break a single ORM query into multiple SQL queries in order that:

No SQL Cartesian product is produced
FirstRows / MaxRows is always honoured

Ebean will always honour firstRows/maxRows and will never generate a SQL Cartesian product both for performance reasons. What this means is that Ebean will analyse all the "fetch paths" that are defined for a query and detect which paths contain a OneToMany or ManyToMany relationship. For Ebean a single SQL query can contain at most one path that contains a OneToMany/ManyToMany and Ebean will automatically break up the ORM query based on this.

Refer to Query Joins

Question and Answer

Q: Is a global lazyLoadBatchSize of 100 reasonable?

A: Yes I think it is reasonable and is possibly a better default than 10 (to the point that perhaps 100 should be the Ebean default). The benefit of a smaller batch size is in scenarios where the object graph use is not 'symmetrical'. That is, in iterating a list of "root level objects" the parts of the object graph traversed/used is not the same. This seems pretty rare though and it seems much more common that the parts of the object graph used is the same for all the root level objects.

If the batch size is 100 but the batch contains many less than that is doesn't matter at all. To explain that, the number of bind values in the generated SQL IN clause falls into buckets of 1,5,10,20,50 and 100 where some bind values are repeated in order to fill up the bucket. Why do this? We don't want to have too many distinct SQL statements as the database itself parses and caches an execution plan for each SQL statement so limiting the buckets gives us a better hit ratio on the databases cache of sql execution plans.

Q: How does this relate to AutoTune?

A: Ebean's AutoTune feature profiles what part of the object graph is used and using this can provide automatic query tuning of the fetch() and select() clause and in this way reduce lazy loading and replacing it with eager fetching. Paths that traverse ManyToOne and OneToOne can be combined reducing the total number of SQL queries executed and so this can be better than batch lazy loading with a high batch size.

Q: How do I identify a N + 1 issue?

A: In Ebean terminology the 1 relates to the origin query and the N relates to secondary queries. In logging you can turn on TRACE level logging for org.avaje.ebean.SUM and included in the logs entries is an attribute called origin. The origin is a hash that is used to link the secondary queries back to the origin query. A low tech grep of the logs for that origin key would return the origin query and all related lazy loading queries.

Shortly a dashboard/monitoring service will be available and that will be able to identify origin queries that result in a lot of lazy loading.

AutoTune profiling provides another alternative although it is orientated towards providing the tuned query that fixes excessive lazy loading.

Q: What about JPA FetchType.EAGER

A:As a deployment annotation FetchType provides a hint that suits a single use case. The issue is that there are many use cases that beans and their properties / relationships need to support - optimising via a deployment annotation can optimise for 1 use case potentially to the detriment of many other use cases.

That is, a ORM should provide a mechanism / query language that enables the developer to optimise the object graph constructor for each use case and fixed deployment annotations can't do that.

Q: What about JPA Fetch groups

A: Fetch groups is a good thing for JPA users and provides an ability to control what part of the object graph to fetch in a similar vein to Ebean's fetch() and select().

My personal disappointment with JPA Fetch groups is that:

Fetch groups seem verbose and painful to use
Fetch groups should have been part of the JPQL query language
Fetch groups are missing control over: Eager/Lazy, Batch Size/ ReadOnly, L2/L3 use
Fetch groups are a hint? Say What?

Q: What about JPQL?

A: JPQL is not a good query language for optimising object graph construction and Ebean didn't adopt JPQL for this reason. That is, when building an object graph with Ebean we want to give the developers the ability to:

Control what part of the object graph to populate (which paths)
Have per path control over eager/lazy loading
Have per path control over loading batch size
Have per path control over readOnly and L2/L3 cache use

Ebean's ORM query language is design to provide a powerful and let simple way to control and optimise object graph construction. When the ORM query the language is not well suited (aggregate queries, reporting, recursive queries etc) Ebean provides good integration into SQL. The ORM query language might cover 90%+ of your OLTP queries but we don't want to expand that and complicate the language but instead promote good mechanisms to integrate with raw SQL instead.