Videos

SQL: Covering indexes and only fetching what we need

Optimisations available when we only fetch what we need from the Database

Overview

Partial Objects is where an entity bean is not fully populated with all its properties but instead has only some of it's properties populated/loaded.

Partial properties are very important from a database, JDBC and network performance perspective and everything in Ebean is built to support partially populated beans. The query language, enhancement, persisting, event adapters and all of Ebean's internals work on the premise that partially populated beans are the norm.

Index only query

If a SQL query executes where all the columns used are included in an index then the database has the ability to return the query result by only using the index and specifically avoid reading the data blocks associated with the table. This is sometimes described as a "covering index" or "covering query".

Small number of columns selected = opportunity for index only query
database.find(Customer.class)

// only fetch the customer id and name
// ... if there is an index covering id and name the database has the ability
// ... to avoid reading the customer table data blocks and instead resolve the
// ... query by only using the index (which can be much faster)
.select("name")

// predicates and order by covered by index
.where().name.istartsWith("rob")
.orderBy().name.asc()
.findList();

Index only join

Similar to 'Index only query' is the case for joins where the columns needed/used on the joined table can be fully resolved by an index. For example, joining the customer table and only needing customer name. Similarly, joining to the product table and only needing the product name. These are examples where the join can often be resolved by the database by just using an index and not actually needing to read/access the table data blocks (join resolved by just reading the index blocks and avoiding reading the table blocks).

Small number of columns selected in joined table = opportunity for index only join
database.find(Order.class)
...
// only fetch the customer id and name
// ... if there is an index on the customer table that is covering the customer id and name
// ... the database has the ability to avoid reading the customer table data blocks and instead
// ... resolve the join part of this query from only the customer index (which is faster)
.fetch("customer", "name")
...
.findList();

Database buffers

When the database is processing the query it creates buffers to performing sorting, joining etc. These buffers are larger when there is more columns involved as there is more data. If you can only select the columns needed then these buffers and the amount of data involved in the sorting, joining etc is reduced. This is more important if the columns that can be excluded are large varchar columns or the entity beans are mapped to tables with large numbers of columns.

Less columns selected = smaller buffers used by DB to process sorts/joins etc

JDBC buffers

The JDBC driver also has buffers to pump the resultSet data over the network. More columns means a bigger JDBC buffer and more memory required by the JDBC driver.

Less columns selected = smaller buffers used by JDBC driver

Network

More columns included in the fetch means more data is pumped over the network. Note that some JDBC drivers do have something like column wise compression to reduce the network payload when values repeat (as repeating values is a pretty common occurrence with SQL resultSet data).

Less columns selected = less data pumped over the network

Select / Fetch

Ebean makes it easy to control what you want to fetch on a query using select() and fetch(). select() is used to define the specific properties to fetch/load for the root level and fetch() is used to define the properties to fetch/load for all other levels (with additional control over eager/lazy loading and loading batch size etc).

// control the lazy loading of customers ...
List<Order> list = Ebean.find(Order.class)
   // fetch just these properties for the Order
  .select("status,orderDate,shipDate")
  // fetch customer name only
  .fetch("customer","name")
  .fetch("customer.shippingAddress", "*")
  .fetch("details", "*")
  // fetch product name only
  .fetch("details.product", "name")
  .where().eq("status",Order.Status.NEW)
  .findList();

AutoTune

AutoTune will tune each query for each use case to fetch the minimal set of properties

Manually optimising queries to fetch only what is needed can be some work and even when you do that you often have the case where a single query like 'find order by id' is used in multiple user cases and each use case uses a different part of the object graph.

AutoTune is a feature than can profile the object graph use relative to the call stack for each query and provide an query that selects/fetches only what is used for the given call stack - effectively provide a nice automated way of optimising the query to only fetch what the application uses.