Aggregation

SQL has the aggregation functions of SUM, MIN, MAX, COUNT and AVG.

We can use these functions in Ebean as dynamic formula or as properties via @Aggregation and @Sum.

Dynamic formula

Single attribute

LocalDate maxDate =
  new QOrder()
    .select("max(orderDate)")
    .customer.name.equalTo("Rob")
    .findSingleAttribute();

select max(t0.order_date)
from orders t0
join customer t1 on t1.id = t0.customer_id
where t1.name = ? ; --bind(Rob)

With aggregation queries that select only a single attribute the returned type matches the property type. In the above case the orderDate is a LocalDate and so that is the type returned.

Multiple attributes

When we specify non-aggregate properties (status in the example below) then a GROUP BY clause is generated that includes all the non-aggregate properties.

List<Order> orders =
  new QOrder()
  .select("status, max(orderDate)")     // status is non-aggregate
  .customer.name.equalTo("Rob")
  .findList();

select t0.status, max(t0.order_date)
from orders t0
join customer t1 on t1.id = t0.customer_id
where t1.name = ?
group by t0.status  -- group by non-aggregate properties

The beans returned by aggregation queries are partially populated. They don't contain the @Id property and thus do not support lazy loading or persisting.

The order beans in the example above will only have their status and orderDate properties loaded.

Fetch

Aggregation queries can include fetch or fetchQuery to load related beans.

List<Order> orders
  = new QOrder()
  .select("status, max(orderDate)")
  .customer.fetch("name")                                // (1) fetch
  .status.notEqualTo(Order.Status.NEW)
  .findList();

select t0.status, max(t0.order_date), t1.id, t1.name     -- (1) has customer id and name
from orders t0
join customer t1 on t1.id = t0.customer_id
where t0.status <> ?
group by t0.status, t1.id, t1.name                       -- (1) has customer id and name

When we use fetchQuery the ORM query is executed as 2 sql queries.

List<Order> orders
  = new QOrder()
  .select("status, max(orderDate)")
  .customer.fetchQuery("name")                           // (2) fetchQuery ...
  .status.notEqualTo(Order.Status.NEW)
  .findList();

-- Primary query
select t0.status, max(t0.order_date), t0.customer_id     -- (2) has customer id only
from orders t0
where t0.status <> ?
group by t0.status, t0.customer_id

-- Secondary query
select t0.id, t0.name                                    -- (2) customer id and name
from customer t0
where t0.id in (?, ?, ?, ?, ? )

Using fetchQuery() can be good when there are many properties on customer that we want to load (as these all go into the secondary sql query rather than the primary query and therefore are not part of the GROUP BY.

@Aggregation

We can model aggregations with properties annotated with @Aggregation. This is an alternative to using dynamic formula.

Properties with @Aggregation must be explicitly included in a query (they are considered Transient).

Example:

@Entity
@Table(name = "orders")
public class Order extends BaseModel {

  ...

  LocalDate orderDate;

  @Aggregation("max(orderDate)")     // aggregation property
  LocalDate maxOrderDate;

  @Aggregation("count(*)")           // aggregation property
  Long totalCount;

Having added the maxOrderDate and totalCount properties we can use them in select and having clauses in our queries.

QOrder o = QOrder.alias();

List<Order> orders = new QOrder()
  .select(o.status, o.maxOrderDate, o.totalCount)
  .findList();

select t0.status, max(t0.order_date), count(*)
from orders t0
group by t0.status

Having

Predicates on aggregation properties should be added to the Having clause.

List<Order> orders = new QOrder()
  .select(o.status, o.maxOrderDate, o.totalCount)
  .status.notEqualTo(Order.Status.COMPLETE)             // (1) where clause - non aggregate properties
  .having()
  .totalCount.greaterThan(1)                            // (2) having clause - aggregate properties
  .findList();

select t0.status, max(t0.order_date), count(*)
from orders t0
where t0.status <> ?                                    // (1)
group by t0.status
having count(*) > ?                                     // (2)

Aggregation beans

If we only want 1 or 2 aggregation properties we feel ok adding these to the existing entity beans. However, if instead we have many properties we want to aggregate this approach is messing with our model and it starts to look ugly.

We can create aggregation beans to model the properties - model the sum/group by/aggregation view.

Use @View rather than @Table
Does not extend Model as we don't save() or delete() these
Does not have @Id or @Version properties
Has properties we want to Aggregate on (sum, max etc)
Has properties we want to group by on (date, status, ManyToOne)

Example

Say we have an entity bean called MachineUse with:

@Entity
@Table(name = "machine_use")
public class MachineUse extends Model {

  @Id
  private long id;

  @ManyToOne(optional = false)
  private DMachine machine;

  private LocalDate date;

  private long distanceKms;     // we want to sum() this ...

  private long timeSecs;        // we want to sum() this ...

  private BigDecimal fuel;      // we want to sum() this ...

  @Version
  private long version;
  ...

We can create the aggregate bean for this as:

@Entity
@View(name = "machine_use")
public class MachineUseAggregate {

  @ManyToOne(optional = false)
  private DMachine machine;              // group by, fetch, fetchQuery on ...

  private LocalDate date;                // group by on ...

  @Aggregation("sum(distanceKms)")
  private Long distanceKms;

  @Aggregation("sum(timeSecs)")
  private Long timeSecs;

  @Aggregation("sum(fuel)")
  private BigDecimal fuel;

  @Aggregation("count(*)")
  private Long count;
  ...

@Sum

When we create aggregation beans we very often see the pattern where we sum a property like:

@Aggregation("sum(<property name>)")
Type <property name>;

For example, we see beans like:

@Aggregation("sum(distanceKms)")
BigDecimal distanceKms;

@Aggregation("sum(useSeconds)")
Long useSeconds;

@Aggregation("sum(cost)")
BigDecimal cost;

@Sum is syntatic sugar for this pattern and the above becomes:

@Sum
BigDecimal distanceKms;

@Sum
Long useSeconds;

@Sum
BigDecimal cost;