Post commit

When a transaction successfully commits entity beans that are mapped to ElasticSearch directly via @DocStore or indirectly via being part of an embedded document (de-normalisation) will have changes that need to propagate to ElasticSearch.

The processing of these entity beans occurs in a background thread so as to not effect the normal response time of the transaction.

Changes are propagated based on their DocStoreMode:

  • UPDATE - changes are sent to ElasticSearch via it's bulk API.
  • QUEUE - changes are pushed onto a queue for later processing.
  • IGNORE - changes are ignored by Ebean with the expectation that the application will find and propagate changes as needed.

Transaction DocStoreMode.IGNORE

A transaction can be set to DocStoreMode.IGNORE and then Ebean will ignore any and all changes in that transaction. This is intended for use with large batch processing where it is deemed best to have Ebean not perform normal processing of the changes and instead have the application search for changes to propagate to ElasticSearch later.

Transaction transaction = server.beginTransaction();
transaction.setDocStoreMode(DocStoreMode.IGNORE);
try {

  // perform lots of changes and we don't want
  // Ebean to propagate those (as it would normall)
  transaction.commit();
} finally {
  transaction.end();
}

// typically application code later finds and
// updates indexes as necessary
// for example:

Query<Product> query = server.find(Product.class)
  .where()
    .ge("whenModified", new Timestamp(since))
    .query();

// update products modified after a given dateTime
server.docStore().indexByQuery(query, 1000);

Insert

When a entity bean is inserted it is added as a to the DocStoreUpdate and sent to the DocStoreUpdateProcessor.

This translates into a index entry in ElasticSearch bulk updates.

Example: Insert a country

Country country = new Country("SA","South Africa");
country.save();

Bulk API

{"index":{"_id":"SA","_type":"country","_index":"country"}}
{"name":"South Africa"}

Delete

When a entity bean is deleted it is added as a to the DocStoreUpdate and sent to the DocStoreUpdateProcessor.

This translates into a delete entry in ElasticSearch bulk updates.

Example: Delete a country

Ebean.delete(Country.class, "SA");

Bulk API

{"delete":{"_id":"SA","_type":"country","_index":"country"}}

Update

Processing updates are more complex than inserts and deletes in that with updates we need to not only update the main @DocStore index but also update any indexes where the effected/updated properties have been included as part of an embedded document (typically via @DocEmbedded).

Example: Update country

Country sa = fetchSaFromDocStore();
sa.setName("Sud Africa");
sa.save();

Bulk API

{"update":{"_id":"SA","_type":"country","_index":"country"}}
{"doc":{"name":"Sud Africa"}}

Embedded documents

When we update an entity bean we also need to update indexes where the entity bean has been embedded.

Each @DocEmbedded represents an embedded document (de-normalisation). When entity beans are updated Ebean will also look to update any related embedded documents.

Based on the mapping (@DocEmbedded doc attributes) Ebean knows the nested paths that need to be checked/updated when an entity bean is updated.

E.g. Customer embedded in Order and Contacts

For example, let us index Customer but also have customer included as an embedded document within the Order index and Contacts index.

Customer indexed
@DocStore
@Entity
public class Customer ...
Customer embedded in Contacts
@DocStore
@Entity
public class Contact extends BasicDomain {

  ...
  @ManyToOne(optional = false)
  @DocEmbedded(doc = "id,name")
  Customer customer;
Customer embedded in Order
@DocStore
@Entity
@Table(name = "orders")
public class Order extends BasicDomain {

  ...
  @NotNull @ManyToOne
  @DocEmbedded(doc = "id,status,name,billingAddress(*,country(*)")
  Customer customer;

When customer name is updated Ebean needs to:

  • Update the Customer index
  • Update any related Contacts (based on nested path update)
  • Update any related Orders (based nested path update)

When Ebean starts it uses the mapping, reading the @DocEmbedded doc attributes and determines the nested document structure. If then registers a listener for each nested path. In the example above 2 listeners are registered with customer where one will update contacts (if customer name is changed) and one will update orders (if name, status or billing address is changed).

Change customer name

If we find customer 2 and change it's name to "Roberto" we will see:

Bulk API
{"update":{"_id":"2","_type":"customer","_index":"customer"}}
{"doc":{"name":"Roberto","whenModified":1459206556280,"version":2}}
{"update":{"_id":"5","_type":"order","_index":"order"}}
{"doc":{"customer":{"id":2,"status":"NEW","name":"Roberto","billingAddress":null}}}
{"update":{"_id":"2","_type":"order","_index":"order"}}
{"doc":{"customer":{"id":2,"status":"NEW","name":"Roberto","billingAddress":null}}}
{"update":{"_id":"4","_type":"contact","_index":"contact"}}
{"doc":{"customer":{"id":2,"name":"Roberto"}}}
  • The 1st entry updates the Customer index
  • The 2nd and 3rd update Order 5 and Order 2 (Roberto's related orders)
  • The 4th updates Contact 4 (Roberto's related contact)

Nested paths

For each nested path Ebean will execute an ElasticSearch scan query to find the entries in the index that need to be updated.

Find related orders
{"fields":["customer.id","id"],"query":{"filtered":{
  "filter":{
    "terms":{"customer.id":[2]}
  }
}}}
Find related contacts
{"fields":["customer.id","id"],"query":{"filtered":{
  "filter":{
    "terms":{"customer.id":[2]}
  }
}}}

It will execute an ORM query against the database to build the JSON to include in the Bulk API call but as above it will execute ElasticSearch scan queries to find all the related entries to update.

E.g. Embedded Country

In the example below the Customer index contains embedded documents for both the billing and shipping address and this in turn embeds the country. In this example "billingAddress.country.code" and "shippingAddress.country.code" are nested paths that Ebean needs to check to see which customer indexes need to be updated when a country name is changed.

In this example below Country is embedded within the Customer index in both the billingAddress and shippingAddress. When we update a Country we also need to update any Customer documents that contain that country in their billing or shipping address.

@DocStore
@Entity
public class Customer extends BasicDomain {
  ...
  @DocEmbedded(doc = "*,country(*)")
  @ManyToOne(cascade = CascadeType.ALL)
  Address billingAddress;

  @DocEmbedded(doc = "*,country(*)")
  @ManyToOne(cascade = CascadeType.ALL)
  Address shippingAddress;

e.g Nested path - billingAddress.country.code

Ebean will execute a scan query against ElasticSearch using the nested path in order to find documents that need to be updated due to the change in the embedded document.

find customer's with billingAddress.country.code = SA
{"fields":["billingAddress.id","id"],"query":{
    "filtered":{"filter":
      {"terms":{"billingAddress.country.code":["SA"]}
    }}
}}
find customer's with shippingAddress.country.code = SA
{"fields":["shippingAddress.id","id"],"query":{
    "filtered":{"filter":
      {"terms":{"billingAddress.country.code":["SA"]}
    }}
}}
find orders with customer.billingAddress.country.code = SA

Country also is embedded in the Order index via customer.billingAddress so we also find orders that have this embedded country.

{"fields":["customer.id","id"],"query":{"filtered":{
  "filter":{
    "terms":{"customer.billingAddress.country.code":["SA"]}
  }
}}}