Documentation / Features / Elastic / Syncing
Post commit
When a transaction successfully commits entity beans that are mapped to ElasticSearch directly
via @DocStore
or indirectly via being part of an embedded document (de-normalisation)
will have changes that need to propagate to ElasticSearch.
The processing of these entity beans occurs in a background thread so as to not effect the normal response time of the transaction.
Changes are propagated based on their DocStoreMode
:
- UPDATE - changes are sent to ElasticSearch via it's bulk API.
- QUEUE - changes are pushed onto a queue for later processing.
- IGNORE - changes are ignored by Ebean with the expectation that the application will find and propagate changes as needed.
Transaction DocStoreMode.IGNORE
A transaction can be set to DocStoreMode.IGNORE
and then Ebean will ignore any and all changes
in that transaction.
This is intended for use with large batch processing where it is deemed best to have Ebean not perform
normal processing of the changes and instead have the application search for changes to propagate to
ElasticSearch later.
Transaction transaction = server.beginTransaction();
transaction.setDocStoreMode(DocStoreMode.IGNORE);
try {
// perform lots of changes and we don't want
// Ebean to propagate those (as it would normall)
transaction.commit();
} finally {
transaction.end();
}
// typically application code later finds and
// updates indexes as necessary
// for example:
Query<Product> query = server.find(Product.class)
.where()
.ge("whenModified", new Timestamp(since))
.query();
// update products modified after a given dateTime
server.docStore().indexByQuery(query, 1000);
Insert
When a entity bean is inserted it is added as a to the DocStoreUpdate and sent to the DocStoreUpdateProcessor.
This translates into a index
entry in ElasticSearch bulk updates.
Example: Insert a country
Country country = new Country("SA","South Africa");
country.save();
Bulk API
{"index":{"_id":"SA","_type":"country","_index":"country"}}
{"name":"South Africa"}
Delete
When a entity bean is deleted it is added as a to the DocStoreUpdate and sent to the DocStoreUpdateProcessor.
This translates into a delete
entry in ElasticSearch bulk updates.
Example: Delete a country
Ebean.delete(Country.class, "SA");
Bulk API
{"delete":{"_id":"SA","_type":"country","_index":"country"}}
Update
Processing updates are more complex than inserts and deletes in that with updates we need to not only
update the main @DocStore
index but also update any indexes where the effected/updated
properties have been included as part of an embedded document (typically via @DocEmbedded
).
Example: Update country
Country sa = fetchSaFromDocStore();
sa.setName("Sud Africa");
sa.save();
Bulk API
{"update":{"_id":"SA","_type":"country","_index":"country"}}
{"doc":{"name":"Sud Africa"}}
Embedded documents
When we update an entity bean we also need to update indexes where the entity bean has been embedded.
Each @DocEmbedded
represents an embedded document (de-normalisation). When entity beans
are updated Ebean will also look to update any related embedded documents.
Based on the mapping (@DocEmbedded
doc attributes) Ebean knows the nested paths
that need to be checked/updated when an entity bean is updated.
E.g. Customer embedded in Order and Contacts
For example, let us index Customer but also have customer included as an embedded document within the Order index and Contacts index.
Customer indexed
@DocStore
@Entity
public class Customer ...
Customer embedded in Contacts
@DocStore
@Entity
public class Contact extends BasicDomain {
...
@ManyToOne(optional = false)
@DocEmbedded(doc = "id,name")
Customer customer;
Customer embedded in Order
@DocStore
@Entity
@Table(name = "orders")
public class Order extends BasicDomain {
...
@NotNull @ManyToOne
@DocEmbedded(doc = "id,status,name,billingAddress(*,country(*)")
Customer customer;
When customer name is updated Ebean needs to:
- Update the Customer index
- Update any related Contacts (based on nested path update)
- Update any related Orders (based nested path update)
When Ebean starts it uses the mapping, reading the @DocEmbedded
doc
attributes and determines the nested document structure. If then registers a listener for each
nested path
. In the example above 2 listeners are registered with customer where one
will update contacts (if customer name is changed) and one will update orders (if name, status or
billing address is changed).
Change customer name
If we find customer 2 and change it's name to "Roberto" we will see:
Bulk API
{"update":{"_id":"2","_type":"customer","_index":"customer"}}
{"doc":{"name":"Roberto","whenModified":1459206556280,"version":2}}
{"update":{"_id":"5","_type":"order","_index":"order"}}
{"doc":{"customer":{"id":2,"status":"NEW","name":"Roberto","billingAddress":null}}}
{"update":{"_id":"2","_type":"order","_index":"order"}}
{"doc":{"customer":{"id":2,"status":"NEW","name":"Roberto","billingAddress":null}}}
{"update":{"_id":"4","_type":"contact","_index":"contact"}}
{"doc":{"customer":{"id":2,"name":"Roberto"}}}
- The 1st entry updates the Customer index
- The 2nd and 3rd update Order 5 and Order 2 (Roberto's related orders)
- The 4th updates Contact 4 (Roberto's related contact)
Nested paths
For each nested path
Ebean will execute an ElasticSearch scan query to find the
entries in the index that need to be updated.
Find related orders
{"fields":["customer.id","id"],"query":{"filtered":{
"filter":{
"terms":{"customer.id":[2]}
}
}}}
Find related contacts
{"fields":["customer.id","id"],"query":{"filtered":{
"filter":{
"terms":{"customer.id":[2]}
}
}}}
It will execute an ORM query against the database to build the JSON to include in the Bulk API call but as above it will execute ElasticSearch scan queries to find all the related entries to update.
E.g. Embedded Country
In the example below the Customer index contains embedded documents for
both the billing and shipping address and this in turn embeds the country.
In this example "billingAddress.country.code"
and "shippingAddress.country.code"
are nested paths
that Ebean needs to check to see which customer indexes need to be
updated when a country name is changed.
In this example below Country is embedded within the Customer index in both the billingAddress and shippingAddress. When we update a Country we also need to update any Customer documents that contain that country in their billing or shipping address.
@DocStore
@Entity
public class Customer extends BasicDomain {
...
@DocEmbedded(doc = "*,country(*)")
@ManyToOne(cascade = CascadeType.ALL)
Address billingAddress;
@DocEmbedded(doc = "*,country(*)")
@ManyToOne(cascade = CascadeType.ALL)
Address shippingAddress;
e.g Nested path - billingAddress.country.code
Ebean will execute a scan query against ElasticSearch using the nested path
in order to find
documents that need to be updated due to the change in the embedded document.
find customer's with billingAddress.country.code = SA
{"fields":["billingAddress.id","id"],"query":{
"filtered":{"filter":
{"terms":{"billingAddress.country.code":["SA"]}
}}
}}
find customer's with shippingAddress.country.code = SA
{"fields":["shippingAddress.id","id"],"query":{
"filtered":{"filter":
{"terms":{"billingAddress.country.code":["SA"]}
}}
}}
find orders with customer.billingAddress.country.code = SA
Country also is embedded in the Order index via customer.billingAddress so we also find orders that have this embedded country.
{"fields":["customer.id","id"],"query":{"filtered":{
"filter":{
"terms":{"customer.billingAddress.country.code":["SA"]}
}
}}}