Documentation / Features / Elastic / Simple query comparison
Caveats
Benchmarks are very difficult and the intention here is look at patterns
in response
times and specifically look at an ORM query that can be executed against both Postgres
and ElasticSearch
to identify patterns and conditions for certain responses.
You should ALWAYS try to run your own benchmarks against your own data. There are so many factors (volume, cardinality, relationships, data skew etc) that randomly generated data does not cater to.
Don't focus on the absolute numbers
... with ElasticSearch by default we effectively get an index on every property. We generally can't do that on our OLTP database (have an index on every column in the table).
Setup
Populated customers, products, orders, order details (using a modifications to the example application: example-kotlin-web with:
- Orders: 500,000
- Customers: 100,000
- Products: 30,000
- Order lines: 5,263,023
The orders are generated with an orderDate
spread randomly over 12 months for 2015
and a random spread of orderStatus
.
This is loaded into a Postgres database and additionally indexed into ElasticSearch. This is a relatively small database but it is decent enough to play around with various aspects of the query (selectivity of the order date and denormalisation).
For Postgres I did add indexes on order date and status but other wise kept with defaults for both Postgres and ElasticSearch. This is not about getting the best numbers but instead looking at the variations we can see in changing selectivity of predicates and denormalisation.
It is also interesting to note that for ElasticSearch this is a non-scoring query (filter query only) so it's good to see how it compares ballpark wise to Postgres for this simple query.
The query
Run a simple ORM query against both Postgres and ElasticSearch ...
Find Top 100
NEW
orders after a given orderDate
.
We can pass in a boolean to switch between running the query against a Postgres database or ElasticSearch.
// top 100 new orders after (given date)
// ... run against Postgres or ElasticSearch
// ... based on boolean - asDocStore
return Order.find.where()
.status.in(Order.Status.NEW)
.orderDate.after(useDate)
.order()
.orderDate.desc()
.setMaxRows(100)
.setUseDocStore(asDocStore)
.findList()
SQL query
SELECT
t0.id c0,
t0.status c1,
t0.order_date c2,
t0.ship_date c3,
t0.version c4,
t0.when_created c5,
t0.when_modified c6,
t0.customer_id c7,
t0.shipping_address_id c8
FROM orders t0
WHERE t0.status IN (?) AND t0.order_date > ?
ORDER BY t0.order_date DESC, t0.id
LIMIT 100; --bind(NEW,2015-12-08)
Elastic query
{
"size": 100,
"sort": [ { "orderDate": { "order": "desc" } } ],
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{ "terms": { "status": [ "NEW" ] } },
{ "range": { "orderDate": { "gt": 1450263600000 } } }
]
}
}
}
}
}
Run mode 1
In this first run mode I restrict the order date to be dates in December
which makes the query
reasonably selective. This means the cost of the sort and filter required for the TOP 100 is not too bad.
DB mean: 10.2 std dev: 2.34 min:6 max:18
ES mean: 8.77 std dev: 1.08 min:7 max:14
ElasticSearch and Postgres are in the same ballpark here - near enough to the same which is great.
Could we give ElasticSearch this query to run in our application? Yes.
Run mode 2 - Denormalisation effect
Denormalisation benefit in action (ElasticSearch as an alternative to a DB materialised view)
ElasticSearch did very well but actually the object it returned contained some additional denormalised information that we didn't have in the Postgres query.
Specifically the ElasticSearch order index contains additional details for the customer (id,name)
and order details (*,product(id,sku,name))
. If we change the ORM query to match the data the
index is returning we need to fetch the customer name and order details (with some additional joins).
// change the ORM query to additionally fetch
// customer and order details to match the index
// which has been denormalised
query
.fetch("customer", "id,name")
.fetch("details", "*")
.fetch("details.product","id,sku,name")
So this is going to make the ORM query against the DB more expensive because now we join to customer, order details and products.
In the ElasticSearch index these 'joins' have been denormalised into the index so that will run as before. In a DB we could create a materialised view to get a similar effect.
DB mean: 56.4 std dev: 5.95 min:48 max:95
ES mean: 8.75 std dev: 1.01 min:7 max:13
So denormalisation in the ElasticSearch index when it is needed is good (which is pretty much stating the obvious but it is always good to check).
Database materialised views would give us a similar effect but we note that using ElasticSearch also has the advantage of taking load off the database.
Run mode 3 - less selective
In this run mode we drop fetching the customer and order details etc (so back to run mode 1) but
change the order date
used such that it is now randomly selected from
the first 100 days of the year
. This means the query is much less selective and so to
get the TOP 100
there are many more rows to sort and filter out of the result.
DB mean: 51.3 std dev: 3.71 min:44 max:60
ES mean: 14.0 std dev: 2.00 min:10 max:22
So here both queries go slower than run mode 1 as we expect (as they have to sort and filter out many more results to obtain the top 100). ElasticSearch does do rather well here in this TOP 100 relatively unselective query in the sense that it didn't really get that much slower.
ElasticSearch is orientated to paging query results but this a is pretty impressive result given that this is a non-scoring filter only query.
Appendix - Bean mapping
The mapping on the order entity bean is:
// this is Kotlin
// .. the @DocEmbedded is what you should look at
@DocStore
@Entity
@Table(name = "orders")
class Order : BaseModel() {
...
var status: Status = Status.NEW;
@DocEmbedded(doc = "id,name")
@ManyToOne @NotNull
var customer: Customer? = null;
@DocEmbedded(doc = "*,product(id,sku,name)")
@OneToMany(mappedBy = "order", cascade = arrayOf(CascadeType.PERSIST))
@OrderBy("id asc")
var details: MutableList<OrderDetail> = ArrayList();
Appendix: Index mapping
The mapping for the ElasticSearch order index. Note that enums are automatically treated as codes (hence status
is not_analysed). The customer and details mapping from the respective @DocEmbedded
annotations.
{
"mappings" : {
"order" : {
"properties" : {
"status": { "type": "string", "index": "not_analyzed" },
"orderDate": { "type": "date" },
"shipDate": { "type": "date" },
"customer" : {
"properties" : {
"id": { "type": "long" },
"name": { "type": "string" }
}
},
"details" : {
"type" : "nested",
"properties" : {
"id": { "type": "long" },
"orderQty": { "type": "integer" },
"shipQty": { "type": "integer" },
"unitPrice": { "type": "double" },
"product" : {
"properties" : {
"id": { "type": "long" },
"sku": { "type": "string" },
"name": { "type": "string" }
}
},
"version": { "type": "long" },
"whenCreated": { "type": "date" },
"whenModified": { "type": "date" }
}
},
"version": { "type": "long" },
"whenCreated": { "type": "date" },
"whenModified": { "type": "date" }
}
}
}
}
Appendix: Example Order
An example from the ElasticSearch index.
{
"_index": "order_v1",
"_type": "order",
"_id": "72033",
"_version": 1,
"found": true,
"_source": {
"status": "NEW",
"orderDate": 1446462000000,
"shipDate": 1446721200000,
"customer": {
"id": 31772,
"name": "big 31775"
},
"details": [
{
"id": 759300,
"orderQty": 7,
"unitPrice": 78,
"product": {
"id": 9697,
"sku": "A1672",
"name": "A1672"
},
"version": 1,
"whenCreated": 1460670857630,
"whenModified": 1460670857630
},
{
"id": 759301,
"orderQty": 8,
"unitPrice": 94,
"product": {
"id": 18351,
"sku": "E2322",
"name": "E2322"
},
"version": 1,
"whenCreated": 1460670857630,
"whenModified": 1460670857630
},
{
"id": 759302,
"orderQty": 11,
"unitPrice": 73,
"product": {
"id": 12358,
"sku": "B2332",
"name": "B2332"
},
"version": 1,
"whenCreated": 1460670857630,
"whenModified": 1460670857630
},
{
"id": 759303,
"orderQty": 14,
"unitPrice": 33,
"product": {
"id": 11847,
"sku": "B1821",
"name": "B1821"
},
"version": 1,
"whenCreated": 1460670857630,
"whenModified": 1460670857630
},
{
"id": 759304,
"orderQty": 11,
"unitPrice": 34,
"product": {
"id": 14230,
"sku": "C2203",
"name": "C2203"
},
"version": 1,
"whenCreated": 1460670857630,
"whenModified": 1460670857630
},
{
"id": 759305,
"orderQty": 17,
"unitPrice": 95,
"product": {
"id": 20625,
"sku": "F2595",
"name": "F2595"
},
"version": 1,
"whenCreated": 1460670857630,
"whenModified": 1460670857630
},
{
"id": 759306,
"orderQty": 4,
"unitPrice": 89,
"product": {
"id": 11653,
"sku": "B1627",
"name": "B1627"
},
"version": 1,
"whenCreated": 1460670857630,
"whenModified": 1460670857630
},
{
"id": 759307,
"orderQty": 2,
"unitPrice": 42,
"product": {
"id": 2237,
"sku": "C450",
"name": "C450"
},
"version": 1,
"whenCreated": 1460670857630,
"whenModified": 1460670857630
},
{
"id": 759308,
"orderQty": 1,
"unitPrice": 60,
"product": {
"id": 3404,
"sku": "D726",
"name": "D726"
},
"version": 1,
"whenCreated": 1460670857630,
"whenModified": 1460670857630
},
{
"id": 759309,
"orderQty": 18,
"unitPrice": 80,
"product": {
"id": 19433,
"sku": "F1403",
"name": "F1403"
},
"version": 1,
"whenCreated": 1460670857630,
"whenModified": 1460670857630
},
{
"id": 759310,
"orderQty": 9,
"unitPrice": 53,
"product": {
"id": 22608,
"sku": "G2577",
"name": "G2577"
},
"version": 1,
"whenCreated": 1460670857630,
"whenModified": 1460670857630
}
],
"version": 1,
"whenCreated": 1460670857630,
"whenModified": 1460670857630
}
}