Simple query comparison

Caveats

Benchmarks are very difficult and the intention here is look at patterns in response times and specifically look at an ORM query that can be executed against both Postgres and ElasticSearch to identify patterns and conditions for certain responses.

You should ALWAYS try to run your own benchmarks against your own data. There are so many factors (volume, cardinality, relationships, data skew etc) that randomly generated data does not cater to.

Don't focus on the absolute numbers

... with ElasticSearch by default we effectively get an index on every property. We generally can't do that on our OLTP database (have an index on every column in the table).

Setup

Populated customers, products, orders, order details (using a modifications to the example application: example-kotlin-web with:

Orders: 500,000
Customers: 100,000
Products: 30,000
Order lines: 5,263,023

The orders are generated with an orderDate spread randomly over 12 months for 2015 and a random spread of orderStatus.

This is loaded into a Postgres database and additionally indexed into ElasticSearch. This is a relatively small database but it is decent enough to play around with various aspects of the query (selectivity of the order date and denormalisation).

For Postgres I did add indexes on order date and status but other wise kept with defaults for both Postgres and ElasticSearch. This is not about getting the best numbers but instead looking at the variations we can see in changing selectivity of predicates and denormalisation.

It is also interesting to note that for ElasticSearch this is a non-scoring query (filter query only) so it's good to see how it compares ballpark wise to Postgres for this simple query.

The query

Run a simple ORM query against both Postgres and ElasticSearch ...

Find Top 100 NEW orders after a given orderDate. We can pass in a boolean to switch between running the query against a Postgres database or ElasticSearch.

// top 100 new orders after (given date)
// ... run against Postgres or ElasticSearch
// ... based on boolean - asDocStore

return Order.find.where()
  .status.in(Order.Status.NEW)
  .orderDate.after(useDate)
  .order()
    .orderDate.desc()

  .setMaxRows(100)
  .setUseDocStore(asDocStore)
  .findList()

SQL query

SELECT
  t0.id                  c0,
  t0.status              c1,
  t0.order_date          c2,
  t0.ship_date           c3,
  t0.version             c4,
  t0.when_created        c5,
  t0.when_modified       c6,
  t0.customer_id         c7,
  t0.shipping_address_id c8
FROM orders t0
WHERE t0.status IN (?) AND t0.order_date > ?
ORDER BY t0.order_date DESC, t0.id
LIMIT 100; --bind(NEW,2015-12-08)

Elastic query

{
  "size": 100,
  "sort": [ { "orderDate": { "order": "desc" } } ],
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must": [
            { "terms": { "status": [ "NEW" ] } },
            { "range": { "orderDate": { "gt": 1450263600000 } } }
          ]
        }
      }
    }
  }
}

Run mode 1

In this first run mode I restrict the order date to be dates in December which makes the query reasonably selective. This means the cost of the sort and filter required for the TOP 100 is not too bad.

DB mean: 10.2    std dev: 2.34    min:6    max:18
ES mean: 8.77    std dev: 1.08    min:7    max:14

ElasticSearch and Postgres are in the same ballpark here - near enough to the same which is great.

Could we give ElasticSearch this query to run in our application? Yes.

Run mode 2 - Denormalisation effect

Denormalisation benefit in action (ElasticSearch as an alternative to a DB materialised view)

ElasticSearch did very well but actually the object it returned contained some additional denormalised information that we didn't have in the Postgres query.

Specifically the ElasticSearch order index contains additional details for the customer (id,name) and order details (*,product(id,sku,name)). If we change the ORM query to match the data the index is returning we need to fetch the customer name and order details (with some additional joins).

// change the ORM query to additionally fetch
// customer and order details to match the index
// which has been denormalised
query
  .fetch("customer", "id,name")
  .fetch("details", "*")
  .fetch("details.product","id,sku,name")

So this is going to make the ORM query against the DB more expensive because now we join to customer, order details and products.

In the ElasticSearch index these 'joins' have been denormalised into the index so that will run as before. In a DB we could create a materialised view to get a similar effect.

DB mean: 56.4    std dev: 5.95    min:48   max:95
ES mean: 8.75    std dev: 1.01    min:7    max:13

So denormalisation in the ElasticSearch index when it is needed is good (which is pretty much stating the obvious but it is always good to check).

Database materialised views would give us a similar effect but we note that using ElasticSearch also has the advantage of taking load off the database.

Run mode 3 - less selective

In this run mode we drop fetching the customer and order details etc (so back to run mode 1) but change the order date used such that it is now randomly selected from the first 100 days of the year. This means the query is much less selective and so to get the TOP 100 there are many more rows to sort and filter out of the result.

DB mean: 51.3    std dev: 3.71    min:44   max:60
ES mean: 14.0    std dev: 2.00    min:10   max:22

So here both queries go slower than run mode 1 as we expect (as they have to sort and filter out many more results to obtain the top 100). ElasticSearch does do rather well here in this TOP 100 relatively unselective query in the sense that it didn't really get that much slower.

ElasticSearch is orientated to paging query results but this a is pretty impressive result given that this is a non-scoring filter only query.

Appendix - Bean mapping

The mapping on the order entity bean is:

// this is Kotlin
// .. the @DocEmbedded is what you should look at

@DocStore
@Entity
@Table(name = "orders")
class Order : BaseModel() {

  ...

  var status: Status = Status.NEW;

  @DocEmbedded(doc = "id,name")
  @ManyToOne @NotNull
  var customer: Customer? = null;

  @DocEmbedded(doc = "*,product(id,sku,name)")
  @OneToMany(mappedBy = "order", cascade = arrayOf(CascadeType.PERSIST))
  @OrderBy("id asc")
  var details: MutableList<OrderDetail> = ArrayList();

Appendix: Index mapping

The mapping for the ElasticSearch order index. Note that enums are automatically treated as codes (hence status is not_analysed). The customer and details mapping from the respective @DocEmbedded annotations.

{
  "mappings" : {
    "order" : {
      "properties" : {
        "status": { "type": "string", "index": "not_analyzed" },
        "orderDate": { "type": "date" },
        "shipDate": { "type": "date" },
        "customer" : {
          "properties" : {
            "id": { "type": "long" },
            "name": { "type": "string" }
          }
        },
        "details" : {
          "type" : "nested",
          "properties" : {
            "id": { "type": "long" },
            "orderQty": { "type": "integer" },
            "shipQty": { "type": "integer" },
            "unitPrice": { "type": "double" },
            "product" : {
              "properties" : {
                "id": { "type": "long" },
                "sku": { "type": "string" },
                "name": { "type": "string" }
              }
            },
            "version": { "type": "long" },
            "whenCreated": { "type": "date" },
            "whenModified": { "type": "date" }
          }
        },
        "version": { "type": "long" },
        "whenCreated": { "type": "date" },
        "whenModified": { "type": "date" }
      }
    }
  }
}

Appendix: Example Order

An example from the ElasticSearch index.

{
  "_index": "order_v1",
  "_type": "order",
  "_id": "72033",
  "_version": 1,
  "found": true,
  "_source": {
    "status": "NEW",
    "orderDate": 1446462000000,
    "shipDate": 1446721200000,
    "customer": {
      "id": 31772,
      "name": "big 31775"
    },
    "details": [
      {
        "id": 759300,
        "orderQty": 7,
        "unitPrice": 78,
        "product": {
          "id": 9697,
          "sku": "A1672",
          "name": "A1672"
        },
        "version": 1,
        "whenCreated": 1460670857630,
        "whenModified": 1460670857630
      },
      {
        "id": 759301,
        "orderQty": 8,
        "unitPrice": 94,
        "product": {
          "id": 18351,
          "sku": "E2322",
          "name": "E2322"
        },
        "version": 1,
        "whenCreated": 1460670857630,
        "whenModified": 1460670857630
      },
      {
        "id": 759302,
        "orderQty": 11,
        "unitPrice": 73,
        "product": {
          "id": 12358,
          "sku": "B2332",
          "name": "B2332"
        },
        "version": 1,
        "whenCreated": 1460670857630,
        "whenModified": 1460670857630
      },
      {
        "id": 759303,
        "orderQty": 14,
        "unitPrice": 33,
        "product": {
          "id": 11847,
          "sku": "B1821",
          "name": "B1821"
        },
        "version": 1,
        "whenCreated": 1460670857630,
        "whenModified": 1460670857630
      },
      {
        "id": 759304,
        "orderQty": 11,
        "unitPrice": 34,
        "product": {
          "id": 14230,
          "sku": "C2203",
          "name": "C2203"
        },
        "version": 1,
        "whenCreated": 1460670857630,
        "whenModified": 1460670857630
      },
      {
        "id": 759305,
        "orderQty": 17,
        "unitPrice": 95,
        "product": {
          "id": 20625,
          "sku": "F2595",
          "name": "F2595"
        },
        "version": 1,
        "whenCreated": 1460670857630,
        "whenModified": 1460670857630
      },
      {
        "id": 759306,
        "orderQty": 4,
        "unitPrice": 89,
        "product": {
          "id": 11653,
          "sku": "B1627",
          "name": "B1627"
        },
        "version": 1,
        "whenCreated": 1460670857630,
        "whenModified": 1460670857630
      },
      {
        "id": 759307,
        "orderQty": 2,
        "unitPrice": 42,
        "product": {
          "id": 2237,
          "sku": "C450",
          "name": "C450"
        },
        "version": 1,
        "whenCreated": 1460670857630,
        "whenModified": 1460670857630
      },
      {
        "id": 759308,
        "orderQty": 1,
        "unitPrice": 60,
        "product": {
          "id": 3404,
          "sku": "D726",
          "name": "D726"
        },
        "version": 1,
        "whenCreated": 1460670857630,
        "whenModified": 1460670857630
      },
      {
        "id": 759309,
        "orderQty": 18,
        "unitPrice": 80,
        "product": {
          "id": 19433,
          "sku": "F1403",
          "name": "F1403"
        },
        "version": 1,
        "whenCreated": 1460670857630,
        "whenModified": 1460670857630
      },
      {
        "id": 759310,
        "orderQty": 9,
        "unitPrice": 53,
        "product": {
          "id": 22608,
          "sku": "G2577",
          "name": "G2577"
        },
        "version": 1,
        "whenCreated": 1460670857630,
        "whenModified": 1460670857630
      }
    ],
    "version": 1,
    "whenCreated": 1460670857630,
    "whenModified": 1460670857630
  }
}