Wednesday, December 16, 2015

MongoDB's BI Connector and pushdown of SQL WHERE clauses

[EDIT 05-Apr-2018: MongoDB BI Connector version 2+ uses a much more rich and powerful approach for pushing down SQL clauses to the database - for more info see here]

In previous posts I showed how to use SQL & ODBC to query data from MongoDB, via Windows clients and Linux clients. In this post, I want to explore what happens to the SQL statement when it is sent to the BI Connector and onto the MongoDB database. For example, is the SQL WHERE clause pushed down to the database to resolve?

Again, for these tests, I've used the same MOT UK car test results data set as my last two posts.

As I wanted to get a better insight into what MongoDB is doing under the covers to process SQL queries, I used the Mongo Shell to enable profiling for the MongoDB database holding the MOT data.

Then on my Linux desktop client, using the ODBC settings I'd configured in my last post, I fired up isql ready to start issuing queries against the MOT data set, via the BI Connector.

I submitted a SQL statement to query all cars with a recorded mileage of over 500,000 miles, selecting specific columns only.

> SELECT make, model, test_mileage FROM testresults WHERE test_mileage > 500000;

The results were correctly returned in isql, but I was more interested to see what MongoDB was asked to do on the server-side, to fulfil this request. So using the Mongo Shell I queried the system profile collection to show the last recorded entry, displaying the exact request that MongoDB had received as a result of the translated SQL query.

> db.system.profile.find({ns: "mot:testresults"}).sort({$natural: -1}).limit(1).pretty()

As you can see in the output, the BI Connector has indeed pushed down to the database, the WHERE clause, plus the projection to return only specific fields. The profiler output shows that this has been achieved by the BI Connector, by assembling an Aggregation Pipeline.

This is great to see. Most of the work to process the SQL query is being done at the database level, reducing the amount of data (less rows and less columns) that is returned to the ODBC client for final processing, and also enabling database indexes to be leveraged for maximum performance.

Song for today: End Come Too Soon by Wild Beasts

1 comment:

Unknown said...

For a more thorough writeup on the pushdown capabilities of the BI connector, see this blog post, which explains exactly what sort of WHERE clauses can be pushed into MongoDB with the BI connector.

The reason this works in your example is because the WHERE clause is purely a conjunction of relational predicates on untransformed source fields. Use of a disjunction or a predicate on any computed expression would result in zero pushdown.