Monday, May 05, 2014

Materialized Subquery Optimization

Greatest N per group is a hard problem. Everytime I comes up in my project at work (essentially a custom customer relation management system built in CodeIgniter) I end up wrestling with trying to do things the way that feels right but MySQL ain't havin' it. I usually end up having better luck saying to Hell with it and just rolling a Select N + 1 antipattern. Shipped beats perfect. I can beat myself up about sucking at database design later. At the moment I have problems to solve.

But today I discovered something interesting. I need to export the most recent values per account out of a couple different tables. Classic greatest N per group problem. I write a fairly ugly set of nested queries with the necessary ORDER BY and GROUP BY tomfoolery to get the behavior I want out of MySQL. I run it here on my local dev instance. It executes in a little under 4 minutes. Far from great, but for what should be a one-off across 200,000 accounts it's something I can live with. I switch over to production. Wait way longer than 4 seconds. As I approach a full minute I halt the query and scratch my head. I throw an EXPLAIN in front of the query in both environments.

The first thing I notice is while my local environment produces the query execution plan almost instantly (under 1ms) over in production the explain statement takes over 10 full seconds to run. Then I notice the output is also quite different. Locally I see the select_type MATERIALIZED. I'm familiar with materialized views even though MySQL doesn't support them. (At least not in the same way systems like PostGRES or Oracle do out of the box.) And I figured this difference in the output of the associated execution plans was an important clue.

A few weeks ago my early 2011 Macbook Pro gave up the ghost. So I'm running on new hardware. MySQL doesn't ship as part of MacOS so when I set up my stuff I (perhaps naively) just rolled with the most recent version. Let's just say our production server isn't quite so up to date.

The difference in query optimization between 5.5 and 5.6 is significant. Maybe I don't suck as database design so much afterall. Maybe I've just been wrestling with the query optimizer. (I probably still suck, just — ya know — less.)

The choice quote that is most relevant to the issues I was wrestling with today:

With Subquery Materialization, a subquery is executed once and the result is stored. In MySQL 5.5, the subquery would instead be executed for each row in the outer query. Our tests show that with a scale factor of 10, DBT-3 query 18 execution time drops from months to 68 seconds.

It's possible that my painfully slow but livable ~4 minute query in 5.6 would take months or even a year or more to run under 5.5. That's insane.