Complex filters -> Bad row estimates -> bad query plan

Started by Mathieu Fenniakover 6 years ago4 messagesgeneral
Jump to latest
#1Mathieu Fenniak
mathieu.fenniak@replicon.com

Hi all,

I have a database query where I have a number of "simple" where clauses, a
number of "complex" subquery based where clauses, and one NOT EXISTS where
clause; it looks something like this:

SELECT ...some fields... FROM Table1
WHERE
Field1 IN (1, 2, 3, 4, 21, 24) AND -- simple filter
Field2 <> 1 AND -- simple filter
Field3 >= '2019-07-08' AND -- simple filter
Field3 <= '2019-08-18' AND -- simple filter
NOT EXISTS (SELECT 1 FROM Table2 WHERE Table2.Status = 2 AND
Table2.SomeId = Table1.Id) AND -- anti-join
COALESCE((SELECT Status FROM Table3 WHERE Table3.SomeId = Table1.Id),
(SELECT Status FROM Table4 WHERE Table4.SomeId = Table1.Id)) = 2 --
"complex" condition

The problem I'm encountering is that I've observed degraded performance in
some cases where the Anti Join merge for the NOT EXISTS clause is planned
based upon poor row estimates for Table1. All of the other filters, and
the SubPlan filter(s) for the complex clauses, result in the query planner
estimating that only 1 row from Table1 will be resulting, so a Nested Loop
Anti Join is used and the RHS of that nested loop in a seqscan on Table2.
The reality is that many thousands of records match all the conditions; a
Merge Anti Join or Hash Anti Join would be a better query plan.

I've tested the query planner with just the simpler conditions, and it
makes pretty reasonable estimates about the row count (+/- 10%). Adding
the NOT EXISTS results in a Merge Anti Join, and performance is great.

Adding the more "complex" conditions (there are potentially multiple of
these subquery plan searches) results in the estimated row count dropping
to 1, and, performance dives.

I know there are no "query hints" in PostgreSQL... any thoughts on
alternative approaches here? The only option I've used in the past for
this is creating specialized indexes, which can provide more targeted
statistics; but it's not applicable here since the "complex" conditions use
data from another table in a subquery.

Appreciate any thoughts, theories, or directions. :-) Thanks,

Mathieu

#2Michael Lewis
mlewis@entrata.com
In reply to: Mathieu Fenniak (#1)
Re: Complex filters -> Bad row estimates -> bad query plan

If those conditions that are throwing off the stats are expected to be
minimally impactful/filtering few rows, then you can use the one
tried-and-true optimizer hint (aside from materialized CTEs, stylized
indexes, etc) --- OFFSET 0 at the end of a sub-query.

SELECT * FROM ( [your existing query without the sub-selects that are
complicated and produce bad estimates] OFFSET 0 ) WHERE [your other
conditions that don't produce good estimates]

If there is correlation between field1 and field2, you might also look at
CREATE STATISTICS assuming you are on PG 10 or 11.

Before I do any of that, I would try LEFT JOIN for Table3 and Table4 then
use the where conditon "AND 2 = COALESCE( Table3.Status, Table4.Status"
and see if the optimizer likes that option better.

#3Mathieu Fenniak
mathieu.fenniak@replicon.com
In reply to: Michael Lewis (#2)
Re: Complex filters -> Bad row estimates -> bad query plan

Thanks Michael.

I'll give some join alternatives a shot first... but, that's cool.

What about OFFSET 0 makes this approach work? I'm thinking the OFFSET 0
create an optimization barrier that prevents the planner from collapsing
that sub-query into the top query, and enforces ordering in the query?

I appreciate your thoughts, thank-you very much for the feedback.

Mathieu

On Wed, Aug 21, 2019 at 12:08 PM Michael Lewis <mlewis@entrata.com> wrote:

Show quoted text

If those conditions that are throwing off the stats are expected to be
minimally impactful/filtering few rows, then you can use the one
tried-and-true optimizer hint (aside from materialized CTEs, stylized
indexes, etc) --- OFFSET 0 at the end of a sub-query.

SELECT * FROM ( [your existing query without the sub-selects that are
complicated and produce bad estimates] OFFSET 0 ) WHERE [your other
conditions that don't produce good estimates]

If there is correlation between field1 and field2, you might also look at
CREATE STATISTICS assuming you are on PG 10 or 11.

Before I do any of that, I would try LEFT JOIN for Table3 and Table4 then
use the where conditon "AND 2 = COALESCE( Table3.Status, Table4.Status"
and see if the optimizer likes that option better.

#4Michael Lewis
mlewis@entrata.com
In reply to: Mathieu Fenniak (#3)
Re: Complex filters -> Bad row estimates -> bad query plan

-- I'm thinking the OFFSET 0 create an optimization barrier that prevents
the planner from collapsing that sub-query into the top query, and enforces
ordering in the query?

That's my understanding. I think it is an optimizer hint by another name. I
used to put things in a CTE (which is always materialized until v12, which
will change it to inlined unless keyword MATERIALIZED is included) or I
would create a temp table if the dataset is expected to contain many rows
such that I can do ANALYZE pg_temp.table_table; so the optimizer has stats
to make good decisions.

Note- Replying to messages with a full quote of the conversation below your
comment (aka top-posting) is discouraged on these mailing lists. Please
quote the portion you are responding to and that's it.