Why PG uses nested-loop join when no indexes are available?

Started by David Grelaudabout 10 years ago7 messagesgeneral

dgrelaud@ideolys.com

about 10 years ago

Hi,

Statistics are not propagated when Common Table Expressions (CTE) are used.
The cardinality of a CTE is equal to 1 most of the time so every joins with
previously computed CTEs are done with the nested-loop algorithm.
This seems to be really a risky choice, even without CTEs, according to
this paper and our own experiments:

"How good are query optimizers, really?." Proceedings of the VLDB Endowment
9.3 (2015): 204-215. (Paragraph 4.1) Leis, Viktor, et al.
http://www.vldb.org/pvldb/vol9/p204-leis.pdf

There are interesting discussions on the web about CTEs and bad
performances:

-
/messages/by-id/COL116-W25F8931477407ED7689D69A3F40@phx.gbl
- http://blog.2ndquadrant.com/postgresql-ctes-are-optimization-fences/
- ...

So when the problem happens (underestimation costs -> nested-loop -> many
rows -> bad performance guarantee), we have currently these solutions:

- refactor the query using Subquery Expressions instead of CTEs but the
query looks really ugly to read (increasing maintenance cost), and we may
loose some other execution plan optimisations provided by CTEs
- refactor the query using temporary table but it becomes impossible to use
single-query prepared statement
- disable nested loop but PostgreSQL does not use Indexes anymore when
available
- use an extension to enable Oracle-style hints (
https://fr.osdn.jp/projects/pghintplan/) but the system becomes blindness
(not data-dependent, potential futures algorithms never used, ...)
- is there another existing solution I'm not aware of?

I'm sure PostgreSQL could provide a better solution to solve this problem.

1) Would it be easy to compute and propagate statistics of CTEs, like
subqueries?

The argument usually returned by the PostgreSQL community is:
"CTEs have been created to let the developer control the execution plan, so
the statistics computation is virtually disabled"
Ok, the developer can control roughly the execution plan but in the same
time, Postgres becomes "stupid" inside each CTEs and chooses always the
"same" join algorithm (the riskiest) to join previously computed CTEs.
It is like giving to somebody the power to fly, while removing his eyes ;).

Drawbacks: even if statistics are computed and propagated across CTEs, and
if queries are really complex, the cost estimator may fail to compute
cardinality and the problem of nested-loop joins still happens.

2) Would it be possible to let the developer inject cardinality hints in
the query?

As suggested by this paper:
"Query optimizers: time to rethink the contract?."" In : Proceedings of the
2009 ACM SIGMOD International Conference on Management of data. ACM, 2009.
p. 961-968. CHAUDHURI, Surajit.
http://courses.cs.washington.edu/courses/csep544/10au/readings/p961-chaudhuri.pdf

The SQL developer could for example inject cardinality in a comment
"my_cte:100000". The developer is responsible to update this cardinality
with its own metrics and tools.
Thus, the cardinality is still data-dependent (not constant Ad vitam
æternam) and the planner is able to choose the best join algorithm
according to all other parameters (system IO...).

3) Always avoid nested-loop join when no indexes are available?

Tom Lane said "There might be some cases where this would help, but there
would be many more where it would be useless or counterproductive."
Who is right between Tom Lane and the Leis Viktor's paper above?

We tried to disable nested_loop all the time in a production environment
and we observed an overall improvement in all queries where Indexes are not
useful or not available (CTEs), which confirms the paper.
In fact, one of our production environment is still running with
"nested_loop off" because benefits are a lot greater than drawbacks as long
as some tables are relatively small (Indexes not used).

4) Do runtime optimizations?

According to research papers, this will be the next challenge. But I think
it is difficult to implement it in a relatively short-term period?

What is the purpose of this message:

We would like to find a "simple" long-term solution to this
under-estimation cost problem, which generate hazarduous performance
regressions in our production environments.

We would like to hear critiques or other solutions from PostgreSQL experts.

We would like to help developing and testing the solution.

Thank you very much!

Regards,
-------------------
David Grelaud,
Ideolys.