Strange influence of default_statistics_target

Started by Вадим Акбашевabout 9 years ago5 messagesbugs
Jump to latest

Hello!
I have encountered a problem with querry plan building:
I'd set default_statistics_target=700, run analyze. Postgres optimize had
chosen plan with hash_join and it took ~1 min for qerry to complete.
Then i set default_statistics_target=500 and the plan was significantly
changed and was using merge_join instead, complition time reduced in
hundreds times, cost reduced drastically.
Now i can't understand why more precise statistics leads to less optimized
plan and what is the right way to use default_statistics_target parameter?
I attach both good and bad querry plans and the querry itself

Attachments:

dst500.txttext/plain; charset=US-ASCII; name=dst500.txtDownload
dst700.txttext/plain; charset=US-ASCII; name=dst700.txtDownload
querry.sqlapplication/sql; name=querry.sqlDownload
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Вадим Акбашев (#1)
Re: Strange influence of default_statistics_target

=?UTF-8?B?0JLQsNC00LjQvCDQkNC60LHQsNGI0LXQsg==?= <ufaowl@gmail.com> writes:

I have encountered a problem with querry plan building:
I'd set default_statistics_target=700, run analyze. Postgres optimize had
chosen plan with hash_join and it took ~1 min for qerry to complete.
Then i set default_statistics_target=500 and the plan was significantly
changed and was using merge_join instead, complition time reduced in
hundreds times, cost reduced drastically.
Now i can't understand why more precise statistics leads to less optimized
plan and what is the right way to use default_statistics_target parameter?
I attach both good and bad querry plans and the querry itself

Are those really the same query? Plan 2 is enforcing a "number_value IS
NOT NULL" condition on "attribute_value av1" that I don't see in plan 1.
And neither plan seems to have much to do with the query, since the
query has UNIONs that aren't in the plans.

But the short answer seems to be that in both cases, the only reason that
the plan doesn't take forever to run is that one sub-join chances to yield
precisely zero rows, and the PG executor happens to be more efficient
about that corner case in the one plan shape than the other. The planner
doesn't take the possibility of that short-circuit happening into account,
since it generally cannot be sure that a sub-join wouldn't yield any rows.
So it's just luck that one plan is noticeably faster in this case.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: Tom Lane (#2)
Re: Strange influence of default_statistics_target

Hi, Tom!
Thank you for your answer.
Those plans indeed were for the querry i've sent. But plans where not
complete as i sent only parts of them that,as i tought, contain the
problem. In this letter i attach full plans and also statistics for tables
the querry works with. For default_statistics_target=700
and default_statistics_target=500.
This querry works much faster in our test enviroment which is 75% of our
production server in size.
Also i've noticed that the plan becomes "bad" after 9-10 hours after it was
fixed. And if I run ANALYZE again without changing effective_cache_size
in postgresql.conf it remains "bad".

Thank you in advance
Vadim

2017-01-18 19:18 GMT+05:00 Tom Lane <tgl@sss.pgh.pa.us>:

Show quoted text

=?UTF-8?B?0JLQsNC00LjQvCDQkNC60LHQsNGI0LXQsg==?= <ufaowl@gmail.com>
writes:

I have encountered a problem with querry plan building:
I'd set default_statistics_target=700, run analyze. Postgres optimize

had

chosen plan with hash_join and it took ~1 min for qerry to complete.
Then i set default_statistics_target=500 and the plan was significantly
changed and was using merge_join instead, complition time reduced in
hundreds times, cost reduced drastically.
Now i can't understand why more precise statistics leads to less

optimized

plan and what is the right way to use default_statistics_target

parameter?

I attach both good and bad querry plans and the querry itself

Are those really the same query? Plan 2 is enforcing a "number_value IS
NOT NULL" condition on "attribute_value av1" that I don't see in plan 1.
And neither plan seems to have much to do with the query, since the
query has UNIONs that aren't in the plans.

But the short answer seems to be that in both cases, the only reason that
the plan doesn't take forever to run is that one sub-join chances to yield
precisely zero rows, and the PG executor happens to be more efficient
about that corner case in the one plan shape than the other. The planner
doesn't take the possibility of that short-circuit happening into account,
since it generally cannot be sure that a sub-join wouldn't yield any rows.
So it's just luck that one plan is noticeably faster in this case.

regards, tom lane

In reply to: Tom Lane (#2)
Re: Strange influence of default_statistics_target

Sorry, i've forgot to attach files themselves

2017-01-18 19:18 GMT+05:00 Tom Lane <tgl@sss.pgh.pa.us>:

Show quoted text

=?UTF-8?B?0JLQsNC00LjQvCDQkNC60LHQsNGI0LXQsg==?= <ufaowl@gmail.com>
writes:

I have encountered a problem with querry plan building:
I'd set default_statistics_target=700, run analyze. Postgres optimize

had

chosen plan with hash_join and it took ~1 min for qerry to complete.
Then i set default_statistics_target=500 and the plan was significantly
changed and was using merge_join instead, complition time reduced in
hundreds times, cost reduced drastically.
Now i can't understand why more precise statistics leads to less

optimized

plan and what is the right way to use default_statistics_target

parameter?

I attach both good and bad querry plans and the querry itself

Are those really the same query? Plan 2 is enforcing a "number_value IS
NOT NULL" condition on "attribute_value av1" that I don't see in plan 1.
And neither plan seems to have much to do with the query, since the
query has UNIONs that aren't in the plans.

But the short answer seems to be that in both cases, the only reason that
the plan doesn't take forever to run is that one sub-join chances to yield
precisely zero rows, and the PG executor happens to be more efficient
about that corner case in the one plan shape than the other. The planner
doesn't take the possibility of that short-circuit happening into account,
since it generally cannot be sure that a sub-join wouldn't yield any rows.
So it's just luck that one plan is noticeably faster in this case.

regards, tom lane

Attachments:

stat_700application/octet-stream; name=stat_700Download
plan_500application/octet-stream; name=plan_500Download
plan_700application/octet-stream; name=plan_700Download
queryapplication/octet-stream; name=queryDownload
stat_500application/octet-stream; name=stat_500Download
In reply to: Вадим Акбашев (#4)
Re: Strange influence of default_statistics_target

Hello.
Information i've sent, is it sufficient?
Thank you.

2017-01-19 14:11 GMT+05:00 Вадим Акбашев <ufaowl@gmail.com>:

Show quoted text

Sorry, i've forgot to attach files themselves

2017-01-18 19:18 GMT+05:00 Tom Lane <tgl@sss.pgh.pa.us>:

=?UTF-8?B?0JLQsNC00LjQvCDQkNC60LHQsNGI0LXQsg==?= <ufaowl@gmail.com>
writes:

I have encountered a problem with querry plan building:
I'd set default_statistics_target=700, run analyze. Postgres optimize

had

chosen plan with hash_join and it took ~1 min for qerry to complete.
Then i set default_statistics_target=500 and the plan was significantly
changed and was using merge_join instead, complition time reduced in
hundreds times, cost reduced drastically.
Now i can't understand why more precise statistics leads to less

optimized

plan and what is the right way to use default_statistics_target

parameter?

I attach both good and bad querry plans and the querry itself

Are those really the same query? Plan 2 is enforcing a "number_value IS
NOT NULL" condition on "attribute_value av1" that I don't see in plan 1.
And neither plan seems to have much to do with the query, since the
query has UNIONs that aren't in the plans.

But the short answer seems to be that in both cases, the only reason that
the plan doesn't take forever to run is that one sub-join chances to yield
precisely zero rows, and the PG executor happens to be more efficient
about that corner case in the one plan shape than the other. The planner
doesn't take the possibility of that short-circuit happening into account,
since it generally cannot be sure that a sub-join wouldn't yield any rows.
So it's just luck that one plan is noticeably faster in this case.

regards, tom lane