Optimization of range queries

Started by Konstantin Knizhnikabout 8 years ago3 messageshackers
Jump to latest
#1Konstantin Knizhnik
k.knizhnik@postgrespro.ru

Hi hackers,

Postgres optimizer is not able to build efficient execution plan for the
following query:

explain select * from  people_raw where not ("ID"<2068113880 AND "INN"
is not null) and "ID"<=2068629726 AND "INN" is not null;
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Bitmap Heap Scan on people_raw  (cost=74937803.72..210449640.49
rows=121521030 width=336)
   Recheck Cond: ("ID" <= 2068629726)
   Filter: (("INN" IS NOT NULL) AND (("ID" >= 2068113880) OR ("INN" IS
NULL)))
   ->  Bitmap Index Scan on "People_pkey" (cost=0.00..74907423.47
rows=2077021718 width=0)
         Index Cond: ("ID" <= 2068629726)
(5 rows)

Here the table is very large, but query effects only relatively small
number of rows located in the range: [2068113880,2068629726]
But unfortunately optimizer doesn't take it into the account.
Moreover, using "is not null" and "not null" is both operands of AND is
not smart:
     (("INN" IS NOT NULL) AND (("ID" >= 2068113880) OR ("INN" IS NULL)))

If I remove "is not null" condition, then plan is perfect:

explain select * from  people_raw where not ("ID"<2068113880) and
"ID"<=2068629726;
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Index Scan using "People_pkey" on people_raw  (cost=0.58..196745.57
rows=586160 width=336)
   Index Cond: (("ID" >= 2068113880) AND ("ID" <= 2068629726))
(2 rows)

Before starting  investigation of the problem, I will like to know
opinion and may be some advise of people familiar with optimizer:
how difficult will be to handle this case and where to look.

Thanks in advance,

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#2Teodor Sigaev
teodor@sigaev.ru
In reply to: Konstantin Knizhnik (#1)
Re: Optimization of range queries

Hi!

12 years ago I proposed patch to which could "union" OR clauses into one
range clause if it's possible. In that time pgsql could not use IS NULL
as index clause, so patch doesn't support that

/messages/by-id/45742C51.9020602@sigaev.ru

option number 4), all other are already committed.

Konstantin Knizhnik wrote:

Hi hackers,

Postgres optimizer is not able to build efficient execution plan for the
following query:

explain select * fromО©╫ people_raw where not ("ID"<2068113880 AND "INN"
is not null) and "ID"<=2068629726 AND "INN" is not null;
О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫ QUERY PLAN
--------------------------------------------------------------------------------------------

О©╫Bitmap Heap Scan on people_rawО©╫ (cost=74937803.72..210449640.49
rows=121521030 width=336)
О©╫О©╫ Recheck Cond: ("ID" <= 2068629726)
О©╫О©╫ Filter: (("INN" IS NOT NULL) AND (("ID" >= 2068113880) OR ("INN" IS
NULL)))
О©╫О©╫ ->О©╫ Bitmap Index Scan on "People_pkey" (cost=0.00..74907423.47
rows=2077021718 width=0)
О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫ Index Cond: ("ID" <= 2068629726)
(5 rows)

Here the table is very large, but query effects only relatively small
number of rows located in the range: [2068113880,2068629726]
But unfortunately optimizer doesn't take it into the account.
Moreover, using "is not null" and "not null" is both operands of AND is
not smart:
О©╫О©╫О©╫О©╫ (("INN" IS NOT NULL) AND (("ID" >= 2068113880) OR ("INN" IS NULL)))

If I remove "is not null" condition, then plan is perfect:

explain select * fromО©╫ people_raw where not ("ID"<2068113880) and
"ID"<=2068629726;
О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫ QUERY PLAN
--------------------------------------------------------------------------------------------

О©╫Index Scan using "People_pkey" on people_rawО©╫ (cost=0.58..196745.57
rows=586160 width=336)
О©╫О©╫ Index Cond: (("ID" >= 2068113880) AND ("ID" <= 2068629726))
(2 rows)

Before startingО©╫ investigation of the problem, I will like to know
opinion and may be some advise of people familiar with optimizer:
how difficult will be to handle this case and where to look.

Thanks in advance,

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#3Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Teodor Sigaev (#2)
Re: Optimization of range queries

On 09.04.2018 20:05, Teodor Sigaev wrote:

Hi!

12 years ago I proposed patch to which could "union" OR clauses into
one range clause if it's possible. In that time pgsql could not use IS
NULL as index clause, so patch doesn't support that

/messages/by-id/45742C51.9020602@sigaev.ru

option number 4), all other are already committed.

It seems to be slightly different optimization.
Attached please find small patch which extends simplify_and_arguments in
clauses.c to eliminated redundant checks.
It doesn't perform complete constrains propagation and not using
predicate_implied_by/predicate_refuted_by because them seems to be too
expensive and essentially increase
query optimization time. Instead of it it just strict match comparison
of predicates with some extra logic for handling negators.

With this patch constructed query plans are optimal:

postgres=# create table foo(x integer primary key, y integer);
CREATE TABLE
postgres=# insert into foo (x) values (generate_series(1,100000));
INSERT 0 100000
postgres=# insert into foo (x,y) values (generate_series(100001,200000), 1);
INSERT 0 100000
postgres=# vacuum analyze foo;
VACUUM
postgres=# explain select * from foo where not (x < 99999 and y is not
null) and (x <= 100001 and y is not null);
О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫ QUERY PLAN
--------------------------------------------------------------------
О©╫Index Scan using foo_pkey on fooО©╫ (cost=0.42..8.48 rows=2 width=8)
О©╫О©╫ Index Cond: ((x <= 100001) AND (x >= 99999))
О©╫О©╫ Filter: (y IS NOT NULL)
(3 rows)

postgres=# explain select * from foo whereО©╫ x <= 100001 and y is not
null and not (x < 99999 and y is not null);
О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫ QUERY PLAN
--------------------------------------------------------------------
О©╫Index Scan using foo_pkey on fooО©╫ (cost=0.42..8.48 rows=2 width=8)
О©╫О©╫ Index Cond: ((x <= 100001) AND (x >= 99999))
О©╫О©╫ Filter: (y IS NOT NULL)
(3 rows)

Konstantin Knizhnik wrote:

Hi hackers,

Postgres optimizer is not able to build efficient execution plan for
the following query:

explain select * fromО©╫ people_raw where not ("ID"<2068113880 AND
"INN" is not null) and "ID"<=2068629726 AND "INN" is not null;
О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫ QUERY PLAN
--------------------------------------------------------------------------------------------

О©╫О©╫Bitmap Heap Scan on people_raw (cost=74937803.72..210449640.49
rows=121521030 width=336)
О©╫О©╫О©╫ Recheck Cond: ("ID" <= 2068629726)
О©╫О©╫О©╫ Filter: (("INN" IS NOT NULL) AND (("ID" >= 2068113880) OR ("INN"
IS NULL)))
О©╫О©╫О©╫ ->О©╫ Bitmap Index Scan on "People_pkey" (cost=0.00..74907423.47
rows=2077021718 width=0)
О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫ Index Cond: ("ID" <= 2068629726)
(5 rows)

Here the table is very large, but query effects only relatively small
number of rows located in the range: [2068113880,2068629726]
But unfortunately optimizer doesn't take it into the account.
Moreover, using "is not null" and "not null" is both operands of AND
is not smart:
О©╫О©╫О©╫О©╫О©╫ (("INN" IS NOT NULL) AND (("ID" >= 2068113880) OR ("INN" IS
NULL)))

If I remove "is not null" condition, then plan is perfect:

explain select * fromО©╫ people_raw where not ("ID"<2068113880) and
"ID"<=2068629726;
О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫ QUERY PLAN
--------------------------------------------------------------------------------------------

О©╫О©╫Index Scan using "People_pkey" on people_raw (cost=0.58..196745.57
rows=586160 width=336)
О©╫О©╫О©╫ Index Cond: (("ID" >= 2068113880) AND ("ID" <= 2068629726))
(2 rows)

Before startingО©╫ investigation of the problem, I will like to know
opinion and may be some advise of people familiar with optimizer:
how difficult will be to handle this case and where to look.

Thanks in advance,

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

redundancy-elimination.patchtext/x-patch; name=redundancy-elimination.patchDownload+146-5