Secondary index access optimizations
Hi hackers,
I am trying to compare different ways of optimizing work with huge
append-only tables in PostgreSQL where primary key is something like
timestamp and queries are usually accessing most recent data using some
secondary keys. Size of secondary index is one of the most critical
factors limiting insert/search performance. As far as data is inserted
in timestamp ascending order, access to primary key is well localized
and accessed tables are present in memory. But if we create secondary
key for the whole table, then access to it will require random reads
from the disk and significantly decrease performance.
There are two well known solutions of the problem:
1. Table partitioning
2. Partial indexes
This approaches I want to compare. First of all I want to check if
optimizer is able to generate efficient query execution plan covering
different time intervals.
Unfortunately in both cases generated plan is not optimal.
1. Table partitioning:
create table base (k integer primary key, v integer);
create table part1 (check (k between 1 and 10000)) inherits (base);
create table part2 (check (k between 10001 and 20000)) inherits (base);
create index pi1 on part1(v);
create index pi2 on part2(v);
insert int part1 values (generate series(1,10000), random());
insert into part2 values (generate_series(10001,20000), random());
explain select * from base where k between 1 and 20000 and v = 100;
QUERY PLAN
-----------------------------------------------------------------------
Append (cost=0.00..15.65 rows=3 width=8)
-> Seq Scan on base (cost=0.00..0.00 rows=1 width=8)
Filter: ((k >= 1) AND (k <= 20000) AND (v = 100))
-> Index Scan using pi1 on part1 (cost=0.29..8.31 rows=1 width=8)
Index Cond: (v = 100)
Filter: ((k >= 1) AND (k <= 20000))
-> Index Scan using pi2 on part2 (cost=0.29..7.34 rows=1 width=8)
Index Cond: (v = 100)
Filter: ((k >= 1) AND (k <= 20000))
Questions:
- Is there some way to avoid sequential scan of parent table? Yes, it is
empty and so sequential scan will not take much time, but ... it still
requires some additional actions and so increasing query execution time.
- Why index scan of partition indexes includes filter condition if it is
guaranteed by check constraint that all records of this partition match
search predicate?
2. Partial indexes:
create table t (k integer primary key, v integer);
insert into t values (generate_series(1,20000),random());
create index i1 on t(v) where k between 1 and 10000;
create index i2 on t(v) where k between 10001 and 20000;
postgres=# explain select * from t where k between 1 and 10000 and v = 100;
QUERY PLAN
------------------------------------------------------------
Index Scan using i1 on t (cost=0.29..7.28 rows=1 width=8)
Index Cond: (v = 100)
(2 rows)
Here we get perfect plan. Let's try to extend search interval:
postgres=# explain select * from t where k between 1 and 20000 and v = 100;
QUERY PLAN
------------------------------------------------------------------
Index Scan using t_pkey on t (cost=0.29..760.43 rows=1 width=8)
Index Cond: ((k >= 1) AND (k <= 20000))
Filter: (v = 100)
(3 rows)
Unfortunately in this case Postgres is not able to apply partial indexes.
And this is what I expected to get:
postgres=# explain select * from t where k between 1 and 10000 and v =
100 union all select * from t where k between 10001 and 20000 and v = 100;
QUERY PLAN
----------------------------------------------------------------------
Append (cost=0.29..14.58 rows=2 width=8)
-> Index Scan using i1 on t (cost=0.29..7.28 rows=1 width=8)
Index Cond: (v = 100)
-> Index Scan using i2 on t t_1 (cost=0.29..7.28 rows=1 width=8)
Index Cond: (v = 100)
I wonder if there are some principle problems in supporting this two
things in optimizer:
1. Remove search condition for primary key if it is fully satisfied by
derived table check constraint.
2. Append index scans of several partial indexes if specified interval
is covered by their conditions.
I wonder if someone is familiar with this part of optimizer ad can
easily fix it.
Otherwise I am going to spend some time on solving this problems (if
community think that such optimizations will be useful).
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 14.08.2017 12:37, Konstantin Knizhnik wrote:
Hi hackers,
I am trying to compare different ways of optimizing work with huge
append-only tables in PostgreSQL where primary key is something like
timestamp and queries are usually accessing most recent data using
some secondary keys. Size of secondary index is one of the most
critical factors limiting insert/search performance. As far as data
is inserted in timestamp ascending order, access to primary key is
well localized and accessed tables are present in memory. But if we
create secondary key for the whole table, then access to it will
require random reads from the disk and significantly decrease
performance.There are two well known solutions of the problem:
1. Table partitioning
2. Partial indexesThis approaches I want to compare. First of all I want to check if
optimizer is able to generate efficient query execution plan covering
different time intervals.
Unfortunately in both cases generated plan is not optimal.1. Table partitioning:
create table base (k integer primary key, v integer);
create table part1 (check (k between 1 and 10000)) inherits (base);
create table part2 (check (k between 10001 and 20000)) inherits (base);
create index pi1 on part1(v);
create index pi2 on part2(v);
insert int part1 values (generate series(1,10000), random());
insert into part2 values (generate_series(10001,20000), random());
explain select * from base where k between 1 and 20000 and v = 100;
QUERY PLAN
-----------------------------------------------------------------------
Append (cost=0.00..15.65 rows=3 width=8)
-> Seq Scan on base (cost=0.00..0.00 rows=1 width=8)
Filter: ((k >= 1) AND (k <= 20000) AND (v = 100))
-> Index Scan using pi1 on part1 (cost=0.29..8.31 rows=1 width=8)
Index Cond: (v = 100)
Filter: ((k >= 1) AND (k <= 20000))
-> Index Scan using pi2 on part2 (cost=0.29..7.34 rows=1 width=8)
Index Cond: (v = 100)
Filter: ((k >= 1) AND (k <= 20000))Questions:
- Is there some way to avoid sequential scan of parent table? Yes, it
is empty and so sequential scan will not take much time, but ... it
still requires some additional actions and so increasing query
execution time.
- Why index scan of partition indexes includes filter condition if it
is guaranteed by check constraint that all records of this partition
match search predicate?2. Partial indexes:
create table t (k integer primary key, v integer);
insert into t values (generate_series(1,20000),random());
create index i1 on t(v) where k between 1 and 10000;
create index i2 on t(v) where k between 10001 and 20000;
postgres=# explain select * from t where k between 1 and 10000 and v =
100;
QUERY PLAN
------------------------------------------------------------
Index Scan using i1 on t (cost=0.29..7.28 rows=1 width=8)
Index Cond: (v = 100)
(2 rows)Here we get perfect plan. Let's try to extend search interval:
postgres=# explain select * from t where k between 1 and 20000 and v =
100;
QUERY PLAN
------------------------------------------------------------------
Index Scan using t_pkey on t (cost=0.29..760.43 rows=1 width=8)
Index Cond: ((k >= 1) AND (k <= 20000))
Filter: (v = 100)
(3 rows)Unfortunately in this case Postgres is not able to apply partial indexes.
And this is what I expected to get:postgres=# explain select * from t where k between 1 and 10000 and v =
100 union all select * from t where k between 10001 and 20000 and v =
100;
QUERY PLAN
----------------------------------------------------------------------
Append (cost=0.29..14.58 rows=2 width=8)
-> Index Scan using i1 on t (cost=0.29..7.28 rows=1 width=8)
Index Cond: (v = 100)
-> Index Scan using i2 on t t_1 (cost=0.29..7.28 rows=1 width=8)
Index Cond: (v = 100)I wonder if there are some principle problems in supporting this two
things in optimizer:
1. Remove search condition for primary key if it is fully satisfied by
derived table check constraint.
2. Append index scans of several partial indexes if specified interval
is covered by their conditions.I wonder if someone is familiar with this part of optimizer ad can
easily fix it.
Otherwise I am going to spend some time on solving this problems (if
community think that such optimizations will be useful).
Replying to myself: the following small patch removes redundant checks
from index scans for derived tables:
diff --git a/src/backend/optimizer/util/plancat.c
b/src/backend/optimizer/util/plancat.c
index 939045d..1f7c9cf 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1441,6 +1441,20 @@ relation_excluded_by_constraints(PlannerInfo *root,
if (predicate_refuted_by(safe_constraints,
rel->baserestrictinfo, false))
return true;
+ /*
+ * Remove from restriction list items implied by table constraints
+ */
+ safe_restrictions = NULL;
+ foreach(lc, rel->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+ if (!predicate_implied_by(list_make1(rinfo->clause),
safe_constraints, false)) {
+ safe_restrictions = lappend(safe_restrictions,
rinfo);
+ }
+ }
+ rel->baserestrictinfo = safe_restrictions;
+
+
return false;
}
=================================================
I am not sure if this is the right place of adjusting restriction list
and is uch transformation always correct.
But for the queries I have tested produced plans are correct:
postgres=# explain select * from base where k between 1 and 20000 and v
= 100;
QUERY PLAN
-----------------------------------------------------------------------
Append (cost=0.00..15.64 rows=3 width=8)
-> Seq Scan on base (cost=0.00..0.00 rows=1 width=8)
Filter: ((k >= 1) AND (k <= 20000) AND (v = 100))
-> Index Scan using pi1 on part1 (cost=0.29..8.30 rows=1 width=8)
Index Cond: (v = 100)
-> Index Scan using pi2 on part2 (cost=0.29..7.34 rows=1 width=8)
Index Cond: (v = 100)
(7 rows)
postgres=# explain select * from base where k between 1 and 15000 and v
= 100;
QUERY PLAN
-----------------------------------------------------------------------
Append (cost=0.00..15.64 rows=3 width=8)
-> Seq Scan on base (cost=0.00..0.00 rows=1 width=8)
Filter: ((k >= 1) AND (k <= 15000) AND (v = 100))
-> Index Scan using pi1 on part1 (cost=0.29..8.30 rows=1 width=8)
Index Cond: (v = 100)
-> Index Scan using pi2 on part2 (cost=0.29..7.34 rows=1 width=8)
Index Cond: (v = 100)
Filter: (k <= 15000)
(8 rows)
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 14.08.2017 19:33, Konstantin Knizhnik wrote:
On 14.08.2017 12:37, Konstantin Knizhnik wrote:
Hi hackers,
I am trying to compare different ways of optimizing work with huge
append-only tables in PostgreSQL where primary key is something like
timestamp and queries are usually accessing most recent data using
some secondary keys. Size of secondary index is one of the most
critical factors limiting insert/search performance. As far as data
is inserted in timestamp ascending order, access to primary key is
well localized and accessed tables are present in memory. But if we
create secondary key for the whole table, then access to it will
require random reads from the disk and significantly decrease
performance.There are two well known solutions of the problem:
1. Table partitioning
2. Partial indexesThis approaches I want to compare. First of all I want to check if
optimizer is able to generate efficient query execution plan covering
different time intervals.
Unfortunately in both cases generated plan is not optimal.1. Table partitioning:
create table base (k integer primary key, v integer);
create table part1 (check (k between 1 and 10000)) inherits (base);
create table part2 (check (k between 10001 and 20000)) inherits (base);
create index pi1 on part1(v);
create index pi2 on part2(v);
insert int part1 values (generate series(1,10000), random());
insert into part2 values (generate_series(10001,20000), random());
explain select * from base where k between 1 and 20000 and v = 100;
QUERY PLAN
-----------------------------------------------------------------------
Append (cost=0.00..15.65 rows=3 width=8)
-> Seq Scan on base (cost=0.00..0.00 rows=1 width=8)
Filter: ((k >= 1) AND (k <= 20000) AND (v = 100))
-> Index Scan using pi1 on part1 (cost=0.29..8.31 rows=1 width=8)
Index Cond: (v = 100)
Filter: ((k >= 1) AND (k <= 20000))
-> Index Scan using pi2 on part2 (cost=0.29..7.34 rows=1 width=8)
Index Cond: (v = 100)
Filter: ((k >= 1) AND (k <= 20000))Questions:
- Is there some way to avoid sequential scan of parent table? Yes, it
is empty and so sequential scan will not take much time, but ... it
still requires some additional actions and so increasing query
execution time.
- Why index scan of partition indexes includes filter condition if it
is guaranteed by check constraint that all records of this partition
match search predicate?2. Partial indexes:
create table t (k integer primary key, v integer);
insert into t values (generate_series(1,20000),random());
create index i1 on t(v) where k between 1 and 10000;
create index i2 on t(v) where k between 10001 and 20000;
postgres=# explain select * from t where k between 1 and 10000 and v
= 100;
QUERY PLAN
------------------------------------------------------------
Index Scan using i1 on t (cost=0.29..7.28 rows=1 width=8)
Index Cond: (v = 100)
(2 rows)Here we get perfect plan. Let's try to extend search interval:
postgres=# explain select * from t where k between 1 and 20000 and v
= 100;
QUERY PLAN
------------------------------------------------------------------
Index Scan using t_pkey on t (cost=0.29..760.43 rows=1 width=8)
Index Cond: ((k >= 1) AND (k <= 20000))
Filter: (v = 100)
(3 rows)Unfortunately in this case Postgres is not able to apply partial
indexes.
And this is what I expected to get:postgres=# explain select * from t where k between 1 and 10000 and v
= 100 union all select * from t where k between 10001 and 20000 and v
= 100;
QUERY PLAN
----------------------------------------------------------------------
Append (cost=0.29..14.58 rows=2 width=8)
-> Index Scan using i1 on t (cost=0.29..7.28 rows=1 width=8)
Index Cond: (v = 100)
-> Index Scan using i2 on t t_1 (cost=0.29..7.28 rows=1 width=8)
Index Cond: (v = 100)I wonder if there are some principle problems in supporting this two
things in optimizer:
1. Remove search condition for primary key if it is fully satisfied
by derived table check constraint.
2. Append index scans of several partial indexes if specified
interval is covered by their conditions.I wonder if someone is familiar with this part of optimizer ad can
easily fix it.
Otherwise I am going to spend some time on solving this problems (if
community think that such optimizations will be useful).Replying to myself: the following small patch removes redundant checks
from index scans for derived tables:diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index 939045d..1f7c9cf 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -1441,6 +1441,20 @@ relation_excluded_by_constraints(PlannerInfo *root, if (predicate_refuted_by(safe_constraints, rel->baserestrictinfo, false)) return true;+ /* + * Remove from restriction list items implied by table constraints + */ + safe_restrictions = NULL; + foreach(lc, rel->baserestrictinfo) + { + RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc); + if (!predicate_implied_by(list_make1(rinfo->clause), safe_constraints, false)) { + safe_restrictions = lappend(safe_restrictions, rinfo); + } + } + rel->baserestrictinfo = safe_restrictions; + + return false; }=================================================
I am not sure if this is the right place of adjusting restriction list
and is such transformation always correct.
But for the queries I have tested produced plans are correct:postgres=# explain select * from base where k between 1 and 20000 and
v = 100;
QUERY PLAN
-----------------------------------------------------------------------
Append (cost=0.00..15.64 rows=3 width=8)
-> Seq Scan on base (cost=0.00..0.00 rows=1 width=8)
Filter: ((k >= 1) AND (k <= 20000) AND (v = 100))
-> Index Scan using pi1 on part1 (cost=0.29..8.30 rows=1 width=8)
Index Cond: (v = 100)
-> Index Scan using pi2 on part2 (cost=0.29..7.34 rows=1 width=8)
Index Cond: (v = 100)
(7 rows)postgres=# explain select * from base where k between 1 and 15000 and
v = 100;
QUERY PLAN
-----------------------------------------------------------------------
Append (cost=0.00..15.64 rows=3 width=8)
-> Seq Scan on base (cost=0.00..0.00 rows=1 width=8)
Filter: ((k >= 1) AND (k <= 15000) AND (v = 100))
-> Index Scan using pi1 on part1 (cost=0.29..8.30 rows=1 width=8)
Index Cond: (v = 100)
-> Index Scan using pi2 on part2 (cost=0.29..7.34 rows=1 width=8)
Index Cond: (v = 100)
Filter: (k <= 15000)
(8 rows)
There is one more problem with predicate_implied_by function and
standard Postgres partitions.
Right now it specifies constrains for partitions using intervals with
open high boundary:
postgres=# create table bt (k integer, v integer) partition by range (k);
postgres=# create table dt1 partition of bt for values from (1) to (10001);
postgres=# create table dt2 partition of bt for values from (10001) to
(20001);
postgres=# create index dti1 on dt1(v);
postgres=# create index dti2 on dt2(v);
postgres=# insert into bt values (generate_series(1,20000), random());
postgres=# \d+ dt2
Table "public.dt2"
Column | Type | Collation | Nullable | Default | Storage | Stats
target | De
scription
--------+---------+-----------+----------+---------+---------+--------------+---
----------
k | integer | | | | plain
| |
v | integer | | | | plain
| |
Partition of: bt FOR VALUES FROM (10001) TO (20001)
Partition constraint: ((k IS NOT NULL) AND (k >= 10001) AND (k < 20001))
Indexes:
"dti2" btree (v)
If now I run the query with predicate "between 1 and 20000", I still see
extra check in the plan:
postgres=# explain select * from bt where k between 1 and 20000 and v = 100;
QUERY PLAN
----------------------------------------------------------------------
Append (cost=0.29..15.63 rows=2 width=8)
-> Index Scan using dti1 on dt1 (cost=0.29..8.30 rows=1 width=8)
Index Cond: (v = 100)
-> Index Scan using dti2 on dt2 (cost=0.29..7.33 rows=1 width=8)
Index Cond: (v = 100)
Filter: (k <= 20000)
(6 rows)
It is because operator_predicate_proof is not able to understand that k
< 20001 and k <= 20000 are equivalent for integer type.
If I rewrite query as (k >= 1 and k < 20001) then plan is optimal:
postgres=# explain select * from bt where k >= 1 and k < 20001 and v = 100;
QUERY PLAN
----------------------------------------------------------------------
Append (cost=0.29..15.63 rows=2 width=8)
-> Index Scan using dti1 on dt1 (cost=0.29..8.30 rows=1 width=8)
Index Cond: (v = 100)
-> Index Scan using dti2 on dt2 (cost=0.29..7.33 rows=1 width=8)
Index Cond: (v = 100)
(5 rows)
It is fixed by one more patch of predtest.c:
diff --git a/src/backend/optimizer/util/predtest.c
b/src/backend/optimizer/util
index 06fce84..c318d00 100644
--- a/src/backend/optimizer/util/predtest.c
+++ b/src/backend/optimizer/util/predtest.c
@@ -17,6 +17,7 @@
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "catalog/pg_operator.h"
#include "executor/executor.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
@@ -1407,6 +1408,11 @@ static const StrategyNumber BT_refute_table[6][6] = {
{none, none, BTEQ, none, none, none} /* NE */
};
+#define Int2LessOperator 95
+#define Int2LessOrEqualOperator 522
+#define Int4LessOrEqualOperator 523
+#define Int8LessOrEqualOperator 414
+
/*
* operator_predicate_proof
if (clause_const->constisnull)
return false;
+ if (!refute_it
+ && ((pred_op == Int4LessOrEqualOperator && clause_op ==
Int4LessOperator)
+ || (pred_op == Int8LessOrEqualOperator && clause_op ==
Int8LessOperator)
+ || (pred_op == Int2LessOrEqualOperator && clause_op ==
Int2LessOperator))
+ && pred_const->constbyval && clause_const->constbyval
+ && pred_const->constvalue + 1 == clause_const->constvalue)
+ {
+ return true;
+ }
+
/*
* Lookup the constant-comparison operator using the system
catalogs and
* the operator implication tables.
===============================
Now plan is perfect once again:
postgres=# explain select * from bt where k between 1 and 20000 and v = 100;
QUERY PLAN
----------------------------------------------------------------------
Append (cost=0.29..15.63 rows=2 width=8)
-> Index Scan using dti1 on dt1 (cost=0.29..8.30 rows=1 width=8)
Index Cond: (v = 100)
-> Index Scan using dti2 on dt2 (cost=0.29..7.33 rows=1 width=8)
Index Cond: (v = 100)
(5 rows)
postgres=# explain select * from bt where k between 1 and 15000 and v = 100;
QUERY PLAN
----------------------------------------------------------------------
Append (cost=0.29..15.63 rows=2 width=8)
-> Index Scan using dti1 on dt1 (cost=0.29..8.30 rows=1 width=8)
Index Cond: (v = 100)
-> Index Scan using dti2 on dt2 (cost=0.29..7.33 rows=1 width=8)
Index Cond: (v = 100)
Filter: (k <= 15000)
(6 rows)
I am not sure that proposed approach is general enough, but I do not
know how to implement in more elegant way.
Composed patch is attached to this mail.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
optimizer.patchtext/x-patch; name=optimizer.patchDownload+30-0
On Wed, Aug 16, 2017 at 9:23 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
postgres=# explain select * from bt where k between 1 and 20000 and v = 100;
QUERY PLAN
----------------------------------------------------------------------
Append (cost=0.29..15.63 rows=2 width=8)
-> Index Scan using dti1 on dt1 (cost=0.29..8.30 rows=1 width=8)
Index Cond: (v = 100)
-> Index Scan using dti2 on dt2 (cost=0.29..7.33 rows=1 width=8)
Index Cond: (v = 100)
Filter: (k <= 20000)
(6 rows)
+1
This seems like a good feature to me: filtering stuff that is
obviously true is a waste of CPU cycles and may even require people to
add redundant stuff to indexes. I was pondering something related to
this over in the partition-wise join thread (join quals that are
implied by partition constraints and should be discarded).
It'd be interesting to get Amit Langote's feedback, so I CC'd him.
I'd be surprised if he and others haven't got a plan or a patch for
this down the back of the sofa.
I might be missing some higher level architectural problems with the
patch, but for what it's worth here's some feedback after a first read
through:
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1441,6 +1441,20 @@ relation_excluded_by_constraints(PlannerInfo *root,
if (predicate_refuted_by(safe_constraints, rel->baserestrictinfo, false))
return true;
+ /*
+ * Remove from restrictions list items implied by table constraints
+ */
+ safe_restrictions = NULL;
+ foreach(lc, rel->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
I think the new way to write this is "RestrictInfo *rinfo =
lfirst_node(RestrictInfo, lc)".
+ if (!predicate_implied_by(list_make1(rinfo->clause),
safe_constraints, false)) {
+ safe_restrictions = lappend(safe_restrictions, rinfo);
+ }
+ }
+ rel->baserestrictinfo = safe_restrictions;
It feels wrong to modify rel->baserestrictinfo in
relation_excluded_by_constraints(). I think there should probably be
a function with a name that more clearly indicates that it mutates its
input, and you should call that separately after
relation_excluded_by_constraints(). Something like
remove_restrictions_implied_by_constraints()?
It is because operator_predicate_proof is not able to understand that k <
20001 and k <= 20000 are equivalent for integer type.[...]
/*
* operator_predicate_proof
if (clause_const->constisnull)
return false;+ if (!refute_it + && ((pred_op == Int4LessOrEqualOperator && clause_op == Int4LessOperator) + || (pred_op == Int8LessOrEqualOperator && clause_op == Int8LessOperator) + || (pred_op == Int2LessOrEqualOperator && clause_op == Int2LessOperator)) + && pred_const->constbyval && clause_const->constbyval + && pred_const->constvalue + 1 == clause_const->constvalue) + { + return true; + } +
I'm less sure about this part. It seems like a slippery slope.
A couple of regression test failures:
inherit ... FAILED
rowsecurity ... FAILED
========================
2 of 179 tests failed.
========================
I didn't try to understand the rowsecurity one, but at first glance
all the differences reported in the inherit test are in fact cases
where your patch is doing the right thing and removing redundant
filters from scans. Nice!
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 09/02/2017 06:44 AM, Thomas Munro wrote:
On Wed, Aug 16, 2017 at 9:23 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:postgres=# explain select * from bt where k between 1 and 20000 and v = 100;
QUERY PLAN
----------------------------------------------------------------------
Append (cost=0.29..15.63 rows=2 width=8)
-> Index Scan using dti1 on dt1 (cost=0.29..8.30 rows=1 width=8)
Index Cond: (v = 100)
-> Index Scan using dti2 on dt2 (cost=0.29..7.33 rows=1 width=8)
Index Cond: (v = 100)
Filter: (k <= 20000)
(6 rows)+1
This seems like a good feature to me: filtering stuff that is
obviously true is a waste of CPU cycles and may even require people to
add redundant stuff to indexes. I was pondering something related to
this over in the partition-wise join thread (join quals that are
implied by partition constraints and should be discarded).It'd be interesting to get Amit Langote's feedback, so I CC'd him.
I'd be surprised if he and others haven't got a plan or a patch for
this down the back of the sofa.I might be missing some higher level architectural problems with the
patch, but for what it's worth here's some feedback after a first read
through:--- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -1441,6 +1441,20 @@ relation_excluded_by_constraints(PlannerInfo *root, if (predicate_refuted_by(safe_constraints, rel->baserestrictinfo, false)) return true;+ /* + * Remove from restrictions list items implied by table constraints + */ + safe_restrictions = NULL; + foreach(lc, rel->baserestrictinfo) + { + RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);I think the new way to write this is "RestrictInfo *rinfo =
lfirst_node(RestrictInfo, lc)".+ if (!predicate_implied_by(list_make1(rinfo->clause), safe_constraints, false)) { + safe_restrictions = lappend(safe_restrictions, rinfo); + } + } + rel->baserestrictinfo = safe_restrictions;It feels wrong to modify rel->baserestrictinfo in
relation_excluded_by_constraints(). I think there should probably be
a function with a name that more clearly indicates that it mutates its
input, and you should call that separately after
relation_excluded_by_constraints(). Something like
remove_restrictions_implied_by_constraints()?It is because operator_predicate_proof is not able to understand that k <
20001 and k <= 20000 are equivalent for integer type.[...]
/*
* operator_predicate_proof
if (clause_const->constisnull)
return false;+ if (!refute_it + && ((pred_op == Int4LessOrEqualOperator && clause_op == Int4LessOperator) + || (pred_op == Int8LessOrEqualOperator && clause_op == Int8LessOperator) + || (pred_op == Int2LessOrEqualOperator && clause_op == Int2LessOperator)) + && pred_const->constbyval && clause_const->constbyval + && pred_const->constvalue + 1 == clause_const->constvalue) + { + return true; + } +I'm less sure about this part. It seems like a slippery slope.
A couple of regression test failures:
inherit ... FAILED
rowsecurity ... FAILED
========================
2 of 179 tests failed.
========================I didn't try to understand the rowsecurity one, but at first glance
all the differences reported in the inherit test are in fact cases
where your patch is doing the right thing and removing redundant
filters from scans. Nice!
Thank you for review.
I attached new version of the patch with remove_restrictions_implied_by_constraints() function.
Concerning failed tests - this is actually result of this optimization: extra filter conditions are removed from query plans.
Sorry, I have not included updated version of expected test output files to the patch.
Now I did it.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
optimizer-2.patchtext/x-diff; name=optimizer-2.patchDownload+99-51
On 2017/09/02 12:44, Thomas Munro wrote:
On Wed, Aug 16, 2017 at 9:23 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:postgres=# explain select * from bt where k between 1 and 20000 and v = 100;
QUERY PLAN
----------------------------------------------------------------------
Append (cost=0.29..15.63 rows=2 width=8)
-> Index Scan using dti1 on dt1 (cost=0.29..8.30 rows=1 width=8)
Index Cond: (v = 100)
-> Index Scan using dti2 on dt2 (cost=0.29..7.33 rows=1 width=8)
Index Cond: (v = 100)
Filter: (k <= 20000)
(6 rows)+1
This seems like a good feature to me: filtering stuff that is
obviously true is a waste of CPU cycles and may even require people to
add redundant stuff to indexes. I was pondering something related to
this over in the partition-wise join thread (join quals that are
implied by partition constraints and should be discarded).It'd be interesting to get Amit Langote's feedback, so I CC'd him.
I'd be surprised if he and others haven't got a plan or a patch for
this down the back of the sofa.
I agree that that's a good optimization in the cases it's correct. Given
that check_index_predicates() already applies the same optimization when
considering using a partial index, it might make sense to try to do the
same even earlier for the table itself using its CHECK / NOT NULL
constraints as predicates (I said *earlier* because
relation_excluded_by_constrains happens for a relation before we look at
its indexes). Also, at the end of relation_excluded_by_constraints() may
not be such a bad place to do this.
By the way, I read in check_index_predicates() that we should not apply
this optimization if the relation in question is a target of UPDATE /
DELETE / SELECT FOR UPDATE.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 04.09.2017 05:38, Amit Langote wrote:
On 2017/09/02 12:44, Thomas Munro wrote:
On Wed, Aug 16, 2017 at 9:23 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:postgres=# explain select * from bt where k between 1 and 20000 and v = 100;
QUERY PLAN
----------------------------------------------------------------------
Append (cost=0.29..15.63 rows=2 width=8)
-> Index Scan using dti1 on dt1 (cost=0.29..8.30 rows=1 width=8)
Index Cond: (v = 100)
-> Index Scan using dti2 on dt2 (cost=0.29..7.33 rows=1 width=8)
Index Cond: (v = 100)
Filter: (k <= 20000)
(6 rows)+1
This seems like a good feature to me: filtering stuff that is
obviously true is a waste of CPU cycles and may even require people to
add redundant stuff to indexes. I was pondering something related to
this over in the partition-wise join thread (join quals that are
implied by partition constraints and should be discarded).It'd be interesting to get Amit Langote's feedback, so I CC'd him.
I'd be surprised if he and others haven't got a plan or a patch for
this down the back of the sofa.I agree that that's a good optimization in the cases it's correct. Given
that check_index_predicates() already applies the same optimization when
considering using a partial index, it might make sense to try to do the
same even earlier for the table itself using its CHECK / NOT NULL
constraints as predicates (I said *earlier* because
relation_excluded_by_constrains happens for a relation before we look at
its indexes). Also, at the end of relation_excluded_by_constraints() may
not be such a bad place to do this.By the way, I read in check_index_predicates() that we should not apply
this optimization if the relation in question is a target of UPDATE /
DELETE / SELECT FOR UPDATE.
Please correct me if I wrong, but it seems to me that in case of table
constraints it is not necessary to specially handle update case.
As far as I understand we need to leave predicate in the plan in case of
partial indexes because due to "read committed" isolation policy
we may need to recheck that tuple still satisfies update condition
(tuple can be changed by some other committed transaction while we are
waiting for it and not satisfying this condition any more).
But no transaction can change tuple in such way that it violates table
constraints, right? So we do not need to recheck it.
Concerning your suggestion to merge check_index_predicates() and
remove_restrictions_implied_by_constraints() functions: may be it can be
done, but frankly speaking I do not see much sense in it - there are too
much differences between this functions and too few code reusing.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Konstantin,
On 2017/09/04 18:19, Konstantin Knizhnik wrote:
On 04.09.2017 05:38, Amit Langote wrote:
On 2017/09/02 12:44, Thomas Munro wrote:
On Wed, Aug 16, 2017 at 9:23 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:postgres=# explain select * from bt where k between 1 and 20000 and v
= 100;
QUERY PLAN
----------------------------------------------------------------------
Append (cost=0.29..15.63 rows=2 width=8)
-> Index Scan using dti1 on dt1 (cost=0.29..8.30 rows=1 width=8)
Index Cond: (v = 100)
-> Index Scan using dti2 on dt2 (cost=0.29..7.33 rows=1 width=8)
Index Cond: (v = 100)
Filter: (k <= 20000)
(6 rows)+1
This seems like a good feature to me: filtering stuff that is
obviously true is a waste of CPU cycles and may even require people to
add redundant stuff to indexes. I was pondering something related to
this over in the partition-wise join thread (join quals that are
implied by partition constraints and should be discarded).It'd be interesting to get Amit Langote's feedback, so I CC'd him.
I'd be surprised if he and others haven't got a plan or a patch for
this down the back of the sofa.I agree that that's a good optimization in the cases it's correct. Given
that check_index_predicates() already applies the same optimization when
considering using a partial index, it might make sense to try to do the
same even earlier for the table itself using its CHECK / NOT NULL
constraints as predicates (I said *earlier* because
relation_excluded_by_constrains happens for a relation before we look at
its indexes). Also, at the end of relation_excluded_by_constraints() may
not be such a bad place to do this.By the way, I read in check_index_predicates() that we should not apply
this optimization if the relation in question is a target of UPDATE /
DELETE / SELECT FOR UPDATE.Please correct me if I wrong, but it seems to me that in case of table
constraints it is not necessary to specially handle update case.
As far as I understand we need to leave predicate in the plan in case of
partial indexes because due to "read committed" isolation policy
we may need to recheck that tuple still satisfies update condition (tuple
can be changed by some other committed transaction while we are waiting
for it and not satisfying this condition any more).
But no transaction can change tuple in such way that it violates table
constraints, right? So we do not need to recheck it.
Actually, I don't really know why check_index_predicates() skips this
optimization in the target relation case, just wanted to point out that
that's so.
Thinking a bit from what you wrote, maybe we need not worry about
EvalPlanQual in the context of your proposed optimization based on the
table's constraints.
Concerning your suggestion to merge check_index_predicates() and
remove_restrictions_implied_by_constraints() functions: may be it can be
done, but frankly speaking I do not see much sense in it - there are too
much differences between this functions and too few code reusing.
Maybe, you meant to address Thomas here. :) Reading his comment again, I
too am a bit concerned about destructively modifying the input rel's
baserestrictinfo. There should at least be a comment that that's being done.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 04.09.2017 12:59, Amit Langote wrote:
Hi Konstantin,
On 2017/09/04 18:19, Konstantin Knizhnik wrote:
On 04.09.2017 05:38, Amit Langote wrote:
On 2017/09/02 12:44, Thomas Munro wrote:
On Wed, Aug 16, 2017 at 9:23 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:postgres=# explain select * from bt where k between 1 and 20000 and v
= 100;
QUERY PLAN
----------------------------------------------------------------------
Append (cost=0.29..15.63 rows=2 width=8)
-> Index Scan using dti1 on dt1 (cost=0.29..8.30 rows=1 width=8)
Index Cond: (v = 100)
-> Index Scan using dti2 on dt2 (cost=0.29..7.33 rows=1 width=8)
Index Cond: (v = 100)
Filter: (k <= 20000)
(6 rows)+1
This seems like a good feature to me: filtering stuff that is
obviously true is a waste of CPU cycles and may even require people to
add redundant stuff to indexes. I was pondering something related to
this over in the partition-wise join thread (join quals that are
implied by partition constraints and should be discarded).It'd be interesting to get Amit Langote's feedback, so I CC'd him.
I'd be surprised if he and others haven't got a plan or a patch for
this down the back of the sofa.I agree that that's a good optimization in the cases it's correct. Given
that check_index_predicates() already applies the same optimization when
considering using a partial index, it might make sense to try to do the
same even earlier for the table itself using its CHECK / NOT NULL
constraints as predicates (I said *earlier* because
relation_excluded_by_constrains happens for a relation before we look at
its indexes). Also, at the end of relation_excluded_by_constraints() may
not be such a bad place to do this.By the way, I read in check_index_predicates() that we should not apply
this optimization if the relation in question is a target of UPDATE /
DELETE / SELECT FOR UPDATE.Please correct me if I wrong, but it seems to me that in case of table
constraints it is not necessary to specially handle update case.
As far as I understand we need to leave predicate in the plan in case of
partial indexes because due to "read committed" isolation policy
we may need to recheck that tuple still satisfies update condition (tuple
can be changed by some other committed transaction while we are waiting
for it and not satisfying this condition any more).
But no transaction can change tuple in such way that it violates table
constraints, right? So we do not need to recheck it.Actually, I don't really know why check_index_predicates() skips this
optimization in the target relation case, just wanted to point out that
that's so.Thinking a bit from what you wrote, maybe we need not worry about
EvalPlanQual in the context of your proposed optimization based on the
table's constraints.Concerning your suggestion to merge check_index_predicates() and
remove_restrictions_implied_by_constraints() functions: may be it can be
done, but frankly speaking I do not see much sense in it - there are too
much differences between this functions and too few code reusing.Maybe, you meant to address Thomas here. :) Reading his comment again, I
too am a bit concerned about destructively modifying the input rel's
baserestrictinfo. There should at least be a comment that that's being done.
But I have considered Thomas comment and extracted code updating
relation's baserestrictinfo from
relation_excluded_by_constraints() to
remove_restrictions_implied_by_constraints() function. It was included
in new version of the patch.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/04 20:46, Konstantin Knizhnik wrote:
On 04.09.2017 12:59, Amit Langote wrote:
On 2017/09/04 18:19, Konstantin Knizhnik wrote:
Concerning your suggestion to merge check_index_predicates() and
remove_restrictions_implied_by_constraints() functions: may be it can be
done, but frankly speaking I do not see much sense in it - there are too
much differences between this functions and too few code reusing.Maybe, you meant to address Thomas here. :) Reading his comment again, I
too am a bit concerned about destructively modifying the input rel's
baserestrictinfo. There should at least be a comment that that's being
done.But I have considered Thomas comment and extracted code updating
relation's baserestrictinfo from
relation_excluded_by_constraints() to
remove_restrictions_implied_by_constraints() function. It was included in
new version of the patch.
Sorry, I hadn't noticed the new patch.
I was confused because I didn't suggest "to merge check_index_predicates()
and remove_restrictions_implied_by_constraints() functions". Perhaps, the
wording in my previous message wasn't clear.
Looking at the new patch, the changes regarding
remove_restrictions_implied_by_constraints() look fine.
Like Thomas, I'm not so sure about the whole predtest.c patch. The core
logic in operator_predicate_proof() should be able to conclude that, say,
k < 21 is implied by k <= 20, which you are trying to address with some
special case code. If there is still problem you think need to be fixed
here, a better place to look at would be somewhere around get_btree_test_op().
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 05.09.2017 04:02, Amit Langote wrote:
Like Thomas, I'm not so sure about the whole predtest.c patch. The core
logic in operator_predicate_proof() should be able to conclude that, say,
k < 21 is implied by k <= 20, which you are trying to address with some
special case code. If there is still problem you think need to be fixed
here, a better place to look at would be somewhere around get_btree_test_op().
Frankly speaking I also do not like this part of my patch.
I will be pleased if you or somebody else can propose better solution.
I do not understand how get_btree_test_op() can help here.
Yes, k < 21 is implied by k <= 20. It is based on generic properties of
< and <= operators.
But I need to proof something different: having table partition
constraint (k < 21) I want to remove predicate (k <= 20) from query.
In other words, operator_predicate_proof() should be able to conclude
that (k <= 20) is implied by (k < 21).
But it is true only for integer types, not for floating point types. And
Postgres operator definition
doesn't provide some way to determine that user defined type is integer
type: has integer values for which such conclusion is true.
Why I think that it is important? Certainly, it is possible to rewrite
query as (k < 21) and no changes in operator_predicate_proof() are needed.
Assume the most natural use case: I have some positive integer key and I
wan to range partition table by such key, for example with interval 10000.
Currently standard PostgreSQL partitioning mechanism requires to specify
intervals with open high boundary.
So if I want first partition to contain interval [1,10000], second -
[10001,20001],... I have to create partitions in such way:
create table bt (k integer, v integer) partition by range (k);
create table dt1 partition of bt for values from (1) to (10001);
create table dt2 partition of bt for values from (10001) to (20001);
...
If I want to write query inspecting data of the particular partition,
then most likely I will use BETWEEN operator:
SELECT * FROM t WHERE k BETWEEN 1 and 10000;
But right now operator_predicate_proof() is not able to conclude that
predicate (k BETWEEN 1 and 10000) transformed to (k >= 1 AND k <= 10000)
is equivalent to (k >= 1 AND k < 10001)
which is used as partition constraint.
Another very popular use case (for example mentioned in PostgreSQL
documentation of partitioning:
https://www.postgresql.org/docs/10/static/ddl-partitioning.html)
is using date as partition key:
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);
CREATE TABLE measurement_y2006m03 PARTITION OF measurement
FOR VALUES FROM ('2006-03-01') TO ('2006-04-01')
Assume that now I want to get measurements for March:
There are three ways to write this query:
select * from measurement where extract(month from logdate) = 3;
select * from measurement where logdate between '2006-03-01' AND
'2006-03-31';
select * from measurement where logdate >= '2006-03-01' AND logdate <
'2006-04-01';
Right now only for the last query optimal query plan will be constructed.
Unfortunately my patch is not covering date type.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Sun, Sep 3, 2017 at 4:34 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
Thank you for review.
I attached new version of the patch with
remove_restrictions_implied_by_constraints() function.
Concerning failed tests - this is actually result of this optimization:
extra filter conditions are removed from query plans.
Sorry, I have not included updated version of expected test output files to
the patch.
Now I did it.
A regression test under contrib/postgres_fdw now fails:
- Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T
1" WHERE (("C 1" IS NOT NULL))
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
(("C 1" IS NOT NULL)) is indeed redundant in that case, because column
"C 1" was declared to be NOT NULL. But:
1. Do we want to go this far? Note that this is not involving
inheritance and constraint exclusion. I don't immediately see any
reason why not, but I'm not sure.
2. If yes, then this postgres_fdw test should be changed, because I
think it was trying to demonstrate that IS NOT NULL expressions are
sent to remote databases -- it would need to be changed so that it
tries that with a column that is actually nullable.
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 07.09.2017 13:00, Thomas Munro wrote:
On Sun, Sep 3, 2017 at 4:34 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:Thank you for review.
I attached new version of the patch with
remove_restrictions_implied_by_constraints() function.
Concerning failed tests - this is actually result of this optimization:
extra filter conditions are removed from query plans.
Sorry, I have not included updated version of expected test output files to
the patch.
Now I did it.A regression test under contrib/postgres_fdw now fails:
- Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" IS NOT NULL)) + Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"(("C 1" IS NOT NULL)) is indeed redundant in that case, because column
"C 1" was declared to be NOT NULL. But:1. Do we want to go this far? Note that this is not involving
inheritance and constraint exclusion. I don't immediately see any
reason why not, but I'm not sure.2. If yes, then this postgres_fdw test should be changed, because I
think it was trying to demonstrate that IS NOT NULL expressions are
sent to remote databases -- it would need to be changed so that it
tries that with a column that is actually nullable.
I do not see any reasons why we should disable this optimization in case
of FDW.
And disabling it requires some extra efforts...
So I have updated test for postgres_fdw, replacing query
SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL;
with
SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL and c3 is not null;
Now it checks two things:
1. That not null check is passed to foreign server for nullable column (c3)
2. That not null check is excluded from query execution plan when it can
be omitted because column is not nullable.
Updated version of the patch is attached to this mail.
Also I added support of date type to operator_predicate_proof to be able
to imply (logdate <= '2017-03-31') from (logdate < '2017-04-01') .
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
optimizer-3.patchtext/x-patch; name=optimizer-3.patchDownload+107-55
On Fri, Sep 8, 2017 at 3:58 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
Updated version of the patch is attached to this mail.
Also I added support of date type to operator_predicate_proof to be able to
imply (logdate <= '2017-03-31') from (logdate < '2017-04-01') .
Hi Konstantin,
Is there any reason why you don't want to split this into two separate
proposals? One for remove_restrictions_implied_by_constraints() and
one for the operator_predicate_proof() changes.
Your v3 patch breaks the new partition_join test (the recently
committed partition-wise join stuff), as far as I can tell in a good
way. Can you please double check those changes and post an updated
patch?
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11/06/2017 04:27 AM, Thomas Munro wrote:
On Fri, Sep 8, 2017 at 3:58 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:Updated version of the patch is attached to this mail.
Also I added support of date type to operator_predicate_proof to be able to
imply (logdate <= '2017-03-31') from (logdate < '2017-04-01') .Hi Konstantin,
Is there any reason why you don't want to split this into two separate
proposals? One for remove_restrictions_implied_by_constraints() and
one for the operator_predicate_proof() changes.Your v3 patch breaks the new partition_join test (the recently
committed partition-wise join stuff), as far as I can tell in a good
way. Can you please double check those changes and post an updated
patch?
Hi Thomas.
The primary idea of this patch was to provide more efficient plans for queries on partitioned tables.
So remove_restrictions_implied_by_constraints() removes redundant predicate checks.
But it doesn't work for standard Postgres 10 partitioning, because here constraints are set using intervals with open high boundary and original version of
operator_predicate_proof() is not able to handle this case.
I have explained this problem in my previous e-mails in this thread.
This is why I have changed operator_predicate_proof() to correctly handle this case.
If you think this patch should be splitted into two, ok: I can do it.
I just want to notice that without patching operator_predicate_proof() it may give not positive effect for standard partitioning,
which I expect to be the most popular use case where this optimization may have an effect.
Concerning broken partition_join test: it is "expected" failure: my patch removes from the plans redundant checks.
So the only required action is to update expected file with results.
Attached please find updated patch.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
optimizer-4.patchtext/x-diff; name=optimizer-4.patchDownload+110-60
On Mon, Nov 6, 2017 at 10:13 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
Concerning broken partition_join test: it is "expected" failure: my patch
removes from the plans redundant checks.
So the only required action is to update expected file with results.
Attached please find updated patch.
The last patch version has conflicts in regression tests and did not
get any reviews. Please provide a rebased version. I am moving the
patch to next CF with waiting on author as status. Thanks!
--
Michael
On 30.11.2017 05:16, Michael Paquier wrote:
On Mon, Nov 6, 2017 at 10:13 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:Concerning broken partition_join test: it is "expected" failure: my patch
removes from the plans redundant checks.
So the only required action is to update expected file with results.
Attached please find updated patch.The last patch version has conflicts in regression tests and did not
get any reviews. Please provide a rebased version. I am moving the
patch to next CF with waiting on author as status. Thanks!
Rebased patch is attached.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
optimizer-5.patchtext/x-patch; name=optimizer-5.patchDownload+6904-1309
Konstantin Knizhnik wrote:
On 30.11.2017 05:16, Michael Paquier wrote:
On Mon, Nov 6, 2017 at 10:13 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:Concerning broken partition_join test: it is "expected" failure: my patch
removes from the plans redundant checks.
So the only required action is to update expected file with results.
Attached please find updated patch.The last patch version has conflicts in regression tests and did not
get any reviews. Please provide a rebased version. I am moving the
patch to next CF with waiting on author as status. Thanks!Rebased patch is attached.
I don't think this is a rebase on the previously posted patch ... it's
about 10x as big and appears to be a thorough rewrite of the entire
optimizer.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 04.12.2017 19:44, Alvaro Herrera wrote:
Konstantin Knizhnik wrote:
On 30.11.2017 05:16, Michael Paquier wrote:
On Mon, Nov 6, 2017 at 10:13 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:Concerning broken partition_join test: it is "expected" failure: my patch
removes from the plans redundant checks.
So the only required action is to update expected file with results.
Attached please find updated patch.The last patch version has conflicts in regression tests and did not
get any reviews. Please provide a rebased version. I am moving the
patch to next CF with waiting on author as status. Thanks!Rebased patch is attached.
I don't think this is a rebase on the previously posted patch ... it's
about 10x as big and appears to be a thorough rewrite of the entire
optimizer.
Or, sorry. I really occasionally committed in this branch patch for
aggregate push down.
Correct reabased patch is attached.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
optimizer-6.patchtext/x-patch; name=optimizer-6.patchDownload+189-237
On Tue, Sep 5, 2017 at 9:10 AM, Konstantin Knizhnik <
k.knizhnik@postgrespro.ru> wrote:
On 05.09.2017 04:02, Amit Langote wrote:
Like Thomas, I'm not so sure about the whole predtest.c patch. The core
logic in operator_predicate_proof() should be able to conclude that, say,
k < 21 is implied by k <= 20, which you are trying to address with some
special case code. If there is still problem you think need to be fixed
here, a better place to look at would be somewhere around get_btree_test_op().Frankly speaking I also do not like this part of my patch.
I will be pleased if you or somebody else can propose better solution.
I do not understand how get_btree_test_op() can help here.Yes, k < 21 is implied by k <= 20. It is based on generic properties of <
and <= operators.
But I need to proof something different: having table partition constraint
(k < 21) I want to remove predicate (k <= 20) from query.
In other words, operator_predicate_proof() should be able to conclude
that (k <= 20) is implied by (k < 21).
But it is true only for integer types, not for floating point types. And
Postgres operator definition
doesn't provide some way to determine that user defined type is integer
type: has integer values for which such conclusion is true.Why I think that it is important? Certainly, it is possible to rewrite
query as (k < 21) and no changes in operator_predicate_proof() are needed.
Assume the most natural use case: I have some positive integer key and I
wan to range partition table by such key, for example with interval 10000.
Currently standard PostgreSQL partitioning mechanism requires to specify
intervals with open high boundary.
So if I want first partition to contain interval [1,10000], second -
[10001,20001],... I have to create partitions in such way:create table bt (k integer, v integer) partition by range (k);
create table dt1 partition of bt for values from (1) to (10001);
create table dt2 partition of bt for values from (10001) to (20001);
...If I want to write query inspecting data of the particular partition, then
most likely I will use BETWEEN operator:SELECT * FROM t WHERE k BETWEEN 1 and 10000;
But right now operator_predicate_proof() is not able to conclude that
predicate (k BETWEEN 1 and 10000) transformed to (k >= 1 AND k <= 10000) is
equivalent to (k >= 1 AND k < 10001)
which is used as partition constraint.Another very popular use case (for example mentioned in PostgreSQL
documentation of partitioning: https://www.postgresql.org/
docs/10/static/ddl-partitioning.html)
is using date as partition key:CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);CREATE TABLE measurement_y2006m03 PARTITION OF measurement
FOR VALUES FROM ('2006-03-01') TO ('2006-04-01')Assume that now I want to get measurements for March:
There are three ways to write this query:
select * from measurement where extract(month from logdate) = 3;
select * from measurement where logdate between '2006-03-01' AND
'2006-03-31';
select * from measurement where logdate >= '2006-03-01' AND logdate <
'2006-04-01';Right now only for the last query optimal query plan will be constructed.
Perhaps the relative pages (about partitioning and optimization) should
mention to avoid BETWEEN and using closed-open checks, as the last query.
Dates are a perfect example to demonstrate that BETWEEN shouldn't be used,
in my opinion. Dates (and timestamps) are not like integers as they are
often used with different levels of precisions, day, month, year, hour,
minute, second, etc. (month in your example). Constructing the correct
expressions for the different precisions can be a nightmare with BETWEEN
but very simple with >= and < (in the example: get start_date,
'2006-03-01', and add one month).
So, just my 2c, is it worth the trouble to implement this feature
(conversion of (k<21) to (k<=20) and vice versa) and how much work would it
need for all data types that are commonly used for partitioning?
Show quoted text
Unfortunately my patch is not covering date type.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company