Strange (and good) side effect of partitioning ?

Started by Phil Florentabout 5 years ago6 messagesgeneral
Jump to latest
#1Phil Florent
philflorent@hotmail.com

Hi,

I read that on Jonathan Lewis' blog :

(I believe that there may be some RDBMS which will treat (e.g.) “X between 20 and 10” as being identical to “X between 10 and 20” )

I am puzzled. PostgreSQL seems NOT to treat X between 20 and 10” as being identical to “X between 10 and 20" but it's complicated.

Here is my test case:

select version();
version
----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 13.1 (Ubuntu 13.1-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit

create table t1 (rn integer , object_name text) partition by range(rn);

create table t1a partition of t1 for values from (1) to (50001);

\d+ t1
Table partitionnée « public.t1 »
Colonne | Type | Collationnement | NULL-able | Par défaut | Stockage | Cible de statistiques | Description
-------------+---------+-----------------+-----------+------------+----------+-----------------------+-------------
rn | integer | | | | plain | |
object_name | text | | | | extended | |
Clé de partition : RANGE (rn)
Partitions: t1a FOR VALUES FROM (1) TO (50001)

insert into t1 select
rownum rn,
upper(md5(random()::text)) object_name
from
(select generate_series(1,50000) rownum) serie
;

explain analyze select object_name
from t1
where
rn between 20 and 10
;
QUERY PLAN
------------------------------------------------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0) (actual time=0.002..0.002 rows=0 loops=1)
One-Time Filter: false
Planning Time: 0.116 ms
Execution Time: 0.020 ms

It's OK but:

explain analyze select object_name
from t1a
where
rn between 20 and 10
;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Seq Scan on t1a (cost=0.00..1167.00 rows=1 width=33) (actual time=6.553..6.553 rows=0 loops=1)
Filter: ((rn >= 20) AND (rn <= 10))
Rows Removed by Filter: 50000
Planning Time: 0.092 ms
Execution Time: 6.573 ms

At first I thought it was related to partition pruning but:

set enable_partition_pruning = false;

explain analyze select object_name
from t1
where
rn between 20 and 10
;

QUERY PLAN
------------------------------------------------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0) (actual time=0.002..0.003 rows=0 loops=1)
One-Time Filter: false
Planning Time: 0.104 ms
Execution Time: 0.021 ms

Confirmation since I still obtain "One-Time Filter: false" if I don't filter on the partition key:

create table t2 (rn integer , rn2 integer, object_name text) partition by range(rn);

create table t2a partition of t2 for values from (1) to (50001);

d+ t2
Table partitionnée « public.t2 »
Colonne | Type | Collationnement | NULL-able | Par défaut | Stockage | Cible de statistiques | Description
-------------+---------+-----------------+-----------+------------+----------+-----------------------+-------------
rn | integer | | | | plain | |
rn2 | integer | | | | plain | |
object_name | text | | | | extended | |
Clé de partition : RANGE (rn)
Partitions: t2a FOR VALUES FROM (1) TO (50001)

insert into t2 select
rownum rn, rownum rn2,
upper(md5(random()::text)) object_name
from
(select generate_series(1,50000) rownum) serie
;

explain analyze select object_name
from t2
where
rn2 between 20 and 10
;

QUERY PLAN
------------------------------------------------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0) (actual time=0.002..0.003 rows=0 loops=1)
One-Time Filter: false
Planning Time: 0.185 ms
Execution Time: 0.019 ms

I don't understand why I don't obtain " One-Time Filter: false" with a classic table or a partition ?

Best regards,

Phil

#2rob stone
floriparob@gmail.com
In reply to: Phil Florent (#1)
Re: Strange (and good) side effect of partitioning ?

Hi,

On Thu, 2021-01-14 at 20:48 +0000, Phil Florent wrote:

Hi,

I read that on Jonathan Lewis' blog :

(I believe that there may be some RDBMS which will treat (e.g.) “X
between 20 and 10” as being identical to“X between 10 and 20” )

I am puzzled. PostgreSQL seems NOT to treat X between 20 and 10” as
being identical to“X between 10 and 20" but it's complicated.

Here is my test case:

select version();                                                   
         version                                                    
         
---------------------------------------------------------------------
-------------------------------------------------------------
 PostgreSQL 13.1 (Ubuntu 13.1-1.pgdg20.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit

create table t1 (rn integer , object_name text) partition by
range(rn);

create table t1a partition of t1 for values from (1) to (50001);

\d+ t1                                         Table partitionnée «
public.t1 »
   Colonne   |  Type   | Collationnement | NULL-able | Par défaut |
Stockage | Cible de statistiques | Description
-------------+---------+-----------------+-----------+------------+--
--------+-----------------------+-------------
 rn          | integer |                 |           |            |
plain    |                       |
 object_name | text    |                 |           |            |
extended |                       |
Clé de partition : RANGE (rn)
Partitions: t1a FOR VALUES FROM (1) TO (50001)

insert into t1 select                                    rownum  rn,
        upper(md5(random()::text)) object_name
from
        (select generate_series(1,50000) rownum) serie
;

explain analyze select  object_namefrom    t1
where
        rn between 20 and 10
;
                                     QUERY PLAN                      
             
---------------------------------------------------------------------
---------------
 Result  (cost=0.00..0.00 rows=0 width=0) (actual time=0.002..0.002
rows=0 loops=1)
   One-Time Filter: false
 Planning Time: 0.116 ms
 Execution Time: 0.020 ms

It's OK but:

explain analyze select  object_namefrom    t1a
where
        rn between 20 and 10
;
                                           QUERY PLAN                
                           
---------------------------------------------------------------------
----------------------------
 Seq Scan on t1a  (cost=0.00..1167.00 rows=1 width=33) (actual
time=6.553..6.553 rows=0 loops=1)
   Filter: ((rn >= 20) AND (rn <= 10))
   Rows Removed by Filter: 50000
 Planning Time: 0.092 ms
 Execution Time: 6.573 ms

At first I thought it was related to partition pruning but:

set enable_partition_pruning = false;

explain analyze select  object_namefrom    t1
where
        rn between 20 and 10
;

                                     QUERY PLAN                      
             --------------------------------------------------------
----------------------------
 Result  (cost=0.00..0.00 rows=0 width=0) (actual time=0.002..0.003
rows=0 loops=1)
   One-Time Filter: false
 Planning Time: 0.104 ms
 Execution Time: 0.021 ms

Confirmation since I still obtain "One-Time Filter: false" if I don't
filter on the partition key:

create table t2 (rn integer , rn2 integer, object_name text)
partition by range(rn);

create table t2a partition of t2 for values from (1) to (50001);

d+ t2                                         Table partitionnée «
public.t2 »
   Colonne   |  Type   | Collationnement | NULL-able | Par défaut |
Stockage | Cible de statistiques | Description
-------------+---------+-----------------+-----------+------------+--
--------+-----------------------+-------------
 rn          | integer |                 |           |            |
plain    |                       |
 rn2         | integer |                 |           |            |
plain    |                       |
 object_name | text    |                 |           |            |
extended |                       |
Clé de partition : RANGE (rn)
Partitions: t2a FOR VALUES FROM (1) TO (50001)

insert into t2 select                                               
  rownum  rn, rownum rn2,
        upper(md5(random()::text)) object_name
from
        (select generate_series(1,50000) rownum) serie
;

explain analyze select  object_namefrom    t2
where
        rn2 between 20 and 10
;

                                     QUERY PLAN                      
             --------------------------------------------------------
----------------------------
 Result  (cost=0.00..0.00 rows=0 width=0) (actual time=0.002..0.003
rows=0 loops=1)
   One-Time Filter: false
 Planning Time: 0.185 ms
 Execution Time: 0.019 ms

I don't understand why I don't obtain " One-Time Filter: false" with
a classic table or a partition ?

Best regards,

Phil

See table 9.2 in the documentation.

BETWEEN 10 AND 20 returns TRUE.
BETWEEN 20 AND 10 returns FALSE.
BETWEEN SYMMETRIC 20 AND 10 returns TRUE.

HTH,

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Phil Florent (#1)
Re: Strange (and good) side effect of partitioning ?

Phil Florent <philflorent@hotmail.com> writes:

I read that on Jonathan Lewis' blog :
(I believe that there may be some RDBMS which will treat (e.g.) "X between 20 and 10" as being identical to "X between 10 and 20" )

FWIW, I hope not, because the SQL spec is perfectly clear that it's
not supposed to work like that. As rob stone noted nearby, you're
supposed to say BETWEEN SYMMETRIC if you want that behavior.
"X between 20 and 10" should always be false (well, unless it's
null because X is null). But I think that's not really what
your question is.

I am puzzled. PostgreSQL seems NOT to treat X between 20 and 10" as being identical to "X between 10 and 20" but it's complicated.

There's no specific mechanism in Postgres that would cause "X between 20
and 10" to be reduced to constant-false (and I kind of think it would
be a waste of effort to add one). So that's why in simple cases you
get a plan like

Seq Scan on t1a (cost=0.00..1167.00 rows=1 width=33) (actual time=6.553..6.553 rows=0 loops=1)
Filter: ((rn >= 20) AND (rn <= 10))

I think that the other cases you show work as they do because the
code for excluding irrelevant range-based partitions is able to
conclude that no partition need be scanned. That is, the
constant-false-one-time-filter plan arises when we have no
partitions remaining to scan, not because the plan for any one
partition would have looked different from what's above.

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#3)
Re: Strange (and good) side effect of partitioning ?

I wrote:

There's no specific mechanism in Postgres that would cause "X between 20
and 10" to be reduced to constant-false

Wait, I take that back. There is a mechanism that can conclude that
"X >= 20" and "X <= 10" are contradictory, but it's not applied by
default. Observe:

regression=# set constraint_exclusion = default;
SET
regression=# explain select * from tenk1 where unique1 between 20 and 10;
QUERY PLAN
-----------------------------------------------------------------------------
Index Scan using tenk1_unique1 on tenk1 (cost=0.29..8.30 rows=1 width=244)
Index Cond: ((unique1 >= 20) AND (unique1 <= 10))
(2 rows)

regression=# set constraint_exclusion = on;
SET
regression=# explain select * from tenk1 where unique1 between 20 and 10;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)

The default value of constraint_exclusion is "partition", which means
(you guessed it) that it's applied only to potential partitioning
constraints. This is a heuristic based on the typical payoff of
excluding whole partitions versus skipping an empty index scan.
But if you have a workload where it's really worth spending
planner cycles looking for self-contradictory queries, you can
turn it on.

regards, tom lane

#5Phil Florent
philflorent@hotmail.com
In reply to: Tom Lane (#4)
RE: Strange (and good) side effect of partitioning ?

Hi Tom, Hi Rob

Thanks for this clear and complete explanation. My question was unclear since I didn't even consider the results could be identical and it was about the plans. I had misunderstood what J.Lewis had written since he probably meant some RDBMS always do a BETWEEN SYMETRIC. Our application currently has double compatibility with Oracle and PostgreSQL, PostgreSQL only after 2021, and I hope we won't be compatible with a RDBMS that would not respect SQL standard on this aspect.

But if you have a workload where it's really worth spending
planner cycles looking for self-contradictory queries, you can
turn it on.

It was theoretical but it is a DSS tool and some queries can be dynamically built by the end users. Perhaps it really happens on some cases since I don't know if we always check criterias not to obtain self-contradictory queries. Since it's not OLTP our execution times are always much more important than our planning times anyway.

There are other places it could be more interesting to spend time for better performance. Debian 10/PostgreSQL 11 is our initial PostgreSQL platform. It's very efficient but I have to prepare Debian 11/PostgresQL 13. My first goal was to avoid performance regressions but I now want to always fully use planning capabilites of PostgreSQL. We currently have to completely deactive merge joins for some workload, nested loops for some other workload. It's OK but it's not optimal.
My current goal is to always activate (almost) everything with Debian 11/PostgreSQL 13 and everything with Debian 12/PostgreSQL 14+.
I will try to increase default_statistics_target it could be worth the price. I will also try to activate enable_partitionwise_aggregate and enable_partitionwise_join since we use partitioning by list of hospitals and subpartitioning by range of times. Replacing our slow Oracle "union all" views by PostgreSQL partitioned tables to deal with group of hospitals has still to be completed.

Best regards,

Phil

________________________________
De : Tom Lane <tgl@sss.pgh.pa.us>
Envoyé : vendredi 15 janvier 2021 03:12
À : Phil Florent <philflorent@hotmail.com>
Cc : pgsql-general@postgresql.org <pgsql-general@postgresql.org>
Objet : Re: Strange (and good) side effect of partitioning ?

I wrote:

There's no specific mechanism in Postgres that would cause "X between 20
and 10" to be reduced to constant-false

Wait, I take that back. There is a mechanism that can conclude that
"X >= 20" and "X <= 10" are contradictory, but it's not applied by
default. Observe:

regression=# set constraint_exclusion = default;
SET
regression=# explain select * from tenk1 where unique1 between 20 and 10;
QUERY PLAN
-----------------------------------------------------------------------------
Index Scan using tenk1_unique1 on tenk1 (cost=0.29..8.30 rows=1 width=244)
Index Cond: ((unique1 >= 20) AND (unique1 <= 10))
(2 rows)

regression=# set constraint_exclusion = on;
SET
regression=# explain select * from tenk1 where unique1 between 20 and 10;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)

The default value of constraint_exclusion is "partition", which means
(you guessed it) that it's applied only to potential partitioning
constraints. This is a heuristic based on the typical payoff of
excluding whole partitions versus skipping an empty index scan.
But if you have a workload where it's really worth spending
planner cycles looking for self-contradictory queries, you can
turn it on.

regards, tom lane

#6Phil Florent
philflorent@hotmail.com
In reply to: Tom Lane (#4)
RE: Strange (and good) side effect of partitioning ?

Hi Tom,

The default value of constraint_exclusion is "partition", which means
(you guessed it) that it's applied only to potential partitioning
constraints. This is a heuristic based on the typical payoff of
excluding whole partitions versus skipping an empty index scan.
But if you have a workload where it's really worth spending
planner cycles looking for self-contradictory queries, you can
turn it on.

Interesting. Test case was not real but planning times have to be considered from a more general point of view. They are not a problem with our DSS app but we will also migrate our OLTP applications.
Partitioning is something new for me since we currently don't use it for our OLTP apps. It was not a technical choice, partitioning is not included in standard license of our current RDBMS. I will globally check the gain/loss with real workloads anyway.

Best regards,

Phil

________________________________
De : Tom Lane <tgl@sss.pgh.pa.us>
Envoyé : vendredi 15 janvier 2021 03:12
À : Phil Florent <philflorent@hotmail.com>
Cc : pgsql-general@postgresql.org <pgsql-general@postgresql.org>
Objet : Re: Strange (and good) side effect of partitioning ?

I wrote:

There's no specific mechanism in Postgres that would cause "X between 20
and 10" to be reduced to constant-false

Wait, I take that back. There is a mechanism that can conclude that
"X >= 20" and "X <= 10" are contradictory, but it's not applied by
default. Observe:

regression=# set constraint_exclusion = default;
SET
regression=# explain select * from tenk1 where unique1 between 20 and 10;
QUERY PLAN
-----------------------------------------------------------------------------
Index Scan using tenk1_unique1 on tenk1 (cost=0.29..8.30 rows=1 width=244)
Index Cond: ((unique1 >= 20) AND (unique1 <= 10))
(2 rows)

regression=# set constraint_exclusion = on;
SET
regression=# explain select * from tenk1 where unique1 between 20 and 10;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)

The default value of constraint_exclusion is "partition", which means
(you guessed it) that it's applied only to potential partitioning
constraints. This is a heuristic based on the typical payoff of
excluding whole partitions versus skipping an empty index scan.
But if you have a workload where it's really worth spending
planner cycles looking for self-contradictory queries, you can
turn it on.

regards, tom lane