BUG #16251: ::text type casting of a constant breaks query performance

Started by PG Bug reporting formabout 6 years ago6 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 16251
Logged by: Dima Pavlov
Email address: imyfess@gmail.com
PostgreSQL version: 12.1
Operating system: Windows 10
Description:

Test table and indexes (PostgreSQL 12.1):

CREATE TABLE t (dt timestamp with time zone);
CREATE INDEX ind ON t USING btree (dt);

INSERT
INTO t(dt)
SELECT
(
timestamp '2020-01-01 00:00:00' +
random() * (
timestamp '2020-02-29 00:00:00' -
timestamp '2020-01-01 00:00:00'
)
)
FROM generate_series(1, 10000)

-------------------------------------

In the first query, everything is ok, appropriate index "ind" is used:

explain (analyze, buffers)
SELECT *
FROM t
WHERE
('2020-02-08')::date IS NULL
OR
dt > '2020-02-08'
ORDER BY dt
LIMIT 1

"Limit (cost=0.29..0.37 rows=1 width=8) (actual time=0.186..0.188 rows=1
loops=1)"
" Buffers: shared hit=3"
" -> Index Only Scan using ind on t (cost=0.29..303.75 rows=3627 width=8)
(actual time=0.184..0.184 rows=1 loops=1)"
" Index Cond: (dt > '2020-02-08 00:00:00+05'::timestamp with time
zone)"
" Heap Fetches: 1"
" Buffers: shared hit=3"
"Planning Time: 2.365 ms"
"Execution Time: 0.239 ms"

-----------------------------------------------

With '::text' type casting of '2020-02-08' (which is already text) query
permofance is very low

explain (analyze, buffers)
SELECT *
FROM t
WHERE
('2020-02-08'::text)::date IS NULL
OR
dt > '2020-02-08'
ORDER BY dt
LIMIT 1

"Limit (cost=0.29..0.44 rows=1 width=8) (actual time=45.306..45.307 rows=1
loops=1)"
" Buffers: shared hit=6232"
" -> Index Only Scan using ind on t (cost=0.29..561.28 rows=3658 width=8)
(actual time=45.304..45.304 rows=1 loops=1)"
" Filter: ((('2020-02-08'::cstring)::date IS NULL) OR (dt >
'2020-02-08 00:00:00+05'::timestamp with time zone))"
" Rows Removed by Filter: 6367"
" Heap Fetches: 6368"
" Buffers: shared hit=6232"
"Planning Time: 0.348 ms"
"Execution Time: 45.343 ms"

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #16251: ::text type casting of a constant breaks query performance

so 8. 2. 2020 v 7:44 odesílatel PG Bug reporting form <
noreply@postgresql.org> napsal:

The following bug has been logged on the website:

Bug reference: 16251
Logged by: Dima Pavlov
Email address: imyfess@gmail.com
PostgreSQL version: 12.1
Operating system: Windows 10
Description:

Test table and indexes (PostgreSQL 12.1):

CREATE TABLE t (dt timestamp with time zone);
CREATE INDEX ind ON t USING btree (dt);

INSERT
INTO t(dt)
SELECT
(
timestamp '2020-01-01 00:00:00' +
random() * (
timestamp '2020-02-29 00:00:00' -
timestamp '2020-01-01 00:00:00'
)
)
FROM generate_series(1, 10000)

-------------------------------------

In the first query, everything is ok, appropriate index "ind" is used:

explain (analyze, buffers)
SELECT *
FROM t
WHERE
('2020-02-08')::date IS NULL
OR
dt > '2020-02-08'
ORDER BY dt
LIMIT 1

"Limit (cost=0.29..0.37 rows=1 width=8) (actual time=0.186..0.188 rows=1
loops=1)"
" Buffers: shared hit=3"
" -> Index Only Scan using ind on t (cost=0.29..303.75 rows=3627
width=8)
(actual time=0.184..0.184 rows=1 loops=1)"
" Index Cond: (dt > '2020-02-08 00:00:00+05'::timestamp with time
zone)"
" Heap Fetches: 1"
" Buffers: shared hit=3"
"Planning Time: 2.365 ms"
"Execution Time: 0.239 ms"

-----------------------------------------------

With '::text' type casting of '2020-02-08' (which is already text) query
permofance is very low

explain (analyze, buffers)
SELECT *
FROM t
WHERE
('2020-02-08'::text)::date IS NULL
OR
dt > '2020-02-08'
ORDER BY dt
LIMIT 1

"Limit (cost=0.29..0.44 rows=1 width=8) (actual time=45.306..45.307 rows=1
loops=1)"
" Buffers: shared hit=6232"
" -> Index Only Scan using ind on t (cost=0.29..561.28 rows=3658
width=8)
(actual time=45.304..45.304 rows=1 loops=1)"
" Filter: ((('2020-02-08'::cstring)::date IS NULL) OR (dt >
'2020-02-08 00:00:00+05'::timestamp with time zone))"
" Rows Removed by Filter: 6367"
" Heap Fetches: 6368"
" Buffers: shared hit=6232"
"Planning Time: 0.348 ms"
"Execution Time: 45.343 ms"

This is not a bug, but just feature.

The '2020-02-08' is not text type - it is 'unknown' type - and then is just
directly transformed to date. I think so cast from text to date is not
maybe immutable, and it can stops some optimizations.

postgres=# explain analyze select * from foo where '2020-02-01'::date is
null or a > '2020-01-01';
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN

╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ Index Only Scan using foo_a_idx on foo (cost=0.42..9.12 rows=40 width=8)
(actual time=0.083..0.125 rows=37 loops=1) │
│ Index Cond: (a > '2020-01-01 00:00:00+01'::timestamp with time zone)

│ Heap Fetches: 37

│ Planning Time: 0.279 ms

│ Execution Time: 0.189 ms

└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(5 rows)

postgres=# explain analyze select * from foo where '2020-02-01'::text::date
is null or a > '2020-01-01';
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN

╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ Seq Scan on foo (cost=0.00..2193.00 rows=540 width=8) (actual
time=0.050..71.671 rows=37 loops=1) │
│ Filter: ((('2020-02-01'::cstring)::date IS NULL) OR (a > '2020-01-01
00:00:00+01'::timestamp with time zone)) │
│ Rows Removed by Filter: 99963

│ Planning Time: 0.299 ms

│ Execution Time: 71.714 ms

└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(5 rows)

The problem is in casting from Date to Timestamptz - the related function
"timestamptz" is marked as "stable" - and then probably the all expression
is marked as "stable", what block to use this expression as index condition.

I don't know the context, but the expression "'2020-02-01'::date is null or
a > '2020-01-01'" has not too much sense, so just don't do this. Postgres
is not too smart and don't try to reduce some useless part of expressions.

Or if you need it, then use UNION and separate this expression to two
independent expressions

postgres=# explain analyze select * from foo where '2020-02-01'::text::date
is null union select * from foo where a > '2020-01-01';
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN

╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ HashAggregate (cost=3202.82..4203.22 rows=100040 width=8) (actual
time=0.175..1.635 rows=37 loops=1) │
│ Group Key: foo.a

│ -> Append (cost=0.01..2952.72 rows=100040 width=8) (actual
time=0.061..0.120 rows=37 loops=1) │
│ -> Result (cost=0.01..1443.01 rows=100000 width=8) (actual
time=0.016..0.017 rows=0 loops=1) │
│ One-Time Filter: (('2020-02-01'::cstring)::date IS NULL)

│ -> Seq Scan on foo (cost=0.01..1443.01 rows=100000
width=8) (never executed) │
│ -> Index Only Scan using foo_a_idx on foo foo_1
(cost=0.42..9.12 rows=40 width=8) (actual time=0.043..0.090 rows=37
loops=1) │
│ Index Cond: (a > '2020-01-01 00:00:00+01'::timestamp with
time zone) │
│ Heap Fetches: 37

│ Planning Time: 0.437 ms

│ Execution Time: 6.690 ms

└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

Regards

Pavel

#3Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#2)
Re: BUG #16251: ::text type casting of a constant breaks query performance

so 8. 2. 2020 v 8:09 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:

so 8. 2. 2020 v 7:44 odesílatel PG Bug reporting form <
noreply@postgresql.org> napsal:

The following bug has been logged on the website:

Bug reference: 16251
Logged by: Dima Pavlov
Email address: imyfess@gmail.com
PostgreSQL version: 12.1
Operating system: Windows 10
Description:

Test table and indexes (PostgreSQL 12.1):

CREATE TABLE t (dt timestamp with time zone);
CREATE INDEX ind ON t USING btree (dt);

INSERT
INTO t(dt)
SELECT
(
timestamp '2020-01-01 00:00:00' +
random() * (
timestamp '2020-02-29 00:00:00' -
timestamp '2020-01-01 00:00:00'
)
)
FROM generate_series(1, 10000)

-------------------------------------

In the first query, everything is ok, appropriate index "ind" is used:

explain (analyze, buffers)
SELECT *
FROM t
WHERE
('2020-02-08')::date IS NULL
OR
dt > '2020-02-08'
ORDER BY dt
LIMIT 1

"Limit (cost=0.29..0.37 rows=1 width=8) (actual time=0.186..0.188 rows=1
loops=1)"
" Buffers: shared hit=3"
" -> Index Only Scan using ind on t (cost=0.29..303.75 rows=3627
width=8)
(actual time=0.184..0.184 rows=1 loops=1)"
" Index Cond: (dt > '2020-02-08 00:00:00+05'::timestamp with time
zone)"
" Heap Fetches: 1"
" Buffers: shared hit=3"
"Planning Time: 2.365 ms"
"Execution Time: 0.239 ms"

-----------------------------------------------

With '::text' type casting of '2020-02-08' (which is already text) query
permofance is very low

explain (analyze, buffers)
SELECT *
FROM t
WHERE
('2020-02-08'::text)::date IS NULL
OR
dt > '2020-02-08'
ORDER BY dt
LIMIT 1

"Limit (cost=0.29..0.44 rows=1 width=8) (actual time=45.306..45.307
rows=1
loops=1)"
" Buffers: shared hit=6232"
" -> Index Only Scan using ind on t (cost=0.29..561.28 rows=3658
width=8)
(actual time=45.304..45.304 rows=1 loops=1)"
" Filter: ((('2020-02-08'::cstring)::date IS NULL) OR (dt >
'2020-02-08 00:00:00+05'::timestamp with time zone))"
" Rows Removed by Filter: 6367"
" Heap Fetches: 6368"
" Buffers: shared hit=6232"
"Planning Time: 0.348 ms"
"Execution Time: 45.343 ms"

This is not a bug, but just feature.

The '2020-02-08' is not text type - it is 'unknown' type - and then is
just directly transformed to date. I think so cast from text to date is not
maybe immutable, and it can stops some optimizations.

postgres=# explain analyze select * from foo where '2020-02-01'::date is
null or a > '2020-01-01';

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN

╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ Index Only Scan using foo_a_idx on foo (cost=0.42..9.12 rows=40
width=8) (actual time=0.083..0.125 rows=37 loops=1) │
│ Index Cond: (a > '2020-01-01 00:00:00+01'::timestamp with time zone)

│ Heap Fetches: 37

│ Planning Time: 0.279 ms

│ Execution Time: 0.189 ms

└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(5 rows)

postgres=# explain analyze select * from foo where
'2020-02-01'::text::date is null or a > '2020-01-01';

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN

╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ Seq Scan on foo (cost=0.00..2193.00 rows=540 width=8) (actual
time=0.050..71.671 rows=37 loops=1) │
│ Filter: ((('2020-02-01'::cstring)::date IS NULL) OR (a > '2020-01-01
00:00:00+01'::timestamp with time zone)) │
│ Rows Removed by Filter: 99963

│ Planning Time: 0.299 ms

│ Execution Time: 71.714 ms

└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(5 rows)

The problem is in casting from Date to Timestamptz - the related function
"timestamptz" is marked as "stable" - and then probably the all expression
is marked as "stable", what block to use this expression as index condition.

I was wrong - problem is in probably cast from text to date.

I don't know the context, but the expression "'2020-02-01'::date is null
or a > '2020-01-01'" has not too much sense, so just don't do this.
Postgres is not too smart and don't try to reduce some useless part of
expressions.

More times Postgres is more sensitive on data types due special rules of
casting.

Show quoted text

Or if you need it, then use UNION and separate this expression to two
independent expressions

postgres=# explain analyze select * from foo where
'2020-02-01'::text::date is null union select * from foo where a >
'2020-01-01';

┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN

╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ HashAggregate (cost=3202.82..4203.22 rows=100040 width=8) (actual
time=0.175..1.635 rows=37 loops=1) │
│ Group Key: foo.a

│ -> Append (cost=0.01..2952.72 rows=100040 width=8) (actual
time=0.061..0.120 rows=37 loops=1) │
│ -> Result (cost=0.01..1443.01 rows=100000 width=8) (actual
time=0.016..0.017 rows=0 loops=1) │
│ One-Time Filter: (('2020-02-01'::cstring)::date IS NULL)

│ -> Seq Scan on foo (cost=0.01..1443.01 rows=100000
width=8) (never executed) │
│ -> Index Only Scan using foo_a_idx on foo foo_1
(cost=0.42..9.12 rows=40 width=8) (actual time=0.043..0.090 rows=37
loops=1) │
│ Index Cond: (a > '2020-01-01 00:00:00+01'::timestamp with
time zone) │
│ Heap Fetches: 37

│ Planning Time: 0.437 ms

│ Execution Time: 6.690 ms

└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

Regards

Pavel

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#2)
Re: BUG #16251: ::text type casting of a constant breaks query performance

Pavel Stehule <pavel.stehule@gmail.com> writes:

so 8. 2. 2020 v 7:44 odesílatel PG Bug reporting form <
noreply@postgresql.org> napsal:

With '::text' type casting of '2020-02-08' (which is already text) query
permofance is very low

The '2020-02-08' is not text type - it is 'unknown' type - and then is just
directly transformed to date. I think so cast from text to date is not
maybe immutable, and it can stops some optimizations.

Yeah. There actually isn't any cast from text to date, if you look into
pg_cast. So "('2020-02-08'::text)::date" is implemented as a text Const
that's fed through a CoerceViaIO node that applies date_in(), and
date_in() is only stable not immutable. (That must be so because its
behavior depends on the DateStyle setting, and maybe TimeZone too; not
sure about the latter but definitely the former.) So the planner is
unable to reduce the IS NULL test to constant-false and thereby get
rid of the OR, and that means it can't usefully apply the index.

If you can't rearrange things so that the IS NULL argument is seen
as a constant, the UNION trick that Pavel mentioned might be a useful
workaround. But I'm inclined to think that you need to take two steps
back and figure out whether this query logic is really sane or not.
You do realize that the query is asking to retrieve the entire table,
if whatever-it-is is NULL? Why would that be what you want?

regards, tom lane

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#4)
Re: BUG #16251: ::text type casting of a constant breaks query performance

so 8. 2. 2020 v 17:49 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Pavel Stehule <pavel.stehule@gmail.com> writes:

so 8. 2. 2020 v 7:44 odesílatel PG Bug reporting form <
noreply@postgresql.org> napsal:

With '::text' type casting of '2020-02-08' (which is already text) query
permofance is very low

The '2020-02-08' is not text type - it is 'unknown' type - and then is

just

directly transformed to date. I think so cast from text to date is not
maybe immutable, and it can stops some optimizations.

Yeah. There actually isn't any cast from text to date, if you look into
pg_cast. So "('2020-02-08'::text)::date" is implemented as a text Const
that's fed through a CoerceViaIO node that applies date_in(), and
date_in() is only stable not immutable. (That must be so because its
behavior depends on the DateStyle setting, and maybe TimeZone too; not
sure about the latter but definitely the former.) So the planner is
unable to reduce the IS NULL test to constant-false and thereby get
rid of the OR, and that means it can't usefully apply the index.

If you can't rearrange things so that the IS NULL argument is seen
as a constant, the UNION trick that Pavel mentioned might be a useful
workaround. But I'm inclined to think that you need to take two steps
back and figure out whether this query logic is really sane or not.
You do realize that the query is asking to retrieve the entire table,
if whatever-it-is is NULL? Why would that be what you want?

If I remember well, this technique was a trick to use one query for
variables that can be (or should not be) specified by user.

I can has a variable $ID. If user specifies this variable, it has some
number, else it has NULL.

When you want to use one query for both possibilities (static query), then
you can write

SELECT * FROM tab WHERE ($ID is NULL OR id = $ID)

We used this technique 20 years ago, and I think it was very popular, but
databases was significantly smaller, and only few people had good knowledge
of SQL databases.

Show quoted text

regards, tom lane

#6Jeff Janes
jeff.janes@gmail.com
In reply to: Tom Lane (#4)
Re: BUG #16251: ::text type casting of a constant breaks query performance

On Sat, Feb 8, 2020 at 11:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Pavel Stehule <pavel.stehule@gmail.com> writes:

so 8. 2. 2020 v 7:44 odesílatel PG Bug reporting form <
noreply@postgresql.org> napsal:

If you can't rearrange things so that the IS NULL argument is seen
as a constant, the UNION trick that Pavel mentioned might be a useful
workaround. But I'm inclined to think that you need to take two steps
back and figure out whether this query logic is really sane or not.
You do realize that the query is asking to retrieve the entire table,
if whatever-it-is is NULL? Why would that be what you want?

It is a pretty common tactic to do this. It much easier on the client side
to bind NULL to a parameter when you don't care, rather than dynamically
rewrite the query text to remove that condition from it. Of course that
whole thing is likely to be ANDed together with other clauses in an
unsimplified real-world example.

Cheers,

Jeff