BUG #18973: The default enable_material=ON affects the cost estimation of optimizer, resulting in 10968x slow

Started by PG Bug reporting form10 months ago2 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 18973
Logged by: Jinhui
Email address: jinhui-lai@foxmail.com
PostgreSQL version: 17.5
Operating system: ubuntu 22.04
Description:

Dear PG developers,
Thank you for taking the time to read my report.
I may have found a performance issue. The parameter enable_material is set
to ON by default, and it affects the cost estimation of optimizer, resulting
in 10968x slow. You can reproduce it as follows:
CREATE TABLE t0(c0 INT8);
CREATE TABLE t1(c1 INT8);
INSERT INTO t1 SELECT i FROM generate_series(1, 100000000) AS i;
SELECT * FROM t0 INNER JOIN t1 ON t0.c0 != t1.c1;
c0 | c1
----+----
(0 rows)
Time: 9794.016 ms (00:09.794)
SET enable_material = off;
SELECT * FROM t0 INNER JOIN t1 ON t0.c0 != t1.c1;
c0 | c1
----+----
(0 rows)
Time: 0.893 ms
The enable_material=ON also affects CROSS/NATURAL JOIN, but not affects
LEFT JOIN:
SELECT * FROM t0 NATURAL JOIN t1;
c0 | c1
----+----
(0 rows)
Time: 7350.216 ms (00:07.350)
SELECT * FROM t0 CROSS JOIN t1;
c0 | c1
----+----
(0 rows)
Time: 6823.532 ms (00:06.824)
SELECT * FROM t0 LEFT JOIN t1 ON t0.c0 != t1.c1;
c0 | c1
----+----
(0 rows)
Time: 0.798 ms
Adding the following code in
postgres/blob/master/src/backend/optimizer/util/plancat.c may works
#include "catalog/pg_statistic_history.h"
...
bool is_table_vacuumed_or_analyzed(Oid relid)
{
Relation pgstahis = NULL;
SysScanDesc scan = NULL;
ScanKeyData key[1];
HeapTuple tuple = NULL;
bool found = false;
ScanKeyInit(&key[0], Anum_pg_statistic_history_starelid,
BTEqualStrategyNumber, F_OIDEQ, ObjectIdGetDatum(relid));
pgstahis = relation_open(StatisticHistoryRelationId, AccessShareLock);
scan = systable_beginscan(pgstahis, StatisticHistoryTabTypAttnumIndexId,
true, NULL, 1, key);
if (HeapTupleIsValid(tuple = systable_getnext(scan))) {
found = true;
}
systable_endscan(scan);
relation_close(pgstahis, AccessShareLock);
return found;
}
Best regard,
Jinhui

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #18973: The default enable_material=ON affects the cost estimation of optimizer, resulting in 10968x slow

PG Bug reporting form <noreply@postgresql.org> writes:

I may have found a performance issue. The parameter enable_material is set
to ON by default, and it affects the cost estimation of optimizer, resulting
in 10968x slow. You can reproduce it as follows:
CREATE TABLE t0(c0 INT8);
CREATE TABLE t1(c1 INT8);
INSERT INTO t1 SELECT i FROM generate_series(1, 100000000) AS i;
SELECT * FROM t0 INNER JOIN t1 ON t0.c0 != t1.c1;
c0 | c1
----+----
(0 rows)
Time: 9794.016 ms (00:09.794)

The problem with this example is that you didn't ANALYZE the tables.
If you do, it switches to a plan without Materialize:

regression=# CREATE TABLE t0(c0 INT8);
CREATE TABLE
regression=# CREATE TABLE t1(c1 INT8);
CREATE TABLE
regression=# INSERT INTO t1 SELECT i FROM generate_series(1, 100000000) AS i;
INSERT 0 100000000
regression=# explain analyze SELECT * FROM t0 INNER JOIN t1 ON t0.c0 != t1.c1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..3391443465.73 rows=224870062964 width=16) (actual time=19992.481..19992.483 rows=0.00 loops=1)
Join Filter: (t0.c0 <> t1.c1)
Buffers: shared read=442478 dirtied=442478 written=428541
-> Seq Scan on t1 (cost=0.00..1442478.28 rows=100000028 width=8) (actual time=0.136..11957.262 rows=100000000.00 loops=1)
Buffers: shared read=442478 dirtied=442478 written=428541
-> Materialize (cost=0.00..43.90 rows=2260 width=8) (actual time=0.000..0.000 rows=0.00 loops=100000000)
Storage: Memory Maximum Storage: 17kB
-> Seq Scan on t0 (cost=0.00..32.60 rows=2260 width=8) (actual time=0.005..0.005 rows=0.00 loops=1)
Planning:
Buffers: shared hit=68 read=33
Planning Time: 4.135 ms
Execution Time: 19992.525 ms
(12 rows)

regression=# vacuum analyze t0,t1;
VACUUM
regression=# explain analyze SELECT * FROM t0 INNER JOIN t1 ON t0.c0 != t1.c1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..2692478.72 rows=100000031 width=16) (actual time=0.004..0.005 rows=0.00 loops=1)
Join Filter: (t0.c0 <> t1.c1)
-> Seq Scan on t0 (cost=0.00..0.00 rows=1 width=8) (actual time=0.004..0.004 rows=0.00 loops=1)
-> Seq Scan on t1 (cost=0.00..1442478.32 rows=100000032 width=8) (never executed)
Planning:
Buffers: shared hit=9
Planning Time: 0.094 ms
Execution Time: 0.017 ms
(8 rows)

But really that's kind of cheating, because it depends critically
on t0 being completely empty. If we add a row there so that the
join has to do some work, there is not so much value after all:

regression=# insert into t0 values(1);
INSERT 0 1
regression=# vacuum analyze t0;
VACUUM
regression=# explain analyze SELECT * FROM t0 INNER JOIN t1 ON t0.c0 != t1.c1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..2692479.73 rows=100000031 width=16) (actual time=0.051..11894.867 rows=99999999.00 loops=1)
Join Filter: (t0.c0 <> t1.c1)
Rows Removed by Join Filter: 1
Buffers: shared hit=15701 read=426778
-> Seq Scan on t0 (cost=0.00..1.01 rows=1 width=8) (actual time=0.003..0.005 rows=1.00 loops=1)
Buffers: shared hit=1
-> Seq Scan on t1 (cost=0.00..1442478.32 rows=100000032 width=8) (actual time=0.044..3853.565 rows=100000000.00 loops=1)
Buffers: shared hit=15700 read=426778
Planning:
Buffers: shared hit=6
Planning Time: 0.068 ms
Execution Time: 14050.387 ms
(12 rows)

We don't optimize for the case of tables being completely
empty, because that's basically a zero-probability situation
in real-world queries. So even though this don't-scan-the-
inner-table-when-the-outer-one-is-empty short-circuit exists
in the executor, the optimizer does not plan on the assumption
of that happening. That's not a bug, it's intentional.
We judge that a plan made on that assumption will be too
brittle if the table turns out to not be empty after all.

regards, tom lane