BUG #18851: Queries with xxx NOT IN (SELECT xxx FROM table) fail to run (or run very slowly) on v17 (v14 ok)

Started by PG Bug reporting formabout 1 year ago2 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 18851
Logged by: Joan Sánchez Sabé
Email address: joan@sanchezsabe.com
PostgreSQL version: 17.4
Operating system: Windows 11 24H2 (26100.3194)
Description:

I have experienced a misbehaviour (IMHO) in PostgreSQL 17 that didn't occur
in PostgreSQL 14.

Minimum reproducible scenario to show the error:

1) Table with a primary key (integer) and 500_000 records on it.

CREATE TABLE t1 AS
(
SELECT
id
FROM
generate_series(1, 1000000) AS s(id)
ORDER BY
random()
LIMIT
500000
) ;
ALTER TABLE t1
ADD PRIMARY KEY (id);

ANALYZE t1 ;

2) Second table, with a smaller (200_000) number of values of the same type,
some of them overlapping the first table, without a primary key (nor any
index).

CREATE TABLE t2 AS
(
SELECT
id
FROM
generate_series(1, 1000000) AS s(id)
ORDER BY
random()
LIMIT
200000
) ;

3) The following query, which just looks for values in t2 not in t1, takes
about 200 ms on my computer using "PostgreSQL 14.17, compiled by Visual C++
build 1942, 64-bit"; running on Windows 11 Pro 24H2 (26100.3194).

SELECT
t2.id
FROM
t2
WHERE
t2.id NOT IN (SELECT id FROM t1) ;

Output: a random list of around 100_000 numbers.

The query plan is as follows:

Seq Scan on public.t2 (cost=8463.00..11848.00 rows=100000 width=4)
(actual time=122.093..186.668 rows=99566 loops=1)
Output: t2.id
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 100434
SubPlan 1
-> Seq Scan on public.t1 (cost=0.00..7213.00 rows=500000 width=4)
(actual time=0.017..20.573 rows=500000 loops=1)
Output: t1.id
Planning Time: 0.173 ms
Execution Time: 192.002 ms

You can check a (smallish) version on https://dbfiddle.uk/aHTC8YQd

--------
The same query, executed on "PostgreSQL 17.4 on x86_64-windows, compiled by
msvc-19.42.34436, 64-bit)" takes "forever". I actually gave up after 10
minutes.

The query plan was:

Seq Scan on public.t2 (cost=0.00..1294403396.00 rows=100000 width=4)
Output: t2.id
Filter: (NOT (ANY (t2.id = (SubPlan 1).col1)))
SubPlan 1
-> Materialize (cost=0.00..11694.00 rows=500000 width=4)
Output: t1.id
-> Seq Scan on public.t1 (cost=0.00..7240.00 rows=500000
width=4)
Output: t1.id

The same query works on smaller tables (sizes 100_000 and 40_000,
respectively; i.e.: 5 x smaller). In that case, the execution plan is the
same as with PostgreSQL 14 (with values divided by aprox. 5, as expected)

Seq Scan on public.t2 (cost=1698.00..2502.00 rows=24480 width=4)
(actual time=20.813..28.825 rows=20016 loops=1)
Output: t2.id
Filter: (NOT (ANY (t2.id = (hashed SubPlan 1).col1)))
Rows Removed by Filter: 19984
SubPlan 1
-> Seq Scan on public.t1 (cost=0.00..1448.00 rows=100000 width=4)
(actual time=0.013..4.844 rows=100000 loops=1)
Output: t1.id
Planning Time: 0.205 ms
Execution Time: 30.565 ms

So, it seems that the problem appears when the execution planner adds the
"Materialize" node.

If necessary, I can add configuration files for both versions of
PostgreSQL.

Many thanks in advance

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #18851: Queries with xxx NOT IN (SELECT xxx FROM table) fail to run (or run very slowly) on v17 (v14 ok)

PG Bug reporting form <noreply@postgresql.org> writes:

I have experienced a misbehaviour (IMHO) in PostgreSQL 17 that didn't occur
in PostgreSQL 14.

What have you got work_mem set to in the two installations?

For me, both versions switch to the cheap hashed plan if work_mem
is large enough that the inner table is estimated to fit in it
(somewhere around 10MB-20MB for this specific example). If anything
the newer code seems to have a smaller threshold.

regards, tom lane