Why does delete from table where not exists (select 1 from ... LIMIT 1) perform badly?
Hi,
I've read about the reason for this before, but cannot find a reference
to it now.
How come the planner treats the
delete from table where not extists(select 1 from table2 where ... LIMIT 1)
so differently, and usually badly, when the LIMIT 1 is there. In older
version of postgresql, I remember that the effect was the opposite, a
limit 1 would actually perform substantially better. Hence we have old
code (and old habits), where the LIMIT 1 is still used.
Shouldn't the planner really understand that the intention is the same
in these two queries?
-- bad:
DELETE FROM iup_locked_gradings ilg
WHERE NOT EXISTS (
SELECT 1 FROM iup_locked_subject ils
WHERE ils.locked_gradings_id = ilg.locked_gradings_id
LIMIT 1
)
;
-- good:
DELETE FROM iup_locked_gradings ilg
WHERE NOT EXISTS (
SELECT 1 FROM iup_locked_subject ils
WHERE ils.locked_gradings_id = ilg.locked_gradings_id
)
;
pp=# begin; explain DELETE FROM iup_locked_gradings ilg WHERE NOT EXISTS
(SELECT 1 FROM iup_locked_subject ils WHERE ils.locked_gradings_id =
ilg.locked_gradings_id LIMIT 1);
BEGIN
QUERY PLAN
-----------------------------------------------------------------------------------------------
Delete (cost=0.00..523542963.48 rows=291737 width=6)
-> Seq Scan on iup_locked_gradings ilg (cost=0.00..523542963.48
rows=291737 width=6)
Filter: (NOT (SubPlan 1))
SubPlan 1
-> Limit (cost=0.00..897.27 rows=1 width=0)
-> Seq Scan on iup_locked_subject ils
(cost=0.00..18842.76 rows=21 width=0)
Filter: (locked_gradings_id = $0)
(7 rows)
pp=# begin; explain DELETE FROM iup_locked_gradings ilg WHERE NOT EXISTS
(SELECT 1 FROM iup_locked_subject ils WHERE ils.locked_gradings_id =
ilg.locked_gradings_id );
BEGIN
QUERY PLAN
---------------------------------------------------------------------------------------------------
Delete (cost=31705.39..47934.47 rows=553737 width=12)
-> Hash Anti Join (cost=31705.39..47934.47 rows=553737 width=12)
Hash Cond: (ilg.locked_gradings_id = ils.locked_gradings_id)
-> Seq Scan on iup_locked_gradings ilg (cost=0.00..6677.44
rows=583474 width=10)
-> Hash (cost=15776.83..15776.83 rows=1226373 width=10)
-> Seq Scan on iup_locked_subject ils
(cost=0.00..15776.83 rows=1226373 width=10)
(6 rows)
pp=#
chalmers=# \d iup_locked_gradings
Table "public.iup_locked_gradings"
Column | Type |
Modifiers
----------------------+---------+----------------------------------------------------------------------------------
locked_gradings_id | integer | not null default
nextval('iup_locked_gradings_locked_gradings_id_seq'::regclass)
type | integer |
description | text |
name | text |
original_gradings_id | integer |
Indexes:
"iup_locked_gradings_pkey" PRIMARY KEY, btree (locked_gradings_id),
tablespace "opt"
Referenced by:
TABLE "iup_locked_subject" CONSTRAINT
"iup_locked_subject_locked_gradings_id_fkey" FOREIGN KEY
(locked_gradings_id) REFERENCES iup_locked_gradings(locked_gradings_id)
ON UPDATE CASCADE ON DELETE SET NULL
Tablespace: "opt"
chalmers=# \d iup_locked_subject
Table "public.iup_locked_subject"
Column | Type |
Modifiers
---------------------+---------+--------------------------------------------------------------------------------
locked_subject_id | integer | not null default
nextval('iup_locked_subject_locked_subject_id_seq'::regclass)
name | text | not null
link_url | text |
description | text |
use_measures | boolean | not null default true
locked_gradings_id | integer |
original_subject_id | integer |
use_fail_warning | boolean | not null default false
Indexes:
"iup_locked_subject_pkey" PRIMARY KEY, btree (locked_subject_id),
tablespace "opt"
Foreign-key constraints:
"iup_locked_subject_locked_gradings_id_fkey" FOREIGN KEY
(locked_gradings_id) REFERENCES iup_locked_gradings(locked_gradings_id)
ON UPDATE CASCADE ON DELETE SET NULL
Referenced by:
Tablespace: "opt"
Palle Girgensohn <girgen@pingpong.net> writes:
How come the planner treats the
delete from table where not extists(select 1 from table2 where ... LIMIT 1)
so differently, and usually badly, when the LIMIT 1 is there.
Because it can't optimize it into an antijoin.
In older
version of postgresql, I remember that the effect was the opposite, a
limit 1 would actually perform substantially better. Hence we have old
code (and old habits), where the LIMIT 1 is still used.
Well, you're basically forcing it into the same type of plan you would
have gotten before antijoins were implemented (circa 8.4), so I don't
see that this is a regression. But I'd get rid of the LIMIT 1 if I were
you. It's been a *very* long time since that was a net benefit in an
EXISTS subquery, if indeed it ever was --- AFAIR, even the earliest PG
versions that understood about optimizing for fast-start plans would do
so in an EXISTS subquery, with or without any LIMIT.
regards, tom lane