BUG #17590: [OPTIMIZER] DELETE never ends on a small table, found a workaround which makes it instant

Started by PG Bug reporting formover 3 years ago2 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 17590
Logged by: Guillaume FOUET
Email address: g.fouet@gmail.com
PostgreSQL version: 14.5
Operating system: Windows 10 - 64x
Description:

Tested with Postgres 13.5, 13.8 and 14.5 (I updated to see if that was a
bug).

I have four tables: ADDRESS, COMPANY, CONTACT, TRIP
ADDRESS has ~130000 rows
COMPANY has a FK toward ADDRESS and ~16000 rows
CONTACT has a FK toward ADDRESS and ~12000 rows
TRIP has a FK toward ADDRESS and ~137500 rows

We wanted to purge the address table from old, unused addresses:
DELETE FROM address WHERE address_id NOT IN (
SELECT DISTINCT address_id FROM company
UNION
SELECT DISTINCT address_id FROM contact
UNION
SELECT DISTINCT address_id FROM trip
);

This query above never ends (I waited 15 minutes and it was still going, HDD
doing nothing, one core CPU used).
The EXPLAIN says it materializes the address_ids aggregates then scans
ADDRESS for deletion.

After many tries, I made this query instead:
CREATE TEMPORARY TABLE used_address_id AS (
SELECT DISTINCT address_id FROM company
UNION
SELECT DISTINCT address_id FROM contact
UNION
SELECT DISTINCT address_id FROM trip
);
CREATE UNIQUE INDEX ON used_address_id (address_id);
DELETE FROM used_address_id WHERE address_id IS NULL;
DELETE FROM address WHERE address_id NOT IN (SELECT address_id FROM
used_address_id);

This was resolved in 500ms (basically instantly).
I have the feeling there's an O(n²) somewhere in the first query.

Thanks.

#2Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: PG Bug reporting form (#1)
Re: BUG #17590: [OPTIMIZER] DELETE never ends on a small table, found a workaround which makes it instant

On 8/19/22 17:13, PG Bug reporting form wrote:

The following bug has been logged on the website:

Bug reference: 17590
Logged by: Guillaume FOUET
Email address: g.fouet@gmail.com
PostgreSQL version: 14.5
Operating system: Windows 10 - 64x
Description:

Tested with Postgres 13.5, 13.8 and 14.5 (I updated to see if that was a
bug).

I have four tables: ADDRESS, COMPANY, CONTACT, TRIP
ADDRESS has ~130000 rows
COMPANY has a FK toward ADDRESS and ~16000 rows
CONTACT has a FK toward ADDRESS and ~12000 rows
TRIP has a FK toward ADDRESS and ~137500 rows

We wanted to purge the address table from old, unused addresses:
DELETE FROM address WHERE address_id NOT IN (
SELECT DISTINCT address_id FROM company
UNION
SELECT DISTINCT address_id FROM contact
UNION
SELECT DISTINCT address_id FROM trip
);

This query above never ends (I waited 15 minutes and it was still going, HDD
doing nothing, one core CPU used).
The EXPLAIN says it materializes the address_ids aggregates then scans
ADDRESS for deletion.

It's probably better to include the query plan. Anyway, the union
essentially creates a new relation, making indexes (on the base
relations unusable).

After many tries, I made this query instead:
CREATE TEMPORARY TABLE used_address_id AS (
SELECT DISTINCT address_id FROM company
UNION
SELECT DISTINCT address_id FROM contact
UNION
SELECT DISTINCT address_id FROM trip
);
CREATE UNIQUE INDEX ON used_address_id (address_id);
DELETE FROM used_address_id WHERE address_id IS NULL;
DELETE FROM address WHERE address_id NOT IN (SELECT address_id FROM
used_address_id);

This was resolved in 500ms (basically instantly).
I have the feeling there's an O(n²) somewhere in the first query.

Yeah. The temporary table means we can use the indexes again.

I'm not sure I'd call this a bug, it's simply how we deal with this sort
of queries. Maybe try splitting the one "NOT IN" condition into a
separate condition per table. I mean, something like

DELETE FROM address
WHERE address_id NOT IN (SELECT address_id FROM company)
AND address_id NOT IN (SELECT address_id FROM contact)
AND address_id NOT IN (SELECT address_id FROM trip)

or something like that.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company