Tips on troubleshooting slow DELETE (suspect cascades)

Started by Jim Vannsabout 2 years ago9 messagesgeneral
Jump to latest
#1Jim Vanns
jvanns@ilm.com

Hi,

I have a slow (CPU bound) DELETE statement I'm attempting to debug and I
suspect that its actually the ON DELETE CASCADE on the foreign key thats
causing it. I suspect this because the dry-run mode of the same query (a
SELECT instead of DELETE) doesn't suffer the same fate. The statement is
effectively;

# Dry mode
SELECT prune_function(timestamp);
# Destructive mode
DELETE FROM foobar p USING prune_function(timestamp) AS e WHERE p.id =
e.prune_id

The logs seem to hold no information on the progress of the statement but
the CPU is pegged at 100% for hours. The SELECT equivalent runs in under a
minute.

What I need is a way to see into this statement as it executes to confirm
my suspicion - does anyone have any tips on that?

Cheers

Jim

--
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Jim Vanns (#1)
Re: Tips on troubleshooting slow DELETE (suspect cascades)

On 1/16/24 09:45, Jim Vanns wrote:

Hi,

I have a slow (CPU bound) DELETE statement I'm attempting to debug and I
suspect that its actually the ON DELETE CASCADE on the foreign key thats
causing it. I suspect this because the dry-run mode of the same query (a
SELECT instead of DELETE) doesn't suffer the same fate. The statement is
effectively;

# Dry mode
SELECT prune_function(timestamp);
# Destructive mode
DELETE FROM foobar p USING prune_function(timestamp) AS e WHERE p.id
<http://p.id&gt; = e.prune_id

The logs seem to hold no information on the progress of the statement
but the CPU is pegged at 100% for hours. The SELECT equivalent runs in
under a minute.

What I need is a way to see into this statement as it executes to
confirm my suspicion - does anyone have any tips on that?

Explain:

https://www.postgresql.org/docs/current/sql-explain.html

It would also be helpful to reply with the table definitions for the
tables. If that is not possible then at least whether there is an index
on the FK reference in the child table(s)?

Cheers

Jim

--
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London

--
Adrian Klaver
adrian.klaver@aklaver.com

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Adrian Klaver (#2)
Re: Tips on troubleshooting slow DELETE (suspect cascades)

Adrian Klaver <adrian.klaver@aklaver.com> writes:

On 1/16/24 09:45, Jim Vanns wrote:

I have a slow (CPU bound) DELETE statement I'm attempting to debug and I
suspect that its actually the ON DELETE CASCADE on the foreign key thats
causing it.

99% of the time, the cause is lack of an index on the foreign key's
referencing columns. We make you have a unique index on the
referenced columns, because otherwise the FK constraint's semantics
are unclear. But you're not required to make one on the other side.

What I need is a way to see into this statement as it executes to
confirm my suspicion - does anyone have any tips on that?

Explain:
https://www.postgresql.org/docs/current/sql-explain.html

Specifically, if EXPLAIN ANALYZE shows a lot of time spent in the
enforcement trigger for the FK, this is likely what's happening.

regards, tom lane

#4Jim Vanns
jvanns@ilm.com
In reply to: Tom Lane (#3)
Re: Tips on troubleshooting slow DELETE (suspect cascades)

Hi Tom/Adrian.

I should have already stated I did begin with EXPLAIN but given they
don't easily work with (the internals) stored/procedures, it wasn't
useful in this case. Also, I keep having to terminate the statement
because it never runs to completion and produces the plan (at least in
ANALYZE VERBOSE mode anyway).

I have, however, pulled the function body code out and produced an
isolated case that can be EXPLAINED. The table in question is a
curious one since it models a hierarchy as an adjacency list and so
the fkey reference is back to itself (to a primary key - so is an
additional index required?):

CREATE TABLE tree (
ts TIMESTAMPTZ NOT NULL
tree_id BIGINT NOT NULL,
parent_id BIGINT NULL,
--
CONSTRAINT cstr_tree_pky PRIMARY KEY (tree_id) INCLUDE (parent_id),
FOREIGN KEY (parent_id) REFERENCES tree(tree_id)
ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
);
CREATE INDEX ON tree USING BRIN (ts);

The tree table has 95,915,630 rows.

I've not yet got a complete or reliable plan :( I have made a DB copy
and will be dropping the constraint to see what effect that has.

Cheers,

Jim

On Tue, 16 Jan 2024 at 22:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Adrian Klaver <adrian.klaver@aklaver.com> writes:

On 1/16/24 09:45, Jim Vanns wrote:

I have a slow (CPU bound) DELETE statement I'm attempting to debug and I
suspect that its actually the ON DELETE CASCADE on the foreign key thats
causing it.

99% of the time, the cause is lack of an index on the foreign key's
referencing columns. We make you have a unique index on the
referenced columns, because otherwise the FK constraint's semantics
are unclear. But you're not required to make one on the other side.

What I need is a way to see into this statement as it executes to
confirm my suspicion - does anyone have any tips on that?

Explain:
https://www.postgresql.org/docs/current/sql-explain.html

Specifically, if EXPLAIN ANALYZE shows a lot of time spent in the
enforcement trigger for the FK, this is likely what's happening.

regards, tom lane

--
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London

#5Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Jim Vanns (#4)
Re: Tips on troubleshooting slow DELETE (suspect cascades)

On 1/18/24 08:37, Jim Vanns wrote:

Hi Tom/Adrian.

I should have already stated I did begin with EXPLAIN but given they
don't easily work with (the internals) stored/procedures, it wasn't
useful in this case. Also, I keep having to terminate the statement
because it never runs to completion and produces the plan (at least in
ANALYZE VERBOSE mode anyway).

I have, however, pulled the function body code out and produced an
isolated case that can be EXPLAINED. The table in question is a

I'm guessing the function is prune_function(timestamp)?

What is the function body code?

curious one since it models a hierarchy as an adjacency list and so
the fkey reference is back to itself (to a primary key - so is an
additional index required?):

CREATE TABLE tree (
ts TIMESTAMPTZ NOT NULL
tree_id BIGINT NOT NULL,
parent_id BIGINT NULL,
--
CONSTRAINT cstr_tree_pky PRIMARY KEY (tree_id) INCLUDE (parent_id),
FOREIGN KEY (parent_id) REFERENCES tree(tree_id)
ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
);
CREATE INDEX ON tree USING BRIN (ts);

The tree table has 95,915,630 rows.

I've not yet got a complete or reliable plan :( I have made a DB copy
and will be dropping the constraint to see what effect that has.

Cheers,

--
Adrian Klaver
adrian.klaver@aklaver.com

#6Rob Sargent
robjsargent@gmail.com
In reply to: Adrian Klaver (#5)
Re: Tips on troubleshooting slow DELETE (suspect cascades)

On Jan 18, 2024, at 9:46 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 1/18/24 08:37, Jim Vanns wrote:

Hi Tom/Adrian.
I should have already stated I did begin with EXPLAIN but given they
don't easily work with (the internals) stored/procedures, it wasn't
useful in this case. Also, I keep having to terminate the statement
because it never runs to completion and produces the plan (at least in
ANALYZE VERBOSE mode anyway).
I have, however, pulled the function body code out and produced an
isolated case that can be EXPLAINED. The table in question is a

I'm guessing the function is prune_function(timestamp)?

What is the function body code?

curious one since it models a hierarchy as an adjacency list and so
the fkey reference is back to itself (to a primary key - so is an
additional index required?):
CREATE TABLE tree (
ts TIMESTAMPTZ NOT NULL
tree_id BIGINT NOT NULL,
parent_id BIGINT NULL,
--
CONSTRAINT cstr_tree_pky PRIMARY KEY (tree_id) INCLUDE (parent_id),
FOREIGN KEY (parent_id) REFERENCES tree(tree_id)
ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
);
CREATE INDEX ON tree USING BRIN (ts);
The tree table has 95,915,630 rows.
I've not yet got a complete or reliable plan :( I have made a DB copy
and will be dropping the constraint to see what effect that has.
Cheers,

Have you tried making a table from prune_function() and using that in the delete statement?

Show quoted text

--
Adrian Klaver
adrian.klaver@aklaver.com

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Vanns (#4)
Re: Tips on troubleshooting slow DELETE (suspect cascades)

Jim Vanns <jvanns@ilm.com> writes:

I should have already stated I did begin with EXPLAIN but given they
don't easily work with (the internals) stored/procedures, it wasn't
useful in this case.

auto_explain with nested statements enabled might help?

regards, tom lane

#8Jim Vanns
jvanns@ilm.com
In reply to: Jim Vanns (#4)
Re: Tips on troubleshooting slow DELETE (suspect cascades)

After dropping the constraint entirely the DELETE completes in 4
minutes (the same time as the dry-run using SELECT against the
function instead of a DELETE). A marked improvement on 3 hours
followed by a pg_cancel_backend()!

Jim

On Thu, 18 Jan 2024 at 16:37, Jim Vanns <jvanns@ilm.com> wrote:

Hi Tom/Adrian.

I should have already stated I did begin with EXPLAIN but given they
don't easily work with (the internals) stored/procedures, it wasn't
useful in this case. Also, I keep having to terminate the statement
because it never runs to completion and produces the plan (at least in
ANALYZE VERBOSE mode anyway).

I have, however, pulled the function body code out and produced an
isolated case that can be EXPLAINED. The table in question is a
curious one since it models a hierarchy as an adjacency list and so
the fkey reference is back to itself (to a primary key - so is an
additional index required?):

CREATE TABLE tree (
ts TIMESTAMPTZ NOT NULL
tree_id BIGINT NOT NULL,
parent_id BIGINT NULL,
--
CONSTRAINT cstr_tree_pky PRIMARY KEY (tree_id) INCLUDE (parent_id),
FOREIGN KEY (parent_id) REFERENCES tree(tree_id)
ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
);
CREATE INDEX ON tree USING BRIN (ts);

The tree table has 95,915,630 rows.

I've not yet got a complete or reliable plan :( I have made a DB copy
and will be dropping the constraint to see what effect that has.

Cheers,

Jim

On Tue, 16 Jan 2024 at 22:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Adrian Klaver <adrian.klaver@aklaver.com> writes:

On 1/16/24 09:45, Jim Vanns wrote:

I have a slow (CPU bound) DELETE statement I'm attempting to debug and I
suspect that its actually the ON DELETE CASCADE on the foreign key thats
causing it.

99% of the time, the cause is lack of an index on the foreign key's
referencing columns. We make you have a unique index on the
referenced columns, because otherwise the FK constraint's semantics
are unclear. But you're not required to make one on the other side.

What I need is a way to see into this statement as it executes to
confirm my suspicion - does anyone have any tips on that?

Explain:
https://www.postgresql.org/docs/current/sql-explain.html

Specifically, if EXPLAIN ANALYZE shows a lot of time spent in the
enforcement trigger for the FK, this is likely what's happening.

regards, tom lane

--
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London

--
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London

#9Martin Ritchie
martinritchie@geotab.com
In reply to: Jim Vanns (#8)
Re: Tips on troubleshooting slow DELETE (suspect cascades)

Add an index to parent_id. What is likely happening is each time a row is
deleted, it has to scan the entire table to make sure it is not referenced
by any parent_id records.

On Thu, Jan 18, 2024 at 12:04 PM Jim Vanns <jvanns@ilm.com> wrote:

Show quoted text

After dropping the constraint entirely the DELETE completes in 4
minutes (the same time as the dry-run using SELECT against the
function instead of a DELETE). A marked improvement on 3 hours
followed by a pg_cancel_backend()!

Jim

On Thu, 18 Jan 2024 at 16:37, Jim Vanns <jvanns@ilm.com> wrote:

Hi Tom/Adrian.

I should have already stated I did begin with EXPLAIN but given they
don't easily work with (the internals) stored/procedures, it wasn't
useful in this case. Also, I keep having to terminate the statement
because it never runs to completion and produces the plan (at least in
ANALYZE VERBOSE mode anyway).

I have, however, pulled the function body code out and produced an
isolated case that can be EXPLAINED. The table in question is a
curious one since it models a hierarchy as an adjacency list and so
the fkey reference is back to itself (to a primary key - so is an
additional index required?):

CREATE TABLE tree (
ts TIMESTAMPTZ NOT NULL
tree_id BIGINT NOT NULL,
parent_id BIGINT NULL,
--
CONSTRAINT cstr_tree_pky PRIMARY KEY (tree_id) INCLUDE (parent_id),
FOREIGN KEY (parent_id) REFERENCES tree(tree_id)
ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
);
CREATE INDEX ON tree USING BRIN (ts);

The tree table has 95,915,630 rows.

I've not yet got a complete or reliable plan :( I have made a DB copy
and will be dropping the constraint to see what effect that has.

Cheers,

Jim

On Tue, 16 Jan 2024 at 22:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Adrian Klaver <adrian.klaver@aklaver.com> writes:

On 1/16/24 09:45, Jim Vanns wrote:

I have a slow (CPU bound) DELETE statement I'm attempting to debug

and I

suspect that its actually the ON DELETE CASCADE on the foreign key

thats

causing it.

99% of the time, the cause is lack of an index on the foreign key's
referencing columns. We make you have a unique index on the
referenced columns, because otherwise the FK constraint's semantics
are unclear. But you're not required to make one on the other side.

What I need is a way to see into this statement as it executes to
confirm my suspicion - does anyone have any tips on that?

Explain:
https://www.postgresql.org/docs/current/sql-explain.html

Specifically, if EXPLAIN ANALYZE shows a lot of time spent in the
enforcement trigger for the FK, this is likely what's happening.

regards, tom lane

--
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London

--
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London