BUG #15489: Segfault on DELETE

Started by PG Bug reporting formover 7 years ago8 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 15489
Logged by: Kanwei Li
Email address: kanwei@gmail.com
PostgreSQL version: 11.0
Operating system: Debian 9
Description:

We started seeing a segfault crash on our postgresql 11 server instance
today when attempting to delete certain rows in the database:

2018-11-06 21:02:07.553 UTC [60606] LOG: server process (PID 66881) was
terminated by signal 11: Segmentation fault
2018-11-06 21:02:07.553 UTC [60606] DETAIL: Failed process was running:
delete from integration_account
where partner_id = 24

Attempting to delete certain rows were causing this segfault, and attempting
to delete other rows did not. There didn't seem to be a pattern, and because
this was on production we couldn't risk playing around too much.

Doing a SELECT on the rows that couldn't be deleted worked fine. There
didn't seem to be data corruption since all the data could be read. However,
attempting to DELETE certain rows would crash it. pg_dump also worked
fine.

What fixed it was performing a VACUUM ANALYZE on the database. After that,
the deletes worked again.

I'm sorry I can no longer list steps to reproduce this, since the VACUUM
fixed it, but I figured I should report it in case others have seen it, or
if anyone can maybe guess what the problem is.

#2Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: PG Bug reporting form (#1)
Re: BUG #15489: Segfault on DELETE

On 2018/11/07 14:01, PG Bug reporting form wrote:

The following bug has been logged on the website:

Bug reference: 15489
Logged by: Kanwei Li
Email address: kanwei@gmail.com
PostgreSQL version: 11.0
Operating system: Debian 9
Description:

We started seeing a segfault crash on our postgresql 11 server instance
today when attempting to delete certain rows in the database:

2018-11-06 21:02:07.553 UTC [60606] LOG: server process (PID 66881) was
terminated by signal 11: Segmentation fault
2018-11-06 21:02:07.553 UTC [60606] DETAIL: Failed process was running:
delete from integration_account
where partner_id = 24

Attempting to delete certain rows were causing this segfault, and attempting
to delete other rows did not. There didn't seem to be a pattern, and because
this was on production we couldn't risk playing around too much.

Doing a SELECT on the rows that couldn't be deleted worked fine. There
didn't seem to be data corruption since all the data could be read. However,
attempting to DELETE certain rows would crash it. pg_dump also worked
fine.

Are there any triggers defined on integration_account? Also, has there
recently been any ALTER TABLE DROP/DROP COLUMN activity on that table?

PG 11.1 to be released later this week fixed a bug that would cause
segmentation fault when running triggers (including, but not limited to
DELETE triggers).

What fixed it was performing a VACUUM ANALYZE on the database. After that,
the deletes worked again.

Hmm, that's a bit mysterious to me if your case is really hitting the bug
I'm suspecting.

Thanks,
Amit

#3Michael Paquier
michael@paquier.xyz
In reply to: Amit Langote (#2)
Re: BUG #15489: Segfault on DELETE

On Wed, Nov 07, 2018 at 02:34:12PM +0900, Amit Langote wrote:

Are there any triggers defined on integration_account? Also, has there
recently been any ALTER TABLE DROP/DROP COLUMN activity on that table?

PG 11.1 to be released later this week fixed a bug that would cause
segmentation fault when running triggers (including, but not limited to
DELETE triggers).

The point is that without more information about the schema used which
would allow to build a reproducible test case from the ground, or even
better a self-contained test case, then there is nothing much we can do
except assuming about what kind of things have been happening here.
--
Michael

#4Frederico Galvão
frederico.costa.galvao@gmail.com
In reply to: Michael Paquier (#3)
Re: Re: BUG #15489: Segfault on DELETE

I stumbled upon this issue yesterday, and trying to reduce and pinpoint
it, I managed to get to this:

//start
CREATE TABLE a (
��� id bigint
);

INSERT INTO a (id) VALUES (1); -- this id's value doesn't matter

ALTER TABLE ONLY a
��� ADD CONSTRAINT a_pkey PRIMARY KEY (id);

CREATE TABLE b (
��� a_id bigint
);

ALTER TABLE ONLY b
��� ADD CONSTRAINT b_a_id_fkey FOREIGN KEY (a_id) REFERENCES a(id);

ALTER TABLE a ADD x BOOLEAN NOT NULL DEFAULT FALSE; -- or TRUE, doesn't
matter

-- VACUUM FULL ANALYZE a; -- uncomment this to fix the bug

DELETE FROM a;
//end

This was the bare minimum I could get to reproduce the segfault on a
portable way. It's something between foreign keys pointing to tables
that have gone through the new no-table-rewrite handling of nonnull
columns with non-volatile default values.

Also, VACUUM ANALYZE itself didn't fix the corrupted data: it needs to
be FULL.

I'm on <Xubuntu 16.04 x86_64>, with <psql (PostgreSQL) 11.0 (Ubuntu
11.0-1.pgdg16.04+2)>.
I have some simple custom settings on postgresql.conf that I don't think
are related to the issue, but I'm willing to provide if needed.

---

Frederico Costa Galv�o

Show quoted text

On 07/11/2018 05:14, Michael Paquier wrote:

On Wed, Nov 07, 2018 at 02:34:12PM +0900, Amit Langote wrote:

Are there any triggers defined on integration_account? Also, has there
recently been any ALTER TABLE DROP/DROP COLUMN activity on that table?

PG 11.1 to be released later this week fixed a bug that would cause
segmentation fault when running triggers (including, but not limited to
DELETE triggers).

The point is that without more information about the schema used which
would allow to build a reproducible test case from the ground, or even
better a self-contained test case, then there is nothing much we can do
except assuming about what kind of things have been happening here.
--
Michael

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Frederico Galvão (#4)
Re: BUG #15489: Segfault on DELETE

=?UTF-8?Q?Frederico_Costa_Galv=c3=a3o?= <frederico.costa.galvao@gmail.com> writes:

I stumbled upon this issue yesterday, and trying to reduce and pinpoint
it, I managed to get to this:

Yeah, this looks like the expand_tuple bug: you've got a foreign-key
trigger and a tuple that doesn't match the table rowtype anymore.
This example doesn't crash for me in HEAD or 11.1.

regards, tom lane

#6Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Frederico Galvão (#4)
Re: BUG #15489: Segfault on DELETE

Thanks Frederico for your reply.

On 2018/11/08 10:10, Frederico Costa Galv�o wrote:

I stumbled upon this issue yesterday, and trying to reduce and pinpoint
it, I managed to get to this:

//start
CREATE TABLE a (
��� id bigint
);

INSERT INTO a (id) VALUES (1); -- this id's value doesn't matter

ALTER TABLE ONLY a
��� ADD CONSTRAINT a_pkey PRIMARY KEY (id);

CREATE TABLE b (
��� a_id bigint
);

ALTER TABLE ONLY b
��� ADD CONSTRAINT b_a_id_fkey FOREIGN KEY (a_id) REFERENCES a(id);

ALTER TABLE a ADD x BOOLEAN NOT NULL DEFAULT FALSE; -- or TRUE, doesn't
matter

There it is. These are similar steps as I'd used to track down a bug
that's now fixed in 11.1.

/messages/by-id/9cb4aa1c-12ba-59c3-fd75-545fa90fb92f@lab.ntt.co.jp

The bug had to do with foreign key trigger not getting a proper
representation of the tuple being deleted, considering the newly added column.

-- VACUUM FULL ANALYZE a; -- uncomment this to fix the bug

Ah, VACUUM FULL will rewrite the tuples such that they're not hit by the
aforementioned bug.

So, if OP can tell that this is what happened in their case too, then 11.1
will have fixed the issue.

Thanks,
Amit

#7Frederico Galvão
frederico.costa.galvao@gmail.com
In reply to: Amit Langote (#6)
Re: BUG #15489: Segfault on DELETE

I'm happy I could help, and I'm even happier to see you guys were 10 steps
ahead of me and already fixed it for 11.1, which I'm definitely looking
forward to.

On Wed, Nov 7, 2018 at 11:33 PM Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
wrote:

Thanks Frederico for your reply.

On 2018/11/08 10:10, Frederico Costa Galvão wrote:

I stumbled upon this issue yesterday, and trying to reduce and pinpoint
it, I managed to get to this:

//start
CREATE TABLE a (
id bigint
);

INSERT INTO a (id) VALUES (1); -- this id's value doesn't matter

ALTER TABLE ONLY a
ADD CONSTRAINT a_pkey PRIMARY KEY (id);

CREATE TABLE b (
a_id bigint
);

ALTER TABLE ONLY b
ADD CONSTRAINT b_a_id_fkey FOREIGN KEY (a_id) REFERENCES a(id);

ALTER TABLE a ADD x BOOLEAN NOT NULL DEFAULT FALSE; -- or TRUE, doesn't
matter

There it is. These are similar steps as I'd used to track down a bug
that's now fixed in 11.1.

/messages/by-id/9cb4aa1c-12ba-59c3-fd75-545fa90fb92f@lab.ntt.co.jp

The bug had to do with foreign key trigger not getting a proper
representation of the tuple being deleted, considering the newly added
column.

-- VACUUM FULL ANALYZE a; -- uncomment this to fix the bug

Ah, VACUUM FULL will rewrite the tuples such that they're not hit by the
aforementioned bug.

So, if OP can tell that this is what happened in their case too, then 11.1
will have fixed the issue.

Thanks,
Amit

--
Frederico Costa Galvão
Engenheiro de Computação - Universidade Federal de Goiás
PontoGet Inovação Web
Tippz Mobile

#8Kanwei Li
kanwei@gmail.com
In reply to: Amit Langote (#6)
Re: BUG #15489: Segfault on DELETE

I did do a VACUUM FULL on that one particular table as well, so that may have been the command that fixed it, yes.

I have just upgraded to PG 11.1 and will report if I see this again. Thanks all!

Kanwei

Show quoted text

On Nov 7, 2018, at 8:33 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

Thanks Frederico for your reply.

On 2018/11/08 10:10, Frederico Costa Galvão wrote:

I stumbled upon this issue yesterday, and trying to reduce and pinpoint
it, I managed to get to this:

//start
CREATE TABLE a (
id bigint
);

INSERT INTO a (id) VALUES (1); -- this id's value doesn't matter

ALTER TABLE ONLY a
ADD CONSTRAINT a_pkey PRIMARY KEY (id);

CREATE TABLE b (
a_id bigint
);

ALTER TABLE ONLY b
ADD CONSTRAINT b_a_id_fkey FOREIGN KEY (a_id) REFERENCES a(id);

ALTER TABLE a ADD x BOOLEAN NOT NULL DEFAULT FALSE; -- or TRUE, doesn't
matter

There it is. These are similar steps as I'd used to track down a bug
that's now fixed in 11.1.

/messages/by-id/9cb4aa1c-12ba-59c3-fd75-545fa90fb92f@lab.ntt.co.jp </messages/by-id/9cb4aa1c-12ba-59c3-fd75-545fa90fb92f@lab.ntt.co.jp&gt;

The bug had to do with foreign key trigger not getting a proper
representation of the tuple being deleted, considering the newly added column.

-- VACUUM FULL ANALYZE a; -- uncomment this to fix the bug

Ah, VACUUM FULL will rewrite the tuples such that they're not hit by the
aforementioned bug.

So, if OP can tell that this is what happened in their case too, then 11.1
will have fixed the issue.

Thanks,
Amit