BUG #14228: replication slot catalog_xmin not cleared on slot reuse

Started by Andrew Gierthalmost 10 years ago13 messagesbugs
Jump to latest
#1Andrew Gierth
andrew@tao11.riddles.org.uk

The following bug has been logged on the website:

Bug reference: 14228
Logged by: Andrew Gierth
Email address: andrew@tao11.riddles.org.uk
PostgreSQL version: 9.6beta2
Operating system: any
Description:

[My analysis of a bug reported on IRC; please CC original reporter: md at
chewy dot com]

Reported on pg 9.4.6 but also reproduced on 9.6beta2

When creating a physical replication slot, the catalog_xmin field of the new
slot is not initialized. If the slot storage had previously been used for a
logical slot, the old catalog_xmin will remain in place and interfere with
vacuum.

Trivially reproducible:

select pg_create_logical_replication_slot('test','test_decoding');
select pg_drop_replication_slot('test');
select pg_create_physical_replication_slot('test');
select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | active | active_pid |
xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
-----------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+---------------------
test | | physical | | | f | |
| 546 | | 0/1525438
(1 row)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Michael Paquier
michael@paquier.xyz
In reply to: Andrew Gierth (#1)
Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

On Wed, Jul 6, 2016 at 2:35 AM, <andrew@tao11.riddles.org.uk> wrote:

When creating a physical replication slot, the catalog_xmin field of the new
slot is not initialized. If the slot storage had previously been used for a
logical slot, the old catalog_xmin will remain in place and interfere with
vacuum.

Good catch! The same applies to confirmed_flush_lsn, which is used
only by logical decoding and should remain as NULL for physical slots.
So I propose the patch attached to address both problems.
--
Michael

Attachments:

fix-repslot-init.patchtext/x-diff; charset=US-ASCII; name=fix-repslot-init.patchDownload+2-0
#3Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Michael Paquier (#2)
Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

"Michael" == Michael Paquier <michael.paquier@gmail.com> writes:

When creating a physical replication slot, the catalog_xmin field of
the new slot is not initialized. If the slot storage had previously
been used for a logical slot, the old catalog_xmin will remain in
place and interfere with vacuum.

Michael> Good catch! The same applies to confirmed_flush_lsn, which is
Michael> used only by logical decoding and should remain as NULL for
Michael> physical slots. So I propose the patch attached to address
Michael> both problems.

What about slot->effective_catalog_xmin ?

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Michael Paquier
michael@paquier.xyz
In reply to: Andrew Gierth (#3)
Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

_

On Wed, Jul 6, 2016 at 12:56 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:

"Michael" == Michael Paquier <michael.paquier@gmail.com> writes:

When creating a physical replication slot, the catalog_xmin field of
the new slot is not initialized. If the slot storage had previously
been used for a logical slot, the old catalog_xmin will remain in
place and interfere with vacuum.

Michael> Good catch! The same applies to confirmed_flush_lsn, which is
Michael> used only by logical decoding and should remain as NULL for
Michael> physical slots. So I propose the patch attached to address
Michael> both problems.

What about slot->effective_catalog_xmin ?

Yes. I guess so, as well as the other candidate_* fields in the slot
to begin from a clean state.
--
Michael

Attachments:

fix-repslot-init-v2.patchapplication/x-patch; name=fix-repslot-init-v2.patchDownload+7-0
#5Stephen Frost
sfrost@snowman.net
In reply to: Michael Paquier (#4)
Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

* Michael Paquier (michael.paquier@gmail.com) wrote:

On Wed, Jul 6, 2016 at 12:56 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:

"Michael" == Michael Paquier <michael.paquier@gmail.com> writes:

When creating a physical replication slot, the catalog_xmin field of
the new slot is not initialized. If the slot storage had previously
been used for a logical slot, the old catalog_xmin will remain in
place and interfere with vacuum.

Michael> Good catch! The same applies to confirmed_flush_lsn, which is
Michael> used only by logical decoding and should remain as NULL for
Michael> physical slots. So I propose the patch attached to address
Michael> both problems.

What about slot->effective_catalog_xmin ?

Yes. I guess so, as well as the other candidate_* fields in the slot
to begin from a clean state.

Seems like we should try to get this in before the next round of point
releases...?

Thanks!

Stephen

#6Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#5)
Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

On Tue, Jul 26, 2016 at 12:25 AM, Stephen Frost <sfrost@snowman.net> wrote:

* Michael Paquier (michael.paquier@gmail.com) wrote:

On Wed, Jul 6, 2016 at 12:56 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:

"Michael" == Michael Paquier <michael.paquier@gmail.com> writes:

When creating a physical replication slot, the catalog_xmin field of
the new slot is not initialized. If the slot storage had previously
been used for a logical slot, the old catalog_xmin will remain in
place and interfere with vacuum.

Michael> Good catch! The same applies to confirmed_flush_lsn, which is
Michael> used only by logical decoding and should remain as NULL for
Michael> physical slots. So I propose the patch attached to address
Michael> both problems.

What about slot->effective_catalog_xmin ?

Yes. I guess so, as well as the other candidate_* fields in the slot
to begin from a clean state.

Seems like we should try to get this in before the next round of point
releases...?

That would be nice, I would guess that Andres or Robert (added in CC)
are the best fits to commit this patch, even if this is just a
variable initialization issue.
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#7Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#4)
Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

On 2016-07-06 13:07:36 +0900, Michael Paquier wrote:

_

On Wed, Jul 6, 2016 at 12:56 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:

"Michael" == Michael Paquier <michael.paquier@gmail.com> writes:

When creating a physical replication slot, the catalog_xmin field of
the new slot is not initialized. If the slot storage had previously
been used for a logical slot, the old catalog_xmin will remain in
place and interfere with vacuum.

Michael> Good catch! The same applies to confirmed_flush_lsn, which is
Michael> used only by logical decoding and should remain as NULL for
Michael> physical slots. So I propose the patch attached to address
Michael> both problems.

What about slot->effective_catalog_xmin ?

Yes. I guess so, as well as the other candidate_* fields in the slot
to begin from a clean state.

I think it'd be better if we explicitly zeroed .data - that way the
likelihood of future bugs of the same ilk is smaller.

Greetings,

Andres Freund

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#8Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#7)
Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

On Thu, Jul 28, 2016 at 9:24 AM, Andres Freund <andres@anarazel.de> wrote:

I think it'd be better if we explicitly zeroed .data - that way the
likelihood of future bugs of the same ilk is smaller.

Okay, I have spent some time looking at all the fields here, and their
significance before reaching this code path in ReplicationSlotCreate,
but did not find any hole if slot->data is zeroed. So here is an
updated patch. You could get rid of all the field initializations I
have done for slot->data, but I think that's cheap to keep them.
--
Michael

Attachments:

fix-repslot-init-v3.patchapplication/x-patch; name=fix-repslot-init-v3.patchDownload+8-0
#9Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Stephen Frost (#5)
Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

"Stephen" == Stephen Frost <sfrost@snowman.net> writes:

Stephen> Seems like we should try to get this in before the next round
Stephen> of point releases...?

I notice that this didn't happen....

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#10Michael Paquier
michael@paquier.xyz
In reply to: Andrew Gierth (#9)
Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

On Tue, Aug 9, 2016 at 10:30 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:

"Stephen" == Stephen Frost <sfrost@snowman.net> writes:

Stephen> Seems like we should try to get this in before the next round
Stephen> of point releases...?

I notice that this didn't happen....

At this point I have added it to the next CF as a bug fix:
https://commitfest.postgresql.org/10/705/
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#11Andres Freund
andres@anarazel.de
In reply to: Andrew Gierth (#9)
Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

On 2016-08-09 14:30:58 +0100, Andrew Gierth wrote:

"Stephen" == Stephen Frost <sfrost@snowman.net> writes:

Stephen> Seems like we should try to get this in before the next round
Stephen> of point releases...?

I notice that this didn't happen....

I'll piuck it up when I'm back from holidays (Tuesday/Wednesday). Sorry
for the delay.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#12Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#11)
Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

On Mon, Aug 15, 2016 at 10:03 AM, Andres Freund <andres@anarazel.de> wrote:

On 2016-08-09 14:30:58 +0100, Andrew Gierth wrote:

"Stephen" == Stephen Frost <sfrost@snowman.net> writes:

Stephen> Seems like we should try to get this in before the next round
Stephen> of point releases...?

I notice that this didn't happen....

I'll piuck it up when I'm back from holidays (Tuesday/Wednesday). Sorry
for the delay.

Thanks!
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#13Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#8)
Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

On 2016-07-28 14:22:29 +0900, Michael Paquier wrote:

On Thu, Jul 28, 2016 at 9:24 AM, Andres Freund <andres@anarazel.de> wrote:

I think it'd be better if we explicitly zeroed .data - that way the
likelihood of future bugs of the same ilk is smaller.

Okay, I have spent some time looking at all the fields here, and their
significance before reaching this code path in ReplicationSlotCreate,
but did not find any hole if slot->data is zeroed. So here is an
updated patch. You could get rid of all the field initializations I
have done for slot->data, but I think that's cheap to keep them.

Pushed without the additional initializations - they're imo more
confusing than helpful - and with some more reordering to match the
struct order.

Regards,

Andres

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs