Two-phase update of restart_lsn in LogicalConfirmReceivedLocation

Started by Arseny Sherabout 8 years ago5 messageshackers

a.sher@postgrespro.ru

about 8 years ago

Hello,

In LogicalConfirmReceivedLocation two fields (data.catalog_xmin and
effective_catalog_xmin) of ReplicationSlot structure are used for
advancing xmin of the slot. This allows to avoid hole when tuples might
already have been vacuumed, but slot's state was not yet flushed to the
disk: if we crash during this hole, after restart we would assume that
all tuples with xmax > old catalog_xmin are still there, while actually
some of them might be already vacuumed. However, the same dodge is not
used for restart_lsn advancement. This means that under somewhat
unlikely circumstances it is possible that we will start decoding from
segment which was already recycled, making the slot broken. Shouldn't
this be fixed?

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Craig Ringer

craig@2ndquadrant.com

about 8 years ago

In reply to: Arseny Sher (#1)

Re: Two-phase update of restart_lsn in LogicalConfirmReceivedLocation

On 1 March 2018 at 13:39, Arseny Sher <a.sher@postgrespro.ru> wrote:

Hello,

In LogicalConfirmReceivedLocation two fields (data.catalog_xmin and
effective_catalog_xmin) of ReplicationSlot structure are used for
advancing xmin of the slot. This allows to avoid hole when tuples might
already have been vacuumed, but slot's state was not yet flushed to the
disk: if we crash during this hole, after restart we would assume that
all tuples with xmax > old catalog_xmin are still there, while actually
some of them might be already vacuumed. However, the same dodge is not
used for restart_lsn advancement. This means that under somewhat
unlikely circumstances it is possible that we will start decoding from
segment which was already recycled, making the slot broken. Shouldn't
this be fixed?

You mean the small window between

MyReplicationSlot->data.restart_lsn =
MyReplicationSlot->candidate_restart_lsn;

and

ReplicationSlotMarkDirty();
ReplicationSlotSave();

in LogicalConfirmReceivedLocation ?

We do release the slot spinlock after updating the slot and before dirtying
and flushing it. But to make the change visible, someone else would have to
call ReplicationSlotsComputeRequiredLSN(). That's possible by the looks,
and could be caused by a concurrent slot drop, physical slot confirmation,
or another logical slot handling a concurrent confirmation.

For something to break, we'd have to

* hit this race to update XLogCtl->replicationSlotMinLSN by a concurrent
call to ReplicationSlotsComputeRequiredLSN while in
LogicalConfirmReceivedLocation
* have the furthest-behind slot be the one in the race window in
LogicalConfirmReceivedLocation
* checkpoint here, before slot is marked dirty
* actually recycle/remove the needed xlog
* crash before writing the new slot state

Checkpoints write out slot state. But only for dirty slots. And we didn't
dirty the slot while we had its spinlock, we only dirty it just before
writing.

So I can't say it's definitely impossible. It seems astonishingly unlikely,
but that's not always good enough.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Robert Haas

robertmhaas@gmail.com

about 8 years ago

In reply to: Craig Ringer (#2)

Re: Two-phase update of restart_lsn in LogicalConfirmReceivedLocation

On Thu, Mar 1, 2018 at 2:03 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

So I can't say it's definitely impossible. It seems astonishingly unlikely,
but that's not always good enough.

Race conditions tend to happen a lot more often than one might think.
If there's a theoretical opportunity for this to go wrong, it would
probably be a good idea to fix it. Not that I'm volunteering.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Tom Lane

tgl@sss.pgh.pa.us

about 8 years ago

In reply to: Robert Haas (#3)

Re: Two-phase update of restart_lsn in LogicalConfirmReceivedLocation

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Mar 1, 2018 at 2:03 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

So I can't say it's definitely impossible. It seems astonishingly unlikely,
but that's not always good enough.

Race conditions tend to happen a lot more often than one might think.

Just to back that up --- we've seen cases where people could repeatably
hit race-condition windows that are just an instruction or two wide.
The first one I came to in an idle archive search is
/messages/by-id/15543.1130714273@sss.pgh.pa.us
I vaguely recall others but don't feel like digging harder right now.

regards, tom lane

Craig Ringer

craig@2ndquadrant.com

about 8 years ago

In reply to: Tom Lane (#4)

Re: Two-phase update of restart_lsn in LogicalConfirmReceivedLocation

On 8 March 2018 at 07:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Mar 1, 2018 at 2:03 AM, Craig Ringer <craig@2ndquadrant.com>

wrote:

So I can't say it's definitely impossible. It seems astonishingly

unlikely,

but that's not always good enough.

Race conditions tend to happen a lot more often than one might think.

Just to back that up --- we've seen cases where people could repeatably
hit race-condition windows that are just an instruction or two wide.
The first one I came to in an idle archive search is
/messages/by-id/15543.1130714273@sss.pgh.pa.us
I vaguely recall others but don't feel like digging harder right now.

That's astonishing.

I guess if you repeat something enough times...

The reason I'm less concerned about this one is that you have to crash in
exactly the wrong place, *while* during a badly timed point in a race. But
the downside is that the result would be an unusable logical slot.

The simplest solution is probably just to mark the slot dirty while we hold
the spinlock, at the same time we advance its restart lsn. Any checkpoint
will then CheckPointReplicationSlots() and flush it. We don't
remove/recycle xlog segments until after that's done in CheckPointGuts() so
it's guaranteed that the slot's new state will be on disk and we can never
have a stale restart_lsn pointing into truncated-away WAL.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services