locking [user] catalog tables vs 2pc vs logical rep

Started by Andres Freundover 5 years ago57 messageshackers

andres@anarazel.de

over 5 years ago

Hi,

The 2pc decoding added in

commit a271a1b50e9bec07e2ef3a05e38e7285113e4ce6
Author: Amit Kapila <akapila@postgresql.org>
Date: 2021-01-04 08:34:50 +0530

Allow decoding at prepare time in ReorderBuffer.

has a deadlock danger when used in a way that takes advantage of
separate decoding of the 2PC PREPARE.

I assume the goal of decoding the 2PC PREPARE is so one can wait for the
PREPARE to have logically replicated, before doing the COMMIT PREPARED.

However, currently it's pretty easy to get into a state where logical
decoding cannot progress until the 2PC transaction has
committed/aborted. Which essentially would lead to undetected deadlocks.

The problem is that catalog tables accessed during logical decoding need
to get locked (otherwise e.g. a table rewrite could happen
concurrently). But if the prepared transaction itself holds a lock on a
catalog table, logical decoding will block on that lock - which won't be
released until replication progresses. A deadlock.

A trivial example:

SELECT pg_create_logical_replication_slot('test', 'test_decoding');
CREATE TABLE foo(id serial primary key);
BEGIN;
LOCK pg_class;
INSERT INTO foo DEFAULT VALUES;
PREPARE TRANSACTION 'foo';

-- hangs waiting for pg_class to be unlocked
SELECT pg_logical_slot_get_changes('test', NULL, NULL, 'two-phase-commit', '1');

Now, more realistic versions of this scenario would probably lock a
'user catalog table' containing replication metadata instead of
pg_class, but ...

At first this seems to be a significant issue. But on the other hand, if
you were to shut the cluster down in this situation (or disconnect all
sessions), you have broken cluster on your hand - without logical
decoding being involved. As it turns out, we need to read pg_class to
log in... And I can't remember this being reported to be a problem?

Perhaps all that we need to do is to disallow 2PC prepare if [user]
catalog tables have been locked exclusively? Similar to how we're
disallowing preparing tables with temp table access.

Greetings,

Andres Freund

Amit Kapila

amit.kapila16@gmail.com

over 5 years ago

In reply to: Andres Freund (#1)

Re: locking [user] catalog tables vs 2pc vs logical rep

On Tue, Feb 23, 2021 at 3:58 AM Andres Freund <andres@anarazel.de> wrote:

At first this seems to be a significant issue. But on the other hand, if
you were to shut the cluster down in this situation (or disconnect all
sessions), you have broken cluster on your hand - without logical
decoding being involved. As it turns out, we need to read pg_class to
log in... And I can't remember this being reported to be a problem?

I don't remember seeing such a report but I think that is a reason
enough (leaving aside logical decoding of 2PC) to either disallow
locking catalog tables or at least document it in some way.

Perhaps all that we need to do is to disallow 2PC prepare if [user]
catalog tables have been locked exclusively?

Right, and we have discussed this during development [1]/messages/by-id/CAA4eK1JeeXOwD6rYnhSOYk5YN-fUTmxe1GkTpN2-BvgnKN6gZg@mail.gmail.com[2]/messages/by-id/CAMGcDxf83P5SGnGH52=_0wRP9pO6uRWCMRwAA0nxKtZvir2_vQ@mail.gmail.com. We
thought either we disallow this operation or will document it. I
thought of doing this along with a core-implementation of Prepare
waiting to get it logically replicated. But at this stage, I think if
the user wants he can do a similar thing in his application where
after prepare it can wait for the transaction to get logically
replicated (if they have their own replication solution based on
logical decoding) and then decide whether to rollback or commit. So,
maybe we should either disallow this operation or at least document
it. What do you think?

Similar to how we're
disallowing preparing tables with temp table access.

Yeah, we disallow other things like pg_export_snapshot as well in a
Prepared transaction, so we can probably disallow this operation as
well.

[1]: /messages/by-id/CAA4eK1JeeXOwD6rYnhSOYk5YN-fUTmxe1GkTpN2-BvgnKN6gZg@mail.gmail.com
[2]: /messages/by-id/CAMGcDxf83P5SGnGH52=_0wRP9pO6uRWCMRwAA0nxKtZvir2_vQ@mail.gmail.com

--
With Regards,
Amit Kapila.

Andres Freund

andres@anarazel.de

over 5 years ago

In reply to: Amit Kapila (#2)

Re: locking [user] catalog tables vs 2pc vs logical rep

Hi,

On 2021-02-23 08:56:39 +0530, Amit Kapila wrote:

On Tue, Feb 23, 2021 at 3:58 AM Andres Freund <andres@anarazel.de> wrote:

Perhaps all that we need to do is to disallow 2PC prepare if [user]
catalog tables have been locked exclusively?

Right, and we have discussed this during development [1][2].

I remember bringing it up before as well... Issues like this really need
to be mentioned as explicit caveats at least somewhere in the code and
commit message. You can't expect people to look at 3+ year old threads.

Greetings,

Andres Freund

Amit Kapila

amit.kapila16@gmail.com

over 5 years ago

In reply to: Andres Freund (#3)

Re: locking [user] catalog tables vs 2pc vs logical rep

On Tue, Feb 23, 2021 at 9:09 AM Andres Freund <andres@anarazel.de> wrote:

On 2021-02-23 08:56:39 +0530, Amit Kapila wrote:

On Tue, Feb 23, 2021 at 3:58 AM Andres Freund <andres@anarazel.de> wrote:

Perhaps all that we need to do is to disallow 2PC prepare if [user]
catalog tables have been locked exclusively?

Right, and we have discussed this during development [1][2].

I remember bringing it up before as well... Issues like this really need
to be mentioned as explicit caveats at least somewhere in the code and
commit message.

Okay, so is it sufficient to add comments in code, or do we want to
add something in docs? I am not completely sure if we need to add in
docs till we have core-implementation of prepare waiting to get
logically replicated.

--
With Regards,
Amit Kapila.

Andres Freund

andres@anarazel.de

over 5 years ago

In reply to: Amit Kapila (#4)

Re: locking [user] catalog tables vs 2pc vs logical rep

On 2021-02-23 09:24:18 +0530, Amit Kapila wrote:

Okay, so is it sufficient to add comments in code, or do we want to
add something in docs? I am not completely sure if we need to add in
docs till we have core-implementation of prepare waiting to get
logically replicated.

There's plenty users of logical decoding that aren't going through the
normal replication mechanism - so they can hit this. So I think it needs
to be documented somewhere.

Greetings,

Andres Freund

Amit Kapila

amit.kapila16@gmail.com

over 5 years ago

In reply to: Andres Freund (#5)

Re: locking [user] catalog tables vs 2pc vs logical rep

On Tue, Feb 23, 2021 at 9:33 AM Andres Freund <andres@anarazel.de> wrote:

On 2021-02-23 09:24:18 +0530, Amit Kapila wrote:

Okay, so is it sufficient to add comments in code, or do we want to
add something in docs? I am not completely sure if we need to add in
docs till we have core-implementation of prepare waiting to get
logically replicated.

There's plenty users of logical decoding that aren't going through the
normal replication mechanism - so they can hit this. So I think it needs
to be documented somewhere.

As per discussion, the attached patch updates both docs and comments
in the code.

--
With Regards,
Amit Kapila.

Amit Kapila

amit.kapila16@gmail.com

about 5 years ago

In reply to: Amit Kapila (#6)

Re: locking [user] catalog tables vs 2pc vs logical rep

On Tue, Feb 23, 2021 at 12:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Feb 23, 2021 at 9:33 AM Andres Freund <andres@anarazel.de> wrote:

On 2021-02-23 09:24:18 +0530, Amit Kapila wrote:

Okay, so is it sufficient to add comments in code, or do we want to
add something in docs? I am not completely sure if we need to add in
docs till we have core-implementation of prepare waiting to get
logically replicated.

There's plenty users of logical decoding that aren't going through the
normal replication mechanism - so they can hit this. So I think it needs
to be documented somewhere.

As per discussion, the attached patch updates both docs and comments
in the code.

I have pushed this patch.

--
With Regards,
Amit Kapila.

vignesh C

vignesh21@gmail.com

about 5 years ago

In reply to: Andres Freund (#1)

Re: locking [user] catalog tables vs 2pc vs logical rep

On Tue, Feb 23, 2021 at 3:59 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

The 2pc decoding added in

commit a271a1b50e9bec07e2ef3a05e38e7285113e4ce6
Author: Amit Kapila <akapila@postgresql.org>
Date: 2021-01-04 08:34:50 +0530

Allow decoding at prepare time in ReorderBuffer.

has a deadlock danger when used in a way that takes advantage of
separate decoding of the 2PC PREPARE.

I assume the goal of decoding the 2PC PREPARE is so one can wait for the
PREPARE to have logically replicated, before doing the COMMIT PREPARED.

However, currently it's pretty easy to get into a state where logical
decoding cannot progress until the 2PC transaction has
committed/aborted. Which essentially would lead to undetected deadlocks.

The problem is that catalog tables accessed during logical decoding need
to get locked (otherwise e.g. a table rewrite could happen
concurrently). But if the prepared transaction itself holds a lock on a
catalog table, logical decoding will block on that lock - which won't be
released until replication progresses. A deadlock.

A trivial example:

SELECT pg_create_logical_replication_slot('test', 'test_decoding');
CREATE TABLE foo(id serial primary key);
BEGIN;
LOCK pg_class;
INSERT INTO foo DEFAULT VALUES;
PREPARE TRANSACTION 'foo';

-- hangs waiting for pg_class to be unlocked
SELECT pg_logical_slot_get_changes('test', NULL, NULL, 'two-phase-commit', '1');

Now, more realistic versions of this scenario would probably lock a
'user catalog table' containing replication metadata instead of
pg_class, but ...

At first this seems to be a significant issue. But on the other hand, if
you were to shut the cluster down in this situation (or disconnect all
sessions), you have broken cluster on your hand - without logical
decoding being involved. As it turns out, we need to read pg_class to
log in... And I can't remember this being reported to be a problem?

Perhaps all that we need to do is to disallow 2PC prepare if [user]
catalog tables have been locked exclusively? Similar to how we're
disallowing preparing tables with temp table access.

Even I felt we should not allow prepare a transaction that has locked
system tables, as it does not allow creating a new session after
restart and also causes the deadlock while logical decoding of
prepared transaction.
I have made a patch to make the prepare transaction fail in this
scenario. Attached the patch for the same.
Thoughts?

Regards,
Vignesh

Ajin Cherian

itsajin@gmail.com

about 5 years ago

In reply to: vignesh C (#8)

Re: locking [user] catalog tables vs 2pc vs logical rep

On Tue, Mar 16, 2021 at 1:36 AM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Feb 23, 2021 at 3:59 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

The 2pc decoding added in

commit a271a1b50e9bec07e2ef3a05e38e7285113e4ce6
Author: Amit Kapila <akapila@postgresql.org>
Date: 2021-01-04 08:34:50 +0530

Allow decoding at prepare time in ReorderBuffer.

has a deadlock danger when used in a way that takes advantage of
separate decoding of the 2PC PREPARE.

I assume the goal of decoding the 2PC PREPARE is so one can wait for the
PREPARE to have logically replicated, before doing the COMMIT PREPARED.

However, currently it's pretty easy to get into a state where logical
decoding cannot progress until the 2PC transaction has
committed/aborted. Which essentially would lead to undetected deadlocks.

The problem is that catalog tables accessed during logical decoding need
to get locked (otherwise e.g. a table rewrite could happen
concurrently). But if the prepared transaction itself holds a lock on a
catalog table, logical decoding will block on that lock - which won't be
released until replication progresses. A deadlock.

A trivial example:

SELECT pg_create_logical_replication_slot('test', 'test_decoding');
CREATE TABLE foo(id serial primary key);
BEGIN;
LOCK pg_class;
INSERT INTO foo DEFAULT VALUES;
PREPARE TRANSACTION 'foo';

-- hangs waiting for pg_class to be unlocked
SELECT pg_logical_slot_get_changes('test', NULL, NULL,

'two-phase-commit', '1');

Now, more realistic versions of this scenario would probably lock a
'user catalog table' containing replication metadata instead of
pg_class, but ...

At first this seems to be a significant issue. But on the other hand, if
you were to shut the cluster down in this situation (or disconnect all
sessions), you have broken cluster on your hand - without logical
decoding being involved. As it turns out, we need to read pg_class to
log in... And I can't remember this being reported to be a problem?

Perhaps all that we need to do is to disallow 2PC prepare if [user]
catalog tables have been locked exclusively? Similar to how we're
disallowing preparing tables with temp table access.

Even I felt we should not allow prepare a transaction that has locked
system tables, as it does not allow creating a new session after
restart and also causes the deadlock while logical decoding of
prepared transaction.
I have made a patch to make the prepare transaction fail in this
scenario. Attached the patch for the same.
Thoughts?

The patch applies fine on HEAD and "make check" passes fine. No major
comments on the patch, just a minor comment:

If you could change the error from, " cannot PREPARE a transaction that has
a lock on user catalog/system table(s)"
to "cannot PREPARE a transaction that has an *exclusive lock* on user
catalog/system table(s)" that would be a more
accurate instruction to the user.

regards,
Ajin Cherian
Fujitsu Australia

#10

vignesh C

vignesh21@gmail.com

about 5 years ago

In reply to: Ajin Cherian (#9)

Re: locking [user] catalog tables vs 2pc vs logical rep

On Wed, Mar 31, 2021 at 2:35 PM Ajin Cherian <itsajin@gmail.com> wrote:

The patch applies fine on HEAD and "make check" passes fine. No major comments on the patch, just a minor comment:

If you could change the error from, " cannot PREPARE a transaction that has a lock on user catalog/system table(s)"
to "cannot PREPARE a transaction that has an exclusive lock on user catalog/system table(s)" that would be a more
accurate instruction to the user.

Thanks for reviewing the patch.
Please find the updated patch which includes the fix for the same.

Regards,
Vignesh

#11

vignesh C

vignesh21@gmail.com

about 5 years ago

In reply to: vignesh C (#10)

Re: locking [user] catalog tables vs 2pc vs logical rep

On Wed, Mar 31, 2021 at 5:47 PM vignesh C <vignesh21@gmail.com> wrote:

On Wed, Mar 31, 2021 at 2:35 PM Ajin Cherian <itsajin@gmail.com> wrote:

The patch applies fine on HEAD and "make check" passes fine. No major comments on the patch, just a minor comment:

If you could change the error from, " cannot PREPARE a transaction that has a lock on user catalog/system table(s)"
to "cannot PREPARE a transaction that has an exclusive lock on user catalog/system table(s)" that would be a more
accurate instruction to the user.

Thanks for reviewing the patch.
Please find the updated patch which includes the fix for the same.

This similar problem exists in case of synchronous replication setup
having synchronous_standby_names referring to the subscriber, when we
do the steps "begin;lock pg_class; insert into test1 values(10);
commit". In this case while decoding of commit, the commit will wait
while trying to acquire a lock on pg_class relation, stack trace for
the same is given below:
#4 0x0000556936cd5d37 in ProcSleep (locallock=0x556937de8728,
lockMethodTable=0x5569371c2620 <default_lockmethod>) at proc.c:1361
#5 0x0000556936cc294a in WaitOnLock (locallock=0x556937de8728,
owner=0x556937e3cd90) at lock.c:1858
#6 0x0000556936cc1231 in LockAcquireExtended (locktag=0x7ffcbb23cff0,
lockmode=1, sessionLock=false, dontWait=false, reportMemoryError=true,
locallockp=0x7ffcbb23cfe8)
at lock.c:1100
#7 0x0000556936cbdbce in LockRelationOid (relid=1259, lockmode=1) at lmgr.c:117
#8 0x00005569367afb12 in relation_open (relationId=1259, lockmode=1)
at relation.c:56
#9 0x00005569368888a2 in table_open (relationId=1259, lockmode=1) at table.c:43
#10 0x0000556936e90a91 in RelidByRelfilenode (reltablespace=0,
relfilenode=16385) at relfilenodemap.c:192
#11 0x0000556936c40361 in ReorderBufferProcessTXN (rb=0x556937e8e760,
txn=0x556937eb8778, commit_lsn=23752880, snapshot_now=0x556937ea0a90,
command_id=0, streaming=false)
at reorderbuffer.c:2122
#12 0x0000556936c411b7 in ReorderBufferReplay (txn=0x556937eb8778,
rb=0x556937e8e760, xid=590, commit_lsn=23752880, end_lsn=23752928,
commit_time=672204445820756,
origin_id=0, origin_lsn=0) at reorderbuffer.c:2589
#13 0x0000556936c41239 in ReorderBufferCommit (rb=0x556937e8e760,
xid=590, commit_lsn=23752880, end_lsn=23752928,
commit_time=672204445820756, origin_id=0, origin_lsn=0)
at reorderbuffer.c:2613
#14 0x0000556936c2f4d9 in DecodeCommit (ctx=0x556937e8c750,
buf=0x7ffcbb23d610, parsed=0x7ffcbb23d4b0, xid=590, two_phase=false)
at decode.c:744

Thoughts?

Regards,
Vignesh

#12

Amit Kapila

amit.kapila16@gmail.com

about 5 years ago

In reply to: vignesh C (#11)

Re: locking [user] catalog tables vs 2pc vs logical rep

On Tue, Apr 20, 2021 at 9:57 AM vignesh C <vignesh21@gmail.com> wrote:

This similar problem exists in case of synchronous replication setup
having synchronous_standby_names referring to the subscriber, when we
do the steps "begin;lock pg_class; insert into test1 values(10);
commit". In this case while decoding of commit, the commit will wait
while trying to acquire a lock on pg_class relation,

So, this appears to be an existing caveat of synchronous replication.
If that is the case, I am not sure if it is a good idea to just block
such ops for the prepared transaction. Also, what about other
operations which acquire an exclusive lock on [user]_catalog_tables
like:
cluster pg_trigger using pg_class_oid_index, similarly cluster on any
user_catalog_table, then the other problematic operation could
truncate of user_catalog_table as is discussed in another thread [1]https://www.postgresql.org/message- id/OSBPR01MB4888314C70DA6B112E32DD6AED2B9%40OSBPR01MB4888.jpnprd01.prod.outlook.com.
I think all such operations can block even with synchronous
replication. I am not sure if we can create examples for all cases
because for ex. we don't have use of user_catalog_tables in-core but
maybe for others, we can try to create examples and see what happens?

If all such operations can block for synchronous replication and
prepared transactions replication then we might want to document them
as caveats at page:
https://www.postgresql.org/docs/devel/logicaldecoding-synchronous.html
and then also give the reference for these caveats at prepared
transactions page:https://www.postgresql.org/docs/devel/logicaldecoding-two-phase-commits.html

What do you think?

As this appears to be an existing caveat of logical replication, I
have added the Petr and Peter E in this email.

[1]: https://www.postgresql.org/message- id/OSBPR01MB4888314C70DA6B112E32DD6AED2B9%40OSBPR01MB4888.jpnprd01.prod.outlook.com
id/OSBPR01MB4888314C70DA6B112E32DD6AED2B9%40OSBPR01MB4888.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.

#13

Michael Paquier

michael@paquier.xyz

about 5 years ago

In reply to: Andres Freund (#1)

Re: locking [user] catalog tables vs 2pc vs logical rep

On Mon, Feb 22, 2021 at 02:28:47PM -0800, Andres Freund wrote:

Perhaps all that we need to do is to disallow 2PC prepare if [user]
catalog tables have been locked exclusively? Similar to how we're
disallowing preparing tables with temp table access.

At least for anything involving critical relations that get loaded at
startup? It seems to me that if we can avoid users to get them
completely locked out even if they run the operation on an object
they own, that would be better than requiring tweaks involving
pg_resetwal or equal to rip of the 2PC transaction from existence.
--
Michael

#14

Michael Paquier

michael@paquier.xyz

about 5 years ago

In reply to: Amit Kapila (#12)

Re: locking [user] catalog tables vs 2pc vs logical rep

On Mon, May 24, 2021 at 10:03:01AM +0530, Amit Kapila wrote:

So, this appears to be an existing caveat of synchronous replication.
If that is the case, I am not sure if it is a good idea to just block
such ops for the prepared transaction. Also, what about other
operations which acquire an exclusive lock on [user]_catalog_tables
like:
cluster pg_trigger using pg_class_oid_index, similarly cluster on any
user_catalog_table, then the other problematic operation could
truncate of user_catalog_table as is discussed in another thread [1].
I think all such operations can block even with synchronous
replication. I am not sure if we can create examples for all cases
because for ex. we don't have use of user_catalog_tables in-core but
maybe for others, we can try to create examples and see what happens?

If all such operations can block for synchronous replication and
prepared transactions replication then we might want to document them
as caveats at page:
https://www.postgresql.org/docs/devel/logicaldecoding-synchronous.html
and then also give the reference for these caveats at prepared
transactions page:https://www.postgresql.org/docs/devel/logicaldecoding-two-phase-commits.html

What do you think?

It seems to me that the 2PC issues on catalog tables and the issues
related to logical replication in synchonous mode are two distinct
things that need to be fixed separately.

The issue with LOCK taken on a catalog while a PREPARE TRANSACTION
holds locks around is bad enough in itself as it could lock down a
user from a cluster as long as the PREPARE TRANSACTION is not removed
from WAL (say the relation is critical for the connection startup).
This could be really disruptive for the user even if he tried to take
a lock on an object he owns, and the way to recover is not easy here,
and the way to recover involves either an old backup or worse,
pg_resetwal.

The second issue with logical replication is still disruptive, but it
looks to me more like a don't-do-it issue, and documenting the caveats
sounds fine enough.

Looking at the patch from upthread..

+           /*
+            * Make note that we've locked a system table or an user catalog
+            * table. This flag will be checked later during prepare transaction
+            * to fail the prepare transaction.
+            */
+           if (lockstmt->mode >= ExclusiveLock &&
+               (IsCatalogRelationOid(reloid) ||
+                RelationIsUsedAsCatalogTable(rel)))
+               MyXactFlags |= XACT_FLAGS_ACQUIREDEXCLUSIVELOCK_SYSREL;
I think that I'd just use IsCatalogRelationOid() here, and I'd be more
severe and restrict all attempts for any lock levels.  It seems to me
that this needs to happen within RangeVarCallbackForLockTable().
I would also rename the flag as just XACT_FLAGS_LOCKEDCATALOG.

+                errmsg("cannot PREPARE a transaction that has an exclusive lock on user catalog/system table(s)")));
What about "cannot PREPARE a transaction that has locked a catalog
relation"?
--
Michael

locking [user] catalog tables vs 2pc vs logical rep

Attachments:

Attachments:

Attachments: