invisible commit question for sync replication

Started by qihua wuabout 3 years ago6 messagesgeneral
Jump to latest
#1qihua wu
staywithpin@gmail.com

When run a cluster with sync replication, if DML is done on primary, but
primary is isolated from all slave, then the DML will hang, if cancel it
DML, it will say:
WARNING: canceling wait for synchronous replication due to user request
DETAIL: The transaction has already committed locally, but might not have
been replicated to the standby

So the workflow is
1: commit to local.
2: waiting for ACK from remote sync.

When cancel the DML at step 2. the data are arealy on local, that's why
it's warning.

But when runs an insert which is waiting for remote ACK, and then query
from another session, I didn't find that row. Why this happen? If the
insert is already one locally, whey another session can't read it?

#2Julien Rouhaud
rjuju123@gmail.com
In reply to: qihua wu (#1)
Re: invisible commit question for sync replication

Hi,

On Wed, Feb 01, 2023 at 02:52:49PM +0800, qihua wu wrote:

When run a cluster with sync replication, if DML is done on primary, but
primary is isolated from all slave, then the DML will hang, if cancel it
DML, it will say:
WARNING: canceling wait for synchronous replication due to user request
DETAIL: The transaction has already committed locally, but might not have
been replicated to the standby

So the workflow is
1: commit to local.
2: waiting for ACK from remote sync.

When cancel the DML at step 2. the data are arealy on local, that's why
it's warning.

But when runs an insert which is waiting for remote ACK, and then query
from another session, I didn't find that row. Why this happen? If the
insert is already one locally, whey another session can't read it?

It works as expected for me, are you sure both sessions are actually connected
to the same server and/or querying the same table?

[1456]: rjuju@127.0.0.1:14295) rjuju=# select pg_backend_pid(), * from tt; pg_backend_pid | id | val ----------------+----+-------- 1456 | 1 | <NULL> (1 row)
id | val
----+-----
(0 rows)

[1456]: rjuju@127.0.0.1:14295) rjuju=# select pg_backend_pid(), * from tt; pg_backend_pid | id | val ----------------+----+-------- 1456 | 1 | <NULL> (1 row)
^CCancel request sent
WARNING: 01000: canceling wait for synchronous replication due to user request
DETAIL: The transaction has already committed locally, but might not have been replicated to the standby.
LOCATION: SyncRepWaitForLSN, syncrep.c:287
INSERT 0 1

[1456]: rjuju@127.0.0.1:14295) rjuju=# select pg_backend_pid(), * from tt; pg_backend_pid | id | val ----------------+----+-------- 1456 | 1 | <NULL> (1 row)
pg_backend_pid | id | val
----------------+----+--------
1456 | 1 | <NULL>
(1 row)

and another session:

[3327]: rjuju@127.0.0.1:14295) rjuju=# select pg_backend_pid(), * from tt; pg_backend_pid | id | val ----------------+----+-------- 3327 | 1 | <NULL> (1 row)
pg_backend_pid | id | val
----------------+----+--------
3327 | 1 | <NULL>
(1 row)

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Julien Rouhaud (#2)
Re: invisible commit question for sync replication

On Wednesday, February 1, 2023, Julien Rouhaud <rjuju123@gmail.com> wrote:

Hi,

On Wed, Feb 01, 2023 at 02:52:49PM +0800, qihua wu wrote:

When run a cluster with sync replication, if DML is done on primary, but
primary is isolated from all slave, then the DML will hang, if cancel it
DML, it will say:
WARNING: canceling wait for synchronous replication due to user request
DETAIL: The transaction has already committed locally, but might not

have

been replicated to the standby

So the workflow is
1: commit to local.
2: waiting for ACK from remote sync.

When cancel the DML at step 2. the data are arealy on local, that's why
it's warning.

But when runs an insert which is waiting for remote ACK, and then query
from another session, I didn't find that row. Why this happen? If the
insert is already one locally, whey another session can't read it?

It works as expected for me, are you sure both sessions are actually
connected
to the same server and/or querying the same table?

[1456]rjuju@127.0.0.1:14295) rjuju=# select * from tt;
id | val
----+-----
(0 rows)

[1456]rjuju@127.0.0.1:14295) rjuju=# insert into tt select 1;
^CCancel request sent
WARNING: 01000: canceling wait for synchronous replication due to user
request
DETAIL: The transaction has already committed locally, but might not have
been replicated to the standby.
LOCATION: SyncRepWaitForLSN, syncrep.c:287
INSERT 0 1

[1456]rjuju@127.0.0.1:14295) rjuju=# select pg_backend_pid(), * from tt;
pg_backend_pid | id | val
----------------+----+--------
1456 | 1 | <NULL>
(1 row)

and another session:

[3327]rjuju@127.0.0.1:14295) rjuju=# select pg_backend_pid(), * from tt;
pg_backend_pid | id | val
----------------+----+--------
3327 | 1 | <NULL>
(1 row)

This wasn’t the question though. Can the second session see the inserted
row before you cancel the insert that is waiting for sync ack?

Supposedly it can (not able to test myself). Basically, the primary waits
to make the local transaction visible until either sync ack or until the
wait for sync ack is cancelled. It doesn’t make sense to make it visible
while waiting for sync ack since that would defeat the very behavior sync
ack provides for.

David J.

#4qihua wu
staywithpin@gmail.com
In reply to: David G. Johnston (#3)
Re: invisible commit question for sync replication

==》Can the second session see the inserted row before you cancel the insert
that is waiting for sync ack?
The second session can NOT see the inserted row if the first session is
still waiting for sync ACK.

I checked the source code, it makes sense to me now:
The waiting for sync ACK is called in
EndPrepare(gxact);
and after that it will call
ProcArrayClearTransaction(MyProc);

and only after ProcArrayClearTransaction(MyProc) is called, will the row be
visible to others.

On Wed, Feb 1, 2023 at 3:21 PM David G. Johnston <david.g.johnston@gmail.com>
wrote:

Show quoted text

On Wednesday, February 1, 2023, Julien Rouhaud <rjuju123@gmail.com> wrote:

Hi,

On Wed, Feb 01, 2023 at 02:52:49PM +0800, qihua wu wrote:

When run a cluster with sync replication, if DML is done on primary, but
primary is isolated from all slave, then the DML will hang, if cancel it
DML, it will say:
WARNING: canceling wait for synchronous replication due to user request
DETAIL: The transaction has already committed locally, but might not

have

been replicated to the standby

So the workflow is
1: commit to local.
2: waiting for ACK from remote sync.

When cancel the DML at step 2. the data are arealy on local, that's why
it's warning.

But when runs an insert which is waiting for remote ACK, and then query
from another session, I didn't find that row. Why this happen? If the
insert is already one locally, whey another session can't read it?

It works as expected for me, are you sure both sessions are actually
connected
to the same server and/or querying the same table?

[1456]rjuju@127.0.0.1:14295) rjuju=# select * from tt;
id | val
----+-----
(0 rows)

[1456]rjuju@127.0.0.1:14295) rjuju=# insert into tt select 1;
^CCancel request sent
WARNING: 01000: canceling wait for synchronous replication due to user
request
DETAIL: The transaction has already committed locally, but might not
have been replicated to the standby.
LOCATION: SyncRepWaitForLSN, syncrep.c:287
INSERT 0 1

[1456]rjuju@127.0.0.1:14295) rjuju=# select pg_backend_pid(), * from tt;
pg_backend_pid | id | val
----------------+----+--------
1456 | 1 | <NULL>
(1 row)

and another session:

[3327]rjuju@127.0.0.1:14295) rjuju=# select pg_backend_pid(), * from tt;
pg_backend_pid | id | val
----------------+----+--------
3327 | 1 | <NULL>
(1 row)

This wasn’t the question though. Can the second session see the inserted
row before you cancel the insert that is waiting for sync ack?

Supposedly it can (not able to test myself). Basically, the primary waits
to make the local transaction visible until either sync ack or until the
wait for sync ack is cancelled. It doesn’t make sense to make it visible
while waiting for sync ack since that would defeat the very behavior sync
ack provides for.

David J.

#5Laurenz Albe
laurenz.albe@cybertec.at
In reply to: qihua wu (#1)
Re: invisible commit question for sync replication

On Wed, 2023-02-01 at 14:52 +0800, qihua wu wrote:

When run a cluster with sync replication, if DML is done on primary, but primary is
isolated from all slave, then the DML will hang, if cancel it DML, it will say:
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby

So the workflow is
1: commit to local.
2: waiting for ACK from remote sync.

When cancel the DML at step 2. the data are arealy on local, that's why it's warning.

But when runs an insert which is waiting for remote ACK, and then query from another
session, I didn't find that row. Why this happen? If the insert is already one locally,
whey another session can't read it?

COMMIT is not as atomic as it appears. When the backend is waiting for the standby,
it has already committed the transaction on disk, but that fact is not advertised to
the other backends yet.

Yours,
Laurenz Albe

#6qihua wu
staywithpin@gmail.com
In reply to: Laurenz Albe (#5)
Re: invisible commit question for sync replication

In the code it will write a warning to postgresql log. Why not also write
the detailed sql? with the exact sql, DBA might do something to fix the
issue.

if (ProcDiePending)
{
ereport(WARNING,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("canceling the wait for synchronous replication and terminating
connection due to administrator command"),
errdetail("The transaction has already committed locally, but might not
have been replicated to the standby.")));
whereToSendOutput = DestNone;
SyncRepCancelWait();
break;
}

On Wed, Feb 1, 2023 at 4:38 PM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

Show quoted text

On Wed, 2023-02-01 at 14:52 +0800, qihua wu wrote:

When run a cluster with sync replication, if DML is done on primary, but

primary is

isolated from all slave, then the DML will hang, if cancel it DML, it

will say:

WARNING: canceling wait for synchronous replication due to user request
DETAIL: The transaction has already committed locally, but might not

have been replicated to the standby

So the workflow is
1: commit to local.
2: waiting for ACK from remote sync.

When cancel the DML at step 2. the data are arealy on local, that's why

it's warning.

But when runs an insert which is waiting for remote ACK, and then query

from another

session, I didn't find that row. Why this happen? If the insert is

already one locally,

whey another session can't read it?

COMMIT is not as atomic as it appears. When the backend is waiting for
the standby,
it has already committed the transaction on disk, but that fact is not
advertised to
the other backends yet.

Yours,
Laurenz Albe