Issues with logical replication

Started by Konstantin Knizhnikover 8 years ago26 messages
#1Konstantin Knizhnik
k.knizhnik@postgrespro.ru

I have faced two issues with logical replication.
Reproducing scenario:

1. start two Postgres instances (I start both at local machine).
2. Initialize pgbench tables at both instances:
pgbench -i -s 10 postgres
3. Create publication of pgbench_accounts table at one node:
create publication pub for table pgbench_accounts;
4. Create subscription at another node:
delete from pgbench_accounts;
CREATE SUBSCRIPTION sub connection 'dbname=postgres host=localhost
port=5432 sslmode=disable' publication pub;
CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS
TRIGGER AS $$
BEGIN
-- RETURN NEW;
END $$ LANGUAGE plpgsql;
CREATE TRIGGER replication_trigger BEFORE INSERT OR UPDATE OR DELETE
ON pgbench_accounts FOR EACH ROW EXECUTE PROCEDURE
replication_trigger_proc();
ALTER TABLE pgbench_accounts ENABLE ALWAYS TRIGGER replication_trigger;
5. Start pgbench at primary node:
pgbench -T 1000 -P 2 -c 10 postgres

Please notice commented "return new" statement. Invocation of this
function cause error and in log we see repeated messages:

2017-10-02 16:47:46.764 MSK [32129] LOG: logical replication table
synchronization worker for subscription "sub", table "pgbench_accounts"
has started
2017-10-02 16:47:46.771 MSK [32129] ERROR: control reached end of
trigger procedure without RETURN
2017-10-02 16:47:46.771 MSK [32129] CONTEXT: PL/pgSQL function
replication_trigger_proc()
COPY pgbench_accounts, line 1: "1 1 0 "
2017-10-02 16:47:46.772 MSK [28020] LOG: worker process: logical
replication worker for subscription 17264 sync 17251 (PID 32129) exited
with exit code 1
...

After few minutes of work primary node (with publication) is crashed
with the following stack trace:

Program terminated with signal SIGABRT, Aborted.
#0 0x00007f3608f8ec37 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007f3608f8ec37 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f3608f92028 in __GI_abort () at abort.c:89
#2 0x00000000009f5740 in ExceptionalCondition (conditionName=0xbf6b30
"!(((xid) != ((TransactionId) 0)))",
errorType=0xbf69af "FailedAssertion", fileName=0xbf69a8 "lmgr.c",
lineNumber=582) at assert.c:54
#3 0x000000000086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0,
oper=XLTW_None) at lmgr.c:582
#4 0x000000000081c9f7 in SnapBuildWaitSnapshot (running=0x2749198,
cutoff=898498) at snapbuild.c:1400
#5 0x000000000081c7c7 in SnapBuildFindSnapshot (builder=0x2807910,
lsn=798477760, running=0x2749198) at snapbuild.c:1311
#6 0x000000000081c339 in SnapBuildProcessRunningXacts
(builder=0x2807910, lsn=798477760, running=0x2749198) at snapbuild.c:1106
#7 0x000000000080a1c7 in DecodeStandbyOp (ctx=0x27ef870,
buf=0x7ffd301858d0) at decode.c:314
#8 0x0000000000809ce1 in LogicalDecodingProcessRecord (ctx=0x27ef870,
record=0x27efb30) at decode.c:117
#9 0x000000000080ddf9 in DecodingContextFindStartpoint (ctx=0x27ef870)
at logical.c:458
#10 0x0000000000823968 in CreateReplicationSlot (cmd=0x27483a8) at
walsender.c:934
#11 0x00000000008246ee in exec_replication_command (
cmd_string=0x27b9520 "CREATE_REPLICATION_SLOT
\"sub_17264_sync_17251\" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT") at
walsender.c:1511
#12 0x000000000088eb44 in PostgresMain (argc=1, argv=0x275b738,
dbname=0x275b578 "postgres", username=0x272b800 "knizhnik")
at postgres.c:4086
#13 0x00000000007ef9a1 in BackendRun (port=0x27532a0) at postmaster.c:4357
#14 0x00000000007ef0cb in BackendStartup (port=0x27532a0) at
postmaster.c:4029
#15 0x00000000007eb68b in ServerLoop () at postmaster.c:1753
#16 0x00000000007eac77 in PostmasterMain (argc=3, argv=0x2729670) at
postmaster.c:1361
#17 0x0000000000728552 in main (argc=3, argv=0x2729670) at main.c:228

Now fix the trigger function:
CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$
BEGIN
RETURN NEW;
END $$ LANGUAGE plpgsql;

And manually perform at master two updates inside one transaction:

postgres=# begin;
BEGIN
postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
UPDATE 1
postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
UPDATE 1
postgres=# commit;
<hangs>

and in replica log we see:
2017-10-02 18:40:26.094 MSK [2954] LOG: logical replication apply
worker for subscription "sub" has started
2017-10-02 18:40:26.101 MSK [2954] ERROR: attempted to lock invisible
tuple
2017-10-02 18:40:26.102 MSK [2882] LOG: worker process: logical
replication worker for subscription 16403 (PID 2954) exited with exit
code 1

Error happens in trigger.c:

#3 0x000000000069bddb in GetTupleForTrigger (estate=0x2e36b50,
epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at
trigger.c:3103
#4 0x000000000069b259 in ExecBRUpdateTriggers (estate=0x2e36b50,
epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac,
fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748
#5 0x00000000006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50,
epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240)
at execReplication.c:461
#6 0x0000000000820894 in apply_handle_update (s=0x7ffc442163b0) at
worker.c:736
#7 0x0000000000820d83 in apply_dispatch (s=0x7ffc442163b0) at worker.c:892
#8 0x000000000082121b in LogicalRepApplyLoop (last_received=2225693736)
at worker.c:1095
#9 0x000000000082219d in ApplyWorkerMain (main_arg=0) at worker.c:1647
#10 0x00000000007dd496 in StartBackgroundWorker () at bgworker.c:835
#11 0x00000000007f0889 in do_start_bgworker (rw=0x2d75f50) at
postmaster.c:5680
#12 0x00000000007f0bc3 in maybe_start_bgworkers () at postmaster.c:5884
#13 0x00000000007efbeb in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:5073
#14 <signal handler called>
#15 0x00007fbe83a054a3 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:81
#16 0x00000000007eb517 in ServerLoop () at postmaster.c:1717
#17 0x00000000007eac48 in PostmasterMain (argc=3, argv=0x2d4e660) at
postmaster.c:1361
#18 0x0000000000728523 in main (argc=3, argv=0x2d4e660) at main.c:228

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Konstantin Knizhnik (#1)
Re: Issues with logical replication

On 02/10/17 17:49, Konstantin Knizhnik wrote:

I have faced two issues with logical replication.
Reproducing scenario:

1. start two Postgres instances (I start both at local machine).
2. Initialize pgbench tables at both instances:
    pgbench -i -s 10 postgres
3. Create publication of pgbench_accounts table at one node:
    create publication pub for table pgbench_accounts;
4. Create subscription at another node:
    delete from pgbench_accounts;
    CREATE SUBSCRIPTION sub connection 'dbname=postgres host=localhost
port=5432 sslmode=disable' publication pub;
    CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS
TRIGGER AS $$
    BEGIN
    --  RETURN NEW;
    END $$ LANGUAGE plpgsql;
   CREATE TRIGGER replication_trigger BEFORE INSERT OR UPDATE OR DELETE
ON pgbench_accounts FOR EACH ROW EXECUTE PROCEDURE
replication_trigger_proc();
   ALTER TABLE pgbench_accounts ENABLE ALWAYS TRIGGER replication_trigger;
5. Start pgbench at primary node:
    pgbench -T 1000 -P 2 -c 10 postgres

Please notice commented "return new" statement. Invocation of this
function cause error and in log we see repeated messages:

2017-10-02 16:47:46.764 MSK [32129] LOG:  logical replication table
synchronization worker for subscription "sub", table "pgbench_accounts"
has started
2017-10-02 16:47:46.771 MSK [32129] ERROR:  control reached end of
trigger procedure without RETURN
2017-10-02 16:47:46.771 MSK [32129] CONTEXT:  PL/pgSQL function
replication_trigger_proc()
    COPY pgbench_accounts, line 1: "1    1    0 "
2017-10-02 16:47:46.772 MSK [28020] LOG:  worker process: logical
replication worker for subscription 17264 sync 17251 (PID 32129) exited
with exit code 1
...

After few minutes of work primary node (with publication) is crashed
with the following stack trace:

Program terminated with signal SIGABRT, Aborted.
#0  0x00007f3608f8ec37 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
56    ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007f3608f8ec37 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f3608f92028 in __GI_abort () at abort.c:89
#2  0x00000000009f5740 in ExceptionalCondition (conditionName=0xbf6b30
"!(((xid) != ((TransactionId) 0)))",
    errorType=0xbf69af "FailedAssertion", fileName=0xbf69a8 "lmgr.c",
lineNumber=582) at assert.c:54
#3  0x000000000086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0,
oper=XLTW_None) at lmgr.c:582
#4  0x000000000081c9f7 in SnapBuildWaitSnapshot (running=0x2749198,
cutoff=898498) at snapbuild.c:1400
#5  0x000000000081c7c7 in SnapBuildFindSnapshot (builder=0x2807910,
lsn=798477760, running=0x2749198) at snapbuild.c:1311
#6  0x000000000081c339 in SnapBuildProcessRunningXacts
(builder=0x2807910, lsn=798477760, running=0x2749198) at snapbuild.c:1106

Hmm this would suggest that XLOG_RUNNING_XACTS record contains invalid
transaction ids but we specifically skip those in
GetRunningTransactionData(). Can you check pg_waldump output for that
record (lsn is shown in the backtrace)?

Now fix the trigger function:
CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$
BEGIN
  RETURN NEW;
END $$ LANGUAGE plpgsql;

And manually perform at master two updates inside one transaction:

postgres=# begin;
BEGIN
postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
UPDATE 1
postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
UPDATE 1
postgres=# commit;
<hangs>

and in replica log we see:
2017-10-02 18:40:26.094 MSK [2954] LOG:  logical replication apply
worker for subscription "sub" has started
2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible
tuple
2017-10-02 18:40:26.102 MSK [2882] LOG:  worker process: logical
replication worker for subscription 16403 (PID 2954) exited with exit
code 1

Error happens in trigger.c:

#3  0x000000000069bddb in GetTupleForTrigger (estate=0x2e36b50,
epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
    lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at
trigger.c:3103
#4  0x000000000069b259 in ExecBRUpdateTriggers (estate=0x2e36b50,
epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac,
    fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748
#5  0x00000000006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50,
epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240)
    at execReplication.c:461
#6  0x0000000000820894 in apply_handle_update (s=0x7ffc442163b0) at
worker.c:736

We have locked the same tuple in RelationFindReplTupleByIndex() just
before this gets called and didn't get the same error. I guess we do
something wrong with snapshots. Will need to investigate more.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Petr Jelinek (#2)
1 attachment(s)
Re: Issues with logical replication

On 02/10/17 18:59, Petr Jelinek wrote:

Now fix the trigger function:
CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$
BEGIN
  RETURN NEW;
END $$ LANGUAGE plpgsql;

And manually perform at master two updates inside one transaction:

postgres=# begin;
BEGIN
postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
UPDATE 1
postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
UPDATE 1
postgres=# commit;
<hangs>

and in replica log we see:
2017-10-02 18:40:26.094 MSK [2954] LOG:  logical replication apply
worker for subscription "sub" has started
2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible
tuple
2017-10-02 18:40:26.102 MSK [2882] LOG:  worker process: logical
replication worker for subscription 16403 (PID 2954) exited with exit
code 1

Error happens in trigger.c:

#3  0x000000000069bddb in GetTupleForTrigger (estate=0x2e36b50,
epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
    lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at
trigger.c:3103
#4  0x000000000069b259 in ExecBRUpdateTriggers (estate=0x2e36b50,
epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac,
    fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748
#5  0x00000000006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50,
epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240)
    at execReplication.c:461
#6  0x0000000000820894 in apply_handle_update (s=0x7ffc442163b0) at
worker.c:736

We have locked the same tuple in RelationFindReplTupleByIndex() just
before this gets called and didn't get the same error. I guess we do
something wrong with snapshots. Will need to investigate more.

Okay, so it's not snapshot. It's the fact that we don't set the
es_output_cid in replication worker which GetTupleForTrigger is using
when locking the tuple. Attached one-liner fixes it.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

Assign-CCI-in-lrep-estate.patchtext/x-patch; name=Assign-CCI-in-lrep-estate.patchDownload
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index b58d2e1008..d6a5cf7088 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -204,6 +204,8 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 	estate->es_num_result_relations = 1;
 	estate->es_result_relation_info = resultRelInfo;
 
+	estate->es_output_cid = GetCurrentCommandId(true);
+
 	/* Triggers might need a slot */
 	if (resultRelInfo->ri_TrigDesc)
 		estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate);
#4Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Petr Jelinek (#3)
Re: Issues with logical replication

On Wed, Oct 4, 2017 at 8:35 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:

On 02/10/17 18:59, Petr Jelinek wrote:

Now fix the trigger function:
CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$
BEGIN
RETURN NEW;
END $$ LANGUAGE plpgsql;

And manually perform at master two updates inside one transaction:

postgres=# begin;
BEGIN
postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
UPDATE 1
postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
UPDATE 1
postgres=# commit;
<hangs>

and in replica log we see:
2017-10-02 18:40:26.094 MSK [2954] LOG: logical replication apply
worker for subscription "sub" has started
2017-10-02 18:40:26.101 MSK [2954] ERROR: attempted to lock invisible
tuple
2017-10-02 18:40:26.102 MSK [2882] LOG: worker process: logical
replication worker for subscription 16403 (PID 2954) exited with exit
code 1

Error happens in trigger.c:

#3 0x000000000069bddb in GetTupleForTrigger (estate=0x2e36b50,
epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at
trigger.c:3103
#4 0x000000000069b259 in ExecBRUpdateTriggers (estate=0x2e36b50,
epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac,
fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748
#5 0x00000000006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50,
epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240)
at execReplication.c:461
#6 0x0000000000820894 in apply_handle_update (s=0x7ffc442163b0) at
worker.c:736

We have locked the same tuple in RelationFindReplTupleByIndex() just
before this gets called and didn't get the same error. I guess we do
something wrong with snapshots. Will need to investigate more.

Okay, so it's not snapshot. It's the fact that we don't set the
es_output_cid in replication worker which GetTupleForTrigger is using
when locking the tuple. Attached one-liner fixes it.

Thank you for the patch.
This bug can happen even without the trigger and I confirmed tgat the
bug is fixed by the patch. I think the patch fixed it properly.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Stas Kelvich
s.kelvich@postgrespro.ru
In reply to: Petr Jelinek (#2)
1 attachment(s)
Re: Issues with logical replication

On 2 Oct 2017, at 19:59, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:

Program terminated with signal SIGABRT, Aborted.
#0 0x00007f3608f8ec37 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007f3608f8ec37 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f3608f92028 in __GI_abort () at abort.c:89
#2 0x00000000009f5740 in ExceptionalCondition (conditionName=0xbf6b30
"!(((xid) != ((TransactionId) 0)))",
errorType=0xbf69af "FailedAssertion", fileName=0xbf69a8 "lmgr.c",
lineNumber=582) at assert.c:54
#3 0x000000000086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0,
oper=XLTW_None) at lmgr.c:582
#4 0x000000000081c9f7 in SnapBuildWaitSnapshot (running=0x2749198,
cutoff=898498) at snapbuild.c:1400
#5 0x000000000081c7c7 in SnapBuildFindSnapshot (builder=0x2807910,
lsn=798477760, running=0x2749198) at snapbuild.c:1311
#6 0x000000000081c339 in SnapBuildProcessRunningXacts
(builder=0x2807910, lsn=798477760, running=0x2749198) at snapbuild.c:1106

Hmm this would suggest that XLOG_RUNNING_XACTS record contains invalid
transaction ids but we specifically skip those in
GetRunningTransactionData(). Can you check pg_waldump output for that
record (lsn is shown in the backtrace)?

I investigated this case and it seems that XactLockTableWait() in SnapBuildWaitSnapshot()
not always work as expected. XactLockTableWait() waits on LockAcquire() for xid to be
completed and if we finally got this lock but transaction is still in progress then such xid
assumed to be a subxid. However LockAcquire/LockRelease cycle can happen after transaction
set xid, but before XactLockTableInsert().

Namely following history happened for xid 5225 and lead to crash:

[backend] LOG: AssignTransactionId: XactTopTransactionId = 5225
[walsender] LOG: LogCurrentRunningXacts: Wrote RUNNING_XACTS xctn=1, xid[0]=5225
[walsender] LOG: XactLockTableWait: LockAcquire 5225
[walsender] LOG: XactLockTableWait: LockRelease 5225
[backend] LOG: AssignTransactionId: LockAcquire ExclusiveLock 5225
[walsender] LOG: TransactionIdIsInProgress: SVC->latestCompletedXid=5224 < xid=5225 => true
[backend] LOG: CommitTransaction: ProcArrayEndTransaction xid=5225, ipw=0
[backend] LOG: CommitTransaction: ResourceOwnerRelease locks xid=5225

I’ve quickly checked other usages of XactLockTableWait() and it seems that it is used mostly with
xids from heap so tx definetly set it lock somewhere in the past.

Not sure what it the best approach to handle that. May be write running xacts only if they already
set their lock?

Also attaching pgbench script that can reproduce failure. I run it over usual pgbench database
in 10 threads. It takes several minutes to crash.

Attachments:

reload.pgbapplication/octet-stream; name=reload.pgb; x-unix-mode=0644Download
#6Robert Haas
robertmhaas@gmail.com
In reply to: Stas Kelvich (#5)
Re: [HACKERS] Issues with logical replication

On Mon, Oct 9, 2017 at 9:19 PM, Stas Kelvich <s.kelvich@postgrespro.ru> wrote:

I investigated this case and it seems that XactLockTableWait() in SnapBuildWaitSnapshot()
not always work as expected. XactLockTableWait() waits on LockAcquire() for xid to be
completed and if we finally got this lock but transaction is still in progress then such xid
assumed to be a subxid. However LockAcquire/LockRelease cycle can happen after transaction
set xid, but before XactLockTableInsert().

Namely following history happened for xid 5225 and lead to crash:

[backend] LOG: AssignTransactionId: XactTopTransactionId = 5225
[walsender] LOG: LogCurrentRunningXacts: Wrote RUNNING_XACTS xctn=1, xid[0]=5225
[walsender] LOG: XactLockTableWait: LockAcquire 5225
[walsender] LOG: XactLockTableWait: LockRelease 5225
[backend] LOG: AssignTransactionId: LockAcquire ExclusiveLock 5225
[walsender] LOG: TransactionIdIsInProgress: SVC->latestCompletedXid=5224 < xid=5225 => true
[backend] LOG: CommitTransaction: ProcArrayEndTransaction xid=5225, ipw=0
[backend] LOG: CommitTransaction: ResourceOwnerRelease locks xid=5225

Ouch. This seems like a bug that needs to be fixed, but do you think
it's related to to Petr's proposed fix to set es_output_cid? That fix
looks reasonable, since we shouldn't try to lock tuples without a
valid CommandId.

Now, having said that, I understand how the lack of that fix could cause:

2017-10-02 18:40:26.101 MSK [2954] ERROR: attempted to lock invisible tuple

But I do not understand how it could cause:

#3 0x000000000086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0,
oper=XLTW_None) at lmgr.c:582

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Stas Kelvich
s.kelvich@postgrespro.ru
In reply to: Robert Haas (#6)
1 attachment(s)
Re: [HACKERS] Issues with logical replication

On 15 Nov 2017, at 23:09, Robert Haas <robertmhaas@gmail.com> wrote:

Ouch. This seems like a bug that needs to be fixed, but do you think
it's related to to Petr's proposed fix to set es_output_cid? That fix
looks reasonable, since we shouldn't try to lock tuples without a
valid CommandId.

Now, having said that, I understand how the lack of that fix could cause:

2017-10-02 18:40:26.101 MSK [2954] ERROR: attempted to lock invisible tuple

But I do not understand how it could cause:

#3 0x000000000086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0,
oper=XLTW_None) at lmgr.c:582

I think that’s two separate bugs. Second one is due to race between
XactLockTableWait() and acquiring of heavyweight transaction lock in
AssignTransactionId(). XactLockTableWait assumes that transaction
already set it’s hw transaction lock, but with current machinery of
RunningXacts and snapshot building we can start such waiting before
transaction gets it hw lock.

I can see two ways out of this and both of them are quite unattractive:

1. Create a flag in pgproc indicating that hw lock is acquired and
include in RunningXacts only transaction with this flag set.
That's probably quite expensive since we need to hold one more exclusive
lwlock in AssignTransactionId.

2. Rewrite XactLockTableWait() in a way that it is aware of the fact that
valid xid can be topmost transaction but not yet have a hw lock entry.
In such situation we can just wait and try to lock again. Good stop
condition in this loop is TransactionIdIsInProgress(xid) becoming
false.

I did a sketch of first approach just to confirm that it solves the problem.
But there I hold ProcArrayLock during update of flag. Since only reader is
GetRunningTransactionData it possible to have a custom lock there. In
this case GetRunningTransactionData will hold three locks simultaneously,
since it already holds ProcArrayLock and XidGenLock =)

Any better ideas?

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

xltw_fix.diffapplication/octet-stream; name=xltw_fix.diff; x-unix-mode=0644Download
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 07e4fd85305..1032365f0a5 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -589,6 +589,10 @@ AssignTransactionId(TransactionState s)
 
 	XactLockTableInsert(s->transactionId);
 
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	MyProc->xltwSet = true;
+	LWLockRelease(ProcArrayLock);
+
 	CurrentResourceOwner = currentOwner;
 
 	/*
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index fa9f297ac2d..b2a092641b6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -419,6 +419,10 @@ ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
 		}
 		else
 			ProcArrayGroupClearXid(proc, latestXid);
+
+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		MyProc->xltwSet = false;
+		LWLockRelease(ProcArrayLock);
 	}
 	else
 	{
@@ -2001,6 +2005,7 @@ GetRunningTransactionData(void)
 	{
 		int			pgprocno = arrayP->pgprocnos[index];
 		volatile PGXACT *pgxact = &allPgXact[pgprocno];
+		volatile PGPROC *proc   = &allProcs[pgprocno];
 		TransactionId xid;
 
 		/* Fetch xid just once - see GetNewTransactionId */
@@ -2010,7 +2015,7 @@ GetRunningTransactionData(void)
 		 * We don't need to store transactions that don't have a TransactionId
 		 * yet because they will not show as running on a standby server.
 		 */
-		if (!TransactionIdIsValid(xid))
+		if (!TransactionIdIsValid(xid) || !proc->xltwSet)
 			continue;
 
 		xids[count++] = xid;
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 5f6727d5014..e83e12fa6b5 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -381,6 +381,7 @@ InitProcess(void)
 	MyProc->lwWaitMode = 0;
 	MyProc->waitLock = NULL;
 	MyProc->waitProcLock = NULL;
+	MyProc->xltwSet = false;
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 205f4845108..e993811c1ef 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -138,6 +138,7 @@ struct PGPROC
 	LOCKMODE	waitLockMode;	/* type of lock we're waiting for */
 	LOCKMASK	heldLocks;		/* bitmask for lock types already held on this
 								 * lock object by this backend */
+	bool		xltwSet;
 
 	/*
 	 * Info to allow us to wait for synchronous replication, if needed.
#8Robert Haas
robertmhaas@gmail.com
In reply to: Stas Kelvich (#7)
Re: [HACKERS] Issues with logical replication

On Wed, Nov 15, 2017 at 8:20 PM, Stas Kelvich <s.kelvich@postgrespro.ru> wrote:

I did a sketch of first approach just to confirm that it solves the problem.
But there I hold ProcArrayLock during update of flag. Since only reader is
GetRunningTransactionData it possible to have a custom lock there. In
this case GetRunningTransactionData will hold three locks simultaneously,
since it already holds ProcArrayLock and XidGenLock =)

To me, it seems like SnapBuildWaitSnapshot() is fundamentally
misdesigned, and ideally Petr (who wrote the patch) or Andres (who
committed it) ought to get involved here and help fix this problem.
My own first inclination would be to rewrite this as a loop: if the
transaction ID precedes the oldest running XID, then continue; else if
TransactionIdDidCommit() || TransactionIdDidAbort() then conclude that
we don't need to wait; else XactLockTableWait() then loop. That way,
if you hit the race condition, you'll just busy-wait instead of doing
the wrong thing. Maybe insert a sleep(1) if we retry more than once.
That sucks, of course, but it seems like a better idea than trying to
redesign XactLockTableWait() or the procarray, which could affect an
awful lot of other things.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#9Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#8)
Re: [HACKERS] Issues with logical replication

On 2017-11-16 10:36:40 -0500, Robert Haas wrote:

On Wed, Nov 15, 2017 at 8:20 PM, Stas Kelvich <s.kelvich@postgrespro.ru> wrote:

I did a sketch of first approach just to confirm that it solves the problem.
But there I hold ProcArrayLock during update of flag. Since only reader is
GetRunningTransactionData it possible to have a custom lock there. In
this case GetRunningTransactionData will hold three locks simultaneously,
since it already holds ProcArrayLock and XidGenLock =)

To me, it seems like SnapBuildWaitSnapshot() is fundamentally
misdesigned

Maybe I'm confused, but why is it fundamentally misdesigned? It's not
such an absurd idea to wait for an xid in a WAL record. I get that
there's a race condition here, which obviously bad, but I don't really
see as evidence of the above claim.

I actually think this code used to be safe because ProcArrayLock used to
be held while generating and logging the running snapshots record. That
was removed when fixing some other bug, but perhaps that shouldn't have
been done...

and ideally Petr (who wrote the patch) or Andres (who committed it)
ought to get involved here and help fix this problem. My own first
inclination would be to rewrite this as a loop: if the transaction ID
precedes the oldest running XID, then continue; else if
TransactionIdDidCommit() || TransactionIdDidAbort() then conclude that
we don't need to wait; else XactLockTableWait() then loop. That way,
if you hit the race condition, you'll just busy-wait instead of doing
the wrong thing. Maybe insert a sleep(1) if we retry more than once.
That sucks, of course, but it seems like a better idea than trying to
redesign XactLockTableWait() or the procarray, which could affect an
awful lot of other things.

Hm. Thinking about this.

Greetings,

Andres Freund

#10Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#9)
Re: [HACKERS] Issues with logical replication

On Thu, Nov 16, 2017 at 2:41 PM, Andres Freund <andres@anarazel.de> wrote:

To me, it seems like SnapBuildWaitSnapshot() is fundamentally
misdesigned

Maybe I'm confused, but why is it fundamentally misdesigned? It's not
such an absurd idea to wait for an xid in a WAL record. I get that
there's a race condition here, which obviously bad, but I don't really
see as evidence of the above claim.

I actually think this code used to be safe because ProcArrayLock used to
be held while generating and logging the running snapshots record. That
was removed when fixing some other bug, but perhaps that shouldn't have
been done...

OK. Well, I might be overstating the case. My comment about
fundamental misdesign was really just based on the assumption that
XactLockTableWait() could be used to wait for an XID the instant it
was generated. That was never gonna work and there's no obvious clean
workaround for the problem. Getting snapshot building to work
properly seems to be Hard (TM).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#11Michael Paquier
michael.paquier@gmail.com
In reply to: Robert Haas (#10)
Re: [HACKERS] Issues with logical replication

On Fri, Nov 17, 2017 at 5:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Nov 16, 2017 at 2:41 PM, Andres Freund <andres@anarazel.de> wrote:

To me, it seems like SnapBuildWaitSnapshot() is fundamentally
misdesigned

Maybe I'm confused, but why is it fundamentally misdesigned? It's not
such an absurd idea to wait for an xid in a WAL record. I get that
there's a race condition here, which obviously bad, but I don't really
see as evidence of the above claim.

I actually think this code used to be safe because ProcArrayLock used to
be held while generating and logging the running snapshots record. That
was removed when fixing some other bug, but perhaps that shouldn't have
been done...

OK. Well, I might be overstating the case. My comment about
fundamental misdesign was really just based on the assumption that
XactLockTableWait() could be used to wait for an XID the instant it
was generated. That was never gonna work and there's no obvious clean
workaround for the problem. Getting snapshot building to work
properly seems to be Hard (TM).

The patches discussed here deserve tracking, so please note that I
have added an entry in the CF app:
https://commitfest.postgresql.org/16/1381/
--
Michael

#12Craig Ringer
craig@2ndquadrant.com
In reply to: Petr Jelinek (#3)
Re: [HACKERS] Issues with logical replication

On 4 October 2017 at 07:35, Petr Jelinek <petr.jelinek@2ndquadrant.com>
wrote:

On 02/10/17 18:59, Petr Jelinek wrote:

Now fix the trigger function:
CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER

AS $$

BEGIN
RETURN NEW;
END $$ LANGUAGE plpgsql;

And manually perform at master two updates inside one transaction:

postgres=# begin;
BEGIN
postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
UPDATE 1
postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
UPDATE 1
postgres=# commit;
<hangs>

and in replica log we see:
2017-10-02 18:40:26.094 MSK [2954] LOG: logical replication apply
worker for subscription "sub" has started
2017-10-02 18:40:26.101 MSK [2954] ERROR: attempted to lock invisible
tuple
2017-10-02 18:40:26.102 MSK [2882] LOG: worker process: logical
replication worker for subscription 16403 (PID 2954) exited with exit
code 1

Error happens in trigger.c:

#3 0x000000000069bddb in GetTupleForTrigger (estate=0x2e36b50,
epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at
trigger.c:3103
#4 0x000000000069b259 in ExecBRUpdateTriggers (estate=0x2e36b50,
epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac,
fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748
#5 0x00000000006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50,
epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240)
at execReplication.c:461
#6 0x0000000000820894 in apply_handle_update (s=0x7ffc442163b0) at
worker.c:736

We have locked the same tuple in RelationFindReplTupleByIndex() just
before this gets called and didn't get the same error. I guess we do
something wrong with snapshots. Will need to investigate more.

Okay, so it's not snapshot. It's the fact that we don't set the
es_output_cid in replication worker which GetTupleForTrigger is using
when locking the tuple. Attached one-liner fixes it.

This seems like a clear-cut bug with a simple fix.

Lets get this committed, so we don't lose it. The rest of the thread is
going off into the weeds a bit issues unrelated to the original problem.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#13Simon Riggs
simon@2ndquadrant.com
In reply to: Petr Jelinek (#3)
Re: [HACKERS] Issues with logical replication

On 4 October 2017 at 10:35, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:

On 02/10/17 18:59, Petr Jelinek wrote:

Now fix the trigger function:
CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$
BEGIN
RETURN NEW;
END $$ LANGUAGE plpgsql;

And manually perform at master two updates inside one transaction:

postgres=# begin;
BEGIN
postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
UPDATE 1
postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
UPDATE 1
postgres=# commit;
<hangs>

and in replica log we see:
2017-10-02 18:40:26.094 MSK [2954] LOG: logical replication apply
worker for subscription "sub" has started
2017-10-02 18:40:26.101 MSK [2954] ERROR: attempted to lock invisible
tuple
2017-10-02 18:40:26.102 MSK [2882] LOG: worker process: logical
replication worker for subscription 16403 (PID 2954) exited with exit
code 1

Error happens in trigger.c:

#3 0x000000000069bddb in GetTupleForTrigger (estate=0x2e36b50,
epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at
trigger.c:3103
#4 0x000000000069b259 in ExecBRUpdateTriggers (estate=0x2e36b50,
epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac,
fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748
#5 0x00000000006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50,
epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240)
at execReplication.c:461
#6 0x0000000000820894 in apply_handle_update (s=0x7ffc442163b0) at
worker.c:736

We have locked the same tuple in RelationFindReplTupleByIndex() just
before this gets called and didn't get the same error. I guess we do
something wrong with snapshots. Will need to investigate more.

Okay, so it's not snapshot. It's the fact that we don't set the
es_output_cid in replication worker which GetTupleForTrigger is using
when locking the tuple. Attached one-liner fixes it.

Applied, thanks

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#14Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Robert Haas (#10)
Re: [HACKERS] Issues with logical replication

Hi,

(sorry for not being active here, I am still catching up after being
away for some family issues)

On 16/11/17 21:12, Robert Haas wrote:

On Thu, Nov 16, 2017 at 2:41 PM, Andres Freund <andres@anarazel.de> wrote:

To me, it seems like SnapBuildWaitSnapshot() is fundamentally
misdesigned

Maybe I'm confused, but why is it fundamentally misdesigned? It's not
such an absurd idea to wait for an xid in a WAL record. I get that
there's a race condition here, which obviously bad, but I don't really
see as evidence of the above claim.

I actually think this code used to be safe because ProcArrayLock used to
be held while generating and logging the running snapshots record. That
was removed when fixing some other bug, but perhaps that shouldn't have
been done...

OK. Well, I might be overstating the case. My comment about
fundamental misdesign was really just based on the assumption that
XactLockTableWait() could be used to wait for an XID the instant it
was generated. That was never gonna work and there's no obvious clean
workaround for the problem. Getting snapshot building to work
properly seems to be Hard (TM).

No kidding, we've been at it since 9.4.

But anyway, while sure we have race condition because
XactLockTableWait() finishes too early, all that should mean is we call
LogStandbySnapshot() too early and as a result taking snapshot (and
hence slot creation) will take longer as we'll wait for next call of
LogStandbySnapshot() from some other caller (because the tx we care
about will still be running). The whole SnapBuildWaitSnapshot() is just
optimization to make slot creation take less time (and also to be able
to write tests).

What I don't understand is how it leads to crash (and I could not
reproduce it using the pgbench file attached in this thread either) and
moreover how it leads to 0 xid being logged. The only explanation I can
come up is that some kind of similar race has to be in
LogStandbySnapshot() but we explicitly check for 0 xid value there.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#15Stas Kelvich
s.kelvich@postgrespro.ru
In reply to: Petr Jelinek (#14)
1 attachment(s)
Re: [HACKERS] Issues with logical replication

On 29 Nov 2017, at 18:46, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:

What I don't understand is how it leads to crash (and I could not
reproduce it using the pgbench file attached in this thread either) and
moreover how it leads to 0 xid being logged. The only explanation I can
come up is that some kind of similar race has to be in
LogStandbySnapshot() but we explicitly check for 0 xid value there.

Zero xid isn’t logged. Loop in XactLockTableWait() does following:

for (;;)
{
Assert(TransactionIdIsValid(xid));
Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));

<...>

xid = SubTransGetParent(xid);
}

So if last statement is reached for top transaction then next iteration
will crash in first assert. And it will be reached if whole this loop
happens before transaction acquired heavyweight lock.

Probability of that crash can be significantly increased be adding sleep
between xid generation and lock insertion in AssignTransactionId().

Attachments:

AssignTransactionId.patchapplication/octet-stream; name=AssignTransactionId.patch; x-unix-mode=0644Download
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 046898c6190..acc3eb64e69 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -569,6 +569,13 @@ AssignTransactionId(TransactionState s)
 	if (!isSubXact)
 		RegisterPredicateLockingXid(s->transactionId);
 
+
+	(void) WaitLatch(MyLatch,
+			WL_LATCH_SET | WL_TIMEOUT,
+			30,
+			WAIT_EVENT_PG_SLEEP);
+	ResetLatch(MyLatch);
+
 	/*
 	 * Acquire lock on the transaction XID.  (We assume this cannot block.) We
 	 * have to ensure that the lock is assigned to the transaction's own
#16Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Stas Kelvich (#15)
Re: [HACKERS] Issues with logical replication

On 29/11/17 20:11, Stas Kelvich wrote:

On 29 Nov 2017, at 18:46, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:

What I don't understand is how it leads to crash (and I could not
reproduce it using the pgbench file attached in this thread either) and
moreover how it leads to 0 xid being logged. The only explanation I can
come up is that some kind of similar race has to be in
LogStandbySnapshot() but we explicitly check for 0 xid value there.

Zero xid isn’t logged. Loop in XactLockTableWait() does following:

for (;;)
{
Assert(TransactionIdIsValid(xid));
Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));

<...>

xid = SubTransGetParent(xid);
}

So if last statement is reached for top transaction then next iteration
will crash in first assert. And it will be reached if whole this loop
happens before transaction acquired heavyweight lock.

Probability of that crash can be significantly increased be adding sleep
between xid generation and lock insertion in AssignTransactionId().

Yes that helps thanks. Now that I reproduced it I understand, I was
confused by the backtrace that said xid was 0 on the input to
XactLockTableWait() but that's not the case, it's what xid is changed to
in the inner loop.

So what happens is that we manage to do LogStandbySnapshot(), decode the
logged snapshot, and run SnapBuildWaitSnapshot() for a transaction in
between GetNewTransactionId() and XactLockTableInsert() calls in
AssignTransactionId() for that same transaction.

I guess the probability of this happening is increased by the fact that
GetRunningTransactionData() acquires XidGenLock so if there is
GetNewTransactionId() running in parallel it will wait for it to finish
and we'll log immediately after that.

Hmm that means that Robert's loop idea will not help and ProcArrayLock
will not save us either. Maybe we could either rewrite XactLockTableWait
or create another version of it with SubTransGetParent() call replaced
by SubTransGetTopmostTransaction() as that will return the same top
level xid in case the input xid wasn't a subxact. That would make it
safe to be called on transactions that didn't acquire lock on themselves
yet.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#17Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#16)
Re: [HACKERS] Issues with logical replication

On 2017-11-30 00:25:58 +0100, Petr Jelinek wrote:

Yes that helps thanks. Now that I reproduced it I understand, I was
confused by the backtrace that said xid was 0 on the input to
XactLockTableWait() but that's not the case, it's what xid is changed to
in the inner loop.

So what happens is that we manage to do LogStandbySnapshot(), decode the
logged snapshot, and run SnapBuildWaitSnapshot() for a transaction in
between GetNewTransactionId() and XactLockTableInsert() calls in
AssignTransactionId() for that same transaction.

I guess the probability of this happening is increased by the fact that
GetRunningTransactionData() acquires XidGenLock so if there is
GetNewTransactionId() running in parallel it will wait for it to finish
and we'll log immediately after that.

Hmm that means that Robert's loop idea will not help and ProcArrayLock
will not save us either. Maybe we could either rewrite XactLockTableWait
or create another version of it with SubTransGetParent() call replaced
by SubTransGetTopmostTransaction() as that will return the same top
level xid in case the input xid wasn't a subxact. That would make it
safe to be called on transactions that didn't acquire lock on themselves
yet.

I've not really looked into this deeply, but afair we can just make this
code accept that edgecase be done with it. As the comment says:

 * Iterate through xids in record, wait for all older than the cutoff to
 * finish.  Then, if possible, log a new xl_running_xacts record.
 *
--- highlight ---
 * This isn't required for the correctness of decoding, but to:
--- highlight ---
 * a) allow isolationtester to notice that we're currently waiting for
 *	  something.
 * b) log a new xl_running_xacts record where it'd be helpful, without having
 *	  to write for bgwriter or checkpointer.

Greetings,

Andres Freund

#18Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Andres Freund (#17)
Re: [HACKERS] Issues with logical replication

On 30/11/17 00:40, Andres Freund wrote:

On 2017-11-30 00:25:58 +0100, Petr Jelinek wrote:

Yes that helps thanks. Now that I reproduced it I understand, I was
confused by the backtrace that said xid was 0 on the input to
XactLockTableWait() but that's not the case, it's what xid is changed to
in the inner loop.

So what happens is that we manage to do LogStandbySnapshot(), decode the
logged snapshot, and run SnapBuildWaitSnapshot() for a transaction in
between GetNewTransactionId() and XactLockTableInsert() calls in
AssignTransactionId() for that same transaction.

I guess the probability of this happening is increased by the fact that
GetRunningTransactionData() acquires XidGenLock so if there is
GetNewTransactionId() running in parallel it will wait for it to finish
and we'll log immediately after that.

Hmm that means that Robert's loop idea will not help and ProcArrayLock
will not save us either. Maybe we could either rewrite XactLockTableWait
or create another version of it with SubTransGetParent() call replaced
by SubTransGetTopmostTransaction() as that will return the same top
level xid in case the input xid wasn't a subxact. That would make it
safe to be called on transactions that didn't acquire lock on themselves
yet.

I've not really looked into this deeply, but afair we can just make this
code accept that edgecase be done with it. As the comment says:

* Iterate through xids in record, wait for all older than the cutoff to
* finish.  Then, if possible, log a new xl_running_xacts record.
*
--- highlight ---
* This isn't required for the correctness of decoding, but to:
--- highlight ---
* a) allow isolationtester to notice that we're currently waiting for
*	  something.
* b) log a new xl_running_xacts record where it'd be helpful, without having
*	  to write for bgwriter or checkpointer.

I don't understand. I mean sure the SnapBuildWaitSnapshot() can live
with it, but the problematic logic happens inside the
XactLockTableInsert() and SnapBuildWaitSnapshot() has no way of
detecting the situation short of reimplementing the
XactLockTableInsert() instead of calling it.

The problem is that SubTransGetParent() returns InvalidTransactionId
when the race happens and SubTransGetParen() is called because
XactLockTableInsert() assumes that if transaction lock was acquired and
the xid is still in progress, the input must have been xid of a
subtransaction). This is why I suggested replacing that call with
SubTransGetTopmostTransaction() which will just return same xid it got
on input and we'll simply retry to do the lock instead of crashing.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#19Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#18)
Re: [HACKERS] Issues with logical replication

On 2017-11-30 00:45:44 +0100, Petr Jelinek wrote:

I don't understand. I mean sure the SnapBuildWaitSnapshot() can live
with it, but the problematic logic happens inside the
XactLockTableInsert() and SnapBuildWaitSnapshot() has no way of
detecting the situation short of reimplementing the
XactLockTableInsert() instead of calling it.

Right. But we fairly trivially can change that. I'm remarking on it
because other people's, not yours, suggestions aimed at making this
bulletproof. I just wanted to make clear that I don't think that's
necessary at all.

Greetings,

Andres Freund

#20Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Andres Freund (#19)
1 attachment(s)
Re: [HACKERS] Issues with logical replication

On 30/11/17 00:47, Andres Freund wrote:

On 2017-11-30 00:45:44 +0100, Petr Jelinek wrote:

I don't understand. I mean sure the SnapBuildWaitSnapshot() can live
with it, but the problematic logic happens inside the
XactLockTableInsert() and SnapBuildWaitSnapshot() has no way of
detecting the situation short of reimplementing the
XactLockTableInsert() instead of calling it.

Right. But we fairly trivially can change that. I'm remarking on it
because other people's, not yours, suggestions aimed at making this
bulletproof. I just wanted to make clear that I don't think that's
necessary at all.

Okay, then I guess we are in agreement. I can confirm that the attached
fixes the issue in my tests. Using SubTransGetTopmostTransaction()
instead of SubTransGetParent() means 3 more ifs in terms of extra CPU
cost for other callers. I don't think it's worth worrying about given we
are waiting for heavyweight lock, but if we did we can just inline the
code directly into SnapBuildWaitSnapshot().

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-Make-XactLockTableWait-work-for-transactions-that-ar.patchtext/x-patch; name=0001-Make-XactLockTableWait-work-for-transactions-that-ar.patchDownload
From 7d230d6453e2ac6e4c47b2af989dea62e667fc4b Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Thu, 30 Nov 2017 01:18:55 +0100
Subject: [PATCH] Make XactLockTableWait work for transactions that are not yet
 self locked

The XactLockTableWait assumed that the xid given on input has already
added itself to the locktable.  This led to another assumption that if
the lock of the xid has succeeded but the xid is still in progress, the
inpud xid must have been subtransaction.  These assumtions might not be
true in logical slot snapshot building on rare occation where the
standby snapshot logged contains xid that has already been added to
ProcArray but not to lock table yet.

The new coding will just not assume xid is subtransaction anymore and
will retry the lock with same toplevel xid it was given before (courtesy
of SubTransGetTopmostTransaction()).

For consistency reasons, same change is made to
ConditionalXactLockTableWait.
---
 src/backend/storage/lmgr/lmgr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index da5679b7a3..9ba6ea00b1 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -590,7 +590,7 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
 
 		if (!TransactionIdIsInProgress(xid))
 			break;
-		xid = SubTransGetParent(xid);
+		xid = SubTransGetTopmostTransaction(xid);
 	}
 
 	if (oper != XLTW_None)
@@ -622,7 +622,7 @@ ConditionalXactLockTableWait(TransactionId xid)
 
 		if (!TransactionIdIsInProgress(xid))
 			break;
-		xid = SubTransGetParent(xid);
+		xid = SubTransGetTopmostTransaction(xid);
 	}
 
 	return true;
-- 
2.14.1

#21Stas Kelvich
s.kelvich@postgrespro.ru
In reply to: Petr Jelinek (#20)
Re: [HACKERS] Issues with logical replication

On 30 Nov 2017, at 03:30, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:

On 30/11/17 00:47, Andres Freund wrote:

On 2017-11-30 00:45:44 +0100, Petr Jelinek wrote:

I don't understand. I mean sure the SnapBuildWaitSnapshot() can live
with it, but the problematic logic happens inside the
XactLockTableInsert() and SnapBuildWaitSnapshot() has no way of
detecting the situation short of reimplementing the
XactLockTableInsert() instead of calling it.

Right. But we fairly trivially can change that. I'm remarking on it
because other people's, not yours, suggestions aimed at making this
bulletproof. I just wanted to make clear that I don't think that's
necessary at all.

Okay, then I guess we are in agreement. I can confirm that the attached
fixes the issue in my tests. Using SubTransGetTopmostTransaction()
instead of SubTransGetParent() means 3 more ifs in terms of extra CPU
cost for other callers. I don't think it's worth worrying about given we
are waiting for heavyweight lock, but if we did we can just inline the
code directly into SnapBuildWaitSnapshot().

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
<0001-Make-XactLockTableWait-work-for-transactions-that-ar.patch>

+1

Seems that having busy loop is the best idea out of several discussed.

I thought about small sleep at the bottom of that loop if we reached topmost
transaction, but taking into account low probability of that event may be
it is faster to do just busy wait.

Also some clarifying comment in code would be nice.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#22Simon Riggs
simon@2ndquadrant.com
In reply to: Petr Jelinek (#20)
Re: [HACKERS] Issues with logical replication

On 30 November 2017 at 11:30, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:

On 30/11/17 00:47, Andres Freund wrote:

On 2017-11-30 00:45:44 +0100, Petr Jelinek wrote:

I don't understand. I mean sure the SnapBuildWaitSnapshot() can live
with it, but the problematic logic happens inside the
XactLockTableInsert() and SnapBuildWaitSnapshot() has no way of
detecting the situation short of reimplementing the
XactLockTableInsert() instead of calling it.

Right. But we fairly trivially can change that. I'm remarking on it
because other people's, not yours, suggestions aimed at making this
bulletproof. I just wanted to make clear that I don't think that's
necessary at all.

Okay, then I guess we are in agreement. I can confirm that the attached
fixes the issue in my tests. Using SubTransGetTopmostTransaction()
instead of SubTransGetParent() means 3 more ifs in terms of extra CPU
cost for other callers. I don't think it's worth worrying about given we
are waiting for heavyweight lock, but if we did we can just inline the
code directly into SnapBuildWaitSnapshot().

This will still fail an Assert in TransactionIdIsInProgress() when
snapshots are overflowed.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#23Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Simon Riggs (#22)
Re: [HACKERS] Issues with logical replication

On 30/11/17 11:48, Simon Riggs wrote:

On 30 November 2017 at 11:30, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:

On 30/11/17 00:47, Andres Freund wrote:

On 2017-11-30 00:45:44 +0100, Petr Jelinek wrote:

I don't understand. I mean sure the SnapBuildWaitSnapshot() can live
with it, but the problematic logic happens inside the
XactLockTableInsert() and SnapBuildWaitSnapshot() has no way of
detecting the situation short of reimplementing the
XactLockTableInsert() instead of calling it.

Right. But we fairly trivially can change that. I'm remarking on it
because other people's, not yours, suggestions aimed at making this
bulletproof. I just wanted to make clear that I don't think that's
necessary at all.

Okay, then I guess we are in agreement. I can confirm that the attached
fixes the issue in my tests. Using SubTransGetTopmostTransaction()
instead of SubTransGetParent() means 3 more ifs in terms of extra CPU
cost for other callers. I don't think it's worth worrying about given we
are waiting for heavyweight lock, but if we did we can just inline the
code directly into SnapBuildWaitSnapshot().

This will still fail an Assert in TransactionIdIsInProgress() when
snapshots are overflowed.

Hmm, which one, why?

I see 2 Asserts there, one is:

Assert(nxids == 0);

Which is inside the RecoveryInProgress(), surely on standbys there will
still be no PGXACTs with assigned xids so that should be fine.

The other one is:

Assert(TransactionIdIsValid(topxid));

Which should be again fine toplevel xid of toplevel xid is same xid
which is a valid one.

So I think we should be fine there.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#24Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Stas Kelvich (#21)
1 attachment(s)
Re: [HACKERS] Issues with logical replication

Stas Kelvich wrote:

Seems that having busy loop is the best idea out of several discussed.

I thought about small sleep at the bottom of that loop if we reached topmost
transaction, but taking into account low probability of that event may be
it is faster to do just busy wait.

In other locations where we do similar things we have 1ms sleeps. I
agree with the need for a comment here.

Proposed patch attached. I tried your reload.pgb test case in 9.4
(after changing pgoutput to test_decoding and removing the 3rd arg to a
function call) and the crash takes about 3 seconds without patch in my
machine. No crash with this patch.

Will push this shortly after lunch.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-Make-XactLockTableWait-work-for-transactions-that-ar.patchtext/plain; charset=iso-8859-1Download
From abca9e98d1ddf86e892bd23bd24878516225a2d1 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Wed, 3 Jan 2018 12:36:47 -0300
Subject: [PATCH] Make XactLockTableWait work for transactions that are not yet
 self-locked
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

XactLockTableWait assumed that its xid argument has already added itself
to the lock table.  That assumption led to another assumption that if
locking the xid has succeeded but the xid is reported as still in
progress, then the input xid must have been a subtransaction.

These assumptions hold true for the original uses of this code in
locking related to on-disk tuples, but they break down in logical
replication slot snapshot building -- in particular, when a standby
snapshot logged contains an xid that's already in ProcArray but not yet
in the lock table.  This leads to assertion failures that can be
reproduced all the way back to 9.4, when logical decoding was
introduced.

The new coding does not assume that the xid is a subtransaction anymore,
and will retry the lock with same toplevel xid it was given before
(courtesy of SubTransGetTopmostTransaction()).

For consistency, change ConditionalXactLockTableWait the same way.

Author: Petr Jelínek
Discussion: https://postgr.es/m/1B3E32D8-FCF4-40B4-AEF9-5C0E3AC57969@postgrespro.ru
Reported-by: Konstantin Knizhnik
Diagnosed-by: Stas Kelvich, Petr Jelínek
Reviewed-by: Andres Freund, Robert Haas
---
 src/backend/storage/lmgr/lmgr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index 3754283d2b..c184f1f956 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -590,7 +590,7 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
 
 		if (!TransactionIdIsInProgress(xid))
 			break;
-		xid = SubTransGetParent(xid);
+		xid = SubTransGetTopmostTransaction(xid);
 	}
 
 	if (oper != XLTW_None)
@@ -622,7 +622,7 @@ ConditionalXactLockTableWait(TransactionId xid)
 
 		if (!TransactionIdIsInProgress(xid))
 			break;
-		xid = SubTransGetParent(xid);
+		xid = SubTransGetTopmostTransaction(xid);
 	}
 
 	return true;
-- 
2.11.0

#25Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#24)
Re: [HACKERS] Issues with logical replication

Alvaro Herrera wrote:

Will push this shortly after lunch.

Pushed. Will you (Konstantin, Stas, Masahiko) please verify that after
this commit all the problems reported with logical replication are
fixed?

Thanks

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#26Stas Kelvich
s.kelvich@postgrespro.ru
In reply to: Alvaro Herrera (#25)
Re: [HACKERS] Issues with logical replication

On 3 Jan 2018, at 23:35, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Pushed. Will you (Konstantin, Stas, Masahiko) please verify that after
this commit all the problems reported with logical replication are
fixed?

Checked that with and without extra sleep in AssignTransactionId(). In both
cases patch works.

Thanks!

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company