A failure in t/001_rep_changes.pl

Started by Bharath Rupireddyalmost 2 years ago3 messages
#1Bharath Rupireddy
Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com

Hi,

I recently observed an assertion failure twice in t/001_rep_changes.pl
on HEAD with the backtrace [1]t/001_rep_changes.pl on my dev EC2 c5.4xlarge instance [2]Linux ip-000-00-0-000 6.2.0-1018-aws #18~22.04.1-Ubuntu SMP Wed Jan 10 22:54:16 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux.
Unfortunately I'm not observing it again. I haven't got a chance to
dive deep into it. However, I'm posting it here just for the records,
and in case something can be derived out of the backtrace.

[1]: t/001_rep_changes.pl

2024-01-31 12:24:38.474 UTC [840166]
pg_16435_sync_16393_7330237333761601891 STATEMENT:
DROP_REPLICATION_SLOT pg_16435_sync_16393_7330237333761601891 WAIT
TRAP: failed Assert("list->head != INVALID_PGPROCNO"), File:
"../../../../src/include/storage/proclist.h", Line: 101, PID: 840166
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(ExceptionalCondition+0xbb)[0x55c8edf6b8f9]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x6637de)[0x55c8edd517de]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(ConditionVariablePrepareToSleep+0x85)[0x55c8edd51b91]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(ReplicationSlotAcquire+0x142)[0x55c8edcead6b]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(ReplicationSlotDrop+0x51)[0x55c8edceb47f]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x60da71)[0x55c8edcfba71]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(exec_replication_command+0x47e)[0x55c8edcfc96a]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(PostgresMain+0x7df)[0x55c8edd7d644]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x5ab50c)[0x55c8edc9950c]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x5aab21)[0x55c8edc98b21]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x5a70de)[0x55c8edc950de]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(PostmasterMain+0x1534)[0x55c8edc949db]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x459c47)[0x55c8edb47c47]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f19fe629d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f19fe629e40]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(_start+0x25)[0x55c8ed7c4565]
2024-01-31 12:24:38.476 UTC [840168]
pg_16435_sync_16390_7330237333761601891 LOG: statement: SELECT
a.attnum, a.attname, a.atttypid, a.attnum =
ANY(i.indkey) FROM pg_catalog.pg_attribute a LEFT JOIN
pg_catalog.pg_index i ON (i.indexrelid =
pg_get_replica_identity_index(16391)) WHERE a.attnum >
0::pg_catalog.int2 AND NOT a.attisdropped AND a.attgenerated = ''
AND a.attrelid = 16391 ORDER BY a.attnum

[2]: Linux ip-000-00-0-000 6.2.0-1018-aws #18~22.04.1-Ubuntu SMP Wed Jan 10 22:54:16 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Jan 10 22:54:16 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#2vignesh C
vignesh C
vignesh21@gmail.com
In reply to: Bharath Rupireddy (#1)
Re: A failure in t/001_rep_changes.pl

On Wed, 14 Feb 2024 at 13:19, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Hi,

I recently observed an assertion failure twice in t/001_rep_changes.pl
on HEAD with the backtrace [1] on my dev EC2 c5.4xlarge instance [2].
Unfortunately I'm not observing it again. I haven't got a chance to
dive deep into it. However, I'm posting it here just for the records,
and in case something can be derived out of the backtrace.

[1] t/001_rep_changes.pl

2024-01-31 12:24:38.474 UTC [840166]
pg_16435_sync_16393_7330237333761601891 STATEMENT:
DROP_REPLICATION_SLOT pg_16435_sync_16393_7330237333761601891 WAIT
TRAP: failed Assert("list->head != INVALID_PGPROCNO"), File:
"../../../../src/include/storage/proclist.h", Line: 101, PID: 840166
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(ExceptionalCondition+0xbb)[0x55c8edf6b8f9]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x6637de)[0x55c8edd517de]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(ConditionVariablePrepareToSleep+0x85)[0x55c8edd51b91]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(ReplicationSlotAcquire+0x142)[0x55c8edcead6b]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(ReplicationSlotDrop+0x51)[0x55c8edceb47f]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x60da71)[0x55c8edcfba71]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(exec_replication_command+0x47e)[0x55c8edcfc96a]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(PostgresMain+0x7df)[0x55c8edd7d644]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x5ab50c)[0x55c8edc9950c]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x5aab21)[0x55c8edc98b21]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x5a70de)[0x55c8edc950de]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(PostmasterMain+0x1534)[0x55c8edc949db]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x459c47)[0x55c8edb47c47]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f19fe629d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f19fe629e40]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(_start+0x25)[0x55c8ed7c4565]
2024-01-31 12:24:38.476 UTC [840168]
pg_16435_sync_16390_7330237333761601891 LOG: statement: SELECT
a.attnum, a.attname, a.atttypid, a.attnum =
ANY(i.indkey) FROM pg_catalog.pg_attribute a LEFT JOIN
pg_catalog.pg_index i ON (i.indexrelid =
pg_get_replica_identity_index(16391)) WHERE a.attnum >
0::pg_catalog.int2 AND NOT a.attisdropped AND a.attgenerated = ''
AND a.attrelid = 16391 ORDER BY a.attnum

[2] Linux ip-000-00-0-000 6.2.0-1018-aws #18~22.04.1-Ubuntu SMP Wed
Jan 10 22:54:16 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

By any chance do you have the log files when this failure occurred, if
so please share it.

Regards,
Vignesh

#3Kyotaro Horiguchi
Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: vignesh C (#2)
Re: A failure in t/001_rep_changes.pl

At Fri, 23 Feb 2024 15:50:21 +0530, vignesh C <vignesh21@gmail.com> wrote in

By any chance do you have the log files when this failure occurred, if
so please share it.

In my understanding, within a single instance, no two proclists can
simultaneously share the same waitlink member of PGPROC.

On the other hand, a publisher uses two condition variables for slots
and WAL waiting, which work on the same PGPROC member cvWaitLink. I
suspect this issue arises from the configuration. However, although it
is unlikly related to this specific issue, a similar problem can arise
in instances that function both as logical publisher and physical
primary.

Regardless of this issue, I think we should provide separate waitlink
members for condition variables that can possibly be used
simultaneously.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center