Truncate in synchronous logical replication failed

Started by tanghy.fnst@fujitsu.comabout 5 years ago40 messageshackers
Jump to latest
#1tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com

Hi

I met a problem in synchronous logical replication. The client hangs when TRUNCATE TABLE at publisher.

Example of the procedure:
------publisher------
create table test (a int primary key);
create publication pub for table test;

------subscriber------
create table test (a int primary key);
create subscription sub connection 'dbname=postgres' publication pub;

Then, set synchronous_standby_names = 'sub’ on publisher, and reload publisher.

------publisher------
truncate test;

Then the client of publisher will wait for a long time. A moment later, the publisher and subscriber will report following errors.
Subscriber log
2021-04-07 12:13:07.700 CST [3542235] logical replication worker ERROR: terminating logical replication worker due to timeout
2021-04-07 12:13:07.722 CST [3542217] postmaster LOG: background worker "logical replication worker" (PID 3542235) exited with exit code 1
2021-04-07 12:13:07.723 CST [3542357] logical replication worker LOG: logical replication apply worker for subscription "sub" has started
2021-04-07 12:13:07.745 CST [3542357] logical replication worker ERROR: could not start WAL streaming: ERROR: replication slot "sub" is active for PID 3542236
Publisher log
2021-04-07 12:13:07.745 CST [3542358] walsender ERROR: replication slot "sub" is active for PID 3542236
2021-04-07 12:13:07.745 CST [3542358] walsender STATEMENT: START_REPLICATION SLOT "sub" LOGICAL 0/169ECE8 (proto_version '2', publication_names '"pub"')

I checked the PG-DOC, found it says that “Replication of TRUNCATE commands is supported”[1]https://www.postgresql.org/docs/devel/logical-replication-restrictions.html, so maybe TRUNCATE is not supported in synchronous logical replication?

If my understanding is right, maybe PG-DOC can be modified like this. Any thought?
Replication of TRUNCATE commands is supported
->
Replication of TRUNCATE commands is supported in asynchronous mode

[1]: https://www.postgresql.org/docs/devel/logical-replication-restrictions.html

Regards,
Tang

#2Amit Kapila
amit.kapila16@gmail.com
In reply to: tanghy.fnst@fujitsu.com (#1)
Re: Truncate in synchronous logical replication failed

On Wed, Apr 7, 2021 at 12:26 PM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:

Hi

I met a problem in synchronous logical replication. The client hangs when TRUNCATE TABLE at publisher.

Can you please check if the behavior is the same for PG-13? This is
just to ensure that we have not introduced any bug in PG-14.

--
With Regards,
Amit Kapila.

#3tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
In reply to: Amit Kapila (#2)
RE: Truncate in synchronous logical replication failed

On Wednesday, April 7, 2021 5:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote

Can you please check if the behavior is the same for PG-13? This is
just to ensure that we have not introduced any bug in PG-14.

Yes, same failure happens at PG-13, too.

Regards,
Tang

#4Japin Li
japinli@hotmail.com
In reply to: tanghy.fnst@fujitsu.com (#3)
Re: Truncate in synchronous logical replication failed

On Wed, 07 Apr 2021 at 16:34, tanghy.fnst@fujitsu.com <tanghy.fnst@fujitsu.com> wrote:

On Wednesday, April 7, 2021 5:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote

Can you please check if the behavior is the same for PG-13? This is
just to ensure that we have not introduced any bug in PG-14.

Yes, same failure happens at PG-13, too.

I found that when we truncate a table in synchronous logical replication,
LockAcquireExtended() [1]if (EligibleForRelationFastPath(locktag, lockmode) && FastPathLocalUseCount < FP_LOCK_SLOTS_PER_BACKEND) { uint32 fasthashcode = FastPathStrongLockHashPartition(hashcode); bool acquired; will try to take a lock via fast path and it
failed (FastPathStrongRelationLocks->count[fasthashcode] = 1).
However, it can acquire the lock when in asynchronous logical replication.
I'm not familiar with the locks, any suggestions? What the difference
between sync and async logical replication for locks?

[1]: if (EligibleForRelationFastPath(locktag, lockmode) && FastPathLocalUseCount < FP_LOCK_SLOTS_PER_BACKEND) { uint32 fasthashcode = FastPathStrongLockHashPartition(hashcode); bool acquired;
if (EligibleForRelationFastPath(locktag, lockmode) &&
FastPathLocalUseCount < FP_LOCK_SLOTS_PER_BACKEND)
{
uint32 fasthashcode = FastPathStrongLockHashPartition(hashcode);
bool acquired;

/*
* LWLockAcquire acts as a memory sequencing point, so it's safe to
* assume that any strong locker whose increment to
* FastPathStrongRelationLocks->counts becomes visible after we test
* it has yet to begin to transfer fast-path locks.
*/
LWLockAcquire(&MyProc->fpInfoLock, LW_EXCLUSIVE);
if (FastPathStrongRelationLocks->count[fasthashcode] != 0)
acquired = false;
else
acquired = FastPathGrantRelationLock(locktag->locktag_field2,
lockmode);
LWLockRelease(&MyProc->fpInfoLock);
if (acquired)
{
/*
* The locallock might contain stale pointers to some old shared
* objects; we MUST reset these to null before considering the
* lock to be acquired via fast-path.
*/
locallock->lock = NULL;
locallock->proclock = NULL;
GrantLockLocal(locallock, owner);
return LOCKACQUIRE_OK;
}
}

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

#5Japin Li
japinli@hotmail.com
In reply to: Japin Li (#4)
Re: Truncate in synchronous logical replication failed

On Thu, 08 Apr 2021 at 19:20, Japin Li <japinli@hotmail.com> wrote:

On Wed, 07 Apr 2021 at 16:34, tanghy.fnst@fujitsu.com <tanghy.fnst@fujitsu.com> wrote:

On Wednesday, April 7, 2021 5:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote

Can you please check if the behavior is the same for PG-13? This is
just to ensure that we have not introduced any bug in PG-14.

Yes, same failure happens at PG-13, too.

I found that when we truncate a table in synchronous logical replication,
LockAcquireExtended() [1] will try to take a lock via fast path and it
failed (FastPathStrongRelationLocks->count[fasthashcode] = 1).
However, it can acquire the lock when in asynchronous logical replication.
I'm not familiar with the locks, any suggestions? What the difference
between sync and async logical replication for locks?

After some analyze, I find that when the TRUNCATE finish, it will call
SyncRepWaitForLSN(), for asynchronous logical replication, it will exit
early, and then it calls ResourceOwnerRelease(RESOURCE_RELEASE_LOCKS) to
release the locks, so the walsender can acquire the lock.

But for synchronous logical replication, SyncRepWaitForLSN() will wait
for specified LSN to be confirmed, so it cannot release the lock, and
the walsender try to acquire the lock. Obviously, it cannot acquire the
lock, because the lock hold by the process which performs TRUNCATE
command. This is why the TRUNCATE in synchronous logical replication is
blocked.

I don't know if it makes sense to fix this, if so, how to do fix it?
Thoughts?

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

#6osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Japin Li (#5)
RE: Truncate in synchronous logical replication failed

Hi

On Saturday, April 10, 2021 11:52 PM Japin Li <japinli@hotmail.com> wrote:

On Thu, 08 Apr 2021 at 19:20, Japin Li <japinli@hotmail.com> wrote:

On Wed, 07 Apr 2021 at 16:34, tanghy.fnst@fujitsu.com

<tanghy.fnst@fujitsu.com> wrote:

On Wednesday, April 7, 2021 5:28 PM Amit Kapila
<amit.kapila16@gmail.com> wrote

Can you please check if the behavior is the same for PG-13? This is
just to ensure that we have not introduced any bug in PG-14.

Yes, same failure happens at PG-13, too.

I found that when we truncate a table in synchronous logical
replication,
LockAcquireExtended() [1] will try to take a lock via fast path and it
failed (FastPathStrongRelationLocks->count[fasthashcode] = 1).
However, it can acquire the lock when in asynchronous logical replication.
I'm not familiar with the locks, any suggestions? What the difference
between sync and async logical replication for locks?

After some analyze, I find that when the TRUNCATE finish, it will call
SyncRepWaitForLSN(), for asynchronous logical replication, it will exit early,
and then it calls ResourceOwnerRelease(RESOURCE_RELEASE_LOCKS) to
release the locks, so the walsender can acquire the lock.

But for synchronous logical replication, SyncRepWaitForLSN() will wait for
specified LSN to be confirmed, so it cannot release the lock, and the
walsender try to acquire the lock. Obviously, it cannot acquire the lock,
because the lock hold by the process which performs TRUNCATE command.
This is why the TRUNCATE in synchronous logical replication is blocked.

Yeah, the TRUNCATE waits in SyncRepWaitForLSN() while
the walsender is blocked by the AccessExclusiveLock taken by it,
which makes the subscriber cannot take the change and leads to a sort of deadlock.

On Wednesday, April 7, 2021 3:56 PM tanghy.fnst@fujitsu.com <tanghy.fnst@fujitsu.com> wrote:

I checked the PG-DOC, found it says that “Replication of TRUNCATE
commands is supported”[1], so maybe TRUNCATE is not supported in
synchronous logical replication?

If my understanding is right, maybe PG-DOC can be modified like this. Any
thought?
Replication of TRUNCATE commands is supported
->
Replication of TRUNCATE commands is supported in asynchronous mode

I'm not sure if this becomes the final solution,
but if we take a measure to fix the doc, we have to be careful for the description,
because when we remove the primary keys of 'test' tables on the scenario in [1]/messages/by-id/OS0PR01MB6113C2499C7DC70EE55ADB82FB759@OS0PR01MB6113.jpnprd01.prod.outlook.com, we don't have this issue.
It means TRUNCATE in synchronous logical replication is not always blocked.

Having the primary key on the pub only causes the hang.
Also, I can observe the same hang using REPLICA IDENTITY USING INDEX and without primary key on the pub,
while I cannot reproduce the problem with the REPLICA IDENTITY FULL and without primary key.
This difference comes from logicalrep_write_attrs() which has a branch to call RelationGetIndexAttrBitmap().
Therefore, the description above is not correct, strictly speaking, I thought.

I'll share my analysis when I get a better idea to address this.

[1]: /messages/by-id/OS0PR01MB6113C2499C7DC70EE55ADB82FB759@OS0PR01MB6113.jpnprd01.prod.outlook.com

Best Regards,
Takamichi Osumi

#7Amit Kapila
amit.kapila16@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#6)
Re: Truncate in synchronous logical replication failed

On Mon, Apr 12, 2021 at 10:03 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

I checked the PG-DOC, found it says that “Replication of TRUNCATE
commands is supported”[1], so maybe TRUNCATE is not supported in
synchronous logical replication?

If my understanding is right, maybe PG-DOC can be modified like this. Any
thought?
Replication of TRUNCATE commands is supported
->
Replication of TRUNCATE commands is supported in asynchronous mode

I'm not sure if this becomes the final solution,

I think unless the solution is not possible or extremely complicated
going via this route doesn't seem advisable.

but if we take a measure to fix the doc, we have to be careful for the description,
because when we remove the primary keys of 'test' tables on the scenario in [1], we don't have this issue.
It means TRUNCATE in synchronous logical replication is not always blocked.

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that
information which locks the required indexes. Now, because TRUNCATE
has already acquired an exclusive lock on the index, it seems to
create a sort of deadlock where the actual Truncate operation waits
for logical replication of operation to complete and logical
replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock the main
relation, we just scan the system table and build that information
using a historic snapshot. Can't we do something similar here?

Adding Petr J. and Peter E. to know their views as this seems to be an
old problem (since the decoding of Truncate operation is introduced).

--
With Regards,
Amit Kapila.

#8osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Amit Kapila (#7)
RE: Truncate in synchronous logical replication failed

On Monday, April 12, 2021 3:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 10:03 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

but if we take a measure to fix the doc, we have to be careful for the
description, because when we remove the primary keys of 'test' tables on the

scenario in [1], we don't have this issue.

It means TRUNCATE in synchronous logical replication is not always

blocked.

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that information
which locks the required indexes. Now, because TRUNCATE has already
acquired an exclusive lock on the index, it seems to create a sort of deadlock
where the actual Truncate operation waits for logical replication of operation to
complete and logical replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build IDENTITY_KEY
attributes? During decoding, we don't even lock the main relation, we just scan
the system table and build that information using a historic snapshot. Can't we
do something similar here?

I think we can build the IDENTITY_KEY attributes with NoLock
instead of calling RelationGetIndexAttrBitmap().

When we trace back the caller side of logicalrep_write_attrs(),
doing the thing equivalent to RelationGetIndexAttrBitmap()
for INDEX_ATTR_BITMAP_IDENTITY_KEY impacts only pgoutput_truncate.

OTOH, I can't find codes similar to RelationGetIndexAttrBitmap()
in pgoutput_* functions and in the file of relcache.c.
Therefore, I'd like to discuss how to address the hang.

My first idea is to extract some parts of RelationGetIndexAttrBitmap()
only for INDEX_ATTR_BITMAP_IDENTITY_KEY and implement those
either in a logicalrep_write_attrs() or as a new function.
RelationGetIndexAttrBitmap() has 'restart' label for goto statement
in order to ensure to return up-to-date attribute bitmaps, so
I prefer having a new function when we choose this direction.
Having that goto in logicalrep_write_attrs() makes it a little bit messy, I felt.

The other direction might be to extend RelationGetIndexAttrBitmap's function definition
to accept lockmode to give NoLock from logicalrep_write_attrs().
But, this change impacts on other several callers so is not as good as the first direction above, I think.

If someone has any better idea, please let me know.

Best Regards,
Takamichi Osumi

#9Petr Jelinek
petr@2ndquadrant.com
In reply to: Amit Kapila (#7)
Re: Truncate in synchronous logical replication failed

On 12 Apr 2021, at 08:58, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 10:03 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

I checked the PG-DOC, found it says that “Replication of TRUNCATE
commands is supported”[1], so maybe TRUNCATE is not supported in
synchronous logical replication?

If my understanding is right, maybe PG-DOC can be modified like this. Any
thought?
Replication of TRUNCATE commands is supported
->
Replication of TRUNCATE commands is supported in asynchronous mode

I'm not sure if this becomes the final solution,

I think unless the solution is not possible or extremely complicated
going via this route doesn't seem advisable.

but if we take a measure to fix the doc, we have to be careful for the description,
because when we remove the primary keys of 'test' tables on the scenario in [1], we don't have this issue.
It means TRUNCATE in synchronous logical replication is not always blocked.

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that
information which locks the required indexes. Now, because TRUNCATE
has already acquired an exclusive lock on the index, it seems to
create a sort of deadlock where the actual Truncate operation waits
for logical replication of operation to complete and logical
replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock the main
relation, we just scan the system table and build that information
using a historic snapshot. Can't we do something similar here?

Adding Petr J. and Peter E. to know their views as this seems to be an
old problem (since the decoding of Truncate operation is introduced).

We used RelationGetIndexAttrBitmap because it already existed, no other reason. I am not sure what exact locking we need but I would have guessed at least AccessShareLock would be needed.

--
Petr

#10Japin Li
japinli@hotmail.com
In reply to: osumi.takamichi@fujitsu.com (#8)
Re: Truncate in synchronous logical replication failed

On Tue, 13 Apr 2021 at 21:54, osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote:

On Monday, April 12, 2021 3:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 10:03 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

but if we take a measure to fix the doc, we have to be careful for the
description, because when we remove the primary keys of 'test' tables on the

scenario in [1], we don't have this issue.

It means TRUNCATE in synchronous logical replication is not always

blocked.

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that information
which locks the required indexes. Now, because TRUNCATE has already
acquired an exclusive lock on the index, it seems to create a sort of deadlock
where the actual Truncate operation waits for logical replication of operation to
complete and logical replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build IDENTITY_KEY
attributes? During decoding, we don't even lock the main relation, we just scan
the system table and build that information using a historic snapshot. Can't we
do something similar here?

I think we can build the IDENTITY_KEY attributes with NoLock
instead of calling RelationGetIndexAttrBitmap().

When we trace back the caller side of logicalrep_write_attrs(),
doing the thing equivalent to RelationGetIndexAttrBitmap()
for INDEX_ATTR_BITMAP_IDENTITY_KEY impacts only pgoutput_truncate.

OTOH, I can't find codes similar to RelationGetIndexAttrBitmap()
in pgoutput_* functions and in the file of relcache.c.
Therefore, I'd like to discuss how to address the hang.

My first idea is to extract some parts of RelationGetIndexAttrBitmap()
only for INDEX_ATTR_BITMAP_IDENTITY_KEY and implement those
either in a logicalrep_write_attrs() or as a new function.
RelationGetIndexAttrBitmap() has 'restart' label for goto statement
in order to ensure to return up-to-date attribute bitmaps, so
I prefer having a new function when we choose this direction.
Having that goto in logicalrep_write_attrs() makes it a little bit messy, I felt.

The other direction might be to extend RelationGetIndexAttrBitmap's function definition
to accept lockmode to give NoLock from logicalrep_write_attrs().
But, this change impacts on other several callers so is not as good as the first direction above, I think.

If someone has any better idea, please let me know.

I think the first idea is better than the second. OTOH, can we release the
locks before SyncRepWaitForLSN(), since it already flush to local WAL files.

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

#11osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Petr Jelinek (#9)
RE: Truncate in synchronous logical replication failed

On Tuesday, April 13, 2021 11:38 PM Petr Jelinek <petr.jelinek@enterprisedb.com> wrote:

On 12 Apr 2021, at 08:58, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Apr 12, 2021 at 10:03 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

I checked the PG-DOC, found it says that “Replication of TRUNCATE
commands is supported”[1], so maybe TRUNCATE is not supported in
synchronous logical replication?

If my understanding is right, maybe PG-DOC can be modified like
this. Any thought?
Replication of TRUNCATE commands is supported
->
Replication of TRUNCATE commands is supported in asynchronous

mode

I'm not sure if this becomes the final solution,

I think unless the solution is not possible or extremely complicated
going via this route doesn't seem advisable.

but if we take a measure to fix the doc, we have to be careful for
the description, because when we remove the primary keys of 'test' tables

on the scenario in [1], we don't have this issue.

It means TRUNCATE in synchronous logical replication is not always

blocked.

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that
information which locks the required indexes. Now, because TRUNCATE
has already acquired an exclusive lock on the index, it seems to
create a sort of deadlock where the actual Truncate operation waits
for logical replication of operation to complete and logical
replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock the main
relation, we just scan the system table and build that information
using a historic snapshot. Can't we do something similar here?

Adding Petr J. and Peter E. to know their views as this seems to be an
old problem (since the decoding of Truncate operation is introduced).

We used RelationGetIndexAttrBitmap because it already existed, no other
reason.I am not sure what exact locking we need but I would have guessed at
least AccessShareLock would be needed.

This was true.

Having a look at the comment of index_open(), there's a description of basic rule
that NoLock should be used if appropriate lock on the index is already taken.
And, making the walsender use NoLock to build the attributes
leads us to the Assert in the relation_open().

Please ignore the two ideas I suggested in another mail,
which doesn't follow the basic and work.

Best Regards,
Takamichi Osumi

#12osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Japin Li (#10)
RE: Truncate in synchronous logical replication failed

On Wednesday, April 14, 2021 11:38 AM Japin Li <japinli@hotmail.com> wrote:

On Tue, 13 Apr 2021 at 21:54, osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Monday, April 12, 2021 3:58 PM Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Mon, Apr 12, 2021 at 10:03 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

but if we take a measure to fix the doc, we have to be careful for
the description, because when we remove the primary keys of 'test'
tables on the

scenario in [1], we don't have this issue.

It means TRUNCATE in synchronous logical replication is not always

blocked.

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that
information which locks the required indexes. Now, because TRUNCATE
has already acquired an exclusive lock on the index, it seems to
create a sort of deadlock where the actual Truncate operation waits
for logical replication of operation to complete and logical replication waits

for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock the main
relation, we just scan the system table and build that information
using a historic snapshot. Can't we do something similar here?

I think we can build the IDENTITY_KEY attributes with NoLock instead
of calling RelationGetIndexAttrBitmap().

When we trace back the caller side of logicalrep_write_attrs(), doing
the thing equivalent to RelationGetIndexAttrBitmap() for
INDEX_ATTR_BITMAP_IDENTITY_KEY impacts only pgoutput_truncate.

OTOH, I can't find codes similar to RelationGetIndexAttrBitmap() in
pgoutput_* functions and in the file of relcache.c.
Therefore, I'd like to discuss how to address the hang.

My first idea is to extract some parts of RelationGetIndexAttrBitmap()
only for INDEX_ATTR_BITMAP_IDENTITY_KEY and implement those either

in

a logicalrep_write_attrs() or as a new function.
RelationGetIndexAttrBitmap() has 'restart' label for goto statement in
order to ensure to return up-to-date attribute bitmaps, so I prefer
having a new function when we choose this direction.
Having that goto in logicalrep_write_attrs() makes it a little bit messy, I felt.

The other direction might be to extend RelationGetIndexAttrBitmap's
function definition to accept lockmode to give NoLock from

logicalrep_write_attrs().

But, this change impacts on other several callers so is not as good as the first

direction above, I think.

If someone has any better idea, please let me know.

I think the first idea is better than the second. OTOH, can we release the locks
before SyncRepWaitForLSN(), since it already flush to local WAL files.

Thank you for your comments.
I didn't mean to change and touch TRUNCATE side to release the locks,
because I expected that its AccessExclusiveLock
protects any other operation (e.g. DROP INDEX) to the table
which affects IDENTITY KEY building. But, now as I said in another e-mail,
both ideas above can't work. Really sorry for making noises.

Best Regards,
Takamichi Osumi

#13Amit Kapila
amit.kapila16@gmail.com
In reply to: Petr Jelinek (#9)
Re: Truncate in synchronous logical replication failed

On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:

On 12 Apr 2021, at 08:58, Amit Kapila <amit.kapila16@gmail.com> wrote:

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that
information which locks the required indexes. Now, because TRUNCATE
has already acquired an exclusive lock on the index, it seems to
create a sort of deadlock where the actual Truncate operation waits
for logical replication of operation to complete and logical
replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock the main
relation, we just scan the system table and build that information
using a historic snapshot. Can't we do something similar here?

Adding Petr J. and Peter E. to know their views as this seems to be an
old problem (since the decoding of Truncate operation is introduced).

We used RelationGetIndexAttrBitmap because it already existed, no other reason.

Fair enough. But I think we should do something about it because using
the same (RelationGetIndexAttrBitmap) just breaks the synchronous
logical replication. I think this is broken since the logical
replication of Truncate is supported.

I am not sure what exact locking we need but I would have guessed at least AccessShareLock would be needed.

Are you telling that we need AccessShareLock on the index? If so, why
is it different from how we access the relation during decoding
(basically in ReorderBufferProcessTXN, we directly use
RelationIdGetRelation() without any lock on the relation)? I think we
do it that way because we need it to process WAL entries and we need
the relation state based on the historic snapshot, so, even if the
relation is later changed/dropped, we are fine with the old state we
got. Isn't the same thing applies here in logicalrep_write_attrs? If
that is true then some equivalent of RelationGetIndexAttrBitmap where
we use RelationIdGetRelation instead of index_open should work? Am, I
missing something?

--
With Regards,
Amit Kapila.

#14Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#13)
Re: Truncate in synchronous logical replication failed

On Wed, Apr 14, 2021 at 3:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:

On 12 Apr 2021, at 08:58, Amit Kapila <amit.kapila16@gmail.com> wrote:

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that
information which locks the required indexes. Now, because TRUNCATE
has already acquired an exclusive lock on the index, it seems to
create a sort of deadlock where the actual Truncate operation waits
for logical replication of operation to complete and logical
replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock the main
relation, we just scan the system table and build that information
using a historic snapshot. Can't we do something similar here?

Adding Petr J. and Peter E. to know their views as this seems to be an
old problem (since the decoding of Truncate operation is introduced).

We used RelationGetIndexAttrBitmap because it already existed, no other reason.

Fair enough. But I think we should do something about it because using
the same (RelationGetIndexAttrBitmap) just breaks the synchronous
logical replication. I think this is broken since the logical
replication of Truncate is supported.

I am not sure what exact locking we need but I would have guessed at least AccessShareLock would be needed.

Are you telling that we need AccessShareLock on the index? If so, why
is it different from how we access the relation during decoding
(basically in ReorderBufferProcessTXN, we directly use
RelationIdGetRelation() without any lock on the relation)? I think we
do it that way because we need it to process WAL entries and we need
the relation state based on the historic snapshot, so, even if the
relation is later changed/dropped, we are fine with the old state we
got. Isn't the same thing applies here in logicalrep_write_attrs? If
that is true then some equivalent of RelationGetIndexAttrBitmap where
we use RelationIdGetRelation instead of index_open should work?

Today, again I have thought about this and don't see a problem with
the above idea. If the above understanding is correct, then I think
for our purpose in pgoutput, we just need to call RelationGetIndexList
and then build the idattr list for relation->rd_replidindex. We can
then cache it in relation->rd_idattr. I am not sure if it is really
required to do all the other work in RelationGetIndexAttrBitmap.

--
With Regards,
Amit Kapila.

#15Japin Li
japinli@hotmail.com
In reply to: Amit Kapila (#14)
Re: Truncate in synchronous logical replication failed

On Thu, 15 Apr 2021 at 18:22, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 14, 2021 at 3:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:

On 12 Apr 2021, at 08:58, Amit Kapila <amit.kapila16@gmail.com> wrote:

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that
information which locks the required indexes. Now, because TRUNCATE
has already acquired an exclusive lock on the index, it seems to
create a sort of deadlock where the actual Truncate operation waits
for logical replication of operation to complete and logical
replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock the main
relation, we just scan the system table and build that information
using a historic snapshot. Can't we do something similar here?

Adding Petr J. and Peter E. to know their views as this seems to be an
old problem (since the decoding of Truncate operation is introduced).

We used RelationGetIndexAttrBitmap because it already existed, no other reason.

Fair enough. But I think we should do something about it because using
the same (RelationGetIndexAttrBitmap) just breaks the synchronous
logical replication. I think this is broken since the logical
replication of Truncate is supported.

I am not sure what exact locking we need but I would have guessed at least AccessShareLock would be needed.

Are you telling that we need AccessShareLock on the index? If so, why
is it different from how we access the relation during decoding
(basically in ReorderBufferProcessTXN, we directly use
RelationIdGetRelation() without any lock on the relation)? I think we
do it that way because we need it to process WAL entries and we need
the relation state based on the historic snapshot, so, even if the
relation is later changed/dropped, we are fine with the old state we
got. Isn't the same thing applies here in logicalrep_write_attrs? If
that is true then some equivalent of RelationGetIndexAttrBitmap where
we use RelationIdGetRelation instead of index_open should work?

Today, again I have thought about this and don't see a problem with
the above idea. If the above understanding is correct, then I think
for our purpose in pgoutput, we just need to call RelationGetIndexList
and then build the idattr list for relation->rd_replidindex.

Sorry, I don't know how can we build the idattr without open the index.
If we should open the index, then we should use NoLock, since the TRUNCATE
side hold AccessExclusiveLock. As Osumi points out in [1]/messages/by-id/OSBPR01MB488834BDBD67BFF2FB048B3DED4E9@OSBPR01MB4888.jpnprd01.prod.outlook.com, The NoLock mode
assumes that the appropriate lock on the index is already taken.

Please let me know if I have misunderstood.

[1]: /messages/by-id/OSBPR01MB488834BDBD67BFF2FB048B3DED4E9@OSBPR01MB4888.jpnprd01.prod.outlook.com

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

#16Amit Kapila
amit.kapila16@gmail.com
In reply to: Japin Li (#15)
Re: Truncate in synchronous logical replication failed

On Thu, Apr 15, 2021 at 4:30 PM Japin Li <japinli@hotmail.com> wrote:

On Thu, 15 Apr 2021 at 18:22, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 14, 2021 at 3:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:

On 12 Apr 2021, at 08:58, Amit Kapila <amit.kapila16@gmail.com> wrote:

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that
information which locks the required indexes. Now, because TRUNCATE
has already acquired an exclusive lock on the index, it seems to
create a sort of deadlock where the actual Truncate operation waits
for logical replication of operation to complete and logical
replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock the main
relation, we just scan the system table and build that information
using a historic snapshot. Can't we do something similar here?

Adding Petr J. and Peter E. to know their views as this seems to be an
old problem (since the decoding of Truncate operation is introduced).

We used RelationGetIndexAttrBitmap because it already existed, no other reason.

Fair enough. But I think we should do something about it because using
the same (RelationGetIndexAttrBitmap) just breaks the synchronous
logical replication. I think this is broken since the logical
replication of Truncate is supported.

I am not sure what exact locking we need but I would have guessed at least AccessShareLock would be needed.

Are you telling that we need AccessShareLock on the index? If so, why
is it different from how we access the relation during decoding
(basically in ReorderBufferProcessTXN, we directly use
RelationIdGetRelation() without any lock on the relation)? I think we
do it that way because we need it to process WAL entries and we need
the relation state based on the historic snapshot, so, even if the
relation is later changed/dropped, we are fine with the old state we
got. Isn't the same thing applies here in logicalrep_write_attrs? If
that is true then some equivalent of RelationGetIndexAttrBitmap where
we use RelationIdGetRelation instead of index_open should work?

Today, again I have thought about this and don't see a problem with
the above idea. If the above understanding is correct, then I think
for our purpose in pgoutput, we just need to call RelationGetIndexList
and then build the idattr list for relation->rd_replidindex.

Sorry, I don't know how can we build the idattr without open the index.
If we should open the index, then we should use NoLock, since the TRUNCATE
side hold AccessExclusiveLock. As Osumi points out in [1], The NoLock mode
assumes that the appropriate lock on the index is already taken.

Why can't we use RelationIdGetRelation() by passing the required
indexOid to it?

--
With Regards,
Amit Kapila.

#17Japin Li
japinli@hotmail.com
In reply to: Amit Kapila (#16)
Re: Truncate in synchronous logical replication failed

On Thu, 15 Apr 2021 at 19:25, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 15, 2021 at 4:30 PM Japin Li <japinli@hotmail.com> wrote:

On Thu, 15 Apr 2021 at 18:22, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 14, 2021 at 3:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:

On 12 Apr 2021, at 08:58, Amit Kapila <amit.kapila16@gmail.com> wrote:

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that
information which locks the required indexes. Now, because TRUNCATE
has already acquired an exclusive lock on the index, it seems to
create a sort of deadlock where the actual Truncate operation waits
for logical replication of operation to complete and logical
replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock the main
relation, we just scan the system table and build that information
using a historic snapshot. Can't we do something similar here?

Adding Petr J. and Peter E. to know their views as this seems to be an
old problem (since the decoding of Truncate operation is introduced).

We used RelationGetIndexAttrBitmap because it already existed, no other reason.

Fair enough. But I think we should do something about it because using
the same (RelationGetIndexAttrBitmap) just breaks the synchronous
logical replication. I think this is broken since the logical
replication of Truncate is supported.

I am not sure what exact locking we need but I would have guessed at least AccessShareLock would be needed.

Are you telling that we need AccessShareLock on the index? If so, why
is it different from how we access the relation during decoding
(basically in ReorderBufferProcessTXN, we directly use
RelationIdGetRelation() without any lock on the relation)? I think we
do it that way because we need it to process WAL entries and we need
the relation state based on the historic snapshot, so, even if the
relation is later changed/dropped, we are fine with the old state we
got. Isn't the same thing applies here in logicalrep_write_attrs? If
that is true then some equivalent of RelationGetIndexAttrBitmap where
we use RelationIdGetRelation instead of index_open should work?

Today, again I have thought about this and don't see a problem with
the above idea. If the above understanding is correct, then I think
for our purpose in pgoutput, we just need to call RelationGetIndexList
and then build the idattr list for relation->rd_replidindex.

Sorry, I don't know how can we build the idattr without open the index.
If we should open the index, then we should use NoLock, since the TRUNCATE
side hold AccessExclusiveLock. As Osumi points out in [1], The NoLock mode
assumes that the appropriate lock on the index is already taken.

Why can't we use RelationIdGetRelation() by passing the required
indexOid to it?

Thanks for your reminder. It might be a way to solve this problem.
I'll try it later.

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

#18Japin Li
japinli@hotmail.com
In reply to: Amit Kapila (#16)
Re: Truncate in synchronous logical replication failed

On Thu, 15 Apr 2021 at 19:25, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 15, 2021 at 4:30 PM Japin Li <japinli@hotmail.com> wrote:

On Thu, 15 Apr 2021 at 18:22, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 14, 2021 at 3:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:

On 12 Apr 2021, at 08:58, Amit Kapila <amit.kapila16@gmail.com> wrote:

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that
information which locks the required indexes. Now, because TRUNCATE
has already acquired an exclusive lock on the index, it seems to
create a sort of deadlock where the actual Truncate operation waits
for logical replication of operation to complete and logical
replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock the main
relation, we just scan the system table and build that information
using a historic snapshot. Can't we do something similar here?

Adding Petr J. and Peter E. to know their views as this seems to be an
old problem (since the decoding of Truncate operation is introduced).

We used RelationGetIndexAttrBitmap because it already existed, no other reason.

Fair enough. But I think we should do something about it because using
the same (RelationGetIndexAttrBitmap) just breaks the synchronous
logical replication. I think this is broken since the logical
replication of Truncate is supported.

I am not sure what exact locking we need but I would have guessed at least AccessShareLock would be needed.

Are you telling that we need AccessShareLock on the index? If so, why
is it different from how we access the relation during decoding
(basically in ReorderBufferProcessTXN, we directly use
RelationIdGetRelation() without any lock on the relation)? I think we
do it that way because we need it to process WAL entries and we need
the relation state based on the historic snapshot, so, even if the
relation is later changed/dropped, we are fine with the old state we
got. Isn't the same thing applies here in logicalrep_write_attrs? If
that is true then some equivalent of RelationGetIndexAttrBitmap where
we use RelationIdGetRelation instead of index_open should work?

Today, again I have thought about this and don't see a problem with
the above idea. If the above understanding is correct, then I think
for our purpose in pgoutput, we just need to call RelationGetIndexList
and then build the idattr list for relation->rd_replidindex.

Sorry, I don't know how can we build the idattr without open the index.
If we should open the index, then we should use NoLock, since the TRUNCATE
side hold AccessExclusiveLock. As Osumi points out in [1], The NoLock mode
assumes that the appropriate lock on the index is already taken.

Why can't we use RelationIdGetRelation() by passing the required
indexOid to it?

Hi Amit, as your suggested, I try to use RelationIdGetRelation() replace
the index_open() to avoid specify the AccessSharedLock, then the TRUNCATE
will not be blocked.

Attached patch fixed it. Thoughts?

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Attachments:

truncate-table-in-synchronous-logical-replication.patchtext/x-patchDownload+28-3
#19osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Japin Li (#17)
RE: Truncate in synchronous logical replication failed

Hi

On Friday, April 16, 2021 11:02 AM Japin Li <japinli@hotmail.com>

On Thu, 15 Apr 2021 at 19:25, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 15, 2021 at 4:30 PM Japin Li <japinli@hotmail.com> wrote:

On Thu, 15 Apr 2021 at 18:22, Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Wed, Apr 14, 2021 at 3:31 PM Amit Kapila

<amit.kapila16@gmail.com> wrote:

On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:

On 12 Apr 2021, at 08:58, Amit Kapila <amit.kapila16@gmail.com>

wrote:

The problem happens only when we try to fetch IDENTITY_KEY
attributes because pgoutput uses RelationGetIndexAttrBitmap()
to get that information which locks the required indexes. Now,
because TRUNCATE has already acquired an exclusive lock on the
index, it seems to create a sort of deadlock where the actual
Truncate operation waits for logical replication of operation
to complete and logical replication waits for actual Truncate

operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock
the main relation, we just scan the system table and build
that information using a historic snapshot. Can't we do something

similar here?

Adding Petr J. and Peter E. to know their views as this seems
to be an old problem (since the decoding of Truncate operation is

introduced).

We used RelationGetIndexAttrBitmap because it already existed, no

other reason.

Fair enough. But I think we should do something about it because
using the same (RelationGetIndexAttrBitmap) just breaks the
synchronous logical replication. I think this is broken since the
logical replication of Truncate is supported.

I am not sure what exact locking we need but I would have guessed

at least AccessShareLock would be needed.

Are you telling that we need AccessShareLock on the index? If so,
why is it different from how we access the relation during
decoding (basically in ReorderBufferProcessTXN, we directly use
RelationIdGetRelation() without any lock on the relation)? I think
we do it that way because we need it to process WAL entries and we
need the relation state based on the historic snapshot, so, even
if the relation is later changed/dropped, we are fine with the old
state we got. Isn't the same thing applies here in
logicalrep_write_attrs? If that is true then some equivalent of
RelationGetIndexAttrBitmap where we use RelationIdGetRelation

instead of index_open should work?

Today, again I have thought about this and don't see a problem with
the above idea. If the above understanding is correct, then I think
for our purpose in pgoutput, we just need to call
RelationGetIndexList and then build the idattr list for

relation->rd_replidindex.

Sorry, I don't know how can we build the idattr without open the index.
If we should open the index, then we should use NoLock, since the
TRUNCATE side hold AccessExclusiveLock. As Osumi points out in [1],
The NoLock mode assumes that the appropriate lock on the index is

already taken.

Why can't we use RelationIdGetRelation() by passing the required
indexOid to it?

Thanks for your reminder. It might be a way to solve this problem.

Yeah. I've made the 1st patch for this issue.

In my env, with the patch
the TRUNCATE in synchronous logical replication doesn't hang.
It's OK with make check-world as well.

Best Regards,
Takamichi Osumi

Attachments:

Truncate_in_synchronous_logical_replication_v01.patchapplication/octet-stream; name=Truncate_in_synchronous_logical_replication_v01.patchDownload+101-2
#20Amit Kapila
amit.kapila16@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#19)
Re: Truncate in synchronous logical replication failed

On Fri, Apr 16, 2021 at 12:56 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

Thanks for your reminder. It might be a way to solve this problem.

Yeah. I've made the 1st patch for this issue.

In my env, with the patch
the TRUNCATE in synchronous logical replication doesn't hang.

Few initial comments:
=====================
1.
+ relreplindex = relation->rd_replidindex;
+
+ /*
+ * build attributes to idindexattrs.
+ */
+ idindexattrs = NULL;
+ foreach(l, indexoidlist)
+ {
+ Oid indexOid = lfirst_oid(l);
+ Relation indexDesc;
+ int i;
+ bool isIDKey; /* replica identity index */
+
+ indexDesc = RelationIdGetRelation(indexOid);

When you have oid of replica identity index (relreplindex) then what
is the need to traverse all the indexes?

2.
It is better to name the function as RelationGet...

--
With Regards,
Amit Kapila.

#21Amit Kapila
amit.kapila16@gmail.com
In reply to: Japin Li (#18)
#22osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Amit Kapila (#20)
#23Japin Li
japinli@hotmail.com
In reply to: Amit Kapila (#21)
#24Amit Kapila
amit.kapila16@gmail.com
In reply to: Japin Li (#23)
#25Japin Li
japinli@hotmail.com
In reply to: osumi.takamichi@fujitsu.com (#22)
#26osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Japin Li (#25)
#27Japin Li
japinli@hotmail.com
In reply to: osumi.takamichi@fujitsu.com (#26)
#28Ajin Cherian
itsajin@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#26)
#29osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Ajin Cherian (#28)
#30Amit Kapila
amit.kapila16@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#29)
#31shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Amit Kapila (#30)
#32osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Amit Kapila (#30)
#33Amit Kapila
amit.kapila16@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#32)
#34osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Amit Kapila (#33)
#35osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: osumi.takamichi@fujitsu.com (#34)
#36Amit Kapila
amit.kapila16@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#35)
#37osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Amit Kapila (#36)
#38Japin Li
japinli@hotmail.com
In reply to: Amit Kapila (#36)
#39Amit Kapila
amit.kapila16@gmail.com
In reply to: Japin Li (#38)
#40tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
In reply to: Amit Kapila (#39)