Truncate in synchronous logical replication failed

amit.kapila16@gmail.com

almost 5 years ago

In reply to: tanghy.fnst@fujitsu.com (#1)

Re: Truncate in synchronous logical replication failed

On Wed, Apr 7, 2021 at 12:26 PM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:

Hi

I met a problem in synchronous logical replication. The client hangs when TRUNCATE TABLE at publisher.

Can you please check if the behavior is the same for PG-13? This is
just to ensure that we have not introduced any bug in PG-14.

--
With Regards,
Amit Kapila.

tanghy.fnst@fujitsu.com

almost 5 years ago

In reply to: Amit Kapila (#2)

RE: Truncate in synchronous logical replication failed

On Wednesday, April 7, 2021 5:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote

Can you please check if the behavior is the same for PG-13? This is
just to ensure that we have not introduced any bug in PG-14.

Yes, same failure happens at PG-13, too.

Regards,
Tang

japinli@hotmail.com

almost 5 years ago

In reply to: tanghy.fnst@fujitsu.com (#3)

Re: Truncate in synchronous logical replication failed

On Wed, 07 Apr 2021 at 16:34, tanghy.fnst@fujitsu.com <tanghy.fnst@fujitsu.com> wrote:

On Wednesday, April 7, 2021 5:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote

Can you please check if the behavior is the same for PG-13? This is
just to ensure that we have not introduced any bug in PG-14.

Yes, same failure happens at PG-13, too.

I found that when we truncate a table in synchronous logical replication,
LockAcquireExtended() [1]if (EligibleForRelationFastPath(locktag, lockmode) && FastPathLocalUseCount < FP_LOCK_SLOTS_PER_BACKEND) { uint32 fasthashcode = FastPathStrongLockHashPartition(hashcode); bool acquired; will try to take a lock via fast path and it
failed (FastPathStrongRelationLocks->count[fasthashcode] = 1).
However, it can acquire the lock when in asynchronous logical replication.
I'm not familiar with the locks, any suggestions? What the difference
between sync and async logical replication for locks?

[1]: if (EligibleForRelationFastPath(locktag, lockmode) && FastPathLocalUseCount < FP_LOCK_SLOTS_PER_BACKEND) { uint32 fasthashcode = FastPathStrongLockHashPartition(hashcode); bool acquired;
if (EligibleForRelationFastPath(locktag, lockmode) &&
FastPathLocalUseCount < FP_LOCK_SLOTS_PER_BACKEND)
{
uint32 fasthashcode = FastPathStrongLockHashPartition(hashcode);
bool acquired;

/*
* LWLockAcquire acts as a memory sequencing point, so it's safe to
* assume that any strong locker whose increment to
* FastPathStrongRelationLocks->counts becomes visible after we test
* it has yet to begin to transfer fast-path locks.
*/
LWLockAcquire(&MyProc->fpInfoLock, LW_EXCLUSIVE);
if (FastPathStrongRelationLocks->count[fasthashcode] != 0)
acquired = false;
else
acquired = FastPathGrantRelationLock(locktag->locktag_field2,
lockmode);
LWLockRelease(&MyProc->fpInfoLock);
if (acquired)
{
/*
* The locallock might contain stale pointers to some old shared
* objects; we MUST reset these to null before considering the
* lock to be acquired via fast-path.
*/
locallock->lock = NULL;
locallock->proclock = NULL;
GrantLockLocal(locallock, owner);
return LOCKACQUIRE_OK;
}
}

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

japinli@hotmail.com

almost 5 years ago

In reply to: Japin Li (#4)

Re: Truncate in synchronous logical replication failed

On Thu, 08 Apr 2021 at 19:20, Japin Li <japinli@hotmail.com> wrote:

On Wed, 07 Apr 2021 at 16:34, tanghy.fnst@fujitsu.com <tanghy.fnst@fujitsu.com> wrote:

On Wednesday, April 7, 2021 5:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote

Can you please check if the behavior is the same for PG-13? This is
just to ensure that we have not introduced any bug in PG-14.

Yes, same failure happens at PG-13, too.

I found that when we truncate a table in synchronous logical replication,
LockAcquireExtended() [1] will try to take a lock via fast path and it
failed (FastPathStrongRelationLocks->count[fasthashcode] = 1).
However, it can acquire the lock when in asynchronous logical replication.
I'm not familiar with the locks, any suggestions? What the difference
between sync and async logical replication for locks?

After some analyze, I find that when the TRUNCATE finish, it will call
SyncRepWaitForLSN(), for asynchronous logical replication, it will exit
early, and then it calls ResourceOwnerRelease(RESOURCE_RELEASE_LOCKS) to
release the locks, so the walsender can acquire the lock.

But for synchronous logical replication, SyncRepWaitForLSN() will wait
for specified LSN to be confirmed, so it cannot release the lock, and
the walsender try to acquire the lock. Obviously, it cannot acquire the
lock, because the lock hold by the process which performs TRUNCATE
command. This is why the TRUNCATE in synchronous logical replication is
blocked.

I don't know if it makes sense to fix this, if so, how to do fix it?
Thoughts?

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

[1]: /messages/by-id/OS0PR01MB6113C2499C7DC70EE55ADB82FB759@OS0PR01MB6113.jpnprd01.prod.outlook.com

almost 5 years ago

In reply to: Japin Li (#5)

RE: Truncate in synchronous logical replication failed

On Saturday, April 10, 2021 11:52 PM Japin Li <japinli@hotmail.com> wrote:

On Thu, 08 Apr 2021 at 19:20, Japin Li <japinli@hotmail.com> wrote:

On Wed, 07 Apr 2021 at 16:34, tanghy.fnst@fujitsu.com

<tanghy.fnst@fujitsu.com> wrote:

On Wednesday, April 7, 2021 5:28 PM Amit Kapila
<amit.kapila16@gmail.com> wrote

Can you please check if the behavior is the same for PG-13? This is
just to ensure that we have not introduced any bug in PG-14.

Yes, same failure happens at PG-13, too.

I found that when we truncate a table in synchronous logical
replication,
LockAcquireExtended() [1] will try to take a lock via fast path and it
failed (FastPathStrongRelationLocks->count[fasthashcode] = 1).
However, it can acquire the lock when in asynchronous logical replication.
I'm not familiar with the locks, any suggestions? What the difference
between sync and async logical replication for locks?

After some analyze, I find that when the TRUNCATE finish, it will call
SyncRepWaitForLSN(), for asynchronous logical replication, it will exit early,
and then it calls ResourceOwnerRelease(RESOURCE_RELEASE_LOCKS) to
release the locks, so the walsender can acquire the lock.

But for synchronous logical replication, SyncRepWaitForLSN() will wait for
specified LSN to be confirmed, so it cannot release the lock, and the
walsender try to acquire the lock. Obviously, it cannot acquire the lock,
because the lock hold by the process which performs TRUNCATE command.
This is why the TRUNCATE in synchronous logical replication is blocked.

Yeah, the TRUNCATE waits in SyncRepWaitForLSN() while
the walsender is blocked by the AccessExclusiveLock taken by it,
which makes the subscriber cannot take the change and leads to a sort of deadlock.

On Wednesday, April 7, 2021 3:56 PM tanghy.fnst@fujitsu.com <tanghy.fnst@fujitsu.com> wrote:

I checked the PG-DOC, found it says that “Replication of TRUNCATE
commands is supported”[1], so maybe TRUNCATE is not supported in
synchronous logical replication?

If my understanding is right, maybe PG-DOC can be modified like this. Any
thought?
Replication of TRUNCATE commands is supported
->
Replication of TRUNCATE commands is supported in asynchronous mode

I'm not sure if this becomes the final solution,
but if we take a measure to fix the doc, we have to be careful for the description,
because when we remove the primary keys of 'test' tables on the scenario in [1]/messages/by-id/OS0PR01MB6113C2499C7DC70EE55ADB82FB759@OS0PR01MB6113.jpnprd01.prod.outlook.com, we don't have this issue.
It means TRUNCATE in synchronous logical replication is not always blocked.

Having the primary key on the pub only causes the hang.
Also, I can observe the same hang using REPLICA IDENTITY USING INDEX and without primary key on the pub,
while I cannot reproduce the problem with the REPLICA IDENTITY FULL and without primary key.
This difference comes from logicalrep_write_attrs() which has a branch to call RelationGetIndexAttrBitmap().
Therefore, the description above is not correct, strictly speaking, I thought.

I'll share my analysis when I get a better idea to address this.

Best Regards,
Takamichi Osumi

amit.kapila16@gmail.com

almost 5 years ago

In reply to: osumi.takamichi@fujitsu.com (#6)

Re: Truncate in synchronous logical replication failed

On Mon, Apr 12, 2021 at 10:03 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

I checked the PG-DOC, found it says that “Replication of TRUNCATE
commands is supported”[1], so maybe TRUNCATE is not supported in
synchronous logical replication?

If my understanding is right, maybe PG-DOC can be modified like this. Any
thought?
Replication of TRUNCATE commands is supported
->
Replication of TRUNCATE commands is supported in asynchronous mode

I'm not sure if this becomes the final solution,

I think unless the solution is not possible or extremely complicated
going via this route doesn't seem advisable.

but if we take a measure to fix the doc, we have to be careful for the description,
because when we remove the primary keys of 'test' tables on the scenario in [1], we don't have this issue.
It means TRUNCATE in synchronous logical replication is not always blocked.

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that
information which locks the required indexes. Now, because TRUNCATE
has already acquired an exclusive lock on the index, it seems to
create a sort of deadlock where the actual Truncate operation waits
for logical replication of operation to complete and logical
replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock the main
relation, we just scan the system table and build that information
using a historic snapshot. Can't we do something similar here?

Adding Petr J. and Peter E. to know their views as this seems to be an
old problem (since the decoding of Truncate operation is introduced).

--
With Regards,
Amit Kapila.

over 4 years ago

In reply to: Amit Kapila (#7)

RE: Truncate in synchronous logical replication failed

On Monday, April 12, 2021 3:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 10:03 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

but if we take a measure to fix the doc, we have to be careful for the
description, because when we remove the primary keys of 'test' tables on the

scenario in [1], we don't have this issue.

It means TRUNCATE in synchronous logical replication is not always

blocked.

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that information
which locks the required indexes. Now, because TRUNCATE has already
acquired an exclusive lock on the index, it seems to create a sort of deadlock
where the actual Truncate operation waits for logical replication of operation to
complete and logical replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build IDENTITY_KEY
attributes? During decoding, we don't even lock the main relation, we just scan
the system table and build that information using a historic snapshot. Can't we
do something similar here?

I think we can build the IDENTITY_KEY attributes with NoLock
instead of calling RelationGetIndexAttrBitmap().

When we trace back the caller side of logicalrep_write_attrs(),
doing the thing equivalent to RelationGetIndexAttrBitmap()
for INDEX_ATTR_BITMAP_IDENTITY_KEY impacts only pgoutput_truncate.

OTOH, I can't find codes similar to RelationGetIndexAttrBitmap()
in pgoutput_* functions and in the file of relcache.c.
Therefore, I'd like to discuss how to address the hang.

My first idea is to extract some parts of RelationGetIndexAttrBitmap()
only for INDEX_ATTR_BITMAP_IDENTITY_KEY and implement those
either in a logicalrep_write_attrs() or as a new function.
RelationGetIndexAttrBitmap() has 'restart' label for goto statement
in order to ensure to return up-to-date attribute bitmaps, so
I prefer having a new function when we choose this direction.
Having that goto in logicalrep_write_attrs() makes it a little bit messy, I felt.

The other direction might be to extend RelationGetIndexAttrBitmap's function definition
to accept lockmode to give NoLock from logicalrep_write_attrs().
But, this change impacts on other several callers so is not as good as the first direction above, I think.

If someone has any better idea, please let me know.

Best Regards,
Takamichi Osumi

Petr Jelinek

petr.jelinek@enterprisedb.com

over 4 years ago

In reply to: Amit Kapila (#7)

Re: Truncate in synchronous logical replication failed

On 12 Apr 2021, at 08:58, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 10:03 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

I checked the PG-DOC, found it says that “Replication of TRUNCATE
commands is supported”[1], so maybe TRUNCATE is not supported in
synchronous logical replication?

If my understanding is right, maybe PG-DOC can be modified like this. Any
thought?
Replication of TRUNCATE commands is supported
->
Replication of TRUNCATE commands is supported in asynchronous mode

I'm not sure if this becomes the final solution,

I think unless the solution is not possible or extremely complicated
going via this route doesn't seem advisable.

but if we take a measure to fix the doc, we have to be careful for the description,
because when we remove the primary keys of 'test' tables on the scenario in [1], we don't have this issue.
It means TRUNCATE in synchronous logical replication is not always blocked.

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that
information which locks the required indexes. Now, because TRUNCATE
has already acquired an exclusive lock on the index, it seems to
create a sort of deadlock where the actual Truncate operation waits
for logical replication of operation to complete and logical
replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock the main
relation, we just scan the system table and build that information
using a historic snapshot. Can't we do something similar here?

Adding Petr J. and Peter E. to know their views as this seems to be an
old problem (since the decoding of Truncate operation is introduced).

We used RelationGetIndexAttrBitmap because it already existed, no other reason. I am not sure what exact locking we need but I would have guessed at least AccessShareLock would be needed.

--
Petr

#10

japinli@hotmail.com

over 4 years ago

In reply to: osumi.takamichi@fujitsu.com (#8)

Re: Truncate in synchronous logical replication failed

On Tue, 13 Apr 2021 at 21:54, osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote:

On Monday, April 12, 2021 3:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 10:03 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

but if we take a measure to fix the doc, we have to be careful for the
description, because when we remove the primary keys of 'test' tables on the

scenario in [1], we don't have this issue.

It means TRUNCATE in synchronous logical replication is not always

blocked.

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that information
which locks the required indexes. Now, because TRUNCATE has already
acquired an exclusive lock on the index, it seems to create a sort of deadlock
where the actual Truncate operation waits for logical replication of operation to
complete and logical replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build IDENTITY_KEY
attributes? During decoding, we don't even lock the main relation, we just scan
the system table and build that information using a historic snapshot. Can't we
do something similar here?

I think we can build the IDENTITY_KEY attributes with NoLock
instead of calling RelationGetIndexAttrBitmap().

When we trace back the caller side of logicalrep_write_attrs(),
doing the thing equivalent to RelationGetIndexAttrBitmap()
for INDEX_ATTR_BITMAP_IDENTITY_KEY impacts only pgoutput_truncate.

OTOH, I can't find codes similar to RelationGetIndexAttrBitmap()
in pgoutput_* functions and in the file of relcache.c.
Therefore, I'd like to discuss how to address the hang.

My first idea is to extract some parts of RelationGetIndexAttrBitmap()
only for INDEX_ATTR_BITMAP_IDENTITY_KEY and implement those
either in a logicalrep_write_attrs() or as a new function.
RelationGetIndexAttrBitmap() has 'restart' label for goto statement
in order to ensure to return up-to-date attribute bitmaps, so
I prefer having a new function when we choose this direction.
Having that goto in logicalrep_write_attrs() makes it a little bit messy, I felt.

The other direction might be to extend RelationGetIndexAttrBitmap's function definition
to accept lockmode to give NoLock from logicalrep_write_attrs().
But, this change impacts on other several callers so is not as good as the first direction above, I think.

If someone has any better idea, please let me know.

I think the first idea is better than the second. OTOH, can we release the
locks before SyncRepWaitForLSN(), since it already flush to local WAL files.

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

#11

over 4 years ago

In reply to: Petr Jelinek (#9)

RE: Truncate in synchronous logical replication failed

On Tuesday, April 13, 2021 11:38 PM Petr Jelinek <petr.jelinek@enterprisedb.com> wrote:

On 12 Apr 2021, at 08:58, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Apr 12, 2021 at 10:03 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

I checked the PG-DOC, found it says that “Replication of TRUNCATE
commands is supported”[1], so maybe TRUNCATE is not supported in
synchronous logical replication?

If my understanding is right, maybe PG-DOC can be modified like
this. Any thought?
Replication of TRUNCATE commands is supported
->
Replication of TRUNCATE commands is supported in asynchronous

mode

I'm not sure if this becomes the final solution,

I think unless the solution is not possible or extremely complicated
going via this route doesn't seem advisable.

but if we take a measure to fix the doc, we have to be careful for
the description, because when we remove the primary keys of 'test' tables

on the scenario in [1], we don't have this issue.

It means TRUNCATE in synchronous logical replication is not always

blocked.

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that
information which locks the required indexes. Now, because TRUNCATE
has already acquired an exclusive lock on the index, it seems to
create a sort of deadlock where the actual Truncate operation waits
for logical replication of operation to complete and logical
replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock the main
relation, we just scan the system table and build that information
using a historic snapshot. Can't we do something similar here?

Adding Petr J. and Peter E. to know their views as this seems to be an
old problem (since the decoding of Truncate operation is introduced).

We used RelationGetIndexAttrBitmap because it already existed, no other
reason.I am not sure what exact locking we need but I would have guessed at
least AccessShareLock would be needed.

This was true.

Having a look at the comment of index_open(), there's a description of basic rule
that NoLock should be used if appropriate lock on the index is already taken.
And, making the walsender use NoLock to build the attributes
leads us to the Assert in the relation_open().

Please ignore the two ideas I suggested in another mail,
which doesn't follow the basic and work.

Best Regards,
Takamichi Osumi

#12

over 4 years ago

In reply to: Japin Li (#10)

RE: Truncate in synchronous logical replication failed

On Wednesday, April 14, 2021 11:38 AM Japin Li <japinli@hotmail.com> wrote:

On Tue, 13 Apr 2021 at 21:54, osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Monday, April 12, 2021 3:58 PM Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Mon, Apr 12, 2021 at 10:03 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

but if we take a measure to fix the doc, we have to be careful for
the description, because when we remove the primary keys of 'test'
tables on the

scenario in [1], we don't have this issue.

It means TRUNCATE in synchronous logical replication is not always

blocked.

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that
information which locks the required indexes. Now, because TRUNCATE
has already acquired an exclusive lock on the index, it seems to
create a sort of deadlock where the actual Truncate operation waits
for logical replication of operation to complete and logical replication waits

for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock the main
relation, we just scan the system table and build that information
using a historic snapshot. Can't we do something similar here?

I think we can build the IDENTITY_KEY attributes with NoLock instead
of calling RelationGetIndexAttrBitmap().

When we trace back the caller side of logicalrep_write_attrs(), doing
the thing equivalent to RelationGetIndexAttrBitmap() for
INDEX_ATTR_BITMAP_IDENTITY_KEY impacts only pgoutput_truncate.

OTOH, I can't find codes similar to RelationGetIndexAttrBitmap() in
pgoutput_* functions and in the file of relcache.c.
Therefore, I'd like to discuss how to address the hang.

My first idea is to extract some parts of RelationGetIndexAttrBitmap()
only for INDEX_ATTR_BITMAP_IDENTITY_KEY and implement those either

in

a logicalrep_write_attrs() or as a new function.
RelationGetIndexAttrBitmap() has 'restart' label for goto statement in
order to ensure to return up-to-date attribute bitmaps, so I prefer
having a new function when we choose this direction.
Having that goto in logicalrep_write_attrs() makes it a little bit messy, I felt.

The other direction might be to extend RelationGetIndexAttrBitmap's
function definition to accept lockmode to give NoLock from

logicalrep_write_attrs().

But, this change impacts on other several callers so is not as good as the first

direction above, I think.

If someone has any better idea, please let me know.

I think the first idea is better than the second. OTOH, can we release the locks
before SyncRepWaitForLSN(), since it already flush to local WAL files.

Thank you for your comments.
I didn't mean to change and touch TRUNCATE side to release the locks,
because I expected that its AccessExclusiveLock
protects any other operation (e.g. DROP INDEX) to the table
which affects IDENTITY KEY building. But, now as I said in another e-mail,
both ideas above can't work. Really sorry for making noises.

Best Regards,
Takamichi Osumi

#13

amit.kapila16@gmail.com

over 4 years ago

In reply to: Petr Jelinek (#9)

Re: Truncate in synchronous logical replication failed

On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:

On 12 Apr 2021, at 08:58, Amit Kapila <amit.kapila16@gmail.com> wrote:

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that
information which locks the required indexes. Now, because TRUNCATE
has already acquired an exclusive lock on the index, it seems to
create a sort of deadlock where the actual Truncate operation waits
for logical replication of operation to complete and logical
replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock the main
relation, we just scan the system table and build that information
using a historic snapshot. Can't we do something similar here?

Adding Petr J. and Peter E. to know their views as this seems to be an
old problem (since the decoding of Truncate operation is introduced).

We used RelationGetIndexAttrBitmap because it already existed, no other reason.

Fair enough. But I think we should do something about it because using
the same (RelationGetIndexAttrBitmap) just breaks the synchronous
logical replication. I think this is broken since the logical
replication of Truncate is supported.

I am not sure what exact locking we need but I would have guessed at least AccessShareLock would be needed.

Are you telling that we need AccessShareLock on the index? If so, why
is it different from how we access the relation during decoding
(basically in ReorderBufferProcessTXN, we directly use
RelationIdGetRelation() without any lock on the relation)? I think we
do it that way because we need it to process WAL entries and we need
the relation state based on the historic snapshot, so, even if the
relation is later changed/dropped, we are fine with the old state we
got. Isn't the same thing applies here in logicalrep_write_attrs? If
that is true then some equivalent of RelationGetIndexAttrBitmap where
we use RelationIdGetRelation instead of index_open should work? Am, I
missing something?

--
With Regards,
Amit Kapila.

#14

amit.kapila16@gmail.com

over 4 years ago

In reply to: Amit Kapila (#13)

Re: Truncate in synchronous logical replication failed

On Wed, Apr 14, 2021 at 3:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:

On 12 Apr 2021, at 08:58, Amit Kapila <amit.kapila16@gmail.com> wrote:

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that
information which locks the required indexes. Now, because TRUNCATE
has already acquired an exclusive lock on the index, it seems to
create a sort of deadlock where the actual Truncate operation waits
for logical replication of operation to complete and logical
replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock the main
relation, we just scan the system table and build that information
using a historic snapshot. Can't we do something similar here?

Adding Petr J. and Peter E. to know their views as this seems to be an
old problem (since the decoding of Truncate operation is introduced).

We used RelationGetIndexAttrBitmap because it already existed, no other reason.

Fair enough. But I think we should do something about it because using
the same (RelationGetIndexAttrBitmap) just breaks the synchronous
logical replication. I think this is broken since the logical
replication of Truncate is supported.

I am not sure what exact locking we need but I would have guessed at least AccessShareLock would be needed.

Are you telling that we need AccessShareLock on the index? If so, why
is it different from how we access the relation during decoding
(basically in ReorderBufferProcessTXN, we directly use
RelationIdGetRelation() without any lock on the relation)? I think we
do it that way because we need it to process WAL entries and we need
the relation state based on the historic snapshot, so, even if the
relation is later changed/dropped, we are fine with the old state we
got. Isn't the same thing applies here in logicalrep_write_attrs? If
that is true then some equivalent of RelationGetIndexAttrBitmap where
we use RelationIdGetRelation instead of index_open should work?

Today, again I have thought about this and don't see a problem with
the above idea. If the above understanding is correct, then I think
for our purpose in pgoutput, we just need to call RelationGetIndexList
and then build the idattr list for relation->rd_replidindex. We can
then cache it in relation->rd_idattr. I am not sure if it is really
required to do all the other work in RelationGetIndexAttrBitmap.

--
With Regards,
Amit Kapila.

#15

[1]: /messages/by-id/OSBPR01MB488834BDBD67BFF2FB048B3DED4E9@OSBPR01MB4888.jpnprd01.prod.outlook.com

japinli@hotmail.com

over 4 years ago

In reply to: Amit Kapila (#14)

Re: Truncate in synchronous logical replication failed

On Thu, 15 Apr 2021 at 18:22, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 14, 2021 at 3:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:

On 12 Apr 2021, at 08:58, Amit Kapila <amit.kapila16@gmail.com> wrote:

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that
information which locks the required indexes. Now, because TRUNCATE
has already acquired an exclusive lock on the index, it seems to
create a sort of deadlock where the actual Truncate operation waits
for logical replication of operation to complete and logical
replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock the main
relation, we just scan the system table and build that information
using a historic snapshot. Can't we do something similar here?

Adding Petr J. and Peter E. to know their views as this seems to be an
old problem (since the decoding of Truncate operation is introduced).

We used RelationGetIndexAttrBitmap because it already existed, no other reason.

Fair enough. But I think we should do something about it because using
the same (RelationGetIndexAttrBitmap) just breaks the synchronous
logical replication. I think this is broken since the logical
replication of Truncate is supported.

I am not sure what exact locking we need but I would have guessed at least AccessShareLock would be needed.

Are you telling that we need AccessShareLock on the index? If so, why
is it different from how we access the relation during decoding
(basically in ReorderBufferProcessTXN, we directly use
RelationIdGetRelation() without any lock on the relation)? I think we
do it that way because we need it to process WAL entries and we need
the relation state based on the historic snapshot, so, even if the
relation is later changed/dropped, we are fine with the old state we
got. Isn't the same thing applies here in logicalrep_write_attrs? If
that is true then some equivalent of RelationGetIndexAttrBitmap where
we use RelationIdGetRelation instead of index_open should work?

Today, again I have thought about this and don't see a problem with
the above idea. If the above understanding is correct, then I think
for our purpose in pgoutput, we just need to call RelationGetIndexList
and then build the idattr list for relation->rd_replidindex.

Sorry, I don't know how can we build the idattr without open the index.
If we should open the index, then we should use NoLock, since the TRUNCATE
side hold AccessExclusiveLock. As Osumi points out in [1]/messages/by-id/OSBPR01MB488834BDBD67BFF2FB048B3DED4E9@OSBPR01MB4888.jpnprd01.prod.outlook.com, The NoLock mode
assumes that the appropriate lock on the index is already taken.

Please let me know if I have misunderstood.

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

#16

amit.kapila16@gmail.com

over 4 years ago

In reply to: Japin Li (#15)

Re: Truncate in synchronous logical replication failed

On Thu, Apr 15, 2021 at 4:30 PM Japin Li <japinli@hotmail.com> wrote:

On Thu, 15 Apr 2021 at 18:22, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 14, 2021 at 3:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:

On 12 Apr 2021, at 08:58, Amit Kapila <amit.kapila16@gmail.com> wrote:

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that
information which locks the required indexes. Now, because TRUNCATE
has already acquired an exclusive lock on the index, it seems to
create a sort of deadlock where the actual Truncate operation waits
for logical replication of operation to complete and logical
replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock the main
relation, we just scan the system table and build that information
using a historic snapshot. Can't we do something similar here?

Adding Petr J. and Peter E. to know their views as this seems to be an
old problem (since the decoding of Truncate operation is introduced).

We used RelationGetIndexAttrBitmap because it already existed, no other reason.

Fair enough. But I think we should do something about it because using
the same (RelationGetIndexAttrBitmap) just breaks the synchronous
logical replication. I think this is broken since the logical
replication of Truncate is supported.

I am not sure what exact locking we need but I would have guessed at least AccessShareLock would be needed.

Are you telling that we need AccessShareLock on the index? If so, why
is it different from how we access the relation during decoding
(basically in ReorderBufferProcessTXN, we directly use
RelationIdGetRelation() without any lock on the relation)? I think we
do it that way because we need it to process WAL entries and we need
the relation state based on the historic snapshot, so, even if the
relation is later changed/dropped, we are fine with the old state we
got. Isn't the same thing applies here in logicalrep_write_attrs? If
that is true then some equivalent of RelationGetIndexAttrBitmap where
we use RelationIdGetRelation instead of index_open should work?

Today, again I have thought about this and don't see a problem with
the above idea. If the above understanding is correct, then I think
for our purpose in pgoutput, we just need to call RelationGetIndexList
and then build the idattr list for relation->rd_replidindex.

Sorry, I don't know how can we build the idattr without open the index.
If we should open the index, then we should use NoLock, since the TRUNCATE
side hold AccessExclusiveLock. As Osumi points out in [1], The NoLock mode
assumes that the appropriate lock on the index is already taken.

Why can't we use RelationIdGetRelation() by passing the required
indexOid to it?

--
With Regards,
Amit Kapila.

#17

japinli@hotmail.com

over 4 years ago

In reply to: Amit Kapila (#16)

Re: Truncate in synchronous logical replication failed

On Thu, 15 Apr 2021 at 19:25, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 15, 2021 at 4:30 PM Japin Li <japinli@hotmail.com> wrote:

On Thu, 15 Apr 2021 at 18:22, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 14, 2021 at 3:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:

On 12 Apr 2021, at 08:58, Amit Kapila <amit.kapila16@gmail.com> wrote:

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that
information which locks the required indexes. Now, because TRUNCATE
has already acquired an exclusive lock on the index, it seems to
create a sort of deadlock where the actual Truncate operation waits
for logical replication of operation to complete and logical
replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock the main
relation, we just scan the system table and build that information
using a historic snapshot. Can't we do something similar here?

Adding Petr J. and Peter E. to know their views as this seems to be an
old problem (since the decoding of Truncate operation is introduced).

We used RelationGetIndexAttrBitmap because it already existed, no other reason.

Fair enough. But I think we should do something about it because using
the same (RelationGetIndexAttrBitmap) just breaks the synchronous
logical replication. I think this is broken since the logical
replication of Truncate is supported.

I am not sure what exact locking we need but I would have guessed at least AccessShareLock would be needed.

Are you telling that we need AccessShareLock on the index? If so, why
is it different from how we access the relation during decoding
(basically in ReorderBufferProcessTXN, we directly use
RelationIdGetRelation() without any lock on the relation)? I think we
do it that way because we need it to process WAL entries and we need
the relation state based on the historic snapshot, so, even if the
relation is later changed/dropped, we are fine with the old state we
got. Isn't the same thing applies here in logicalrep_write_attrs? If
that is true then some equivalent of RelationGetIndexAttrBitmap where
we use RelationIdGetRelation instead of index_open should work?

Today, again I have thought about this and don't see a problem with
the above idea. If the above understanding is correct, then I think
for our purpose in pgoutput, we just need to call RelationGetIndexList
and then build the idattr list for relation->rd_replidindex.

Sorry, I don't know how can we build the idattr without open the index.
If we should open the index, then we should use NoLock, since the TRUNCATE
side hold AccessExclusiveLock. As Osumi points out in [1], The NoLock mode
assumes that the appropriate lock on the index is already taken.

Why can't we use RelationIdGetRelation() by passing the required
indexOid to it?

Thanks for your reminder. It might be a way to solve this problem.
I'll try it later.

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

#18

japinli@hotmail.com

over 4 years ago

In reply to: Amit Kapila (#16)

1 attachment(s)

Re: Truncate in synchronous logical replication failed

On Thu, 15 Apr 2021 at 19:25, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 15, 2021 at 4:30 PM Japin Li <japinli@hotmail.com> wrote:

On Thu, 15 Apr 2021 at 18:22, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 14, 2021 at 3:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:

On 12 Apr 2021, at 08:58, Amit Kapila <amit.kapila16@gmail.com> wrote:

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that
information which locks the required indexes. Now, because TRUNCATE
has already acquired an exclusive lock on the index, it seems to
create a sort of deadlock where the actual Truncate operation waits
for logical replication of operation to complete and logical
replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock the main
relation, we just scan the system table and build that information
using a historic snapshot. Can't we do something similar here?

Adding Petr J. and Peter E. to know their views as this seems to be an
old problem (since the decoding of Truncate operation is introduced).

We used RelationGetIndexAttrBitmap because it already existed, no other reason.

Fair enough. But I think we should do something about it because using
the same (RelationGetIndexAttrBitmap) just breaks the synchronous
logical replication. I think this is broken since the logical
replication of Truncate is supported.

I am not sure what exact locking we need but I would have guessed at least AccessShareLock would be needed.

Are you telling that we need AccessShareLock on the index? If so, why
is it different from how we access the relation during decoding
(basically in ReorderBufferProcessTXN, we directly use
RelationIdGetRelation() without any lock on the relation)? I think we
do it that way because we need it to process WAL entries and we need
the relation state based on the historic snapshot, so, even if the
relation is later changed/dropped, we are fine with the old state we
got. Isn't the same thing applies here in logicalrep_write_attrs? If
that is true then some equivalent of RelationGetIndexAttrBitmap where
we use RelationIdGetRelation instead of index_open should work?

Today, again I have thought about this and don't see a problem with
the above idea. If the above understanding is correct, then I think
for our purpose in pgoutput, we just need to call RelationGetIndexList
and then build the idattr list for relation->rd_replidindex.

Sorry, I don't know how can we build the idattr without open the index.
If we should open the index, then we should use NoLock, since the TRUNCATE
side hold AccessExclusiveLock. As Osumi points out in [1], The NoLock mode
assumes that the appropriate lock on the index is already taken.

Why can't we use RelationIdGetRelation() by passing the required
indexOid to it?

Hi Amit, as your suggested, I try to use RelationIdGetRelation() replace
the index_open() to avoid specify the AccessSharedLock, then the TRUNCATE
will not be blocked.

Attached patch fixed it. Thoughts?

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Attachments:

truncate-table-in-synchronous-logical-replication.patchtext/x-patchDownload

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 29702d6eab..0ad59ef189 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -5060,7 +5060,7 @@ restart:
 		bool		isPK;		/* primary key */
 		bool		isIDKey;	/* replica identity index */
 
-		indexDesc = index_open(indexOid, AccessShareLock);
+		indexDesc = RelationIdGetRelation(indexOid);
 
 		/*
 		 * Extract index expressions and index predicate.  Note: Don't use
@@ -5134,7 +5134,7 @@ restart:
 		/* Collect all attributes in the index predicate, too */
 		pull_varattnos(indexPredicate, 1, &indexattrs);
 
-		index_close(indexDesc, AccessShareLock);
+		RelationClose(indexDesc);
 	}
 
 	/*
diff --git a/src/test/subscription/t/010_truncate.pl b/src/test/subscription/t/010_truncate.pl
index be2c0bdc35..cfcee2f1a7 100644
--- a/src/test/subscription/t/010_truncate.pl
+++ b/src/test/subscription/t/010_truncate.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 11;
 
 # setup
 
@@ -158,3 +158,28 @@ is($result, qq(0||), 'truncate of multiple tables some not published');
 $result = $node_subscriber->safe_psql('postgres',
 	"SELECT count(*), min(a), max(a) FROM tab2");
 is($result, qq(3|1|3), 'truncate of multiple tables some not published');
+
+# setup synchronous logical replication
+
+$node_publisher->safe_psql('postgres',
+	"ALTER SYSTEM SET synchronous_standby_names TO 'sub1'");
+$node_publisher->safe_psql('postgres', "SELECT pg_reload_conf()");
+
+# insert data to truncate
+
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO tab1 VALUES (1), (2), (3)");
+
+$node_publisher->wait_for_catchup('sub1');
+
+$result = $node_subscriber->safe_psql('postgres',
+	"SELECT count(*), min(a), max(a) FROM tab1");
+is($result, qq(3|1|3), 'check synchronous logical replication');
+
+$node_publisher->safe_psql('postgres', "TRUNCATE tab1");
+
+$node_publisher->wait_for_catchup('sub1');
+
+$result = $node_subscriber->safe_psql('postgres',
+	"SELECT count(*), min(a), max(a) FROM tab1");
+is($result, qq(0||), 'truncate replicated in synchronous logical replication');

#19

over 4 years ago

In reply to: Japin Li (#17)

1 attachment(s)

RE: Truncate in synchronous logical replication failed

On Friday, April 16, 2021 11:02 AM Japin Li <japinli@hotmail.com>

On Thu, 15 Apr 2021 at 19:25, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 15, 2021 at 4:30 PM Japin Li <japinli@hotmail.com> wrote:

On Thu, 15 Apr 2021 at 18:22, Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Wed, Apr 14, 2021 at 3:31 PM Amit Kapila

<amit.kapila16@gmail.com> wrote:

On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:

On 12 Apr 2021, at 08:58, Amit Kapila <amit.kapila16@gmail.com>

wrote:

The problem happens only when we try to fetch IDENTITY_KEY
attributes because pgoutput uses RelationGetIndexAttrBitmap()
to get that information which locks the required indexes. Now,
because TRUNCATE has already acquired an exclusive lock on the
index, it seems to create a sort of deadlock where the actual
Truncate operation waits for logical replication of operation
to complete and logical replication waits for actual Truncate

operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock
the main relation, we just scan the system table and build
that information using a historic snapshot. Can't we do something

similar here?

Adding Petr J. and Peter E. to know their views as this seems
to be an old problem (since the decoding of Truncate operation is

introduced).

We used RelationGetIndexAttrBitmap because it already existed, no

other reason.

Fair enough. But I think we should do something about it because
using the same (RelationGetIndexAttrBitmap) just breaks the
synchronous logical replication. I think this is broken since the
logical replication of Truncate is supported.

I am not sure what exact locking we need but I would have guessed

at least AccessShareLock would be needed.

Are you telling that we need AccessShareLock on the index? If so,
why is it different from how we access the relation during
decoding (basically in ReorderBufferProcessTXN, we directly use
RelationIdGetRelation() without any lock on the relation)? I think
we do it that way because we need it to process WAL entries and we
need the relation state based on the historic snapshot, so, even
if the relation is later changed/dropped, we are fine with the old
state we got. Isn't the same thing applies here in
logicalrep_write_attrs? If that is true then some equivalent of
RelationGetIndexAttrBitmap where we use RelationIdGetRelation

instead of index_open should work?

Today, again I have thought about this and don't see a problem with
the above idea. If the above understanding is correct, then I think
for our purpose in pgoutput, we just need to call
RelationGetIndexList and then build the idattr list for

relation->rd_replidindex.

Sorry, I don't know how can we build the idattr without open the index.
If we should open the index, then we should use NoLock, since the
TRUNCATE side hold AccessExclusiveLock. As Osumi points out in [1],
The NoLock mode assumes that the appropriate lock on the index is

already taken.

Why can't we use RelationIdGetRelation() by passing the required
indexOid to it?

Thanks for your reminder. It might be a way to solve this problem.

Yeah. I've made the 1st patch for this issue.

In my env, with the patch
the TRUNCATE in synchronous logical replication doesn't hang.
It's OK with make check-world as well.

Best Regards,
Takamichi Osumi

Attachments:

Truncate_in_synchronous_logical_replication_v01.patchapplication/octet-stream; name=Truncate_in_synchronous_logical_replication_v01.patchDownload

diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index 2a1f983..130e15f 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -668,8 +668,7 @@ logicalrep_write_attrs(StringInfo out, Relation rel)
 	/* fetch bitmap of REPLICATION IDENTITY attributes */
 	replidentfull = (rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL);
 	if (!replidentfull)
-		idattrs = RelationGetIndexAttrBitmap(rel,
-											 INDEX_ATTR_BITMAP_IDENTITY_KEY);
+		idattrs = RelationSetIdentityKeyBitmap(rel);
 
 	/* send the attributes */
 	for (i = 0; i < desc->natts; i++)
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 29702d6..218ab91 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -5206,6 +5206,104 @@ restart:
 	}
 }
 
+Bitmapset *
+RelationSetIdentityKeyBitmap(Relation relation)
+{
+	Bitmapset  *idindexattrs;	/* columns in the replica identity */
+	List	   *indexoidlist;
+	List	   *newindexoidlist;
+	Oid			relpkindex;
+	Oid			relreplindex;
+	ListCell   *l;
+	MemoryContext oldcxt;
+
+	/* Quick exit if we already computed the result. */
+	if (relation->rd_idattr != NULL)
+		return bms_copy(relation->rd_idattr);
+
+	/* Fast path if definitely no indexes */
+	if (!RelationGetForm(relation)->relhasindex)
+		return NULL;
+
+	/*
+	 * Get cached list of index OIDs. If we have to start over, we do so here.
+	 */
+restart:
+	indexoidlist = RelationGetIndexList(relation);
+
+	/* Fall out if no indexes (but relhasindex was set) */
+	if (indexoidlist == NIL)
+		return NULL;
+
+	/* Save some values to compare after building attributes */
+	relpkindex = relation->rd_pkindex;
+	relreplindex = relation->rd_replidindex;
+
+	/*
+	 * build attributes to idindexattrs.
+	 */
+	idindexattrs = NULL;
+	foreach(l, indexoidlist)
+	{
+		Oid			indexOid = lfirst_oid(l);
+		Relation	indexDesc;
+		int			i;
+		bool		isIDKey;	/* replica identity index */
+
+		indexDesc = RelationIdGetRelation(indexOid);
+
+		/* Is this index the configured (or default) replica identity? */
+		isIDKey = (indexOid == relreplindex);
+
+		/* Collect simple attribute references */
+		for (i = 0; i < indexDesc->rd_index->indnatts; i++)
+		{
+			int			attrnum = indexDesc->rd_index->indkey.values[i];
+
+			/* Romove non-key columns here. */
+			if (attrnum != 0)
+			{
+				if (isIDKey && i < indexDesc->rd_index->indnkeyatts)
+					idindexattrs = bms_add_member(idindexattrs,
+												  attrnum - FirstLowInvalidHeapAttributeNumber);
+			}
+		}
+		RelationClose(indexDesc);
+	}
+
+	/* Check if we need to redo */
+	newindexoidlist = RelationGetIndexList(relation);
+	if (equal(indexoidlist, newindexoidlist) &&
+		relpkindex == relation->rd_pkindex &&
+		relreplindex == relation->rd_replidindex)
+	{
+		/* Still the same index set, so proceed */
+		list_free(newindexoidlist);
+		list_free(indexoidlist);
+	}
+	else
+	{
+		/* Gotta do it over ... might as well not leak memory */
+		list_free(newindexoidlist);
+		list_free(indexoidlist);
+		bms_free(idindexattrs);
+
+		goto restart;
+	}
+
+	/* Don't leak the old values of these bitmaps, if any */
+	bms_free(relation->rd_idattr);
+	relation->rd_idattr = NULL;
+
+	/* Now save copies of the bitmaps in the relcache entry */
+	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+	relation->rd_idattr = bms_copy(idindexattrs);
+	MemoryContextSwitchTo(oldcxt);
+
+	/* We return our original working copy for caller to play with */
+	return idindexattrs;
+}
+
 /*
  * RelationGetExclusionInfo -- get info about index's exclusion constraint
  *
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 2fcdf79..de49ecc 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -65,6 +65,8 @@ typedef enum IndexAttrBitmapKind
 extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation,
 											 IndexAttrBitmapKind attrKind);
 
+extern Bitmapset *RelationSetIdentityKeyBitmap(Relation relation);
+
 extern void RelationGetExclusionInfo(Relation indexRelation,
 									 Oid **operators,
 									 Oid **procs,

#20

amit.kapila16@gmail.com

over 4 years ago

In reply to: osumi.takamichi@fujitsu.com (#19)

Re: Truncate in synchronous logical replication failed

On Fri, Apr 16, 2021 at 12:56 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

Thanks for your reminder. It might be a way to solve this problem.

Yeah. I've made the 1st patch for this issue.

In my env, with the patch
the TRUNCATE in synchronous logical replication doesn't hang.

Few initial comments:
=====================
1.
+ relreplindex = relation->rd_replidindex;
+
+ /*
+ * build attributes to idindexattrs.
+ */
+ idindexattrs = NULL;
+ foreach(l, indexoidlist)
+ {
+ Oid indexOid = lfirst_oid(l);
+ Relation indexDesc;
+ int i;
+ bool isIDKey; /* replica identity index */
+
+ indexDesc = RelationIdGetRelation(indexOid);

When you have oid of replica identity index (relreplindex) then what
is the need to traverse all the indexes?

2.
It is better to name the function as RelationGet...

--
With Regards,
Amit Kapila.

#21

amit.kapila16@gmail.com

over 4 years ago

In reply to: Japin Li (#18)

Re: Truncate in synchronous logical replication failed

On Fri, Apr 16, 2021 at 12:55 PM Japin Li <japinli@hotmail.com> wrote:

On Thu, 15 Apr 2021 at 19:25, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 15, 2021 at 4:30 PM Japin Li <japinli@hotmail.com> wrote:

On Thu, 15 Apr 2021 at 18:22, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 14, 2021 at 3:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:

On 12 Apr 2021, at 08:58, Amit Kapila <amit.kapila16@gmail.com> wrote:

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that
information which locks the required indexes. Now, because TRUNCATE
has already acquired an exclusive lock on the index, it seems to
create a sort of deadlock where the actual Truncate operation waits
for logical replication of operation to complete and logical
replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock the main
relation, we just scan the system table and build that information
using a historic snapshot. Can't we do something similar here?

Adding Petr J. and Peter E. to know their views as this seems to be an
old problem (since the decoding of Truncate operation is introduced).

We used RelationGetIndexAttrBitmap because it already existed, no other reason.

Fair enough. But I think we should do something about it because using
the same (RelationGetIndexAttrBitmap) just breaks the synchronous
logical replication. I think this is broken since the logical
replication of Truncate is supported.

I am not sure what exact locking we need but I would have guessed at least AccessShareLock would be needed.

Are you telling that we need AccessShareLock on the index? If so, why
is it different from how we access the relation during decoding
(basically in ReorderBufferProcessTXN, we directly use
RelationIdGetRelation() without any lock on the relation)? I think we
do it that way because we need it to process WAL entries and we need
the relation state based on the historic snapshot, so, even if the
relation is later changed/dropped, we are fine with the old state we
got. Isn't the same thing applies here in logicalrep_write_attrs? If
that is true then some equivalent of RelationGetIndexAttrBitmap where
we use RelationIdGetRelation instead of index_open should work?

Today, again I have thought about this and don't see a problem with
the above idea. If the above understanding is correct, then I think
for our purpose in pgoutput, we just need to call RelationGetIndexList
and then build the idattr list for relation->rd_replidindex.

Sorry, I don't know how can we build the idattr without open the index.
If we should open the index, then we should use NoLock, since the TRUNCATE
side hold AccessExclusiveLock. As Osumi points out in [1], The NoLock mode
assumes that the appropriate lock on the index is already taken.

Why can't we use RelationIdGetRelation() by passing the required
indexOid to it?

Hi Amit, as your suggested, I try to use RelationIdGetRelation() replace
the index_open() to avoid specify the AccessSharedLock, then the TRUNCATE
will not be blocked.

It is okay as POC but we can't change the existing function
RelationGetIndexAttrBitmap. It is used from other places as well. It
might be better to write a separate function for this, something like
what Osumi-San's patch is trying to do.

--
With Regards,
Amit Kapila.

#22

over 4 years ago

In reply to: Amit Kapila (#20)

RE: Truncate in synchronous logical replication failed

On Friday, April 16, 2021 5:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Apr 16, 2021 at 12:56 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

Thanks for your reminder. It might be a way to solve this problem.

Yeah. I've made the 1st patch for this issue.

In my env, with the patch
the TRUNCATE in synchronous logical replication doesn't hang.
Few initial comments:
=====================
1.
+ relreplindex = relation->rd_replidindex;
+
+ /*
+ * build attributes to idindexattrs.
+ */
+ idindexattrs = NULL;
+ foreach(l, indexoidlist)
+ {
+ Oid indexOid = lfirst_oid(l);
+ Relation indexDesc;
+ int i;
+ bool isIDKey; /* replica identity index */
+
+ indexDesc = RelationIdGetRelation(indexOid);
When you have oid of replica identity index (relreplindex) then what is the
need to traverse all the indexes?

Ok. No need to traverse all the indexes. Will fix this part.

2.
It is better to name the function as RelationGet...

You are right. I'll modify this in my next version.

Best Regards,
Takamichi Osumi

#23

japinli@hotmail.com

over 4 years ago

In reply to: Amit Kapila (#21)

Re: Truncate in synchronous logical replication failed

On Fri, 16 Apr 2021 at 16:52, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Apr 16, 2021 at 12:55 PM Japin Li <japinli@hotmail.com> wrote:

On Thu, 15 Apr 2021 at 19:25, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 15, 2021 at 4:30 PM Japin Li <japinli@hotmail.com> wrote:

On Thu, 15 Apr 2021 at 18:22, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 14, 2021 at 3:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:

On 12 Apr 2021, at 08:58, Amit Kapila <amit.kapila16@gmail.com> wrote:

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that
information which locks the required indexes. Now, because TRUNCATE
has already acquired an exclusive lock on the index, it seems to
create a sort of deadlock where the actual Truncate operation waits
for logical replication of operation to complete and logical
replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock the main
relation, we just scan the system table and build that information
using a historic snapshot. Can't we do something similar here?

Adding Petr J. and Peter E. to know their views as this seems to be an
old problem (since the decoding of Truncate operation is introduced).

We used RelationGetIndexAttrBitmap because it already existed, no other reason.

Fair enough. But I think we should do something about it because using
the same (RelationGetIndexAttrBitmap) just breaks the synchronous
logical replication. I think this is broken since the logical
replication of Truncate is supported.

I am not sure what exact locking we need but I would have guessed at least AccessShareLock would be needed.

Are you telling that we need AccessShareLock on the index? If so, why
is it different from how we access the relation during decoding
(basically in ReorderBufferProcessTXN, we directly use
RelationIdGetRelation() without any lock on the relation)? I think we
do it that way because we need it to process WAL entries and we need
the relation state based on the historic snapshot, so, even if the
relation is later changed/dropped, we are fine with the old state we
got. Isn't the same thing applies here in logicalrep_write_attrs? If
that is true then some equivalent of RelationGetIndexAttrBitmap where
we use RelationIdGetRelation instead of index_open should work?

Today, again I have thought about this and don't see a problem with
the above idea. If the above understanding is correct, then I think
for our purpose in pgoutput, we just need to call RelationGetIndexList
and then build the idattr list for relation->rd_replidindex.

Sorry, I don't know how can we build the idattr without open the index.
If we should open the index, then we should use NoLock, since the TRUNCATE
side hold AccessExclusiveLock. As Osumi points out in [1], The NoLock mode
assumes that the appropriate lock on the index is already taken.

Why can't we use RelationIdGetRelation() by passing the required
indexOid to it?

Hi Amit, as your suggested, I try to use RelationIdGetRelation() replace
the index_open() to avoid specify the AccessSharedLock, then the TRUNCATE
will not be blocked.

It is okay as POC but we can't change the existing function
RelationGetIndexAttrBitmap. It is used from other places as well. It
might be better to write a separate function for this, something like
what Osumi-San's patch is trying to do.

Agreed!

Hi Osumi-San, can you merge the test case in your next version?

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

#24

amit.kapila16@gmail.com

over 4 years ago

In reply to: Japin Li (#23)

Re: Truncate in synchronous logical replication failed

On Fri, Apr 16, 2021 at 3:08 PM Japin Li <japinli@hotmail.com> wrote:

On Fri, 16 Apr 2021 at 16:52, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Apr 16, 2021 at 12:55 PM Japin Li <japinli@hotmail.com> wrote:
It is okay as POC but we can't change the existing function
RelationGetIndexAttrBitmap. It is used from other places as well. It
might be better to write a separate function for this, something like
what Osumi-San's patch is trying to do.

Agreed!

Hi Osumi-San, can you merge the test case in your next version?

+1. Your test looks reasonable to me.

--
With Regards,
Amit Kapila.

#25

japinli@hotmail.com

over 4 years ago

In reply to: osumi.takamichi@fujitsu.com (#22)

1 attachment(s)

Re: Truncate in synchronous logical replication failed

On Fri, 16 Apr 2021 at 17:19, osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote:

Hi

On Friday, April 16, 2021 5:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Apr 16, 2021 at 12:56 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

Thanks for your reminder. It might be a way to solve this problem.

Yeah. I've made the 1st patch for this issue.

In my env, with the patch
the TRUNCATE in synchronous logical replication doesn't hang.
Few initial comments:
=====================
1.
+ relreplindex = relation->rd_replidindex;
+
+ /*
+ * build attributes to idindexattrs.
+ */
+ idindexattrs = NULL;
+ foreach(l, indexoidlist)
+ {
+ Oid indexOid = lfirst_oid(l);
+ Relation indexDesc;
+ int i;
+ bool isIDKey; /* replica identity index */
+
+ indexDesc = RelationIdGetRelation(indexOid);
When you have oid of replica identity index (relreplindex) then what is the
need to traverse all the indexes?
Ok. No need to traverse all the indexes. Will fix this part.

2.
It is better to name the function as RelationGet...

You are right. I'll modify this in my next version.

I took the liberty to address review comments and provide a v2 patch on top of
your's v1 patch, also merge the test case.

Sorry for attaching.

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Attachments:

truncate_in_synchronous_logical_replication_v02.patchtext/x-patchDownload

diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index 2a1f9830e0..1cf59e0fb0 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -668,8 +668,7 @@ logicalrep_write_attrs(StringInfo out, Relation rel)
 	/* fetch bitmap of REPLICATION IDENTITY attributes */
 	replidentfull = (rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL);
 	if (!replidentfull)
-		idattrs = RelationGetIndexAttrBitmap(rel,
-											 INDEX_ATTR_BITMAP_IDENTITY_KEY);
+		idattrs = RelationGetIdentityKeyBitmap(rel);
 
 	/* send the attributes */
 	for (i = 0; i < desc->natts; i++)
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 29702d6eab..ace167f324 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -5206,6 +5206,94 @@ restart:
 	}
 }
 
+Bitmapset *
+RelationGetIdentityKeyBitmap(Relation relation)
+{
+	Bitmapset  *idindexattrs;	/* columns in the replica identity */
+	List	   *indexoidlist;
+	List	   *newindexoidlist;
+	Oid			relpkindex;
+	Oid			relreplindex;
+	Relation	indexDesc;
+	int			i;
+	MemoryContext oldcxt;
+
+	/* Quick exit if we already computed the result. */
+	if (relation->rd_idattr != NULL)
+		return bms_copy(relation->rd_idattr);
+
+	/* Fast path if definitely no indexes */
+	if (!RelationGetForm(relation)->relhasindex)
+		return NULL;
+
+	/*
+	 * Get cached list of index OIDs. If we have to start over, we do so here.
+	 */
+restart:
+	indexoidlist = RelationGetIndexList(relation);
+
+	/* Fall out if no indexes (but relhasindex was set) */
+	if (indexoidlist == NIL)
+		return NULL;
+
+	/* Save some values to compare after building attributes */
+	relpkindex = relation->rd_pkindex;
+	relreplindex = relation->rd_replidindex;
+
+	/*
+	 * build attributes to idindexattrs.
+	 */
+	idindexattrs = NULL;
+	indexDesc = RelationIdGetRelation(relreplindex);
+
+	/* Collect simple attribute references */
+	for (i = 0; i < indexDesc->rd_index->indnatts; i++)
+	{
+		int			attrnum = indexDesc->rd_index->indkey.values[i];
+
+		/* Romove non-key columns here. */
+		if (attrnum != 0)
+		{
+			if (i < indexDesc->rd_index->indnkeyatts)
+				idindexattrs = bms_add_member(idindexattrs,
+											  attrnum - FirstLowInvalidHeapAttributeNumber);
+		}
+	}
+	RelationClose(indexDesc);
+
+	/* Check if we need to redo */
+	newindexoidlist = RelationGetIndexList(relation);
+	if (equal(indexoidlist, newindexoidlist) &&
+		relpkindex == relation->rd_pkindex &&
+		relreplindex == relation->rd_replidindex)
+	{
+		/* Still the same index set, so proceed */
+		list_free(newindexoidlist);
+		list_free(indexoidlist);
+	}
+	else
+	{
+		/* Gotta do it over ... might as well not leak memory */
+		list_free(newindexoidlist);
+		list_free(indexoidlist);
+		bms_free(idindexattrs);
+
+		goto restart;
+	}
+
+	/* Don't leak the old values of these bitmaps, if any */
+	bms_free(relation->rd_idattr);
+	relation->rd_idattr = NULL;
+
+	/* Now save copies of the bitmaps in the relcache entry */
+	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+	relation->rd_idattr = bms_copy(idindexattrs);
+	MemoryContextSwitchTo(oldcxt);
+
+	/* We return our original working copy for caller to play with */
+	return idindexattrs;
+}
+
 /*
  * RelationGetExclusionInfo -- get info about index's exclusion constraint
  *
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 2fcdf79323..f772855ac6 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -65,6 +65,8 @@ typedef enum IndexAttrBitmapKind
 extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation,
 											 IndexAttrBitmapKind attrKind);
 
+extern Bitmapset *RelationGetIdentityKeyBitmap(Relation relation);
+
 extern void RelationGetExclusionInfo(Relation indexRelation,
 									 Oid **operators,
 									 Oid **procs,
diff --git a/src/test/subscription/t/010_truncate.pl b/src/test/subscription/t/010_truncate.pl
index be2c0bdc35..cfcee2f1a7 100644
--- a/src/test/subscription/t/010_truncate.pl
+++ b/src/test/subscription/t/010_truncate.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 11;
 
 # setup
 
@@ -158,3 +158,28 @@ is($result, qq(0||), 'truncate of multiple tables some not published');
 $result = $node_subscriber->safe_psql('postgres',
 	"SELECT count(*), min(a), max(a) FROM tab2");
 is($result, qq(3|1|3), 'truncate of multiple tables some not published');
+
+# setup synchronous logical replication
+
+$node_publisher->safe_psql('postgres',
+	"ALTER SYSTEM SET synchronous_standby_names TO 'sub1'");
+$node_publisher->safe_psql('postgres', "SELECT pg_reload_conf()");
+
+# insert data to truncate
+
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO tab1 VALUES (1), (2), (3)");
+
+$node_publisher->wait_for_catchup('sub1');
+
+$result = $node_subscriber->safe_psql('postgres',
+	"SELECT count(*), min(a), max(a) FROM tab1");
+is($result, qq(3|1|3), 'check synchronous logical replication');
+
+$node_publisher->safe_psql('postgres', "TRUNCATE tab1");
+
+$node_publisher->wait_for_catchup('sub1');
+
+$result = $node_subscriber->safe_psql('postgres',
+	"SELECT count(*), min(a), max(a) FROM tab1");
+is($result, qq(0||), 'truncate replicated in synchronous logical replication');

#26

over 4 years ago

In reply to: Japin Li (#25)

1 attachment(s)

RE: Truncate in synchronous logical replication failed

On Saturday, April 17, 2021 12:53 AM Japin Li <japinli@hotmail.com>

On Fri, 16 Apr 2021 at 17:19, osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Friday, April 16, 2021 5:50 PM Amit Kapila <amit.kapila16@gmail.com>

wrote:
On Fri, Apr 16, 2021 at 12:56 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

Thanks for your reminder. It might be a way to solve this problem.

Yeah. I've made the 1st patch for this issue.

In my env, with the patch
the TRUNCATE in synchronous logical replication doesn't hang.
Few initial comments:
=====================
1.
+ relreplindex = relation->rd_replidindex;
+
+ /*
+ * build attributes to idindexattrs.
+ */
+ idindexattrs = NULL;
+ foreach(l, indexoidlist)
+ {
+ Oid indexOid = lfirst_oid(l);
+ Relation indexDesc;
+ int i;
+ bool isIDKey; /* replica identity index */
+
+ indexDesc = RelationIdGetRelation(indexOid);
When you have oid of replica identity index (relreplindex) then what
is the need to traverse all the indexes?
Ok. No need to traverse all the indexes. Will fix this part.

2.
It is better to name the function as RelationGet...

You are right. I'll modify this in my next version.
I took the liberty to address review comments and provide a v2 patch on top
of your's v1 patch, also merge the test case.

Sorry for attaching.

No problem. Thank you for updating the patch.
I've conducted some cosmetic changes. Could you please check this ?
That's already applied by pgindent.

I executed RT for this and made no failure.
Just in case, I executed 010_truncate.pl test 100 times in a tight loop,
which also didn't fail.

Best Regards,
Takamichi Osumi

Attachments:

truncate_in_synchronous_logical_replication_v03.patchapplication/octet-stream; name=truncate_in_synchronous_logical_replication_v03.patchDownload

diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index 2a1f983..1cf59e0 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -668,8 +668,7 @@ logicalrep_write_attrs(StringInfo out, Relation rel)
 	/* fetch bitmap of REPLICATION IDENTITY attributes */
 	replidentfull = (rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL);
 	if (!replidentfull)
-		idattrs = RelationGetIndexAttrBitmap(rel,
-											 INDEX_ATTR_BITMAP_IDENTITY_KEY);
+		idattrs = RelationGetIdentityKeyBitmap(rel);
 
 	/* send the attributes */
 	for (i = 0; i < desc->natts; i++)
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 29702d6..7b15440 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -5207,6 +5207,101 @@ restart:
 }
 
 /*
+ * RelationGetIdentityKeyBitmap -- get a bitmap of replica identity attribute numbers
+ *
+ * A bitmmap of index attribute numbers for the configured replica identity index is returned.
+ * See also comments of RelationGetIndexAttrBitmap().
+ *
+ * NB: this function uses RelationGetIndexList() to the relation and doesn't take lock on it,
+ * unlike RelationGetIndexAttrBitmap().
+ */
+Bitmapset *
+RelationGetIdentityKeyBitmap(Relation relation)
+{
+	Bitmapset  *idindexattrs;	/* columns in the replica identity */
+	List	   *indexoidlist;
+	List	   *newindexoidlist;
+	Oid			relpkindex;
+	Oid			relreplindex;
+	Relation	indexDesc;
+	int			i;
+	MemoryContext oldcxt;
+
+	/* Quick exit if we already computed the result. */
+	if (relation->rd_idattr != NULL)
+		return bms_copy(relation->rd_idattr);
+
+	/* Fast path if definitely no indexes */
+	if (!RelationGetForm(relation)->relhasindex)
+		return NULL;
+
+	/*
+	 * Get cached list of index OIDs. If we have to start over, we do so here.
+	 */
+restart:
+	indexoidlist = RelationGetIndexList(relation);
+
+	/* Fall out if no indexes (but relhasindex was set) */
+	if (indexoidlist == NIL)
+		return NULL;
+
+	/* Save some values to compare after building attributes */
+	relpkindex = relation->rd_pkindex;
+	relreplindex = relation->rd_replidindex;
+
+	/* Build attributes to idindexattrs */
+	idindexattrs = NULL;
+	indexDesc = RelationIdGetRelation(relreplindex);
+
+	/* Collect attribute references */
+	for (i = 0; i < indexDesc->rd_index->indnatts; i++)
+	{
+		int			attrnum = indexDesc->rd_index->indkey.values[i];
+
+		/* Romove non-key columns */
+		if (attrnum != 0)
+		{
+			if (i < indexDesc->rd_index->indnkeyatts)
+				idindexattrs = bms_add_member(idindexattrs,
+											  attrnum - FirstLowInvalidHeapAttributeNumber);
+		}
+	}
+	RelationClose(indexDesc);
+
+	/* Check if we need to redo */
+	newindexoidlist = RelationGetIndexList(relation);
+	if (equal(indexoidlist, newindexoidlist) &&
+		relpkindex == relation->rd_pkindex &&
+		relreplindex == relation->rd_replidindex)
+	{
+		/* Still the same index set, so proceed */
+		list_free(newindexoidlist);
+		list_free(indexoidlist);
+	}
+	else
+	{
+		/* Gotta do it over ... might as well not leak memory */
+		list_free(newindexoidlist);
+		list_free(indexoidlist);
+		bms_free(idindexattrs);
+
+		goto restart;
+	}
+
+	/* Don't leak the old values of these bitmaps, if any */
+	bms_free(relation->rd_idattr);
+	relation->rd_idattr = NULL;
+
+	/* Now save copies of the bitmaps in the relcache entry */
+	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+	relation->rd_idattr = bms_copy(idindexattrs);
+	MemoryContextSwitchTo(oldcxt);
+
+	/* We return our original working copy for caller to play with */
+	return idindexattrs;
+}
+
+/*
  * RelationGetExclusionInfo -- get info about index's exclusion constraint
  *
  * This should be called only for an index that is known to have an
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 2fcdf79..f772855 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -65,6 +65,8 @@ typedef enum IndexAttrBitmapKind
 extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation,
 											 IndexAttrBitmapKind attrKind);
 
+extern Bitmapset *RelationGetIdentityKeyBitmap(Relation relation);
+
 extern void RelationGetExclusionInfo(Relation indexRelation,
 									 Oid **operators,
 									 Oid **procs,
diff --git a/src/test/subscription/t/010_truncate.pl b/src/test/subscription/t/010_truncate.pl
index be2c0bd..cfcee2f 100644
--- a/src/test/subscription/t/010_truncate.pl
+++ b/src/test/subscription/t/010_truncate.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 11;
 
 # setup
 
@@ -158,3 +158,28 @@ is($result, qq(0||), 'truncate of multiple tables some not published');
 $result = $node_subscriber->safe_psql('postgres',
 	"SELECT count(*), min(a), max(a) FROM tab2");
 is($result, qq(3|1|3), 'truncate of multiple tables some not published');
+
+# setup synchronous logical replication
+
+$node_publisher->safe_psql('postgres',
+	"ALTER SYSTEM SET synchronous_standby_names TO 'sub1'");
+$node_publisher->safe_psql('postgres', "SELECT pg_reload_conf()");
+
+# insert data to truncate
+
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO tab1 VALUES (1), (2), (3)");
+
+$node_publisher->wait_for_catchup('sub1');
+
+$result = $node_subscriber->safe_psql('postgres',
+	"SELECT count(*), min(a), max(a) FROM tab1");
+is($result, qq(3|1|3), 'check synchronous logical replication');
+
+$node_publisher->safe_psql('postgres', "TRUNCATE tab1");
+
+$node_publisher->wait_for_catchup('sub1');
+
+$result = $node_subscriber->safe_psql('postgres',
+	"SELECT count(*), min(a), max(a) FROM tab1");
+is($result, qq(0||), 'truncate replicated in synchronous logical replication');

#27

japinli@hotmail.com

over 4 years ago

In reply to: osumi.takamichi@fujitsu.com (#26)

Re: Truncate in synchronous logical replication failed

On Sat, 17 Apr 2021 at 12:03, osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote:

On Saturday, April 17, 2021 12:53 AM Japin Li <japinli@hotmail.com>
On Fri, 16 Apr 2021 at 17:19, osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Friday, April 16, 2021 5:50 PM Amit Kapila <amit.kapila16@gmail.com>

wrote:
On Fri, Apr 16, 2021 at 12:56 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

Thanks for your reminder. It might be a way to solve this problem.

Yeah. I've made the 1st patch for this issue.

In my env, with the patch
the TRUNCATE in synchronous logical replication doesn't hang.
Few initial comments:
=====================
1.
+ relreplindex = relation->rd_replidindex;
+
+ /*
+ * build attributes to idindexattrs.
+ */
+ idindexattrs = NULL;
+ foreach(l, indexoidlist)
+ {
+ Oid indexOid = lfirst_oid(l);
+ Relation indexDesc;
+ int i;
+ bool isIDKey; /* replica identity index */
+
+ indexDesc = RelationIdGetRelation(indexOid);
When you have oid of replica identity index (relreplindex) then what
is the need to traverse all the indexes?
Ok. No need to traverse all the indexes. Will fix this part.

2.
It is better to name the function as RelationGet...

You are right. I'll modify this in my next version.
I took the liberty to address review comments and provide a v2 patch on top
of your's v1 patch, also merge the test case.

Sorry for attaching.
No problem. Thank you for updating the patch.
I've conducted some cosmetic changes. Could you please check this ?
That's already applied by pgindent.

I executed RT for this and made no failure.
Just in case, I executed 010_truncate.pl test 100 times in a tight loop,
which also didn't fail.

LGTM, I made an entry in the commitfest[1]https://commitfest.postgresql.org/33/3081/, so that the patches will get a
chance to run on all the platforms.

[1]: https://commitfest.postgresql.org/33/3081/

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

#28

Ajin Cherian

itsajin@gmail.com

over 4 years ago

In reply to: osumi.takamichi@fujitsu.com (#26)

Re: Truncate in synchronous logical replication failed

On Sat, Apr 17, 2021 at 2:04 PM osumi.takamichi@fujitsu.com <
osumi.takamichi@fujitsu.com> wrote:

No problem. Thank you for updating the patch.
I've conducted some cosmetic changes. Could you please check this ?
That's already applied by pgindent.

I executed RT for this and made no failure.
Just in case, I executed 010_truncate.pl test 100 times in a tight loop,
which also didn't fail.

I reviewed the patch, ran make check, no issues. One minor comment:

Could you add the comment similar to RelationGetIndexAttrBitmap() on why
the redo, it's not very obvious
to someone reading the code, why we are refetching the index list here.

+ /* Check if we need to redo */
+ newindexoidlist = RelationGetIndexList(relation);

thanks,
Ajin Cherian
Fujitsu Australia

#29

over 4 years ago

In reply to: Ajin Cherian (#28)

1 attachment(s)

RE: Truncate in synchronous logical replication failed

On Tuesday, April 20, 2021 10:53 AM Ajin Cherian <itsajin@gmail.com> wrote:

On Sat, Apr 17, 2021 at 2:04 PM osumi.takamichi@fujitsu.com
<mailto:osumi.takamichi@fujitsu.com> <osumi.takamichi@fujitsu.com
<mailto:osumi.takamichi@fujitsu.com> > wrote:

No problem. Thank you for updating the patch.
I've conducted some cosmetic changes. Could you please check
this ?
That's already applied by pgindent.

I executed RT for this and made no failure.
Just in case, I executed 010_truncate.pl <http://010_truncate.pl>
test 100 times in a tight loop,
which also didn't fail.

I reviewed the patch, ran make check, no issues. One minor comment:

Could you add the comment similar to RelationGetIndexAttrBitmap() on why
the redo, it's not very obvious to someone reading the code, why we are
refetching the index list here.

+ /* Check if we need to redo */

+ newindexoidlist = RelationGetIndexList(relation);

Yeah, makes sense. Fixed.
Its indents seem a bit weird but came from pgindent.
Thank you for your review !

Best Regards,
Takamichi Osumi

Attachments:

truncate_in_synchronous_logical_replication_v04.patchapplication/octet-stream; name=truncate_in_synchronous_logical_replication_v04.patchDownload

diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index 2a1f983..1cf59e0 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -668,8 +668,7 @@ logicalrep_write_attrs(StringInfo out, Relation rel)
 	/* fetch bitmap of REPLICATION IDENTITY attributes */
 	replidentfull = (rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL);
 	if (!replidentfull)
-		idattrs = RelationGetIndexAttrBitmap(rel,
-											 INDEX_ATTR_BITMAP_IDENTITY_KEY);
+		idattrs = RelationGetIdentityKeyBitmap(rel);
 
 	/* send the attributes */
 	for (i = 0; i < desc->natts; i++)
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 29702d6..6fee830 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -5207,6 +5207,105 @@ restart:
 }
 
 /*
+ * RelationGetIdentityKeyBitmap -- get a bitmap of replica identity attribute numbers
+ *
+ * A bitmmap of index attribute numbers for the configured replica identity index is returned.
+ * See also comments of RelationGetIndexAttrBitmap().
+ *
+ * NB: this function uses RelationGetIndexList() to the relation and doesn't take lock on it,
+ * unlike RelationGetIndexAttrBitmap().
+ */
+Bitmapset *
+RelationGetIdentityKeyBitmap(Relation relation)
+{
+	Bitmapset  *idindexattrs;	/* columns in the replica identity */
+	List	   *indexoidlist;
+	List	   *newindexoidlist;
+	Oid			relpkindex;
+	Oid			relreplindex;
+	Relation	indexDesc;
+	int			i;
+	MemoryContext oldcxt;
+
+	/* Quick exit if we already computed the result. */
+	if (relation->rd_idattr != NULL)
+		return bms_copy(relation->rd_idattr);
+
+	/* Fast path if definitely no indexes */
+	if (!RelationGetForm(relation)->relhasindex)
+		return NULL;
+
+	/*
+	 * Get cached list of index OIDs. If we have to start over, we do so here.
+	 */
+restart:
+	indexoidlist = RelationGetIndexList(relation);
+
+	/* Fall out if no indexes (but relhasindex was set) */
+	if (indexoidlist == NIL)
+		return NULL;
+
+	/* Save some values to compare after building attributes */
+	relpkindex = relation->rd_pkindex;
+	relreplindex = relation->rd_replidindex;
+
+	/* Build attributes to idindexattrs */
+	idindexattrs = NULL;
+	indexDesc = RelationIdGetRelation(relreplindex);
+
+	/* Collect attribute references */
+	for (i = 0; i < indexDesc->rd_index->indnatts; i++)
+	{
+		int			attrnum = indexDesc->rd_index->indkey.values[i];
+
+		/* Romove non-key columns */
+		if (attrnum != 0)
+		{
+			if (i < indexDesc->rd_index->indnkeyatts)
+				idindexattrs = bms_add_member(idindexattrs,
+											  attrnum - FirstLowInvalidHeapAttributeNumber);
+		}
+	}
+	RelationClose(indexDesc);
+
+	/*
+	 * If we have received a relcache flush event on this relcache entry, we
+	 * might get a change in the rel's index list during above loop. In this
+	 * case, we'd better redo to ensure we deliver up-to-date bitmap.
+	 */
+	newindexoidlist = RelationGetIndexList(relation);
+	if (equal(indexoidlist, newindexoidlist) &&
+		relpkindex == relation->rd_pkindex &&
+		relreplindex == relation->rd_replidindex)
+	{
+		/* Still the same index set, so proceed */
+		list_free(newindexoidlist);
+		list_free(indexoidlist);
+	}
+	else
+	{
+		/* Gotta do it over ... might as well not leak memory */
+		list_free(newindexoidlist);
+		list_free(indexoidlist);
+		bms_free(idindexattrs);
+
+		goto restart;
+	}
+
+	/* Don't leak the old values of these bitmaps, if any */
+	bms_free(relation->rd_idattr);
+	relation->rd_idattr = NULL;
+
+	/* Now save copies of the bitmaps in the relcache entry */
+	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+	relation->rd_idattr = bms_copy(idindexattrs);
+	MemoryContextSwitchTo(oldcxt);
+
+	/* We return our original working copy for caller to play with */
+	return idindexattrs;
+}
+
+/*
  * RelationGetExclusionInfo -- get info about index's exclusion constraint
  *
  * This should be called only for an index that is known to have an
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 2fcdf79..f772855 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -65,6 +65,8 @@ typedef enum IndexAttrBitmapKind
 extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation,
 											 IndexAttrBitmapKind attrKind);
 
+extern Bitmapset *RelationGetIdentityKeyBitmap(Relation relation);
+
 extern void RelationGetExclusionInfo(Relation indexRelation,
 									 Oid **operators,
 									 Oid **procs,
diff --git a/src/test/subscription/t/010_truncate.pl b/src/test/subscription/t/010_truncate.pl
index be2c0bd..cfcee2f 100644
--- a/src/test/subscription/t/010_truncate.pl
+++ b/src/test/subscription/t/010_truncate.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 11;
 
 # setup
 
@@ -158,3 +158,28 @@ is($result, qq(0||), 'truncate of multiple tables some not published');
 $result = $node_subscriber->safe_psql('postgres',
 	"SELECT count(*), min(a), max(a) FROM tab2");
 is($result, qq(3|1|3), 'truncate of multiple tables some not published');
+
+# setup synchronous logical replication
+
+$node_publisher->safe_psql('postgres',
+	"ALTER SYSTEM SET synchronous_standby_names TO 'sub1'");
+$node_publisher->safe_psql('postgres', "SELECT pg_reload_conf()");
+
+# insert data to truncate
+
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO tab1 VALUES (1), (2), (3)");
+
+$node_publisher->wait_for_catchup('sub1');
+
+$result = $node_subscriber->safe_psql('postgres',
+	"SELECT count(*), min(a), max(a) FROM tab1");
+is($result, qq(3|1|3), 'check synchronous logical replication');
+
+$node_publisher->safe_psql('postgres', "TRUNCATE tab1");
+
+$node_publisher->wait_for_catchup('sub1');
+
+$result = $node_subscriber->safe_psql('postgres',
+	"SELECT count(*), min(a), max(a) FROM tab1");
+is($result, qq(0||), 'truncate replicated in synchronous logical replication');

#30

amit.kapila16@gmail.com

over 4 years ago

In reply to: osumi.takamichi@fujitsu.com (#29)

1 attachment(s)

Re: Truncate in synchronous logical replication failed

On Tue, Apr 20, 2021 at 9:00 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Tuesday, April 20, 2021 10:53 AM Ajin Cherian <itsajin@gmail.com> wrote:

I reviewed the patch, ran make check, no issues. One minor comment:

Could you add the comment similar to RelationGetIndexAttrBitmap() on why
the redo, it's not very obvious to someone reading the code, why we are
refetching the index list here.

+ /* Check if we need to redo */

+ newindexoidlist = RelationGetIndexList(relation);

Yeah, makes sense. Fixed.

I don't think here we need to restart to get a stable list of indexes
as we do in RelationGetIndexAttrBitmap. The reason is here we build
the cache entry using a historic snapshot and all the later changes
are absorbed while decoding WAL. I have updated that and modified few
comments in the attached patch. Can you please test this in
clobber_cache_always mode? I think just testing
subscription/t/010_truncate.pl would be sufficient.

Now, this bug exists in prior versions (>= v11) where we have
introduced decoding of Truncate. Do we want to backpatch this? As no
user has reported this till now apart from Tang who I think got it
while doing some other patch testing, we might want to just push it in
HEAD. I am fine either way. Petr, others, do you have any opinion on
this matter?

--
With Regards,
Amit Kapila.

Attachments:

truncate_in_synchronous_logical_replication_v05.patchapplication/octet-stream; name=truncate_in_synchronous_logical_replication_v05.patchDownload

diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index 2a1f9830e05..1cf59e0fb0f 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -668,8 +668,7 @@ logicalrep_write_attrs(StringInfo out, Relation rel)
 	/* fetch bitmap of REPLICATION IDENTITY attributes */
 	replidentfull = (rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL);
 	if (!replidentfull)
-		idattrs = RelationGetIndexAttrBitmap(rel,
-											 INDEX_ATTR_BITMAP_IDENTITY_KEY);
+		idattrs = RelationGetIdentityKeyBitmap(rel);
 
 	/* send the attributes */
 	for (i = 0; i < desc->natts; i++)
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 29702d6eab1..0fdae9ea7d6 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -5206,6 +5206,88 @@ restart:
 	}
 }
 
+/*
+ * RelationGetIdentityKeyBitmap -- get a bitmap of replica identity attribute
+ * numbers
+ *
+ * A bitmap of index attribute numbers for the configured replica identity
+ * index is returned.
+ *
+ * See also comments of RelationGetIndexAttrBitmap().
+ *
+ * This is a special purpose function used during logical replication. Here,
+ * unlike RelationGetIndexAttrBitmap(), we don't a acquire lock on the required
+ * index as we build the cache entry using a historic snapshot and all the
+ * later changes are absorbed while decoding WAL. Due to this reason, we don't
+ * need to retry here in case of a change in the set of indexes.
+ */
+Bitmapset *
+RelationGetIdentityKeyBitmap(Relation relation)
+{
+	Bitmapset  *idindexattrs;	/* columns in the replica identity */
+	List	   *indexoidlist;
+	Oid			relreplindex;
+	Relation	indexDesc;
+	int			i;
+	MemoryContext oldcxt;
+
+	/* Quick exit if we already computed the result. */
+	if (relation->rd_idattr != NULL)
+		return bms_copy(relation->rd_idattr);
+
+	/* Fast path if definitely no indexes */
+	if (!RelationGetForm(relation)->relhasindex)
+		return NULL;
+
+	/* Historic snapshot must be set. */
+	Assert(HistoricSnapshotActive());
+
+	indexoidlist = RelationGetIndexList(relation);
+
+	/* Fall out if no indexes (but relhasindex was set) */
+	if (indexoidlist == NIL)
+		return NULL;
+
+	/* Save some values to compare after building attributes */
+	relreplindex = relation->rd_replidindex;
+
+	/* Build attributes to idindexattrs */
+	idindexattrs = NULL;
+	indexDesc = RelationIdGetRelation(relreplindex);
+
+	/* Collect attribute references */
+	for (i = 0; i < indexDesc->rd_index->indnatts; i++)
+	{
+		int			attrnum = indexDesc->rd_index->indkey.values[i];
+
+		/*
+		 * We don't include non-key columns into idindexattrs bitmaps. See
+		 * RelationGetIndexAttrBitmap.
+		 */
+		if (attrnum != 0)
+		{
+			if (i < indexDesc->rd_index->indnkeyatts)
+				idindexattrs = bms_add_member(idindexattrs,
+											  attrnum - FirstLowInvalidHeapAttributeNumber);
+		}
+	}
+
+	RelationClose(indexDesc);
+	list_free(indexoidlist);
+
+	/* Don't leak the old values of these bitmaps, if any */
+	bms_free(relation->rd_idattr);
+	relation->rd_idattr = NULL;
+
+	/* Now save copy of the bitmap in the relcache entry */
+	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+	relation->rd_idattr = bms_copy(idindexattrs);
+	MemoryContextSwitchTo(oldcxt);
+
+	/* We return our original working copy for caller to play with */
+	return idindexattrs;
+}
+
 /*
  * RelationGetExclusionInfo -- get info about index's exclusion constraint
  *
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 2fcdf793238..f772855ac69 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -65,6 +65,8 @@ typedef enum IndexAttrBitmapKind
 extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation,
 											 IndexAttrBitmapKind attrKind);
 
+extern Bitmapset *RelationGetIdentityKeyBitmap(Relation relation);
+
 extern void RelationGetExclusionInfo(Relation indexRelation,
 									 Oid **operators,
 									 Oid **procs,
diff --git a/src/test/subscription/t/010_truncate.pl b/src/test/subscription/t/010_truncate.pl
index be2c0bdc35e..7fc403e1385 100644
--- a/src/test/subscription/t/010_truncate.pl
+++ b/src/test/subscription/t/010_truncate.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 11;
 
 # setup
 
@@ -158,3 +158,28 @@ is($result, qq(0||), 'truncate of multiple tables some not published');
 $result = $node_subscriber->safe_psql('postgres',
 	"SELECT count(*), min(a), max(a) FROM tab2");
 is($result, qq(3|1|3), 'truncate of multiple tables some not published');
+
+# Test that truncate works for synchronous logical replication
+
+$node_publisher->safe_psql('postgres',
+	"ALTER SYSTEM SET synchronous_standby_names TO 'sub1'");
+$node_publisher->safe_psql('postgres', "SELECT pg_reload_conf()");
+
+# insert data to truncate
+
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO tab1 VALUES (1), (2), (3)");
+
+$node_publisher->wait_for_catchup('sub1');
+
+$result = $node_subscriber->safe_psql('postgres',
+	"SELECT count(*), min(a), max(a) FROM tab1");
+is($result, qq(3|1|3), 'check synchronous logical replication');
+
+$node_publisher->safe_psql('postgres', "TRUNCATE tab1");
+
+$node_publisher->wait_for_catchup('sub1');
+
+$result = $node_subscriber->safe_psql('postgres',
+	"SELECT count(*), min(a), max(a) FROM tab1");
+is($result, qq(0||), 'truncate replicated in synchronous logical replication');

#31

shiy.fnst@fujitsu.com

over 4 years ago

In reply to: Amit Kapila (#30)

RE: Truncate in synchronous logical replication failed

I don't think here we need to restart to get a stable list of indexes
as we do in RelationGetIndexAttrBitmap. The reason is here we build
the cache entry using a historic snapshot and all the later changes
are absorbed while decoding WAL. I have updated that and modified few
comments in the attached patch. Can you please test this in
clobber_cache_always mode? I think just testing
subscription/t/010_truncate.pl would be sufficient.

Thanks for your patch. I tested your patch and it passes 'make check-world' and it works as expected.
By the way, I also tested in clobber_cache_always mode, it passed, too.(I only tested subscription/t/010_truncate.pl.)

Regards,
Shi yu

#32

over 4 years ago

In reply to: Amit Kapila (#30)

RE: Truncate in synchronous logical replication failed

On Thursday, April 22, 2021 6:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 20, 2021 at 9:00 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Tuesday, April 20, 2021 10:53 AM Ajin Cherian <itsajin@gmail.com>

wrote:

I reviewed the patch, ran make check, no issues. One minor comment:

Could you add the comment similar to RelationGetIndexAttrBitmap() on
why the redo, it's not very obvious to someone reading the code, why
we are refetching the index list here.

+ /* Check if we need to redo */

+ newindexoidlist = RelationGetIndexList(relation);

Yeah, makes sense. Fixed.

I don't think here we need to restart to get a stable list of indexes as we do in
RelationGetIndexAttrBitmap. The reason is here we build the cache entry
using a historic snapshot and all the later changes are absorbed while
decoding WAL.

I rechecked this and I agree with this.
I don't see any problem to remove the redo check.
Based on this direction, I'll share my several minor comments.

[1]: a typo of RelationGetIdentityKeyBitmap's comment

+ * This is a special purpose function used during logical replication. Here,
+ * unlike RelationGetIndexAttrBitmap(), we don't a acquire lock on the required

We have "a" in an unnatural place. It should be "we don't acquire...".

[2]: suggestion to fix RelationGetIdentityKeyBitmap's comment

+ * later changes are absorbed while decoding WAL. Due to this reason, we don't
+ * need to retry here in case of a change in the set of indexes.

I think it's better to use "Because of" instead of "Due to".
This patch works to solve the deadlock.

[3]: wrong comment in RelationGetIdentityKeyBitmap

+ /* Save some values to compare after building attributes */

I wrote this comment for the redo check
that has been removed already. We can delete it.

[4]: suggestion to remove local variable relreplindex in RelationGetIdentityKeyBitmap

I think we don't need relreplindex.
We can just pass relaton->rd_replidindex to RelationIdGetRelation().
There is no more reference of the variable.

[5]: suggestion to fix the place to release indexoidlist

I thought we can do its list_free() ealier,
after checking if there is no indexes.

[6]: suggestion to remove period in one comment.

+ /* Quick exit if we already computed the result. */

This comes from RelationGetIndexAttrBitmap (and my posted versions)
but I think we can remove the period to give better alignment shared with other comments in the function.

I have updated that and modified few comments in the
attached patch. Can you please test this in clobber_cache_always mode? I
think just testing subscription/t/010_truncate.pl would be sufficient.

I did it. It didn't fail. No problem.

Now, this bug exists in prior versions (>= v11) where we have introduced
decoding of Truncate. Do we want to backpatch this? As no user has reported
this till now apart from Tang who I think got it while doing some other patch
testing, we might want to just push it in HEAD. I am fine either way. Petr,
others, do you have any opinion on this matter?

I think we are fine with applying this patch to the HEAD only,
since nobody has complained about the hang issue.

Best Regards,
Takamichi Osumi

#33

amit.kapila16@gmail.com

over 4 years ago

In reply to: osumi.takamichi@fujitsu.com (#32)

Re: Truncate in synchronous logical replication failed

On Fri, Apr 23, 2021 at 12:04 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Thursday, April 22, 2021 6:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 20, 2021 at 9:00 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Tuesday, April 20, 2021 10:53 AM Ajin Cherian <itsajin@gmail.com>

wrote:

I reviewed the patch, ran make check, no issues. One minor comment:

Could you add the comment similar to RelationGetIndexAttrBitmap() on
why the redo, it's not very obvious to someone reading the code, why
we are refetching the index list here.

+ /* Check if we need to redo */

+ newindexoidlist = RelationGetIndexList(relation);

Yeah, makes sense. Fixed.

I don't think here we need to restart to get a stable list of indexes as we do in
RelationGetIndexAttrBitmap. The reason is here we build the cache entry
using a historic snapshot and all the later changes are absorbed while
decoding WAL.

I rechecked this and I agree with this.
I don't see any problem to remove the redo check.
Based on this direction, I'll share my several minor comments.

[1] a typo of RelationGetIdentityKeyBitmap's comment
+ * This is a special purpose function used during logical replication. Here,
+ * unlike RelationGetIndexAttrBitmap(), we don't a acquire lock on the required
We have "a" in an unnatural place. It should be "we don't acquire...".

[2] suggestion to fix RelationGetIdentityKeyBitmap's comment
+ * later changes are absorbed while decoding WAL. Due to this reason, we don't
+ * need to retry here in case of a change in the set of indexes.
I think it's better to use "Because of" instead of "Due to".
This patch works to solve the deadlock.

I am not sure which one is better. I would like to keep it as it is
unless you feel strongly about point 2.

[3] wrong comment in RelationGetIdentityKeyBitmap

+ /* Save some values to compare after building attributes */

I wrote this comment for the redo check
that has been removed already. We can delete it.

[4] suggestion to remove local variable relreplindex in RelationGetIdentityKeyBitmap

I think we don't need relreplindex.
We can just pass relaton->rd_replidindex to RelationIdGetRelation().
There is no more reference of the variable.

[5] suggestion to fix the place to release indexoidlist

I thought we can do its list_free() ealier,
after checking if there is no indexes.

Hmm, why? If there are no indexes then we wouldn't have allocated any memory.

[6] suggestion to remove period in one comment.

+ /* Quick exit if we already computed the result. */

This comes from RelationGetIndexAttrBitmap (and my posted versions)
but I think we can remove the period to give better alignment shared with other comments in the function.

Can you please update the patch except for the two points to which I
responded above?

--
With Regards,
Amit Kapila.

#34

over 4 years ago

In reply to: Amit Kapila (#33)

1 attachment(s)

RE: Truncate in synchronous logical replication failed

On Friday, April 23, 2021 3:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Apr 23, 2021 at 12:04 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Thursday, April 22, 2021 6:33 PM Amit Kapila

<amit.kapila16@gmail.com> wrote:

On Tue, Apr 20, 2021 at 9:00 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Tuesday, April 20, 2021 10:53 AM Ajin Cherian
<itsajin@gmail.com>

wrote:

I reviewed the patch, ran make check, no issues. One minor

comment:

Could you add the comment similar to
RelationGetIndexAttrBitmap() on why the redo, it's not very
obvious to someone reading the code, why we are refetching the

index list here.
+ /* Check if we need to redo */

+ newindexoidlist = RelationGetIndexList(relation);

Yeah, makes sense. Fixed.

I don't think here we need to restart to get a stable list of
indexes as we do in RelationGetIndexAttrBitmap. The reason is here
we build the cache entry using a historic snapshot and all the later
changes are absorbed while decoding WAL.

I rechecked this and I agree with this.
I don't see any problem to remove the redo check.
Based on this direction, I'll share my several minor comments.

[1] a typo of RelationGetIdentityKeyBitmap's comment
+ * This is a special purpose function used during logical
+ replication. Here,
+ * unlike RelationGetIndexAttrBitmap(), we don't a acquire lock on
+ the required
We have "a" in an unnatural place. It should be "we don't acquire...".

[2] suggestion to fix RelationGetIdentityKeyBitmap's comment
+ * later changes are absorbed while decoding WAL. Due to this reason,
+ we don't
+ * need to retry here in case of a change in the set of indexes.
I think it's better to use "Because of" instead of "Due to".
This patch works to solve the deadlock.
I am not sure which one is better. I would like to keep it as it is unless you feel
strongly about point 2.

[3] wrong comment in RelationGetIdentityKeyBitmap

+ /* Save some values to compare after building attributes */

I wrote this comment for the redo check that has been removed already.
We can delete it.

[4] suggestion to remove local variable relreplindex in
RelationGetIdentityKeyBitmap

I think we don't need relreplindex.
We can just pass relaton->rd_replidindex to RelationIdGetRelation().
There is no more reference of the variable.

[5] suggestion to fix the place to release indexoidlist

I thought we can do its list_free() ealier, after checking if there is
no indexes.

Hmm, why? If there are no indexes then we wouldn't have allocated any
memory.

[6] suggestion to remove period in one comment.

+ /* Quick exit if we already computed the result. */

This comes from RelationGetIndexAttrBitmap (and my posted versions)
but I think we can remove the period to give better alignment shared with

other comments in the function.

Can you please update the patch except for the two points to which I
responded above?

I've combined v5 with above accepted comments.

Just in case, I've conducted make check-world,
the test of clobber_cache_always mode again for v6
and tight loop test of 100 times for 010_truncate.pl.
The result is that all passed with no failure.

Best Regards,
Takamichi Osumi

Attachments:

truncate_in_synchronous_logical_replication_v06.patchapplication/octet-stream; name=truncate_in_synchronous_logical_replication_v06.patchDownload

diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index 2a1f983..1cf59e0 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -668,8 +668,7 @@ logicalrep_write_attrs(StringInfo out, Relation rel)
 	/* fetch bitmap of REPLICATION IDENTITY attributes */
 	replidentfull = (rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL);
 	if (!replidentfull)
-		idattrs = RelationGetIndexAttrBitmap(rel,
-											 INDEX_ATTR_BITMAP_IDENTITY_KEY);
+		idattrs = RelationGetIdentityKeyBitmap(rel);
 
 	/* send the attributes */
 	for (i = 0; i < desc->natts; i++)
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 29702d6..d0c127e 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -5207,6 +5207,84 @@ restart:
 }
 
 /*
+ * RelationGetIdentityKeyBitmap -- get a bitmap of replica identity attribute
+ * numbers
+ *
+ * A bitmap of index attribute numbers for the configured replica identity
+ * index is returned.
+ *
+ * See also comments of RelationGetIndexAttrBitmap().
+ *
+ * This is a special purpose function used during logical replication. Here,
+ * unlike RelationGetIndexAttrBitmap(), we don't acquire lock on the required
+ * index as we build the cache entry using a historic snapshot and all the
+ * later changes are absorbed while decoding WAL. Due to this reason, we don't
+ * need to retry here in case of a change in the set of indexes.
+ */
+Bitmapset *
+RelationGetIdentityKeyBitmap(Relation relation)
+{
+	Bitmapset  *idindexattrs;	/* columns in the replica identity */
+	List	   *indexoidlist;
+	Relation	indexDesc;
+	int			i;
+	MemoryContext oldcxt;
+
+	/* Quick exit if we already computed the result */
+	if (relation->rd_idattr != NULL)
+		return bms_copy(relation->rd_idattr);
+
+	/* Fast path if definitely no indexes */
+	if (!RelationGetForm(relation)->relhasindex)
+		return NULL;
+
+	/* Historic snapshot must be set. */
+	Assert(HistoricSnapshotActive());
+
+	indexoidlist = RelationGetIndexList(relation);
+
+	/* Fall out if no indexes (but relhasindex was set) */
+	if (indexoidlist == NIL)
+		return NULL;
+
+	/* Build attributes to idindexattrs */
+	idindexattrs = NULL;
+	indexDesc = RelationIdGetRelation(relation->rd_replidindex);
+
+	/* Collect attribute references */
+	for (i = 0; i < indexDesc->rd_index->indnatts; i++)
+	{
+		int			attrnum = indexDesc->rd_index->indkey.values[i];
+
+		/*
+		 * We don't include non-key columns into idindexattrs bitmaps. See
+		 * RelationGetIndexAttrBitmap.
+		 */
+		if (attrnum != 0)
+		{
+			if (i < indexDesc->rd_index->indnkeyatts)
+				idindexattrs = bms_add_member(idindexattrs,
+											  attrnum - FirstLowInvalidHeapAttributeNumber);
+		}
+	}
+
+	RelationClose(indexDesc);
+	list_free(indexoidlist);
+
+	/* Don't leak the old values of these bitmaps, if any */
+	bms_free(relation->rd_idattr);
+	relation->rd_idattr = NULL;
+
+	/* Now save copy of the bitmap in the relcache entry */
+	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+	relation->rd_idattr = bms_copy(idindexattrs);
+	MemoryContextSwitchTo(oldcxt);
+
+	/* We return our original working copy for caller to play with */
+	return idindexattrs;
+}
+
+/*
  * RelationGetExclusionInfo -- get info about index's exclusion constraint
  *
  * This should be called only for an index that is known to have an
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 2fcdf79..f772855 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -65,6 +65,8 @@ typedef enum IndexAttrBitmapKind
 extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation,
 											 IndexAttrBitmapKind attrKind);
 
+extern Bitmapset *RelationGetIdentityKeyBitmap(Relation relation);
+
 extern void RelationGetExclusionInfo(Relation indexRelation,
 									 Oid **operators,
 									 Oid **procs,
diff --git a/src/test/subscription/t/010_truncate.pl b/src/test/subscription/t/010_truncate.pl
index be2c0bd..7fc403e 100644
--- a/src/test/subscription/t/010_truncate.pl
+++ b/src/test/subscription/t/010_truncate.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 11;
 
 # setup
 
@@ -158,3 +158,28 @@ is($result, qq(0||), 'truncate of multiple tables some not published');
 $result = $node_subscriber->safe_psql('postgres',
 	"SELECT count(*), min(a), max(a) FROM tab2");
 is($result, qq(3|1|3), 'truncate of multiple tables some not published');
+
+# Test that truncate works for synchronous logical replication
+
+$node_publisher->safe_psql('postgres',
+	"ALTER SYSTEM SET synchronous_standby_names TO 'sub1'");
+$node_publisher->safe_psql('postgres', "SELECT pg_reload_conf()");
+
+# insert data to truncate
+
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO tab1 VALUES (1), (2), (3)");
+
+$node_publisher->wait_for_catchup('sub1');
+
+$result = $node_subscriber->safe_psql('postgres',
+	"SELECT count(*), min(a), max(a) FROM tab1");
+is($result, qq(3|1|3), 'check synchronous logical replication');
+
+$node_publisher->safe_psql('postgres', "TRUNCATE tab1");
+
+$node_publisher->wait_for_catchup('sub1');
+
+$result = $node_subscriber->safe_psql('postgres',
+	"SELECT count(*), min(a), max(a) FROM tab1");
+is($result, qq(0||), 'truncate replicated in synchronous logical replication');

#35

over 4 years ago

In reply to: osumi.takamichi@fujitsu.com (#34)

1 attachment(s)

RE: Truncate in synchronous logical replication failed

On Friday, April 23, 2021 6:03 PM I wrote:

I've combined v5 with above accepted comments.

Just in case, I've conducted make check-world, the test of
clobber_cache_always mode again for v6 and tight loop test of 100 times for
010_truncate.pl.
The result is that all passed with no failure.

I'm sorry, I realized another minor thing which should be fixed.

In v6, I did below.
+Bitmapset *
+RelationGetIdentityKeyBitmap(Relation relation)
+{
+       Bitmapset  *idindexattrs;       /* columns in the replica identity */
...
+       /* Build attributes to idindexattrs */
+       idindexattrs = NULL;

But, we removed the loop, so we can insert NULL
at the beginning to declare idindexattrs.
v7 is the version to update this part and
related comments from v6.

Best Regards,
Takamichi Osumi

Attachments:

truncate_in_synchronous_logical_replication_v07.patchapplication/octet-stream; name=truncate_in_synchronous_logical_replication_v07.patchDownload

diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index 2a1f983..1cf59e0 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -668,8 +668,7 @@ logicalrep_write_attrs(StringInfo out, Relation rel)
 	/* fetch bitmap of REPLICATION IDENTITY attributes */
 	replidentfull = (rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL);
 	if (!replidentfull)
-		idattrs = RelationGetIndexAttrBitmap(rel,
-											 INDEX_ATTR_BITMAP_IDENTITY_KEY);
+		idattrs = RelationGetIdentityKeyBitmap(rel);
 
 	/* send the attributes */
 	for (i = 0; i < desc->natts; i++)
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 29702d6..c6fcc54 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -5207,6 +5207,81 @@ restart:
 }
 
 /*
+ * RelationGetIdentityKeyBitmap -- get a bitmap of replica identity attribute
+ * numbers
+ *
+ * A bitmap of index attribute numbers for the configured replica identity
+ * index is returned.
+ *
+ * See also comments of RelationGetIndexAttrBitmap().
+ *
+ * This is a special purpose function used during logical replication. Here,
+ * unlike RelationGetIndexAttrBitmap(), we don't acquire lock on the required
+ * index as we build the cache entry using a historic snapshot and all the
+ * later changes are absorbed while decoding WAL. Due to this reason, we don't
+ * need to retry here in case of a change in the set of indexes.
+ */
+Bitmapset *
+RelationGetIdentityKeyBitmap(Relation relation)
+{
+	Bitmapset  *idindexattrs = NULL;	/* columns in the replica identity */
+	List	   *indexoidlist;
+	Relation	indexDesc;
+	int			i;
+	MemoryContext oldcxt;
+
+	/* Quick exit if we already computed the result */
+	if (relation->rd_idattr != NULL)
+		return bms_copy(relation->rd_idattr);
+
+	/* Fast path if definitely no indexes */
+	if (!RelationGetForm(relation)->relhasindex)
+		return NULL;
+
+	/* Historic snapshot must be set. */
+	Assert(HistoricSnapshotActive());
+
+	indexoidlist = RelationGetIndexList(relation);
+
+	/* Fall out if no indexes (but relhasindex was set) */
+	if (indexoidlist == NIL)
+		return NULL;
+
+	/* Build attributes to idindexattrs by collecting attribute references */
+	indexDesc = RelationIdGetRelation(relation->rd_replidindex);
+	for (i = 0; i < indexDesc->rd_index->indnatts; i++)
+	{
+		int			attrnum = indexDesc->rd_index->indkey.values[i];
+
+		/*
+		 * We don't include non-key columns into idindexattrs bitmaps. See
+		 * RelationGetIndexAttrBitmap.
+		 */
+		if (attrnum != 0)
+		{
+			if (i < indexDesc->rd_index->indnkeyatts)
+				idindexattrs = bms_add_member(idindexattrs,
+											  attrnum - FirstLowInvalidHeapAttributeNumber);
+		}
+	}
+
+	RelationClose(indexDesc);
+	list_free(indexoidlist);
+
+	/* Don't leak the old values of these bitmaps, if any */
+	bms_free(relation->rd_idattr);
+	relation->rd_idattr = NULL;
+
+	/* Now save copy of the bitmap in the relcache entry */
+	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+	relation->rd_idattr = bms_copy(idindexattrs);
+	MemoryContextSwitchTo(oldcxt);
+
+	/* We return our original working copy for caller to play with */
+	return idindexattrs;
+}
+
+/*
  * RelationGetExclusionInfo -- get info about index's exclusion constraint
  *
  * This should be called only for an index that is known to have an
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 2fcdf79..f772855 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -65,6 +65,8 @@ typedef enum IndexAttrBitmapKind
 extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation,
 											 IndexAttrBitmapKind attrKind);
 
+extern Bitmapset *RelationGetIdentityKeyBitmap(Relation relation);
+
 extern void RelationGetExclusionInfo(Relation indexRelation,
 									 Oid **operators,
 									 Oid **procs,
diff --git a/src/test/subscription/t/010_truncate.pl b/src/test/subscription/t/010_truncate.pl
index be2c0bd..7fc403e 100644
--- a/src/test/subscription/t/010_truncate.pl
+++ b/src/test/subscription/t/010_truncate.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 11;
 
 # setup
 
@@ -158,3 +158,28 @@ is($result, qq(0||), 'truncate of multiple tables some not published');
 $result = $node_subscriber->safe_psql('postgres',
 	"SELECT count(*), min(a), max(a) FROM tab2");
 is($result, qq(3|1|3), 'truncate of multiple tables some not published');
+
+# Test that truncate works for synchronous logical replication
+
+$node_publisher->safe_psql('postgres',
+	"ALTER SYSTEM SET synchronous_standby_names TO 'sub1'");
+$node_publisher->safe_psql('postgres', "SELECT pg_reload_conf()");
+
+# insert data to truncate
+
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO tab1 VALUES (1), (2), (3)");
+
+$node_publisher->wait_for_catchup('sub1');
+
+$result = $node_subscriber->safe_psql('postgres',
+	"SELECT count(*), min(a), max(a) FROM tab1");
+is($result, qq(3|1|3), 'check synchronous logical replication');
+
+$node_publisher->safe_psql('postgres', "TRUNCATE tab1");
+
+$node_publisher->wait_for_catchup('sub1');
+
+$result = $node_subscriber->safe_psql('postgres',
+	"SELECT count(*), min(a), max(a) FROM tab1");
+is($result, qq(0||), 'truncate replicated in synchronous logical replication');

#36

amit.kapila16@gmail.com

over 4 years ago

In reply to: osumi.takamichi@fujitsu.com (#35)

1 attachment(s)

Re: Truncate in synchronous logical replication failed

On Fri, Apr 23, 2021 at 7:18 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

The latest patch looks good to me. I have made a minor modification
and added a commit message in the attached. I would like to once again
ask whether anybody else thinks we should backpatch this? Just a
summary for anybody not following this thread:

This patch fixes the Logical Replication of Truncate in synchronous
commit mode. The Truncate operation acquires an exclusive lock on the
target relation and indexes and waits for logical replication of the
operation to finish at commit. Now because we are acquiring the shared
lock on the target index to get index attributes in pgoutput while
sending the changes for the Truncate operation, it leads to a
deadlock.

Actually, we don't need to acquire a lock on the target index as we
build the cache entry using a historic snapshot and all the later
changes are absorbed while decoding WAL. So, we wrote a special
purpose function for logical replication to get a bitmap of replica
identity attribute numbers where we get that information without
locking the target index.

We are planning not to backpatch this as there doesn't seem to be any
field complaint about this issue since it was introduced in commit
5dfd1e5a in v11.

--
With Regards,
Amit Kapila.

Attachments:

truncate_in_synchronous_logical_replication_v08.patchapplication/octet-stream; name=truncate_in_synchronous_logical_replication_v08.patchDownload

From efa9b99db9c1d2f0ef1484c557b8c042f0e08b02 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Mon, 26 Apr 2021 09:45:39 +0530
Subject: [PATCH v8] Fix Logical Replication of Truncate in synchronous commit
 mode.

The Truncate operation acquires an exclusive lock on the target relation
and indexes and waits for logical replication of the operation to finish
at commit. Now because we are acquiring the shared lock on the target
index to get index attributes in pgoutput while sending the changes for
the Truncate operation, it leads to a deadlock.

Actually, we don't need to acquire a lock on the targetindex as we build
the cache entry using a historic snapshot and all the later changes are
absorbed while decoding WAL. So, we wrote a special purpose function for
logical replication to get a bitmap of replica identity attribute numbers
where we get that information without locking the target index.

We decided not to backpatch this as there doesn't seem to be any field
complaint about this issue since it was introduced in commit 5dfd1e5a in
v11.

Reported-by: Haiying Tang
Author: Takamichi Osumi, test case by Li Japin
Reviewed-by: Amit Kapila, Ajin Cherian
Discussion: https://postgr.es/m/OS0PR01MB6113C2499C7DC70EE55ADB82FB759@OS0PR01MB6113.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/proto.c |  3 +-
 src/backend/utils/cache/relcache.c      | 75 +++++++++++++++++++++++++++++++++
 src/include/utils/relcache.h            |  2 +
 src/test/subscription/t/010_truncate.pl | 27 +++++++++++-
 4 files changed, 104 insertions(+), 3 deletions(-)

diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index 2a1f983..1cf59e0 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -668,8 +668,7 @@ logicalrep_write_attrs(StringInfo out, Relation rel)
 	/* fetch bitmap of REPLICATION IDENTITY attributes */
 	replidentfull = (rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL);
 	if (!replidentfull)
-		idattrs = RelationGetIndexAttrBitmap(rel,
-											 INDEX_ATTR_BITMAP_IDENTITY_KEY);
+		idattrs = RelationGetIdentityKeyBitmap(rel);
 
 	/* send the attributes */
 	for (i = 0; i < desc->natts; i++)
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 29702d6..316a256 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -5207,6 +5207,81 @@ restart:
 }
 
 /*
+ * RelationGetIdentityKeyBitmap -- get a bitmap of replica identity attribute
+ * numbers
+ *
+ * A bitmap of index attribute numbers for the configured replica identity
+ * index is returned.
+ *
+ * See also comments of RelationGetIndexAttrBitmap().
+ *
+ * This is a special purpose function used during logical replication. Here,
+ * unlike RelationGetIndexAttrBitmap(), we don't acquire a lock on the required
+ * index as we build the cache entry using a historic snapshot and all the
+ * later changes are absorbed while decoding WAL. Due to this reason, we don't
+ * need to retry here in case of a change in the set of indexes.
+ */
+Bitmapset *
+RelationGetIdentityKeyBitmap(Relation relation)
+{
+	Bitmapset  *idindexattrs = NULL;	/* columns in the replica identity */
+	List	   *indexoidlist;
+	Relation	indexDesc;
+	int			i;
+	MemoryContext oldcxt;
+
+	/* Quick exit if we already computed the result */
+	if (relation->rd_idattr != NULL)
+		return bms_copy(relation->rd_idattr);
+
+	/* Fast path if definitely no indexes */
+	if (!RelationGetForm(relation)->relhasindex)
+		return NULL;
+
+	/* Historic snapshot must be set. */
+	Assert(HistoricSnapshotActive());
+
+	indexoidlist = RelationGetIndexList(relation);
+
+	/* Fall out if no indexes (but relhasindex was set) */
+	if (indexoidlist == NIL)
+		return NULL;
+
+	/* Build attributes to idindexattrs by collecting attribute references */
+	indexDesc = RelationIdGetRelation(relation->rd_replidindex);
+	for (i = 0; i < indexDesc->rd_index->indnatts; i++)
+	{
+		int			attrnum = indexDesc->rd_index->indkey.values[i];
+
+		/*
+		 * We don't include non-key columns into idindexattrs bitmaps. See
+		 * RelationGetIndexAttrBitmap.
+		 */
+		if (attrnum != 0)
+		{
+			if (i < indexDesc->rd_index->indnkeyatts)
+				idindexattrs = bms_add_member(idindexattrs,
+											  attrnum - FirstLowInvalidHeapAttributeNumber);
+		}
+	}
+
+	RelationClose(indexDesc);
+	list_free(indexoidlist);
+
+	/* Don't leak the old values of these bitmaps, if any */
+	bms_free(relation->rd_idattr);
+	relation->rd_idattr = NULL;
+
+	/* Now save copy of the bitmap in the relcache entry */
+	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+	relation->rd_idattr = bms_copy(idindexattrs);
+	MemoryContextSwitchTo(oldcxt);
+
+	/* We return our original working copy for caller to play with */
+	return idindexattrs;
+}
+
+/*
  * RelationGetExclusionInfo -- get info about index's exclusion constraint
  *
  * This should be called only for an index that is known to have an
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 2fcdf79..f772855 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -65,6 +65,8 @@ typedef enum IndexAttrBitmapKind
 extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation,
 											 IndexAttrBitmapKind attrKind);
 
+extern Bitmapset *RelationGetIdentityKeyBitmap(Relation relation);
+
 extern void RelationGetExclusionInfo(Relation indexRelation,
 									 Oid **operators,
 									 Oid **procs,
diff --git a/src/test/subscription/t/010_truncate.pl b/src/test/subscription/t/010_truncate.pl
index be2c0bd..7fc403e 100644
--- a/src/test/subscription/t/010_truncate.pl
+++ b/src/test/subscription/t/010_truncate.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 11;
 
 # setup
 
@@ -158,3 +158,28 @@ is($result, qq(0||), 'truncate of multiple tables some not published');
 $result = $node_subscriber->safe_psql('postgres',
 	"SELECT count(*), min(a), max(a) FROM tab2");
 is($result, qq(3|1|3), 'truncate of multiple tables some not published');
+
+# Test that truncate works for synchronous logical replication
+
+$node_publisher->safe_psql('postgres',
+	"ALTER SYSTEM SET synchronous_standby_names TO 'sub1'");
+$node_publisher->safe_psql('postgres', "SELECT pg_reload_conf()");
+
+# insert data to truncate
+
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO tab1 VALUES (1), (2), (3)");
+
+$node_publisher->wait_for_catchup('sub1');
+
+$result = $node_subscriber->safe_psql('postgres',
+	"SELECT count(*), min(a), max(a) FROM tab1");
+is($result, qq(3|1|3), 'check synchronous logical replication');
+
+$node_publisher->safe_psql('postgres', "TRUNCATE tab1");
+
+$node_publisher->wait_for_catchup('sub1');
+
+$result = $node_subscriber->safe_psql('postgres',
+	"SELECT count(*), min(a), max(a) FROM tab1");
+is($result, qq(0||), 'truncate replicated in synchronous logical replication');
-- 
1.8.3.1

#37

over 4 years ago

In reply to: Amit Kapila (#36)

RE: Truncate in synchronous logical replication failed

On Monday, April 26, 2021 1:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Apr 23, 2021 at 7:18 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

The latest patch looks good to me. I have made a minor modification and
added a commit message in the attached.

Thank you for updating the patch.

I think we need one space for "targetindex" in the commit message.
From my side, there is no more additional comments !

I would like to once again ask
whether anybody else thinks we should backpatch this? Just a summary for
anybody not following this thread:

This patch fixes the Logical Replication of Truncate in synchronous commit
mode. The Truncate operation acquires an exclusive lock on the target
relation and indexes and waits for logical replication of the operation to finish
at commit. Now because we are acquiring the shared lock on the target index
to get index attributes in pgoutput while sending the changes for the Truncate
operation, it leads to a deadlock.

Actually, we don't need to acquire a lock on the target index as we build the
cache entry using a historic snapshot and all the later changes are absorbed
while decoding WAL. So, we wrote a special purpose function for logical
replication to get a bitmap of replica identity attribute numbers where we get
that information without locking the target index.

We are planning not to backpatch this as there doesn't seem to be any field
complaint about this issue since it was introduced in commit 5dfd1e5a in v11.

Please anyone, share your opinion on this matter, when you have.

Best Regards,
Takamichi Osumi

#38

japinli@hotmail.com

over 4 years ago

In reply to: Amit Kapila (#36)

Re: Truncate in synchronous logical replication failed

On Mon, 26 Apr 2021 at 12:49, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Apr 23, 2021 at 7:18 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

The latest patch looks good to me. I have made a minor modification
and added a commit message in the attached. I would like to once again
ask whether anybody else thinks we should backpatch this? Just a
summary for anybody not following this thread:

This patch fixes the Logical Replication of Truncate in synchronous
commit mode. The Truncate operation acquires an exclusive lock on the
target relation and indexes and waits for logical replication of the
operation to finish at commit. Now because we are acquiring the shared
lock on the target index to get index attributes in pgoutput while
sending the changes for the Truncate operation, it leads to a
deadlock.

Actually, we don't need to acquire a lock on the target index as we
build the cache entry using a historic snapshot and all the later
changes are absorbed while decoding WAL. So, we wrote a special
purpose function for logical replication to get a bitmap of replica
identity attribute numbers where we get that information without
locking the target index.

We are planning not to backpatch this as there doesn't seem to be any
field complaint about this issue since it was introduced in commit
5dfd1e5a in v11.

+1 for apply only on HEAD.

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

#39