Decoding speculative insert with toast leaks memory

amit.kapila16@gmail.com

over 4 years ago

In reply to: Ashutosh Bapat (#1)

Re: Decoding speculative insert with toast leaks memory

On Thu, Mar 25, 2021 at 11:04 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Hi All,
We saw OOM in a system where WAL sender consumed Gigabttes of memory
which was never released. Upon investigation, we found out that there
were many ReorderBufferToastHash memory contexts linked to
ReorderBuffer context, together consuming gigs of memory. They were
running INSERT ... ON CONFLICT .. among other things. A similar report
at [1]

but by then we might have reused the toast_hash and thus can not be
destroyed. But that isn't the problem since the reused toast_hash will
be destroyed eventually.

It's only when the change next to speculative insert is something
other than INSERT/UPDATE/DELETE that we have to worry about a
speculative insert that was never confirmed. So may be for those
cases, we check whether specinsert != null and destroy toast_hash if
it exists.

Can we consider the possibility to destroy the toast_hash in
ReorderBufferCleanupTXN/ReorderBufferTruncateTXN? It will delay the
clean up of memory till the end of stream or txn but there won't be
any memory leak.

--
With Regards,
Amit Kapila.

amit.kapila16@gmail.com

over 4 years ago

In reply to: Peter Geoghegan (#2)

Re: Decoding speculative insert with toast leaks memory

On Thu, May 27, 2021 at 8:27 AM Peter Geoghegan <pg@bowt.ie> wrote:

On Wed, Mar 24, 2021 at 10:34 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Hi All,
We saw OOM in a system where WAL sender consumed Gigabttes of memory
which was never released. Upon investigation, we found out that there
were many ReorderBufferToastHash memory contexts linked to
ReorderBuffer context, together consuming gigs of memory. They were
running INSERT ... ON CONFLICT .. among other things. A similar report
at [1]

What is the relationship between this bug and commit 7259736a6e5,
dealt specifically with TOAST and speculative insertion resource
management issues within reorderbuffer.c? Amit?

This seems to be a pre-existing bug. This should be reproduced in
PG-13 and or prior to that commit. Ashutosh can confirm?

--
With Regards,
Amit Kapila.

amit.kapila16@gmail.com

over 4 years ago

In reply to: Amit Kapila (#3)

Re: Decoding speculative insert with toast leaks memory

On Thu, May 27, 2021 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Mar 25, 2021 at 11:04 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Hi All,
We saw OOM in a system where WAL sender consumed Gigabttes of memory
which was never released. Upon investigation, we found out that there
were many ReorderBufferToastHash memory contexts linked to
ReorderBuffer context, together consuming gigs of memory. They were
running INSERT ... ON CONFLICT .. among other things. A similar report
at [1]

..

but by then we might have reused the toast_hash and thus can not be
destroyed. But that isn't the problem since the reused toast_hash will
be destroyed eventually.

It's only when the change next to speculative insert is something
other than INSERT/UPDATE/DELETE that we have to worry about a
speculative insert that was never confirmed. So may be for those
cases, we check whether specinsert != null and destroy toast_hash if
it exists.

Can we consider the possibility to destroy the toast_hash in
ReorderBufferCleanupTXN/ReorderBufferTruncateTXN? It will delay the
clean up of memory till the end of stream or txn but there won't be
any memory leak.

The other possibility could be to clean it up when we clean the spec
insert change in the below code:
/*
* There's a speculative insertion remaining, just clean in up, it
* can't have been successful, otherwise we'd gotten a confirmation
* record.
*/
if (specinsert)
{
ReorderBufferReturnChange(rb, specinsert, true);
specinsert = NULL;
}

But I guess we might miss cleaning it up in case of an error. A
similar problem could be there in the idea where we will try to tie
the clean up with the next change.

--
With Regards,
Amit Kapila.

dilipbalaut@gmail.com

over 4 years ago

In reply to: Amit Kapila (#3)

Re: Decoding speculative insert with toast leaks memory

On Thu, May 27, 2021 at 9:03 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Mar 25, 2021 at 11:04 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Hi All,
We saw OOM in a system where WAL sender consumed Gigabttes of memory
which was never released. Upon investigation, we found out that there
were many ReorderBufferToastHash memory contexts linked to
ReorderBuffer context, together consuming gigs of memory. They were
running INSERT ... ON CONFLICT .. among other things. A similar report
at [1]

..

but by then we might have reused the toast_hash and thus can not be
destroyed. But that isn't the problem since the reused toast_hash will
be destroyed eventually.

It's only when the change next to speculative insert is something
other than INSERT/UPDATE/DELETE that we have to worry about a
speculative insert that was never confirmed. So may be for those
cases, we check whether specinsert != null and destroy toast_hash if
it exists.

Can we consider the possibility to destroy the toast_hash in
ReorderBufferCleanupTXN/ReorderBufferTruncateTXN? It will delay the
clean up of memory till the end of stream or txn but there won't be
any memory leak.

Currently, we are ignoring XLH_DELETE_IS_SUPER, so maybe we can do
something based on this flag?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

dilipbalaut@gmail.com

over 4 years ago

In reply to: Amit Kapila (#5)

Re: Decoding speculative insert with toast leaks memory

On Thu, May 27, 2021 at 9:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, May 27, 2021 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Mar 25, 2021 at 11:04 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Hi All,
We saw OOM in a system where WAL sender consumed Gigabttes of memory
which was never released. Upon investigation, we found out that there
were many ReorderBufferToastHash memory contexts linked to
ReorderBuffer context, together consuming gigs of memory. They were
running INSERT ... ON CONFLICT .. among other things. A similar report
at [1]

..

but by then we might have reused the toast_hash and thus can not be
destroyed. But that isn't the problem since the reused toast_hash will
be destroyed eventually.

It's only when the change next to speculative insert is something
other than INSERT/UPDATE/DELETE that we have to worry about a
speculative insert that was never confirmed. So may be for those
cases, we check whether specinsert != null and destroy toast_hash if
it exists.

Can we consider the possibility to destroy the toast_hash in
ReorderBufferCleanupTXN/ReorderBufferTruncateTXN? It will delay the
clean up of memory till the end of stream or txn but there won't be
any memory leak.

The other possibility could be to clean it up when we clean the spec
insert change in the below code:

Yeah that could be done.

/*
* There's a speculative insertion remaining, just clean in up, it
* can't have been successful, otherwise we'd gotten a confirmation
* record.
*/
if (specinsert)
{
ReorderBufferReturnChange(rb, specinsert, true);
specinsert = NULL;
}

But I guess we might miss cleaning it up in case of an error. A
similar problem could be there in the idea where we will try to tie
the clean up with the next change.

In error case also we can handle it in the CATCH block no?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

amit.kapila16@gmail.com

over 4 years ago

In reply to: Dilip Kumar (#7)

Re: Decoding speculative insert with toast leaks memory

On Thu, May 27, 2021 at 9:40 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, May 27, 2021 at 9:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Can we consider the possibility to destroy the toast_hash in
ReorderBufferCleanupTXN/ReorderBufferTruncateTXN? It will delay the
clean up of memory till the end of stream or txn but there won't be
any memory leak.

The other possibility could be to clean it up when we clean the spec
insert change in the below code:

Yeah that could be done.

/*
* There's a speculative insertion remaining, just clean in up, it
* can't have been successful, otherwise we'd gotten a confirmation
* record.
*/
if (specinsert)
{
ReorderBufferReturnChange(rb, specinsert, true);
specinsert = NULL;
}

But I guess we might miss cleaning it up in case of an error. A
similar problem could be there in the idea where we will try to tie
the clean up with the next change.

In error case also we can handle it in the CATCH block no?

True, but if you do this clean-up in ReorderBufferCleanupTXN then you
don't need to take care at separate places. Also, toast_hash is stored
in txn so it appears natural to clean it up in while releasing TXN.

--
With Regards,
Amit Kapila.

dilipbalaut@gmail.com

over 4 years ago

In reply to: Amit Kapila (#8)

1 attachment(s)

Re: Decoding speculative insert with toast leaks memory

On Thu, May 27, 2021 at 9:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, May 27, 2021 at 9:40 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

True, but if you do this clean-up in ReorderBufferCleanupTXN then you
don't need to take care at separate places. Also, toast_hash is stored
in txn so it appears natural to clean it up in while releasing TXN.

Make sense, basically, IMHO we will have to do in TruncateTXN and
ReturnTXN as attached?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v1-0001-Cleanup-toast-hash.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Cleanup-toast-hash.patchDownload

From ff049e1ab141ba6bc5c05d62e7a225abfb18fad8 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Thu, 27 May 2021 10:03:27 +0530
Subject: [PATCH v1] Cleanup toast hash

---
 src/backend/replication/logical/reorderbuffer.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 4401608..39242f2 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -437,6 +437,10 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->tuplecid_hash = NULL;
 	}
 
+	/* cleanup the toast hash if it is not done already. */
+	if (txn->toast_hash != NULL)
+		ReorderBufferToastReset(rb, txn);	
+
 	if (txn->invalidations)
 	{
 		pfree(txn->invalidations);
@@ -1629,6 +1633,10 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, bool txn_prep
 		txn->tuplecid_hash = NULL;
 	}
 
+	/* cleanup the toast hash if it is not done already. */
+	if (txn->toast_hash != NULL)
+		ReorderBufferToastReset(rb, txn);
+
 	/* If this txn is serialized then clean the disk space. */
 	if (rbtxn_is_serialized(txn))
 	{
-- 
1.8.3.1

#10

tomas.vondra@enterprisedb.com

over 4 years ago

In reply to: Dilip Kumar (#9)

Re: Decoding speculative insert with toast leaks memory

On 5/27/21 6:36 AM, Dilip Kumar wrote:

On Thu, May 27, 2021 at 9:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, May 27, 2021 at 9:40 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

True, but if you do this clean-up in ReorderBufferCleanupTXN then you
don't need to take care at separate places. Also, toast_hash is stored
in txn so it appears natural to clean it up in while releasing TXN.

Make sense, basically, IMHO we will have to do in TruncateTXN and
ReturnTXN as attached?

Yeah, I've been working on a fix over the last couple days (because of a
customer issue), and I ended up with the reset in ReorderBufferReturnTXN
too - it does solve the issue in the case I've been investigating.

I'm not sure the reset in ReorderBufferTruncateTXN is correct, though.
Isn't it possible that we'll need the TOAST data after streaming part of
the transaction? After all, we're not resetting invalidations, tuplecids
and snapshot either ... And we'll eventually clean it after the streamed
transaction gets commited (ReorderBufferStreamCommit ends up calling
ReorderBufferReturnTXN too).

I wonder if there's a way to free the TOASTed data earlier, instead of
waiting until the end of the transaction (as this patch does). But I
suspect it'd be way more complex, harder to backpatch, and destroying
the hash table is a good idea anyway.

Also, I think the "if (txn->toast_hash != NULL)" checks are not needed,
because it's the first thing ReorderBufferToastReset does.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#11

dilipbalaut@gmail.com

over 4 years ago

In reply to: Tomas Vondra (#10)

Re: Decoding speculative insert with toast leaks memory

On Fri, May 28, 2021 at 5:16 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/27/21 6:36 AM, Dilip Kumar wrote:

On Thu, May 27, 2021 at 9:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, May 27, 2021 at 9:40 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

True, but if you do this clean-up in ReorderBufferCleanupTXN then you
don't need to take care at separate places. Also, toast_hash is stored
in txn so it appears natural to clean it up in while releasing TXN.

Make sense, basically, IMHO we will have to do in TruncateTXN and
ReturnTXN as attached?

Yeah, I've been working on a fix over the last couple days (because of a
customer issue), and I ended up with the reset in ReorderBufferReturnTXN
too - it does solve the issue in the case I've been investigating.

I'm not sure the reset in ReorderBufferTruncateTXN is correct, though.
Isn't it possible that we'll need the TOAST data after streaming part of
the transaction? After all, we're not resetting invalidations, tuplecids
and snapshot either

Actually, as per the current design, we don't need the toast data
after the streaming. Because currently, we don't allow to stream the
transaction if we need to keep the toast across stream e.g. if we only
have toast insert without the main insert we assure this as partial
changes, another case is if we have multi-insert with toast then we
keep the transaction as mark partial until we get the last insert of
the multi-insert. So with the current design we don't have any case
where we need to keep toast data across streams.

... And we'll eventually clean it after the streamed

transaction gets commited (ReorderBufferStreamCommit ends up calling
ReorderBufferReturnTXN too).

Right, but generally after streaming we assert that txn->size should
be 0. That could be changed if we change the above design but this is
what it is today.

I wonder if there's a way to free the TOASTed data earlier, instead of
waiting until the end of the transaction (as this patch does). But I
suspect it'd be way more complex, harder to backpatch, and destroying
the hash table is a good idea anyway.

Right.

Also, I think the "if (txn->toast_hash != NULL)" checks are not needed,
because it's the first thing ReorderBufferToastReset does.

I see, I will change this. If we all agree with this idea.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#12

tomas.vondra@enterprisedb.com

over 4 years ago

In reply to: Dilip Kumar (#11)

Re: Decoding speculative insert with toast leaks memory

On 5/28/21 2:17 PM, Dilip Kumar wrote:

On Fri, May 28, 2021 at 5:16 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/27/21 6:36 AM, Dilip Kumar wrote:

On Thu, May 27, 2021 at 9:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, May 27, 2021 at 9:40 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

True, but if you do this clean-up in ReorderBufferCleanupTXN then you
don't need to take care at separate places. Also, toast_hash is stored
in txn so it appears natural to clean it up in while releasing TXN.

Make sense, basically, IMHO we will have to do in TruncateTXN and
ReturnTXN as attached?

Yeah, I've been working on a fix over the last couple days (because of a
customer issue), and I ended up with the reset in ReorderBufferReturnTXN
too - it does solve the issue in the case I've been investigating.

I'm not sure the reset in ReorderBufferTruncateTXN is correct, though.
Isn't it possible that we'll need the TOAST data after streaming part of
the transaction? After all, we're not resetting invalidations, tuplecids
and snapshot either

Actually, as per the current design, we don't need the toast data
after the streaming. Because currently, we don't allow to stream the
transaction if we need to keep the toast across stream e.g. if we only
have toast insert without the main insert we assure this as partial
changes, another case is if we have multi-insert with toast then we
keep the transaction as mark partial until we get the last insert of
the multi-insert. So with the current design we don't have any case
where we need to keep toast data across streams.

... And we'll eventually clean it after the streamed

transaction gets commited (ReorderBufferStreamCommit ends up calling
ReorderBufferReturnTXN too).

Right, but generally after streaming we assert that txn->size should
be 0. That could be changed if we change the above design but this is
what it is today.

Can we add some assert to enforce this?

I wonder if there's a way to free the TOASTed data earlier, instead of
waiting until the end of the transaction (as this patch does). But I
suspect it'd be way more complex, harder to backpatch, and destroying
the hash table is a good idea anyway.

Right.

Also, I think the "if (txn->toast_hash != NULL)" checks are not needed,
because it's the first thing ReorderBufferToastReset does.

I see, I will change this. If we all agree with this idea.

+1 from me. I think it's good enough to do the cleanup at the end, and
it's an improvement compared to current state. There might be cases of
transactions doing many such speculative inserts and accumulating a lot
of data in the TOAST hash, but I find it very unlikely.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13

amit.kapila16@gmail.com

over 4 years ago

In reply to: Tomas Vondra (#12)

Re: Decoding speculative insert with toast leaks memory

On Fri, May 28, 2021 at 6:01 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/28/21 2:17 PM, Dilip Kumar wrote:

On Fri, May 28, 2021 at 5:16 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/27/21 6:36 AM, Dilip Kumar wrote:

On Thu, May 27, 2021 at 9:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, May 27, 2021 at 9:40 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

True, but if you do this clean-up in ReorderBufferCleanupTXN then you
don't need to take care at separate places. Also, toast_hash is stored
in txn so it appears natural to clean it up in while releasing TXN.

Make sense, basically, IMHO we will have to do in TruncateTXN and
ReturnTXN as attached?

Yeah, I've been working on a fix over the last couple days (because of a
customer issue), and I ended up with the reset in ReorderBufferReturnTXN
too - it does solve the issue in the case I've been investigating.

I'm not sure the reset in ReorderBufferTruncateTXN is correct, though.
Isn't it possible that we'll need the TOAST data after streaming part of
the transaction? After all, we're not resetting invalidations, tuplecids
and snapshot either

Actually, as per the current design, we don't need the toast data
after the streaming. Because currently, we don't allow to stream the
transaction if we need to keep the toast across stream e.g. if we only
have toast insert without the main insert we assure this as partial
changes, another case is if we have multi-insert with toast then we
keep the transaction as mark partial until we get the last insert of
the multi-insert. So with the current design we don't have any case
where we need to keep toast data across streams.

... And we'll eventually clean it after the streamed

transaction gets commited (ReorderBufferStreamCommit ends up calling
ReorderBufferReturnTXN too).

Right, but generally after streaming we assert that txn->size should
be 0. That could be changed if we change the above design but this is
what it is today.

Can we add some assert to enforce this?

There is already an Assert for this. See ReorderBufferCheckMemoryLimit.

--
With Regards,
Amit Kapila.

#14

amit.kapila16@gmail.com

over 4 years ago

In reply to: Tomas Vondra (#10)

Re: Decoding speculative insert with toast leaks memory

On Fri, May 28, 2021 at 5:16 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

I wonder if there's a way to free the TOASTed data earlier, instead of
waiting until the end of the transaction (as this patch does).

IIUC we are anyway freeing the toasted data at the next
insert/update/delete. We can try to free at other change message types
like REORDER_BUFFER_CHANGE_MESSAGE but as you said that may make the
patch more complex, so it seems better to do the fix on the lines of
what is proposed in the patch.

--
With Regards,
Amit Kapila.

#15

tomas.vondra@enterprisedb.com

over 4 years ago

In reply to: Amit Kapila (#14)

Re: Decoding speculative insert with toast leaks memory

On 5/29/21 6:29 AM, Amit Kapila wrote:

On Fri, May 28, 2021 at 5:16 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

I wonder if there's a way to free the TOASTed data earlier, instead of
waiting until the end of the transaction (as this patch does).

IIUC we are anyway freeing the toasted data at the next
insert/update/delete. We can try to free at other change message types
like REORDER_BUFFER_CHANGE_MESSAGE but as you said that may make the
patch more complex, so it seems better to do the fix on the lines of
what is proposed in the patch.

Even if we started doing what you mention (freeing the hash for other
change types), we'd still need to do what the patch proposes because the
speculative insert may be the last change in the transaction. For the
other cases it works as a mitigation, so that we don't leak the memory
forever.

So let's get this committed, perhaps with a comment explaining that it
might be possible to reset earlier if needed.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16

amit.kapila16@gmail.com

over 4 years ago

In reply to: Tomas Vondra (#15)

Re: Decoding speculative insert with toast leaks memory

On Sat, May 29, 2021 at 5:45 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/29/21 6:29 AM, Amit Kapila wrote:

On Fri, May 28, 2021 at 5:16 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

I wonder if there's a way to free the TOASTed data earlier, instead of
waiting until the end of the transaction (as this patch does).

IIUC we are anyway freeing the toasted data at the next
insert/update/delete. We can try to free at other change message types
like REORDER_BUFFER_CHANGE_MESSAGE but as you said that may make the
patch more complex, so it seems better to do the fix on the lines of
what is proposed in the patch.

+1

Even if we started doing what you mention (freeing the hash for other
change types), we'd still need to do what the patch proposes because the
speculative insert may be the last change in the transaction. For the
other cases it works as a mitigation, so that we don't leak the memory
forever.

Right.

So let's get this committed, perhaps with a comment explaining that it
might be possible to reset earlier if needed.

Okay, I think it would be better if we can test this once for the
streaming case as well. Dilip, would you like to do that and send the
updated patch as per one of the comments by Tomas?

--
With Regards,
Amit Kapila.

#17

dilipbalaut@gmail.com

over 4 years ago

In reply to: Amit Kapila (#16)

Re: Decoding speculative insert with toast leaks memory

On Mon, 31 May 2021 at 8:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, May 29, 2021 at 5:45 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/29/21 6:29 AM, Amit Kapila wrote:

On Fri, May 28, 2021 at 5:16 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

I wonder if there's a way to free the TOASTed data earlier, instead of
waiting until the end of the transaction (as this patch does).

IIUC we are anyway freeing the toasted data at the next
insert/update/delete. We can try to free at other change message types
like REORDER_BUFFER_CHANGE_MESSAGE but as you said that may make the
patch more complex, so it seems better to do the fix on the lines of
what is proposed in the patch.

+1

Even if we started doing what you mention (freeing the hash for other
change types), we'd still need to do what the patch proposes because the
speculative insert may be the last change in the transaction. For the
other cases it works as a mitigation, so that we don't leak the memory
forever.

Right.

So let's get this committed, perhaps with a comment explaining that it
might be possible to reset earlier if needed.

Okay, I think it would be better if we can test this once for the
streaming case as well. Dilip, would you like to do that and send the
updated patch as per one of the comments by Tomas?

I will do that in sometime.

--

Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#18

dilipbalaut@gmail.com

over 4 years ago

In reply to: Dilip Kumar (#17)

2 attachment(s)

Re: Decoding speculative insert with toast leaks memory

On Mon, May 31, 2021 at 8:50 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, 31 May 2021 at 8:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Okay, I think it would be better if we can test this once for the
streaming case as well. Dilip, would you like to do that and send the
updated patch as per one of the comments by Tomas?

I will do that sometime.

I have changed patches as Tomas suggested and also created back patches.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v1-0001-Fix-memory-leak-in-toast-hash_v13-to-v9.6.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Fix-memory-leak-in-toast-hash_v13-to-v9.6.patchDownload

From 792892fce8771c003eca16847e5e494a70f318ed Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Mon, 31 May 2021 15:20:05 +0530
Subject: [PATCH v1] Fix memory leak in toast hash

While cleaning up the changes just destory the toast hash so that if
it is not already done in some cases we don't leak memory.
---
 src/backend/replication/logical/reorderbuffer.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 5251932..89fa7e7 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -391,6 +391,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->tuplecid_hash = NULL;
 	}
 
+	/* cleanup the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	if (txn->invalidations)
 	{
 		pfree(txn->invalidations);
-- 
1.8.3.1

v1-0001-Fix-memory-leak-in-toast-hash-v14.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Fix-memory-leak-in-toast-hash-v14.patchDownload

From 9a7eeaf7b47ec6faff0bbc524412d46869136720 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Thu, 27 May 2021 10:03:27 +0530
Subject: [PATCH v1] Fix memory leak in toast hash

While cleaning up the changes just destory the toast hash so that if
it is not already done in some cases we don't leak memory.
---
 src/backend/replication/logical/reorderbuffer.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 2d9e127..ab65d3b 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -437,6 +437,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->tuplecid_hash = NULL;
 	}
 
+	/* cleanup the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	if (txn->invalidations)
 	{
 		pfree(txn->invalidations);
@@ -1637,6 +1640,9 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, bool txn_prep
 		txn->tuplecid_hash = NULL;
 	}
 
+	/* Cleanup the toast hash. */
+	ReorderBufferToastReset(rb, txn);
+
 	/* If this txn is serialized then clean the disk space. */
 	if (rbtxn_is_serialized(txn))
 	{
-- 
1.8.3.1

#19

dilipbalaut@gmail.com

over 4 years ago

In reply to: Dilip Kumar (#18)

Re: Decoding speculative insert with toast leaks memory

On Mon, 31 May 2021 at 4:29 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, May 31, 2021 at 8:50 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, 31 May 2021 at 8:21 AM, Amit Kapila <amit.kapila16@gmail.com>

wrote:

Okay, I think it would be better if we can test this once for the
streaming case as well. Dilip, would you like to do that and send the
updated patch as per one of the comments by Tomas?

I will do that sometime.

I have changed patches as Tomas suggested and also created back patches.

I missed to do the test for streaming. I will to that tomorrow and reply
back.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#20

dilipbalaut@gmail.com

over 4 years ago

In reply to: Dilip Kumar (#19)

Re: Decoding speculative insert with toast leaks memory

On Mon, May 31, 2021 at 6:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, 31 May 2021 at 4:29 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, May 31, 2021 at 8:50 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, 31 May 2021 at 8:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Okay, I think it would be better if we can test this once for the
streaming case as well. Dilip, would you like to do that and send the
updated patch as per one of the comments by Tomas?

I will do that sometime.

I have changed patches as Tomas suggested and also created back patches.

I missed to do the test for streaming. I will to that tomorrow and reply back.

For streaming transactions this issue is not there. Because this
problem will only occur if the last change is *SPEC INSERT * and after
that there is no other UPDATE/INSERT change because on that change we
are resetting the toast table. Now, if the transaction has only *SPEC
INSERT* without SPEC CONFIRM or any other INSERT/UPDATE then we will
not stream it. And if we get any next INSERT/UPDATE then only we can
select this for stream but in that case toast will be reset. So as of
today for streaming mode we don't have this problem.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#21

amit.kapila16@gmail.com

over 4 years ago

In reply to: Dilip Kumar (#20)

Re: Decoding speculative insert with toast leaks memory

On Mon, May 31, 2021 at 8:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, May 31, 2021 at 6:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I missed to do the test for streaming. I will to that tomorrow and reply back.

For streaming transactions this issue is not there. Because this
problem will only occur if the last change is *SPEC INSERT * and after
that there is no other UPDATE/INSERT change because on that change we
are resetting the toast table. Now, if the transaction has only *SPEC
INSERT* without SPEC CONFIRM or any other INSERT/UPDATE then we will
not stream it. And if we get any next INSERT/UPDATE then only we can
select this for stream but in that case toast will be reset. So as of
today for streaming mode we don't have this problem.

What if the next change is a different SPEC_INSERT
(REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT)? I think in that case we
will stream but won't free the toast memory.

--
With Regards,
Amit Kapila.

#22

dilipbalaut@gmail.com

over 4 years ago

In reply to: Amit Kapila (#21)

Re: Decoding speculative insert with toast leaks memory

On Tue, Jun 1, 2021 at 9:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, May 31, 2021 at 8:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, May 31, 2021 at 6:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I missed to do the test for streaming. I will to that tomorrow and reply back.

For streaming transactions this issue is not there. Because this
problem will only occur if the last change is *SPEC INSERT * and after
that there is no other UPDATE/INSERT change because on that change we
are resetting the toast table. Now, if the transaction has only *SPEC
INSERT* without SPEC CONFIRM or any other INSERT/UPDATE then we will
not stream it. And if we get any next INSERT/UPDATE then only we can
select this for stream but in that case toast will be reset. So as of
today for streaming mode we don't have this problem.

What if the next change is a different SPEC_INSERT
(REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT)? I think in that case we
will stream but won't free the toast memory.

But if the next change is again the SPEC INSERT then we will keep the
PARTIAL change flag set and we will not select this transaction for
stream right?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#23

amit.kapila16@gmail.com

over 4 years ago

In reply to: Dilip Kumar (#22)

Re: Decoding speculative insert with toast leaks memory

On Tue, Jun 1, 2021 at 9:59 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jun 1, 2021 at 9:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, May 31, 2021 at 8:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, May 31, 2021 at 6:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I missed to do the test for streaming. I will to that tomorrow and reply back.

For streaming transactions this issue is not there. Because this
problem will only occur if the last change is *SPEC INSERT * and after
that there is no other UPDATE/INSERT change because on that change we
are resetting the toast table. Now, if the transaction has only *SPEC
INSERT* without SPEC CONFIRM or any other INSERT/UPDATE then we will
not stream it. And if we get any next INSERT/UPDATE then only we can
select this for stream but in that case toast will be reset. So as of
today for streaming mode we don't have this problem.

What if the next change is a different SPEC_INSERT
(REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT)? I think in that case we
will stream but won't free the toast memory.

But if the next change is again the SPEC INSERT then we will keep the
PARTIAL change flag set and we will not select this transaction for
stream right?

Right, I think you can remove the change related to stream xact and
probably write some comments on why we don't need it for streamed
transactions. But, now I have another question related to fixing the
non-streamed case. What if after the missing spec_confirm we get the
delete operation in the transaction? It seems
ReorderBufferToastReplace always expects Insert/Update if we have
toast hash active in the transaction.

--
With Regards,
Amit Kapila.

#24

dilipbalaut@gmail.com

over 4 years ago

In reply to: Amit Kapila (#23)

Re: Decoding speculative insert with toast leaks memory

On Tue, Jun 1, 2021 at 10:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 1, 2021 at 9:59 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jun 1, 2021 at 9:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, May 31, 2021 at 8:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, May 31, 2021 at 6:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I missed to do the test for streaming. I will to that tomorrow and reply back.

For streaming transactions this issue is not there. Because this
problem will only occur if the last change is *SPEC INSERT * and after
that there is no other UPDATE/INSERT change because on that change we
are resetting the toast table. Now, if the transaction has only *SPEC
INSERT* without SPEC CONFIRM or any other INSERT/UPDATE then we will
not stream it. And if we get any next INSERT/UPDATE then only we can
select this for stream but in that case toast will be reset. So as of
today for streaming mode we don't have this problem.

What if the next change is a different SPEC_INSERT
(REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT)? I think in that case we
will stream but won't free the toast memory.

But if the next change is again the SPEC INSERT then we will keep the
PARTIAL change flag set and we will not select this transaction for
stream right?

Right, I think you can remove the change related to stream xact and
probably write some comments on why we don't need it for streamed
transactions. But, now I have another question related to fixing the
non-streamed case. What if after the missing spec_confirm we get the
delete operation in the transaction? It seems
ReorderBufferToastReplace always expects Insert/Update if we have
toast hash active in the transaction.

Yeah, that looks like a problem, I will test this case.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#25

dilipbalaut@gmail.com

over 4 years ago

In reply to: Dilip Kumar (#24)

Re: Decoding speculative insert with toast leaks memory

On Tue, Jun 1, 2021 at 11:00 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jun 1, 2021 at 10:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 1, 2021 at 9:59 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jun 1, 2021 at 9:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, May 31, 2021 at 8:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, May 31, 2021 at 6:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I missed to do the test for streaming. I will to that tomorrow and reply back.

For streaming transactions this issue is not there. Because this
problem will only occur if the last change is *SPEC INSERT * and after
that there is no other UPDATE/INSERT change because on that change we
are resetting the toast table. Now, if the transaction has only *SPEC
INSERT* without SPEC CONFIRM or any other INSERT/UPDATE then we will
not stream it. And if we get any next INSERT/UPDATE then only we can
select this for stream but in that case toast will be reset. So as of
today for streaming mode we don't have this problem.

What if the next change is a different SPEC_INSERT
(REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT)? I think in that case we
will stream but won't free the toast memory.

But if the next change is again the SPEC INSERT then we will keep the
PARTIAL change flag set and we will not select this transaction for
stream right?

Right, I think you can remove the change related to stream xact and
probably write some comments on why we don't need it for streamed
transactions. But, now I have another question related to fixing the
non-streamed case. What if after the missing spec_confirm we get the
delete operation in the transaction? It seems
ReorderBufferToastReplace always expects Insert/Update if we have
toast hash active in the transaction.

Yeah, that looks like a problem, I will test this case.

I am able to hit that Assert after slight modification in the original
test case, basically, I added an extra delete in the spec abort
transaction and I got this assertion.

#0 0x00007f7b8cc3a387 in raise () from /lib64/libc.so.6
#1 0x00007f7b8cc3ba78 in abort () from /lib64/libc.so.6
#2 0x0000000000b027c7 in ExceptionalCondition (conditionName=0xcc11df
"change->data.tp.newtuple", errorType=0xcc0244 "FailedAssertion",
fileName=0xcc0290 "reorderbuffer.c", lineNumber=4601) at assert.c:69
#3 0x00000000008dfeaf in ReorderBufferToastReplace (rb=0x1a73e40,
txn=0x1b5d6e8, relation=0x7f7b8dab4d78, change=0x1b5fb68) at
reorderbuffer.c:4601
#4 0x00000000008db769 in ReorderBufferProcessTXN (rb=0x1a73e40,
txn=0x1b5d6e8, commit_lsn=24329048, snapshot_now=0x1b4b8d0,
command_id=0, streaming=false)
at reorderbuffer.c:2187
#5 0x00000000008dc1df in ReorderBufferReplay (txn=0x1b5d6e8,
rb=0x1a73e40, xid=748, commit_lsn=24329048, end_lsn=24329096,
commit_time=675842700629597,
origin_id=0, origin_lsn=0) at reorderbuffer.c:2601
#6 0x00000000008dc267 in ReorderBufferCommit (rb=0x1a73e40, xid=748,
commit_lsn=24329048, end_lsn=24329096, commit_time=675842700629597,
origin_id=0, origin_lsn=0)
at reorderbuffer.c:2625
#7 0x00000000008cc144 in DecodeCommit (ctx=0x1b319b0,
buf=0x7ffdf15fb0a0, parsed=0x7ffdf15faf00, xid=748, two_phase=false)
at decode.c:744

IMHO, as I stated earlier one way to fix this problem is that we add
the spec abort operation (DELETE + XLH_DELETE_IS_SUPER flag) to the
queue, maybe with action name
"REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT" and as part of processing
that just cleans up the toast and specinsert tuple and nothing else.
If we think that makes sense then I can work on that patch?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#26

amit.kapila16@gmail.com

over 4 years ago

In reply to: Dilip Kumar (#25)

Re: Decoding speculative insert with toast leaks memory

On Tue, Jun 1, 2021 at 11:44 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jun 1, 2021 at 11:00 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jun 1, 2021 at 10:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Right, I think you can remove the change related to stream xact and
probably write some comments on why we don't need it for streamed
transactions. But, now I have another question related to fixing the
non-streamed case. What if after the missing spec_confirm we get the
delete operation in the transaction? It seems
ReorderBufferToastReplace always expects Insert/Update if we have
toast hash active in the transaction.

Yeah, that looks like a problem, I will test this case.

I am able to hit that Assert after slight modification in the original
test case, basically, I added an extra delete in the spec abort
transaction and I got this assertion.

Can we try with other Insert/Update after spec abort to check if there
can be other problems due to active toast_hash?

IMHO, as I stated earlier one way to fix this problem is that we add
the spec abort operation (DELETE + XLH_DELETE_IS_SUPER flag) to the
queue, maybe with action name
"REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT" and as part of processing
that just cleans up the toast and specinsert tuple and nothing else.
If we think that makes sense then I can work on that patch?

I think this should solve the problem but let's first try to see if we
have any other problems. Because, say, if we don't have any other
problem, then maybe removing Assert might work but I guess we still
need to process the tuple to find that we don't need to assemble toast
for it which again seems like overkill.

--
With Regards,
Amit Kapila.

#27

dilipbalaut@gmail.com

over 4 years ago

In reply to: Amit Kapila (#26)

Re: Decoding speculative insert with toast leaks memory

On Tue, Jun 1, 2021 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

IMHO, as I stated earlier one way to fix this problem is that we add
the spec abort operation (DELETE + XLH_DELETE_IS_SUPER flag) to the
queue, maybe with action name
"REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT" and as part of processing
that just cleans up the toast and specinsert tuple and nothing else.
If we think that makes sense then I can work on that patch?

I think this should solve the problem but let's first try to see if we
have any other problems. Because, say, if we don't have any other
problem, then maybe removing Assert might work but I guess we still
need to process the tuple to find that we don't need to assemble toast
for it which again seems like overkill.

Yeah, other operation will also fail, basically, if txn->toast_hash is
not NULL then we assume that we need to assemble the tuple using
toast, but if there is next insert in another relation and if that
relation doesn't have a toast table then it will fail with below
error. And otherwise also, if it is the same relation, then the toast
chunks of previous tuple will be used for constructing this new tuple.
I think we must have to clean the toast before processing the next
tuple so I think we can go with the solution I mentioned above.

static void
ReorderBufferToastReplace
{
...
toast_rel = RelationIdGetRelation(relation->rd_rel->reltoastrelid);
if (!RelationIsValid(toast_rel))
elog(ERROR, "could not open relation with OID %u",
relation->rd_rel->reltoastrelid);

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#28

dilipbalaut@gmail.com

over 4 years ago

In reply to: Dilip Kumar (#27)

1 attachment(s)

Re: Decoding speculative insert with toast leaks memory

On Tue, Jun 1, 2021 at 5:22 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jun 1, 2021 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

IMHO, as I stated earlier one way to fix this problem is that we add
the spec abort operation (DELETE + XLH_DELETE_IS_SUPER flag) to the
queue, maybe with action name
"REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT" and as part of processing
that just cleans up the toast and specinsert tuple and nothing else.
If we think that makes sense then I can work on that patch?

I think this should solve the problem but let's first try to see if we
have any other problems. Because, say, if we don't have any other
problem, then maybe removing Assert might work but I guess we still
need to process the tuple to find that we don't need to assemble toast
for it which again seems like overkill.

Yeah, other operation will also fail, basically, if txn->toast_hash is
not NULL then we assume that we need to assemble the tuple using
toast, but if there is next insert in another relation and if that
relation doesn't have a toast table then it will fail with below
error. And otherwise also, if it is the same relation, then the toast
chunks of previous tuple will be used for constructing this new tuple.
I think we must have to clean the toast before processing the next
tuple so I think we can go with the solution I mentioned above.

static void
ReorderBufferToastReplace
{
...
toast_rel = RelationIdGetRelation(relation->rd_rel->reltoastrelid);
if (!RelationIsValid(toast_rel))
elog(ERROR, "could not open relation with OID %u",
relation->rd_rel->reltoastrelid);

The attached patch fixes by queuing the spec abort change and cleaning
up the toast hash on spec abort. Currently, in this patch I am
queuing up all the spec abort changes, but as an optimization we can
avoid
queuing the spec abort for toast tables but for that we need to log
that as a flag in WAL. that this XLH_DELETE_IS_SUPER is for a toast
relation.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v2-0001-Bug-fix-for-speculative-abort.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Bug-fix-for-speculative-abort.patchDownload

From f607e140cae21183fea2fc226029d7673478509e Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Tue, 1 Jun 2021 19:53:47 +0530
Subject: [PATCH v2] Bug fix for speculative abort

If speculative insert has a toast table insert then if that tuple
is not confirmed then the toast hash is not cleaned and that is
creating various problem like a) memory leak b) next insert is
using these uncleaned toast data for its insertion and other
error and assersion failure.  So this patch handle that by
queuing the spec abort changes and cleaning up the toast hash
on spec abort.  Currently, in this patch we are queuing up all
the spec abort changes, but as an optimization we can avoid
queuing the spec abort for toast tables but for that we need to
log that as a flag in WAL.
---
 src/backend/replication/logical/decode.c        | 16 +++++++-----
 src/backend/replication/logical/reorderbuffer.c | 34 +++++++++++++++++++++++++
 src/include/replication/reorderbuffer.h         |  1 +
 3 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 7067016..1a0d7dc 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -1040,19 +1040,21 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (target_node.dbNode != ctx->slot->data.database)
 		return;
 
+	/* output plugin doesn't look for this origin, no need to queue */
+	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
+		return;
+
+	change = ReorderBufferGetChange(ctx->reorder);
+
 	/*
 	 * Super deletions are irrelevant for logical decoding, it's driven by the
 	 * confirmation records.
 	 */
 	if (xlrec->flags & XLH_DELETE_IS_SUPER)
-		return;
-
-	/* output plugin doesn't look for this origin, no need to queue */
-	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
-		return;
+		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
+	else
+		change->action = REORDER_BUFFER_CHANGE_DELETE;
 
-	change = ReorderBufferGetChange(ctx->reorder);
-	change->action = REORDER_BUFFER_CHANGE_DELETE;
 	change->origin_id = XLogRecGetOrigin(r);
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 2d9e127..4940cf5 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -520,6 +520,7 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change,
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
@@ -2254,6 +2255,36 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 					specinsert = change;
 					break;
 
+				case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+
+					/*
+					 * Abort for speculative insertion arrived.  So cleanup the
+					 * specinsert tuple and toast hash.  If spec insert change
+					 * is NULL then do nothing, this is possible because we
+					 * have spec abort for each toast entry.  So we just have
+					 * to clean the specinsert and toast hash for the first
+					 * spec abort for the main table and for the remaining
+					 * entries we can just ignore.
+					 *
+					 * XXX For optimization, we may log a flag saying this is
+					 * a spec abort for the toast table and we can avoid queuing
+					 * that change.
+					 */
+					if (specinsert != NULL)
+					{
+						/* Clear the toast chunk */
+						Assert(change->data.tp.clear_toast_afterwards);
+						ReorderBufferToastReset(rb, txn);
+
+						/*
+						* If the speculative insertion was aborted, the record
+						* isn't needed anymore.
+						*/
+						ReorderBufferReturnChange(rb, specinsert, true);
+						specinsert = NULL;
+					}
+					break;
+
 				case REORDER_BUFFER_CHANGE_TRUNCATE:
 					{
 						int			i;
@@ -3754,6 +3785,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -4017,6 +4049,7 @@ ReorderBufferChangeSize(ReorderBufferChange *change)
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -4315,6 +4348,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 0c6e9d1..9ff0986 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -63,6 +63,7 @@ enum ReorderBufferChangeType
 	REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT,
 	REORDER_BUFFER_CHANGE_TRUNCATE
 };
 
-- 
1.8.3.1

#29

amit.kapila16@gmail.com

over 4 years ago

In reply to: Dilip Kumar (#28)

Re: Decoding speculative insert with toast leaks memory

On Tue, Jun 1, 2021 at 8:01 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

The attached patch fixes by queuing the spec abort change and cleaning
up the toast hash on spec abort. Currently, in this patch I am
queuing up all the spec abort changes, but as an optimization we can
avoid
queuing the spec abort for toast tables but for that we need to log
that as a flag in WAL. that this XLH_DELETE_IS_SUPER is for a toast
relation.

I don't think that is required especially because we intend to
backpatch this, so I would like to keep such optimization for another
day. Few comments:

Comments:
------------
/*
* Super deletions are irrelevant for logical decoding, it's driven by the
* confirmation records.
*/
1. The above comment is not required after your other changes.

/*
* Either speculative insertion was confirmed, or it was
* unsuccessful and the record isn't needed anymore.
*/
if (specinsert != NULL)
2. The above comment needs some adjustment.

/*
* There's a speculative insertion remaining, just clean in up, it
* can't have been successful, otherwise we'd gotten a confirmation
* record.
*/
if (specinsert)
{
ReorderBufferReturnChange(rb, specinsert, true);
specinsert = NULL;
}

3. Ideally, we should have an Assert here because we shouldn't reach
without cleaning up specinsert. If there is still a chance then we
should mention that in the comments.

4.
+ case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+
+ /*
+ * Abort for speculative insertion arrived.

I think here we should explain why we can't piggyback cleanup on next
insert/update/delete.

5. Can we write a test case for it? I guess we don't need to use
multiple sessions if the conflicting record is already present.

Please see if the same patch works on back-branches? I guess this
makes the change bit tricky as it involves decoding a new message but
not sure if there is a better way.

--
With Regards,
Amit Kapila.

#30

amit.kapila16@gmail.com

over 4 years ago

In reply to: Dilip Kumar (#27)

Re: Decoding speculative insert with toast leaks memory

On Tue, Jun 1, 2021 at 5:23 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jun 1, 2021 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

IMHO, as I stated earlier one way to fix this problem is that we add
the spec abort operation (DELETE + XLH_DELETE_IS_SUPER flag) to the
queue, maybe with action name
"REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT" and as part of processing
that just cleans up the toast and specinsert tuple and nothing else.
If we think that makes sense then I can work on that patch?

I think this should solve the problem but let's first try to see if we
have any other problems. Because, say, if we don't have any other
problem, then maybe removing Assert might work but I guess we still
need to process the tuple to find that we don't need to assemble toast
for it which again seems like overkill.

Yeah, other operation will also fail, basically, if txn->toast_hash is
not NULL then we assume that we need to assemble the tuple using
toast, but if there is next insert in another relation and if that
relation doesn't have a toast table then it will fail with below
error. And otherwise also, if it is the same relation, then the toast
chunks of previous tuple will be used for constructing this new tuple.

I think the same relation case might not create a problem because it
won't find the entry for it in the toast_hash, so it will return from
there but the other two problems will be there. So, one idea could be
to just avoid those two cases (by simply adding return for those
cases) and still we can rely on toast clean up on the next
insert/update/delete. However, your approach looks more natural to me
as that will allow us to clean up memory in all cases instead of
waiting till the transaction end. So, I still vote for going with your
patch's idea of cleaning at spec_abort but I am fine if you and others
decide not to process spec_abort message. What do you think? Tomas, do
you have any opinion on this matter?

--
With Regards,
Amit Kapila.

#31

dilipbalaut@gmail.com

over 4 years ago

In reply to: Amit Kapila (#30)

2 attachment(s)

Re: Decoding speculative insert with toast leaks memory

On Wed, Jun 2, 2021 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 1, 2021 at 5:23 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jun 1, 2021 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

IMHO, as I stated earlier one way to fix this problem is that we add
the spec abort operation (DELETE + XLH_DELETE_IS_SUPER flag) to the
queue, maybe with action name
"REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT" and as part of processing
that just cleans up the toast and specinsert tuple and nothing else.
If we think that makes sense then I can work on that patch?

I think this should solve the problem but let's first try to see if we
have any other problems. Because, say, if we don't have any other
problem, then maybe removing Assert might work but I guess we still
need to process the tuple to find that we don't need to assemble toast
for it which again seems like overkill.

Yeah, other operation will also fail, basically, if txn->toast_hash is
not NULL then we assume that we need to assemble the tuple using
toast, but if there is next insert in another relation and if that
relation doesn't have a toast table then it will fail with below
error. And otherwise also, if it is the same relation, then the toast
chunks of previous tuple will be used for constructing this new tuple.

I think the same relation case might not create a problem because it
won't find the entry for it in the toast_hash, so it will return from
there but the other two problems will be there.

Right

So, one idea could be

to just avoid those two cases (by simply adding return for those
cases) and still we can rely on toast clean up on the next
insert/update/delete. However, your approach looks more natural to me
as that will allow us to clean up memory in all cases instead of
waiting till the transaction end. So, I still vote for going with your
patch's idea of cleaning at spec_abort but I am fine if you and others
decide not to process spec_abort message. What do you think? Tomas, do
you have any opinion on this matter?

I agree that processing with spec abort looks more natural and ideally
the current code expects it to be getting cleaned after the change,
that's the reason we have those assertions and errors. OTOH I agree
that we can just return from those conditions because now we know that
with the current code those situations are possible. My vote is with
handling the spec abort option (Option1) because that looks more
natural way of handling these issues and we also don't have to clean
up the hash in "ReorderBufferReturnTXN" if no followup change after
spec abort. I am attaching the patches with both the approaches for
the reference.

Once we finalize on the approach then I will work on pending review
comments and also prepare the back branch patches.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

Option1_v2-0001-Bug-fix-for-speculative-abort.patchtext/x-patch; charset=US-ASCII; name=Option1_v2-0001-Bug-fix-for-speculative-abort.patchDownload

From f607e140cae21183fea2fc226029d7673478509e Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Tue, 1 Jun 2021 19:53:47 +0530
Subject: [PATCH v2] Bug fix for speculative abort

If speculative insert has a toast table insert then if that tuple
is not confirmed then the toast hash is not cleaned and that is
creating various problem like a) memory leak b) next insert is
using these uncleaned toast data for its insertion and other
error and assersion failure.  So this patch handle that by
queuing the spec abort changes and cleaning up the toast hash
on spec abort.  Currently, in this patch we are queuing up all
the spec abort changes, but as an optimization we can avoid
queuing the spec abort for toast tables but for that we need to
log that as a flag in WAL.
---
 src/backend/replication/logical/decode.c        | 16 +++++++-----
 src/backend/replication/logical/reorderbuffer.c | 34 +++++++++++++++++++++++++
 src/include/replication/reorderbuffer.h         |  1 +
 3 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 7067016..1a0d7dc 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -1040,19 +1040,21 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (target_node.dbNode != ctx->slot->data.database)
 		return;
 
+	/* output plugin doesn't look for this origin, no need to queue */
+	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
+		return;
+
+	change = ReorderBufferGetChange(ctx->reorder);
+
 	/*
 	 * Super deletions are irrelevant for logical decoding, it's driven by the
 	 * confirmation records.
 	 */
 	if (xlrec->flags & XLH_DELETE_IS_SUPER)
-		return;
-
-	/* output plugin doesn't look for this origin, no need to queue */
-	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
-		return;
+		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
+	else
+		change->action = REORDER_BUFFER_CHANGE_DELETE;
 
-	change = ReorderBufferGetChange(ctx->reorder);
-	change->action = REORDER_BUFFER_CHANGE_DELETE;
 	change->origin_id = XLogRecGetOrigin(r);
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 2d9e127..4940cf5 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -520,6 +520,7 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change,
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
@@ -2254,6 +2255,36 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 					specinsert = change;
 					break;
 
+				case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+
+					/*
+					 * Abort for speculative insertion arrived.  So cleanup the
+					 * specinsert tuple and toast hash.  If spec insert change
+					 * is NULL then do nothing, this is possible because we
+					 * have spec abort for each toast entry.  So we just have
+					 * to clean the specinsert and toast hash for the first
+					 * spec abort for the main table and for the remaining
+					 * entries we can just ignore.
+					 *
+					 * XXX For optimization, we may log a flag saying this is
+					 * a spec abort for the toast table and we can avoid queuing
+					 * that change.
+					 */
+					if (specinsert != NULL)
+					{
+						/* Clear the toast chunk */
+						Assert(change->data.tp.clear_toast_afterwards);
+						ReorderBufferToastReset(rb, txn);
+
+						/*
+						* If the speculative insertion was aborted, the record
+						* isn't needed anymore.
+						*/
+						ReorderBufferReturnChange(rb, specinsert, true);
+						specinsert = NULL;
+					}
+					break;
+
 				case REORDER_BUFFER_CHANGE_TRUNCATE:
 					{
 						int			i;
@@ -3754,6 +3785,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -4017,6 +4049,7 @@ ReorderBufferChangeSize(ReorderBufferChange *change)
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -4315,6 +4348,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 0c6e9d1..9ff0986 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -63,6 +63,7 @@ enum ReorderBufferChangeType
 	REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT,
 	REORDER_BUFFER_CHANGE_TRUNCATE
 };
 
-- 
1.8.3.1

Option2_v3-0001-Bug-fix-in-case-of-leftover-hash-after-spec-abort.patchtext/x-patch; charset=US-ASCII; name=Option2_v3-0001-Bug-fix-in-case-of-leftover-hash-after-spec-abort.patchDownload

From 68c672ec05b1867e3b170435f929321e43d05020 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Wed, 2 Jun 2021 11:16:34 +0530
Subject: [PATCH v3] Bug fix in case of leftover hash after spec abort

Basically, if the toast table insertion for spec insert followup
by spec abort then the toast hash was not cleaned up immeditely
and there was some assert and error based on that situation so
just ignore them.
---
 src/backend/replication/logical/reorderbuffer.c | 32 +++++++++++++++++--------
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 2d9e127..5c20e13 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -437,6 +437,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->tuplecid_hash = NULL;
 	}
 
+	/* cleanup the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	if (txn->invalidations)
 	{
 		pfree(txn->invalidations);
@@ -4583,6 +4586,25 @@ ReorderBufferToastReplace(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		return;
 
 	/*
+	 * If the current change is not a INSERT or UPDATE then it might be the
+	 * leftover toast hash from previous spec insert without spec confirm.  So
+	 * we can just ignore it.
+	 */
+	if (change->data.tp.newtuple == NULL)
+		return;
+
+	desc = RelationGetDescr(relation);
+
+	/*
+	 * If the current relation doesn't have a toast relation then it might be
+	 * the leftover toast hash from previous spec insert without spec confirm.
+	 * So we can just ignore it.
+	 */
+	toast_rel = RelationIdGetRelation(relation->rd_rel->reltoastrelid);
+	if (!RelationIsValid(toast_rel))
+		return;
+
+	/*
 	 * We're going to modify the size of the change, so to make sure the
 	 * accounting is correct we'll make it look like we're removing the change
 	 * now (with the old size), and then re-add it at the end.
@@ -4591,16 +4613,6 @@ ReorderBufferToastReplace(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 	oldcontext = MemoryContextSwitchTo(rb->context);
 
-	/* we should only have toast tuples in an INSERT or UPDATE */
-	Assert(change->data.tp.newtuple);
-
-	desc = RelationGetDescr(relation);
-
-	toast_rel = RelationIdGetRelation(relation->rd_rel->reltoastrelid);
-	if (!RelationIsValid(toast_rel))
-		elog(ERROR, "could not open relation with OID %u",
-			 relation->rd_rel->reltoastrelid);
-
 	toast_desc = RelationGetDescr(toast_rel);
 
 	/* should we allocate from stack instead? */
-- 
1.8.3.1

#32

amit.kapila16@gmail.com

over 4 years ago

In reply to: Dilip Kumar (#31)

Re: Decoding speculative insert with toast leaks memory

On Wed, Jun 2, 2021 at 11:38 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Jun 2, 2021 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 1, 2021 at 5:23 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jun 1, 2021 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

IMHO, as I stated earlier one way to fix this problem is that we add
the spec abort operation (DELETE + XLH_DELETE_IS_SUPER flag) to the
queue, maybe with action name
"REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT" and as part of processing
that just cleans up the toast and specinsert tuple and nothing else.
If we think that makes sense then I can work on that patch?

I think this should solve the problem but let's first try to see if we
have any other problems. Because, say, if we don't have any other
problem, then maybe removing Assert might work but I guess we still
need to process the tuple to find that we don't need to assemble toast
for it which again seems like overkill.

Yeah, other operation will also fail, basically, if txn->toast_hash is
not NULL then we assume that we need to assemble the tuple using
toast, but if there is next insert in another relation and if that
relation doesn't have a toast table then it will fail with below
error. And otherwise also, if it is the same relation, then the toast
chunks of previous tuple will be used for constructing this new tuple.

I think the same relation case might not create a problem because it
won't find the entry for it in the toast_hash, so it will return from
there but the other two problems will be there.

Right

So, one idea could be

to just avoid those two cases (by simply adding return for those
cases) and still we can rely on toast clean up on the next
insert/update/delete. However, your approach looks more natural to me
as that will allow us to clean up memory in all cases instead of
waiting till the transaction end. So, I still vote for going with your
patch's idea of cleaning at spec_abort but I am fine if you and others
decide not to process spec_abort message. What do you think? Tomas, do
you have any opinion on this matter?

I agree that processing with spec abort looks more natural and ideally
the current code expects it to be getting cleaned after the change,
that's the reason we have those assertions and errors. OTOH I agree
that we can just return from those conditions because now we know that
with the current code those situations are possible. My vote is with
handling the spec abort option (Option1) because that looks more
natural way of handling these issues and we also don't have to clean
up the hash in "ReorderBufferReturnTXN" if no followup change after
spec abort.

Even, if we decide to go with spec_abort approach, it might be better
to still keep the toastreset call in ReorderBufferReturnTXN so that it
can be freed in case of error.

--
With Regards,
Amit Kapila.

#33

amit.kapila16@gmail.com

over 4 years ago

In reply to: Amit Kapila (#32)

Re: Decoding speculative insert with toast leaks memory

On Wed, Jun 2, 2021 at 11:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jun 2, 2021 at 11:38 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Jun 2, 2021 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I think the same relation case might not create a problem because it
won't find the entry for it in the toast_hash, so it will return from
there but the other two problems will be there.

Right

So, one idea could be

to just avoid those two cases (by simply adding return for those
cases) and still we can rely on toast clean up on the next
insert/update/delete. However, your approach looks more natural to me
as that will allow us to clean up memory in all cases instead of
waiting till the transaction end. So, I still vote for going with your
patch's idea of cleaning at spec_abort but I am fine if you and others
decide not to process spec_abort message. What do you think? Tomas, do
you have any opinion on this matter?

I agree that processing with spec abort looks more natural and ideally
the current code expects it to be getting cleaned after the change,
that's the reason we have those assertions and errors.

Okay, so, let's go with that approach. I have thought about whether it
creates any problem in back-branches but couldn't think of any
primarily because we are not going to send anything additional to
plugin/subscriber. Do you see any problems with back branches if we go
with this approach?

OTOH I agree
that we can just return from those conditions because now we know that
with the current code those situations are possible. My vote is with
handling the spec abort option (Option1) because that looks more
natural way of handling these issues and we also don't have to clean
up the hash in "ReorderBufferReturnTXN" if no followup change after
spec abort.

Even, if we decide to go with spec_abort approach, it might be better
to still keep the toastreset call in ReorderBufferReturnTXN so that it
can be freed in case of error.

Please take care of this as well.

--
With Regards,
Amit Kapila.

#34

dilipbalaut@gmail.com

over 4 years ago

In reply to: Amit Kapila (#33)

Re: Decoding speculative insert with toast leaks memory

On Mon, 7 Jun 2021 at 8:30 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jun 2, 2021 at 11:52 AM Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Wed, Jun 2, 2021 at 11:38 AM Dilip Kumar <dilipbalaut@gmail.com>

wrote:

On Wed, Jun 2, 2021 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com>

wrote:

I think the same relation case might not create a problem because it
won't find the entry for it in the toast_hash, so it will return from
there but the other two problems will be there.

Right

So, one idea could be

to just avoid those two cases (by simply adding return for those
cases) and still we can rely on toast clean up on the next
insert/update/delete. However, your approach looks more natural to me
as that will allow us to clean up memory in all cases instead of
waiting till the transaction end. So, I still vote for going with

your

patch's idea of cleaning at spec_abort but I am fine if you and

others

decide not to process spec_abort message. What do you think? Tomas,

do

you have any opinion on this matter?

I agree that processing with spec abort looks more natural and ideally
the current code expects it to be getting cleaned after the change,
that's the reason we have those assertions and errors.

Okay, so, let's go with that approach. I have thought about whether it
creates any problem in back-branches but couldn't think of any
primarily because we are not going to send anything additional to
plugin/subscriber. Do you see any problems with back branches if we go
with this approach?

I will check this and let you know.

OTOH I agree
that we can just return from those conditions because now we know that
with the current code those situations are possible. My vote is with
handling the spec abort option (Option1) because that looks more
natural way of handling these issues and we also don't have to clean
up the hash in "ReorderBufferReturnTXN" if no followup change after
spec abort.

Even, if we decide to go with spec_abort approach, it might be better
to still keep the toastreset call in ReorderBufferReturnTXN so that it
can be freed in case of error.

Please take care of this as well.

--

Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#35

dilipbalaut@gmail.com

over 4 years ago

In reply to: Dilip Kumar (#34)

1 attachment(s)

Re: Decoding speculative insert with toast leaks memory

On Mon, Jun 7, 2021 at 8:46 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, 7 Jun 2021 at 8:30 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Wed, Jun 2, 2021 at 11:52 AM Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Wed, Jun 2, 2021 at 11:38 AM Dilip Kumar <dilipbalaut@gmail.com>

wrote:

On Wed, Jun 2, 2021 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com>

wrote:

I think the same relation case might not create a problem because it
won't find the entry for it in the toast_hash, so it will return

from

there but the other two problems will be there.

Right

So, one idea could be

to just avoid those two cases (by simply adding return for those
cases) and still we can rely on toast clean up on the next
insert/update/delete. However, your approach looks more natural to

me

as that will allow us to clean up memory in all cases instead of
waiting till the transaction end. So, I still vote for going with

your

patch's idea of cleaning at spec_abort but I am fine if you and

others

decide not to process spec_abort message. What do you think? Tomas,

do

you have any opinion on this matter?

I agree that processing with spec abort looks more natural and ideally
the current code expects it to be getting cleaned after the change,
that's the reason we have those assertions and errors.

Okay, so, let's go with that approach. I have thought about whether it
creates any problem in back-branches but couldn't think of any
primarily because we are not going to send anything additional to
plugin/subscriber. Do you see any problems with back branches if we go
with this approach?

I will check this and let you know.

OTOH I agree
that we can just return from those conditions because now we know that
with the current code those situations are possible. My vote is with
handling the spec abort option (Option1) because that looks more
natural way of handling these issues and we also don't have to clean
up the hash in "ReorderBufferReturnTXN" if no followup change after
spec abort.

Even, if we decide to go with spec_abort approach, it might be better
to still keep the toastreset call in ReorderBufferReturnTXN so that it
can be freed in case of error.

Please take care of this as well.

Ok

I have fixed all pending review comments and also added a test case which
is working fine. I haven't yet checked on the back branches. Let's
discuss if we think this patch looks fine then I can apply and test on the
back branches.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v3-0001-Bug-fix-for-speculative-abort.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Bug-fix-for-speculative-abort.patchDownload

From 0b9c93398ef108a3d71cbac6f793a0314964aaa2 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Tue, 1 Jun 2021 19:53:47 +0530
Subject: [PATCH v3] Bug fix for speculative abort

If speculative insert has a toast table insert then if that tuple
is not confirmed then the toast hash is not cleaned and that is
creating various problem like a) memory leak b) next insert is
using these uncleaned toast data for its insertion and other
error and assersion failure.  So this patch handle that by
queuing the spec abort changes and cleaning up the toast hash
on spec abort.  Currently, in this patch we are queuing up all
the spec abort changes, but as an optimization we can avoid
queuing the spec abort for toast tables but for that we need to
log that as a flag in WAL.
---
 contrib/test_decoding/Makefile                     |  2 +-
 .../test_decoding/expected/speculative_abort.out   | 64 +++++++++++++++++++++
 contrib/test_decoding/specs/speculative_abort.spec | 67 ++++++++++++++++++++++
 src/backend/replication/logical/decode.c           | 14 ++---
 src/backend/replication/logical/reorderbuffer.c    | 43 +++++++++++++-
 src/include/replication/reorderbuffer.h            |  1 +
 6 files changed, 180 insertions(+), 11 deletions(-)
 create mode 100644 contrib/test_decoding/expected/speculative_abort.out
 create mode 100644 contrib/test_decoding/specs/speculative_abort.spec

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 9a31e0b..1cab935 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -8,7 +8,7 @@ REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
 	spill slot truncate stream stats twophase twophase_stream
 ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
 	oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
-	twophase_snapshot
+	twophase_snapshot speculative_abort
 
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/expected/speculative_abort.out b/contrib/test_decoding/expected/speculative_abort.out
new file mode 100644
index 0000000..d672deb
--- /dev/null
+++ b/contrib/test_decoding/expected/speculative_abort.out
@@ -0,0 +1,64 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s1_session s1_lock_s2 s1_lock_s3 s1_begin s1_insert_tbl1 s2_session s2_begin s2_insert_tbl1 s3_session s3_begin s3_insert_tbl1 s1_unlock_s2 s1_unlock_s3 s1_lock_s2 s1_abort s3_commit s1_unlock_s2 s2_insert_tbl2 s2_commit s1_get_changes
+data           
+
+step s1_session: SET spec.session = 1;
+step s1_lock_s2: SELECT pg_advisory_lock(2);
+pg_advisory_lock
+
+               
+step s1_lock_s3: SELECT pg_advisory_lock(2);
+pg_advisory_lock
+
+               
+step s1_begin: BEGIN;
+step s1_insert_tbl1: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING;
+step s2_session: SET spec.session = 2;
+step s2_begin: BEGIN;
+s2: NOTICE:  2acquiring advisory lock on 2
+step s2_insert_tbl1: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+step s3_session: SET spec.session = 3;
+step s3_begin: BEGIN;
+s3: NOTICE:  3acquiring advisory lock on 3
+s3: NOTICE:  3acquiring advisory lock on 3
+step s3_insert_tbl1: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+step s1_unlock_s2: SELECT pg_advisory_unlock(2);
+pg_advisory_unlock
+
+t              
+step s1_unlock_s3: SELECT pg_advisory_unlock(2);
+pg_advisory_unlock
+
+t              
+s2: NOTICE:  2acquiring advisory lock on 2
+step s1_lock_s2: SELECT pg_advisory_lock(2);
+pg_advisory_lock
+
+               
+step s1_abort: ROLLBACK;
+s2: NOTICE:  2acquiring advisory lock on 2
+s3: NOTICE:  3acquiring advisory lock on 3
+step s3_insert_tbl1: <... completed>
+step s3_commit: COMMIT;
+step s1_unlock_s2: SELECT pg_advisory_unlock(2);
+pg_advisory_unlock
+
+t              
+s2: NOTICE:  2acquiring advisory lock on 2
+s2: NOTICE:  2acquiring advisory lock on 2
+step s2_insert_tbl1: <... completed>
+step s2_insert_tbl2: INSERT INTO tbl2 VALUES(1);
+step s2_commit: COMMIT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+data           
+
+BEGIN          
+table public.tbl1: INSERT: a[integer]:1 b[text]:'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+COMMIT         
+BEGIN          
+table public.tbl2: INSERT: a[integer]:1
+COMMIT         
+?column?       
+
+stop           
diff --git a/contrib/test_decoding/specs/speculative_abort.spec b/contrib/test_decoding/specs/speculative_abort.spec
new file mode 100644
index 0000000..0e2ab29
--- /dev/null
+++ b/contrib/test_decoding/specs/speculative_abort.spec
@@ -0,0 +1,67 @@
+setup
+{
+	SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+	DROP TABLE IF EXISTS tbl1;
+	CREATE TABLE tbl1 (a INT, b TEXT);
+	ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+	CREATE TABLE tbl2 (a INT);
+
+	CREATE OR REPLACE FUNCTION blurt_and_lock(int) RETURNS int IMMUTABLE LANGUAGE plpgsql AS $$
+	BEGIN
+	-- depending on lock state, wait for lock 2 or 3
+	IF current_setting('spec.session')::int =  2 THEN
+		RAISE NOTICE '2acquiring advisory lock on 2';
+		PERFORM pg_advisory_lock(2);
+		PERFORM pg_advisory_unlock(2);
+	ELSIF current_setting('spec.session')::int =  3 THEN
+		RAISE NOTICE '3acquiring advisory lock on 3';
+		PERFORM pg_advisory_lock(3);
+		PERFORM pg_advisory_unlock(3);
+	END IF;
+	RETURN $1;
+	END;$$;
+
+	CREATE UNIQUE INDEX idx on tbl1(blurt_and_lock(a));
+
+	-- consume DDL
+	SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+}
+
+teardown
+{
+    DROP TABLE tbl1;
+    SELECT 'stop' FROM pg_drop_replication_slot('isolation_slot');
+}
+
+session "s1"
+setup { SET synchronous_commit=on; }
+
+step "s1_lock_s2" { SELECT pg_advisory_lock(2); }
+step "s1_lock_s3" { SELECT pg_advisory_lock(2); }
+step "s1_session" { SET spec.session = 1; }
+step "s1_begin" { BEGIN; }
+step "s1_insert_tbl1" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+step "s1_abort" { ROLLBACK; }
+step "s1_unlock_s2" { SELECT pg_advisory_unlock(2); }
+step "s1_unlock_s3" { SELECT pg_advisory_unlock(2); }
+step "s1_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); }
+
+session "s2"
+setup { SET synchronous_commit=on; }
+
+step "s2_session" { SET spec.session = 2; }
+step "s2_begin" { BEGIN; }
+step "s2_insert_tbl1" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+step "s2_insert_tbl2" { INSERT INTO tbl2 VALUES(1); }
+step "s2_commit" { COMMIT; }
+
+session "s3"
+setup { SET synchronous_commit=on; }
+
+step "s3_session" { SET spec.session = 3; }
+step "s3_begin" { BEGIN; }
+step "s3_insert_tbl1" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+step "s3_commit" { COMMIT; }
+
+
+permutation "s1_session" "s1_lock_s2" "s1_lock_s3" "s1_begin" "s1_insert_tbl1" "s2_session" "s2_begin" "s2_insert_tbl1" "s3_session" "s3_begin" "s3_insert_tbl1" "s1_unlock_s2" "s1_unlock_s3" "s1_lock_s2" "s1_abort" "s3_commit" "s1_unlock_s2" "s2_insert_tbl2" "s2_commit" "s1_get_changes"
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 7067016..453efc5 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -1040,19 +1040,17 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (target_node.dbNode != ctx->slot->data.database)
 		return;
 
-	/*
-	 * Super deletions are irrelevant for logical decoding, it's driven by the
-	 * confirmation records.
-	 */
-	if (xlrec->flags & XLH_DELETE_IS_SUPER)
-		return;
-
 	/* output plugin doesn't look for this origin, no need to queue */
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
 	change = ReorderBufferGetChange(ctx->reorder);
-	change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+	if (xlrec->flags & XLH_DELETE_IS_SUPER)
+		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
+	else
+		change->action = REORDER_BUFFER_CHANGE_DELETE;
+
 	change->origin_id = XLogRecGetOrigin(r);
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 2d9e127..dd95785 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -443,6 +443,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	/* Reset the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	pfree(txn);
 }
 
@@ -520,6 +523,7 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change,
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
@@ -2211,8 +2215,8 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			change_done:
 
 					/*
-					 * Either speculative insertion was confirmed, or it was
-					 * unsuccessful and the record isn't needed anymore.
+					 * If speculative insertion was confirmed, the record isn't
+					 * needed anymore.
 					 */
 					if (specinsert != NULL)
 					{
@@ -2254,6 +2258,38 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 					specinsert = change;
 					break;
 
+				case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+
+					/*
+					 * Abort for speculative insertion arrived.  So cleanup the
+					 * specinsert tuple and toast hash.  If spec insert change
+					 * is NULL then do nothing, this is possible because we
+					 * have spec abort for each toast entry.  So we just have
+					 * to clean the specinsert and toast hash for the first
+					 * spec abort for the main table and remaining changes for
+					 * the tables can be ignored.
+					 */
+					if (specinsert != NULL)
+					{
+						/*
+						 * Clear the toast hash, we must clean the toast hash
+						 * before we start with a completely new tuple,
+						 * otherwise, while processing the new tuple it would
+						 * create a confusion that whether we need to process
+						 * these toast chunks or not.
+						 */
+						Assert(change->data.tp.clear_toast_afterwards);
+						ReorderBufferToastReset(rb, txn);
+
+						/*
+						* If the speculative insertion was aborted, the record
+						* isn't needed anymore.
+						*/
+						ReorderBufferReturnChange(rb, specinsert, true);
+						specinsert = NULL;
+					}
+					break;
+
 				case REORDER_BUFFER_CHANGE_TRUNCATE:
 					{
 						int			i;
@@ -3754,6 +3790,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -4017,6 +4054,7 @@ ReorderBufferChangeSize(ReorderBufferChange *change)
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -4315,6 +4353,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 0c6e9d1..9ff0986 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -63,6 +63,7 @@ enum ReorderBufferChangeType
 	REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT,
 	REORDER_BUFFER_CHANGE_TRUNCATE
 };
 
-- 
1.8.3.1

#36

amit.kapila16@gmail.com

over 4 years ago

In reply to: Dilip Kumar (#35)

Re: Decoding speculative insert with toast leaks memory

On Mon, Jun 7, 2021 at 6:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I have fixed all pending review comments and also added a test case which is working fine.

Few observations and questions on testcase:
1.
+step "s1_lock_s2" { SELECT pg_advisory_lock(2); }
+step "s1_lock_s3" { SELECT pg_advisory_lock(2); }
+step "s1_session" { SET spec.session = 1; }
+step "s1_begin" { BEGIN; }
+step "s1_insert_tbl1" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000))
ON CONFLICT DO NOTHING; }
+step "s1_abort" { ROLLBACK; }
+step "s1_unlock_s2" { SELECT pg_advisory_unlock(2); }
+step "s1_unlock_s3" { SELECT pg_advisory_unlock(2); }

Here, s1_lock_s3 and s1_unlock_s3 uses 2 as identifier. Don't you need
to use 3 in that part of the test?

2. In the test, there seems to be an assumption that we can unlock s2
and s3 one after another, and then both will start waiting on s-1 but
isn't it possible that before s2 start waiting on s1, s3 completes its
insertion and then s2 will never proceed for speculative insertion?

I haven't yet checked on the back branches. Let's discuss if we think this patch looks fine then I can apply and test on the back branches.

Sure, that makes sense.

--
With Regards,
Amit Kapila.

#37

dilipbalaut@gmail.com

over 4 years ago

In reply to: Amit Kapila (#36)

Re: Decoding speculative insert with toast leaks memory

On Mon, Jun 7, 2021 at 6:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jun 7, 2021 at 6:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I have fixed all pending review comments and also added a test case

which is working fine.
Few observations and questions on testcase:
1.
+step "s1_lock_s2" { SELECT pg_advisory_lock(2); }
+step "s1_lock_s3" { SELECT pg_advisory_lock(2); }
+step "s1_session" { SET spec.session = 1; }
+step "s1_begin" { BEGIN; }
+step "s1_insert_tbl1" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000))
ON CONFLICT DO NOTHING; }
+step "s1_abort" { ROLLBACK; }
+step "s1_unlock_s2" { SELECT pg_advisory_unlock(2); }
+step "s1_unlock_s3" { SELECT pg_advisory_unlock(2); }
Here, s1_lock_s3 and s1_unlock_s3 uses 2 as identifier. Don't you need
to use 3 in that part of the test?

Yeah this should be 3.

2. In the test, there seems to be an assumption that we can unlock s2
and s3 one after another, and then both will start waiting on s-1 but
isn't it possible that before s2 start waiting on s1, s3 completes its
insertion and then s2 will never proceed for speculative insertion?

I agree, such race conditions are possible. Currently, I am not able to
think what we can do here, but I will think more on this.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#38

dilipbalaut@gmail.com

over 4 years ago

In reply to: Dilip Kumar (#37)

2 attachment(s)

Re: Decoding speculative insert with toast leaks memory

On Mon, Jun 7, 2021 at 6:45 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

2. In the test, there seems to be an assumption that we can unlock s2
and s3 one after another, and then both will start waiting on s-1 but
isn't it possible that before s2 start waiting on s1, s3 completes its
insertion and then s2 will never proceed for speculative insertion?

I agree, such race conditions are possible. Currently, I am not able to think what we can do here, but I will think more on this.

Based on the off list discussion, I have modified the test based on
the idea showed in
"isolation/specs/insert-conflict-specconflict.spec", other open point
we had about the race condition that how to ensure that when we unlock
any session it make progress, IMHO the isolation tested is designed in
a way that either all the waiting session returns with the output or
again block on a heavy weight lock before performing the next step.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v4-0001-Bug-fix-for-speculative-abort.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Bug-fix-for-speculative-abort.patchDownload

From dcea4c36267ad2dc58dd0a57733a6f6276e2d754 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Tue, 8 Jun 2021 17:06:39 +0530
Subject: [PATCH v4 1/2] Bug fix for speculative abort

If speculative insert has a toast table insert then if that tuple
is not confirmed then the toast hash is not cleaned and that is
creating various problem like a) memory leak b) next insert is
using these uncleaned toast data for its insertion and other
error and assersion failure.  So this patch handle that by
queuing the spec abort changes and cleaning up the toast hash
on spec abort.  Currently, in this patch we are queuing up all
the spec abort changes, but as an optimization we can avoid
queuing the spec abort for toast tables but for that we need to
log that as a flag in WAL.
---
 src/backend/replication/logical/decode.c        | 14 ++++----
 src/backend/replication/logical/reorderbuffer.c | 43 +++++++++++++++++++++++--
 src/include/replication/reorderbuffer.h         |  1 +
 3 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 7067016..453efc5 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -1040,19 +1040,17 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (target_node.dbNode != ctx->slot->data.database)
 		return;
 
-	/*
-	 * Super deletions are irrelevant for logical decoding, it's driven by the
-	 * confirmation records.
-	 */
-	if (xlrec->flags & XLH_DELETE_IS_SUPER)
-		return;
-
 	/* output plugin doesn't look for this origin, no need to queue */
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
 	change = ReorderBufferGetChange(ctx->reorder);
-	change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+	if (xlrec->flags & XLH_DELETE_IS_SUPER)
+		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
+	else
+		change->action = REORDER_BUFFER_CHANGE_DELETE;
+
 	change->origin_id = XLogRecGetOrigin(r);
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 2d9e127..dd95785 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -443,6 +443,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	/* Reset the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	pfree(txn);
 }
 
@@ -520,6 +523,7 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change,
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
@@ -2211,8 +2215,8 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			change_done:
 
 					/*
-					 * Either speculative insertion was confirmed, or it was
-					 * unsuccessful and the record isn't needed anymore.
+					 * If speculative insertion was confirmed, the record isn't
+					 * needed anymore.
 					 */
 					if (specinsert != NULL)
 					{
@@ -2254,6 +2258,38 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 					specinsert = change;
 					break;
 
+				case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+
+					/*
+					 * Abort for speculative insertion arrived.  So cleanup the
+					 * specinsert tuple and toast hash.  If spec insert change
+					 * is NULL then do nothing, this is possible because we
+					 * have spec abort for each toast entry.  So we just have
+					 * to clean the specinsert and toast hash for the first
+					 * spec abort for the main table and remaining changes for
+					 * the tables can be ignored.
+					 */
+					if (specinsert != NULL)
+					{
+						/*
+						 * Clear the toast hash, we must clean the toast hash
+						 * before we start with a completely new tuple,
+						 * otherwise, while processing the new tuple it would
+						 * create a confusion that whether we need to process
+						 * these toast chunks or not.
+						 */
+						Assert(change->data.tp.clear_toast_afterwards);
+						ReorderBufferToastReset(rb, txn);
+
+						/*
+						* If the speculative insertion was aborted, the record
+						* isn't needed anymore.
+						*/
+						ReorderBufferReturnChange(rb, specinsert, true);
+						specinsert = NULL;
+					}
+					break;
+
 				case REORDER_BUFFER_CHANGE_TRUNCATE:
 					{
 						int			i;
@@ -3754,6 +3790,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -4017,6 +4054,7 @@ ReorderBufferChangeSize(ReorderBufferChange *change)
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -4315,6 +4353,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 0c6e9d1..9ff0986 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -63,6 +63,7 @@ enum ReorderBufferChangeType
 	REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT,
 	REORDER_BUFFER_CHANGE_TRUNCATE
 };
 
-- 
1.8.3.1

v4-0002-test-case.patchtext/x-patch; charset=US-ASCII; name=v4-0002-test-case.patchDownload

From 8b5a46a99489c5c1cacbaf6047440c4689727b9c Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Tue, 8 Jun 2021 17:11:24 +0530
Subject: [PATCH v4 2/2] test case

---
 contrib/test_decoding/Makefile                     |   2 +-
 .../test_decoding/expected/speculative_abort.out   |  85 +++++++++++++++++
 contrib/test_decoding/specs/speculative_abort.spec | 105 +++++++++++++++++++++
 3 files changed, 191 insertions(+), 1 deletion(-)
 create mode 100644 contrib/test_decoding/expected/speculative_abort.out
 create mode 100644 contrib/test_decoding/specs/speculative_abort.spec

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 9a31e0b..1cab935 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -8,7 +8,7 @@ REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
 	spill slot truncate stream stats twophase twophase_stream
 ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
 	oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
-	twophase_snapshot
+	twophase_snapshot speculative_abort
 
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/expected/speculative_abort.out b/contrib/test_decoding/expected/speculative_abort.out
new file mode 100644
index 0000000..79ee67c
--- /dev/null
+++ b/contrib/test_decoding/expected/speculative_abort.out
@@ -0,0 +1,85 @@
+Parsed test spec with 3 sessions
+
+starting permutation: controller_locks controller_show_count s1_begin s1_insert_toast s2_insert_toast controller_show_count controller_unlock_1_1 controller_unlock_2_1 controller_unlock_1_3 controller_unlock_2_3 controller_show_count controller_unlock_2_2 controller_show_count controller_unlock_1_2 s1_insert_other s1_commit controller_get_changes
+data           
+
+step controller_locks: SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);
+pg_advisory_locksess           lock           
+
+               1              1              
+               1              2              
+               1              3              
+               2              1              
+               2              2              
+               2              3              
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step s1_begin: BEGIN;
+s1: NOTICE:  blurt_and_lock_123() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 3
+step s1_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+s2: NOTICE:  blurt_and_lock_123() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 3
+step s2_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_1_1: SELECT pg_advisory_unlock(1, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_2_1: SELECT pg_advisory_unlock(2, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_1_3: SELECT pg_advisory_unlock(1, 3);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock_123() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step controller_unlock_2_3: SELECT pg_advisory_unlock(2, 3);
+pg_advisory_unlock
+
+t              
+s2: NOTICE:  blurt_and_lock_123() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 2
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_2_2: SELECT pg_advisory_unlock(2, 2);
+pg_advisory_unlock
+
+t              
+step s2_insert_toast: <... completed>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+1              
+step controller_unlock_1_2: SELECT pg_advisory_unlock(1, 2);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock_123() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+s1: NOTICE:  blurt_and_lock_123() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step s1_insert_toast: <... completed>
+step s1_insert_other: INSERT INTO tbl2 VALUES(1);
+step s1_commit: COMMIT;
+step controller_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+data           
+
+BEGIN          
+table public.tbl1: INSERT: a[integer]:1 b[text]:'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+COMMIT         
+BEGIN          
+table public.tbl2: INSERT: a[integer]:1
+COMMIT         
+?column?       
+
+stop           
diff --git a/contrib/test_decoding/specs/speculative_abort.spec b/contrib/test_decoding/specs/speculative_abort.spec
new file mode 100644
index 0000000..01b6be5
--- /dev/null
+++ b/contrib/test_decoding/specs/speculative_abort.spec
@@ -0,0 +1,105 @@
+setup
+{
+	SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+	DROP TABLE IF EXISTS tbl1;
+	CREATE TABLE tbl1 (a INT, b TEXT);
+	ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+	CREATE TABLE tbl2 (a INT);
+
+    CREATE OR REPLACE FUNCTION blurt_and_lock_123(int) RETURNS int IMMUTABLE LANGUAGE plpgsql AS $$
+     BEGIN
+        RAISE NOTICE 'blurt_and_lock_123() called for % in session %', $1, current_setting('spec.session')::int;
+
+	-- depending on lock state, wait for lock 2 or 3
+        IF pg_try_advisory_xact_lock(current_setting('spec.session')::int, 1) THEN
+            RAISE NOTICE 'acquiring advisory lock on 2';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 2);
+        ELSE
+            RAISE NOTICE 'acquiring advisory lock on 3';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 3);
+        END IF;
+    RETURN $1;
+    END;$$;
+
+	CREATE UNIQUE INDEX idx on tbl1(blurt_and_lock_123(a));
+
+	-- consume DDL
+	SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+}
+
+teardown
+{
+    DROP TABLE tbl1;
+    SELECT 'stop' FROM pg_drop_replication_slot('isolation_slot');
+}
+
+session "controller"
+setup
+{
+    SET default_transaction_isolation = 'read committed';
+    SET application_name = 'isolation/insert-specconflict-controller';
+}
+step "controller_locks" {SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);}
+step "controller_unlock_1_1" { SELECT pg_advisory_unlock(1, 1); }
+step "controller_unlock_2_1" { SELECT pg_advisory_unlock(2, 1); }
+step "controller_unlock_1_2" { SELECT pg_advisory_unlock(1, 2); }
+step "controller_unlock_2_2" { SELECT pg_advisory_unlock(2, 2); }
+step "controller_unlock_1_3" { SELECT pg_advisory_unlock(1, 3); }
+step "controller_unlock_2_3" { SELECT pg_advisory_unlock(2, 3); }
+step "controller_show_count" { SELECT COUNT(*) FROM tbl1; }
+step "controller_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); }
+
+session "s1"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 1;
+    SET application_name = 'isolation/insert-specconflict-s1';
+}
+
+step "s1_begin"  { BEGIN; }
+step "s1_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+step "s1_insert_other" { INSERT INTO tbl2 VALUES(1); }
+step "s1_commit"  { COMMIT; }
+
+session "s2"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 2;
+    SET application_name = 'isolation/insert-specconflict-s2';
+}
+step "s2_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+
+
+#permutation "s1_session" "s1_lock_s2" "s1_lock_s3" "s1_begin" "s1_insert_tbl1" "s2_session" "s2_begin" "s2_insert_tbl1" "s3_session" "s3_begin" "s3_insert_tbl1" "s1_unlock_s2" "s1_unlock_s3" "s1_lock_s2" "s1_abort" "s3_commit" "s1_unlock_s2" "s2_insert_tbl2" "s2_commit" "s1_get_changes"
+
+# Test that speculative locks are correctly acquired and released, s2
+# inserts, s1 updates.
+permutation
+   # acquire a number of locks, to control execution flow - the
+   # blurt_and_lock_123 function acquires advisory locks that allow us to
+   # continue after a) the optimistic conflict probe b) after the
+   # insertion of the speculative tuple.
+   "controller_locks"
+   "controller_show_count"
+   "s1_begin"
+   "s1_insert_toast" "s2_insert_toast"
+   "controller_show_count"
+   # Switch both sessions to wait on the other lock next time (the speculative insertion)
+   "controller_unlock_1_1" "controller_unlock_2_1"
+   # Allow both sessions to continue
+   "controller_unlock_1_3" "controller_unlock_2_3"
+   "controller_show_count"
+   # Allow the second session to finish insertion
+   "controller_unlock_2_2"
+   # This should now show a successful insertion
+   "controller_show_count"
+   # Allow the first session to speculative abort
+   "controller_unlock_1_2"
+   # Insert into other table from s1 and commit
+   "s1_insert_other" "s1_commit"
+   # Get the changes
+   "controller_get_changes"
-- 
1.8.3.1

#39

amit.kapila16@gmail.com

over 4 years ago

In reply to: Dilip Kumar (#38)

Re: Decoding speculative insert with toast leaks memory

On Tue, Jun 8, 2021 at 5:16 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Based on the off list discussion, I have modified the test based on
the idea showed in
"isolation/specs/insert-conflict-specconflict.spec", other open point
we had about the race condition that how to ensure that when we unlock
any session it make progress, IMHO the isolation tested is designed in
a way that either all the waiting session returns with the output or
again block on a heavy weight lock before performing the next step.

Few comments:
1. The test has a lot of similarities and test duplication with what
we are doing in insert-conflict-specconflict.spec. Can we move it to
insert-conflict-specconflict.spec? I understand that having it in
test_decoding has the advantage that we can have all decoding tests in
one place but OTOH, we can avoid a lot of test-code duplication if we
add it in insert-conflict-specconflict.spec.

2.
+#permutation "s1_session" "s1_lock_s2" "s1_lock_s3" "s1_begin"
"s1_insert_tbl1" "s2_session" "s2_begin" "s2_insert_tbl1" "s3_session"
"s3_begin" "s3_insert_tbl1" "s1_unlock_s2" "s1_unlock_s3" "s1_lock_s2"
"s1_abort" "s3_commit" "s1_unlock_s2" "s2_insert_tbl2" "s2_commit"
"s1_get_changes"

This permutation is not matching with what we are actually doing.

3.
+# Test that speculative locks are correctly acquired and released, s2
+# inserts, s1 updates.

This test description doesn't seem to be correct. Can we change it to
something like: "Test logical decoding of speculative aborts for toast
insertion followed by insertion into a different table which doesn't
have a toast"?

Also, let's prepare and test the patches for back-branches. It would
be better if you can prepare separate patches for code and test-case
for each branch then I can merge them before commit. This helps with
testing on back-branches.

--
With Regards,
Amit Kapila.

#40

dilipbalaut@gmail.com

over 4 years ago

In reply to: Amit Kapila (#39)

Re: Decoding speculative insert with toast leaks memory

On Wed, Jun 9, 2021 at 11:03 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 8, 2021 at 5:16 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Based on the off list discussion, I have modified the test based on
the idea showed in
"isolation/specs/insert-conflict-specconflict.spec", other open point
we had about the race condition that how to ensure that when we unlock
any session it make progress, IMHO the isolation tested is designed in
a way that either all the waiting session returns with the output or
again block on a heavy weight lock before performing the next step.

Few comments:
1. The test has a lot of similarities and test duplication with what
we are doing in insert-conflict-specconflict.spec. Can we move it to
insert-conflict-specconflict.spec? I understand that having it in
test_decoding has the advantage that we can have all decoding tests in
one place but OTOH, we can avoid a lot of test-code duplication if we
add it in insert-conflict-specconflict.spec.

It seems the isolation test runs on the default configuration, will it be a
good idea to change the wal_level to logical for the whole isolation tester
folder?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#41

amit.kapila16@gmail.com

over 4 years ago

In reply to: Dilip Kumar (#40)

Re: Decoding speculative insert with toast leaks memory

On Wed, Jun 9, 2021 at 4:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Jun 9, 2021 at 11:03 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 8, 2021 at 5:16 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Based on the off list discussion, I have modified the test based on
the idea showed in
"isolation/specs/insert-conflict-specconflict.spec", other open point
we had about the race condition that how to ensure that when we unlock
any session it make progress, IMHO the isolation tested is designed in
a way that either all the waiting session returns with the output or
again block on a heavy weight lock before performing the next step.

Few comments:
1. The test has a lot of similarities and test duplication with what
we are doing in insert-conflict-specconflict.spec. Can we move it to
insert-conflict-specconflict.spec? I understand that having it in
test_decoding has the advantage that we can have all decoding tests in
one place but OTOH, we can avoid a lot of test-code duplication if we
add it in insert-conflict-specconflict.spec.

It seems the isolation test runs on the default configuration, will it be a good idea to change the wal_level to logical for the whole isolation tester folder?

No, that doesn't sound like a good idea to me. Let's keep it in
test_decoding then.

--
With Regards,
Amit Kapila.

#42

dilipbalaut@gmail.com

over 4 years ago

In reply to: Amit Kapila (#41)

Re: Decoding speculative insert with toast leaks memory

On Wed, Jun 9, 2021 at 4:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jun 9, 2021 at 4:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Few comments:
1. The test has a lot of similarities and test duplication with what
we are doing in insert-conflict-specconflict.spec. Can we move it to
insert-conflict-specconflict.spec? I understand that having it in
test_decoding has the advantage that we can have all decoding tests in
one place but OTOH, we can avoid a lot of test-code duplication if we
add it in insert-conflict-specconflict.spec.

It seems the isolation test runs on the default configuration, will it

be a good idea to change the wal_level to logical for the whole isolation
tester folder?

No, that doesn't sound like a good idea to me. Let's keep it in
test_decoding then.

Okay, I will work on the remaining comments and back patches and send it by
tomorrow.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#43

Alvaro Herrera

alvherre@alvh.no-ip.org

over 4 years ago

In reply to: Dilip Kumar (#42)

Re: Decoding speculative insert with toast leaks memory

May I suggest to use a different name in the blurt_and_lock_123()
function, so that it doesn't conflict with the one in
insert-conflict-specconflict? Thanks

--
ï¿½lvaro Herrera 39ï¿½49'30"S 73ï¿½17'W

#44

dilipbalaut@gmail.com

over 4 years ago

In reply to: Alvaro Herrera (#43)

10 attachment(s)

Re: Decoding speculative insert with toast leaks memory

On Wed, Jun 9, 2021 at 8:59 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

May I suggest to use a different name in the blurt_and_lock_123()
function, so that it doesn't conflict with the one in
insert-conflict-specconflict? Thanks

Renamed to blurt_and_lock(), is that fine?

I haved fixed other comments and also prepared patches for the back branches.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v5-0001-Bug-fix-for-speculative-abort_HEAD.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Bug-fix-for-speculative-abort_HEAD.patchDownload

From dcea4c36267ad2dc58dd0a57733a6f6276e2d754 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Tue, 8 Jun 2021 17:06:39 +0530
Subject: [PATCH v5 1/2] Bug fix for speculative abort

If speculative insert has a toast table insert then if that tuple
is not confirmed then the toast hash is not cleaned and that is
creating various problem like a) memory leak b) next insert is
using these uncleaned toast data for its insertion and other
error and assersion failure.  So this patch handle that by
queuing the spec abort changes and cleaning up the toast hash
on spec abort.  Currently, in this patch we are queuing up all
the spec abort changes, but as an optimization we can avoid
queuing the spec abort for toast tables but for that we need to
log that as a flag in WAL.
---
 src/backend/replication/logical/decode.c        | 14 ++++----
 src/backend/replication/logical/reorderbuffer.c | 43 +++++++++++++++++++++++--
 src/include/replication/reorderbuffer.h         |  1 +
 3 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 7067016..453efc5 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -1040,19 +1040,17 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (target_node.dbNode != ctx->slot->data.database)
 		return;
 
-	/*
-	 * Super deletions are irrelevant for logical decoding, it's driven by the
-	 * confirmation records.
-	 */
-	if (xlrec->flags & XLH_DELETE_IS_SUPER)
-		return;
-
 	/* output plugin doesn't look for this origin, no need to queue */
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
 	change = ReorderBufferGetChange(ctx->reorder);
-	change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+	if (xlrec->flags & XLH_DELETE_IS_SUPER)
+		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
+	else
+		change->action = REORDER_BUFFER_CHANGE_DELETE;
+
 	change->origin_id = XLogRecGetOrigin(r);
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 2d9e127..dd95785 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -443,6 +443,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	/* Reset the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	pfree(txn);
 }
 
@@ -520,6 +523,7 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change,
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
@@ -2211,8 +2215,8 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			change_done:
 
 					/*
-					 * Either speculative insertion was confirmed, or it was
-					 * unsuccessful and the record isn't needed anymore.
+					 * If speculative insertion was confirmed, the record isn't
+					 * needed anymore.
 					 */
 					if (specinsert != NULL)
 					{
@@ -2254,6 +2258,38 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 					specinsert = change;
 					break;
 
+				case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+
+					/*
+					 * Abort for speculative insertion arrived.  So cleanup the
+					 * specinsert tuple and toast hash.  If spec insert change
+					 * is NULL then do nothing, this is possible because we
+					 * have spec abort for each toast entry.  So we just have
+					 * to clean the specinsert and toast hash for the first
+					 * spec abort for the main table and remaining changes for
+					 * the tables can be ignored.
+					 */
+					if (specinsert != NULL)
+					{
+						/*
+						 * Clear the toast hash, we must clean the toast hash
+						 * before we start with a completely new tuple,
+						 * otherwise, while processing the new tuple it would
+						 * create a confusion that whether we need to process
+						 * these toast chunks or not.
+						 */
+						Assert(change->data.tp.clear_toast_afterwards);
+						ReorderBufferToastReset(rb, txn);
+
+						/*
+						* If the speculative insertion was aborted, the record
+						* isn't needed anymore.
+						*/
+						ReorderBufferReturnChange(rb, specinsert, true);
+						specinsert = NULL;
+					}
+					break;
+
 				case REORDER_BUFFER_CHANGE_TRUNCATE:
 					{
 						int			i;
@@ -3754,6 +3790,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -4017,6 +4054,7 @@ ReorderBufferChangeSize(ReorderBufferChange *change)
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -4315,6 +4353,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 0c6e9d1..9ff0986 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -63,6 +63,7 @@ enum ReorderBufferChangeType
 	REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT,
 	REORDER_BUFFER_CHANGE_TRUNCATE
 };
 
-- 
1.8.3.1

v5-0001-Bug-fix-for-speculative-abort-v10.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Bug-fix-for-speculative-abort-v10.patchDownload

From 4440e5dca68da59d9d397efb890893470dd92aaf Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Thu, 10 Jun 2021 11:59:58 +0530
Subject: [PATCH v5 1/2] Bug fix for speculative abort v10

If speculative insert has a toast table insert then if that tuple
is not confirmed then the toast hash is not cleaned and that is
creating various problem like a) memory leak b) next insert is
using these uncleaned toast data for its insertion and other
error and assersion failure.  So this patch handle that by
queuing the spec abort changes and cleaning up the toast hash
on spec abort.  Currently, in this patch we are queuing up all
the spec abort changes, but as an optimization we can avoid
queuing the spec abort for toast tables but for that we need to
log that as a flag in WAL.
---
 src/backend/replication/logical/decode.c        | 14 ++++-----
 src/backend/replication/logical/reorderbuffer.c | 42 +++++++++++++++++++++++--
 src/include/replication/reorderbuffer.h         |  3 +-
 3 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index d7a6f5d..3778ea9 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -778,19 +778,17 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (target_node.dbNode != ctx->slot->data.database)
 		return;
 
-	/*
-	 * Super deletions are irrelevant for logical decoding, it's driven by the
-	 * confirmation records.
-	 */
-	if (xlrec->flags & XLH_DELETE_IS_SUPER)
-		return;
-
 	/* output plugin doesn't look for this origin, no need to queue */
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
 	change = ReorderBufferGetChange(ctx->reorder);
-	change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+	if (xlrec->flags & XLH_DELETE_IS_SUPER)
+		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
+	else
+		change->action = REORDER_BUFFER_CHANGE_DELETE;
+
 	change->origin_id = XLogRecGetOrigin(r);
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 165ba8f..2f80c73 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -353,6 +353,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	/* Reset the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	pfree(txn);
 }
 
@@ -413,6 +416,7 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change)
 			break;
 			/* no data in addition to the struct itself */
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
@@ -1621,8 +1625,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			change_done:
 
 					/*
-					 * Either speculative insertion was confirmed, or it was
-					 * unsuccessful and the record isn't needed anymore.
+					 * If speculative insertion was confirmed, the record isn't
+					 * needed anymore.
 					 */
 					if (specinsert != NULL)
 					{
@@ -1664,6 +1668,38 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 					specinsert = change;
 					break;
 
+				case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+
+					/*
+					 * Abort for speculative insertion arrived.  So cleanup the
+					 * specinsert tuple and toast hash.  If spec insert change
+					 * is NULL then do nothing, this is possible because we
+					 * have spec abort for each toast entry.  So we just have
+					 * to clean the specinsert and toast hash for the first
+					 * spec abort for the main table and remaining changes for
+					 * the tables can be ignored.
+					 */
+					if (specinsert != NULL)
+					{
+						/*
+						 * Clear the toast hash, we must clean the toast hash
+						 * before we start with a completely new tuple,
+						 * otherwise, while processing the new tuple it would
+						 * create a confusion that whether we need to process
+						 * these toast chunks or not.
+						 */
+						Assert(change->data.tp.clear_toast_afterwards);
+						ReorderBufferToastReset(rb, txn);
+
+						/*
+						* If the speculative insertion was aborted, the record
+						* isn't needed anymore.
+						*/
+						ReorderBufferReturnChange(rb, specinsert);
+						specinsert = NULL;
+					}
+					break;
+
 				case REORDER_BUFFER_CHANGE_MESSAGE:
 					rb->message(rb, txn, change->lsn, true,
 								change->data.msg.prefix,
@@ -2423,6 +2459,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -2716,6 +2753,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			}
 			/* the base struct contains all the data, easy peasy */
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 5530c4f..6c2889a 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -59,7 +59,8 @@ enum ReorderBufferChangeType
 	REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID,
 	REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
-	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT
 };
 
 /*
-- 
1.8.3.1

v5-0001-Bug-fix-for-speculative-abort-v12_and_v13.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Bug-fix-for-speculative-abort-v12_and_v13.patchDownload

From e739d144dd9e2ed1e4903d3992b56b0ac76c1526 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Thu, 10 Jun 2021 11:09:47 +0530
Subject: [PATCH v5 1/2] Bug fix for speculative abort v13

If speculative insert has a toast table insert then if that tuple
is not confirmed then the toast hash is not cleaned and that is
creating various problem like a) memory leak b) next insert is
using these uncleaned toast data for its insertion and other
error and assersion failure.  So this patch handle that by
queuing the spec abort changes and cleaning up the toast hash
on spec abort.  Currently, in this patch we are queuing up all
the spec abort changes, but as an optimization we can avoid
queuing the spec abort for toast tables but for that we need to
log that as a flag in WAL.
---
 src/backend/replication/logical/decode.c        | 14 ++++-----
 src/backend/replication/logical/reorderbuffer.c | 42 +++++++++++++++++++++++--
 src/include/replication/reorderbuffer.h         |  1 +
 3 files changed, 47 insertions(+), 10 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index c2e5e3a..4985c2a 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -800,19 +800,17 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (target_node.dbNode != ctx->slot->data.database)
 		return;
 
-	/*
-	 * Super deletions are irrelevant for logical decoding, it's driven by the
-	 * confirmation records.
-	 */
-	if (xlrec->flags & XLH_DELETE_IS_SUPER)
-		return;
-
 	/* output plugin doesn't look for this origin, no need to queue */
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
 	change = ReorderBufferGetChange(ctx->reorder);
-	change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+	if (xlrec->flags & XLH_DELETE_IS_SUPER)
+		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
+	else
+		change->action = REORDER_BUFFER_CHANGE_DELETE;
+
 	change->origin_id = XLogRecGetOrigin(r);
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 5251932..5e571ff 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -397,6 +397,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	/* Reset the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	pfree(txn);
 }
 
@@ -467,6 +470,7 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change)
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
@@ -1677,8 +1681,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			change_done:
 
 					/*
-					 * Either speculative insertion was confirmed, or it was
-					 * unsuccessful and the record isn't needed anymore.
+					 * If speculative insertion was confirmed, the record isn't
+					 * needed anymore.
 					 */
 					if (specinsert != NULL)
 					{
@@ -1720,6 +1724,38 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 					specinsert = change;
 					break;
 
+				case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+
+					/*
+					 * Abort for speculative insertion arrived.  So cleanup the
+					 * specinsert tuple and toast hash.  If spec insert change
+					 * is NULL then do nothing, this is possible because we
+					 * have spec abort for each toast entry.  So we just have
+					 * to clean the specinsert and toast hash for the first
+					 * spec abort for the main table and remaining changes for
+					 * the tables can be ignored.
+					 */
+					if (specinsert != NULL)
+					{
+						/*
+						 * Clear the toast hash, we must clean the toast hash
+						 * before we start with a completely new tuple,
+						 * otherwise, while processing the new tuple it would
+						 * create a confusion that whether we need to process
+						 * these toast chunks or not.
+						 */
+						Assert(change->data.tp.clear_toast_afterwards);
+						ReorderBufferToastReset(rb, txn);
+
+						/*
+						* If the speculative insertion was aborted, the record
+						* isn't needed anymore.
+						*/
+						ReorderBufferReturnChange(rb, specinsert);
+						specinsert = NULL;
+					}
+					break;
+
 				case REORDER_BUFFER_CHANGE_TRUNCATE:
 					{
 						int			i;
@@ -2640,6 +2676,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -3032,6 +3069,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 019bd38..dbd2e84 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -62,6 +62,7 @@ enum ReorderBufferChangeType
 	REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT,
 	REORDER_BUFFER_CHANGE_TRUNCATE
 };
 
-- 
1.8.3.1

v5-0001-Bug-fix-for-speculative-abort-v96.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Bug-fix-for-speculative-abort-v96.patchDownload

From ef1253e047556459cdcd415e4e2558353fa41e76 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Thu, 10 Jun 2021 13:53:00 +0530
Subject: [PATCH v5 1/2] Bug fix for speculative abort v96

If speculative insert has a toast table insert then if that tuple
is not confirmed then the toast hash is not cleaned and that is
creating various problem like a) memory leak b) next insert is
using these uncleaned toast data for its insertion and other
error and assersion failure.  So this patch handle that by
queuing the spec abort changes and cleaning up the toast hash
on spec abort.  Currently, in this patch we are queuing up all
the spec abort changes, but as an optimization we can avoid
queuing the spec abort for toast tables but for that we need to
log that as a flag in WAL.
---
 src/backend/replication/logical/decode.c        | 14 ++++-----
 src/backend/replication/logical/reorderbuffer.c | 42 +++++++++++++++++++++++--
 src/include/replication/reorderbuffer.h         |  3 +-
 3 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 1300902..571a901 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -778,19 +778,17 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (target_node.dbNode != ctx->slot->data.database)
 		return;
 
-	/*
-	 * Super deletions are irrelevant for logical decoding, it's driven by the
-	 * confirmation records.
-	 */
-	if (xlrec->flags & XLH_DELETE_IS_SUPER)
-		return;
-
 	/* output plugin doesn't look for this origin, no need to queue */
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
 	change = ReorderBufferGetChange(ctx->reorder);
-	change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+	if (xlrec->flags & XLH_DELETE_IS_SUPER)
+		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
+	else
+		change->action = REORDER_BUFFER_CHANGE_DELETE;
+
 	change->origin_id = XLogRecGetOrigin(r);
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index f0de337..54c8901 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -364,6 +364,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	/* Reset the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	/* check whether to put into the slab cache */
 	if (rb->nr_cached_transactions < max_cached_transactions)
 	{
@@ -449,6 +452,7 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change)
 			break;
 			/* no data in addition to the struct itself */
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
@@ -1674,8 +1678,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			change_done:
 
 					/*
-					 * Either speculative insertion was confirmed, or it was
-					 * unsuccessful and the record isn't needed anymore.
+					 * If speculative insertion was confirmed, the record isn't
+					 * needed anymore.
 					 */
 					if (specinsert != NULL)
 					{
@@ -1717,6 +1721,38 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 					specinsert = change;
 					break;
 
+				case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+
+					/*
+					 * Abort for speculative insertion arrived.  So cleanup the
+					 * specinsert tuple and toast hash.  If spec insert change
+					 * is NULL then do nothing, this is possible because we
+					 * have spec abort for each toast entry.  So we just have
+					 * to clean the specinsert and toast hash for the first
+					 * spec abort for the main table and remaining changes for
+					 * the tables can be ignored.
+					 */
+					if (specinsert != NULL)
+					{
+						/*
+						 * Clear the toast hash, we must clean the toast hash
+						 * before we start with a completely new tuple,
+						 * otherwise, while processing the new tuple it would
+						 * create a confusion that whether we need to process
+						 * these toast chunks or not.
+						 */
+						Assert(change->data.tp.clear_toast_afterwards);
+						ReorderBufferToastReset(rb, txn);
+
+						/*
+						* If the speculative insertion was aborted, the record
+						* isn't needed anymore.
+						*/
+						ReorderBufferReturnChange(rb, specinsert);
+						specinsert = NULL;
+					}
+					break;
+
 				case REORDER_BUFFER_CHANGE_MESSAGE:
 					rb->message(rb, txn, change->lsn, true,
 								change->data.msg.prefix,
@@ -2476,6 +2512,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -2764,6 +2801,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			}
 			/* the base struct contains all the data, easy peasy */
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index e085088..67f2981 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -59,7 +59,8 @@ enum ReorderBufferChangeType
 	REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID,
 	REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
-	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT
 };
 
 /*
-- 
1.8.3.1

v5-0001-Bug-fix-for-speculative-abort-v11.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Bug-fix-for-speculative-abort-v11.patchDownload

From 1c08381fd1a32406a80f6f9a0bf6ce5916022b7c Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Thu, 10 Jun 2021 11:09:47 +0530
Subject: [PATCH v5 1/2] Bug fix for speculative abort v11

If speculative insert has a toast table insert then if that tuple
is not confirmed then the toast hash is not cleaned and that is
creating various problem like a) memory leak b) next insert is
using these uncleaned toast data for its insertion and other
error and assersion failure.  So this patch handle that by
queuing the spec abort changes and cleaning up the toast hash
on spec abort.  Currently, in this patch we are queuing up all
the spec abort changes, but as an optimization we can avoid
queuing the spec abort for toast tables but for that we need to
log that as a flag in WAL.
---
 src/backend/replication/logical/decode.c        | 14 ++++-----
 src/backend/replication/logical/reorderbuffer.c | 42 +++++++++++++++++++++++--
 src/include/replication/reorderbuffer.h         |  1 +
 3 files changed, 47 insertions(+), 10 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index d7da3d6..676f921 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -804,19 +804,17 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (target_node.dbNode != ctx->slot->data.database)
 		return;
 
-	/*
-	 * Super deletions are irrelevant for logical decoding, it's driven by the
-	 * confirmation records.
-	 */
-	if (xlrec->flags & XLH_DELETE_IS_SUPER)
-		return;
-
 	/* output plugin doesn't look for this origin, no need to queue */
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
 	change = ReorderBufferGetChange(ctx->reorder);
-	change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+	if (xlrec->flags & XLH_DELETE_IS_SUPER)
+		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
+	else
+		change->action = REORDER_BUFFER_CHANGE_DELETE;
+
 	change->origin_id = XLogRecGetOrigin(r);
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 1a4b87c..b9c6016 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -351,6 +351,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	/* Reset the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	pfree(txn);
 }
 
@@ -418,6 +421,7 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change)
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
@@ -1620,8 +1624,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			change_done:
 
 					/*
-					 * Either speculative insertion was confirmed, or it was
-					 * unsuccessful and the record isn't needed anymore.
+					 * If speculative insertion was confirmed, the record isn't
+					 * needed anymore.
 					 */
 					if (specinsert != NULL)
 					{
@@ -1663,6 +1667,38 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 					specinsert = change;
 					break;
 
+				case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+
+					/*
+					 * Abort for speculative insertion arrived.  So cleanup the
+					 * specinsert tuple and toast hash.  If spec insert change
+					 * is NULL then do nothing, this is possible because we
+					 * have spec abort for each toast entry.  So we just have
+					 * to clean the specinsert and toast hash for the first
+					 * spec abort for the main table and remaining changes for
+					 * the tables can be ignored.
+					 */
+					if (specinsert != NULL)
+					{
+						/*
+						 * Clear the toast hash, we must clean the toast hash
+						 * before we start with a completely new tuple,
+						 * otherwise, while processing the new tuple it would
+						 * create a confusion that whether we need to process
+						 * these toast chunks or not.
+						 */
+						Assert(change->data.tp.clear_toast_afterwards);
+						ReorderBufferToastReset(rb, txn);
+
+						/*
+						* If the speculative insertion was aborted, the record
+						* isn't needed anymore.
+						*/
+						ReorderBufferReturnChange(rb, specinsert);
+						specinsert = NULL;
+					}
+					break;
+
 				case REORDER_BUFFER_CHANGE_TRUNCATE:
 					{
 						int			i;
@@ -2475,6 +2511,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -2778,6 +2815,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index c41f362..f51336f 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -60,6 +60,7 @@ enum ReorderBufferChangeType
 	REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT,
 	REORDER_BUFFER_CHANGE_TRUNCATE
 };
 
-- 
1.8.3.1

v5-0002-Test-logical-decoding-of-speculative-aborts_HEAD.patchtext/x-patch; charset=US-ASCII; name=v5-0002-Test-logical-decoding-of-speculative-aborts_HEAD.patchDownload

From cb8f217a3c9764ff60294298965809a3b55dbcae Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Tue, 8 Jun 2021 17:11:24 +0530
Subject: [PATCH v5 2/2] Test logical decoding of speculative aborts

Test logical decoding of speculative aborts for toast
insertion followed by insertion into a different table
which doesn't have a toast
---
 contrib/test_decoding/Makefile                     |   2 +-
 .../test_decoding/expected/speculative_abort.out   |  85 ++++++++++++++++
 contrib/test_decoding/specs/speculative_abort.spec | 111 +++++++++++++++++++++
 3 files changed, 197 insertions(+), 1 deletion(-)
 create mode 100644 contrib/test_decoding/expected/speculative_abort.out
 create mode 100644 contrib/test_decoding/specs/speculative_abort.spec

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 9a31e0b..1cab935 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -8,7 +8,7 @@ REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
 	spill slot truncate stream stats twophase twophase_stream
 ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
 	oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
-	twophase_snapshot
+	twophase_snapshot speculative_abort
 
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/expected/speculative_abort.out b/contrib/test_decoding/expected/speculative_abort.out
new file mode 100644
index 0000000..7492506
--- /dev/null
+++ b/contrib/test_decoding/expected/speculative_abort.out
@@ -0,0 +1,85 @@
+Parsed test spec with 3 sessions
+
+starting permutation: controller_locks controller_show_count s1_begin s1_insert_toast s2_insert_toast controller_show_count controller_unlock_1_1 controller_unlock_2_1 controller_unlock_1_3 controller_unlock_2_3 controller_show_count controller_unlock_2_2 controller_show_count controller_unlock_1_2 s1_insert_other s1_commit controller_get_changes
+data           
+
+step controller_locks: SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);
+pg_advisory_locksess           lock           
+
+               1              1              
+               1              2              
+               1              3              
+               2              1              
+               2              2              
+               2              3              
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step s1_begin: BEGIN;
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 3
+step s1_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 3
+step s2_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_1_1: SELECT pg_advisory_unlock(1, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_2_1: SELECT pg_advisory_unlock(2, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_1_3: SELECT pg_advisory_unlock(1, 3);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step controller_unlock_2_3: SELECT pg_advisory_unlock(2, 3);
+pg_advisory_unlock
+
+t              
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 2
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_2_2: SELECT pg_advisory_unlock(2, 2);
+pg_advisory_unlock
+
+t              
+step s2_insert_toast: <... completed>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+1              
+step controller_unlock_1_2: SELECT pg_advisory_unlock(1, 2);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step s1_insert_toast: <... completed>
+step s1_insert_other: INSERT INTO tbl2 VALUES(1);
+step s1_commit: COMMIT;
+step controller_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+data           
+
+BEGIN          
+table public.tbl1: INSERT: a[integer]:1 b[text]:'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+COMMIT         
+BEGIN          
+table public.tbl2: INSERT: a[integer]:1
+COMMIT         
+?column?       
+
+stop           
diff --git a/contrib/test_decoding/specs/speculative_abort.spec b/contrib/test_decoding/specs/speculative_abort.spec
new file mode 100644
index 0000000..7e80a25
--- /dev/null
+++ b/contrib/test_decoding/specs/speculative_abort.spec
@@ -0,0 +1,111 @@
+# INSERT ... ON CONFLICT test verifying that speculative abort for toast
+# insertion are handled during logical decoding
+#
+# Does this by using advisory locks controlling progress of
+# insertions. By waiting when building the index keys, it's possible
+# to schedule concurrent INSERT ON CONFLICTs so that there will always
+# be a speculative conflict.
+
+setup
+{
+	SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+	DROP TABLE IF EXISTS tbl1;
+	CREATE TABLE tbl1 (a INT, b TEXT);
+	ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+	CREATE TABLE tbl2 (a INT);
+
+    CREATE OR REPLACE FUNCTION blurt_and_lock(int) RETURNS int IMMUTABLE LANGUAGE plpgsql AS $$
+    BEGIN
+        RAISE NOTICE 'blurt_and_lock() called for % in session %', $1, current_setting('spec.session')::int;
+
+	-- depending on lock state, wait for lock 2 or 3
+        IF pg_try_advisory_xact_lock(current_setting('spec.session')::int, 1) THEN
+            RAISE NOTICE 'acquiring advisory lock on 2';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 2);
+        ELSE
+            RAISE NOTICE 'acquiring advisory lock on 3';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 3);
+        END IF;
+    RETURN $1;
+    END;$$;
+
+	CREATE UNIQUE INDEX idx on tbl1(blurt_and_lock(a));
+
+	-- consume DDL
+	SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+}
+
+teardown
+{
+    DROP TABLE tbl1;
+    SELECT 'stop' FROM pg_drop_replication_slot('isolation_slot');
+}
+
+session "controller"
+setup
+{
+    SET default_transaction_isolation = 'read committed';
+    SET application_name = 'isolation/insert-specconflict-controller';
+}
+step "controller_locks" {SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);}
+step "controller_unlock_1_1" { SELECT pg_advisory_unlock(1, 1); }
+step "controller_unlock_2_1" { SELECT pg_advisory_unlock(2, 1); }
+step "controller_unlock_1_2" { SELECT pg_advisory_unlock(1, 2); }
+step "controller_unlock_2_2" { SELECT pg_advisory_unlock(2, 2); }
+step "controller_unlock_1_3" { SELECT pg_advisory_unlock(1, 3); }
+step "controller_unlock_2_3" { SELECT pg_advisory_unlock(2, 3); }
+step "controller_show_count" { SELECT COUNT(*) FROM tbl1; }
+step "controller_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); }
+
+session "s1"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 1;
+    SET application_name = 'isolation/insert-specconflict-s1';
+}
+
+step "s1_begin"  { BEGIN; }
+step "s1_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+step "s1_insert_other" { INSERT INTO tbl2 VALUES(1); }
+step "s1_commit"  { COMMIT; }
+
+session "s2"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 2;
+    SET application_name = 'isolation/insert-specconflict-s2';
+}
+step "s2_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+
+
+# Test logical decoding of speculative aborts for toast insertion followed by
+# insertion into a different table which doesn't have a toast
+permutation
+   # acquire a number of locks, to control execution flow - the
+   # blurt_and_lock function acquires advisory locks that allow us to
+   # continue after a) the optimistic conflict probe b) after the
+   # insertion of the speculative tuple.
+   "controller_locks"
+   "controller_show_count"
+   "s1_begin"
+   "s1_insert_toast" "s2_insert_toast"
+   "controller_show_count"
+   # Switch both sessions to wait on the other lock next time (the speculative insertion)
+   "controller_unlock_1_1" "controller_unlock_2_1"
+   # Allow both sessions to continue
+   "controller_unlock_1_3" "controller_unlock_2_3"
+   "controller_show_count"
+   # Allow the second session to finish insertion
+   "controller_unlock_2_2"
+   # This should now show a successful insertion
+   "controller_show_count"
+   # Allow the first session to speculative abort
+   "controller_unlock_1_2"
+   # Insert into other table from s1 and commit
+   "s1_insert_other" "s1_commit"
+   # Get the changes
+   "controller_get_changes"
-- 
1.8.3.1

v5-0002-Test-logical-decoding-of-speculative-aborts-v10.patchtext/x-patch; charset=US-ASCII; name=v5-0002-Test-logical-decoding-of-speculative-aborts-v10.patchDownload

From a6bc6c989b3a0fcab69bb1e63af239b30dfd4f9d Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Thu, 10 Jun 2021 11:11:44 +0530
Subject: [PATCH v5 2/2] Test logical decoding of speculative aborts v10

Test logical decoding of speculative aborts for toast
insertion followed by insertion into a different table
which doesn't have a toast
---
 contrib/test_decoding/Makefile                     |   2 +-
 .../test_decoding/expected/speculative_abort.out   |  85 ++++++++++++++++
 contrib/test_decoding/specs/speculative_abort.spec | 111 +++++++++++++++++++++
 3 files changed, 197 insertions(+), 1 deletion(-)
 create mode 100644 contrib/test_decoding/expected/speculative_abort.out
 create mode 100644 contrib/test_decoding/specs/speculative_abort.spec

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 2db2b27..c7def09 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -51,7 +51,7 @@ regresscheck-install-force: | submake-regress submake-test_decoding temp-install
 	    $(REGRESSCHECKS)
 
 ISOLATIONCHECKS=mxact delayed_startup ondisk_startup concurrent_ddl_dml \
-	oldest_xmin snapshot_transfer subxact_without_top
+	oldest_xmin snapshot_transfer subxact_without_top speculative_abort
 
 isolationcheck: | submake-isolation submake-test_decoding temp-install
 	$(pg_isolation_regress_check) \
diff --git a/contrib/test_decoding/expected/speculative_abort.out b/contrib/test_decoding/expected/speculative_abort.out
new file mode 100644
index 0000000..7492506
--- /dev/null
+++ b/contrib/test_decoding/expected/speculative_abort.out
@@ -0,0 +1,85 @@
+Parsed test spec with 3 sessions
+
+starting permutation: controller_locks controller_show_count s1_begin s1_insert_toast s2_insert_toast controller_show_count controller_unlock_1_1 controller_unlock_2_1 controller_unlock_1_3 controller_unlock_2_3 controller_show_count controller_unlock_2_2 controller_show_count controller_unlock_1_2 s1_insert_other s1_commit controller_get_changes
+data           
+
+step controller_locks: SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);
+pg_advisory_locksess           lock           
+
+               1              1              
+               1              2              
+               1              3              
+               2              1              
+               2              2              
+               2              3              
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step s1_begin: BEGIN;
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 3
+step s1_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 3
+step s2_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_1_1: SELECT pg_advisory_unlock(1, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_2_1: SELECT pg_advisory_unlock(2, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_1_3: SELECT pg_advisory_unlock(1, 3);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step controller_unlock_2_3: SELECT pg_advisory_unlock(2, 3);
+pg_advisory_unlock
+
+t              
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 2
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_2_2: SELECT pg_advisory_unlock(2, 2);
+pg_advisory_unlock
+
+t              
+step s2_insert_toast: <... completed>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+1              
+step controller_unlock_1_2: SELECT pg_advisory_unlock(1, 2);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step s1_insert_toast: <... completed>
+step s1_insert_other: INSERT INTO tbl2 VALUES(1);
+step s1_commit: COMMIT;
+step controller_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+data           
+
+BEGIN          
+table public.tbl1: INSERT: a[integer]:1 b[text]:'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+COMMIT         
+BEGIN          
+table public.tbl2: INSERT: a[integer]:1
+COMMIT         
+?column?       
+
+stop           
diff --git a/contrib/test_decoding/specs/speculative_abort.spec b/contrib/test_decoding/specs/speculative_abort.spec
new file mode 100644
index 0000000..7e80a25
--- /dev/null
+++ b/contrib/test_decoding/specs/speculative_abort.spec
@@ -0,0 +1,111 @@
+# INSERT ... ON CONFLICT test verifying that speculative abort for toast
+# insertion are handled during logical decoding
+#
+# Does this by using advisory locks controlling progress of
+# insertions. By waiting when building the index keys, it's possible
+# to schedule concurrent INSERT ON CONFLICTs so that there will always
+# be a speculative conflict.
+
+setup
+{
+	SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+	DROP TABLE IF EXISTS tbl1;
+	CREATE TABLE tbl1 (a INT, b TEXT);
+	ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+	CREATE TABLE tbl2 (a INT);
+
+    CREATE OR REPLACE FUNCTION blurt_and_lock(int) RETURNS int IMMUTABLE LANGUAGE plpgsql AS $$
+    BEGIN
+        RAISE NOTICE 'blurt_and_lock() called for % in session %', $1, current_setting('spec.session')::int;
+
+	-- depending on lock state, wait for lock 2 or 3
+        IF pg_try_advisory_xact_lock(current_setting('spec.session')::int, 1) THEN
+            RAISE NOTICE 'acquiring advisory lock on 2';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 2);
+        ELSE
+            RAISE NOTICE 'acquiring advisory lock on 3';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 3);
+        END IF;
+    RETURN $1;
+    END;$$;
+
+	CREATE UNIQUE INDEX idx on tbl1(blurt_and_lock(a));
+
+	-- consume DDL
+	SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+}
+
+teardown
+{
+    DROP TABLE tbl1;
+    SELECT 'stop' FROM pg_drop_replication_slot('isolation_slot');
+}
+
+session "controller"
+setup
+{
+    SET default_transaction_isolation = 'read committed';
+    SET application_name = 'isolation/insert-specconflict-controller';
+}
+step "controller_locks" {SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);}
+step "controller_unlock_1_1" { SELECT pg_advisory_unlock(1, 1); }
+step "controller_unlock_2_1" { SELECT pg_advisory_unlock(2, 1); }
+step "controller_unlock_1_2" { SELECT pg_advisory_unlock(1, 2); }
+step "controller_unlock_2_2" { SELECT pg_advisory_unlock(2, 2); }
+step "controller_unlock_1_3" { SELECT pg_advisory_unlock(1, 3); }
+step "controller_unlock_2_3" { SELECT pg_advisory_unlock(2, 3); }
+step "controller_show_count" { SELECT COUNT(*) FROM tbl1; }
+step "controller_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); }
+
+session "s1"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 1;
+    SET application_name = 'isolation/insert-specconflict-s1';
+}
+
+step "s1_begin"  { BEGIN; }
+step "s1_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+step "s1_insert_other" { INSERT INTO tbl2 VALUES(1); }
+step "s1_commit"  { COMMIT; }
+
+session "s2"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 2;
+    SET application_name = 'isolation/insert-specconflict-s2';
+}
+step "s2_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+
+
+# Test logical decoding of speculative aborts for toast insertion followed by
+# insertion into a different table which doesn't have a toast
+permutation
+   # acquire a number of locks, to control execution flow - the
+   # blurt_and_lock function acquires advisory locks that allow us to
+   # continue after a) the optimistic conflict probe b) after the
+   # insertion of the speculative tuple.
+   "controller_locks"
+   "controller_show_count"
+   "s1_begin"
+   "s1_insert_toast" "s2_insert_toast"
+   "controller_show_count"
+   # Switch both sessions to wait on the other lock next time (the speculative insertion)
+   "controller_unlock_1_1" "controller_unlock_2_1"
+   # Allow both sessions to continue
+   "controller_unlock_1_3" "controller_unlock_2_3"
+   "controller_show_count"
+   # Allow the second session to finish insertion
+   "controller_unlock_2_2"
+   # This should now show a successful insertion
+   "controller_show_count"
+   # Allow the first session to speculative abort
+   "controller_unlock_1_2"
+   # Insert into other table from s1 and commit
+   "s1_insert_other" "s1_commit"
+   # Get the changes
+   "controller_get_changes"
-- 
1.8.3.1

v5-0002-Test-logical-decoding-of-speculative-aborts-v11.patchtext/x-patch; charset=US-ASCII; name=v5-0002-Test-logical-decoding-of-speculative-aborts-v11.patchDownload

From d5f3cb36e68fde7dcd2a207a8e5443ebf7d23c83 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Thu, 10 Jun 2021 11:11:44 +0530
Subject: [PATCH v5 2/2] Test logical decoding of speculative aborts v11

Test logical decoding of speculative aborts for toast
insertion followed by insertion into a different table
which doesn't have a toast
---
 contrib/test_decoding/Makefile                     |   2 +-
 .../test_decoding/expected/speculative_abort.out   |  85 ++++++++++++++++
 contrib/test_decoding/specs/speculative_abort.spec | 111 +++++++++++++++++++++
 3 files changed, 197 insertions(+), 1 deletion(-)
 create mode 100644 contrib/test_decoding/expected/speculative_abort.out
 create mode 100644 contrib/test_decoding/specs/speculative_abort.spec

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 65a91a8..29e5a3e 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -51,7 +51,7 @@ regresscheck-install-force: | submake-regress submake-test_decoding temp-install
 	    $(REGRESSCHECKS)
 
 ISOLATIONCHECKS=mxact delayed_startup ondisk_startup concurrent_ddl_dml \
-	oldest_xmin snapshot_transfer subxact_without_top
+	oldest_xmin snapshot_transfer subxact_without_top speculative_abort
 
 isolationcheck: | submake-isolation submake-test_decoding temp-install
 	$(pg_isolation_regress_check) \
diff --git a/contrib/test_decoding/expected/speculative_abort.out b/contrib/test_decoding/expected/speculative_abort.out
new file mode 100644
index 0000000..7492506
--- /dev/null
+++ b/contrib/test_decoding/expected/speculative_abort.out
@@ -0,0 +1,85 @@
+Parsed test spec with 3 sessions
+
+starting permutation: controller_locks controller_show_count s1_begin s1_insert_toast s2_insert_toast controller_show_count controller_unlock_1_1 controller_unlock_2_1 controller_unlock_1_3 controller_unlock_2_3 controller_show_count controller_unlock_2_2 controller_show_count controller_unlock_1_2 s1_insert_other s1_commit controller_get_changes
+data           
+
+step controller_locks: SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);
+pg_advisory_locksess           lock           
+
+               1              1              
+               1              2              
+               1              3              
+               2              1              
+               2              2              
+               2              3              
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step s1_begin: BEGIN;
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 3
+step s1_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 3
+step s2_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_1_1: SELECT pg_advisory_unlock(1, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_2_1: SELECT pg_advisory_unlock(2, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_1_3: SELECT pg_advisory_unlock(1, 3);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step controller_unlock_2_3: SELECT pg_advisory_unlock(2, 3);
+pg_advisory_unlock
+
+t              
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 2
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_2_2: SELECT pg_advisory_unlock(2, 2);
+pg_advisory_unlock
+
+t              
+step s2_insert_toast: <... completed>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+1              
+step controller_unlock_1_2: SELECT pg_advisory_unlock(1, 2);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step s1_insert_toast: <... completed>
+step s1_insert_other: INSERT INTO tbl2 VALUES(1);
+step s1_commit: COMMIT;
+step controller_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+data           
+
+BEGIN          
+table public.tbl1: INSERT: a[integer]:1 b[text]:'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+COMMIT         
+BEGIN          
+table public.tbl2: INSERT: a[integer]:1
+COMMIT         
+?column?       
+
+stop           
diff --git a/contrib/test_decoding/specs/speculative_abort.spec b/contrib/test_decoding/specs/speculative_abort.spec
new file mode 100644
index 0000000..7e80a25
--- /dev/null
+++ b/contrib/test_decoding/specs/speculative_abort.spec
@@ -0,0 +1,111 @@
+# INSERT ... ON CONFLICT test verifying that speculative abort for toast
+# insertion are handled during logical decoding
+#
+# Does this by using advisory locks controlling progress of
+# insertions. By waiting when building the index keys, it's possible
+# to schedule concurrent INSERT ON CONFLICTs so that there will always
+# be a speculative conflict.
+
+setup
+{
+	SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+	DROP TABLE IF EXISTS tbl1;
+	CREATE TABLE tbl1 (a INT, b TEXT);
+	ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+	CREATE TABLE tbl2 (a INT);
+
+    CREATE OR REPLACE FUNCTION blurt_and_lock(int) RETURNS int IMMUTABLE LANGUAGE plpgsql AS $$
+    BEGIN
+        RAISE NOTICE 'blurt_and_lock() called for % in session %', $1, current_setting('spec.session')::int;
+
+	-- depending on lock state, wait for lock 2 or 3
+        IF pg_try_advisory_xact_lock(current_setting('spec.session')::int, 1) THEN
+            RAISE NOTICE 'acquiring advisory lock on 2';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 2);
+        ELSE
+            RAISE NOTICE 'acquiring advisory lock on 3';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 3);
+        END IF;
+    RETURN $1;
+    END;$$;
+
+	CREATE UNIQUE INDEX idx on tbl1(blurt_and_lock(a));
+
+	-- consume DDL
+	SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+}
+
+teardown
+{
+    DROP TABLE tbl1;
+    SELECT 'stop' FROM pg_drop_replication_slot('isolation_slot');
+}
+
+session "controller"
+setup
+{
+    SET default_transaction_isolation = 'read committed';
+    SET application_name = 'isolation/insert-specconflict-controller';
+}
+step "controller_locks" {SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);}
+step "controller_unlock_1_1" { SELECT pg_advisory_unlock(1, 1); }
+step "controller_unlock_2_1" { SELECT pg_advisory_unlock(2, 1); }
+step "controller_unlock_1_2" { SELECT pg_advisory_unlock(1, 2); }
+step "controller_unlock_2_2" { SELECT pg_advisory_unlock(2, 2); }
+step "controller_unlock_1_3" { SELECT pg_advisory_unlock(1, 3); }
+step "controller_unlock_2_3" { SELECT pg_advisory_unlock(2, 3); }
+step "controller_show_count" { SELECT COUNT(*) FROM tbl1; }
+step "controller_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); }
+
+session "s1"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 1;
+    SET application_name = 'isolation/insert-specconflict-s1';
+}
+
+step "s1_begin"  { BEGIN; }
+step "s1_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+step "s1_insert_other" { INSERT INTO tbl2 VALUES(1); }
+step "s1_commit"  { COMMIT; }
+
+session "s2"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 2;
+    SET application_name = 'isolation/insert-specconflict-s2';
+}
+step "s2_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+
+
+# Test logical decoding of speculative aborts for toast insertion followed by
+# insertion into a different table which doesn't have a toast
+permutation
+   # acquire a number of locks, to control execution flow - the
+   # blurt_and_lock function acquires advisory locks that allow us to
+   # continue after a) the optimistic conflict probe b) after the
+   # insertion of the speculative tuple.
+   "controller_locks"
+   "controller_show_count"
+   "s1_begin"
+   "s1_insert_toast" "s2_insert_toast"
+   "controller_show_count"
+   # Switch both sessions to wait on the other lock next time (the speculative insertion)
+   "controller_unlock_1_1" "controller_unlock_2_1"
+   # Allow both sessions to continue
+   "controller_unlock_1_3" "controller_unlock_2_3"
+   "controller_show_count"
+   # Allow the second session to finish insertion
+   "controller_unlock_2_2"
+   # This should now show a successful insertion
+   "controller_show_count"
+   # Allow the first session to speculative abort
+   "controller_unlock_1_2"
+   # Insert into other table from s1 and commit
+   "s1_insert_other" "s1_commit"
+   # Get the changes
+   "controller_get_changes"
-- 
1.8.3.1

v5-0002-Test-logical-decoding-of-speculative-aborts-v12_and_v13.patchtext/x-patch; charset=US-ASCII; name=v5-0002-Test-logical-decoding-of-speculative-aborts-v12_and_v13.patchDownload

From 34f99cd579b7994707b207ef8fd748a40c970e4a Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Thu, 10 Jun 2021 11:31:31 +0530
Subject: [PATCH v5 2/2] Test logical decoding of speculative aborts v13

Test logical decoding of speculative aborts for toast
insertion followed by insertion into a different table
which doesn't have a toast
---
 contrib/test_decoding/Makefile                     |   2 +-
 .../test_decoding/expected/speculative_abort.out   |  85 ++++++++++++++++
 contrib/test_decoding/specs/speculative_abort.spec | 111 +++++++++++++++++++++
 3 files changed, 197 insertions(+), 1 deletion(-)
 create mode 100644 contrib/test_decoding/expected/speculative_abort.out
 create mode 100644 contrib/test_decoding/specs/speculative_abort.spec

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index f439c58..14e60bd 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -7,7 +7,7 @@ REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
 	decoding_into_rel binary prepared replorigin time messages \
 	spill slot truncate
 ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
-	oldest_xmin snapshot_transfer subxact_without_top
+	oldest_xmin snapshot_transfer subxact_without_top speculative_abort
 
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/expected/speculative_abort.out b/contrib/test_decoding/expected/speculative_abort.out
new file mode 100644
index 0000000..7492506
--- /dev/null
+++ b/contrib/test_decoding/expected/speculative_abort.out
@@ -0,0 +1,85 @@
+Parsed test spec with 3 sessions
+
+starting permutation: controller_locks controller_show_count s1_begin s1_insert_toast s2_insert_toast controller_show_count controller_unlock_1_1 controller_unlock_2_1 controller_unlock_1_3 controller_unlock_2_3 controller_show_count controller_unlock_2_2 controller_show_count controller_unlock_1_2 s1_insert_other s1_commit controller_get_changes
+data           
+
+step controller_locks: SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);
+pg_advisory_locksess           lock           
+
+               1              1              
+               1              2              
+               1              3              
+               2              1              
+               2              2              
+               2              3              
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step s1_begin: BEGIN;
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 3
+step s1_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 3
+step s2_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_1_1: SELECT pg_advisory_unlock(1, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_2_1: SELECT pg_advisory_unlock(2, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_1_3: SELECT pg_advisory_unlock(1, 3);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step controller_unlock_2_3: SELECT pg_advisory_unlock(2, 3);
+pg_advisory_unlock
+
+t              
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 2
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_2_2: SELECT pg_advisory_unlock(2, 2);
+pg_advisory_unlock
+
+t              
+step s2_insert_toast: <... completed>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+1              
+step controller_unlock_1_2: SELECT pg_advisory_unlock(1, 2);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step s1_insert_toast: <... completed>
+step s1_insert_other: INSERT INTO tbl2 VALUES(1);
+step s1_commit: COMMIT;
+step controller_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+data           
+
+BEGIN          
+table public.tbl1: INSERT: a[integer]:1 b[text]:'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+COMMIT         
+BEGIN          
+table public.tbl2: INSERT: a[integer]:1
+COMMIT         
+?column?       
+
+stop           
diff --git a/contrib/test_decoding/specs/speculative_abort.spec b/contrib/test_decoding/specs/speculative_abort.spec
new file mode 100644
index 0000000..7e80a25
--- /dev/null
+++ b/contrib/test_decoding/specs/speculative_abort.spec
@@ -0,0 +1,111 @@
+# INSERT ... ON CONFLICT test verifying that speculative abort for toast
+# insertion are handled during logical decoding
+#
+# Does this by using advisory locks controlling progress of
+# insertions. By waiting when building the index keys, it's possible
+# to schedule concurrent INSERT ON CONFLICTs so that there will always
+# be a speculative conflict.
+
+setup
+{
+	SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+	DROP TABLE IF EXISTS tbl1;
+	CREATE TABLE tbl1 (a INT, b TEXT);
+	ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+	CREATE TABLE tbl2 (a INT);
+
+    CREATE OR REPLACE FUNCTION blurt_and_lock(int) RETURNS int IMMUTABLE LANGUAGE plpgsql AS $$
+    BEGIN
+        RAISE NOTICE 'blurt_and_lock() called for % in session %', $1, current_setting('spec.session')::int;
+
+	-- depending on lock state, wait for lock 2 or 3
+        IF pg_try_advisory_xact_lock(current_setting('spec.session')::int, 1) THEN
+            RAISE NOTICE 'acquiring advisory lock on 2';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 2);
+        ELSE
+            RAISE NOTICE 'acquiring advisory lock on 3';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 3);
+        END IF;
+    RETURN $1;
+    END;$$;
+
+	CREATE UNIQUE INDEX idx on tbl1(blurt_and_lock(a));
+
+	-- consume DDL
+	SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+}
+
+teardown
+{
+    DROP TABLE tbl1;
+    SELECT 'stop' FROM pg_drop_replication_slot('isolation_slot');
+}
+
+session "controller"
+setup
+{
+    SET default_transaction_isolation = 'read committed';
+    SET application_name = 'isolation/insert-specconflict-controller';
+}
+step "controller_locks" {SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);}
+step "controller_unlock_1_1" { SELECT pg_advisory_unlock(1, 1); }
+step "controller_unlock_2_1" { SELECT pg_advisory_unlock(2, 1); }
+step "controller_unlock_1_2" { SELECT pg_advisory_unlock(1, 2); }
+step "controller_unlock_2_2" { SELECT pg_advisory_unlock(2, 2); }
+step "controller_unlock_1_3" { SELECT pg_advisory_unlock(1, 3); }
+step "controller_unlock_2_3" { SELECT pg_advisory_unlock(2, 3); }
+step "controller_show_count" { SELECT COUNT(*) FROM tbl1; }
+step "controller_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); }
+
+session "s1"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 1;
+    SET application_name = 'isolation/insert-specconflict-s1';
+}
+
+step "s1_begin"  { BEGIN; }
+step "s1_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+step "s1_insert_other" { INSERT INTO tbl2 VALUES(1); }
+step "s1_commit"  { COMMIT; }
+
+session "s2"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 2;
+    SET application_name = 'isolation/insert-specconflict-s2';
+}
+step "s2_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+
+
+# Test logical decoding of speculative aborts for toast insertion followed by
+# insertion into a different table which doesn't have a toast
+permutation
+   # acquire a number of locks, to control execution flow - the
+   # blurt_and_lock function acquires advisory locks that allow us to
+   # continue after a) the optimistic conflict probe b) after the
+   # insertion of the speculative tuple.
+   "controller_locks"
+   "controller_show_count"
+   "s1_begin"
+   "s1_insert_toast" "s2_insert_toast"
+   "controller_show_count"
+   # Switch both sessions to wait on the other lock next time (the speculative insertion)
+   "controller_unlock_1_1" "controller_unlock_2_1"
+   # Allow both sessions to continue
+   "controller_unlock_1_3" "controller_unlock_2_3"
+   "controller_show_count"
+   # Allow the second session to finish insertion
+   "controller_unlock_2_2"
+   # This should now show a successful insertion
+   "controller_show_count"
+   # Allow the first session to speculative abort
+   "controller_unlock_1_2"
+   # Insert into other table from s1 and commit
+   "s1_insert_other" "s1_commit"
+   # Get the changes
+   "controller_get_changes"
-- 
1.8.3.1

v5-0002-Test-logical-decoding-of-speculative-aborts-v96.patchtext/x-patch; charset=US-ASCII; name=v5-0002-Test-logical-decoding-of-speculative-aborts-v96.patchDownload

From 4d8e5a68b63ca14abfef755d78f383746b41000f Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Thu, 10 Jun 2021 13:57:17 +0530
Subject: [PATCH v5 2/2] Test logical decoding of speculative aborts v96

Test logical decoding of speculative aborts for toast
insertion followed by insertion into a different table
which doesn't have a toast
---
 contrib/test_decoding/Makefile                     |   2 +-
 .../test_decoding/expected/speculative_abort.out   |  85 ++++++++++++++++
 contrib/test_decoding/specs/speculative_abort.spec | 111 +++++++++++++++++++++
 3 files changed, 197 insertions(+), 1 deletion(-)
 create mode 100644 contrib/test_decoding/expected/speculative_abort.out
 create mode 100644 contrib/test_decoding/specs/speculative_abort.spec

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index b6fc8da..18dcd2d 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -54,7 +54,7 @@ regresscheck-install-force: | submake-regress submake-test_decoding temp-install
 	    $(REGRESSCHECKS)
 
 ISOLATIONCHECKS=mxact delayed_startup ondisk_startup concurrent_ddl_dml \
-	oldest_xmin snapshot_transfer subxact_without_top
+	oldest_xmin snapshot_transfer subxact_without_top speculative_abort
 
 isolationcheck: | submake-isolation submake-test_decoding temp-install
 	$(MKDIR_P) isolation_output
diff --git a/contrib/test_decoding/expected/speculative_abort.out b/contrib/test_decoding/expected/speculative_abort.out
new file mode 100644
index 0000000..7492506
--- /dev/null
+++ b/contrib/test_decoding/expected/speculative_abort.out
@@ -0,0 +1,85 @@
+Parsed test spec with 3 sessions
+
+starting permutation: controller_locks controller_show_count s1_begin s1_insert_toast s2_insert_toast controller_show_count controller_unlock_1_1 controller_unlock_2_1 controller_unlock_1_3 controller_unlock_2_3 controller_show_count controller_unlock_2_2 controller_show_count controller_unlock_1_2 s1_insert_other s1_commit controller_get_changes
+data           
+
+step controller_locks: SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);
+pg_advisory_locksess           lock           
+
+               1              1              
+               1              2              
+               1              3              
+               2              1              
+               2              2              
+               2              3              
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step s1_begin: BEGIN;
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 3
+step s1_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 3
+step s2_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_1_1: SELECT pg_advisory_unlock(1, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_2_1: SELECT pg_advisory_unlock(2, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_1_3: SELECT pg_advisory_unlock(1, 3);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step controller_unlock_2_3: SELECT pg_advisory_unlock(2, 3);
+pg_advisory_unlock
+
+t              
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 2
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_2_2: SELECT pg_advisory_unlock(2, 2);
+pg_advisory_unlock
+
+t              
+step s2_insert_toast: <... completed>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+1              
+step controller_unlock_1_2: SELECT pg_advisory_unlock(1, 2);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step s1_insert_toast: <... completed>
+step s1_insert_other: INSERT INTO tbl2 VALUES(1);
+step s1_commit: COMMIT;
+step controller_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+data           
+
+BEGIN          
+table public.tbl1: INSERT: a[integer]:1 b[text]:'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+COMMIT         
+BEGIN          
+table public.tbl2: INSERT: a[integer]:1
+COMMIT         
+?column?       
+
+stop           
diff --git a/contrib/test_decoding/specs/speculative_abort.spec b/contrib/test_decoding/specs/speculative_abort.spec
new file mode 100644
index 0000000..7e80a25
--- /dev/null
+++ b/contrib/test_decoding/specs/speculative_abort.spec
@@ -0,0 +1,111 @@
+# INSERT ... ON CONFLICT test verifying that speculative abort for toast
+# insertion are handled during logical decoding
+#
+# Does this by using advisory locks controlling progress of
+# insertions. By waiting when building the index keys, it's possible
+# to schedule concurrent INSERT ON CONFLICTs so that there will always
+# be a speculative conflict.
+
+setup
+{
+	SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+	DROP TABLE IF EXISTS tbl1;
+	CREATE TABLE tbl1 (a INT, b TEXT);
+	ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+	CREATE TABLE tbl2 (a INT);
+
+    CREATE OR REPLACE FUNCTION blurt_and_lock(int) RETURNS int IMMUTABLE LANGUAGE plpgsql AS $$
+    BEGIN
+        RAISE NOTICE 'blurt_and_lock() called for % in session %', $1, current_setting('spec.session')::int;
+
+	-- depending on lock state, wait for lock 2 or 3
+        IF pg_try_advisory_xact_lock(current_setting('spec.session')::int, 1) THEN
+            RAISE NOTICE 'acquiring advisory lock on 2';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 2);
+        ELSE
+            RAISE NOTICE 'acquiring advisory lock on 3';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 3);
+        END IF;
+    RETURN $1;
+    END;$$;
+
+	CREATE UNIQUE INDEX idx on tbl1(blurt_and_lock(a));
+
+	-- consume DDL
+	SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+}
+
+teardown
+{
+    DROP TABLE tbl1;
+    SELECT 'stop' FROM pg_drop_replication_slot('isolation_slot');
+}
+
+session "controller"
+setup
+{
+    SET default_transaction_isolation = 'read committed';
+    SET application_name = 'isolation/insert-specconflict-controller';
+}
+step "controller_locks" {SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);}
+step "controller_unlock_1_1" { SELECT pg_advisory_unlock(1, 1); }
+step "controller_unlock_2_1" { SELECT pg_advisory_unlock(2, 1); }
+step "controller_unlock_1_2" { SELECT pg_advisory_unlock(1, 2); }
+step "controller_unlock_2_2" { SELECT pg_advisory_unlock(2, 2); }
+step "controller_unlock_1_3" { SELECT pg_advisory_unlock(1, 3); }
+step "controller_unlock_2_3" { SELECT pg_advisory_unlock(2, 3); }
+step "controller_show_count" { SELECT COUNT(*) FROM tbl1; }
+step "controller_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); }
+
+session "s1"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 1;
+    SET application_name = 'isolation/insert-specconflict-s1';
+}
+
+step "s1_begin"  { BEGIN; }
+step "s1_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+step "s1_insert_other" { INSERT INTO tbl2 VALUES(1); }
+step "s1_commit"  { COMMIT; }
+
+session "s2"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 2;
+    SET application_name = 'isolation/insert-specconflict-s2';
+}
+step "s2_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+
+
+# Test logical decoding of speculative aborts for toast insertion followed by
+# insertion into a different table which doesn't have a toast
+permutation
+   # acquire a number of locks, to control execution flow - the
+   # blurt_and_lock function acquires advisory locks that allow us to
+   # continue after a) the optimistic conflict probe b) after the
+   # insertion of the speculative tuple.
+   "controller_locks"
+   "controller_show_count"
+   "s1_begin"
+   "s1_insert_toast" "s2_insert_toast"
+   "controller_show_count"
+   # Switch both sessions to wait on the other lock next time (the speculative insertion)
+   "controller_unlock_1_1" "controller_unlock_2_1"
+   # Allow both sessions to continue
+   "controller_unlock_1_3" "controller_unlock_2_3"
+   "controller_show_count"
+   # Allow the second session to finish insertion
+   "controller_unlock_2_2"
+   # This should now show a successful insertion
+   "controller_show_count"
+   # Allow the first session to speculative abort
+   "controller_unlock_1_2"
+   # Insert into other table from s1 and commit
+   "s1_insert_other" "s1_commit"
+   # Get the changes
+   "controller_get_changes"
-- 
1.8.3.1

#45

amit.kapila16@gmail.com

over 4 years ago

In reply to: Dilip Kumar (#44)

1 attachment(s)

Re: Decoding speculative insert with toast leaks memory

On Thu, Jun 10, 2021 at 2:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Jun 9, 2021 at 8:59 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

May I suggest to use a different name in the blurt_and_lock_123()
function, so that it doesn't conflict with the one in
insert-conflict-specconflict? Thanks

Renamed to blurt_and_lock(), is that fine?

I think a non-conflicting name should be fine.

I haved fixed other comments and also prepared patches for the back branches.

Okay, I have verified the fix on all branches and the newly added test
was giving error without patch and passes with code change patch. Few
minor things:
1. You forgot to make the change in ReorderBufferChangeSize for v13 patch.
2. I have made a few changes in the HEAD patch, (a) There was an
unnecessary cleanup of spec insert at one place. I have replaced that
with Assert. (b) I have added and edited few comments both in the code
and test patch.

Please find the patch for HEAD attached. Can you please prepare the
patch for back-branches by doing all the changes I have done in the
patch for HEAD?

--
With Regards,
Amit Kapila.

Attachments:

v6-0001-Fix-decoding-of-speculative-aborts.patchapplication/octet-stream; name=v6-0001-Fix-decoding-of-speculative-aborts.patchDownload

From 3ac8ba2025f5db2b63ab94b0b51789e14ae9ce46 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Thu, 10 Jun 2021 18:29:31 +0530
Subject: [PATCH v6] Fix decoding of speculative aborts.

During decoding for speculative inserts, we were relying for cleaning
toast hash on confirmation records or next change records. But that
could lead to multiple problems (a) memory leak if there is neither a
confirmation record nor any other record after toast insertion for a
speculative insert in the transaction, (b) error and assertion failures
if the next operation is not an insert/update on the same table.

The fix is to start queuing spec abort change and clean up toast hash
and change record during its processing. Currently, we are queuing the
spec aborts for both toast and main table even though we perform cleanup
while processing the main table's spec abort record. Later, if we have a
way to distinguish between the spec abort record of toast and the main
table, we can avoid queuing the change for spec aborts of toast tables.

Reported-by: Ashutosh Bapat
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Backpatch-through: 9.6, where it was introduced
Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com
---
 contrib/test_decoding/Makefile                     |   2 +-
 .../test_decoding/expected/speculative_abort.out   |  85 ++++++++++++++++
 contrib/test_decoding/specs/speculative_abort.spec | 111 +++++++++++++++++++++
 src/backend/replication/logical/decode.c           |  14 ++-
 src/backend/replication/logical/reorderbuffer.c    |  49 ++++++---
 src/include/replication/reorderbuffer.h            |   9 +-
 6 files changed, 245 insertions(+), 25 deletions(-)
 create mode 100644 contrib/test_decoding/expected/speculative_abort.out
 create mode 100644 contrib/test_decoding/specs/speculative_abort.spec

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 9a31e0b..1cab935 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -8,7 +8,7 @@ REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
 	spill slot truncate stream stats twophase twophase_stream
 ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
 	oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
-	twophase_snapshot
+	twophase_snapshot speculative_abort
 
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/expected/speculative_abort.out b/contrib/test_decoding/expected/speculative_abort.out
new file mode 100644
index 0000000..7492506
--- /dev/null
+++ b/contrib/test_decoding/expected/speculative_abort.out
@@ -0,0 +1,85 @@
+Parsed test spec with 3 sessions
+
+starting permutation: controller_locks controller_show_count s1_begin s1_insert_toast s2_insert_toast controller_show_count controller_unlock_1_1 controller_unlock_2_1 controller_unlock_1_3 controller_unlock_2_3 controller_show_count controller_unlock_2_2 controller_show_count controller_unlock_1_2 s1_insert_other s1_commit controller_get_changes
+data           
+
+step controller_locks: SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);
+pg_advisory_locksess           lock           
+
+               1              1              
+               1              2              
+               1              3              
+               2              1              
+               2              2              
+               2              3              
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step s1_begin: BEGIN;
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 3
+step s1_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 3
+step s2_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_1_1: SELECT pg_advisory_unlock(1, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_2_1: SELECT pg_advisory_unlock(2, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_1_3: SELECT pg_advisory_unlock(1, 3);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step controller_unlock_2_3: SELECT pg_advisory_unlock(2, 3);
+pg_advisory_unlock
+
+t              
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 2
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_2_2: SELECT pg_advisory_unlock(2, 2);
+pg_advisory_unlock
+
+t              
+step s2_insert_toast: <... completed>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+1              
+step controller_unlock_1_2: SELECT pg_advisory_unlock(1, 2);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step s1_insert_toast: <... completed>
+step s1_insert_other: INSERT INTO tbl2 VALUES(1);
+step s1_commit: COMMIT;
+step controller_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+data           
+
+BEGIN          
+table public.tbl1: INSERT: a[integer]:1 b[text]:'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+COMMIT         
+BEGIN          
+table public.tbl2: INSERT: a[integer]:1
+COMMIT         
+?column?       
+
+stop           
diff --git a/contrib/test_decoding/specs/speculative_abort.spec b/contrib/test_decoding/specs/speculative_abort.spec
new file mode 100644
index 0000000..6b3cbfb
--- /dev/null
+++ b/contrib/test_decoding/specs/speculative_abort.spec
@@ -0,0 +1,111 @@
+# INSERT ... ON CONFLICT test verifying that speculative abort for toast
+# insertions are handled during logical decoding.
+#
+# Does this by using advisory locks controlling the progress of
+# insertions. By waiting when building the index keys, it's possible
+# to schedule concurrent INSERT ON CONFLICTs so that there will always
+# be a speculative conflict.
+
+setup
+{
+	SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+	DROP TABLE IF EXISTS tbl1;
+	CREATE TABLE tbl1 (a INT, b TEXT);
+	ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+	CREATE TABLE tbl2 (a INT);
+
+    CREATE OR REPLACE FUNCTION blurt_and_lock(int) RETURNS int IMMUTABLE LANGUAGE plpgsql AS $$
+    BEGIN
+        RAISE NOTICE 'blurt_and_lock() called for % in session %', $1, current_setting('spec.session')::int;
+
+	-- depending on lock state, wait for lock 2 or 3
+        IF pg_try_advisory_xact_lock(current_setting('spec.session')::int, 1) THEN
+            RAISE NOTICE 'acquiring advisory lock on 2';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 2);
+        ELSE
+            RAISE NOTICE 'acquiring advisory lock on 3';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 3);
+        END IF;
+    RETURN $1;
+    END;$$;
+
+	CREATE UNIQUE INDEX idx on tbl1(blurt_and_lock(a));
+
+	-- consume DDL
+	SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+}
+
+teardown
+{
+    DROP TABLE tbl1;
+    SELECT 'stop' FROM pg_drop_replication_slot('isolation_slot');
+}
+
+session "controller"
+setup
+{
+    SET default_transaction_isolation = 'read committed';
+    SET application_name = 'isolation/insert-specconflict-controller';
+}
+step "controller_locks" {SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);}
+step "controller_unlock_1_1" { SELECT pg_advisory_unlock(1, 1); }
+step "controller_unlock_2_1" { SELECT pg_advisory_unlock(2, 1); }
+step "controller_unlock_1_2" { SELECT pg_advisory_unlock(1, 2); }
+step "controller_unlock_2_2" { SELECT pg_advisory_unlock(2, 2); }
+step "controller_unlock_1_3" { SELECT pg_advisory_unlock(1, 3); }
+step "controller_unlock_2_3" { SELECT pg_advisory_unlock(2, 3); }
+step "controller_show_count" { SELECT COUNT(*) FROM tbl1; }
+step "controller_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); }
+
+session "s1"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 1;
+    SET application_name = 'isolation/insert-specconflict-s1';
+}
+
+step "s1_begin"  { BEGIN; }
+step "s1_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+step "s1_insert_other" { INSERT INTO tbl2 VALUES(1); }
+step "s1_commit"  { COMMIT; }
+
+session "s2"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 2;
+    SET application_name = 'isolation/insert-specconflict-s2';
+}
+step "s2_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+
+
+# Test logical decoding of speculative aborts for toast insertion followed by
+# insertion into a different table which doesn't have a toast.
+permutation
+   # acquire a number of locks, to control execution flow - the
+   # blurt_and_lock function acquires advisory locks that allow us to
+   # continue after a) the optimistic conflict probe b) after the
+   # insertion of the speculative tuple.
+   "controller_locks"
+   "controller_show_count"
+   "s1_begin"
+   "s1_insert_toast" "s2_insert_toast"
+   "controller_show_count"
+   # Switch both sessions to wait on the other lock next time (the speculative insertion)
+   "controller_unlock_1_1" "controller_unlock_2_1"
+   # Allow both sessions to continue
+   "controller_unlock_1_3" "controller_unlock_2_3"
+   "controller_show_count"
+   # Allow the second session to finish insertion
+   "controller_unlock_2_2"
+   # This should now show a successful insertion
+   "controller_show_count"
+   # Allow the first session to speculative abort
+   "controller_unlock_1_2"
+   # Insert into other table from s1 and commit
+   "s1_insert_other" "s1_commit"
+   # Get the changes
+   "controller_get_changes"
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 7067016..453efc5 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -1040,19 +1040,17 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (target_node.dbNode != ctx->slot->data.database)
 		return;
 
-	/*
-	 * Super deletions are irrelevant for logical decoding, it's driven by the
-	 * confirmation records.
-	 */
-	if (xlrec->flags & XLH_DELETE_IS_SUPER)
-		return;
-
 	/* output plugin doesn't look for this origin, no need to queue */
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
 	change = ReorderBufferGetChange(ctx->reorder);
-	change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+	if (xlrec->flags & XLH_DELETE_IS_SUPER)
+		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
+	else
+		change->action = REORDER_BUFFER_CHANGE_DELETE;
+
 	change->origin_id = XLogRecGetOrigin(r);
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 2d9e127..d905f32 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -443,6 +443,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	/* Reset the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	pfree(txn);
 }
 
@@ -520,6 +523,7 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change,
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
@@ -2211,8 +2215,8 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			change_done:
 
 					/*
-					 * Either speculative insertion was confirmed, or it was
-					 * unsuccessful and the record isn't needed anymore.
+					 * If speculative insertion was confirmed, the record isn't
+					 * needed anymore.
 					 */
 					if (specinsert != NULL)
 					{
@@ -2254,6 +2258,32 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 					specinsert = change;
 					break;
 
+				case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+
+					/*
+					 * Abort for speculative insertion arrived. So cleanup the
+					 * specinsert tuple and toast hash.
+					 *
+					 * Note that we get the spec abort change for each toast
+					 * entry but we need to perform the cleanup only the first
+					 * time we get it for the main table.
+					 */
+					if (specinsert != NULL)
+					{
+						/*
+						 * We must clean the toast hash before processing a
+						 * completely new tuple to avoid confusion about the
+						 * previous tuple's toast chunks.
+						 */
+						Assert(change->data.tp.clear_toast_afterwards);
+						ReorderBufferToastReset(rb, txn);
+
+						/* We don't need this record anymore. */
+						ReorderBufferReturnChange(rb, specinsert, true);
+						specinsert = NULL;
+					}
+					break;
+
 				case REORDER_BUFFER_CHANGE_TRUNCATE:
 					{
 						int			i;
@@ -2360,16 +2390,8 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			}
 		}
 
-		/*
-		 * There's a speculative insertion remaining, just clean in up, it
-		 * can't have been successful, otherwise we'd gotten a confirmation
-		 * record.
-		 */
-		if (specinsert)
-		{
-			ReorderBufferReturnChange(rb, specinsert, true);
-			specinsert = NULL;
-		}
+		/* speculative insertion record must be freed by now */
+		Assert(!specinsert);
 
 		/* clean up the iterator */
 		ReorderBufferIterTXNFinish(rb, iterstate);
@@ -3754,6 +3776,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -4017,6 +4040,7 @@ ReorderBufferChangeSize(ReorderBufferChange *change)
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -4315,6 +4339,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 0c6e9d1..ba257d8 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -46,10 +46,10 @@ typedef struct ReorderBufferTupleBuf
  * changes. Users of the decoding facilities will never see changes with
  * *_INTERNAL_* actions.
  *
- * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM changes concern
- * "speculative insertions", and their confirmation respectively.  They're
- * used by INSERT .. ON CONFLICT .. UPDATE.  Users of logical decoding don't
- * have to care about these.
+ * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM, and INTERNAL_SPEC_ABORT
+ * changes concern "speculative insertions", their confirmation, and abort
+ * respectively.  They're used by INSERT .. ON CONFLICT .. UPDATE.  Users of
+ * logical decoding don't have to care about these.
  */
 enum ReorderBufferChangeType
 {
@@ -63,6 +63,7 @@ enum ReorderBufferChangeType
 	REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT,
 	REORDER_BUFFER_CHANGE_TRUNCATE
 };
 
-- 
1.8.3.1

#46

dilipbalaut@gmail.com

over 4 years ago

In reply to: Amit Kapila (#45)

6 attachment(s)

Re: Decoding speculative insert with toast leaks memory

On Thu, Jun 10, 2021 at 7:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Please find the patch for HEAD attached. Can you please prepare the
patch for back-branches by doing all the changes I have done in the
patch for HEAD?

Done

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v7-0001-Fix-decoding-of-speculative-aborts.patchtext/x-patch; charset=US-ASCII; name=v7-0001-Fix-decoding-of-speculative-aborts.patchDownload

From 3ac8ba2025f5db2b63ab94b0b51789e14ae9ce46 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Thu, 10 Jun 2021 18:29:31 +0530
Subject: [PATCH v6] Fix decoding of speculative aborts.

During decoding for speculative inserts, we were relying for cleaning
toast hash on confirmation records or next change records. But that
could lead to multiple problems (a) memory leak if there is neither a
confirmation record nor any other record after toast insertion for a
speculative insert in the transaction, (b) error and assertion failures
if the next operation is not an insert/update on the same table.

The fix is to start queuing spec abort change and clean up toast hash
and change record during its processing. Currently, we are queuing the
spec aborts for both toast and main table even though we perform cleanup
while processing the main table's spec abort record. Later, if we have a
way to distinguish between the spec abort record of toast and the main
table, we can avoid queuing the change for spec aborts of toast tables.

Reported-by: Ashutosh Bapat
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Backpatch-through: 9.6, where it was introduced
Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com
---
 contrib/test_decoding/Makefile                     |   2 +-
 .../test_decoding/expected/speculative_abort.out   |  85 ++++++++++++++++
 contrib/test_decoding/specs/speculative_abort.spec | 111 +++++++++++++++++++++
 src/backend/replication/logical/decode.c           |  14 ++-
 src/backend/replication/logical/reorderbuffer.c    |  49 ++++++---
 src/include/replication/reorderbuffer.h            |   9 +-
 6 files changed, 245 insertions(+), 25 deletions(-)
 create mode 100644 contrib/test_decoding/expected/speculative_abort.out
 create mode 100644 contrib/test_decoding/specs/speculative_abort.spec

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 9a31e0b..1cab935 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -8,7 +8,7 @@ REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
 	spill slot truncate stream stats twophase twophase_stream
 ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
 	oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
-	twophase_snapshot
+	twophase_snapshot speculative_abort
 
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/expected/speculative_abort.out b/contrib/test_decoding/expected/speculative_abort.out
new file mode 100644
index 0000000..7492506
--- /dev/null
+++ b/contrib/test_decoding/expected/speculative_abort.out
@@ -0,0 +1,85 @@
+Parsed test spec with 3 sessions
+
+starting permutation: controller_locks controller_show_count s1_begin s1_insert_toast s2_insert_toast controller_show_count controller_unlock_1_1 controller_unlock_2_1 controller_unlock_1_3 controller_unlock_2_3 controller_show_count controller_unlock_2_2 controller_show_count controller_unlock_1_2 s1_insert_other s1_commit controller_get_changes
+data           
+
+step controller_locks: SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);
+pg_advisory_locksess           lock           
+
+               1              1              
+               1              2              
+               1              3              
+               2              1              
+               2              2              
+               2              3              
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step s1_begin: BEGIN;
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 3
+step s1_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 3
+step s2_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_1_1: SELECT pg_advisory_unlock(1, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_2_1: SELECT pg_advisory_unlock(2, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_1_3: SELECT pg_advisory_unlock(1, 3);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step controller_unlock_2_3: SELECT pg_advisory_unlock(2, 3);
+pg_advisory_unlock
+
+t              
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 2
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_2_2: SELECT pg_advisory_unlock(2, 2);
+pg_advisory_unlock
+
+t              
+step s2_insert_toast: <... completed>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+1              
+step controller_unlock_1_2: SELECT pg_advisory_unlock(1, 2);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step s1_insert_toast: <... completed>
+step s1_insert_other: INSERT INTO tbl2 VALUES(1);
+step s1_commit: COMMIT;
+step controller_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+data           
+
+BEGIN          
+table public.tbl1: INSERT: a[integer]:1 b[text]:'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+COMMIT         
+BEGIN          
+table public.tbl2: INSERT: a[integer]:1
+COMMIT         
+?column?       
+
+stop           
diff --git a/contrib/test_decoding/specs/speculative_abort.spec b/contrib/test_decoding/specs/speculative_abort.spec
new file mode 100644
index 0000000..6b3cbfb
--- /dev/null
+++ b/contrib/test_decoding/specs/speculative_abort.spec
@@ -0,0 +1,111 @@
+# INSERT ... ON CONFLICT test verifying that speculative abort for toast
+# insertions are handled during logical decoding.
+#
+# Does this by using advisory locks controlling the progress of
+# insertions. By waiting when building the index keys, it's possible
+# to schedule concurrent INSERT ON CONFLICTs so that there will always
+# be a speculative conflict.
+
+setup
+{
+	SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+	DROP TABLE IF EXISTS tbl1;
+	CREATE TABLE tbl1 (a INT, b TEXT);
+	ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+	CREATE TABLE tbl2 (a INT);
+
+    CREATE OR REPLACE FUNCTION blurt_and_lock(int) RETURNS int IMMUTABLE LANGUAGE plpgsql AS $$
+    BEGIN
+        RAISE NOTICE 'blurt_and_lock() called for % in session %', $1, current_setting('spec.session')::int;
+
+	-- depending on lock state, wait for lock 2 or 3
+        IF pg_try_advisory_xact_lock(current_setting('spec.session')::int, 1) THEN
+            RAISE NOTICE 'acquiring advisory lock on 2';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 2);
+        ELSE
+            RAISE NOTICE 'acquiring advisory lock on 3';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 3);
+        END IF;
+    RETURN $1;
+    END;$$;
+
+	CREATE UNIQUE INDEX idx on tbl1(blurt_and_lock(a));
+
+	-- consume DDL
+	SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+}
+
+teardown
+{
+    DROP TABLE tbl1;
+    SELECT 'stop' FROM pg_drop_replication_slot('isolation_slot');
+}
+
+session "controller"
+setup
+{
+    SET default_transaction_isolation = 'read committed';
+    SET application_name = 'isolation/insert-specconflict-controller';
+}
+step "controller_locks" {SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);}
+step "controller_unlock_1_1" { SELECT pg_advisory_unlock(1, 1); }
+step "controller_unlock_2_1" { SELECT pg_advisory_unlock(2, 1); }
+step "controller_unlock_1_2" { SELECT pg_advisory_unlock(1, 2); }
+step "controller_unlock_2_2" { SELECT pg_advisory_unlock(2, 2); }
+step "controller_unlock_1_3" { SELECT pg_advisory_unlock(1, 3); }
+step "controller_unlock_2_3" { SELECT pg_advisory_unlock(2, 3); }
+step "controller_show_count" { SELECT COUNT(*) FROM tbl1; }
+step "controller_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); }
+
+session "s1"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 1;
+    SET application_name = 'isolation/insert-specconflict-s1';
+}
+
+step "s1_begin"  { BEGIN; }
+step "s1_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+step "s1_insert_other" { INSERT INTO tbl2 VALUES(1); }
+step "s1_commit"  { COMMIT; }
+
+session "s2"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 2;
+    SET application_name = 'isolation/insert-specconflict-s2';
+}
+step "s2_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+
+
+# Test logical decoding of speculative aborts for toast insertion followed by
+# insertion into a different table which doesn't have a toast.
+permutation
+   # acquire a number of locks, to control execution flow - the
+   # blurt_and_lock function acquires advisory locks that allow us to
+   # continue after a) the optimistic conflict probe b) after the
+   # insertion of the speculative tuple.
+   "controller_locks"
+   "controller_show_count"
+   "s1_begin"
+   "s1_insert_toast" "s2_insert_toast"
+   "controller_show_count"
+   # Switch both sessions to wait on the other lock next time (the speculative insertion)
+   "controller_unlock_1_1" "controller_unlock_2_1"
+   # Allow both sessions to continue
+   "controller_unlock_1_3" "controller_unlock_2_3"
+   "controller_show_count"
+   # Allow the second session to finish insertion
+   "controller_unlock_2_2"
+   # This should now show a successful insertion
+   "controller_show_count"
+   # Allow the first session to speculative abort
+   "controller_unlock_1_2"
+   # Insert into other table from s1 and commit
+   "s1_insert_other" "s1_commit"
+   # Get the changes
+   "controller_get_changes"
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 7067016..453efc5 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -1040,19 +1040,17 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (target_node.dbNode != ctx->slot->data.database)
 		return;
 
-	/*
-	 * Super deletions are irrelevant for logical decoding, it's driven by the
-	 * confirmation records.
-	 */
-	if (xlrec->flags & XLH_DELETE_IS_SUPER)
-		return;
-
 	/* output plugin doesn't look for this origin, no need to queue */
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
 	change = ReorderBufferGetChange(ctx->reorder);
-	change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+	if (xlrec->flags & XLH_DELETE_IS_SUPER)
+		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
+	else
+		change->action = REORDER_BUFFER_CHANGE_DELETE;
+
 	change->origin_id = XLogRecGetOrigin(r);
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 2d9e127..d905f32 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -443,6 +443,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	/* Reset the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	pfree(txn);
 }
 
@@ -520,6 +523,7 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change,
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
@@ -2211,8 +2215,8 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			change_done:
 
 					/*
-					 * Either speculative insertion was confirmed, or it was
-					 * unsuccessful and the record isn't needed anymore.
+					 * If speculative insertion was confirmed, the record isn't
+					 * needed anymore.
 					 */
 					if (specinsert != NULL)
 					{
@@ -2254,6 +2258,32 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 					specinsert = change;
 					break;
 
+				case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+
+					/*
+					 * Abort for speculative insertion arrived. So cleanup the
+					 * specinsert tuple and toast hash.
+					 *
+					 * Note that we get the spec abort change for each toast
+					 * entry but we need to perform the cleanup only the first
+					 * time we get it for the main table.
+					 */
+					if (specinsert != NULL)
+					{
+						/*
+						 * We must clean the toast hash before processing a
+						 * completely new tuple to avoid confusion about the
+						 * previous tuple's toast chunks.
+						 */
+						Assert(change->data.tp.clear_toast_afterwards);
+						ReorderBufferToastReset(rb, txn);
+
+						/* We don't need this record anymore. */
+						ReorderBufferReturnChange(rb, specinsert, true);
+						specinsert = NULL;
+					}
+					break;
+
 				case REORDER_BUFFER_CHANGE_TRUNCATE:
 					{
 						int			i;
@@ -2360,16 +2390,8 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			}
 		}
 
-		/*
-		 * There's a speculative insertion remaining, just clean in up, it
-		 * can't have been successful, otherwise we'd gotten a confirmation
-		 * record.
-		 */
-		if (specinsert)
-		{
-			ReorderBufferReturnChange(rb, specinsert, true);
-			specinsert = NULL;
-		}
+		/* speculative insertion record must be freed by now */
+		Assert(!specinsert);
 
 		/* clean up the iterator */
 		ReorderBufferIterTXNFinish(rb, iterstate);
@@ -3754,6 +3776,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -4017,6 +4040,7 @@ ReorderBufferChangeSize(ReorderBufferChange *change)
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -4315,6 +4339,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 0c6e9d1..ba257d8 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -46,10 +46,10 @@ typedef struct ReorderBufferTupleBuf
  * changes. Users of the decoding facilities will never see changes with
  * *_INTERNAL_* actions.
  *
- * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM changes concern
- * "speculative insertions", and their confirmation respectively.  They're
- * used by INSERT .. ON CONFLICT .. UPDATE.  Users of logical decoding don't
- * have to care about these.
+ * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM, and INTERNAL_SPEC_ABORT
+ * changes concern "speculative insertions", their confirmation, and abort
+ * respectively.  They're used by INSERT .. ON CONFLICT .. UPDATE.  Users of
+ * logical decoding don't have to care about these.
  */
 enum ReorderBufferChangeType
 {
@@ -63,6 +63,7 @@ enum ReorderBufferChangeType
 	REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT,
 	REORDER_BUFFER_CHANGE_TRUNCATE
 };
 
-- 
1.8.3.1

v7-0001-96-Fix-decoding-of-speculative-aborts.patchtext/x-patch; charset=US-ASCII; name=v7-0001-96-Fix-decoding-of-speculative-aborts.patchDownload

From 5d5116aac9db01242e36bc629fa19af816c47e23 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Fri, 11 Jun 2021 10:08:42 +0530
Subject: [PATCH v7] 96-Fix decoding of speculative aborts.

During decoding for speculative inserts, we were relying for cleaning
toast hash on confirmation records or next change records. But that
could lead to multiple problems (a) memory leak if there is neither a
confirmation record nor any other record after toast insertion for a
speculative insert in the transaction, (b) error and assertion failures
if the next operation is not an insert/update on the same table.

The fix is to start queuing spec abort change and clean up toast hash
and change record during its processing. Currently, we are queuing the
spec aborts for both toast and main table even though we perform cleanup
while processing the main table's spec abort record. Later, if we have a
way to distinguish between the spec abort record of toast and the main
table, we can avoid queuing the change for spec aborts of toast tables.

Reported-by: Ashutosh Bapat
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Backpatch-through: 9.6, where it was introduced
Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com
---
 contrib/test_decoding/Makefile                     |   2 +-
 .../test_decoding/expected/speculative_abort.out   |  85 ++++++++++++++++
 contrib/test_decoding/specs/speculative_abort.spec | 111 +++++++++++++++++++++
 src/backend/replication/logical/decode.c           |  14 ++-
 src/backend/replication/logical/reorderbuffer.c    |  48 ++++++---
 src/include/replication/reorderbuffer.h            |  11 +-
 6 files changed, 245 insertions(+), 26 deletions(-)
 create mode 100644 contrib/test_decoding/expected/speculative_abort.out
 create mode 100644 contrib/test_decoding/specs/speculative_abort.spec

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index b6fc8da..18dcd2d 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -54,7 +54,7 @@ regresscheck-install-force: | submake-regress submake-test_decoding temp-install
 	    $(REGRESSCHECKS)
 
 ISOLATIONCHECKS=mxact delayed_startup ondisk_startup concurrent_ddl_dml \
-	oldest_xmin snapshot_transfer subxact_without_top
+	oldest_xmin snapshot_transfer subxact_without_top speculative_abort
 
 isolationcheck: | submake-isolation submake-test_decoding temp-install
 	$(MKDIR_P) isolation_output
diff --git a/contrib/test_decoding/expected/speculative_abort.out b/contrib/test_decoding/expected/speculative_abort.out
new file mode 100644
index 0000000..7492506
--- /dev/null
+++ b/contrib/test_decoding/expected/speculative_abort.out
@@ -0,0 +1,85 @@
+Parsed test spec with 3 sessions
+
+starting permutation: controller_locks controller_show_count s1_begin s1_insert_toast s2_insert_toast controller_show_count controller_unlock_1_1 controller_unlock_2_1 controller_unlock_1_3 controller_unlock_2_3 controller_show_count controller_unlock_2_2 controller_show_count controller_unlock_1_2 s1_insert_other s1_commit controller_get_changes
+data           
+
+step controller_locks: SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);
+pg_advisory_locksess           lock           
+
+               1              1              
+               1              2              
+               1              3              
+               2              1              
+               2              2              
+               2              3              
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step s1_begin: BEGIN;
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 3
+step s1_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 3
+step s2_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_1_1: SELECT pg_advisory_unlock(1, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_2_1: SELECT pg_advisory_unlock(2, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_1_3: SELECT pg_advisory_unlock(1, 3);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step controller_unlock_2_3: SELECT pg_advisory_unlock(2, 3);
+pg_advisory_unlock
+
+t              
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 2
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_2_2: SELECT pg_advisory_unlock(2, 2);
+pg_advisory_unlock
+
+t              
+step s2_insert_toast: <... completed>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+1              
+step controller_unlock_1_2: SELECT pg_advisory_unlock(1, 2);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step s1_insert_toast: <... completed>
+step s1_insert_other: INSERT INTO tbl2 VALUES(1);
+step s1_commit: COMMIT;
+step controller_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+data           
+
+BEGIN          
+table public.tbl1: INSERT: a[integer]:1 b[text]:'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+COMMIT         
+BEGIN          
+table public.tbl2: INSERT: a[integer]:1
+COMMIT         
+?column?       
+
+stop           
diff --git a/contrib/test_decoding/specs/speculative_abort.spec b/contrib/test_decoding/specs/speculative_abort.spec
new file mode 100644
index 0000000..6b3cbfb
--- /dev/null
+++ b/contrib/test_decoding/specs/speculative_abort.spec
@@ -0,0 +1,111 @@
+# INSERT ... ON CONFLICT test verifying that speculative abort for toast
+# insertions are handled during logical decoding.
+#
+# Does this by using advisory locks controlling the progress of
+# insertions. By waiting when building the index keys, it's possible
+# to schedule concurrent INSERT ON CONFLICTs so that there will always
+# be a speculative conflict.
+
+setup
+{
+	SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+	DROP TABLE IF EXISTS tbl1;
+	CREATE TABLE tbl1 (a INT, b TEXT);
+	ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+	CREATE TABLE tbl2 (a INT);
+
+    CREATE OR REPLACE FUNCTION blurt_and_lock(int) RETURNS int IMMUTABLE LANGUAGE plpgsql AS $$
+    BEGIN
+        RAISE NOTICE 'blurt_and_lock() called for % in session %', $1, current_setting('spec.session')::int;
+
+	-- depending on lock state, wait for lock 2 or 3
+        IF pg_try_advisory_xact_lock(current_setting('spec.session')::int, 1) THEN
+            RAISE NOTICE 'acquiring advisory lock on 2';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 2);
+        ELSE
+            RAISE NOTICE 'acquiring advisory lock on 3';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 3);
+        END IF;
+    RETURN $1;
+    END;$$;
+
+	CREATE UNIQUE INDEX idx on tbl1(blurt_and_lock(a));
+
+	-- consume DDL
+	SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+}
+
+teardown
+{
+    DROP TABLE tbl1;
+    SELECT 'stop' FROM pg_drop_replication_slot('isolation_slot');
+}
+
+session "controller"
+setup
+{
+    SET default_transaction_isolation = 'read committed';
+    SET application_name = 'isolation/insert-specconflict-controller';
+}
+step "controller_locks" {SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);}
+step "controller_unlock_1_1" { SELECT pg_advisory_unlock(1, 1); }
+step "controller_unlock_2_1" { SELECT pg_advisory_unlock(2, 1); }
+step "controller_unlock_1_2" { SELECT pg_advisory_unlock(1, 2); }
+step "controller_unlock_2_2" { SELECT pg_advisory_unlock(2, 2); }
+step "controller_unlock_1_3" { SELECT pg_advisory_unlock(1, 3); }
+step "controller_unlock_2_3" { SELECT pg_advisory_unlock(2, 3); }
+step "controller_show_count" { SELECT COUNT(*) FROM tbl1; }
+step "controller_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); }
+
+session "s1"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 1;
+    SET application_name = 'isolation/insert-specconflict-s1';
+}
+
+step "s1_begin"  { BEGIN; }
+step "s1_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+step "s1_insert_other" { INSERT INTO tbl2 VALUES(1); }
+step "s1_commit"  { COMMIT; }
+
+session "s2"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 2;
+    SET application_name = 'isolation/insert-specconflict-s2';
+}
+step "s2_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+
+
+# Test logical decoding of speculative aborts for toast insertion followed by
+# insertion into a different table which doesn't have a toast.
+permutation
+   # acquire a number of locks, to control execution flow - the
+   # blurt_and_lock function acquires advisory locks that allow us to
+   # continue after a) the optimistic conflict probe b) after the
+   # insertion of the speculative tuple.
+   "controller_locks"
+   "controller_show_count"
+   "s1_begin"
+   "s1_insert_toast" "s2_insert_toast"
+   "controller_show_count"
+   # Switch both sessions to wait on the other lock next time (the speculative insertion)
+   "controller_unlock_1_1" "controller_unlock_2_1"
+   # Allow both sessions to continue
+   "controller_unlock_1_3" "controller_unlock_2_3"
+   "controller_show_count"
+   # Allow the second session to finish insertion
+   "controller_unlock_2_2"
+   # This should now show a successful insertion
+   "controller_show_count"
+   # Allow the first session to speculative abort
+   "controller_unlock_1_2"
+   # Insert into other table from s1 and commit
+   "s1_insert_other" "s1_commit"
+   # Get the changes
+   "controller_get_changes"
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 1300902..571a901 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -778,19 +778,17 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (target_node.dbNode != ctx->slot->data.database)
 		return;
 
-	/*
-	 * Super deletions are irrelevant for logical decoding, it's driven by the
-	 * confirmation records.
-	 */
-	if (xlrec->flags & XLH_DELETE_IS_SUPER)
-		return;
-
 	/* output plugin doesn't look for this origin, no need to queue */
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
 	change = ReorderBufferGetChange(ctx->reorder);
-	change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+	if (xlrec->flags & XLH_DELETE_IS_SUPER)
+		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
+	else
+		change->action = REORDER_BUFFER_CHANGE_DELETE;
+
 	change->origin_id = XLogRecGetOrigin(r);
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index f0de337..1cd0bbd 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -364,6 +364,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	/* Reset the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	/* check whether to put into the slab cache */
 	if (rb->nr_cached_transactions < max_cached_transactions)
 	{
@@ -449,6 +452,7 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change)
 			break;
 			/* no data in addition to the struct itself */
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
@@ -1674,8 +1678,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			change_done:
 
 					/*
-					 * Either speculative insertion was confirmed, or it was
-					 * unsuccessful and the record isn't needed anymore.
+					 * If speculative insertion was confirmed, the record isn't
+					 * needed anymore.
 					 */
 					if (specinsert != NULL)
 					{
@@ -1717,6 +1721,32 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 					specinsert = change;
 					break;
 
+				case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+
+					/*
+					 * Abort for speculative insertion arrived. So cleanup the
+					 * specinsert tuple and toast hash.
+					 *
+					 * Note that we get the spec abort change for each toast
+					 * entry but we need to perform the cleanup only the first
+					 * time we get it for the main table.
+					 */
+					if (specinsert != NULL)
+					{
+						/*
+						 * We must clean the toast hash before processing a
+						 * completely new tuple to avoid confusion about the
+						 * previous tuple's toast chunks.
+						 */
+						Assert(change->data.tp.clear_toast_afterwards);
+						ReorderBufferToastReset(rb, txn);
+
+						/* We don't need this record anymore. */
+						ReorderBufferReturnChange(rb, specinsert);
+						specinsert = NULL;
+					}
+					break;
+
 				case REORDER_BUFFER_CHANGE_MESSAGE:
 					rb->message(rb, txn, change->lsn, true,
 								change->data.msg.prefix,
@@ -1792,16 +1822,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			}
 		}
 
-		/*
-		 * There's a speculative insertion remaining, just clean in up, it
-		 * can't have been successful, otherwise we'd gotten a confirmation
-		 * record.
-		 */
-		if (specinsert)
-		{
-			ReorderBufferReturnChange(rb, specinsert);
-			specinsert = NULL;
-		}
+		/* speculative insertion record must be freed by now */
+		Assert(!specinsert);
 
 		/* clean up the iterator */
 		ReorderBufferIterTXNFinish(rb, iterstate);
@@ -2476,6 +2498,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -2764,6 +2787,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			}
 			/* the base struct contains all the data, easy peasy */
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index e085088..ddba4bd 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -44,10 +44,10 @@ typedef struct ReorderBufferTupleBuf
  * changes. Users of the decoding facilities will never see changes with
  * *_INTERNAL_* actions.
  *
- * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM changes concern
- * "speculative insertions", and their confirmation respectively.  They're
- * used by INSERT .. ON CONFLICT .. UPDATE.  Users of logical decoding don't
- * have to care about these.
+ * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM, and INTERNAL_SPEC_ABORT
+ * changes concern "speculative insertions", their confirmation, and abort
+ * respectively.  They're used by INSERT .. ON CONFLICT .. UPDATE.  Users of
+ * logical decoding don't have to care about these.
  */
 enum ReorderBufferChangeType
 {
@@ -59,7 +59,8 @@ enum ReorderBufferChangeType
 	REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID,
 	REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
-	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT
 };
 
 /*
-- 
1.8.3.1

v7-0001-v11-Fix-decoding-of-speculative-aborts.patchtext/x-patch; charset=US-ASCII; name=v7-0001-v11-Fix-decoding-of-speculative-aborts.patchDownload

From 96ef95fdb64afcdf1f5c5cb0c298286c2cbc1e9f Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Fri, 11 Jun 2021 10:58:59 +0530
Subject: [PATCH v7] v11-Fix decoding of speculative aborts.

During decoding for speculative inserts, we were relying for cleaning
toast hash on confirmation records or next change records. But that
could lead to multiple problems (a) memory leak if there is neither a
confirmation record nor any other record after toast insertion for a
speculative insert in the transaction, (b) error and assertion failures
if the next operation is not an insert/update on the same table.

The fix is to start queuing spec abort change and clean up toast hash
and change record during its processing. Currently, we are queuing the
spec aborts for both toast and main table even though we perform cleanup
while processing the main table's spec abort record. Later, if we have a
way to distinguish between the spec abort record of toast and the main
table, we can avoid queuing the change for spec aborts of toast tables.

Reported-by: Ashutosh Bapat
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Backpatch-through: 9.6, where it was introduced
Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com
---
 contrib/test_decoding/Makefile                     |   2 +-
 .../test_decoding/expected/speculative_abort.out   |  85 ++++++++++++++++
 contrib/test_decoding/specs/speculative_abort.spec | 111 +++++++++++++++++++++
 src/backend/replication/logical/decode.c           |  14 ++-
 src/backend/replication/logical/reorderbuffer.c    |  48 ++++++---
 src/include/replication/reorderbuffer.h            |   9 +-
 6 files changed, 244 insertions(+), 25 deletions(-)
 create mode 100644 contrib/test_decoding/expected/speculative_abort.out
 create mode 100644 contrib/test_decoding/specs/speculative_abort.spec

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 65a91a8..29e5a3e 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -51,7 +51,7 @@ regresscheck-install-force: | submake-regress submake-test_decoding temp-install
 	    $(REGRESSCHECKS)
 
 ISOLATIONCHECKS=mxact delayed_startup ondisk_startup concurrent_ddl_dml \
-	oldest_xmin snapshot_transfer subxact_without_top
+	oldest_xmin snapshot_transfer subxact_without_top speculative_abort
 
 isolationcheck: | submake-isolation submake-test_decoding temp-install
 	$(pg_isolation_regress_check) \
diff --git a/contrib/test_decoding/expected/speculative_abort.out b/contrib/test_decoding/expected/speculative_abort.out
new file mode 100644
index 0000000..7492506
--- /dev/null
+++ b/contrib/test_decoding/expected/speculative_abort.out
@@ -0,0 +1,85 @@
+Parsed test spec with 3 sessions
+
+starting permutation: controller_locks controller_show_count s1_begin s1_insert_toast s2_insert_toast controller_show_count controller_unlock_1_1 controller_unlock_2_1 controller_unlock_1_3 controller_unlock_2_3 controller_show_count controller_unlock_2_2 controller_show_count controller_unlock_1_2 s1_insert_other s1_commit controller_get_changes
+data           
+
+step controller_locks: SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);
+pg_advisory_locksess           lock           
+
+               1              1              
+               1              2              
+               1              3              
+               2              1              
+               2              2              
+               2              3              
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step s1_begin: BEGIN;
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 3
+step s1_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 3
+step s2_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_1_1: SELECT pg_advisory_unlock(1, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_2_1: SELECT pg_advisory_unlock(2, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_1_3: SELECT pg_advisory_unlock(1, 3);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step controller_unlock_2_3: SELECT pg_advisory_unlock(2, 3);
+pg_advisory_unlock
+
+t              
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 2
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_2_2: SELECT pg_advisory_unlock(2, 2);
+pg_advisory_unlock
+
+t              
+step s2_insert_toast: <... completed>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+1              
+step controller_unlock_1_2: SELECT pg_advisory_unlock(1, 2);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step s1_insert_toast: <... completed>
+step s1_insert_other: INSERT INTO tbl2 VALUES(1);
+step s1_commit: COMMIT;
+step controller_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+data           
+
+BEGIN          
+table public.tbl1: INSERT: a[integer]:1 b[text]:'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+COMMIT         
+BEGIN          
+table public.tbl2: INSERT: a[integer]:1
+COMMIT         
+?column?       
+
+stop           
diff --git a/contrib/test_decoding/specs/speculative_abort.spec b/contrib/test_decoding/specs/speculative_abort.spec
new file mode 100644
index 0000000..6b3cbfb
--- /dev/null
+++ b/contrib/test_decoding/specs/speculative_abort.spec
@@ -0,0 +1,111 @@
+# INSERT ... ON CONFLICT test verifying that speculative abort for toast
+# insertions are handled during logical decoding.
+#
+# Does this by using advisory locks controlling the progress of
+# insertions. By waiting when building the index keys, it's possible
+# to schedule concurrent INSERT ON CONFLICTs so that there will always
+# be a speculative conflict.
+
+setup
+{
+	SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+	DROP TABLE IF EXISTS tbl1;
+	CREATE TABLE tbl1 (a INT, b TEXT);
+	ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+	CREATE TABLE tbl2 (a INT);
+
+    CREATE OR REPLACE FUNCTION blurt_and_lock(int) RETURNS int IMMUTABLE LANGUAGE plpgsql AS $$
+    BEGIN
+        RAISE NOTICE 'blurt_and_lock() called for % in session %', $1, current_setting('spec.session')::int;
+
+	-- depending on lock state, wait for lock 2 or 3
+        IF pg_try_advisory_xact_lock(current_setting('spec.session')::int, 1) THEN
+            RAISE NOTICE 'acquiring advisory lock on 2';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 2);
+        ELSE
+            RAISE NOTICE 'acquiring advisory lock on 3';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 3);
+        END IF;
+    RETURN $1;
+    END;$$;
+
+	CREATE UNIQUE INDEX idx on tbl1(blurt_and_lock(a));
+
+	-- consume DDL
+	SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+}
+
+teardown
+{
+    DROP TABLE tbl1;
+    SELECT 'stop' FROM pg_drop_replication_slot('isolation_slot');
+}
+
+session "controller"
+setup
+{
+    SET default_transaction_isolation = 'read committed';
+    SET application_name = 'isolation/insert-specconflict-controller';
+}
+step "controller_locks" {SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);}
+step "controller_unlock_1_1" { SELECT pg_advisory_unlock(1, 1); }
+step "controller_unlock_2_1" { SELECT pg_advisory_unlock(2, 1); }
+step "controller_unlock_1_2" { SELECT pg_advisory_unlock(1, 2); }
+step "controller_unlock_2_2" { SELECT pg_advisory_unlock(2, 2); }
+step "controller_unlock_1_3" { SELECT pg_advisory_unlock(1, 3); }
+step "controller_unlock_2_3" { SELECT pg_advisory_unlock(2, 3); }
+step "controller_show_count" { SELECT COUNT(*) FROM tbl1; }
+step "controller_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); }
+
+session "s1"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 1;
+    SET application_name = 'isolation/insert-specconflict-s1';
+}
+
+step "s1_begin"  { BEGIN; }
+step "s1_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+step "s1_insert_other" { INSERT INTO tbl2 VALUES(1); }
+step "s1_commit"  { COMMIT; }
+
+session "s2"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 2;
+    SET application_name = 'isolation/insert-specconflict-s2';
+}
+step "s2_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+
+
+# Test logical decoding of speculative aborts for toast insertion followed by
+# insertion into a different table which doesn't have a toast.
+permutation
+   # acquire a number of locks, to control execution flow - the
+   # blurt_and_lock function acquires advisory locks that allow us to
+   # continue after a) the optimistic conflict probe b) after the
+   # insertion of the speculative tuple.
+   "controller_locks"
+   "controller_show_count"
+   "s1_begin"
+   "s1_insert_toast" "s2_insert_toast"
+   "controller_show_count"
+   # Switch both sessions to wait on the other lock next time (the speculative insertion)
+   "controller_unlock_1_1" "controller_unlock_2_1"
+   # Allow both sessions to continue
+   "controller_unlock_1_3" "controller_unlock_2_3"
+   "controller_show_count"
+   # Allow the second session to finish insertion
+   "controller_unlock_2_2"
+   # This should now show a successful insertion
+   "controller_show_count"
+   # Allow the first session to speculative abort
+   "controller_unlock_1_2"
+   # Insert into other table from s1 and commit
+   "s1_insert_other" "s1_commit"
+   # Get the changes
+   "controller_get_changes"
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index d7da3d6..676f921 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -804,19 +804,17 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (target_node.dbNode != ctx->slot->data.database)
 		return;
 
-	/*
-	 * Super deletions are irrelevant for logical decoding, it's driven by the
-	 * confirmation records.
-	 */
-	if (xlrec->flags & XLH_DELETE_IS_SUPER)
-		return;
-
 	/* output plugin doesn't look for this origin, no need to queue */
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
 	change = ReorderBufferGetChange(ctx->reorder);
-	change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+	if (xlrec->flags & XLH_DELETE_IS_SUPER)
+		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
+	else
+		change->action = REORDER_BUFFER_CHANGE_DELETE;
+
 	change->origin_id = XLogRecGetOrigin(r);
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 1a4b87c..244d203 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -351,6 +351,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	/* Reset the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	pfree(txn);
 }
 
@@ -418,6 +421,7 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change)
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
@@ -1620,8 +1624,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			change_done:
 
 					/*
-					 * Either speculative insertion was confirmed, or it was
-					 * unsuccessful and the record isn't needed anymore.
+					 * If speculative insertion was confirmed, the record isn't
+					 * needed anymore.
 					 */
 					if (specinsert != NULL)
 					{
@@ -1695,6 +1699,32 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 						break;
 					}
 
+				case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+
+					/*
+					 * Abort for speculative insertion arrived. So cleanup the
+					 * specinsert tuple and toast hash.
+					 *
+					 * Note that we get the spec abort change for each toast
+					 * entry but we need to perform the cleanup only the first
+					 * time we get it for the main table.
+					 */
+					if (specinsert != NULL)
+					{
+						/*
+						 * We must clean the toast hash before processing a
+						 * completely new tuple to avoid confusion about the
+						 * previous tuple's toast chunks.
+						 */
+						Assert(change->data.tp.clear_toast_afterwards);
+						ReorderBufferToastReset(rb, txn);
+
+						/* We don't need this record anymore. */
+						ReorderBufferReturnChange(rb, specinsert);
+						specinsert = NULL;
+					}
+					break;
+
 				case REORDER_BUFFER_CHANGE_MESSAGE:
 					rb->message(rb, txn, change->lsn, true,
 								change->data.msg.prefix,
@@ -1770,16 +1800,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			}
 		}
 
-		/*
-		 * There's a speculative insertion remaining, just clean in up, it
-		 * can't have been successful, otherwise we'd gotten a confirmation
-		 * record.
-		 */
-		if (specinsert)
-		{
-			ReorderBufferReturnChange(rb, specinsert);
-			specinsert = NULL;
-		}
+		/* speculative insertion record must be freed by now */
+		Assert(!specinsert);
 
 		/* clean up the iterator */
 		ReorderBufferIterTXNFinish(rb, iterstate);
@@ -2475,6 +2497,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -2778,6 +2801,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index c41f362..ffb5a94 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -44,10 +44,10 @@ typedef struct ReorderBufferTupleBuf
  * changes. Users of the decoding facilities will never see changes with
  * *_INTERNAL_* actions.
  *
- * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM changes concern
- * "speculative insertions", and their confirmation respectively.  They're
- * used by INSERT .. ON CONFLICT .. UPDATE.  Users of logical decoding don't
- * have to care about these.
+ * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM, and INTERNAL_SPEC_ABORT
+ * changes concern "speculative insertions", their confirmation, and abort
+ * respectively.  They're used by INSERT .. ON CONFLICT .. UPDATE.  Users of
+ * logical decoding don't have to care about these.
  */
 enum ReorderBufferChangeType
 {
@@ -60,6 +60,7 @@ enum ReorderBufferChangeType
 	REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT,
 	REORDER_BUFFER_CHANGE_TRUNCATE
 };
 
-- 
1.8.3.1

v7-0001-v12-Fix-decoding-of-speculative-aborts.patchtext/x-patch; charset=US-ASCII; name=v7-0001-v12-Fix-decoding-of-speculative-aborts.patchDownload

From 30124d6c24daa4b9058259e0e940b1671e6c541b Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Fri, 11 Jun 2021 11:09:02 +0530
Subject: [PATCH v7] v12-Fix decoding of speculative aborts.

During decoding for speculative inserts, we were relying for cleaning
toast hash on confirmation records or next change records. But that
could lead to multiple problems (a) memory leak if there is neither a
confirmation record nor any other record after toast insertion for a
speculative insert in the transaction, (b) error and assertion failures
if the next operation is not an insert/update on the same table.

The fix is to start queuing spec abort change and clean up toast hash
and change record during its processing. Currently, we are queuing the
spec aborts for both toast and main table even though we perform cleanup
while processing the main table's spec abort record. Later, if we have a
way to distinguish between the spec abort record of toast and the main
table, we can avoid queuing the change for spec aborts of toast tables.

Reported-by: Ashutosh Bapat
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Backpatch-through: 9.6, where it was introduced
Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com
---
 contrib/test_decoding/Makefile                     |   2 +-
 .../test_decoding/expected/speculative_abort.out   |  85 ++++++++++++++++
 contrib/test_decoding/specs/speculative_abort.spec | 111 +++++++++++++++++++++
 src/backend/replication/logical/decode.c           |  14 ++-
 src/backend/replication/logical/reorderbuffer.c    |  48 ++++++---
 src/include/replication/reorderbuffer.h            |   9 +-
 6 files changed, 244 insertions(+), 25 deletions(-)
 create mode 100644 contrib/test_decoding/expected/speculative_abort.out
 create mode 100644 contrib/test_decoding/specs/speculative_abort.spec

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index f439c58..14e60bd 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -7,7 +7,7 @@ REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
 	decoding_into_rel binary prepared replorigin time messages \
 	spill slot truncate
 ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
-	oldest_xmin snapshot_transfer subxact_without_top
+	oldest_xmin snapshot_transfer subxact_without_top speculative_abort
 
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/expected/speculative_abort.out b/contrib/test_decoding/expected/speculative_abort.out
new file mode 100644
index 0000000..7492506
--- /dev/null
+++ b/contrib/test_decoding/expected/speculative_abort.out
@@ -0,0 +1,85 @@
+Parsed test spec with 3 sessions
+
+starting permutation: controller_locks controller_show_count s1_begin s1_insert_toast s2_insert_toast controller_show_count controller_unlock_1_1 controller_unlock_2_1 controller_unlock_1_3 controller_unlock_2_3 controller_show_count controller_unlock_2_2 controller_show_count controller_unlock_1_2 s1_insert_other s1_commit controller_get_changes
+data           
+
+step controller_locks: SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);
+pg_advisory_locksess           lock           
+
+               1              1              
+               1              2              
+               1              3              
+               2              1              
+               2              2              
+               2              3              
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step s1_begin: BEGIN;
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 3
+step s1_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 3
+step s2_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_1_1: SELECT pg_advisory_unlock(1, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_2_1: SELECT pg_advisory_unlock(2, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_1_3: SELECT pg_advisory_unlock(1, 3);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step controller_unlock_2_3: SELECT pg_advisory_unlock(2, 3);
+pg_advisory_unlock
+
+t              
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 2
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_2_2: SELECT pg_advisory_unlock(2, 2);
+pg_advisory_unlock
+
+t              
+step s2_insert_toast: <... completed>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+1              
+step controller_unlock_1_2: SELECT pg_advisory_unlock(1, 2);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step s1_insert_toast: <... completed>
+step s1_insert_other: INSERT INTO tbl2 VALUES(1);
+step s1_commit: COMMIT;
+step controller_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+data           
+
+BEGIN          
+table public.tbl1: INSERT: a[integer]:1 b[text]:'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+COMMIT         
+BEGIN          
+table public.tbl2: INSERT: a[integer]:1
+COMMIT         
+?column?       
+
+stop           
diff --git a/contrib/test_decoding/specs/speculative_abort.spec b/contrib/test_decoding/specs/speculative_abort.spec
new file mode 100644
index 0000000..6b3cbfb
--- /dev/null
+++ b/contrib/test_decoding/specs/speculative_abort.spec
@@ -0,0 +1,111 @@
+# INSERT ... ON CONFLICT test verifying that speculative abort for toast
+# insertions are handled during logical decoding.
+#
+# Does this by using advisory locks controlling the progress of
+# insertions. By waiting when building the index keys, it's possible
+# to schedule concurrent INSERT ON CONFLICTs so that there will always
+# be a speculative conflict.
+
+setup
+{
+	SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+	DROP TABLE IF EXISTS tbl1;
+	CREATE TABLE tbl1 (a INT, b TEXT);
+	ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+	CREATE TABLE tbl2 (a INT);
+
+    CREATE OR REPLACE FUNCTION blurt_and_lock(int) RETURNS int IMMUTABLE LANGUAGE plpgsql AS $$
+    BEGIN
+        RAISE NOTICE 'blurt_and_lock() called for % in session %', $1, current_setting('spec.session')::int;
+
+	-- depending on lock state, wait for lock 2 or 3
+        IF pg_try_advisory_xact_lock(current_setting('spec.session')::int, 1) THEN
+            RAISE NOTICE 'acquiring advisory lock on 2';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 2);
+        ELSE
+            RAISE NOTICE 'acquiring advisory lock on 3';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 3);
+        END IF;
+    RETURN $1;
+    END;$$;
+
+	CREATE UNIQUE INDEX idx on tbl1(blurt_and_lock(a));
+
+	-- consume DDL
+	SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+}
+
+teardown
+{
+    DROP TABLE tbl1;
+    SELECT 'stop' FROM pg_drop_replication_slot('isolation_slot');
+}
+
+session "controller"
+setup
+{
+    SET default_transaction_isolation = 'read committed';
+    SET application_name = 'isolation/insert-specconflict-controller';
+}
+step "controller_locks" {SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);}
+step "controller_unlock_1_1" { SELECT pg_advisory_unlock(1, 1); }
+step "controller_unlock_2_1" { SELECT pg_advisory_unlock(2, 1); }
+step "controller_unlock_1_2" { SELECT pg_advisory_unlock(1, 2); }
+step "controller_unlock_2_2" { SELECT pg_advisory_unlock(2, 2); }
+step "controller_unlock_1_3" { SELECT pg_advisory_unlock(1, 3); }
+step "controller_unlock_2_3" { SELECT pg_advisory_unlock(2, 3); }
+step "controller_show_count" { SELECT COUNT(*) FROM tbl1; }
+step "controller_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); }
+
+session "s1"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 1;
+    SET application_name = 'isolation/insert-specconflict-s1';
+}
+
+step "s1_begin"  { BEGIN; }
+step "s1_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+step "s1_insert_other" { INSERT INTO tbl2 VALUES(1); }
+step "s1_commit"  { COMMIT; }
+
+session "s2"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 2;
+    SET application_name = 'isolation/insert-specconflict-s2';
+}
+step "s2_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+
+
+# Test logical decoding of speculative aborts for toast insertion followed by
+# insertion into a different table which doesn't have a toast.
+permutation
+   # acquire a number of locks, to control execution flow - the
+   # blurt_and_lock function acquires advisory locks that allow us to
+   # continue after a) the optimistic conflict probe b) after the
+   # insertion of the speculative tuple.
+   "controller_locks"
+   "controller_show_count"
+   "s1_begin"
+   "s1_insert_toast" "s2_insert_toast"
+   "controller_show_count"
+   # Switch both sessions to wait on the other lock next time (the speculative insertion)
+   "controller_unlock_1_1" "controller_unlock_2_1"
+   # Allow both sessions to continue
+   "controller_unlock_1_3" "controller_unlock_2_3"
+   "controller_show_count"
+   # Allow the second session to finish insertion
+   "controller_unlock_2_2"
+   # This should now show a successful insertion
+   "controller_show_count"
+   # Allow the first session to speculative abort
+   "controller_unlock_1_2"
+   # Insert into other table from s1 and commit
+   "s1_insert_other" "s1_commit"
+   # Get the changes
+   "controller_get_changes"
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index eef2f88..ff18861 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -803,19 +803,17 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (target_node.dbNode != ctx->slot->data.database)
 		return;
 
-	/*
-	 * Super deletions are irrelevant for logical decoding, it's driven by the
-	 * confirmation records.
-	 */
-	if (xlrec->flags & XLH_DELETE_IS_SUPER)
-		return;
-
 	/* output plugin doesn't look for this origin, no need to queue */
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
 	change = ReorderBufferGetChange(ctx->reorder);
-	change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+	if (xlrec->flags & XLH_DELETE_IS_SUPER)
+		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
+	else
+		change->action = REORDER_BUFFER_CHANGE_DELETE;
+
 	change->origin_id = XLogRecGetOrigin(r);
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 941ca9a..3a922b7 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -359,6 +359,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	/* Reset the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	pfree(txn);
 }
 
@@ -426,6 +429,7 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change)
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
@@ -1628,8 +1632,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			change_done:
 
 					/*
-					 * Either speculative insertion was confirmed, or it was
-					 * unsuccessful and the record isn't needed anymore.
+					 * If speculative insertion was confirmed, the record isn't
+					 * needed anymore.
 					 */
 					if (specinsert != NULL)
 					{
@@ -1703,6 +1707,32 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 						break;
 					}
 
+				case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+
+					/*
+					 * Abort for speculative insertion arrived. So cleanup the
+					 * specinsert tuple and toast hash.
+					 *
+					 * Note that we get the spec abort change for each toast
+					 * entry but we need to perform the cleanup only the first
+					 * time we get it for the main table.
+					 */
+					if (specinsert != NULL)
+					{
+						/*
+						 * We must clean the toast hash before processing a
+						 * completely new tuple to avoid confusion about the
+						 * previous tuple's toast chunks.
+						 */
+						Assert(change->data.tp.clear_toast_afterwards);
+						ReorderBufferToastReset(rb, txn);
+
+						/* We don't need this record anymore. */
+						ReorderBufferReturnChange(rb, specinsert);
+						specinsert = NULL;
+					}
+					break;
+
 				case REORDER_BUFFER_CHANGE_MESSAGE:
 					rb->message(rb, txn, change->lsn, true,
 								change->data.msg.prefix,
@@ -1778,16 +1808,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			}
 		}
 
-		/*
-		 * There's a speculative insertion remaining, just clean in up, it
-		 * can't have been successful, otherwise we'd gotten a confirmation
-		 * record.
-		 */
-		if (specinsert)
-		{
-			ReorderBufferReturnChange(rb, specinsert);
-			specinsert = NULL;
-		}
+		/* speculative insertion record must be freed by now */
+		Assert(!specinsert);
 
 		/* clean up the iterator */
 		ReorderBufferIterTXNFinish(rb, iterstate);
@@ -2483,6 +2505,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -2797,6 +2820,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 49df12a..1ced41f 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -44,10 +44,10 @@ typedef struct ReorderBufferTupleBuf
  * changes. Users of the decoding facilities will never see changes with
  * *_INTERNAL_* actions.
  *
- * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM changes concern
- * "speculative insertions", and their confirmation respectively.  They're
- * used by INSERT .. ON CONFLICT .. UPDATE.  Users of logical decoding don't
- * have to care about these.
+ * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM, and INTERNAL_SPEC_ABORT
+ * changes concern "speculative insertions", their confirmation, and abort
+ * respectively.  They're used by INSERT .. ON CONFLICT .. UPDATE.  Users of
+ * logical decoding don't have to care about these.
  */
 enum ReorderBufferChangeType
 {
@@ -60,6 +60,7 @@ enum ReorderBufferChangeType
 	REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT,
 	REORDER_BUFFER_CHANGE_TRUNCATE
 };
 
-- 
1.8.3.1

v7-0001-v10-Fix-decoding-of-speculative-aborts.patchtext/x-patch; charset=US-ASCII; name=v7-0001-v10-Fix-decoding-of-speculative-aborts.patchDownload

From 8aab67b16187179c16eaa4bb29e9673cd4313a5f Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Fri, 11 Jun 2021 10:45:49 +0530
Subject: [PATCH v7] v10-Fix decoding of speculative aborts.

During decoding for speculative inserts, we were relying for cleaning
toast hash on confirmation records or next change records. But that
could lead to multiple problems (a) memory leak if there is neither a
confirmation record nor any other record after toast insertion for a
speculative insert in the transaction, (b) error and assertion failures
if the next operation is not an insert/update on the same table.

The fix is to start queuing spec abort change and clean up toast hash
and change record during its processing. Currently, we are queuing the
spec aborts for both toast and main table even though we perform cleanup
while processing the main table's spec abort record. Later, if we have a
way to distinguish between the spec abort record of toast and the main
table, we can avoid queuing the change for spec aborts of toast tables.

Reported-by: Ashutosh Bapat
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Backpatch-through: 9.6, where it was introduced
Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com
---
 contrib/test_decoding/Makefile                     |   2 +-
 .../test_decoding/expected/speculative_abort.out   |  85 ++++++++++++++++
 contrib/test_decoding/specs/speculative_abort.spec | 111 +++++++++++++++++++++
 src/backend/replication/logical/decode.c           |  14 ++-
 src/backend/replication/logical/reorderbuffer.c    |  48 ++++++---
 src/include/replication/reorderbuffer.h            |  11 +-
 6 files changed, 245 insertions(+), 26 deletions(-)
 create mode 100644 contrib/test_decoding/expected/speculative_abort.out
 create mode 100644 contrib/test_decoding/specs/speculative_abort.spec

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 2db2b27..c7def09 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -51,7 +51,7 @@ regresscheck-install-force: | submake-regress submake-test_decoding temp-install
 	    $(REGRESSCHECKS)
 
 ISOLATIONCHECKS=mxact delayed_startup ondisk_startup concurrent_ddl_dml \
-	oldest_xmin snapshot_transfer subxact_without_top
+	oldest_xmin snapshot_transfer subxact_without_top speculative_abort
 
 isolationcheck: | submake-isolation submake-test_decoding temp-install
 	$(pg_isolation_regress_check) \
diff --git a/contrib/test_decoding/expected/speculative_abort.out b/contrib/test_decoding/expected/speculative_abort.out
new file mode 100644
index 0000000..7492506
--- /dev/null
+++ b/contrib/test_decoding/expected/speculative_abort.out
@@ -0,0 +1,85 @@
+Parsed test spec with 3 sessions
+
+starting permutation: controller_locks controller_show_count s1_begin s1_insert_toast s2_insert_toast controller_show_count controller_unlock_1_1 controller_unlock_2_1 controller_unlock_1_3 controller_unlock_2_3 controller_show_count controller_unlock_2_2 controller_show_count controller_unlock_1_2 s1_insert_other s1_commit controller_get_changes
+data           
+
+step controller_locks: SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);
+pg_advisory_locksess           lock           
+
+               1              1              
+               1              2              
+               1              3              
+               2              1              
+               2              2              
+               2              3              
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step s1_begin: BEGIN;
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 3
+step s1_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 3
+step s2_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_1_1: SELECT pg_advisory_unlock(1, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_2_1: SELECT pg_advisory_unlock(2, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_1_3: SELECT pg_advisory_unlock(1, 3);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step controller_unlock_2_3: SELECT pg_advisory_unlock(2, 3);
+pg_advisory_unlock
+
+t              
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 2
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_2_2: SELECT pg_advisory_unlock(2, 2);
+pg_advisory_unlock
+
+t              
+step s2_insert_toast: <... completed>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+1              
+step controller_unlock_1_2: SELECT pg_advisory_unlock(1, 2);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step s1_insert_toast: <... completed>
+step s1_insert_other: INSERT INTO tbl2 VALUES(1);
+step s1_commit: COMMIT;
+step controller_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+data           
+
+BEGIN          
+table public.tbl1: INSERT: a[integer]:1 b[text]:'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+COMMIT         
+BEGIN          
+table public.tbl2: INSERT: a[integer]:1
+COMMIT         
+?column?       
+
+stop           
diff --git a/contrib/test_decoding/specs/speculative_abort.spec b/contrib/test_decoding/specs/speculative_abort.spec
new file mode 100644
index 0000000..6b3cbfb
--- /dev/null
+++ b/contrib/test_decoding/specs/speculative_abort.spec
@@ -0,0 +1,111 @@
+# INSERT ... ON CONFLICT test verifying that speculative abort for toast
+# insertions are handled during logical decoding.
+#
+# Does this by using advisory locks controlling the progress of
+# insertions. By waiting when building the index keys, it's possible
+# to schedule concurrent INSERT ON CONFLICTs so that there will always
+# be a speculative conflict.
+
+setup
+{
+	SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+	DROP TABLE IF EXISTS tbl1;
+	CREATE TABLE tbl1 (a INT, b TEXT);
+	ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+	CREATE TABLE tbl2 (a INT);
+
+    CREATE OR REPLACE FUNCTION blurt_and_lock(int) RETURNS int IMMUTABLE LANGUAGE plpgsql AS $$
+    BEGIN
+        RAISE NOTICE 'blurt_and_lock() called for % in session %', $1, current_setting('spec.session')::int;
+
+	-- depending on lock state, wait for lock 2 or 3
+        IF pg_try_advisory_xact_lock(current_setting('spec.session')::int, 1) THEN
+            RAISE NOTICE 'acquiring advisory lock on 2';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 2);
+        ELSE
+            RAISE NOTICE 'acquiring advisory lock on 3';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 3);
+        END IF;
+    RETURN $1;
+    END;$$;
+
+	CREATE UNIQUE INDEX idx on tbl1(blurt_and_lock(a));
+
+	-- consume DDL
+	SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+}
+
+teardown
+{
+    DROP TABLE tbl1;
+    SELECT 'stop' FROM pg_drop_replication_slot('isolation_slot');
+}
+
+session "controller"
+setup
+{
+    SET default_transaction_isolation = 'read committed';
+    SET application_name = 'isolation/insert-specconflict-controller';
+}
+step "controller_locks" {SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);}
+step "controller_unlock_1_1" { SELECT pg_advisory_unlock(1, 1); }
+step "controller_unlock_2_1" { SELECT pg_advisory_unlock(2, 1); }
+step "controller_unlock_1_2" { SELECT pg_advisory_unlock(1, 2); }
+step "controller_unlock_2_2" { SELECT pg_advisory_unlock(2, 2); }
+step "controller_unlock_1_3" { SELECT pg_advisory_unlock(1, 3); }
+step "controller_unlock_2_3" { SELECT pg_advisory_unlock(2, 3); }
+step "controller_show_count" { SELECT COUNT(*) FROM tbl1; }
+step "controller_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); }
+
+session "s1"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 1;
+    SET application_name = 'isolation/insert-specconflict-s1';
+}
+
+step "s1_begin"  { BEGIN; }
+step "s1_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+step "s1_insert_other" { INSERT INTO tbl2 VALUES(1); }
+step "s1_commit"  { COMMIT; }
+
+session "s2"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 2;
+    SET application_name = 'isolation/insert-specconflict-s2';
+}
+step "s2_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+
+
+# Test logical decoding of speculative aborts for toast insertion followed by
+# insertion into a different table which doesn't have a toast.
+permutation
+   # acquire a number of locks, to control execution flow - the
+   # blurt_and_lock function acquires advisory locks that allow us to
+   # continue after a) the optimistic conflict probe b) after the
+   # insertion of the speculative tuple.
+   "controller_locks"
+   "controller_show_count"
+   "s1_begin"
+   "s1_insert_toast" "s2_insert_toast"
+   "controller_show_count"
+   # Switch both sessions to wait on the other lock next time (the speculative insertion)
+   "controller_unlock_1_1" "controller_unlock_2_1"
+   # Allow both sessions to continue
+   "controller_unlock_1_3" "controller_unlock_2_3"
+   "controller_show_count"
+   # Allow the second session to finish insertion
+   "controller_unlock_2_2"
+   # This should now show a successful insertion
+   "controller_show_count"
+   # Allow the first session to speculative abort
+   "controller_unlock_1_2"
+   # Insert into other table from s1 and commit
+   "s1_insert_other" "s1_commit"
+   # Get the changes
+   "controller_get_changes"
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index d7a6f5d..3778ea9 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -778,19 +778,17 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (target_node.dbNode != ctx->slot->data.database)
 		return;
 
-	/*
-	 * Super deletions are irrelevant for logical decoding, it's driven by the
-	 * confirmation records.
-	 */
-	if (xlrec->flags & XLH_DELETE_IS_SUPER)
-		return;
-
 	/* output plugin doesn't look for this origin, no need to queue */
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
 	change = ReorderBufferGetChange(ctx->reorder);
-	change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+	if (xlrec->flags & XLH_DELETE_IS_SUPER)
+		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
+	else
+		change->action = REORDER_BUFFER_CHANGE_DELETE;
+
 	change->origin_id = XLogRecGetOrigin(r);
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 165ba8f..0963b23 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -353,6 +353,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	/* Reset the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	pfree(txn);
 }
 
@@ -413,6 +416,7 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change)
 			break;
 			/* no data in addition to the struct itself */
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
@@ -1621,8 +1625,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			change_done:
 
 					/*
-					 * Either speculative insertion was confirmed, or it was
-					 * unsuccessful and the record isn't needed anymore.
+					 * If speculative insertion was confirmed, the record isn't
+					 * needed anymore.
 					 */
 					if (specinsert != NULL)
 					{
@@ -1664,6 +1668,32 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 					specinsert = change;
 					break;
 
+				case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+
+					/*
+					 * Abort for speculative insertion arrived. So cleanup the
+					 * specinsert tuple and toast hash.
+					 *
+					 * Note that we get the spec abort change for each toast
+					 * entry but we need to perform the cleanup only the first
+					 * time we get it for the main table.
+					 */
+					if (specinsert != NULL)
+					{
+						/*
+						 * We must clean the toast hash before processing a
+						 * completely new tuple to avoid confusion about the
+						 * previous tuple's toast chunks.
+						 */
+						Assert(change->data.tp.clear_toast_afterwards);
+						ReorderBufferToastReset(rb, txn);
+
+						/* We don't need this record anymore. */
+						ReorderBufferReturnChange(rb, specinsert);
+						specinsert = NULL;
+					}
+					break;
+
 				case REORDER_BUFFER_CHANGE_MESSAGE:
 					rb->message(rb, txn, change->lsn, true,
 								change->data.msg.prefix,
@@ -1739,16 +1769,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			}
 		}
 
-		/*
-		 * There's a speculative insertion remaining, just clean in up, it
-		 * can't have been successful, otherwise we'd gotten a confirmation
-		 * record.
-		 */
-		if (specinsert)
-		{
-			ReorderBufferReturnChange(rb, specinsert);
-			specinsert = NULL;
-		}
+		/* speculative insertion record must be freed by now */
+		Assert(!specinsert);
 
 		/* clean up the iterator */
 		ReorderBufferIterTXNFinish(rb, iterstate);
@@ -2423,6 +2445,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -2716,6 +2739,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			}
 			/* the base struct contains all the data, easy peasy */
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 5530c4f..d455567 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -44,10 +44,10 @@ typedef struct ReorderBufferTupleBuf
  * changes. Users of the decoding facilities will never see changes with
  * *_INTERNAL_* actions.
  *
- * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM changes concern
- * "speculative insertions", and their confirmation respectively.  They're
- * used by INSERT .. ON CONFLICT .. UPDATE.  Users of logical decoding don't
- * have to care about these.
+ * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM, and INTERNAL_SPEC_ABORT
+ * changes concern "speculative insertions", their confirmation, and abort
+ * respectively.  They're used by INSERT .. ON CONFLICT .. UPDATE.  Users of
+ * logical decoding don't have to care about these.
  */
 enum ReorderBufferChangeType
 {
@@ -59,7 +59,8 @@ enum ReorderBufferChangeType
 	REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID,
 	REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
-	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT
 };
 
 /*
-- 
1.8.3.1

v7-0001-v13-Fix-decoding-of-speculative-aborts.patchtext/x-patch; charset=US-ASCII; name=v7-0001-v13-Fix-decoding-of-speculative-aborts.patchDownload

From 2e80138ad62a053ca051172bca4aef0081d59236 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Fri, 11 Jun 2021 11:17:34 +0530
Subject: [PATCH v7] v13-Fix decoding of speculative aborts.

During decoding for speculative inserts, we were relying for cleaning
toast hash on confirmation records or next change records. But that
could lead to multiple problems (a) memory leak if there is neither a
confirmation record nor any other record after toast insertion for a
speculative insert in the transaction, (b) error and assertion failures
if the next operation is not an insert/update on the same table.

The fix is to start queuing spec abort change and clean up toast hash
and change record during its processing. Currently, we are queuing the
spec aborts for both toast and main table even though we perform cleanup
while processing the main table's spec abort record. Later, if we have a
way to distinguish between the spec abort record of toast and the main
table, we can avoid queuing the change for spec aborts of toast tables.

Reported-by: Ashutosh Bapat
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Backpatch-through: 9.6, where it was introduced
Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com
---
 contrib/test_decoding/Makefile                     |   2 +-
 .../test_decoding/expected/speculative_abort.out   |  85 ++++++++++++++++
 contrib/test_decoding/specs/speculative_abort.spec | 111 +++++++++++++++++++++
 src/backend/replication/logical/decode.c           |  14 ++-
 src/backend/replication/logical/reorderbuffer.c    |  49 ++++++---
 src/include/replication/reorderbuffer.h            |   9 +-
 6 files changed, 245 insertions(+), 25 deletions(-)
 create mode 100644 contrib/test_decoding/expected/speculative_abort.out
 create mode 100644 contrib/test_decoding/specs/speculative_abort.spec

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index f439c58..14e60bd 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -7,7 +7,7 @@ REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
 	decoding_into_rel binary prepared replorigin time messages \
 	spill slot truncate
 ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
-	oldest_xmin snapshot_transfer subxact_without_top
+	oldest_xmin snapshot_transfer subxact_without_top speculative_abort
 
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/expected/speculative_abort.out b/contrib/test_decoding/expected/speculative_abort.out
new file mode 100644
index 0000000..7492506
--- /dev/null
+++ b/contrib/test_decoding/expected/speculative_abort.out
@@ -0,0 +1,85 @@
+Parsed test spec with 3 sessions
+
+starting permutation: controller_locks controller_show_count s1_begin s1_insert_toast s2_insert_toast controller_show_count controller_unlock_1_1 controller_unlock_2_1 controller_unlock_1_3 controller_unlock_2_3 controller_show_count controller_unlock_2_2 controller_show_count controller_unlock_1_2 s1_insert_other s1_commit controller_get_changes
+data           
+
+step controller_locks: SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);
+pg_advisory_locksess           lock           
+
+               1              1              
+               1              2              
+               1              3              
+               2              1              
+               2              2              
+               2              3              
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step s1_begin: BEGIN;
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 3
+step s1_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 3
+step s2_insert_toast: INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; <waiting ...>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_1_1: SELECT pg_advisory_unlock(1, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_2_1: SELECT pg_advisory_unlock(2, 1);
+pg_advisory_unlock
+
+t              
+step controller_unlock_1_3: SELECT pg_advisory_unlock(1, 3);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step controller_unlock_2_3: SELECT pg_advisory_unlock(2, 3);
+pg_advisory_unlock
+
+t              
+s2: NOTICE:  blurt_and_lock() called for 1 in session 2
+s2: NOTICE:  acquiring advisory lock on 2
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+0              
+step controller_unlock_2_2: SELECT pg_advisory_unlock(2, 2);
+pg_advisory_unlock
+
+t              
+step s2_insert_toast: <... completed>
+step controller_show_count: SELECT COUNT(*) FROM tbl1;
+count          
+
+1              
+step controller_unlock_1_2: SELECT pg_advisory_unlock(1, 2);
+pg_advisory_unlock
+
+t              
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+s1: NOTICE:  blurt_and_lock() called for 1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
+step s1_insert_toast: <... completed>
+step s1_insert_other: INSERT INTO tbl2 VALUES(1);
+step s1_commit: COMMIT;
+step controller_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+data           
+
+BEGIN          
+table public.tbl1: INSERT: a[integer]:1 b[text]:'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+COMMIT         
+BEGIN          
+table public.tbl2: INSERT: a[integer]:1
+COMMIT         
+?column?       
+
+stop           
diff --git a/contrib/test_decoding/specs/speculative_abort.spec b/contrib/test_decoding/specs/speculative_abort.spec
new file mode 100644
index 0000000..6b3cbfb
--- /dev/null
+++ b/contrib/test_decoding/specs/speculative_abort.spec
@@ -0,0 +1,111 @@
+# INSERT ... ON CONFLICT test verifying that speculative abort for toast
+# insertions are handled during logical decoding.
+#
+# Does this by using advisory locks controlling the progress of
+# insertions. By waiting when building the index keys, it's possible
+# to schedule concurrent INSERT ON CONFLICTs so that there will always
+# be a speculative conflict.
+
+setup
+{
+	SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+	DROP TABLE IF EXISTS tbl1;
+	CREATE TABLE tbl1 (a INT, b TEXT);
+	ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+	CREATE TABLE tbl2 (a INT);
+
+    CREATE OR REPLACE FUNCTION blurt_and_lock(int) RETURNS int IMMUTABLE LANGUAGE plpgsql AS $$
+    BEGIN
+        RAISE NOTICE 'blurt_and_lock() called for % in session %', $1, current_setting('spec.session')::int;
+
+	-- depending on lock state, wait for lock 2 or 3
+        IF pg_try_advisory_xact_lock(current_setting('spec.session')::int, 1) THEN
+            RAISE NOTICE 'acquiring advisory lock on 2';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 2);
+        ELSE
+            RAISE NOTICE 'acquiring advisory lock on 3';
+            PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 3);
+        END IF;
+    RETURN $1;
+    END;$$;
+
+	CREATE UNIQUE INDEX idx on tbl1(blurt_and_lock(a));
+
+	-- consume DDL
+	SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+}
+
+teardown
+{
+    DROP TABLE tbl1;
+    SELECT 'stop' FROM pg_drop_replication_slot('isolation_slot');
+}
+
+session "controller"
+setup
+{
+    SET default_transaction_isolation = 'read committed';
+    SET application_name = 'isolation/insert-specconflict-controller';
+}
+step "controller_locks" {SELECT pg_advisory_lock(sess, lock), sess, lock FROM generate_series(1, 2) a(sess), generate_series(1,3) b(lock);}
+step "controller_unlock_1_1" { SELECT pg_advisory_unlock(1, 1); }
+step "controller_unlock_2_1" { SELECT pg_advisory_unlock(2, 1); }
+step "controller_unlock_1_2" { SELECT pg_advisory_unlock(1, 2); }
+step "controller_unlock_2_2" { SELECT pg_advisory_unlock(2, 2); }
+step "controller_unlock_1_3" { SELECT pg_advisory_unlock(1, 3); }
+step "controller_unlock_2_3" { SELECT pg_advisory_unlock(2, 3); }
+step "controller_show_count" { SELECT COUNT(*) FROM tbl1; }
+step "controller_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); }
+
+session "s1"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 1;
+    SET application_name = 'isolation/insert-specconflict-s1';
+}
+
+step "s1_begin"  { BEGIN; }
+step "s1_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+step "s1_insert_other" { INSERT INTO tbl2 VALUES(1); }
+step "s1_commit"  { COMMIT; }
+
+session "s2"
+setup
+{
+	SET synchronous_commit=on;
+    SET default_transaction_isolation = 'read committed';
+    SET spec.session = 2;
+    SET application_name = 'isolation/insert-specconflict-s2';
+}
+step "s2_insert_toast" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; }
+
+
+# Test logical decoding of speculative aborts for toast insertion followed by
+# insertion into a different table which doesn't have a toast.
+permutation
+   # acquire a number of locks, to control execution flow - the
+   # blurt_and_lock function acquires advisory locks that allow us to
+   # continue after a) the optimistic conflict probe b) after the
+   # insertion of the speculative tuple.
+   "controller_locks"
+   "controller_show_count"
+   "s1_begin"
+   "s1_insert_toast" "s2_insert_toast"
+   "controller_show_count"
+   # Switch both sessions to wait on the other lock next time (the speculative insertion)
+   "controller_unlock_1_1" "controller_unlock_2_1"
+   # Allow both sessions to continue
+   "controller_unlock_1_3" "controller_unlock_2_3"
+   "controller_show_count"
+   # Allow the second session to finish insertion
+   "controller_unlock_2_2"
+   # This should now show a successful insertion
+   "controller_show_count"
+   # Allow the first session to speculative abort
+   "controller_unlock_1_2"
+   # Insert into other table from s1 and commit
+   "s1_insert_other" "s1_commit"
+   # Get the changes
+   "controller_get_changes"
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index c2e5e3a..4985c2a 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -800,19 +800,17 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (target_node.dbNode != ctx->slot->data.database)
 		return;
 
-	/*
-	 * Super deletions are irrelevant for logical decoding, it's driven by the
-	 * confirmation records.
-	 */
-	if (xlrec->flags & XLH_DELETE_IS_SUPER)
-		return;
-
 	/* output plugin doesn't look for this origin, no need to queue */
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
 	change = ReorderBufferGetChange(ctx->reorder);
-	change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+	if (xlrec->flags & XLH_DELETE_IS_SUPER)
+		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
+	else
+		change->action = REORDER_BUFFER_CHANGE_DELETE;
+
 	change->origin_id = XLogRecGetOrigin(r);
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 5251932..e86eb8c 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -397,6 +397,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	/* Reset the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	pfree(txn);
 }
 
@@ -467,6 +470,7 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change)
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
@@ -1677,8 +1681,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			change_done:
 
 					/*
-					 * Either speculative insertion was confirmed, or it was
-					 * unsuccessful and the record isn't needed anymore.
+					 * If speculative insertion was confirmed, the record isn't
+					 * needed anymore.
 					 */
 					if (specinsert != NULL)
 					{
@@ -1720,6 +1724,32 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 					specinsert = change;
 					break;
 
+				case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+
+					/*
+					 * Abort for speculative insertion arrived. So cleanup the
+					 * specinsert tuple and toast hash.
+					 *
+					 * Note that we get the spec abort change for each toast
+					 * entry but we need to perform the cleanup only the first
+					 * time we get it for the main table.
+					 */
+					if (specinsert != NULL)
+					{
+						/*
+						 * We must clean the toast hash before processing a
+						 * completely new tuple to avoid confusion about the
+						 * previous tuple's toast chunks.
+						 */
+						Assert(change->data.tp.clear_toast_afterwards);
+						ReorderBufferToastReset(rb, txn);
+
+						/* We don't need this record anymore. */
+						ReorderBufferReturnChange(rb, specinsert);
+						specinsert = NULL;
+					}
+					break;
+
 				case REORDER_BUFFER_CHANGE_TRUNCATE:
 					{
 						int			i;
@@ -1827,16 +1857,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			}
 		}
 
-		/*
-		 * There's a speculative insertion remaining, just clean in up, it
-		 * can't have been successful, otherwise we'd gotten a confirmation
-		 * record.
-		 */
-		if (specinsert)
-		{
-			ReorderBufferReturnChange(rb, specinsert);
-			specinsert = NULL;
-		}
+		/* speculative insertion record must be freed by now */
+		Assert(!specinsert);
 
 		/* clean up the iterator */
 		ReorderBufferIterTXNFinish(rb, iterstate);
@@ -2640,6 +2662,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -2747,6 +2770,7 @@ ReorderBufferChangeSize(ReorderBufferChange *change)
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -3032,6 +3056,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 019bd38..676491d 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -46,10 +46,10 @@ typedef struct ReorderBufferTupleBuf
  * changes. Users of the decoding facilities will never see changes with
  * *_INTERNAL_* actions.
  *
- * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM changes concern
- * "speculative insertions", and their confirmation respectively.  They're
- * used by INSERT .. ON CONFLICT .. UPDATE.  Users of logical decoding don't
- * have to care about these.
+ * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM, and INTERNAL_SPEC_ABORT
+ * changes concern "speculative insertions", their confirmation, and abort
+ * respectively.  They're used by INSERT .. ON CONFLICT .. UPDATE.  Users of
+ * logical decoding don't have to care about these.
  */
 enum ReorderBufferChangeType
 {
@@ -62,6 +62,7 @@ enum ReorderBufferChangeType
 	REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT,
 	REORDER_BUFFER_CHANGE_TRUNCATE
 };
 
-- 
1.8.3.1

#47

amit.kapila16@gmail.com

over 4 years ago

In reply to: Dilip Kumar (#46)

Re: Decoding speculative insert with toast leaks memory

On Fri, Jun 11, 2021 at 11:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Jun 10, 2021 at 7:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Please find the patch for HEAD attached. Can you please prepare the
patch for back-branches by doing all the changes I have done in the
patch for HEAD?

Done

Thanks, the patch looks good to me. I'll push these early next week
(Tuesday) unless someone has any other comments or suggestions.

--
With Regards,
Amit Kapila.

#48

[1]: /messages/by-id/20210613073407.GA768908@rfd.leadboat.com

amit.kapila16@gmail.com

over 4 years ago

In reply to: Amit Kapila (#47)

Re: Decoding speculative insert with toast leaks memory

On Fri, Jun 11, 2021 at 7:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jun 11, 2021 at 11:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Jun 10, 2021 at 7:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Please find the patch for HEAD attached. Can you please prepare the
patch for back-branches by doing all the changes I have done in the
patch for HEAD?

Done

Thanks, the patch looks good to me. I'll push these early next week
(Tuesday) unless someone has any other comments or suggestions.

I think the test in this patch is quite similar to what Noah has
pointed in the nearby thread [1]/messages/by-id/20210613073407.GA768908@rfd.leadboat.com to be failing at some intervals. Can
you also please once verify the same and if we can expect similar
failures here then we might want to consider dropping the test in this
patch for now? We can always come back to it once we find a good
solution to make it pass consistently.

--
With Regards,
Amit Kapila.

#49

dilipbalaut@gmail.com

over 4 years ago

In reply to: Amit Kapila (#48)

Re: Decoding speculative insert with toast leaks memory

On Mon, Jun 14, 2021 at 8:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I think the test in this patch is quite similar to what Noah has
pointed in the nearby thread [1] to be failing at some intervals. Can
you also please once verify the same and if we can expect similar
failures here then we might want to consider dropping the test in this
patch for now? We can always come back to it once we find a good
solution to make it pass consistently.

test insert-conflict-do-nothing ... ok 646 ms
test insert-conflict-do-nothing-2 ... ok 1994 ms
test insert-conflict-do-update ... ok 1786 ms
test insert-conflict-do-update-2 ... ok 2689 ms
test insert-conflict-do-update-3 ... ok 851 ms
test insert-conflict-specconflict ... FAILED 3695 ms
test delete-abort-savept ... ok 1238 ms

Yeah, this is the same test that we have used base for our test so
let's not push this test until it becomes stable.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#50

dilipbalaut@gmail.com

over 4 years ago

In reply to: Dilip Kumar (#49)

6 attachment(s)

Re: Decoding speculative insert with toast leaks memory

On Mon, Jun 14, 2021 at 9:44 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Jun 14, 2021 at 8:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I think the test in this patch is quite similar to what Noah has
pointed in the nearby thread [1] to be failing at some intervals. Can
you also please once verify the same and if we can expect similar
failures here then we might want to consider dropping the test in this
patch for now? We can always come back to it once we find a good
solution to make it pass consistently.

test insert-conflict-do-nothing ... ok 646 ms
test insert-conflict-do-nothing-2 ... ok 1994 ms
test insert-conflict-do-update ... ok 1786 ms
test insert-conflict-do-update-2 ... ok 2689 ms
test insert-conflict-do-update-3 ... ok 851 ms
test insert-conflict-specconflict ... FAILED 3695 ms
test delete-abort-savept ... ok 1238 ms

Yeah, this is the same test that we have used base for our test so
let's not push this test until it becomes stable.

Patches without test case.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v8-0001-96-Fix-decoding-of-speculative-aborts.patchtext/x-patch; charset=US-ASCII; name=v8-0001-96-Fix-decoding-of-speculative-aborts.patchDownload

From e73d20545d7f1725dc424de3d9168269bc20ad33 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Fri, 11 Jun 2021 10:08:42 +0530
Subject: [PATCH v8] 96-Fix decoding of speculative aborts.

During decoding for speculative inserts, we were relying for cleaning
toast hash on confirmation records or next change records. But that
could lead to multiple problems (a) memory leak if there is neither a
confirmation record nor any other record after toast insertion for a
speculative insert in the transaction, (b) error and assertion failures
if the next operation is not an insert/update on the same table.

The fix is to start queuing spec abort change and clean up toast hash
and change record during its processing. Currently, we are queuing the
spec aborts for both toast and main table even though we perform cleanup
while processing the main table's spec abort record. Later, if we have a
way to distinguish between the spec abort record of toast and the main
table, we can avoid queuing the change for spec aborts of toast tables.

Reported-by: Ashutosh Bapat
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Backpatch-through: 9.6, where it was introduced
Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com
---
 src/backend/replication/logical/decode.c        | 14 ++++----
 src/backend/replication/logical/reorderbuffer.c | 48 ++++++++++++++++++-------
 src/include/replication/reorderbuffer.h         | 11 +++---
 3 files changed, 48 insertions(+), 25 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 1300902..571a901 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -778,19 +778,17 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (target_node.dbNode != ctx->slot->data.database)
 		return;
 
-	/*
-	 * Super deletions are irrelevant for logical decoding, it's driven by the
-	 * confirmation records.
-	 */
-	if (xlrec->flags & XLH_DELETE_IS_SUPER)
-		return;
-
 	/* output plugin doesn't look for this origin, no need to queue */
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
 	change = ReorderBufferGetChange(ctx->reorder);
-	change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+	if (xlrec->flags & XLH_DELETE_IS_SUPER)
+		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
+	else
+		change->action = REORDER_BUFFER_CHANGE_DELETE;
+
 	change->origin_id = XLogRecGetOrigin(r);
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index f0de337..1cd0bbd 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -364,6 +364,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	/* Reset the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	/* check whether to put into the slab cache */
 	if (rb->nr_cached_transactions < max_cached_transactions)
 	{
@@ -449,6 +452,7 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change)
 			break;
 			/* no data in addition to the struct itself */
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
@@ -1674,8 +1678,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			change_done:
 
 					/*
-					 * Either speculative insertion was confirmed, or it was
-					 * unsuccessful and the record isn't needed anymore.
+					 * If speculative insertion was confirmed, the record isn't
+					 * needed anymore.
 					 */
 					if (specinsert != NULL)
 					{
@@ -1717,6 +1721,32 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 					specinsert = change;
 					break;
 
+				case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+
+					/*
+					 * Abort for speculative insertion arrived. So cleanup the
+					 * specinsert tuple and toast hash.
+					 *
+					 * Note that we get the spec abort change for each toast
+					 * entry but we need to perform the cleanup only the first
+					 * time we get it for the main table.
+					 */
+					if (specinsert != NULL)
+					{
+						/*
+						 * We must clean the toast hash before processing a
+						 * completely new tuple to avoid confusion about the
+						 * previous tuple's toast chunks.
+						 */
+						Assert(change->data.tp.clear_toast_afterwards);
+						ReorderBufferToastReset(rb, txn);
+
+						/* We don't need this record anymore. */
+						ReorderBufferReturnChange(rb, specinsert);
+						specinsert = NULL;
+					}
+					break;
+
 				case REORDER_BUFFER_CHANGE_MESSAGE:
 					rb->message(rb, txn, change->lsn, true,
 								change->data.msg.prefix,
@@ -1792,16 +1822,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			}
 		}
 
-		/*
-		 * There's a speculative insertion remaining, just clean in up, it
-		 * can't have been successful, otherwise we'd gotten a confirmation
-		 * record.
-		 */
-		if (specinsert)
-		{
-			ReorderBufferReturnChange(rb, specinsert);
-			specinsert = NULL;
-		}
+		/* speculative insertion record must be freed by now */
+		Assert(!specinsert);
 
 		/* clean up the iterator */
 		ReorderBufferIterTXNFinish(rb, iterstate);
@@ -2476,6 +2498,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -2764,6 +2787,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			}
 			/* the base struct contains all the data, easy peasy */
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index e085088..ddba4bd 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -44,10 +44,10 @@ typedef struct ReorderBufferTupleBuf
  * changes. Users of the decoding facilities will never see changes with
  * *_INTERNAL_* actions.
  *
- * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM changes concern
- * "speculative insertions", and their confirmation respectively.  They're
- * used by INSERT .. ON CONFLICT .. UPDATE.  Users of logical decoding don't
- * have to care about these.
+ * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM, and INTERNAL_SPEC_ABORT
+ * changes concern "speculative insertions", their confirmation, and abort
+ * respectively.  They're used by INSERT .. ON CONFLICT .. UPDATE.  Users of
+ * logical decoding don't have to care about these.
  */
 enum ReorderBufferChangeType
 {
@@ -59,7 +59,8 @@ enum ReorderBufferChangeType
 	REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID,
 	REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
-	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT
 };
 
 /*
-- 
1.8.3.1

v8-0001-v12-Fix-decoding-of-speculative-aborts.patchtext/x-patch; charset=US-ASCII; name=v8-0001-v12-Fix-decoding-of-speculative-aborts.patchDownload

From f2b9240a860764934cf26f06ad4fd9b73a616a1f Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Fri, 11 Jun 2021 11:09:02 +0530
Subject: [PATCH v8] v12-Fix decoding of speculative aborts.

During decoding for speculative inserts, we were relying for cleaning
toast hash on confirmation records or next change records. But that
could lead to multiple problems (a) memory leak if there is neither a
confirmation record nor any other record after toast insertion for a
speculative insert in the transaction, (b) error and assertion failures
if the next operation is not an insert/update on the same table.

The fix is to start queuing spec abort change and clean up toast hash
and change record during its processing. Currently, we are queuing the
spec aborts for both toast and main table even though we perform cleanup
while processing the main table's spec abort record. Later, if we have a
way to distinguish between the spec abort record of toast and the main
table, we can avoid queuing the change for spec aborts of toast tables.

Reported-by: Ashutosh Bapat
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Backpatch-through: 9.6, where it was introduced
Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com
---
 src/backend/replication/logical/decode.c        | 14 ++++----
 src/backend/replication/logical/reorderbuffer.c | 48 ++++++++++++++++++-------
 src/include/replication/reorderbuffer.h         |  9 ++---
 3 files changed, 47 insertions(+), 24 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index eef2f88..ff18861 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -803,19 +803,17 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (target_node.dbNode != ctx->slot->data.database)
 		return;
 
-	/*
-	 * Super deletions are irrelevant for logical decoding, it's driven by the
-	 * confirmation records.
-	 */
-	if (xlrec->flags & XLH_DELETE_IS_SUPER)
-		return;
-
 	/* output plugin doesn't look for this origin, no need to queue */
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
 	change = ReorderBufferGetChange(ctx->reorder);
-	change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+	if (xlrec->flags & XLH_DELETE_IS_SUPER)
+		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
+	else
+		change->action = REORDER_BUFFER_CHANGE_DELETE;
+
 	change->origin_id = XLogRecGetOrigin(r);
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 941ca9a..3a922b7 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -359,6 +359,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	/* Reset the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	pfree(txn);
 }
 
@@ -426,6 +429,7 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change)
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
@@ -1628,8 +1632,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			change_done:
 
 					/*
-					 * Either speculative insertion was confirmed, or it was
-					 * unsuccessful and the record isn't needed anymore.
+					 * If speculative insertion was confirmed, the record isn't
+					 * needed anymore.
 					 */
 					if (specinsert != NULL)
 					{
@@ -1703,6 +1707,32 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 						break;
 					}
 
+				case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+
+					/*
+					 * Abort for speculative insertion arrived. So cleanup the
+					 * specinsert tuple and toast hash.
+					 *
+					 * Note that we get the spec abort change for each toast
+					 * entry but we need to perform the cleanup only the first
+					 * time we get it for the main table.
+					 */
+					if (specinsert != NULL)
+					{
+						/*
+						 * We must clean the toast hash before processing a
+						 * completely new tuple to avoid confusion about the
+						 * previous tuple's toast chunks.
+						 */
+						Assert(change->data.tp.clear_toast_afterwards);
+						ReorderBufferToastReset(rb, txn);
+
+						/* We don't need this record anymore. */
+						ReorderBufferReturnChange(rb, specinsert);
+						specinsert = NULL;
+					}
+					break;
+
 				case REORDER_BUFFER_CHANGE_MESSAGE:
 					rb->message(rb, txn, change->lsn, true,
 								change->data.msg.prefix,
@@ -1778,16 +1808,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			}
 		}
 
-		/*
-		 * There's a speculative insertion remaining, just clean in up, it
-		 * can't have been successful, otherwise we'd gotten a confirmation
-		 * record.
-		 */
-		if (specinsert)
-		{
-			ReorderBufferReturnChange(rb, specinsert);
-			specinsert = NULL;
-		}
+		/* speculative insertion record must be freed by now */
+		Assert(!specinsert);
 
 		/* clean up the iterator */
 		ReorderBufferIterTXNFinish(rb, iterstate);
@@ -2483,6 +2505,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -2797,6 +2820,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 49df12a..1ced41f 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -44,10 +44,10 @@ typedef struct ReorderBufferTupleBuf
  * changes. Users of the decoding facilities will never see changes with
  * *_INTERNAL_* actions.
  *
- * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM changes concern
- * "speculative insertions", and their confirmation respectively.  They're
- * used by INSERT .. ON CONFLICT .. UPDATE.  Users of logical decoding don't
- * have to care about these.
+ * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM, and INTERNAL_SPEC_ABORT
+ * changes concern "speculative insertions", their confirmation, and abort
+ * respectively.  They're used by INSERT .. ON CONFLICT .. UPDATE.  Users of
+ * logical decoding don't have to care about these.
  */
 enum ReorderBufferChangeType
 {
@@ -60,6 +60,7 @@ enum ReorderBufferChangeType
 	REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT,
 	REORDER_BUFFER_CHANGE_TRUNCATE
 };
 
-- 
1.8.3.1

v8-0001-v10-Fix-decoding-of-speculative-aborts.patchtext/x-patch; charset=US-ASCII; name=v8-0001-v10-Fix-decoding-of-speculative-aborts.patchDownload

From 070ddba30178a5e239b19be83b0e7828a502ab6c Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Fri, 11 Jun 2021 10:45:49 +0530
Subject: [PATCH v8] v10-Fix decoding of speculative aborts.

During decoding for speculative inserts, we were relying for cleaning
toast hash on confirmation records or next change records. But that
could lead to multiple problems (a) memory leak if there is neither a
confirmation record nor any other record after toast insertion for a
speculative insert in the transaction, (b) error and assertion failures
if the next operation is not an insert/update on the same table.

The fix is to start queuing spec abort change and clean up toast hash
and change record during its processing. Currently, we are queuing the
spec aborts for both toast and main table even though we perform cleanup
while processing the main table's spec abort record. Later, if we have a
way to distinguish between the spec abort record of toast and the main
table, we can avoid queuing the change for spec aborts of toast tables.

Reported-by: Ashutosh Bapat
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Backpatch-through: 9.6, where it was introduced
Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com
---
 src/backend/replication/logical/decode.c        | 14 ++++----
 src/backend/replication/logical/reorderbuffer.c | 48 ++++++++++++++++++-------
 src/include/replication/reorderbuffer.h         | 11 +++---
 3 files changed, 48 insertions(+), 25 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index d7a6f5d..3778ea9 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -778,19 +778,17 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (target_node.dbNode != ctx->slot->data.database)
 		return;
 
-	/*
-	 * Super deletions are irrelevant for logical decoding, it's driven by the
-	 * confirmation records.
-	 */
-	if (xlrec->flags & XLH_DELETE_IS_SUPER)
-		return;
-
 	/* output plugin doesn't look for this origin, no need to queue */
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
 	change = ReorderBufferGetChange(ctx->reorder);
-	change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+	if (xlrec->flags & XLH_DELETE_IS_SUPER)
+		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
+	else
+		change->action = REORDER_BUFFER_CHANGE_DELETE;
+
 	change->origin_id = XLogRecGetOrigin(r);
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 165ba8f..0963b23 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -353,6 +353,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	/* Reset the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	pfree(txn);
 }
 
@@ -413,6 +416,7 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change)
 			break;
 			/* no data in addition to the struct itself */
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
@@ -1621,8 +1625,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			change_done:
 
 					/*
-					 * Either speculative insertion was confirmed, or it was
-					 * unsuccessful and the record isn't needed anymore.
+					 * If speculative insertion was confirmed, the record isn't
+					 * needed anymore.
 					 */
 					if (specinsert != NULL)
 					{
@@ -1664,6 +1668,32 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 					specinsert = change;
 					break;
 
+				case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+
+					/*
+					 * Abort for speculative insertion arrived. So cleanup the
+					 * specinsert tuple and toast hash.
+					 *
+					 * Note that we get the spec abort change for each toast
+					 * entry but we need to perform the cleanup only the first
+					 * time we get it for the main table.
+					 */
+					if (specinsert != NULL)
+					{
+						/*
+						 * We must clean the toast hash before processing a
+						 * completely new tuple to avoid confusion about the
+						 * previous tuple's toast chunks.
+						 */
+						Assert(change->data.tp.clear_toast_afterwards);
+						ReorderBufferToastReset(rb, txn);
+
+						/* We don't need this record anymore. */
+						ReorderBufferReturnChange(rb, specinsert);
+						specinsert = NULL;
+					}
+					break;
+
 				case REORDER_BUFFER_CHANGE_MESSAGE:
 					rb->message(rb, txn, change->lsn, true,
 								change->data.msg.prefix,
@@ -1739,16 +1769,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			}
 		}
 
-		/*
-		 * There's a speculative insertion remaining, just clean in up, it
-		 * can't have been successful, otherwise we'd gotten a confirmation
-		 * record.
-		 */
-		if (specinsert)
-		{
-			ReorderBufferReturnChange(rb, specinsert);
-			specinsert = NULL;
-		}
+		/* speculative insertion record must be freed by now */
+		Assert(!specinsert);
 
 		/* clean up the iterator */
 		ReorderBufferIterTXNFinish(rb, iterstate);
@@ -2423,6 +2445,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -2716,6 +2739,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			}
 			/* the base struct contains all the data, easy peasy */
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 5530c4f..d455567 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -44,10 +44,10 @@ typedef struct ReorderBufferTupleBuf
  * changes. Users of the decoding facilities will never see changes with
  * *_INTERNAL_* actions.
  *
- * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM changes concern
- * "speculative insertions", and their confirmation respectively.  They're
- * used by INSERT .. ON CONFLICT .. UPDATE.  Users of logical decoding don't
- * have to care about these.
+ * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM, and INTERNAL_SPEC_ABORT
+ * changes concern "speculative insertions", their confirmation, and abort
+ * respectively.  They're used by INSERT .. ON CONFLICT .. UPDATE.  Users of
+ * logical decoding don't have to care about these.
  */
 enum ReorderBufferChangeType
 {
@@ -59,7 +59,8 @@ enum ReorderBufferChangeType
 	REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID,
 	REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
-	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT
 };
 
 /*
-- 
1.8.3.1

v8-0001-Fix-decoding-of-speculative-aborts.patchtext/x-patch; charset=US-ASCII; name=v8-0001-Fix-decoding-of-speculative-aborts.patchDownload

From 912f84100e5ba0820b349ff791075209f35e2513 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Thu, 10 Jun 2021 18:29:31 +0530
Subject: [PATCH v8] Fix decoding of speculative aborts.

During decoding for speculative inserts, we were relying for cleaning
toast hash on confirmation records or next change records. But that
could lead to multiple problems (a) memory leak if there is neither a
confirmation record nor any other record after toast insertion for a
speculative insert in the transaction, (b) error and assertion failures
if the next operation is not an insert/update on the same table.

The fix is to start queuing spec abort change and clean up toast hash
and change record during its processing. Currently, we are queuing the
spec aborts for both toast and main table even though we perform cleanup
while processing the main table's spec abort record. Later, if we have a
way to distinguish between the spec abort record of toast and the main
table, we can avoid queuing the change for spec aborts of toast tables.

Reported-by: Ashutosh Bapat
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Backpatch-through: 9.6, where it was introduced
Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com
---
 src/backend/replication/logical/decode.c        | 14 +++----
 src/backend/replication/logical/reorderbuffer.c | 49 +++++++++++++++++++------
 src/include/replication/reorderbuffer.h         |  9 +++--
 3 files changed, 48 insertions(+), 24 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 7067016..453efc5 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -1040,19 +1040,17 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (target_node.dbNode != ctx->slot->data.database)
 		return;
 
-	/*
-	 * Super deletions are irrelevant for logical decoding, it's driven by the
-	 * confirmation records.
-	 */
-	if (xlrec->flags & XLH_DELETE_IS_SUPER)
-		return;
-
 	/* output plugin doesn't look for this origin, no need to queue */
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
 	change = ReorderBufferGetChange(ctx->reorder);
-	change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+	if (xlrec->flags & XLH_DELETE_IS_SUPER)
+		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
+	else
+		change->action = REORDER_BUFFER_CHANGE_DELETE;
+
 	change->origin_id = XLogRecGetOrigin(r);
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 2d9e127..d905f32 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -443,6 +443,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	/* Reset the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	pfree(txn);
 }
 
@@ -520,6 +523,7 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change,
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
@@ -2211,8 +2215,8 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			change_done:
 
 					/*
-					 * Either speculative insertion was confirmed, or it was
-					 * unsuccessful and the record isn't needed anymore.
+					 * If speculative insertion was confirmed, the record isn't
+					 * needed anymore.
 					 */
 					if (specinsert != NULL)
 					{
@@ -2254,6 +2258,32 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 					specinsert = change;
 					break;
 
+				case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+
+					/*
+					 * Abort for speculative insertion arrived. So cleanup the
+					 * specinsert tuple and toast hash.
+					 *
+					 * Note that we get the spec abort change for each toast
+					 * entry but we need to perform the cleanup only the first
+					 * time we get it for the main table.
+					 */
+					if (specinsert != NULL)
+					{
+						/*
+						 * We must clean the toast hash before processing a
+						 * completely new tuple to avoid confusion about the
+						 * previous tuple's toast chunks.
+						 */
+						Assert(change->data.tp.clear_toast_afterwards);
+						ReorderBufferToastReset(rb, txn);
+
+						/* We don't need this record anymore. */
+						ReorderBufferReturnChange(rb, specinsert, true);
+						specinsert = NULL;
+					}
+					break;
+
 				case REORDER_BUFFER_CHANGE_TRUNCATE:
 					{
 						int			i;
@@ -2360,16 +2390,8 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			}
 		}
 
-		/*
-		 * There's a speculative insertion remaining, just clean in up, it
-		 * can't have been successful, otherwise we'd gotten a confirmation
-		 * record.
-		 */
-		if (specinsert)
-		{
-			ReorderBufferReturnChange(rb, specinsert, true);
-			specinsert = NULL;
-		}
+		/* speculative insertion record must be freed by now */
+		Assert(!specinsert);
 
 		/* clean up the iterator */
 		ReorderBufferIterTXNFinish(rb, iterstate);
@@ -3754,6 +3776,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -4017,6 +4040,7 @@ ReorderBufferChangeSize(ReorderBufferChange *change)
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -4315,6 +4339,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 0c6e9d1..ba257d8 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -46,10 +46,10 @@ typedef struct ReorderBufferTupleBuf
  * changes. Users of the decoding facilities will never see changes with
  * *_INTERNAL_* actions.
  *
- * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM changes concern
- * "speculative insertions", and their confirmation respectively.  They're
- * used by INSERT .. ON CONFLICT .. UPDATE.  Users of logical decoding don't
- * have to care about these.
+ * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM, and INTERNAL_SPEC_ABORT
+ * changes concern "speculative insertions", their confirmation, and abort
+ * respectively.  They're used by INSERT .. ON CONFLICT .. UPDATE.  Users of
+ * logical decoding don't have to care about these.
  */
 enum ReorderBufferChangeType
 {
@@ -63,6 +63,7 @@ enum ReorderBufferChangeType
 	REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT,
 	REORDER_BUFFER_CHANGE_TRUNCATE
 };
 
-- 
1.8.3.1

v8-0001-v11-Fix-decoding-of-speculative-aborts.patchtext/x-patch; charset=US-ASCII; name=v8-0001-v11-Fix-decoding-of-speculative-aborts.patchDownload

From 3b3433301386ac5f64af2647bd1e9cbac6cec784 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Fri, 11 Jun 2021 10:58:59 +0530
Subject: [PATCH v8] v11-Fix decoding of speculative aborts.

During decoding for speculative inserts, we were relying for cleaning
toast hash on confirmation records or next change records. But that
could lead to multiple problems (a) memory leak if there is neither a
confirmation record nor any other record after toast insertion for a
speculative insert in the transaction, (b) error and assertion failures
if the next operation is not an insert/update on the same table.

The fix is to start queuing spec abort change and clean up toast hash
and change record during its processing. Currently, we are queuing the
spec aborts for both toast and main table even though we perform cleanup
while processing the main table's spec abort record. Later, if we have a
way to distinguish between the spec abort record of toast and the main
table, we can avoid queuing the change for spec aborts of toast tables.

Reported-by: Ashutosh Bapat
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Backpatch-through: 9.6, where it was introduced
Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com
---
 src/backend/replication/logical/decode.c        | 14 ++++----
 src/backend/replication/logical/reorderbuffer.c | 48 ++++++++++++++++++-------
 src/include/replication/reorderbuffer.h         |  9 ++---
 3 files changed, 47 insertions(+), 24 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index d7da3d6..676f921 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -804,19 +804,17 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (target_node.dbNode != ctx->slot->data.database)
 		return;
 
-	/*
-	 * Super deletions are irrelevant for logical decoding, it's driven by the
-	 * confirmation records.
-	 */
-	if (xlrec->flags & XLH_DELETE_IS_SUPER)
-		return;
-
 	/* output plugin doesn't look for this origin, no need to queue */
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
 	change = ReorderBufferGetChange(ctx->reorder);
-	change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+	if (xlrec->flags & XLH_DELETE_IS_SUPER)
+		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
+	else
+		change->action = REORDER_BUFFER_CHANGE_DELETE;
+
 	change->origin_id = XLogRecGetOrigin(r);
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 1a4b87c..244d203 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -351,6 +351,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	/* Reset the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	pfree(txn);
 }
 
@@ -418,6 +421,7 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change)
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
@@ -1620,8 +1624,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			change_done:
 
 					/*
-					 * Either speculative insertion was confirmed, or it was
-					 * unsuccessful and the record isn't needed anymore.
+					 * If speculative insertion was confirmed, the record isn't
+					 * needed anymore.
 					 */
 					if (specinsert != NULL)
 					{
@@ -1695,6 +1699,32 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 						break;
 					}
 
+				case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+
+					/*
+					 * Abort for speculative insertion arrived. So cleanup the
+					 * specinsert tuple and toast hash.
+					 *
+					 * Note that we get the spec abort change for each toast
+					 * entry but we need to perform the cleanup only the first
+					 * time we get it for the main table.
+					 */
+					if (specinsert != NULL)
+					{
+						/*
+						 * We must clean the toast hash before processing a
+						 * completely new tuple to avoid confusion about the
+						 * previous tuple's toast chunks.
+						 */
+						Assert(change->data.tp.clear_toast_afterwards);
+						ReorderBufferToastReset(rb, txn);
+
+						/* We don't need this record anymore. */
+						ReorderBufferReturnChange(rb, specinsert);
+						specinsert = NULL;
+					}
+					break;
+
 				case REORDER_BUFFER_CHANGE_MESSAGE:
 					rb->message(rb, txn, change->lsn, true,
 								change->data.msg.prefix,
@@ -1770,16 +1800,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			}
 		}
 
-		/*
-		 * There's a speculative insertion remaining, just clean in up, it
-		 * can't have been successful, otherwise we'd gotten a confirmation
-		 * record.
-		 */
-		if (specinsert)
-		{
-			ReorderBufferReturnChange(rb, specinsert);
-			specinsert = NULL;
-		}
+		/* speculative insertion record must be freed by now */
+		Assert(!specinsert);
 
 		/* clean up the iterator */
 		ReorderBufferIterTXNFinish(rb, iterstate);
@@ -2475,6 +2497,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -2778,6 +2801,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index c41f362..ffb5a94 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -44,10 +44,10 @@ typedef struct ReorderBufferTupleBuf
  * changes. Users of the decoding facilities will never see changes with
  * *_INTERNAL_* actions.
  *
- * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM changes concern
- * "speculative insertions", and their confirmation respectively.  They're
- * used by INSERT .. ON CONFLICT .. UPDATE.  Users of logical decoding don't
- * have to care about these.
+ * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM, and INTERNAL_SPEC_ABORT
+ * changes concern "speculative insertions", their confirmation, and abort
+ * respectively.  They're used by INSERT .. ON CONFLICT .. UPDATE.  Users of
+ * logical decoding don't have to care about these.
  */
 enum ReorderBufferChangeType
 {
@@ -60,6 +60,7 @@ enum ReorderBufferChangeType
 	REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT,
 	REORDER_BUFFER_CHANGE_TRUNCATE
 };
 
-- 
1.8.3.1

v8-0001-v13-Fix-decoding-of-speculative-aborts.patchtext/x-patch; charset=US-ASCII; name=v8-0001-v13-Fix-decoding-of-speculative-aborts.patchDownload

From 7b1fbfa9d2fe4365c5c2cfb54794ca34008343b5 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Fri, 11 Jun 2021 11:17:34 +0530
Subject: [PATCH v8] v13-Fix decoding of speculative aborts.

During decoding for speculative inserts, we were relying for cleaning
toast hash on confirmation records or next change records. But that
could lead to multiple problems (a) memory leak if there is neither a
confirmation record nor any other record after toast insertion for a
speculative insert in the transaction, (b) error and assertion failures
if the next operation is not an insert/update on the same table.

The fix is to start queuing spec abort change and clean up toast hash
and change record during its processing. Currently, we are queuing the
spec aborts for both toast and main table even though we perform cleanup
while processing the main table's spec abort record. Later, if we have a
way to distinguish between the spec abort record of toast and the main
table, we can avoid queuing the change for spec aborts of toast tables.

Reported-by: Ashutosh Bapat
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Backpatch-through: 9.6, where it was introduced
Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com
---
 src/backend/replication/logical/decode.c        | 14 +++----
 src/backend/replication/logical/reorderbuffer.c | 49 +++++++++++++++++++------
 src/include/replication/reorderbuffer.h         |  9 +++--
 3 files changed, 48 insertions(+), 24 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index c2e5e3a..4985c2a 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -800,19 +800,17 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (target_node.dbNode != ctx->slot->data.database)
 		return;
 
-	/*
-	 * Super deletions are irrelevant for logical decoding, it's driven by the
-	 * confirmation records.
-	 */
-	if (xlrec->flags & XLH_DELETE_IS_SUPER)
-		return;
-
 	/* output plugin doesn't look for this origin, no need to queue */
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
 	change = ReorderBufferGetChange(ctx->reorder);
-	change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+	if (xlrec->flags & XLH_DELETE_IS_SUPER)
+		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
+	else
+		change->action = REORDER_BUFFER_CHANGE_DELETE;
+
 	change->origin_id = XLogRecGetOrigin(r);
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 5251932..e86eb8c 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -397,6 +397,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	/* Reset the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	pfree(txn);
 }
 
@@ -467,6 +470,7 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change)
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
@@ -1677,8 +1681,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			change_done:
 
 					/*
-					 * Either speculative insertion was confirmed, or it was
-					 * unsuccessful and the record isn't needed anymore.
+					 * If speculative insertion was confirmed, the record isn't
+					 * needed anymore.
 					 */
 					if (specinsert != NULL)
 					{
@@ -1720,6 +1724,32 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 					specinsert = change;
 					break;
 
+				case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+
+					/*
+					 * Abort for speculative insertion arrived. So cleanup the
+					 * specinsert tuple and toast hash.
+					 *
+					 * Note that we get the spec abort change for each toast
+					 * entry but we need to perform the cleanup only the first
+					 * time we get it for the main table.
+					 */
+					if (specinsert != NULL)
+					{
+						/*
+						 * We must clean the toast hash before processing a
+						 * completely new tuple to avoid confusion about the
+						 * previous tuple's toast chunks.
+						 */
+						Assert(change->data.tp.clear_toast_afterwards);
+						ReorderBufferToastReset(rb, txn);
+
+						/* We don't need this record anymore. */
+						ReorderBufferReturnChange(rb, specinsert);
+						specinsert = NULL;
+					}
+					break;
+
 				case REORDER_BUFFER_CHANGE_TRUNCATE:
 					{
 						int			i;
@@ -1827,16 +1857,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			}
 		}
 
-		/*
-		 * There's a speculative insertion remaining, just clean in up, it
-		 * can't have been successful, otherwise we'd gotten a confirmation
-		 * record.
-		 */
-		if (specinsert)
-		{
-			ReorderBufferReturnChange(rb, specinsert);
-			specinsert = NULL;
-		}
+		/* speculative insertion record must be freed by now */
+		Assert(!specinsert);
 
 		/* clean up the iterator */
 		ReorderBufferIterTXNFinish(rb, iterstate);
@@ -2640,6 +2662,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -2747,6 +2770,7 @@ ReorderBufferChangeSize(ReorderBufferChange *change)
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			/* ReorderBufferChange contains everything important */
@@ -3032,6 +3056,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 019bd38..676491d 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -46,10 +46,10 @@ typedef struct ReorderBufferTupleBuf
  * changes. Users of the decoding facilities will never see changes with
  * *_INTERNAL_* actions.
  *
- * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM changes concern
- * "speculative insertions", and their confirmation respectively.  They're
- * used by INSERT .. ON CONFLICT .. UPDATE.  Users of logical decoding don't
- * have to care about these.
+ * The INTERNAL_SPEC_INSERT and INTERNAL_SPEC_CONFIRM, and INTERNAL_SPEC_ABORT
+ * changes concern "speculative insertions", their confirmation, and abort
+ * respectively.  They're used by INSERT .. ON CONFLICT .. UPDATE.  Users of
+ * logical decoding don't have to care about these.
  */
 enum ReorderBufferChangeType
 {
@@ -62,6 +62,7 @@ enum ReorderBufferChangeType
 	REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
 	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+	REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT,
 	REORDER_BUFFER_CHANGE_TRUNCATE
 };
 
-- 
1.8.3.1

#51

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2021-06-15%2020%3A49%3A26

amit.kapila16@gmail.com

over 4 years ago

In reply to: Dilip Kumar (#50)

Re: Decoding speculative insert with toast leaks memory

On Mon, Jun 14, 2021 at 12:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Jun 14, 2021 at 9:44 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Jun 14, 2021 at 8:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I think the test in this patch is quite similar to what Noah has
pointed in the nearby thread [1] to be failing at some intervals. Can
you also please once verify the same and if we can expect similar
failures here then we might want to consider dropping the test in this
patch for now? We can always come back to it once we find a good
solution to make it pass consistently.

test insert-conflict-do-nothing ... ok 646 ms
test insert-conflict-do-nothing-2 ... ok 1994 ms
test insert-conflict-do-update ... ok 1786 ms
test insert-conflict-do-update-2 ... ok 2689 ms
test insert-conflict-do-update-3 ... ok 851 ms
test insert-conflict-specconflict ... FAILED 3695 ms
test delete-abort-savept ... ok 1238 ms

Yeah, this is the same test that we have used base for our test so
let's not push this test until it becomes stable.

Patches without test case.

Pushed!

--
With Regards,
Amit Kapila.

#52

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Amit Kapila (#51)

Re: Decoding speculative insert with toast leaks memory

Amit Kapila <amit.kapila16@gmail.com> writes:

Pushed!

skink reports that this has valgrind issues:

2021-06-16 01:20:13.344 UTC [2198271][4/0:0] LOG: received replication command: IDENTIFY_SYSTEM
2021-06-16 01:20:13.384 UTC [2198271][4/0:0] LOG: received replication command: START_REPLICATION SLOT "sub2" LOGICAL 0/0 (proto_version '1', publication_names '"pub2"')
2021-06-16 01:20:13.454 UTC [2198271][4/0:0] LOG: starting logical decoding for slot "sub2"
2021-06-16 01:20:13.454 UTC [2198271][4/0:0] DETAIL: Streaming transactions committing after 0/157C828, reading WAL from 0/157C7F0.
2021-06-16 01:20:13.488 UTC [2198271][4/0:0] LOG: logical decoding found consistent point at 0/157C7F0
2021-06-16 01:20:13.488 UTC [2198271][4/0:0] DETAIL: There are no running transactions.
...
==2198271== VALGRINDERROR-BEGIN
==2198271== Conditional jump or move depends on uninitialised value(s)
==2198271== at 0x80EF890: rel_sync_cache_relation_cb (pgoutput.c:833)
==2198271== by 0x666AEB: LocalExecuteInvalidationMessage (inval.c:595)
==2198271== by 0x53A423: ReceiveSharedInvalidMessages (sinval.c:90)
==2198271== by 0x666026: AcceptInvalidationMessages (inval.c:683)
==2198271== by 0x53FBDD: LockRelationOid (lmgr.c:136)
==2198271== by 0x1D3943: relation_open (relation.c:56)
==2198271== by 0x26F21F: table_open (table.c:43)
==2198271== by 0x66D97F: ScanPgRelation (relcache.c:346)
==2198271== by 0x674644: RelationBuildDesc (relcache.c:1059)
==2198271== by 0x674BE8: RelationClearRelation (relcache.c:2568)
==2198271== by 0x675064: RelationFlushRelation (relcache.c:2736)
==2198271== by 0x6750A6: RelationCacheInvalidateEntry (relcache.c:2797)
==2198271== Uninitialised value was created by a heap allocation
==2198271== at 0x6AC308: MemoryContextAlloc (mcxt.c:826)
==2198271== by 0x68A8D9: DynaHashAlloc (dynahash.c:283)
==2198271== by 0x68A94B: element_alloc (dynahash.c:1675)
==2198271== by 0x68AA58: get_hash_entry (dynahash.c:1284)
==2198271== by 0x68B23E: hash_search_with_hash_value (dynahash.c:1057)
==2198271== by 0x68B3D4: hash_search (dynahash.c:913)
==2198271== by 0x80EE855: get_rel_sync_entry (pgoutput.c:681)
==2198271== by 0x80EEDA5: pgoutput_truncate (pgoutput.c:530)
==2198271== by 0x4E48A2: truncate_cb_wrapper (logical.c:797)
==2198271== by 0x4EFDDB: ReorderBufferCommit (reorderbuffer.c:1777)
==2198271== by 0x4E1DBE: DecodeCommit (decode.c:637)
==2198271== by 0x4E1F31: DecodeXactOp (decode.c:245)
==2198271==
==2198271== VALGRINDERROR-END

regards, tom lane

#53

amit.kapila16@gmail.com

over 4 years ago

In reply to: Tom Lane (#52)

Re: Decoding speculative insert with toast leaks memory

On Wed, Jun 16, 2021 at 8:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Kapila <amit.kapila16@gmail.com> writes:

Pushed!

skink reports that this has valgrind issues:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2021-06-15%2020%3A49%3A26

The problem happens at line:
rel_sync_cache_relation_cb()
{
..
if (entry->map)
..

I think the reason is that before we initialize 'entry->map' in
get_rel_sync_entry(), the invalidation is processed as part of which
when we try to clean up the entry, it tries to access uninitialized
value. Note, this won't happen in HEAD as we initialize 'entry->map'
before we get to process any invalidation. We have fixed a similar
issue in HEAD sometime back as part of the commit 69bd60672a, so we
need to make a similar change in PG-13 as well.

This problem is introduced by commit d250568121 (Fix memory leak due
to RelationSyncEntry.map.) not by the patch in this thread, so keeping
Amit L and Osumi-San in the loop.

--
With Regards,
Amit Kapila.

#54

Amit Langote

amitlangote09@gmail.com

over 4 years ago

In reply to: Amit Kapila (#53)

Re: Decoding speculative insert with toast leaks memory

On Thu, Jun 17, 2021 at 12:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jun 16, 2021 at 8:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Kapila <amit.kapila16@gmail.com> writes:

Pushed!

skink reports that this has valgrind issues:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2021-06-15%2020%3A49%3A26

The problem happens at line:
rel_sync_cache_relation_cb()
{
..
if (entry->map)
..

I think the reason is that before we initialize 'entry->map' in
get_rel_sync_entry(), the invalidation is processed as part of which
when we try to clean up the entry, it tries to access uninitialized
value. Note, this won't happen in HEAD as we initialize 'entry->map'
before we get to process any invalidation. We have fixed a similar
issue in HEAD sometime back as part of the commit 69bd60672a, so we
need to make a similar change in PG-13 as well.

This problem is introduced by commit d250568121 (Fix memory leak due
to RelationSyncEntry.map.) not by the patch in this thread, so keeping
Amit L and Osumi-San in the loop.

Thanks.

Maybe not sufficient as a fix, but I wonder if
rel_sync_cache_relation_cb() should really also check that
replicate_valid is true in the following condition:

/*
* Reset schema sent status as the relation definition may have changed.
* Also free any objects that depended on the earlier definition.
*/
if (entry != NULL)
{

If the problem is with HEAD, I don't quite understand why the last
statement of the following code block doesn't suffice:

/* Not found means schema wasn't sent */
if (!found)
{
/* immediately make a new entry valid enough to satisfy callbacks */
entry->schema_sent = false;
entry->streamed_txns = NIL;
entry->replicate_valid = false;
entry->pubactions.pubinsert = entry->pubactions.pubupdate =
entry->pubactions.pubdelete = entry->pubactions.pubtruncate = false;
entry->publish_as_relid = InvalidOid;
entry->map = NULL; /* will be set by maybe_send_schema() if needed */
}

Do we need the same statement at the end of the following block?

/* Validate the entry */
if (!entry->replicate_valid)
{

--
Amit Langote
EDB: http://www.enterprisedb.com

#55

amit.kapila16@gmail.com

over 4 years ago

In reply to: Amit Langote (#54)

Re: Decoding speculative insert with toast leaks memory

On Thu, Jun 17, 2021 at 10:39 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Thu, Jun 17, 2021 at 12:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jun 16, 2021 at 8:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Kapila <amit.kapila16@gmail.com> writes:

Pushed!

skink reports that this has valgrind issues:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2021-06-15%2020%3A49%3A26

The problem happens at line:
rel_sync_cache_relation_cb()
{
..
if (entry->map)
..

I think the reason is that before we initialize 'entry->map' in
get_rel_sync_entry(), the invalidation is processed as part of which
when we try to clean up the entry, it tries to access uninitialized
value. Note, this won't happen in HEAD as we initialize 'entry->map'
before we get to process any invalidation. We have fixed a similar
issue in HEAD sometime back as part of the commit 69bd60672a, so we
need to make a similar change in PG-13 as well.

This problem is introduced by commit d250568121 (Fix memory leak due
to RelationSyncEntry.map.) not by the patch in this thread, so keeping
Amit L and Osumi-San in the loop.

Thanks.

Maybe not sufficient as a fix, but I wonder if
rel_sync_cache_relation_cb() should really also check that
replicate_valid is true in the following condition:

I don't think that is required because we initialize the entry in "if
(!found)" case in the HEAD.

/*
* Reset schema sent status as the relation definition may have changed.
* Also free any objects that depended on the earlier definition.
*/
if (entry != NULL)
{

If the problem is with HEAD,

The problem occurs only in PG-13. So, we need to make PG-13 code
similar to HEAD as far as initialization of entry is concerned.

--
With Regards,
Amit Kapila.

#56

Amit Langote

amitlangote09@gmail.com

over 4 years ago

In reply to: Amit Kapila (#55)

1 attachment(s)

Re: Decoding speculative insert with toast leaks memory

On Thu, Jun 17, 2021 at 3:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Jun 17, 2021 at 10:39 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Thu, Jun 17, 2021 at 12:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jun 16, 2021 at 8:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Kapila <amit.kapila16@gmail.com> writes:

Pushed!

skink reports that this has valgrind issues:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2021-06-15%2020%3A49%3A26

The problem happens at line:
rel_sync_cache_relation_cb()
{
..
if (entry->map)
..

I think the reason is that before we initialize 'entry->map' in
get_rel_sync_entry(), the invalidation is processed as part of which
when we try to clean up the entry, it tries to access uninitialized
value. Note, this won't happen in HEAD as we initialize 'entry->map'
before we get to process any invalidation. We have fixed a similar
issue in HEAD sometime back as part of the commit 69bd60672a, so we
need to make a similar change in PG-13 as well.

This problem is introduced by commit d250568121 (Fix memory leak due
to RelationSyncEntry.map.) not by the patch in this thread, so keeping
Amit L and Osumi-San in the loop.

Thanks.

Maybe not sufficient as a fix, but I wonder if
rel_sync_cache_relation_cb() should really also check that
replicate_valid is true in the following condition:

I don't think that is required because we initialize the entry in "if
(!found)" case in the HEAD.

Yeah, I see that. If we can be sure that the callback can't get
called between hash_search() allocating the entry and the above code
block making the entry look valid, which appears to be the case, then
I guess we don't need to worry.

/*
* Reset schema sent status as the relation definition may have changed.
* Also free any objects that depended on the earlier definition.
*/
if (entry != NULL)
{

If the problem is with HEAD,

The problem occurs only in PG-13. So, we need to make PG-13 code
similar to HEAD as far as initialization of entry is concerned.

Oh I missed that the problem report is for the PG13 branch.

How about the attached patch then?

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

pg13-init-RelationSyncEntry-properly.patchapplication/octet-stream; name=pg13-init-RelationSyncEntry-properly.patchDownload

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 5b33b31515..30cb4e03aa 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -685,7 +685,22 @@ get_rel_sync_entry(PGOutputData *data, Oid relid)
 	Assert(entry != NULL);
 
 	/* Not found means schema wasn't sent */
-	if (!found || !entry->replicate_valid)
+	if (!found)
+	{
+		/*
+		 * Make the new entry valid enough for the callbacks to look at, in
+		 * case any of them get invoked during the more complicated
+		 * initialization steps below.
+		 */
+		entry->schema_sent = false;
+		entry->replicate_valid = false;
+		entry->pubactions.pubinsert = entry->pubactions.pubupdate =
+			entry->pubactions.pubdelete = entry->pubactions.pubtruncate = false;
+		entry->publish_as_relid = InvalidOid;
+		entry->map = NULL;	/* will be set by maybe_send_schema() if needed */
+	}
+
+	if (!entry->replicate_valid)
 	{
 		List	   *pubids = GetRelationPublications(relid);
 		ListCell   *lc;
@@ -782,13 +797,9 @@ get_rel_sync_entry(PGOutputData *data, Oid relid)
 		list_free(pubids);
 
 		entry->publish_as_relid = publish_as_relid;
-		entry->map = NULL;	/* will be set by maybe_send_schema() if needed */
 		entry->replicate_valid = true;
 	}
 
-	if (!found)
-		entry->schema_sent = false;
-
 	return entry;
 }

#57

dilipbalaut@gmail.com

over 4 years ago

In reply to: Amit Langote (#56)

Re: Decoding speculative insert with toast leaks memory

On Thu, Jun 17, 2021 at 12:52 PM Amit Langote <amitlangote09@gmail.com> wrote:

Oh I missed that the problem report is for the PG13 branch.

How about the attached patch then?

Looks good, one minor comment, how about making the below comment,
same as on the head?

- if (!found || !entry->replicate_valid)
+ if (!found)
+ {
+ /*
+ * Make the new entry valid enough for the callbacks to look at, in
+ * case any of them get invoked during the more complicated
+ * initialization steps below.
+ */

On head:
if (!found)
{
/* immediately make a new entry valid enough to satisfy callbacks */

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#58

Amit Langote

amitlangote09@gmail.com

over 4 years ago

In reply to: Dilip Kumar (#57)

2 attachment(s)

Re: Decoding speculative insert with toast leaks memory

Hi Dilip,

On Thu, Jun 17, 2021 at 4:45 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Jun 17, 2021 at 12:52 PM Amit Langote <amitlangote09@gmail.com> wrote:

Oh I missed that the problem report is for the PG13 branch.

How about the attached patch then?

Looks good,

Thanks for checking.

one minor comment, how about making the below comment,
same as on the head?
- if (!found || !entry->replicate_valid)
+ if (!found)
+ {
+ /*
+ * Make the new entry valid enough for the callbacks to look at, in
+ * case any of them get invoked during the more complicated
+ * initialization steps below.
+ */
On head:
if (!found)
{
/* immediately make a new entry valid enough to satisfy callbacks */

Agree it's better to have the same comment in both branches.

Though, I think it should be "the new entry", not "a new entry". I
find the sentence I wrote a bit more enlightening, but I am fine with
just fixing the aforementioned problem with the existing comment.

I've updated the patch. Also, attaching a patch for HEAD for the
s/a/the change. While at it, I also capitalized "immediately".

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

pg13-init-RelationSyncEntry-properly_v2.patchapplication/octet-stream; name=pg13-init-RelationSyncEntry-properly_v2.patchDownload

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 5b33b31515..da82cafa72 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -685,7 +685,20 @@ get_rel_sync_entry(PGOutputData *data, Oid relid)
 	Assert(entry != NULL);
 
 	/* Not found means schema wasn't sent */
-	if (!found || !entry->replicate_valid)
+	if (!found)
+	{
+		/*
+		 * Immediately make the new entry valid enough to satisfy callbacks.
+		 */
+		entry->schema_sent = false;
+		entry->replicate_valid = false;
+		entry->pubactions.pubinsert = entry->pubactions.pubupdate =
+			entry->pubactions.pubdelete = entry->pubactions.pubtruncate = false;
+		entry->publish_as_relid = InvalidOid;
+		entry->map = NULL;	/* will be set by maybe_send_schema() if needed */
+	}
+
+	if (!entry->replicate_valid)
 	{
 		List	   *pubids = GetRelationPublications(relid);
 		ListCell   *lc;
@@ -782,13 +795,9 @@ get_rel_sync_entry(PGOutputData *data, Oid relid)
 		list_free(pubids);
 
 		entry->publish_as_relid = publish_as_relid;
-		entry->map = NULL;	/* will be set by maybe_send_schema() if needed */
 		entry->replicate_valid = true;
 	}
 
-	if (!found)
-		entry->schema_sent = false;
-
 	return entry;
 }

HEAD-fix-get_rel_sync_entry-comment.patchapplication/octet-stream; name=HEAD-fix-get_rel_sync_entry-comment.patchDownload

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 63f108f960..3117308fb7 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1024,7 +1024,9 @@ get_rel_sync_entry(PGOutputData *data, Oid relid)
 	/* Not found means schema wasn't sent */
 	if (!found)
 	{
-		/* immediately make a new entry valid enough to satisfy callbacks */
+		/*
+		 * Immediately make the new entry valid enough to satisfy callbacks.
+		 */
 		entry->schema_sent = false;
 		entry->streamed_txns = NIL;
 		entry->replicate_valid = false;

#59

amit.kapila16@gmail.com

over 4 years ago

In reply to: Amit Langote (#58)

Re: Decoding speculative insert with toast leaks memory

On Thu, Jun 17, 2021 at 1:35 PM Amit Langote <amitlangote09@gmail.com> wrote:

Hi Dilip,

On Thu, Jun 17, 2021 at 4:45 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Jun 17, 2021 at 12:52 PM Amit Langote <amitlangote09@gmail.com> wrote:

Oh I missed that the problem report is for the PG13 branch.

How about the attached patch then?

Looks good,

Thanks for checking.
one minor comment, how about making the below comment,
same as on the head?
- if (!found || !entry->replicate_valid)
+ if (!found)
+ {
+ /*
+ * Make the new entry valid enough for the callbacks to look at, in
+ * case any of them get invoked during the more complicated
+ * initialization steps below.
+ */
On head:
if (!found)
{
/* immediately make a new entry valid enough to satisfy callbacks */
Agree it's better to have the same comment in both branches.

Though, I think it should be "the new entry", not "a new entry". I
find the sentence I wrote a bit more enlightening, but I am fine with
just fixing the aforementioned problem with the existing comment.

I've updated the patch. Also, attaching a patch for HEAD for the
s/a/the change. While at it, I also capitalized "immediately".

Your patch looks good to me as well. I would like to retain the
comment as it is from master for now. I'll do some testing and push it
tomorrow unless there are additional comments.

--
With Regards,
Amit Kapila.

#60

amit.kapila16@gmail.com

over 4 years ago

In reply to: Amit Kapila (#59)

Re: Decoding speculative insert with toast leaks memory

On Thu, Jun 17, 2021 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Your patch looks good to me as well. I would like to retain the
comment as it is from master for now. I'll do some testing and push it
tomorrow unless there are additional comments.

Pushed!

--
With Regards,
Amit Kapila.

#61

tomas.vondra@enterprisedb.com

over 4 years ago

In reply to: Amit Kapila (#60)

Re: Decoding speculative insert with toast leaks memory

Hi,

On 6/18/21 5:50 AM, Amit Kapila wrote:

On Thu, Jun 17, 2021 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Your patch looks good to me as well. I would like to retain the
comment as it is from master for now. I'll do some testing and push it
tomorrow unless there are additional comments.

Pushed!

While rebasing a patch broken by 4daa140a2f5, I noticed that the patch
does this:

@@ -63,6 +63,7 @@ enum ReorderBufferChangeType
        REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
        REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
        REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+       REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT,
        REORDER_BUFFER_CHANGE_TRUNCATE
 };

I understand adding the ABORT right after CONFIRM

Isn't that an undesirable ABI break for extensions? It changes the value
for the TRUNCATE item, so if an extension references to that somehow
it'd suddenly start failing (until it gets rebuilt). And the failures
would be pretty confusing and seemingly contradicting the code.

FWIW I don't know how likely it is for an extension to depend on the
TRUNCATE value (it'd be far worse for INSERT/UPDATE/DELETE), but seems
moving the new element at the end would solve this.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#62

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Tomas Vondra (#61)

Re: Decoding speculative insert with toast leaks memory

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:

While rebasing a patch broken by 4daa140a2f5, I noticed that the patch
does this:

@@ -63,6 +63,7 @@ enum ReorderBufferChangeType
REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+       REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT,
REORDER_BUFFER_CHANGE_TRUNCATE
};

Isn't that an undesirable ABI break for extensions?

I think it's OK in HEAD. I agree we shouldn't do it like that
in the back branches.

regards, tom lane

#63

amit.kapila16@gmail.com

over 4 years ago

In reply to: Tom Lane (#62)

Re: Decoding speculative insert with toast leaks memory

On Wed, Jun 23, 2021 at 8:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:

While rebasing a patch broken by 4daa140a2f5, I noticed that the patch
does this:
@@ -63,6 +63,7 @@ enum ReorderBufferChangeType
REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID,
REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT,
REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
+       REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT,
REORDER_BUFFER_CHANGE_TRUNCATE
};
Isn't that an undesirable ABI break for extensions?

I think it's OK in HEAD. I agree we shouldn't do it like that
in the back branches.

Okay, I'll change this in back branches and HEAD to keep the code
consistent, or do you think it is better to retain the order in HEAD
as it is and just change it for back-branches?

--
With Regards,
Amit Kapila.

#64

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Amit Kapila (#63)

Re: Decoding speculative insert with toast leaks memory

Amit Kapila <amit.kapila16@gmail.com> writes:

I think it's OK in HEAD. I agree we shouldn't do it like that
in the back branches.

Okay, I'll change this in back branches and HEAD to keep the code
consistent, or do you think it is better to retain the order in HEAD
as it is and just change it for back-branches?

As I said, I'd keep the natural ordering in HEAD.

regards, tom lane

#65

Michael Paquier

michael@paquier.xyz

over 4 years ago

In reply to: Tom Lane (#64)

Re: Decoding speculative insert with toast leaks memory

On Thu, Jun 24, 2021 at 12:25:15AM -0400, Tom Lane wrote:

Amit Kapila <amit.kapila16@gmail.com> writes:

Okay, I'll change this in back branches and HEAD to keep the code
consistent, or do you think it is better to retain the order in HEAD
as it is and just change it for back-branches?

As I said, I'd keep the natural ordering in HEAD.

Yes, please keep the items in an alphabetical order on HEAD, and just
have the new item at the bottom of the enum in the back-branches.
That's the usual practice.
--
Michael

#66