[BUG] failed assertion in EnsurePortalSnapshotExists()
Hi hackers,
I recently observed a failed assertion in EnsurePortalSnapshotExists().
The steps to reproduce the issue on the master branch are:
create table bdt (a int primary key);
insert into bdt values (1),(2);
create table bdt2 (a int);
insert into bdt2 values (1);
Then launching:
DO $$
BEGIN
FOR i IN 1..2 LOOP
BEGIN
INSERT INTO bdt (a) VALUES (i);
exception when unique_violation then update bdt2 set a = i;
COMMIT;
END;
END LOOP;
END;
$$;
Would produce:
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost
Due to:
#2 0x0000000000b2ffcb in ExceptionalCondition (conditionName=0xd0d598
"portal->portalSnapshot == NULL", errorType=0xd0d0b3 "FailedAssertion",
fileName=0xd0d174 "pquery.c", lineNumber=1785) at assert.c:69
#3 0x000000000099e666 in EnsurePortalSnapshotExists () at pquery.c:1785
From what i have seen, we end up having ActiveSnapshot set to NULL in
AtSubAbort_Snapshot() (while we still have ActivePortal->portalSnapshot
not being NULL and not set to NULL later on).
That leads to ActiveSnapshotSet() not being true in the next call to
EnsurePortalSnapshotExists() and leads to the failed assertion (checking
that ActivePortal->portalSnapshot is NULL) later on in the code.
Based on this, i have created the attached patch (which fixes the issue
mentioned in the repro) but I have the feeling that I may have missed
something more important here (that would not be addressed with the
attached patch), thoughts?
Thanks
Bertrand
Attachments:
v1-0001-EnsurePortalSnapshotExists-failed-assertion.patchtext/plain; charset=UTF-8; name=v1-0001-EnsurePortalSnapshotExists-failed-assertion.patchDownload+5-0
"Drouvot, Bertrand" <bdrouvot@amazon.com> writes:
I recently observed a failed assertion in EnsurePortalSnapshotExists().
Hmm, interesting. If I take out the "update bdt2" step, so that the
exception clause is just COMMIT, then I get something different:
ERROR: portal snapshots (1) did not account for all active snapshots (0)
CONTEXT: PL/pgSQL function inline_code_block line 8 at COMMIT
I think perhaps plpgsql's exception handling needs a bit of adjustment,
but not sure what yet.
regards, tom lane
Hi,
On 9/27/21 9:44 PM, Tom Lane wrote:
"Drouvot, Bertrand" <bdrouvot@amazon.com> writes:
I recently observed a failed assertion in EnsurePortalSnapshotExists().
Hmm, interesting.
Thanks for looking at it!
If I take out the "update bdt2" step, so that the
exception clause is just COMMIT, then I get something different:ERROR: portal snapshots (1) did not account for all active snapshots (0)
CONTEXT: PL/pgSQL function inline_code_block line 8 at COMMIT
FWIW, I just gave it a try and it looks like this is also "fixed" by the
proposed patch.
Does it make sense (as it is currently) to set the ActiveSnapshot to
NULL and not ensuring the same is done for ActivePortal->portalSnapshot?
Or does it mean we should not reach a state where we set ActiveSnapshot
to NULL while ActivePortal->portalSnapshot is not already NULL?
Thanks
Bertrand
"Drouvot, Bertrand" <bdrouvot@amazon.com> writes:
Does it make sense (as it is currently) to set the ActiveSnapshot to
NULL and not ensuring the same is done for ActivePortal->portalSnapshot?
I think that patch is just a kluge :-(
After tracing through this I've figured out what is happening, and
why you need to put the exception block inside a loop to make it
happen. The first iteration goes fine, and we end up with an empty
ActiveSnapshot stack (and no portalSnapshots) because that's how
the COMMIT leaves it. In the second iteration, we try to
re-establish the portal snapshot, but at the point where
EnsurePortalSnapshotExists is called (from the first INSERT
command) we are already inside a subtransaction thanks to the
plpgsql exception block. This means that the stacked ActiveSnapshot
has as_level = 2, although the Portal owning it belongs to the
outer transaction. So at the next exception, AtSubAbort_Snapshot
zaps the stacked ActiveSnapshot, but the Portal stays alive and
now it has a dangling portalSnapshot pointer.
Basically there seem to be two ways to fix this, both requiring
API additions to snapmgr.c (hence, cc'ing Alvaro for opinion):
1. Provide a variant of PushActiveSnapshot that allows the caller
to specify the as_level to use, and then have
EnsurePortalSnapshotExists, as well as other places that create
Portal-associated snapshots, use this with as_level equal to the
Portal's createSubid. The main hazard I see here is that this could
theoretically allow the ActiveSnapshot stack to end up with
out-of-order as_level values, causing issues. For the moment we
could probably just add assertions to check that the passed as_level
is >= next level, or maybe even that this variant is only called with
empty ActiveSnapshot stack.
2. Provide a way for AtSubAbort_Portals to detect whether a
portalSnapshot pointer points to a snap that's going to be
deleted by AtSubAbort_Snapshot, and then just have it clear any
portalSnapshots that are about to become dangling. (This'd amount
to accepting the possibility that portalSnapshot is of a different
subxact level from the portal, and dealing with the situation.)
I initially thought #2 would be the way to go, but it turns out
to be a bit messy since what we have is a Snapshot pointer not an
ActiveSnapshotElt pointer. We'd have to do something like search the
ActiveSnapshot stack looking for pointer equality to the caller's
Snapshot pointer, which seems fragile --- do we assume as_snap is
unique for any other purpose?
That being the case, I'm now leaning to #1. Thoughts?
regards, tom lane
Hi,
On 9/28/21 6:50 PM, Tom Lane wrote:
"Drouvot, Bertrand" <bdrouvot@amazon.com> writes:
Does it make sense (as it is currently) to set the ActiveSnapshot to
NULL and not ensuring the same is done for ActivePortal->portalSnapshot?I think that patch is just a kluge :-(
right, sounded too simple to be "true".
After tracing through this I've figured out what is happening, and
why you need to put the exception block inside a loop to make it
happen. The first iteration goes fine, and we end up with an empty
ActiveSnapshot stack (and no portalSnapshots) because that's how
the COMMIT leaves it. In the second iteration, we try to
re-establish the portal snapshot, but at the point where
EnsurePortalSnapshotExists is called (from the first INSERT
command) we are already inside a subtransaction thanks to the
plpgsql exception block. This means that the stacked ActiveSnapshot
has as_level = 2, although the Portal owning it belongs to the
outer transaction. So at the next exception, AtSubAbort_Snapshot
zaps the stacked ActiveSnapshot, but the Portal stays alive and
now it has a dangling portalSnapshot pointer.
Thanks for the explanation!
Basically there seem to be two ways to fix this, both requiring
API additions to snapmgr.c (hence, cc'ing Alvaro for opinion):1. Provide a variant of PushActiveSnapshot that allows the caller
to specify the as_level to use, and then have
EnsurePortalSnapshotExists, as well as other places that create
Portal-associated snapshots, use this with as_level equal to the
Portal's createSubid. The main hazard I see here is that this could
theoretically allow the ActiveSnapshot stack to end up with
out-of-order as_level values, causing issues. For the moment we
could probably just add assertions to check that the passed as_level
is >= next level, or maybe even that this variant is only called with
empty ActiveSnapshot stack.
I think we may get a non empty ActiveSnapshot stack with prepared
statements, so tempted to do the assertion on the levels.
2. Provide a way for AtSubAbort_Portals to detect whether a
portalSnapshot pointer points to a snap that's going to be
deleted by AtSubAbort_Snapshot, and then just have it clear any
portalSnapshots that are about to become dangling. (This'd amount
to accepting the possibility that portalSnapshot is of a different
subxact level from the portal, and dealing with the situation.)I initially thought #2 would be the way to go, but it turns out
to be a bit messy since what we have is a Snapshot pointer not an
ActiveSnapshotElt pointer. We'd have to do something like search the
ActiveSnapshot stack looking for pointer equality to the caller's
Snapshot pointer, which seems fragile --- do we assume as_snap is
unique for any other purpose?That being the case, I'm now leaning to #1. Thoughts?
I'm also inclined to #1.
Please find attached a patch proposal for #1 that also adds a new test.
Thanks
Bertrand
Attachments:
v2-0001-EnsurePortalSnapshotExists-failed-assertion.patchtext/plain; charset=UTF-8; name=v2-0001-EnsurePortalSnapshotExists-failed-assertion.patchDownload+88-9
Em qua., 29 de set. de 2021 às 06:55, Drouvot, Bertrand <bdrouvot@amazon.com>
escreveu:
Hi,
On 9/28/21 6:50 PM, Tom Lane wrote:
"Drouvot, Bertrand" <bdrouvot@amazon.com> writes:
Does it make sense (as it is currently) to set the ActiveSnapshot to
NULL and not ensuring the same is done for ActivePortal->portalSnapshot?I think that patch is just a kluge :-(
right, sounded too simple to be "true".
After tracing through this I've figured out what is happening, and
why you need to put the exception block inside a loop to make it
happen. The first iteration goes fine, and we end up with an empty
ActiveSnapshot stack (and no portalSnapshots) because that's how
the COMMIT leaves it. In the second iteration, we try to
re-establish the portal snapshot, but at the point where
EnsurePortalSnapshotExists is called (from the first INSERT
command) we are already inside a subtransaction thanks to the
plpgsql exception block. This means that the stacked ActiveSnapshot
has as_level = 2, although the Portal owning it belongs to the
outer transaction. So at the next exception, AtSubAbort_Snapshot
zaps the stacked ActiveSnapshot, but the Portal stays alive and
now it has a dangling portalSnapshot pointer.Thanks for the explanation!
Basically there seem to be two ways to fix this, both requiring
API additions to snapmgr.c (hence, cc'ing Alvaro for opinion):1. Provide a variant of PushActiveSnapshot that allows the caller
to specify the as_level to use, and then have
EnsurePortalSnapshotExists, as well as other places that create
Portal-associated snapshots, use this with as_level equal to the
Portal's createSubid. The main hazard I see here is that this could
theoretically allow the ActiveSnapshot stack to end up with
out-of-order as_level values, causing issues. For the moment we
could probably just add assertions to check that the passed as_level
is >= next level, or maybe even that this variant is only called with
empty ActiveSnapshot stack.I think we may get a non empty ActiveSnapshot stack with prepared
statements, so tempted to do the assertion on the levels.2. Provide a way for AtSubAbort_Portals to detect whether a
portalSnapshot pointer points to a snap that's going to be
deleted by AtSubAbort_Snapshot, and then just have it clear any
portalSnapshots that are about to become dangling. (This'd amount
to accepting the possibility that portalSnapshot is of a different
subxact level from the portal, and dealing with the situation.)I initially thought #2 would be the way to go, but it turns out
to be a bit messy since what we have is a Snapshot pointer not an
ActiveSnapshotElt pointer. We'd have to do something like search the
ActiveSnapshot stack looking for pointer equality to the caller's
Snapshot pointer, which seems fragile --- do we assume as_snap is
unique for any other purpose?That being the case, I'm now leaning to #1. Thoughts?
I'm also inclined to #1.
I have a stupid question, why duplicate PushActiveSnapshot?
Wouldn't one function be better?
PushActiveSnapshot(Snapshot snap, int as_level);
Sample calls:
PushActiveSnapshot(GetTransactionSnapshot(),
GetCurrentTransactionNestLevel());
PushActiveSnapshot(queryDesc->snapshot, GetCurrentTransactionNestLevel());
PushActiveSnapshot(GetTransactionSnapshot(), portal->createSubid);
regards,
Ranier Vilela
Hi,
On 9/29/21 12:59 PM, Ranier Vilela wrote:
Em qua., 29 de set. de 2021 às 06:55, Drouvot, Bertrand
<bdrouvot@amazon.com> escreveu:I'm also inclined to #1.
I have a stupid question, why duplicate PushActiveSnapshot?
Wouldn't one function be better?PushActiveSnapshot(Snapshot snap, int as_level);
Sample calls:
PushActiveSnapshot(GetTransactionSnapshot(),
GetCurrentTransactionNestLevel());
PushActiveSnapshot(queryDesc->snapshot,
GetCurrentTransactionNestLevel());
PushActiveSnapshot(GetTransactionSnapshot(), portal->createSubid);
I would say because that could "break" existing extensions for example.
Adding a new function prevents "updating" existing extensions making use
of PushActiveSnapshot().
Thanks
Bertrand
Em qua., 29 de set. de 2021 às 08:12, Drouvot, Bertrand <bdrouvot@amazon.com>
escreveu:
Hi,
On 9/29/21 12:59 PM, Ranier Vilela wrote:Em qua., 29 de set. de 2021 às 06:55, Drouvot, Bertrand <
bdrouvot@amazon.com> escreveu:I'm also inclined to #1.
I have a stupid question, why duplicate PushActiveSnapshot?
Wouldn't one function be better?PushActiveSnapshot(Snapshot snap, int as_level);
Sample calls:
PushActiveSnapshot(GetTransactionSnapshot(),
GetCurrentTransactionNestLevel());
PushActiveSnapshot(queryDesc->snapshot, GetCurrentTransactionNestLevel());
PushActiveSnapshot(GetTransactionSnapshot(), portal->createSubid);I would say because that could "break" existing extensions for example.
Adding a new function prevents "updating" existing extensions making use
of PushActiveSnapshot().
Valid argument of course.
But the extensions should also fit the core code.
Duplicating functions is very bad for maintenance and bloats the code
unnecessarily, IMHO.
regards,
Ranier Vilela
On 9/29/21 1:23 PM, Ranier Vilela wrote:
Em qua., 29 de set. de 2021 às 08:12, Drouvot, Bertrand
<bdrouvot@amazon.com> escreveu:Hi,
On 9/29/21 12:59 PM, Ranier Vilela wrote:
Em qua., 29 de set. de 2021 às 06:55, Drouvot, Bertrand
<bdrouvot@amazon.com> escreveu:I'm also inclined to #1.
I have a stupid question, why duplicate PushActiveSnapshot?
Wouldn't one function be better?PushActiveSnapshot(Snapshot snap, int as_level);
Sample calls:
PushActiveSnapshot(GetTransactionSnapshot(),
GetCurrentTransactionNestLevel());
PushActiveSnapshot(queryDesc->snapshot,
GetCurrentTransactionNestLevel());
PushActiveSnapshot(GetTransactionSnapshot(), portal->createSubid);I would say because that could "break" existing extensions for
example.Adding a new function prevents "updating" existing extensions
making use of PushActiveSnapshot().Valid argument of course.
But the extensions should also fit the core code.
Duplicating functions is very bad for maintenance and bloats the code
unnecessarily, IMHO.
Right. I don't have a strong opinion about this.
Let's see what Tom, Alvaro or others arguments/opinions are (should they
also want to go with option #1).
Thanks
Bertrand
On 2021-Sep-28, Tom Lane wrote:
1. Provide a variant of PushActiveSnapshot that allows the caller
to specify the as_level to use, and then have
EnsurePortalSnapshotExists, as well as other places that create
Portal-associated snapshots, use this with as_level equal to the
Portal's createSubid. The main hazard I see here is that this could
theoretically allow the ActiveSnapshot stack to end up with
out-of-order as_level values, causing issues. For the moment we
could probably just add assertions to check that the passed as_level
is >= next level, or maybe even that this variant is only called with
empty ActiveSnapshot stack.
I don't see anything wrong with this idea offhand.
I didn't try to create scenarios with out-of-order as_level active
snapshots, but with all the creativity out there in the world, and
considering that it's possible to write procedures in C, I think that
asserting that the order is maintained is warranted.
Now if we do meet a case with out-of-order levels, what do we do? I
suppose we'd need to update AtSubCommit_Snapshot and AtSubAbort_Snapshot
to cope with that (but by all means let's wait until we have a test case
where that happens ...)
2. Provide a way for AtSubAbort_Portals to detect whether a
portalSnapshot pointer points to a snap that's going to be
deleted by AtSubAbort_Snapshot, and then just have it clear any
portalSnapshots that are about to become dangling. (This'd amount
to accepting the possibility that portalSnapshot is of a different
subxact level from the portal, and dealing with the situation.)I initially thought #2 would be the way to go, but it turns out
to be a bit messy since what we have is a Snapshot pointer not an
ActiveSnapshotElt pointer. We'd have to do something like search the
ActiveSnapshot stack looking for pointer equality to the caller's
Snapshot pointer, which seems fragile --- do we assume as_snap is
unique for any other purpose?
I don't remember what patch it was, but I remember contemplating
sometime during the past year the possibility of snapshots being used
twice in the active stack. Now maybe this is not possible in practice
because in most cases we create a copy, but I couldn't swear that that's
always the case. I wouldn't rely on that.
--
Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
"Las cosas son buenas o malas segun las hace nuestra opinión" (Lisias)
On 2021-Sep-29, Ranier Vilela wrote:
Em qua., 29 de set. de 2021 às 08:12, Drouvot, Bertrand <bdrouvot@amazon.com>
escreveu:
Adding a new function prevents "updating" existing extensions making use
of PushActiveSnapshot().Valid argument of course.
But the extensions should also fit the core code.
Duplicating functions is very bad for maintenance and bloats the code
unnecessarily, IMHO.
Well, there are 42 calls of PushActiveSnapshot currently, and only 6 are
updated in the patch. Given that six sevenths of the calls continue to
use the existing function and that it is less verbose than the new one,
that seems sufficient argument to keep it.
--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
"How amazing is that? I call it a night and come back to find that a bug has
been identified and patched while I sleep." (Robert Davidson)
http://archives.postgresql.org/pgsql-sql/2006-03/msg00378.php
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2021-Sep-29, Ranier Vilela wrote:
Em qua., 29 de set. de 2021 às 08:12, Drouvot, Bertrand <bdrouvot@amazon.com>
escreveu:
Duplicating functions is very bad for maintenance and bloats the code
unnecessarily, IMHO.
Well, there are 42 calls of PushActiveSnapshot currently, and only 6 are
updated in the patch. Given that six sevenths of the calls continue to
use the existing function and that it is less verbose than the new one,
that seems sufficient argument to keep it.
Seeing that we have to back-patch this, changing the ABI of
PushActiveSnapshot seems like a complete non-starter.
The idea I'd had to avoid code duplication was to make
PushActiveSnapshot a wrapper for the extended function:
void
PushActiveSnapshot(Snapshot snap)
{
PushActiveSnapshotWithLevel(snap, GetCurrentTransactionNestLevel());
}
This would add one function call to the common code path, but there
are enough function calls in PushActiveSnapshot that I don't think
that's a big concern.
Another point is that this'd also add the as_level ordering assertion
to the common code path, but on the whole I think that's good not bad.
BTW, this is not great code:
+ if (ActiveSnapshot != NULL && ActiveSnapshot->as_next != NULL)
+ Assert(as_level >= ActiveSnapshot->as_next->as_level);
You want it all wrapped in the Assert, so that there's not any code
left in a non-assert build (which the compiler may or may not optimize
away, perhaps after complaining about a side-effect-free statement).
Actually, it's plain wrong, because you should be looking at the
top as_level not the next one. So more like
Assert(ActiveSnapshot == NULL ||
snap_level >= ActiveSnapshot->as_level);
regards, tom lane
Em qua., 29 de set. de 2021 às 15:01, Tom Lane <tgl@sss.pgh.pa.us> escreveu:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2021-Sep-29, Ranier Vilela wrote:
Em qua., 29 de set. de 2021 às 08:12, Drouvot, Bertrand <
bdrouvot@amazon.com>
escreveu:
Duplicating functions is very bad for maintenance and bloats the code
unnecessarily, IMHO.Well, there are 42 calls of PushActiveSnapshot currently, and only 6 are
updated in the patch. Given that six sevenths of the calls continue to
use the existing function and that it is less verbose than the new one,
that seems sufficient argument to keep it.Seeing that we have to back-patch this, changing the ABI of
PushActiveSnapshot seems like a complete non-starter.The idea I'd had to avoid code duplication was to make
PushActiveSnapshot a wrapper for the extended function:void
PushActiveSnapshot(Snapshot snap)
{
PushActiveSnapshotWithLevel(snap, GetCurrentTransactionNestLevel());
}Much better.
regards,
Ranier Vilela
On 9/29/21 8:01 PM, Tom Lane wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2021-Sep-29, Ranier Vilela wrote:
Em qua., 29 de set. de 2021 às 08:12, Drouvot, Bertrand <bdrouvot@amazon.com>
escreveu:
Duplicating functions is very bad for maintenance and bloats the code
unnecessarily, IMHO.Well, there are 42 calls of PushActiveSnapshot currently, and only 6 are
updated in the patch. Given that six sevenths of the calls continue to
use the existing function and that it is less verbose than the new one,
that seems sufficient argument to keep it.Seeing that we have to back-patch this, changing the ABI of
PushActiveSnapshot seems like a complete non-starter.The idea I'd had to avoid code duplication was to make
PushActiveSnapshot a wrapper for the extended function:void
PushActiveSnapshot(Snapshot snap)
{
PushActiveSnapshotWithLevel(snap, GetCurrentTransactionNestLevel());
}
Implemented into the new attached patch.
So more like
Assert(ActiveSnapshot == NULL ||
snap_level >= ActiveSnapshot->as_level);
Implemented into the new attached patch.
But make check is now failing on join_hash.sql, I have been able to
repro with:
create table bdt (a int);
begin;
savepoint a;
rollback to a;
explain select count(*) from bdt;
Which triggers a failed assertion on the new one:
TRAP: FailedAssertion("ActiveSnapshot == NULL || as_level >=
ActiveSnapshot->as_level"
because we have as_level = 2 while ActiveSnapshot->as_level = 3.
Thanks
Bertrand
Attachments:
v2-0002-EnsurePortalSnapshotExists-failed-assertion.patchtext/plain; charset=UTF-8; name=v2-0002-EnsurePortalSnapshotExists-failed-assertion.patchDownload+63-11
"Drouvot, Bertrand" <bdrouvot@amazon.com> writes:
But make check is now failing on join_hash.sql, I have been able to
repro with:
Oh, duh, should have thought a bit harder. createSubid is a sequential
subtransaction number; it's not the same as the as_level nesting level.
Probably the most effective way to handle this is to add a subtransaction
nesting-level field to struct Portal, so we can pass that. I don't recall
that xact.c provides any easy way to extract the nesting level of a
subtransaction that's not the most closely nested one.
regards, tom lane
On 9/29/21 10:11 PM, Tom Lane wrote:
"Drouvot, Bertrand" <bdrouvot@amazon.com> writes:
But make check is now failing on join_hash.sql, I have been able to
repro with:Oh, duh, should have thought a bit harder. createSubid is a sequential
subtransaction number; it's not the same as the as_level nesting level.
Oh right, thanks for the explanation.
Probably the most effective way to handle this is to add a subtransaction
nesting-level field to struct Portal, so we can pass that.
Agree, done that way in the new attached patch.
Thanks
Bertrand
Attachments:
v2-0003-EnsurePortalSnapshotExists-failed-assertion.patchtext/plain; charset=UTF-8; name=v2-0003-EnsurePortalSnapshotExists-failed-assertion.patchDownload+72-14
"Drouvot, Bertrand" <bdrouvot@amazon.com> writes:
[ v2-0003-EnsurePortalSnapshotExists-failed-assertion.patch ]
Looking through this, I think you were overenthusiastic about applying
PushActiveSnapshotWithLevel. We don't really need to use it except in
the places where we're setting portalSnapshot, because other than those
cases we don't have a risk of portalSnapshot becoming a dangling pointer.
Also, I'm quite nervous about applying PushActiveSnapshotWithLevel to
snapshots that aren't created by the portal machinery itself, because
we don't know very much about where passed-in snapshots came from or
what the caller thinks their lifespan is.
The attached revision therefore backs off to only using the new code
in the two places where we really need it. I made a number of
more-cosmetic revisions too. Notably, I think it's useful to frame
the testing shortcoming as "we were not testing COMMIT/ROLLBACK
inside a plpgsql exception block". So I moved the test code to the
plpgsql tests and made it check ROLLBACK too.
regards, tom lane
PS: Memo to self: in the back branches, the new field has to be
added at the end of struct Portal.
Attachments:
v3-EnsurePortalSnapshotExists-failed-assertion.patchtext/x-diff; charset=us-ascii; name=v3-EnsurePortalSnapshotExists-failed-assertion.patchDownload+93-8
On 9/30/21 7:16 PM, Tom Lane wrote:
"Drouvot, Bertrand" <bdrouvot@amazon.com> writes:
[ v2-0003-EnsurePortalSnapshotExists-failed-assertion.patch ]
Looking through this, I think you were overenthusiastic about applying
PushActiveSnapshotWithLevel. We don't really need to use it except in
the places where we're setting portalSnapshot, because other than those
cases we don't have a risk of portalSnapshot becoming a dangling pointer.
Also, I'm quite nervous about applying PushActiveSnapshotWithLevel to
snapshots that aren't created by the portal machinery itself, because
we don't know very much about where passed-in snapshots came from or
what the caller thinks their lifespan is.
Oh right, I did not think about it, thanks!
The attached revision therefore backs off to only using the new code
in the two places where we really need it.
Ok, so in PortalRunUtility() and EnsurePortalSnapshotExists().
I made a number of
more-cosmetic revisions too.
thanks!
Notably, I think it's useful to frame
the testing shortcoming as "we were not testing COMMIT/ROLLBACK
inside a plpgsql exception block". So I moved the test code to the
plpgsql tests and made it check ROLLBACK too.
Indeed, makes sense.
regards, tom lane
PS: Memo to self: in the back branches, the new field has to be
added at the end of struct Portal.
out of curiosity, why?
Thanks
Bertrand
"Drouvot, Bertrand" <bdrouvot@amazon.com> writes:
On 9/30/21 7:16 PM, Tom Lane wrote:
PS: Memo to self: in the back branches, the new field has to be
added at the end of struct Portal.
out of curiosity, why?
Sticking it into the middle would create an ABI break for any
extension code that's looking at struct Portal, due to movement
of existing field offsets. In HEAD that's fine, so we should
put the field where it makes the most sense. But we have to
be careful about changing globally-visible structs in released
branches.
regards, tom lane
On 9/30/21 8:25 PM, Tom Lane wrote:
"Drouvot, Bertrand" <bdrouvot@amazon.com> writes:
On 9/30/21 7:16 PM, Tom Lane wrote:
PS: Memo to self: in the back branches, the new field has to be
added at the end of struct Portal.out of curiosity, why?
Sticking it into the middle would create an ABI break for any
extension code that's looking at struct Portal, due to movement
of existing field offsets. In HEAD that's fine, so we should
put the field where it makes the most sense. But we have to
be careful about changing globally-visible structs in released
branches.
Got it, thanks!
Bertrand