Parallel vacuum workers prevent the oldest xmin from advancing
Hi all,
A customer reported that during parallel index vacuum, the oldest xmin
doesn't advance. Normally, the calculation of oldest xmin
(ComputeXidHorizons()) ignores xmin/xid of processes having
PROC_IN_VACUUM flag in MyProc->statusFlags. But since parallel vacuum
workers don’t set their statusFlags, the xmin of the parallel vacuum
worker is considered to calculate the oldest xmin. This issue happens
from PG13 where the parallel vacuum was introduced. I think it's a
bug.
Moreover, the same problem happens also in CREATE/REINDEX CONCURRENTLY
case in PG14 or later for the same reason (due to lack of
PROC_IN_SAFE_IC flag).
To fix it, I thought that we change the create index code and the
vacuum code so that the individual parallel worker sets its status
flags according to the leader’s one. But ISTM it’d be better to copy
the leader’s status flags to workers in ParallelWorkerMain(). I've
attached a patch for HEAD.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
copy_status_flags.patchapplication/octet-stream; name=copy_status_flags.patchDownload+20-0
On 10/6/21, 12:13 AM, "Masahiko Sawada" <sawada.mshk@gmail.com> wrote:
A customer reported that during parallel index vacuum, the oldest xmin
doesn't advance. Normally, the calculation of oldest xmin
(ComputeXidHorizons()) ignores xmin/xid of processes having
PROC_IN_VACUUM flag in MyProc->statusFlags. But since parallel vacuum
workers don’t set their statusFlags, the xmin of the parallel vacuum
worker is considered to calculate the oldest xmin. This issue happens
from PG13 where the parallel vacuum was introduced. I think it's a
bug.
+1
To fix it, I thought that we change the create index code and the
vacuum code so that the individual parallel worker sets its status
flags according to the leader’s one. But ISTM it’d be better to copy
the leader’s status flags to workers in ParallelWorkerMain(). I've
attached a patch for HEAD.
The patch seems reasonable to me.
Nathan
On Wed, Oct 6, 2021 at 12:41 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Hi all,
A customer reported that during parallel index vacuum, the oldest xmin
doesn't advance. Normally, the calculation of oldest xmin
(ComputeXidHorizons()) ignores xmin/xid of processes having
PROC_IN_VACUUM flag in MyProc->statusFlags. But since parallel vacuum
workers don’t set their statusFlags, the xmin of the parallel vacuum
worker is considered to calculate the oldest xmin. This issue happens
from PG13 where the parallel vacuum was introduced. I think it's a
bug.
I agree. Your patch seems to be in the right direction but I haven't
tested it yet. Feel free to register in the next CF to avoid
forgetting it.
--
With Regards,
Amit Kapila.
On 2021-Oct-06, Masahiko Sawada wrote:
Hi all,
A customer reported that during parallel index vacuum, the oldest xmin
doesn't advance. Normally, the calculation of oldest xmin
(ComputeXidHorizons()) ignores xmin/xid of processes having
PROC_IN_VACUUM flag in MyProc->statusFlags. But since parallel vacuum
workers don’t set their statusFlags, the xmin of the parallel vacuum
worker is considered to calculate the oldest xmin. This issue happens
from PG13 where the parallel vacuum was introduced. I think it's a
bug.
Augh, yeah, I agree this is a pretty serious problem.
But ISTM it’d be better to copy the leader’s status flags to workers
in ParallelWorkerMain(). I've attached a patch for HEAD.
Hmm, this affects not only PROC_IN_VACUUM and PROC_IN_SAFE_CIC (the bug
you're fixing), but also:
* PROC_IS_AUTOVACUUM. That seems reasonable to me -- should a parallel
worker for autovacuum be considered autovacuum too? AFAICS it's only
used by the deadlock detector, so it should be okay. However, in the
normal path, that flag is set much earlier.
* PROC_VACUUM_FOR_WRAPAROUND. Should be innocuous I think, since the
"parent" process already has this flag and thus shouldn't be cancelled.
* PROC_IN_LOGICAL_DECODING. Surely not set for parallel vacuum workers,
so not a problem.
--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
"Doing what he did amounts to sticking his fingers under the hood of the
implementation; if he gets his fingers burnt, it's his problem." (Tom Lane)
On Fri, Oct 8, 2021 at 8:13 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2021-Oct-06, Masahiko Sawada wrote:
A customer reported that during parallel index vacuum, the oldest xmin
doesn't advance. Normally, the calculation of oldest xmin
(ComputeXidHorizons()) ignores xmin/xid of processes having
PROC_IN_VACUUM flag in MyProc->statusFlags. But since parallel vacuum
workers don’t set their statusFlags, the xmin of the parallel vacuum
worker is considered to calculate the oldest xmin. This issue happens
from PG13 where the parallel vacuum was introduced. I think it's a
bug.Augh, yeah, I agree this is a pretty serious problem.
So is this comparable problem, which happens to be much older:
/messages/by-id/CAH2-WzkjrK556enVtFLmyXEdw91xGuwiyZVep2kp5yQT_-3JDg@mail.gmail.com
In both cases we see bugs (or implementation deficiencies) that
accidentally block ComputeXidHorizons() for hours, when that isn't
truly necessary. Practically all users are not sure of whether or not
VACUUM behaves like a long running transaction already, in general, so
we shouldn't be surprised that it takes so long for us to hear about
issues like this.
I think that we should try to find a way of making this whole class of
problems easier to identify in production. There needs to be greater
visibility into what process holds back VACUUM, and how long that
lasts -- something easy to use, and *obvious*. That would be a very
useful feature in general. It would also make catching these issues
early far more likely. It's just *not okay* that you have to follow long
and complicated instructions [1]https://www.cybertec-postgresql.com/en/reasons-why-vacuum-wont-remove-dead-rows/ to get just some of this information.
How can something this important just be an afterthought?
[1]: https://www.cybertec-postgresql.com/en/reasons-why-vacuum-wont-remove-dead-rows/
--
Peter Geoghegan
On Sat, Oct 9, 2021 at 12:13 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2021-Oct-06, Masahiko Sawada wrote:
Hi all,
A customer reported that during parallel index vacuum, the oldest xmin
doesn't advance. Normally, the calculation of oldest xmin
(ComputeXidHorizons()) ignores xmin/xid of processes having
PROC_IN_VACUUM flag in MyProc->statusFlags. But since parallel vacuum
workers don’t set their statusFlags, the xmin of the parallel vacuum
worker is considered to calculate the oldest xmin. This issue happens
from PG13 where the parallel vacuum was introduced. I think it's a
bug.Augh, yeah, I agree this is a pretty serious problem.
But ISTM it’d be better to copy the leader’s status flags to workers
in ParallelWorkerMain(). I've attached a patch for HEAD.Hmm, this affects not only PROC_IN_VACUUM and PROC_IN_SAFE_CIC (the bug
you're fixing), but also:* PROC_IS_AUTOVACUUM. That seems reasonable to me -- should a parallel
worker for autovacuum be considered autovacuum too? AFAICS it's only
used by the deadlock detector, so it should be okay. However, in the
normal path, that flag is set much earlier.* PROC_VACUUM_FOR_WRAPAROUND. Should be innocuous I think, since the
"parent" process already has this flag and thus shouldn't be cancelled.
Currently, we don't support parallel vacuum for autovacuum. So
parallel workers for vacuum don't have these two flags.
* PROC_IN_LOGICAL_DECODING. Surely not set for parallel vacuum workers,
so not a problem.
Agreed.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Mon, Oct 11, 2021 at 09:23:32AM +0900, Masahiko Sawada wrote:
On Sat, Oct 9, 2021 at 12:13 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
* PROC_VACUUM_FOR_WRAPAROUND. Should be innocuous I think, since the
"parent" process already has this flag and thus shouldn't be cancelled.Currently, we don't support parallel vacuum for autovacuum. So
parallel workers for vacuum don't have these two flags.
That's something that should IMO be marked in the code as a comment as
something to worry about once/if someone begins playing with parallel
autovacuum. If the change involving autovacuum is simple, I see no
reason to not add this part now, though.
--
Michael
On Wed, Oct 6, 2021 at 6:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
To fix it, I thought that we change the create index code and the
vacuum code so that the individual parallel worker sets its status
flags according to the leader’s one. But ISTM it’d be better to copy
the leader’s status flags to workers in ParallelWorkerMain(). I've
attached a patch for HEAD.
+1 The fix looks reasonable to me too.
Is it possible for the patch to add test cases for the two identified
problem scenarios? (PROC_IN_VACUUM, PROC_IN_SAFE_IC)
Regards,
Greg Nancarrow
Fujitsu Australia
On Mon, Oct 11, 2021 at 3:01 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
On Wed, Oct 6, 2021 at 6:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
To fix it, I thought that we change the create index code and the
vacuum code so that the individual parallel worker sets its status
flags according to the leader’s one. But ISTM it’d be better to copy
the leader’s status flags to workers in ParallelWorkerMain(). I've
attached a patch for HEAD.+1 The fix looks reasonable to me too.
Is it possible for the patch to add test cases for the two identified
problem scenarios? (PROC_IN_VACUUM, PROC_IN_SAFE_IC)
Not sure we can add stable tests for this. There is no way in the test
infra to control parallel workers to suspend and resume etc. and the
oldest xmin can vary depending on the situation. Probably we can add
an assertion to ensure a parallel worker for vacuum or create index
has PROC_IN_VACUUM or PROC_IN_SAFE_IC, respectively.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Mon, Oct 11, 2021 at 9:51 AM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Oct 11, 2021 at 09:23:32AM +0900, Masahiko Sawada wrote:
On Sat, Oct 9, 2021 at 12:13 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
* PROC_VACUUM_FOR_WRAPAROUND. Should be innocuous I think, since the
"parent" process already has this flag and thus shouldn't be cancelled.Currently, we don't support parallel vacuum for autovacuum. So
parallel workers for vacuum don't have these two flags.That's something that should IMO be marked in the code as a comment as
something to worry about once/if someone begins playing with parallel
autovacuum. If the change involving autovacuum is simple, I see no
reason to not add this part now, though.
Agreed. I added the comment. Also, I added an assertion to ensure that
a parallel worker for vacuum has PROC_IN_VACUUM flag (and doesn't have
other flags). But we cannot do that for parallel workers for building
btree index as they don’t know whether or not the CONCURRENTLY option
is specified.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
copy_status_flags_v2.patchapplication/octet-stream; name=copy_status_flags_v2.patchDownload+26-0
Hmm, I think this should happen before the transaction snapshot is
established in the worker; perhaps immediately after calling
StartParallelWorkerTransaction(), or anyway not after
SetTransactionSnapshot. In fact, since SetTransactionSnapshot receives
a 'sourceproc' argument, why not do it exactly there? ISTM that
ProcArrayInstallRestoredXmin() is where this should happen.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On 2021-Oct-19, Alvaro Herrera wrote:
Hmm, I think this should happen before the transaction snapshot is
established in the worker; perhaps immediately after calling
StartParallelWorkerTransaction(), or anyway not after
SetTransactionSnapshot. In fact, since SetTransactionSnapshot receives
a 'sourceproc' argument, why not do it exactly there? ISTM that
ProcArrayInstallRestoredXmin() is where this should happen.
... and there is a question about the lock strength used for
ProcArrayLock. The current routine uses LW_SHARED, but there's no
clarity that we can modify proc->statusFlags and ProcGlobal->statusFlags
without LW_EXCLUSIVE.
Maybe we can change ProcArrayInstallRestoredXmin so that if it sees that
proc->statusFlags is not zero, then it grabs LW_EXCLUSIVE (and copies),
otherwise it keeps using LW_SHARED as it does now (and does not copy.)
(This also suggests that using LW_EXCLUSIVE inconditionally for all
cases as your patch does is not great. OTOH it's just once at every
bgworker start, so it's not *that* frequent.)
Initially, I was a bit nervous about copying flags willy-nilly. Do we
need to be more careful? I mean, have a way for the code to specify
flags to copy, maybe something like
MyProc->statusFlags |= proc->statusFlags & copyableFlags;
ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
with this coding,
1. we do not unset flags that the bgworker already has for whatever
reason
2. we do not copy flags that may be unrelated to the effect we desire.
The problem, and it's something I don't have an answer for, is how to
specify copyableFlags. This code is the generic ParallelWorkerMain()
and there's little-to-no chance to pass stuff from the process that
requested the bgworker. So maybe Sawada-san's original coding of just
copying everything is okay.
--
Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
On Wed, Oct 20, 2021 at 3:07 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2021-Oct-19, Alvaro Herrera wrote:
Thank you for the comment.
Hmm, I think this should happen before the transaction snapshot is
established in the worker; perhaps immediately after calling
StartParallelWorkerTransaction(), or anyway not after
SetTransactionSnapshot. In fact, since SetTransactionSnapshot receives
a 'sourceproc' argument, why not do it exactly there? ISTM that
ProcArrayInstallRestoredXmin() is where this should happen.... and there is a question about the lock strength used for
ProcArrayLock. The current routine uses LW_SHARED, but there's no
clarity that we can modify proc->statusFlags and ProcGlobal->statusFlags
without LW_EXCLUSIVE.Maybe we can change ProcArrayInstallRestoredXmin so that if it sees that
proc->statusFlags is not zero, then it grabs LW_EXCLUSIVE (and copies),
otherwise it keeps using LW_SHARED as it does now (and does not copy.)
Initially, I've considered copying statusFlags in
ProcArrayInstallRestoredXmin() but I hesitated to do that because
statusFlags is not relevant with xmin and snapshot stuff. But I agree
that copying statusFlags should happen before restoring the snapshot.
If we copy statusFlags in ProcArrayInstallRestoredXmin() there is
still little window that the restored snapshot holds back the oldest
xmin? If so it would be better to call ProcArrayCopyStatusFlags()
right after StartParallelWorker().
(This also suggests that using LW_EXCLUSIVE inconditionally for all
cases as your patch does is not great. OTOH it's just once at every
bgworker start, so it's not *that* frequent.)
Agreed.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Wed, Oct 20, 2021 at 9:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Oct 20, 2021 at 3:07 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2021-Oct-19, Alvaro Herrera wrote:
Thank you for the comment.
Hmm, I think this should happen before the transaction snapshot is
established in the worker; perhaps immediately after calling
StartParallelWorkerTransaction(), or anyway not after
SetTransactionSnapshot. In fact, since SetTransactionSnapshot receives
a 'sourceproc' argument, why not do it exactly there? ISTM that
ProcArrayInstallRestoredXmin() is where this should happen.... and there is a question about the lock strength used for
ProcArrayLock. The current routine uses LW_SHARED, but there's no
clarity that we can modify proc->statusFlags and ProcGlobal->statusFlags
without LW_EXCLUSIVE.Maybe we can change ProcArrayInstallRestoredXmin so that if it sees that
proc->statusFlags is not zero, then it grabs LW_EXCLUSIVE (and copies),
otherwise it keeps using LW_SHARED as it does now (and does not copy.)Initially, I've considered copying statusFlags in
ProcArrayInstallRestoredXmin() but I hesitated to do that because
statusFlags is not relevant with xmin and snapshot stuff. But I agree
that copying statusFlags should happen before restoring the snapshot.If we copy statusFlags in ProcArrayInstallRestoredXmin() there is
still little window that the restored snapshot holds back the oldest
xmin?
That's wrong, I'd misunderstood.
I agree to copy statusFlags in ProcArrayInstallRestoredXmin(). I've
updated the patch accordingly.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
copy_status_flags_v3.patchapplication/octet-stream; name=copy_status_flags_v3.patchDownload+37-1
On Fri, 22 Oct 2021 at 07:38, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Oct 20, 2021 at 9:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Oct 20, 2021 at 3:07 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2021-Oct-19, Alvaro Herrera wrote:
Thank you for the comment.
Hmm, I think this should happen before the transaction snapshot is
established in the worker; perhaps immediately after calling
StartParallelWorkerTransaction(), or anyway not after
SetTransactionSnapshot. In fact, since SetTransactionSnapshot receives
a 'sourceproc' argument, why not do it exactly there? ISTM that
ProcArrayInstallRestoredXmin() is where this should happen.... and there is a question about the lock strength used for
ProcArrayLock. The current routine uses LW_SHARED, but there's no
clarity that we can modify proc->statusFlags and ProcGlobal->statusFlags
without LW_EXCLUSIVE.Maybe we can change ProcArrayInstallRestoredXmin so that if it sees that
proc->statusFlags is not zero, then it grabs LW_EXCLUSIVE (and copies),
otherwise it keeps using LW_SHARED as it does now (and does not copy.)Initially, I've considered copying statusFlags in
ProcArrayInstallRestoredXmin() but I hesitated to do that because
statusFlags is not relevant with xmin and snapshot stuff. But I agree
that copying statusFlags should happen before restoring the snapshot.If we copy statusFlags in ProcArrayInstallRestoredXmin() there is
still little window that the restored snapshot holds back the oldest
xmin?That's wrong, I'd misunderstood.
I agree to copy statusFlags in ProcArrayInstallRestoredXmin(). I've
updated the patch accordingly.
I've tested this patch, and it correctly fixes the issue of blocking
xmin from advancing, and also fixes an issue of retreating the
observed *_oldest_nonremovable in XidHorizons through parallel
workers.
There are still some other soundness issues with xmin handling (see
[0]: /messages/by-id/17257-1e46de26bec11433@postgresql.org
relevant branches.
Kind regards,
Matthias van de Meent
On Fri, Oct 22, 2021 at 11:08 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Oct 20, 2021 at 9:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I agree to copy statusFlags in ProcArrayInstallRestoredXmin(). I've
updated the patch accordingly.
1.
@@ -2663,7 +2677,16 @@ ProcArrayInstallRestoredXmin(TransactionId
xmin, PGPROC *proc)
TransactionIdIsNormal(xid) &&
TransactionIdPrecedesOrEquals(xid, xmin))
{
+ /* restore xmin */
MyProc->xmin = TransactionXmin = xmin;
+
+ /* copy statusFlags */
+ if (flags != 0)
+ {
+ MyProc->statusFlags = proc->statusFlags;
+ ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+ }
Is there a reason to tie the logic of copying status flags with the
last two transaction-related conditions?
2.
LWLockAcquire(ProcArrayLock, LW_SHARED);
+ flags = proc->statusFlags;
+
+ /*
+ * If the source xact has any statusFlags, we re-grab ProcArrayLock
+ * on exclusive mode so we can copy it to MyProc->statusFlags.
+ */
+ if (flags != 0)
+ {
+ LWLockRelease(ProcArrayLock);
+ LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+ }
This looks a bit odd to me. It would have been better if we know when
to acquire an exclusive lock without first acquiring the shared lock.
I see why it could be a good idea to do this stuff in
ProcArrayInstallRestoredXmin() but seeing the patch it seems better to
do this separately for the parallel worker as is done in your previous
patch version but do it after we call
StartParallelWorkerTransaction(). I am also not very sure if the other
callers of this code path will expect ProcArrayInstallRestoredXmin()
to do this assignment and also the function name appears to be very
specific to what it is currently doing.
--
With Regards,
Amit Kapila.
On Fri, Nov 5, 2021 at 8:16 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
On Fri, 22 Oct 2021 at 07:38, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Oct 20, 2021 at 9:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Oct 20, 2021 at 3:07 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2021-Oct-19, Alvaro Herrera wrote:
Thank you for the comment.
Hmm, I think this should happen before the transaction snapshot is
established in the worker; perhaps immediately after calling
StartParallelWorkerTransaction(), or anyway not after
SetTransactionSnapshot. In fact, since SetTransactionSnapshot receives
a 'sourceproc' argument, why not do it exactly there? ISTM that
ProcArrayInstallRestoredXmin() is where this should happen.... and there is a question about the lock strength used for
ProcArrayLock. The current routine uses LW_SHARED, but there's no
clarity that we can modify proc->statusFlags and ProcGlobal->statusFlags
without LW_EXCLUSIVE.Maybe we can change ProcArrayInstallRestoredXmin so that if it sees that
proc->statusFlags is not zero, then it grabs LW_EXCLUSIVE (and copies),
otherwise it keeps using LW_SHARED as it does now (and does not copy.)Initially, I've considered copying statusFlags in
ProcArrayInstallRestoredXmin() but I hesitated to do that because
statusFlags is not relevant with xmin and snapshot stuff. But I agree
that copying statusFlags should happen before restoring the snapshot.If we copy statusFlags in ProcArrayInstallRestoredXmin() there is
still little window that the restored snapshot holds back the oldest
xmin?That's wrong, I'd misunderstood.
I agree to copy statusFlags in ProcArrayInstallRestoredXmin(). I've
updated the patch accordingly.I've tested this patch, and it correctly fixes the issue of blocking
xmin from advancing, and also fixes an issue of retreating the
observed *_oldest_nonremovable in XidHorizons through parallel
workers.There are still some other soundness issues with xmin handling (see
[0]), but that should not prevent this patch from landing in the
relevant branches.
AFAICU, in the thread referred by you, it seems that the main reported
issue will be resolved by this patch but there is a discussion about
xmin moving backward which seems to be the case with the current code
as per code comments mentioned by Andres. Is my understanding correct?
--
With Regards,
Amit Kapila.
On Wed, 10 Nov 2021 at 11:51, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Nov 5, 2021 at 8:16 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:AFAICU, in the thread referred by you, it seems that the main reported
issue will be resolved by this patch but there is a discussion about
xmin moving backward which seems to be the case with the current code
as per code comments mentioned by Andres. Is my understanding correct?
That is correct.
On Wed, Nov 10, 2021 at 6:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Oct 22, 2021 at 11:08 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Oct 20, 2021 at 9:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I agree to copy statusFlags in ProcArrayInstallRestoredXmin(). I've
updated the patch accordingly.1. @@ -2663,7 +2677,16 @@ ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc) TransactionIdIsNormal(xid) && TransactionIdPrecedesOrEquals(xid, xmin)) { + /* restore xmin */ MyProc->xmin = TransactionXmin = xmin; + + /* copy statusFlags */ + if (flags != 0) + { + MyProc->statusFlags = proc->statusFlags; + ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags; + }Is there a reason to tie the logic of copying status flags with the
last two transaction-related conditions?
My wrong. It should not be tied.
2.
LWLockAcquire(ProcArrayLock, LW_SHARED);+ flags = proc->statusFlags; + + /* + * If the source xact has any statusFlags, we re-grab ProcArrayLock + * on exclusive mode so we can copy it to MyProc->statusFlags. + */ + if (flags != 0) + { + LWLockRelease(ProcArrayLock); + LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); + }This looks a bit odd to me. It would have been better if we know when
to acquire an exclusive lock without first acquiring the shared lock.
I think we should acquire an exclusive lock only if status flags are
not empty. But to check the status flags we need to acquire a shared
lock. No?
I see why it could be a good idea to do this stuff in
ProcArrayInstallRestoredXmin() but seeing the patch it seems better to
do this separately for the parallel worker as is done in your previous
patch version but do it after we call
StartParallelWorkerTransaction(). I am also not very sure if the other
callers of this code path will expect ProcArrayInstallRestoredXmin()
to do this assignment and also the function name appears to be very
specific to what it is currently doing.
Fair enough. I was also concerned that but since
ProcArrayInstallRestoredXmin() is a convenient place to set status
flags I changed the patch accordingly. As you pointed out, doing that
separately for the parallel worker is clearer.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Hi,
On 2021-11-11 12:22:42 +0900, Masahiko Sawada wrote:
2.
LWLockAcquire(ProcArrayLock, LW_SHARED);+ flags = proc->statusFlags; + + /* + * If the source xact has any statusFlags, we re-grab ProcArrayLock + * on exclusive mode so we can copy it to MyProc->statusFlags. + */ + if (flags != 0) + { + LWLockRelease(ProcArrayLock); + LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); + }This looks a bit odd to me. It would have been better if we know when
to acquire an exclusive lock without first acquiring the shared lock.I think we should acquire an exclusive lock only if status flags are
not empty. But to check the status flags we need to acquire a shared
lock. No?
This seems like an unnecessary optimization. ProcArrayInstallRestoredXmin()
only happens in the context of much more expensive operations.
I think it might be worth asserting that the set of flags we're copying is a
known subset of the flags that are valid to copy from the source.
Greetings,
Andres Freund