A assert failure when initdb with track_commit_timestamp=on
Hi,
When working with the commit_ts module, I find the following issue:
After configure with --enable-cassert option, then initdb with:
initdb -D x2 -c track_commit_timestamp=on
Then we can get the following core dump:
0 in raise of /lib/x86_64-linux-gnu/libc.so.6
1 in abort of /lib/x86_64-linux-gnu/libc.so.6
2 in ExceptionalCondition of assert.c:66
3 in TransactionIdSetCommitTs of commit_ts.c:257
4 in SetXidCommitTsInPage of commit_ts.c:236
5 in TransactionTreeSetCommitTsData of commit_ts.c:192
6 in RecordTransactionCommit of xact.c:1468
7 in CommitTransaction of xact.c:2365
8 in CommitTransactionCommandInternal of xact.c:3202
9 in CommitTransactionCommand of xact.c:3163
10 in BootstrapModeMain of bootstrap.c:390
11 in main of main.c:210
The reason are TransactionIdSetCommitTs think the given xid must be
normal
static void
TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
RepOriginId nodeid, int slotno)
{
...
Assert(TransactionIdIsNormal(xid));
}
However this is not true in BootstrapMode, this failure is masked by
default because TransactionTreeSetCommitTsData returns fast when
track_commit_timestamp is off.
void
TransactionTreeSetCommitTsData(TransactionId xid, int nsubxids,
TransactionId *subxids, TimestampTz timestamp,
RepOriginId nodeid)
{
/*
* No-op if the module is not active.
*
*/
if (!commitTsShared->commitTsActive)
return;
..
}
I can't think out a meaningful reason to record the commit timestamp for a
BootstrapTransactionId or FrozenTransactionId, so I think bypass it in
TransactionTreeSetCommitTsData could be a solution. Another solution is
just removing the Assert in TransactionIdSetCommitTs, it works during
initdb test at least.
I include both fixes in the attachment, I think just one of them could
be adopted however.
--
Best Regards
Andy Fan
Attachments:
v1-0001-Fix-the-Assert-failure-when-initdb-with-track_com.patchtext/x-diffDownload+7-3
On Wed, Jul 02, 2025 at 12:38:01AM +0000, Andy Fan wrote:
However this is not true in BootstrapMode, this failure is masked by
default because TransactionTreeSetCommitTsData returns fast when
track_commit_timestamp is off.
Agreed that there is no point in registering a commit timestamp in
the cases of a frozen and bootstrap XIDs. I would recommend to keep
the assertion in TransactionIdSetCommitTs(), though, that still looks
useful to me for the many callers of this routine, at least as a
sanity check.
I did not check, but usually we apply filters based on
IsBootstrapProcessingMode() for code paths that we do not want to
reach while in bootstrap mode. Could the same be done here?
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
Hi,
On Wed, Jul 02, 2025 at 12:38:01AM +0000, Andy Fan wrote:
However this is not true in BootstrapMode, this failure is masked by
default because TransactionTreeSetCommitTsData returns fast when
track_commit_timestamp is off.Agreed that there is no point in registering a commit timestamp in
the cases of a frozen and bootstrap XIDs. I would recommend to keep
the assertion in TransactionIdSetCommitTs(), though, that still looks
useful to me for the many callers of this routine, at least as a
sanity check.
Yes, The assert also guard the InvalidTransactionId. So I removed this
solution in v2. Another reason for this is: if we allowed
BooststrapTransactionId in the commit_ts, it introduces something new to
this module when initdb with track_commit_timestamp=on. This risk might
be very low, but it can be avoided easily with the another solution.
I did not check, but usually we apply filters based on
IsBootstrapProcessingMode() for code paths that we do not want to
reach while in bootstrap mode. Could the same be done here?
I think you are right. so I used IsBootstrapProcessingMode in v2.
--
Best Regards
Andy Fan
Attachments:
v2-0001-Don-t-record-commit_ts-in-bootstarp-mode.patchtext/x-diffDownload+6-1
Hi,
On Wed, Jul 02, 2025 at 01:38:18AM +0000, Andy Fan wrote:
Michael Paquier <michael@paquier.xyz> writes:
Hi,
On Wed, Jul 02, 2025 at 12:38:01AM +0000, Andy Fan wrote:
However this is not true in BootstrapMode, this failure is masked by
default because TransactionTreeSetCommitTsData returns fast when
track_commit_timestamp is off.Agreed that there is no point in registering a commit timestamp in
the cases of a frozen and bootstrap XIDs. I would recommend to keep
the assertion in TransactionIdSetCommitTs(), though, that still looks
useful to me for the many callers of this routine, at least as a
sanity check.Yes, The assert also guard the InvalidTransactionId. So I removed this
solution in v2. Another reason for this is: if we allowed
BooststrapTransactionId in the commit_ts, it introduces something new to
this module when initdb with track_commit_timestamp=on. This risk might
be very low, but it can be avoided easily with the another solution.I did not check, but usually we apply filters based on
IsBootstrapProcessingMode() for code paths that we do not want to
reach while in bootstrap mode. Could the same be done here?I think you are right. so I used IsBootstrapProcessingMode in v2.
Thanks for the report and the patch.
Yeah, I also think that making use of IsBootstrapProcessingMode() is the
right thing to do here.
=== 1
+ * Don't bother to record commit_ts for Booststrap mode.
typo: s/Booststrap/Bootstrap/
Also, grep on "Bootstrap mode" and "bootstrap mode" gives much more occurrences
for the later, so maybe use "bootstrap mode" instead?
=== 2
The FrozenTransactionId case is missing in v2, is that on purpose?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Wed, Jul 02, 2025 at 09:03:35AM +0000, Bertrand Drouvot wrote:
+ * Don't bother to record commit_ts for Booststrap mode.
typo: s/Booststrap/Bootstrap/
Also, grep on "Bootstrap mode" and "bootstrap mode" gives much more occurrences
for the later, so maybe use "bootstrap mode" instead?=== 2
The FrozenTransactionId case is missing in v2, is that on purpose?
There may be an argument about bypassing the frozen case as well, but
I cannot get excited about that because it can also be very useful to
detect problems as calling this routine with a frozen XID would be
really wrong. The assert in TransactionIdSetCommitTs() is enough for
this purpose, no need to add something else. As of now
TransactionTreeSetCommitTsData() is only called with an active XID at
commit, but if somebody has too much imagination in a module..
As a whole I think that the addition of IsBootstrapProcessingMode() in
TransactionTreeSetCommitTsData() is enough. There may be an argument
about backpatching this change. However, nobody has complained about
the idea of enabling the commit_ts module while bootstrapping, so I'd
keep it as a HEAD-only thing and be conservative.
A second thing I would suggest is some test coverage, with the
addition of extra => [ '-c', "track_commit_timestamp=on=on" ] in an
init() method called in one of the script tests of
src/test/modules/commit_ts/t. Just changing 001_base.pl should be
enough to trigger your original failure, making sure that this never
breaks again.
--
Michael
On 2025/07/02 9:38, Andy Fan wrote:
Hi,
When working with the commit_ts module, I find the following issue:
After configure with --enable-cassert option, then initdb with:
initdb -D x2 -c track_commit_timestamp=on
Then we can get the following core dump:
Is this the same issue that was discussed in [1]/messages/by-id/OSCPR01MB14966FF9E4C4145F37B937E52F5102@OSCPR01MB14966.jpnprd01.prod.outlook.com?
Regards,
[1]: /messages/by-id/OSCPR01MB14966FF9E4C4145F37B937E52F5102@OSCPR01MB14966.jpnprd01.prod.outlook.com
--
Fujii Masao
NTT DATA Japan Corporation
On Thu, Jul 03, 2025 at 09:48:40AM +0900, Fujii Masao wrote:
Is this the same issue that was discussed in [1]?
[1] /messages/by-id/OSCPR01MB14966FF9E4C4145F37B937E52F5102@OSCPR01MB14966.jpnprd01.prod.outlook.com
Ah, indeed, so it was reported a couple of months ago. I am not sure
that the argument about all the other GUCs potentially impacted holds
much value; we are talking about a specific code path.
--
Michael
Dear Michael, Fujii-san,
Ah, indeed, so it was reported a couple of months ago. I am not sure
that the argument about all the other GUCs potentially impacted holds
much value; we are talking about a specific code path.
Yeah, I did report but sadly it was missed by others :-(. To clarify,
The current patch looks good to me.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
"Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> writes:
Dear Michael, Fujii-san,
Ah, indeed, so it was reported a couple of months ago. I am not sure
that the argument about all the other GUCs potentially impacted holds
much value; we are talking about a specific code path.Yeah, I did report but sadly it was missed by others :-(. To clarify,
The current patch looks good to me.
Then I'd thank Michael to watch the maillist closely this time.
I checked the fix suggested by Hayato, I think his patch is better than
me because his patch checks at the startup time while my patch checks at
each time of RecordTransactionCommit. So v3 takes his patch. v3 also
added the testcase suggested by Michael for test coverage, it clearly
proves the bug is fixed now.
--
Best Regards
Andy Fan
Attachments:
v3-0001-Avoid-activating-commit_ts-while-bootstrap.patchtext/x-diffDownload+8-2
On 2025/07/03 22:31, Andy Fan wrote:
"Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> writes:
Dear Michael, Fujii-san,
Ah, indeed, so it was reported a couple of months ago. I am not sure
that the argument about all the other GUCs potentially impacted holds
much value; we are talking about a specific code path.Yeah, I did report but sadly it was missed by others :-(. To clarify,
The current patch looks good to me.Then I'd thank Michael to watch the maillist closely this time.
I checked the fix suggested by Hayato, I think his patch is better than
me because his patch checks at the startup time while my patch checks at
each time of RecordTransactionCommit. So v3 takes his patch. v3 also
added the testcase suggested by Michael for test coverage, it clearly
proves the bug is fixed now.
The patch looks good to me.
By the way, although it's a separate issue, I noticed that running
initdb -c transaction_timeout=1 causes an assertion failure:
running bootstrap script ... TRAP: failed Assert("all_timeouts_initialized"), File: "timeout.c", Line: 164, PID: 22057
0 postgres 0x00000001105d9d02 ExceptionalCondition + 178
1 postgres 0x0000000110612af7 enable_timeout + 55
2 postgres 0x0000000110612aa9 enable_timeout_after + 73
3 postgres 0x000000010fead8e0 StartTransaction + 816
4 postgres 0x000000010fead4a1 StartTransactionCommand + 65
5 postgres 0x000000010fef01de BootstrapModeMain + 1518
6 postgres 0x0000000110167ef4 main + 676
7 dyld 0x00007ff805092530 start + 3056
child process was terminated by signal 6: Abort trap: 6
This happens because enable_timeout() tries to activate the transaction
timeout before InitializeTimeouts() has been called — in other words,
the timeout system hasn't been initialized yet. To fix this, we might
need to forcibly set transaction_timeout to 0 during bootstrap.
Regards,
--
Fujii Masao
NTT DATA Japan Corporation
On Fri, Jul 04, 2025 at 01:26:19AM +0900, Fujii Masao wrote:
On 2025/07/03 22:31, Andy Fan wrote:
I checked the fix suggested by Hayato, I think his patch is better than
me because his patch checks at the startup time while my patch checks at
each time of RecordTransactionCommit. So v3 takes his patch. v3 also
added the testcase suggested by Michael for test coverage, it clearly
proves the bug is fixed now.The patch looks good to me.
I was wondering if we should backpatch that, and decided towards a
yes here as it can be annoying. 3e51b278db6a has introduced the -c
switch in initdb in v16, but that could be reached as well if one
touches at some initialization path, perhaps in a fork. Done, down
to v13.
Let's tackle the rest separately.
--
Michael
Dear Fujii-san,
By the way, although it's a separate issue, I noticed that running
initdb -c transaction_timeout=1 causes an assertion failure:
I feel it may be able to discuss in other places but let me say one comment.
running bootstrap script ... TRAP: failed Assert("all_timeouts_initialized"), File:
"timeout.c", Line: 164, PID: 22057
0 postgres 0x00000001105d9d02
ExceptionalCondition + 178
1 postgres 0x0000000110612af7
enable_timeout + 55
2 postgres 0x0000000110612aa9
enable_timeout_after + 73
3 postgres 0x000000010fead8e0
StartTransaction + 816
4 postgres 0x000000010fead4a1
StartTransactionCommand + 65
5 postgres 0x000000010fef01de
BootstrapModeMain + 1518
6 postgres 0x0000000110167ef4 main + 676
7 dyld 0x00007ff805092530 start + 3056
child process was terminated by signal 6: Abort trap: 6This happens because enable_timeout() tries to activate the transaction
timeout before InitializeTimeouts() has been called ? in other words,
the timeout system hasn't been initialized yet. To fix this, we might
need to forcibly set transaction_timeout to 0 during bootstrap.
If more GUCs were found which cannot be set during the bootstrap mode, how about
introducing a new flag like GUC_DEFAULT_WHILE_BOOTSTRAPPING for GUC variables?
If the flag is set all setting can be ignored when
IsBootstrapProcessingMode() = true.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
On 2025/07/04 16:29, Hayato Kuroda (Fujitsu) wrote:
Dear Fujii-san,
By the way, although it's a separate issue, I noticed that running
initdb -c transaction_timeout=1 causes an assertion failure:I feel it may be able to discuss in other places
OK, I've started a new thread for this issue at [1]/messages/by-id/a68fae7d-f45a-4c70-8d90-2a2cd3bdcfca@oss.nttdata.com.
If more GUCs were found which cannot be set during the bootstrap mode, how about
introducing a new flag like GUC_DEFAULT_WHILE_BOOTSTRAPPING for GUC variables?
If the flag is set all setting can be ignored when
IsBootstrapProcessingMode() = true.
If there are many GUCs that behave incorrectly during bootstrap,
a general mechanism like that might be worth considering. But if
only a few GUCs are affected, as I believe is the case,
then such a mechanism may be overkill.
In that case, IMO it should be sufficient to disable the problematic
GUCs individually, for example by calling SetConfigOption(..., PGC_S_OVERRIDE).
Regards,
[1]: /messages/by-id/a68fae7d-f45a-4c70-8d90-2a2cd3bdcfca@oss.nttdata.com
--
Fujii Masao
NTT DATA Japan Corporation
Fujii Masao <masao.fujii@oss.nttdata.com> writes:
On 2025/07/04 16:29, Hayato Kuroda (Fujitsu) wrote:
If more GUCs were found which cannot be set during the bootstrap mode, how about
introducing a new flag like GUC_DEFAULT_WHILE_BOOTSTRAPPING for GUC variables?
If the flag is set all setting can be ignored when
IsBootstrapProcessingMode() = true.
If there are many GUCs that behave incorrectly during bootstrap,
a general mechanism like that might be worth considering. But if
only a few GUCs are affected, as I believe is the case,
then such a mechanism may be overkill.
As I remarked in the other thread, I don't like inventing a different
solution for each GUC. So if there are even two that need something
done, I think Hayato-san's idea has merit.
The core of the patch could be as little as
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 667df448732..43f289924e6 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3464,6 +3464,15 @@ set_config_with_handle(const char *name, config_handle *handle,
return 0;
}
+ /*
+ * Certain GUCs aren't safe to enable during bootstrap mode. Silently
+ * ignore attempts to set them to non-default values.
+ */
+ if (unlikely(IsBootstrapProcessingMode()) &&
+ (record->flags & GUC_IGNORE_IN_BOOTSTRAP) &&
+ source != PGC_S_DEFAULT)
+ changeVal = false;
+
/*
* Check if the option can be set at this time. See guc.h for the precise
* rules.
If we went this way, we'd presumably revert 5a6c39b6d in favor
of marking track_commit_timestamp with this flag.
regards, tom lane
On 2025/07/05 0:30, Tom Lane wrote:
Fujii Masao <masao.fujii@oss.nttdata.com> writes:
On 2025/07/04 16:29, Hayato Kuroda (Fujitsu) wrote:
If more GUCs were found which cannot be set during the bootstrap mode, how about
introducing a new flag like GUC_DEFAULT_WHILE_BOOTSTRAPPING for GUC variables?
If the flag is set all setting can be ignored when
IsBootstrapProcessingMode() = true.If there are many GUCs that behave incorrectly during bootstrap,
a general mechanism like that might be worth considering. But if
only a few GUCs are affected, as I believe is the case,
then such a mechanism may be overkill.As I remarked in the other thread, I don't like inventing a different
solution for each GUC. So if there are even two that need something
done, I think Hayato-san's idea has merit.The core of the patch could be as little as
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 667df448732..43f289924e6 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -3464,6 +3464,15 @@ set_config_with_handle(const char *name, config_handle *handle, return 0; }+ /* + * Certain GUCs aren't safe to enable during bootstrap mode. Silently + * ignore attempts to set them to non-default values. + */ + if (unlikely(IsBootstrapProcessingMode()) && + (record->flags & GUC_IGNORE_IN_BOOTSTRAP) && + source != PGC_S_DEFAULT) + changeVal = false; + /* * Check if the option can be set at this time. See guc.h for the precise * rules.
This code seems to assume that the processing mode is switched to bootstrap before
GUC parameters are processed. But is that actually the case?
Regards,
--
Fujii Masao
NTT DATA Japan Corporation
Fujii Masao <masao.fujii@oss.nttdata.com> writes:
On 2025/07/05 0:30, Tom Lane wrote:
As I remarked in the other thread, I don't like inventing a different
solution for each GUC. So if there are even two that need something
done, I think Hayato-san's idea has merit.
This code seems to assume that the processing mode is switched to bootstrap before
GUC parameters are processed. But is that actually the case?
Oh, good point. But there doesn't seem to be any ill effect from
making BootstrapModeMain set BootstrapProcessing a bit earlier.
Attached is a proof-of-concept that I've actually tested.
However, what I find with this POC is that
initdb -c transaction_timeout=10s
goes through fine, but (at least on my machine)
initdb -c transaction_timeout=1
yields
...
running bootstrap script ... ok
performing post-bootstrap initialization ... 2025-07-04 13:08:04.225 EDT [261836] FATAL: terminating connection due to transaction timeout
child process exited with exit code 1
because 1ms is not enough time to complete the post-bootstrap run.
I would argue that that's pilot error and we did exactly what the
user demanded, but is there anyone who wants to say that we should
suppress such GUCs during post-bootstrap too?
regards, tom lane
Attachments:
poc-survive-transaction_timeout-in-initdb.patchtext/x-diff; charset=us-ascii; name=poc-survive-transaction_timeout-in-initdb.patchDownload+12-3
On Fri, Jul 04, 2025 at 11:30:17AM -0400, Tom Lane wrote:
As I remarked in the other thread, I don't like inventing a different
solution for each GUC. So if there are even two that need something
done, I think Hayato-san's idea has merit.+ /* + * Certain GUCs aren't safe to enable during bootstrap mode. Silently + * ignore attempts to set them to non-default values. + */ + if (unlikely(IsBootstrapProcessingMode()) && + (record->flags & GUC_IGNORE_IN_BOOTSTRAP) && + source != PGC_S_DEFAULT) + changeVal = false; + /* * Check if the option can be set at this time. See guc.h for the precise * rules.
This is assuming that the default value assigned to a GUC will always
take the right decision in the bootstrap case, which is perhaps OK
anyway in most cases, or we would know about that during initdb. I'd
be OK with something among these lines, yes. I have doubts if we can
really safely place the initial GUC loading to happen after the
postmaster is switched to bootstrap mode, and if it's a viable
strategy in the long-term..
If we went this way, we'd presumably revert 5a6c39b6d in favor
of marking track_commit_timestamp with this flag.
Makes sense, on HEAD.
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
This is assuming that the default value assigned to a GUC will always
take the right decision in the bootstrap case, which is perhaps OK
anyway in most cases, or we would know about that during initdb.
Yeah, I've been wondering about whether the code ought to accept
source == PGC_S_DYNAMIC_DEFAULT. It doesn't matter until/unless
we need to set this flag on a GUC that has code to compute a
dynamic default, so any decision we made right now would be made
in a vacuum ... but perhaps the right guess is to allow it.
If we went this way, we'd presumably revert 5a6c39b6d in favor
of marking track_commit_timestamp with this flag.
Makes sense, on HEAD.
Well, you back-patched 5a6c39b6d, so it's not clear to me why
we wouldn't want to back-patch something to fix any other GUC
suffering from a comparable problem. I don't see that adding
another possible GUC flag is an ABI break, especially when it's
a flag that no extension could have a need to set.
regards, tom lane
On 2025/07/05 2:17, Tom Lane wrote:
Fujii Masao <masao.fujii@oss.nttdata.com> writes:
On 2025/07/05 0:30, Tom Lane wrote:
As I remarked in the other thread, I don't like inventing a different
solution for each GUC. So if there are even two that need something
done, I think Hayato-san's idea has merit.This code seems to assume that the processing mode is switched to bootstrap before
GUC parameters are processed. But is that actually the case?Oh, good point. But there doesn't seem to be any ill effect from
making BootstrapModeMain set BootstrapProcessing a bit earlier.
Maybe. But I noticed that your patch also moves the line "IgnoreSystemIndexes = true;"
earlier. Why did you make this change?
This could cause initdb to fail with a PANIC error when run with ignore_system_indexes=off,
like this:
$ initdb -D data -c ignore_system_indexes=off
...
FATAL: could not open relation with OID 2703
PANIC: cannot abort transaction 1, it was already committed
So perhaps "IgnoreSystemIndexes = true;" should be placed after GUCs are processed?
Or GUC ignore_system_indexes also should be treated in the same way
as transaction_timeout?
Regards,
--
Fujii Masao
NTT DATA Japan Corporation
Fujii Masao <masao.fujii@oss.nttdata.com> writes:
On 2025/07/05 2:17, Tom Lane wrote:
Oh, good point. But there doesn't seem to be any ill effect from
making BootstrapModeMain set BootstrapProcessing a bit earlier.
Maybe. But I noticed that your patch also moves the line "IgnoreSystemIndexes = true;"
earlier. Why did you make this change?
It just seemed to go with the bootstrap-mode setting. But your
example shows differently:
This could cause initdb to fail with a PANIC error when run with ignore_system_indexes=off,
like this:
$ initdb -D data -c ignore_system_indexes=off
...
FATAL: could not open relation with OID 2703
PANIC: cannot abort transaction 1, it was already committed
So perhaps "IgnoreSystemIndexes = true;" should be placed after GUCs are processed?
Yeah, we should do it like that (and probably also have a comment...)
Or GUC ignore_system_indexes also should be treated in the same way
as transaction_timeout?
Yes, I'd say we ought to mark that GUC as don't-accept-in-bootstrap
too. I've not done any research about what other GUCs can break
initdb, but now I'm starting to suspect there are several.
BTW, I now realize that this is only an issue starting from v16.
Before that initdb didn't have a -c switch, so there was not a
way for people to shove random settings into it.
regards, tom lane