Add checkpoint and redo LSN to LogCheckpointEnd log message
Hi,
It is useful (for debugging purposes) if the checkpoint end message
has the checkpoint LSN and REDO LSN [1]2021-12-23 14:58:54.694 UTC [1965649] LOG: checkpoint starting: shutdown immediate 2021-12-23 14:58:54.714 UTC [1965649] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.025 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB; LSN=0/14D0AD0, REDO LSN=0/14D0AD0. It gives more context while
analyzing checkpoint-related issues. The pg_controldata gives the last
checkpoint LSN and REDO LSN, but having this info alongside the log
message helps analyze issues that happened previously, connect the
dots and identify the root cause.
Attaching a small patch herewith. Thoughts?
[1]: 2021-12-23 14:58:54.694 UTC [1965649] LOG: checkpoint starting: shutdown immediate 2021-12-23 14:58:54.714 UTC [1965649] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.025 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB; LSN=0/14D0AD0, REDO LSN=0/14D0AD0
2021-12-23 14:58:54.694 UTC [1965649] LOG: checkpoint starting:
shutdown immediate
2021-12-23 14:58:54.714 UTC [1965649] LOG: checkpoint complete: wrote
0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled;
write=0.001 s, sync=0.001 s, total=0.025 s; sync files=0,
longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB;
LSN=0/14D0AD0, REDO LSN=0/14D0AD0
Regards,
Bharath Rupireddy.
Attachments:
v1-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patchapplication/octet-stream; name=v1-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patchDownload+10-5
At Thu, 23 Dec 2021 20:35:54 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
Hi,
It is useful (for debugging purposes) if the checkpoint end message
has the checkpoint LSN and REDO LSN [1]. It gives more context while
analyzing checkpoint-related issues. The pg_controldata gives the last
checkpoint LSN and REDO LSN, but having this info alongside the log
message helps analyze issues that happened previously, connect the
dots and identify the root cause.Attaching a small patch herewith. Thoughts?
A big +1 from me. I thought about proposing the same in the past.
[1]
2021-12-23 14:58:54.694 UTC [1965649] LOG: checkpoint starting:
shutdown immediate
2021-12-23 14:58:54.714 UTC [1965649] LOG: checkpoint complete: wrote
0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled;
write=0.001 s, sync=0.001 s, total=0.025 s; sync files=0,
longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB;
LSN=0/14D0AD0, REDO LSN=0/14D0AD0
I thougt about something like the following, but your proposal may be
clearer.
WAL range=[0/14D0340, 0/14D0AD0]
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Fri, Dec 24, 2021 at 11:21 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
At Thu, 23 Dec 2021 20:35:54 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
Hi,
It is useful (for debugging purposes) if the checkpoint end message
has the checkpoint LSN and REDO LSN [1]. It gives more context while
analyzing checkpoint-related issues. The pg_controldata gives the last
checkpoint LSN and REDO LSN, but having this info alongside the log
message helps analyze issues that happened previously, connect the
dots and identify the root cause.Attaching a small patch herewith. Thoughts?
A big +1 from me. I thought about proposing the same in the past.
Thanks for the review. I've added a CF entry to not lose track -
https://commitfest.postgresql.org/36/3474/.
[1]
2021-12-23 14:58:54.694 UTC [1965649] LOG: checkpoint starting:
shutdown immediate
2021-12-23 14:58:54.714 UTC [1965649] LOG: checkpoint complete: wrote
0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled;
write=0.001 s, sync=0.001 s, total=0.025 s; sync files=0,
longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB;
LSN=0/14D0AD0, REDO LSN=0/14D0AD0I thougt about something like the following, but your proposal may be
clearer.WAL range=[0/14D0340, 0/14D0AD0]
Yeah the proposed in the v1 is clear saying checkpoint/restartpoint
LSN and REDO LSN.
Regards,
Bharath Rupireddy.
On Fri, Dec 24, 2021 at 02:51:34PM +0900, Kyotaro Horiguchi wrote:
I thougt about something like the following, but your proposal may be
clearer.
+ "LSN=%X/%X, REDO LSN=%X/%X",
This could be rather confusing for the average user, even if I agree
that this is some information that only an advanced user could
understand. Could it be possible to define those fields in a more
deterministic way? For one, it is hard to understand the relationship
between both fields without looking at the code, particulary if both
share the same value. It is at least rather.. Well, mostly, easy to
guess what each other field means in this context, which is not the
case of what you are proposing here. One idea could be use also
"start point" for REDO, for example.
--
Michael
On Fri, Dec 24, 2021 at 5:42 PM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Dec 24, 2021 at 02:51:34PM +0900, Kyotaro Horiguchi wrote:
I thougt about something like the following, but your proposal may be
clearer.+ "LSN=%X/%X, REDO LSN=%X/%X",
This could be rather confusing for the average user, even if I agree
that this is some information that only an advanced user could
understand. Could it be possible to define those fields in a more
deterministic way? For one, it is hard to understand the relationship
between both fields without looking at the code, particulary if both
share the same value. It is at least rather.. Well, mostly, easy to
guess what each other field means in this context, which is not the
case of what you are proposing here. One idea could be use also
"start point" for REDO, for example.
How about "location=%X/%X, REDO start location=%X/%X"? The entire log
message can look like below:
2021-12-24 12:20:19.140 UTC [1977834] LOG: checkpoint
complete:location=%X/%X, REDO start location=%X/%X; wrote 7 buffers
(0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.005 s,
sync=0.007 s, total=0.192 s; sync files=5, longest=0.005 s,
average=0.002 s; distance=293 kB, estimate=56584 kB
Another variant:
2021-12-24 12:20:19.140 UTC [1977834] LOG: checkpoint completed at
location=%X/%X with REDO start location=%X/%X: wrote 7 buffers (0.0%);
0 WAL file(s) added, 0 removed, 0 recycled; write=0.005 s, sync=0.007
s, total=0.192 s; sync files=5, longest=0.005 s, average=0.002 s;
distance=293 kB, estimate=56584 kB
2021-12-24 12:20:19.140 UTC [1977834] LOG: restartpoint completed at
location=%X/%X with REDO start location=%X/%X: wrote 7 buffers (0.0%);
0 WAL file(s) added, 0 removed, 0 recycled; write=0.005 s, sync=0.007
s, total=0.192 s; sync files=5, longest=0.005 s, average=0.002 s;
distance=293 kB, estimate=56584 kB
Thoughts?
Regards,
Bharath Rupireddy.
On Fri, Dec 24, 2021 at 5:54 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Fri, Dec 24, 2021 at 5:42 PM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Dec 24, 2021 at 02:51:34PM +0900, Kyotaro Horiguchi wrote:
I thougt about something like the following, but your proposal may be
clearer.+ "LSN=%X/%X, REDO LSN=%X/%X",
This could be rather confusing for the average user, even if I agree
that this is some information that only an advanced user could
understand. Could it be possible to define those fields in a more
deterministic way? For one, it is hard to understand the relationship
between both fields without looking at the code, particulary if both
share the same value. It is at least rather.. Well, mostly, easy to
guess what each other field means in this context, which is not the
case of what you are proposing here. One idea could be use also
"start point" for REDO, for example.How about "location=%X/%X, REDO start location=%X/%X"? The entire log
message can look like below:2021-12-24 12:20:19.140 UTC [1977834] LOG: checkpoint
complete:location=%X/%X, REDO start location=%X/%X; wrote 7 buffers
(0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.005 s,
sync=0.007 s, total=0.192 s; sync files=5, longest=0.005 s,
average=0.002 s; distance=293 kB, estimate=56584 kBAnother variant:
2021-12-24 12:20:19.140 UTC [1977834] LOG: checkpoint completed at
location=%X/%X with REDO start location=%X/%X: wrote 7 buffers (0.0%);
0 WAL file(s) added, 0 removed, 0 recycled; write=0.005 s, sync=0.007
s, total=0.192 s; sync files=5, longest=0.005 s, average=0.002 s;
distance=293 kB, estimate=56584 kB
2021-12-24 12:20:19.140 UTC [1977834] LOG: restartpoint completed at
location=%X/%X with REDO start location=%X/%X: wrote 7 buffers (0.0%);
0 WAL file(s) added, 0 removed, 0 recycled; write=0.005 s, sync=0.007
s, total=0.192 s; sync files=5, longest=0.005 s, average=0.002 s;
distance=293 kB, estimate=56584 kB
Here are the 2 patches.
one(v2-1-XXX.patch) adding the info as:
2021-12-28 02:44:34.870 UTC [2384386] LOG: checkpoint complete:
location=0/1B03040, REDO start location=0/1B03008; wrote 466 buffers
(2.8%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.014 s,
sync=0.038 s, total=0.072 s; sync files=21, longest=0.022 s,
average=0.002 s; distance=6346 kB, estimate=6346 kB
another(v2-2-XXX.patch) adding the info as:
2021-12-28 02:52:24.464 UTC [2394396] LOG: checkpoint completed at
location=0/212FFC8 with REDO start location=0/212FF90: wrote 451
buffers (2.8%); 0 WAL file(s) added, 0 removed, 1 recycled;
write=0.012 s, sync=0.032 s, total=0.071 s; sync files=6,
longest=0.022 s, average=0.006 s; distance=6272 kB, estimate=6272 kB
attaching v1-0001-XXX from the initial mail again just for the sake of
completion:
2021-12-23 14:58:54.714 UTC [1965649] LOG: checkpoint complete: wrote
0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled;
write=0.001 s, sync=0.001 s, total=0.025 s; sync files=0,
longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB;
LSN=0/14D0AD0, REDO LSN=0/14D0AD0
Thoughts?
Regards,
Bharath Rupireddy.
Attachments:
v2-1-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patchapplication/octet-stream; name=v2-1-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patchDownload+8-3
v2-2-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patchapplication/octet-stream; name=v2-2-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patchDownload+8-3
v1-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patchapplication/octet-stream; name=v1-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patchDownload+10-5
At Tue, 28 Dec 2021 08:26:28 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
On Fri, Dec 24, 2021 at 5:54 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Fri, Dec 24, 2021 at 5:42 PM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Dec 24, 2021 at 02:51:34PM +0900, Kyotaro Horiguchi wrote:
I thougt about something like the following, but your proposal may be
clearer.+ "LSN=%X/%X, REDO LSN=%X/%X",
This could be rather confusing for the average user, even if I agree
that this is some information that only an advanced user could
understand. Could it be possible to define those fields in a more
deterministic way? For one, it is hard to understand the relationship
between both fields without looking at the code, particulary if both
share the same value. It is at least rather.. Well, mostly, easy to
guess what each other field means in this context, which is not the
case of what you are proposing here. One idea could be use also
"start point" for REDO, for example.How about "location=%X/%X, REDO start location=%X/%X"? The entire log
message can look like below:
..
Thoughts?
It seems to me "LSN" or just "location" is more confusing or
mysterious than "REDO LSN" for the average user. If we want to avoid
being technically too detailed, we would use just "start LSN=%X/%X,
end LSN=%X/%X". And it is equivalent to "WAL range=[%X/%X, %X/%X]"..
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Hi,
On Tue, Dec 28, 2021 at 10:56 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
attaching v1-0001-XXX from the initial mail again just for the sake of
completion:
Unfortunately this breaks the cfbot as it tries to apply this patch
too: http://cfbot.cputube.org/patch_36_3474.log.
For this kind of situation I think that the usual solution is to use a
.txt extension to make sure that the cfbot won't try to apply it.
Julien Rouhaud <rjuju123@gmail.com> writes:
For this kind of situation I think that the usual solution is to use a
.txt extension to make sure that the cfbot won't try to apply it.
Yeah ... this has come up before. Is there a documented way to
attach files that the cfbot will ignore?
Two specific scenarios seem to be interesting:
1. You are attaching patch(es) plus some non-patch files
2. You are attaching some random files, and would like to not
displace the cfbot's idea of the latest patchset.
I don't know exactly how to do either of those.
regards, tom lane
On Wed, Jan 12, 2022 at 01:19:22AM -0500, Tom Lane wrote:
Julien Rouhaud <rjuju123@gmail.com> writes:
For this kind of situation I think that the usual solution is to use a
.txt extension to make sure that the cfbot won't try to apply it.Yeah ... this has come up before. Is there a documented way to
attach files that the cfbot will ignore?
Not that I know of unfortunately. I think the apply part is done by
https://github.com/macdice/cfbot/blob/master/cfbot_patchburner_chroot_ctl.sh#L93-L120,
which seems reasonable. So basically any extension outside of those could be
used, excluding of course anything clearly suspicious that would be rejected by
many email providers.
Two specific scenarios seem to be interesting:
1. You are attaching patch(es) plus some non-patch files
2. You are attaching some random files, and would like to not
displace the cfbot's idea of the latest patchset.
I'm assuming that someone wanting to send an additional patch to be applied on
top of the OP patchset is part of 2?
Julien Rouhaud <rjuju123@gmail.com> writes:
On Wed, Jan 12, 2022 at 01:19:22AM -0500, Tom Lane wrote:
2. You are attaching some random files, and would like to not
displace the cfbot's idea of the latest patchset.
I'm assuming that someone wanting to send an additional patch to be applied on
top of the OP patchset is part of 2?
AFAIK, if you're submitting a patch then you have to attach a complete
patchset, or the cfbot will be totally lost. Getting the bot to
understand incremental patches would be useful for sure ... but it's
outside the scope of what I'm asking for now, which is just clear
documentation of what the bot can do already.
regards, tom lane
On Wed, Jan 12, 2022 at 2:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
AFAIK, if you're submitting a patch then you have to attach a complete
patchset, or the cfbot will be totally lost. Getting the bot to
understand incremental patches would be useful for sure ... but it's
outside the scope of what I'm asking for now, which is just clear
documentation of what the bot can do already.
Right, but the use case I'm mentioning is a bit different: provide
another patch while *not* triggering the cfbot. I've seen cases in
the past where people want to share some code to the OP and it seems
reasonable to allow that without risking the trigger the cfbot, at
least not without the OP validating or adapting the changes.
On Wed, Jan 12, 2022 at 7:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Julien Rouhaud <rjuju123@gmail.com> writes:
On Wed, Jan 12, 2022 at 01:19:22AM -0500, Tom Lane wrote:
2. You are attaching some random files, and would like to not
displace the cfbot's idea of the latest patchset.I'm assuming that someone wanting to send an additional patch to be applied on
top of the OP patchset is part of 2?AFAIK, if you're submitting a patch then you have to attach a complete
patchset, or the cfbot will be totally lost. Getting the bot to
understand incremental patches would be useful for sure ... but it's
outside the scope of what I'm asking for now, which is just clear
documentation of what the bot can do already.
By way of documentation, I've just now tried to answer these question
in the new FAQ at:
Thomas Munro <thomas.munro@gmail.com> writes:
By way of documentation, I've just now tried to answer these question
in the new FAQ at:
https://wiki.postgresql.org/wiki/Cfbot
Thanks!
regards, tom lane
On Wed, Jan 12, 2022 at 2:52 PM Thomas Munro <thomas.munro@gmail.com> wrote:
By way of documentation, I've just now tried to answer these question
in the new FAQ at:
Great! Thanks a lot!
On Wed, Jan 12, 2022 at 11:39 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
Hi,
On Tue, Dec 28, 2021 at 10:56 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:attaching v1-0001-XXX from the initial mail again just for the sake of
completion:Unfortunately this breaks the cfbot as it tries to apply this patch
too: http://cfbot.cputube.org/patch_36_3474.log.For this kind of situation I think that the usual solution is to use a
.txt extension to make sure that the cfbot won't try to apply it.
Thanks. IMO, the following format of logging is better, so attaching
the v2-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patch as
.patch
2021-12-28 02:52:24.464 UTC [2394396] LOG: checkpoint completed at
location=0/212FFC8 with REDO start location=0/212FF90: wrote 451
buffers (2.8%); 0 WAL file(s) added, 0 removed, 1 recycled;
write=0.012 s, sync=0.032 s, total=0.071 s; sync files=6,
longest=0.022 s, average=0.006 s; distance=6272 kB, estimate=6272 kB
Others are attached as .txt files.
Regards,
Bharath Rupireddy.
Attachments:
v2-0002-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.txttext/plain; charset=US-ASCII; name=v2-0002-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.txtDownload+8-3
v2-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patchapplication/octet-stream; name=v2-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patchDownload+8-3
v2-0003-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.txttext/plain; charset=US-ASCII; name=v2-0003-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.txtDownload+8-3
On Thu, Jan 13, 2022 at 11:50 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
Thanks. IMO, the following format of logging is better, so attaching
the v2-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patch as
.patch2021-12-28 02:52:24.464 UTC [2394396] LOG: checkpoint completed at
location=0/212FFC8 with REDO start location=0/212FF90: wrote 451
buffers (2.8%); 0 WAL file(s) added, 0 removed, 1 recycled;
write=0.012 s, sync=0.032 s, total=0.071 s; sync files=6,
longest=0.022 s, average=0.006 s; distance=6272 kB, estimate=6272 kB
One of the test cases was failing with the above style of the log
message, changing "checkpoint complete" to "checkpoint completed at
location" doesn't seem to be a better idea. It looks like the users or
the log monitoring tools might be using the same text "checkpoint
complete", therefore I don't want to break that. Here's the v3 patch
that I think will work better. Please review.
Regards,
Bharath Rupireddy.
Attachments:
v3-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patchapplication/x-patch; name=v3-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patchDownload+8-3
On 1/3/22, 5:52 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
It seems to me "LSN" or just "location" is more confusing or
mysterious than "REDO LSN" for the average user. If we want to avoid
being technically too detailed, we would use just "start LSN=%X/%X,
end LSN=%X/%X". And it is equivalent to "WAL range=[%X/%X, %X/%X]"..
My first instinct was that this should stay aligned with
pg_controldata, but that would mean using "location=%X/%X, REDO
location=%X/%X," which doesn't seem terribly descriptive. IIUC the
"checkpoint location" is the LSN of the WAL record for the checkpoint,
and the "checkpoint's REDO location" is the LSN where checkpoint
creation began (i.e., what you must retain for crash recovery). My
vote is for "start=%X/%X, end=%X/%X."
Nathan
At Thu, 20 Jan 2022 00:36:32 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in
On 1/3/22, 5:52 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
It seems to me "LSN" or just "location" is more confusing or
mysterious than "REDO LSN" for the average user. If we want to avoid
being technically too detailed, we would use just "start LSN=%X/%X,
end LSN=%X/%X". And it is equivalent to "WAL range=[%X/%X, %X/%X]"..My first instinct was that this should stay aligned with
pg_controldata, but that would mean using "location=%X/%X, REDO
location=%X/%X," which doesn't seem terribly descriptive. IIUC the
"checkpoint location" is the LSN of the WAL record for the checkpoint,
and the "checkpoint's REDO location" is the LSN where checkpoint
creation began (i.e., what you must retain for crash recovery). My
vote is for "start=%X/%X, end=%X/%X."
+1. Works for me. %X/%X itself expresses it is an LSN.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Thu, Jan 20, 2022 at 6:06 AM Bossart, Nathan <bossartn@amazon.com> wrote:
On 1/3/22, 5:52 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
It seems to me "LSN" or just "location" is more confusing or
mysterious than "REDO LSN" for the average user. If we want to avoid
being technically too detailed, we would use just "start LSN=%X/%X,
end LSN=%X/%X". And it is equivalent to "WAL range=[%X/%X, %X/%X]"..My first instinct was that this should stay aligned with
pg_controldata, but that would mean using "location=%X/%X, REDO
location=%X/%X," which doesn't seem terribly descriptive. IIUC the
"checkpoint location" is the LSN of the WAL record for the checkpoint,
and the "checkpoint's REDO location" is the LSN where checkpoint
creation began (i.e., what you must retain for crash recovery). My
vote is for "start=%X/%X, end=%X/%X."
I'm still not clear how the REDO location can be treated as a start
LSN? Can someone throw some light one what this checkpoint's REDO
location is?
Regards,
Bharath Rupireddy.