[PATCH] Teach pg_waldump to extract FPIs from the WAL

Started by David Christensenalmost 4 years ago66 messageshackers
Jump to latest
#1David Christensen
david.christensen@crunchydata.com

Hi -hackers,

Enclosed is a patch to allow extraction/saving of FPI from the WAL
stream via pg_waldump.

Description from the commit:

Extracts full-page images from the WAL stream into a target directory,
which must be empty or not
exist. These images are subject to the same filtering rules as normal
display in pg_waldump, which
means that you can isolate the full page writes to a target relation,
among other things.

Files are saved with the filename: <lsn>.<ts>.<db>.<rel>.<blk> with
formatting to make things
somewhat sortable; for instance:

00000000-010000C0.1663.1.6117.0
00000000-01000150.1664.0.6115.0
00000000-010001E0.1664.0.6114.0
00000000-01000270.1663.1.6116.0
00000000-01000300.1663.1.6113.0
00000000-01000390.1663.1.6112.0
00000000-01000420.1663.1.8903.0
00000000-010004B0.1663.1.8902.0
00000000-01000540.1663.1.6111.0
00000000-010005D0.1663.1.6110.0

It's noteworthy that the raw images do not have the current LSN stored
with them in the WAL
stream (as would be true for on-heap versions of the blocks), nor
would the checksum be valid in
them (though WAL itself has checksums, so there is some protection
there). This patch chooses to
place the LSN and calculate the proper checksum (if non-zero in the
source image) in the outputted
block. (This could perhaps be a targetted flag if we decide we don't
always want this.)

These images could be loaded/inspected via `pg_read_binary_file()` and
used in the `pageinspect`
suite of tools to perform detailed analysis on the pages in question,
based on historical
information, and may come in handy for forensics work.

Best,

David

Attachments:

v1-pg_waldump-save-fpi.patchapplication/octet-stream; name=v1-pg_waldump-save-fpi.patchDownload+188-2
#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Christensen (#1)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On 2022-Apr-22, David Christensen wrote:

Hi -hackers,

Enclosed is a patch to allow extraction/saving of FPI from the WAL
stream via pg_waldump.

I already wrote and posted a patch to do exactly this, and found it the
only way to fix a customer problem, so +1 on having this feature. I
haven't reviewed David's patch.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

#3Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: David Christensen (#1)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Sat, 23 Apr 2022 at 00:51, David Christensen
<david.christensen@crunchydata.com> wrote:

Hi -hackers,

Enclosed is a patch to allow extraction/saving of FPI from the WAL
stream via pg_waldump.

Description from the commit:

Extracts full-page images from the WAL stream into a target directory,
which must be empty or not
exist. These images are subject to the same filtering rules as normal
display in pg_waldump, which
means that you can isolate the full page writes to a target relation,
among other things.

Files are saved with the filename: <lsn>.<ts>.<db>.<rel>.<blk> with
formatting to make things

Regardless of my (lack of) opinion on the inclusion of this patch in
PG (I did not significantly review this patch); I noticed that you do
not yet identify the 'fork' of the FPI in the file name.

A lack of fork identifier in the exported file names would make
debugging much more difficult due to the relatively difficult to
identify data contained in !main forks, so I think this oversight
should be fixed, be it through `_forkname` postfix like normal fork
segments, or be it through `.<forknum>` numerical in- or postfix in
the filename.

-Matthias

#4David Christensen
david.christensen@crunchydata.com
In reply to: Matthias van de Meent (#3)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Sat, Apr 23, 2022 at 9:49 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

Regardless of my (lack of) opinion on the inclusion of this patch in
PG (I did not significantly review this patch); I noticed that you do
not yet identify the 'fork' of the FPI in the file name.

A lack of fork identifier in the exported file names would make
debugging much more difficult due to the relatively difficult to
identify data contained in !main forks, so I think this oversight
should be fixed, be it through `_forkname` postfix like normal fork
segments, or be it through `.<forknum>` numerical in- or postfix in
the filename.

-Matthias

Hi Matthias, great point. Enclosed is a revised version of the patch
that adds the fork identifier to the end if it's a non-main fork.

Best,

David

Attachments:

v2-pg_waldump-save-fpi.patchapplication/octet-stream; name=v2-pg_waldump-save-fpi.patchDownload+208-2
#5Michael Paquier
michael@paquier.xyz
In reply to: David Christensen (#4)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Sat, Apr 23, 2022 at 01:43:36PM -0500, David Christensen wrote:

Hi Matthias, great point. Enclosed is a revised version of the patch
that adds the fork identifier to the end if it's a non-main fork.

Like Alvaro, I have seen cases where this would have been really
handy. So +1 from me, as well, to have more tooling like what you are
proposing. Fine for me to use one file for each block with a name
like what you are suggesting for each one of them.

+       /* we accept an empty existing directory */
+       if (stat(config.save_fpw_path, &st) == 0 && S_ISDIR(st.st_mode))
+       {
I don't think that there is any need to rely on a new logic if there
is already some code in place able to do the same work.  See
verify_dir_is_empty_or_create() in pg_basebackup.c, as one example,
that relies on pg_check_dir().  I think that you'd better rely at
least on what pgcheckdir.c offers.
+       {"raw-fpi", required_argument, NULL, 'W'},
I think that we'd better rename this option.  "fpi", that is not used
much in the user-facing docs, is additionally not adapted when we have
an other option called -w/--fullpage.  I can think of
--save-fullpage.
+                   PageSetLSN(page, record->ReadRecPtr);
+                   /* if checksum field is non-zero then we have checksums enabled,
+                    * so recalculate the checksum with new LSN (yes, this is a hack)
+                    */
Yeah, that looks like a hack, but putting in place a page on a cluster
that has checksums enabled would be more annoying with
zero_damaged_pages enabled if we don't do that, so that's fine by me
as-is.  Perhaps you should mention that FPWs don't have their
pd_checksum updated when written.
+       /* we will now extract the fullpage image from the XLogRecord and save
+        * it to a calculated filename */
The format of this comment is incorrect.
+            <entry>The LSN of the record with this block, formatted
+            as <literal>%08x-%08X</literal> instead of the
+            conventional <literal>%X/%X</literal> due to filesystem naming
+            limits</entry>
The last part of the sentence about %X/%X could just be removed.  That
could be confusing, at worse.

+ PageSetLSN(page, record->ReadRecPtr);
Why is pd_lsn set?

git diff --check complains a bit.

This stuff should include some tests. With --end, the tests can
be cheap.
--
Michael

#6Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#5)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

Hi,

On 4/25/22 8:11 AM, Michael Paquier wrote:

On Sat, Apr 23, 2022 at 01:43:36PM -0500, David Christensen wrote:

Hi Matthias, great point. Enclosed is a revised version of the patch
that adds the fork identifier to the end if it's a non-main fork.

Like Alvaro, I have seen cases where this would have been really
handy. So +1 from me, as well, to have more tooling like what you are
proposing.

+1 on the idea.

FWIW, there is an extension doing this [1]https://github.com/bdrouvot/pg_wal_fp_extract but having the feature
included in pg_waldump would be great.

Bertrand

[1]: https://github.com/bdrouvot/pg_wal_fp_extract

#7Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: David Christensen (#1)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Sat, Apr 23, 2022 at 4:21 AM David Christensen
<david.christensen@crunchydata.com> wrote:

Hi -hackers,

Enclosed is a patch to allow extraction/saving of FPI from the WAL
stream via pg_waldump.

Description from the commit:

Extracts full-page images from the WAL stream into a target directory,
which must be empty or not
exist. These images are subject to the same filtering rules as normal
display in pg_waldump, which
means that you can isolate the full page writes to a target relation,
among other things.

Files are saved with the filename: <lsn>.<ts>.<db>.<rel>.<blk> with
formatting to make things
somewhat sortable; for instance:

00000000-010000C0.1663.1.6117.0
00000000-01000150.1664.0.6115.0
00000000-010001E0.1664.0.6114.0
00000000-01000270.1663.1.6116.0
00000000-01000300.1663.1.6113.0
00000000-01000390.1663.1.6112.0
00000000-01000420.1663.1.8903.0
00000000-010004B0.1663.1.8902.0
00000000-01000540.1663.1.6111.0
00000000-010005D0.1663.1.6110.0

It's noteworthy that the raw images do not have the current LSN stored
with them in the WAL
stream (as would be true for on-heap versions of the blocks), nor
would the checksum be valid in
them (though WAL itself has checksums, so there is some protection
there). This patch chooses to
place the LSN and calculate the proper checksum (if non-zero in the
source image) in the outputted
block. (This could perhaps be a targetted flag if we decide we don't
always want this.)

These images could be loaded/inspected via `pg_read_binary_file()` and
used in the `pageinspect`
suite of tools to perform detailed analysis on the pages in question,
based on historical
information, and may come in handy for forensics work.

Thanks for working on this. I'm just thinking if we can use these FPIs
to repair the corrupted pages? I would like to understand more
detailed usages of the FPIs other than inspecting with pageinspect.

Given that others have realistic use-cases (of course I would like to
know more about those), +1 for the idea. However, I would suggest
adding a function to extract raw FPI data to the pg_walinspect
extension that got recently committed in PG 15, the out of which can
directly be fed to pageinspect functions or

Few comments:
1) I think it's good to mention the stored file name format.
+ printf(_("  -W, --raw-fpi=path     save found full page images to
given path\n"));
2)
+ for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+ {
+ /* we will now extract the fullpage image from the XLogRecord and save
+ * it to a calculated filename */
+
+ if (XLogRecHasBlockImage(record, block_id))

I think we need XLogRecHasBlockRef to be true to check
XLogRecHasBlockImage otherwise, we will see some build farms failing,
recently I've seen this failure for pg_walinspect..

for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
{
if (!XLogRecHasBlockRef(record, block_id))
continue;

        if (XLogRecHasBlockImage(record, block_id))
            *fpi_len += XLogRecGetBlock(record, block_id)->bimg_len;
    }
3) Please correct the commenting format:
+ /* we will now extract the fullpage image from the XLogRecord and save
+ * it to a calculated filename */
4) Usually we start errors with lower case letters "could not ....."
+ pg_fatal("Couldn't open file for output: %s", filename);
+ pg_fatal("Couldn't write out complete FPI to file: %s", filename);
And the variable name too:
+ FILE *OPF;
5) Not sure how the FPIs of TOASTed tables get stored, but it would be
good to check.
6) Good to specify the known usages of FPIs in the documentation.
7) Isn't it good to emit an error if RestoreBlockImage returns false?
+ if (RestoreBlockImage(record, block_id, page))
+ {
8) I think I don't mind if a non-empty directory is specified - IMO
better usability is this - if the directory is non-empty, just go add
the FPI files if FPI file exists just replace it, if the directory
isn't existing, create and write the FPI files.
+ /* we accept an empty existing directory */
+ if (stat(config.save_fpw_path, &st) == 0 && S_ISDIR(st.st_mode))
+ {
 9) Instead of following:
+ if (XLogRecordHasFPW(xlogreader_state))
+ XLogRecordSaveFPWs(xlogreader_state, config.save_fpw_path);
I will just do this in XLogRecordSaveFPWs:
    for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
    {
        if (!XLogRecHasBlockRef(record, block_id))
            continue;

if (XLogRecHasBlockImage(record, block_id))
{

}
}
10) Along with pg_pwrite(), can we also fsync the files (of course
users can choose it optionally) so that the writes will be durable for
the OS crashes?

Regards,
Bharath Rupireddy.

#8David Christensen
david.christensen@crunchydata.com
In reply to: Michael Paquier (#5)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Mon, Apr 25, 2022 at 1:11 AM Michael Paquier <michael@paquier.xyz> wrote:

On Sat, Apr 23, 2022 at 01:43:36PM -0500, David Christensen wrote:

Hi Matthias, great point. Enclosed is a revised version of the patch
that adds the fork identifier to the end if it's a non-main fork.

Like Alvaro, I have seen cases where this would have been really
handy. So +1 from me, as well, to have more tooling like what you are
proposing. Fine for me to use one file for each block with a name
like what you are suggesting for each one of them.

+       /* we accept an empty existing directory */
+       if (stat(config.save_fpw_path, &st) == 0 && S_ISDIR(st.st_mode))
+       {
I don't think that there is any need to rely on a new logic if there
is already some code in place able to do the same work.  See
verify_dir_is_empty_or_create() in pg_basebackup.c, as one example,
that relies on pg_check_dir().  I think that you'd better rely at
least on what pgcheckdir.c offers.

Yeah, though I am tending towards what another user had suggested and
just accepting an existing directory rather than requiring it to be
empty, so thinking I might just skip this one. (Will review the file
you've pointed out to see if there is a relevant function though.)

+       {"raw-fpi", required_argument, NULL, 'W'},
I think that we'd better rename this option.  "fpi", that is not used
much in the user-facing docs, is additionally not adapted when we have
an other option called -w/--fullpage.  I can think of
--save-fullpage.

Makes sense.

+                   PageSetLSN(page, record->ReadRecPtr);
+                   /* if checksum field is non-zero then we have checksums enabled,
+                    * so recalculate the checksum with new LSN (yes, this is a hack)
+                    */
Yeah, that looks like a hack, but putting in place a page on a cluster
that has checksums enabled would be more annoying with
zero_damaged_pages enabled if we don't do that, so that's fine by me
as-is.  Perhaps you should mention that FPWs don't have their
pd_checksum updated when written.

Can make a mention; you thinking just a comment in the code is
sufficient, or want there to be user-visible docs as well?

+       /* we will now extract the fullpage image from the XLogRecord and save
+        * it to a calculated filename */
The format of this comment is incorrect.
+            <entry>The LSN of the record with this block, formatted
+            as <literal>%08x-%08X</literal> instead of the
+            conventional <literal>%X/%X</literal> due to filesystem naming
+            limits</entry>
The last part of the sentence about %X/%X could just be removed.  That
could be confusing, at worse.

Makes sense.

+ PageSetLSN(page, record->ReadRecPtr);
Why is pd_lsn set?

This was to make the page as extracted equivalent to the effect of
applying the WAL record block on replay (the LSN and checksum both);
since recovery calls this function to mark the LSN where the page came
from this is why I did this as well. (I do see perhaps a case for
--raw output that doesn't munge the page whatsoever, just made
comparisons easier.)

git diff --check complains a bit.

Can look into this.

This stuff should include some tests. With --end, the tests can
be cheap.

Yeah, makes sense, will include some in the next version.

David

#9David Christensen
david.christensen@crunchydata.com
In reply to: Bertrand Drouvot (#6)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Mon, Apr 25, 2022 at 2:00 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Hi,

On 4/25/22 8:11 AM, Michael Paquier wrote:

On Sat, Apr 23, 2022 at 01:43:36PM -0500, David Christensen wrote:

Hi Matthias, great point. Enclosed is a revised version of the patch
that adds the fork identifier to the end if it's a non-main fork.

Like Alvaro, I have seen cases where this would have been really
handy. So +1 from me, as well, to have more tooling like what you are
proposing.

+1 on the idea.

FWIW, there is an extension doing this [1] but having the feature
included in pg_waldump would be great.

Cool, glad to see there is some interest; definitely some overlap in
forensics inside and outside the database both, as there are different
use cases for both.

David

#10David Christensen
david.christensen@crunchydata.com
In reply to: Bharath Rupireddy (#7)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Mon, Apr 25, 2022 at 6:03 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Thanks for working on this. I'm just thinking if we can use these FPIs
to repair the corrupted pages? I would like to understand more
detailed usages of the FPIs other than inspecting with pageinspect.

My main use case was for being able to look at potential corruption,
either in the WAL stream, on heap, or in tools associated with the WAL
stream. I suppose you could use the page images to replace corrupted
on-disk pages (and in fact I think I've heard of a tool or two that
try to do that), though don't know that I consider this the primary
purpose (and having toast tables and the list, as well as clog would
make it potentially hard to just drop-in a page version without
issues). Might help in extreme situations though.

Given that others have realistic use-cases (of course I would like to
know more about those), +1 for the idea. However, I would suggest
adding a function to extract raw FPI data to the pg_walinspect
extension that got recently committed in PG 15, the out of which can
directly be fed to pageinspect functions or

Yeah, makes sense to have some overlap here; will review what is there
and see if there is some shared code base we can utilize. (ISTR some
work towards getting these two tools using more of the same code, and
this seems like another such instance.)

Few comments:
1) I think it's good to mention the stored file name format.
+ printf(_(" -W, --raw-fpi=path save found full page images to
given path\n"));

+1, though I've also thought there could be uses to have multiple
possible output formats here (most immediately, there may be cases
where we want *each* FPI for a block vs the *latest*, so files name
with/without the LSN component seem the easiest way forward here).
That would introduce some additional complexity though, so might need
to see if others think that makes any sense.

2)
+ for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+ {
+ /* we will now extract the fullpage image from the XLogRecord and save
+ * it to a calculated filename */
+
+ if (XLogRecHasBlockImage(record, block_id))

I think we need XLogRecHasBlockRef to be true to check
XLogRecHasBlockImage otherwise, we will see some build farms failing,
recently I've seen this failure for pg_walinspect..

for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
{
if (!XLogRecHasBlockRef(record, block_id))
continue;

if (XLogRecHasBlockImage(record, block_id))
*fpi_len += XLogRecGetBlock(record, block_id)->bimg_len;
}

Good point; my previous patch that got committed here (127aea2a65)
probably also needed this treatment.

3) Please correct the commenting format:
+ /* we will now extract the fullpage image from the XLogRecord and save
+ * it to a calculated filename */

Ack.

4) Usually we start errors with lower case letters "could not ....."
+ pg_fatal("Couldn't open file for output: %s", filename);
+ pg_fatal("Couldn't write out complete FPI to file: %s", filename);
And the variable name too:
+ FILE *OPF;

Ack.

5) Not sure how the FPIs of TOASTed tables get stored, but it would be
good to check.

What would be different here? Are there issues you can think of, or
just more from the pageinspect side of things?

6) Good to specify the known usages of FPIs in the documentation.

Ack. Prob good to get additional info/use cases from others, as mine
is fairly short. :-)

7) Isn't it good to emit an error if RestoreBlockImage returns false?
+ if (RestoreBlockImage(record, block_id, page))
+ {

Ack.

8) I think I don't mind if a non-empty directory is specified - IMO
better usability is this - if the directory is non-empty, just go add
the FPI files if FPI file exists just replace it, if the directory
isn't existing, create and write the FPI files.
+ /* we accept an empty existing directory */
+ if (stat(config.save_fpw_path, &st) == 0 && S_ISDIR(st.st_mode))
+ {

Agreed; was mainly trying to prevent accidental expansion inside
`pg_wal` when an earlier version of the patch implied `.` as the
current dir with an optional path, but I've since made the path
non-optional and agree that this is unnecessarily restrictive.

9) Instead of following:
+ if (XLogRecordHasFPW(xlogreader_state))
+ XLogRecordSaveFPWs(xlogreader_state, config.save_fpw_path);
I will just do this in XLogRecordSaveFPWs:
for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
{
if (!XLogRecHasBlockRef(record, block_id))
continue;

if (XLogRecHasBlockImage(record, block_id))
{

}
}

Yeah, a little redundant.

10) Along with pg_pwrite(), can we also fsync the files (of course
users can choose it optionally) so that the writes will be durable for
the OS crashes?

Can add; you thinking a separate flag to disable this with default true?

Best,

David

#11Michael Paquier
michael@paquier.xyz
In reply to: David Christensen (#10)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Mon, Apr 25, 2022 at 10:24:52AM -0500, David Christensen wrote:

On Mon, Apr 25, 2022 at 6:03 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Thanks for working on this. I'm just thinking if we can use these FPIs
to repair the corrupted pages? I would like to understand more
detailed usages of the FPIs other than inspecting with pageinspect.

My main use case was for being able to look at potential corruption,
either in the WAL stream, on heap, or in tools associated with the WAL
stream. I suppose you could use the page images to replace corrupted
on-disk pages (and in fact I think I've heard of a tool or two that
try to do that), though don't know that I consider this the primary
purpose (and having toast tables and the list, as well as clog would
make it potentially hard to just drop-in a page version without
issues). Might help in extreme situations though.

You could do a bunch of things with those images, even make things
worse if you are not careful enough.

10) Along with pg_pwrite(), can we also fsync the files (of course
users can choose it optionally) so that the writes will be durable for
the OS crashes?

Can add; you thinking a separate flag to disable this with default true?

We expect data generated by tools like pg_dump, pg_receivewal
(depending on the use --synchronous) or pg_basebackup to be consistent
when we exit from the call. FWIW, flushing this data does not seem
like a strong requirement for something aimed at being used page-level
chirurgy or lookups, because the WAL segments should still be around
even if the host holding the archives is unplugged.
--
Michael

#12Michael Paquier
michael@paquier.xyz
In reply to: David Christensen (#8)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Mon, Apr 25, 2022 at 10:11:10AM -0500, David Christensen wrote:

On Mon, Apr 25, 2022 at 1:11 AM Michael Paquier <michael@paquier.xyz> wrote:

I don't think that there is any need to rely on a new logic if there
is already some code in place able to do the same work. See
verify_dir_is_empty_or_create() in pg_basebackup.c, as one example,
that relies on pg_check_dir(). I think that you'd better rely at
least on what pgcheckdir.c offers.

Yeah, though I am tending towards what another user had suggested and
just accepting an existing directory rather than requiring it to be
empty, so thinking I might just skip this one. (Will review the file
you've pointed out to see if there is a relevant function though.)

OK. FWIW, pg_check_dir() is used in initdb and pg_basebackup because
these care about the behavior to use when specifying a target path.
You could reuse it, but use a different policy depending on its
returned result for the needs of what you see fit in this case.

+                   PageSetLSN(page, record->ReadRecPtr);
+                   /* if checksum field is non-zero then we have checksums enabled,
+                    * so recalculate the checksum with new LSN (yes, this is a hack)
+                    */
Yeah, that looks like a hack, but putting in place a page on a cluster
that has checksums enabled would be more annoying with
zero_damaged_pages enabled if we don't do that, so that's fine by me
as-is.  Perhaps you should mention that FPWs don't have their
pd_checksum updated when written.

Can make a mention; you thinking just a comment in the code is
sufficient, or want there to be user-visible docs as well?

I was thinking about a comment, at least.

This was to make the page as extracted equivalent to the effect of
applying the WAL record block on replay (the LSN and checksum both);
since recovery calls this function to mark the LSN where the page came
from this is why I did this as well. (I do see perhaps a case for
--raw output that doesn't munge the page whatsoever, just made
comparisons easier.)

Hm. Okay. The argument goes both ways, I guess, depending on what we
want to do with those raw pages. Still you should not need pd_lsn if
the point is to be able to stick the pages back in place to attempt to
get back as much data as possible when loading it back to the shared
buffers?
--
Michael

#13David Christensen
david.christensen@crunchydata.com
In reply to: Michael Paquier (#12)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Mon, Apr 25, 2022 at 9:54 PM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Apr 25, 2022 at 10:11:10AM -0500, David Christensen wrote:

On Mon, Apr 25, 2022 at 1:11 AM Michael Paquier <michael@paquier.xyz> wrote:

I don't think that there is any need to rely on a new logic if there
is already some code in place able to do the same work. See
verify_dir_is_empty_or_create() in pg_basebackup.c, as one example,
that relies on pg_check_dir(). I think that you'd better rely at
least on what pgcheckdir.c offers.

Yeah, though I am tending towards what another user had suggested and
just accepting an existing directory rather than requiring it to be
empty, so thinking I might just skip this one. (Will review the file
you've pointed out to see if there is a relevant function though.)

OK. FWIW, pg_check_dir() is used in initdb and pg_basebackup because
these care about the behavior to use when specifying a target path.
You could reuse it, but use a different policy depending on its
returned result for the needs of what you see fit in this case.

I have a new version of the patch (pending tests) that uses
pg_check_dir's return value to handle things appropriately, so at
least some code reuse now. It did end up simplifying a lot.

+                   PageSetLSN(page, record->ReadRecPtr);
+                   /* if checksum field is non-zero then we have checksums enabled,
+                    * so recalculate the checksum with new LSN (yes, this is a hack)
+                    */
Yeah, that looks like a hack, but putting in place a page on a cluster
that has checksums enabled would be more annoying with
zero_damaged_pages enabled if we don't do that, so that's fine by me
as-is.  Perhaps you should mention that FPWs don't have their
pd_checksum updated when written.

Can make a mention; you thinking just a comment in the code is
sufficient, or want there to be user-visible docs as well?

I was thinking about a comment, at least.

New patch version has significantly more comments.

This was to make the page as extracted equivalent to the effect of
applying the WAL record block on replay (the LSN and checksum both);
since recovery calls this function to mark the LSN where the page came
from this is why I did this as well. (I do see perhaps a case for
--raw output that doesn't munge the page whatsoever, just made
comparisons easier.)

Hm. Okay. The argument goes both ways, I guess, depending on what we
want to do with those raw pages. Still you should not need pd_lsn if
the point is to be able to stick the pages back in place to attempt to
get back as much data as possible when loading it back to the shared
buffers?

Yeah, I can see that too; I think there's at least enough of an
argument for a flag to apply the fixups or just extract only the raw
page pre-modification. Not sure which should be the "default"
behavior; either `--raw` or `--fixup-metadata` or something could
work. (Naming suggestions welcomed.)

David

#14David Christensen
david.christensen@crunchydata.com
In reply to: Michael Paquier (#11)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Mon, Apr 25, 2022 at 9:42 PM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Apr 25, 2022 at 10:24:52AM -0500, David Christensen wrote:

On Mon, Apr 25, 2022 at 6:03 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Thanks for working on this. I'm just thinking if we can use these FPIs
to repair the corrupted pages? I would like to understand more
detailed usages of the FPIs other than inspecting with pageinspect.

My main use case was for being able to look at potential corruption,
either in the WAL stream, on heap, or in tools associated with the WAL
stream. I suppose you could use the page images to replace corrupted
on-disk pages (and in fact I think I've heard of a tool or two that
try to do that), though don't know that I consider this the primary
purpose (and having toast tables and the list, as well as clog would
make it potentially hard to just drop-in a page version without
issues). Might help in extreme situations though.

You could do a bunch of things with those images, even make things
worse if you are not careful enough.

True. :-) This does seem like a tool geared towards "expert mode", so
maybe we just assume if you need it you know what you're doing?

10) Along with pg_pwrite(), can we also fsync the files (of course
users can choose it optionally) so that the writes will be durable for
the OS crashes?

Can add; you thinking a separate flag to disable this with default true?

We expect data generated by tools like pg_dump, pg_receivewal
(depending on the use --synchronous) or pg_basebackup to be consistent
when we exit from the call. FWIW, flushing this data does not seem
like a strong requirement for something aimed at being used page-level
chirurgy or lookups, because the WAL segments should still be around
even if the host holding the archives is unplugged.

I have added the fsync to the latest path (forthcoming), but I have no
strong preferences here as to the correct/expected behavior.

Best,

David

#15Michael Paquier
michael@paquier.xyz
In reply to: David Christensen (#14)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Tue, Apr 26, 2022 at 01:15:05PM -0500, David Christensen wrote:

True. :-) This does seem like a tool geared towards "expert mode", so
maybe we just assume if you need it you know what you're doing?

This is definitely an expert mode toy.
--
Michael

#16David Christensen
david.christensen@crunchydata.com
In reply to: David Christensen (#13)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

Enclosed is v3 of this patch; this adds two modes for this feature,
one with the raw page `--save-fullpage/-W` and one with the
LSN+checksum fixups `--save-fullpage-fixup/-X`.

I've added at least some basic sanity-checking of the underlying
feature, as well as run the test file and the changes to pg_waldump.c
through pgindent/perltidy to make them adhere to project standards.
Threw in a rebase as well.

Would appreciate any additional feedback here.

Best,

David

Attachments:

v3-pg_waldump-save-fpi.patchapplication/octet-stream; name=v3-pg_waldump-save-fpi.patchDownload+379-26
#17David Christensen
david.christensen@crunchydata.com
In reply to: David Christensen (#16)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

...and pushing a couple fixups pointed out by cfbot, so here's v4.

On Mon, May 2, 2022 at 8:42 AM David Christensen
<david.christensen@crunchydata.com> wrote:

Show quoted text

Enclosed is v3 of this patch; this adds two modes for this feature,
one with the raw page `--save-fullpage/-W` and one with the
LSN+checksum fixups `--save-fullpage-fixup/-X`.

I've added at least some basic sanity-checking of the underlying
feature, as well as run the test file and the changes to pg_waldump.c
through pgindent/perltidy to make them adhere to project standards.
Threw in a rebase as well.

Would appreciate any additional feedback here.

Best,

David

Attachments:

v4-pg_waldump-save-fpi.patchapplication/octet-stream; name=v4-pg_waldump-save-fpi.patchDownload+385-26
#18Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#15)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On 2022-Apr-27, Michael Paquier wrote:

On Tue, Apr 26, 2022 at 01:15:05PM -0500, David Christensen wrote:

True. :-) This does seem like a tool geared towards "expert mode", so
maybe we just assume if you need it you know what you're doing?

This is definitely an expert mode toy.

I remember Greg Mullane posted a tool that attempts to correct page CRC
mismatches[1]https://github.com/turnstep/pg_healer. This new tool might be useful to feed healing attempts,
too. (It's of course not in any way a *solution*, because the page
might have been modified by other WAL records since the last FPI, but it
could be a start towards building a solution that scoops page contents
from WAL.)

[1]: https://github.com/turnstep/pg_healer

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Subversion to GIT: the shortest path to happiness I've ever heard of
(Alexey Klyukin)

#19Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#18)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Tue, May 03, 2022 at 10:34:41AM +0200, Alvaro Herrera wrote:

I remember Greg Mullane posted a tool that attempts to correct page CRC
mismatches[1]. This new tool might be useful to feed healing attempts,
too. (It's of course not in any way a *solution*, because the page
might have been modified by other WAL records since the last FPI, but it
could be a start towards building a solution that scoops page contents
from WAL.)

[1] https://github.com/turnstep/pg_healer

Fun. Thanks for mentioning that.
--
Michael

#20Ian Lawrence Barwick
barwick@gmail.com
In reply to: David Christensen (#17)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022年5月3日(火) 8:45 David Christensen <david.christensen@crunchydata.com>:

...and pushing a couple fixups pointed out by cfbot, so here's v4.

Hi

cfbot reports the patch no longer applies [1]http://cfbot.cputube.org/patch_40_3628.log. As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch.

[1]: http://cfbot.cputube.org/patch_40_3628.log

Thanks

Ian Barwick

#21Justin Pryzby
pryzby@telsasoft.com
In reply to: Ian Lawrence Barwick (#20)
#22David Christensen
david.christensen@crunchydata.com
In reply to: Justin Pryzby (#21)
#23Justin Pryzby
pryzby@telsasoft.com
In reply to: David Christensen (#22)
#24David Christensen
david.christensen@crunchydata.com
In reply to: Justin Pryzby (#23)
#25David Christensen
david.christensen@crunchydata.com
In reply to: David Christensen (#24)
#26Justin Pryzby
pryzby@telsasoft.com
In reply to: David Christensen (#25)
#27David Christensen
david.christensen@crunchydata.com
In reply to: Justin Pryzby (#26)
#28David Christensen
david.christensen@crunchydata.com
In reply to: David Christensen (#27)
#29kato-sho@fujitsu.com
kato-sho@fujitsu.com
In reply to: David Christensen (#28)
#30Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: David Christensen (#28)
#31Justin Pryzby
pryzby@telsasoft.com
In reply to: Bharath Rupireddy (#30)
#32Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Justin Pryzby (#31)
#33David Christensen
david.christensen@crunchydata.com
In reply to: Bharath Rupireddy (#30)
#34David Christensen
david.christensen@crunchydata.com
In reply to: Justin Pryzby (#31)
#35David Christensen
david.christensen@crunchydata.com
In reply to: David Christensen (#34)
#36Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: David Christensen (#33)
#37David Christensen
david.christensen@crunchydata.com
In reply to: Bharath Rupireddy (#36)
#38Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: David Christensen (#37)
#39Justin Pryzby
pryzby@telsasoft.com
In reply to: David Christensen (#35)
#40David Christensen
david.christensen@crunchydata.com
In reply to: Justin Pryzby (#39)
#41David Christensen
david.christensen@crunchydata.com
In reply to: Bharath Rupireddy (#38)
#42David Christensen
david.christensen@crunchydata.com
In reply to: David Christensen (#41)
#43Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: David Christensen (#42)
#44David Christensen
david.christensen@crunchydata.com
In reply to: Bharath Rupireddy (#43)
#45David Christensen
david.christensen@crunchydata.com
In reply to: David Christensen (#44)
#46Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: David Christensen (#45)
#47David Christensen
david.christensen@crunchydata.com
In reply to: Bharath Rupireddy (#46)
#48Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: David Christensen (#47)
#49David Christensen
david.christensen@crunchydata.com
In reply to: Bharath Rupireddy (#48)
#50Michael Paquier
michael@paquier.xyz
In reply to: David Christensen (#49)
#51David Christensen
david.christensen@crunchydata.com
In reply to: Michael Paquier (#50)
#52Michael Paquier
michael@paquier.xyz
In reply to: David Christensen (#51)
#53Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: David Christensen (#51)
#54David Christensen
david.christensen@crunchydata.com
In reply to: Bharath Rupireddy (#53)
#55David Christensen
david.christensen@crunchydata.com
In reply to: Michael Paquier (#52)
#56David Christensen
david.christensen@crunchydata.com
In reply to: David Christensen (#54)
#57Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: David Christensen (#56)
#58Michael Paquier
michael@paquier.xyz
In reply to: Bharath Rupireddy (#57)
#59Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Michael Paquier (#58)
#60David Christensen
david.christensen@crunchydata.com
In reply to: Michael Paquier (#58)
#61Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#58)
#62Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#61)
#63Michael Paquier
michael@paquier.xyz
In reply to: Bharath Rupireddy (#59)
#64Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#59)
#65Andrey Borodin
amborodin@acm.org
In reply to: Michael Paquier (#62)
#66Michael Paquier
michael@paquier.xyz
In reply to: Andrey Borodin (#65)