Fix pg_checksums progress report

Started by Nonamealmost 5 years ago8 messages
#1Noname
Shinya11.Kato@nttdata.com
1 attachment(s)

Hi,

I found a problem with the pg_checksums.c.

The total_size is calculated by scanning the directory.
The current_size is calculated by scanning the files, but the current_size does not include the size of NewPages.

This may cause pg_checksums progress report to not be 100%.
I have attached a patch that fixes this.

Regards,
Shinya Kato

Attachments:

fix_pg_checksums_progress_report.patchapplication/octet-stream; name=fix_pg_checksums_progress_report.patchDownload
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 0223ee4408..f3b4751e3f 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -229,12 +229,14 @@ scan_file(const char *fn, BlockNumber segmentno)
 		}
 		blocks++;
 
+		current_size += r;
+
 		/* New pages have no checksum yet */
 		if (PageIsNew(header))
 			continue;
 
 		csum = pg_checksum_page(buf.data, blockno + segmentno * RELSEG_SIZE);
-		current_size += r;
+
 		if (mode == PG_MODE_CHECK)
 		{
 			if (csum != header->pd_checksum)
#2Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Noname (#1)
Re: Fix pg_checksums progress report

On 2021/04/02 14:23, Shinya11.Kato@nttdata.com wrote:

Hi,

I found a problem with the pg_checksums.c.

The total_size is calculated by scanning the directory.
The current_size is calculated by scanning the files, but the current_size does not include the size of NewPages.

This may cause pg_checksums progress report to not be 100%.
I have attached a patch that fixes this.

Thanks for the report and patch!

I could reproduce this issue and confirmed that the patch fixes it.

Regarding the patch, I think that it's better to add the comment about
why current_size needs to be counted including new pages.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#3Noname
Shinya11.Kato@nttdata.com
In reply to: Fujii Masao (#2)
1 attachment(s)
RE: Fix pg_checksums progress report

-----Original Message-----
From: Fujii Masao <masao.fujii@oss.nttdata.com>
Sent: Friday, April 2, 2021 2:39 PM
To: Shinya11.Kato@nttdata.com; pgsql-hackers@postgresql.org
Subject: Re: Fix pg_checksums progress report

On 2021/04/02 14:23, Shinya11.Kato@nttdata.com wrote:

Hi,

I found a problem with the pg_checksums.c.

The total_size is calculated by scanning the directory.
The current_size is calculated by scanning the files, but the current_size does

not include the size of NewPages.

This may cause pg_checksums progress report to not be 100%.
I have attached a patch that fixes this.

Thanks for the report and patch!

I could reproduce this issue and confirmed that the patch fixes it.

Regarding the patch, I think that it's better to add the comment about why
current_size needs to be counted including new pages.

Thanks for your review.
I added a comment to the patch, and attached the new patch.

Regards,
Shinya Kato

Attachments:

fix_pg_checksums_progress_report_v2.patchapplication/octet-stream; name=fix_pg_checksums_progress_report_v2.patchDownload
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 0223ee4408..2d078d8fbb 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -229,12 +229,17 @@ scan_file(const char *fn, BlockNumber segmentno)
 		}
 		blocks++;
 
+		/*
+		 * The current_size is calculated before checking if header is a
+		 * new page, because total_size includes the size of new pages.
+		 */
+		current_size += r;
+
 		/* New pages have no checksum yet */
 		if (PageIsNew(header))
 			continue;
 
 		csum = pg_checksum_page(buf.data, blockno + segmentno * RELSEG_SIZE);
-		current_size += r;
 		if (mode == PG_MODE_CHECK)
 		{
 			if (csum != header->pd_checksum)
#4Michael Paquier
michael@paquier.xyz
In reply to: Noname (#3)
Re: Fix pg_checksums progress report

On Fri, Apr 02, 2021 at 07:30:32AM +0000, Shinya11.Kato@nttdata.com wrote:

I added a comment to the patch, and attached the new patch.

Hmm. This looks to come from 280e5f14 that introduced the progress
reports so this would need a backpatch down to 12. I have not looked
in details and have not looked at the patch yet, though. Fujii-san,
are you planning to take care of that? That was my stuff originally,
so I am fine to look at it. But not now, on a Friday afternoon :)
--
Michael

#5Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Michael Paquier (#4)
Re: Fix pg_checksums progress report

On 2021/04/02 16:47, Michael Paquier wrote:

On Fri, Apr 02, 2021 at 07:30:32AM +0000, Shinya11.Kato@nttdata.com wrote:

I added a comment to the patch, and attached the new patch.

Thanks for updating the patch!

+		/*
+		 * The current_size is calculated before checking if header is a
+		 * new page, because total_size includes the size of new pages.
+		 */
+		current_size += r;

I'd like to comment more. What about the following?

---------------------------
Since the file size is counted as total_size for progress status information, the sizes of all pages including new ones in the file should be counted as current_size. Otherwise the progress reporting calculated using those counters may not reach 100%.
---------------------------

Hmm. This looks to come from 280e5f14 that introduced the progress
reports so this would need a backpatch down to 12.

Yes.

I have not looked
in details and have not looked at the patch yet, though. Fujii-san,
are you planning to take care of that?

Yes, I will. Thanks for the consideration!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#6Noname
Shinya11.Kato@nttdata.com
In reply to: Fujii Masao (#5)
1 attachment(s)
RE: Fix pg_checksums progress report

-----Original Message-----
From: Fujii Masao <masao.fujii@oss.nttdata.com>
Sent: Friday, April 2, 2021 6:03 PM
To: Michael Paquier <michael@paquier.xyz>; Shinya11.Kato@nttdata.com
Cc: pgsql-hackers@postgresql.org
Subject: Re: Fix pg_checksums progress report

On 2021/04/02 16:47, Michael Paquier wrote:

On Fri, Apr 02, 2021 at 07:30:32AM +0000, Shinya11.Kato@nttdata.com wrote:

I added a comment to the patch, and attached the new patch.

Thanks for updating the patch!

+		/*
+		 * The current_size is calculated before checking if header is a
+		 * new page, because total_size includes the size of new
pages.
+		 */
+		current_size += r;

I'd like to comment more. What about the following?

---------------------------
Since the file size is counted as total_size for progress status information, the
sizes of all pages including new ones in the file should be counted as
current_size. Otherwise the progress reporting calculated using those counters
may not reach 100%.
---------------------------

Thanks for your review!
I updated the patch, and attached it.

Regards,
Shinya Kato

Attachments:

fix_pg_checksums_progress_report_v3.patchapplication/octet-stream; name=fix_pg_checksums_progress_report_v3.patchDownload
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 0223ee4408..831cf42d3a 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -229,12 +229,19 @@ scan_file(const char *fn, BlockNumber segmentno)
 		}
 		blocks++;
 
+		/*
+		 * Since the file size is counted as total_size for progress status
+		 * information, the sizes of all pages including new ones in the file
+		 * should be counted as current_size. Otherwise the progress reporting
+		 * calculated using those counters may not reach 100%.
+		 */
+		current_size += r;
+
 		/* New pages have no checksum yet */
 		if (PageIsNew(header))
 			continue;
 
 		csum = pg_checksum_page(buf.data, blockno + segmentno * RELSEG_SIZE);
-		current_size += r;
 		if (mode == PG_MODE_CHECK)
 		{
 			if (csum != header->pd_checksum)
#7Michael Paquier
michael@paquier.xyz
In reply to: Fujii Masao (#5)
Re: Fix pg_checksums progress report

On Fri, Apr 02, 2021 at 06:03:21PM +0900, Fujii Masao wrote:

On 2021/04/02 16:47, Michael Paquier wrote:

I have not looked
in details and have not looked at the patch yet, though. Fujii-san,
are you planning to take care of that?

Yes, I will. Thanks for the consideration!

OK, thanks!
--
Michael

#8Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Noname (#6)
Re: Fix pg_checksums progress report

On 2021/04/02 18:19, Shinya11.Kato@nttdata.com wrote:

Thanks for your review!
I updated the patch, and attached it.

Thanks for updating the patch! Pushed.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION