Progress reporting for pg_verify_checksums

Started by Michael Banckover 7 years ago65 messageshackers
Jump to latest
#1Michael Banck
michael.banck@credativ.de

Hi,

my colleague Bernd Helmle recently added progress reporting to our
pg_checksums application[1]https://github.com/credativ/pg_checksums/commit/2b691 -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de. I have now forward ported this to
pg_verify_checksums for the September commitfest, please see the
attached patch.

Here's the description:

This optionally prints the progress of pg_verify_checksums via read
kilobytes to the terminal with the new command line argument -P.

If -P was forgotten and pg_verify_checksums operates on a large cluster,
the caller can send SIGUSR1 to pg_verify_checksums to turn progress
status reporting on during runtime.

Michael

[1]: https://github.com/credativ/pg_checksums/commit/2b691 -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de
--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Attachments:

pg_verify_checksums_progress.patchtext/x-patch; charset=UTF-8; name=pg_verify_checksums_progress.patchDownload+146-9
#2Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Banck (#1)
Re: Progress reporting for pg_verify_checksums

Hallo Michael,

my colleague Bernd Helmle recently added progress reporting to our
pg_checksums application[1]. I have now forward ported this to
pg_verify_checksums for the September commitfest, please see the
attached patch.

It seems that the patch does not apply cleanly on master, neither with
"git apply" nor "patch". Could you rebase it?

Here's the description:

This optionally prints the progress of pg_verify_checksums via read
kilobytes

MB? GB?

to the terminal with the new command line argument -P.

If -P was forgotten and pg_verify_checksums operates on a large cluster,
the caller can send SIGUSR1 to pg_verify_checksums to turn progress
status reporting on during runtime.

Hmmm. I cannot say I like the signal feature much. Would it make sense for
the progress to be on by default, and to have a quiet option instead?

--
Fabien.

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Fabien COELHO (#2)
Re: Progress reporting for pg_verify_checksums

On 2018-Sep-01, Fabien COELHO wrote:

If -P was forgotten and pg_verify_checksums operates on a large cluster,
the caller can send SIGUSR1 to pg_verify_checksums to turn progress
status reporting on during runtime.

Hmmm. I cannot say I like the signal feature much. Would it make sense for
the progress to be on by default, and to have a quiet option instead?

Hmm, I recall this technique being used elsewhere and is sometimes
useful. Can't remember where though -- by manpages, it's not rsync nor
pv ...

How about making it a toggle? Default off, enable-able by option,
toggleable by signal. (If you enable it via the signal, what's the rate
to report at?)

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Michael Banck
michael.banck@credativ.de
In reply to: Alvaro Herrera (#3)
Re: Progress reporting for pg_verify_checksums

Hi,

On Mon, Sep 03, 2018 at 11:21:32AM -0300, Alvaro Herrera wrote:

On 2018-Sep-01, Fabien COELHO wrote:

If -P was forgotten and pg_verify_checksums operates on a large cluster,
the caller can send SIGUSR1 to pg_verify_checksums to turn progress
status reporting on during runtime.

Hmmm. I cannot say I like the signal feature much. Would it make sense for
the progress to be on by default, and to have a quiet option instead?

Hmm, I recall this technique being used elsewhere and is sometimes
useful. Can't remember where though -- by manpages, it's not rsync nor
pv ...

e2fsck does it. And I think it would be useful for pg_basebackup as
well, but that's for another time.

In any case, pg_basebackup has the same option and it is off by default
as well. I guess the rationale for not having it on by default for
pg_basebackup is the fact is does not have any other output normally.

pg_verify_checksums writes a report at the end however, but maybe that
one could be demoted to verbose mode now that we have it?

How about making it a toggle? Default off, enable-able by option,
toggleable by signal.

That sounds like a good idea.

(If you enable it via the signal, what's the rate to report at?)

You mean how often it reports progress (that seems independent of the
signal)? From testing, it's roughly every second or so.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB M�nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 M�nchengladbach
Gesch�ftsf�hrung: Dr. Michael Meskes, J�rg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#5Michael Banck
michael.banck@credativ.de
In reply to: Michael Banck (#1)
Re: Progress reporting for pg_verify_checksums

Hi,

Am Freitag, den 31.08.2018, 14:50 +0200 schrieb Michael Banck:

my colleague Bernd Helmle recently added progress reporting to our
pg_checksums application[1]. I have now forward ported this to
pg_verify_checksums for the September commitfest, please see the
attached patch.

Here's the description:

This optionally prints the progress of pg_verify_checksums via read
kilobytes to the terminal with the new command line argument -P.

If -P was forgotten and pg_verify_checksums operates on a large cluster,
the caller can send SIGUSR1 to pg_verify_checksums to turn progress
status reporting on during runtime.

Version 2 of the patch is attached. This is rebased to master as of
422952ee and changes the signal handling to be a toggle as suggested by
Alvaro.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Attachments:

pg_verify_checksums_progress_V2.patchtext/x-patch; charset=UTF-8; name=pg_verify_checksums_progress_V2.patchDownload+148-10
#6Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Banck (#5)
Re: Progress reporting for pg_verify_checksums

Hallo Michael,

This optionally prints the progress of pg_verify_checksums via read
kilobytes to the terminal with the new command line argument -P.

If -P was forgotten and pg_verify_checksums operates on a large cluster,
the caller can send SIGUSR1 to pg_verify_checksums to turn progress
status reporting on during runtime.

Version 2 of the patch is attached. This is rebased to master as of
422952ee and changes the signal handling to be a toggle as suggested by
Alvaro.

Patch applies cleanly and compiles.

About tests: "make check" is okay, but ITSM that the command is not
started once, ever, in any test:-( It is unclear whether any test triggers
checksums anyway...

The time() granularity to the second makes the result awkward on small
tests:

8/1554552 kB (0%, 8 kB/s)
1040856/1554552 kB (66%, 1040856 kB/s)
1554552/1554552 kB (100%, 1554552 kB/s)

I'd suggest to reuse the "portability/instr_time.h" stuff which allows a
much better precision.

The "kB" kilo-byte symbols stands for 1000 bytes, but you are displaying
1024-bytes kilos, which symbol is either "KiB" or "KB". Given that we are
about storage which is usually measured in powers of 1,000, so I'd suggest
to use thousands.

Another reserve I have is that on any hardware it is likely that there
will be a lot of kilos, so maybe megas (MB) should be used instead.

I'm wondering whether the bandwidth display should be local (from the last
display) or global (from the start of the command), but for the last
forced one. This is an open question.

Maybe it would be nice to show elapsed time and expected completion time
based on the current speed.

I would be in favor or having this turned on by default, and silenced with
some option. I'm not sure other people would agree, though, so it is an
open question as well.

About the code:

  +               if (show_progress)
  +                       show_progress = false;
  +               else
  +                       show_progress = true;

Why not a simple:

/* invert current state */
show_progress = !show_progress;

I do not see much advantage in using intermediate string buffers for
values:

  + snprintf(totalstr, sizeof(totalstr), INT64_FORMAT,
  +          total_size / 1024);
  + snprintf(currentstr, sizeof(currentstr), INT64_FORMAT,
  +          current_size / 1024);
  + snprintf(currspeedstr, sizeof(currspeedstr), INT64_FORMAT,
  +          (current_size / 1024) / (((time(NULL) - scan_started) == 0) ? 1 : (time(NULL) - scan_started)));
  + fprintf(stderr, "%s/%s kB (%d%%, %s kB/s)",
  +          currentstr, totalstr, total_percent, currspeedstr);

ISTM that just one simple fprintf would be fine:

fprintf(stderr, INT64_FORMAT "/" INT64_FORMAT " ...",
formulas for each slot...);

ISTM that this line overwriting trick deserves a comment:

  + if (isatty(fileno(stderr)))
  +   fprintf(stderr, "\r");
  + else
  +   fprintf(stderr, "\n");

And it runs a little amok with "-v".

+ memset(&act, '\0', sizeof(act));

pg uses 0 instead of '\0' everywhere else.

   +   /* Make sure we just report at least once a second */
   +   if ((now == last_progress_update) && !force) return;

The comments seems contradictory, I understand the code makes sure that it
is "at most" once a second. Pgbench uses "-P XXX" to strigger a progress
report every XXX second. I'm unsure whether it would be useful to allow
the user to change the 1 second display interval.

I'm not sure why you removed one unrelated line:

#include "storage/checksum.h"
#include "storage/checksum_impl.h"

-
static int64 files = 0;
static int64 blocks = 0;
static int64 badblocks = 0;

There is a problem in the scan_file code: The added sizeonly parameter is
not used. It should be removed.

--
Fabien.

#7Michael Banck
michael.banck@credativ.de
In reply to: Fabien COELHO (#6)
Re: Progress reporting for pg_verify_checksums

Hi,

thanks for the review!

On Wed, Sep 19, 2018 at 05:17:05PM +0200, Fabien COELHO wrote:

This optionally prints the progress of pg_verify_checksums via read
kilobytes to the terminal with the new command line argument -P.

If -P was forgotten and pg_verify_checksums operates on a large cluster,
the caller can send SIGUSR1 to pg_verify_checksums to turn progress
status reporting on during runtime.

Version 2 of the patch is attached. This is rebased to master as of
422952ee and changes the signal handling to be a toggle as suggested by
Alvaro.

Patch applies cleanly and compiles.

About tests: "make check" is okay, but ITSM that the command is not started
once, ever, in any test:-( It is unclear whether any test triggers checksums
anyway...

Yeah, this was discussed on another thread and some basic tap tests for
pg_verify_checksums were proposed (also by me), but this hasn't been
further addressed.

Personally, I think this would be a good thing to add to v11 even.

In any case, this particular feature might not be very easy to tap test,
but I am open to suggestions, of course.

The time() granularity to the second makes the result awkward on small
tests:

8/1554552 kB (0%, 8 kB/s)
1040856/1554552 kB (66%, 1040856 kB/s)
1554552/1554552 kB (100%, 1554552 kB/s)

I'd suggest to reuse the "portability/instr_time.h" stuff which allows a
much better precision.

The "kB" kilo-byte symbols stands for 1000 bytes, but you are displaying
1024-bytes kilos, which symbol is either "KiB" or "KB". Given that we are
about storage which is usually measured in powers of 1,000, so I'd suggest
to use thousands.

Another reserve I have is that on any hardware it is likely that there will
be a lot of kilos, so maybe megas (MB) should be used instead.

The display is exactly the same as in pg_basebackup (except the
bandwith is shown as well), so right now I think it is more useful to be
consistent here. So if we change that, pg_basebackup should be changed
as well I think.

Maybe the code could be factored out to some common file in the future.

I'm wondering whether the bandwidth display should be local (from the last
display) or global (from the start of the command), but for the last forced
one. This is an open question.

Maybe it would be nice to show elapsed time and expected completion time
based on the current speed.

Maybe; this could be added to the factored out common code I mentioned
above.

I would be in favor or having this turned on by default, and silenced with
some option. I'm not sure other people would agree, though, so it is an open
question as well.

If this runs in a script, it would be pretty annoying, so everybody
would have to add --no-progress so I am not convinced. Also, both
pg_basebackup and pgbench don't show progress by default.

About the code:

+               if (show_progress)
+                       show_progress = false;
+               else
+                       show_progress = true;

Why not a simple:

/* invert current state */
show_progress = !show_progress;

Fair enough.

I do not see much advantage in using intermediate string buffers for values:

+ snprintf(totalstr, sizeof(totalstr), INT64_FORMAT,
+          total_size / 1024);
+ snprintf(currentstr, sizeof(currentstr), INT64_FORMAT,
+          current_size / 1024);
+ snprintf(currspeedstr, sizeof(currspeedstr), INT64_FORMAT,
+          (current_size / 1024) / (((time(NULL) - scan_started) == 0) ? 1 : (time(NULL) - scan_started)));
+ fprintf(stderr, "%s/%s kB (%d%%, %s kB/s)",
+          currentstr, totalstr, total_percent, currspeedstr);

ISTM that just one simple fprintf would be fine:

fprintf(stderr, INT64_FORMAT "/" INT64_FORMAT " ...",
formulas for each slot...);

That code is adopted from pg_basebackup.c and the comment there says:

| * Separate step to keep platform-dependent format code out of
| * translatable strings. And we only test for INT64_FORMAT
| * availability in snprintf, not fprintf.

So that sounds legit to me.

ISTM that this line overwriting trick deserves a comment:

+ if (isatty(fileno(stderr)))
+   fprintf(stderr, "\r");
+ else
+   fprintf(stderr, "\n");

I agree a comment should be in there. But that code is also taken
verbatim from pg_basebackup.c (but this time, there's no comment there,
either).

How about this:

| * If we are reporting to a terminal, send a carriage return so that
| * we stay on the same line. If not, send a newline.

And it runs a little amok with "-v".

Do you suggest we should make those mutually exlusive? I agree that in
general, -P is not very useful if -v is on, but if you have a really big
table, it should still be, no?

+ memset(&act, '\0', sizeof(act));

pg uses 0 instead of '\0' everywhere else.

Ok.

+   /* Make sure we just report at least once a second */
+   if ((now == last_progress_update) && !force) return;

The comments seems contradictory, I understand the code makes sure that it
is "at most" once a second.

I guess you're right as the identical code in pg_basebackup.c has a
comment saying "Max once per second".

Pgbench uses "-P XXX" to strigger a progress
report every XXX second. I'm unsure whether it would be useful to allow the
user to change the 1 second display interval.

I think pgbench writes a new line for each report? In that case, having
it every second for longer runs might be annoying and I can see the
point in having in configurable.

In the case of pg_basebackup/pg_verify_checksums, I guess 1 second is
fine.

I'm not sure why you removed one unrelated line:

#include "storage/checksum.h"
#include "storage/checksum_impl.h"

-
static int64 files = 0;
static int64 blocks = 0;
static int64 badblocks = 0;

Merge error on my side I guess, will fix.

There is a problem in the scan_file code: The added sizeonly parameter is
not used. It should be removed.

Right, this might have been a leftover from an earlier version of the
code, I'll check back with Bernd to make sure that was not a
porting/merge error on my side.

I've attached V3 of the patch, addressing some of your comments.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB M�nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 M�nchengladbach
Gesch�ftsf�hrung: Dr. Michael Meskes, J�rg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Attachments:

pg_verify_checksums_progress_V3.patchtext/x-diff; charset=us-asciiDownload+146-8
#8Thomas Munro
thomas.munro@gmail.com
In reply to: Alvaro Herrera (#3)
Re: Progress reporting for pg_verify_checksums

On Tue, Sep 4, 2018 at 2:21 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2018-Sep-01, Fabien COELHO wrote:

If -P was forgotten and pg_verify_checksums operates on a large cluster,
the caller can send SIGUSR1 to pg_verify_checksums to turn progress
status reporting on during runtime.

Hmmm. I cannot say I like the signal feature much. Would it make sense for
the progress to be on by default, and to have a quiet option instead?

Hmm, I recall this technique being used elsewhere and is sometimes
useful. Can't remember where though -- by manpages, it's not rsync nor
pv ...

On BSD-derived systems including macOS, you can hit ^T to deliver
SIGINFO to the foreground process (much like ^C delivers SIGINT).
Many well behaved programs including rsync (at least on FreeBSD) then
show you some kind of progress or status report:

munro@asterix $ sleep 60
load: 0.11 cmd: sleep 62234 [nanslp] 1.22r 0.00u 0.00s 0% 2032k
sleep: about 58 second(s) left out of the original 60

I've contemplated various crackpot ideas about teaching psql to show
information about what my backend is doing when I hit ^T (perhaps via
a second connection, similar to the way it cancels queries?)...

There was an interesting thread[1]/messages/by-id/20140623101501.GN16260@awork2.anarazel.de about dumping various things from
backends on (multiplexed) SIGUSR1. Also if memory serves, JVMs also
dump internal stuff when you signal them, so you can confirm that they
really did eat all 64GB of your RAM. But none of these things are
examples of enabling/disabling an on-going status reporting mode,
which is perhaps what you meant, they're more like one-off pokes to
dump information.

[1]: /messages/by-id/20140623101501.GN16260@awork2.anarazel.de

--
Thomas Munro
http://www.enterprisedb.com

#9Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Banck (#7)
Re: Progress reporting for pg_verify_checksums

Hallo Michael,

About patch v3: applies cleanly and compiles.

The xml documentation should be updated! (It is kind of hard to notice
what is not there:-)

See "doc/src/sgml/ref/pg_verify_checksums.sgml".

The time() granularity to the second makes the result awkward on small
tests:

8/1554552 kB (0%, 8 kB/s)
1040856/1554552 kB (66%, 1040856 kB/s)
1554552/1554552 kB (100%, 1554552 kB/s)

I'd suggest to reuse the "portability/instr_time.h" stuff which allows a
much better precision.

I still think it would make sense to use that instead of low-precision
time().

The "kB" kilo-byte symbols stands for 1000 bytes, but you are displaying
1024-bytes kilos, which symbol is either "KiB" or "KB". Given that we are
about storage which is usually measured in powers of 1,000, so I'd suggest
to use thousands.

Another reserve I have is that on any hardware it is likely that there will
be a lot of kilos, so maybe megas (MB) should be used instead.

The display is exactly the same as in pg_basebackup (except the
bandwith is shown as well), so right now I think it is more useful to be
consistent here.

Hmmm... units are units, and the display is wrong. The fact that other
commands are wrong as well does not look like a good argument to persist
in an error.

So if we change that, pg_basebackup should be changed as well I think.

I'm okay with fixing pg_basebackup as well! I'm unsure about the best
place to put such a function, though. Probably under "src/common/" where
there is already some front-end specific code ("fe_memutils.c").

Maybe the code could be factored out to some common file in the future.

I would be okay with a progress printing function shared between some
commands. It just needs some place. pg_rewind also has a --rewind option.

Maybe it would be nice to show elapsed time and expected completion time
based on the current speed.

Maybe; this could be added to the factored out common code I mentioned
above.

Yep.

I would be in favor or having this turned on by default, and silenced with
some option. I'm not sure other people would agree, though, so it is an open
question as well.

If this runs in a script, it would be pretty annoying, so everybody
would have to add --no-progress so I am not convinced. Also, both
pg_basebackup and pgbench don't show progress by default.

Ok.

I do not see much advantage in using intermediate string buffers for values:

+ snprintf(totalstr, sizeof(totalstr), INT64_FORMAT,
+          total_size / 1024);
+ snprintf(currentstr, sizeof(currentstr), INT64_FORMAT,
+          current_size / 1024);
+ snprintf(currspeedstr, sizeof(currspeedstr), INT64_FORMAT,
+          (current_size / 1024) / (((time(NULL) - scan_started) == 0) ? 1 : (time(NULL) - scan_started)));
+ fprintf(stderr, "%s/%s kB (%d%%, %s kB/s)",
+          currentstr, totalstr, total_percent, currspeedstr);

ISTM that just one simple fprintf would be fine:

fprintf(stderr, INT64_FORMAT "/" INT64_FORMAT " ...",
formulas for each slot...);

That code is adopted from pg_basebackup.c and the comment there says:

| * Separate step to keep platform-dependent format code out of
| * translatable strings. And we only test for INT64_FORMAT
| * availability in snprintf, not fprintf.

So that sounds legit to me.

Hmmm. Translation. Not sure I like much the idea of translating units,
but why not.

| * If we are reporting to a terminal, send a carriage return so that
| * we stay on the same line. If not, send a newline.

And it runs a little amok with "-v".

Do you suggest we should make those mutually exlusive? I agree that in
general, -P is not very useful if -v is on, but if you have a really big
table, it should still be, no?

Yep. I was just mentionning that they interfere on a terminal, but I agree
that there is no obvious fix.

+ memset(&act, '\0', sizeof(act));

pg uses 0 instead of '\0' everywhere else.

Ok.

Not '0', plain 0 (the integer of value zero).

--
Fabien.

#10Michael Banck
michael.banck@credativ.de
In reply to: Fabien COELHO (#9)
Re: Progress reporting for pg_verify_checksums

Hi,

On Wed, Sep 26, 2018 at 10:46:06AM +0200, Fabien COELHO wrote:

The xml documentation should be updated! (It is kind of hard to notice what
is not there:-)

See "doc/src/sgml/ref/pg_verify_checksums.sgml".

Right, I've added a paragraph.

The time() granularity to the second makes the result awkward on small
tests:

8/1554552 kB (0%, 8 kB/s)
1040856/1554552 kB (66%, 1040856 kB/s)
1554552/1554552 kB (100%, 1554552 kB/s)

I'd suggest to reuse the "portability/instr_time.h" stuff which allows a
much better precision.

I still think it would make sense to use that instead of low-precision
time().

As the use-case of this is not for small tests, I don't think it is
useful to make the code more complicated for this.

The "kB" kilo-byte symbols stands for 1000 bytes, but you are displaying
1024-bytes kilos, which symbol is either "KiB" or "KB". Given that we are
about storage which is usually measured in powers of 1,000, so I'd suggest
to use thousands.

Another reserve I have is that on any hardware it is likely that there will
be a lot of kilos, so maybe megas (MB) should be used instead.

The display is exactly the same as in pg_basebackup (except the
bandwith is shown as well), so right now I think it is more useful to be
consistent here.

Hmmm... units are units, and the display is wrong. The fact that other
commands are wrong as well does not look like a good argument to persist in
an error.

I've had a look around, and "kB" seems to be a common unit for 1024
bytes, e.g. in pg_test_fsync etc., unless I am mistaken?

So if we change that, pg_basebackup should be changed as well I think.

I'm okay with fixing pg_basebackup as well! I'm unsure about the best place
to put such a function, though. Probably under "src/common/" where there is
already some front-end specific code ("fe_memutils.c").

Maybe the code could be factored out to some common file in the future.

I would be okay with a progress printing function shared between some
commands. It just needs some place. pg_rewind also has a --rewind option.

I guess you mean pg_rewind also has a --progress option.

I do agree it makes sense to refactor that, but I don't think this
should be part of this patch.

+ memset(&act, '\0', sizeof(act));

pg uses 0 instead of '\0' everywhere else.

Ok.

Not '0', plain 0 (the integer of value zero).

Oops, thanks for spotting that.

I've attached v4 of the patch.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB M�nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 M�nchengladbach
Gesch�ftsf�hrung: Dr. Michael Meskes, J�rg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Attachments:

pg_verify_checksums_progress_V4.patchtext/x-diff; charset=us-asciiDownload+159-8
#11Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Banck (#10)
Re: Progress reporting for pg_verify_checksums

The time() granularity to the second makes the result awkward on small
tests:

8/1554552 kB (0%, 8 kB/s)
1040856/1554552 kB (66%, 1040856 kB/s)
1554552/1554552 kB (100%, 1554552 kB/s)

I'd suggest to reuse the "portability/instr_time.h" stuff which allows a
much better precision.

I still think it would make sense to use that instead of low-precision
time().

As the use-case of this is not for small tests,

Even for a long run, the induce error by the 1 second imprecision on both
points is significant at the beginning of the scan.

I don't think it is useful to make the code more complicated for this.

I do not think that it would be much more complicated to use the
portability macros to get a precise time.

The display is exactly the same as in pg_basebackup (except the
bandwith is shown as well), so right now I think it is more useful to be
consistent here.

Hmmm... units are units, and the display is wrong. The fact that other
commands are wrong as well does not look like a good argument to persist in
an error.

I've had a look around, and "kB" seems to be a common unit for 1024
bytes, e.g. in pg_test_fsync etc., unless I am mistaken?

I can only repeat my above comment: the fact that postgres is wrong
elsewhere is not a good reason to persist in reproducing an error.

See the mother of all knowledge https://en.wikipedia.org/wiki/Kilobyte

- SI (decimal, 1000): kB, MB, GB, ...
- IEC (binary 1024): KiB, MiB, GiB, ...
- JEDEC (binary 1024, used for memory): KB, MB, GB.

Summary:
- 1 kB = 1000 bytes
- 1 KB = 1 KiB = 1024 bytes

Decimal is used for storage (HDD, SSD), binary for memory. That is life,
and IMHO Postgres code is not the place to invent new units.

Moreover, I still think that the progress should use MB defined as 1000000
bytes for showing the progress.

I would be okay with a progress printing function shared between some
commands. It just needs some place. pg_rewind also has a --rewind option.

I guess you mean pg_rewind also has a --progress option.

Indeed.

I do agree it makes sense to refactor that, but I don't think this
should be part of this patch.

That's a point. I'd suggest to put the new progress function directly in
the common part, and other patches could take advantage of it for other
commands when someone feels like it.

Other comments:

function toggle_progress_report has empty lines after { and before },
but this style is not used elsewhere, I think that they should be removed.

The report_progress API should be ready to be used by another client
application, which suggest that global variables should be avoided.

Maybe:

void report_progress(current, total, force)

and the scan start and last times could be a static variable inside the
function initialized on the first call, which would suggest to call the
function at the beginning of the scan, probably with current == 0.

Or maybe:

time_type report_progress(current, total, start, last, force)

Which would return the last time, and the caller has responsability for
initializing a start time.

--
Fabien.

#12Thomas Munro
thomas.munro@gmail.com
In reply to: Michael Banck (#10)
Re: Progress reporting for pg_verify_checksums

On Sat, Sep 29, 2018 at 1:07 AM Michael Banck <michael.banck@credativ.de> wrote:

I've attached v4 of the patch.

Hi Michael,

Windows doesn't like sigaction:

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.15189

I'm not sure if we classify this as a "frontend" program. Should it
be using pqsignal() from src/port/pqsignal.c? Or perhaps just
sigaction as you have it (pqsignal.c says that we require sigaction on
all Unices), but #ifndef WIN32 around that stuff, since SIGUSR1 is
never going to work anyway.

--
Thomas Munro
http://www.enterprisedb.com

#13Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Thomas Munro (#12)
Re: Progress reporting for pg_verify_checksums

On Wed, Oct 3, 2018 at 12:51 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote:

On Sat, Sep 29, 2018 at 1:07 AM Michael Banck <michael.banck@credativ.de> wrote:

I've attached v4 of the patch.

Hi Michael,

Windows doesn't like sigaction:

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.15189

I'm not sure if we classify this as a "frontend" program. Should it
be using pqsignal() from src/port/pqsignal.c? Or perhaps just
sigaction as you have it (pqsignal.c says that we require sigaction on
all Unices), but #ifndef WIN32 around that stuff, since SIGUSR1 is
never going to work anyway.

Unfortunately, patch also has some conflicts with the current master. I'll move
it to the next CF.

#14Michael Banck
michael.banck@credativ.de
In reply to: Thomas Munro (#12)
Re: Progress reporting for pg_verify_checksums

Hi,

On Wed, Oct 03, 2018 at 11:50:36AM +1300, Thomas Munro wrote:

On Sat, Sep 29, 2018 at 1:07 AM Michael Banck <michael.banck@credativ.de> wrote:
Windows doesn't like sigaction:

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.15189

Thanks for the report and sorry for taking so long to reply.

I'm not sure if we classify this as a "frontend" program. Should it
be using pqsignal() from src/port/pqsignal.c? Or perhaps just
sigaction as you have it (pqsignal.c says that we require sigaction on
all Unices), but #ifndef WIN32 around that stuff, since SIGUSR1 is
never going to work anyway.

I've used pqsignal now, and disabled that feature on Windows.

I've also updated one comment and added an additional regression test.

V5 attached.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB M�nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 M�nchengladbach
Gesch�ftsf�hrung: Dr. Michael Meskes, J�rg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Attachments:

pg_verify_checksums_progress_V5.patchtext/x-diff; charset=us-asciiDownload+150-9
#15Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Banck (#14)
Re: Progress reporting for pg_verify_checksums

Hallo Michael,

V5 attached.

Patch applies cleanly, compiles, global & local make check are ok.

Given that the specific output is not checked, I do not think that the -P
check deserves a test on its own, I think that the -P option could simply
be added to any of the existing tests.

I'm still unhappy that "kB" is used for 1024 bytes in the output, contrary
to all existing standards (1 kB = 1000 bytes -- SI, 1 KB = 1 KiB = 1024
bytes -- IEC & JEDEC). The fact that this is also used wrongly elsewhere
in pg is not relevant, these are bugs to be fixed, not to be replicated.

Given the speed of verifying checksums and its storage-oriented status, I
also still think that a (possibly fractional) MB (1,000,000 bytes), or
even GB, is the right unit to use for reporting this progress. On my
laptop (SSD), verifying runs at least at 1.26 GB/s (on one small test),
there is no point in displaying kilobytes progress.

I still think that using a more precise time than time(), eg with existing
macros from "instr_time.h", would not cost anything more and result in a
better precision output. It would also allow to remove the check used to
avoid a division-by-zero by switching to double.

If the check is performed while online (other patch in queue), then the
size may change thus it may not reach or go beyond 100%. No big deal.

I'd consider inverting the sizeonly boolean, so that true does the check
and false does only the size collection. It seems more logical to me if it
performs more with true than with false.

--
Fabien.

#16Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Fabien COELHO (#15)
Re: Progress reporting for pg_verify_checksums

Given the speed of verifying checksums and its storage-oriented status, I
also still think that a (possibly fractional) MB (1,000,000 bytes), or even
GB, is the right unit to use for reporting this progress. On my laptop (SSD),
verifying runs at least at 1.26 GB/s (on one small test), there is no point
in displaying kilobytes progress.

Obviously the file is cached by the system at such speed, but still most
disks should provides dozens of MB per second of read bandwidth. If GB is
used, it should use fractional display (eg 1.25 GB) though.

--
Fabien.

#17Michael Banck
michael.banck@credativ.de
In reply to: Fabien COELHO (#16)
Re: Progress reporting for pg_verify_checksums

Hi,

Am Dienstag, den 25.12.2018, 12:12 +0100 schrieb Fabien COELHO:

Given the speed of verifying checksums and its storage-oriented status, I
also still think that a (possibly fractional) MB (1,000,000 bytes), or even
GB, is the right unit to use for reporting this progress. On my laptop (SSD),
verifying runs at least at 1.26 GB/s (on one small test), there is no point
in displaying kilobytes progress.

Obviously the file is cached by the system at such speed, but still most
disks should provides dozens of MB per second of read bandwidth. If GB is
used, it should use fractional display (eg 1.25 GB) though.

I think MB indeed makes more sense than kB, so I have changed that now
in V7, per attached.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Attachments:

pg_verify_checksums_progress_V6.patchtext/x-patch; charset=UTF-8; name=pg_verify_checksums_progress_V6.patchDownload+151-10
#18Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Banck (#17)
Re: Progress reporting for pg_verify_checksums

I think MB indeed makes more sense than kB, so I have changed that now
in V7, per attached.

You use 1024ᅵ bytes. What about 1000000 bytes per MB, as the unit is about
stored files?

Also, you did not answer to my other points:
- use "instr_time.h" for better precision
- invert sizeonly
- reuse a test

--
Fabien.

#19Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#18)
Re: Progress reporting for pg_verify_checksums

On Tue, Dec 25, 2018 at 07:05:30PM +0100, Fabien COELHO wrote:

You use 1024² bytes. What about 1000000 bytes per MB, as the unit is about
stored files?

Also, you did not answer to my other points:
- use "instr_time.h" for better precision
- invert sizeonly
- reuse a test

It seems to me that the SIGUSR1 business is not necessary for the
basic layer of this feature, so I would rather rip that off now. If
necessary we could always discuss that later on. My take about the
option is that --progress should not be the default, but that reports
should only be provided if the caller wants them.

--quiet may have some value by itself, but that seems like a separate
discussion to me.

+   /*
+    * If we are reporting to a terminal, send a carriage return so that we
+    * stay on the same line.  If not, send a newline.
+    */
+   if (isatty(fileno(stderr)))
+       fprintf(stderr, "\r");
+   else
+       fprintf(stderr, "\n");
This bit is not really elegant, why not just '\r'?
+   /* The same for total size */
+   if (current_size > total_size)
+       total_size = current_size / 1048576;
Let's use that in a variable and not hardcode the number.

What's introduced here is very similar to what progress_report() does
in pg_rewind/logging.c. The report depends on the context so this
cannot be a common routine logic but perhaps we could keep a
consistent output for both tools?
--
Michael

#20Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#19)
Re: Progress reporting for pg_verify_checksums

On 2018-Dec-26, Michael Paquier wrote:

+   /*
+    * If we are reporting to a terminal, send a carriage return so that we
+    * stay on the same line.  If not, send a newline.
+    */
+   if (isatty(fileno(stderr)))
+       fprintf(stderr, "\r");
+   else
+       fprintf(stderr, "\n");
This bit is not really elegant, why not just '\r'?

Umm, this is established coding pattern in pg_basebackup.c.
Stylistically I'd change all those cases to "fprintf(stderr,
isatty(fileno(stderr)) ? "\r" : "\n")" but leave the string alone, since
AFAIR it took some time to figure out what to do. (I'd also make the
comment one line instead of four, say "Stay on the same line if
reporting to a terminal". That makes the whole stanza two lines rather
than eight, which is the appropriate amount of space for it).

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#21Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#20)
#22Michael Banck
michael.banck@credativ.de
In reply to: Michael Paquier (#19)
#23Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#22)
#24Michael Banck
michael.banck@credativ.de
In reply to: Michael Paquier (#23)
#25Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Banck (#22)
#26Michael Banck
michael.banck@credativ.de
In reply to: Alvaro Herrera (#25)
#27Bernd Helmle
mailings@oopsware.de
In reply to: Michael Banck (#26)
#28Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bernd Helmle (#27)
#29Michael Banck
michael.banck@credativ.de
In reply to: Alvaro Herrera (#28)
#30Fabien COELHO
fabien.coelho@mines-paristech.fr
In reply to: Michael Banck (#29)
#31Michael Banck
michael.banck@credativ.de
In reply to: Fabien COELHO (#30)
#32Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Banck (#31)
#33Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#32)
#34Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#33)
#35Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#34)
#36Michael Banck
michael.banck@credativ.de
In reply to: Kyotaro Horiguchi (#34)
#37Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Banck (#36)
#38Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Fabien COELHO (#37)
#39Michael Banck
michael.banck@credativ.de
In reply to: Kyotaro Horiguchi (#38)
#40Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#39)
#41Michael Banck
michael.banck@credativ.de
In reply to: Michael Paquier (#40)
#42Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Banck (#41)
#43Michael Banck
michael.banck@credativ.de
In reply to: Fabien COELHO (#42)
#44Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Banck (#43)
#45Michael Banck
michael.banck@credativ.de
In reply to: Fabien COELHO (#44)
#46Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Banck (#45)
#47Michael Banck
michael.banck@credativ.de
In reply to: Fabien COELHO (#46)
#48Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Banck (#47)
#49Michael Banck
michael.banck@credativ.de
In reply to: Fabien COELHO (#48)
#50Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Banck (#49)
#51Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#50)
#52Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Paquier (#51)
#53Michael Banck
michael.banck@credativ.de
In reply to: Michael Paquier (#51)
#54Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#53)
#55Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#52)
#56Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Paquier (#55)
#57Michael Banck
michael.banck@credativ.de
In reply to: Michael Paquier (#54)
#58Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#57)
#59Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#56)
#60Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Paquier (#59)
#61Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#60)
#62Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Paquier (#61)
#63Michael Banck
michael.banck@credativ.de
In reply to: Michael Paquier (#61)
#64Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#63)
#65Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#62)