gitlab post-mortem: pg_basebackup waiting for checkpoint

Started by Michael Banckalmost 9 years ago30 messages
#1Michael Banck
michael.banck@credativ.de
1 attachment(s)

Hi,

one take-away from the Gitlab Post-Mortem[1]https://about.gitlab.com/2017/02/10/postmortem-of-database-outage-of-january-31/ appears to be that after
their secondary lost replication, they were confused about what
pg_basebackup was doing when they tried to rebuild it. It just sat there
and did nothing (even with --verbose), so they assumed something was
wrong with either the primary or the connection, and restarted it
several times.

AFAICT, it turns out the checkpoint was written on the master (they
probably did not use -c fast), but this wasn't obvious to them:

"One of the engineers went to the secondary and wiped the data
directory, then ran pg_basebackup. Unfortunately pg_basebackup would
hang, producing no meaningful output, despite the --verbose option being
set."

[...]

"Unfortunately this did not resolve the problem of pg_basebackup not
starting replication immediately. One of the engineers decided to run it
with strace to see what it was blocking on. strace showed that
pg_basebackup was hanging in a poll call, but that did not provide any
other meaningful information that might have explained why."

[...]

"It would later be revealed by another engineer (who wasn't around at
the time) that this is normal behavior: pg_basebackup will wait for the
primary to start sending over replication data and it will sit and wait
silently until that time. Unfortunately this was not clearly documented
in our engineering runbooks nor in the official pg_basebackup document."

ISTM that even with WAL streaming, nothing would be written on the
client server until the checkpoint is complete, as do_pg_start_backup()
runs the checkpoint and only returns the starting WAL location
afterwards.

The attached (untested) patch is to kick of a discussion on how to
improve the situation, it is supposed to mention the checkpoint when
--verbose is used and adds a paragraph about the checkpoint being run to
the Notes section of the documentation.

Michael

[1]: https://about.gitlab.com/2017/02/10/postmortem-of-database-outage-of-january-31/

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Attachments:

pg_basebackup.patchtext/x-patch; charset=UTF-8; name=pg_basebackup.patchDownload
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index c9dd62c..a298e5c 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -660,6 +660,14 @@ PostgreSQL documentation
   <title>Notes</title>
 
   <para>
+   At the beginning of the backup, a checkpoint needs to be written on the
+   server the backup is taken from.  Especially if the option
+   <literal>--checkpoint=fast</literal> is not used, this can take some time
+   during which <application>pg_basebackup</application> will be idle on the
+   server it is running on.
+  </para>
+
+  <para>
    The backup will include all files in the data directory and tablespaces,
    including the configuration files and any additional files placed in the
    directory by third parties, except certain temporary files managed by
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index b6463fa..ae18c16 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1754,6 +1754,9 @@ BaseBackup(void)
 	if (maxrate > 0)
 		maxrate_clause = psprintf("MAX_RATE %u", maxrate);
 
+	if (verbose)
+		fprintf(stderr, "%s: initiating base backup, waiting for checkpoint to complete\n", progname);
+
 	basebkp =
 		psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s",
 				 escaped_label,
@@ -1771,6 +1774,9 @@ BaseBackup(void)
 		disconnect_and_exit(1);
 	}
 
+	if (verbose)
+		fprintf(stderr, "%s: checkpoint completed\n", progname);
+
 	/*
 	 * Get the starting xlog position
 	 */
#2Magnus Hagander
magnus@hagander.net
In reply to: Michael Banck (#1)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

On Sat, Feb 11, 2017 at 10:38 AM, Michael Banck <michael.banck@credativ.de>
wrote:

Hi,

one take-away from the Gitlab Post-Mortem[1] appears to be that after
their secondary lost replication, they were confused about what
pg_basebackup was doing when they tried to rebuild it. It just sat there
and did nothing (even with --verbose), so they assumed something was
wrong with either the primary or the connection, and restarted it
several times.

AFAICT, it turns out the checkpoint was written on the master (they
probably did not use -c fast), but this wasn't obvious to them:

Yeah, I've seen this happen to a number of people. I think that sounds like
what's happened here as well. I've considered things in the line of the
patch you posted, but never got around to actually doing anything about it.

ISTM that even with WAL streaming, nothing would be written on the
client server until the checkpoint is complete, as do_pg_start_backup()
runs the checkpoint and only returns the starting WAL location
afterwards.

The attached (untested) patch is to kick of a discussion on how to
improve the situation, it is supposed to mention the checkpoint when
--verbose is used and adds a paragraph about the checkpoint being run to
the Notes section of the documentation.

Docs look good to me, other than claiming that pg_basebackup runs on a
server (it can run anywhere). I would just say "during which pg_basebackup
will appear idle". How does that sound to you?

As for the code, while I haven't tested it, isn't the "checkpoint
completed" message in the wrong place? Doesn't PQsendQuery() complete
immediately, and the check needs to be put *after* the PQgetResult() call?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#3Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#2)
1 attachment(s)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

Hi,

Am Samstag, den 11.02.2017, 11:07 +0100 schrieb Magnus Hagander:

As for the code, while I haven't tested it, isn't the "checkpoint
completed" message in the wrong place? Doesn't PQsendQuery() complete
immediately, and the check needs to be put *after* the PQgetResult()
call?

I guess you're right, I've moved it further down. There is in fact a
message about the xlog location (unless you switch off wal entirely),
but having another one right before that mentioning the completed
checkpoint sounds ok to me.

There's also some inconsistencies around which messages are prepended
with "pg_basebackup: " and which are translatable; I guess all messages
printed on --verbose should be translatable? Also, as almost all
messages have a "pg_basebackup: " prefix, I've added it to the rest.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Attachments:

pg_basebackup_v2.patchtext/x-patch; charset=UTF-8; name=pg_basebackup_v2.patchDownload
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index c9dd62c..a298e5c 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -660,6 +660,14 @@ PostgreSQL documentation
   <title>Notes</title>
 
   <para>
+   At the beginning of the backup, a checkpoint needs to be written on the
+   server the backup is taken from.  Especially if the option
+   <literal>--checkpoint=fast</literal> is not used, this can take some time
+   during which <application>pg_basebackup</application> will be idle on the
+   server it is running on.
+  </para>
+
+  <para>
    The backup will include all files in the data directory and tablespaces,
    including the configuration files and any additional files placed in the
    directory by third parties, except certain temporary files managed by
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index b6463fa..874b6d6 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1754,6 +1754,11 @@ BaseBackup(void)
 	if (maxrate > 0)
 		maxrate_clause = psprintf("MAX_RATE %u", maxrate);
 
+	if (verbose)
+		fprintf(stderr,
+				_("%s: initiating base backup, waiting for checkpoint to complete\n"),
+				progname);
+
 	basebkp =
 		psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s",
 				 escaped_label,
@@ -1791,6 +1796,9 @@ BaseBackup(void)
 
 	strlcpy(xlogstart, PQgetvalue(res, 0, 0), sizeof(xlogstart));
 
+	if (verbose)
+		fprintf(stderr, _("%s: checkpoint completed\n"), progname);
+
 	/*
 	 * 9.3 and later sends the TLI of the starting point. With older servers,
 	 * assume it's the same as the latest timeline reported by
@@ -1804,8 +1812,8 @@ BaseBackup(void)
 	MemSet(xlogend, 0, sizeof(xlogend));
 
 	if (verbose && includewal != NO_WAL)
-		fprintf(stderr, _("transaction log start point: %s on timeline %u\n"),
-				xlogstart, starttli);
+		fprintf(stderr, _("%s: transaction log start point: %s on timeline %u\n"),
+				progname, xlogstart, starttli);
 
 	/*
 	 * Get the header
@@ -1907,7 +1915,7 @@ BaseBackup(void)
 	}
 	strlcpy(xlogend, PQgetvalue(res, 0, 0), sizeof(xlogend));
 	if (verbose && includewal != NO_WAL)
-		fprintf(stderr, "transaction log end point: %s\n", xlogend);
+		fprintf(stderr, _("%s: transaction log end point: %s\n", progname, xlogend);
 	PQclear(res);
 
 	res = PQgetResult(conn);
@@ -2048,7 +2056,7 @@ BaseBackup(void)
 	}
 
 	if (verbose)
-		fprintf(stderr, "%s: base backup completed\n", progname);
+		fprintf(stderr, _("%s: base backup completed\n)", progname);
 }
 
 
#4Michael Banck
michael.banck@credativ.de
In reply to: Michael Banck (#3)
1 attachment(s)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

Hi,

Am Samstag, den 11.02.2017, 11:25 +0100 schrieb Michael Banck:

Am Samstag, den 11.02.2017, 11:07 +0100 schrieb Magnus Hagander:

As for the code, while I haven't tested it, isn't the "checkpoint
completed" message in the wrong place? Doesn't PQsendQuery() complete
immediately, and the check needs to be put *after* the PQgetResult()
call?

I guess you're right, I've moved it further down. There is in fact a
message about the xlog location (unless you switch off wal entirely),
but having another one right before that mentioning the completed
checkpoint sounds ok to me.

There's also some inconsistencies around which messages are prepended
with "pg_basebackup: " and which are translatable; I guess all messages
printed on --verbose should be translatable? Also, as almost all
messages have a "pg_basebackup: " prefix, I've added it to the rest.

Sorry, there were two typoes in the last patch, I've attached a fixed
one.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Attachments:

pg_basebackup_v3.patchtext/x-patch; charset=UTF-8; name=pg_basebackup_v3.patchDownload
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index c9dd62c..a298e5c 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -660,6 +660,14 @@ PostgreSQL documentation
   <title>Notes</title>
 
   <para>
+   At the beginning of the backup, a checkpoint needs to be written on the
+   server the backup is taken from.  Especially if the option
+   <literal>--checkpoint=fast</literal> is not used, this can take some time
+   during which <application>pg_basebackup</application> will be idle on the
+   server it is running on.
+  </para>
+
+  <para>
    The backup will include all files in the data directory and tablespaces,
    including the configuration files and any additional files placed in the
    directory by third parties, except certain temporary files managed by
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index b6463fa..60200a9 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1754,6 +1754,11 @@ BaseBackup(void)
 	if (maxrate > 0)
 		maxrate_clause = psprintf("MAX_RATE %u", maxrate);
 
+	if (verbose)
+		fprintf(stderr,
+				_("%s: initiating base backup, waiting for checkpoint to complete\n"),
+				progname);
+
 	basebkp =
 		psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s",
 				 escaped_label,
@@ -1791,6 +1796,9 @@ BaseBackup(void)
 
 	strlcpy(xlogstart, PQgetvalue(res, 0, 0), sizeof(xlogstart));
 
+	if (verbose)
+		fprintf(stderr, _("%s: checkpoint completed\n"), progname);
+
 	/*
 	 * 9.3 and later sends the TLI of the starting point. With older servers,
 	 * assume it's the same as the latest timeline reported by
@@ -1804,8 +1812,8 @@ BaseBackup(void)
 	MemSet(xlogend, 0, sizeof(xlogend));
 
 	if (verbose && includewal != NO_WAL)
-		fprintf(stderr, _("transaction log start point: %s on timeline %u\n"),
-				xlogstart, starttli);
+		fprintf(stderr, _("%s: transaction log start point: %s on timeline %u\n"),
+				progname, xlogstart, starttli);
 
 	/*
 	 * Get the header
@@ -1907,7 +1915,7 @@ BaseBackup(void)
 	}
 	strlcpy(xlogend, PQgetvalue(res, 0, 0), sizeof(xlogend));
 	if (verbose && includewal != NO_WAL)
-		fprintf(stderr, "transaction log end point: %s\n", xlogend);
+		fprintf(stderr, _("%s: transaction log end point: %s\n"), progname, xlogend);
 	PQclear(res);
 
 	res = PQgetResult(conn);
@@ -2048,7 +2056,7 @@ BaseBackup(void)
 	}
 
 	if (verbose)
-		fprintf(stderr, "%s: base backup completed\n", progname);
+		fprintf(stderr, _("%s: base backup completed\n"), progname);
 }
 
 
#5Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Michael Banck (#4)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

On 2/11/17 4:36 AM, Michael Banck wrote:

I guess you're right, I've moved it further down. There is in fact a
message about the xlog location (unless you switch off wal entirely),
but having another one right before that mentioning the completed
checkpoint sounds ok to me.

1) I don't think this should be verbose output. Having a program sit
there "doing nothing" for no apparent reason is just horrible UI design.

2) I think it'd be useful to have a way to get the status of a running
checkpoint. The checkpointer already has that info, and I think it might
even be in shared memory already. If there was a function that reported
checkpoint status pg_basebackup could poll that to provide users with
live status. That should be a separate patch though.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Magnus Hagander
magnus@hagander.net
In reply to: Jim Nasby (#5)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

On Mon, Feb 13, 2017 at 3:29 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 2/11/17 4:36 AM, Michael Banck wrote:

I guess you're right, I've moved it further down. There is in fact a
message about the xlog location (unless you switch off wal entirely),
but having another one right before that mentioning the completed
checkpoint sounds ok to me.

1) I don't think this should be verbose output. Having a program sit there
"doing nothing" for no apparent reason is just horrible UI design.

That would include much of Unix then.. For example if I run "cp" on a large
file it sits around "doing nothing". Same if I do "tar". No?

2) I think it'd be useful to have a way to get the status of a running
checkpoint. The checkpointer already has that info, and I think it might
even be in shared memory already. If there was a function that reported
checkpoint status pg_basebackup could poll that to provide users with live
status. That should be a separate patch though.

I agree that this would definitely be useful. But it might be something
that's better exposed as a server-side view?

(and if pg_basebackup could poll it it would probably still not be included
by default -- only if -P was given).

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#7Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#6)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

Hi,

Am Montag, den 13.02.2017, 09:31 +0100 schrieb Magnus Hagander:

On Mon, Feb 13, 2017 at 3:29 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:
On 2/11/17 4:36 AM, Michael Banck wrote:
I guess you're right, I've moved it further down.
There is in fact a
message about the xlog location (unless you switch off
wal entirely),
but having another one right before that mentioning
the completed
checkpoint sounds ok to me.

1) I don't think this should be verbose output. Having a
program sit there "doing nothing" for no apparent reason is
just horrible UI design.

That would include much of Unix then.. For example if I run "cp" on a
large file it sits around "doing nothing". Same if I do "tar". No?

The expectation for all three commands is that, even if there is no
output on stdout, they will write data to the local machine. So you can
easily monitor the progress of cp and tar by running du or something in
a different terminal.

With pg_basebackup, nothing is happening on the local machine until the
checkpoint on the remote is finished; while this is obvious to somebody
familiar with how basebackups work internally, it appears to be not
clear at all to some users.

So I think notifying the user that something is happening remotely while
the local process waits would be useful, but on the other hand,
pg_basebackup does not print anything unless (i) --verbose is set or
(ii) there is an error, so I think having it mention the checkpoint in
--verbose mode only is acceptable.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Magnus Hagander
magnus@hagander.net
In reply to: Michael Banck (#7)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

On Mon, Feb 13, 2017 at 10:33 AM, Michael Banck <michael.banck@credativ.de>
wrote:

Hi,

Am Montag, den 13.02.2017, 09:31 +0100 schrieb Magnus Hagander:

On Mon, Feb 13, 2017 at 3:29 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:
On 2/11/17 4:36 AM, Michael Banck wrote:
I guess you're right, I've moved it further down.
There is in fact a
message about the xlog location (unless you switch off
wal entirely),
but having another one right before that mentioning
the completed
checkpoint sounds ok to me.

1) I don't think this should be verbose output. Having a
program sit there "doing nothing" for no apparent reason is
just horrible UI design.

That would include much of Unix then.. For example if I run "cp" on a
large file it sits around "doing nothing". Same if I do "tar". No?

The expectation for all three commands is that, even if there is no
output on stdout, they will write data to the local machine. So you can
easily monitor the progress of cp and tar by running du or something in
a different terminal.

With pg_basebackup, nothing is happening on the local machine until the
checkpoint on the remote is finished; while this is obvious to somebody
familiar with how basebackups work internally, it appears to be not
clear at all to some users.

True.

However, outputing this info by default will make it show up in things like
everybodys cronjobs by default. Right now a successful pg_basebackup run
will come out with no output at all, which is how most Unix commands work,
and brings it's own advantages. If we change that people will have to send
all the output to /dev/null, resulting in missing the things that are
actually important in any regard.

So I think notifying the user that something is happening remotely while
the local process waits would be useful, but on the other hand,
pg_basebackup does not print anything unless (i) --verbose is set or
(ii) there is an error, so I think having it mention the checkpoint in
--verbose mode only is acceptable.

Yeah, that's my view as well. I'm all for including it in verbose mode.

*Iff* we can get a progress indicator through the checkpoint we could
include that in --progress mode. But that's a different patch, of course,
but it shouldn't be included in the default output even if we find it.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#9Robert Haas
robertmhaas@gmail.com
In reply to: Magnus Hagander (#8)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

On Tue, Feb 14, 2017 at 12:06 PM, Magnus Hagander <magnus@hagander.net> wrote:

However, outputing this info by default will make it show up in things like
everybodys cronjobs by default. Right now a successful pg_basebackup run
will come out with no output at all, which is how most Unix commands work,
and brings it's own advantages. If we change that people will have to send
all the output to /dev/null, resulting in missing the things that are
actually important in any regard.

I agree with that. I think having this show up in verbose mode is a
really good idea - when something just hangs, users don't know what's
going on, and that's bad. But showing it all the time seems like a
bridge too far. As the postmortem linked above shows, people will
think of things like "hey, let's try --verbose mode" when the obvious
thing doesn't work. What is really irritating to them is when
--verbose mode fails to be, uh, verbose.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#9)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

Robert Haas wrote:

On Tue, Feb 14, 2017 at 12:06 PM, Magnus Hagander <magnus@hagander.net> wrote:

However, outputing this info by default will make it show up in things like
everybodys cronjobs by default. Right now a successful pg_basebackup run
will come out with no output at all, which is how most Unix commands work,
and brings it's own advantages. If we change that people will have to send
all the output to /dev/null, resulting in missing the things that are
actually important in any regard.

I agree with that. I think having this show up in verbose mode is a
really good idea - when something just hangs, users don't know what's
going on, and that's bad. But showing it all the time seems like a
bridge too far. As the postmortem linked above shows, people will
think of things like "hey, let's try --verbose mode" when the obvious
thing doesn't work. What is really irritating to them is when
--verbose mode fails to be, uh, verbose.

I'd rather have a --quiet mode instead. If you're running it by hand,
you're likely to omit the switch, whereas when writing the cron job
you're going to notice lack of switch even before you let the job run
once.

I think progress reporting ought to go to stderr anyway.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#10)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

I'd rather have a --quiet mode instead. If you're running it by hand,
you're likely to omit the switch, whereas when writing the cron job
you're going to notice lack of switch even before you let the job run
once.

Well, that might've been a better way to design it, but changing it
now would break backward compatibility and I'm not really sure that's
a good idea. Even if it is, it's a separate concern from whether or
not in the less-quiet mode we should point out that we're waiting for
a checkpoint on the server side.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Jeff Janes
jeff.janes@gmail.com
In reply to: Magnus Hagander (#8)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

On Tue, Feb 14, 2017 at 9:06 AM, Magnus Hagander <magnus@hagander.net>
wrote:

Yeah, that's my view as well. I'm all for including it in verbose mode.

*Iff* we can get a progress indicator through the checkpoint we could
include that in --progress mode. But that's a different patch, of course,
but it shouldn't be included in the default output even if we find it.

I think it should show up in --progress mode. It would be great if we
could show fine-grained progress reports on the checkpoint, but if we can't
do that we should still report as fine as we are able to, which is that a
checkpoint is in progress. Otherwise we are setting the perfect as the
enemy of the good.

Cheers,

Jeff

#13Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Robert Haas (#11)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

On 2/14/17 5:18 PM, Robert Haas wrote:

On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

I'd rather have a --quiet mode instead. If you're running it by hand,
you're likely to omit the switch, whereas when writing the cron job
you're going to notice lack of switch even before you let the job run
once.

Well, that might've been a better way to design it, but changing it
now would break backward compatibility and I'm not really sure that's

Meh... it's really only going to affect cronjobs or scripts, which are
easy enough to fix, and you're not going to have that many of them (or
if you do you certainly have an automated way to push the update).

a good idea. Even if it is, it's a separate concern from whether or
not in the less-quiet mode we should point out that we're waiting for
a checkpoint on the server side.

Well, --quite was suggested because of confusion from pg_basebackup
twiddling it's thumbs...
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Jim Nasby (#13)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

On 02/17/2017 08:17 PM, Jim Nasby wrote:

On 2/14/17 5:18 PM, Robert Haas wrote:

On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

I'd rather have a --quiet mode instead. If you're running it by hand,
you're likely to omit the switch, whereas when writing the cron job
you're going to notice lack of switch even before you let the job run
once.

Well, that might've been a better way to design it, but changing it
now would break backward compatibility and I'm not really sure that's

Meh... it's really only going to affect cronjobs or scripts, which are
easy enough to fix, and you're not going to have that many of them (or
if you do you certainly have an automated way to push the update).

I think you're underestimating the breakage and overestimating how easy
it's going to be to it. It's true we'd only change this in a major
version, so people should assume possible breakage and test.

a good idea. Even if it is, it's a separate concern from whether or
not in the less-quiet mode we should point out that we're waiting for
a checkpoint on the server side.

Well, --quite was suggested because of confusion from pg_basebackup
twiddling it's thumbs...

I'm in favor of the '--verbose' route. People are used to that when
investigating issues, and it does not break existing cron jobs. I can
live with --quiet though, as long as we don't resort to some craziness
along the lines "if there's tty be verbose, otherwise be quiet".

I have my doubts about this actually addressing gitlab-like mistakes,
though, because it's a helluva jump from "It's waiting and not doing
anything," to "We need to remove the datadir." (One of the reasons being
that non-empty directory is a local issue, and there's no reason why the
tool should wait instead of just reporting an error.)

FWIW before messing with the pg_basebackup code, perhaps we should
improve the documentation and explain clearly the meaning of 'fast' and
'spread' checkpoint modes. Right now, pg_basebackup docs only say this:

Sets checkpoint mode to fast or spread (default) (see Section 24.3.3).

which is pretty damn useless, when you're investigating an issue. And
the referenced section (Making a Base Backup Using the Low Level API)
does not clearly explain how this maps to pg_start_backup(_,?).

What about adding a paragraph into pg_basebackup docs, explaining that
with 'fast' it does immediate checkpoint, while with 'spread' it'll wait
for a spread checkpoint.

regards

-- Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15David G. Johnston
david.g.johnston@gmail.com
In reply to: Tomas Vondra (#14)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

On Fri, Feb 17, 2017 at 4:22 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

What about adding a paragraph into pg_basebackup docs, explaining that
with 'fast' it does immediate checkpoint, while with 'spread' it'll wait
for a spread checkpoint.

I agree that a better, and self-contained, explanation of the behaviors
that fast and spread invoke on the server should be included directly in
the pg_basebackup docs.

Additionally, a primary benefit of pg_basebackup is hiding the low-level
details from the user and in that spirit the cross-reference link to
Section 25.3.3 "Making a Base Backup Using the Low Level API" should be
removed. If there is specific information there that a user of
pg_basebackup needs it should be presented properly in the application
documentation.

The top of pg_basebackup points to the entire 25.3 chapter but the flow
from there is solid - coverage of pg_basebackup occurs and points out the
low level API for those whose needs are not fully served by the bundled
application. If one uses pg_basebackup they should be able to stop at that
point, go back to the app page, and continue reading and skip all of 25.3.3

The term "spread checkpoint" isn't actually a defined term in our
docs...and aside from the word spread itself describing out a checkpoint
works, it isn't used outside of pg_basebackup docs. So "it will wait for a
spread checkpoint" doesn't really work - "it will start and then wait for a
normal checkpoint to complete" does.

More holistically (i.e., feel free to skip)

This paragraph from 25.3.3:

"""
This is because it performs a checkpoint, and the I/O required for the
checkpoint will be spread out over a significant period of time, by default
half your inter-checkpoint interval (see the configuration parameter
checkpoint_completion_target). This is usually what you want, because it
minimizes the impact on query processing. If you want to start the backup
as soon as possible, change the second parameter to true.
"""

is good but buried and seems like it would be more visible in Chapter 30.
Reliability and the Write-Ahead Log. To there both the internals and
backbackup pages could point the reader. There isn't a chapter dedicated
to checkpoints - nor does there need to be - but a section in 30 seems
warranted as being the official reference. Right now you have to skim the
configuration variables and "WAL Configuration" and "CHECKPOINT" and "base
backup API and pg_basebackup" to cover everything. A checkpoint chapter
with that paragraph as a focus would allow the other items to simply say
"immediate or normal checkpoint" as needed and redirect the reader for
additional context as to the trade-offs of each - whether done manually or
during some form of backup script.

David J.

#16Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#14)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

On Sat, Feb 18, 2017 at 4:52 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

I have my doubts about this actually addressing gitlab-like mistakes,
though, because it's a helluva jump from "It's waiting and not doing
anything," to "We need to remove the datadir." (One of the reasons being
that non-empty directory is a local issue, and there's no reason why the
tool should wait instead of just reporting an error.)

It's pretty clear that the gitlab postmortem involves multiple people
making multiple serious errors, including failing to test that the
ostensible backups could actually be restored. I was taught that rule
#1 as far as backups are concerned is to test that you can restore
them, so that seems like a big miss. However, I don't think the fact
they made other mistakes is a reason not to improve the things we can
improve and, certainly, having some way for pg_basebackup to tell you
that it's waiting for the master to checkpoint will help the next
person who is confused by that particular thing. That person may go
on to be confused by something else, but then again maybe not.
Improving the reporting in this case stands on its own merits.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Michael Banck
michael.banck@credativ.de
In reply to: Robert Haas (#11)
2 attachment(s)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

Hi,

Am Dienstag, den 14.02.2017, 18:18 -0500 schrieb Robert Haas:

On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

I'd rather have a --quiet mode instead. If you're running it by hand,
you're likely to omit the switch, whereas when writing the cron job
you're going to notice lack of switch even before you let the job run
once.

Well, that might've been a better way to design it, but changing it
now would break backward compatibility and I'm not really sure that's
a good idea. Even if it is, it's a separate concern from whether or
not in the less-quiet mode we should point out that we're waiting for
a checkpoint on the server side.

ISTM the consensus is that there should be no output in regular mode,
but a message should be displayed in verbose and progress mode.

So I went forth and also added a message in progress mode (unless
verbose messages are requested anyway).

Regarding the documentation, I tried to clarify the difference between
the checkpoint types a bit more, but I think any further action is
probably a larger rewrite of the documentation on this topic.

So attached are two patches, I've split it up in the documentation and
the code output part. I'll add it as one commitfest entry in the
"Clients" section though, as it's not really a big patch, unless
somebody thinks it should have a secon entry in "Documentation"?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Attachments:

0001-Documentation-updates-regarding-checkpoints-for-base.patchtext/x-patch; charset=UTF-8; name=0001-Documentation-updates-regarding-checkpoints-for-base.patchDownload
From bcbe19855f9f94eadf9e47a7f3b9a920a7f2a616 Mon Sep 17 00:00:00 2001
From: Michael Banck <michael.banck@credativ.de>
Date: Sun, 26 Feb 2017 18:06:40 +0100
Subject: [PATCH 1/2] Documentation updates regarding checkpoints for
 basebackups.

Mention that fast and immediate checkpoints are the same, and add a paragraph to
the pg_basebackup documentation about the checkpoint taken on the remote server.
---
 doc/src/sgml/backup.sgml            |  3 ++-
 doc/src/sgml/ref/pg_basebackup.sgml | 10 +++++++++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 5f009ee..9485d87 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -862,7 +862,8 @@ SELECT pg_start_backup('label', false, false);
      <xref linkend="guc-checkpoint-completion-target">).  This is
      usually what you want, because it minimizes the impact on query
      processing.  If you want to start the backup as soon as
-     possible, change the second parameter to <literal>true</>.
+     possible, change the second parameter to <literal>true</>, which will
+     issue an immediate checkpoint using as much I/O as possible.
     </para>
 
     <para>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index c9dd62c..c197630 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -419,7 +419,7 @@ PostgreSQL documentation
       <term><option>--checkpoint=<replaceable class="parameter">fast|spread</replaceable></option></term>
       <listitem>
        <para>
-        Sets checkpoint mode to fast or spread (default) (see <xref linkend="backup-lowlevel-base-backup">).
+        Sets checkpoint mode to fast (immediate) or spread (default) (see <xref linkend="backup-lowlevel-base-backup">).
        </para>
       </listitem>
      </varlistentry>
@@ -660,6 +660,14 @@ PostgreSQL documentation
   <title>Notes</title>
 
   <para>
+   At the beginning of the backup, a checkpoint needs to be written on the
+   server the backup is taken from.  Especially if the option
+   <literal>--checkpoint=fast</literal> is not used, this can take some time
+   during which <application>pg_basebackup</application> will be idle on the
+   server it is running on.
+  </para>
+
+  <para>
    The backup will include all files in the data directory and tablespaces,
    including the configuration files and any additional files placed in the
    directory by third parties, except certain temporary files managed by
-- 
2.1.4

0002-Mention-initial-checkpoint-in-pg_basebackup-for-verb.patchtext/x-patch; charset=UTF-8; name=0002-Mention-initial-checkpoint-in-pg_basebackup-for-verb.patchDownload
From 1e4051dff9710382b6b4f63373a304c6ce70c4ac Mon Sep 17 00:00:00 2001
From: Michael Banck <michael.banck@credativ.de>
Date: Sun, 26 Feb 2017 20:23:21 +0100
Subject: [PATCH 2/2] Mention initial checkpoint in pg_basebackup for
 verbose/progess output.

Before the actual data directory contents are streamed, a checkpoint is
taken on the remote server. Especially if no fast checkpoint is
requested, this can take quite a while during which the pg_basebackup
command apparently sits idle doing nothing.

To alert the user that work is being done on the remote server, mention
the checkpoint if verbose or progress output has been requested.  As
pg_basebackup does not output anything during regular operation, no
additional output is printed in this case.

Also harmonize some other verbose messages in passing.
---
 src/bin/pg_basebackup/pg_basebackup.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index bc997dc..4b75e76 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1753,6 +1753,14 @@ BaseBackup(void)
 	if (maxrate > 0)
 		maxrate_clause = psprintf("MAX_RATE %u", maxrate);
 
+	if (verbose)
+		fprintf(stderr,
+				_("%s: initiating base backup, waiting for checkpoint to complete\n"),
+				progname);
+
+	if (showprogress && !verbose)
+		fprintf(stderr, "waiting for checkpoint\n");
+
 	basebkp =
 		psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s",
 				 escaped_label,
@@ -1790,6 +1798,9 @@ BaseBackup(void)
 
 	strlcpy(xlogstart, PQgetvalue(res, 0, 0), sizeof(xlogstart));
 
+	if (verbose)
+		fprintf(stderr, _("%s: checkpoint completed\n"), progname);
+
 	/*
 	 * 9.3 and later sends the TLI of the starting point. With older servers,
 	 * assume it's the same as the latest timeline reported by
@@ -1803,8 +1814,8 @@ BaseBackup(void)
 	MemSet(xlogend, 0, sizeof(xlogend));
 
 	if (verbose && includewal != NO_WAL)
-		fprintf(stderr, _("transaction log start point: %s on timeline %u\n"),
-				xlogstart, starttli);
+		fprintf(stderr, _("%s: transaction log start point: %s on timeline %u\n"),
+				progname, xlogstart, starttli);
 
 	/*
 	 * Get the header
@@ -1906,7 +1917,7 @@ BaseBackup(void)
 	}
 	strlcpy(xlogend, PQgetvalue(res, 0, 0), sizeof(xlogend));
 	if (verbose && includewal != NO_WAL)
-		fprintf(stderr, "transaction log end point: %s\n", xlogend);
+		fprintf(stderr, _("%s: transaction log end point: %s\n"), progname, xlogend);
 	PQclear(res);
 
 	res = PQgetResult(conn);
@@ -2047,7 +2058,7 @@ BaseBackup(void)
 	}
 
 	if (verbose)
-		fprintf(stderr, "%s: base backup completed\n", progname);
+		fprintf(stderr, _("%s: base backup completed\n"), progname);
 }
 
 
-- 
2.1.4

#18Magnus Hagander
magnus@hagander.net
In reply to: Michael Banck (#17)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck <michael.banck@credativ.de>
wrote:

Hi,

Am Dienstag, den 14.02.2017, 18:18 -0500 schrieb Robert Haas:

On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

I'd rather have a --quiet mode instead. If you're running it by hand,
you're likely to omit the switch, whereas when writing the cron job
you're going to notice lack of switch even before you let the job run
once.

Well, that might've been a better way to design it, but changing it
now would break backward compatibility and I'm not really sure that's
a good idea. Even if it is, it's a separate concern from whether or
not in the less-quiet mode we should point out that we're waiting for
a checkpoint on the server side.

ISTM the consensus is that there should be no output in regular mode,
but a message should be displayed in verbose and progress mode.

So I went forth and also added a message in progress mode (unless
verbose messages are requested anyway).

Regarding the documentation, I tried to clarify the difference between
the checkpoint types a bit more, but I think any further action is
probably a larger rewrite of the documentation on this topic.

So attached are two patches, I've split it up in the documentation and
the code output part. I'll add it as one commitfest entry in the
"Clients" section though, as it's not really a big patch, unless
somebody thinks it should have a secon entry in "Documentation"?

Agreed, and applied as one patch. Except I noticed you also fixed a couple
of entries which were missing the progname in the messages -- I broke those
out to a separate patch instead.

Made a small change to "using as much I/O as available" rather than "as
possible", which I think is a better wording, along with the change of the
idle wording I suggested before. (but feel free to point it out to me if
that's wrong).

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#19Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#18)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

Hi,

Am Sonntag, den 26.02.2017, 21:32 +0100 schrieb Magnus Hagander:

On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck
<michael.banck@credativ.de> wrote:

Agreed, and applied as one patch. Except I noticed you also fixed a
couple of entries which were missing the progname in the messages -- I
broke those out to a separate patch instead.

Thanks!

Made a small change to "using as much I/O as available" rather than
"as possible", which I think is a better wording, along with the
change of the idle wording I suggested before. (but feel free to point
it out to me if that's wrong).

LGTM, I apparently missed your suggestion when I re-read the thread.

I am just wondering whether this could/should be back-patched, maybe? It
is not a bug fix, of course, but OTOH is rather small and probably
helpful to some users on current releases.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Magnus Hagander
magnus@hagander.net
In reply to: Michael Banck (#19)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

On Sun, Feb 26, 2017 at 9:53 PM, Michael Banck <michael.banck@credativ.de>
wrote:

Hi,

Am Sonntag, den 26.02.2017, 21:32 +0100 schrieb Magnus Hagander:

On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck
<michael.banck@credativ.de> wrote:

Agreed, and applied as one patch. Except I noticed you also fixed a
couple of entries which were missing the progname in the messages -- I
broke those out to a separate patch instead.

Thanks!

Made a small change to "using as much I/O as available" rather than
"as possible", which I think is a better wording, along with the
change of the idle wording I suggested before. (but feel free to point
it out to me if that's wrong).

LGTM, I apparently missed your suggestion when I re-read the thread.

I am just wondering whether this could/should be back-patched, maybe? It
is not a bug fix, of course, but OTOH is rather small and probably
helpful to some users on current releases.

Good point. We should definitely back-patch the documentation updates.

Not 100% sure about the others, as it's a small behaviour change. But since
it's only in verbose mode, I doubt it is very likely to break anybodys
scripts relying on certain output or so.

What do others think?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#18)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

Magnus Hagander <magnus@hagander.net> writes:

On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck <michael.banck@credativ.de>
wrote:

ISTM the consensus is that there should be no output in regular mode,
but a message should be displayed in verbose and progress mode.

Agreed, and applied as one patch.

Is there an argument for back-patching this?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#21)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

On Sun, Feb 26, 2017 at 9:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Magnus Hagander <magnus@hagander.net> writes:

On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck <

michael.banck@credativ.de>

wrote:

ISTM the consensus is that there should be no output in regular mode,
but a message should be displayed in verbose and progress mode.

Agreed, and applied as one patch.

Is there an argument for back-patching this?

Seems you were typing that at the same time as we did.

I'm considering it, but not swayed in either direction. Should I take your
comment as a vote that we should back-patch it?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#22)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

Magnus Hagander <magnus@hagander.net> writes:

On Sun, Feb 26, 2017 at 9:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Is there an argument for back-patching this?

I'm considering it, but not swayed in either direction. Should I take your
comment as a vote that we should back-patch it?

Yeah, I'd vote for it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24Simon Riggs
simon@2ndquadrant.com
In reply to: Magnus Hagander (#20)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

On 26 February 2017 at 20:55, Magnus Hagander <magnus@hagander.net> wrote:

What do others think?

Changing the output behaviour of a command isn't something we usually
do as a backpatch.

This change doesn't affect the default behaviour so probably wouldn't
make a difference to the outcome of the situation that generated this
thread.

Having said that, if it helps others to avoid mistakes in the future
then its worth doing, so +1 to backpatch.

I've looked into changing the actual underlying behaviour and I don't
think its feasible, so making this change will at least allow some
responsiveness from us. Thanks Michael, Magnus.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Jeff Janes
jeff.janes@gmail.com
In reply to: Magnus Hagander (#18)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

On Sun, Feb 26, 2017 at 12:32 PM, Magnus Hagander <magnus@hagander.net>
wrote:

On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck <michael.banck@credativ.de>
wrote:

Hi,

Am Dienstag, den 14.02.2017, 18:18 -0500 schrieb Robert Haas:

On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

I'd rather have a --quiet mode instead. If you're running it by hand,
you're likely to omit the switch, whereas when writing the cron job
you're going to notice lack of switch even before you let the job run
once.

Well, that might've been a better way to design it, but changing it
now would break backward compatibility and I'm not really sure that's
a good idea. Even if it is, it's a separate concern from whether or
not in the less-quiet mode we should point out that we're waiting for
a checkpoint on the server side.

ISTM the consensus is that there should be no output in regular mode,
but a message should be displayed in verbose and progress mode.

So I went forth and also added a message in progress mode (unless
verbose messages are requested anyway).

Regarding the documentation, I tried to clarify the difference between
the checkpoint types a bit more, but I think any further action is
probably a larger rewrite of the documentation on this topic.

So attached are two patches, I've split it up in the documentation and
the code output part. I'll add it as one commitfest entry in the
"Clients" section though, as it's not really a big patch, unless
somebody thinks it should have a secon entry in "Documentation"?

Agreed, and applied as one patch. Except I noticed you also fixed a couple
of entries which were missing the progname in the messages -- I broke those
out to a separate patch instead.

Made a small change to "using as much I/O as available" rather than "as
possible", which I think is a better wording, along with the change of the
idle wording I suggested before. (but feel free to point it out to me if
that's wrong).

Should the below fprintf end in a \r rather than a \n, so that the the
progress message gets over-written once the checkpoint is done and we have
moved on?

if (showprogress && !verbose)
fprintf(stderr, "waiting for checkpoint\n");

That would seem more in keeping with how the other progress messages
operate.

Cheers,

Jeff

#26Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#22)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

Hi,

Am Montag, den 27.02.2017, 16:20 +0100 schrieb Magnus Hagander:

On Sun, Feb 26, 2017 at 9:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Is there an argument for back-patching this?

Seems you were typing that at the same time as we did.

I'm considering it, but not swayed in either direction. Should I take
your comment as a vote that we should back-patch it?

I've checked back into this thread, and there seems to be a +1 from Tom
and a +(0.5-1) from Simon for backpatching, and no obvious -1s. Did you
decide against it in the end, or is this still an open item?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27Magnus Hagander
magnus@hagander.net
In reply to: Michael Banck (#26)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

On Wed, Mar 29, 2017 at 1:05 PM, Michael Banck <michael.banck@credativ.de>
wrote:

Hi,

Am Montag, den 27.02.2017, 16:20 +0100 schrieb Magnus Hagander:

On Sun, Feb 26, 2017 at 9:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Is there an argument for back-patching this?

Seems you were typing that at the same time as we did.

I'm considering it, but not swayed in either direction. Should I take
your comment as a vote that we should back-patch it?

I've checked back into this thread, and there seems to be a +1 from Tom
and a +(0.5-1) from Simon for backpatching, and no obvious -1s. Did you
decide against it in the end, or is this still an open item?

No, I plan to work on it, so it's still an open item. I've been backlogged
with other things, but I will try to get too it soon.

(This also includes considering Jeff's note)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#28Magnus Hagander
magnus@hagander.net
In reply to: Jeff Janes (#25)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

On Mon, Feb 27, 2017 at 7:46 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Sun, Feb 26, 2017 at 12:32 PM, Magnus Hagander <magnus@hagander.net>
wrote:

On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck <michael.banck@credativ.de

wrote:

Hi,

Am Dienstag, den 14.02.2017, 18:18 -0500 schrieb Robert Haas:

On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

I'd rather have a --quiet mode instead. If you're running it by

hand,

you're likely to omit the switch, whereas when writing the cron job
you're going to notice lack of switch even before you let the job run
once.

Well, that might've been a better way to design it, but changing it
now would break backward compatibility and I'm not really sure that's
a good idea. Even if it is, it's a separate concern from whether or
not in the less-quiet mode we should point out that we're waiting for
a checkpoint on the server side.

ISTM the consensus is that there should be no output in regular mode,
but a message should be displayed in verbose and progress mode.

So I went forth and also added a message in progress mode (unless
verbose messages are requested anyway).

Regarding the documentation, I tried to clarify the difference between
the checkpoint types a bit more, but I think any further action is
probably a larger rewrite of the documentation on this topic.

So attached are two patches, I've split it up in the documentation and
the code output part. I'll add it as one commitfest entry in the
"Clients" section though, as it's not really a big patch, unless
somebody thinks it should have a secon entry in "Documentation"?

Agreed, and applied as one patch. Except I noticed you also fixed a
couple of entries which were missing the progname in the messages -- I
broke those out to a separate patch instead.

Made a small change to "using as much I/O as available" rather than "as
possible", which I think is a better wording, along with the change of the
idle wording I suggested before. (but feel free to point it out to me if
that's wrong).

Should the below fprintf end in a \r rather than a \n, so that the the
progress message gets over-written once the checkpoint is done and we have
moved on?

if (showprogress && !verbose)
fprintf(stderr, "waiting for checkpoint\n");

That would seem more in keeping with how the other progress messages
operate.

Agreed, that makes more sense. I've pushed a patch that does this.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#29Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#27)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

On Fri, Mar 31, 2017 at 8:59 AM, Magnus Hagander <magnus@hagander.net>
wrote:

On Wed, Mar 29, 2017 at 1:05 PM, Michael Banck <michael.banck@credativ.de>
wrote:

Hi,

Am Montag, den 27.02.2017, 16:20 +0100 schrieb Magnus Hagander:

On Sun, Feb 26, 2017 at 9:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Is there an argument for back-patching this?

Seems you were typing that at the same time as we did.

I'm considering it, but not swayed in either direction. Should I take
your comment as a vote that we should back-patch it?

I've checked back into this thread, and there seems to be a +1 from Tom
and a +(0.5-1) from Simon for backpatching, and no obvious -1s. Did you
decide against it in the end, or is this still an open item?

No, I plan to work on it, so it's still an open item. I've been backlogged
with other things, but I will try to get too it soon.

(This also includes considering Jeff's note)

I've applied a backpatch to 9.4. Prior to that pretty much the entire patch
is a conflict, so it would need a full rewrite.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#30Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#29)
Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

Am Samstag, den 01.04.2017, 17:29 +0200 schrieb Magnus Hagander:

I've applied a backpatch to 9.4. Prior to that pretty much the entire
patch is a conflict, so it would need a full rewrite.

Thanks!

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers