Usability improvements for pg_stop_backup()

Started by Josh Berkusover 11 years ago2 messages
#1Josh Berkus
josh@agliodbs.com

Hackers,

Since Gabrielle has improved archiving with pg_stat_archiver in 9.4, I'd
like to go further and improve the usability of pg_stop_backup().
However, based on my IRC discussion with Vik, there might not be
consensus on what the right behavior *should* be. This is for 9.5, of
course.

Currently, if archive_command is failing, pg_stop_backup() will hang
forever. The only way to figure out what's wrong with pg_stop_backup()
is to tail the PostgreSQL logs. This is difficult for users to
troubleshoot, and strongly resists any kind of automation.

Yes, we can work around this by setting statement_timeout, but that has
two issues (a) the user has to remember to do it before the problem
occurs, and (b) it won't differentiate between archive failure and other
reasons it might time out.

As such, I propose that pg_stop_backup() should error with an
appropriate error message ("Could not archive WAL segments") after three
archiving attempts. We could also add an optional parameter to raise
the number of attempts from the default of three.

An alternative, if we were doing this from scratch, would be for
pg_stop_backup to return false or -1 or something if it couldn't
archive; there are reasons why a user might not care that
archive_command was failing (shared storage comes to mind). However,
that would be a surprising break with backwards compatability, since
currently users don't check the result value of pg_stop_backup().

Thoughts?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Kevin Grittner
kgrittn@ymail.com
In reply to: Josh Berkus (#1)
Re: Usability improvements for pg_stop_backup()

Josh Berkus <josh@agliodbs.com> wrote:

Currently, if archive_command is failing, pg_stop_backup() will hang
forever.  The only way to figure out what's wrong with pg_stop_backup()
is to tail the PostgreSQL logs.  This is difficult for users to
troubleshoot, and strongly resists any kind of automation.

That is bad.

Yes, we can work around this by setting statement_timeout, but that has
two issues (a) the user has to remember to do it before the problem
occurs, and (b) it won't differentiate between archive failure and other
reasons it might time out.

Clearly not a long-term solution.

As such, I propose that pg_stop_backup() should error with an
appropriate error message ("Could not archive WAL segments") after
three
archiving attempts.  We could also add an optional parameter to raise
the number of attempts from the default of three.

That sounds sane to me.

An alternative, if we were doing this from scratch, would be for
pg_stop_backup to return false or -1 or something if it couldn't
archive; there are reasons why a user might not care that
archive_command was failing (shared storage comes to mind).  However,
that would be a surprising break with backwards compatability, since
currently users don't check the result value of pg_stop_backup().

Some might, which is a stronger argument against changing what get
returned.  Even in a green field though, I would argue that
pg_stop_backup() should return information about the minimum range
of WAL files needed to perform a consistent recovery -- or possibly
duplicate everything in the backup history file.  An error seems
much more appropriate to indicate that the user does not have a
valid backup.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers