Improve shutdown during online backup

Started by Albe Laurenzalmost 18 years ago9 messages
#1Albe Laurenz
laurenz.albe@wien.gv.at
2 attachment(s)

This follows up on the discussion in
http://archives.postgresql.org/pgsql-hackers/2008-03/msg01033.php

- pg_ctl will refuse a smart shutdown during online backup.
- The postmaster will also refuse to shutdown in smart mode
in that case and log a message to that effect.
- In fast shutdown mode, the server will rename "backup_label"
after successfully shutting down and log the fact.

Yours,
Laurenz Albe

Attachments:

backup-shut.doc.patchapplication/octet-stream; name=backup-shut.doc.patchDownload
diff -r -c postgresql-8.4.orig/doc/src/sgml/backup.sgml postgresql-8.4/doc/src/sgml/backup.sgml
*** postgresql-8.4.orig/doc/src/sgml/backup.sgml	2008-04-01 15:18:50.000000000 +0200
--- postgresql-8.4/doc/src/sgml/backup.sgml	2008-04-01 15:23:27.000000000 +0200
***************
*** 871,876 ****
--- 871,879 ----
      file.  In case of confusion it will
      therefore be possible to look inside a backup dump file and determine
      exactly which backup session the dump file came from.
+     Do not shutdown the server during an online backup. The server will refuse
+     a smart shutdown and terminate backup mode on a fast shutdown, as
+     described in <xref linkend="server-shutdown">.
     </para>
  
     <para>
Only in postgresql-8.4/doc/src/sgml: backup.sgml.orig
diff -r -c postgresql-8.4.orig/doc/src/sgml/ref/pg_ctl-ref.sgml postgresql-8.4/doc/src/sgml/ref/pg_ctl-ref.sgml
*** postgresql-8.4.orig/doc/src/sgml/ref/pg_ctl-ref.sgml	2008-04-01 15:18:50.000000000 +0200
--- postgresql-8.4/doc/src/sgml/ref/pg_ctl-ref.sgml	2008-04-01 15:23:27.000000000 +0200
***************
*** 134,141 ****
     the specified data directory is shut down.  Three different
     shutdown methods can be selected with the <option>-m</option>
     option: <quote>Smart</quote> mode waits for all the clients to
!    disconnect.  This is the default.  <quote>Fast</quote> mode does
!    not wait for clients to disconnect.  All active transactions are
     rolled back and clients are forcibly disconnected, then the
     server is shut down.  <quote>Immediate</quote> mode will abort
     all server processes without a clean shutdown.  This will lead to 
--- 134,143 ----
     the specified data directory is shut down.  Three different
     shutdown methods can be selected with the <option>-m</option>
     option: <quote>Smart</quote> mode waits for all the clients to
!    disconnect and will not shutdown a server in online backup
!    mode.  This is the default.  <quote>Fast</quote> mode does
!    not wait for clients to disconnect and will terminate an
!    online backup in progress.  All active transactions are
     rolled back and clients are forcibly disconnected, then the
     server is shut down.  <quote>Immediate</quote> mode will abort
     all server processes without a clean shutdown.  This will lead to 
diff -r -c postgresql-8.4.orig/doc/src/sgml/runtime.sgml postgresql-8.4/doc/src/sgml/runtime.sgml
*** postgresql-8.4.orig/doc/src/sgml/runtime.sgml	2008-04-01 15:18:50.000000000 +0200
--- postgresql-8.4/doc/src/sgml/runtime.sgml	2008-04-01 15:23:27.000000000 +0200
***************
*** 1309,1315 ****
         After receiving <systemitem>SIGTERM</systemitem>, the server
         disallows new connections, but lets existing sessions end their
         work normally. It shuts down only after all of the sessions
!        terminate normally. This is the <firstterm>Smart
         Shutdown</firstterm>.
        </para>
       </listitem>
--- 1309,1316 ----
         After receiving <systemitem>SIGTERM</systemitem>, the server
         disallows new connections, but lets existing sessions end their
         work normally. It shuts down only after all of the sessions
!        terminate normally.  If the server is in online backup mode,
!        it will refuse to shut down.  This is the <firstterm>Smart
         Shutdown</firstterm>.
        </para>
       </listitem>
***************
*** 1322,1328 ****
         The server disallows new connections and sends all existing
         server processes <systemitem>SIGTERM</systemitem>, which will cause them
         to abort their current transactions and exit promptly. It then
!        waits for the server processes to exit and finally shuts down. This is the
         <firstterm>Fast Shutdown</firstterm>.
        </para>
       </listitem>
--- 1323,1331 ----
         The server disallows new connections and sends all existing
         server processes <systemitem>SIGTERM</systemitem>, which will cause them
         to abort their current transactions and exit promptly. It then
!        waits for the server processes to exit and finally shuts down.
!        If the server is in online backup mode, backup mode will be
!        terminated, rendering the backup useless.  This is the
         <firstterm>Fast Shutdown</firstterm>.
        </para>
       </listitem>
backup-shut.patchapplication/octet-stream; name=backup-shut.patchDownload
diff -r -c postgresql-8.4.orig/src/backend/postmaster/postmaster.c postgresql-8.4/src/backend/postmaster/postmaster.c
*** postgresql-8.4.orig/src/backend/postmaster/postmaster.c	2008-04-01 15:18:52.000000000 +0200
--- postgresql-8.4/src/backend/postmaster/postmaster.c	2008-04-01 15:23:22.000000000 +0200
***************
*** 286,291 ****
--- 286,295 ----
  extern int	optreset;
  #endif
  
+ /* files for online backup */
+ #define BACKUP_LABEL_FILE	"backup_label"
+ #define BACKUP_LABEL_OLD	"backup_label.old"
+ 
  /*
   * postmaster.c - function prototypes
   */
***************
*** 1933,1938 ****
--- 1937,1943 ----
  pmdie(SIGNAL_ARGS)
  {
  	int			save_errno = errno;
+ 	struct stat		statbuf;
  
  	PG_SETMASK(&BlockSig);
  
***************
*** 1955,1960 ****
--- 1960,1975 ----
  			ereport(LOG,
  					(errmsg("received smart shutdown request")));
  
+ 			/* refuse to shut down if "backup_label" exists */
+ 			if (0 == stat(BACKUP_LABEL_FILE, &statbuf))
+ 			{
+ 				ereport(LOG,
+ 					(errmsg("online backup in progress, shutdown refused"),
+ 					errdetail_log("Execute \"pg_stop_backup()\" to leave backup mode before shutting down.")));
+ 				Shutdown = NoShutdown;
+ 				break;
+ 			}
+ 
  			if (pmState == PM_RUN)
  			{
  				/* autovacuum workers are told to shut down immediately */
***************
*** 2011,2016 ****
--- 2026,2047 ----
  			 * PostmasterStateMachine will take the next step.
  			 */
  			PostmasterStateMachine();
+ 
+ 			/*
+ 			 * Rename "backup_label" file if it exists to avoid
+ 			 * recovery after a clean fast shutdown.
+ 			 */
+ 			if (0 == stat(BACKUP_LABEL_FILE, &statbuf))
+ 			{
+ 				unlink(BACKUP_LABEL_OLD);
+ 				if (0 == rename(BACKUP_LABEL_FILE, BACKUP_LABEL_OLD))
+ 				{
+ 					ereport(LOG,
+ 						(errmsg("online backup cancelled, \"%s\" renamed to \"%s\"",
+ 							BACKUP_LABEL_FILE, BACKUP_LABEL_OLD)));
+ 				}
+ 			}
+ 
  			break;
  
  		case SIGQUIT:
Only in postgresql-8.4/src/backend/postmaster: postmaster.c.orig
diff -r -c postgresql-8.4.orig/src/bin/pg_ctl/pg_ctl.c postgresql-8.4/src/bin/pg_ctl/pg_ctl.c
*** postgresql-8.4.orig/src/bin/pg_ctl/pg_ctl.c	2008-04-01 15:18:55.000000000 +0200
--- postgresql-8.4/src/bin/pg_ctl/pg_ctl.c	2008-04-01 15:23:22.000000000 +0200
***************
*** 144,149 ****
--- 144,150 ----
  static char postopts_file[MAXPGPATH];
  static char pid_file[MAXPGPATH];
  static char conf_file[MAXPGPATH];
+ static char backup_file[MAXPGPATH];
  
  #if defined(HAVE_GETRLIMIT) && defined(RLIMIT_CORE)
  static void unlimit_core_size(void);
***************
*** 731,736 ****
--- 732,738 ----
  {
  	int			cnt;
  	pgpid_t		pid;
+ 	struct stat	statbuf;
  
  	pid = get_pgpid();
  
***************
*** 749,754 ****
--- 751,766 ----
  		exit(1);
  	}
  
+ 	if (SMART_MODE == shutdown_mode)
+ 	{
+ 		if (0 == stat(backup_file, &statbuf))
+ 		{
+ 			write_stderr(_("%s: smart shutdown refused, online backup in progress\n"),
+ 				progname);
+ 			exit(1);
+ 		}
+ 	}
+ 
  	if (kill((pid_t) pid, sig) != 0)
  	{
  		write_stderr(_("%s: could not send stop signal (PID: %ld): %s\n"), progname, pid,
***************
*** 799,804 ****
--- 811,817 ----
  {
  	int			cnt;
  	pgpid_t		pid;
+ 	struct stat	statbuf;
  
  	pid = get_pgpid();
  
***************
*** 826,831 ****
--- 839,854 ----
  
  	if (postmaster_is_alive((pid_t) pid))
  	{
+ 		if (SMART_MODE == shutdown_mode)
+ 		{
+ 			if (0 == stat(backup_file, &statbuf))
+ 			{
+ 				write_stderr(_("%s: cannot stop server; online backup in progress\n"),
+ 					progname);
+ 				exit(1);
+ 			}
+ 		}
+ 
  		if (kill((pid_t) pid, sig) != 0)
  		{
  			write_stderr(_("%s: could not send stop signal (PID: %ld): %s\n"), progname, pid,
***************
*** 1883,1888 ****
--- 1906,1912 ----
  		snprintf(postopts_file, MAXPGPATH, "%s/postmaster.opts", pg_data);
  		snprintf(pid_file, MAXPGPATH, "%s/postmaster.pid", pg_data);
  		snprintf(conf_file, MAXPGPATH, "%s/postgresql.conf", pg_data);
+ 		snprintf(backup_file, MAXPGPATH, "%s/backup_label", pg_data);
  	}
  
  	switch (ctl_command)
#2Simon Riggs
simon@2ndquadrant.com
In reply to: Albe Laurenz (#1)
Re: Improve shutdown during online backup

On Tue, 2008-04-01 at 15:34 +0200, Albe Laurenz wrote:

This follows up on the discussion in
http://archives.postgresql.org/pgsql-hackers/2008-03/msg01033.php

- pg_ctl will refuse a smart shutdown during online backup.
- The postmaster will also refuse to shutdown in smart mode
in that case and log a message to that effect.
- In fast shutdown mode, the server will rename "backup_label"
after successfully shutting down and log the fact.

Looks good.

Few comments:

* smart shutdown waits for sessions to complete, yet this just ignores
smart shutdowns which is something a little different. I think we
should wait for the backup to complete and then shutdown.

* when we say "online backup cancelled" I think we should say something
more like "online backup mode cancelled". All we are doing is removing
the backup label file, we're not actually cancelling the physical backup
since it is external to the database anyway.

* The #defines at top of postmaster.c are duplicated from xlog.c
If we can't agree on a common header file then we should at least add a
comment to mention they are duplicated (in both locations).

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk

#3Simon Riggs
simon@2ndquadrant.com
In reply to: Simon Riggs (#2)
Re: Improve shutdown during online backup

On Tue, 2008-04-01 at 17:42 +0100, Simon Riggs wrote:

Few comments:

* smart shutdown waits for sessions to complete, yet this just ignores
smart shutdowns which is something a little different. I think we
should wait for the backup to complete and then shutdown.

* The #defines at top of postmaster.c are duplicated from xlog.c
If we can't agree on a common header file then we should at least add a
comment to mention they are duplicated (in both locations).

If we add a function called something like BackupInProgress() to xlog.c,
exported via miscadmin.h then we can use it within the
PostmasterStateMachine() function like this

if (pmState == PM_WAIT_BACKENDS)
{
if (CountChildren() == 0 &&
StartupPID == 0 &&
(BgWriterPID == 0 || !FatalError) &&
WalWriterPID == 0 &&
AutoVacPID == 0 &&
!BackupInProgress()) <---- new line

so that the postmaster doesn't need to know about how we do backups.

That way you don't need any of the special cases in your patch, nor is
there any need to duplicate the #defines.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk

#4Albe Laurenz
laurenz.albe@wien.gv.at
In reply to: Simon Riggs (#3)
Re: Improve shutdown during online backup

Simon Riggs wrote:

Few comments:

* smart shutdown waits for sessions to complete, yet this just ignores
smart shutdowns which is something a little different. I think we
should wait for the backup to complete and then shutdown.

That would be more consistent, I agree.

I'll undo my changes to pg_ctl as well, as they make no more sense then.

* The #defines at top of postmaster.c are duplicated from xlog.c
If we can't agree on a common header file then we should at least add a
comment to mention they are duplicated (in both locations).

If we add a function called something like BackupInProgress()
to xlog.c,
exported via miscadmin.h then we can use it within the
PostmasterStateMachine() function like this

if (pmState == PM_WAIT_BACKENDS)
{
if (CountChildren() == 0 &&
StartupPID == 0 &&
(BgWriterPID == 0 || !FatalError) &&
WalWriterPID == 0 &&
AutoVacPID == 0 &&
!BackupInProgress()) <---- new line

so that the postmaster doesn't need to know about how we do backups.

That way you don't need any of the special cases in your patch, nor is
there any need to duplicate the #defines.

I realized that duplicating the #defines was ugly, and will do it
like that.

Thanks for the hints.

Yours,
Laurenz Albe

#5Albe Laurenz
laurenz.albe@wien.gv.at
In reply to: Simon Riggs (#3)
Re: Improve shutdown during online backup

Simon Riggs wrote:

Few comments:

* smart shutdown waits for sessions to complete, yet this just ignores
smart shutdowns which is something a little different. I think we
should wait for the backup to complete and then shutdown.

If we add a function called something like BackupInProgress() to xlog.c,
exported via miscadmin.h then we can use it within the
PostmasterStateMachine() function like this

if (pmState == PM_WAIT_BACKENDS)
{
if (CountChildren() == 0 &&
StartupPID == 0 &&
(BgWriterPID == 0 || !FatalError) &&
WalWriterPID == 0 &&
AutoVacPID == 0 &&
!BackupInProgress()) <---- new line

so that the postmaster doesn't need to know about how we do backups.

That way you don't need any of the special cases in your patch, nor is
there any need to duplicate the #defines.

I looked at that, and it won't work, for these reasons:

PostmasterStateMachine() is called once after a smart shutdown.
If there are children or a backup is in progress, pmState will remain
PM_WAIT_BACKENDS.

Now whenever a child exits, the reaper() will be called, which in turn
calls PostmasterStateMachine() again and advances pmState if appropriate.
This won't work for backups though, because removal of backup_label will
not send a SIGCHLD to the postmaster.

Moreover, if Shutdown == SmartShutdown, new connections won't be accepted,
and nobody can connect and call pg_stop_backup().
So even if I'd add a check for
(pmState == PM_WAIT_BACKENDS) && !BackupInProgress() somewhere in the
ServerLoop(), it wouldn't do much good, because the only way for somebody
to cancel online backup mode would be to manually remove the file.

So the only reasonable thing to do on smart shutdown during an online
backup is to have the shutdown request fail, right? The only alternative being
that a smart shutdown request should interrupt online backup mode.

So - unless you point out a flaw in my reasoning - I'll implement it
that way, but will put all code that handles backup_label files into
xlog.c.

Yours,
Laurenz Albe

#6Heikki Linnakangas
heikki@enterprisedb.com
In reply to: Albe Laurenz (#5)
Re: Improve shutdown during online backup

Albe Laurenz wrote:

Moreover, if Shutdown == SmartShutdown, new connections won't be accepted,
and nobody can connect and call pg_stop_backup().
So even if I'd add a check for
(pmState == PM_WAIT_BACKENDS) && !BackupInProgress() somewhere in the
ServerLoop(), it wouldn't do much good, because the only way for somebody
to cancel online backup mode would be to manually remove the file.

Good point.

So the only reasonable thing to do on smart shutdown during an online
backup is to have the shutdown request fail, right? The only alternative being
that a smart shutdown request should interrupt online backup mode.

Or we can add another state, PM_WAIT_BACKUP, before PM_WAIT_BACKENDS,
that allows new connections, and waits until the backup ends.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#7Albe Laurenz
laurenz.albe@wien.gv.at
In reply to: Heikki Linnakangas (#6)
Re: Improve shutdown during online backup

[what should happen if a smart shutdown request is received during online backup mode?
I'll cc: the hackers list, maybe others have something to say to this]

Heikki Linnakangas wrote:

Albe Laurenz wrote:

Moreover, if Shutdown == SmartShutdown, new connections won't be accepted,
and nobody can connect and call pg_stop_backup().
So even if I'd add a check for
(pmState == PM_WAIT_BACKENDS) && !BackupInProgress() somewhere in the
ServerLoop(), it wouldn't do much good, because the only way for somebody
to cancel online backup mode would be to manually remove the file.

Good point.

So the only reasonable thing to do on smart shutdown during an online
backup is to have the shutdown request fail, right? The only alternative being
that a smart shutdown request should interrupt online backup mode.

Or we can add another state, PM_WAIT_BACKUP, before PM_WAIT_BACKENDS,
that allows new connections, and waits until the backup ends.

That's an option. Maybe it is possible to restrict connections to superusers
(who are the only ones who can call pg_stop_backup() anyway).

Or, we could allow superuser connections in state PM_WAIT_BACKENDS...

Opinions?

Yours,
Laurenz Albe

#8Simon Riggs
simon@2ndquadrant.com
In reply to: Albe Laurenz (#7)
Re: Improve shutdown during online backup

On Tue, 2008-04-08 at 09:16 +0200, Albe Laurenz wrote:

Heikki Linnakangas wrote:

Albe Laurenz wrote:

Moreover, if Shutdown == SmartShutdown, new connections won't be accepted,
and nobody can connect and call pg_stop_backup().
So even if I'd add a check for
(pmState == PM_WAIT_BACKENDS) && !BackupInProgress() somewhere in the
ServerLoop(), it wouldn't do much good, because the only way for somebody
to cancel online backup mode would be to manually remove the file.

Good point.

So the only reasonable thing to do on smart shutdown during an online
backup is to have the shutdown request fail, right? The only alternative being
that a smart shutdown request should interrupt online backup mode.

Or we can add another state, PM_WAIT_BACKUP, before PM_WAIT_BACKENDS,
that allows new connections, and waits until the backup ends.

That's an option. Maybe it is possible to restrict connections to superusers
(who are the only ones who can call pg_stop_backup() anyway).

Or, we could allow superuser connections in state PM_WAIT_BACKENDS...

That sounds right.

Completely unrelated to backups, if you issue a smart shutdown and it
doesn't, you probably would like to connect and see what is happening
and why. The reason may not be a backup-in-progress.

Personally, I think "smart" shutdown could be even smarter. It should
kick off unwanted sessions, such as an idle pgAdmin session - maybe a
rule like "anything that has been idle for >30 seconds".

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

#9Gregory Stark
stark@enterprisedb.com
In reply to: Simon Riggs (#8)
Re: Improve shutdown during online backup

"Simon Riggs" <simon@2ndquadrant.com> writes:

Personally, I think "smart" shutdown could be even smarter. It should
kick off unwanted sessions, such as an idle pgAdmin session - maybe a
rule like "anything that has been idle for >30 seconds".

That's not a bad idea in itself but I don't think it's something the server
should be in the business of doing. One big reason is that the server
shouldn't be imposing arbitrary policy. That should be something the person
running the shutdown is in control over.

What you could do is have a separate program (I would write a client but a
server-side function would work too) to kick off users based on various
criteria you can specify.

Then you can put in your backup scripts two commands, one to kick off idle
users and then do a smart shutdown.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!