WAL "low watermark" during base backup

Started by Magnus Haganderover 14 years ago14 messages
#1Magnus Hagander
magnus@hagander.net
1 attachment(s)

Attached patch implements a "low watermark wal location" in the
walsender shmem array. Setting this value in a walsender prevents
transaction log removal prior to this point - similar to how
wal_keep_segments work, except with an absolute number rather than
relative. For now, this is set when running a base backup with WAL
included - to prevent the required WAL to be recycled away while the
backup is running, without having to guestimate the value for
wal_keep_segments. (There could be other ways added to set it in the
future, but that's the only one I've done for now)

It obviously needs some documentation updates as well, but I wanted to
get some comments on the way it's done before I work on those.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Attachments:

wal_low_watermark.patchtext/x-patch; charset=US-ASCII; name=wal_low_watermark.patchDownload
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 8194,8199 **** CreateRestartPoint(int flags)
--- 8194,8206 ----
   * Calculate the last segment that we need to retain because of
   * wal_keep_segments, by subtracting wal_keep_segments from
   * the given xlog location, recptr.
+  *
+  * Also check if there any in-progress base backup that has set
+  * a low watermark preventing us from removing it.
+  *
+  * NOTE! If the last segment calculated is later than the one
+  * passed in through logId and logSeg, do *not* update the
+  * values.
   */
  static void
  KeepLogSeg(XLogRecPtr recptr, uint32 *logId, uint32 *logSeg)
***************
*** 8202,8211 **** KeepLogSeg(XLogRecPtr recptr, uint32 *logId, uint32 *logSeg)
--- 8209,8260 ----
  	uint32		seg;
  	int			d_log;
  	int			d_seg;
+ 	XLogRecPtr	lowwater = {0,0};
+ 	uint32		lowwater_log = 0;
+ 	uint32		lowwater_seg = 0;
+ 
+ 	if (max_wal_senders > 0)
+ 	{
+ 		int i;
+ 
+ 		/* Check if there is a WAL sender with a low watermark */
+ 		for (i = 0; i < max_wal_senders; i++)
+ 		{
+ 			/* use volatile pointer to prevent code rearrangement */
+ 			volatile WalSnd *walsnd = &WalSndCtl->walsnds[i];
+ 			XLogRecPtr	this_lowwater;
+ 
+ 			if (walsnd->pid == 0)
+ 				continue;
+ 
+ 			SpinLockAcquire(&walsnd->mutex);
+ 			this_lowwater = walsnd->lowwater;
+ 			SpinLockRelease(&walsnd->mutex);
+ 
+ 			if (XLByteLT(lowwater, this_lowwater))
+ 				lowwater = this_lowwater;
+ 		}
+ 
+ 		XLByteToSeg(lowwater, lowwater_log, lowwater_seg);
+ 	}
  
  	if (wal_keep_segments == 0)
+ 	{
+ 		/* No wal_keep_segments, so let low watermark decide */
+ 		if (lowwater_log == 0 && lowwater_seg == 0)
+ 			return;
+ 
+ 		if (lowwater_log < *logId || (lowwater_log == *logId && lowwater_seg < *logSeg))
+ 		{
+ 			*logId = lowwater_log;
+ 			*logSeg = lowwater_seg;
+ 		}
  		return;
+ 	}
  
+ 	/*
+ 	 * Calculate the cutoff point caused by wal_keep_segments
+ 	 */
  	XLByteToSeg(recptr, log, seg);
  
  	d_seg = wal_keep_segments % XLogSegsPerFile;
***************
*** 8226,8231 **** KeepLogSeg(XLogRecPtr recptr, uint32 *logId, uint32 *logSeg)
--- 8275,8293 ----
  	else
  		log = log - d_log;
  
+ 	/*
+ 	 * If the low watermark is earlier than wal_keep_segments, let
+ 	 * it decide if we keep or not.
+ 	 */
+ 	if (lowwater_log > 0 || lowwater_seg > 0)
+ 	{
+ 		if (lowwater_log < log || (lowwater_log == log && lowwater_seg < seg))
+ 		{
+ 			log = lowwater_log;
+ 			seg = lowwater_seg;
+ 		}
+ 	}
+ 
  	/* don't delete WAL segments newer than the calculated segment */
  	if (log < *logId || (log == *logId && seg < *logSeg))
  	{
*** a/src/backend/replication/basebackup.c
--- b/src/backend/replication/basebackup.c
***************
*** 96,101 **** perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
--- 96,115 ----
  	startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &labelfile);
  	SendXlogRecPtrResult(startptr);
  
+ 	/*
+ 	 * If we are including WAL, set a low watermark so that ordinary
+ 	 * WAL rotation won't remove the files for us.
+ 	 */
+ 	if (opt->includewal)
+ 	{
+ 		/* use volatile pointer to prevent code rearrangement */
+ 		volatile WalSnd *walsnd = MyWalSnd;
+ 
+ 		SpinLockAcquire(&walsnd->mutex);
+ 		walsnd->lowwater = startptr;
+ 		SpinLockRelease(&walsnd->mutex);
+ 	}
+ 
  	PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
  	{
  		List	   *tablespaces = NIL;
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
***************
*** 899,904 **** InitWalSnd(void)
--- 899,905 ----
  			 */
  			walsnd->pid = MyProcPid;
  			MemSet(&walsnd->sentPtr, 0, sizeof(XLogRecPtr));
+ 			MemSet(&walsnd->lowwater, 0, sizeof(XLogRecPtr));
  			walsnd->state = WALSNDSTATE_STARTUP;
  			SpinLockRelease(&walsnd->mutex);
  			/* don't need the lock anymore */
*** a/src/include/replication/walsender.h
--- b/src/include/replication/walsender.h
***************
*** 46,51 **** typedef struct WalSnd
--- 46,57 ----
  	XLogRecPtr	flush;
  	XLogRecPtr	apply;
  
+ 	/*
+ 	 * Prevent xlog rotation prior to the low watermark (used during base
+ 	 * backups that include the transaction log)
+ 	 */
+ 	XLogRecPtr	lowwater;
+ 
  	/* Protects shared variables shown above. */
  	slock_t		mutex;
  
#2Jaime Casanova
jaime@2ndquadrant.com
In reply to: Magnus Hagander (#1)
Re: WAL "low watermark" during base backup

On Fri, Sep 2, 2011 at 12:52 PM, Magnus Hagander <magnus@hagander.net> wrote:

Attached patch implements a "low watermark wal location" in the
walsender shmem array. Setting this value in a walsender prevents
transaction log removal prior to this point - similar to how
wal_keep_segments work, except with an absolute number rather than
relative.

cool! just a question, shouldn't we clean the value after the base
backup has finished?

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

#3Magnus Hagander
magnus@hagander.net
In reply to: Jaime Casanova (#2)
Re: WAL "low watermark" during base backup

On Fri, Sep 2, 2011 at 20:12, Jaime Casanova <jaime@2ndquadrant.com> wrote:

On Fri, Sep 2, 2011 at 12:52 PM, Magnus Hagander <magnus@hagander.net> wrote:

Attached patch implements a "low watermark wal location" in the
walsender shmem array. Setting this value in a walsender prevents
transaction log removal prior to this point - similar to how
wal_keep_segments work, except with an absolute number rather than
relative.

cool! just a question, shouldn't we clean the value after the base
backup has finished?

We should. Thanks, will fix!

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#1)
Re: WAL "low watermark" during base backup

Magnus Hagander <magnus@hagander.net> writes:

Attached patch implements a "low watermark wal location" in the
walsender shmem array. Setting this value in a walsender prevents
transaction log removal prior to this point - similar to how
wal_keep_segments work, except with an absolute number rather than
relative. For now, this is set when running a base backup with WAL
included - to prevent the required WAL to be recycled away while the
backup is running, without having to guestimate the value for
wal_keep_segments. (There could be other ways added to set it in the
future, but that's the only one I've done for now)

I agree with that parenthetical remark, ie that we'll probably consider
other uses for this in future, so I'd suggest changing this one comment:

+  * Also check if there any in-progress base backup that has set
+  * a low watermark preventing us from removing it.

Just say "if any WAL sender has a low watermark that prevents us from
removing it".

Looks reasonably sane otherwise, modulo Jaime's comment about the
missing reset step.

regards, tom lane

#5Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Magnus Hagander (#1)
Re: WAL "low watermark" during base backup

Magnus Hagander <magnus@hagander.net> writes:

Attached patch implements a "low watermark wal location" in the
walsender shmem array. Setting this value in a walsender prevents
transaction log removal prior to this point - similar to how
wal_keep_segments work, except with an absolute number rather than

Cool. The first use case that comes to my mind is when to clean old WAL
files when using multiple standby servers. Will it help here?

relative. For now, this is set when running a base backup with WAL
included - to prevent the required WAL to be recycled away while the
backup is running, without having to guestimate the value for
wal_keep_segments.

I would have guessed that if you stream WALs in parallel of the backup,
and begin streaming before you pg_start_backup(), you don't need
anything more. Is that wrong?

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#6Simon Riggs
simon@2ndQuadrant.com
In reply to: Magnus Hagander (#1)
Re: WAL "low watermark" during base backup

On Fri, Sep 2, 2011 at 6:52 PM, Magnus Hagander <magnus@hagander.net> wrote:

Attached patch implements a "low watermark wal location" in the
walsender shmem array. Setting this value in a walsender prevents
transaction log removal prior to this point - similar to how
wal_keep_segments work, except with an absolute number rather than
relative. For now, this is set when running a base backup with WAL
included - to prevent the required WAL to be recycled away while the
backup is running, without having to guestimate the value for
wal_keep_segments. (There could be other ways added to set it in the
future, but that's the only one I've done for now)

It obviously needs some documentation updates as well, but I wanted to
get some comments on the way it's done before I work on those.

I'm not yet fully available for a discussion on this, but not sure I like this.

You don't have to guess the setting of wal_keep_segments, you
calculate it exactly from the size of your WAL disk. No other
calculation is easy or accurate.

This patch implements "fill disk until primary croaks" behaviour which
means you are making a wild and risky guess as to whether it will
work. If it does not, you are hosed.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#7Magnus Hagander
magnus@hagander.net
In reply to: Simon Riggs (#6)
Re: WAL "low watermark" during base backup

On Sun, Sep 4, 2011 at 19:02, Simon Riggs <simon@2ndquadrant.com> wrote:

On Fri, Sep 2, 2011 at 6:52 PM, Magnus Hagander <magnus@hagander.net> wrote:

Attached patch implements a "low watermark wal location" in the
walsender shmem array. Setting this value in a walsender prevents
transaction log removal prior to this point - similar to how
wal_keep_segments work, except with an absolute number rather than
relative. For now, this is set when running a base backup with WAL
included - to prevent the required WAL to be recycled away while the
backup is running, without having to guestimate the value for
wal_keep_segments. (There could be other ways added to set it in the
future, but that's the only one I've done for now)

It obviously needs some documentation updates as well, but I wanted to
get some comments on the way it's done before I work on those.

I'm not yet fully available for a discussion on this, but not sure I like this.

You don't have to guess the setting of wal_keep_segments, you
calculate it exactly from the size of your WAL disk. No other
calculation is easy or accurate.

Uh, no. What about the (very large number of) cases where pg is just
sitting on one partition, possibly shared with a whole lot of other
services? You'd need to set it to all-of-your-disk, which is something
that will change over time.

Maybe I wasn't entirely clear in the submission, but if it wasn't
obvious: the use-case for this is the small and simple installations
that need a simple way of doing a reliable online backup. This is the
"pg_basebackup -x" usecase altogether - for example, anybody "bigger"
likely has archiv elogging setup already, in which case this
functionality is not interesting at all.

This patch implements "fill disk until primary croaks" behaviour which
means you are making a wild and risky guess as to whether it will
work. If it does not, you are hosed.

Replace "primary" with "server" - remember that this is about backups
and not replication primarily.

That said, you are correct, it does implement that. But then again,
logging into the database and opening a transaction and just leaving
it around for $forever will have similar problems - yet, we allow
users to do that.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#8Simon Riggs
simon@2ndQuadrant.com
In reply to: Magnus Hagander (#7)
Re: WAL "low watermark" during base backup

On Mon, Sep 5, 2011 at 11:38 AM, Magnus Hagander <magnus@hagander.net> wrote:

On Sun, Sep 4, 2011 at 19:02, Simon Riggs <simon@2ndquadrant.com> wrote:

On Fri, Sep 2, 2011 at 6:52 PM, Magnus Hagander <magnus@hagander.net> wrote:

Attached patch implements a "low watermark wal location" in the
walsender shmem array. Setting this value in a walsender prevents
transaction log removal prior to this point - similar to how
wal_keep_segments work, except with an absolute number rather than
relative. For now, this is set when running a base backup with WAL
included - to prevent the required WAL to be recycled away while the
backup is running, without having to guestimate the value for
wal_keep_segments. (There could be other ways added to set it in the
future, but that's the only one I've done for now)

It obviously needs some documentation updates as well, but I wanted to
get some comments on the way it's done before I work on those.

I'm not yet fully available for a discussion on this, but not sure I like this.

You don't have to guess the setting of wal_keep_segments, you
calculate it exactly from the size of your WAL disk. No other
calculation is easy or accurate.

Uh, no. What about the (very large number of) cases where pg is just
sitting on one partition, possibly shared with a whole lot of other
services? You'd need to set it to all-of-your-disk, which is something
that will change over time.

Maybe I wasn't entirely clear in the submission, but if it wasn't
obvious: the use-case for this is the small and simple installations
that need a simple way of doing a reliable online backup. This is the
"pg_basebackup -x" usecase altogether - for example, anybody "bigger"
likely has archiv elogging setup already, in which case this
functionality is not interesting at all.

I understand the need for a reliable backup, problem is they won't get
one like this.

If your disk fills, the backup cannot end correctly, so you must
somehow avoid the disk filling while the backup is taken.

Removing the safety that prevents the disk from filling doesn't
actually prevent it filling.

If you must have this then make pg_basebackup copy xlog files
regularly during the backup. That way your backup can take forever and
your primary disk won't fill up. In many cases it actually will take
forever, but at least we don't take down the primary.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#9Magnus Hagander
magnus@hagander.net
In reply to: Simon Riggs (#8)
Re: WAL "low watermark" during base backup

On Tue, Sep 6, 2011 at 22:35, Simon Riggs <simon@2ndquadrant.com> wrote:

On Mon, Sep 5, 2011 at 11:38 AM, Magnus Hagander <magnus@hagander.net> wrote:

On Sun, Sep 4, 2011 at 19:02, Simon Riggs <simon@2ndquadrant.com> wrote:

On Fri, Sep 2, 2011 at 6:52 PM, Magnus Hagander <magnus@hagander.net> wrote:

Attached patch implements a "low watermark wal location" in the
walsender shmem array. Setting this value in a walsender prevents
transaction log removal prior to this point - similar to how
wal_keep_segments work, except with an absolute number rather than
relative. For now, this is set when running a base backup with WAL
included - to prevent the required WAL to be recycled away while the
backup is running, without having to guestimate the value for
wal_keep_segments. (There could be other ways added to set it in the
future, but that's the only one I've done for now)

It obviously needs some documentation updates as well, but I wanted to
get some comments on the way it's done before I work on those.

I'm not yet fully available for a discussion on this, but not sure I like this.

You don't have to guess the setting of wal_keep_segments, you
calculate it exactly from the size of your WAL disk. No other
calculation is easy or accurate.

Uh, no. What about the (very large number of) cases where pg is just
sitting on one partition, possibly shared with a whole lot of other
services? You'd need to set it to all-of-your-disk, which is something
that will change over time.

Maybe I wasn't entirely clear in the submission, but if it wasn't
obvious: the use-case for this is the small and simple installations
that need a simple way of doing a reliable online backup. This is the
"pg_basebackup -x" usecase altogether - for example, anybody "bigger"
likely has archiv elogging setup already, in which case this
functionality is not interesting at all.

I understand the need for a reliable backup, problem is they won't get
one like this.

If your disk fills, the backup cannot end correctly, so you must
somehow avoid the disk filling while the backup is taken.

The same thing will happen if your archive_command stops working - the
disk fills up. There are plenty of scenarios whereby the disk can fill
up.

There are a lot of cases where this really isn't a risk, and I believe
this would be a helpful feature in many of those *simple* cases.

Removing the safety that prevents the disk from filling doesn't
actually prevent it filling.

If you must have this then make pg_basebackup copy xlog files
regularly during the backup. That way your backup can take forever and
your primary disk won't fill up. In many cases it actually will take
forever, but at least we don't take down the primary.

There is a patch to do something like that as well sitting on the CF
page. I don't believe one necessarily excludes the other.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#10Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Magnus Hagander (#9)
Re: WAL "low watermark" during base backup

Magnus Hagander <magnus@hagander.net> writes:

If you must have this then make pg_basebackup copy xlog files
regularly during the backup. That way your backup can take forever and
your primary disk won't fill up. In many cases it actually will take
forever, but at least we don't take down the primary.

There is a patch to do something like that as well sitting on the CF
page. I don't believe one necessarily excludes the other.

I'm not getting why we need the later one when we have this older one?

--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#11Magnus Hagander
magnus@hagander.net
In reply to: Dimitri Fontaine (#10)
Re: WAL "low watermark" during base backup

On Fri, Sep 9, 2011 at 13:40, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:

Magnus Hagander <magnus@hagander.net> writes:

If you must have this then make pg_basebackup copy xlog files
regularly during the backup. That way your backup can take forever and
your primary disk won't fill up. In many cases it actually will take
forever, but at least we don't take down the primary.

There is a patch to do something like that as well sitting on the CF
page. I don't believe one necessarily excludes the other.

I'm not getting why we need the later one when we have this older one?

One of them is for the simple case. It requires a single connection to
the server, and it supports things like writing to tarfiles and
compression.

The other one is more compelx. It uses multiple connections (one for
the base, one for the xlog), and as such doesn't support writing to
files, only directories.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#12Florian Pflug
fgp@phlo.org
In reply to: Magnus Hagander (#11)
Re: WAL "low watermark" during base backup

On Sep9, 2011, at 13:48 , Magnus Hagander wrote:

On Fri, Sep 9, 2011 at 13:40, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:

Magnus Hagander <magnus@hagander.net> writes:

If you must have this then make pg_basebackup copy xlog files
regularly during the backup. That way your backup can take forever and
your primary disk won't fill up. In many cases it actually will take
forever, but at least we don't take down the primary.

There is a patch to do something like that as well sitting on the CF
page. I don't believe one necessarily excludes the other.

I'm not getting why we need the later one when we have this older one?

One of them is for the simple case. It requires a single connection to
the server, and it supports things like writing to tarfiles and
compression.

The other one is more compelx. It uses multiple connections (one for
the base, one for the xlog), and as such doesn't support writing to
files, only directories.

I guess the real question is, why can't we stream the WALs as they are
generated instead of at the end even over a single connection and when
writing tarfiles?

Couldn't we send all available WAL after each single data-file instead
of waiting for all data files to be transferred before sending WAL?

best regards,
Florian Pflug

#13Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Florian Pflug (#12)
Re: WAL "low watermark" during base backup

Florian Pflug <fgp@phlo.org> writes:

Couldn't we send all available WAL after each single data-file instead
of waiting for all data files to be transferred before sending WAL?

+1 (or maybe not at the file boundary but rather driven by archive
command with some internal hooking, as the backend needs some new
provisions here anyway).

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#11)
Re: WAL "low watermark" during base backup

Magnus Hagander <magnus@hagander.net> writes:

On Fri, Sep 9, 2011 at 13:40, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:

I'm not getting why we need the later one when we have this older one?

One of them is for the simple case. It requires a single connection to
the server, and it supports things like writing to tarfiles and
compression.

The other one is more compelx. It uses multiple connections (one for
the base, one for the xlog), and as such doesn't support writing to
files, only directories.

I'm with Dimitri on this one: let's not invent two different ways to do
the same thing. Let's pick the better one, or meld them somehow, so
we only have one implementation to support going forward.

regards, tom lane