commit dfda6ebaec67 versus wal_keep_segments

Started by Jeff Janesalmost 13 years ago5 messages

jeff.janes@gmail.com

almost 13 years ago

This commit introduced a problem with wal_keep_segments:

commit dfda6ebaec6763090fb78b458a979b558c50b39b
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sun Jun 24 18:06:38 2012 +0300

Don't waste the last segment of each 4GB logical log file.

in a side window do: watch "ls -lrt /tmp/data/pg_xlog"

dfda6ebaec/bin/initdb -D /tmp/data
dfda6ebaec/bin/pg_ctl -D /tmp/data -l logfile restart -o "--fsync=off
--wal_keep_segments=20"
createdb
pgbench -i -s10
pgbench -T3600

xlogs accumulate until there are about 26 of them. Then all of a sudden
they drop down to 11 of them. Then it builds back up to around 26, and
seems to stay there permanently.

At some point when it is over-pruning and recycling, it recyles the log
files that are still needed for recovery, and if the database crashes at
that point it will not recover because it can't find either the primary
secondary checkpoint records.

Cheers,

Jeff

Jeff Janes

jeff.janes@gmail.com

almost 13 years ago

In reply to: Jeff Janes (#1)

Re: commit dfda6ebaec67 versus wal_keep_segments

On Tue, Apr 2, 2013 at 10:08 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

This commit introduced a problem with wal_keep_segments:

commit dfda6ebaec6763090fb78b458a979b558c50b39b

The problem seems to be that the underflow warned about is happening,
because the check to guard it was checking the wrong thing. However, I
don't really understand KeepLogSeg. It seems like segno, and hence recptr,
don't actually serve any purpose.

Cheers,

Jeff

diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index 2f9209f..3643be8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8264,7 +8264,7 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
        XLByteToSeg(recptr, segno);

        /* avoid underflow, don't go below 1 */
-       if (segno <= wal_keep_segments)
+       if (*logSegNo <= wal_keep_segments)
                segno = 1;
        else
                segno = *logSegNo - wal_keep_segments;

Heikki Linnakangas

hlinnakangas@vmware.com

almost 13 years ago

In reply to: Jeff Janes (#2)

1 attachment(s)

Re: commit dfda6ebaec67 versus wal_keep_segments

On 03.04.2013 18:58, Jeff Janes wrote:

On Tue, Apr 2, 2013 at 10:08 PM, Jeff Janes<jeff.janes@gmail.com> wrote:

This commit introduced a problem with wal_keep_segments:

commit dfda6ebaec6763090fb78b458a979b558c50b39b

The problem seems to be that the underflow warned about is happening,
because the check to guard it was checking the wrong thing. However, I
don't really understand KeepLogSeg. It seems like segno, and hence recptr,
don't actually serve any purpose.

Hmm, the check is actually correct, but the assignment in the
else-branch isn't. The idea of KeepLogSeg is to calculate recptr -
wal_keep_segments, and assign that to *logSegNo. But only if *logSegNo
is not already < than the calculated value. Does the attached look
correct to you?

At some point when it is over-pruning and recycling, it recyles the log
files that are still needed for recovery, and if the database crashes at
that point it will not recover because it can't find either the primary
secondary checkpoint records.

So, KeepLogSeg incorrectly sets *logSegNo to 0, and CreateCheckPoint
decrements it, causing it to underflow to 2^64-1. Now RemoveOldXlogFiles
feels free to remove every WAL segment.

- Heikki

Attachments:

fix-KeepLogSeg-1.patchtext/x-diff; name=fix-KeepLogSeg-1.patchDownload

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3227d4c..2f79af6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7518,9 +7518,9 @@ CreateRestartPoint(int flags)
 }
 
 /*
- * Calculate the last segment that we need to retain because of
- * wal_keep_segments, by subtracting wal_keep_segments from
- * the given xlog location, recptr.
+ * Retreat *logSegNo to the last segment that we need to retain because of
+ * wal_keep_segments. This is calculated by by subtracting wal_keep_segments
+ * from the given xlog location, recptr.
  */
 static void
 KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
@@ -7536,7 +7536,7 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
 	if (segno <= wal_keep_segments)
 		segno = 1;
 	else
-		segno = *logSegNo - wal_keep_segments;
+		segno = segno - wal_keep_segments;
 
 	/* don't delete WAL segments newer than the calculated segment */
 	if (segno < *logSegNo)

Jeff Janes

jeff.janes@gmail.com

almost 13 years ago

In reply to: Heikki Linnakangas (#3)

Re: commit dfda6ebaec67 versus wal_keep_segments

On Wed, Apr 3, 2013 at 11:14 AM, Heikki Linnakangas <hlinnakangas@vmware.com

wrote:

On 03.04.2013 18:58, Jeff Janes wrote:

On Tue, Apr 2, 2013 at 10:08 PM, Jeff Janes<jeff.janes@gmail.com> wrote:

This commit introduced a problem with wal_keep_segments:

commit dfda6ebaec6763090fb78b458a979b**558c50b39b

The problem seems to be that the underflow warned about is happening,
because the check to guard it was checking the wrong thing. However, I
don't really understand KeepLogSeg. It seems like segno, and hence
recptr,
don't actually serve any purpose.

Hmm, the check is actually correct, but the assignment in the else-branch
isn't. The idea of KeepLogSeg is to calculate recptr - wal_keep_segments,
and assign that to *logSegNo. But only if *logSegNo is not already < than
the calculated value. Does the attached look correct to you?

Let me describe what I think is going on. My description is "On start,
recptr is the redo location of the just-completed checkpoint, and logSegNo
is the redo location segment of the checkpoint before that one. We want to
keep the previous-checkpoint redo location, and we also want to keep
wal_keep_segments before the current-checkpoint redo location, so we take
whichever is earlier."

If my understanding is now correct, then I think your patch looks correct.
(Also, applying it fixed the problem I was having.)

Why do we keep wal_keep_segments before the just-finished checkpoint,
rather than keeping that many before the previous checkpoint? I seems like
it would be more intuitive (to the DBA) for that parameter to mean "keep
this many more segments than you otherwise would". I'm not proposing we
change it, I'm just curious about why it is done that way.

Thanks,

Jeff

Heikki Linnakangas

hlinnakangas@vmware.com

almost 13 years ago

In reply to: Jeff Janes (#4)

Re: commit dfda6ebaec67 versus wal_keep_segments

On 03.04.2013 22:50, Jeff Janes wrote:

On Wed, Apr 3, 2013 at 11:14 AM, Heikki Linnakangas<hlinnakangas@vmware.com

wrote:

On 03.04.2013 18:58, Jeff Janes wrote:

On Tue, Apr 2, 2013 at 10:08 PM, Jeff Janes<jeff.janes@gmail.com> wrote:

This commit introduced a problem with wal_keep_segments:

commit dfda6ebaec6763090fb78b458a979b**558c50b39b

The problem seems to be that the underflow warned about is happening,
because the check to guard it was checking the wrong thing. However, I
don't really understand KeepLogSeg. It seems like segno, and hence
recptr,
don't actually serve any purpose.

Hmm, the check is actually correct, but the assignment in the else-branch
isn't. The idea of KeepLogSeg is to calculate recptr - wal_keep_segments,
and assign that to *logSegNo. But only if *logSegNo is not already< than
the calculated value. Does the attached look correct to you?

Let me describe what I think is going on. My description is "On start,
recptr is the redo location of the just-completed checkpoint, and logSegNo
is the redo location segment of the checkpoint before that one. We want to
keep the previous-checkpoint redo location, and we also want to keep
wal_keep_segments before the current-checkpoint redo location, so we take
whichever is earlier."

If my understanding is now correct, then I think your patch looks correct.
(Also, applying it fixed the problem I was having.)

Ok, thanks, applied.

Why do we keep wal_keep_segments before the just-finished checkpoint,
rather than keeping that many before the previous checkpoint? I seems like
it would be more intuitive (to the DBA) for that parameter to mean "keep
this many more segments than you otherwise would". I'm not proposing we
change it, I'm just curious about why it is done that way.

It feels more intuitive to me the way it is. wal_keep_log_segments means
"make sure there are always this many old WAL segments available in the
server, regardless of any other settings". If you have a standby, it
means that you don't need a new base backup as long as you don't fall
behind the master by more than wal_keep_segments segments.

On 03.04.2013 21:33, Alvaro Herrera wrote:

"by by"

Fixed, thanks.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers