BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

Started by Laurence Roweover 10 years ago87 messageshackersbugs
Jump to latest
#1Laurence Rowe
l@lrowe.co.uk
hackersbugs

The following bug has been logged on the website:

Bug reference: 13685
Logged by: Laurence Rwoe
Email address: l@lrowe.co.uk
PostgreSQL version: 9.4.5
Operating system: Mac OS X 10.10
Description:

I'm seeing Postgres 9.4.5 archive while idle every archive_timeout when I
set ``wal_level hot_standby``:

$ initdb testpg
$ cat << 'EOF' >> testpg/postgresql.conf

wal_level = hot_standby
archive_mode = on
archive_timeout = 10
checkpoint_timeout = 1h
archive_command = 'echo $(date) archive %p'
log_checkpoints = on
EOF

$ postgres -D testpg
LOG: database system was shut down at 2015-10-13 11:58:45 PDT
LOG: MultiXact member wraparound protections are now enabled
LOG: autovacuum launcher started
LOG: database system is ready to accept connections
Tue Oct 13 12:00:47 PDT 2015 archive pg_xlog/000000010000000000000001
Tue Oct 13 12:00:57 PDT 2015 archive pg_xlog/000000010000000000000002
LOG: checkpoint starting: xlog
Tue Oct 13 12:01:07 PDT 2015 archive pg_xlog/000000010000000000000003
LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s)
added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=0.026 s;
sync files=0, longest=0.000 s, average=0.000 s
Tue Oct 13 12:01:17 PDT 2015 archive pg_xlog/000000010000000000000004
Tue Oct 13 12:01:27 PDT 2015 archive pg_xlog/000000010000000000000005
LOG: checkpoint starting: xlog
Tue Oct 13 12:01:38 PDT 2015 archive pg_xlog/000000010000000000000006
LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s)
added, 0 removed, 3 recycled; write=0.000 s, sync=0.000 s, total=0.027 s;
sync files=0, longest=0.000 s, average=0.000 s
Tue Oct 13 12:01:48 PDT 2015 archive pg_xlog/000000010000000000000007
Tue Oct 13 12:01:58 PDT 2015 archive pg_xlog/000000010000000000000008
LOG: checkpoint starting: xlog
LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s)
added, 0 removed, 3 recycled; write=0.000 s, sync=0.000 s, total=0.001 s;
sync files=0, longest=0.000 s, average=0.000 s

At ``wal_level archive`` I only see archiving every checkpoint_timeout (that
it archives every checkpoint_timeout is a known limitation, see
/messages/by-id/1407389876762-5813999.post@n5.nabble.com):

$ initdb testpg
$ cat << 'EOF' >> testpg/postgresql.conf

wal_level = archive
archive_mode = on
archive_timeout = 10
checkpoint_timeout = 60
archive_command = 'echo $(date) archive %p'
log_checkpoints = on
EOF

# lrowe@Laurences-iMac ~/scratch
$ postgres -D testpg
LOG: database system was shut down at 2015-10-13 12:25:38 PDT
LOG: MultiXact member wraparound protections are now enabled
LOG: autovacuum launcher started
LOG: database system is ready to accept connections
Tue Oct 13 12:25:49 PDT 2015 archive pg_xlog/000000010000000000000001
LOG: checkpoint starting: time
LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s)
added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=0.029 s;
sync files=0, longest=0.000 s, average=0.000 s
Tue Oct 13 12:26:39 PDT 2015 archive pg_xlog/000000010000000000000002
LOG: checkpoint starting: time
LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s)
added, 0 removed, 1 recycled; write=0.000 s, sync=0.000 s, total=0.030 s;
sync files=0, longest=0.000 s, average=0.000 s
Tue Oct 13 12:27:39 PDT 2015 archive pg_xlog/000000010000000000000003

I think this additional archiving at wal_level hot_standby is a bug.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Michael Paquier
michael@paquier.xyz
In reply to: Laurence Rowe (#1)
hackersbugs
Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

On Sat, Oct 17, 2015 at 5:30 AM, <l@lrowe.co.uk> wrote:

I'm seeing Postgres 9.4.5 archive while idle every archive_timeout when I
set ``wal_level hot_standby``.
At ``wal_level archive`` I only see archiving every checkpoint_timeout

(that

it archives every checkpoint_timeout is a known limitation, see

/messages/by-id/1407389876762-5813999.post@n5.nabble.com
):

I think this additional archiving at wal_level hot_standby is a bug.

Agreed. There is indeed a difference between the way 9.3 and 9.4 behave.
When wal_level = hot_standby, with 9.4 a segment will be archived depending
on archive_timeout as you mention, and that's not the case of 9.3. There is
definitely a regression here: we should not archive a segment if there is
no activity.

If I look at the contents of the segments with 9.4 when there is no
activity, I am seeing that actually a record XLOG_RUNNING_XACTS is
generated all the time after switching a segment, leading to the archiving
of this segment because server thinks that there is new data, and actually
there is, so the segment will be archived... Here is for example the output
of pg_xlogdump in this case:
$ pg_xlogdump 000000010000000000000018
rmgr: Standby len (rec/tot): 24/ 56, tx: 0, lsn:
0/18000028, prev 0/17000060, bkp: 0000, desc: running xacts: nextXid 1001
latestCompletedXid 1000 oldestRunningXid 1001
rmgr: XLOG len (rec/tot): 0/ 32, tx: 0, lsn:
0/18000060, prev 0/18000028, bkp: 0000, desc: xlog switch
[end of records for this segment]

A little bit of debugging is directing me to the bgwriter process,
LogStandbySnapshot() being called by BackgroundWriterMain@bgwriter.c,
generating those WAL records even if a system is idle. I am adding Robert
and Andres in CC, as this is caused by commit ed46758 which is a new thing
of 9.4.

I think that a simple idea would be to not call LogStandbySnapshot() when
we are still at the beginning of a new segment. We know that the first page
of a WAL segment has a size of SizeOfXLogLongPHD, so just having a check on
that sounds enough to me. Per se the patch attached that should be applied
down to 9.4. This fixes the regression reported by Laurence for me.
Regards,
--
Michael

Attachments:

20151017_archive_idle.patchapplication/x-patch; name=20151017_archive_idle.patchDownload+11-6
#3Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#2)
hackersbugs
Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

On Sat, Oct 17, 2015 at 11:10 PM, Michael Paquier
<michael.paquier@gmail.com>wrote:

I think that a simple idea would be to not call LogStandbySnapshot() when
we are still at the beginning of a new segment. We know that the first page
of a WAL segment has a size of SizeOfXLogLongPHD, so just having a check on
that sounds enough to me. Per se the patch attached that should be applied
down to 9.4. This fixes the regression reported by Laurence for me.

This bug fix is registered in next CF so as we do not lose track of it;
https://commitfest.postgresql.org/7/398/
--
Michael

#4Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#3)
hackersbugs
Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

On Tue, Oct 20, 2015 at 2:21 AM, Michael Paquier wrote:

On Sat, Oct 17, 2015 at 11:10 PM, Michael Paquier wrote:

I think that a simple idea would be to not call LogStandbySnapshot() when
we are still at the beginning of a new segment. We know that the first page
of a WAL segment has a size of SizeOfXLogLongPHD, so just having a check on
that sounds enough to me. Per se the patch attached that should be applied
down to 9.4. This fixes the regression reported by Laurence for me.

This bug fix is registered in next CF so as we do not lose track of it;
https://commitfest.postgresql.org/7/398/

Andres has mentioned me during Postgres Europe that instead of looking
at if the last inserted record was at the beginning of a new segment,
we had better check if some progress has been done since the last
checkpoint that happened. Attached is a patch adding some logic to
track the LSN position of the last checkpoint record created, and log
the standby activity only if some progress has been done since. It
seems that we had better be sure that the last checkpoint record is
neither the last inserted record, nor the one previously inserted
before logging the standby activity, as a XLOG_STANDBY_LOCK record may
be added in between.
Regards,
--
Michael

Attachments:

20151102_archive_idle_v2.patchtext/x-patch; charset=US-ASCII; name=20151102_archive_idle_v2.patchDownload+55-7
#5Jeff Janes
jeff.janes@gmail.com
In reply to: Michael Paquier (#4)
hackersbugs
Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested

Simple patch, applies and makes cleanly, does what it says and says what it does.

If a transaction holding locks aborts on an otherwise idle server, perhaps it will take a very long time for a log-shipping standby to realize this. But I have hard time believing that anyone who cares about that would be using log-shipping (rather than streaming) anyway.

Marking it ready for committer.

The new status of this patch is: Ready for Committer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Michael Paquier
michael@paquier.xyz
In reply to: Jeff Janes (#5)
hackersbugs
Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

On Mon, Nov 2, 2015 at 2:58 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Simple patch, applies and makes cleanly, does what it says and says what it does.

If a transaction holding locks aborts on an otherwise idle server, perhaps it will take a very long time for a log-shipping standby to realize this. But I have hard time believing that anyone who cares about that would be using log-shipping (rather than streaming) anyway.

Marking it ready for committer.

The new status of this patch is: Ready for Committer

Thanks! That was deadly fast.

Just wondering: shouldn't we keep the discussion around this patch on
-bugs instead? Not saying you are wrong, Jeff, I am just not sure what
would be the best practice regarding patches related to bugs. I would
think that it is at least necessary to keep the person who reported
the bug in CC to let him know the progress though.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Jeff Janes
jeff.janes@gmail.com
In reply to: Michael Paquier (#6)
hackersbugs
Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

On Sun, Nov 1, 2015 at 11:09 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Mon, Nov 2, 2015 at 2:58 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Simple patch, applies and makes cleanly, does what it says and says what it does.

If a transaction holding locks aborts on an otherwise idle server, perhaps it will take a very long time for a log-shipping standby to realize this. But I have hard time believing that anyone who cares about that would be using log-shipping (rather than streaming) anyway.

Marking it ready for committer.

The new status of this patch is: Ready for Committer

Thanks! That was deadly fast.

Just wondering: shouldn't we keep the discussion around this patch on
-bugs instead? Not saying you are wrong, Jeff, I am just not sure what
would be the best practice regarding patches related to bugs. I would
think that it is at least necessary to keep the person who reported
the bug in CC to let him know the progress though.

I wasn't sure about -bugs vs -hackers either, but in this case I used
the review form built into the commit-fest app, and the app is what
sent the email. As far as I know I can't change its destination or
its CC list, even if I had thought ahead to do so.

I think the bug reporter should certainly be CCed when the bug is
closed, I don't know about intermediate steps in the "sausage making"
process. Something to think about for a bug-tracker we might
implement in the future. I think most bugs are summarily handled by
committers, so don't go through the commitfest process at all.

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Janes (#5)
hackersbugs
Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

On Mon, Nov 2, 2015 at 12:58 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

If a transaction holding locks aborts on an otherwise idle server, perhaps it will take a very long time for a log-shipping standby to realize this. But I have hard time believing that anyone who cares about that would be using log-shipping (rather than streaming) anyway.

I'm sure other people here understand this better than me, but I
wonder if it wouldn't make more sense to somehow log this data only if
something material has changed in the data being logged. This seems
to be trying to log something only if something else has been written
to WAL, which I'm not sure is the right test.

Also, this check here:

+                               if (last_snapshot_lsn != insert_lsn &&
+                                       checkpoint_lsn != insert_lsn &&
+                                       checkpoint_lsn != previous_lsn)

...seems like it will fire if there have been 0 or 1 WAL records since
the last checkpoint, regardless of what they are. I'm not sure that's
the right test, and it'll break again the minute we have a third thing
we want to log only if the system is non-idle.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#8)
hackersbugs
Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

On 2015-11-03 10:23:35 -0500, Robert Haas wrote:

On Mon, Nov 2, 2015 at 12:58 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

If a transaction holding locks aborts on an otherwise idle server, perhaps it will take a very long time for a log-shipping standby to realize this. But I have hard time believing that anyone who cares about that would be using log-shipping (rather than streaming) anyway.

I'm sure other people here understand this better than me, but I
wonder if it wouldn't make more sense to somehow log this data only if
something material has changed in the data being logged.

Phew. That doesn't seem easy to measure. I'm doubtful that it's worth
comparing the snapshot and such, especially in the back
branches.

We could maybe add something that we only log a snapshot if XXX
megabytes have been logged or something. But I don't know which number
to pick here - and if there's other write activity the price of a
snapshot record really isn't high.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#9)
hackersbugs
Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

On Wed, Nov 4, 2015 at 12:43 AM, Andres Freund <andres@anarazel.de> wrote:

On 2015-11-03 10:23:35 -0500, Robert Haas wrote:

On Mon, Nov 2, 2015 at 12:58 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

If a transaction holding locks aborts on an otherwise idle server, perhaps it will take a very long time for a log-shipping standby to realize this. But I have hard time believing that anyone who cares about that would be using log-shipping (rather than streaming) anyway.

I'm sure other people here understand this better than me, but I
wonder if it wouldn't make more sense to somehow log this data only if
something material has changed in the data being logged.

Phew. That doesn't seem easy to measure. I'm doubtful that it's worth
comparing the snapshot and such, especially in the back
branches.

Well, I guess that's why I thought it would be more simple to check if
we are at the beginning of a segment at first sight. This has no
chance to break if anything else like that is being added in the
future as it doesn't depend on the record types, though new similar
records added on a timely manner would need a similar check. Perhaps
this could be coupled by a check on the last XLOG_SWITCH_XLOG record
instead of checkpoint activity though.

We could maybe add something that we only log a snapshot if XXX
megabytes have been logged or something. But I don't know which number
to pick here - and if there's other write activity the price of a
snapshot record really isn't high.

On a completely idle system, I don't think we should log any standby
records. This is what ~9.3 does.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#10)
hackersbugs
Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

On November 4, 2015 12:37:02 AM GMT+01:00, Michael Paquier <michael.paquier@gmail.com> wrote:

On Wed, Nov 4, 2015 at 12:43 AM, Andres Freund <andres@anarazel.de>
wrote:

On 2015-11-03 10:23:35 -0500, Robert Haas wrote:

On Mon, Nov 2, 2015 at 12:58 AM, Jeff Janes <jeff.janes@gmail.com>

wrote:

If a transaction holding locks aborts on an otherwise idle server,

perhaps it will take a very long time for a log-shipping standby to
realize this. But I have hard time believing that anyone who cares
about that would be using log-shipping (rather than streaming) anyway.

I'm sure other people here understand this better than me, but I
wonder if it wouldn't make more sense to somehow log this data only

if

something material has changed in the data being logged.

Phew. That doesn't seem easy to measure. I'm doubtful that it's worth
comparing the snapshot and such, especially in the back
branches.

Well, I guess that's why I thought it would be more simple to check if
we are at the beginning of a segment at first sight. This has no
chance to break if anything else like that is being added in the
future as it doesn't depend on the record types, though new similar
records added on a timely manner would need a similar check. Perhaps
this could be coupled by a check on the last XLOG_SWITCH_XLOG record
instead of checkpoint activity though.

We could maybe add something that we only log a snapshot if XXX
megabytes have been logged or something. But I don't know which

number

to pick here - and if there's other write activity the price of a
snapshot record really isn't high.

On a completely idle system, I don't think we should log any standby
records. This is what ~9.3 does.

Are you sure? I think it'll around checkpoints, no? I thought Heikki had fixed that, but looking sound that doesn't seem to be the case.

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#11)
hackersbugs
Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

On Wed, Nov 4, 2015 at 8:39 AM, Andres Freund <andres@anarazel.de> wrote:

On November 4, 2015 12:37:02 AM GMT+01:00, Michael Paquier wrote:

On a completely idle system, I don't think we should log any standby
records. This is what ~9.3 does.

Are you sure? I think it'll around checkpoints, no? I thought Heikki had fixed that, but looking sound that doesn't seem to be the case.

Er, yes, sorry. I should have used clearer words: I meant idle system
with something running nothing including internal checkpoints. An
instance indeed generates a XLOG_RUNNING_XACTS record before a
checkpoint record on an idle system.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#9)
hackersbugs
Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

On Wed, Nov 4, 2015 at 12:43 AM, Andres Freund <andres@anarazel.de> wrote:

On 2015-11-03 10:23:35 -0500, Robert Haas wrote:

On Mon, Nov 2, 2015 at 12:58 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

If a transaction holding locks aborts on an otherwise idle server, perhaps it will take a very long time for a log-shipping standby to realize this. But I have hard time believing that anyone who cares about that would be using log-shipping (rather than streaming) anyway.

I'm sure other people here understand this better than me, but I
wonder if it wouldn't make more sense to somehow log this data only if
something material has changed in the data being logged.

Phew. That doesn't seem easy to measure. I'm doubtful that it's worth
comparing the snapshot and such, especially in the back
branches.

We could maybe add something that we only log a snapshot if XXX
megabytes have been logged or something. But I don't know which number
to pick here - and if there's other write activity the price of a
snapshot record really isn't high.

My first guess on the matter is that we would like to have an extra
condition that depends on max_wal_size with at least a minimum number
of segments generated since the last standby snapshot, perhaps
max_wal_size / 16, but this coefficient is clearly a rule of thumb.
With the default configuration of 1GB, that would be waiting for 4
segments to be generated before logging in a standby snapshot.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#12)
hackersbugs
Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

On 2015-11-04 16:01:28 +0900, Michael Paquier wrote:

On Wed, Nov 4, 2015 at 8:39 AM, Andres Freund <andres@anarazel.de> wrote:

On November 4, 2015 12:37:02 AM GMT+01:00, Michael Paquier wrote:

On a completely idle system, I don't think we should log any standby
records. This is what ~9.3 does.

Are you sure? I think it'll around checkpoints, no? I thought Heikki had fixed that, but looking sound that doesn't seem to be the case.

Er, yes, sorry. I should have used clearer words: I meant idle system
with something running nothing including internal checkpoints.

Uh, but you'll always have checkpoints happen on wal_level =
hot_standby, even in 9.3? Maybe I'm not parsing your sentence right.

As soon as a single checkpoint ever happened the early-return logic in
CreateCheckPoint() will fail to take the LogStandbySnapshot() in
CreateCheckPoint() into account. The test is:
if (curInsert == ControlFile->checkPoint +
MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint)) &&
ControlFile->checkPoint == ControlFile->checkPointCopy.redo)
which obviously doesn't work if there's been a WAL record logged after
the redo pointer has been determined etc.

The reason that a single checkpoint is needed to "jumpstart" the
pointless checkpoints is that otherwise we'll never have issued a
LogStandbySnapshot() and thus the above code block works if we started
from a proper shutdown checkpoint.

Independent of the idle issue, it seems to me that the location of the
LogStandbySnapshot() is actually rather suboptimal - it really should
really be before the CheckPointGuts(), not afterwards. As closer it's to
the redo pointer of the checkpoint a hot standby node starts up from,
the sooner that node can reach consistency. There's no difference for
the first time a node starts from a basebackup (since we gotta replay
that checkpoint anyway before we're consistent), but if we start from a
restartpoint...

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#14)
hackersbugs
Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

On Wed, Nov 4, 2015 at 7:33 PM, Andres Freund <andres@anarazel.de> wrote:

On 2015-11-04 16:01:28 +0900, Michael Paquier wrote:

On Wed, Nov 4, 2015 at 8:39 AM, Andres Freund <andres@anarazel.de> wrote:

On November 4, 2015 12:37:02 AM GMT+01:00, Michael Paquier wrote:

On a completely idle system, I don't think we should log any standby
records. This is what ~9.3 does.

Are you sure? I think it'll around checkpoints, no? I thought Heikki had fixed that, but looking sound that doesn't seem to be the case.

Er, yes, sorry. I should have used clearer words: I meant idle system
with something running nothing including internal checkpoints.

Uh, but you'll always have checkpoints happen on wal_level =
hot_standby, even in 9.3? Maybe I'm not parsing your sentence right.

Reading again my previous sentence I cannot get the meaning of it
myself :) Well, I just meant that in ~9.3 LogStandbySnapshot() is
called at each checkpoint, checkpoints occurring after
checkpoint_timeout even if the system is idle.

As soon as a single checkpoint ever happened the early-return logic in
CreateCheckPoint() will fail to take the LogStandbySnapshot() in
CreateCheckPoint() into account. The test is:
if (curInsert == ControlFile->checkPoint +
MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint)) &&
ControlFile->checkPoint == ControlFile->checkPointCopy.redo)
which obviously doesn't work if there's been a WAL record logged after
the redo pointer has been determined etc.

Yes. If segment switches are enforced at a pace faster than
checkpoint_timeout, this check considers that a checkpoint needs to
happen because a SWITCH_XLOG record is in-between. I am a bit
surprised that this should happen actually. The segment switch
triggers a checkpoint record, and vice-versa, even for idle systems.
Shouldn't we make this check a bit smarter then?

The reason that a single checkpoint is needed to "jumpstart" the
pointless checkpoints is that otherwise we'll never have issued a
LogStandbySnapshot() and thus the above code block works if we started
from a proper shutdown checkpoint.

Independent of the idle issue, it seems to me that the location of the
LogStandbySnapshot() is actually rather suboptimal - it really should
really be before the CheckPointGuts(), not afterwards. As closer it's to
the redo pointer of the checkpoint a hot standby node starts up from,
the sooner that node can reach consistency. There's no difference for
the first time a node starts from a basebackup (since we gotta replay
that checkpoint anyway before we're consistent), but if we start from a
restartpoint...

Agreed. LogStandbySnapshot() is called after CheckPointGuts() since
its introduction in efc16ea5. This may save time. This would surely be
a master-only optimization though.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#15)
hackersbugs
Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

On Thu, Nov 5, 2015 at 3:00 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Wed, Nov 4, 2015 at 7:33 PM, Andres Freund wrote:

As soon as a single checkpoint ever happened the early-return logic in
CreateCheckPoint() will fail to take the LogStandbySnapshot() in
CreateCheckPoint() into account. The test is:
if (curInsert == ControlFile->checkPoint +
MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint)) &&
ControlFile->checkPoint == ControlFile->checkPointCopy.redo)
which obviously doesn't work if there's been a WAL record logged after
the redo pointer has been determined etc.

Yes. If segment switches are enforced at a pace faster than
checkpoint_timeout, this check considers that a checkpoint needs to
happen because a SWITCH_XLOG record is in-between. I am a bit
surprised that this should happen actually. The segment switch
triggers a checkpoint record, and vice-versa, even for idle systems.
Shouldn't we make this check a bit smarter then?

Ah, the documentation clearly explains that setting archive_timeout
will enforce a segment switch if any activity occurred, including a
checkpoint:
http://www.postgresql.org/docs/devel/static/runtime-config-wal.html#RUNTIME-CONFIG-WAL-ARCHIVING
I missed that, sorry.

I have as well thought a bit about adding a space-related constraint
on the standby snapshot generated by the bgwriter, so as to not rely
entirely on the interval of 15s. I finished with the attached that
uses a check based on CheckPointSegments / 8 to be sure that at least
this number of segments has been generated since the last checkpoint
before logging a new snapshot. I guess that's less brittle than the
last patch. Thoughts?
--
Michael

Attachments:

20151106_archive_idle_v3.patchtext/x-patch; charset=US-ASCII; name=20151106_archive_idle_v3.patchDownload+35-4
#17Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#16)
hackersbugs
Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

On Fri, Nov 6, 2015 at 2:47 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

I have as well thought a bit about adding a space-related constraint
on the standby snapshot generated by the bgwriter, so as to not rely
entirely on the interval of 15s. I finished with the attached that
uses a check based on CheckPointSegments / 8 to be sure that at least
this number of segments has been generated since the last checkpoint
before logging a new snapshot. I guess that's less brittle than the
last patch. Thoughts?

I can't see why that would be a good idea. My understanding is that
the logical decoding code needs to get those messages pretty
regularly, and I don't see why that need would be reduced on systems
where CheckPointSegments is large.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#17)
hackersbugs
Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

On 2015-11-06 11:42:56 -0500, Robert Haas wrote:

On Fri, Nov 6, 2015 at 2:47 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

I have as well thought a bit about adding a space-related constraint
on the standby snapshot generated by the bgwriter, so as to not rely
entirely on the interval of 15s. I finished with the attached that
uses a check based on CheckPointSegments / 8 to be sure that at least
this number of segments has been generated since the last checkpoint
before logging a new snapshot. I guess that's less brittle than the
last patch. Thoughts?

I can't see why that would be a good idea. My understanding is that
the logical decoding code needs to get those messages pretty
regularly, and I don't see why that need would be reduced on systems
where CheckPointSegments is large.

Precisely.

What I'm thinking of right now is a marker somewhere in shared memory,
that tells whether anything worthwhile has happened since the last
determination of the redo pointer. Where standby snapshots don't
count. That seems like it'd be to maintain going forward than doing
precise size calculations like CreateCheckPoint() already does, and
would additionally need to handle its own standby snapshot, not to speak
of the background ones.

Seems like it'd be doable in ReserveXLogInsertLocation().

Whether it's actually worthwhile I'm not all that sure tho.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#18)
hackersbugs
Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

On Fri, Nov 6, 2015 at 11:52 AM, Andres Freund <andres@anarazel.de> wrote:

On 2015-11-06 11:42:56 -0500, Robert Haas wrote:

On Fri, Nov 6, 2015 at 2:47 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

I have as well thought a bit about adding a space-related constraint
on the standby snapshot generated by the bgwriter, so as to not rely
entirely on the interval of 15s. I finished with the attached that
uses a check based on CheckPointSegments / 8 to be sure that at least
this number of segments has been generated since the last checkpoint
before logging a new snapshot. I guess that's less brittle than the
last patch. Thoughts?

I can't see why that would be a good idea. My understanding is that
the logical decoding code needs to get those messages pretty
regularly, and I don't see why that need would be reduced on systems
where CheckPointSegments is large.

Precisely.

What I'm thinking of right now is a marker somewhere in shared memory,
that tells whether anything worthwhile has happened since the last
determination of the redo pointer. Where standby snapshots don't
count. That seems like it'd be to maintain going forward than doing
precise size calculations like CreateCheckPoint() already does, and
would additionally need to handle its own standby snapshot, not to speak
of the background ones.

Good idea.

Seems like it'd be doable in ReserveXLogInsertLocation().

Whether it's actually worthwhile I'm not all that sure tho.

Why not?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#19)
hackersbugs
Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

On November 6, 2015 6:21:50 PM GMT+01:00, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Nov 6, 2015 at 11:52 AM, Andres Freund <andres@anarazel.de>
wrote:

Seems like it'd be doable in ReserveXLogInsertLocation().

Whether it's actually worthwhile I'm not all that sure tho.

Why not?

Adds another instruction in one of the hottest spinlock protected sections of PG. Probably won't be significant, but...

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#20)
hackersbugs
#22Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#18)
hackersbugs
#23Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#22)
hackersbugs
#24Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#23)
hackersbugs
#25Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#24)
hackersbugs
#26Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#24)
hackersbugs
#27Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#26)
hackersbugs
#28Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#27)
hackersbugs
#29Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#28)
hackersbugs
#30Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#29)
hackersbugs
#31Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#30)
hackersbugs
#32Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#31)
hackersbugs
#33Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#25)
hackersbugs
#34Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#33)
hackersbugs
#35Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#32)
hackersbugs
#36Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#34)
hackersbugs
#37Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#35)
hackersbugs
#38Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#37)
hackersbugs
#39Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#38)
hackersbugs
#40Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#39)
hackersbugs
#41Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#40)
hackersbugs
#42Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#41)
hackersbugs
#43Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#41)
hackersbugs
#44Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#43)
hackersbugs
#45Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#43)
hackersbugs
#46Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#44)
hackersbugs
#47Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#46)
hackersbugs
#48Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#47)
hackersbugs
#49Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#47)
hackersbugs
#50Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#49)
hackersbugs
#51Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#50)
hackersbugs
#52Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#51)
hackersbugs
#53Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#51)
hackersbugs
#54Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#53)
hackersbugs
#55Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#54)
hackersbugs
#56Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#53)
hackersbugs
#57Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#56)
hackersbugs
#58Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#57)
hackersbugs
#59Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#58)
hackersbugs
#60Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#59)
hackersbugs
#61Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#60)
hackersbugs
#62Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#61)
hackersbugs
#63Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#62)
hackersbugs
#64Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#63)
hackersbugs
#65Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#64)
hackersbugs
#66Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#65)
hackersbugs
#67Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#66)
hackersbugs
#68Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#67)
hackersbugs
#69Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#68)
hackersbugs
#70Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#69)
hackersbugs
#71Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#70)
hackersbugs
#72Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#54)
hackersbugs
#73Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#72)
hackersbugs
#74Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#73)
hackersbugs
#75Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#74)
hackersbugs
#76Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#75)
hackersbugs
#77Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#71)
hackersbugs
#78Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#77)
hackersbugs
#79Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#76)
hackersbugs
#80Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#79)
hackersbugs
#81Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#80)
hackersbugs
#82David Steele
david@pgmasters.net
In reply to: Michael Paquier (#80)
hackersbugs
#83Michael Paquier
michael@paquier.xyz
In reply to: David Steele (#82)
hackersbugs
#84Simon Riggs
simon@2ndQuadrant.com
In reply to: David Steele (#82)
hackersbugs
#85Simon Riggs
simon@2ndQuadrant.com
In reply to: Simon Riggs (#84)
hackersbugs
#86Michael Paquier
michael@paquier.xyz
In reply to: Simon Riggs (#85)
hackersbugs
#87Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#86)
hackersbugs