Postgres 8.4: archive_timeout vs. checkpoint_timeout
Hi folks,
I want to refer to a question Rob did back in 2008 at
http://archives.postgresql.org/pgsql-general/2008-07/msg01167.php as we
are currently running into a similar question:
We are using warm standby via PITR using a shared drive between master
and slave node.
Our setup currently is set to archive_timeout = 60s and
checkpoint_timeout = 600s.
We expected that now every minute a WAL-file is written to the share,
but somehow we might misunderstood some part of the documentation as in
periods with low traffic on database the interval between WAL files is
1min up to ten minutes.
However, the goal was to have a WAL file every minute so disaster
recovering can be done fast with a minimum of lost data.
Question is: What did we miss? Do we need to put checkpoint_timeout also
to 60s and does this makes sense at all?
Cheers,
Frank
On Thu, Oct 6, 2011 at 3:47 AM, Frank Lanitz <frank@frank.uvena.de> wrote:
Hi folks,
I want to refer to a question Rob did back in 2008 at
http://archives.postgresql.org/pgsql-general/2008-07/msg01167.php as we
are currently running into a similar question:
We are using warm standby via PITR using a shared drive between master
and slave node.Our setup currently is set to archive_timeout = 60s and
checkpoint_timeout = 600s.We expected that now every minute a WAL-file is written to the share,
but somehow we might misunderstood some part of the documentation as in
periods with low traffic on database the interval between WAL files is1min up to ten minutes.
The 8.4 docs lack this detail, but the 9.0 docs explain this. I don't
believe it's a behavior change; I think it's just more clarification in the
documents (
http://www.postgresql.org/docs/9.0/interactive/runtime-config-wal.html#GUC-ARCHIVE-TIMEOUT)
" When this parameter is greater than zero, the server will switch to a new
segment file whenever this many seconds have elapsed since the last segment
file switch, ***and there has been any database activity, including a single
checkpoint.***" (emphasis mine)
Tom said something similar in the thread you referenced:
http://archives.postgresql.org/pgsql-general/2008-07/msg01166.php
"One possible connection is that an xlog file switch will not actually
happen unless some xlog output has been generated since the last switch.
If you were watching an otherwise-idle system then maybe the checkpoint
records are needed to make it look like a switch is needed. OTOH if
it's *that* idle then the checkpoints should be no-ops too."
However, the goal was to have a WAL file every minute so disaster
recovering can be done fast with a minimum of lost data.
If there was any data, it's existence in the transaction log would trigger
the archive_timeout behavior. With no database activity, you aren't missing
anything.
Question is: What did we miss? Do we need to put checkpoint_timeout also
to 60s and does this makes sense at all?
You are getting what you need (maximum 60s between data and the
corresponding data being sent through archive_command), just not exactly
what you thought you asked for.
If you absolutely must have a file every in order to sleep well, you can
lower checkpoint_timeout. Keep in mind the cost of checkpoints.
Derrick
Hi,
Thanks for your response.
Am 07.10.2011 22:05, schrieb Derrick Rice:
On Thu, Oct 6, 2011 at 3:47 AM, Frank Lanitz <frank@frank.uvena.de
<mailto:frank@frank.uvena.de>> wrote:Hi folks,
I want to refer to a question Rob did back in 2008 at
http://archives.postgresql.org/pgsql-general/2008-07/msg01167.php as we
are currently running into a similar question:
We are using warm standby via PITR using a shared drive between master
and slave node.Our setup currently is set to archive_timeout = 60s and
checkpoint_timeout = 600s.We expected that now every minute a WAL-file is written to the share,
but somehow we might misunderstood some part of the documentation as in
periods with low traffic on database the interval between WAL files is1min up to ten minutes.
The 8.4 docs lack this detail, but the 9.0 docs explain this. I don't
believe it's a behavior change; I think it's just more clarification in
the documents (
http://www.postgresql.org/docs/9.0/interactive/runtime-config-wal.html#GUC-ARCHIVE-TIMEOUT
)" When this parameter is greater than zero, the server will switch to a
new segment file whenever this many seconds have elapsed since the last
segment file switch, ***and there has been any database activity,
including a single checkpoint.***" (emphasis mine)Tom said something similar in the thread you referenced:
http://archives.postgresql.org/pgsql-general/2008-07/msg01166.php
"One possible connection is that an xlog file switch will not actually
happen unless some xlog output has been generated since the last switch.
If you were watching an otherwise-idle system then maybe the checkpoint
records are needed to make it look like a switch is needed. OTOH if
it's *that* idle then the checkpoints should be no-ops too."
We are recognizing import failures on slave after we lower the
archive_timeout below the checkpoint_timeout. Did I understand it
correctly that these errors might get caused by this?
However, the goal was to have a WAL file every minute so disaster
recovering can be done fast with a minimum of lost data.If there was any data, it's existence in the transaction log would
trigger the archive_timeout behavior. With no database activity, you
aren't missing anything.Question is: What did we miss? Do we need to put checkpoint_timeout also
to 60s and does this makes sense at all?You are getting what you need (maximum 60s between data and the
corresponding data being sent through archive_command), just not exactly
what you thought you asked for.If you absolutely must have a file every in order to sleep well, you can
lower checkpoint_timeout. Keep in mind the cost of checkpoints.
We will have to think about this.
Cheers,
Frank