Re: pg_standby and build farm

Started by Doug Knightabout 19 years ago5 messages

dknight@wsi.com

about 19 years ago

Hi all,
I'm new to the forums, so bear with me on my questions. I've set up an
auto-archive and auto-recover pair of databases using pg_standby, which
I'm prototyping various products for high availability. I've noticed
that when I attempt to fail from the primary archiver to the secondary
recovery db using the pg_standby trigger file, the secondary detects the
trigger file, flags that it couldn't read the current WAL file
pg_standby was waiting on, then attempts to read in the previous WAL
file. I use the -m option in pg_standby, so the previous WAL file no
longer exists, which causes the secondary postgres to "panic" on not
being able to open the previous WAL and terminates. Is there a way to
prevent the looking for the previous, or preserving the previous WAL
file so that when the trigger file is detected, the secondary will come
all the way up, completely its recovery mode?

Thanks,
Doug Knight

Simon Riggs

simon@2ndquadrant.com

about 19 years ago

In reply to: Doug Knight (#1)

On Tue, 2006-12-26 at 10:34 -0500, Doug Knight wrote:

I'm new to the forums, so bear with me on my questions. I've set up an
auto-archive and auto-recover pair of databases using pg_standby,
which I'm prototyping various products for high availability.

Thanks for the feedback. pg_standby was written as a by-product of some
WAL recovery testing, so its in use, but only by me so far.

I've noticed that when I attempt to fail from the primary archiver to
the secondary recovery db using the pg_standby trigger file, the
secondary detects the trigger file, flags that it couldn't read the
current WAL file pg_standby was waiting on, then attempts to read in
the previous WAL file. I use the -m option in pg_standby, so the
previous WAL file no longer exists, which causes the secondary
postgres to "panic" on not being able to open the previous WAL and
terminates. Is there a way to prevent the looking for the previous, or
preserving the previous WAL file so that when the trigger file is
detected, the secondary will come all the way up, completely its
recovery mode?

Well, on a separate point I realised today that the -m option is faster
but just doesn't work perfectly with restartable recovery, so I had
already decided before reading this post that the move option needs to
be removed. Doubly so, now. Apologies for my slip.

Use the default copy mode and it should work fine.

The reason for the -m option was performance. Recovery is I/O-bound,
with 50% of the CPU it does use coming from IsRecordValid() - which is
where the CRC checking takes place. (I can add an option to recover
without CRC checks, if anyone wants it, once I've rejigged the option
parsing for recovery.conf.)

Should be able to use hard links, i.e. ln -f -s /archivepath/%f %p
instead. I'll test that tomorrow then issue a new version.

I'll also add a signal handler for SIGINT and SIGQUIT.

Comments?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

Simon Riggs

simon@2ndquadrant.com

about 19 years ago

In reply to: Simon Riggs (#2)

On Wed, 2006-12-27 at 20:09 +0000, Simon Riggs wrote:

The reason for the -m option was performance. Recovery is I/O-bound,
with 50% of the CPU it does use coming from IsRecordValid() - which is
where the CRC checking takes place. (I can add an option to recover
without CRC checks, if anyone wants it, once I've rejigged the option
parsing for recovery.conf.)

Make that 70% of the CPU, for long running recoveries, but the CPU only
gets as high as 20% on my tests, so still I/O bound.

Should be able to use hard links, i.e. ln -f -s /archivepath/%f %p
instead. I'll test that tomorrow then issue a new version.

The ln works, and helps, but not that much. I'll remove the -m option
and replace it with an -l option. Must be careful to use the -f option.

The majority of the I/O comes from writing dirty buffers, so enabling
the bgwriter during recovery would probably be more helpful.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

Doug Knight

dknight@wsi.com

about 19 years ago

In reply to: Simon Riggs (#3)

Thanks for the response, Simon. I'm continuing to use your script, so if
there's anything I can help you with (testing, ideas, etc), just let me
know.

Doug

On Thu, 2006-12-28 at 13:40 +0000, Simon Riggs wrote:

Show quoted text

On Wed, 2006-12-27 at 20:09 +0000, Simon Riggs wrote:

The reason for the -m option was performance. Recovery is I/O-bound,
with 50% of the CPU it does use coming from IsRecordValid() - which is
where the CRC checking takes place. (I can add an option to recover
without CRC checks, if anyone wants it, once I've rejigged the option
parsing for recovery.conf.)

Make that 70% of the CPU, for long running recoveries, but the CPU only
gets as high as 20% on my tests, so still I/O bound.

Should be able to use hard links, i.e. ln -f -s /archivepath/%f %p
instead. I'll test that tomorrow then issue a new version.

The ln works, and helps, but not that much. I'll remove the -m option
and replace it with an -l option. Must be careful to use the -f option.

The majority of the I/O comes from writing dirty buffers, so enabling
the bgwriter during recovery would probably be more helpful.

Simon Riggs

simon@2ndquadrant.com

about 19 years ago

In reply to: Doug Knight (#4)

On Thu, 2006-12-28 at 08:45 -0500, Doug Knight wrote:

Thanks for the response, Simon. I'm continuing to use your script, so
if there's anything I can help you with (testing, ideas, etc), just
let me know.

My earlier comments about mv were not correct; when fully retesting
things, I noted another error instead and have corrected that.

pg_standby now supports cp, mv and ln - all 3 now tested on Linux.

Posting new version to patches; no signal handling (yet).

Another other comments gratefully received.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com