Archiver not picking up changes to archive_command

Started by bricklenover 15 years ago7 messages
#1bricklen
bricklen@gmail.com

Hi,

I'm stumped by an issue we are experiencing at the moment. We have
been successfully archiving logs to two standby sites for many months
now using the following command:

rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync
--bwlimit=1250 -az %p postgres@14.121.70.98:/WAL_Archive/

Due to some heavy processing today, we have been falling behind on
shipping log files (by about a 1000 logs or so), so wanted to up our
bwlimit like so:

rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync
--bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/

The db is showing the change.
SHOW archive_command:
rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync
--bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/

Yet, the running processes never get above the original bwlimit of
1250. Have I missed a step? Would "kill -HUP <archiver pid>" help?
(I'm leery of trying that untested though)

ps aux | grep rsync
postgres 27704 0.0 0.0 63820 1068 ? S 16:55 0:00 sh -c
rsync -a pg_xlog/000000010000071700000070
postgres@192.168.80.174:/WAL_Archive/ && rsync --bwlimit=1250 -az
pg_xlog/000000010000071700000070 postgres@14.121.70.98:/WAL_Archive/
postgres 27714 37.2 0.0 68716 1612 ? S 16:55 0:01 rsync
--bwlimit=1250 -az pg_xlog/000000010000071700000070
postgres@14.121.70.98:/WAL_Archive/
postgres 27715 3.0 0.0 60764 5648 ? S 16:55 0:00 ssh
-l postgres 14.121.70.98 rsync --server -logDtprz --bwlimit=1250 .
/WAL_Archive/

Thanks,

bricklen

#2bricklen
bricklen@gmail.com
In reply to: bricklen (#1)
Re: Archiver not picking up changes to archive_command

Sorry, version: PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled
by GCC gcc (GCC) 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit

Show quoted text

On Mon, May 10, 2010 at 5:01 PM, bricklen <bricklen@gmail.com> wrote:

Hi,

I'm stumped by an issue we are experiencing at the moment. We have
been successfully archiving logs to two standby sites for many months
now using the following command:

rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync
--bwlimit=1250 -az %p postgres@14.121.70.98:/WAL_Archive/

Due to some heavy processing today, we have been falling behind on
shipping log files (by about a 1000 logs or so), so wanted to up our
bwlimit like so:

rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync
--bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/

The db is showing the change.
SHOW archive_command:
rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync
--bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/

Yet, the running processes never get above the original bwlimit of
1250. Have I missed a step? Would "kill -HUP <archiver pid>" help?
(I'm leery of trying that untested though)

ps aux | grep rsync
postgres 27704  0.0  0.0  63820  1068 ?        S    16:55   0:00 sh -c
rsync -a pg_xlog/000000010000071700000070
postgres@192.168.80.174:/WAL_Archive/ && rsync --bwlimit=1250 -az
pg_xlog/000000010000071700000070 postgres@14.121.70.98:/WAL_Archive/
postgres 27714 37.2  0.0  68716  1612 ?        S    16:55   0:01 rsync
--bwlimit=1250 -az pg_xlog/000000010000071700000070
postgres@14.121.70.98:/WAL_Archive/
postgres 27715  3.0  0.0  60764  5648 ?        S    16:55   0:00 ssh
-l postgres 14.121.70.98 rsync --server -logDtprz --bwlimit=1250 .
/WAL_Archive/

Thanks,

bricklen

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: bricklen (#1)
Re: Archiver not picking up changes to archive_command

bricklen <bricklen@gmail.com> writes:

Due to some heavy processing today, we have been falling behind on
shipping log files (by about a 1000 logs or so), so wanted to up our
bwlimit like so:

rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync
--bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/

The db is showing the change.
SHOW archive_command:
rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync
--bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/

Yet, the running processes never get above the original bwlimit of
1250. Have I missed a step? Would "kill -HUP <archiver pid>" help?
(I'm leery of trying that untested though)

A look at the code shows that the archiver only notices SIGHUP once
per outer loop, so the change would only take effect once you catch up,
which is not going to help much in this case. Possibly we should change
it to check for SIGHUP after each archive_command execution.

If you kill -9 the archiver process, the postmaster will just start
a new one, but realize that that would result in two concurrent
rsync's. It might work ok to kill -9 the archiver and the current
rsync in the same command.

regards, tom lane

#4Greg Smith
greg@2ndquadrant.com
In reply to: Tom Lane (#3)
Re: Archiver not picking up changes to archive_command

Tom Lane wrote:

A look at the code shows that the archiver only notices SIGHUP once
per outer loop, so the change would only take effect once you catch up,
which is not going to help much in this case. Possibly we should change
it to check for SIGHUP after each archive_command execution.

I never considered this a really important issue to sort out because I
tell everybody it's unwise to put something complicated directly into
archive_command. Much better to call a script that gets passed %f/%p,
then let that script do all the work; don't even have to touch the
server config if you need to fix something then. The lack of error
checking that you get when just writing some shell commands directly in
the archive_command itself horrifies me in a production environment.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us

#5bricklen
bricklen@gmail.com
In reply to: Tom Lane (#3)
Re: Archiver not picking up changes to archive_command

On Mon, May 10, 2010 at 5:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

A look at the code shows that the archiver only notices SIGHUP once
per outer loop, so the change would only take effect once you catch up,
which is not going to help much in this case.  Possibly we should change
it to check for SIGHUP after each archive_command execution.

If you kill -9 the archiver process, the postmaster will just start
a new one, but realize that that would result in two concurrent
rsync's.  It might work ok to kill -9 the archiver and the current
rsync in the same command.

                       regards, tom lane

I think I'll just wait it out, then sighup.

Thanks for looking into this.

#6bricklen
bricklen@gmail.com
In reply to: Greg Smith (#4)
Re: Archiver not picking up changes to archive_command

On Mon, May 10, 2010 at 6:12 PM, Greg Smith <greg@2ndquadrant.com> wrote:

Tom Lane wrote:

A look at the code shows that the archiver only notices SIGHUP once
per outer loop, so the change would only take effect once you catch up,
which is not going to help much in this case.  Possibly we should change
it to check for SIGHUP after each archive_command execution.

I never considered this a really important issue to sort out because I tell
everybody it's unwise to put something complicated directly into
archive_command.  Much better to call a script that gets passed %f/%p, then
let that script do all the work; don't even have to touch the server config
if you need to fix something then.  The lack of error checking that you get
when just writing some shell commands directly in the archive_command itself
horrifies me in a production environment.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Thanks Greg, that's a good idea. I'll revise that series of commands
into a script, and add some error handling as you suggest.

Cheers,

Bricklen

#7Fujii Masao
masao.fujii@gmail.com
In reply to: Tom Lane (#3)
1 attachment(s)
Re: Archiver not picking up changes to archive_command

On Tue, May 11, 2010 at 9:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

bricklen <bricklen@gmail.com> writes:

Due to some heavy processing today, we have been falling behind on
shipping log files (by about a 1000 logs or so), so wanted to up our
bwlimit like so:

rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync
--bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/

The db is showing the change.
SHOW archive_command:
rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync
--bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/

Yet, the running processes never get above the original bwlimit of
1250. Have I missed a step? Would "kill -HUP <archiver pid>" help?
(I'm leery of trying that untested though)

A look at the code shows that the archiver only notices SIGHUP once
per outer loop, so the change would only take effect once you catch up,
which is not going to help much in this case.  Possibly we should change
it to check for SIGHUP after each archive_command execution.

+1

Here is the simple patch to do so.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

pgarch_check_sighup_v1.patchapplication/octet-stream; name=pgarch_check_sighup_v1.patchDownload
*** a/src/backend/postmaster/pgarch.c
--- b/src/backend/postmaster/pgarch.c
***************
*** 430,435 **** pgarch_ArchiverCopyLoop(void)
--- 430,442 ----
  
  		for (;;)
  		{
+ 			/* Check for config update */
+ 			if (got_SIGHUP)
+ 			{
+ 				got_SIGHUP = false;
+ 				ProcessConfigFile(PGC_SIGHUP);
+ 			}
+ 
  			/*
  			 * Do not initiate any more archive commands after receiving
  			 * SIGTERM, nor after the postmaster has died unexpectedly. The