pg_standby stuck on a wal file size <16MB

Started by Vladimir Kosilovalmost 18 years ago10 messagesgeneral
Jump to latest
#1Vladimir Kosilov
vkosilov@gmail.com

I've been having problem with pgsql-8.2.5 master/slave warm standby
replication setup where occasionally master node generates a wal file
smaller then expected 16MB. pg_standby on slave gets stuck on such short
files, and replication halts from that moment on. we have to do
pg_start_backup/ rsync of data to slave / pg_stop_backup and restart
slave in order to recover. database I'm replicating is write mostly.

this is process on slave that is waiting on log file which is smaller in
size:
postgres 14277 11857 0 03:34 ? 00:00:00 sh -c
/usr/local/pgsql/bin/pg_standby -d -k 255 -r 7 -s 10 -w 0 -t
/usr/local/recovery_control/quit_recovery /usr/local/wal_archive
000000010000034000000020 pg_xlog/RECOVERYXLOG 2>>
/usr/local/pgsql/data/standby.log
postgres 14278 14277 0 03:34 ? 00:00:00
/usr/local/pgsql/bin/pg_standby -d -k 255 -r 7 -s 10 -w 0 -t
/usr/local/recovery_control/quit_recovery /usr/local/wal_archive
000000010000034000000020 pg_xlog/RECOVERYXLOG

here is a sample list of wal_archive directory slave, note the size of
000000010000034000000020 expected wal file is less then 16MB:

...
-rw------- 1 postgres postgres 16777216 May 17 03:19
000000010000034000000017
-rw------- 1 postgres postgres 16777216 May 17 03:21
000000010000034000000018
-rw------- 1 postgres postgres 16777216 May 17 03:22
000000010000034000000019
-rw------- 1 postgres postgres 16777216 May 17 03:24
00000001000003400000001A
-rw------- 1 postgres postgres 16777216 May 17 03:26
00000001000003400000001B
-rw------- 1 postgres postgres 16777216 May 17 03:27
00000001000003400000001C
-rw------- 1 postgres postgres 16777216 May 17 03:29
00000001000003400000001D
-rw------- 1 postgres postgres 16777216 May 17 03:30
00000001000003400000001E
-rw------- 1 postgres postgres 16777216 May 17 03:32
00000001000003400000001F

-rw------- 1 postgres postgres 13746176 May 17 03:34
000000010000034000000020

-rw------- 1 postgres postgres 16777216 May 17 03:35
000000010000034000000021
-rw------- 1 postgres postgres 16777216 May 17 03:37
000000010000034000000022
-rw------- 1 postgres postgres 16777216 May 17 03:38
000000010000034000000023
-rw------- 1 postgres postgres 16777216 May 17 03:40
000000010000034000000024
-rw------- 1 postgres postgres 16777216 May 17 03:41
000000010000034000000025
-rw------- 1 postgres postgres 16777216 May 17 03:43
000000010000034000000026
-rw------- 1 postgres postgres 16777216 May 17 03:45
000000010000034000000027
...

skipping a bit further I see there is at least one other instance where
wal file is shorter then normal 16MB:

-rw------- 1 postgres postgres 16777216 May 17 05:42
00000001000003400000006F
-rw------- 1 postgres postgres 16777216 May 17 05:44
000000010000034000000070
-rw------- 1 postgres postgres 16777216 May 17 05:46
000000010000034000000071
-rw------- 1 postgres postgres 16777216 May 17 05:47
000000010000034000000072

-rw------- 1 postgres postgres 16384 May 17 05:50
000000010000034000000073

-rw------- 1 postgres postgres 16777216 May 17 05:51
000000010000034000000074
-rw------- 1 postgres postgres 16777216 May 17 05:52
000000010000034000000075
-rw------- 1 postgres postgres 16777216 May 17 05:54
000000010000034000000076

why would a master node create a wal file smaller then normal
checkpoint_segment size and how can this be avoided. I need a reliable
replication mechanism even at a cost of longer recovery on standby.

master's postgresql.conf:
..
fsync = on
wal_sync_method = open_sync
wal_buffers = 128
checkpoint_segments = 64
archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p
/usr/local/wal_archive_local/%f'

archive files are then moved on master to standby every other minute:

rsync -aq --remove-sent-files /usr/local/wal_archive_local/
slave::wal_archive/

slave's recovery.conf:
restore_command = '/usr/local/pgsql/bin/pg_standby -d -k 255 -r 7 -s 10
-w 0 -t /usr/local/recovery_control/quit_recovery /usr/local/wal_archive
%f %p 2>> /u
sr/local/pgsql/data/standby.log'

both servers are identical Dell PE1950 servers with 4 sas hd w/hardware
RAID 1+0 running:
2.6.18-8.1.8.el5 #1 SMP Tue Jul 10 06:39:17 EDT 2007 x86_64 x86_64
x86_64 GNU/Linux

Thanks much in advance
V.

#2Gurjeet Singh
singh.gurjeet@gmail.com
In reply to: Vladimir Kosilov (#1)
Re: pg_standby stuck on a wal file size <16MB

On Sun, May 18, 2008 at 12:17 AM, Vladimir Kosilov <vkosilov@gmail.com>
wrote:

I've been having problem with pgsql-8.2.5 master/slave warm standby
replication setup where occasionally master node generates a wal file
smaller then expected 16MB. pg_standby on slave gets stuck on such short
files, and replication halts from that moment on. we have to do
pg_start_backup/ rsync of data to slave / pg_stop_backup and restart slave
in order to recover. database I'm replicating is write mostly.

this is process on slave that is waiting on log file which is smaller in
size:
postgres 14277 11857 0 03:34 ? 00:00:00 sh -c
/usr/local/pgsql/bin/pg_standby -d -k 255 -r 7 -s 10 -w 0 -t
/usr/local/recovery_control/quit_recovery /usr/local/wal_archive
000000010000034000000020 pg_xlog/RECOVERYXLOG 2>>
/usr/local/pgsql/data/standby.log
postgres 14278 14277 0 03:34 ? 00:00:00
/usr/local/pgsql/bin/pg_standby -d -k 255 -r 7 -s 10 -w 0 -t
/usr/local/recovery_control/quit_recovery /usr/local/wal_archive
000000010000034000000020 pg_xlog/RECOVERYXLOG

here is a sample list of wal_archive directory slave, note the size of
000000010000034000000020 expected wal file is less then 16MB:

...
-rw------- 1 postgres postgres 16777216 May 17 03:19
000000010000034000000017
-rw------- 1 postgres postgres 16777216 May 17 03:21
000000010000034000000018
-rw------- 1 postgres postgres 16777216 May 17 03:22
000000010000034000000019
-rw------- 1 postgres postgres 16777216 May 17 03:24
00000001000003400000001A
-rw------- 1 postgres postgres 16777216 May 17 03:26
00000001000003400000001B
-rw------- 1 postgres postgres 16777216 May 17 03:27
00000001000003400000001C
-rw------- 1 postgres postgres 16777216 May 17 03:29
00000001000003400000001D
-rw------- 1 postgres postgres 16777216 May 17 03:30
00000001000003400000001E
-rw------- 1 postgres postgres 16777216 May 17 03:32
00000001000003400000001F

-rw------- 1 postgres postgres 13746176 May 17 03:34
000000010000034000000020

-rw------- 1 postgres postgres 16777216 May 17 03:35
000000010000034000000021
-rw------- 1 postgres postgres 16777216 May 17 03:37
000000010000034000000022
-rw------- 1 postgres postgres 16777216 May 17 03:38
000000010000034000000023
-rw------- 1 postgres postgres 16777216 May 17 03:40
000000010000034000000024
-rw------- 1 postgres postgres 16777216 May 17 03:41
000000010000034000000025
-rw------- 1 postgres postgres 16777216 May 17 03:43
000000010000034000000026
-rw------- 1 postgres postgres 16777216 May 17 03:45
000000010000034000000027
...

skipping a bit further I see there is at least one other instance where wal
file is shorter then normal 16MB:

-rw------- 1 postgres postgres 16777216 May 17 05:42
00000001000003400000006F
-rw------- 1 postgres postgres 16777216 May 17 05:44
000000010000034000000070
-rw------- 1 postgres postgres 16777216 May 17 05:46
000000010000034000000071
-rw------- 1 postgres postgres 16777216 May 17 05:47
000000010000034000000072

-rw------- 1 postgres postgres 16384 May 17 05:50
000000010000034000000073

-rw------- 1 postgres postgres 16777216 May 17 05:51
000000010000034000000074
-rw------- 1 postgres postgres 16777216 May 17 05:52
000000010000034000000075
-rw------- 1 postgres postgres 16777216 May 17 05:54
000000010000034000000076

why would a master node create a wal file smaller then normal
checkpoint_segment size and how can this be avoided. I need a reliable
replication mechanism even at a cost of longer recovery on standby.

master's postgresql.conf:
..
fsync = on
wal_sync_method = open_sync
wal_buffers = 128
checkpoint_segments = 64
archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p
/usr/local/wal_archive_local/%f'

archive files are then moved on master to standby every other minute:

rsync -aq --remove-sent-files /usr/local/wal_archive_local/
slave::wal_archive/

slave's recovery.conf:
restore_command = '/usr/local/pgsql/bin/pg_standby -d -k 255 -r 7 -s 10 -w
0 -t /usr/local/recovery_control/quit_recovery /usr/local/wal_archive %f %p
2>> /u
sr/local/pgsql/data/standby.log'

both servers are identical Dell PE1950 servers with 4 sas hd w/hardware
RAID 1+0 running:
2.6.18-8.1.8.el5 #1 SMP Tue Jul 10 06:39:17 EDT 2007 x86_64 x86_64 x86_64
GNU/Linux

Thanks much in advance
V.

Any reason why you are not using rsync command in the archive_command in the
first place?

gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device

#3Ioannis Tambouras
ioannis@akroninc.net
In reply to: Gurjeet Singh (#2)
Re: pg_standby stuck on a wal file size <16MB

archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p
/usr/local/wal_archive_local/%f'

The archive command tests if the wal segment exists and is a file,
but it does not check if the file is still being written. You need to
copy the file after writing has finished (it reached 16777216 bytes).
I don't have sources of pg_standby near me, but I remember in the
C code checks for complete segment sizes.

thanks
Ioannis Tambouras

#4Vlad Kosilov
vkosilov@contigo.com
In reply to: Gurjeet Singh (#2)
Re: pg_standby stuck on a wal file size <16MB

Gurjeet Singh wrote:

archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p
/usr/local/wal_archive_local/%f'

archive files are then moved on master to standby every other minute:

rsync -aq --remove-sent-files /usr/local/wal_archive_local/
slave::wal_archive/

slave's recovery.conf:
restore_command = '/usr/local/pgsql/bin/pg_standby -d -k 255 -r 7 -s 10 -w
0 -t /usr/local/recovery_control/quit_recovery /usr/local/wal_archive %f %p
2>> /u
sr/local/pgsql/data/standby.log'

both servers are identical Dell PE1950 servers with 4 sas hd w/hardware
RAID 1+0 running:
2.6.18-8.1.8.el5 #1 SMP Tue Jul 10 06:39:17 EDT 2007 x86_64 x86_64 x86_64
GNU/Linux

Thanks much in advance
V.

Any reason why you are not using rsync command in the archive_command in the
first place?

gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device

I wanted to remove dependency on networking between the nodes and
standby availability at what seems to be a very critical moment -
getting a good archive of a wal file. we tried coping to nfs mounts
before with mixed results on a writes busy db and linux/nfs

Thanks,
V.

--
________________________________________
Vladimir (Vlad) Kosilov
Senior Systems Administrator
Contigo Systems Inc.
604.683.3106 (phone)
604.648.9886 (fax)
vkosilov@contigo.com
www.contigo.com
________________________________________

#5Vlad Kosilov
vkosilov@contigo.com
In reply to: Ioannis Tambouras (#3)
Re: pg_standby stuck on a wal file size <16MB

is this wrong of me to expect that postgres would not make a wal file
available to archive_command unless it was completely ready to let go of
that wal file?
thanks,
V.
Ioannis Tambouras wrote:

archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p
/usr/local/wal_archive_local/%f'

The archive command tests if the wal segment exists and is a file,
but it does not check if the file is still being written. You need to
copy the file after writing has finished (it reached 16777216 bytes).
I don't have sources of pg_standby near me, but I remember in the
C code checks for complete segment sizes.

thanks
Ioannis Tambouras

--
________________________________________
Vladimir (Vlad) Kosilov
Senior Systems Administrator
Contigo Systems Inc.
604.683.3106 (phone)
604.648.9886 (fax)
vkosilov@contigo.com
www.contigo.com
________________________________________

#6Gurjeet Singh
singh.gurjeet@gmail.com
In reply to: Vlad Kosilov (#5)
Re: pg_standby stuck on a wal file size <16MB

On Sun, May 18, 2008 at 5:48 AM, Vlad Kosilov <vkosilov@contigo.com> wrote:

is this wrong of me to expect that postgres would not make a wal file
available to archive_command unless it was completely ready to let go of
that wal file?

thats a perfectly reasonable expectation; and thats exactly how it works.

Ioannis Tambouras wrote:

archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p

/usr/local/wal_archive_local/%f'

The archive command tests if the wal segment exists and is a file, but it
does not check if the file is still being written. You need to copy the file
after writing has finished (it reached 16777216 bytes). I don't have sources
of pg_standby near me, but I remember in the
C code checks for complete segment sizes.

thanks
Ioannis Tambouras

--
________________________________________
Vladimir (Vlad) Kosilov
Senior Systems Administrator
Contigo Systems Inc.
604.683.3106 (phone)
604.648.9886 (fax)
vkosilov@contigo.com
www.contigo.com
________________________________________

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device

#7Greg Smith
gsmith@gregsmith.com
In reply to: Ioannis Tambouras (#3)
Re: pg_standby stuck on a wal file size <16MB

On Sat, 17 May 2008, Ioannis Tambouras wrote:

The archive command tests if the wal segment exists and is a file,
but it does not check if the file is still being written.

That's because it doesn't have to; the archive command doesn't get called
until the writing is done.

I don't have sources of pg_standby near me, but I remember in the
C code checks for complete segment sizes.

That's on the receiving side, to make sure it's not trying to process
files that haven't finished copying to the standby yet. You don't have to
do any of that in the archive_command.

Anyway, back to the original question:

archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p
/usr/local/wal_archive_local/%f'
archive files are then moved on master to standby every other minute:
rsync -aq --remove-sent-files /usr/local/wal_archive_local/
slave::wal_archive/

I don't see any mechanism here to keep rsync from copying over partial
files to the standby before they've finished copying to the
wal_archive_local directory. That's my guess for where the small files
are coming from, rsync before the cp is done. If you're going to buffer
in a transfer directory, you need some sort of test or locking to make
sure the file is complete with exactly 16MB before it gets rsync'd over.
I suspect no amount of poking at the standby will root out the issue
because it's happening on the primary.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#8Vlad Kosilov
vkosilov@contigo.com
In reply to: Greg Smith (#7)
Re: pg_standby stuck on a wal file size <16MB

Greg, I think you might be onto something here. your suggestion prompts
to get rid off --remove-sent-files in my rsync command, and clean up wal
files from master by some other means, this way even if master's rsync
picks partial file on the first run, it should update and complete file
once it's finished being copied by archive_command to my buffer directory.
Thanks! I'll get this in asap, if this does not solve the problem I'll
re-post
V.
Greg Smith wrote:

On Sat, 17 May 2008, Ioannis Tambouras wrote:

The archive command tests if the wal segment exists and is a file,
but it does not check if the file is still being written.

That's because it doesn't have to; the archive command doesn't get
called until the writing is done.

I don't have sources of pg_standby near me, but I remember in the
C code checks for complete segment sizes.

That's on the receiving side, to make sure it's not trying to process
files that haven't finished copying to the standby yet. You don't
have to do any of that in the archive_command.

Anyway, back to the original question:

archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p
/usr/local/wal_archive_local/%f'
archive files are then moved on master to standby every other minute:
rsync -aq --remove-sent-files /usr/local/wal_archive_local/
slave::wal_archive/

I don't see any mechanism here to keep rsync from copying over partial
files to the standby before they've finished copying to the
wal_archive_local directory. That's my guess for where the small
files are coming from, rsync before the cp is done. If you're going
to buffer in a transfer directory, you need some sort of test or
locking to make sure the file is complete with exactly 16MB before it
gets rsync'd over. I suspect no amount of poking at the standby will
root out the issue because it's happening on the primary.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
________________________________________
Vladimir (Vlad) Kosilov
Senior Systems Administrator
Contigo Systems Inc.
604.683.3106 (phone)
604.648.9886 (fax)
vkosilov@contigo.com
www.contigo.com
________________________________________

#9Vlad Kosilov
vkosilov@contigo.com
In reply to: Greg Smith (#7)
Re: pg_standby stuck on a wal file size <16MB

as Greg pointed out: my use of rsync --remove-sent-files option had
contributed to a short sized wal log file on standby.
changing master's postgres crontab to the following helped to resolve
the issue:

# ship logs to standby:
*/2 * * * * rsync -aq /wal_archive_local/ 10.10.10.12::wal_archive/
# remove files older then remove_check file mtime
*/5 * * * * find /wal_archive_local/ ! -newer
/wal_archive_local/remove_check -exec rm -f {} \; && touch
/wal_archive_local/remove_check

Thank you!
V.

Greg Smith wrote:

On Sat, 17 May 2008, Ioannis Tambouras wrote:

The archive command tests if the wal segment exists and is a file,
but it does not check if the file is still being written.

That's because it doesn't have to; the archive command doesn't get
called until the writing is done.

I don't have sources of pg_standby near me, but I remember in the
C code checks for complete segment sizes.

That's on the receiving side, to make sure it's not trying to process
files that haven't finished copying to the standby yet. You don't
have to do any of that in the archive_command.

Anyway, back to the original question:

archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p
/usr/local/wal_archive_local/%f'
archive files are then moved on master to standby every other minute:
rsync -aq --remove-sent-files /usr/local/wal_archive_local/
slave::wal_archive/

I don't see any mechanism here to keep rsync from copying over partial
files to the standby before they've finished copying to the
wal_archive_local directory. That's my guess for where the small
files are coming from, rsync before the cp is done. If you're going
to buffer in a transfer directory, you need some sort of test or
locking to make sure the file is complete with exactly 16MB before it
gets rsync'd over. I suspect no amount of poking at the standby will
root out the issue because it's happening on the primary.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
________________________________________
Vladimir (Vlad) Kosilov
Senior Systems Administrator
Contigo Systems Inc.
604.683.3106 (phone)
604.648.9886 (fax)
vkosilov@contigo.com
www.contigo.com
________________________________________

#10Gurjeet Singh
singh.gurjeet@gmail.com
In reply to: Vlad Kosilov (#9)
Re: pg_standby stuck on a wal file size <16MB

On Wed, May 21, 2008 at 4:14 AM, Vlad Kosilov <vkosilov@contigo.com> wrote:

as Greg pointed out: my use of rsync --remove-sent-files option had
contributed to a short sized wal log file on standby.
changing master's postgres crontab to the following helped to resolve the
issue:

# ship logs to standby:
*/2 * * * * rsync -aq /wal_archive_local/ 10.10.10.12::wal_archive/
# remove files older then remove_check file mtime
*/5 * * * * find /wal_archive_local/ ! -newer
/wal_archive_local/remove_check -exec rm -f {} \; && touch
/wal_archive_local/remove_check

Hmmm, nice trick.

--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device