Issue with rsync based incremental backup : postgres: startup process waiting for 0000000100000001000000D2

Started by Sambaalmost 14 years ago3 messagesgeneral
Jump to latest
#1Samba
saasira@gmail.com

Hi,

I'm trying to make a periodic (daily) incremental backup using rsync but
noticing a message that postgres startup process is waiting on so and so
XLOG file indefinitely.

I intend to schedule a cron job to periodically execute this rsync backup
process by : a) stop postgres server on backup server b) rsync the data
directory of original server to backup server c) start postgres on
secondary d) ensure that postgres starts up with out any issue e) execute
some test cases so as to ensure that there no data integrity issues f)
execute more tests to ensure that data is valid from business logic
perspective.

One may ask why am i using rsync backup instead of the log shipping bases
streaming replication; well, my answer would be that our system would be
having a lot of updates and fewer additions and since the transactions
would be piling up for each update, the WAL files accumulate so much that
we would be needing to archive much more data than the actual changes.
Further, transferring such huge amount of logs is going to consume lot of
time and bandwidth. Instead, I suppose that with rsync based backup we just
need to sync the actual data files and need to worry about archiving the
older WAL files, thus saving on the time of transfer, bandwidth costs, and
WAL archive management.

I would like to know if syncing the $PGDATA directory with rsync would be
good enough for incremental backups or that would create
data integrity issues.

Everytime i run the rsync command, I could successfully start the backup
server and am able to see the latest data from the original server, but am
a bit worried about this message which may be hinting that the backup is
not complete and is missing some essential WAL files.

Is there any additional steps or rsync configuration options that i need to
specify in order to make this sync complete?

Please let me know what can be done to make the rsync process complete so
that the backup server does not complain about waiting for missing WAL
files.

Here are the steps i executed on master and slave:

On Backup Server
root@backup ~]#
root@backup ~]# service postgresql stop
Stopping postgresql service: [ OK ]
root@backup ~]#

On Original Server
root@original ~]#
root@original ~]# su -l postgres -s /bin/bash -c "psql -d test"
test=#
test=# SELECT pg_start_backup('rsync_backup');
test=#\q
root@original ~]#
root@original ~]# rsync --archive --checksum --compress --progress
--exclude postgresql.conf --exclude pg_hba.conf --exclude server.crt
--exclude server.key /var/lib/pgsql/data/* root@backup.server
:/var/lib/pgsql/data/
root@original ~]# su -l postgres -s /bin/bash -c "psql -d test"
test=#
test=# SELECT pg_stop_backup();
test=#\q
root@original~]#

On Backup Server
root@backup ~]#
root@backup ~]# service postgresql start
Starting postgresql service: [ OK ]
root@backup ~]#
root@backup ~]# ps aux | grep postgres
postgres 18210 5.3 1.5 1811408 88540 ? S 20:43 0:00
/usr/pgsql-9.1/bin/postmaster -p 5432 -D /var/lib/pgsql/data
postgres 18220 0.0 0.0 115496 1144 ? Ss 20:43 0:00 postgres:
logger process
*postgres 18221 0.0 0.0 1812252 2328 ? Ss 20:43 0:00 postgres:
startup process waiting for 0000000100000001000000D2*
postgres 18222 0.0 0.0 1812180 2280 ? Ss 20:43 0:00 postgres:
writer process
postgres 18223 0.0 0.0 117744 1388 ? Ss 20:43 0:00 postgres:
stats collector process
postgres 18337 0.6 0.1 1814308 8092 ? Ss 20:43 0:00 postgres:
avaya_system_app_user avmgmt 127.0.0.1(34207) idle
postgres 18406 0.4 0.1 1815792 9272 ? Ss 20:43 0:00 postgres:
avaya_system_app_user avmgmt 127.0.0.1(34217) idle
postgres 18477 0.0 0.1 1813684 6680 ? Ss 20:43 0:00 postgres:
avaya_sseuser avmgmt 127.0.0.1(34231) idle
root 18479 0.0 0.0 61160 728 pts/1 R+ 20:43 0:00 grep
postgres
root@backup ~]#

Please advise me if what i'm doing is makes sense and is an accepted
mechanism for taking backups or if there is any other procedure that i can
emplpoy to avoid unnecessarily archiving gigabytes of WAL logs which may be
growing many times the size of the actual data directory.

Thanks and Regards,
Samba

#2Michael Nolan
htfoot@gmail.com
In reply to: Samba (#1)
Re: Issue with rsync based incremental backup : postgres: startup process waiting for 0000000100000001000000D2

On Thu, May 3, 2012 at 11:49 AM, Samba <saasira@gmail.com> wrote:

Hi,

Please advise me if what i'm doing is makes sense and is an accepted
mechanism for taking backups or if there is any other procedure that i can
emplpoy to avoid unnecessarily archiving gigabytes of WAL logs which may be
growing many times the size of the actual data directory.

Thanks and Regards,
Samba

The problem is that rsync isn't copying all the xlog files created during
the time the rsync is taking place, which is why it is complaining that
there are files missing.

There may be other logical flaws with your process as well.

Something similar to the steps given in "Starting Replication with only a
Quick Master Restart" as laid out in the wiki tutorial on binary
replication might give you a way to make this work. (You probably won't
need the restart of the master, since you're not actually setting up
replication, so you won't be changing the postgresql.conf file on your
master.)

This uses a two-step process. First you copy all the files EXCEPT the ones
on pg_xlog, then you copy those files, so you have a complete set.

See http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial
--
Mike Nolan

#3Samba
saasira@gmail.com
In reply to: Michael Nolan (#2)
Re: Issue with rsync based incremental backup : postgres: startup process waiting for 0000000100000001000000D2

Thanks Mike,

It worked!

I had even setup streaming replication after doing incremental replication,
without needing to stop postgres on the primary server.

Here is the script i came up with :

#!/bin/bash

if [ $# -ne 1 ]; then
echo "you must specify the hostname of the backup server";
exit 0;
fi;
BACKUP_SERVER=$1
PGDATA=/var/lib/pgsql/data
PGXLOG=$PGDATA/pg_xlog
PGEXEC="sudo su -l postgres -s /bin/bash -c"
RSYNC="rsync"
OPTIONS="--archive --checksum --compress --progress"
EXCLUDES="--exclude postmaster.pid --exclude postgresql.conf --exclude
pg_hba.conf --exclude server.crt --exclude server.key"
ROLLOVER=32
SSH="ssh -q -o StrictHostKeyChecking=no -o BatchMode=yes $BACKUP_SERVER"
REPLICATION_CHECK="$SSH ps aux | grep postgres | grep wal | grep receiver"
#On BACKUP_SERVER
if [ -n "$(service postgresql status | grep "pid[:blank:]*[0-9]*")" ];
then
$SSH "service postgresql stop"
fi;
#On PRIMARY
echo "Running VACUUM"
$PGEXEC "psql -c \"VACUUM FULL;\""
echo "VACUUM completed"

for f in $(ls -tr $PGXLOG | head -n ${ROLLOVER}); do
$RSYNC $OPTIONS $PGXLOG/$f $BACKUP_SERVER:$PGXLOG/
done;
$PGEXEC "psql -c \"SELECT pg_start_backup('incremental_backup',true);\""
$RSYNC $OPTIONS $EXCLUDES --exclude pg_xlog $PGDATA $BACKUP_SERVER:$PGDATA
$PGEXEC "psql -c \"SELECT pg_stop_backup();\""
$RSYNC $OPTIONS $PGXLOG $BACKUP_SERVER:$PGXLOG
$RSYNC $OPTIONS $PGXLOG $BACKUP_SERVER:$PGXLOG
#On BACKUP_SERVER
$SSH "service postgresql start"
if [ -z "$(service postgresql status | grep "pid[:blank:]*[0-9]*")" ];
then
echo "Failed to start database on backup server"
echo "Look into the postgres logs for more details"
echo "exiting..."
exit 1;
fi;
#need to improve this delay-check to wait until the backup server has
finished recovery and started into streaming mode
sleep 30
if [ -n "$(${REPLICATION_CHECK})" ] ; then
echo "SUCCESS in synching BACKUP_SERVER with the latest data from
Primary";
#On BACKUP_SERVER
$SSH "service postgresql stop"
echo "Stopped the backup server in good state; it will get updated in the
next scheduled incremental backup"
else
echo "FAILED to sync backup server with Primary";
echo "Leaving the backup server running in the failed state for
further debugging"
exit 1;
fi;
exit 0;

I hope this would help others in need...

Thanks and Regards,
Samba

----------------------------------------------------------------------------------------------------------------------------------------------------------------
On Thu, May 3, 2012 at 11:55 PM, Michael Nolan <htfoot@gmail.com> wrote:

Show quoted text

On Thu, May 3, 2012 at 11:49 AM, Samba <saasira@gmail.com> wrote:

Hi,

Please advise me if what i'm doing is makes sense and is an accepted
mechanism for taking backups or if there is any other procedure that i can
emplpoy to avoid unnecessarily archiving gigabytes of WAL logs which may be
growing many times the size of the actual data directory.

Thanks and Regards,
Samba

The problem is that rsync isn't copying all the xlog files created during
the time the rsync is taking place, which is why it is complaining that
there are files missing.

There may be other logical flaws with your process as well.

Something similar to the steps given in "Starting Replication with only a
Quick Master Restart" as laid out in the wiki tutorial on binary
replication might give you a way to make this work. (You probably won't
need the restart of the master, since you're not actually setting up
replication, so you won't be changing the postgresql.conf file on your
master.)

This uses a two-step process. First you copy all the files EXCEPT the
ones on pg_xlog, then you copy those files, so you have a complete set.

See http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial
--
Mike Nolan