Streaming Replication: Observations, Questions and Comments
Hi all,
We have a postgres-9.0 streaming replication set up where we keep the WAL
segments on the master amounting to 10 GB so that we can survive longer
periods of disconnect between master and slave. We do not use any shared
storage space for archiving WAL logs. (the shared disk server may turn out
to be another point of failure, which we would want to avoid)
Here is our basic configuration parameters in :
postgresql.conf on master:
wal_keep_segments = 640 # previously 32 | # in logfile segments, min 1,
16MB each
archive_mode = on # allows archiving to be done # (change
requires restart)
archive_command = 'cp -v %p /archives/data/pgsql/pg_xlog/%f'
postgresql.conf on slave:
wal_level = hot_standby
hot_standby = on
recovery.conf on slave:
standby_mode = 'on'
primary_conninfo = 'host=ip.add.ress port=5432 user=repman'
trigger_file = '/var/lib/pgsql/data/stop.replication'
restore_command = 'cp -i /archives/data/pgsql/pg_xlog/%f "%p"'
Master and Slave servers are sperated by thousands of miles and the network
bandwidth comprises just an ordinary 1 Mbps DSL line. Both the master server
and the slave server have the /archives partition mounted and synced with
csync2 between master and the slave systems. I'm not sure if this is the
correct way of configuring streaming replication, but I will explain what
worked for us and what we are still left wanting with:
Under heavy inserts/updates/deletes on the master (load generated by stored
procedures), we noticed that the slave went far behind the master and
resulted into breakage of replication. Hence we changed from 32 log file
segments to 640, which corresponds to 10 GB so that we can survive either
very heavy spikes of load or even a week's disconnect of the slave (although
alarms would be raised appropriately for the same effect).
One strange thing I noticed is that the pg_xlogs on the master have outsized
the actual data stored in the database by at least 3-4 times, which was
quite surprising. I'm not sure if 'restore_command' has anything to do with
it. I did not understand why transaction logs would need to be so many times
larger than the actual size of the database, have I done something wrong
somewhere?
Another interesting fact we noticed is that once the replication is broken
for some longer time ( walsender and walreceiver processes have died by this
time), we had to restart not only the slave server but also the master
server, which was quite strange. Shouldn't the master server start (if there
is none running) walsender process the moment it receives a request for
streaming? Similarly, why should the slave be restarted just to start
replication again? why can't these two processes be independently started
and stopped by the postmaster process as and when necessary as per the need
to replicate or not?
Another thing that I noticed was that the slave server has logged that it is
out of sync and hence closing replication but the master did not say
anything about this breakage of replication.
So summing up the above, I would like to have some pointers to understand
the following, which I think will benefit many others as well:
- Do I need to 'archive' since I'm storing quite a significant number of
logfile segments that can help sustain disconnect for almost a week?
- Why did the xlog data accumulate so much as to be 3-4 times that of the
actual size of the database growth during the same period?
- why should we restart the postgres service on slave and/or master if
we need to join a slave back into replication after a long disconnect?
- why is the master not complaining about the loss of replication? (of
course, slave did complain about the disconnect or its inability to continue
accepting data from master on account difference in xlog location).
Some of the above might be because of wrong configuration, while some may be
give hints for future enhancements. I hope this will start a healthy
discussion on the areas where streaming replication needs to be improved and
strengthened.
Thanks and Regards,
Samba
On 08/24/2011 11:33 AM, Samba wrote:
One strange thing I noticed is that the pg_xlogs on the master have
outsized the actual data stored in the database by at least 3-4 times,
which was quite surprising. I'm not sure if 'restore_command' has
anything to do with it. I did not understand why transaction logs
would need to be so many times larger than the actual size of the
database, have I done something wrong somewhere?
That's common to see. Systems that regularly UPDATE the same rows often
can easily end up with a WAL stream much larger than the database. The
WAL data contains enough information to replay every point in time from
the base backup until the current time. That can be significantly
larger than the database, which just holds the latest copy of the data.
One of the biggest things that makes your WAL large are the full page
writes that protect against incomplete writes. See "question regarding
full_page_writes" thread happening on this list recently for details.
Each time you touch a page, per checkpoint, another full copy of that
page is written out.
What I have to do in a lot of cases is significantly decrease the number
of checkpoints in order to keep this overhead under control. The
default config has a checkpoint every checkpoint_segments of work, and
every checkpoint_timeout of time. That makes for a checkpoint every 5
minutes, and even more often under heavy load.
If you increase checkpoint_segments a whole lot, all of your checkpoints
will be based on the timeout instead. Then you can see how WAL load
decreases as you increase checkpoint_timeout. I've had to set
checkpoint_timeout as high as 30 minutes before on busy systems, to
lower the WAL overhead.
--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
On August 24, 2011 08:33:17 AM Samba wrote:
One strange thing I noticed is that the pg_xlogs on the master have
outsized the actual data stored in the database by at least 3-4 times,
which was quite surprising. I'm not sure if 'restore_command' has anything
to do with it. I did not understand why transaction logs would need to be
so many times larger than the actual size of the database, have I done
something wrong somewhere?
If you archive them instead of keeping them in pg_xlog, you can gzip them.
They compress reasonably well.
The problem with maintaining a separate archive is that one need to write
some additional scripts to periodically remove older log files from the
archive and that gets complicated with a setup having one master and
multiple slaves.
I think it is a better idea to club compression and clean up in the core
itself, may at a later release. A better approach to cleanup is that the
walsender process decides when to cleanup a particular logfile based on the
feedback from the all the registered slaves. If a slave is not reachable or
falls behind for too long, then that slave should be banned from the setup
(log the event in pg_replication.log ???). The replication status for each
slave can be maintained in something like pg_slave_replica_status catalog
table.
When it comes to compression, walsender can compress the each chunk of data
that it streams (increasing the streaming_delay may improve compression
ratio, hence a balance has to be struck between compression and
sustainable-data-loss-in-case-of-failure)
Although I could visualise this design would be much better than leaving it
to external utilities, I'm not that good at C language and hence only
proposing a design and not a patch. I hope my suggestion will be received in
good spirit.
Thanks and Regards,
Samba
PS:
I have wrongly stated that master server had to be restarted in case of long
disconnects, sorry that was not true. But I still feel that requiring
restart of standby server to resume replication should be avoided, if
possible.
And, I strongly feel that a breakage in replication must be logged by both
master server and the concerned slave servers.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
On Wed, Aug 24, 2011 at 11:03 PM, Alan Hodgson <ahodgson@simkin.ca> wrote:
Show quoted text
On August 24, 2011 08:33:17 AM Samba wrote:
One strange thing I noticed is that the pg_xlogs on the master have
outsized the actual data stored in the database by at least 3-4 times,
which was quite surprising. I'm not sure if 'restore_command' hasanything
to do with it. I did not understand why transaction logs would need to be
so many times larger than the actual size of the database, have I done
something wrong somewhere?If you archive them instead of keeping them in pg_xlog, you can gzip them.
They compress reasonably well.--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general