[GENERA]: Postgresql-9.1.1 synchronous replication issue
Hello,
I was testing the Postgres-9.1.1 synchronous streaming replication on our
UAT system.
Without synchronous replication, everything was working fine.
But, when i enabled synchronous_replication_names='*', the "create table"
started hanging for long time.
When i pressed "Ctrl+C" i got the following message -
Cancel request sent
WARNING: canceling wait for synchronous replication due to user request
DETAIL: The transaction has already committed locally, but might not have
been replicated to the standby.
CREATE TABLE
Can someone please help us ?
Thanks
VB
What is the value of synchronous_commit ?
---
Regards,
Raghavendra
EnterpriseDB Corporation
Blog: http://raghavt.blogspot.com/
On Thu, Feb 2, 2012 at 12:21 PM, Venkat Balaji <venkat.balaji@verse.in>wrote:
Show quoted text
Hello,
I was testing the Postgres-9.1.1 synchronous streaming replication on our
UAT system.Without synchronous replication, everything was working fine.
But, when i enabled synchronous_replication_names='*', the "create table"
started hanging for long time.When i pressed "Ctrl+C" i got the following message -
Cancel request sent
WARNING: canceling wait for synchronous replication due to user request
DETAIL: The transaction has already committed locally, but might not have
been replicated to the standby.
CREATE TABLECan someone please help us ?
Thanks
VB
synchronous_commit is "on"
Thanks
VB
On Thu, Feb 2, 2012 at 12:31 PM, Raghavendra <
raghavendra.rao@enterprisedb.com> wrote:
Show quoted text
What is the value of synchronous_commit ?
---
Regards,
Raghavendra
EnterpriseDB Corporation
Blog: http://raghavt.blogspot.com/On Thu, Feb 2, 2012 at 12:21 PM, Venkat Balaji <venkat.balaji@verse.in>wrote:
Hello,
I was testing the Postgres-9.1.1 synchronous streaming replication on our
UAT system.Without synchronous replication, everything was working fine.
But, when i enabled synchronous_replication_names='*', the "create table"
started hanging for long time.When i pressed "Ctrl+C" i got the following message -
Cancel request sent
WARNING: canceling wait for synchronous replication due to user request
DETAIL: The transaction has already committed locally, but might not
have been replicated to the standby.
CREATE TABLECan someone please help us ?
Thanks
VB
On Wednesday, February 01, 2012 10:51:44 pm Venkat Balaji wrote:
Hello,
I was testing the Postgres-9.1.1 synchronous streaming replication on our
UAT system.Without synchronous replication, everything was working fine.
But, when i enabled synchronous_replication_names='*', the "create table"
started hanging for long time.
Only the CREATE TABLE statement or all statements?
In general terms synchronous replication moves at the speed of the connection
between the primary and standby or does not occur if the standby can not be
found. So what is the state of the connection between the primary and standby?
When i pressed "Ctrl+C" i got the following message -
Cancel request sent
WARNING: canceling wait for synchronous replication due to user request
DETAIL: The transaction has already committed locally, but might not have
been replicated to the standby.
CREATE TABLECan someone please help us ?
Thanks
VB
--
Adrian Klaver
adrian.klaver@gmail.com
On Thu, Feb 2, 2012 at 8:37 PM, Adrian Klaver <adrian.klaver@gmail.com>wrote:
On Wednesday, February 01, 2012 10:51:44 pm Venkat Balaji wrote:
Hello,
I was testing the Postgres-9.1.1 synchronous streaming replication on our
UAT system.Without synchronous replication, everything was working fine.
But, when i enabled synchronous_replication_names='*', the "create table"
started hanging for long time.Only the CREATE TABLE statement or all statements?
In general terms synchronous replication moves at the speed of the
connection
between the primary and standby or does not occur if the standby can not be
found. So what is the state of the connection between the primary and
standby?
Connection is working fine between primary and standby, ping is working
fine and wal archive file transfer is working without any issues.
I tried CREATE TABLE and CREATE DATABASE, both were hanging.
Apart from regular streaming replication settings, I did the following on
primary to enable synchronous replication -
synchronous_standby_names='*'
Commands started hanging after that. Is there anything else i need to do.
Thanks
VB
On Thursday, February 02, 2012 10:21:28 pm Venkat Balaji wrote:
Connection is working fine between primary and standby, ping is working
fine and wal archive file transfer is working without any issues.I tried CREATE TABLE and CREATE DATABASE, both were hanging.
Apart from regular streaming replication settings, I did the following on
primary to enable synchronous replication -synchronous_standby_names='*'
Commands started hanging after that. Is there anything else i need to do.
From here:
http://www.postgresql.org/docs/9.1/interactive/runtime-config-replication.html
"
synchronous_standby_names (string)
... The synchronous standby will be the first standby named in this list that is
both currently connected and streaming data in real-time (as shown by a state of
streaming in the pg_stat_replication view). Other standby servers appearing
later in this list represent potential synchronous standbys....
The name of a standby server for this purpose is the application_name setting of
the standby, as set in the primary_conninfo of the standby's walreceiver. There
is no mechanism to enforce uniqueness. In case of duplicates one of the matching
standbys will be chosen to be the synchronous standby, though exactly which one
is indeterminate. The special entry * matches any application_name, including
the default application name of walreceiver.
"
So I would check the pg_stat_replication view to see if Postgres is seeing the
standby as streaming.
Thanks
VB
--
Adrian Klaver
adrian.klaver@gmail.com
This issue stays resolved !!!
The statements are no more hanging on production now :)
The suspected problem was -
Our brand new production server did not have the port 5432 open.
I had opened the port using "iptables" command and everything started
working.
synchronous replication is fast and awesome.
Thanks
VB
On Fri, Feb 3, 2012 at 9:45 PM, Adrian Klaver <adrian.klaver@gmail.com>wrote:
Show quoted text
On Thursday, February 02, 2012 10:21:28 pm Venkat Balaji wrote:
Connection is working fine between primary and standby, ping is working
fine and wal archive file transfer is working without any issues.I tried CREATE TABLE and CREATE DATABASE, both were hanging.
Apart from regular streaming replication settings, I did the following on
primary to enable synchronous replication -synchronous_standby_names='*'
Commands started hanging after that. Is there anything else i need to do.
From here:
http://www.postgresql.org/docs/9.1/interactive/runtime-config-replication.html
"
synchronous_standby_names (string)
... The synchronous standby will be the first standby named in this list
that is
both currently connected and streaming data in real-time (as shown by a
state of
streaming in the pg_stat_replication view). Other standby servers appearing
later in this list represent potential synchronous standbys....The name of a standby server for this purpose is the application_name
setting of
the standby, as set in the primary_conninfo of the standby's walreceiver.
There
is no mechanism to enforce uniqueness. In case of duplicates one of the
matching
standbys will be chosen to be the synchronous standby, though exactly
which one
is indeterminate. The special entry * matches any application_name,
including
the default application name of walreceiver."
So I would check the pg_stat_replication view to see if Postgres is seeing
the
standby as streaming.Thanks
VB--
Adrian Klaver
adrian.klaver@gmail.com
Hello,
Disaster Recovery testing for Synchronous replication setup -
When the standby site is down, transactions at the production site started
hanging (this is after the successful setup of synchronous replication).
We changed synchronous_commit to 'local' to over-come this situation.
- No transactions are hanging at the production site even when the standby
is down
- Standby is automatically getting synced when it is back up again.
Can someone let us know if there are any "-ve" effects of putting
synchronous_commit='local' ??
I am assuming that this as good as putting "synchronous_commit=on" on an
stand-alone system.
We need to get this setup live on production shortly.
Thanks
VB
On Fri, Feb 10, 2012 at 4:47 PM, Venkat Balaji <venkat.balaji@verse.in>wrote:
Show quoted text
This issue stays resolved !!!
The statements are no more hanging on production now :)
The suspected problem was -
Our brand new production server did not have the port 5432 open.
I had opened the port using "iptables" command and everything started
working.synchronous replication is fast and awesome.
Thanks
VBOn Fri, Feb 3, 2012 at 9:45 PM, Adrian Klaver <adrian.klaver@gmail.com>wrote:
On Thursday, February 02, 2012 10:21:28 pm Venkat Balaji wrote:
Connection is working fine between primary and standby, ping is working
fine and wal archive file transfer is working without any issues.I tried CREATE TABLE and CREATE DATABASE, both were hanging.
Apart from regular streaming replication settings, I did the following
on
primary to enable synchronous replication -
synchronous_standby_names='*'
Commands started hanging after that. Is there anything else i need to
do.
From here:
http://www.postgresql.org/docs/9.1/interactive/runtime-config-replication.html
"
synchronous_standby_names (string)
... The synchronous standby will be the first standby named in this list
that is
both currently connected and streaming data in real-time (as shown by a
state of
streaming in the pg_stat_replication view). Other standby servers
appearing
later in this list represent potential synchronous standbys....The name of a standby server for this purpose is the application_name
setting of
the standby, as set in the primary_conninfo of the standby's walreceiver.
There
is no mechanism to enforce uniqueness. In case of duplicates one of the
matching
standbys will be chosen to be the synchronous standby, though exactly
which one
is indeterminate. The special entry * matches any application_name,
including
the default application name of walreceiver."
So I would check the pg_stat_replication view to see if Postgres is
seeing the
standby as streaming.Thanks
VB--
Adrian Klaver
adrian.klaver@gmail.com
On Tuesday, February 14, 2012 4:21:22 am Venkat Balaji wrote:
Hello,
Disaster Recovery testing for Synchronous replication setup -
When the standby site is down, transactions at the production site started
hanging (this is after the successful setup of synchronous replication).We changed synchronous_commit to 'local' to over-come this situation.
- No transactions are hanging at the production site even when the standby
is down
- Standby is automatically getting synced when it is back up again.Can someone let us know if there are any "-ve" effects of putting
synchronous_commit='local' ??I am assuming that this as good as putting "synchronous_commit=on" on an
stand-alone system.
It would seem you are really after streaming replication(which is asynchronous)
more than synchronous replication. I have not used synchronous replication
enough to be sure, but I think by setting synchronous_commit='local' you are
basically turning the system into a straight streaming(asynchronous) system
anyway.
We need to get this setup live on production shortly.
Thanks
VB
--
Adrian Klaver
adrian.klaver@gmail.com
On Tue, Feb 14, 2012 at 8:09 PM, Adrian Klaver <adrian.klaver@gmail.com>wrote:
On Tuesday, February 14, 2012 4:21:22 am Venkat Balaji wrote:
Hello,
Disaster Recovery testing for Synchronous replication setup -
When the standby site is down, transactions at the production site
started
hanging (this is after the successful setup of synchronous replication).
We changed synchronous_commit to 'local' to over-come this situation.
- No transactions are hanging at the production site even when the
standby
is down
- Standby is automatically getting synced when it is back up again.Can someone let us know if there are any "-ve" effects of putting
synchronous_commit='local' ??I am assuming that this as good as putting "synchronous_commit=on" on an
stand-alone system.It would seem you are really after streaming replication(which is
asynchronous)
more than synchronous replication. I have not used synchronous replication
enough to be sure, but I think by setting synchronous_commit='local' you
are
basically turning the system into a straight streaming(asynchronous) system
anyway.
Yeah. Its a kind of asynchronous. All i wanted is as follows -
1
Show quoted text
We need to get this setup live on production shortly.
Thanks
VB--
Adrian Klaver
adrian.klaver@gmail.com
On Wed, Feb 15, 2012 at 11:01 AM, Venkat Balaji <venkat.balaji@verse.in>wrote:
On Tue, Feb 14, 2012 at 8:09 PM, Adrian Klaver <adrian.klaver@gmail.com>wrote:
On Tuesday, February 14, 2012 4:21:22 am Venkat Balaji wrote:
Hello,
Disaster Recovery testing for Synchronous replication setup -
When the standby site is down, transactions at the production site
started
hanging (this is after the successful setup of synchronous replication).
We changed synchronous_commit to 'local' to over-come this situation.
- No transactions are hanging at the production site even when the
standby
is down
- Standby is automatically getting synced when it is back up again.Can someone let us know if there are any "-ve" effects of putting
synchronous_commit='local' ??I am assuming that this as good as putting "synchronous_commit=on" on an
stand-alone system.It would seem you are really after streaming replication(which is
asynchronous)
more than synchronous replication. I have not used synchronous replication
enough to be sure, but I think by setting synchronous_commit='local' you
are
basically turning the system into a straight streaming(asynchronous)
system
anyway.Sorry. Ignore my earlier message -
Yeah. Its a kind of asynchronous ( at the transaction level, NOT WAL based
). All i wanted to achieve is as follows -
1. Synchronous replication - which would perform transactions
simultaneously on production and standby.
2. Ideally, if the commit does not occur at the standby site, then it would
not commit at the production as well, which will cause production site
to hang. I do not want production site to hang if the standby site is
down or not accessible.
3. I would need the commit to occur on production and the production apps
should not be disturbed if the standby fails to respond. To achieve this,
I have set synchronous_commit='local' to ensure that transactions are
committed at production site first.
We do have streaming replication (of PG-9.0) setup on our other production
boxes, which is asynchronous and is WAL based.
Thanks
VB
In-short, I would like to understand if i am achieving the same
asynchronous streaming replication by putting synchronous_commit='local' -
I understand that streaming replication is record based log-shipping.
Below is what shows up on our primary test server where we are testing
synchronous replication -
*1. Synchronous setup enabled with synchronous_commit='local'*
postgres=# select * from pg_stat_replication ;
procpid | usesysid | usename | application_name | client_addr |
client_hostname | client_port | backend_start | state |
sent_location | write_locat
ion | flush_location | replay_location | sync_priority | sync_state
---------+----------+----------+------------------+-------------+-----------------+-------------+------------------------------+-----------+---------------+------------
----+----------------+-----------------+---------------+------------
24099 | 10 | postgres | walreceiver | <ip-address> |
| 56432 | 2012-02-15 12:55:39.65663+03 | streaming |
0/E000078 | 0/E000078
| 0/E000078 | 0/E000078 | 1 | *sync*
(1 row)
postgres=# show synchronous_commit ;
synchronous_commit
--------------------
* local*
(1 row)
postgres=# show synchronous_standby_names ;
synchronous_standby_names
---------------------------------------------
*
(1 row)
Does this mean that the system is still replicating synchronously ? If yes,
by what means ?
*Below is our actual production setup in 9.1.1 with asynchronous
replication setup -*
*2. Asynchronous enabled with synchronous_commit='on'*
psql (9.1.1)
Type "help" for help.
postgres=# select * from pg_stat_replication;
procpid | usesysid | usename | application_name | client_addr |
client_hostname | client_port | backend_start | state
| sent_location | write
_location | flush_location | replay_location | sync_priority | sync_state
---------+----------+----------+------------------+-------------+----------------------+-------------+-------------------------------+-----------+---------------+------
----------+----------------+-----------------+---------------+------------
3159 | 10 | postgres | walreceiver | <ipaddress> |
<hostname> | 40165 | 2012-02-08 12:41:51.858897+03 | streaming |
1/86F83B50 | 1/86F
83B50 | 1/86F83B50 | 1/86F83B50 | 0 | *async*
(1 row)
postgres=# show synchronous_commit ;
synchronous_commit
--------------------
on
(1 row)
postgres=# show synchronous_standby_names ;
synchronous_standby_names
---------------------------
(1 row)
Operation wise, I am not seeing much difference by inserting few 1000 rows.
Its almost the same behavior both in asynch and sync rep.
Thanks,
VB
On Wed, Feb 15, 2012 at 11:11 AM, Venkat Balaji <venkat.balaji@verse.in>wrote:
Show quoted text
On Wed, Feb 15, 2012 at 11:01 AM, Venkat Balaji <venkat.balaji@verse.in>wrote:
On Tue, Feb 14, 2012 at 8:09 PM, Adrian Klaver <adrian.klaver@gmail.com>wrote:
On Tuesday, February 14, 2012 4:21:22 am Venkat Balaji wrote:
Hello,
Disaster Recovery testing for Synchronous replication setup -
When the standby site is down, transactions at the production site
started
hanging (this is after the successful setup of synchronous
replication).
We changed synchronous_commit to 'local' to over-come this situation.
- No transactions are hanging at the production site even when the
standby
is down
- Standby is automatically getting synced when it is back up again.Can someone let us know if there are any "-ve" effects of putting
synchronous_commit='local' ??I am assuming that this as good as putting "synchronous_commit=on" on
an
stand-alone system.
It would seem you are really after streaming replication(which is
asynchronous)
more than synchronous replication. I have not used synchronous
replication
enough to be sure, but I think by setting synchronous_commit='local'
you are
basically turning the system into a straight streaming(asynchronous)
system
anyway.Sorry. Ignore my earlier message -
Yeah. Its a kind of asynchronous ( at the transaction level, NOT WAL based
). All i wanted to achieve is as follows -1. Synchronous replication - which would perform transactions
simultaneously on production and standby.
2. Ideally, if the commit does not occur at the standby site, then it
would not commit at the production as well, which will cause production
site
to hang. I do not want production site to hang if the standby site is
down or not accessible.
3. I would need the commit to occur on production and the production apps
should not be disturbed if the standby fails to respond. To achieve this,
I have set synchronous_commit='local' to ensure that transactions are
committed at production site first.We do have streaming replication (of PG-9.0) setup on our other production
boxes, which is asynchronous and is WAL based.Thanks
VB
On Wednesday, February 15, 2012 2:15:34 am Venkat Balaji wrote:
In-short, I would like to understand if i am achieving the same
asynchronous streaming replication by putting synchronous_commit='local' -I understand that streaming replication is record based log-shipping.
Below is what shows up on our primary test server where we are testing
synchronous replication -*1. Synchronous setup enabled with synchronous_commit='local'*
postgres=# select * from pg_stat_replication ;
procpid | usesysid | usename | application_name | client_addr |
client_hostname | client_port | backend_start | state |
sent_location | write_locat
ion | flush_location | replay_location | sync_priority | sync_state
---------+----------+----------+------------------+-------------+----------
-------+-------------+------------------------------+-----------+----------
-----+------------
----+----------------+-----------------+---------------+------------ 24099
| 10 | postgres | walreceiver | <ip-address> || 56432 | 2012-02-15 12:55:39.65663+03 | streaming |
0/E000078 | 0/E000078
| 0/E000078 | 0/E000078 | 1 | *sync*
(1 row)
postgres=# show synchronous_commit ;
synchronous_commit
--------------------
* local*
(1 row)postgres=# show synchronous_standby_names ;
synchronous_standby_names
---------------------------------------------
*
(1 row)Does this mean that the system is still replicating synchronously ? If yes,
by what means ?*Below is our actual production setup in 9.1.1 with asynchronous
replication setup -**2. Asynchronous enabled with synchronous_commit='on'*
psql (9.1.1)
Type "help" for help.postgres=# select * from pg_stat_replication;
procpid | usesysid | usename | application_name | client_addr |
client_hostname | client_port | backend_start | state| sent_location | write
_location | flush_location | replay_location | sync_priority | sync_state
---------+----------+----------+------------------+-------------+----------
------------+-------------+-------------------------------+-----------+----
-----------+------
----------+----------------+-----------------+---------------+------------
3159 | 10 | postgres | walreceiver | <ipaddress> |
<hostname> | 40165 | 2012-02-08 12:41:51.858897+03 | streaming |
1/86F83B50 | 1/86F
83B50 | 1/86F83B50 | 1/86F83B50 | 0 | *async*(1 row)
postgres=# show synchronous_commit ;
synchronous_commit
--------------------
on
(1 row)postgres=# show synchronous_standby_names ;
synchronous_standby_names
---------------------------(1 row)
Operation wise, I am not seeing much difference by inserting few 1000 rows.
Its almost the same behavior both in asynch and sync rep.
First sync replication is just an advanced form of streaming replication.
http://www.postgresql.org/docs/9.1/interactive/warm-standby.html#STREAMING-
REPLICATION
"
Streaming replication allows a standby server to stay more up-to-date than is
possible with file-based log shipping. The standby connects to the primary, which
streams WAL records to the standby as they're generated, without waiting for the
WAL file to be filled.
"
In both cases WAL information is being used. Though just one record at a time,
so the entire WAL file does not have to be shipped over. In the case of sync
replication a commit on the master is not complete until it also completes on
the standby. This is for the default case where synchronous_commit=on and
synchronous_standby_names has valid names. In your case you are using
synchronous_commit=local and per:
http://www.postgresql.org/docs/9.1/interactive/runtime-config-wal.html#RUNTIME-
CONFIG-WAL-SETTINGS
synchronous_commit
....
"However, the special value local is available for transactions that wish to
wait for local flush to disk, but not synchronous replication."
So in this case you are not waiting for confirmation of the commit being flushed
to disk on the standby. It that case you are bypassing the primary reason for
sync replication. The plus is transactions on the master will complete faster
and do so in the absence of the standby. The minus is that you are in sort of an
in between state.
Personally, I take sync replication to be basically an all or nothing
proposition. By setting it up you are saying you want, at minimum, two database
clusters to be in sync at any point in time all the time (except for start up).
If that is not possible then you are really looking for async replication.
Thanks,
VB
--
Adrian Klaver
adrian.klaver@gmail.com
Andrian,
Thanks a lot !
So in this case you are not waiting for confirmation of the commit being
flushed
to disk on the standby. It that case you are bypassing the primary reason
for
sync replication. The plus is transactions on the master will complete
faster
and do so in the absence of the standby. The minus is that you are in sort
of an
in between state.
I understand. My worry and requirement is to ensure master is not disturbed
for any reason.
In sync rep, the biggest worry is if standby server is unavailable and is
down for longer time, master hangs and will be in the same state until
standby comes back up or replication must be broken temporarily (until
standby comes back up) so that master runs without interruption. This is a
costly exercise on production from downtime perspective.
Personally, I take sync replication to be basically an all or nothing
proposition. By setting it up you are saying you want, at minimum, two
database
clusters to be in sync at any point in time all the time (except for start
up).
If that is not possible then you are really looking for async replication.
Yeah. We will need to make a decision accordingly.
Thanks again,
VB
On Wednesday, February 15, 2012 10:21:02 pm Venkat Balaji wrote:
Andrian,
Thanks a lot !
So in this case you are not waiting for confirmation of the commit being
flushed
to disk on the standby. It that case you are bypassing the primary
reason for
sync replication. The plus is transactions on the master will complete
faster
and do so in the absence of the standby. The minus is that you are in
sort of an
in between state.I understand. My worry and requirement is to ensure master is not disturbed
for any reason.
In sync rep, the biggest worry is if standby server is unavailable and is
down for longer time, master hangs and will be in the same state until
standby comes back up or replication must be broken temporarily (until
standby comes back up) so that master runs without interruption. This is a
costly exercise on production from downtime perspective.
So just use regular streaming replication without sync rep. You get record based
transaction shipping without having to wait for the standby. You will need to
make sure that wal_keep_segments is big enough to cover any down time on the
standby(you would need that for sync rep also).
Personally, I take sync replication to be basically an all or nothing
proposition. By setting it up you are saying you want, at minimum, two
database
clusters to be in sync at any point in time all the time (except for
start up).
If that is not possible then you are really looking for async
replication.Yeah. We will need to make a decision accordingly.
Thanks again,
VB
--
Adrian Klaver
adrian.klaver@gmail.com
On Thu, Feb 16, 2012 at 8:14 PM, Adrian Klaver <adrian.klaver@gmail.com>wrote:
On Wednesday, February 15, 2012 10:21:02 pm Venkat Balaji wrote:
Andrian,
Thanks a lot !
So in this case you are not waiting for confirmation of the commit being
flushed
to disk on the standby. It that case you are bypassing the primary
reason for
sync replication. The plus is transactions on the master will complete
faster
and do so in the absence of the standby. The minus is that you are in
sort of an
in between state.I understand. My worry and requirement is to ensure master is not
disturbed
for any reason.
In sync rep, the biggest worry is if standby server is unavailable and is
down for longer time, master hangs and will be in the same state until
standby comes back up or replication must be broken temporarily (until
standby comes back up) so that master runs without interruption. This isa
costly exercise on production from downtime perspective.
So just use regular streaming replication without sync rep. You get record
based
transaction shipping without having to wait for the standby. You will
need to
make sure that wal_keep_segments is big enough to cover any down time on
the
standby(you would need that for sync rep also).
As we already have streaming replication configured. We have rolled back
the plan of setting up synchronous replication.
Thanks,
VB