Support for N synchronous standby servers
Hi all,
Please find attached a patch to add support of synchronous replication
for multiple standby servers. This is controlled by the addition of a
new GUC parameter called synchronous_standby_num, that makes server
wait for transaction commit on the first N standbys defined in
synchronous_standby_names. The implementation is really
straight-forward, and has just needed a couple of modifications in
walsender.c for pg_stat_get_wal_senders and syncrep.c.
When a process commit is cancelled manually by user or when
ProcDiePending shows up, the message returned to user does not show
the list of walsenders where the commit has not been confirmed as it
partially confirmed. I have not done anything for that but let me know
if that would be useful. This would need a scan of the walsenders to
get their application_name.
Thanks,
--
Michael
Attachments:
0001-syncrep_multi_standbys.patchtext/x-diff; charset=US-ASCII; name=0001-syncrep_multi_standbys.patchDownload+175-42
On Sat, Aug 9, 2014 at 3:03 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
Hi all,
Please find attached a patch to add support of synchronous replication
for multiple standby servers. This is controlled by the addition of a
new GUC parameter called synchronous_standby_num, that makes server
wait for transaction commit on the first N standbys defined in
synchronous_standby_names. The implementation is really
straight-forward, and has just needed a couple of modifications in
walsender.c for pg_stat_get_wal_senders and syncrep.c.
Great! This is really the feature which I really want.
Though I forgot why we missed this feature when
we had added the synchronous replication feature,
maybe it's worth reading the old discussion which
may suggest the potential problem of N sync standbys.
I just tested this feature with synchronous_standby_num = 2.
I started up only one synchronous standby and ran
the write transaction. Then the transaction was successfully
completed, i.e., it didn't wait for two standbys. Probably
this is a bug of the patch.
And, you forgot to add the line of synchronous_standby_num
to postgresql.conf.sample.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Aug 11, 2014 at 1:31 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Sat, Aug 9, 2014 at 3:03 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
Great! This is really the feature which I really want.
Though I forgot why we missed this feature when
we had added the synchronous replication feature,
maybe it's worth reading the old discussion which
may suggest the potential problem of N sync standbys.
Sure, I'll double check. Thanks for your comments.
I just tested this feature with synchronous_standby_num = 2.
I started up only one synchronous standby and ran
the write transaction. Then the transaction was successfully
completed, i.e., it didn't wait for two standbys. Probably
this is a bug of the patch.
Oh OK, yes this is a bug of what I did. The number of standbys to wait
for takes precedence on the number of standbys found in the list of
active WAL senders. I changed the patch to take into account that
behavior. So for example if you have only one sync standby connected,
and synchronous_standby_num = 2, client waits indefinitely.
And, you forgot to add the line of synchronous_standby_num
to postgresql.conf.sample.
Yep, right.
On top of that, I refactored the code in such a way that
pg_stat_get_wal_senders and SyncRepReleaseWaiters rely on a single API
to get the list of synchronous standbys found. This reduces code
duplication, duplication that already exists in HEAD...
Regards,
--
Michael
Attachments:
20140811_multi_syncrep_v2.patchtext/x-patch; charset=US-ASCII; name=20140811_multi_syncrep_v2.patchDownload+259-141
On Mon, Aug 11, 2014 at 11:54 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Mon, Aug 11, 2014 at 1:31 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Sat, Aug 9, 2014 at 3:03 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
Great! This is really the feature which I really want.
Though I forgot why we missed this feature when
we had added the synchronous replication feature,
maybe it's worth reading the old discussion which
may suggest the potential problem of N sync standbys.Sure, I'll double check. Thanks for your comments.
I just tested this feature with synchronous_standby_num = 2.
I started up only one synchronous standby and ran
the write transaction. Then the transaction was successfully
completed, i.e., it didn't wait for two standbys. Probably
this is a bug of the patch.Oh OK, yes this is a bug of what I did. The number of standbys to wait
for takes precedence on the number of standbys found in the list of
active WAL senders. I changed the patch to take into account that
behavior. So for example if you have only one sync standby connected,
and synchronous_standby_num = 2, client waits indefinitely.
Thanks for updating the patch! Again I tested the feature and found something
wrong. I set synchronous_standby_num to 2 and started three standbys. Two of
them are included in synchronous_standby_names, i.e., they are synchronous
standbys. That is, the other one standby is always asynchronous. When
I shutdown one of synchronous standbys and executed the write transaction,
the transaction was successfully completed. So the transaction doesn't wait for
two sync standbys in that case. Probably this is a bug.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Aug 11, 2014 at 1:26 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
Thanks for updating the patch! Again I tested the feature and found
something
wrong. I set synchronous_standby_num to 2 and started three standbys. Two
of
them are included in synchronous_standby_names, i.e., they are synchronous
standbys. That is, the other one standby is always asynchronous. When
I shutdown one of synchronous standbys and executed the write transaction,
the transaction was successfully completed. So the transaction doesn't
wait for
two sync standbys in that case. Probably this is a bug.
Well, that's working in my case :)
Please see below with 4 nodes: 1 master and 3 standbys on same host. Master
listens to 5432, other nodes to 5433, 5434 and 5435. Each standby's
application_name is node_$PORT
=# show synchronous_standby_names ;
synchronous_standby_names
---------------------------
node_5433,node_5434
(1 row)
=# show synchronous_standby_num ;
synchronous_standby_num
-------------------------
2
(1 row)
=# SELECT application_name,
pg_xlog_location_diff(sent_location, flush_location) AS replay_delta,
sync_priority,
sync_state
FROM pg_stat_replication ORDER BY replay_delta ASC, application_name;
application_name | replay_delta | sync_priority | sync_state
------------------+--------------+---------------+------------
node_5433 | 0 | 1 | sync
node_5434 | 0 | 2 | sync
node_5435 | 0 | 0 | async
(3 rows)
=# create table aa (a int);
CREATE TABLE
[...]
-- Stopped node with port 5433:
[...]
=# SELECT application_name,
pg_xlog_location_diff(sent_location, flush_location) AS replay_delta,
sync_priority,
sync_state
FROM pg_stat_replication ORDER BY replay_delta ASC, application_name;
application_name | replay_delta | sync_priority | sync_state
------------------+--------------+---------------+------------
node_5434 | 0 | 2 | sync
node_5435 | 0 | 0 | async
(2 rows)
=# create table ab (a int);
^CCancel request sent
WARNING: 01000: canceling wait for synchronous replication due to user
request
DETAIL: The transaction has already committed locally, but might not have
been replicated to the standby(s).
LOCATION: SyncRepWaitForLSN, syncrep.c:227
CREATE TABLE
Regards,
--
Michael
On Mon, Aug 11, 2014 at 2:10 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Mon, Aug 11, 2014 at 1:26 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
Thanks for updating the patch! Again I tested the feature and found
something
wrong. I set synchronous_standby_num to 2 and started three standbys. Two
of
them are included in synchronous_standby_names, i.e., they are synchronous
standbys. That is, the other one standby is always asynchronous. When
I shutdown one of synchronous standbys and executed the write transaction,
the transaction was successfully completed. So the transaction doesn't
wait for
two sync standbys in that case. Probably this is a bug.Well, that's working in my case :)
Oh, that worked in my machine, too, this time... I did something wrong.
Sorry for the noise.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Aug 11, 2014 at 4:26 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Mon, Aug 11, 2014 at 2:10 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
Oh, that worked in my machine, too, this time... I did something wrong.
Sorry for the noise.
No problem, thanks for spending time testing.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Aug 11, 2014 at 4:38 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Mon, Aug 11, 2014 at 4:26 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Mon, Aug 11, 2014 at 2:10 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
Oh, that worked in my machine, too, this time... I did something wrong.
Sorry for the noise.No problem, thanks for spending time testing.
Probably I got the similar but another problem. I set synchronous_standby_num
to 2 and started up two synchronous standbys. When I ran write transactions,
they were successfully completed. That's OK.
I sent the SIGSTOP signal to the walreceiver process in one of sync standbys,
and then ran write transactions again. In this case, they must not be completed
because their WAL cannot be replicated to the standby that its walreceiver
was stopped. But they were successfully completed.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 13, 2014 at 2:10 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
I sent the SIGSTOP signal to the walreceiver process in one of sync standbys,
and then ran write transactions again. In this case, they must not be completed
because their WAL cannot be replicated to the standby that its walreceiver
was stopped. But they were successfully completed.
At the end of SyncRepReleaseWaiters, SYNC_REP_WAIT_WRITE and
SYNC_REP_WAIT_FLUSH in walsndctl were able to update with only one wal
sender in sync, making backends wake up even if other standbys did not
catch up. But we need to scan all the synchronous wal senders and find
the minimum write and flush positions and update walsndctl with those
values. Well that's a code path I forgot to cover.
Attached is an updated patch fixing the problem you reported.
Regards,
--
Michael
Attachments:
20140813_multi_syncrep_v3.patchtext/x-patch; charset=US-ASCII; name=20140813_multi_syncrep_v3.patchDownload+292-172
On Wed, Aug 13, 2014 at 4:10 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Wed, Aug 13, 2014 at 2:10 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
I sent the SIGSTOP signal to the walreceiver process in one of sync standbys,
and then ran write transactions again. In this case, they must not be completed
because their WAL cannot be replicated to the standby that its walreceiver
was stopped. But they were successfully completed.At the end of SyncRepReleaseWaiters, SYNC_REP_WAIT_WRITE and
SYNC_REP_WAIT_FLUSH in walsndctl were able to update with only one wal
sender in sync, making backends wake up even if other standbys did not
catch up. But we need to scan all the synchronous wal senders and find
the minimum write and flush positions and update walsndctl with those
values. Well that's a code path I forgot to cover.Attached is an updated patch fixing the problem you reported.
+ At any one time there will be at a number of active
synchronous standbys
+ defined by <varname>synchronous_standby_num</>; transactions waiting
It's better to use <xref linkend="guc-synchronous-standby-num">, instead.
+ for commit will be allowed to proceed after those standby servers
+ confirms receipt of their data. The synchronous standbys will be
Typo: confirms -> confirm
+ <para>
+ Specifies the number of standbys that support
+ <firstterm>synchronous replication</>, as described in
+ <xref linkend="synchronous-replication">, and listed as the first
+ elements of <xref linkend="guc-synchronous-standby-names">.
+ </para>
+ <para>
+ Default value is 1.
+ </para>
synchronous_standby_num is defined with PGC_SIGHUP. So the following
should be added into the document.
This parameter can only be set in the postgresql.conf file or on
the server command line.
The name of the parameter "synchronous_standby_num" sounds to me that
the transaction must wait for its WAL to be replicated to s_s_num standbys.
But that's not true in your patch. If s_s_names is empty, replication works
asynchronously whether the value of s_s_num is. I'm afraid that it's confusing.
The description of s_s_num is not sufficient. I'm afraid that users can easily
misunderstand that they can use quorum commit feature by using s_s_names
and s_s_num. That is, the transaction waits for its WAL to be replicated to
any s_s_num standbys listed in s_s_names.
When s_s_num is set to larger value than max_wal_senders, we should warn that?
+ for (i = 0; i < num_sync; i++)
+ {
+ volatile WalSnd *walsndloc = &WalSndCtl->walsnds[sync_standbys[i]];
+
+ if (min_write_pos > walsndloc->write)
+ min_write_pos = walsndloc->write;
+ if (min_flush_pos > walsndloc->flush)
+ min_flush_pos = walsndloc->flush;
+ }
I don't think that it's safe to see those shared values without spinlock.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 14, 2014 at 8:34 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
+ At any one time there will be at a number of active synchronous standbys + defined by <varname>synchronous_standby_num</>; transactions waitingIt's better to use <xref linkend="guc-synchronous-standby-num">, instead.
Fixed.
+ for commit will be allowed to proceed after those standby servers + confirms receipt of their data. The synchronous standbys will beTypo: confirms -> confirm
Fixed.
+ <para> + Specifies the number of standbys that support + <firstterm>synchronous replication</>, as described in + <xref linkend="synchronous-replication">, and listed as the first + elements of <xref linkend="guc-synchronous-standby-names">. + </para> + <para> + Default value is 1. + </para>synchronous_standby_num is defined with PGC_SIGHUP. So the following
should be added into the document.This parameter can only be set in the postgresql.conf file or on
the server command line.
Fixed.
The name of the parameter "synchronous_standby_num" sounds to me that
the transaction must wait for its WAL to be replicated to s_s_num standbys.
But that's not true in your patch. If s_s_names is empty, replication works
asynchronously whether the value of s_s_num is. I'm afraid that it's confusing.
The description of s_s_num is not sufficient. I'm afraid that users can easily
misunderstand that they can use quorum commit feature by using s_s_names
and s_s_num. That is, the transaction waits for its WAL to be replicated to
any s_s_num standbys listed in s_s_names.
I reworked the docs to mention all that. Yes things are a bit
different than any quorum commit facility (how to parametrize that
simply without a parameter mapping one to one the items of
s_s_names?), as this facility relies on the order of the items of
s_s_names and the fact that stansbys are connected at a given time.
When s_s_num is set to larger value than max_wal_senders, we should warn that?
Actually I have done a bit more than that by forbidding setting
s_s_num to a value higher than max_wal_senders. Thoughts?
Now that we discuss the interactions with other parameters. Another
thing that I am wondering about now is: what should we do if we
specify s_s_num to a number higher than the elements in s_s_names?
Currently, the patch gives the priority to s_s_num, in short if we set
s_s_num to 100, server will wait for 100 servers to confirm commit
even if there are less than 100 elements in s_s_names. I chose this
way because it looks saner particularly if s_s_names = '*'. Thoughts
once again?
+ for (i = 0; i < num_sync; i++) + { + volatile WalSnd *walsndloc = &WalSndCtl->walsnds[sync_standbys[i]]; + + if (min_write_pos > walsndloc->write) + min_write_pos = walsndloc->write; + if (min_flush_pos > walsndloc->flush) + min_flush_pos = walsndloc->flush; + }I don't think that it's safe to see those shared values without spinlock.
Looking at walsender.c you are right. I have updated the code to use
the mutex lock of the walsender whose values are being read from.
Regards,
--
Michael
On Thu, Aug 14, 2014 at 4:34 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Wed, Aug 13, 2014 at 4:10 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Wed, Aug 13, 2014 at 2:10 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
I sent the SIGSTOP signal to the walreceiver process in one of sync standbys,
and then ran write transactions again. In this case, they must not be completed
because their WAL cannot be replicated to the standby that its walreceiver
was stopped. But they were successfully completed.At the end of SyncRepReleaseWaiters, SYNC_REP_WAIT_WRITE and
SYNC_REP_WAIT_FLUSH in walsndctl were able to update with only one wal
sender in sync, making backends wake up even if other standbys did not
catch up. But we need to scan all the synchronous wal senders and find
the minimum write and flush positions and update walsndctl with those
values. Well that's a code path I forgot to cover.Attached is an updated patch fixing the problem you reported.
+ At any one time there will be at a number of active synchronous standbys + defined by <varname>synchronous_standby_num</>; transactions waitingIt's better to use <xref linkend="guc-synchronous-standby-num">, instead.
+ for commit will be allowed to proceed after those standby servers + confirms receipt of their data. The synchronous standbys will beTypo: confirms -> confirm
+ <para> + Specifies the number of standbys that support + <firstterm>synchronous replication</>, as described in + <xref linkend="synchronous-replication">, and listed as the first + elements of <xref linkend="guc-synchronous-standby-names">. + </para> + <para> + Default value is 1. + </para>synchronous_standby_num is defined with PGC_SIGHUP. So the following
should be added into the document.This parameter can only be set in the postgresql.conf file or on
the server command line.The name of the parameter "synchronous_standby_num" sounds to me that
the transaction must wait for its WAL to be replicated to s_s_num standbys.
But that's not true in your patch. If s_s_names is empty, replication works
asynchronously whether the value of s_s_num is. I'm afraid that it's confusing.The description of s_s_num is not sufficient. I'm afraid that users can easily
misunderstand that they can use quorum commit feature by using s_s_names
and s_s_num. That is, the transaction waits for its WAL to be replicated to
any s_s_num standbys listed in s_s_names.When s_s_num is set to larger value than max_wal_senders, we should warn that?
+ for (i = 0; i < num_sync; i++) + { + volatile WalSnd *walsndloc = &WalSndCtl->walsnds[sync_standbys[i]]; + + if (min_write_pos > walsndloc->write) + min_write_pos = walsndloc->write; + if (min_flush_pos > walsndloc->flush) + min_flush_pos = walsndloc->flush; + }I don't think that it's safe to see those shared values without spinlock.
Regards,
--
Fujii Masao
--
Michael
Attachments:
20140815_multi_syncrep_v4.patchtext/x-patch; charset=US-ASCII; name=20140815_multi_syncrep_v4.patchDownload+341-174
On Fri, Aug 15, 2014 at 4:05 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Thu, Aug 14, 2014 at 8:34 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
+ At any one time there will be at a number of active synchronous standbys + defined by <varname>synchronous_standby_num</>; transactions waitingIt's better to use <xref linkend="guc-synchronous-standby-num">, instead.
Fixed.
+ for commit will be allowed to proceed after those standby servers + confirms receipt of their data. The synchronous standbys will beTypo: confirms -> confirm
Fixed.
+ <para> + Specifies the number of standbys that support + <firstterm>synchronous replication</>, as described in + <xref linkend="synchronous-replication">, and listed as the first + elements of <xref linkend="guc-synchronous-standby-names">. + </para> + <para> + Default value is 1. + </para>synchronous_standby_num is defined with PGC_SIGHUP. So the following
should be added into the document.This parameter can only be set in the postgresql.conf file or on
the server command line.Fixed.
The name of the parameter "synchronous_standby_num" sounds to me that
the transaction must wait for its WAL to be replicated to s_s_num standbys.
But that's not true in your patch. If s_s_names is empty, replication works
asynchronously whether the value of s_s_num is. I'm afraid that it's confusing.
The description of s_s_num is not sufficient. I'm afraid that users can easily
misunderstand that they can use quorum commit feature by using s_s_names
and s_s_num. That is, the transaction waits for its WAL to be replicated to
any s_s_num standbys listed in s_s_names.I reworked the docs to mention all that. Yes things are a bit
different than any quorum commit facility (how to parametrize that
simply without a parameter mapping one to one the items of
s_s_names?), as this facility relies on the order of the items of
s_s_names and the fact that stansbys are connected at a given time.When s_s_num is set to larger value than max_wal_senders, we should warn that?
Actually I have done a bit more than that by forbidding setting
s_s_num to a value higher than max_wal_senders. Thoughts?
You added check_synchronous_standby_num() as the GUC check function for
synchronous_standby_num, and checked that there. But that seems to be wrong.
You can easily see the following error messages even if synchronous_standby_num
is smaller than max_wal_senders. The point is that synchronous_standby_num
should be located before max_wal_senders in postgresql.conf.
LOG: invalid value for parameter "synchronous_standby_num": 0
DETAIL: synchronous_standby_num cannot be higher than max_wal_senders.
Now that we discuss the interactions with other parameters. Another
thing that I am wondering about now is: what should we do if we
specify s_s_num to a number higher than the elements in s_s_names?
Currently, the patch gives the priority to s_s_num, in short if we set
s_s_num to 100, server will wait for 100 servers to confirm commit
even if there are less than 100 elements in s_s_names. I chose this
way because it looks saner particularly if s_s_names = '*'. Thoughts
once again?
I'm fine with this. As you gave an example, the number of entries in s_s_names
can be smaller than the number of actual active sync standbys. For example,
when s_s_names is set to 'hoge', more than one standbys with the name 'hoge'
can connect to the server with sync mode.
I still think that it's strange that replication can be async even when
s_s_num is larger than zero. That is, I think that the transaction must
wait for s_s_num sync standbys whether s_s_names is empty or not.
OTOH, if s_s_num is zero, replication must be async whether s_s_names
is empty or not. At least for me, it's intuitive to use s_s_num primarily
to control the sync mode. Of course, other hackers may have different
thoughts, so we need to keep our ear open for them.
In the above design, one problem is that the number of parameters
that those who want to set up only one sync replication need to change is
incremented by one. That is, they need to change s_s_num additionally.
If we are really concerned about this, we can treat a value of -1 in
s_s_num as the special value, which allows us to control sync replication
only by s_s_names as we do now. That is, if s_s_names is empty,
replication would be async. Otherwise, only one standby with
high-priority in s_s_names becomes sync one. Probably the default of
s_s_num should be -1. Thought?
The source code comments at the top of syncrep.c need to be udpated.
It's worth checking whether there are other comments to be updated.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Aug 15, 2014 at 8:28 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
Now that we discuss the interactions with other parameters. Another
thing that I am wondering about now is: what should we do if we
specify s_s_num to a number higher than the elements in s_s_names?
Currently, the patch gives the priority to s_s_num, in short if we set
s_s_num to 100, server will wait for 100 servers to confirm commit
even if there are less than 100 elements in s_s_names. I chose this
way because it looks saner particularly if s_s_names = '*'. Thoughts
once again?I'm fine with this. As you gave an example, the number of entries in s_s_names
can be smaller than the number of actual active sync standbys. For example,
when s_s_names is set to 'hoge', more than one standbys with the name 'hoge'
can connect to the server with sync mode.
This is a bit tricky. Suppose there is one standby connected which
has reached the relevant WAL position. We then lose that connection,
and a new standby connects. When or if the second standby is known to
have reached the relevant WAL position, can we release waiters? It
depends. If the old and new connections are to two different standbys
that happen to have the same name, yes. But if it's the same standby
reconnecting, then no.
I still think that it's strange that replication can be async even when
s_s_num is larger than zero. That is, I think that the transaction must
wait for s_s_num sync standbys whether s_s_names is empty or not.
+1.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Aug 15, 2014 at 9:28 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
You added check_synchronous_standby_num() as the GUC check function for
synchronous_standby_num, and checked that there. But that seems to be
wrong.
You can easily see the following error messages even if
synchronous_standby_num
is smaller than max_wal_senders. The point is that synchronous_standby_num
should be located before max_wal_senders in postgresql.conf.LOG: invalid value for parameter "synchronous_standby_num": 0
DETAIL: synchronous_standby_num cannot be higher than max_wal_senders.
I am not sure what I can do here, so I am removing this check in the code,
and simply add a note in the docs that a value of _num higher than
max_wal_senders does not have much meaning.
I still think that it's strange that replication can be async even when
s_s_num is larger than zero. That is, I think that the transaction must
wait for s_s_num sync standbys whether s_s_names is empty or not.
OTOH, if s_s_num is zero, replication must be async whether s_s_names
is empty or not. At least for me, it's intuitive to use s_s_num primarily
to control the sync mode. Of course, other hackers may have different
thoughts, so we need to keep our ear open for them.
Sure, the compromise looks to be what you propose, and I am fine with that.
In the above design, one problem is that the number of parameters
that those who want to set up only one sync replication need to change is
incremented by one. That is, they need to change s_s_num additionally.
If we are really concerned about this, we can treat a value of -1 in
s_s_num as the special value, which allows us to control sync replication
only by s_s_names as we do now. That is, if s_s_names is empty,
replication would be async. Otherwise, only one standby with
high-priority in s_s_names becomes sync one. Probably the default of
s_s_num should be -1. Thought?
Taking into account those comments, attached is a patch doing the following
things depending on the values of _num and _names:
- If _num = -1 and _names is empty, all the standbys are considered as
async (same behavior as 9.1~, and default).
- If _num = -1 and _names has at least one item, wait for one standby, even
if it is not connected at the time of commit. If one node is found as sync,
other standbys listed in _names with higher priority than the sync one are
in potential state (same as existing behavior).
- If _num = 0, all the standbys are async, whatever the values in _names.
Priority is enforced to 0 for all the standbys. SyncStandbysDefined is set
to false in this case.
- If _num > 0, must wait for _num standbys whatever the values in _names
The default value of _num is -1. Documentation has been updated in
consequence.
The source code comments at the top of syncrep.c need to be udpated.
It's worth checking whether there are other comments to be updated.
Done. I have updated some comments in other places than the header.
Regards,
--
Michael
Attachments:
20140821_multi_syncrep_v5.patchtext/x-patch; charset=US-ASCII; name=20140821_multi_syncrep_v5.patchDownload+384-204
On 09 August 2014 11:33, Michael Paquier Wrote:
Please find attached a patch to add support of synchronous replication
for multiple standby servers. This is controlled by the addition of a
new GUC parameter called synchronous_standby_num, that makes server
wait for transaction commit on the first N standbys defined in
synchronous_standby_names. The implementation is really straight-
forward, and has just needed a couple of modifications in walsender.c
for pg_stat_get_wal_senders and syncrep.c.
I have just started looking into this patch.
Please find below my first level of observation from the patch:
1. Allocation of memory for sync_nodes in function SyncRepGetSynchronousNodes should be equivalent to allowed_sync_nodes instead of max_wal_senders. As anyway we are not going to store sync stdbys more than allowed_sync_nodes.
sync_nodes = (int *) palloc(allowed_sync_nodes * sizeof(int));
2. Logic of deciding the highest priority one seems to be in-correct.
Assume, s_s_num = 3, s_s_names = 3,4,2,1
standby nodes are in order as: 1,2,3,4,5,6,7
As per the logic in patch, node 4 with priority 2 will not be added in the list whereas 1,2,3 will be added.
The problem is because priority updated for next tracking is not the highest priority as of that iteration, it is just priority of last node added to the list. So it may happen that a node with higher priority is still there in list but we are comparing with some other smaller priority.
3. Can we optimize the function SyncRepGetSynchronousNodes in such a way that it gets the number of standby nodes from s_s_names itself. With this it will be usful to stop scanning the moment we get first s_s_num potential standbys.
Thanks and Regards,
Kumar Rajeev Rastogi
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Aug 22, 2014 at 7:14 PM, Rajeev rastogi <rajeev.rastogi@huawei.com>
wrote:
I have just started looking into this patch.
Please find below my first level of observation from the patch:
Thanks! Updated patch attached.
1. Allocation of memory for sync_nodes in function
SyncRepGetSynchronousNodes should be equivalent to allowed_sync_nodes
instead of max_wal_senders. As anyway we are not going to store sync stdbys
more than allowed_sync_nodes.
sync_nodes = (int *) palloc(allowed_sync_nodes *
sizeof(int));
Fixed.
2. Logic of deciding the highest priority one seems to be in-correct.
Assume, s_s_num = 3, s_s_names = 3,4,2,1
standby nodes are in order as: 1,2,3,4,5,6,7As per the logic in patch, node 4 with priority 2 will not be
added in the list whereas 1,2,3 will be added.The problem is because priority updated for next tracking is not
the highest priority as of that iteration, it is just priority of last node
added to the list. So it may happen that a node with higher priority
is still there in list but we are comparing with some other smaller
priority.
Fixed. Nice catch!
3. Can we optimize the function SyncRepGetSynchronousNodes in such a way
that it gets the number of standby nodes from s_s_names itself. With this
it will be usful to stop scanning the moment we get first s_s_num potential
standbys.
By doing so, we would need to scan the WAL sender array more than once (or
once if we can find N sync nodes with a name matching the first entry, smth
unlikely to happen). We would need as well to recalculate for a given item
in the list _names what is its priority and compare it with the existing
entries in the WAL sender list. So this is not worth the shot.
Also, using the priority instead of s_s_names is more solid as s_s_names is
now used only in SyncRepGetStandbyPriority to calculate the priority for a
given WAL sender, and is a function only called by a WAL sender itself when
it initializes.
Regards,
--
Michael
Attachments:
20140822_multi_syncrep_v6.patchtext/x-patch; charset=US-ASCII; name=20140822_multi_syncrep_v6.patchDownload+390-204
On Fri, Aug 22, 2014 at 11:42 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
2. Logic of deciding the highest priority one seems to be in-correct.
Assume, s_s_num = 3, s_s_names = 3,4,2,1
standby nodes are in order as: 1,2,3,4,5,6,7As per the logic in patch, node 4 with priority 2 will not be added in the list whereas 1,2,3 will be added.
The problem is because priority updated for next tracking is not the highest priority as of that iteration, it is just priority of last node added to the list. So it may happen that a node with higher priority is still there in list but we are comparing with some other smaller priority.
Fixed. Nice catch!
Actually by re-reading the code I wrote yesterday I found that the fix
in v6 for that is not correct. That's really fixed with v7 attached.
Regards,
--
Michael
Attachments:
20140823_multi_syncrep_v7.patchtext/x-patch; charset=US-ASCII; name=20140823_multi_syncrep_v7.patchDownload+321-81
On 23 August 2014 11:22, Michael Paquier Wrote:
2. Logic of deciding the highest priority one seems to be in-correct.
Assume, s_s_num = 3, s_s_names = 3,4,2,1
standby nodes are in order as: 1,2,3,4,5,6,7As per the logic in patch, node 4 with priority 2 will not
be added in the list whereas 1,2,3 will be added.
The problem is because priority updated for next tracking is
not the highest priority as of that iteration, it is just priority of
last node added to the list. So it may happen that a node with
higher priority is still there in list but we are comparing with some
other smaller priority.Fixed. Nice catch!
Actually by re-reading the code I wrote yesterday I found that the fix
in v6 for that is not correct. That's really fixed with v7 attached.
I have done some more review, below are my comments:
1. There are currently two loops on *num_sync, Can we simply the function SyncRepGetSynchronousNodes by moving the priority calculation inside the upper loop
if (*num_sync == allowed_sync_nodes)
{
for (j = 0; j < *num_sync; j++)
{
Anyway we require priority only if *num_sync == allowed_sync_nodes condition matches.
So in this loop itself, we can calculate the priority as well as assigment of new standbys with lower priority.
Let me know if you see any issue with this.
2. Comment inside the function SyncRepReleaseWaiters,
/*
* We should have found ourselves at least, except if it is not expected
* to find any synchronous nodes.
*/
Assert(num_sync > 0);
I think "except if it is not expected to find any synchronous nodes" is not required.
As if it has come till this point means atleast this node is synchronous.
3. Document says that s_s_num should be lesser than max_wal_senders but code wise there is no protection for the same.
IMHO, s_s_num should be lesser than or equal to max_wal_senders otherwise COMMIT will never return back the console without
any knowledge of user.
I see that some discussion has happened regarding this but I think just adding documentation for this is not enough.
I am not sure what issue is observed in adding check during GUC initialization but if there is unavoidable issue during GUC initialization
then can't we try to add check at later points.
4. Similary interaction between parameters s_s_names and s_s_num. I see some discussion has happened regarding this and it is acceptable
to have s_s_num more than s_s_names. But I was thinking should not give atleast some notice message to user for such case along with
some documentation.
config.sgml
5. "At any one time there will be at a number of active synchronous standbys": this sentence is not proper.
6. When this parameter is set to <literal>0</>, all the standby
nodes will be considered as asynchronous.
Can we make this as
When this parameter is set to <literal>0</>, all the standby
nodes will be considered as asynchronous irrespective of value of synchronous_standby_names.
7. Are considered as synchronous the first elements of
<xref linkend="guc-synchronous-standby-names"> in number of
<xref linkend="guc-synchronous-standby-num"> that are
connected.
Starting of this sentence looks to be incomplete.
high-availability.sgml
8. Standbys listed after this will take over the role
of synchronous standby if the first one should fail.
Should not we make it as:
Standbys listed after this will take over the role
of synchronous standby if any of the first synchronous-standby-num standby fails.
Let me know incase if something is not clear.
Thanks and Regards,
Kumar Rajeev Rastogi.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 27, 2014 at 2:46 PM, Rajeev rastogi
<rajeev.rastogi@huawei.com> wrote:
I have done some more review, below are my comments:
Thanks!
1. There are currently two loops on *num_sync, Can we simplify the function SyncRepGetSynchronousNodes by moving the priority calculation inside the upper loop
if (*num_sync == allowed_sync_nodes)
{
for (j = 0; j < *num_sync; j++)
{
Anyway we require priority only if *num_sync == allowed_sync_nodes condition matches.
So in this loop itself, we can calculate the priority as well as assigment of new standbys with lower priority.
Let me know if you see any issue with this.
OK, I see, yes this can minimize process a bit so I refactored the
code by integrating the second loop to the first. This has needed the
removal of the break portion as we need to find the highest priority
value among the nodes already determined as synchronous.
2. Comment inside the function SyncRepReleaseWaiters,
/*
* We should have found ourselves at least, except if it is not expected
* to find any synchronous nodes.
*/
Assert(num_sync > 0);I think "except if it is not expected to find any synchronous nodes" is not required.
As if it has come till this point means at least this node is synchronous.
Yes, removed.
3. Document says that s_s_num should be lesser than max_wal_senders but code wise there is no protection for the same.
IMHO, s_s_num should be lesser than or equal to max_wal_senders otherwise COMMIT will never return back the console without
any knowledge of user.
I see that some discussion has happened regarding this but I think just adding documentation for this is not enough.
I am not sure what issue is observed in adding check during GUC initialization but if there is unavoidable issue during GUC initialization then can't we try to add check at later points.
The trick here is that you cannot really return a warning at GUC
loading level to the user as a warning could be easily triggered if
for example s_s_num is present before max_wal_senders in
postgresql.conf. I am open to any solutions if there are any (like an
error when initializing WAL senders?!). Documentation seems enough for
me to warn the user.
4. Similary interaction between parameters s_s_names and s_s_num. I see some discussion has happened regarding this and it is acceptable
to have s_s_num more than s_s_names. But I was thinking should not give atleast some notice message to user for such case along with
some documentation.
Done. I added the following in the paragraph "Server will wait":
Hence it is recommended to not set <varname>synchronous_standby_num</>
to a value higher than the number of elements in
<varname>synchronous_standby_names</>.
5. "At any one time there will be at a number of active synchronous standbys": this sentence is not proper.
What about that:
"At any one time there can be a number of active synchronous standbys
up to the number defined by <xref
linkend="guc-synchronous-standby-num">"
6. When this parameter is set to <literal>0</>, all the standby
nodes will be considered as asynchronous.Can we make this as
When this parameter is set to <literal>0</>, all the standby
nodes will be considered as asynchronous irrespective of value of synchronous_standby_names.
Done. This seems proper for the user as we do not care at all about
s_s_names if _num = 0.
7. Are considered as synchronous the first elements of
<xref linkend="guc-synchronous-standby-names"> in number of
<xref linkend="guc-synchronous-standby-num"> that are
connected.Starting of this sentence looks to be incomplete.
OK, I reworked this part as well. I hope it is clearer.
8. Standbys listed after this will take over the role
of synchronous standby if the first one should fail.Should not we make it as:
Standbys listed after this will take over the role
of synchronous standby if any of the first synchronous-standby-num standby fails.
Fixed as proposed.
At the same I found a bug with pg_stat_get_wal_senders caused by a
NULL pointer that was freed when s_s_num = 0. Updated patch addressing
comments is attached. On top of that the documentation has been
reworked a bit by replacing the too-high amount of <xref> blocks by
<varname>, having a link to a given variable specified only once.
Regards,
--
Michael
Attachments:
20140828_multi_syncrep_v8.patchtext/x-diff; charset=US-ASCII; name=20140828_multi_syncrep_v8.patchDownload+404-204
On 08/28/2014 10:10 AM, Michael Paquier wrote:
+ #synchronous_standby_num = -1 # number of standbys servers using sync rep
To be honest, that's a horrible name for the GUC. Back when synchronous
replication was implemented, we had looong discussions on this feature.
It was called "quorum commit" back then. I'd suggest using the "quorum"
term in this patch, too, that's a fairly well-known term in distributed
computing for this.
When synchronous replication was added, quorum was left out to keep
things simple; the current feature set was the most we could all agree
on to be useful. If you search the archives for "quorum commit" you'll
see what I mean. There was a lot of confusion on what is possible and
what is useful, but regarding this particular patch: people wanted to be
able to describe more complicated scenarios. For example, imagine that
you have a master and two standbys in one the primary data center, and
two more standbys in a different data center. It should be possible to
specify that you must get acknowledgment from at least on standby in
both data centers. Maybe you could hack that by giving the standbys in
the same data center the same name, but it gets ugly, and it still won't
scale to even more complex scenarios.
Maybe that's OK - we don't necessarily need to solve all scenarios at
once. But it's worth considering.
BTW, how does this patch behave if there are multiple standbys connected
with the same name?
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers