Connection slots reserved for replication
Hello hackers,
at the moment it is possible to reserve some amount of connection slots for
superusers and this behavior is controlled by
superuser_reserved_connections configuration parameter with the default
value = 3.
In case if all non-reserved connection slots are busy, replica fails to
open a new connection and start streaming from the primary. Such behavior
is very bad if you want to run postgresql HA clusters
Initially, replication connections required superuser privileges (in 9.0)
and therefore they were deliberately excluded from
superuser_reserved_connections.
Basically that means it has never been possible to reserve come connection
slots for replication connections.
Later (9.1) it became possible to create a user with REPLICATION and
NOSUPERUSER options, but comment in the postinit.c still tells that
superuser is required.
Now I think now it is a time to go further, and we should make it possible
to reserve some connection slots for replication in a manner similar to
superuser connections.
How should it work:
1. If we know that we got the replication connection, we just should make
sure that there are at least superuser_reserved_connections free connection
slots are available.
2. If we know that this is neither superuser nor replication connection, we
should check that there are at least (superuser_reserved_connections +
NumWalSenders() - max_wal_senders) connection slots are available.
And the last question how to control the number of reserved slots for
replication. There are two options:
1. We can introduce a new GUC for that: replication_reserved_connections
2. Or we can just use the value of max_wal_senders
Personally, I more like the second option.
Attached patch implements above described functionality.
Feedback is very appretiated.
Regards,
--
Alexander Kukushkin
Attachments:
replication_reserved_connections.patchtext/x-patch; charset=US-ASCII; name=replication_reserved_connections.patchDownload+37-10
On Wed, Aug 1, 2018 at 9:30 PM, Alexander Kukushkin <cyberdemn@gmail.com> wrote:
Hello hackers,
at the moment it is possible to reserve some amount of connection slots for
superusers and this behavior is controlled by superuser_reserved_connections
configuration parameter with the default value = 3.In case if all non-reserved connection slots are busy, replica fails to open
a new connection and start streaming from the primary. Such behavior is very
bad if you want to run postgresql HA clusters
Yeah, that's also bad if we want to use pg_baseback in the situation.
Initially, replication connections required superuser privileges (in 9.0)
and therefore they were deliberately excluded from
superuser_reserved_connections. Basically that means it has never been
possible to reserve come connection slots for replication connections.Later (9.1) it became possible to create a user with REPLICATION and
NOSUPERUSER options, but comment in the postinit.c still tells that
superuser is required.Now I think now it is a time to go further, and we should make it possible
to reserve some connection slots for replication in a manner similar to
superuser connections.
+1
How should it work:
1. If we know that we got the replication connection, we just should make
sure that there are at least superuser_reserved_connections free connection
slots are available.
2. If we know that this is neither superuser nor replication connection, we
should check that there are at least (superuser_reserved_connections +
NumWalSenders() - max_wal_senders) connection slots are available.
You wanted to mean (superuser_reserved_connections + max_wal_senders -
NumWalSenders()) in the second point?
And the last question how to control the number of reserved slots for
replication. There are two options:
1. We can introduce a new GUC for that: replication_reserved_connections
2. Or we can just use the value of max_wal_sendersPersonally, I more like the second option.
One argrable point of the second option could be that it breaks
backward compatibility of the parameter configurations. That is, the
existing systems need to re-configure the max_connections. So it might
be better to take the first option with
replication_reservd_connections = 0 by default.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Hi,
2018-09-14 12:23 GMT+02:00 Masahiko Sawada <sawada.mshk@gmail.com>:
2. If we know that this is neither superuser nor replication connection, we
should check that there are at least (superuser_reserved_connections +
NumWalSenders() - max_wal_senders) connection slots are available.You wanted to mean (superuser_reserved_connections + max_wal_senders -
NumWalSenders()) in the second point?
Sure, my bad. Did a mistake when writing an email, but in the attached
file it looks good.
One argrable point of the second option could be that it breaks
backward compatibility of the parameter configurations. That is, the
existing systems need to re-configure the max_connections. So it might
be better to take the first option with
replication_reservd_connections = 0 by default.
Please find attached the new version of the patch, which introduces
replication_reservd_connections GUC
Regards,
--
Alexander Kukushkin
Attachments:
replication_reserved_connections-v2.patchtext/x-patch; charset=US-ASCII; name=replication_reserved_connections-v2.patchDownload+86-10
Hello.
I agree to the objective.
At Mon, 17 Sep 2018 14:25:56 +0200, Alexander Kukushkin <cyberdemn@gmail.com> wrote in <CAFh8B=nbh4gbFCiT-jpjth60QJC1pKoWkvgke+7di-FgAduGLQ@mail.gmail.com>
Hi,
2018-09-14 12:23 GMT+02:00 Masahiko Sawada <sawada.mshk@gmail.com>:
2. If we know that this is neither superuser nor replication connection, we
should check that there are at least (superuser_reserved_connections +
NumWalSenders() - max_wal_senders) connection slots are available.You wanted to mean (superuser_reserved_connections + max_wal_senders -
NumWalSenders()) in the second point?Sure, my bad. Did a mistake when writing an email, but in the attached
file it looks good.One argrable point of the second option could be that it breaks
backward compatibility of the parameter configurations. That is, the
existing systems need to re-configure the max_connections. So it might
be better to take the first option with
replication_reservd_connections = 0 by default.Please find attached the new version of the patch, which introduces
replication_reservd_connections GUC
With this patch, non-superuser is rejected if there are less than
super_res_conn + (remain of repl_res_conn). It gives the same
effect for walsenders with just sharing
superuser_reserved_connection by superusers and walsenders.
Instaed, we can iterally "reserve" connection slots for the
specific use by providing ProcGlobal->walsenderFreeProcs. The
slots are never stolen for other usage. Allowing additional
walsenders comes after all the slots are filled to grab an
available "normal" slot, it works as the same as the current
behavior when walsender_reserved_connectsions = 0.
What do you think about this?
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Hi,
On 20 September 2018 at 08:18, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Instaed, we can iterally "reserve" connection slots for the
specific use by providing ProcGlobal->walsenderFreeProcs. The
slots are never stolen for other usage. Allowing additional
walsenders comes after all the slots are filled to grab an
available "normal" slot, it works as the same as the current
behavior when walsender_reserved_connectsions = 0.What do you think about this?
Sounds reasonable, please see the updated patch.
Regards,
--
Alexander Kukushkin
Attachments:
replication_reserved_connections-v3.patchtext/x-patch; charset=US-ASCII; name=replication_reserved_connections-v3.patchDownload+62-4
Hi,
Attached rebased version patch to the current HEAD and created commit fest entry
Show quoted text
On Fri, 21 Sep 2018 at 13:43, Alexander Kukushkin <cyberdemn@gmail.com> wrote:
Hi,
On 20 September 2018 at 08:18, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:Instaed, we can iterally "reserve" connection slots for the
specific use by providing ProcGlobal->walsenderFreeProcs. The
slots are never stolen for other usage. Allowing additional
walsenders comes after all the slots are filled to grab an
available "normal" slot, it works as the same as the current
behavior when walsender_reserved_connectsions = 0.What do you think about this?
Sounds reasonable, please see the updated patch.
Regards,
--
Alexander Kukushkin
Attachments:
replication_reserved_connections-v4.patchtext/x-patch; charset=US-ASCII; name=replication_reserved_connections-v4.patchDownload+62-4
Hello. Thank you for the new version.
At Thu, 1 Nov 2018 11:58:52 +0100, Alexander Kukushkin <cyberdemn@gmail.com> wrote in <CAFh8B=kaX0uWdyZXn3xZPgRqhHrbiOWwFhWStdG0fvJ4is21iA@mail.gmail.com>
Hi,
Attached rebased version patch to the current HEAD and created commit fest entry
On Fri, 21 Sep 2018 at 13:43, Alexander Kukushkin <cyberdemn@gmail.com> wrote:Hi,
On 20 September 2018 at 08:18, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:Instaed, we can iterally "reserve" connection slots for the
specific use by providing ProcGlobal->walsenderFreeProcs. The
slots are never stolen for other usage. Allowing additional
walsenders comes after all the slots are filled to grab an
available "normal" slot, it works as the same as the current
behavior when walsender_reserved_connectsions = 0.What do you think about this?
Sounds reasonable, please see the updated patch.
InitializeMaxBackends()
MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
- max_worker_processes;
+ max_worker_processes + replication_reserved_connections;
This means walsender doesn't comsume a connection, which is
different from the current behavior. We should reserve a part of
MaxConnections for walsenders. (in PostmasterMain,
max_wal_senders is counted as a part of MaxConnections)
+ if (am_walsender && replication_reserved_connections < max_wal_senders
+ && *procgloballist == NULL)
+ procgloballist = &ProcGlobal->freeProcs;
Currently exccesive number of walsenders are rejected in
InitWalSenderSlot and emit the following error.
ereport(FATAL,
(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
errmsg("number of requested standby connections "
"exceeds max_wal_senders (currently %d)",
max_wal_senders)));
With this patch, if max_wal_senders =
replication_reserved_connections = 3 and the fourth walreceiver
comes, we will get "FATAL: sorry, too many clients already"
instead. It should be fixed.
When r_r_conn = 2 and max_wal_senders = 3 and the three
walsenders are active, in an exreme case where a new replication
connection comes at the same time another is exiting, we could
end up using two normal slots despite that one slot is vacant in
reserved slots. If we want to avoid the case, we need to limit
the number of normal slots to use. I don't think it is acceptable
as is but basically something like the attached would do that.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
replication_reserved_connections_limitnormal.patchtext/x-patch; charset=us-asciiDownload+62-30
On Tue, Nov 6, 2018 at 9:16 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hello. Thank you for the new version.
At Thu, 1 Nov 2018 11:58:52 +0100, Alexander Kukushkin <cyberdemn@gmail.com> wrote in <CAFh8B=kaX0uWdyZXn3xZPgRqhHrbiOWwFhWStdG0fvJ4is21iA@mail.gmail.com>
Hi,
Attached rebased version patch to the current HEAD and created commit fest entry
On Fri, 21 Sep 2018 at 13:43, Alexander Kukushkin <cyberdemn@gmail.com> wrote:Hi,
On 20 September 2018 at 08:18, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:Instaed, we can iterally "reserve" connection slots for the
specific use by providing ProcGlobal->walsenderFreeProcs. The
slots are never stolen for other usage. Allowing additional
walsenders comes after all the slots are filled to grab an
available "normal" slot, it works as the same as the current
behavior when walsender_reserved_connectsions = 0.What do you think about this?
Sounds reasonable, please see the updated patch.
InitializeMaxBackends() MaxBackends = MaxConnections + autovacuum_max_workers + 1 + - max_worker_processes; + max_worker_processes + replication_reserved_connections;This means walsender doesn't comsume a connection, which is
different from the current behavior. We should reserve a part of
MaxConnections for walsenders. (in PostmasterMain,
max_wal_senders is counted as a part of MaxConnections)
Yes. We can force replication_reserved_connections <= max_wal_senders
and then reserved connections for replication should be a part of
MaxConnections.
+ if (am_walsender && replication_reserved_connections < max_wal_senders + && *procgloballist == NULL) + procgloballist = &ProcGlobal->freeProcs;Currently exccesive number of walsenders are rejected in
InitWalSenderSlot and emit the following error.ereport(FATAL,
(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
errmsg("number of requested standby connections "
"exceeds max_wal_senders (currently %d)",
max_wal_senders)));With this patch, if max_wal_senders =
replication_reserved_connections = 3 and the fourth walreceiver
comes, we will get "FATAL: sorry, too many clients already"
instead. It should be fixed.When r_r_conn = 2 and max_wal_senders = 3 and the three
walsenders are active, in an exreme case where a new replication
connection comes at the same time another is exiting, we could
end up using two normal slots despite that one slot is vacant in
reserved slots.
Doesn't the max_wal_senders prevent the case?
Wal senders can get connection if we have free procs more than
(MaxConnections - reserved for superusers). So I think for normal
users the connection must be refused if (MaxConnections - (reserved
for superuser and replication) > # of freeprocs) and for wal senders
the connection also must be refused if (MaxConnections - (reserved for
superuser) > # of freeprocs). I'm not sure we need such trick in
InitWalSenderSlot().
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATIONNTT Open Source Software Center
Hello.
At Wed, 7 Nov 2018 19:31:00 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoASCq808+iqcFoVuLu-+i8kon=6wN3+sY=EVKGm-56qig@mail.gmail.com>
On Tue, Nov 6, 2018 at 9:16 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:InitializeMaxBackends() MaxBackends = MaxConnections + autovacuum_max_workers + 1 + - max_worker_processes; + max_worker_processes + replication_reserved_connections;This means walsender doesn't comsume a connection, which is
different from the current behavior. We should reserve a part of
MaxConnections for walsenders. (in PostmasterMain,
max_wal_senders is counted as a part of MaxConnections)Yes. We can force replication_reserved_connections <= max_wal_senders
and then reserved connections for replication should be a part of
MaxConnections.+ if (am_walsender && replication_reserved_connections < max_wal_senders + && *procgloballist == NULL) + procgloballist = &ProcGlobal->freeProcs;Currently exccesive number of walsenders are rejected in
InitWalSenderSlot and emit the following error.ereport(FATAL,
(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
errmsg("number of requested standby connections "
"exceeds max_wal_senders (currently %d)",
max_wal_senders)));With this patch, if max_wal_senders =
replication_reserved_connections = 3 and the fourth walreceiver
comes, we will get "FATAL: sorry, too many clients already"
instead. It should be fixed.When r_r_conn = 2 and max_wal_senders = 3 and the three
walsenders are active, in an exreme case where a new replication
connection comes at the same time another is exiting, we could
end up using two normal slots despite that one slot is vacant in
reserved slots.Doesn't the max_wal_senders prevent the case?
Currently the variable doesn't work as so. We once accept the
connection request and searches for a vacant slot in
InitWalSenderSlot and reject the connection if it found that no
room is available. Even with this patch, we don't count the
accurate number of active walsenders (for performance reason). If
reserved slot are filled, there's no way other than to accept the
connection using non-reserved slot if r_r_conn <
max_wal_senders. If one of active walsenders went away since we
allocated non-reserved connection slot until InitWalSenderSlot
starts searching sendnds[] array. Finally the new walsender on
the unreserved slot is activated, and one reserved slot is left
empty. So this is "an extreme case". We could ignore the case.
I'm doubt that we should allow the setting where r_r_conn <
max_wal_senders, or even r_r_conn != max_wal_senders. We don't
have a problem like this if we don't allow the cases.
Wal senders can get connection if we have free procs more than
(MaxConnections - reserved for superusers). So I think for normal
users the connection must be refused if (MaxConnections - (reserved
for superuser and replication) > # of freeprocs) and for wal senders
the connection also must be refused if (MaxConnections - (reserved for
superuser) > # of freeprocs). I'm not sure we need such trick in
InitWalSenderSlot().
(For clarity, I don't mean my previous patch is good solution.)
It works as far as we accept that some reserved slots can be left
unused despite of some walsenders are using normal slots. (Just
exiting a walsender using reserved slot causes this but it is
usually occupied by walsenders comes later)
Another idea is we acquire a walsnd[] slot before getting a
connection slot..
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Thu, Nov 8, 2018 at 9:30 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hello.
At Wed, 7 Nov 2018 19:31:00 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoASCq808+iqcFoVuLu-+i8kon=6wN3+sY=EVKGm-56qig@mail.gmail.com>
On Tue, Nov 6, 2018 at 9:16 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:InitializeMaxBackends() MaxBackends = MaxConnections + autovacuum_max_workers + 1 + - max_worker_processes; + max_worker_processes + replication_reserved_connections;This means walsender doesn't comsume a connection, which is
different from the current behavior. We should reserve a part of
MaxConnections for walsenders. (in PostmasterMain,
max_wal_senders is counted as a part of MaxConnections)Yes. We can force replication_reserved_connections <= max_wal_senders
and then reserved connections for replication should be a part of
MaxConnections.+ if (am_walsender && replication_reserved_connections < max_wal_senders + && *procgloballist == NULL) + procgloballist = &ProcGlobal->freeProcs;Currently exccesive number of walsenders are rejected in
InitWalSenderSlot and emit the following error.ereport(FATAL,
(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
errmsg("number of requested standby connections "
"exceeds max_wal_senders (currently %d)",
max_wal_senders)));With this patch, if max_wal_senders =
replication_reserved_connections = 3 and the fourth walreceiver
comes, we will get "FATAL: sorry, too many clients already"
instead. It should be fixed.When r_r_conn = 2 and max_wal_senders = 3 and the three
walsenders are active, in an exreme case where a new replication
connection comes at the same time another is exiting, we could
end up using two normal slots despite that one slot is vacant in
reserved slots.Doesn't the max_wal_senders prevent the case?
Currently the variable doesn't work as so. We once accept the
connection request and searches for a vacant slot in
InitWalSenderSlot and reject the connection if it found that no
room is available. Even with this patch, we don't count the
accurate number of active walsenders (for performance reason). If
reserved slot are filled, there's no way other than to accept the
connection using non-reserved slot if r_r_conn <
max_wal_senders. If one of active walsenders went away since we
allocated non-reserved connection slot until InitWalSenderSlot
starts searching sendnds[] array. Finally the new walsender on
the unreserved slot is activated, and one reserved slot is left
empty. So this is "an extreme case". We could ignore the case.I'm doubt that we should allow the setting where r_r_conn <
max_wal_senders, or even r_r_conn != max_wal_senders. We don't
have a problem like this if we don't allow the cases.Wal senders can get connection if we have free procs more than
(MaxConnections - reserved for superusers). So I think for normal
users the connection must be refused if (MaxConnections - (reserved
for superuser and replication) > # of freeprocs) and for wal senders
the connection also must be refused if (MaxConnections - (reserved for
superuser) > # of freeprocs). I'm not sure we need such trick in
InitWalSenderSlot().(For clarity, I don't mean my previous patch is good solution.)
It works as far as we accept that some reserved slots can be left
unused despite of some walsenders are using normal slots. (Just
exiting a walsender using reserved slot causes this but it is
usually occupied by walsenders comes later)Another idea is we acquire a walsnd[] slot before getting a
connection slot..
After more thought, I'm inclined to agree to reserve max_wal_senders
slots and not to have replication_reserved_connections parameter.
For superuser_reserved_connection, actually it works so that we
certainly reserve slots for superuser in case where slots are almost
full regardless of who is using other slots incluing superusers
themselves. But replication connections requires different behaviour
as it has the another limit (max_wal_senders). If we have
replication_reserved_connections < max_wal_senders, it could end up
with the same issue as what originally reported on this thread.
Therefore many users would set replication_reserved_connections =
max_wal_senders.
On the other hand, If we always reserve max_wal_senders slots
available slots for normal backend will get decreased in the next
release, which require for users to re-confiugre the max_connection.
But I felt this behavior seems more natural than the current one, so I
think the re-configuration can be acceptable for users.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATIONNTT Open Source Software Center
On Fri, Nov 9, 2018 at 2:02 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:
On Thu, Nov 8, 2018 at 9:30 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:Hello.
At Wed, 7 Nov 2018 19:31:00 +0900, Masahiko Sawada <
sawada.mshk@gmail.com> wrote in <CAD21AoASCq808+iqcFoVuLu-+i8kon=6wN3+sY=
EVKGm-56qig@mail.gmail.com>On Tue, Nov 6, 2018 at 9:16 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:InitializeMaxBackends() MaxBackends = MaxConnections + autovacuum_max_workers + 1 + - max_worker_processes; + max_worker_processes +replication_reserved_connections;
This means walsender doesn't comsume a connection, which is
different from the current behavior. We should reserve a part of
MaxConnections for walsenders. (in PostmasterMain,
max_wal_senders is counted as a part of MaxConnections)Yes. We can force replication_reserved_connections <= max_wal_senders
and then reserved connections for replication should be a part of
MaxConnections.+ if (am_walsender && replication_reserved_connections <
max_wal_senders
+ && *procgloballist == NULL) + procgloballist = &ProcGlobal->freeProcs;Currently exccesive number of walsenders are rejected in
InitWalSenderSlot and emit the following error.ereport(FATAL,
(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
errmsg("number of requested standby connections "
"exceeds max_wal_senders (currently %d)",
max_wal_senders)));With this patch, if max_wal_senders =
replication_reserved_connections = 3 and the fourth walreceiver
comes, we will get "FATAL: sorry, too many clients already"
instead. It should be fixed.When r_r_conn = 2 and max_wal_senders = 3 and the three
walsenders are active, in an exreme case where a new replication
connection comes at the same time another is exiting, we could
end up using two normal slots despite that one slot is vacant in
reserved slots.Doesn't the max_wal_senders prevent the case?
Currently the variable doesn't work as so. We once accept the
connection request and searches for a vacant slot in
InitWalSenderSlot and reject the connection if it found that no
room is available. Even with this patch, we don't count the
accurate number of active walsenders (for performance reason). If
reserved slot are filled, there's no way other than to accept the
connection using non-reserved slot if r_r_conn <
max_wal_senders. If one of active walsenders went away since we
allocated non-reserved connection slot until InitWalSenderSlot
starts searching sendnds[] array. Finally the new walsender on
the unreserved slot is activated, and one reserved slot is left
empty. So this is "an extreme case". We could ignore the case.I'm doubt that we should allow the setting where r_r_conn <
max_wal_senders, or even r_r_conn != max_wal_senders. We don't
have a problem like this if we don't allow the cases.Wal senders can get connection if we have free procs more than
(MaxConnections - reserved for superusers). So I think for normal
users the connection must be refused if (MaxConnections - (reserved
for superuser and replication) > # of freeprocs) and for wal senders
the connection also must be refused if (MaxConnections - (reserved for
superuser) > # of freeprocs). I'm not sure we need such trick in
InitWalSenderSlot().(For clarity, I don't mean my previous patch is good solution.)
It works as far as we accept that some reserved slots can be left
unused despite of some walsenders are using normal slots. (Just
exiting a walsender using reserved slot causes this but it is
usually occupied by walsenders comes later)Another idea is we acquire a walsnd[] slot before getting a
connection slot..After more thought, I'm inclined to agree to reserve max_wal_senders
slots and not to have replication_reserved_connections parameter.For superuser_reserved_connection, actually it works so that we
certainly reserve slots for superuser in case where slots are almost
full regardless of who is using other slots incluing superusers
themselves. But replication connections requires different behaviour
as it has the another limit (max_wal_senders). If we have
replication_reserved_connections < max_wal_senders, it could end up
with the same issue as what originally reported on this thread.
Therefore many users would set replication_reserved_connections =
max_wal_senders.On the other hand, If we always reserve max_wal_senders slots
available slots for normal backend will get decreased in the next
release, which require for users to re-confiugre the max_connection.
But I felt this behavior seems more natural than the current one, so I
think the re-configuration can be acceptable for users.
Maybe what we should do instead is not consider max_wal_senders a part of
the total number of connections, and instead size the things that needs to
be sized by them by max_connections + max_wal_senders. That seems more
logical given how the parameters are named as well.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Greetings,
* Magnus Hagander (magnus@hagander.net) wrote:
On Fri, Nov 9, 2018 at 2:02 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On the other hand, If we always reserve max_wal_senders slots
available slots for normal backend will get decreased in the next
release, which require for users to re-confiugre the max_connection.
But I felt this behavior seems more natural than the current one, so I
think the re-configuration can be acceptable for users.Maybe what we should do instead is not consider max_wal_senders a part of
the total number of connections, and instead size the things that needs to
be sized by them by max_connections + max_wal_senders. That seems more
logical given how the parameters are named as well.
I tend to agree with having max_connections + max_wal_senders.
Thanks!
Stephen
Hi,
attaching the new version of the patch.
Now it simply reserves max_wal_senders slots in the ProcGlobal, what
guarantees that only walsender process could use them.
Regards,
--
Alexander Kukushkin
Attachments:
replication_reserved_connections-v5.patchtext/x-patch; charset=US-ASCII; name=replication_reserved_connections-v5.patchDownload+21-12
On 30. Nov 2018, at 13:58, Alexander Kukushkin <cyberdemn@gmail.com> wrote:
attaching the new version of the patch.
Now it simply reserves max_wal_senders slots in the ProcGlobal, what
guarantees that only walsender process could use them.
With this patch It looks like InitProcess will trigger the generic error about 'too many clients' before the more specific error message in InitWalSenderSlot when exceeding the number of max_wal_senders.
Does excluding WAL senders from the max_connections limit and including max_wal_senders in MaxBackends also imply that we need to add max_wal_senders to the list at xlog.c: CheckRequiredParameterValues, requiring its value on the replica to be not lower than the one on the primary?
Cheers,
Oleksii
On 05/12/2018 15:33, Oleksii Kliukin wrote:
On 30. Nov 2018, at 13:58, Alexander Kukushkin <cyberdemn@gmail.com> wrote:
attaching the new version of the patch.
Now it simply reserves max_wal_senders slots in the ProcGlobal, what
guarantees that only walsender process could use them.With this patch It looks like InitProcess will trigger the generic error about 'too many clients' before the more specific error message in InitWalSenderSlot when exceeding the number of max_wal_senders.
Does excluding WAL senders from the max_connections limit and including max_wal_senders in MaxBackends also imply that we need to add max_wal_senders to the list at xlog.c: CheckRequiredParameterValues, requiring its value on the replica to be not lower than the one on the primary?
I think it does, we need the proc slots for walsenders on the standby
same way we do for normal backends.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi,
On Thu, 6 Dec 2018 at 00:55, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
Does excluding WAL senders from the max_connections limit and including max_wal_senders in MaxBackends also imply that we need to add max_wal_senders to the list at xlog.c: CheckRequiredParameterValues, requiring its value on the replica to be not lower than the one on the primary?
I think it does, we need the proc slots for walsenders on the standby
same way we do for normal backends.
You are absolutely right. Attaching the new version of the patch.
Regards,
--
Alexander Kukushkin
Attachments:
replication_reserved_connections-v6.patchtext/x-patch; charset=US-ASCII; name=replication_reserved_connections-v6.patchDownload+54-24
On Mon, Dec 17, 2018, at 14:10, Alexander Kukushkin wrote:
Hi,
On Thu, 6 Dec 2018 at 00:55, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
Does excluding WAL senders from the max_connections limit and including max_wal_senders in MaxBackends also imply that we need to add max_wal_senders to the list at xlog.c: CheckRequiredParameterValues, requiring its value on the replica to be not lower than the one on the primary?
I think it does, we need the proc slots for walsenders on the standby
same way we do for normal backends.You are absolutely right. Attaching the new version of the patch.
Thank you. I've checked that the replica correctly complains when its value of max_wal_senders is lower than the one on the primary at v6.
As stated in my previous comment, I think we should retain the specific error message on exceeding max_wal_senders, instead of showing the generic "too many clients already'. Attached is the patch that fixes this small thing. I've also rebased it against the master and took a liberty of naming it v7. It makes me wondering why don't we apply the same level of details to the regular out of connection message and don't show the actual value of max_connections in the error text?
The code diff to the previous version is rather small:
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index d1a8113cb6..df073673f5 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2273,8 +2273,8 @@ InitWalSenderSlot(void)
Assert(MyWalSnd == NULL);
/*
- * Find a free walsender slot and reserve it. If this fails, we must be
- * out of WalSnd structures.
+ * Find a free walsender slot and reserve it. This must not fail due
+ * to the prior check for free walsenders at InitProcess.
*/
for (i = 0; i < max_wal_senders; i++)
{
@@ -2310,13 +2310,7 @@ InitWalSenderSlot(void)
break;
}
}
- if (MyWalSnd == NULL)
- ereport(FATAL,
- (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
- errmsg("number of requested standby connections "
- "exceeds max_wal_senders (currently %d)",
- max_wal_senders)));
-
+ Assert(MyWalSnd != NULL);
/* Arrange to clean up at walsender exit */
on_shmem_exit(WalSndKill, 0);
}
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 33387fb71b..fe33fc42e3 100644
@ -341,6 +353,12 @@ InitProcess(void)
* in the autovacuum case?
*/
SpinLockRelease(ProcStructLock);
+ if (am_walsender)
+ ereport(FATAL,
+ (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
+ errmsg("number of requested standby connections "
+ "exceeds max_wal_senders (currently %d)",
+ max_wal_senders)));
ereport(FATAL,
(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
errmsg("sorry, too many clients already")));
Cheers,
Oleksii
Attachments:
replication_reserved_connections_v7.patchtext/plain; name=replication_reserved_connections_v7.patchDownload+63-33
Hi,
On 02/01/2019 10:21, Oleksii Kliukin wrote:
On Mon, Dec 17, 2018, at 14:10, Alexander Kukushkin wrote:
Hi,
On Thu, 6 Dec 2018 at 00:55, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
Does excluding WAL senders from the max_connections limit and including max_wal_senders in MaxBackends also imply that we need to add max_wal_senders to the list at xlog.c: CheckRequiredParameterValues, requiring its value on the replica to be not lower than the one on the primary?
I think it does, we need the proc slots for walsenders on the standby
same way we do for normal backends.You are absolutely right. Attaching the new version of the patch.
Thank you. I've checked that the replica correctly complains when its value of max_wal_senders is lower than the one on the primary at v6.
As stated in my previous comment, I think we should retain the specific error message on exceeding max_wal_senders, instead of showing the generic "too many clients already'. Attached is the patch that fixes this small thing. I've also rebased it against the master and took a liberty of naming it v7. It makes me wondering why don't we apply the same level of details to the regular out of connection message and don't show the actual value of max_connections in the error text?
+1
The patch generally looks good, but the documentation of max_wal_senders
needs updating. In config.sgml we say:
WAL sender processes count towards the total number
of connections, so this parameter's value must be less than
<xref linkend="guc-max-connections"/> minus
<xref linkend="guc-superuser-reserved-connections"/>.
This is now misleading.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jan 2, 2019, at 11:02, Petr Jelinek wrote:
The patch generally looks good, but the documentation of max_wal_senders
needs updating. In config.sgml we say:WAL sender processes count towards the total number
of connections, so this parameter's value must be less than
<xref linkend="guc-max-connections"/> minus
<xref linkend="guc-superuser-reserved-connections"/>.This is now misleading.
Thank you. Attached the new version(called it v8) with the following changes to the documentation:
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e94b305add..6634ce8cdc 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -697,8 +697,7 @@ include_dir 'conf.d'
<para>
The default value is three connections. The value must be less
- than <varname>max_connections</varname> minus
- <xref linkend="guc-max-wal-senders"/>.
+ than <varname>max_connections</varname>.
This parameter can only be set at server start.
</para>
</listitem>
@@ -3453,24 +3452,25 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows
</term>
<listitem>
<para>
- Specifies the maximum number of concurrent connections from
- standby servers or streaming base backup clients (i.e., the
- maximum number of simultaneously running WAL sender
- processes). The default is 10. The value 0 means replication is
- disabled. WAL sender processes count towards the total number
- of connections, so this parameter's value must be less than
- <xref linkend="guc-max-connections"/> minus
- <xref linkend="guc-superuser-reserved-connections"/>.
- Abrupt streaming client disconnection might leave an orphaned
- connection slot behind until
- a timeout is reached, so this parameter should be set slightly
- higher than the maximum number of expected clients so disconnected
- clients can immediately reconnect. This parameter can only
- be set at server start.
+ Specifies the maximum number of concurrent connections from standby
+ servers or streaming base backup clients (i.e., the maximum number of
+ simultaneously running WAL sender processes). The default is 10. The
+ value 0 means replication is disabled. Abrupt streaming client
+ disconnection might leave an orphaned connection slot behind until a
+ timeout is reached, so this parameter should be set slightly higher
+ than the maximum number of expected clients so disconnected clients
+ can immediately reconnect. This parameter can only be set at server
+ start.
Also, <varname>wal_level</varname> must be set to
<literal>replica</literal> or higher to allow connections from standby
servers.
</para>
+
+ <para>
+ When running a standby server, you must set this parameter to the
+ same or higher value than on the master server. Otherwise, queries
+ will not be allowed in the standby server.
+ </para>
</listitem>
</varlistentry>
Cheers,
Oleksii
Attachments:
replication_reserved_connections_v8.patchtext/plain; name=replication_reserved_connections_v8.patchDownload+79-49
Hi,
On Wed, 2 Jan 2019 at 12:17, Oleksii Kliukin <alexk@hintbits.com> wrote:
Thank you. Attached the new version(called it v8) with the following changes to the documentation:
Thank you for jumping on it. Your changes look good to me.
I was also thinking about changing the value in PG_CONTROL_VERSION,
because we added the new field into the control file, but decided to
leave this change to committer.
Regards,
--
Alexander Kukushkin