Support load balancing in libpq
Load balancing connections across multiple read replicas is a pretty
common way of scaling out read queries. There are two main ways of doing
so, both with their own advantages and disadvantages:
1. Load balancing at the client level
2. Load balancing by connecting to an intermediary load balancer
Option 1 has been supported by JDBC (Java) for 8 years and Npgsql (C#)
merged support about a year ago. This patch adds the same functionality
to libpq. The way it's implemented is the same as the implementation of
JDBC, and contains two levels of load balancing:
1. The given hosts are randomly shuffled, before resolving them
one-by-one.
2. Once a host its addresses get resolved, those addresses are shuffled,
before trying to connect to them one-by-one.
Attachments:
0001-Support-load-balancing-in-libpq.patchapplication/octet-stream; name=0001-Support-load-balancing-in-libpq.patchDownload+203-33
Hi Jelte,
Load balancing connections across multiple read replicas is a pretty
common way of scaling out read queries. There are two main ways of doing
so, both with their own advantages and disadvantages:
1. Load balancing at the client level
2. Load balancing by connecting to an intermediary load balancerOption 1 has been supported by JDBC (Java) for 8 years and Npgsql (C#)
merged support about a year ago. This patch adds the same functionality
to libpq. The way it's implemented is the same as the implementation of
JDBC, and contains two levels of load balancing:
1. The given hosts are randomly shuffled, before resolving them
one-by-one.
2. Once a host its addresses get resolved, those addresses are shuffled,
before trying to connect to them one-by-one.
Thanks for the patch.
I don't mind the feature but I believe the name is misleading. Unless
I missed something, the patch merely allows choosing a random host
from the provided list. By load balancing people generally expect
something more elaborate - e.g. sending read-only queries to replicas
and read/write queries to the primary, or perhaps using weights
proportional to the server throughput/performance.
Randomization would be a better term for what the patch does.
--
Best regards,
Aleksander Alekseev
I tried to stay in line with the naming of this same option in JDBC and
Npgsql, where it's called "loadBalanceHosts" and "Load Balance Hosts"
respectively. So, actually to be more in line it should be the option for
libpq should be called "load_balance_hosts" (not "loadbalance" like
in the previous patch). I attached a new patch with the name of the
option changed to this.
I also don't think the name is misleading. Randomization of hosts will
automatically result in balancing the load across multiple hosts. This is
assuming more than a single connection is made using the connection
string, either on the same client node or on different client nodes. I think
I think is a fair assumption to make. Also note that this patch does not load
balance queries, it load balances connections. This is because libpq works
at the connection level, not query level, due to session level state.
I agree it is indeed fairly simplistic load balancing. But many dedicated load
balancers often use simplistic load balancing too. Round-robin, random and
hash+modulo based load balancing are all very commonly used load balancer
strategies. Using this patch you should even be able to implement the
weighted load balancing that you suggest, by supplying the same host + port
pair multiple times in the list of hosts.
My preference would be to use load_balance_hosts for the option name.
However, if the name of the option becomes the main point of contention
I would be fine with changing the option to "randomize_hosts". I think
in the end it comes down to what we want the name of the option to reflect:
1. load_balance_hosts reflects what you (want to) achieve by enabling it
2. randomize_hosts reflects how it is achieved
Jelte
Attachments:
0001-Support-load-balancing-in-libpq.patchapplication/octet-stream; name=0001-Support-load-balancing-in-libpq.patchDownload+202-33
On Fri, Jun 10, 2022 at 10:01 PM Jelte Fennema
<Jelte.Fennema@microsoft.com> wrote:
Load balancing connections across multiple read replicas is a pretty
common way of scaling out read queries. There are two main ways of doing
so, both with their own advantages and disadvantages:
1. Load balancing at the client level
2. Load balancing by connecting to an intermediary load balancerOption 1 has been supported by JDBC (Java) for 8 years and Npgsql (C#)
merged support about a year ago. This patch adds the same functionality
to libpq. The way it's implemented is the same as the implementation of
JDBC, and contains two levels of load balancing:
1. The given hosts are randomly shuffled, before resolving them
one-by-one.
2. Once a host its addresses get resolved, those addresses are shuffled,
before trying to connect to them one-by-one.
Thanks for the patch. +1 for the general idea of redirecting connections.
I'm quoting a previous attempt by Satyanarayana Narlapuram on this
topic [1]/messages/by-id/CY1PR21MB00246DE1F9E9C58455A78A37915C0@CY1PR21MB0024.namprd21.prod.outlook.com, it also has a patch set.
IMO, rebalancing of the load must be based on parameters (as also
suggested by Aleksander Alekseev in this thread) such as the
read-only/write queries, CPU/IO/Memory utilization of the
primary/standby, network distance etc. We may not have to go the extra
mile to determine all of these parameters dynamically during query
authentication time, but we can let users provide a list of standby
hosts based on "some" priority (Satya's thread [1]/messages/by-id/CY1PR21MB00246DE1F9E9C58455A78A37915C0@CY1PR21MB0024.namprd21.prod.outlook.com attempts to do
this, in a way, with users specifying the hosts via pg_hba.conf file).
If required, randomization in choosing the hosts can be optional.
Also, IMO, the solution must have a fallback mechanism if the
standby/chosen host isn't reachable.
Few thoughts on the patch:
1) How are we determining if the submitted query is read-only or write?
2) What happens for explicit transactions? The queries related to the
same txn get executed on the same host right? How are we guaranteeing
this?
3) Isn't it good to provide a way to test the patch?
[1]: /messages/by-id/CY1PR21MB00246DE1F9E9C58455A78A37915C0@CY1PR21MB0024.namprd21.prod.outlook.com
Regards,
Bharath Rupireddy.
I'm quoting a previous attempt by Satyanarayana Narlapuram on this
topic [1], it also has a patch set.
Thanks for sharing that. It's indeed a different approach to solve the
same problem. I think my approach is much simpler, since it only
requires minimal changes to the libpq client and none to the postgres
server or the postgres protocol.
However, that linked patch is more flexible due to allowing redirection
based on users and databases. With my patch something similar could
still be achieved by using different hostname lists for different databases
or users at the client side.
To be completely clear on the core difference between the patch IMO:
In this patch a DNS server or (a hardcoded hostname/IP list at the client
side) is used to determine what host to connect to. In the linked
patch instead the Postgres server starts being the source of truth of what
to connect to, thus essentially becoming something similar to a DNS server.
We may not have to go the extra
mile to determine all of these parameters dynamically during query
authentication time, but we can let users provide a list of standby
hosts based on "some" priority (Satya's thread [1] attempts to do
this, in a way, with users specifying the hosts via pg_hba.conf file).
If required, randomization in choosing the hosts can be optional.
I'm not sure if you read my response to Aleksander. I feel like I
addressed part of this at least. But maybe I was not clear enough,
or added too much fluff. So, I'll re-iterate the important part:
By specifying the same host multiple times in the DNS response or
the hostname/IP list you can achieve weighted load balancing.
Few thoughts on the patch:
1) How are we determining if the submitted query is read-only or write?
This is not part of this patch. libpq and thus this patch works at the connection
level, not at the query level, so determining a read-only query or write only query
is not possible without large changes.
However, libpq already has a target_session_attrs[1]https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNECT-TARGET-SESSION-ATTRS connection option. This can be
used to open connections specifically to read-only or writable servers. However,
once a read-only connection is opened it is then the responsibility of the client
not to send write queries over this read-only connection, otherwise they will fail.
2) What happens for explicit transactions? The queries related to the
same txn get executed on the same host right? How are we guaranteeing
this?
We're load balancing connections, not queries. Once a connection is made
all queries on that connection will be executed on the same host.
3) Isn't it good to provide a way to test the patch?
The way I tested it myself was by setting up a few databases on my local machine
listening on 127.0.0.1, 127.0.0.2, 127.0.0.3 and then putting all those in the connection
string. Then looking at the connection attempts on the servers their logs showed that
the client was indeed connecting to a random one (by using log_connections=true
in postgresql.conf).
I would definitely like to have some automated tests for this, but I couldn't find tests
for libpq that were connecting to multiple postgres servers. If they exist, any pointers
are appreciated. If they don't exist, pointers to similar tests are also appreciated.
[1]: https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNECT-TARGET-SESSION-ATTRS
Dear Jelte,
I like your idea. But do we have to sort randomly even if target_session_attr is set to 'primary' or 'read-write'?
I think this parameter can be used when all listed servers have same data,
and we can assume that one of members is a primary and others are secondary.
In this case user maybe add a primary host to top of the list,
so sorting may increase time durations for establishing connection.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
we can assume that one of members is a primary and others are secondary.
With plain Postgres this assumption is probably correct. But the main reason
I'm interested in this patch was because I would like to be able to load
balance across the workers in a Citus cluster. These workers are all primaries.
Similar usage would likely be possible with BDR (bidirectional replication).
In this case user maybe add a primary host to top of the list,
so sorting may increase time durations for establishing connection.
If the user takes such care when building their host list, they could simply
not add the load_balance_hosts=true option to the connection string.
If you know there's only one primary in the list and you're looking for
the primary, then there's no reason to use load_balance_hosts=true.
Dear Jelte,
With plain Postgres this assumption is probably correct. But the main reason
I'm interested in this patch was because I would like to be able to load
balance across the workers in a Citus cluster. These workers are all primaries.
Similar usage would likely be possible with BDR (bidirectional replication).
I agree this feature is useful for BDR-like solutions.
If the user takes such care when building their host list, they could simply
not add the load_balance_hosts=true option to the connection string.
If you know there's only one primary in the list and you're looking for
the primary, then there's no reason to use load_balance_hosts=true.
You meant that it was the user responsibility to set correctly, right?
It seemed reasonable because libpq was just a library for connecting to server
and should not be smart.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
Hi,
the patch no longer applies cleanly, please rebase (it's trivial).
I don't like the provided commit message very much, I think the
discussion about pgJDBC having had load balancing for a while belongs
elsewhere.
On Wed, Jun 22, 2022 at 07:54:19AM +0000, Jelte Fennema wrote:
I tried to stay in line with the naming of this same option in JDBC and
Npgsql, where it's called "loadBalanceHosts" and "Load Balance Hosts"
respectively. So, actually to be more in line it should be the option for
libpq should be called "load_balance_hosts" (not "loadbalance" like
in the previous patch). I attached a new patch with the name of the
option changed to this.
Maybe my imagination is not so great, but what else than hosts could we
possibly load-balance? I don't mind calling it load_balance, but I also
don't feel very strongly one way or the other and this is clearly
bikeshed territory.
I also don't think the name is misleading. Randomization of hosts will
automatically result in balancing the load across multiple hosts. This is
assuming more than a single connection is made using the connection
string, either on the same client node or on different client nodes. I think
I think is a fair assumption to make. Also note that this patch does not load
balance queries, it load balances connections. This is because libpq works
at the connection level, not query level, due to session level state.
I agree.
Also, I think the scope is ok for a first implementation (but see
below). You or others could possibly further enhance the algorithm in
the future, but it seems to be useful as-is.
I agree it is indeed fairly simplistic load balancing.
If I understand correctly, you've added DNS-based load balancing on top
of just shuffling the provided hostnames. This makes sense if a
hostname is backed by more than one IP address in the context of load
balancing, but it also complicates the patch. So I'm wondering how much
shorter the patch would be if you leave that out for now?
On the other hand, I believe pgJDBC keeps track of which hosts are up or
down and only load balances among the ones which are up (maybe
rechecking after a timeout? I don't remember), is this something you're
doing, or did you consider it?
Some quick remarks on the patch:
/* OK, scan this addrlist for a working server address */
- conn->addr_cur = conn->addrlist;
reset_connection_state_machine = true;
conn->try_next_host = false;
The comment might need to be updated.
+ int naddr; /* # of addrs returned by getaddrinfo */
This is spelt "number of" in several other places in the file, and we
still have enough space to spell it out here as well without a
line-wrap.
Michael
Attached is an updated patch with the following changes:
1. rebased (including solved merge conflict)
2. fixed failing tests in CI
3. changed the commit message a little bit
4. addressed the two remarks from Micheal
5. changed the prng_state from a global to a connection level value for thread-safety
6. use pg_prng_uint64_range
Maybe my imagination is not so great, but what else than hosts could we
possibly load-balance? I don't mind calling it load_balance, but I also
don't feel very strongly one way or the other and this is clearly
bikeshed territory.
I agree, which is why I called it load_balance in my original patch. But I also
think it's useful to match the naming for the already existing implementations
in the PG ecosystem around this. But like you I don't really feel strongly either
way. It's a tradeoff between short name and consistency in the ecosystem.
If I understand correctly, you've added DNS-based load balancing on top
of just shuffling the provided hostnames. This makes sense if a
hostname is backed by more than one IP address in the context of load
balancing, but it also complicates the patch. So I'm wondering how much
shorter the patch would be if you leave that out for now?
Yes, that's correct and indeed the patch would be simpler without, i.e. all the
addrinfo changes would become unnecessary. But IMHO the behaviour of
the added option would be very unexpected if it didn't load balance across
multiple IPs in a DNS record. libpq currently makes no real distinction in
handling of provided hosts and handling of their resolved IPs. If load balancing
would only apply to the host list that would start making a distinction
between the two.
Apart from that the load balancing across IPs is one of the main reasons
for my interest in this patch. The reason is that it allows expanding or reducing
the number of nodes that are being load balanced across transparently to the
application. Which means that there's no need to re-deploy applications with
new connection strings when changing the number hosts.
On the other hand, I believe pgJDBC keeps track of which hosts are up or
down and only load balances among the ones which are up (maybe
rechecking after a timeout? I don't remember), is this something you're
doing, or did you consider it?
I don't think it's possible to do this in libpq without huge changes to its
architecture, since normally a connection will only a PGconn will only
create a single connection. The reason pgJDBC can do this is because
it's actually a connection pooler, so it will open more than one connection
and can thus keep some global state about the different hosts.
Jelte
Attachments:
0001-Support-load-balancing-in-libpq.patchapplication/octet-stream; name=0001-Support-load-balancing-in-libpq.patchDownload+210-35
+1 for overall idea of load balancing via random host selection.
For the patch itself, I think it is better to use a more precise time
function in libpq_prng_init or call it only once.
Thought it is a special corner case, imagine all the connection attempts at
first second will be seeded with the save
value, i.e. will attempt to connect to the same host. I think, this is not
we want to achieve.
And the "hostroder" option should be free'd in freePGconn.
Also, IMO, the solution must have a fallback mechanism if the
standby/chosen host isn't reachable.
Yeah, I think it should. I'm not insisting on a particular name of options
here, but in my view, the overall idea may be next:
- we have two libpq's options: "load_balance_hosts" and "failover_timeout";
- the "load_balance_hosts" should be "sequential" or "random";
- the "failover_timeout" is a time period, within which, if connection to
the server is not established, we switch to the next address or host.
While writing this text, I start thinking that load balancing is a
combination of two parameters above.
3) Isn't it good to provide a way to test the patch?
Good idea too. I think, we should add tap test here.
--
Best regards,
Maxim Orlov.
Hi,
On Mon, Sep 12, 2022 at 02:16:56PM +0000, Jelte Fennema wrote:
Attached is an updated patch with the following changes:
1. rebased (including solved merge conflict)
2. fixed failing tests in CI
3. changed the commit message a little bit
4. addressed the two remarks from Micheal
5. changed the prng_state from a global to a connection level value for thread-safety
6. use pg_prng_uint64_range
Thanks!
I tested this some more, and found it somewhat surprising that at least
when looking at it on a microscopic level, some hosts are chosen more
often than the others for a while.
I basically ran
while true; do psql -At "host=pg1,pg2,pg3 load_balance_hosts=1" -c
"SELECT inet_server_addr()"; sleep 1; done
and the initial output was:
10.0.3.109
10.0.3.109
10.0.3.240
10.0.3.109
10.0.3.109
10.0.3.240
10.0.3.109
10.0.3.240
10.0.3.240
10.0.3.240
10.0.3.240
10.0.3.109
10.0.3.240
10.0.3.109
10.0.3.109
10.0.3.240
10.0.3.240
10.0.3.109
10.0.3.60
I.e. the second host (pg2/10.0.3.60) was only hit after 19 iterations.
Once significantly more than a hundred iterations are run, the hosts
somewhat even out, but it is maybe suprising to users:
50 100 250 500 1000 10000
10.0.3.60 9 24 77 165 328 3317
10.0.3.109 25 42 88 178 353 3372
10.0.3.240 16 34 85 157 319 3311
Or maybe my test setup is skewed? When I choose a two seconds timeout
between psql calls, I get a more even distribution initially, but it
then diverges after 100 iterations:
50 100 250 500 1000
10.0.3.60 19 36 98 199 374
10.0.3.109 13 33 80 150 285
10.0.3.240 18 31 72 151 341
Could just be bad luck...
I also switch one host to have two IP addresses in /etc/hosts:
10.0.3.109 pg1
10.0.3.60 pg1
10.0.3.240 pg3
And this resulted in this (one second timeout again):
First run:
50 100 250 500 1000
10.0.3.60 10 18 56 120 255
10.0.3.109 14 30 67 139 278
10.0.3.240 26 52 127 241 467
Second run:
50 100 250 500 1000
10.0.3.60 20 31 77 138 265
10.0.3.109 9 20 52 116 245
10.0.3.240 21 49 121 246 490
So it looks like it load-balances between pg1 and pg3, and not between
the three IPs - is this expected?
If I switch from "host=pg1,pg3" to "host=pg1,pg1,pg3", each IP adress is
hit roughly equally.
So I guess this is how it should work, but in that case I think the
documentation should be more explicit about what is to be expected if a
host has multiple IP addresses or hosts are specified multiple times in
the connection string.
Maybe my imagination is not so great, but what else than hosts could we
possibly load-balance? I don't mind calling it load_balance, but I also
don't feel very strongly one way or the other and this is clearly
bikeshed territory.I agree, which is why I called it load_balance in my original patch.
But I also think it's useful to match the naming for the already
existing implementations in the PG ecosystem around this.
But like you I don't really feel strongly either way. It's a tradeoff
between short name and consistency in the ecosystem.
I don't think consistency is an extremely valid concern. As a
counterpoint, pgJDBC had targetServerType some time before Postgres, and
the libpq parameter was then named somewhat differently when it was
introduced, namely target_session_attrs.
If I understand correctly, you've added DNS-based load balancing on top
of just shuffling the provided hostnames.� This makes sense if a
hostname is backed by more than one IP address in the context of load
balancing, but it also complicates the patch. So I'm wondering how much
shorter the patch would be if you leave that out for now?Yes, that's correct and indeed the patch would be simpler without, i.e. all the
addrinfo changes would become unnecessary. But IMHO the behaviour of
the added option would be very unexpected if it didn't load balance across
multiple IPs in a DNS record. libpq currently makes no real distinction in
handling of provided hosts and handling of their resolved IPs. If load balancing
would only apply to the host list that would start making a distinction
between the two.
Fair enough, I agree.
Apart from that the load balancing across IPs is one of the main reasons
for my interest in this patch. The reason is that it allows expanding or reducing
the number of nodes that are being load balanced across transparently to the
application. Which means that there's no need to re-deploy applications with
new connection strings when changing the number hosts.
That's a good point as well.
On the other hand, I believe pgJDBC keeps track of which hosts are up or
down and only load balances among the ones which are up (maybe
rechecking after a timeout? I don't remember), is this something you're
doing, or did you consider it?I don't think it's possible to do this in libpq without huge changes to its
architecture, since normally a connection will only a PGconn will only
create a single connection. The reason pgJDBC can do this is because
it's actually a connection pooler, so it will open more than one connection
and can thus keep some global state about the different hosts.
Ok.
Michael
Hi,
On Wed, Sep 14, 2022 at 05:53:48PM +0300, Maxim Orlov wrote:
Also, IMO, the solution must have a fallback mechanism if the
standby/chosen host isn't reachable.Yeah, I think it should. I'm not insisting on a particular name of options
here, but in my view, the overall idea may be next:
- we have two libpq's options: "load_balance_hosts" and "failover_timeout";
- the "load_balance_hosts" should be "sequential" or "random";
- the "failover_timeout" is a time period, within which, if connection to
the server is not established, we switch to the next address or host.
Isn't this exactly what connect_timeout is providing? In my tests, it
worked exactly as I would expect it, i.e. after connect_timeout seconds,
libpq was re-shuffling and going for another host.
If you specify only one host (or all are down), you get an error.
In any case, I am not sure what failover has to do with it if we are
talking about initiating connections - usually failover is for already
established connections that suddendly go away for one reason or
another.
Or maybe I'm just not understanding where you're getting at?
While writing this text, I start thinking that load balancing is a
combination of two parameters above.
So I guess what you are saying is that if load_balance_hosts is set,
not setting connect_timeout would be a hazard, cause it would stall the
connection attempt even though other hosts would be available.
That's right, but I guess it's already a hazard if you put multiple
hosts in there, and the connection is not immediately failed (because
the host doesn't exist or it rejects the connection) but stalled by a
firewall for one host, while other hosts later on in the list would be
happy to accept connections.
So maybe this is something to think about, but just changing the
defaul of connect_timeout to something else when load balancing is on
would be very surprising. In any case, I don't think this absolutely
needs to be addressed by the initial feature, it could be expanded upon
later on if needed, the feature is useful on its own, along with
connect_timeout.
Michael
I attached a new patch which does the following:
1. adds tap tests
2. adds random_seed parameter to libpq (required for tap tests)
3. frees conn->loadbalance in freePGConn
4. add more expansive docs on the feature its behaviour
Apart from bike shedding on the name of the option I think it's pretty good now.
Isn't this exactly what connect_timeout is providing? In my tests, it
worked exactly as I would expect it, i.e. after connect_timeout seconds,
libpq was re-shuffling and going for another host.
Yes, this was the main purpose of multiple hosts previously. This patch
doesn't change that, and it indeed continues to work when enabling
load balancing too. I included this in the tap tests.
I tested this some more, and found it somewhat surprising that at least
when looking at it on a microscopic level, some hosts are chosen more
often than the others for a while.
That does seem surprising, but it looks like it might simply be bad luck.
Did you compile with OpenSSL support? Otherwise, the strong random
source might not be used.
So it looks like it load-balances between pg1 and pg3, and not between
the three IPs - is this expected?If I switch from "host=pg1,pg3" to "host=pg1,pg1,pg3", each IP adress is
hit roughly equally.So I guess this is how it should work, but in that case I think the
documentation should be more explicit about what is to be expected if a
host has multiple IP addresses or hosts are specified multiple times in
the connection string.
Yes, this behaviour is expected I tried to make that clearer in the newest
version of the docs.
For the patch itself, I think it is better to use a more precise time
function in libpq_prng_init or call it only once.
Thought it is a special corner case, imagine all the connection attempts at
first second will be seeded with the save
I agree that using microseconds would probably be preferable. But that seems
like a separate patch, since I took this initialization code from the InitProcessGlobals
function. Also, it shouldn't be a big issue in practice, since usually the strong random
source will be used.
Attachments:
0001-Support-load-balancing-in-libpq.patchapplication/octet-stream; name=0001-Support-load-balancing-in-libpq.patchDownload+413-34
Attached is a new version with the tests cleaned up a bit (more comments mostly).
@Michael, did you have a chance to look at the last version? Because I feel that the
patch is pretty much ready for a committer to look at, at this point.
Attachments:
v5-0001-Support-load-balancing-in-libpq.patchapplication/octet-stream; name=v5-0001-Support-load-balancing-in-libpq.patchDownload+446-34
Hi,
On Tue, Nov 29, 2022 at 02:57:08PM +0000, Jelte Fennema wrote:
Attached is a new version with the tests cleaned up a bit (more
comments mostly).@Michael, did you have a chance to look at the last version? Because I
feel that the patch is pretty much ready for a committer to look at,
at this point.
I had another look; it still applies and tests pass. I still find the
distribution over three hosts a bit skewed (even when OpenSSL is
enabled, which wasn't the case when I first tested it), but couldn't
find a general fault and it worked well enough in my testing.
I wonder whether making the parameter a boolean will paint us into a
corner, and whether maybe additional modes could be envisioned in the
future, but I can't think of some right now (you can pretty neatly
restrict load-balancing to standbys by setting
target_session_attrs=standby in addition). Maybe a way to change the
behaviour if a dns hostname is backed by multiple entries?
Some further (mostly nitpicking) comments on the patch:
From 6e20bb223012b666161521b5e7249c066467a5f3 Mon Sep 17 00:00:00 2001
From: Jelte Fennema <github-tech@jeltef.nl>
Date: Mon, 12 Sep 2022 09:44:06 +0200
Subject: [PATCH v5] Support load balancing in libpqLoad balancing connections across multiple read replicas is a pretty
common way of scaling out read queries. There are two main ways of doing
so, both with their own advantages and disadvantages:
1. Load balancing at the client level
2. Load balancing by connecting to an intermediary load balancerBoth JBDC (Java) and Npgsql (C#) already support client level load
balancing (option #1). This patch implements client level load balancing
for libpq as well. To stay consistent with the JDBC and Npgsql part of
the ecosystem, a similar implementation and name for the option are
used.
I still think all of the above has no business in the commit message,
though maybe the first paragraph can stay as introduction.
It contains two levels of load balancing:
1. The given hosts are randomly shuffled, before resolving them
one-by-one.
2. Once a host its addresses get resolved, those addresses are shuffled,
before trying to connect to them one-by-one.
That's fine.
What should be in the commit message is at least a mention of what the
new connection parameter is called and possibly what is done to
accomplish it.
But the committer will pick this up if needed I guess.
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index f9558dec3b..6ce7a0c9cc 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -1316,6 +1316,54 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname </listitem> </varlistentry>+ <varlistentry id="libpq-load-balance-hosts" xreflabel="load_balance_hosts"> + <term><literal>load_balance_hosts</literal></term> + <listitem> + <para> + Controls whether the client load balances connections across hosts and + adresses. The default value is 0, meaning off, this means that hosts are + tried in order they are provided and addresses are tried in the order + they are received from DNS or a hosts file. If this value is set to 1, + meaning on, the hosts and addresses that they resolve to are tried in + random order. Subsequent queries once connected will still be sent to + the same server. Setting this to 1, is mostly useful when opening + multiple connections at the same time, possibly from different machines. + This way connections can be load balanced across multiple Postgres + servers. + </para> + <para> + When providing multiple hosts, these hosts are resolved in random order. + Then if that host resolves to multiple addresses, these addresses are + connected to in random order. Only once all addresses for a single host + have been tried, the addresses for the next random host will be + resolved. This behaviour can lead to non-uniform address selection in + certain cases. Such as when not all hosts resolve to the same number of + addresses, or when multiple hosts resolve to the same address. So if you + want uniform load balancing, this is something to keep in mind. However, + non-uniform load balancing also has usecases, e.g. providing the + hostname of a larger server multiple times in the host string so it gets + more requests. + </para> + <para> + When using this setting it's recommended to also configure a reasonable + value for <xref linkend="libpq-connect-connect-timeout"/>. Because then, + if one of the nodes that are used for load balancing is not responding, + a new node will be tried. + </para> + </listitem> + </varlistentry>
I think this whole section is generally fine, but needs some
copyediting.
+ <varlistentry id="libpq-random-seed" xreflabel="random_seed"> + <term><literal>random_seed</literal></term> + <listitem> + <para> + Sets the random seed that is used by <xref linkend="libpq-load-balance-hosts"/> + to randomize the host order. This option is mostly useful when running + tests that require a stable random order. + </para> + </listitem> + </varlistentry>
I wonder whether this needs to be documented if it is mostly a
development/testing parameter?
diff --git a/src/include/libpq/pqcomm.h b/src/include/libpq/pqcomm.h index fcf68df39b..39e93b1392 100644 --- a/src/include/libpq/pqcomm.h +++ b/src/include/libpq/pqcomm.h @@ -27,6 +27,12 @@ typedef struct socklen_t salen; } SockAddr;+typedef struct +{ + int family; + SockAddr addr; +} AddrInfo;
That last line looks weirdly indented compared to SockAddr; in the
struct above.
/* Configure the UNIX socket location for the well known port. */
#define UNIXSOCK_PATH(path, port, sockdir) \ diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index f88d672c6c..b4d3613713 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -241,6 +241,14 @@ static const internalPQconninfoOption PQconninfoOptions[] = { "Fallback-Application-Name", "", 64, offsetof(struct pg_conn, fbappname)},+ {"load_balance_hosts", NULL, NULL, NULL, + "Load-Balance", "", 1, /* should be just '0' or '1' */ + offsetof(struct pg_conn, loadbalance)}, + + {"random_seed", NULL, NULL, NULL, + "Random-Seed", "", 10, /* strlen(INT32_MAX) == 10 */ + offsetof(struct pg_conn, randomseed)}, + {"keepalives", NULL, NULL, NULL, "TCP-Keepalives", "", 1, /* should be just '0' or '1' */ offsetof(struct pg_conn, keepalives)}, @@ -379,6 +387,7 @@ static bool fillPGconn(PGconn *conn, PQconninfoOption *connOptions); static void freePGconn(PGconn *conn); static void closePGconn(PGconn *conn); static void release_conn_addrinfo(PGconn *conn); +static bool store_conn_addrinfo(PGconn *conn, struct addrinfo *addrlist); static void sendTerminateConn(PGconn *conn); static PQconninfoOption *conninfo_init(PQExpBuffer errorMessage); static PQconninfoOption *parse_connection_string(const char *connstr, @@ -424,6 +433,9 @@ static void pgpassfileWarning(PGconn *conn); static void default_threadlock(int acquire); static bool sslVerifyProtocolVersion(const char *version); static bool sslVerifyProtocolRange(const char *min, const char *max); +static int loadBalance(PGconn *conn); +static bool parse_int_param(const char *value, int *result, PGconn *conn, + const char *context);/* global variable because fe-auth.c needs to access it */
@@ -1007,6 +1019,46 @@ parse_comma_separated_list(char **startptr, bool *more)
return p;
}+/* + * Initializes the prng_state field of the connection. We want something + * unpredictable, so if possible, use high-quality random bits for the + * seed. Otherwise, fall back to a seed based on timestamp and PID. + */ +static bool +libpq_prng_init(PGconn *conn) +{ + if (unlikely(conn->randomseed)) + { + int rseed; + + if (!parse_int_param(conn->randomseed, &rseed, conn, "random_seed")) + { + return false; + };
I think it's project policy to drop the braces for single statements in
if blocks.
+ pg_prng_seed(&conn->prng_state, rseed); + } + else if (unlikely(!pg_prng_strong_seed(&conn->prng_state))) + { + uint64 rseed; + time_t now = time(NULL); + + /* + * Since PIDs and timestamps tend to change more frequently in their + * least significant bits, shift the timestamp left to allow a larger + * total number of seeds in a given time period. Since that would + * leave only 20 bits of the timestamp that cycle every ~1 second, + * also mix in some higher bits. + */ + rseed = ((uint64) getpid()) ^ + ((uint64) now << 12) ^ + ((uint64) now >> 20); + + pg_prng_seed(&conn->prng_state, rseed); + } + return true; +} + + /*
Additional newline.
@@ -1164,6 +1217,36 @@ connectOptions2(PGconn *conn)
}
}+ if (loadbalancehosts < 0) + { + appendPQExpBufferStr(&conn->errorMessage, + libpq_gettext("loadbalance parameter must be an integer\n")); + return false; + } + + if (loadbalancehosts) + { + if (!libpq_prng_init(conn)) + { + return false; + } + + /* + * Shuffle connhost with a Durstenfeld/Knuth version of the + * Fisher-Yates shuffle. Source: + * https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#The_modern_algorithm + */ + for (i = conn->nconnhost - 1; i > 0; i--) + { + int j = pg_prng_uint64_range(&conn->prng_state, 0, i); + pg_conn_host temp = conn->connhost[j]; + + conn->connhost[j] = conn->connhost[i]; + conn->connhost[i] = temp; + } + } + + /*
Additional newline.
@@ -1726,6 +1809,27 @@ connectFailureMessage(PGconn *conn, int errorno)
libpq_append_conn_error(conn, "\tIs the server running on that host and accepting TCP/IP connections?");
}+/* + * Should we load balance across hosts? Returns 1 if yes, 0 if no, and -1 if + * conn->loadbalance is set to a value which is not parseable as an integer. + */ +static int +loadBalance(PGconn *conn) +{ + char *ep; + int val; + + if (conn->loadbalance == NULL) + { + return 0; + }
Another case of additional braces.
+ val = strtol(conn->loadbalance, &ep, 10); + if (*ep) + return -1; + return val != 0 ? 1 : 0; +} + + /*
Additional newline.
@@ -4041,6 +4154,63 @@ freePGconn(PGconn *conn)
free(conn);
}+
+/*
Additional newline.
+ * Copies over the AddrInfos from addrlist to the PGconn. + */ +static bool +store_conn_addrinfo(PGconn *conn, struct addrinfo *addrlist) +{ + struct addrinfo *ai = addrlist; + + conn->whichaddr = 0; + + conn->naddr = 0; + while (ai) + { + ai = ai->ai_next; + conn->naddr++; + } + + conn->addr = calloc(conn->naddr, sizeof(AddrInfo)); + if (conn->addr == NULL) + { + return false; + }
Additional braces.
@@ -4048,11 +4218,10 @@ freePGconn(PGconn *conn) static void release_conn_addrinfo(PGconn *conn) { - if (conn->addrlist) + if (conn->addr) { - pg_freeaddrinfo_all(conn->addrlist_family, conn->addrlist); - conn->addrlist = NULL; - conn->addr_cur = NULL; /* for safety */ + free(conn->addr); + conn->addr = NULL; } }diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h index 512762f999..76ee988038 100644 --- a/src/interfaces/libpq/libpq-int.h +++ b/src/interfaces/libpq/libpq-int.h @@ -82,6 +82,8 @@ typedef struct #endif #endif /* USE_OPENSSL */+#include "common/pg_prng.h" + /* * POSTGRES backend dependent Constants. */ @@ -373,6 +375,8 @@ struct pg_conn char *pgpassfile; /* path to a file containing password(s) */ char *channel_binding; /* channel binding mode * (require,prefer,disable) */ + char *loadbalance; /* load balance over hosts */ + char *randomseed; /* seed for randomization of load balancing */ char *keepalives; /* use TCP keepalives? */
A bit unclear why you put the variables at this point in the list, but
the list doesn't seem to be ordered strictly anyway; still, maybe they
would fit best at the bottom below target_session_attrs?
char *keepalives_idle; /* time between TCP keepalives */ char *keepalives_interval; /* time between TCP keepalive @@ -461,8 +465,10 @@ struct pg_conn PGTargetServerType target_server_type; /* desired session properties */ bool try_next_addr; /* time to advance to next address/host? */ bool try_next_host; /* time to advance to next connhost[]? */ - struct addrinfo *addrlist; /* list of addresses for current connhost */ - struct addrinfo *addr_cur; /* the one currently being tried */ + int naddr; /* number of addrs returned by getaddrinfo */ + int whichaddr; /* the addr currently being tried */
Address(es) is always spelt out in the comments, those two addr(s)
should also I think.
diff --git a/src/interfaces/libpq/t/003_loadbalance.pl b/src/interfaces/libpq/t/003_loadbalance.pl new file mode 100644 index 0000000000..07eddbe9cc --- /dev/null +++ b/src/interfaces/libpq/t/003_loadbalance.pl @@ -0,0 +1,167 @@ +# Copyright (c) 2022, PostgreSQL Global Development Group
Copyright bump needed.
Cheers,
Michael
Attached an updated patch which should address your feedback and
I updated the commit message.
I wonder whether making the parameter a boolean will paint us into a
corner
I made it a string option, just like target_session_attrs. I'm pretty sure a
round-robin load balancing policy could be implemented in the future
given certain constraints, like connections being made within the same
process. I adjusted the docs accordingly.
+typedef struct +{ + int family; + SockAddr addr; +} AddrInfo;That last line looks weirdly indented compared to SockAddr; in the
struct above.
Yes I agree, but for some reason pgindent really badly wants it formatted
that way. I now undid the changes made by pgindent manually.
I wonder whether this needs to be documented if it is mostly a
development/testing parameter?
I also wasn't sure whether it should be documented or not. I'm fine with
either, I'll leave it in for now and let a committer decide if it's wanted or not.
A bit unclear why you put the variables at this point in the list, but
the list doesn't seem to be ordered strictly anyway; still, maybe they
would fit best at the bottom below target_session_attrs?
Good point, I added them after target_session_attrs now and also moved
docs/parsing accordingly. This makes conceptually to me as well, since
target_session_attrs and load_balance_hosts have some interesting
sense contextually too.
P.S. I also attached the same pgindent run patch that I added in
/messages/by-id/AM5PR83MB0178D3B31CA1B6EC4A8ECC42F7529@AM5PR83MB0178.EURPRD83.prod.outlook.com
On Wed, Sep 14, 2022 at 7:54 AM Maxim Orlov <orlovmg@gmail.com> wrote:
For the patch itself, I think it is better to use a more precise time function in libpq_prng_init or call it only once.
Thought it is a special corner case, imagine all the connection attempts at first second will be seeded with the save
value, i.e. will attempt to connect to the same host. I think, this is not we want to achieve.
Just a quick single-issue review, but I agree with Maxim that having
one PRNG, seeded once, would be simpler -- with the tangible benefit
that it would eliminate weird behavior on simultaneous connections
when the client isn't using OpenSSL. (I'm guessing a simple lock on a
global PRNG would be less overhead than the per-connection
strong_random machinery, too, but I have no data to back that up.) The
test seed could then be handled globally as well (envvar?) so that you
don't have to introduce a debug-only option into the connection
string.
Overall, I like the concept.
--Jacob
Just a quick single-issue review, but I agree with Maxim that having
one PRNG, seeded once, would be simpler
I don't agree that it's simpler. Because now there's a mutex you have
to manage, and honestly cross-platform threading in C is not simple.
However, I attached two additional patches that implement this
approach on top of the previous patchset. Just to make sure that
this patch is not blocked on this.
with the tangible benefit that it would eliminate weird behavior on
simultaneous connections when the client isn't using OpenSSL.
This is true, but still I think in practice very few people have a libpq
that's compiled without OpenSSL support.
I'm guessing a simple lock on a
global PRNG would be less overhead than the per-connection
strong_random machinery, too, but I have no data to back that up.
It might very well have less overhead, but neither of them should take
up any significant amount of time during connection establishment.
The test seed could then be handled globally as well (envvar?) so that you
don't have to introduce a debug-only option into the connection string.
Why is a debug-only envvar any better than a debug-only connection option?
For now I kept the connection option approach, since to me they seem pretty
much equivalent.
Attachments:
v7-0003-Make-mutexes-easier-to-use-in-libpq.patchapplication/x-patch; name=v7-0003-Make-mutexes-easier-to-use-in-libpq.patchDownload+64-49
v7-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patchapplication/x-patch; name=v7-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patchDownload+149-150
v7-0004-Share-prng-state-between-all-PGconns-in-process.patchapplication/x-patch; name=v7-0004-Share-prng-state-between-all-PGconns-in-process.patchDownload+32-7
v7-0002-Support-load-balancing-in-libpq.patchapplication/x-patch; name=v7-0002-Support-load-balancing-in-libpq.patchDownload+460-34
On Fri, Jan 13, 2023 at 9:10 AM Jelte Fennema <postgres@jeltef.nl> wrote:
Just a quick single-issue review, but I agree with Maxim that having
one PRNG, seeded once, would be simplerI don't agree that it's simpler. Because now there's a mutex you have
to manage, and honestly cross-platform threading in C is not simple.
However, I attached two additional patches that implement this
approach on top of the previous patchset. Just to make sure that
this patch is not blocked on this.
It hadn't been my intention to block the patch on it, sorry. Just
registering a preference.
I also didn't intend to make you refactor the locking code -- my
assumption was that you could use the existing pglock_thread() to
handle it, since it didn't seem like the additional contention would
hurt too much. Maybe that's not actually performant enough, in which
case my suggestion loses an advantage.
The test seed could then be handled globally as well (envvar?) so that you
don't have to introduce a debug-only option into the connection string.Why is a debug-only envvar any better than a debug-only connection option?
For now I kept the connection option approach, since to me they seem pretty
much equivalent.
I guess I worry less about envvar namespace pollution than pollution
of the connection options. And my thought was that the one-time
initialization could be moved to a place that doesn't need to know the
connection options at all, to make it easier to reason about the
architecture. Say, next to the WSAStartup machinery.
But as it is now, I agree that the implementation hasn't really lost
any complexity compared to the original, and I don't feel particularly
strongly about it. If it doesn't help to make the change, then it
doesn't help.
Thanks,
--Jacob