Add max_wal_replay_size connection parameter to libpq
Hi,
When connecting with target_session_attrs=standby (or prefer-standby,
read-only, any) and multiple standbys are available, libpq currently
selects the first acceptable candidate without regard for how "current"
its data is. A standby configured with recovery_min_apply_delay,
experiencing slow I/O, or otherwise lagging is treated the same as one
that is fully caught up.
I would like to propose a new libpq connection parameter,
max_wal_replay_size, that allows clients to skip standby servers whose
WAL replay backlog exceeds a given threshold.
Example:
psql "host=host1,host2,host3 port=5111,5222,5333 \
target_session_attrs=standby max_wal_replay_size=16MB"
When this parameter is set, libpq executes a small query during
connection establishment to evaluate:
pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn())
on the standby. If the result exceeds the specified threshold, the
server is skipped and the next host in the list is tried. The check is
skipped entirely when target_session_attrs is set to primary or
read-write, since those modes already exclude standbys.
If pg_last_wal_receive_lsn() is NULL (e.g. no active WAL receiver due to
missing primary_conninfo or a disconnected upstream), the backlog cannot
be determined. In that case, the standby is treated as exceeding the
threshold and is skipped.
This parameter measures only the apply lag on the standby itself, i.e.,
how much already-received WAL remains to be replayed. It does not
attempt to measure how far the standby is behind the primary. In
particular, a standby that is slow to receive WAL but fast to replay it
may report a small backlog here while still being significantly behind.
The attached PoC patch may make the behaviour clearer.
Any feedback on this approach would be appreciated.
Best, Jim
Attachments:
v1-0001-Add-libpq-connection-parameter-max_wal_replay_siz.patchtext/x-patch; charset=UTF-8; name=v1-0001-Add-libpq-connection-parameter-max_wal_replay_siz.patchDownload+459-4
Hi Jim,
On Sun, Mar 29, 2026 at 10:56 AM Jim Jones <jim.jones@uni-muenster.de>
wrote:
Hi,
When connecting with target_session_attrs=standby (or prefer-standby,
read-only, any) and multiple standbys are available, libpq currently
selects the first acceptable candidate without regard for how "current"
its data is. A standby configured with recovery_min_apply_delay,
experiencing slow I/O, or otherwise lagging is treated the same as one
that is fully caught up.I would like to propose a new libpq connection parameter,
max_wal_replay_size, that allows clients to skip standby servers whose
WAL replay backlog exceeds a given threshold.Example:
psql "host=host1,host2,host3 port=5111,5222,5333 \
target_session_attrs=standby max_wal_replay_size=16MB"When this parameter is set, libpq executes a small query during
connection establishment to evaluate:pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn())
on the standby. If the result exceeds the specified threshold, the
server is skipped and the next host in the list is tried. The check is
skipped entirely when target_session_attrs is set to primary or
read-write, since those modes already exclude standbys.
What if none of them meets the criteria? You fail the connection? Wouldn't
it cause an availability issue?
If pg_last_wal_receive_lsn() is NULL (e.g. no active WAL receiver due to
missing primary_conninfo or a disconnected upstream), the backlog cannot
be determined. In that case, the standby is treated as exceeding the
threshold and is skipped.
When a standby is replaying archiving log, it can still be caught up. This
doesn't seem right to me.
This parameter measures only the apply lag on the standby itself, i.e.,
how much already-received WAL remains to be replayed. It does not
attempt to measure how far the standby is behind the primary. In
particular, a standby that is slow to receive WAL but fast to replay it
may report a small backlog here while still being significantly behind.
IMHO, this change appears to not meet the objective of routing
connections/queries to the most up-to-date standby.
Thanks,
Satya
On 29/03/2026 20:31, SATYANARAYANA NARLAPURAM wrote:
What if none of them meets the criteria? You fail the connection?
Wouldn't it cause an availability issue?
Yes, the connection fails if no host meets the threshold. This is
intentional, and it is consistent with the existing behaviour of
target_session_attrs: if you set target_session_attrs=standby and no
standby is reachable, the connection fails too.
If pg_last_wal_receive_lsn() is NULL (e.g. no active WAL receiver due to
missing primary_conninfo or a disconnected upstream), the backlog cannot
be determined. In that case, the standby is treated as exceeding the
threshold and is skipped.When a standby is replaying archiving log, it can still be caught up.
This doesn't seem right to me.
I totally see your point here. The issue is that
pg_last_wal_receive_lsn() returns NULL when there is no WAL receiver
process -- regardless of how current the data actually is. Without a
receive LSN, the metric this parameter is based on (receive_lsn -
replay_lsn) is simply undefined for that standby.
Please let me know if I am missing something here.
This parameter measures only the apply lag on the standby itself, i.e.,
how much already-received WAL remains to be replayed. It does not
attempt to measure how far the standby is behind the primary. In
particular, a standby that is slow to receive WAL but fast to replay it
may report a small backlog here while still being significantly behind.IMHO, this change appears to not meet the objective of routing
connections/queries to the most up-to-date standby.
The parameter's objective is not to route to the most up-to-date
standby; it is to skip standbys whose apply lag exceeds a given threshold.
Thanks for the quick feedback. Much appreciated!
Best, Jim
Hi,
On Sun, Mar 29, 2026 at 11:53 AM Jim Jones <jim.jones@uni-muenster.de>
wrote:
On 29/03/2026 20:31, SATYANARAYANA NARLAPURAM wrote:
What if none of them meets the criteria? You fail the connection?
Wouldn't it cause an availability issue?Yes, the connection fails if no host meets the threshold. This is
intentional, and it is consistent with the existing behaviour of
target_session_attrs: if you set target_session_attrs=standby and no
standby is reachable, the connection fails too.If pg_last_wal_receive_lsn() is NULL (e.g. no active WAL receiver
due to
missing primary_conninfo or a disconnected upstream), the backlog
cannot
be determined. In that case, the standby is treated as exceeding the
threshold and is skipped.When a standby is replaying archiving log, it can still be caught up.
This doesn't seem right to me.I totally see your point here. The issue is that
pg_last_wal_receive_lsn() returns NULL when there is no WAL receiver
process -- regardless of how current the data actually is. Without a
receive LSN, the metric this parameter is based on (receive_lsn -
replay_lsn) is simply undefined for that standby.Please let me know if I am missing something here.
This parameter measures only the apply lag on the standby itself,
i.e.,
how much already-received WAL remains to be replayed. It does not
attempt to measure how far the standby is behind the primary. In
particular, a standby that is slow to receive WAL but fast to replayit
may report a small backlog here while still being significantly
behind.
IMHO, this change appears to not meet the objective of routing
connections/queries to the most up-to-date standby.The parameter's objective is not to route to the most up-to-date
standby; it is to skip standbys whose apply lag exceeds a given threshold.
What is the expectation from such a routing? Is it for freshness of data
for the client or
freeing up the standby from user connections so that it can catch up with
primary?
The paragraph described originally was talking about the freshness.
Thanks,
Satya
Hi Satya
On 30/03/2026 01:51, SATYANARAYANA NARLAPURAM wrote:
What is the expectation from such a routing? Is it for freshness of data
for the client or
freeing up the standby from user connections so that it can catch up
with primary?
The paragraph described originally was talking about the freshness.
The motivation is closer to the former, but in a specific sense. The
concrete situations I have in mind are:
* Standbys that are I/O-bound and falling behind on replaying received WAL
* Standbys catching up after a pause
* Standbys running with recovery_min_apply_delay (intentionally delayed
replication).
In these cases, the client would read data older than what the standby
has already received. Routing around them is what the parameter enables.
The parameter does not measure how far the standby is behind the primary
-- it only measures locally buffered WAL that hasn't been applied yet. I
see I should have been clearer about that in the original email.
Let me know if I am missing something here.
Thanks!
Best, Jim