Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

Started by SATYANARAYANA NARLAPURAMover 4 years ago36 messageshackers
Jump to latest
#1SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com

Hi Hackers,

I am considering implementing RPO (recovery point objective) enforcement
feature for Postgres where the WAL writes on the primary are stalled when
the WAL distance between the primary and standby exceeds the configured
(replica_lag_in_bytes) threshold. This feature is useful particularly in
the disaster recovery setups where primary and standby are in different
regions and synchronous replication can't be set up for latency and
performance reasons yet requires some level of RPO enforcement.

The idea here is to calculate the lag between the primary and the standby
(Async?) server during XLogInsert and block the caller until the lag is
less than the threshold value. We can calculate the max lag by iterating
over ReplicationSlotCtl->replication_slots. If this is not something we
don't want to do in the core, at least adding a hook for XlogInsert is of
great value.

A few other scenarios I can think of with the hook are:

1. Enforcing RPO as described above
2. Enforcing rate limit and slow throttling when sync standby is falling
behind (could be flush lag or replay lag)
3. Transactional log rate governance - useful for cloud providers to
provide SKU sizes based on allowed WAL writes.

Thoughts?

Thanks,
Satya

#2Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: SATYANARAYANA NARLAPURAM (#1)
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

On Thu, Dec 23, 2021 at 5:53 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:

Hi Hackers,

I am considering implementing RPO (recovery point objective) enforcement feature for Postgres where the WAL writes on the primary are stalled when the WAL distance between the primary and standby exceeds the configured (replica_lag_in_bytes) threshold. This feature is useful particularly in the disaster recovery setups where primary and standby are in different regions and synchronous replication can't be set up for latency and performance reasons yet requires some level of RPO enforcement.

+1 for the idea in general. However, blocking writes on primary seems
an extremely radical idea. The replicas can fall behind transiently at
times and blocking writes on the primary may stop applications failing
for these transient times. This is not a problem if the applications
have retry logic for the writes. How about blocking writes on primary
if the replicas fall behind the primary for a certain period of time?

The idea here is to calculate the lag between the primary and the standby (Async?) server during XLogInsert and block the caller until the lag is less than the threshold value. We can calculate the max lag by iterating over ReplicationSlotCtl->replication_slots.

The "falling behind" can also be quantified by the number of
write-transactions on the primary. I think it's good to have the users
choose what the "falling behind" means for them. We can have something
like the "recovery_target" param with different options name, xid,
time, lsn.

If this is not something we don't want to do in the core, at least adding a hook for XlogInsert is of great value.

IMHO, this feature may not be needed by everyone, the hook-way seems
reasonable so that the postgres vendors can provide different
implementations (for instance they can write an extension that
implements this hook which can block writes on primary, write some log
messages, inform some service layer of the replicas falling behind the
primary etc.). If we were to have the hook in XLogInsert which gets
called so frequently or XLogInsert is a hot-path, the hook really
should do as little work as possible, otherwise the write operations
latency may increase.

A few other scenarios I can think of with the hook are:

Enforcing RPO as described above
Enforcing rate limit and slow throttling when sync standby is falling behind (could be flush lag or replay lag)
Transactional log rate governance - useful for cloud providers to provide SKU sizes based on allowed WAL writes.

Thoughts?

The hook can help to achieve the above objectives but where to place
it and what parameters it should take as input (or what info it should
emit out of the server via the hook) are important too.

Having said all, the RPO feature can also be implemented outside of
the postgres, a simple implementation could be - get the primary
current wal lsn using pg_current_wal_lsn and all the replicas
restart_lsn using pg_replication_slot, if they differ by certain
amount, then issue ALTER SYSTEM SET READ ONLY command [1]/messages/by-id/CAAJ_b967uKBiW6gbHr5aPzweURYjEGv333FHVHxvJmMhanwHXA@mail.gmail.com on the
primary, this requires the connections to the server and proper access
rights. This feature can also be implemented as an extension (without
the hook) which doesn't require any connections to the server yet can
access the required info primary current_wal_lsn, restart_lsn of the
replication slots etc, but the RPO enforcement may not be immediate as
the server doesn't have any hooks in XLogInsert or some other area.

[1]: /messages/by-id/CAAJ_b967uKBiW6gbHr5aPzweURYjEGv333FHVHxvJmMhanwHXA@mail.gmail.com

Regards,
Bharath Rupireddy.

#3Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: SATYANARAYANA NARLAPURAM (#1)
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

On Thu, Dec 23, 2021 at 5:53 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:

Hi Hackers,

I am considering implementing RPO (recovery point objective) enforcement feature for Postgres where the WAL writes on the primary are stalled when the WAL distance between the primary and standby exceeds the configured (replica_lag_in_bytes) threshold. This feature is useful particularly in the disaster recovery setups where primary and standby are in different regions and synchronous replication can't be set up for latency and performance reasons yet requires some level of RPO enforcement.

Limiting transaction rate when the standby fails behind is a good feature ...

The idea here is to calculate the lag between the primary and the standby (Async?) server during XLogInsert and block the caller until the lag is less than the threshold value. We can calculate the max lag by iterating over ReplicationSlotCtl->replication_slots. If this is not something we don't want to do in the core, at least adding a hook for XlogInsert is of great value.

but doing it in XLogInsert does not seem to be a good idea. It's a
common point for all kinds of logging including VACUUM. We could
accidently stall a critical VACUUM operation because of that.

As Bharath described, it better be handled at the application level monitoring.

--
Best Wishes,
Ashutosh Bapat

#4SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com
In reply to: Ashutosh Bapat (#3)
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

On Thu, Dec 23, 2021 at 5:18 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
wrote:

On Thu, Dec 23, 2021 at 5:53 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:

Hi Hackers,

I am considering implementing RPO (recovery point objective) enforcement

feature for Postgres where the WAL writes on the primary are stalled when
the WAL distance between the primary and standby exceeds the configured
(replica_lag_in_bytes) threshold. This feature is useful particularly in
the disaster recovery setups where primary and standby are in different
regions and synchronous replication can't be set up for latency and
performance reasons yet requires some level of RPO enforcement.

Limiting transaction rate when the standby fails behind is a good feature
...

The idea here is to calculate the lag between the primary and the

standby (Async?) server during XLogInsert and block the caller until the
lag is less than the threshold value. We can calculate the max lag by
iterating over ReplicationSlotCtl->replication_slots. If this is not
something we don't want to do in the core, at least adding a hook for
XlogInsert is of great value.

but doing it in XLogInsert does not seem to be a good idea.

XLogInsert isn't the best place to throttle/govern in a simple and fair
way, particularly the long-running transactions on the server?

It's a
common point for all kinds of logging including VACUUM. We could
accidently stall a critical VACUUM operation because of that.

Agreed, but again this is a policy decision that DBA can relax/enforce. I
expect RPO is in the range of a few 100MBs to GBs and on a healthy system
typically lag never comes close to this value. The Hook implementation can
take care of nitty-gritty details on the policy enforcement based on the
needs, for example, not throttling some backend processes like vacuum,
checkpointer; throttling based on the roles, for example not to throttle
superuser connections; and throttling based on replay lag, write lag,
checkpoint taking longer, closer to disk full. Each of these can be easily
translated into GUCs. Depending on the direction of the thread on the hook
vs a feature in the Core, I can add more implementation details.

As Bharath described, it better be handled at the application level
monitoring.

Both RPO based WAL throttling and application level monitoring can co-exist
as each one has its own merits and challenges. Each application developer
has to implement their own throttling logic and often times it is hard to
get it right.

Show quoted text

--
Best Wishes,
Ashutosh Bapat

#5SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com
In reply to: SATYANARAYANA NARLAPURAM (#1)
Fwd: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

Please find the attached draft patch.

On Thu, Dec 23, 2021 at 2:47 AM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Dec 23, 2021 at 5:53 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:

Hi Hackers,

I am considering implementing RPO (recovery point objective) enforcement

feature for Postgres where the WAL writes on the primary are stalled when
the WAL distance between the primary and standby exceeds the configured
(replica_lag_in_bytes) threshold. This feature is useful particularly in
the disaster recovery setups where primary and standby are in different
regions and synchronous replication can't be set up for latency and
performance reasons yet requires some level of RPO enforcement.

+1 for the idea in general. However, blocking writes on primary seems
an extremely radical idea. The replicas can fall behind transiently at
times and blocking writes on the primary may stop applications failing
for these transient times. This is not a problem if the applications
have retry logic for the writes. How about blocking writes on primary
if the replicas fall behind the primary for a certain period of time?

My proposal is to block the caller from writing until the lag situation is
improved. Don't want to throw any errors and fail the tranaction. I think
we are aligned?

The idea here is to calculate the lag between the primary and the

standby (Async?) server during XLogInsert and block the caller until the
lag is less than the threshold value. We can calculate the max lag by
iterating over ReplicationSlotCtl->replication_slots.

The "falling behind" can also be quantified by the number of
write-transactions on the primary. I think it's good to have the users
choose what the "falling behind" means for them. We can have something
like the "recovery_target" param with different options name, xid,
time, lsn.

The transactions can be of arbitrary size and length and these options may
not provide the desired results. Time is a worthy option to add.

If this is not something we don't want to do in the core, at least

adding a hook for XlogInsert is of great value.

IMHO, this feature may not be needed by everyone, the hook-way seems
reasonable so that the postgres vendors can provide different
implementations (for instance they can write an extension that
implements this hook which can block writes on primary, write some log
messages, inform some service layer of the replicas falling behind the
primary etc.). If we were to have the hook in XLogInsert which gets
called so frequently or XLogInsert is a hot-path, the hook really
should do as little work as possible, otherwise the write operations
latency may increase.

A Hook is a good start. If there is enough interest then an extension can
be added to the contrib module.

A few other scenarios I can think of with the hook are:

Enforcing RPO as described above
Enforcing rate limit and slow throttling when sync standby is falling

behind (could be flush lag or replay lag)

Transactional log rate governance - useful for cloud providers to

provide SKU sizes based on allowed WAL writes.

Thoughts?

The hook can help to achieve the above objectives but where to place
it and what parameters it should take as input (or what info it should
emit out of the server via the hook) are important too.

XLogInsert in my opinion is the best place to call it and the hook can be
something like this "void xlog_insert_hook(NULL)" as all the throttling
logic required is the current flush position which can be obtained
from GetFlushRecPtr and the ReplicationSlotCtl. Attached a draft patch.

Having said all, the RPO feature can also be implemented outside of
the postgres, a simple implementation could be - get the primary
current wal lsn using pg_current_wal_lsn and all the replicas
restart_lsn using pg_replication_slot, if they differ by certain
amount, then issue ALTER SYSTEM SET READ ONLY command [1] on the
primary, this requires the connections to the server and proper access
rights. This feature can also be implemented as an extension (without
the hook) which doesn't require any connections to the server yet can
access the required info primary current_wal_lsn, restart_lsn of the
replication slots etc, but the RPO enforcement may not be immediate as
the server doesn't have any hooks in XLogInsert or some other area.

READ ONLY is a decent choice but can fail the writes or not take
into effect until the end of the transaction?

Show quoted text

[1] -
/messages/by-id/CAAJ_b967uKBiW6gbHr5aPzweURYjEGv333FHVHxvJmMhanwHXA@mail.gmail.com

Regards,
Bharath Rupireddy.

Attachments:

0001-Add-xlog_insert_hook-to-give-control-to-the-plugins.patchapplication/octet-stream; name=0001-Add-xlog_insert_hook-to-give-control-to-the-plugins.patchDownload+14-1
#6Dilip Kumar
dilipbalaut@gmail.com
In reply to: SATYANARAYANA NARLAPURAM (#5)
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

On Fri, Dec 24, 2021 at 3:27 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:

XLogInsert in my opinion is the best place to call it and the hook can be
something like this "void xlog_insert_hook(NULL)" as all the throttling
logic required is the current flush position which can be obtained
from GetFlushRecPtr and the ReplicationSlotCtl. Attached a draft patch.

IMHO, it is not a good idea to call an external hook function inside a
critical section. Generally, we ensure that we do not call any code path
within a critical section which can throw an error and if we start calling
the external hook then we lose that control. It should be blocked at the
operation level itself e.g. ALTER TABLE READ ONLY, or by some other hook at
a little higher level.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#7Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Dilip Kumar (#6)
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

On Fri, Dec 24, 2021 at 4:43 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Fri, Dec 24, 2021 at 3:27 AM SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> wrote:

XLogInsert in my opinion is the best place to call it and the hook can be something like this "void xlog_insert_hook(NULL)" as all the throttling logic required is the current flush position which can be obtained from GetFlushRecPtr and the ReplicationSlotCtl. Attached a draft patch.

IMHO, it is not a good idea to call an external hook function inside a critical section. Generally, we ensure that we do not call any code path within a critical section which can throw an error and if we start calling the external hook then we lose that control. It should be blocked at the operation level itself e.g. ALTER TABLE READ ONLY, or by some other hook at a little higher level.

Yeah, good point. It's not advisable to give the control to the
external module in the critical section. For instance, memory
allocation isn't allowed (see [1]/* * You should not do memory allocations within a critical section, because * an out-of-memory error will be escalated to a PANIC. To enforce that * rule, the allocation functions Assert that. */ #define AssertNotInCriticalSection(context) \ Assert(CritSectionCount == 0 || (context)->allowInCritSection)) and the ereport(ERROR,....) would
transform to PANIC inside the critical section (see [2]/* * If we are inside a critical section, all errors become PANIC * errors. See miscadmin.h. */ if (CritSectionCount > 0) elevel = PANIC;, [3]* A related, but conceptually distinct, mechanism is the "critical section" * mechanism. A critical section not only holds off cancel/die interrupts, * but causes any ereport(ERROR) or ereport(FATAL) to become ereport(PANIC) * --- that is, a system-wide reset is forced. Needless to say, only really * *critical* code should be marked as a critical section! Currently, this * mechanism is only used for XLOG-related code.).
Moreover the critical section is to be short-spanned i.e. executing
the as minimal code as possible. There's no guarantee that an external
module would follow these.

I suggest we do it at the level of transaction start i.e. when a txnid
is getting allocated i.e. in AssignTransactionId(). If we do this,
when the limit for the throttling is exceeded, the current txn (even
if it is a long running txn) continues to do the WAL insertions, the
next txns would get blocked. But this is okay and can be conveyed to
the users via documentation if need be. We do block txnid assignments
for parallel workers in this function, so this is a good choice IMO.

Thoughts?

[1]: /* * You should not do memory allocations within a critical section, because * an out-of-memory error will be escalated to a PANIC. To enforce that * rule, the allocation functions Assert that. */ #define AssertNotInCriticalSection(context) \ Assert(CritSectionCount == 0 || (context)->allowInCritSection)
/*
* You should not do memory allocations within a critical section, because
* an out-of-memory error will be escalated to a PANIC. To enforce that
* rule, the allocation functions Assert that.
*/
#define AssertNotInCriticalSection(context) \
Assert(CritSectionCount == 0 || (context)->allowInCritSection)

[2]: /* * If we are inside a critical section, all errors become PANIC * errors. See miscadmin.h. */ if (CritSectionCount > 0) elevel = PANIC;
/*
* If we are inside a critical section, all errors become PANIC
* errors. See miscadmin.h.
*/
if (CritSectionCount > 0)
elevel = PANIC;

[3]: * A related, but conceptually distinct, mechanism is the "critical section" * mechanism. A critical section not only holds off cancel/die interrupts, * but causes any ereport(ERROR) or ereport(FATAL) to become ereport(PANIC) * --- that is, a system-wide reset is forced. Needless to say, only really * *critical* code should be marked as a critical section! Currently, this * mechanism is only used for XLOG-related code.
* A related, but conceptually distinct, mechanism is the "critical section"
* mechanism. A critical section not only holds off cancel/die interrupts,
* but causes any ereport(ERROR) or ereport(FATAL) to become ereport(PANIC)
* --- that is, a system-wide reset is forced. Needless to say, only really
* *critical* code should be marked as a critical section! Currently, this
* mechanism is only used for XLOG-related code.

Regards,
Bharath Rupireddy.

#8SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com
In reply to: Dilip Kumar (#6)
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

On Fri, Dec 24, 2021 at 3:13 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Fri, Dec 24, 2021 at 3:27 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:

XLogInsert in my opinion is the best place to call it and the hook can be
something like this "void xlog_insert_hook(NULL)" as all the throttling
logic required is the current flush position which can be obtained
from GetFlushRecPtr and the ReplicationSlotCtl. Attached a draft patch.

IMHO, it is not a good idea to call an external hook function inside a
critical section. Generally, we ensure that we do not call any code path
within a critical section which can throw an error and if we start calling
the external hook then we lose that control.

Thank you for the comment. XLogInsertRecord is inside a critical section
but not XLogInsert. Am I missing something?

It should be blocked at the operation level itself e.g. ALTER TABLE READ
ONLY, or by some other hook at a little higher level.

There is a lot of maintenance overhead with a custom implementation at
individual databases and tables level. This doesn't provide the necessary
control that I am looking for.

Show quoted text

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#9Dilip Kumar
dilipbalaut@gmail.com
In reply to: SATYANARAYANA NARLAPURAM (#8)
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

On Sun, Dec 26, 2021 at 3:52 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:

On Fri, Dec 24, 2021 at 3:13 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Fri, Dec 24, 2021 at 3:27 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:

XLogInsert in my opinion is the best place to call it and the hook can
be something like this "void xlog_insert_hook(NULL)" as all the throttling
logic required is the current flush position which can be obtained
from GetFlushRecPtr and the ReplicationSlotCtl. Attached a draft patch.

IMHO, it is not a good idea to call an external hook function inside a
critical section. Generally, we ensure that we do not call any code path
within a critical section which can throw an error and if we start calling
the external hook then we lose that control.

Thank you for the comment. XLogInsertRecord is inside a critical section
but not XLogInsert. Am I missing something?

Actually all the WAL insertions are done under a critical section (except
few exceptions), that means if you see all the references of XLogInsert(),
it is always called under the critical section and that is my main worry
about hooking at XLogInsert level.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#10SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com
In reply to: Dilip Kumar (#9)
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

On Sat, Dec 25, 2021 at 6:01 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Sun, Dec 26, 2021 at 3:52 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:

On Fri, Dec 24, 2021 at 3:13 AM Dilip Kumar <dilipbalaut@gmail.com>
wrote:

On Fri, Dec 24, 2021 at 3:27 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:

XLogInsert in my opinion is the best place to call it and the hook can
be something like this "void xlog_insert_hook(NULL)" as all the throttling
logic required is the current flush position which can be obtained
from GetFlushRecPtr and the ReplicationSlotCtl. Attached a draft patch.

IMHO, it is not a good idea to call an external hook function inside a
critical section. Generally, we ensure that we do not call any code path
within a critical section which can throw an error and if we start calling
the external hook then we lose that control.

Thank you for the comment. XLogInsertRecord is inside a critical section
but not XLogInsert. Am I missing something?

Actually all the WAL insertions are done under a critical section (except
few exceptions), that means if you see all the references of XLogInsert(),
it is always called under the critical section and that is my main worry
about hooking at XLogInsert level.

Got it, understood the concern. But can we document the limitations of the
hook and let the hook take care of it? I don't expect an error to be thrown
here since we are not planning to allocate memory or make file system calls
but instead look at the shared memory state and add delays when required.

Show quoted text

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#11Julien Rouhaud
rjuju123@gmail.com
In reply to: SATYANARAYANA NARLAPURAM (#10)
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

On Sun, Dec 26, 2021 at 1:06 PM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:

Got it, understood the concern. But can we document the limitations of the hook and let the hook take care of it? I don't expect an error to be thrown here since we are not planning to allocate memory or make file system calls but instead look at the shared memory state and add delays when required.

It wouldn't work. You can't make any assumption about how long it
would take for the replication lag to resolve, so you may have to wait
for a very long time. It means that at the very least the sleep has
to be interruptible and therefore can raise an error. In general
there isn't much you can due in a critical section, so this approach
doesn't seem sensible to me.

#12Dilip Kumar
dilipbalaut@gmail.com
In reply to: SATYANARAYANA NARLAPURAM (#10)
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

On Sun, Dec 26, 2021 at 10:36 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:

Actually all the WAL insertions are done under a critical section (except
few exceptions), that means if you see all the references of XLogInsert(),
it is always called under the critical section and that is my main worry
about hooking at XLogInsert level.

Got it, understood the concern. But can we document the limitations of the
hook and let the hook take care of it? I don't expect an error to be thrown
here since we are not planning to allocate memory or make file system calls
but instead look at the shared memory state and add delays when required.

Yet another problem is that if we are in XlogInsert() that means we are
holding the buffer locks on all the pages we have modified, so if we add a
hook at that level which can make it wait then we would also block any of
the read operations needed to read from those buffers. I haven't thought
what could be better way to do this but this is certainly not good.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#13SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com
In reply to: Dilip Kumar (#12)
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

On Sat, Dec 25, 2021 at 9:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Sun, Dec 26, 2021 at 10:36 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:

Actually all the WAL insertions are done under a critical section
(except few exceptions), that means if you see all the references of
XLogInsert(), it is always called under the critical section and that is my
main worry about hooking at XLogInsert level.

Got it, understood the concern. But can we document the limitations of
the hook and let the hook take care of it? I don't expect an error to be
thrown here since we are not planning to allocate memory or make file
system calls but instead look at the shared memory state and add delays
when required.

Yet another problem is that if we are in XlogInsert() that means we are
holding the buffer locks on all the pages we have modified, so if we add a
hook at that level which can make it wait then we would also block any of
the read operations needed to read from those buffers. I haven't thought
what could be better way to do this but this is certainly not good.

Yes, this is a problem. The other approach is adding a hook at
XLogWrite/XLogFlush? All the other backends will be waiting behind the
WALWriteLock. The process that is performing the write enters into a busy
loop with small delays until the criteria are met. Inability to process the
interrupts inside the critical section is a challenge in both approaches.
Any other thoughts?

Show quoted text

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#14Stephen Frost
sfrost@snowman.net
In reply to: SATYANARAYANA NARLAPURAM (#13)
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

Greetings,

* SATYANARAYANA NARLAPURAM (satyanarlapuram@gmail.com) wrote:

On Sat, Dec 25, 2021 at 9:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Sun, Dec 26, 2021 at 10:36 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:

Actually all the WAL insertions are done under a critical section
(except few exceptions), that means if you see all the references of
XLogInsert(), it is always called under the critical section and that is my
main worry about hooking at XLogInsert level.

Got it, understood the concern. But can we document the limitations of
the hook and let the hook take care of it? I don't expect an error to be
thrown here since we are not planning to allocate memory or make file
system calls but instead look at the shared memory state and add delays
when required.

Yet another problem is that if we are in XlogInsert() that means we are
holding the buffer locks on all the pages we have modified, so if we add a
hook at that level which can make it wait then we would also block any of
the read operations needed to read from those buffers. I haven't thought
what could be better way to do this but this is certainly not good.

Yes, this is a problem. The other approach is adding a hook at
XLogWrite/XLogFlush? All the other backends will be waiting behind the
WALWriteLock. The process that is performing the write enters into a busy
loop with small delays until the criteria are met. Inability to process the
interrupts inside the critical section is a challenge in both approaches.
Any other thoughts?

Why not have this work the exact same way sync replicas do, except that
it's based off of some byte/time lag for some set of async replicas?
That is, in RecordTransactionCommit(), perhaps right after the
SyncRepWaitForLSN() call, or maybe even add this to that function? Sure
seems like there's a lot of similarity.

Thanks,

Stephen

#15SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com
In reply to: Stephen Frost (#14)
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

Stephen, thank you!

On Wed, Dec 29, 2021 at 5:46 AM Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* SATYANARAYANA NARLAPURAM (satyanarlapuram@gmail.com) wrote:

On Sat, Dec 25, 2021 at 9:25 PM Dilip Kumar <dilipbalaut@gmail.com>

wrote:

On Sun, Dec 26, 2021 at 10:36 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:

Actually all the WAL insertions are done under a critical section
(except few exceptions), that means if you see all the references of
XLogInsert(), it is always called under the critical section and

that is my

main worry about hooking at XLogInsert level.

Got it, understood the concern. But can we document the limitations of
the hook and let the hook take care of it? I don't expect an error to

be

thrown here since we are not planning to allocate memory or make file
system calls but instead look at the shared memory state and add

delays

when required.

Yet another problem is that if we are in XlogInsert() that means we are
holding the buffer locks on all the pages we have modified, so if we

add a

hook at that level which can make it wait then we would also block any

of

the read operations needed to read from those buffers. I haven't

thought

what could be better way to do this but this is certainly not good.

Yes, this is a problem. The other approach is adding a hook at
XLogWrite/XLogFlush? All the other backends will be waiting behind the
WALWriteLock. The process that is performing the write enters into a busy
loop with small delays until the criteria are met. Inability to process

the

interrupts inside the critical section is a challenge in both approaches.
Any other thoughts?

Why not have this work the exact same way sync replicas do, except that
it's based off of some byte/time lag for some set of async replicas?
That is, in RecordTransactionCommit(), perhaps right after the
SyncRepWaitForLSN() call, or maybe even add this to that function? Sure
seems like there's a lot of similarity.

I was thinking of achieving log governance (throttling WAL MB/sec) and also
providing RPO guarantees. In this model, it is hard to throttle WAL
generation of a long running transaction (for example copy/select into).
However, this meets my RPO needs. Are you in support of adding a hook or
the actual change? IMHO, the hook allows more creative options. I can go
ahead and make a patch accordingly.

Show quoted text

Thanks,

Stephen

#16Stephen Frost
sfrost@snowman.net
In reply to: SATYANARAYANA NARLAPURAM (#15)
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

Greetings,

On Wed, Dec 29, 2021 at 14:04 SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:

Stephen, thank you!

On Wed, Dec 29, 2021 at 5:46 AM Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* SATYANARAYANA NARLAPURAM (satyanarlapuram@gmail.com) wrote:

On Sat, Dec 25, 2021 at 9:25 PM Dilip Kumar <dilipbalaut@gmail.com>

wrote:

On Sun, Dec 26, 2021 at 10:36 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:

Actually all the WAL insertions are done under a critical section
(except few exceptions), that means if you see all the references of
XLogInsert(), it is always called under the critical section and

that is my

main worry about hooking at XLogInsert level.

Got it, understood the concern. But can we document the limitations

of

the hook and let the hook take care of it? I don't expect an error

to be

thrown here since we are not planning to allocate memory or make file
system calls but instead look at the shared memory state and add

delays

when required.

Yet another problem is that if we are in XlogInsert() that means we

are

holding the buffer locks on all the pages we have modified, so if we

add a

hook at that level which can make it wait then we would also block

any of

the read operations needed to read from those buffers. I haven't

thought

what could be better way to do this but this is certainly not good.

Yes, this is a problem. The other approach is adding a hook at
XLogWrite/XLogFlush? All the other backends will be waiting behind the
WALWriteLock. The process that is performing the write enters into a

busy

loop with small delays until the criteria are met. Inability to process

the

interrupts inside the critical section is a challenge in both

approaches.

Any other thoughts?

Why not have this work the exact same way sync replicas do, except that
it's based off of some byte/time lag for some set of async replicas?
That is, in RecordTransactionCommit(), perhaps right after the
SyncRepWaitForLSN() call, or maybe even add this to that function? Sure
seems like there's a lot of similarity.

I was thinking of achieving log governance (throttling WAL MB/sec) and
also providing RPO guarantees. In this model, it is hard to throttle WAL
generation of a long running transaction (for example copy/select into).

Long running transactions have a lot of downsides and are best discouraged.
I don’t know that we should be designing this for that case specifically,
particularly given the complications it would introduce as discussed on
this thread already.

However, this meets my RPO needs. Are you in support of adding a hook or

the actual change? IMHO, the hook allows more creative options. I can go
ahead and make a patch accordingly.

I would think this would make more sense as part of core rather than a
hook, as that then requires an extension and additional setup to get going,
which raises the bar quite a bit when it comes to actually being used.

Thanks,

Stephen

Show quoted text
#17SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com
In reply to: Stephen Frost (#16)
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

On Wed, Dec 29, 2021 at 11:16 AM Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

On Wed, Dec 29, 2021 at 14:04 SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:

Stephen, thank you!

On Wed, Dec 29, 2021 at 5:46 AM Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* SATYANARAYANA NARLAPURAM (satyanarlapuram@gmail.com) wrote:

On Sat, Dec 25, 2021 at 9:25 PM Dilip Kumar <dilipbalaut@gmail.com>

wrote:

On Sun, Dec 26, 2021 at 10:36 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:

Actually all the WAL insertions are done under a critical section
(except few exceptions), that means if you see all the references

of

XLogInsert(), it is always called under the critical section and

that is my

main worry about hooking at XLogInsert level.

Got it, understood the concern. But can we document the limitations

of

the hook and let the hook take care of it? I don't expect an error

to be

thrown here since we are not planning to allocate memory or make

file

system calls but instead look at the shared memory state and add

delays

when required.

Yet another problem is that if we are in XlogInsert() that means we

are

holding the buffer locks on all the pages we have modified, so if we

add a

hook at that level which can make it wait then we would also block

any of

the read operations needed to read from those buffers. I haven't

thought

what could be better way to do this but this is certainly not good.

Yes, this is a problem. The other approach is adding a hook at
XLogWrite/XLogFlush? All the other backends will be waiting behind the
WALWriteLock. The process that is performing the write enters into a

busy

loop with small delays until the criteria are met. Inability to

process the

interrupts inside the critical section is a challenge in both

approaches.

Any other thoughts?

Why not have this work the exact same way sync replicas do, except that
it's based off of some byte/time lag for some set of async replicas?
That is, in RecordTransactionCommit(), perhaps right after the
SyncRepWaitForLSN() call, or maybe even add this to that function? Sure
seems like there's a lot of similarity.

I was thinking of achieving log governance (throttling WAL MB/sec) and
also providing RPO guarantees. In this model, it is hard to throttle WAL
generation of a long running transaction (for example copy/select into).

Long running transactions have a lot of downsides and are best
discouraged. I don’t know that we should be designing this for that case
specifically, particularly given the complications it would introduce as
discussed on this thread already.

However, this meets my RPO needs. Are you in support of adding a hook or

the actual change? IMHO, the hook allows more creative options. I can go
ahead and make a patch accordingly.

I would think this would make more sense as part of core rather than a
hook, as that then requires an extension and additional setup to get going,
which raises the bar quite a bit when it comes to actually being used.

Sounds good, I will work on making the changes accordingly.

Show quoted text

Thanks,

Stephen

#18Andres Freund
andres@anarazel.de
In reply to: SATYANARAYANA NARLAPURAM (#13)
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

Hi,

On 2021-12-27 16:40:28 -0800, SATYANARAYANA NARLAPURAM wrote:

Yet another problem is that if we are in XlogInsert() that means we are
holding the buffer locks on all the pages we have modified, so if we add a
hook at that level which can make it wait then we would also block any of
the read operations needed to read from those buffers. I haven't thought
what could be better way to do this but this is certainly not good.

Yes, this is a problem. The other approach is adding a hook at
XLogWrite/XLogFlush?

That's pretty much the same - XLogInsert() can trigger an
XLogWrite()/Flush().

I think it's a complete no-go to add throttling to these places. It's quite
possible that it'd cause new deadlocks, and it's almost guaranteed to have
unintended consequences (e.g. replication falling back further because
XLogFlush() is being throttled).

I also don't think it's a sane thing to add hooks to these places. It's
complicated enough as-is, adding the chance for random other things to happen
during such crucial operations will make it even harder to maintain.

Greetings,

Andres Freund

#19SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com
In reply to: Andres Freund (#18)
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

On Wed, Dec 29, 2021 at 11:31 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2021-12-27 16:40:28 -0800, SATYANARAYANA NARLAPURAM wrote:

Yet another problem is that if we are in XlogInsert() that means we are
holding the buffer locks on all the pages we have modified, so if we

add a

hook at that level which can make it wait then we would also block any

of

the read operations needed to read from those buffers. I haven't

thought

what could be better way to do this but this is certainly not good.

Yes, this is a problem. The other approach is adding a hook at
XLogWrite/XLogFlush?

That's pretty much the same - XLogInsert() can trigger an
XLogWrite()/Flush().

I think it's a complete no-go to add throttling to these places. It's quite
possible that it'd cause new deadlocks, and it's almost guaranteed to have
unintended consequences (e.g. replication falling back further because
XLogFlush() is being throttled).

I also don't think it's a sane thing to add hooks to these places. It's
complicated enough as-is, adding the chance for random other things to
happen
during such crucial operations will make it even harder to maintain.

Andres, thanks for the comments. Agreed on this based on the previous
discussions on this thread. Could you please share your thoughts on adding
it after SyncRepWaitForLSN()?

Show quoted text

Greetings,

Andres Freund

#20Andres Freund
andres@anarazel.de
In reply to: SATYANARAYANA NARLAPURAM (#19)
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

Hi,

On 2021-12-29 11:34:53 -0800, SATYANARAYANA NARLAPURAM wrote:

On Wed, Dec 29, 2021 at 11:31 AM Andres Freund <andres@anarazel.de> wrote:
Andres, thanks for the comments. Agreed on this based on the previous
discussions on this thread. Could you please share your thoughts on adding
it after SyncRepWaitForLSN()?

I don't think that's good either - you're delaying transaction commit
(i.e. xact becoming visible / locks being released). That also has the danger
of increasing lock contention (albeit more likely to be heavyweight locks /
serializable state). It'd have to be after the transaction actually committed.

Greetings,

Andres Freund

#21Dilip Kumar
dilipbalaut@gmail.com
In reply to: Andres Freund (#20)
#22SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com
In reply to: Dilip Kumar (#21)
#23Dilip Kumar
dilipbalaut@gmail.com
In reply to: SATYANARAYANA NARLAPURAM (#22)
#24Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Dilip Kumar (#23)
#25Dilip Kumar
dilipbalaut@gmail.com
In reply to: Bharath Rupireddy (#24)
#26SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com
In reply to: Dilip Kumar (#25)
#27Andres Freund
andres@anarazel.de
In reply to: SATYANARAYANA NARLAPURAM (#22)
#28Ashwin Agrawal
aagrawal@pivotal.io
In reply to: SATYANARAYANA NARLAPURAM (#1)
#29Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#18)
#30Dilip Kumar
dilipbalaut@gmail.com
In reply to: Andres Freund (#29)
#31SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com
In reply to: Andres Freund (#29)
#32Dilip Kumar
dilipbalaut@gmail.com
In reply to: SATYANARAYANA NARLAPURAM (#31)
#33SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com
In reply to: Dilip Kumar (#32)
#34Nathan Bossart
nathandbossart@gmail.com
In reply to: SATYANARAYANA NARLAPURAM (#33)
#35Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Nathan Bossart (#34)
#36Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Konstantin Knizhnik (#35)