Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
Hi Hackers,
I am considering implementing RPO (recovery point objective) enforcement
feature for Postgres where the WAL writes on the primary are stalled when
the WAL distance between the primary and standby exceeds the configured
(replica_lag_in_bytes) threshold. This feature is useful particularly in
the disaster recovery setups where primary and standby are in different
regions and synchronous replication can't be set up for latency and
performance reasons yet requires some level of RPO enforcement.
The idea here is to calculate the lag between the primary and the standby
(Async?) server during XLogInsert and block the caller until the lag is
less than the threshold value. We can calculate the max lag by iterating
over ReplicationSlotCtl->replication_slots. If this is not something we
don't want to do in the core, at least adding a hook for XlogInsert is of
great value.
A few other scenarios I can think of with the hook are:
1. Enforcing RPO as described above
2. Enforcing rate limit and slow throttling when sync standby is falling
behind (could be flush lag or replay lag)
3. Transactional log rate governance - useful for cloud providers to
provide SKU sizes based on allowed WAL writes.
Thoughts?
Thanks,
Satya
On Thu, Dec 23, 2021 at 5:53 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:
Hi Hackers,
I am considering implementing RPO (recovery point objective) enforcement feature for Postgres where the WAL writes on the primary are stalled when the WAL distance between the primary and standby exceeds the configured (replica_lag_in_bytes) threshold. This feature is useful particularly in the disaster recovery setups where primary and standby are in different regions and synchronous replication can't be set up for latency and performance reasons yet requires some level of RPO enforcement.
+1 for the idea in general. However, blocking writes on primary seems
an extremely radical idea. The replicas can fall behind transiently at
times and blocking writes on the primary may stop applications failing
for these transient times. This is not a problem if the applications
have retry logic for the writes. How about blocking writes on primary
if the replicas fall behind the primary for a certain period of time?
The idea here is to calculate the lag between the primary and the standby (Async?) server during XLogInsert and block the caller until the lag is less than the threshold value. We can calculate the max lag by iterating over ReplicationSlotCtl->replication_slots.
The "falling behind" can also be quantified by the number of
write-transactions on the primary. I think it's good to have the users
choose what the "falling behind" means for them. We can have something
like the "recovery_target" param with different options name, xid,
time, lsn.
If this is not something we don't want to do in the core, at least adding a hook for XlogInsert is of great value.
IMHO, this feature may not be needed by everyone, the hook-way seems
reasonable so that the postgres vendors can provide different
implementations (for instance they can write an extension that
implements this hook which can block writes on primary, write some log
messages, inform some service layer of the replicas falling behind the
primary etc.). If we were to have the hook in XLogInsert which gets
called so frequently or XLogInsert is a hot-path, the hook really
should do as little work as possible, otherwise the write operations
latency may increase.
A few other scenarios I can think of with the hook are:
Enforcing RPO as described above
Enforcing rate limit and slow throttling when sync standby is falling behind (could be flush lag or replay lag)
Transactional log rate governance - useful for cloud providers to provide SKU sizes based on allowed WAL writes.Thoughts?
The hook can help to achieve the above objectives but where to place
it and what parameters it should take as input (or what info it should
emit out of the server via the hook) are important too.
Having said all, the RPO feature can also be implemented outside of
the postgres, a simple implementation could be - get the primary
current wal lsn using pg_current_wal_lsn and all the replicas
restart_lsn using pg_replication_slot, if they differ by certain
amount, then issue ALTER SYSTEM SET READ ONLY command [1]/messages/by-id/CAAJ_b967uKBiW6gbHr5aPzweURYjEGv333FHVHxvJmMhanwHXA@mail.gmail.com on the
primary, this requires the connections to the server and proper access
rights. This feature can also be implemented as an extension (without
the hook) which doesn't require any connections to the server yet can
access the required info primary current_wal_lsn, restart_lsn of the
replication slots etc, but the RPO enforcement may not be immediate as
the server doesn't have any hooks in XLogInsert or some other area.
[1]: /messages/by-id/CAAJ_b967uKBiW6gbHr5aPzweURYjEGv333FHVHxvJmMhanwHXA@mail.gmail.com
Regards,
Bharath Rupireddy.
On Thu, Dec 23, 2021 at 5:53 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:
Hi Hackers,
I am considering implementing RPO (recovery point objective) enforcement feature for Postgres where the WAL writes on the primary are stalled when the WAL distance between the primary and standby exceeds the configured (replica_lag_in_bytes) threshold. This feature is useful particularly in the disaster recovery setups where primary and standby are in different regions and synchronous replication can't be set up for latency and performance reasons yet requires some level of RPO enforcement.
Limiting transaction rate when the standby fails behind is a good feature ...
The idea here is to calculate the lag between the primary and the standby (Async?) server during XLogInsert and block the caller until the lag is less than the threshold value. We can calculate the max lag by iterating over ReplicationSlotCtl->replication_slots. If this is not something we don't want to do in the core, at least adding a hook for XlogInsert is of great value.
but doing it in XLogInsert does not seem to be a good idea. It's a
common point for all kinds of logging including VACUUM. We could
accidently stall a critical VACUUM operation because of that.
As Bharath described, it better be handled at the application level monitoring.
--
Best Wishes,
Ashutosh Bapat
On Thu, Dec 23, 2021 at 5:18 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
wrote:
On Thu, Dec 23, 2021 at 5:53 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:Hi Hackers,
I am considering implementing RPO (recovery point objective) enforcement
feature for Postgres where the WAL writes on the primary are stalled when
the WAL distance between the primary and standby exceeds the configured
(replica_lag_in_bytes) threshold. This feature is useful particularly in
the disaster recovery setups where primary and standby are in different
regions and synchronous replication can't be set up for latency and
performance reasons yet requires some level of RPO enforcement.Limiting transaction rate when the standby fails behind is a good feature
...The idea here is to calculate the lag between the primary and the
standby (Async?) server during XLogInsert and block the caller until the
lag is less than the threshold value. We can calculate the max lag by
iterating over ReplicationSlotCtl->replication_slots. If this is not
something we don't want to do in the core, at least adding a hook for
XlogInsert is of great value.but doing it in XLogInsert does not seem to be a good idea.
XLogInsert isn't the best place to throttle/govern in a simple and fair
way, particularly the long-running transactions on the server?
It's a
common point for all kinds of logging including VACUUM. We could
accidently stall a critical VACUUM operation because of that.
Agreed, but again this is a policy decision that DBA can relax/enforce. I
expect RPO is in the range of a few 100MBs to GBs and on a healthy system
typically lag never comes close to this value. The Hook implementation can
take care of nitty-gritty details on the policy enforcement based on the
needs, for example, not throttling some backend processes like vacuum,
checkpointer; throttling based on the roles, for example not to throttle
superuser connections; and throttling based on replay lag, write lag,
checkpoint taking longer, closer to disk full. Each of these can be easily
translated into GUCs. Depending on the direction of the thread on the hook
vs a feature in the Core, I can add more implementation details.
As Bharath described, it better be handled at the application level
monitoring.
Both RPO based WAL throttling and application level monitoring can co-exist
as each one has its own merits and challenges. Each application developer
has to implement their own throttling logic and often times it is hard to
get it right.
Show quoted text
--
Best Wishes,
Ashutosh Bapat
Please find the attached draft patch.
On Thu, Dec 23, 2021 at 2:47 AM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:
On Thu, Dec 23, 2021 at 5:53 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:Hi Hackers,
I am considering implementing RPO (recovery point objective) enforcement
feature for Postgres where the WAL writes on the primary are stalled when
the WAL distance between the primary and standby exceeds the configured
(replica_lag_in_bytes) threshold. This feature is useful particularly in
the disaster recovery setups where primary and standby are in different
regions and synchronous replication can't be set up for latency and
performance reasons yet requires some level of RPO enforcement.+1 for the idea in general. However, blocking writes on primary seems
an extremely radical idea. The replicas can fall behind transiently at
times and blocking writes on the primary may stop applications failing
for these transient times. This is not a problem if the applications
have retry logic for the writes. How about blocking writes on primary
if the replicas fall behind the primary for a certain period of time?
My proposal is to block the caller from writing until the lag situation is
improved. Don't want to throw any errors and fail the tranaction. I think
we are aligned?
The idea here is to calculate the lag between the primary and the
standby (Async?) server during XLogInsert and block the caller until the
lag is less than the threshold value. We can calculate the max lag by
iterating over ReplicationSlotCtl->replication_slots.The "falling behind" can also be quantified by the number of
write-transactions on the primary. I think it's good to have the users
choose what the "falling behind" means for them. We can have something
like the "recovery_target" param with different options name, xid,
time, lsn.
The transactions can be of arbitrary size and length and these options may
not provide the desired results. Time is a worthy option to add.
If this is not something we don't want to do in the core, at least
adding a hook for XlogInsert is of great value.
IMHO, this feature may not be needed by everyone, the hook-way seems
reasonable so that the postgres vendors can provide different
implementations (for instance they can write an extension that
implements this hook which can block writes on primary, write some log
messages, inform some service layer of the replicas falling behind the
primary etc.). If we were to have the hook in XLogInsert which gets
called so frequently or XLogInsert is a hot-path, the hook really
should do as little work as possible, otherwise the write operations
latency may increase.
A Hook is a good start. If there is enough interest then an extension can
be added to the contrib module.
A few other scenarios I can think of with the hook are:
Enforcing RPO as described above
Enforcing rate limit and slow throttling when sync standby is fallingbehind (could be flush lag or replay lag)
Transactional log rate governance - useful for cloud providers to
provide SKU sizes based on allowed WAL writes.
Thoughts?
The hook can help to achieve the above objectives but where to place
it and what parameters it should take as input (or what info it should
emit out of the server via the hook) are important too.
XLogInsert in my opinion is the best place to call it and the hook can be
something like this "void xlog_insert_hook(NULL)" as all the throttling
logic required is the current flush position which can be obtained
from GetFlushRecPtr and the ReplicationSlotCtl. Attached a draft patch.
Having said all, the RPO feature can also be implemented outside of
the postgres, a simple implementation could be - get the primary
current wal lsn using pg_current_wal_lsn and all the replicas
restart_lsn using pg_replication_slot, if they differ by certain
amount, then issue ALTER SYSTEM SET READ ONLY command [1] on the
primary, this requires the connections to the server and proper access
rights. This feature can also be implemented as an extension (without
the hook) which doesn't require any connections to the server yet can
access the required info primary current_wal_lsn, restart_lsn of the
replication slots etc, but the RPO enforcement may not be immediate as
the server doesn't have any hooks in XLogInsert or some other area.
READ ONLY is a decent choice but can fail the writes or not take
into effect until the end of the transaction?
Show quoted text
[1] -
/messages/by-id/CAAJ_b967uKBiW6gbHr5aPzweURYjEGv333FHVHxvJmMhanwHXA@mail.gmail.comRegards,
Bharath Rupireddy.
Attachments:
0001-Add-xlog_insert_hook-to-give-control-to-the-plugins.patchapplication/octet-stream; name=0001-Add-xlog_insert_hook-to-give-control-to-the-plugins.patchDownload+14-1
Import Notes
Reply to msg id not found: CAHg+QDcUj5UWG13FCELZNVi_smZZdjiYQvSsQcgH2hp-NJTRZQ@mail.gmail.com
On Fri, Dec 24, 2021 at 3:27 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:
XLogInsert in my opinion is the best place to call it and the hook can be
something like this "void xlog_insert_hook(NULL)" as all the throttling
logic required is the current flush position which can be obtained
from GetFlushRecPtr and the ReplicationSlotCtl. Attached a draft patch.
IMHO, it is not a good idea to call an external hook function inside a
critical section. Generally, we ensure that we do not call any code path
within a critical section which can throw an error and if we start calling
the external hook then we lose that control. It should be blocked at the
operation level itself e.g. ALTER TABLE READ ONLY, or by some other hook at
a little higher level.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Fri, Dec 24, 2021 at 4:43 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Dec 24, 2021 at 3:27 AM SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> wrote:
XLogInsert in my opinion is the best place to call it and the hook can be something like this "void xlog_insert_hook(NULL)" as all the throttling logic required is the current flush position which can be obtained from GetFlushRecPtr and the ReplicationSlotCtl. Attached a draft patch.
IMHO, it is not a good idea to call an external hook function inside a critical section. Generally, we ensure that we do not call any code path within a critical section which can throw an error and if we start calling the external hook then we lose that control. It should be blocked at the operation level itself e.g. ALTER TABLE READ ONLY, or by some other hook at a little higher level.
Yeah, good point. It's not advisable to give the control to the
external module in the critical section. For instance, memory
allocation isn't allowed (see [1]/* * You should not do memory allocations within a critical section, because * an out-of-memory error will be escalated to a PANIC. To enforce that * rule, the allocation functions Assert that. */ #define AssertNotInCriticalSection(context) \ Assert(CritSectionCount == 0 || (context)->allowInCritSection)) and the ereport(ERROR,....) would
transform to PANIC inside the critical section (see [2]/* * If we are inside a critical section, all errors become PANIC * errors. See miscadmin.h. */ if (CritSectionCount > 0) elevel = PANIC;, [3]* A related, but conceptually distinct, mechanism is the "critical section" * mechanism. A critical section not only holds off cancel/die interrupts, * but causes any ereport(ERROR) or ereport(FATAL) to become ereport(PANIC) * --- that is, a system-wide reset is forced. Needless to say, only really * *critical* code should be marked as a critical section! Currently, this * mechanism is only used for XLOG-related code.).
Moreover the critical section is to be short-spanned i.e. executing
the as minimal code as possible. There's no guarantee that an external
module would follow these.
I suggest we do it at the level of transaction start i.e. when a txnid
is getting allocated i.e. in AssignTransactionId(). If we do this,
when the limit for the throttling is exceeded, the current txn (even
if it is a long running txn) continues to do the WAL insertions, the
next txns would get blocked. But this is okay and can be conveyed to
the users via documentation if need be. We do block txnid assignments
for parallel workers in this function, so this is a good choice IMO.
Thoughts?
[1]: /* * You should not do memory allocations within a critical section, because * an out-of-memory error will be escalated to a PANIC. To enforce that * rule, the allocation functions Assert that. */ #define AssertNotInCriticalSection(context) \ Assert(CritSectionCount == 0 || (context)->allowInCritSection)
/*
* You should not do memory allocations within a critical section, because
* an out-of-memory error will be escalated to a PANIC. To enforce that
* rule, the allocation functions Assert that.
*/
#define AssertNotInCriticalSection(context) \
Assert(CritSectionCount == 0 || (context)->allowInCritSection)
[2]: /* * If we are inside a critical section, all errors become PANIC * errors. See miscadmin.h. */ if (CritSectionCount > 0) elevel = PANIC;
/*
* If we are inside a critical section, all errors become PANIC
* errors. See miscadmin.h.
*/
if (CritSectionCount > 0)
elevel = PANIC;
[3]: * A related, but conceptually distinct, mechanism is the "critical section" * mechanism. A critical section not only holds off cancel/die interrupts, * but causes any ereport(ERROR) or ereport(FATAL) to become ereport(PANIC) * --- that is, a system-wide reset is forced. Needless to say, only really * *critical* code should be marked as a critical section! Currently, this * mechanism is only used for XLOG-related code.
* A related, but conceptually distinct, mechanism is the "critical section"
* mechanism. A critical section not only holds off cancel/die interrupts,
* but causes any ereport(ERROR) or ereport(FATAL) to become ereport(PANIC)
* --- that is, a system-wide reset is forced. Needless to say, only really
* *critical* code should be marked as a critical section! Currently, this
* mechanism is only used for XLOG-related code.
Regards,
Bharath Rupireddy.
On Fri, Dec 24, 2021 at 3:13 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Dec 24, 2021 at 3:27 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:XLogInsert in my opinion is the best place to call it and the hook can be
something like this "void xlog_insert_hook(NULL)" as all the throttling
logic required is the current flush position which can be obtained
from GetFlushRecPtr and the ReplicationSlotCtl. Attached a draft patch.IMHO, it is not a good idea to call an external hook function inside a
critical section. Generally, we ensure that we do not call any code path
within a critical section which can throw an error and if we start calling
the external hook then we lose that control.
Thank you for the comment. XLogInsertRecord is inside a critical section
but not XLogInsert. Am I missing something?
It should be blocked at the operation level itself e.g. ALTER TABLE READ
ONLY, or by some other hook at a little higher level.
There is a lot of maintenance overhead with a custom implementation at
individual databases and tables level. This doesn't provide the necessary
control that I am looking for.
Show quoted text
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Sun, Dec 26, 2021 at 3:52 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:
On Fri, Dec 24, 2021 at 3:13 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Dec 24, 2021 at 3:27 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:XLogInsert in my opinion is the best place to call it and the hook can
be something like this "void xlog_insert_hook(NULL)" as all the throttling
logic required is the current flush position which can be obtained
from GetFlushRecPtr and the ReplicationSlotCtl. Attached a draft patch.IMHO, it is not a good idea to call an external hook function inside a
critical section. Generally, we ensure that we do not call any code path
within a critical section which can throw an error and if we start calling
the external hook then we lose that control.Thank you for the comment. XLogInsertRecord is inside a critical section
but not XLogInsert. Am I missing something?
Actually all the WAL insertions are done under a critical section (except
few exceptions), that means if you see all the references of XLogInsert(),
it is always called under the critical section and that is my main worry
about hooking at XLogInsert level.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Sat, Dec 25, 2021 at 6:01 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Sun, Dec 26, 2021 at 3:52 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:On Fri, Dec 24, 2021 at 3:13 AM Dilip Kumar <dilipbalaut@gmail.com>
wrote:On Fri, Dec 24, 2021 at 3:27 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:XLogInsert in my opinion is the best place to call it and the hook can
be something like this "void xlog_insert_hook(NULL)" as all the throttling
logic required is the current flush position which can be obtained
from GetFlushRecPtr and the ReplicationSlotCtl. Attached a draft patch.IMHO, it is not a good idea to call an external hook function inside a
critical section. Generally, we ensure that we do not call any code path
within a critical section which can throw an error and if we start calling
the external hook then we lose that control.Thank you for the comment. XLogInsertRecord is inside a critical section
but not XLogInsert. Am I missing something?Actually all the WAL insertions are done under a critical section (except
few exceptions), that means if you see all the references of XLogInsert(),
it is always called under the critical section and that is my main worry
about hooking at XLogInsert level.
Got it, understood the concern. But can we document the limitations of the
hook and let the hook take care of it? I don't expect an error to be thrown
here since we are not planning to allocate memory or make file system calls
but instead look at the shared memory state and add delays when required.
Show quoted text
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Sun, Dec 26, 2021 at 1:06 PM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:
Got it, understood the concern. But can we document the limitations of the hook and let the hook take care of it? I don't expect an error to be thrown here since we are not planning to allocate memory or make file system calls but instead look at the shared memory state and add delays when required.
It wouldn't work. You can't make any assumption about how long it
would take for the replication lag to resolve, so you may have to wait
for a very long time. It means that at the very least the sleep has
to be interruptible and therefore can raise an error. In general
there isn't much you can due in a critical section, so this approach
doesn't seem sensible to me.
On Sun, Dec 26, 2021 at 10:36 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:
Actually all the WAL insertions are done under a critical section (except
few exceptions), that means if you see all the references of XLogInsert(),
it is always called under the critical section and that is my main worry
about hooking at XLogInsert level.Got it, understood the concern. But can we document the limitations of the
hook and let the hook take care of it? I don't expect an error to be thrown
here since we are not planning to allocate memory or make file system calls
but instead look at the shared memory state and add delays when required.
Yet another problem is that if we are in XlogInsert() that means we are
holding the buffer locks on all the pages we have modified, so if we add a
hook at that level which can make it wait then we would also block any of
the read operations needed to read from those buffers. I haven't thought
what could be better way to do this but this is certainly not good.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Sat, Dec 25, 2021 at 9:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Sun, Dec 26, 2021 at 10:36 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:Actually all the WAL insertions are done under a critical section
(except few exceptions), that means if you see all the references of
XLogInsert(), it is always called under the critical section and that is my
main worry about hooking at XLogInsert level.Got it, understood the concern. But can we document the limitations of
the hook and let the hook take care of it? I don't expect an error to be
thrown here since we are not planning to allocate memory or make file
system calls but instead look at the shared memory state and add delays
when required.Yet another problem is that if we are in XlogInsert() that means we are
holding the buffer locks on all the pages we have modified, so if we add a
hook at that level which can make it wait then we would also block any of
the read operations needed to read from those buffers. I haven't thought
what could be better way to do this but this is certainly not good.
Yes, this is a problem. The other approach is adding a hook at
XLogWrite/XLogFlush? All the other backends will be waiting behind the
WALWriteLock. The process that is performing the write enters into a busy
loop with small delays until the criteria are met. Inability to process the
interrupts inside the critical section is a challenge in both approaches.
Any other thoughts?
Show quoted text
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Greetings,
* SATYANARAYANA NARLAPURAM (satyanarlapuram@gmail.com) wrote:
On Sat, Dec 25, 2021 at 9:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Sun, Dec 26, 2021 at 10:36 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:Actually all the WAL insertions are done under a critical section
(except few exceptions), that means if you see all the references of
XLogInsert(), it is always called under the critical section and that is my
main worry about hooking at XLogInsert level.Got it, understood the concern. But can we document the limitations of
the hook and let the hook take care of it? I don't expect an error to be
thrown here since we are not planning to allocate memory or make file
system calls but instead look at the shared memory state and add delays
when required.Yet another problem is that if we are in XlogInsert() that means we are
holding the buffer locks on all the pages we have modified, so if we add a
hook at that level which can make it wait then we would also block any of
the read operations needed to read from those buffers. I haven't thought
what could be better way to do this but this is certainly not good.Yes, this is a problem. The other approach is adding a hook at
XLogWrite/XLogFlush? All the other backends will be waiting behind the
WALWriteLock. The process that is performing the write enters into a busy
loop with small delays until the criteria are met. Inability to process the
interrupts inside the critical section is a challenge in both approaches.
Any other thoughts?
Why not have this work the exact same way sync replicas do, except that
it's based off of some byte/time lag for some set of async replicas?
That is, in RecordTransactionCommit(), perhaps right after the
SyncRepWaitForLSN() call, or maybe even add this to that function? Sure
seems like there's a lot of similarity.
Thanks,
Stephen
Stephen, thank you!
On Wed, Dec 29, 2021 at 5:46 AM Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* SATYANARAYANA NARLAPURAM (satyanarlapuram@gmail.com) wrote:
On Sat, Dec 25, 2021 at 9:25 PM Dilip Kumar <dilipbalaut@gmail.com>
wrote:
On Sun, Dec 26, 2021 at 10:36 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:Actually all the WAL insertions are done under a critical section
(except few exceptions), that means if you see all the references of
XLogInsert(), it is always called under the critical section andthat is my
main worry about hooking at XLogInsert level.
Got it, understood the concern. But can we document the limitations of
the hook and let the hook take care of it? I don't expect an error tobe
thrown here since we are not planning to allocate memory or make file
system calls but instead look at the shared memory state and adddelays
when required.
Yet another problem is that if we are in XlogInsert() that means we are
holding the buffer locks on all the pages we have modified, so if weadd a
hook at that level which can make it wait then we would also block any
of
the read operations needed to read from those buffers. I haven't
thought
what could be better way to do this but this is certainly not good.
Yes, this is a problem. The other approach is adding a hook at
XLogWrite/XLogFlush? All the other backends will be waiting behind the
WALWriteLock. The process that is performing the write enters into a busy
loop with small delays until the criteria are met. Inability to processthe
interrupts inside the critical section is a challenge in both approaches.
Any other thoughts?Why not have this work the exact same way sync replicas do, except that
it's based off of some byte/time lag for some set of async replicas?
That is, in RecordTransactionCommit(), perhaps right after the
SyncRepWaitForLSN() call, or maybe even add this to that function? Sure
seems like there's a lot of similarity.
I was thinking of achieving log governance (throttling WAL MB/sec) and also
providing RPO guarantees. In this model, it is hard to throttle WAL
generation of a long running transaction (for example copy/select into).
However, this meets my RPO needs. Are you in support of adding a hook or
the actual change? IMHO, the hook allows more creative options. I can go
ahead and make a patch accordingly.
Show quoted text
Thanks,
Stephen
Greetings,
On Wed, Dec 29, 2021 at 14:04 SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:
Stephen, thank you!
On Wed, Dec 29, 2021 at 5:46 AM Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* SATYANARAYANA NARLAPURAM (satyanarlapuram@gmail.com) wrote:
On Sat, Dec 25, 2021 at 9:25 PM Dilip Kumar <dilipbalaut@gmail.com>
wrote:
On Sun, Dec 26, 2021 at 10:36 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:Actually all the WAL insertions are done under a critical section
(except few exceptions), that means if you see all the references of
XLogInsert(), it is always called under the critical section andthat is my
main worry about hooking at XLogInsert level.
Got it, understood the concern. But can we document the limitations
of
the hook and let the hook take care of it? I don't expect an error
to be
thrown here since we are not planning to allocate memory or make file
system calls but instead look at the shared memory state and adddelays
when required.
Yet another problem is that if we are in XlogInsert() that means we
are
holding the buffer locks on all the pages we have modified, so if we
add a
hook at that level which can make it wait then we would also block
any of
the read operations needed to read from those buffers. I haven't
thought
what could be better way to do this but this is certainly not good.
Yes, this is a problem. The other approach is adding a hook at
XLogWrite/XLogFlush? All the other backends will be waiting behind the
WALWriteLock. The process that is performing the write enters into abusy
loop with small delays until the criteria are met. Inability to process
the
interrupts inside the critical section is a challenge in both
approaches.
Any other thoughts?
Why not have this work the exact same way sync replicas do, except that
it's based off of some byte/time lag for some set of async replicas?
That is, in RecordTransactionCommit(), perhaps right after the
SyncRepWaitForLSN() call, or maybe even add this to that function? Sure
seems like there's a lot of similarity.I was thinking of achieving log governance (throttling WAL MB/sec) and
also providing RPO guarantees. In this model, it is hard to throttle WAL
generation of a long running transaction (for example copy/select into).
Long running transactions have a lot of downsides and are best discouraged.
I don’t know that we should be designing this for that case specifically,
particularly given the complications it would introduce as discussed on
this thread already.
However, this meets my RPO needs. Are you in support of adding a hook or
the actual change? IMHO, the hook allows more creative options. I can go
ahead and make a patch accordingly.
I would think this would make more sense as part of core rather than a
hook, as that then requires an extension and additional setup to get going,
which raises the bar quite a bit when it comes to actually being used.
Thanks,
Stephen
Show quoted text
On Wed, Dec 29, 2021 at 11:16 AM Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
On Wed, Dec 29, 2021 at 14:04 SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:Stephen, thank you!
On Wed, Dec 29, 2021 at 5:46 AM Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* SATYANARAYANA NARLAPURAM (satyanarlapuram@gmail.com) wrote:
On Sat, Dec 25, 2021 at 9:25 PM Dilip Kumar <dilipbalaut@gmail.com>
wrote:
On Sun, Dec 26, 2021 at 10:36 AM SATYANARAYANA NARLAPURAM <
satyanarlapuram@gmail.com> wrote:Actually all the WAL insertions are done under a critical section
(except few exceptions), that means if you see all the referencesof
XLogInsert(), it is always called under the critical section and
that is my
main worry about hooking at XLogInsert level.
Got it, understood the concern. But can we document the limitations
of
the hook and let the hook take care of it? I don't expect an error
to be
thrown here since we are not planning to allocate memory or make
file
system calls but instead look at the shared memory state and add
delays
when required.
Yet another problem is that if we are in XlogInsert() that means we
are
holding the buffer locks on all the pages we have modified, so if we
add a
hook at that level which can make it wait then we would also block
any of
the read operations needed to read from those buffers. I haven't
thought
what could be better way to do this but this is certainly not good.
Yes, this is a problem. The other approach is adding a hook at
XLogWrite/XLogFlush? All the other backends will be waiting behind the
WALWriteLock. The process that is performing the write enters into abusy
loop with small delays until the criteria are met. Inability to
process the
interrupts inside the critical section is a challenge in both
approaches.
Any other thoughts?
Why not have this work the exact same way sync replicas do, except that
it's based off of some byte/time lag for some set of async replicas?
That is, in RecordTransactionCommit(), perhaps right after the
SyncRepWaitForLSN() call, or maybe even add this to that function? Sure
seems like there's a lot of similarity.I was thinking of achieving log governance (throttling WAL MB/sec) and
also providing RPO guarantees. In this model, it is hard to throttle WAL
generation of a long running transaction (for example copy/select into).Long running transactions have a lot of downsides and are best
discouraged. I don’t know that we should be designing this for that case
specifically, particularly given the complications it would introduce as
discussed on this thread already.However, this meets my RPO needs. Are you in support of adding a hook or
the actual change? IMHO, the hook allows more creative options. I can go
ahead and make a patch accordingly.I would think this would make more sense as part of core rather than a
hook, as that then requires an extension and additional setup to get going,
which raises the bar quite a bit when it comes to actually being used.
Sounds good, I will work on making the changes accordingly.
Show quoted text
Thanks,
Stephen
Hi,
On 2021-12-27 16:40:28 -0800, SATYANARAYANA NARLAPURAM wrote:
Yet another problem is that if we are in XlogInsert() that means we are
holding the buffer locks on all the pages we have modified, so if we add a
hook at that level which can make it wait then we would also block any of
the read operations needed to read from those buffers. I haven't thought
what could be better way to do this but this is certainly not good.Yes, this is a problem. The other approach is adding a hook at
XLogWrite/XLogFlush?
That's pretty much the same - XLogInsert() can trigger an
XLogWrite()/Flush().
I think it's a complete no-go to add throttling to these places. It's quite
possible that it'd cause new deadlocks, and it's almost guaranteed to have
unintended consequences (e.g. replication falling back further because
XLogFlush() is being throttled).
I also don't think it's a sane thing to add hooks to these places. It's
complicated enough as-is, adding the chance for random other things to happen
during such crucial operations will make it even harder to maintain.
Greetings,
Andres Freund
On Wed, Dec 29, 2021 at 11:31 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2021-12-27 16:40:28 -0800, SATYANARAYANA NARLAPURAM wrote:
Yet another problem is that if we are in XlogInsert() that means we are
holding the buffer locks on all the pages we have modified, so if weadd a
hook at that level which can make it wait then we would also block any
of
the read operations needed to read from those buffers. I haven't
thought
what could be better way to do this but this is certainly not good.
Yes, this is a problem. The other approach is adding a hook at
XLogWrite/XLogFlush?That's pretty much the same - XLogInsert() can trigger an
XLogWrite()/Flush().I think it's a complete no-go to add throttling to these places. It's quite
possible that it'd cause new deadlocks, and it's almost guaranteed to have
unintended consequences (e.g. replication falling back further because
XLogFlush() is being throttled).I also don't think it's a sane thing to add hooks to these places. It's
complicated enough as-is, adding the chance for random other things to
happen
during such crucial operations will make it even harder to maintain.
Andres, thanks for the comments. Agreed on this based on the previous
discussions on this thread. Could you please share your thoughts on adding
it after SyncRepWaitForLSN()?
Show quoted text
Greetings,
Andres Freund
Hi,
On 2021-12-29 11:34:53 -0800, SATYANARAYANA NARLAPURAM wrote:
On Wed, Dec 29, 2021 at 11:31 AM Andres Freund <andres@anarazel.de> wrote:
Andres, thanks for the comments. Agreed on this based on the previous
discussions on this thread. Could you please share your thoughts on adding
it after SyncRepWaitForLSN()?
I don't think that's good either - you're delaying transaction commit
(i.e. xact becoming visible / locks being released). That also has the danger
of increasing lock contention (albeit more likely to be heavyweight locks /
serializable state). It'd have to be after the transaction actually committed.
Greetings,
Andres Freund