Long running query causing XID limit breach

Started by sudabout 2 years ago34 messagesgeneral

suds1434@gmail.com

about 2 years ago

Hello ,
It's RDS postgres version 15.4. We suddenly saw the
"MaximumUsedTransactionIDs" reach to ~1.5billion and got alerted by team
members who mentioned the database is going to be in shutdown/hung if this
value reaches to ~2billion and won't be able to serve any incoming
transactions. It was a panic situation.

I have heard of it before , because of the way postgres works and the XID
being a datatype of length 32 bit integer can only represent (2^32)/2=~2
billion transactions. However, as RDS performs the auto vacuum , we thought
that we need not worry about this issue. But it seems we were wrong. And we
found one adhoc "SELECT '' query was running on the reader instance since
the last couple of days and when that was killed, the max xid
(MaximumUsedTransactionIDs) dropped to 50million immediately.

So I have few questions,

1)This system is going to be a 24/7 up and running system which will
process ~500million business transactions/day in future i.e. ~4-5billion
rows/day inserted across multiple tables each day. And as I understand each
row will have XID allocated. So in that case , does it mean that, we will
need (5billion/24)=~200million XID/hour and thus , if any such legitimate
application "SELECT" query keeps running for ~10 hours (and thus keep the
historical XID alive) , then it can saturate the
"MaximumUsedTransactionIDs" and make the database standstill in
2billion/200million=~10hrs. Is this understanding correct? Seems we are
prone to hit this limit sooner going forward.

2)We have some legitimate cases where the reporting queries can run for
5-6hrs. So in such cases if the start of this SELECT query happen at 100th
XID on table TAB1, then whatever transactions happen after that time,
across all other tables(table2, table3 etc) in the database won't get
vacuum until that SELECT query on table1 get vacuumed(as database will try
to keep that same 100th XID image) and the XID will just keep incrementing
for new transaction, eventually reaching the max limit. Is my understanding
correct here?

3)Although RDS does the auto vacuum by default. but should we also consider
doing manual vacuum without impacting ongoing transactions? Something as
below options
vacuum freeze tab1;
vacuum freeze;
vacuum;
vacuum analyze tab1;
vacuum tab1;

4)Had worked in past in oracle database where the similar transaction
identifier is called as "system change number" , but never encountered that
being exhausted and also there it used to have UNDO record and if a SELECT
query needs anything beyond certain limit(set undo_retention parameter) the
select query used to fail with snapshot too old error but not impacting any
write transactions. But in postgres it seems nothing like that happens and
every "Select query" will try to run till its completion without any such
failure, until it gets skilled by someone. Is my understanding correct?

And in that case, It seems we have to mandatorily set "statement_timeout"
to some value e.g. 4hrs(also i am not seeing a way to set it for any
specific user level, so it will be set for all queries including
application level) and also "idle_in_transaction_session_timeout" to
5minutes, even on all the prod and non prod databases, to restrict the long
running transactions/queries and avoid such issues in future. Correct me if
I'm wrong.

Regards
Sud

Muhammad Salahuddin Manzoor

salahuddin.m@bitnine.net

about 2 years ago

In reply to: sud (#1)

Re: Long running query causing XID limit breach

Greetings,

In high-transaction environments like yours, it may be necessary to
supplement this with manual vacuuming.

Few Recommendations

Monitor Long-Running Queries try to optimize.
Optimize Autovacuum.
Partitioning.
Adopt Vacuum Strategy after peak hours.

*Salahuddin (살라후딘**)*

On Thu, 23 May 2024 at 02:16, sud <suds1434@gmail.com> wrote:

Show quoted text

Hello ,
It's RDS postgres version 15.4. We suddenly saw the
"MaximumUsedTransactionIDs" reach to ~1.5billion and got alerted by team
members who mentioned the database is going to be in shutdown/hung if this
value reaches to ~2billion and won't be able to serve any incoming
transactions. It was a panic situation.

I have heard of it before , because of the way postgres works and the XID
being a datatype of length 32 bit integer can only represent (2^32)/2=~2
billion transactions. However, as RDS performs the auto vacuum , we thought
that we need not worry about this issue. But it seems we were wrong. And we
found one adhoc "SELECT '' query was running on the reader instance since
the last couple of days and when that was killed, the max xid
(MaximumUsedTransactionIDs) dropped to 50million immediately.

So I have few questions,

1)This system is going to be a 24/7 up and running system which will
process ~500million business transactions/day in future i.e. ~4-5billion
rows/day inserted across multiple tables each day. And as I understand each
row will have XID allocated. So in that case , does it mean that, we will
need (5billion/24)=~200million XID/hour and thus , if any such legitimate
application "SELECT" query keeps running for ~10 hours (and thus keep the
historical XID alive) , then it can saturate the
"MaximumUsedTransactionIDs" and make the database standstill in
2billion/200million=~10hrs. Is this understanding correct? Seems we are
prone to hit this limit sooner going forward.

2)We have some legitimate cases where the reporting queries can run for
5-6hrs. So in such cases if the start of this SELECT query happen at 100th
XID on table TAB1, then whatever transactions happen after that time,
across all other tables(table2, table3 etc) in the database won't get
vacuum until that SELECT query on table1 get vacuumed(as database will try
to keep that same 100th XID image) and the XID will just keep incrementing
for new transaction, eventually reaching the max limit. Is my understanding
correct here?

3)Although RDS does the auto vacuum by default. but should we also
consider doing manual vacuum without impacting ongoing transactions?
Something as below options
vacuum freeze tab1;
vacuum freeze;
vacuum;
vacuum analyze tab1;
vacuum tab1;

4)Had worked in past in oracle database where the similar transaction
identifier is called as "system change number" , but never encountered that
being exhausted and also there it used to have UNDO record and if a SELECT
query needs anything beyond certain limit(set undo_retention parameter) the
select query used to fail with snapshot too old error but not impacting any
write transactions. But in postgres it seems nothing like that happens and
every "Select query" will try to run till its completion without any such
failure, until it gets skilled by someone. Is my understanding correct?

And in that case, It seems we have to mandatorily set "statement_timeout"
to some value e.g. 4hrs(also i am not seeing a way to set it for any
specific user level, so it will be set for all queries including
application level) and also "idle_in_transaction_session_timeout" to
5minutes, even on all the prod and non prod databases, to restrict the long
running transactions/queries and avoid such issues in future. Correct me if
I'm wrong.

Regards
Sud

sud

suds1434@gmail.com

about 2 years ago

In reply to: Muhammad Salahuddin Manzoor (#2)

Re: Long running query causing XID limit breach

On Thu, May 23, 2024 at 9:00 AM Muhammad Salahuddin Manzoor <
salahuddin.m@bitnine.net> wrote:

Greetings,

In high-transaction environments like yours, it may be necessary to
supplement this with manual vacuuming.

Few Recommendations

Monitor Long-Running Queries try to optimize.
Optimize Autovacuum.
Partitioning.
Adopt Vacuum Strategy after peak hours.

We have these big tables already partitioned. So does "vacuum table_name"

will endup scanning whole table or just the latest/live partition which is
getting loaded currently? and do you mean to say running command "vacuum
table_name;" frequently on selective tables that are experiencing heavy DML
? Hope this won't lock the table anyway because the data will be
written/read from these tables 24/7.

When you say, "optimize autovacuum" does it mean to set a higher value
of "autovacuum_max_workers"
and "autovacuum_freeze_max_age"?

Considering we have ~4 billion rows inserted daily into the table and there
is limit of ~2billion to the "Maximumusedtxnids", what threshold should we
set for the alerting and to have enough time at hand to fix this issue?

Muhammad Salahuddin Manzoor

salahuddin.m@bitnine.net

about 2 years ago

In reply to: sud (#3)

Re: Long running query causing XID limit breach

Greetings,

Running `VACUUM table_name;` on a partitioned table will vacuum each
partition individually, not the whole table as a single unit.

Yes, running `VACUUM table_name;` frequently on tables or partitions with
heavy DML is recommended.

Regular `VACUUM` does not lock the table for reads or writes, so it won't
disrupt ongoing 24/7 data operations.

"optimize autovacuum"
Yes. Adjust following parameters as per your system/environment requirement
autovacuum_max_workers,
autovacuum_freeze_max_age ,
autovacuum_vacuum_cost_delay

Following need to be first tested thoroughly in a test environment.
Recommended Alert Threshold
Alert at 50% Usage: Set the alert threshold at 1 billion used XIDs. This
provides a significant buffer, giving you ample time to take corrective
action before reaching the critical limit.

Calculation Rationale
Daily XID Usage: Approximately 4 billion rows per day implies high XID
consumption.
Buffer Time: At 1 billion XIDs, you would still have 1 billion XIDs
remaining, giving you roughly 12 hours to address the issue if your system
consumes 200 million XIDs per hour.

*Salahuddin (살라후딘**)*

On Thu, 23 May 2024 at 09:48, sud <suds1434@gmail.com> wrote:

Show quoted text

On Thu, May 23, 2024 at 9:00 AM Muhammad Salahuddin Manzoor <
salahuddin.m@bitnine.net> wrote:

Greetings,

In high-transaction environments like yours, it may be necessary to
supplement this with manual vacuuming.

Few Recommendations

Monitor Long-Running Queries try to optimize.
Optimize Autovacuum.
Partitioning.
Adopt Vacuum Strategy after peak hours.

We have these big tables already partitioned. So does "vacuum table_name"

will endup scanning whole table or just the latest/live partition which is
getting loaded currently? and do you mean to say running command "vacuum
table_name;" frequently on selective tables that are experiencing heavy DML
? Hope this won't lock the table anyway because the data will be
written/read from these tables 24/7.

When you say, "optimize autovacuum" does it mean to set a higher value of "autovacuum_max_workers"
and "autovacuum_freeze_max_age"?

Considering we have ~4 billion rows inserted daily into the table and
there is limit of ~2billion to the "Maximumusedtxnids", what threshold
should we set for the alerting and to have enough time at hand to fix this
issue?

sud

suds1434@gmail.com

about 2 years ago

In reply to: Muhammad Salahuddin Manzoor (#4)

Re: Long running query causing XID limit breach

On Thu, May 23, 2024 at 10:42 AM Muhammad Salahuddin Manzoor <
salahuddin.m@bitnine.net> wrote:

Greetings,

Running `VACUUM table_name;` on a partitioned table will vacuum each
partition individually, not the whole table as a single unit.

Yes, running `VACUUM table_name;` frequently on tables or partitions with
heavy DML is recommended.

Regular `VACUUM` does not lock the table for reads or writes, so it won't
disrupt ongoing 24/7 data operations.

"optimize autovacuum"
Yes. Adjust following parameters as per your system/environment
requirement
autovacuum_max_workers,
autovacuum_freeze_max_age ,
autovacuum_vacuum_cost_delay

Following need to be first tested thoroughly in a test environment.
Recommended Alert Threshold
Alert at 50% Usage: Set the alert threshold at 1 billion used XIDs. This
provides a significant buffer, giving you ample time to take corrective
action before reaching the critical limit.

Calculation Rationale
Daily XID Usage: Approximately 4 billion rows per day implies high XID
consumption.
Buffer Time: At 1 billion XIDs, you would still have 1 billion XIDs
remaining, giving you roughly 12 hours to address the issue if your system
consumes 200 million XIDs per hour.

Thank you so much. That helps.
So apart from setting these alerts on "Maximumusedtxnids" and making the
vacuum optimized by tweaking above parameters, should we also need to have
monitoring in place to ensure the Vacuum is not taking longer as compared
to its normal runtime and also if it's getting blocked/failed by something?
Like for example in our case where the select query was running longer , so
the vacuum must not be able to succeed every time it attempts, so is it
really worth having that level of alerting? and also how can we get an
idea regarding if the vacuum is not succeeding or getting failed etc to
avoid such upcoming issues?

sud

suds1434@gmail.com

about 2 years ago

In reply to: sud (#5)

Re: Long running query causing XID limit breach

Also,if i am getting it correct, it means we should not run any transaction
(even if it's legitimate one like for e.g. a big Reporting "SELECT" query)
beyond 10hrs, as that will end up consuming 10*200million XID per hour=
2billion XID limit saturation and thus causing system failure. Hope my
understanding is correct here.

On Thu, May 23, 2024 at 11:41 AM sud <suds1434@gmail.com> wrote:

Show quoted text

On Thu, May 23, 2024 at 10:42 AM Muhammad Salahuddin Manzoor <
salahuddin.m@bitnine.net> wrote:

Greetings,

Running `VACUUM table_name;` on a partitioned table will vacuum each
partition individually, not the whole table as a single unit.

Yes, running `VACUUM table_name;` frequently on tables or partitions with
heavy DML is recommended.

Regular `VACUUM` does not lock the table for reads or writes, so it won't
disrupt ongoing 24/7 data operations.

"optimize autovacuum"
Yes. Adjust following parameters as per your system/environment
requirement
autovacuum_max_workers,
autovacuum_freeze_max_age ,
autovacuum_vacuum_cost_delay

Following need to be first tested thoroughly in a test environment.
Recommended Alert Threshold
Alert at 50% Usage: Set the alert threshold at 1 billion used XIDs. This
provides a significant buffer, giving you ample time to take corrective
action before reaching the critical limit.

Calculation Rationale
Daily XID Usage: Approximately 4 billion rows per day implies high XID
consumption.
Buffer Time: At 1 billion XIDs, you would still have 1 billion XIDs
remaining, giving you roughly 12 hours to address the issue if your system
consumes 200 million XIDs per hour.

Thank you so much. That helps.
So apart from setting these alerts on "Maximumusedtxnids" and making the
vacuum optimized by tweaking above parameters, should we also need to have
monitoring in place to ensure the Vacuum is not taking longer as compared
to its normal runtime and also if it's getting blocked/failed by something?
Like for example in our case where the select query was running longer , so
the vacuum must not be able to succeed every time it attempts, so is it
really worth having that level of alerting? and also how can we get an
idea regarding if the vacuum is not succeeding or getting failed etc to
avoid such upcoming issues?

Muhammad Salahuddin Manzoor

salahuddin.m@bitnine.net

about 2 years ago

In reply to: sud (#6)

Re: Long running query causing XID limit breach

Greetings,

Yes, monitoring and alerting for VACUUM operations are crucial.

Track VACUUM Duration and Success:

SELECT pid, state, query_start, now() - query_start AS duration, query
FROM pg_stat_activity
WHERE query LIKE 'VACUUM%'
ORDER BY duration DESC;

Check Autovacuum Activity:

SELECT relname, last_vacuum, last_autovacuum, vacuum_count, autovacuum_count
FROM pg_stat_all_tables
WHERE relname = 'your_table_name';

Log and Monitor VACUUM Failures:

log_autovacuum_min_duration = 0 # Log all autovacuum actions
log_min_messages = 'WARNING' # Ensure warnings and above are logged

Use tools like pgBadger to analyze PostgreSQL logs and identify any issues
with autovacuum operations.

Set Up Alerts for Long-Running VACUUMs:

Use monitoring tools such as pgMonitor, Nagios, Prometheus with Grafana, or
New Relic to set up alerts for long-running VACUUM processes.

Yes, your understanding is correct. In a high-transaction environment like
yours, long-running transactions, including legitimate reporting queries,
can hold back the advancement of the transaction ID (XID) horizon. This can
prevent VACUUM from properly cleaning up old XIDs, leading to the risk of
XID wraparound and potential system failure.

Use some Mitigation Strategies to handle long running quires like Set
Transaction Timeouts, Monitor and Kill Long-Running Queries, Optimize Query
Performance, Schedule heavy reporting queries during periods of lower
transactional activity.

*Salahuddin (살라후딘*
*)*

On Thu, 23 May 2024 at 11:25, sud <suds1434@gmail.com> wrote:

Show quoted text

Also,if i am getting it correct, it means we should not run any
transaction (even if it's legitimate one like for e.g. a big Reporting
"SELECT" query) beyond 10hrs, as that will end up consuming 10*200million
XID per hour= 2billion XID limit saturation and thus causing system
failure. Hope my understanding is correct here.

On Thu, May 23, 2024 at 11:41 AM sud <suds1434@gmail.com> wrote:

On Thu, May 23, 2024 at 10:42 AM Muhammad Salahuddin Manzoor <
salahuddin.m@bitnine.net> wrote:

Greetings,

Running `VACUUM table_name;` on a partitioned table will vacuum each
partition individually, not the whole table as a single unit.

Yes, running `VACUUM table_name;` frequently on tables or partitions
with heavy DML is recommended.

Regular `VACUUM` does not lock the table for reads or writes, so it
won't disrupt ongoing 24/7 data operations.

"optimize autovacuum"
Yes. Adjust following parameters as per your system/environment
requirement
autovacuum_max_workers,
autovacuum_freeze_max_age ,
autovacuum_vacuum_cost_delay

Following need to be first tested thoroughly in a test environment.
Recommended Alert Threshold
Alert at 50% Usage: Set the alert threshold at 1 billion used XIDs. This
provides a significant buffer, giving you ample time to take corrective
action before reaching the critical limit.

Calculation Rationale
Daily XID Usage: Approximately 4 billion rows per day implies high XID
consumption.
Buffer Time: At 1 billion XIDs, you would still have 1 billion XIDs
remaining, giving you roughly 12 hours to address the issue if your system
consumes 200 million XIDs per hour.

Thank you so much. That helps.
So apart from setting these alerts on "Maximumusedtxnids" and making the
vacuum optimized by tweaking above parameters, should we also need to have
monitoring in place to ensure the Vacuum is not taking longer as compared
to its normal runtime and also if it's getting blocked/failed by something?
Like for example in our case where the select query was running longer , so
the vacuum must not be able to succeed every time it attempts, so is it
really worth having that level of alerting? and also how can we get an
idea regarding if the vacuum is not succeeding or getting failed etc to
avoid such upcoming issues?

yudhi s

learnerdatabase99@gmail.com

about 2 years ago

In reply to: sud (#5)

Re: Long running query causing XID limit breach

On Thu, May 23, 2024 at 11:42 AM sud <suds1434@gmail.com> wrote:

Calculation Rationale
Daily XID Usage: Approximately 4 billion rows per day implies high XID
consumption.
Buffer Time: At 1 billion XIDs, you would still have 1 billion XIDs
remaining, giving you roughly 12 hours to address the issue if your system
consumes 200 million XIDs per hour.

OP mentioned that initially the number of business transactions is around

500million but the rows inserted across many tables are ~4-5billion in
total per day. So doesn't it mean that the XID consumption will happen
based on the transactions rather on the number of rows basis. Say
for example ~4billion rows may be loaded using a batch size of ~1000 using
bulk load, so that will be ~4million txn so it should use ~4million XID but
not 4 billion XID usage. And thus making the transactions process in
batches rather than row by row minimizes the XID usage. Correct?

Laurenz Albe

laurenz.albe@cybertec.at

about 2 years ago

In reply to: sud (#1)

Re: Long running query causing XID limit breach

On Thu, 2024-05-23 at 02:46 +0530, sud wrote:

It's RDS postgres version 15.4. We suddenly saw the "MaximumUsedTransactionIDs"
reach to ~1.5billion and got alerted by team members who mentioned the database
is going to be in shutdown/hung if this value reaches to ~2billion and won't be
able to serve any incoming transactions. It was a panic situation.

I have heard of it before , because of the way postgres works and the XID being
a datatype of length 32 bit integer can only represent (2^32)/2=~2 billion
transactions. However, as RDS performs the auto vacuum , we thought that we need
not worry about this issue. But it seems we were wrong. And we found one adhoc
"SELECT '' query was running on the reader instance since the last couple of
days and when that was killed, the max xid (MaximumUsedTransactionIDs) dropped
to 50million immediately.

This has nothing to do with autovacuum running.
PostgreSQL won't freeze any rows above the xmin horizon (see the "backend_xmin"
column in "pg_stat_activity").

So I have few questions,

1)This system is going to be a 24/7 up and running system which will process
~500million business transactions/day in future i.e. ~4-5billion rows/day
inserted across multiple tables each day. And as I understand each row will
have XID allocated. So in that case , does it mean that, we will need
(5billion/24)=~200million XID/hour and thus , if any such legitimate
application "SELECT" query keeps running for ~10 hours (and thus keep the
historical XID alive) , then it can saturate the "MaximumUsedTransactionIDs"
and make the database standstill in 2billion/200million=~10hrs. Is this
understanding correct? Seems we are prone to hit this limit sooner going forward.

Yes, that is correct. You cannot run such long-running queries with a
transaction rate like that.

2)We have some legitimate cases where the reporting queries can run for 5-6hrs.
So in such cases if the start of this SELECT query happen at 100th XID on
table TAB1, then whatever transactions happen after that time, across all
other tables(table2, table3 etc) in the database won't get vacuum until that
SELECT query on table1 get vacuumed(as database will try to keep that same
100th XID image) and the XID will just keep incrementing for new transaction,
eventually reaching the max limit. Is my understanding correct here?

Again correct. PostgreSQL cannot tell which tables the query will use in the
future, so VACUUM cannot clean up old row versions in any table in the database.
This is irrelevant for transaction ID wraparound, though: you'll get into trouble
even if only a single table holds an unfrozen row that is old enough.

3)Although RDS does the auto vacuum by default. but should we also consider
doing manual vacuum without impacting ongoing transactions?

That won't help. Autovacuum is running, but cannot freeze the rows, and a manual
VACUUM cannot do it either.

4)Had worked in past in oracle database [...]

Oracle implements all this radically differently.

And in that case, It seems we have to mandatorily set "statement_timeout" to
some value e.g. 4hrs(also i am not seeing a way to set it for any specific
user level, so it will be set for all queries including application level)
and also "idle_in_transaction_session_timeout" to 5minutes, even on all the
prod and non prod databases, to restrict the long running transactions/queries
and avoid such issues in future. Correct me if I'm wrong.

That looks right.

One thing you could consider is running the long-running queries on a standby
server. Replication will get delayed, and you have to keep all the WAL
around for the standby to catch up once the query is done, but it should work.
You'd set "max_streaming_standby_delay" to -1 on the standby.

Yours,
Laurenz Albe

#10

sud

suds1434@gmail.com

about 2 years ago

In reply to: Laurenz Albe (#9)

Re: Long running query causing XID limit breach

On Thu, May 23, 2024 at 1:22 PM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

On Thu, 2024-05-23 at 02:46 +0530, sud wrote:

It's RDS postgres version 15.4. We suddenly saw the

"MaximumUsedTransactionIDs"

reach to ~1.5billion and got alerted by team members who mentioned the

database

is going to be in shutdown/hung if this value reaches to ~2billion and

won't be

able to serve any incoming transactions. It was a panic situation.

I have heard of it before , because of the way postgres works and the

XID being

a datatype of length 32 bit integer can only represent (2^32)/2=~2

billion

transactions. However, as RDS performs the auto vacuum , we thought that

we need

not worry about this issue. But it seems we were wrong. And we found one

adhoc

"SELECT '' query was running on the reader instance since the last

couple of

days and when that was killed, the max xid (MaximumUsedTransactionIDs)

dropped

to 50million immediately.

This has nothing to do with autovacuum running.
PostgreSQL won't freeze any rows above the xmin horizon (see the
"backend_xmin"
column in "pg_stat_activity").

So I have few questions,

1)This system is going to be a 24/7 up and running system which will

process

~500million business transactions/day in future i.e. ~4-5billion

rows/day

inserted across multiple tables each day. And as I understand each row

will

have XID allocated. So in that case , does it mean that, we will need
(5billion/24)=~200million XID/hour and thus , if any such legitimate
application "SELECT" query keeps running for ~10 hours (and thus keep

the

historical XID alive) , then it can saturate the

"MaximumUsedTransactionIDs"

and make the database standstill in 2billion/200million=~10hrs. Is this
understanding correct? Seems we are prone to hit this limit sooner

going forward.

Yes, that is correct. You cannot run such long-running queries with a
transaction rate like that.

When you mean transaction ,does it mean one commit ? For example if it's
inserting+committing ~1000 rows in one batch then all the 1000 rows will be
marked as one XID rather than 1000 different XID. and so we should look for
batch processing rather than row by row types processing. Is the
understanding correct?

One thing you could consider is running the long-running queries on a
standby
server. Replication will get delayed, and you have to keep all the WAL
around for the standby to catch up once the query is done, but it should
work.
You'd set "max_streaming_standby_delay" to -1 on the standby.

We have the "Select query" running on a reader instance , but still the
writer instance was showing up "MaximumUsedTransactionIDs" reaching
1.5billion, so it means both the instance as part of same cluster so
sharing same XIDs, and as per your suggestion we should run this in
separate standby cluster altogether which does not share same XID. Is this
understanding correct? or it can be handled even with another reader
instance by just tweaking some other parameter so that they won't share the
same XID?

#11

Laurenz Albe

laurenz.albe@cybertec.at

about 2 years ago

In reply to: sud (#10)

Re: Long running query causing XID limit breach

On Thu, 2024-05-23 at 13:41 +0530, sud wrote:

Yes, that is correct. You cannot run such long-running queries with a
transaction rate like that.

When you mean transaction ,does it mean one commit ? For example if it's
inserting+committing ~1000 rows in one batch then all the 1000 rows will
be marked as one XID rather than 1000 different XID. and so we should look
for batch processing rather than row by row types processing.
Is the understanding correct?

Yes, that would help.

One thing you could consider is running the long-running queries on a standby
server. Replication will get delayed, and you have to keep all the WAL
around for the standby to catch up once the query is done, but it should work.
You'd set "max_streaming_standby_delay" to -1 on the standby.

We have the "Select query" running on a reader instance , but still the writer
instance was showing up "MaximumUsedTransactionIDs" reaching 1.5billion, so it
means both the instance as part of same cluster so sharing same XIDs

If a long running query on the standby influences the primary, that means that
you have "hot_standby_feedback" set to "on". Set it to "off".

Yours,
Laurenz Albe

#12

sud

suds1434@gmail.com

about 2 years ago

In reply to: Laurenz Albe (#11)

Re: Long running query causing XID limit breach

On Thu, May 23, 2024 at 1:45 PM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

If a long running query on the standby influences the primary, that means
that
you have "hot_standby_feedback" set to "on". Set it to "off".

Will the setting up of "hot_standby_feedback" value to OFF will cause the

reader instance to give incorrect query results or unexpected query failure
which will be potential inconsistency between the writer and reader
instance, as because those XID's can be removed/cleaned by the writer node
even if its being read by the reader instance query. And it can have more
replication lag. So I'm wondering , if this setup is advisable one?

#13

Laurenz Albe

laurenz.albe@cybertec.at

about 2 years ago

In reply to: sud (#12)

Re: Long running query causing XID limit breach

On Thu, 2024-05-23 at 18:15 +0530, sud wrote:

On Thu, May 23, 2024 at 1:45 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

If a long running query on the standby influences the primary, that means that
you have "hot_standby_feedback" set to "on". Set it to "off".

Will the setting up of "hot_standby_feedback" value to OFF will cause the
reader instance to give incorrect query results or unexpected query failure
which will be potential inconsistency between the writer and reader instance,
as because those XID's can be removed/cleaned by the writer node even if its
being read by the reader instance query. And it can have more replication lag.

There will never be incorrect query results.

It can happen that a query on the standby gets canceled if you don't set
"max_standby_streaming_delay" to -1, but that can happen even if
"hot_standby_feedback" is "on". It just happens less often.

The effect of setting "max_standby_streaming_delay" to -1 will often be a
replication delay if you run a long query. That's what you have to
accept if you want to execute long-running queries.

You will never be able to have both of the following:
- queries never get canceled
- there is no replication delay

So I'm wondering , if this setup is advisable one?

I'd say yes. Anyway, if doesn't look like you have an alternative if
you want to run queries that take longer than it takes your transaction
ID counter to wrap around.

Yours,
Laurenz Albe

#14

sud

suds1434@gmail.com

about 2 years ago

In reply to: Laurenz Albe (#13)

Re: Long running query causing XID limit breach

On Thu, May 23, 2024 at 8:11 PM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

On Thu, 2024-05-23 at 18:15 +0530, sud wrote:

On Thu, May 23, 2024 at 1:45 PM Laurenz Albe <laurenz.albe@cybertec.at>

wrote:

If a long running query on the standby influences the primary, that

means that

you have "hot_standby_feedback" set to "on". Set it to "off".

Will the setting up of "hot_standby_feedback" value to OFF will cause the
reader instance to give incorrect query results or unexpected query

failure

which will be potential inconsistency between the writer and reader

instance,

as because those XID's can be removed/cleaned by the writer node even if

its

being read by the reader instance query. And it can have more

replication lag.

There will never be incorrect query results.

It can happen that a query on the standby gets canceled if you don't set
"max_standby_streaming_delay" to -1, but that can happen even if
"hot_standby_feedback" is "on". It just happens less often.

The effect of setting "max_standby_streaming_delay" to -1 will often be a
replication delay if you run a long query. That's what you have to
accept if you want to execute long-running queries.

You will never be able to have both of the following:
- queries never get canceled
- there is no replication delay

I am trying to understand these two parameters and each time it looks a bit
confusing to me. If These two parameters complement or conflict with each
other.

Say for example, If we set hot_feedback_standby to ON (which is currently
set as default ON by the way), it will make the primary wait till the query
completion at standby and can cause such a high bump in XID in scenarios
where the query on standby runs for days(like in our current scenario which
happens). So we were thinking of setting it as OFF, to avoid
the transaction ID wrap around issue..

But as you also mentioned to set the "max_standby_streaming_delay" to -1
(which is currently set as 14 second in our case) ,it will wait infinitely
, till the query completes on the standby and wont apply the WAL which can
cause override of the XID which the standby query is reading from. But wont
this same behaviour be happening while we have hot_feedback_standby set as
"ON"?

But again for HA , in case primary down we should not be in big lag for the
standby and thus we want the standby also with minimal lag. And as you
mentioned there will never be incorrect results but at amx it will be query
cancellation, so I was thinking , if it's fine to just keep the
"hot_feedback_standby" as OFF and let the max_standby_streaming_delay set
as it is like 14 sec. Let me know your thoughts.

Basically below are the combinations, i am confused between..

hot_feedback_stanby ON and max_standby_streaming_delay=-1
or
hot_feedback_stanby OFF and max_standby_streaming_delay=-1
Or
hot_feedback_stanby ON and max_standby_streaming_delay=14 sec
Or
hot_feedback_stanby OFF and max_standby_streaming_delay=14 sec

#15

yudhi s

learnerdatabase99@gmail.com

about 2 years ago

In reply to: sud (#14)

Re: Long running query causing XID limit breach

On Fri, May 24, 2024 at 10:34 AM sud <suds1434@gmail.com> wrote:

I am trying to understand these two parameters and each time it looks a
bit confusing to me. If These two parameters complement or conflict with
each other.

Say for example, If we set hot_feedback_standby to ON (which is currently
set as default ON by the way), it will make the primary wait till the query
completion at standby and can cause such a high bump in XID in scenarios
where the query on standby runs for days(like in our current scenario which
happens). So we were thinking of setting it as OFF, to avoid
the transaction ID wrap around issue..

But as you also mentioned to set the "max_standby_streaming_delay" to -1
(which is currently set as 14 second in our case) ,it will wait infinitely
, till the query completes on the standby and wont apply the WAL which can
cause override of the XID which the standby query is reading from. But wont
this same behaviour be happening while we have hot_feedback_standby set as
"ON"?

But again for HA , in case primary down we should not be in big lag for
the standby and thus we want the standby also with minimal lag. And as you
mentioned there will never be incorrect results but at amx it will be query
cancellation, so I was thinking , if it's fine to just keep the
"hot_feedback_standby" as OFF and let the max_standby_streaming_delay set
as it is like 14 sec. Let me know your thoughts.

Basically below are the combinations, i am confused between..

hot_feedback_stanby ON and max_standby_streaming_delay=-1
or
hot_feedback_stanby OFF and max_standby_streaming_delay=-1
Or
hot_feedback_stanby ON and max_standby_streaming_delay=14 sec
Or
hot_feedback_stanby OFF and max_standby_streaming_delay=14 sec

As per my understanding here, this would be the behaviour. Others may
comment..

*hot_standby_feedback ON and max_standby_streaming_delay = -1:*
Ensures that long-running queries on the standby are not interrupted. The
primary waits indefinitely to avoid vacuuming rows needed by standby
queries.
But Can lead to significant replication lag and increased XID consumption
on the primary, potentially causing transaction ID wraparound issues.

*hot_standby_feedback OFF and max_standby_streaming_delay = -1:*
Ensures long-running queries on the standby are not interrupted. No
feedback is sent to the primary, reducing the risk of XID wraparound.
But The standby may fall significantly behind the primary, resulting in
high replication lag.

*hot_standby_feedback ON and max_standby_streaming_delay = 14 seconds:*
The primary prevents vacuuming rows needed by standby queries, reducing
query cancellations on the standby. The replication lag is limited to 14
seconds.
But Long-running queries on the standby that exceed 14 seconds may be
canceled, and the primary can still experience increased XID consumption.

*hot_standby_feedback OFF and max_standby_streaming_delay = 14 seconds:*
Limits replication lag to 14 seconds and reduces XID consumption on the
primary. Queries on the standby exceeding 14 seconds are canceled.
but Long-running queries on the standby are more likely to be canceled due
to the lack of feedback to the primary.

#16

sud

suds1434@gmail.com

about 2 years ago

In reply to: yudhi s (#15)

Re: Long running query causing XID limit breach

On Sun, May 26, 2024 at 2:24 AM yudhi s <learnerdatabase99@gmail.com> wrote:

*hot_standby_feedback ON and max_standby_streaming_delay = -1:*
Ensures that long-running queries on the standby are not interrupted. The
primary waits indefinitely to avoid vacuuming rows needed by standby
queries.
But Can lead to significant replication lag and increased XID consumption
on the primary, potentially causing transaction ID wraparound issues.

*hot_standby_feedback OFF and max_standby_streaming_delay = -1:*
Ensures long-running queries on the standby are not interrupted. No
feedback is sent to the primary, reducing the risk of XID wraparound.
But The standby may fall significantly behind the primary, resulting in
high replication lag.

*hot_standby_feedback ON and max_standby_streaming_delay = 14 seconds:*
The primary prevents vacuuming rows needed by standby queries, reducing
query cancellations on the standby. The replication lag is limited to 14
seconds.
But Long-running queries on the standby that exceed 14 seconds may be
canceled, and the primary can still experience increased XID consumption.

*hot_standby_feedback OFF and max_standby_streaming_delay = 14 seconds:*
Limits replication lag to 14 seconds and reduces XID consumption on the
primary. Queries on the standby exceeding 14 seconds are canceled.
but Long-running queries on the standby are more likely to be canceled due
to the lack of feedback to the primary.

Thank you so much.
Does it mean that the last one we should go for i.e. (*hot_standby_feedback
OFF and max_standby_streaming_delay = 14 seconds), *as because high
availability is also a key requirement in any production environment, so
keeping 14 seconds lag is kind of okay and also at the same time
keeping hot_standby_feedback OFF will make sure the transaction id
wraparound around won't happen because of any long running query on standby
as it won't wait for the stand by feedback for vacuuming the tables.

But i have one question here , does max_standby_streaming_delay = 14 ,
means the queries on the standby will get cancelled after 14 seconds?

#17

David HJ

chuxiongzhong@gmail.com

about 2 years ago

In reply to: sud (#1)

Re: Long running query causing XID limit breach

anyone know how to describe from this mailing list?

On Thu, May 23, 2024 at 5:16 AM sud <suds1434@gmail.com> wrote:

Show quoted text

Hello ,
It's RDS postgres version 15.4. We suddenly saw the
"MaximumUsedTransactionIDs" reach to ~1.5billion and got alerted by team
members who mentioned the database is going to be in shutdown/hung if this
value reaches to ~2billion and won't be able to serve any incoming
transactions. It was a panic situation.

I have heard of it before , because of the way postgres works and the XID
being a datatype of length 32 bit integer can only represent (2^32)/2=~2
billion transactions. However, as RDS performs the auto vacuum , we thought
that we need not worry about this issue. But it seems we were wrong. And we
found one adhoc "SELECT '' query was running on the reader instance since
the last couple of days and when that was killed, the max xid
(MaximumUsedTransactionIDs) dropped to 50million immediately.

So I have few questions,

1)This system is going to be a 24/7 up and running system which will
process ~500million business transactions/day in future i.e. ~4-5billion
rows/day inserted across multiple tables each day. And as I understand each
row will have XID allocated. So in that case , does it mean that, we will
need (5billion/24)=~200million XID/hour and thus , if any such legitimate
application "SELECT" query keeps running for ~10 hours (and thus keep the
historical XID alive) , then it can saturate the
"MaximumUsedTransactionIDs" and make the database standstill in
2billion/200million=~10hrs. Is this understanding correct? Seems we are
prone to hit this limit sooner going forward.

2)We have some legitimate cases where the reporting queries can run for
5-6hrs. So in such cases if the start of this SELECT query happen at 100th
XID on table TAB1, then whatever transactions happen after that time,
across all other tables(table2, table3 etc) in the database won't get
vacuum until that SELECT query on table1 get vacuumed(as database will try
to keep that same 100th XID image) and the XID will just keep incrementing
for new transaction, eventually reaching the max limit. Is my understanding
correct here?

3)Although RDS does the auto vacuum by default. but should we also
consider doing manual vacuum without impacting ongoing transactions?
Something as below options
vacuum freeze tab1;
vacuum freeze;
vacuum;
vacuum analyze tab1;
vacuum tab1;

4)Had worked in past in oracle database where the similar transaction
identifier is called as "system change number" , but never encountered that
being exhausted and also there it used to have UNDO record and if a SELECT
query needs anything beyond certain limit(set undo_retention parameter) the
select query used to fail with snapshot too old error but not impacting any
write transactions. But in postgres it seems nothing like that happens and
every "Select query" will try to run till its completion without any such
failure, until it gets skilled by someone. Is my understanding correct?

And in that case, It seems we have to mandatorily set "statement_timeout"
to some value e.g. 4hrs(also i am not seeing a way to set it for any
specific user level, so it will be set for all queries including
application level) and also "idle_in_transaction_session_timeout" to
5minutes, even on all the prod and non prod databases, to restrict the long
running transactions/queries and avoid such issues in future. Correct me if
I'm wrong.

Regards
Sud

#18

Torsten Förtsch

tfoertsch123@gmail.com

about 2 years ago

In reply to: sud (#16)

Re: Long running query causing XID limit breach

On Sat, May 25, 2024 at 11:00 PM sud <suds1434@gmail.com> wrote:

But i have one question here , does max_standby_streaming_delay = 14 ,
means the queries on the standby will get cancelled after 14 seconds?

No, your query gets cancelled when it stalls replication for >14 sec. If
your master is idle and does not send any WAL and the replica has
caught up, the query can take as long as it wants.

#19

sud

suds1434@gmail.com

about 2 years ago

In reply to: Torsten Förtsch (#18)

Re: Long running query causing XID limit breach

On Sun, May 26, 2024 at 1:43 PM Torsten Förtsch <tfoertsch123@gmail.com>
wrote:

On Sat, May 25, 2024 at 11:00 PM sud <suds1434@gmail.com> wrote:

But i have one question here , does max_standby_streaming_delay = 14 ,
means the queries on the standby will get cancelled after 14 seconds?

No, your query gets cancelled when it stalls replication for >14 sec. If
your master is idle and does not send any WAL and the replica has
caught up, the query can take as long as it wants.

Thank you so much.
For example , in below scenario,
if i have insert query going on on primary instance on table 25th may
partition of TABLE1, and at same time we are selecting data from 24th May
partition , then with "max_standby_streaming_delay = 14" setup , it just
allows the select query to run for any duration without any restriction
even if the WAL gets applied on the standby regularly. Also INSERT query in
primary won't make the standby SELECT queries to cancel as because the WAL
record of INSERT queries on the primary instance is not conflicting to the
exact rows those were being read by the standby. Is my understanding
correct here?

However, if i have Update/Delete query going on on primary instance on
table 25th may partition of TABLE1 and on the exact same set of rows which
were being read by the standby instance by the SELECT query, then the
application of such WAL record to standby can max wait for 14 seconds and
thus those select query are prone to be cancelled after 14 seconds. Is this
understanding correct?

If the above is true then it doesn't look good, as because in an OLTP
system there will be a lot of DMLS happening on the writer instances and
there may be many queries running on the reader/standby instances which are
meant to run for hours. And if we say making those SELECT queries run for
hours means compromising an hour of "high availability"/RPO or a lag of an
hour between primary and standby , that doesn't look good. Please
correct me if I am missing something here.

#20