Experimental patch for terminating VACUUM freeze blockers

Started by wenhui qiu12 days ago4 messageshackers
Jump to latest
#1wenhui qiu
qiuwenhuifx@gmail.com

Hi all,

When a table’s age reaches vacuum_failsafe_age, VACUUM enters failsafe mode
and bypasses nonessential throttling such as autovacuum_vacuum_cost_delay so
it can make progress as aggressively as possible.

However, if VACUUM’s freeze horizon is blocked by an old transaction, the
failsafe behavior alone cannot reduce the table age. In that case, VACUUM
may run at full speed but still be unable to advance relfrozenxid.

I have an experimental patch to explore handling this situation. The patch
adds a GUC, vacuum_freeze_terminate_blockers_pid, which allows VACUUM to
terminate regular client backends whose transaction horizon blocks VACUUM
from advancing its freeze cutoff. The intended targets are
idle-in-transaction sessions and long-running active transactions that are
holding an old xmin or assigned XID.

One possible question is why not rely on idle_in_transaction_session_timeout.
In practice, this parameter is often not configured, and even when it is,
it only addresses idle-in-transaction sessions. It does not help with
active long-running transactions that can also hold back VACUUM’s freeze
horizon.

The patch deliberately does not try to handle other causes of freeze
horizon retention, such as replication slots, hot standby feedback, or
prepared transactions.

This is experimental and intended for discussion.

Attachments:

vacuum_freeze_terminate_blockers_pid.patchapplication/octet-stream; name=vacuum_freeze_terminate_blockers_pid.patchDownload+394-3
#2Nathan Bossart
nathandbossart@gmail.com
In reply to: wenhui qiu (#1)
Re: Experimental patch for terminating VACUUM freeze blockers

On Wed, May 13, 2026 at 07:56:43PM +0800, wenhui qiu wrote:

I have an experimental patch to explore handling this situation. The patch
adds a GUC, vacuum_freeze_terminate_blockers_pid, which allows VACUUM to
terminate regular client backends whose transaction horizon blocks VACUUM
from advancing its freeze cutoff. The intended targets are
idle-in-transaction sessions and long-running active transactions that are
holding an old xmin or assigned XID.

Thanks for sharing. I certainly agree that this area has room for
improvement in Postgres.

The patch deliberately does not try to handle other causes of freeze
horizon retention, such as replication slots, hot standby feedback, or
prepared transactions.

My experience is a bit dated, but I remember the two main issues being
replication slots and temporary tables. We now have
idle_replication_slot_timeout, and there's a somewhat active thread on an
XID version of that parameter [0]/messages/by-id/CA+-JvFsMHckBMzsu5Ov9HCG3AFbMh056hHy1FiXazBRtZ9pFBg@mail.gmail.com, but I'm not aware of any recent ideas
about how to deal with stranded temporary tables. I'd encourage you to
think about these problems, too.

[0]: /messages/by-id/CA+-JvFsMHckBMzsu5Ov9HCG3AFbMh056hHy1FiXazBRtZ9pFBg@mail.gmail.com

--
nathan

#3Sami Imseih
samimseih@gmail.com
In reply to: Nathan Bossart (#2)
Re: Experimental patch for terminating VACUUM freeze blockers

On Wed, May 13, 2026 at 07:56:43PM +0800, wenhui qiu wrote:

I have an experimental patch to explore handling this situation. The patch
adds a GUC, vacuum_freeze_terminate_blockers_pid, which allows VACUUM to
terminate regular client backends whose transaction horizon blocks VACUUM
from advancing its freeze cutoff. The intended targets are
idle-in-transaction sessions and long-running active transactions that are
holding an old xmin or assigned XID.

Thanks for sharing. I certainly agree that this area has room for
improvement in Postgres.

My 2c. Using something like the proposed
vacuum_freeze_terminate_blockers_pid (GUC name is misleading, since
it's a bool )
seems backwards to me. It does not address the root cause, which is
the long-running
transaction, etc and attempts to deal with the symptom rather than the problem.
This also means a poor configuration of this parameter will more
likely lead to a system
silently getting into wraparound, as a DBA may relax a bit on monitoring, maybe.

I do think we need better visibility into what is blocking vacuum, which was
discussed here [1]/messages/by-id/CAOzEurSgy-gDtwFmEbj5+R9PL0_G3qYB6nnzJtNStyuf87VSVg@mail.gmail.com, but ultimately it is up to the DBA to properly monitor
and mitigate workloads that are impacting their vacuum.

[1]: /messages/by-id/CAOzEurSgy-gDtwFmEbj5+R9PL0_G3qYB6nnzJtNStyuf87VSVg@mail.gmail.com

--
Sami Imseih
Amazon Web Services (AWS)

#4wenhui qiu
qiuwenhuifx@gmail.com
In reply to: Sami Imseih (#3)
Re: Experimental patch for terminating VACUUM freeze blockers

Hi Sami

My 2c. Using something like the proposed
vacuum_freeze_terminate_blockers_pid (GUC name is misleading, since
it's a bool )
seems backwards to me. It does not address the root cause, which is
the long-running
transaction, etc and attempts to deal with the symptom rather than the

problem.

This also means a poor configuration of this parameter will more
likely lead to a system
silently getting into wraparound, as a DBA may relax a bit on monitoring,

maybe.

Thanks for your comments. I agree that the real problem is the long-running
transaction or other blocker, and this patch is not meant to replace
monitoring or fixing that root cause. The idea was inspired by the
replay/query conflict mechanism. Normally we do not want to cancel
long-running queries that block replay either; alternatively, we can enable
hot_standby_feedback, but that brings other problems. So this is a
trade-off left to the DBA. I think this is similar: the right solution is
still to prevent xid wraparound by managing the blockers, but when the
system is already approaching wraparound, terminating blockers may be
preferable to letting the system continue toward a dangerous state. So this
is intended as a last-resort option, not as the primary solution.

I do think we need better visibility into what is blocking vacuum, which

was

discussed here [1], but ultimately it is up to the DBA to properly monitor
and mitigate workloads that are impacting their vacuum.

[1]

/messages/by-id/CAOzEurSgy-gDtwFmEbj5+R9PL0_G3qYB6nnzJtNStyuf87VSVg@mail.gmail.com

I'm in this thread,I know you're good at adding relevant system views.
Often, checking system views is more convenient than looking at logs. If
this request is approved, I’d appreciate your help in implementing the
corresponding system views.
.

Thanks

On Thu, May 14, 2026 at 3:55 AM Sami Imseih <samimseih@gmail.com> wrote:

Show quoted text

On Wed, May 13, 2026 at 07:56:43PM +0800, wenhui qiu wrote:

I have an experimental patch to explore handling this situation. The

patch

adds a GUC, vacuum_freeze_terminate_blockers_pid, which allows VACUUM

to

terminate regular client backends whose transaction horizon blocks

VACUUM

from advancing its freeze cutoff. The intended targets are
idle-in-transaction sessions and long-running active transactions that

are

holding an old xmin or assigned XID.

Thanks for sharing. I certainly agree that this area has room for
improvement in Postgres.

My 2c. Using something like the proposed
vacuum_freeze_terminate_blockers_pid (GUC name is misleading, since
it's a bool )
seems backwards to me. It does not address the root cause, which is
the long-running
transaction, etc and attempts to deal with the symptom rather than the
problem.
This also means a poor configuration of this parameter will more
likely lead to a system
silently getting into wraparound, as a DBA may relax a bit on monitoring,
maybe.

I do think we need better visibility into what is blocking vacuum, which
was
discussed here [1], but ultimately it is up to the DBA to properly monitor
and mitigate workloads that are impacting their vacuum.

[1]
/messages/by-id/CAOzEurSgy-gDtwFmEbj5+R9PL0_G3qYB6nnzJtNStyuf87VSVg@mail.gmail.com

--
Sami Imseih
Amazon Web Services (AWS)