A few patches to clarify snapshot management

Started by Heikki Linnakangasover 1 year ago13 messageshackers
Jump to latest
#1Heikki Linnakangas
heikki.linnakangas@enterprisedb.com

While working on the CSN snapshot patch, I got sidetracked looking
closer into the snapshot tracking in snapmgr.c. Attached are a few
patches to clarify some things.

# Patch 1: Remove unnecessary GetTransactionSnapshot() calls and FIXME
comments

In commit dc7420c2c927, Andres added FIXME comments like these in a few
places:

autovacuum.c, get_database_list(void):

/*
* Start a transaction so we can access pg_database, and get a snapshot.
* We don't have a use for the snapshot itself, but we're interested in
* the secondary effect that it sets RecentGlobalXmin. (This is critical
* for anything that reads heap pages, because HOT may decide to prune
* them even if the process doesn't attempt to modify any tuples.)
*
* FIXME: This comment is inaccurate / the code buggy. A snapshot that is
* not pushed/active does not reliably prevent HOT pruning (->xmin could
* e.g. be cleared when cache invalidations are processed).
*/
StartTransactionCommand();
(void) GetTransactionSnapshot();

Those GetTransactionSnapshot() calls are unnecessary, because we hold
onto registered copy of CatalogSnapshot throughout the catalog scans.
This patch removes those unnecessary calls, and the FIXMEs.

# Patch 2: Assert that a snapshot is active or registered before it's used

GetTransactionSnapshot() comment said:

* Note that the return value may point at static storage that will be modified
* by future calls and by CommandCounterIncrement(). Callers should call
* RegisterSnapshot or PushActiveSnapshot on the returned snap if it is to be
* used very long.

That's pretty vague. Firstly, it says the returned value _may_ point to
static storage, but ISTM it _always_ does, if you interpret "static
storage" liberally. Some callers actually rely on the fact that you can
call GetTransactionSnapshot() and throw away the result without having a
leak. So I propose rewording that to "return value points at static
storage", rather than just "may point".

In REPEATABLE READ mode, the returned CurrentSnapshot is palloc'd, not a
pointer directly to a static variable, but all calls within the same
transaction return the same palloc'd Snapshot pointer, and will be
modified by CommandCounterIncrement(). From the caller's point of view,
it's like a static.

Secondly, what exactly is "used very long"? It means until the next call
of any of the Get*Snapshot() functions, CommandCounterIncrement(), or
anything that might call SnapshotResetXmin() like PopActiveSnapshot().
Given how complicated that gets, I feel it's dangerous to do pretty much
anything else than immediately call PushActiveSnapshot() or
RegisterSnapshot() with it. To try to enforce that, this patch adds an
assertion in HeapTupleSatisfiesMVCC() that the snapshot must be
registered or pushed active. That's not a very accurate check of that
stricter rule: some callers were violating the new assertion and had
comments to explain why it was safe, and OTOH it won't catch calls to
those invalidating functions that don't involve visibility checks.

We were violating that assertion in a few places, which were not wrong
and had explaining comments, but this patch changes them to just
register the snapshot instead of explaining why it's safe to skip it.

# Patch 3: Add comment with more details on active snapshots

Now that I have this swapped in my head, I wrote a few paragraphs on how
the active snapshot stack works at high level.

# Patch 4: Add checks that no snapshots are "leaked"

This patch is not to be committed right now, just for discussion.

I'm not very happy with how GetTransactionSnapshot() and friends return
a statically allocated snapshot. The whole "return value should not be
used very long" thing is just so vague. If we changed it to return a
palloc'd snapshot, would we introduce leaks? This patch adds assertions
that every call to GetTransactionSnapshot() is paired with a
PushActiveSnapshot() or RegsiterSnapshot() call, and changes a few
places that were violating that stricter rule. Some of those changes
seem nice anyway, like registering the snapshot in verify_heapam(), even
though they're not strictly necessary today.

A perhaps better way to enforce that would be to replace
GetTransactionSnapshot() with functions that also push or register the
snapshot:

RegisterSnapshot(GetTransactionSnapshot()) -> RegisterTransactionSnapshot()

PushActiveSnapshot(GetTransactionSnapshot()) -> PushTransactionSnapshot()

That function signature would eliminate the concept of a returned
statically-allocated snapshot, and the whole question of what does "used
very long" mean in GetTransactionSnapshot(). Thoughts on that?

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

0001-Remove-unnecessary-GetTransactionSnapshot-calls-and-.patchtext/x-patch; charset=UTF-8; name=0001-Remove-unnecessary-GetTransactionSnapshot-calls-and-.patchDownload+3-33
0002-Assert-that-a-snapshot-is-active-or-registered-befor.patchtext/x-patch; charset=UTF-8; name=0002-Assert-that-a-snapshot-is-active-or-registered-befor.patchDownload+26-18
0003-Add-comment-with-more-details-on-active-snapshots.patchtext/x-patch; charset=UTF-8; name=0003-Add-comment-with-more-details-on-active-snapshots.patchDownload+55-1
0004-WIP-Add-checks-that-no-snapshots-are-leaked.patchtext/x-patch; charset=UTF-8; name=0004-WIP-Add-checks-that-no-snapshots-are-leaked.patchDownload+81-28
#2Nathan Bossart
nathandbossart@gmail.com
In reply to: Heikki Linnakangas (#1)
Re: A few patches to clarify snapshot management

On Mon, Dec 16, 2024 at 12:06:33PM +0200, Heikki Linnakangas wrote:

While working on the CSN snapshot patch, I got sidetracked looking closer
into the snapshot tracking in snapmgr.c. Attached are a few patches to
clarify some things.

I haven't yet looked closely at what you are proposing, but big +1 from me
for the general idea. I recently found myself wishing for a lot more
commentary about this stuff [0]/messages/by-id/Z0dB1ld2iPcS6nC9@nathan.

[0]: /messages/by-id/Z0dB1ld2iPcS6nC9@nathan

--
nathan

#3Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Nathan Bossart (#2)
Re: A few patches to clarify snapshot management

On 16/12/2024 23:56, Nathan Bossart wrote:

On Mon, Dec 16, 2024 at 12:06:33PM +0200, Heikki Linnakangas wrote:

While working on the CSN snapshot patch, I got sidetracked looking closer
into the snapshot tracking in snapmgr.c. Attached are a few patches to
clarify some things.

I haven't yet looked closely at what you are proposing, but big +1 from me
for the general idea. I recently found myself wishing for a lot more
commentary about this stuff [0].

[0] /messages/by-id/Z0dB1ld2iPcS6nC9@nathan

While playing around some more with this, I noticed that this code in
GetTransactionSnapshot() is never reached, and AFAICS has always been
dead code:

Snapshot
GetTransactionSnapshot(void)
{
/*
* Return historic snapshot if doing logical decoding. We'll never need a
* non-historic transaction snapshot in this (sub-)transaction, so there's
* no need to be careful to set one up for later calls to
* GetTransactionSnapshot().
*/
if (HistoricSnapshotActive())
{
Assert(!FirstSnapshotSet);
return HistoricSnapshot;
}

when you think about it, that's good, because it doesn't really make
sense to call GetTransactionSnapshot() during logical decoding. We jump
through hoops to make the historic catalog decoding possible with
historic snapshots, tracking subtransactions that modify catalogs and
WAL-logging command ids, but they're not suitable for general purpose
queries. So I think we should turn that into an error, per attached patch.

Another observation is that we only ever use regular MVCC snapshots as
active snapshots. I added a "Assert(snapshot->snapshot_type ==
SNAPSHOT_MVCC);" to PushActiveSnapshotWithLevel() and all regression
tests passed. That's also good, because we assumed that much in a few
places anyway: there are a couple of calls that amount to
"XidInMVCCSnapshot(..., GetActiveSnapshot()"), in
find_inheritance_children_extended() and RelationGetPartitionDesc(). We
could add comments and that assertion to make that assumption explicit.

And that thought takes me deeper down the rabbit hole:

/*
* Struct representing all kind of possible snapshots.
*
* There are several different kinds of snapshots:
* * Normal MVCC snapshots
* * MVCC snapshots taken during recovery (in Hot-Standby mode)
* * Historic MVCC snapshots used during logical decoding
* * snapshots passed to HeapTupleSatisfiesDirty()
* * snapshots passed to HeapTupleSatisfiesNonVacuumable()
* * snapshots used for SatisfiesAny, Toast, Self where no members are
* accessed.
*
* TODO: It's probably a good idea to split this struct using a NodeTag
* similar to how parser and executor nodes are handled, with one type for
* each different kind of snapshot to avoid overloading the meaning of
* individual fields.
*/
typedef struct SnapshotData

I'm thinking of implementing that TODO, splitting SnapshotData into
separate structs like MVCCSnapshotData, SnapshotDirtyData, etc. It seems
to me most places can assume that you're dealing with MVCC snapshots,
and if we had separate types for them, could be using MVCCSnapshot
instead of the generic Snapshot. Only the table and index AM functions
need to deal with non-MVCC snapshots.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

0001-Don-t-allow-GetTransactionSnapshot-in-logical-decodi.patchtext/x-patch; charset=UTF-8; name=0001-Don-t-allow-GetTransactionSnapshot-in-logical-decodi.patchDownload+4-9
#4Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#3)
Re: A few patches to clarify snapshot management

On 20/12/2024 19:31, Heikki Linnakangas wrote:

/*
 * Struct representing all kind of possible snapshots.
 *
 * There are several different kinds of snapshots:
 * * Normal MVCC snapshots
 * * MVCC snapshots taken during recovery (in Hot-Standby mode)
 * * Historic MVCC snapshots used during logical decoding
 * * snapshots passed to HeapTupleSatisfiesDirty()
 * * snapshots passed to HeapTupleSatisfiesNonVacuumable()
 * * snapshots used for SatisfiesAny, Toast, Self where no members are
 *     accessed.
 *
 * TODO: It's probably a good idea to split this struct using a NodeTag
 * similar to how parser and executor nodes are handled, with one type
for
 * each different kind of snapshot to avoid overloading the meaning of
 * individual fields.
 */
typedef struct SnapshotData

I'm thinking of implementing that TODO, splitting SnapshotData into
separate structs like MVCCSnapshotData, SnapshotDirtyData, etc. It seems
to me most places can assume that you're dealing with MVCC snapshots,
and if we had separate types for them, could be using MVCCSnapshot
instead of the generic Snapshot. Only the table and index AM functions
need to deal with non-MVCC snapshots.

Here's a draft of that. Going through this exercise clarified a few
things to me that I didn't realize before:

- The executor only deals with MVCC snapshots. Special snapshots are
only for the lower-level AM interfaces.
- Only MVCC snapshots can be pushed to the active stack
- Only MVCC or historic MVCC snapshots can be registered with a resource
owner

Thoughts?

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

v1-0001-Add-comment-with-more-details-on-active-snapshots.patchtext/x-patch; charset=UTF-8; name=v1-0001-Add-comment-with-more-details-on-active-snapshots.patchDownload+55-1
v1-0002-Add-assertions.patchtext/x-patch; charset=UTF-8; name=v1-0002-Add-assertions.patchDownload+5-1
v1-0003-wip-Split-SnapshotData-into-multiple-structs.patchtext/x-patch; charset=UTF-8; name=v1-0003-wip-Split-SnapshotData-into-multiple-structs.patchDownload+578-478
#5Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#3)
Re: A few patches to clarify snapshot management

On 2024-12-20 19:31:01 +0200, Heikki Linnakangas wrote:

On 16/12/2024 23:56, Nathan Bossart wrote:

On Mon, Dec 16, 2024 at 12:06:33PM +0200, Heikki Linnakangas wrote:

While working on the CSN snapshot patch, I got sidetracked looking closer
into the snapshot tracking in snapmgr.c. Attached are a few patches to
clarify some things.

I haven't yet looked closely at what you are proposing, but big +1 from me
for the general idea. I recently found myself wishing for a lot more
commentary about this stuff [0].

[0] /messages/by-id/Z0dB1ld2iPcS6nC9@nathan

While playing around some more with this, I noticed that this code in
GetTransactionSnapshot() is never reached, and AFAICS has always been dead
code:

Snapshot
GetTransactionSnapshot(void)
{
/*
* Return historic snapshot if doing logical decoding. We'll never need a
* non-historic transaction snapshot in this (sub-)transaction, so there's
* no need to be careful to set one up for later calls to
* GetTransactionSnapshot().
*/
if (HistoricSnapshotActive())
{
Assert(!FirstSnapshotSet);
return HistoricSnapshot;
}

when you think about it, that's good, because it doesn't really make sense
to call GetTransactionSnapshot() during logical decoding. We jump through
hoops to make the historic catalog decoding possible with historic
snapshots, tracking subtransactions that modify catalogs and WAL-logging
command ids, but they're not suitable for general purpose queries. So I
think we should turn that into an error, per attached patch.

Hm. I'm not sure it's a good idea to forbid this. Couldn't there be sane C
code in an output functions calling GetTransactionSnapshot() or such to do
some internal lookups?

Greetings,

Andres Freund

#6Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#5)
Re: A few patches to clarify snapshot management

On 07/01/2025 00:00, Andres Freund wrote:

On 2024-12-20 19:31:01 +0200, Heikki Linnakangas wrote:

While playing around some more with this, I noticed that this code in
GetTransactionSnapshot() is never reached, and AFAICS has always been dead
code:

Snapshot
GetTransactionSnapshot(void)
{
/*
* Return historic snapshot if doing logical decoding. We'll never need a
* non-historic transaction snapshot in this (sub-)transaction, so there's
* no need to be careful to set one up for later calls to
* GetTransactionSnapshot().
*/
if (HistoricSnapshotActive())
{
Assert(!FirstSnapshotSet);
return HistoricSnapshot;
}

when you think about it, that's good, because it doesn't really make sense
to call GetTransactionSnapshot() during logical decoding. We jump through
hoops to make the historic catalog decoding possible with historic
snapshots, tracking subtransactions that modify catalogs and WAL-logging
command ids, but they're not suitable for general purpose queries. So I
think we should turn that into an error, per attached patch.

Hm. I'm not sure it's a good idea to forbid this. Couldn't there be sane C
code in an output functions calling GetTransactionSnapshot() or such to do
some internal lookups?

I haven't seen any. And I don't think that would work correctly while
doing logical decoding anyway, because historical snapshots only track
XIDs that modify catalogs. regclassout and enumout do work because they
use the catalog snapshot rather than GetTransactionSnapshot().

(I committed that change in commit 1585ff7387 already, but discussion is
still welcome of course)

--
Heikki Linnakangas
Neon (https://neon.tech)

#7Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#4)
Re: A few patches to clarify snapshot management

On 06/01/2025 23:30, Heikki Linnakangas wrote:

On 20/12/2024 19:31, Heikki Linnakangas wrote:

/*
 * Struct representing all kind of possible snapshots.
 *
 * There are several different kinds of snapshots:
 * * Normal MVCC snapshots
 * * MVCC snapshots taken during recovery (in Hot-Standby mode)
 * * Historic MVCC snapshots used during logical decoding
 * * snapshots passed to HeapTupleSatisfiesDirty()
 * * snapshots passed to HeapTupleSatisfiesNonVacuumable()
 * * snapshots used for SatisfiesAny, Toast, Self where no members are
 *     accessed.
 *
 * TODO: It's probably a good idea to split this struct using a NodeTag
 * similar to how parser and executor nodes are handled, with one
type for
 * each different kind of snapshot to avoid overloading the meaning of
 * individual fields.
 */
typedef struct SnapshotData

I'm thinking of implementing that TODO, splitting SnapshotData into
separate structs like MVCCSnapshotData, SnapshotDirtyData, etc. It
seems to me most places can assume that you're dealing with MVCC
snapshots, and if we had separate types for them, could be using
MVCCSnapshot instead of the generic Snapshot. Only the table and index
AM functions need to deal with non-MVCC snapshots.

Here's a draft of that. Going through this exercise clarified a few
things to me that I didn't realize before:

- The executor only deals with MVCC snapshots. Special snapshots are
only for the lower-level AM interfaces.
- Only MVCC snapshots can be pushed to the active stack
- Only MVCC or historic MVCC snapshots can be registered with a resource
owner

I committed the patches adding comments on Tuesday. Here's an updated
version of the patch to split SnapshotData into different structs.

The second, new patch simplifies the historic snapshot reference
counting during logical decoding. It's in principle independent from the
first patch, but it was hard to see how the opportunity before splitting
the structs.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

v2-0001-Split-SnapshotData-into-separate-structs-for-each.patchtext/x-patch; charset=UTF-8; name=v2-0001-Split-SnapshotData-into-separate-structs-for-each.patchDownload+451-337
v2-0002-Simplify-historic-snapshot-refcounting.patchtext/x-patch; charset=UTF-8; name=v2-0002-Simplify-historic-snapshot-refcounting.patchDownload+46-103
#8Noah Misch
noah@leadboat.com
In reply to: Heikki Linnakangas (#6)
Re: A few patches to clarify snapshot management

On Tue, Jan 07, 2025 at 11:55:00AM +0200, Heikki Linnakangas wrote:

On 07/01/2025 00:00, Andres Freund wrote:

On 2024-12-20 19:31:01 +0200, Heikki Linnakangas wrote:

While playing around some more with this, I noticed that this code in
GetTransactionSnapshot() is never reached, and AFAICS has always been dead
code:

Snapshot
GetTransactionSnapshot(void)
{
/*
* Return historic snapshot if doing logical decoding. We'll never need a
* non-historic transaction snapshot in this (sub-)transaction, so there's
* no need to be careful to set one up for later calls to
* GetTransactionSnapshot().
*/
if (HistoricSnapshotActive())
{
Assert(!FirstSnapshotSet);
return HistoricSnapshot;
}

when you think about it, that's good, because it doesn't really make sense
to call GetTransactionSnapshot() during logical decoding. We jump through
hoops to make the historic catalog decoding possible with historic
snapshots, tracking subtransactions that modify catalogs and WAL-logging
command ids, but they're not suitable for general purpose queries. So I
think we should turn that into an error, per attached patch.

Hm. I'm not sure it's a good idea to forbid this. Couldn't there be sane C
code in an output functions calling GetTransactionSnapshot() or such to do
some internal lookups?

I haven't seen any. And I don't think that would work correctly while doing
logical decoding anyway, because historical snapshots only track XIDs that
modify catalogs. regclassout and enumout do work because they use the
catalog snapshot rather than GetTransactionSnapshot().

(I committed that change in commit 1585ff7387 already, but discussion is
still welcome of course)

https://github.com/2ndQuadrant/pglogical does rely on the pre-1585ff7387 code
for its row_filter feature. row_filter calls ExecEvalExpr() from the output
plugin, e.g. to evaluate expression "id between 2 AND 4" arising from this
configuration in the pglogical test suite:

SELECT * FROM pglogical.replication_set_add_table('default', 'basic_dml', false, row_filter := $rf$id between 2 AND 4$rf$);

One of the GetTransactionSnapshot() calls is in pglogical_output_plugin.c
itself. For that, I could work around the change by forcing the old
HistoricSnapshot use:

-		PushActiveSnapshot(GetTransactionSnapshot());
+		Assert(HistoricSnapshotActive());
+		PushActiveSnapshot(GetCatalogSnapshot(InvalidOid));

That doesn't get far. Calling a plpgsql function in the expression reaches
the "cannot take query snapshot during logical decoding" error via this stack
trace:

GetTransactionSnapshot at snapmgr.c:279:3
exec_eval_simple_expr at pl_exec.c:6214:3
(inlined by) exec_eval_expr at pl_exec.c:5699:6
exec_stmt_raise at pl_exec.c:3820:8
(inlined by) exec_stmts at pl_exec.c:2096:10
exec_stmt_block at pl_exec.c:1955:6
exec_toplevel_block at pl_exec.c:1646:7
plpgsql_exec_function at pl_exec.c:636:5
plpgsql_call_handler at pl_handler.c:278:11
fmgr_security_definer at fmgr.c:755:52
ExecInterpExpr at execExprInterp.c:927:7
pglogical_change_filter at pglogical_output_plugin.c:663:7
(inlined by) pg_decode_change at pglogical_output_plugin.c:691:7
change_cb_wrapper at logical.c:1121:22
ReorderBufferApplyChange at reorderbuffer.c:2078:3
(inlined by) ReorderBufferProcessTXN at reorderbuffer.c:2383:7
DecodeCommit at decode.c:743:3
(inlined by) xact_decode at decode.c:242:5
LogicalDecodingProcessRecord at decode.c:123:1
XLogSendLogical at walsender.c:3442:33
WalSndLoop at walsender.c:2837:7
StartLogicalReplication at walsender.c:1504:2
(inlined by) exec_replication_command at walsender.c:2158:6
PostgresMain at postgres.c:4762:10
BackendMain at backend_startup.c:80:2
postmaster_child_launch at launch_backend.c:291:3
BackendStartup at postmaster.c:3587:8
(inlined by) ServerLoop at postmaster.c:1702:6
PostmasterMain at postmaster.c:1252:6
main at main.c:165:4

Hm. I'm not sure it's a good idea to forbid this. Couldn't there be sane C
code in an output functions calling GetTransactionSnapshot() or such to do
some internal lookups?

I think pglogical_output_plugin.c w/ plpgsql is largely sane when used with a
plpgsql function that consults only catalogs and the output tuple. If the
pglogical test suite is representative, that's the usual case for a
row_filter. A plpgsql function that reads user tables will be fragile with
concurrent pruning, but a user might sanely accept that fragility. A plpgsql
function that writes tuples is not sane in a row_filter. How do you see it?

So far, I know of these options:

1. Make pglogical block the row_filter feature for any v18+ origin.
2. Revert postgresql.git commit 1585ff7387.
3. Make pglogical use HistoricSnapshot where pglogical_output_plugin.c handles
snapshots directly. That should keep simple row_filter expressions like
"col > 0" functioning. Entering plpgsql or similarly-complex logic will
fail with "cannot take query snapshot during logical decoding", and we'll
consider that to be working as intended.
4. Fail later and lazily, for just the most-unreasonable cases. For example,
fail when HistoricSnapshot applies to a write operation. (Maybe this
already fails. I didn't check.)

Which of those or other options should we consider?

For reference, https://github.com/2ndQuadrant/pglogical/pull/503 is an
otherwise-working port of pglogical to v18. Its Makefile currently disables
the tests that reach "cannot take query snapshot during logical decoding".

#9Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Noah Misch (#8)
Re: A few patches to clarify snapshot management

On 10/08/2025 01:23, Noah Misch wrote:

On Tue, Jan 07, 2025 at 11:55:00AM +0200, Heikki Linnakangas wrote:

On 07/01/2025 00:00, Andres Freund wrote:

On 2024-12-20 19:31:01 +0200, Heikki Linnakangas wrote:

While playing around some more with this, I noticed that this code in
GetTransactionSnapshot() is never reached, and AFAICS has always been dead
code:

Snapshot
GetTransactionSnapshot(void)
{
/*
* Return historic snapshot if doing logical decoding. We'll never need a
* non-historic transaction snapshot in this (sub-)transaction, so there's
* no need to be careful to set one up for later calls to
* GetTransactionSnapshot().
*/
if (HistoricSnapshotActive())
{
Assert(!FirstSnapshotSet);
return HistoricSnapshot;
}

when you think about it, that's good, because it doesn't really make sense
to call GetTransactionSnapshot() during logical decoding. We jump through
hoops to make the historic catalog decoding possible with historic
snapshots, tracking subtransactions that modify catalogs and WAL-logging
command ids, but they're not suitable for general purpose queries. So I
think we should turn that into an error, per attached patch.

Hm. I'm not sure it's a good idea to forbid this. Couldn't there be sane C
code in an output functions calling GetTransactionSnapshot() or such to do
some internal lookups?

I haven't seen any. And I don't think that would work correctly while doing
logical decoding anyway, because historical snapshots only track XIDs that
modify catalogs. regclassout and enumout do work because they use the
catalog snapshot rather than GetTransactionSnapshot().

(I committed that change in commit 1585ff7387 already, but discussion is
still welcome of course)

https://github.com/2ndQuadrant/pglogical does rely on the pre-1585ff7387 code
for its row_filter feature. row_filter calls ExecEvalExpr() from the output
plugin, e.g. to evaluate expression "id between 2 AND 4" arising from this
configuration in the pglogical test suite:

SELECT * FROM pglogical.replication_set_add_table('default', 'basic_dml', false, row_filter := $rf$id between 2 AND 4$rf$);

One of the GetTransactionSnapshot() calls is in pglogical_output_plugin.c
itself. For that, I could work around the change by forcing the old
HistoricSnapshot use:

-		PushActiveSnapshot(GetTransactionSnapshot());
+		Assert(HistoricSnapshotActive());
+		PushActiveSnapshot(GetCatalogSnapshot(InvalidOid));

That doesn't get far. Calling a plpgsql function in the expression reaches
the "cannot take query snapshot during logical decoding" error via this stack
trace:

GetTransactionSnapshot at snapmgr.c:279:3
exec_eval_simple_expr at pl_exec.c:6214:3
(inlined by) exec_eval_expr at pl_exec.c:5699:6
exec_stmt_raise at pl_exec.c:3820:8
(inlined by) exec_stmts at pl_exec.c:2096:10
exec_stmt_block at pl_exec.c:1955:6
exec_toplevel_block at pl_exec.c:1646:7
plpgsql_exec_function at pl_exec.c:636:5
plpgsql_call_handler at pl_handler.c:278:11
fmgr_security_definer at fmgr.c:755:52
ExecInterpExpr at execExprInterp.c:927:7
pglogical_change_filter at pglogical_output_plugin.c:663:7
(inlined by) pg_decode_change at pglogical_output_plugin.c:691:7
change_cb_wrapper at logical.c:1121:22
ReorderBufferApplyChange at reorderbuffer.c:2078:3
(inlined by) ReorderBufferProcessTXN at reorderbuffer.c:2383:7
DecodeCommit at decode.c:743:3
(inlined by) xact_decode at decode.c:242:5
LogicalDecodingProcessRecord at decode.c:123:1
XLogSendLogical at walsender.c:3442:33
WalSndLoop at walsender.c:2837:7
StartLogicalReplication at walsender.c:1504:2
(inlined by) exec_replication_command at walsender.c:2158:6
PostgresMain at postgres.c:4762:10
BackendMain at backend_startup.c:80:2
postmaster_child_launch at launch_backend.c:291:3
BackendStartup at postmaster.c:3587:8
(inlined by) ServerLoop at postmaster.c:1702:6
PostmasterMain at postmaster.c:1252:6
main at main.c:165:4

Hm. I'm not sure it's a good idea to forbid this. Couldn't there be sane C
code in an output functions calling GetTransactionSnapshot() or such to do
some internal lookups?

I think pglogical_output_plugin.c w/ plpgsql is largely sane when used with a
plpgsql function that consults only catalogs and the output tuple. If the
pglogical test suite is representative, that's the usual case for a
row_filter. A plpgsql function that reads user tables will be fragile with
concurrent pruning, but a user might sanely accept that fragility. A plpgsql
function that writes tuples is not sane in a row_filter. How do you see it?

So far, I know of these options:

1. Make pglogical block the row_filter feature for any v18+ origin.
2. Revert postgresql.git commit 1585ff7387.
3. Make pglogical use HistoricSnapshot where pglogical_output_plugin.c handles
snapshots directly. That should keep simple row_filter expressions like
"col > 0" functioning. Entering plpgsql or similarly-complex logic will
fail with "cannot take query snapshot during logical decoding", and we'll
consider that to be working as intended.
4. Fail later and lazily, for just the most-unreasonable cases. For example,
fail when HistoricSnapshot applies to a write operation. (Maybe this
already fails. I didn't check.)

Which of those or other options should we consider?

Hmm, so what snapshot should you use for these row filter expressions
anyway?

As long as you don't try to access any tables, it doesn't matter
much. Although, the effective catalog snapshot also affects how any
functions used in the expression are resolved. If you CREATE OR REPLACE
the row filter function while logical decoding is active, what version
of the function do you expect to be used? I think that's a little fuzzy,
and you might get different answer for the initial sync step and the
on-going decoding. We don't necessarily need to solve that here.
Nevertheless, what would be the least surprising answer to that?

Currently, in v17 and below, we will use the historic snapshot, which
represents the point in time that we are decoding. Is that the right
choice? I'm not sure. A historic snapshot is only supposed to be used
for catalogs, it's not clear if it works correctly for arbitrary
queries. And it's not clear it's the right choice for resolving the row
filter functions either.

How about always using a fresh snapshot instead? Instead of pushing the
historic snapshot as the active snapshot, _disable_ the historic
snapshot and use GetTransactionSnapshot() to acquire a regular snapshot?

We could implement that in GetTransactionSnapshot() ifself by just
removing the check for HistoricSnapshotActive(), and let it call
GetSnapshotData() as usual. But I still think it's a useful sanity check
that you don't call GetTransactionSnapshot() while a historic snapshot
is active, so I'd prefer for the caller to explicitly disable the
historic snapshot first.

Attached is a patch to pglogical to demonstrate that.

Another apprach is to continue down the path you attempted. There are
many places in plpgsql and elsewhere where we call
GetTransactionSnapshot(), but are they really necessary when you're
executing something like the row filter expression? I think the row
filter expression is supposed to be read-only. There are optimizations
already to avoid GetTransactionSnapshot() calls in read-only functions
(i.e. immutable), but we could expand those to any function in a
read-only transaction, and set XactReadOnly while evaluating the row
filter expression.

The second attached patch makes that change in PostgreSQL code. With
those changes, the pglogical change you attempted to do
"PushActiveSnapshot(GetCatalogSnapshot(InvalidOid))" seems to work. I'm
not sure it covers all the cases though, there might be more
GetTransactionSnapshot() calls lurking.

I think I prefer the change to pglogical to disable the historic
snapshot. It feels more robust. I'm not sure if there's a performance
penalty though, as you now need to call GetSnapshotData() for every
decoded transaction.

Finally, attached is a pglogical test case to test what happens if you
change the datatype of the table, while there's decoding active with a
complex row filter function that also accesses the table. I'm not sure
how that should behave and I think that falls into the category of
"don't do that". But FWIW, on v17 it tries to read it fails with this:

ERROR: could not read blocks 0..0 in file "base/16384/17512": read only
0 of 8192 bytes

while with the attached pglogical-disable-historic-snapshot.patch it
fails more nicely:

[2025-08-15 14:05:41.514 EEST] [3824375] [regression] ERROR: attribute
1 of type rowfilter_ddl_tbl has wrong type
[2025-08-15 14:05:41.514 EEST] [3824375] [regression] DETAIL: Table has
type text, but query expects integer.

- Heikki

Attachments:

pglogical-disable-historic-snapshot.patchtext/x-patch; charset=UTF-8; name=pglogical-disable-historic-snapshot.patchDownload+13-0
use-readonly-mode-for-decode.patchtext/x-patch; charset=UTF-8; name=use-readonly-mode-for-decode.patchDownload+9-5
row_filter_ddl.sqlapplication/sql; name=row_filter_ddl.sqlDownload
#10Noah Misch
noah@leadboat.com
In reply to: Heikki Linnakangas (#9)
Re: A few patches to clarify snapshot management

On Fri, Aug 15, 2025 at 02:12:03PM +0300, Heikki Linnakangas wrote:

On 10/08/2025 01:23, Noah Misch wrote:

On Tue, Jan 07, 2025 at 11:55:00AM +0200, Heikki Linnakangas wrote:

On 07/01/2025 00:00, Andres Freund wrote:

Hm. I'm not sure it's a good idea to forbid this. Couldn't there be sane C
code in an output functions calling GetTransactionSnapshot() or such to do
some internal lookups?

I haven't seen any. And I don't think that would work correctly while doing
logical decoding anyway, because historical snapshots only track XIDs that
modify catalogs. regclassout and enumout do work because they use the
catalog snapshot rather than GetTransactionSnapshot().

(I committed that change in commit 1585ff7387 already, but discussion is
still welcome of course)

https://github.com/2ndQuadrant/pglogical does rely on the pre-1585ff7387 code
for its row_filter feature.

So far, I know of these options:

1. Make pglogical block the row_filter feature for any v18+ origin.
2. Revert postgresql.git commit 1585ff7387.
3. Make pglogical use HistoricSnapshot where pglogical_output_plugin.c handles
snapshots directly. That should keep simple row_filter expressions like
"col > 0" functioning. Entering plpgsql or similarly-complex logic will
fail with "cannot take query snapshot during logical decoding", and we'll
consider that to be working as intended.
4. Fail later and lazily, for just the most-unreasonable cases. For example,
fail when HistoricSnapshot applies to a write operation. (Maybe this
already fails. I didn't check.)

Which of those or other options should we consider?

Hmm, so what snapshot should you use for these row filter expressions
anyway?

As long as you don't try to access any tables, it doesn't matter
much. Although, the effective catalog snapshot also affects how any
functions used in the expression are resolved. If you CREATE OR REPLACE the
row filter function while logical decoding is active, what version of the
function do you expect to be used? I think that's a little fuzzy, and you
might get different answer for the initial sync step and the on-going
decoding. We don't necessarily need to solve that here. Nevertheless, what
would be the least surprising answer to that?

Currently, in v17 and below, we will use the historic snapshot, which
represents the point in time that we are decoding. Is that the right
choice? I'm not sure. A historic snapshot is only supposed to be used for
catalogs, it's not clear if it works correctly for arbitrary queries. And
it's not clear it's the right choice for resolving the row filter functions
either.

How about always using a fresh snapshot instead? Instead of pushing the
historic snapshot as the active snapshot, _disable_ the historic snapshot
and use GetTransactionSnapshot() to acquire a regular snapshot?

I see advantages in using the historic snapshot:

1. It's the longstanding behavior, and applications aren't complaining.

2. If someone wants "fresh snapshot", they can do that today with a C
extension that provides an execute_at_fresh_snapshot(sql text) SQL
function. If we adopt the fresh snapshot in pglogical or in core, I don't
see a comparably-clean way for the application code to get back to the
historic snapshot. (That's because the historic snapshot lives only in
stack variables at the moment in question.)

3. If an application is relying on the longstanding behavior and needs to
adapt to the proposed "fresh snapshot" behavior, that may be invasive to
implement and harmful to performance. For example, instead of reading from
a user_catalog_table inside the filter, the application may need to
duplicate that table's data into the rows being filtered.

Does the "fresh snapshot" alternative bring strengths to outweigh those?

Another apprach is to continue down the path you attempted. There are many
places in plpgsql and elsewhere where we call GetTransactionSnapshot(), but
are they really necessary when you're executing something like the row
filter expression? I think the row filter expression is supposed to be
read-only. There are optimizations already to avoid GetTransactionSnapshot()
calls in read-only functions (i.e. immutable), but we could expand those to
any function in a read-only transaction, and set XactReadOnly while
evaluating the row filter expression.

The second attached patch makes that change in PostgreSQL code. With those
changes, the pglogical change you attempted to do
"PushActiveSnapshot(GetCatalogSnapshot(InvalidOid))" seems to work. I'm not
sure it covers all the cases though, there might be more
GetTransactionSnapshot() calls lurking.

Yep. As you say, the GetTransactionSnapshot() calls probably aren't
necessary, so this could work long-term. That patch's edit in src/pl may
imply similar needs lurking in non-core PLs.

Finally, attached is a pglogical test case to test what happens if you
change the datatype of the table, while there's decoding active with a
complex row filter function that also accesses the table. I'm not sure how
that should behave and I think that falls into the category of "don't do
that". But FWIW, on v17 it tries to read it fails with this:

ERROR: could not read blocks 0..0 in file "base/16384/17512": read only 0
of 8192 bytes

Reading reliably with a historic snapshot would require adding
user_catalog_table. Ideally, the error message would lead the user to a
conclusion like "you're reading a non-catalog with a historic snapshot; this
is expected after a rewrite of that non-catalog". With user_catalog_table:

-   CREATE TABLE public.rowfilter_ddl_tbl (id int primary key);
+   CREATE TABLE public.rowfilter_ddl_tbl (id int primary key) WITH (user_catalog_table = true);

... the v17 ALTER fails with 'ERROR: cannot rewrite table "rowfilter_ddl_tbl"
used as a catalog table'. Not bad. Incidentally, while that's the result
with a production build, a v17 --enable-cassert build crashes earlier (at
today's REL_17_STABLE and today's pglogical head):

#4 0x000055ed959bd415 in ExceptionalCondition (conditionName=conditionName@entry=0x55ed95ae46f8 "ActiveSnapshot->as_snap->active_count == 1",
fileName=fileName@entry=0x55ed95a49ce1 "snapmgr.c", lineNumber=lineNumber@entry=718) at assert.c:66
#5 0x000055ed959ff409 in UpdateActiveSnapshotCommandId () at snapmgr.c:718
#6 0x000055ed956f8342 in _SPI_execute_plan (plan=plan@entry=0x55edab044410, options=options@entry=0x7ffd8b03f4b0, snapshot=snapshot@entry=0x0,
crosscheck_snapshot=crosscheck_snapshot@entry=0x0, fire_triggers=fire_triggers@entry=true) at spi.c:2668
#7 0x000055ed956f8bc2 in SPI_execute_plan_with_paramlist (plan=0x55edab044410, params=0x55edab021750, read_only=false, tcount=tcount@entry=0) at spi.c:751
#8 0x00007f18a25e78a7 in exec_run_select (estate=estate@entry=0x7ffd8b03fbd0, expr=expr@entry=0x55edab030b58, portalP=portalP@entry=0x0, maxtuples=0) at pl_exec.c:5824
#9 0x00007f18a25e7b8a in exec_eval_expr (estate=0x7ffd8b03fbd0, expr=0x55edab030b58, isNull=0x7ffd8b03f597, rettype=0x7ffd8b03f598, rettypmod=0x7ffd8b03f59c)
at pl_exec.c:5714
#10 0x00007f18a25ea7f2 in exec_assign_expr (estate=estate@entry=0x7ffd8b03fbd0, target=0x55edab020b50, expr=0x55edab030b58) at pl_exec.c:5039
#11 0x00007f18a25ebe45 in exec_stmt_assign (estate=0x7ffd8b03fbd0, stmt=0x55edab030ac8) at pl_exec.c:2156
#12 exec_stmts (estate=estate@entry=0x7ffd8b03fbd0, stmts=0x55edab030c38) at pl_exec.c:2020
#13 0x00007f18a25ebdd3 in exec_stmt_if (estate=0x55edaaf3af40, stmt=<optimized out>) at pl_exec.c:2535
#14 exec_stmts (estate=estate@entry=0x7ffd8b03fbd0, stmts=0x55edab030cd8) at pl_exec.c:2036
#15 0x00007f18a25ede6b in exec_stmt_block (estate=estate@entry=0x7ffd8b03fbd0, block=block@entry=0x55edab0310d0) at pl_exec.c:1943
#16 0x00007f18a25edf6d in exec_toplevel_block (estate=estate@entry=0x7ffd8b03fbd0, block=0x55edab0310d0) at pl_exec.c:1634
#17 0x00007f18a25ee7e1 in plpgsql_exec_function (func=func@entry=0x55edab024a18, fcinfo=fcinfo@entry=0x55edaafd8a18, simple_eval_estate=simple_eval_estate@entry=0x0,
simple_eval_resowner=simple_eval_resowner@entry=0x0, procedure_resowner=procedure_resowner@entry=0x0, atomic=<optimized out>) at pl_exec.c:623
#18 0x00007f18a25f8e43 in plpgsql_call_handler (fcinfo=0x55edaafd8a18) at pl_handler.c:277
#19 0x000055ed956b239f in ExecInterpExpr (state=0x55edaafd83a0, econtext=0x55edab012180, isnull=<optimized out>) at execExprInterp.c:740
#20 0x00007f18a3003bc0 in pglogical_change_filter (data=0x55edaafb7268, relation=0x7f18a2a93ab8, change=0x55edab006dc0, att_list=<synthetic pointer>)
at pglogical_output_plugin.c:656
#21 pg_decode_change (ctx=0x55edaafb5460, txn=<optimized out>, relation=0x7f18a2a93ab8, change=0x55edab006dc0) at pglogical_output_plugin.c:690
#22 0x000055ed957fe6f9 in change_cb_wrapper (cache=<optimized out>, txn=<optimized out>, relation=<optimized out>, change=<optimized out>) at logical.c:1137
#23 0x000055ed9580a1f8 in ReorderBufferApplyChange (rb=<optimized out>, txn=<optimized out>, relation=0x7f18a2a93ab8, change=0x55edab006dc0, streaming=false)
at reorderbuffer.c:2019
#24 ReorderBufferProcessTXN (rb=0x55edaafb9ce0, txn=0x55edaafefdb0, commit_lsn=37174848, snapshot_now=<optimized out>, command_id=command_id@entry=0,
streaming=streaming@entry=false) at reorderbuffer.c:2300
#25 0x000055ed9580a4dc in ReorderBufferReplay (txn=<optimized out>, rb=<optimized out>, commit_lsn=<optimized out>, end_lsn=<optimized out>, commit_time=<optimized out>,
origin_id=<optimized out>, origin_lsn=0, xid=<optimized out>) at reorderbuffer.c:2767
#26 0x000055ed9580b041 in ReorderBufferCommit (rb=<optimized out>, xid=<optimized out>, commit_lsn=<optimized out>, end_lsn=<optimized out>, commit_time=<optimized out>,
origin_id=<optimized out>, origin_lsn=<optimized out>) at reorderbuffer.c:2791
#27 0x000055ed957fad92 in DecodeCommit (ctx=0x55edaafb5460, buf=0x7ffd8b040500, parsed=0x7ffd8b040370, xid=794, two_phase=false) at decode.c:746
#28 xact_decode (ctx=0x55edaafb5460, buf=0x7ffd8b040500) at decode.c:242
#29 0x000055ed957fa721 in LogicalDecodingProcessRecord (ctx=0x55edaafb5460, record=0x55edaafb5838) at decode.c:116
#30 0x000055ed95825c12 in XLogSendLogical () at walsender.c:3445
#31 0x000055ed95828526 in WalSndLoop (send_data=send_data@entry=0x55ed95825bd0 <XLogSendLogical>) at walsender.c:2835
#32 0x000055ed958295bc in StartLogicalReplication (cmd=<optimized out>) at walsender.c:1525
#33 exec_replication_command (
cmd_string=cmd_string@entry=0x55edaaed62d0 "START_REPLICATION SLOT \"pgl_postgres_test_provider_test_sube55bf37\" LOGICAL 0/2341A18 (expected_encoding 'UTF8', min_proto_version '1', max_proto_version '1', startup_params_format '1', \"binary.want_i"...) at walsender.c:2160
#34 0x000055ed958811f4 in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4763
...
(gdb) p ActiveSnapshot->as_snap->active_count
$1 = 3

#11Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Noah Misch (#10)
Re: A few patches to clarify snapshot management

On 19/08/2025 03:14, Noah Misch wrote:

On Fri, Aug 15, 2025 at 02:12:03PM +0300, Heikki Linnakangas wrote:

How about always using a fresh snapshot instead? Instead of pushing the
historic snapshot as the active snapshot, _disable_ the historic snapshot
and use GetTransactionSnapshot() to acquire a regular snapshot?

I see advantages in using the historic snapshot:

1. It's the longstanding behavior, and applications aren't complaining.

2. If someone wants "fresh snapshot", they can do that today with a C
extension that provides an execute_at_fresh_snapshot(sql text) SQL
function. If we adopt the fresh snapshot in pglogical or in core, I don't
see a comparably-clean way for the application code to get back to the
historic snapshot. (That's because the historic snapshot lives only in
stack variables at the moment in question.)

3. If an application is relying on the longstanding behavior and needs to
adapt to the proposed "fresh snapshot" behavior, that may be invasive to
implement and harmful to performance. For example, instead of reading from
a user_catalog_table inside the filter, the application may need to
duplicate that table's data into the rows being filtered.

Oh, I had not considered user_catalog_tables. I didn't remember that's a
thing.

The docs on user_catalog_table says:

Note that access to user catalog tables or regular system catalog
tables in the output plugins has to be done via the systable_* scan APIs
only. Access via the heap_* scan APIs will error out.

That doesn't quite say that you should be able to run arbitrary queries
on a user_catalog_table. In fact it suggests that you can't, because
surely you're not using the systable_* scan APIs when running arbitrary
queries.

That said, I agree it would be nice if we can keep it working.

Does the "fresh snapshot" alternative bring strengths to outweigh those?

The argument for the fresh snapshot is that using a historic snapshot
only makes sense for catalog tables, and by taking a fresh snapshot, we
avoid the mistake of using the historic snapshot for anything else. I
thought that there's practically no valid use case for using a historic
snapshot in anything that calls GetTransactionSnapshot(), and if it
happens to work, it's only because the snapshot isn't actually used for
anything or is only used to read data that hasn't changed so that you
get away with it.

I agree that reading a table marked as user_catalog_table is valid case,
however, so I take back that argument.

How about the attached, then? It reverts the GetTransactionSnapshot()
change. But to still catch at least some of the invalid uses of the
historic snapshot, it adds checks to heap_beginscan() and
index_beginscan(), to complain if they are called on a non-catalog
relation with a historic snapshot.

- Heikki

Attachments:

0001-Revert-GetTransactionSnapshot-to-return-historic-sna.patchtext/x-patch; charset=UTF-8; name=0001-Revert-GetTransactionSnapshot-to-return-historic-sna.patchDownload+35-5
#12Noah Misch
noah@leadboat.com
In reply to: Heikki Linnakangas (#11)
Re: A few patches to clarify snapshot management

On Tue, Aug 19, 2025 at 11:45:01PM +0300, Heikki Linnakangas wrote:

How about the attached, then? It reverts the GetTransactionSnapshot()
change. But to still catch at least some of the invalid uses of the historic
snapshot, it adds checks to heap_beginscan() and index_beginscan(), to
complain if they are called on a non-catalog relation with a historic
snapshot.

@@ -1143,6 +1143,15 @@ heap_beginscan(Relation relation, Snapshot snapshot,
if (!(snapshot && IsMVCCSnapshot(snapshot)))
scan->rs_base.rs_flags &= ~SO_ALLOW_PAGEMODE;

+	/* Check that a historic snapshot is not used for non-catalog tables */
+	if (snapshot &&
+		IsHistoricMVCCSnapshot(snapshot) &&
+		!RelationIsAccessibleInLogicalDecoding(relation))
+	{
+		elog(ERROR, "cannot query non-catalog table \"%s\" during logical decoding",
+			 RelationGetRelationName(relation));
+	}
+

I feel post-beta3 is late for debut of restrictions like this. How about a
pure revert, then add those restrictions in v19? Should be s/elog/ereport/,
also.

#13Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Noah Misch (#12)
Re: A few patches to clarify snapshot management

On 20/08/2025 03:37, Noah Misch wrote:

On Tue, Aug 19, 2025 at 11:45:01PM +0300, Heikki Linnakangas wrote:

How about the attached, then? It reverts the GetTransactionSnapshot()
change. But to still catch at least some of the invalid uses of the historic
snapshot, it adds checks to heap_beginscan() and index_beginscan(), to
complain if they are called on a non-catalog relation with a historic
snapshot.

@@ -1143,6 +1143,15 @@ heap_beginscan(Relation relation, Snapshot snapshot,
if (!(snapshot && IsMVCCSnapshot(snapshot)))
scan->rs_base.rs_flags &= ~SO_ALLOW_PAGEMODE;

+	/* Check that a historic snapshot is not used for non-catalog tables */
+	if (snapshot &&
+		IsHistoricMVCCSnapshot(snapshot) &&
+		!RelationIsAccessibleInLogicalDecoding(relation))
+	{
+		elog(ERROR, "cannot query non-catalog table \"%s\" during logical decoding",
+			 RelationGetRelationName(relation));
+	}
+

I feel post-beta3 is late for debut of restrictions like this. How about a
pure revert, then add those restrictions in v19? Should be s/elog/ereport/,
also.

Ok, fair. I committed the revert to v18, and the revert + additional
checks to master.

- Heikki