Error "initial slot snapshot too large" in create replication slot
While creating an "export snapshot" I don't see any protection why the
number of xids in the snapshot can not cross the
"GetMaxSnapshotXidCount()"?.
Basically, while converting the HISTORIC snapshot to the MVCC snapshot
in "SnapBuildInitialSnapshot()", we add all the xids between
snap->xmin to snap->xmax to the MVCC snap->xip array (xids for which
commit were not recorded). The problem is that we add both topxids as
well as the subxids into the same array and expect that the "xid"
count does not cross the "GetMaxSnapshotXidCount()". So it seems like
an issue but I am not sure what is the fix for this, some options are
a) Don't limit the xid count in the exported snapshot and dynamically
resize the array b) Increase the limit to GetMaxSnapshotXidCount() +
GetMaxSnapshotSubxidCount(). But in option b) there would still be a
problem that how do we handle the overflowed subtransaction?
I have locally, reproduced the issue,
1. Configuration
max_connections= 5
autovacuum = off
max_worker_processes = 0
2.Then from pgbench I have run the attached script (test.sql) from 5 clients.
./pgbench -i postgres
./pgbench -c4 -j4 -T 3000 -f test1.sql -P1 postgres
3. Concurrently, create replication slot,
[dilipkumar@localhost bin]$ ./psql "dbname=postgres replication=database"
postgres[7367]=#
postgres[6463]=# CREATE_REPLICATION_SLOT "slot" LOGICAL "test_decoding";
ERROR: 40001: initial slot snapshot too large
LOCATION: SnapBuildInitialSnapshot, snapbuild.c:597
postgres[6463]=# CREATE_REPLICATION_SLOT "slot" LOGICAL "test_decoding";
ERROR: XX000: clearing exported snapshot in wrong transaction state
LOCATION: SnapBuildClearExportedSnapshot, snapbuild.c:690
I could reproduce this issue, at least once in 8-10 attempts of
creating the replication slot.
Note: After that issue, I have noticed one more issue "clearing
exported snapshot in wrong transaction state", that is because the
"ExportInProgress" is not cleared on the transaction abort, for this,
a simple fix is we can clear this state on the transaction abort,
maybe I will raise this as a separate issue?
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
At Mon, 11 Oct 2021 11:49:41 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
While creating an "export snapshot" I don't see any protection why the
number of xids in the snapshot can not cross the
"GetMaxSnapshotXidCount()"?.Basically, while converting the HISTORIC snapshot to the MVCC snapshot
in "SnapBuildInitialSnapshot()", we add all the xids between
snap->xmin to snap->xmax to the MVCC snap->xip array (xids for which
commit were not recorded). The problem is that we add both topxids as
well as the subxids into the same array and expect that the "xid"
count does not cross the "GetMaxSnapshotXidCount()". So it seems like
an issue but I am not sure what is the fix for this, some options are
It seems to me that it is a compromise between the restriction of the
legitimate snapshot and snapshots created by snapbuild. If the xids
overflow, the resulting snapshot may lose a siginificant xid, i.e, a
top-level xid.
a) Don't limit the xid count in the exported snapshot and dynamically
resize the array b) Increase the limit to GetMaxSnapshotXidCount() +
GetMaxSnapshotSubxidCount(). But in option b) there would still be a
problem that how do we handle the overflowed subtransaction?
I'm afraid that we shouldn't expand the size limits. If I understand
it correctly, we only need the top-level xids in the exported snapshot
and reorder buffer knows whether a xid is a top-level or not after
establishing a consistent snapshot.
The attached diff tries to make SnapBuildInitialSnapshot exclude
subtransaction from generated snapshots. It seems working fine for
you repro. (Of course, I'm not confident that it is the correct thing,
though..)
What do you think about this?
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
exclude_subxact_from_snapbuild_snapshot.diff.txttext/plain; charset=us-asciiDownload
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 46e66608cf..4e452cce7c 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3326,6 +3326,26 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid)
}
+/*
+ * ReorderBufferXidIsKnownSubXact
+ * Returns true if the xid is a known subtransaction.
+ */
+bool
+ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false,
+ NULL, InvalidXLogRecPtr, false);
+
+ /* a known subtxn? */
+ if (txn && rbtxn_is_known_subxact(txn))
+ return true;
+
+ return false;
+}
+
+
/*
* ---------------------------------------
* Disk serialization support
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index a5333349a8..12d283f4de 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -591,12 +591,18 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
if (test == NULL)
{
- if (newxcnt >= GetMaxSnapshotXidCount())
- ereport(ERROR,
- (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
- errmsg("initial slot snapshot too large")));
+ bool issubxact =
+ ReorderBufferXidIsKnownSubXact(builder->reorder, xid);
- newxip[newxcnt++] = xid;
+ if (!issubxact)
+ {
+ if (newxcnt >= GetMaxSnapshotXidCount())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("initial slot snapshot too large")));
+
+ newxip[newxcnt++] = xid;
+ }
}
TransactionIdAdvance(xid);
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 5b40ff75f7..e5fa1051d7 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -669,6 +669,7 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid);
bool ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid,
XLogRecPtr prepare_lsn, XLogRecPtr end_lsn,
TimestampTz prepare_time,
On Mon, Oct 11, 2021 at 4:29 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
At Mon, 11 Oct 2021 11:49:41 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
While creating an "export snapshot" I don't see any protection why the
number of xids in the snapshot can not cross the
"GetMaxSnapshotXidCount()"?.Basically, while converting the HISTORIC snapshot to the MVCC snapshot
in "SnapBuildInitialSnapshot()", we add all the xids between
snap->xmin to snap->xmax to the MVCC snap->xip array (xids for which
commit were not recorded). The problem is that we add both topxids as
well as the subxids into the same array and expect that the "xid"
count does not cross the "GetMaxSnapshotXidCount()". So it seems like
an issue but I am not sure what is the fix for this, some options areIt seems to me that it is a compromise between the restriction of the
legitimate snapshot and snapshots created by snapbuild. If the xids
overflow, the resulting snapshot may lose a siginificant xid, i.e, a
top-level xid.a) Don't limit the xid count in the exported snapshot and dynamically
resize the array b) Increase the limit to GetMaxSnapshotXidCount() +
GetMaxSnapshotSubxidCount(). But in option b) there would still be a
problem that how do we handle the overflowed subtransaction?I'm afraid that we shouldn't expand the size limits. If I understand
it correctly, we only need the top-level xids in the exported snapshot
But since we are using this as an MVCC snapshot, if we don't have the
subxid and if we also don't mark the "suboverflowed" flag then IMHO
the sub-transaction visibility might be wrong, Am I missing something?
and reorder buffer knows whether a xid is a top-level or not after
establishing a consistent snapshot.The attached diff tries to make SnapBuildInitialSnapshot exclude
subtransaction from generated snapshots. It seems working fine for
you repro. (Of course, I'm not confident that it is the correct thing,
though..)What do you think about this?
If your statement that we only need top-xids in the exported snapshot,
is true then this fix looks fine to me. If not then we might need to
add the sub-xids in the snapshot->subxip array and if it crosses the
limit of GetMaxSnapshotSubxidCount(), then we can mark "suboverflowed"
flag.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
At Mon, 11 Oct 2021 16:48:10 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
On Mon, Oct 11, 2021 at 4:29 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:At Mon, 11 Oct 2021 11:49:41 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
While creating an "export snapshot" I don't see any protection why the
number of xids in the snapshot can not cross the
"GetMaxSnapshotXidCount()"?.Basically, while converting the HISTORIC snapshot to the MVCC snapshot
in "SnapBuildInitialSnapshot()", we add all the xids between
snap->xmin to snap->xmax to the MVCC snap->xip array (xids for which
commit were not recorded). The problem is that we add both topxids as
well as the subxids into the same array and expect that the "xid"
count does not cross the "GetMaxSnapshotXidCount()". So it seems like
an issue but I am not sure what is the fix for this, some options areIt seems to me that it is a compromise between the restriction of the
legitimate snapshot and snapshots created by snapbuild. If the xids
overflow, the resulting snapshot may lose a siginificant xid, i.e, a
top-level xid.a) Don't limit the xid count in the exported snapshot and dynamically
resize the array b) Increase the limit to GetMaxSnapshotXidCount() +
GetMaxSnapshotSubxidCount(). But in option b) there would still be a
problem that how do we handle the overflowed subtransaction?I'm afraid that we shouldn't expand the size limits. If I understand
it correctly, we only need the top-level xids in the exported snapshotBut since we are using this as an MVCC snapshot, if we don't have the
subxid and if we also don't mark the "suboverflowed" flag then IMHO
the sub-transaction visibility might be wrong, Am I missing something?
Sorry I should have set suboverflowed in the generated snapshot.
However, we can store subxid list as well when the snapshot (or
running_xact) is not overflown. These (should) works the same way.
On physical standby, snapshot is created just filling up only subxip
with all top and sub xids (procrray.c:2400). It would be better we do
the same thing here.
and reorder buffer knows whether a xid is a top-level or not after
establishing a consistent snapshot.The attached diff tries to make SnapBuildInitialSnapshot exclude
subtransaction from generated snapshots. It seems working fine for
you repro. (Of course, I'm not confident that it is the correct thing,
though..)What do you think about this?
If your statement that we only need top-xids in the exported snapshot,
is true then this fix looks fine to me. If not then we might need to
add the sub-xids in the snapshot->subxip array and if it crosses the
limit of GetMaxSnapshotSubxidCount(), then we can mark "suboverflowed"
flag.
So I came up with the attached version.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
0001-Allow-overflowed-snapshot-when-creating-logical-repl.patchtext/x-patch; charset=us-asciiDownload
From 374a10aa6819224ca6af548100aa34e6c772a2c3 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 12 Oct 2021 13:53:27 +0900
Subject: [PATCH] Allow overflowed snapshot when creating logical replication
slot
Snapshot can hold top XIDs up to the number of server processes but
SnapBuildInitialSnapshot tries to store all top-level and sub XIDs to
there and fails for certain circumstances. Instead, create a
"takenDuringRecovery" snapshot instead, which stores all XIDs in
subxip array. Addition to that, allow to create an overflowed
snapshot by adding to reorder buffer a function to tell whether an XID
is a top-level or not.
---
.../replication/logical/reorderbuffer.c | 20 ++++++++
src/backend/replication/logical/snapbuild.c | 50 ++++++++++++++-----
src/include/replication/reorderbuffer.h | 1 +
3 files changed, 59 insertions(+), 12 deletions(-)
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 46e66608cf..4e452cce7c 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3326,6 +3326,26 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid)
}
+/*
+ * ReorderBufferXidIsKnownSubXact
+ * Returns true if the xid is a known subtransaction.
+ */
+bool
+ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false,
+ NULL, InvalidXLogRecPtr, false);
+
+ /* a known subtxn? */
+ if (txn && rbtxn_is_known_subxact(txn))
+ return true;
+
+ return false;
+}
+
+
/*
* ---------------------------------------
* Disk serialization support
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index a5333349a8..d422315717 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -531,8 +531,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
{
Snapshot snap;
TransactionId xid;
- TransactionId *newxip;
- int newxcnt = 0;
+ TransactionId *newsubxip;
+ int newsubxcnt;
+ bool overflowed = false;
Assert(!FirstSnapshotSet);
Assert(XactIsoLevel == XACT_REPEATABLE_READ);
@@ -568,9 +569,12 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
MyProc->xmin = snap->xmin;
- /* allocate in transaction context */
- newxip = (TransactionId *)
- palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
+ /*
+ * Allocate in transaction context. We use only subxip to store xids. See
+ * GetSnapshotData for details.
+ */
+ newsubxip = (TransactionId *)
+ palloc(sizeof(TransactionId) * GetMaxSnapshotSubxidCount());
/*
* snapbuild.c builds transactions in an "inverted" manner, which means it
@@ -578,6 +582,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
* classical snapshot by marking all non-committed transactions as
* in-progress. This can be expensive.
*/
+retry:
+ newsubxcnt = 0;
+
for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
{
void *test;
@@ -591,12 +598,29 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
if (test == NULL)
{
- if (newxcnt >= GetMaxSnapshotXidCount())
- ereport(ERROR,
- (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
- errmsg("initial slot snapshot too large")));
+ bool add = true;
- newxip[newxcnt++] = xid;
+ /* exlude subxids when subxip is overflowed */
+ if (overflowed &&
+ ReorderBufferXidIsKnownSubXact(builder->reorder, xid))
+ add = false;
+
+ if (add)
+ {
+ if (newsubxcnt >= GetMaxSnapshotSubxidCount())
+ {
+ if (overflowed)
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("initial slot snapshot too large")));
+
+ /* retry excluding subxids */
+ overflowed = true;
+ goto retry;
+ }
+
+ newsubxip[newsubxcnt++] = xid;
+ }
}
TransactionIdAdvance(xid);
@@ -604,8 +628,10 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
/* adjust remaining snapshot fields as needed */
snap->snapshot_type = SNAPSHOT_MVCC;
- snap->xcnt = newxcnt;
- snap->xip = newxip;
+ snap->xcnt = 0;
+ snap->subxcnt = newsubxcnt;
+ snap->subxip = newsubxip;
+ snap->suboverflowed = overflowed;
return snap;
}
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 5b40ff75f7..e5fa1051d7 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -669,6 +669,7 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid);
bool ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid,
XLogRecPtr prepare_lsn, XLogRecPtr end_lsn,
TimestampTz prepare_time,
--
2.27.0
At Tue, 12 Oct 2021 13:59:59 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
So I came up with the attached version.
Sorry, it was losing a piece of change.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
v2-0001-Allow-overflowed-snapshot-when-creating-logical-r.patchtext/x-patch; charset=us-asciiDownload
From 424f405b4c9d41471eae1edd48cdbb6a6d47217e Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 12 Oct 2021 13:53:27 +0900
Subject: [PATCH v2] Allow overflowed snapshot when creating logical
replication slot
Snapshot can hold top XIDs up to the number of server processes but
SnapBuildInitialSnapshot tries to store all top-level and sub XIDs to
there and fails for certain circumstances. Instead, create a
"takenDuringRecovery" snapshot instead, which stores all XIDs in
subxip array. Addition to that, allow to create an overflowed
snapshot by adding to reorder buffer a function to tell whether an XID
is a top-level or not.
---
.../replication/logical/reorderbuffer.c | 20 +++++++
src/backend/replication/logical/snapbuild.c | 56 +++++++++++++++----
src/include/replication/reorderbuffer.h | 1 +
3 files changed, 65 insertions(+), 12 deletions(-)
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 46e66608cf..4e452cce7c 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3326,6 +3326,26 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid)
}
+/*
+ * ReorderBufferXidIsKnownSubXact
+ * Returns true if the xid is a known subtransaction.
+ */
+bool
+ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false,
+ NULL, InvalidXLogRecPtr, false);
+
+ /* a known subtxn? */
+ if (txn && rbtxn_is_known_subxact(txn))
+ return true;
+
+ return false;
+}
+
+
/*
* ---------------------------------------
* Disk serialization support
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index a5333349a8..46c691df20 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -531,8 +531,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
{
Snapshot snap;
TransactionId xid;
- TransactionId *newxip;
- int newxcnt = 0;
+ TransactionId *newsubxip;
+ int newsubxcnt;
+ bool overflowed = false;
Assert(!FirstSnapshotSet);
Assert(XactIsoLevel == XACT_REPEATABLE_READ);
@@ -568,9 +569,12 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
MyProc->xmin = snap->xmin;
- /* allocate in transaction context */
- newxip = (TransactionId *)
- palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
+ /*
+ * Allocate in transaction context. We use only subxip to store xids. See
+ * GetSnapshotData for details.
+ */
+ newsubxip = (TransactionId *)
+ palloc(sizeof(TransactionId) * GetMaxSnapshotSubxidCount());
/*
* snapbuild.c builds transactions in an "inverted" manner, which means it
@@ -578,6 +582,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
* classical snapshot by marking all non-committed transactions as
* in-progress. This can be expensive.
*/
+retry:
+ newsubxcnt = 0;
+
for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
{
void *test;
@@ -591,12 +598,29 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
if (test == NULL)
{
- if (newxcnt >= GetMaxSnapshotXidCount())
- ereport(ERROR,
- (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
- errmsg("initial slot snapshot too large")));
+ bool add = true;
- newxip[newxcnt++] = xid;
+ /* exlude subxids when subxip is overflowed */
+ if (overflowed &&
+ ReorderBufferXidIsKnownSubXact(builder->reorder, xid))
+ add = false;
+
+ if (add)
+ {
+ if (newsubxcnt >= GetMaxSnapshotSubxidCount())
+ {
+ if (overflowed)
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("initial slot snapshot too large")));
+
+ /* retry excluding subxids */
+ overflowed = true;
+ goto retry;
+ }
+
+ newsubxip[newsubxcnt++] = xid;
+ }
}
TransactionIdAdvance(xid);
@@ -604,8 +628,16 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
/* adjust remaining snapshot fields as needed */
snap->snapshot_type = SNAPSHOT_MVCC;
- snap->xcnt = newxcnt;
- snap->xip = newxip;
+ snap->xcnt = 0;
+ snap->subxcnt = newsubxcnt;
+ snap->subxip = newsubxip;
+ snap->suboverflowed = overflowed;
+
+ /*
+ * Although this snapshot is taken actually not during recovery, all XIDs
+ * are stored in subxip even if it is not overflowed.
+ */
+ snap->takenDuringRecovery = true;
return snap;
}
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 5b40ff75f7..e5fa1051d7 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -669,6 +669,7 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid);
bool ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid,
XLogRecPtr prepare_lsn, XLogRecPtr end_lsn,
TimestampTz prepare_time,
--
2.27.0
On Tue, Oct 12, 2021 at 10:35 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
At Tue, 12 Oct 2021 13:59:59 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
So I came up with the attached version.
Sorry, it was losing a piece of change.
Yeah, at a high level this is on the idea I have in mind, I will do a
detailed review in a day or two. Thanks for working on this.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Tue, Oct 12, 2021 at 11:30 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Tue, Oct 12, 2021 at 10:35 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:At Tue, 12 Oct 2021 13:59:59 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
So I came up with the attached version.
Sorry, it was losing a piece of change.
Yeah, at a high level this is on the idea I have in mind, I will do a
detailed review in a day or two. Thanks for working on this.
While doing the detailed review, I think there are a couple of
problems with the patch, the main problem of storing all the xid in
the snap->subxip is that once we mark the snapshot overflown then the
XidInMVCCSnapshot, will not search the subxip array, instead it will
fetch the topXid and search in the snap->xip array. Another issue is
that the total xids could be GetMaxSnapshotSubxidCount()
+GetMaxSnapshotXidCount().
I think what we should be doing is that if the xid is know subxid then
add in the snap->subxip array otherwise in snap->xip array. So
snap->xip array size will be GetMaxSnapshotXidCount() whereas the
snap->subxip array size will be GetMaxSnapshotSubxidCount(). And if
the array size is full then we can stop adding the subxids to the
array.
What is your thought on this?
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Tue, Oct 19, 2021 at 2:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Tue, Oct 12, 2021 at 11:30 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Tue, Oct 12, 2021 at 10:35 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:At Tue, 12 Oct 2021 13:59:59 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
So I came up with the attached version.
Sorry, it was losing a piece of change.
Yeah, at a high level this is on the idea I have in mind, I will do a
detailed review in a day or two. Thanks for working on this.While doing the detailed review, I think there are a couple of
problems with the patch, the main problem of storing all the xid in
the snap->subxip is that once we mark the snapshot overflown then the
XidInMVCCSnapshot, will not search the subxip array, instead it will
fetch the topXid and search in the snap->xip array.
I missed that you are marking the snapshot as takenDuringRecovery so
your fix looks fine.
Another issue is
that the total xids could be GetMaxSnapshotSubxidCount()
+GetMaxSnapshotXidCount().
I have fixed this, the updated patch is attached.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v3-0001-Allow-overflowed-snapshot-when-creating-logical-r.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Allow-overflowed-snapshot-when-creating-logical-r.patchDownload
From b6ca7a05b05a9244bf400d14b75dfcd1f99c76cd Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 12 Oct 2021 13:53:27 +0900
Subject: [PATCH v3] Allow overflowed snapshot when creating logical
replication slot
Snapshot can hold top XIDs up to the number of server processes but
SnapBuildInitialSnapshot tries to store all top-level and sub XIDs to
there and fails for certain circumstances. Instead, create a
"takenDuringRecovery" snapshot instead, which stores all XIDs in
subxip array. Addition to that, allow to create an overflowed
snapshot by adding to reorder buffer a function to tell whether an XID
is a top-level or not.
---
src/backend/replication/logical/reorderbuffer.c | 20 +++++++++
src/backend/replication/logical/snapbuild.c | 57 +++++++++++++++++++------
src/include/replication/reorderbuffer.h | 1 +
3 files changed, 66 insertions(+), 12 deletions(-)
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 46e6660..4e452cc 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3327,6 +3327,26 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid)
/*
+ * ReorderBufferXidIsKnownSubXact
+ * Returns true if the xid is a known subtransaction.
+ */
+bool
+ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false,
+ NULL, InvalidXLogRecPtr, false);
+
+ /* a known subtxn? */
+ if (txn && rbtxn_is_known_subxact(txn))
+ return true;
+
+ return false;
+}
+
+
+/*
* ---------------------------------------
* Disk serialization support
* ---------------------------------------
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index dbdc172..9fd28d4 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -531,8 +531,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
{
Snapshot snap;
TransactionId xid;
- TransactionId *newxip;
- int newxcnt = 0;
+ TransactionId *newsubxip;
+ int maxxidcnt;
+ bool overflowed = false;
Assert(!FirstSnapshotSet);
Assert(XactIsoLevel == XACT_REPEATABLE_READ);
@@ -568,9 +569,13 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
MyProc->xmin = snap->xmin;
- /* allocate in transaction context */
- newxip = (TransactionId *)
- palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
+ /*
+ * Allocate in transaction context. We use only subxip to store xids. See
+ * GetSnapshotData for details.
+ */
+ newsubxip = (TransactionId *) palloc(sizeof(TransactionId) *
+ (GetMaxSnapshotXidCount() +
+ GetMaxSnapshotSubxidCount()));
/*
* snapbuild.c builds transactions in an "inverted" manner, which means it
@@ -578,6 +583,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
* classical snapshot by marking all non-committed transactions as
* in-progress. This can be expensive.
*/
+retry:
+ newsubxcnt = 0;
+
for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
{
void *test;
@@ -591,12 +599,29 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
if (test == NULL)
{
- if (newxcnt >= GetMaxSnapshotXidCount())
- ereport(ERROR,
- (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
- errmsg("initial slot snapshot too large")));
+ bool add = true;
- newxip[newxcnt++] = xid;
+ /* exlude subxids when subxip is overflowed */
+ if (overflowed &&
+ ReorderBufferXidIsKnownSubXact(builder->reorder, xid))
+ add = false;
+
+ if (add)
+ {
+ if (newsubxcnt >= GetMaxSnapshotSubxidCount())
+ {
+ if (overflowed)
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("initial slot snapshot too large")));
+
+ /* retry excluding subxids */
+ overflowed = true;
+ goto retry;
+ }
+
+ newsubxip[newsubxcnt++] = xid;
+ }
}
TransactionIdAdvance(xid);
@@ -604,8 +629,16 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
/* adjust remaining snapshot fields as needed */
snap->snapshot_type = SNAPSHOT_MVCC;
- snap->xcnt = newxcnt;
- snap->xip = newxip;
+ snap->xcnt = 0;
+ snap->subxcnt = newsubxcnt;
+ snap->subxip = newsubxip;
+ snap->suboverflowed = overflowed;
+
+ /*
+ * Although this snapshot is taken actually not during recovery, all XIDs
+ * are stored in subxip even if it is not overflowed.
+ */
+ snap->takenDuringRecovery = true;
return snap;
}
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 5b40ff7..e5fa105 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -669,6 +669,7 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid);
bool ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid,
XLogRecPtr prepare_lsn, XLogRecPtr end_lsn,
TimestampTz prepare_time,
--
1.8.3.1
Hi,
On Tue, Nov 02, 2021 at 04:40:39PM +0530, Dilip Kumar wrote:
I have fixed this, the updated patch is attached.
The cfbot reports that this patch doesn't compile:
https://cirrus-ci.com/task/5642000073490432?logs=build
[03:01:24.477] snapbuild.c: In function ‘SnapBuildInitialSnapshot’:
[03:01:24.477] snapbuild.c:587:2: error: ‘newsubxcnt’ undeclared (first use in this function); did you mean ‘newsubxip’?
[03:01:24.477] 587 | newsubxcnt = 0;
[03:01:24.477] | ^~~~~~~~~~
[03:01:24.477] | newsubxip
[03:01:24.477] snapbuild.c:587:2: note: each undeclared identifier is reported only once for each function it appears in
[03:01:24.477] snapbuild.c:535:8: warning: unused variable ‘maxxidcnt’ [-Wunused-variable]
[03:01:24.477] 535 | int maxxidcnt;
[03:01:24.477] | ^~~~~~~~~
Could you send a new version? In the meantime I will switch the patch to
Waiting on Author.
On Wed, Jan 12, 2022 at 4:09 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Hi,
On Tue, Nov 02, 2021 at 04:40:39PM +0530, Dilip Kumar wrote:
I have fixed this, the updated patch is attached.
The cfbot reports that this patch doesn't compile:
https://cirrus-ci.com/task/5642000073490432?logs=build[03:01:24.477] snapbuild.c: In function ‘SnapBuildInitialSnapshot’:
[03:01:24.477] snapbuild.c:587:2: error: ‘newsubxcnt’ undeclared (first
use in this function); did you mean ‘newsubxip’?
[03:01:24.477] 587 | newsubxcnt = 0;
[03:01:24.477] | ^~~~~~~~~~
[03:01:24.477] | newsubxip
[03:01:24.477] snapbuild.c:587:2: note: each undeclared identifier is
reported only once for each function it appears in
[03:01:24.477] snapbuild.c:535:8: warning: unused variable ‘maxxidcnt’
[-Wunused-variable]
[03:01:24.477] 535 | int maxxidcnt;
[03:01:24.477] | ^~~~~~~~~Could you send a new version? In the meantime I will switch the patch to
Waiting on Author.
Thanks for notifying, I will work on this and send the update patch soon.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
At Tue, 2 Nov 2021 16:40:39 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
On Tue, Oct 19, 2021 at 2:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Tue, Oct 12, 2021 at 11:30 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Tue, Oct 12, 2021 at 10:35 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:At Tue, 12 Oct 2021 13:59:59 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
So I came up with the attached version.
Sorry, it was losing a piece of change.
Yeah, at a high level this is on the idea I have in mind, I will do a
detailed review in a day or two. Thanks for working on this.While doing the detailed review, I think there are a couple of
problems with the patch, the main problem of storing all the xid in
the snap->subxip is that once we mark the snapshot overflown then the
XidInMVCCSnapshot, will not search the subxip array, instead it will
fetch the topXid and search in the snap->xip array.I missed that you are marking the snapshot as takenDuringRecovery so
your fix looks fine.Another issue is
that the total xids could be GetMaxSnapshotSubxidCount()
+GetMaxSnapshotXidCount().I have fixed this, the updated patch is attached.
Mmm. The size of the array cannot be larger than the numbers the
*Connt() functions return. Thus we cannot attach the oversized array
to ->subxip. (I don't recall clearly but that would lead to assertion
failure somewhere..)
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
At Mon, 17 Jan 2022 09:27:14 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
On Wed, Jan 12, 2022 at 4:09 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
The cfbot reports that this patch doesn't compile:
https://cirrus-ci.com/task/5642000073490432?logs=build[03:01:24.477] snapbuild.c: In function ‘SnapBuildInitialSnapshot’:
[03:01:24.477] snapbuild.c:587:2: error: ‘newsubxcnt’ undeclared (first
use in this function); did you mean ‘newsubxip’?
[03:01:24.477] 587 | newsubxcnt = 0;
[03:01:24.477] | ^~~~~~~~~~
[03:01:24.477] | newsubxip
[03:01:24.477] snapbuild.c:587:2: note: each undeclared identifier is
reported only once for each function it appears in
[03:01:24.477] snapbuild.c:535:8: warning: unused variable ‘maxxidcnt’
[-Wunused-variable]
[03:01:24.477] 535 | int maxxidcnt;
[03:01:24.477] | ^~~~~~~~~Could you send a new version? In the meantime I will switch the patch to
Waiting on Author.Thanks for notifying, I will work on this and send the update patch soon.
me> Mmm. The size of the array cannot be larger than the numbers the
me> *Connt() functions return. Thus we cannot attach the oversized array
me> to ->subxip. (I don't recall clearly but that would lead to assertion
me> failure somewhere..)
Then, I fixed the v3 error and post v4.
To recap:
SnapBUildInitialSnapshot tries to store XIDS of both top and sub
transactions into snapshot->xip array but the array is easily
overflowed and CREATE_REPLICATOIN_SLOT command ends with an error.
To fix this, this patch is doing the following things.
- Use subxip array instead of xip array to allow us have larger array
for xids. So the snapshot is marked as takenDuringRecovery, which
is a kind of abuse but largely reduces the chance of getting
"initial slot snapshot too large" error.
- Still if subxip is overflowed, retry with excluding subtransactions
then set suboverflowed. This causes XidInMVCCSnapshot (finally)
scans over subxip array for targetted top-level xid.
We could take another way: make a !takenDuringRecovery snapshot by
using xip instead of subxip. It is cleaner but it has far larger
chance of needing to retry.
(renamed the patch since it represented a part of the patch)
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
v4-0001-Avoid-an-error-while-creating-logical-replication.patchtext/x-patch; charset=us-asciiDownload
From 1e3ec40d70c2e7f8482632333001fe52527ef17a Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Mon, 31 Jan 2022 15:03:33 +0900
Subject: [PATCH v4] Avoid an error while creating logical replication slot
Snapshot can hold top XIDs up to the number of server processes but
SnapBuildInitialSnapshot tries to store all top-level and sub XIDs to
there and easily fails with "initial slot snapshot too large" error.
Instead, create a "takenDuringRecovery" snapshot, which stores all
XIDs in subxip array. This alone largely reduces the chance of getting
the error. But to eliminate the chance of the error to occur, allow to
create an overflowed snapshot by adding to reorder buffer a function
to tell whether an XID is a top-level or not.
Author: Dilip Kumar and Kyotaro Horiguchi
---
.../replication/logical/reorderbuffer.c | 20 +++++++
src/backend/replication/logical/snapbuild.c | 54 ++++++++++++++-----
src/include/replication/reorderbuffer.h | 1 +
3 files changed, 63 insertions(+), 12 deletions(-)
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 19b2ba2000..429e6296ab 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3326,6 +3326,26 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid)
}
+/*
+ * ReorderBufferXidIsKnownSubXact
+ * Returns true if the xid is a known subtransaction.
+ */
+bool
+ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false,
+ NULL, InvalidXLogRecPtr, false);
+
+ /* a known subtxn? */
+ if (txn && rbtxn_is_known_subxact(txn))
+ return true;
+
+ return false;
+}
+
+
/*
* ---------------------------------------
* Disk serialization support
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 83fca8a77d..21f30fa1d3 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -531,8 +531,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
{
Snapshot snap;
TransactionId xid;
- TransactionId *newxip;
- int newxcnt = 0;
+ TransactionId *newsubxip;
+ int newsubxcnt;
+ bool overflowed = false;
Assert(!FirstSnapshotSet);
Assert(XactIsoLevel == XACT_REPEATABLE_READ);
@@ -568,9 +569,12 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
MyProc->xmin = snap->xmin;
- /* allocate in transaction context */
- newxip = (TransactionId *)
- palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
+ /*
+ * Allocate in transaction context. We use only subxip to store xids. See
+ * below and GetSnapshotData for details.
+ */
+ newsubxip = (TransactionId *)
+ palloc(sizeof(TransactionId) * GetMaxSnapshotSubxidCount());
/*
* snapbuild.c builds transactions in an "inverted" manner, which means it
@@ -578,6 +582,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
* classical snapshot by marking all non-committed transactions as
* in-progress. This can be expensive.
*/
+retry:
+ newsubxcnt = 0;
+
for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
{
void *test;
@@ -591,12 +598,28 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
if (test == NULL)
{
- if (newxcnt >= GetMaxSnapshotXidCount())
- ereport(ERROR,
- (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
- errmsg("initial slot snapshot too large")));
+ /*
+ * If subtransactions fit the subxid array, we fill the array and
+ * call it a day. Otherwise we store only top-level transactions
+ * into the same subxip array.
+ */
+ if (!overflowed ||
+ !ReorderBufferXidIsKnownSubXact(builder->reorder, xid))
+ {
+ if (newsubxcnt >= GetMaxSnapshotSubxidCount())
+ {
+ if (overflowed)
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("initial slot snapshot too large")));
- newxip[newxcnt++] = xid;
+ /* retry excluding subxids */
+ overflowed = true;
+ goto retry;
+ }
+
+ newsubxip[newsubxcnt++] = xid;
+ }
}
TransactionIdAdvance(xid);
@@ -604,8 +627,15 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
/* adjust remaining snapshot fields as needed */
snap->snapshot_type = SNAPSHOT_MVCC;
- snap->xcnt = newxcnt;
- snap->xip = newxip;
+ snap->xcnt = 0;
+ snap->subxcnt = newsubxcnt;
+ snap->subxip = newsubxip;
+ snap->suboverflowed = overflowed;
+ /*
+ * Although this snapshot is taken actually not during recovery, all XIDs
+ * are stored in subxip even if it is not overflowed.
+ */
+ snap->takenDuringRecovery = true;
return snap;
}
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index aa0a73382f..c05cf3078c 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -669,6 +669,7 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid);
bool ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid,
XLogRecPtr prepare_lsn, XLogRecPtr end_lsn,
TimestampTz prepare_time,
--
2.27.0
Import Notes
Reply to msg id not found: CAFiTN-vb_Mv+rRVDWtagvdkow-eWaT_xsGLSDiPfzL41UKiDwQ@mail.gmail.com20220131.143158.896330709668373625.horikyota.ntt@gmail.com
On Mon, Jan 31, 2022 at 11:50 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
SnapBUildInitialSnapshot tries to store XIDS of both top and sub
transactions into snapshot->xip array but the array is easily
overflowed and CREATE_REPLICATOIN_SLOT command ends with an error.To fix this, this patch is doing the following things.
- Use subxip array instead of xip array to allow us have larger array
for xids. So the snapshot is marked as takenDuringRecovery, which
is a kind of abuse but largely reduces the chance of getting
"initial slot snapshot too large" error.- Still if subxip is overflowed, retry with excluding subtransactions
then set suboverflowed. This causes XidInMVCCSnapshot (finally)
scans over subxip array for targetted top-level xid.We could take another way: make a !takenDuringRecovery snapshot by
using xip instead of subxip. It is cleaner but it has far larger
chance of needing to retry.(renamed the patch since it represented a part of the patch)
Thanks for the updated version. I will look into it this week.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Mon, Jan 31, 2022 at 11:50 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
At Mon, 17 Jan 2022 09:27:14 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
me> Mmm. The size of the array cannot be larger than the numbers the
me> *Connt() functions return. Thus we cannot attach the oversized array
me> to ->subxip. (I don't recall clearly but that would lead to assertion
me> failure somewhere..)Then, I fixed the v3 error and post v4.
Yeah you are right, SetTransactionSnapshot() has that assertion.
Anyway after looking again it appears that
GetMaxSnapshotSubxidCount is the correct size because this is
PGPROC_MAX_CACHED_SUBXIDS +1, i.e. it considers top transactions as
well so we don't need to add them separately.
SnapBUildInitialSnapshot tries to store XIDS of both top and sub
transactions into snapshot->xip array but the array is easily
overflowed and CREATE_REPLICATOIN_SLOT command ends with an error.To fix this, this patch is doing the following things.
- Use subxip array instead of xip array to allow us have larger array
for xids. So the snapshot is marked as takenDuringRecovery, which
is a kind of abuse but largely reduces the chance of getting
"initial slot snapshot too large" error.
Right. I think the patch looks fine to me.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
В Пн, 07/02/2022 в 13:52 +0530, Dilip Kumar пишет:
On Mon, Jan 31, 2022 at 11:50 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:At Mon, 17 Jan 2022 09:27:14 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
me> Mmm. The size of the array cannot be larger than the numbers the
me> *Connt() functions return. Thus we cannot attach the oversized array
me> to ->subxip. (I don't recall clearly but that would lead to assertion
me> failure somewhere..)Then, I fixed the v3 error and post v4.
Yeah you are right, SetTransactionSnapshot() has that assertion.
Anyway after looking again it appears that
GetMaxSnapshotSubxidCount is the correct size because this is
PGPROC_MAX_CACHED_SUBXIDS +1, i.e. it considers top transactions as
well so we don't need to add them separately.SnapBUildInitialSnapshot tries to store XIDS of both top and sub
transactions into snapshot->xip array but the array is easily
overflowed and CREATE_REPLICATOIN_SLOT command ends with an error.To fix this, this patch is doing the following things.
- Use subxip array instead of xip array to allow us have larger array
for xids. So the snapshot is marked as takenDuringRecovery, which
is a kind of abuse but largely reduces the chance of getting
"initial slot snapshot too large" error.Right. I think the patch looks fine to me.
Good day.
I've looked to the patch. Personally I'd prefer dynamically resize
xip array. But I think there is issue with upgrade if replica source
is upgraded before destination, right?
Concerning patch, I think more comments should be written about new
usage case for `takenDuringRecovery`. May be this field should be renamed
at all?
And there are checks for `takenDuringRecovery` in `heapgetpage` and
`heapam_scan_sample_next_tuple`. Are this checks affected by the change?
Neither the preceding discussion nor commit message answer me.
-------
regards
Yura Sokolov
Postgres Professional
y.sokolov@postgrespro.ru
funny.falcon@gmail.com
At Sun, 13 Feb 2022 17:35:38 +0300, Yura Sokolov <y.sokolov@postgrespro.ru> wrote in
В Пн, 07/02/2022 в 13:52 +0530, Dilip Kumar пишет:
Right. I think the patch looks fine to me.
Good day.
I've looked to the patch. Personally I'd prefer dynamically resize
xip array. But I think there is issue with upgrade if replica source
is upgraded before destination, right?
I don't think it is relevant. I think we don't convey snapshot via
streaming replication. But I think that expanding xip or subxip is
wrong, since it is tied with ProcArray structure. (Even though we
abuse the arrays in some situations, like this).
Concerning patch, I think more comments should be written about new
usage case for `takenDuringRecovery`. May be this field should be renamed
at all?
I don't feel the need to rename it so much. It just signals that "this
snapshot is in the form for recovery". The more significant reason is
that I don't come up better name:p
And the comment is slightly modified and gets a pointer to detailed
comment.
+ * Although this snapshot is not acutally taken during recovery, all XIDs
+ * are stored in subxip. See GetSnapshotData for the details of that form
+ * of snapshot.
And there are checks for `takenDuringRecovery` in `heapgetpage` and
`heapam_scan_sample_next_tuple`. Are this checks affected by the change?
Neither the preceding discussion nor commit message answer me.
The snapshot works correctly, but for the heapgetpage case, it foces
all_visible to be false. That unnecessarily prevents visibility check
from skipping.
An annoying thing in SnapBuildInitialSnapshot is that we don't know
the number of xids before looping over the xid range, and we don't
want to bother sorting top-level xids and subxids unless we have to do
so.
Is it better that we hassle in SnapBuildInitialSnapshot to create a
!takenDuringRecovery snapshot?
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
At Fri, 01 Apr 2022 14:44:30 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
At Sun, 13 Feb 2022 17:35:38 +0300, Yura Sokolov <y.sokolov@postgrespro.ru> wrote in
And there are checks for `takenDuringRecovery` in `heapgetpage` and
`heapam_scan_sample_next_tuple`. Are this checks affected by the change?
Neither the preceding discussion nor commit message answer me.The snapshot works correctly, but for the heapgetpage case, it foces
all_visible to be false. That unnecessarily prevents visibility check
from skipping.An annoying thing in SnapBuildInitialSnapshot is that we don't know
the number of xids before looping over the xid range, and we don't
want to bother sorting top-level xids and subxids unless we have to do
so.Is it better that we hassle in SnapBuildInitialSnapshot to create a
!takenDuringRecovery snapshot?
So this is that. v5 creates a regular snapshot.
By the way, is there any chance this could be committed to 15?
Otherwise I'll immediately move this to the next CF.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
v5-0001-Create-correct-snapshot-during-CREATE_REPLICATION.patchtext/x-patch; charset=us-asciiDownload
From 6f3cfb58bc2c6294124831d6b410b0e51d067fb2 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Mon, 31 Jan 2022 15:03:33 +0900
Subject: [PATCH v5] Create correct snapshot during CREATE_REPLICATION_SLOT
Snapshot can hold top XIDs up to the number of server processes but
SnapBuildInitialSnapshot tries to store all top-level and sub XIDs to
there and easily fails with "initial slot snapshot too large" error.
We could instead create a "takenDuringRecovery" snapshot, which later
leads to unnecessary visibility checks. Therefore we take trouble to
create a regular snapshot by identifying whether each xids is
top-level and storing it in the appropriate xid array.
Author: Dilip Kumar and Kyotaro Horiguchi
---
.../replication/logical/reorderbuffer.c | 20 +++++++++
src/backend/replication/logical/snapbuild.c | 45 +++++++++++++++----
src/include/replication/reorderbuffer.h | 1 +
3 files changed, 58 insertions(+), 8 deletions(-)
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index c2d9be81fa..42adbe5f15 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3658,6 +3658,26 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid)
}
+/*
+ * ReorderBufferXidIsKnownSubXact
+ * Returns true if the xid is a known subtransaction.
+ */
+bool
+ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false,
+ NULL, InvalidXLogRecPtr, false);
+
+ /* a known subtxn? */
+ if (txn && rbtxn_is_known_subxact(txn))
+ return true;
+
+ return false;
+}
+
+
/*
* ---------------------------------------
* Disk serialization support
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 83fca8a77d..fa08e406e3 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -532,7 +532,10 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
Snapshot snap;
TransactionId xid;
TransactionId *newxip;
- int newxcnt = 0;
+ TransactionId *newsubxip;
+ int newxcnt;
+ int newsubxcnt;
+ bool overflowed = false;
Assert(!FirstSnapshotSet);
Assert(XactIsoLevel == XACT_REPEATABLE_READ);
@@ -568,9 +571,17 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
MyProc->xmin = snap->xmin;
- /* allocate in transaction context */
+ /*
+ * Allocate in transaction context.
+ *
+ * We could use only subxip to store all xids (takenduringrecovery
+ * snapshot) but that causes useless visibility checks later so we hasle to
+ * create a normal snapshot.
+ */
newxip = (TransactionId *)
palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
+ newsubxip = (TransactionId *)
+ palloc(sizeof(TransactionId) * GetMaxSnapshotSubxidCount());
/*
* snapbuild.c builds transactions in an "inverted" manner, which means it
@@ -578,6 +589,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
* classical snapshot by marking all non-committed transactions as
* in-progress. This can be expensive.
*/
+ newxcnt = 0;
+ newsubxcnt = 0;
+
for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
{
void *test;
@@ -591,12 +605,24 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
if (test == NULL)
{
- if (newxcnt >= GetMaxSnapshotXidCount())
- ereport(ERROR,
- (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
- errmsg("initial slot snapshot too large")));
-
- newxip[newxcnt++] = xid;
+ /* Store the xid to the appropriate xid array */
+ if (ReorderBufferXidIsKnownSubXact(builder->reorder, xid))
+ {
+ if (!overflowed)
+ {
+ if (newsubxcnt >= GetMaxSnapshotSubxidCount())
+ overflowed = true;
+ else
+ newsubxip[newsubxcnt++] = xid;
+ }
+ }
+ else
+ {
+ if (newxcnt >= GetMaxSnapshotXidCount())
+ elog(ERROR,
+ "too many transactions while creating snapshot");
+ newxip[newxcnt++] = xid;
+ }
}
TransactionIdAdvance(xid);
@@ -606,6 +632,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
snap->snapshot_type = SNAPSHOT_MVCC;
snap->xcnt = newxcnt;
snap->xip = newxip;
+ snap->subxcnt = newsubxcnt;
+ snap->subxip = newsubxip;
+ snap->suboverflowed = overflowed;
return snap;
}
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 0bcc150b33..b26c34ce9d 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -707,6 +707,7 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid);
bool ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid,
XLogRecPtr prepare_lsn, XLogRecPtr end_lsn,
TimestampTz prepare_time,
--
2.27.0
On Thu, Mar 31, 2022 at 11:53 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
So this is that. v5 creates a regular snapshot.
This patch will need a quick rebase over 905c020bef9, which added
`extern` to several missing locations.
Thanks,
--Jacob
At Tue, 5 Jul 2022 11:32:42 -0700, Jacob Champion <jchampion@timescale.com> wrote in
On Thu, Mar 31, 2022 at 11:53 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:So this is that. v5 creates a regular snapshot.
This patch will need a quick rebase over 905c020bef9, which added
`extern` to several missing locations.
Thanks! Just rebased.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
v6-0001-Create-correct-snapshot-during-CREATE_REPLICATION.patchtext/x-patch; charset=us-asciiDownload
From abcf0a0e0b3e2de9927d8943a3e3c145ab189508 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 19 Jul 2022 11:50:29 +0900
Subject: [PATCH v6] Create correct snapshot during CREATE_REPLICATION_SLOT
Snapshot can hold top XIDs up to the number of server processes but
SnapBuildInitialSnapshot tries to store all top-level and sub XIDs to
there and easily fails with "initial slot snapshot too large" error.
We could instead create a "takenDuringRecovery" snapshot, which later
leads to unnecessary visibility checks. Therefore we take trouble to
create a regular snapshot by identifying whether each xids is
top-level and storing it in the appropriate xid array.
Author: Dilip Kumar and Kyotaro Horiguchi
---
.../replication/logical/reorderbuffer.c | 20 +++++++++
src/backend/replication/logical/snapbuild.c | 45 +++++++++++++++----
src/include/replication/reorderbuffer.h | 1 +
3 files changed, 58 insertions(+), 8 deletions(-)
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 88a37fde72..44ea3f31aa 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3332,6 +3332,26 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid)
}
+/*
+ * ReorderBufferXidIsKnownSubXact
+ * Returns true if the xid is a known subtransaction.
+ */
+bool
+ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false,
+ NULL, InvalidXLogRecPtr, false);
+
+ /* a known subtxn? */
+ if (txn && rbtxn_is_known_subxact(txn))
+ return true;
+
+ return false;
+}
+
+
/*
* ---------------------------------------
* Disk serialization support
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 73c0f15214..2b5282788a 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -532,7 +532,10 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
Snapshot snap;
TransactionId xid;
TransactionId *newxip;
- int newxcnt = 0;
+ TransactionId *newsubxip;
+ int newxcnt;
+ int newsubxcnt;
+ bool overflowed = false;
Assert(!FirstSnapshotSet);
Assert(XactIsoLevel == XACT_REPEATABLE_READ);
@@ -568,9 +571,17 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
MyProc->xmin = snap->xmin;
- /* allocate in transaction context */
+ /*
+ * Allocate in transaction context.
+ *
+ * We could use only subxip to store all xids (takenduringrecovery
+ * snapshot) but that causes useless visibility checks later so we hasle to
+ * create a normal snapshot.
+ */
newxip = (TransactionId *)
palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
+ newsubxip = (TransactionId *)
+ palloc(sizeof(TransactionId) * GetMaxSnapshotSubxidCount());
/*
* snapbuild.c builds transactions in an "inverted" manner, which means it
@@ -578,6 +589,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
* classical snapshot by marking all non-committed transactions as
* in-progress. This can be expensive.
*/
+ newxcnt = 0;
+ newsubxcnt = 0;
+
for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
{
void *test;
@@ -591,12 +605,24 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
if (test == NULL)
{
- if (newxcnt >= GetMaxSnapshotXidCount())
- ereport(ERROR,
- (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
- errmsg("initial slot snapshot too large")));
-
- newxip[newxcnt++] = xid;
+ /* Store the xid to the appropriate xid array */
+ if (ReorderBufferXidIsKnownSubXact(builder->reorder, xid))
+ {
+ if (!overflowed)
+ {
+ if (newsubxcnt >= GetMaxSnapshotSubxidCount())
+ overflowed = true;
+ else
+ newsubxip[newsubxcnt++] = xid;
+ }
+ }
+ else
+ {
+ if (newxcnt >= GetMaxSnapshotXidCount())
+ elog(ERROR,
+ "too many transactions while creating snapshot");
+ newxip[newxcnt++] = xid;
+ }
}
TransactionIdAdvance(xid);
@@ -606,6 +632,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
snap->snapshot_type = SNAPSHOT_MVCC;
snap->xcnt = newxcnt;
snap->xip = newxip;
+ snap->subxcnt = newsubxcnt;
+ snap->subxip = newsubxip;
+ snap->suboverflowed = overflowed;
return snap;
}
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index d109d0baed..736f22a04e 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -669,6 +669,7 @@ extern void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid
extern bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
extern bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+extern bool ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid);
extern bool ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid,
XLogRecPtr prepare_lsn, XLogRecPtr end_lsn,
TimestampTz prepare_time,
--
2.31.1
On Mon, Jul 18, 2022 at 10:55 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
Thanks! Just rebased.
Hi,
I mentioned this patch to Andres in conversation, and he expressed a
concern that there might be no guarantee that we retain enough CLOG to
look up XIDs. Presumably this wouldn't be an issue when the snapshot
doesn't get marked suboverflowed, but what if it does?
Adding Andres in the hopes that he may comment further.
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi,
Thanks for working on this!
I think this should include a test that fails without this change and succeeds
with it...
On 2022-07-19 11:55:06 +0900, Kyotaro Horiguchi wrote:
From abcf0a0e0b3e2de9927d8943a3e3c145ab189508 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 19 Jul 2022 11:50:29 +0900
Subject: [PATCH v6] Create correct snapshot during CREATE_REPLICATION_SLOT
This sees a tad misleading - the previous snapshot wasn't borken, right?
+/* + * ReorderBufferXidIsKnownSubXact + * Returns true if the xid is a known subtransaction. + */ +bool +ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid) +{ + ReorderBufferTXN *txn; + + txn = ReorderBufferTXNByXid(rb, xid, false, + NULL, InvalidXLogRecPtr, false); + + /* a known subtxn? */ + if (txn && rbtxn_is_known_subxact(txn)) + return true; + + return false; +}
The comments here just seem to restate the code....
It's not obvious to me that it's the right design (or even correct) to ask
reorderbuffer about an xid being a subxid. Maybe I'm missing something, but
why would reorderbuffer even be guaranteed to know about all these subxids?
@@ -568,9 +571,17 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
MyProc->xmin = snap->xmin;
- /* allocate in transaction context */ + /* + * Allocate in transaction context. + * + * We could use only subxip to store all xids (takenduringrecovery + * snapshot) but that causes useless visibility checks later so we hasle to + * create a normal snapshot. + */
I can't really parse this comment at this point, and I seriously doubt I could
later on.
@@ -591,12 +605,24 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
if (test == NULL) { - if (newxcnt >= GetMaxSnapshotXidCount()) - ereport(ERROR, - (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), - errmsg("initial slot snapshot too large"))); - - newxip[newxcnt++] = xid; + /* Store the xid to the appropriate xid array */ + if (ReorderBufferXidIsKnownSubXact(builder->reorder, xid)) + { + if (!overflowed) + { + if (newsubxcnt >= GetMaxSnapshotSubxidCount()) + overflowed = true; + else + newsubxip[newsubxcnt++] = xid; + } + } + else + { + if (newxcnt >= GetMaxSnapshotXidCount()) + elog(ERROR, + "too many transactions while creating snapshot"); + newxip[newxcnt++] = xid; + } }
Hm, this is starting to be pretty deeply nested...
I wonder if a better fix here wouldn't be to allow importing a snapshot with a
larger ->xid array. Yes, we can't do that in CurrentSnapshotData, but IIRC we
need to be in a transactional snapshot anyway, which is copied anyway?
Greetings,
Andres Freund
Hi,
On 2022-09-09 13:19:14 -0400, Robert Haas wrote:
I mentioned this patch to Andres in conversation, and he expressed a
concern that there might be no guarantee that we retain enough CLOG to
look up XIDs.
I was concerned we wouldn't keep enough subtrans, rather than clog. But I
think we're ok, because we need to have an appropriate ->xmin for exporting /
importing the snapshot.
Greetings,
Andres Freund
On Tue, Sep 13, 2022 at 3:22 AM Andres Freund <andres@anarazel.de> wrote:
It's not obvious to me that it's the right design (or even correct) to ask
reorderbuffer about an xid being a subxid. Maybe I'm missing something, but
why would reorderbuffer even be guaranteed to know about all these subxids?
Yeah, you are right, the reorderbuffer will only know about the
transaction for which changes got added to the reorder buffer. So
this seems not to be the right design idea.
I wonder if a better fix here wouldn't be to allow importing a snapshot with a
larger ->xid array. Yes, we can't do that in CurrentSnapshotData, but IIRC we
need to be in a transactional snapshot anyway, which is copied anyway?
Yeah when I first found this issue, I thought that should be the
solution. But later it went in a different direction.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Thanks for raizing this up, Robert and the comment, Andres.
At Tue, 13 Sep 2022 07:00:42 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
On Tue, Sep 13, 2022 at 3:22 AM Andres Freund <andres@anarazel.de> wrote:
It's not obvious to me that it's the right design (or even correct) to ask
reorderbuffer about an xid being a subxid. Maybe I'm missing something, but
why would reorderbuffer even be guaranteed to know about all these subxids?Yeah, you are right, the reorderbuffer will only know about the
transaction for which changes got added to the reorder buffer. So
this seems not to be the right design idea.
That function is called after the SnapBuild reaches
SNAPBUILD_CONSISTENT state ,or SnapBuildInitialSnapshot() rejects
other than that state. That is, IIUC the top-sub relationship of all
the currently running transactions is fully known to reorder buffer.
We need a comment about that.
I wonder if a better fix here wouldn't be to allow importing a snapshot with a
larger ->xid array. Yes, we can't do that in CurrentSnapshotData, but IIRC we
need to be in a transactional snapshot anyway, which is copied anyway?Yeah when I first found this issue, I thought that should be the
solution. But later it went in a different direction.
I think that that is the best solution if rbtxn_is_known_subxzact() is
known to be unreliable at the time.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Tue, Sep 13, 2022 at 11:52 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
Thanks for raizing this up, Robert and the comment, Andres.
At Tue, 13 Sep 2022 07:00:42 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
On Tue, Sep 13, 2022 at 3:22 AM Andres Freund <andres@anarazel.de> wrote:
It's not obvious to me that it's the right design (or even correct) to ask
reorderbuffer about an xid being a subxid. Maybe I'm missing something, but
why would reorderbuffer even be guaranteed to know about all these subxids?Yeah, you are right, the reorderbuffer will only know about the
transaction for which changes got added to the reorder buffer. So
this seems not to be the right design idea.That function is called after the SnapBuild reaches
SNAPBUILD_CONSISTENT state ,or SnapBuildInitialSnapshot() rejects
other than that state. That is, IIUC the top-sub relationship of all
the currently running transactions is fully known to reorder buffer.
We need a comment about that.
I don't think this assumption is true, any xid started after switching
to the SNAPBUILD_FULL_SNAPSHOT and before switching to the
SNAPBUILD_CONSISTENT, might still be in progress so we can not
identify whether they are subxact or not from reorder buffer.
refer to this comment:
/*
* c) transition from FULL_SNAPSHOT to CONSISTENT.
*
* In FULL_SNAPSHOT state (see d) ), and this xl_running_xacts'
* oldestRunningXid is >= than nextXid from when we switched to
* FULL_SNAPSHOT. This means all transactions that are currently in
* progress have a catalog snapshot, and all their changes have been
* collected. Switch to CONSISTENT.
*/
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
At Mon, 12 Sep 2022 14:51:56 -0700, Andres Freund <andres@anarazel.de> wrote in
Hi,
Thanks for working on this!
I think this should include a test that fails without this change and succeeds
with it...On 2022-07-19 11:55:06 +0900, Kyotaro Horiguchi wrote:
From abcf0a0e0b3e2de9927d8943a3e3c145ab189508 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 19 Jul 2022 11:50:29 +0900
Subject: [PATCH v6] Create correct snapshot during CREATE_REPLICATION_SLOTThis sees a tad misleading - the previous snapshot wasn't borken, right?
I saw it kind of broken that ->xip contains sub transactions. But I
didn't meant it's broken by "correct". Is "proper" suitable there?
+/* + * ReorderBufferXidIsKnownSubXact + * Returns true if the xid is a known subtransaction. + */ +bool +ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid) +{ + ReorderBufferTXN *txn; + + txn = ReorderBufferTXNByXid(rb, xid, false, + NULL, InvalidXLogRecPtr, false); + + /* a known subtxn? */ + if (txn && rbtxn_is_known_subxact(txn)) + return true; + + return false; +}The comments here just seem to restate the code....
Yeah, it is pulled from the existing code but result looks like so..
It's not obvious to me that it's the right design (or even correct) to ask
reorderbuffer about an xid being a subxid. Maybe I'm missing something, but
why would reorderbuffer even be guaranteed to know about all these subxids?
I think you're missing that the code is visited only after the reorder
buffer's state becomes SNAPBUILD_CONSISTENT. I think
rbtxn_is_known_subxact() is reliable at that stage.
@@ -568,9 +571,17 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
MyProc->xmin = snap->xmin;
- /* allocate in transaction context */ + /* + * Allocate in transaction context. + * + * We could use only subxip to store all xids (takenduringrecovery + * snapshot) but that causes useless visibility checks later so we hasle to + * create a normal snapshot. + */I can't really parse this comment at this point, and I seriously doubt I could
later on.
Mmm. The "takenduringrecovery" is relly perplexing (it has been
somehow lower-cased..), but after removing the parenthesized part, it
looks like this. And it had a misspelling but I removed that word. Is
this still unreadable?
We could use only subxip to store all xids but that causes useless
visibility checks later so we create a normal snapshot.
@@ -591,12 +605,24 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
if (test == NULL) { - if (newxcnt >= GetMaxSnapshotXidCount()) - ereport(ERROR, - (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), - errmsg("initial slot snapshot too large"))); - - newxip[newxcnt++] = xid; + /* Store the xid to the appropriate xid array */ + if (ReorderBufferXidIsKnownSubXact(builder->reorder, xid)) + { + if (!overflowed) + { + if (newsubxcnt >= GetMaxSnapshotSubxidCount()) + overflowed = true; + else + newsubxip[newsubxcnt++] = xid; + } + } + else + { + if (newxcnt >= GetMaxSnapshotXidCount()) + elog(ERROR, + "too many transactions while creating snapshot"); + newxip[newxcnt++] = xid; + } }Hm, this is starting to be pretty deeply nested...
Yeah, at least one if() is removable.
I wonder if a better fix here wouldn't be to allow importing a snapshot with a
larger ->xid array. Yes, we can't do that in CurrentSnapshotData, but IIRC we
need to be in a transactional snapshot anyway, which is copied anyway?
The other reason for oversized xip array is it causes visibility check
when it is used. AFAICS XidInMVCCSnapshot has additional path for
takenDuringRecovery snapshots that contains a linear search (currently
it is replaced by pg_lfind32()). This saves us from doing this for
that snapshot.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
At Tue, 13 Sep 2022 12:08:18 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
On Tue, Sep 13, 2022 at 11:52 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:That function is called after the SnapBuild reaches
SNAPBUILD_CONSISTENT state ,or SnapBuildInitialSnapshot() rejects
other than that state. That is, IIUC the top-sub relationship of all
the currently running transactions is fully known to reorder buffer.
We need a comment about that.I don't think this assumption is true, any xid started after switching
to the SNAPBUILD_FULL_SNAPSHOT and before switching to the
SNAPBUILD_CONSISTENT, might still be in progress so we can not
identify whether they are subxact or not from reorder buffer.
Yeah, I misunderstood that the relationship is recorded earlier
(how?). Thus it is not reliable in the first place.
I agree that the best way is oversized xip.
By the way, I feel that "is >= than" is redundant or plain wrong..
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
At Tue, 13 Sep 2022 15:45:07 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
At Mon, 12 Sep 2022 14:51:56 -0700, Andres Freund <andres@anarazel.de> wrote in
This sees a tad misleading - the previous snapshot wasn't borken, right?
I saw it kind of broken that ->xip contains sub transactions. But I
didn't meant it's broken by "correct". Is "proper" suitable there?
No. It's not broken if it is takenDuringRecovery. So this flag can be
used to notify that xip can be oversized.
I realized that rbtxn_is_known_subxact is not reliable. I'm
redirecting to oversized xip. Pleas wait for a while.
regareds.
--
Kyotaro Horiguchi
NTT Open Source Software Center
At Tue, 13 Sep 2022 16:10:59 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
At Tue, 13 Sep 2022 12:08:18 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
On Tue, Sep 13, 2022 at 11:52 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:That function is called after the SnapBuild reaches
SNAPBUILD_CONSISTENT state ,or SnapBuildInitialSnapshot() rejects
other than that state. That is, IIUC the top-sub relationship of all
the currently running transactions is fully known to reorder buffer.
We need a comment about that.I don't think this assumption is true, any xid started after switching
to the SNAPBUILD_FULL_SNAPSHOT and before switching to the
SNAPBUILD_CONSISTENT, might still be in progress so we can not
identify whether they are subxact or not from reorder buffer.Yeah, I misunderstood that the relationship is recorded earlier
(how?). Thus it is not reliable in the first place.I agree that the best way is oversized xip.
By the way, I feel that "is >= than" is redundant or plain wrong..
By the way GetSnapshotData() does this:
snapshot->subxip = (TransactionId *)
malloc(GetMaxSnapshotSubxidCount() * sizeof(TransactionId));
...
if (!snapshot->takenDuringRecovery)
...
else
{
subcount = KnownAssignedXidsGetAndSetXmin(snapshot->subxip, &xmin,
xmax);
It is possible that the subxip is overrun. We need to expand the array
somehow. Or assign the array of the size (GetMaxSnapshotXidCount() +
GetMaxSnapshotSubxidCount()) for takenDuringRecovery snapshots.
(I feel deja vu..)
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Sigh..
At Tue, 13 Sep 2022 16:30:06 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
It is possible that the subxip is overrun. We need to expand the array
somehow. Or assign the array of the size (GetMaxSnapshotXidCount() +
GetMaxSnapshotSubxidCount()) for takenDuringRecovery snapshots.
And I found that this is already done. What we should is doing the
same thing in snapbuild.
Sorry for the noise..
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
At Tue, 13 Sep 2022 16:15:34 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
At Tue, 13 Sep 2022 15:45:07 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
At Mon, 12 Sep 2022 14:51:56 -0700, Andres Freund <andres@anarazel.de> wrote in
This sees a tad misleading - the previous snapshot wasn't borken, right?
I saw it kind of broken that ->xip contains sub transactions. But I
didn't meant it's broken by "correct". Is "proper" suitable there?No. It's not broken if it is takenDuringRecovery. So this flag can be
used to notify that xip can be oversized.I realized that rbtxn_is_known_subxact is not reliable. I'm
redirecting to oversized xip. Pleas wait for a while.
However, the reader of saved snapshots (ImportSnapshot) has the
restriction that
if (xcnt < 0 || xcnt > GetMaxSnapshotXidCount())
ereport(ERROR,
and
if (xcnt < 0 || xcnt > GetMaxSnapshotSubxidCount())
ereport(ERROR,
(this xid is subxcnt)
And ExportSnapshot repalces oversized subxip with overflowed.
So even when GetSnapshotData() returns a snapshot with oversized
subxip, it is saved as just "overflowed" on exporting. I don't think
this is the expected behavior since such (no xip and overflowed)
snapshot no longer works.
On the other hand, it seems to me that snapbuild doesn't like
takenDuringRecovery snapshots.
So snapshot needs additional flag signals that xip is oversized and
all xid are holded there. And also need to let takenDuringRecovery
suggest subxip is oversized.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Tue, Sep 13, 2022 at 05:31:05PM +0900, Kyotaro Horiguchi wrote:
And ExportSnapshot repalces oversized subxip with overflowed.
So even when GetSnapshotData() returns a snapshot with oversized
subxip, it is saved as just "overflowed" on exporting. I don't think
this is the expected behavior since such (no xip and overflowed)
snapshot no longer works.On the other hand, it seems to me that snapbuild doesn't like
takenDuringRecovery snapshots.So snapshot needs additional flag signals that xip is oversized and
all xid are holded there. And also need to let takenDuringRecovery
suggest subxip is oversized.
The discussion seems to have stopped here. As this is classified as a
bug fix, I have moved this patch to next CF, waiting on author for the
moment.
--
Michael
Hi,
On 2022-10-12 14:10:15 +0900, Michael Paquier wrote:
The discussion seems to have stopped here. As this is classified as a
bug fix, I have moved this patch to next CF, waiting on author for the
moment.
FWIW, I view this more as lifting a limitation. I wouldn't want to
backpatch the change.
Greetings,
Andres Freund
On Wed, 12 Oct 2022 at 01:10, Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Sep 13, 2022 at 05:31:05PM +0900, Kyotaro Horiguchi wrote:
And ExportSnapshot repalces oversized subxip with overflowed.
So even when GetSnapshotData() returns a snapshot with oversized
subxip, it is saved as just "overflowed" on exporting. I don't think
this is the expected behavior since such (no xip and overflowed)
snapshot no longer works.On the other hand, it seems to me that snapbuild doesn't like
takenDuringRecovery snapshots.So snapshot needs additional flag signals that xip is oversized and
all xid are holded there. And also need to let takenDuringRecovery
suggest subxip is oversized.The discussion seems to have stopped here. As this is classified as a
bug fix, I have moved this patch to next CF, waiting on author for the
moment.
Kyotoro Horiguchi, any chance you'll be able to work on this for this
commitfest? If so shout (or anyone else is planning to push it over
the line.... Andres?) otherwise I'll move it on to the next release.
--
Gregory Stark
As Commitfest Manager
At Mon, 20 Mar 2023 13:46:51 -0400, "Gregory Stark (as CFM)" <stark.cfm@gmail.com> wrote in
On Wed, 12 Oct 2022 at 01:10, Michael Paquier <michael@paquier.xyz> wrote:
The discussion seems to have stopped here. As this is classified as a
bug fix, I have moved this patch to next CF, waiting on author for the
moment.Kyotoro Horiguchi, any chance you'll be able to work on this for this
commitfest? If so shout (or anyone else is planning to push it over
the line.... Andres?) otherwise I'll move it on to the next release.
Ugg. sorry for being lazy. I have lost track of the conversation. I'm
currently working on this and will come back soon with a new version.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
At Wed, 22 Mar 2023 14:27:40 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
At Mon, 20 Mar 2023 13:46:51 -0400, "Gregory Stark (as CFM)" <stark.cfm@gmail.com> wrote in
Kyotoro Horiguchi, any chance you'll be able to work on this for this
commitfest? If so shout (or anyone else is planning to push it over
the line.... Andres?) otherwise I'll move it on to the next release.Ugg. sorry for being lazy. I have lost track of the conversation. I'm
currently working on this and will come back soon with a new version.
I relized that attempting to make SnapshotData.xip expansible was
making procarray.c and snapmgr.c too complicated. The main reason is
that SnapShotData is allocated in various ways, like on the stack,
using palloc including xip/subxip arrays, with palloc then allocating
xip/subxip arrays separately, or statically allocated and then having
xip/subxip arrays malloc'ed later. This variety was making the
expansion logic a mess.
So I went back to square one and decided to use subxip as an extension
for the xip array instead.
Like the comment added in the function SnapBuildInitialSnapshot
mentions, I don't think we can reliably identify top-level XIDs. So,
this patch just increases the allowed number of XIDs by using the
subxip array.
(The title of the patch was changed accordingly.)
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
v7-0001-Lift-initial-snapshot-limit-for-logical-replicati.patchtext/x-patch; charset=us-asciiDownload
From 4a41002eea7eaa18bd51521798c19d191d9c6d8c Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 19 Jul 2022 11:50:29 +0900
Subject: [PATCH v7] Lift initial snapshot limit for logical replication
The replication command CREATE_REPLICATION_SLOT tends to fail with
"snapshot too large" when many subtransactions are active. This
problem stems from SnapBuildInitialSnapshot attempting to generate a
snapshot in which all active XIDs, including subxids, are stored
within the xip array. This patch mitigates the constraint on the
acceptable number of transaction IDs by utilizing the subxid array.
Author: Dilip Kumar and Kyotaro Horiguchi
---
src/backend/replication/logical/snapbuild.c | 50 +++++++++++++++++----
src/backend/storage/ipc/procarray.c | 18 ++------
src/include/access/transam.h | 33 ++++++++++++++
3 files changed, 77 insertions(+), 24 deletions(-)
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 62542827e4..516e9ddfc8 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -565,10 +565,14 @@ Snapshot
SnapBuildInitialSnapshot(SnapBuild *builder)
{
Snapshot snap;
+ int ntxn;
+ int xiplen;
TransactionId xid;
TransactionId safeXid;
TransactionId *newxip;
+ TransactionId *newsubxip;
int newxcnt = 0;
+ int newsubxcnt = 0;
Assert(XactIsoLevel == XACT_REPEATABLE_READ);
Assert(builder->building_full_snapshot);
@@ -610,9 +614,35 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
MyProc->xmin = snap->xmin;
- /* allocate in transaction context */
- newxip = (TransactionId *)
- palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
+ /*
+ * Aallocate in transaction context
+ *
+ * Since we know all the active XIDs, this snapshot won't be
+ * suboverflowed. However, if the number of XIDs surpasses the XID arrays'
+ * capacity, we cannot establish this snapshot because we don't know which
+ * XIDs are top-level. Since the original snapshot is historical, we can't
+ * reliably idetify the top-level XIDs. Thus, we have no choice but fail
+ * when there are too many active XIDs.
+ */
+ ntxn = TransactionIdDistance(snap->xmin, snap->xmax) - snap->xcnt;
+ xiplen = GetMaxSnapshotXidCount();
+ newxip = (TransactionId *) palloc(sizeof(TransactionId) * xiplen);
+
+ if (ntxn > GetMaxSnapshotXidCount())
+ {
+ /*
+ * Since we can't identify the top-level XIDs, we can't carete a
+ * suboverflowed snapshot.
+ */
+ if (ntxn > xiplen + GetMaxSnapshotSubxidCount())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("initial slot snapshot too large")));
+
+ newsubxip = (TransactionId *)
+ palloc(sizeof(TransactionId) * GetMaxSnapshotSubxidCount());
+ }
+
/*
* snapbuild.c builds transactions in an "inverted" manner, which means it
@@ -633,21 +663,23 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
if (test == NULL)
{
- if (newxcnt >= GetMaxSnapshotXidCount())
- ereport(ERROR,
- (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
- errmsg("initial slot snapshot too large")));
-
- newxip[newxcnt++] = xid;
+ if (newxcnt < xiplen)
+ newxip[newxcnt++] = xid;
+ else
+ newsubxip[newsubxcnt++] = xid;
}
TransactionIdAdvance(xid);
}
+ Assert(newxcnt + newsubxcnt == ntxn);
+
/* adjust remaining snapshot fields as needed */
snap->snapshot_type = SNAPSHOT_MVCC;
snap->xcnt = newxcnt;
snap->xip = newxip;
+ snap->subxcnt = newsubxcnt;
+ snap->subxip = newsubxip;
return snap;
}
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index ea91ce355f..a0f5ba5a36 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4823,22 +4823,10 @@ KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
Assert(TransactionIdPrecedesOrEquals(from_xid, to_xid));
/*
- * Calculate how many array slots we'll need. Normally this is cheap; in
- * the unusual case where the XIDs cross the wrap point, we do it the hard
- * way.
+ * Calculate how many array slots we'll need. Both ends of the region are
+ * inclusive.
*/
- if (to_xid >= from_xid)
- nxids = to_xid - from_xid + 1;
- else
- {
- nxids = 1;
- next_xid = from_xid;
- while (TransactionIdPrecedes(next_xid, to_xid))
- {
- nxids++;
- TransactionIdAdvance(next_xid);
- }
- }
+ nxids = TransactionIdDistance(from_xid, to_xid) + 1;
/*
* Since only the startup process modifies the head/tail pointers, we
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index f5af6d3055..b6a3c8abee 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -370,6 +370,39 @@ FullTransactionIdNewer(FullTransactionId a, FullTransactionId b)
return b;
}
+
+/* return the distance between the two IDs */
+static inline int
+TransactionIdDistance(TransactionId from_xid, TransactionId to_xid)
+{
+ int nxids;
+ TransactionId next_xid;
+
+ StaticAssertDecl(sizeof(int) >= sizeof(TransactionId),
+ "TransactionId exceeds the width of int");
+ Assert(TransactionIdPrecedesOrEquals(from_xid, to_xid));
+
+ /*
+ * Calculate the distance between the two XIDs. Normally this is cheap; in
+ * the unusual case where the XIDs cross the wrap point, we do it the hard
+ * way.
+ */
+ if (to_xid >= from_xid)
+ nxids = to_xid - from_xid;
+ else
+ {
+ nxids = 0;
+ next_xid = from_xid;
+ while (TransactionIdPrecedes(next_xid, to_xid))
+ {
+ nxids++;
+ TransactionIdAdvance(next_xid);
+ }
+ }
+
+ return nxids;
+}
+
#endif /* FRONTEND */
#endif /* TRANSAM_H */
--
2.31.1
On Thu, Mar 23, 2023 at 10:53 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
At Wed, 22 Mar 2023 14:27:40 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
At Mon, 20 Mar 2023 13:46:51 -0400, "Gregory Stark (as CFM)" <stark.cfm@gmail.com> wrote in
Kyotoro Horiguchi, any chance you'll be able to work on this for this
commitfest? If so shout (or anyone else is planning to push it over
the line.... Andres?) otherwise I'll move it on to the next release.Ugg. sorry for being lazy. I have lost track of the conversation. I'm
currently working on this and will come back soon with a new version.I relized that attempting to make SnapshotData.xip expansible was
making procarray.c and snapmgr.c too complicated. The main reason is
that SnapShotData is allocated in various ways, like on the stack,
using palloc including xip/subxip arrays, with palloc then allocating
xip/subxip arrays separately, or statically allocated and then having
xip/subxip arrays malloc'ed later. This variety was making the
expansion logic a mess.So I went back to square one and decided to use subxip as an extension
for the xip array instead.Like the comment added in the function SnapBuildInitialSnapshot
mentions, I don't think we can reliably identify top-level XIDs. So,
this patch just increases the allowed number of XIDs by using the
subxip array.
Thanks for working on this, your idea looks fine but my only worry is
that in the future if someone tries to change the logic of
XidInMVCCSnapshot() then they must be aware that the snap->xip array
and snap->subxip array no long distinguishes the top xids and subxids.
I agree with the current logic if we are not marking sub-overflow then
there is no issue, so can we document this in the SnapshotData
structure?
Also, there are some typos in the patch
/idetify/identify
/carete/create
/Aallocate/Allocate
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Thanks for looking this!
At Thu, 23 Mar 2023 14:15:12 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
On Thu, Mar 23, 2023 at 10:53 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
Thanks for working on this, your idea looks fine but my only worry is
that in the future if someone tries to change the logic of
XidInMVCCSnapshot() then they must be aware that the snap->xip array
and snap->subxip array no long distinguishes the top xids and subxids.
Yeah, I had the same thought when I was working on the posted version.
I agree with the current logic if we are not marking sub-overflow then
there is no issue, so can we document this in the SnapshotData
structure?
(I found that it was alrady mentioned...)
In a unpublished version (what I referred to as "a mess"), I added a
flag called "topsub_mixed" to SnapshotData, indicating that XIDs of
top and sub transactions are stored in xip and subxip arrays in a
mixed manner. However, I eventually removed it since it could only be
used for sanity checks related to suboverflowed.
I inserted the following sentense in the middle of the comments for
xip and subxip.
In the case of !suboverflowed, there's a situation where this
contains both top and sub-transaction IDs in a mixed manner.
And added similar a similar sentense to a comment of
XidInMVCCSnapshot.
While doning this, I realized that we could simplify and optimize XID
search code by combining the two XID arrays. If !suboverflowed, the
array stored all active XIDs of both top and
sub-transactions. Otherwise it only stores active top XIDs and maybe
XIDs of some sub-transactions. If many subXIDs are stored when
overflowed, there might lead to some degradation but I think the win
we gain from searching just one XID array in most cases outweighs
that. (I didn't do this (of course) in this version.)
Also, there are some typos in the patch
/idetify/identify
/carete/create
/Aallocate/Allocate
Oops! Thanks for pointing out them.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
v8-0001-Lift-initial-snapshot-limit-for-logical-replicati.patchtext/x-patch; charset=us-asciiDownload
From 9536d174b05f0d112dde71a4a4ca61ae08f4ae9a Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 19 Jul 2022 11:50:29 +0900
Subject: [PATCH v8] Lift initial snapshot limit for logical replication
The replication command CREATE_REPLICATION_SLOT tends to fail with
"snapshot too large" when many subtransactions are active. This
problem stems from SnapBuildInitialSnapshot attempting to generate a
snapshot in which all active XIDs, including subxids, are stored
within the xip array. This patch mitigates the constraint on the
acceptable number of transaction IDs by utilizing the subxid array.
Author: Dilip Kumar and Kyotaro Horiguchi
---
src/backend/replication/logical/snapbuild.c | 50 +++++++++++++++++----
src/backend/storage/ipc/procarray.c | 18 ++------
src/backend/utils/time/snapmgr.c | 8 ++--
src/include/access/transam.h | 33 ++++++++++++++
src/include/utils/snapshot.h | 12 +++--
5 files changed, 90 insertions(+), 31 deletions(-)
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 62542827e4..3a15c3cea7 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -565,10 +565,14 @@ Snapshot
SnapBuildInitialSnapshot(SnapBuild *builder)
{
Snapshot snap;
+ int ntxn;
+ int xiplen;
TransactionId xid;
TransactionId safeXid;
TransactionId *newxip;
+ TransactionId *newsubxip;
int newxcnt = 0;
+ int newsubxcnt = 0;
Assert(XactIsoLevel == XACT_REPEATABLE_READ);
Assert(builder->building_full_snapshot);
@@ -610,9 +614,35 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
MyProc->xmin = snap->xmin;
- /* allocate in transaction context */
- newxip = (TransactionId *)
- palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
+ /*
+ * Allocate in transaction context
+ *
+ * Since we know all the active XIDs, this snapshot won't be
+ * suboverflowed. However, if the number of XIDs surpasses the XID arrays'
+ * capacity, we cannot establish this snapshot because we don't know which
+ * XIDs are top-level. Since the original snapshot is historical, we can't
+ * reliably identify the top-level XIDs. Thus, we have no choice but fail
+ * when there are too many active XIDs.
+ */
+ ntxn = TransactionIdDistance(snap->xmin, snap->xmax) - snap->xcnt;
+ xiplen = GetMaxSnapshotXidCount();
+ newxip = (TransactionId *) palloc(sizeof(TransactionId) * xiplen);
+
+ if (ntxn > GetMaxSnapshotXidCount())
+ {
+ /*
+ * Since we can't identify the top-level XIDs, we can't create a
+ * suboverflowed snapshot.
+ */
+ if (ntxn > xiplen + GetMaxSnapshotSubxidCount())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("initial slot snapshot too large")));
+
+ newsubxip = (TransactionId *)
+ palloc(sizeof(TransactionId) * GetMaxSnapshotSubxidCount());
+ }
+
/*
* snapbuild.c builds transactions in an "inverted" manner, which means it
@@ -633,21 +663,23 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
if (test == NULL)
{
- if (newxcnt >= GetMaxSnapshotXidCount())
- ereport(ERROR,
- (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
- errmsg("initial slot snapshot too large")));
-
- newxip[newxcnt++] = xid;
+ if (newxcnt < xiplen)
+ newxip[newxcnt++] = xid;
+ else
+ newsubxip[newsubxcnt++] = xid;
}
TransactionIdAdvance(xid);
}
+ Assert(newxcnt + newsubxcnt == ntxn);
+
/* adjust remaining snapshot fields as needed */
snap->snapshot_type = SNAPSHOT_MVCC;
snap->xcnt = newxcnt;
snap->xip = newxip;
+ snap->subxcnt = newsubxcnt;
+ snap->subxip = newsubxip;
return snap;
}
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index ea91ce355f..a0f5ba5a36 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4823,22 +4823,10 @@ KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
Assert(TransactionIdPrecedesOrEquals(from_xid, to_xid));
/*
- * Calculate how many array slots we'll need. Normally this is cheap; in
- * the unusual case where the XIDs cross the wrap point, we do it the hard
- * way.
+ * Calculate how many array slots we'll need. Both ends of the region are
+ * inclusive.
*/
- if (to_xid >= from_xid)
- nxids = to_xid - from_xid + 1;
- else
- {
- nxids = 1;
- next_xid = from_xid;
- while (TransactionIdPrecedes(next_xid, to_xid))
- {
- nxids++;
- TransactionIdAdvance(next_xid);
- }
- }
+ nxids = TransactionIdDistance(from_xid, to_xid) + 1;
/*
* Since only the startup process modifies the head/tail pointers, we
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 7d11ae3478..f7951d8787 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -2310,9 +2310,11 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
/*
* If the snapshot contains full subxact data, the fastest way to
* check things is just to compare the given XID against both subxact
- * XIDs and top-level XIDs. If the snapshot overflowed, we have to
- * use pg_subtrans to convert a subxact XID to its parent XID, but
- * then we need only look at top-level XIDs not subxacts.
+ * XIDs and top-level XIDs. Note that there's a case where both arrays
+ * contain top and sub-transaction's XIDs in a mixed manner. If the
+ * snapshot overflowed, we have to use pg_subtrans to convert a subxact
+ * XID to its parent XID, but then we need only look at top-level XIDs
+ * not subxacts.
*/
if (!snapshot->suboverflowed)
{
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index f5af6d3055..b6a3c8abee 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -370,6 +370,39 @@ FullTransactionIdNewer(FullTransactionId a, FullTransactionId b)
return b;
}
+
+/* return the distance between the two IDs */
+static inline int
+TransactionIdDistance(TransactionId from_xid, TransactionId to_xid)
+{
+ int nxids;
+ TransactionId next_xid;
+
+ StaticAssertDecl(sizeof(int) >= sizeof(TransactionId),
+ "TransactionId exceeds the width of int");
+ Assert(TransactionIdPrecedesOrEquals(from_xid, to_xid));
+
+ /*
+ * Calculate the distance between the two XIDs. Normally this is cheap; in
+ * the unusual case where the XIDs cross the wrap point, we do it the hard
+ * way.
+ */
+ if (to_xid >= from_xid)
+ nxids = to_xid - from_xid;
+ else
+ {
+ nxids = 0;
+ next_xid = from_xid;
+ while (TransactionIdPrecedes(next_xid, to_xid))
+ {
+ nxids++;
+ TransactionIdAdvance(next_xid);
+ }
+ }
+
+ return nxids;
+}
+
#endif /* FRONTEND */
#endif /* TRANSAM_H */
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index 583a667a40..0de6aed6e0 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -160,8 +160,10 @@ typedef struct SnapshotData
/*
* For normal MVCC snapshot this contains the all xact IDs that are in
* progress, unless the snapshot was taken during recovery in which case
- * it's empty. For historic MVCC snapshots, the meaning is inverted, i.e.
- * it contains *committed* transactions between xmin and xmax.
+ * it's empty. In the case of !suboverflowed, there's a situation where
+ * this contains both top and sub-transaction IDs in a mixed manner. For
+ * historic MVCC snapshots, the meaning is inverted, i.e. it contains
+ * *committed* transactions between xmin and xmax.
*
* note: all ids in xip[] satisfy xmin <= xip[i] < xmax
*/
@@ -171,8 +173,10 @@ typedef struct SnapshotData
/*
* For non-historic MVCC snapshots, this contains subxact IDs that are in
* progress (and other transactions that are in progress if taken during
- * recovery). For historic snapshot it contains *all* xids assigned to the
- * replayed transaction, including the toplevel xid.
+ * recovery). In the case of !suboverflowed, there's a situation where this
+ * contains both top and sub-transaction IDs in a mixed manner. For
+ * historic snapshot it contains *all* xids assigned to the replayed
+ * transaction, including the toplevel xid.
*
* note: all ids in subxip[] are >= xmin, but we don't bother filtering
* out any that are >= xmax
--
2.31.1
On Thu, Mar 23, 2023 at 11:02 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
[ new patch ]
Well, I guess nobody is too excited about fixing this, because it's
been another 10 months with no discussion. Andres doesn't even seem to
think this is as much a bug as it is a limitation, for all that it's
filed in the CF application under bug fixes. I kind of wonder if we
should just close this entry in the CF, but I'll hold off on that for
now.
/*
* For normal MVCC snapshot this contains the all xact IDs that are in
* progress, unless the snapshot was taken during recovery in which case
- * it's empty. For historic MVCC snapshots, the meaning is inverted, i.e.
- * it contains *committed* transactions between xmin and xmax.
+ * it's empty. In the case of !suboverflowed, there's a situation where
+ * this contains both top and sub-transaction IDs in a mixed manner. For
+ * historic MVCC snapshots, the meaning is inverted, i.e. it contains
+ * *committed* transactions between xmin and xmax.
*
* note: all ids in xip[] satisfy xmin <= xip[i] < xmax
*/
I have to say that I don't like this at all. It's bad enough that we
already use the xip/subxip arrays in two different ways depending on
the situation. Increasing that to three different ways seems painful.
How is anyone supposed to keep track of how the array is being used at
which point in the code?
If we are going to do that, I suspect it needs comment updates in more
places than what the patch does currently. For instance:
+ if (newxcnt < xiplen)
+ newxip[newxcnt++] = xid;
+ else
+ newsubxip[newsubxcnt++] = xid;
Just imagine coming across this code in 5 or 10 years and finding that
it had no comment explaining anything. Yikes!
Aside from the details of the patch, and perhaps more seriously, I'm
not really clear that we have consensus on an approach. A few
different proposals seem to have been floated, and it doesn't seem
like anybody hates anybody else's idea completely, but it doesn't
really seem like everyone agrees on what to do, either.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Sat, 6 Jan 2024 at 01:47, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Mar 23, 2023 at 11:02 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:[ new patch ]
Well, I guess nobody is too excited about fixing this, because it's
been another 10 months with no discussion. Andres doesn't even seem to
think this is as much a bug as it is a limitation, for all that it's
filed in the CF application under bug fixes. I kind of wonder if we
should just close this entry in the CF, but I'll hold off on that for
now.
I have changed the status of the patch to "Waiting on Author" as we
don't have a concrete patch with an accepted design which is in a
reviewable shape. We can think if we want to pursue this patch further
or probably close this in the current commitfest and start it again
when someone wants to work on this more actively.
Regards,
Vignesh
Thank you for the comments. This will help move the discussion
forward.
At Fri, 5 Jan 2024 15:17:11 -0500, Robert Haas <robertmhaas@gmail.com> wrote in
On Thu, Mar 23, 2023 at 11:02 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:[ new patch ]
Well, I guess nobody is too excited about fixing this, because it's
been another 10 months with no discussion. Andres doesn't even seem to
think this is as much a bug as it is a limitation, for all that it's
filed in the CF application under bug fixes. I kind of wonder if we
should just close this entry in the CF, but I'll hold off on that for
now.
Perhaps you are correct. Ultimately, this issue is largely
theoretical, and I don't believe anyone would be inconvenienced by
imposing this contraint.
* note: all ids in xip[] satisfy xmin <= xip[i] < xmax
*/I have to say that I don't like this at all. It's bad enough that we
already use the xip/subxip arrays in two different ways depending on
the situation. Increasing that to three different ways seems painful.
How is anyone supposed to keep track of how the array is being used at
which point in the code?
I understand. So, summirizing the current state briefly, I believe it
as follows:
a. snapbuild lacks a means to differentiate between top and sub xids
during snapshot building.
b. Abusing takenDuringRecovery could lead to potential issues, so it
has been rejected.
c. Dynamic sizing of xip is likely to have a significant impact on
performance (as mentioned in the comments of GetSnapshotData).
d. (new!) Adding a third storing method is not favored.
As a method to satisfy these prerequisites, I think one direction
could be to split takenDuringRecovery into flags indicating the
storage method and creation timing. I present this as a last-ditch
effort. It's a rough proposal, and further validation should be
necessary. If this direction is also not acceptable, I'll proceed with
removing the CF entry.
If we are going to do that, I suspect it needs comment updates in more
places than what the patch does currently. For instance:+ if (newxcnt < xiplen) + newxip[newxcnt++] = xid; + else + newsubxip[newsubxcnt++] = xid;Just imagine coming across this code in 5 or 10 years and finding that
it had no comment explaining anything. Yikes!
^^;
Aside from the details of the patch, and perhaps more seriously, I'm
not really clear that we have consensus on an approach. A few
different proposals seem to have been floated, and it doesn't seem
like anybody hates anybody else's idea completely, but it doesn't
really seem like everyone agrees on what to do, either.
I don't fully agree with that.It's not so much that I dislike other
proposals, but rather that we haven't been able to find a definitive
solution that stands out.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
v9-0001-Lift-initial-snapshot-limit-for-logical-replicati.patchtext/x-patch; charset=us-asciiDownload
From cd310711c1565b686416e8bf5a3a5105002e35c3 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Fri, 12 Jan 2024 11:36:41 +0900
Subject: [PATCH v9] Lift initial snapshot limit for logical replication
The replication command CREATE_REPLICATION_SLOT often fails with
"snapshot too large" error when numerous subtransactions are
active. This issue stems from SnapBuildInitialSnapshot attempting to
generate a snapshot that includes all active XIDs, including subxids,
stored within the xip array. This patch mitigates the constraint by
utilizing the same storing method as takenDuringRecovery.
---
src/backend/replication/logical/snapbuild.c | 22 +++++++++++++--------
src/backend/storage/ipc/procarray.c | 1 +
src/backend/utils/time/snapmgr.c | 15 +++++++++-----
src/include/utils/snapshot.h | 3 ++-
4 files changed, 27 insertions(+), 14 deletions(-)
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index a0b7947d2f..528bdbb662 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -388,6 +388,7 @@ SnapBuildFreeSnapshot(Snapshot snap)
Assert(snap->curcid == FirstCommandId);
Assert(!snap->suboverflowed);
Assert(!snap->takenDuringRecovery);
+ Assert(!snap->mixed);
Assert(snap->regd_count == 0);
/* slightly more likely, so it's checked even without c-asserts */
@@ -464,6 +465,7 @@ SnapBuildSnapDecRefcount(Snapshot snap)
Assert(snap->curcid == FirstCommandId);
Assert(!snap->suboverflowed);
Assert(!snap->takenDuringRecovery);
+ Assert(!snap->mixed);
Assert(snap->regd_count == 0);
@@ -550,6 +552,7 @@ SnapBuildBuildSnapshot(SnapBuild *builder)
snapshot->suboverflowed = false;
snapshot->takenDuringRecovery = false;
+ snapshot->mixed = false;
snapshot->copied = false;
snapshot->curcid = FirstCommandId;
snapshot->active_count = 0;
@@ -572,8 +575,8 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
Snapshot snap;
TransactionId xid;
TransactionId safeXid;
- TransactionId *newxip;
- int newxcnt = 0;
+ TransactionId *newsubxip;
+ int newsubxcnt = 0;
Assert(XactIsoLevel == XACT_REPEATABLE_READ);
Assert(builder->building_full_snapshot);
@@ -616,8 +619,8 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
MyProc->xmin = snap->xmin;
/* allocate in transaction context */
- newxip = (TransactionId *)
- palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
+ newsubxip = (TransactionId *)
+ palloc(sizeof(TransactionId) * GetMaxSnapshotSubxidCount());
/*
* snapbuild.c builds transactions in an "inverted" manner, which means it
@@ -638,12 +641,12 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
if (test == NULL)
{
- if (newxcnt >= GetMaxSnapshotXidCount())
+ if (newsubxcnt >= GetMaxSnapshotSubxidCount())
ereport(ERROR,
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
errmsg("initial slot snapshot too large")));
- newxip[newxcnt++] = xid;
+ newsubxip[newsubxcnt++] = xid;
}
TransactionIdAdvance(xid);
@@ -651,8 +654,11 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
/* adjust remaining snapshot fields as needed */
snap->snapshot_type = SNAPSHOT_MVCC;
- snap->xcnt = newxcnt;
- snap->xip = newxip;
+ snap->xcnt = 0;
+ snap->xip = NULL;
+ snap->subxcnt = newsubxcnt;
+ snap->subxip = newsubxip;
+ snap->mixed = true;
return snap;
}
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index e2f71da4a0..744ba27441 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2377,6 +2377,7 @@ GetSnapshotData(Snapshot snapshot)
* those newly added transaction ids would be filtered away, so we
* need not be concerned about them.
*/
+ snapshot->mixed = true;
subcount = KnownAssignedXidsGetAndSetXmin(snapshot->subxip, &xmin,
xmax);
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 675e81d82d..6ba9fda4ef 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -205,6 +205,7 @@ typedef struct SerializedSnapshotData
int32 subxcnt;
bool suboverflowed;
bool takenDuringRecovery;
+ bool mixed;
CommandId curcid;
TimestampTz whenTaken;
XLogRecPtr lsn;
@@ -519,6 +520,7 @@ SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
sourcesnap->subxcnt * sizeof(TransactionId));
CurrentSnapshot->suboverflowed = sourcesnap->suboverflowed;
CurrentSnapshot->takenDuringRecovery = sourcesnap->takenDuringRecovery;
+ CurrentSnapshot->mixed = sourcesnap->mixed;
/* NB: curcid should NOT be copied, it's a local matter */
CurrentSnapshot->snapXactCompletionCount = 0;
@@ -616,8 +618,7 @@ CopySnapshot(Snapshot snapshot)
* snapshot taken during recovery; all the top-level XIDs are in subxip as
* well in that case, so we mustn't lose them.
*/
- if (snapshot->subxcnt > 0 &&
- (!snapshot->suboverflowed || snapshot->takenDuringRecovery))
+ if (snapshot->subxcnt > 0 && (!snapshot->suboverflowed || snapshot->mixed))
{
newsnap->subxip = (TransactionId *) ((char *) newsnap + subxipoff);
memcpy(newsnap->subxip, snapshot->subxip,
@@ -1226,6 +1227,7 @@ ExportSnapshot(Snapshot snapshot)
appendStringInfo(&buf, "sxp:%u\n", children[i]);
}
appendStringInfo(&buf, "rec:%u\n", snapshot->takenDuringRecovery);
+ appendStringInfo(&buf, "mixed:%u\n", snapshot->mixed);
/*
* Now write the text representation into a file. We first write to a
@@ -1502,6 +1504,7 @@ ImportSnapshot(const char *idstr)
}
snapshot.takenDuringRecovery = parseIntFromText("rec:", &filebuf, path);
+ snapshot.mixed = parseIntFromText("mixed:", &filebuf, path);
/*
* Do some additional sanity checking, just to protect ourselves. We
@@ -1706,7 +1709,7 @@ EstimateSnapshotSpace(Snapshot snapshot)
size = add_size(sizeof(SerializedSnapshotData),
mul_size(snapshot->xcnt, sizeof(TransactionId)));
if (snapshot->subxcnt > 0 &&
- (!snapshot->suboverflowed || snapshot->takenDuringRecovery))
+ (!snapshot->suboverflowed || snapshot->mixed))
size = add_size(size,
mul_size(snapshot->subxcnt, sizeof(TransactionId)));
@@ -1732,6 +1735,7 @@ SerializeSnapshot(Snapshot snapshot, char *start_address)
serialized_snapshot.subxcnt = snapshot->subxcnt;
serialized_snapshot.suboverflowed = snapshot->suboverflowed;
serialized_snapshot.takenDuringRecovery = snapshot->takenDuringRecovery;
+ serialized_snapshot.mixed = snapshot->mixed;
serialized_snapshot.curcid = snapshot->curcid;
serialized_snapshot.whenTaken = snapshot->whenTaken;
serialized_snapshot.lsn = snapshot->lsn;
@@ -1741,7 +1745,7 @@ SerializeSnapshot(Snapshot snapshot, char *start_address)
* taken during recovery - in that case, top-level XIDs are in subxip as
* well, and we mustn't lose them.
*/
- if (serialized_snapshot.suboverflowed && !snapshot->takenDuringRecovery)
+ if (serialized_snapshot.suboverflowed && !snapshot->mixed)
serialized_snapshot.subxcnt = 0;
/* Copy struct to possibly-unaligned buffer */
@@ -1806,6 +1810,7 @@ RestoreSnapshot(char *start_address)
snapshot->subxcnt = serialized_snapshot.subxcnt;
snapshot->suboverflowed = serialized_snapshot.suboverflowed;
snapshot->takenDuringRecovery = serialized_snapshot.takenDuringRecovery;
+ snapshot->mixed = serialized_snapshot.mixed;
snapshot->curcid = serialized_snapshot.curcid;
snapshot->whenTaken = serialized_snapshot.whenTaken;
snapshot->lsn = serialized_snapshot.lsn;
@@ -1880,7 +1885,7 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
* Snapshot information is stored slightly differently in snapshots taken
* during recovery.
*/
- if (!snapshot->takenDuringRecovery)
+ if (!snapshot->mixed)
{
/*
* If the snapshot contains full subxact data, the fastest way to
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index 8d1e31e888..85b7648932 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -181,7 +181,8 @@ typedef struct SnapshotData
int32 subxcnt; /* # of xact ids in subxip[] */
bool suboverflowed; /* has the subxip array overflowed? */
- bool takenDuringRecovery; /* recovery-shaped snapshot? */
+ bool takenDuringRecovery; /* taken on a standby? */
+ bool mixed; /* both top and sub xids are mixed in subxip */
bool copied; /* false if it's a static snapshot */
CommandId curcid; /* in my xact, CID < curcid are visible */
--
2.39.3
On Thu, Jan 11, 2024 at 9:47 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
I understand. So, summirizing the current state briefly, I believe it
as follows:a. snapbuild lacks a means to differentiate between top and sub xids
during snapshot building.b. Abusing takenDuringRecovery could lead to potential issues, so it
has been rejected.c. Dynamic sizing of xip is likely to have a significant impact on
performance (as mentioned in the comments of GetSnapshotData).d. (new!) Adding a third storing method is not favored.
As a method to satisfy these prerequisites, I think one direction
could be to split takenDuringRecovery into flags indicating the
storage method and creation timing. I present this as a last-ditch
effort. It's a rough proposal, and further validation should be
necessary. If this direction is also not acceptable, I'll proceed with
removing the CF entry.
I think that the idea of evolving takenDuringRecovery could
potentially work for this problem and maybe for some other things as
well. I remember studying that flag before and coming to the
conclusion that it had some pretty specific and surprising semantics
that weren't obvious from the name -- I don't remember the details --
and so I think improving it could be useful. Also, I think that
multiple storage methods could be more palatable if there were a clear
way to distinguish which one was in use and good comments explaining
the distinction in relevant places.
However, I wonder whether this whole area is in need of a bigger
rethink. There seem to be a number of situations in which the split
into xip and subxip arrays is not very convenient, and also some
situations where it's quite important. Sometimes we want to record
what's committed, and sometimes what isn't. It's all a bit messy and
inconsistent. The necessity of limiting snapshot size is annoying,
too. I have no real idea what can be done about all of this, but what
strikes me is that the current system has grown up incrementally: we
started with a data structure designed for the original use case, and
now by gradually adding new use cases things have gotten complicated.
If we were designing things over from scratch, maybe we'd do it
differently and end up with something less messy. And maybe someone
can imagine a redesign that is likewise less messy.
But on the other hand, maybe not. Perhaps we can't really do any
better than what we have. Then the question becomes whether this case
is important enough to justify additional code complexity. I don't
think I've personally seen users run into this problem so I have no
special reason to think that it's important, but if it's causing
issues for other people then maybe it is.
--
Robert Haas
EDB: http://www.enterprisedb.com
At Fri, 12 Jan 2024 11:28:09 -0500, Robert Haas <robertmhaas@gmail.com> wrote in
However, I wonder whether this whole area is in need of a bigger
rethink. There seem to be a number of situations in which the split
into xip and subxip arrays is not very convenient, and also some
situations where it's quite important. Sometimes we want to record
what's committed, and sometimes what isn't. It's all a bit messy and
inconsistent. The necessity of limiting snapshot size is annoying,
too. I have no real idea what can be done about all of this, but what
strikes me is that the current system has grown up incrementally: we
started with a data structure designed for the original use case, and
now by gradually adding new use cases things have gotten complicated.
If we were designing things over from scratch, maybe we'd do it
differently and end up with something less messy. And maybe someone
can imagine a redesign that is likewise less messy.But on the other hand, maybe not. Perhaps we can't really do any
better than what we have. Then the question becomes whether this case
is important enough to justify additional code complexity. I don't
think I've personally seen users run into this problem so I have no
special reason to think that it's important, but if it's causing
issues for other people then maybe it is.
Thank you for the deep insights. I have understood your points. As I
can't think of any further simple modifications on this line, I will
withdraw this CF entry. At the moment, I also lack a fundamental,
comprehensive solution, but should if I or anyone else come up with
such a solution in the future, I believe it would worth a separate
discussion.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Mon, Jan 15, 2024 at 9:18 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
Thank you for the deep insights. I have understood your points. As I
can't think of any further simple modifications on this line, I will
withdraw this CF entry. At the moment, I also lack a fundamental,
comprehensive solution, but should if I or anyone else come up with
such a solution in the future, I believe it would worth a separate
discussion.
I completely agree.
--
Robert Haas
EDB: http://www.enterprisedb.com