Wrong usage of RelationNeedsWAL
Hello.
Commit c6b92041d3 changed the definition of RelationNeedsWAL().
-#define RelationNeedsWAL(relation) \
- ((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT)
+#define RelationNeedsWAL(relation) \
+ ((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT && \
+ (XLogIsNeeded() || \
+ (relation->rd_createSubid == InvalidSubTransactionId && \
+ relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId)))
On the other hand I found this usage.
plancat.c:128 get_relation_info()
/* Temporary and unlogged relations are inaccessible during recovery. */
if (!RelationNeedsWAL(relation) && RecoveryInProgress())
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot access temporary or unlogged relations during recovery")));
It works as expected accidentally, but the meaning is off.
WAL-skipping optmization is irrelevant to the condition for the error.
I found five misues in the tree. Please find the attached.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
0001-Fix-misuses-of-RelationNeedsWAL.patchtext/x-patch; charset=us-asciiDownload+14-8
Hi,
On 2021-01-13 16:07:05 +0900, Kyotaro Horiguchi wrote:
Commit c6b92041d3 changed the definition of RelationNeedsWAL().
-#define RelationNeedsWAL(relation) \ - ((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT) +#define RelationNeedsWAL(relation) \ + ((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT && \ + (XLogIsNeeded() || \ + (relation->rd_createSubid == InvalidSubTransactionId && \ + relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId)))On the other hand I found this usage.
plancat.c:128 get_relation_info()
/* Temporary and unlogged relations are inaccessible during recovery. */
if (!RelationNeedsWAL(relation) && RecoveryInProgress())
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot access temporary or unlogged relations during recovery")));It works as expected accidentally, but the meaning is off.
WAL-skipping optmization is irrelevant to the condition for the error.I found five misues in the tree. Please find the attached.
Noah?
Greetings,
Andres Freund
On Wed, Jan 13, 2021 at 04:07:05PM +0900, Kyotaro Horiguchi wrote:
The definition of the macro RelationNeedsWAL has been changed by
c6b92041d3 to include conditions related to the WAL-skip optimization
but some uses of the macro are not relevant to the optimization. That
misuses are harmless for now as they are only executed while wal_level= replica or WAL-skipping is inactive. However, this should be
corrected to prevent future hazard.
I see user-visible consequences:
--- a/src/backend/catalog/pg_publication.c +++ b/src/backend/catalog/pg_publication.c @@ -67,7 +67,7 @@ check_publication_add_relation(Relation targetrel) errdetail("System tables cannot be added to publications.")));/* UNLOGGED and TEMP relations cannot be part of publication. */ - if (!RelationNeedsWAL(targetrel)) + if (!RelationIsWalLogged(targetrel))
Today, the following fails needlessly under wal_level=minimal:
BEGIN;
SET client_min_messages = 'ERROR';
CREATE TABLE skip_wal ();
CREATE PUBLICATION p FOR TABLE skip_wal;
ROLLBACK;
Could you add that test to one of the regress scripts?
ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("table \"%s\" cannot be replicated", diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index da322b453e..0500efcdb9 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -126,7 +126,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, relation = table_open(relationObjectId, NoLock);/* Temporary and unlogged relations are inaccessible during recovery. */ - if (!RelationNeedsWAL(relation) && RecoveryInProgress()) + if (!RelationIsWalLogged(relation) && RecoveryInProgress())
This has no user-visible consequences, but I agree you've clarified it.
ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("cannot access temporary or unlogged relations during recovery"))); diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h index 10b63982c0..810806a542 100644 --- a/src/include/utils/rel.h +++ b/src/include/utils/rel.h @@ -552,16 +552,23 @@ typedef struct ViewOptions (relation)->rd_smgr->smgr_targblock = (targblock); \ } while (0)+/* + * RelationIsPermanent + * True if relation is WAL-logged. + */ +#define RelationIsWalLogged(relation) \ + ((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT)
To me, the names RelationIsWalLogged() and RelationNeedsWAL() don't convey
their behavior difference. How about one of the following spellings?
- Name the new macro RelationEverNeedsWAL().
- Just test "relpersistence == RELPERSISTENCE_PERMANENT", without a macro.
+ /* * RelationNeedsWAL - * True if relation needs WAL. + * True if relation needs WAL at the time. * * Returns false if wal_level = minimal and this relation is created or * truncated in the current transaction. See "Skipping WAL for New * RelFileNode" in src/backend/access/transam/README. */ #define RelationNeedsWAL(relation) \ - ((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT && \ + (RelationIsWalLogged(relation) && \ (XLogIsNeeded() || \ (relation->rd_createSubid == InvalidSubTransactionId && \ relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId))) @@ -619,7 +626,7 @@ typedef struct ViewOptions */ #define RelationIsAccessibleInLogicalDecoding(relation) \ (XLogLogicalInfoActive() && \ - RelationNeedsWAL(relation) && \ + RelationIsWalLogged(relation) && \
Today's condition expands to:
wal_level >= WAL_LEVEL_LOGICAL &&
relation->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT &&
(wal_level >= WAL_LEVEL_REPLICA || ...)
The proposed change doesn't affect semantics, and it likely doesn't change
optimized code. I slightly prefer to leave this line unchanged.
(IsCatalogRelation(relation) || RelationIsUsedAsCatalogTable(relation)))
/*
@@ -635,7 +642,7 @@ typedef struct ViewOptions
*/
#define RelationIsLogicallyLogged(relation) \
(XLogLogicalInfoActive() && \
- RelationNeedsWAL(relation) && \
+ RelationIsWalLogged(relation) && \
Likewise.
!IsCatalogRelation(relation))
/* routines in utils/cache/relcache.c */ diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h index 579be352c5..7be922a9f1 100644 --- a/src/include/utils/snapmgr.h +++ b/src/include/utils/snapmgr.h @@ -37,7 +37,7 @@ */ #define RelationAllowsEarlyPruning(rel) \ ( \ - RelationNeedsWAL(rel) \ + RelationIsWalLogged(rel) \
I suspect this is user-visible for a scenario like:
CREATE TABLE t AS SELECT ...; DELETE FROM t;
-- ... time passes, rows of t are now eligible for early pruning ...
BEGIN;
ALTER TABLE t SET TABLESPACE something; -- start skipping WAL
SELECT count(*) FROM t;
After this patch, the SELECT would do early pruning, as it does in the absence
of the ALTER TABLE. When pruning doesn't update the page LSN,
TestForOldSnapshot() will be unable to detect that early pruning changed the
query results. Hence, RelationAllowsEarlyPruning() must not change this way.
Does that sound right?
Thank you for the comments, Noah and Andres.
At Fri, 15 Jan 2021 20:38:16 -0800, Noah Misch <noah@leadboat.com> wrote in
On Wed, Jan 13, 2021 at 04:07:05PM +0900, Kyotaro Horiguchi wrote:
The definition of the macro RelationNeedsWAL has been changed by
c6b92041d3 to include conditions related to the WAL-skip optimization
but some uses of the macro are not relevant to the optimization. That
misuses are harmless for now as they are only executed while wal_level= replica or WAL-skipping is inactive. However, this should be
corrected to prevent future hazard.
I see user-visible consequences:
--- a/src/backend/catalog/pg_publication.c +++ b/src/backend/catalog/pg_publication.c @@ -67,7 +67,7 @@ check_publication_add_relation(Relation targetrel) errdetail("System tables cannot be added to publications.")));/* UNLOGGED and TEMP relations cannot be part of publication. */ - if (!RelationNeedsWAL(targetrel)) + if (!RelationIsWalLogged(targetrel))Today, the following fails needlessly under wal_level=minimal:
BEGIN;
SET client_min_messages = 'ERROR';
CREATE TABLE skip_wal ();
CREATE PUBLICATION p FOR TABLE skip_wal;
ROLLBACK;Could you add that test to one of the regress scripts?
Mmm. I thought that it cannot be used while wal_level=minimal. The
WARNING for insiffucient wal_level shows that it is intended to
work. A test is added in the attached.
ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("table \"%s\" cannot be replicated", diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index da322b453e..0500efcdb9 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -126,7 +126,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, relation = table_open(relationObjectId, NoLock);/* Temporary and unlogged relations are inaccessible during recovery. */ - if (!RelationNeedsWAL(relation) && RecoveryInProgress()) + if (!RelationIsWalLogged(relation) && RecoveryInProgress())This has no user-visible consequences, but I agree you've clarified it.
ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("cannot access temporary or unlogged relations during recovery"))); diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h index 10b63982c0..810806a542 100644 --- a/src/include/utils/rel.h +++ b/src/include/utils/rel.h @@ -552,16 +552,23 @@ typedef struct ViewOptions (relation)->rd_smgr->smgr_targblock = (targblock); \ } while (0)+/* + * RelationIsPermanent + * True if relation is WAL-logged. + */ +#define RelationIsWalLogged(relation) \ + ((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT)To me, the names RelationIsWalLogged() and RelationNeedsWAL() don't convey
their behavior difference. How about one of the following spellings?
Yeah! I was concerned about that.
- Name the new macro RelationEverNeedsWAL().
- Just test "relpersistence == RELPERSISTENCE_PERMANENT", without a macro.
Yes. I wasn't confident on using the macro since as you pointed the
macro name is very confusing. The reason for using a macro was
RelationUsesLocalBuffers().
I'm not sure the exact meaning the "ever" (doesn't seem to be
"always"). On the other hand there are many places where the second
line above takes place. So I chose to do that without a macro.
+ /* * RelationNeedsWAL - * True if relation needs WAL. + * True if relation needs WAL at the time. * * Returns false if wal_level = minimal and this relation is created or * truncated in the current transaction. See "Skipping WAL for New * RelFileNode" in src/backend/access/transam/README. */ #define RelationNeedsWAL(relation) \ - ((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT && \ + (RelationIsWalLogged(relation) && \ (XLogIsNeeded() || \ (relation->rd_createSubid == InvalidSubTransactionId && \ relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId))) @@ -619,7 +626,7 @@ typedef struct ViewOptions */ #define RelationIsAccessibleInLogicalDecoding(relation) \ (XLogLogicalInfoActive() && \ - RelationNeedsWAL(relation) && \ + RelationIsWalLogged(relation) && \Today's condition expands to:
wal_level >= WAL_LEVEL_LOGICAL &&
relation->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT &&
(wal_level >= WAL_LEVEL_REPLICA || ...)The proposed change doesn't affect semantics, and it likely doesn't change
optimized code. I slightly prefer to leave this line unchanged.
Yes. The macro is correct whether we do the change or not. And I din't
make my mind that it is the right fix consdering back-patch
burden. The reason for the change was simply that the macro is
irrelevant to the rest of the old RelationNeedsWAL() macro other than
relpersistence.
So.. (?) I reverted it in this version.
(IsCatalogRelation(relation) || RelationIsUsedAsCatalogTable(relation)))
/*
@@ -635,7 +642,7 @@ typedef struct ViewOptions
*/
#define RelationIsLogicallyLogged(relation) \
(XLogLogicalInfoActive() && \
- RelationNeedsWAL(relation) && \
+ RelationIsWalLogged(relation) && \Likewise.
!IsCatalogRelation(relation))
/* routines in utils/cache/relcache.c */ diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h index 579be352c5..7be922a9f1 100644 --- a/src/include/utils/snapmgr.h +++ b/src/include/utils/snapmgr.h @@ -37,7 +37,7 @@ */ #define RelationAllowsEarlyPruning(rel) \ ( \ - RelationNeedsWAL(rel) \ + RelationIsWalLogged(rel) \I suspect this is user-visible for a scenario like:
CREATE TABLE t AS SELECT ...; DELETE FROM t;
-- ... time passes, rows of t are now eligible for early pruning ...
BEGIN;
ALTER TABLE t SET TABLESPACE something; -- start skipping WAL
SELECT count(*) FROM t;After this patch, the SELECT would do early pruning, as it does in the absence
of the ALTER TABLE. When pruning doesn't update the page LSN,
TestForOldSnapshot() will be unable to detect that early pruning changed the
query results. Hence, RelationAllowsEarlyPruning() must not change this way.
Does that sound right?
Mmm, maybe no. The pruning works on XID (or timestamp), not on LSN, so
it seems to work well even if pruning happened at the SELECT.
Conversely that should work after old_snapshot_threshold elapsed.
Am I missing something?
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
v2-0001-Do-not-use-RelationNeedsWAL-to-identify-relation-.patchtext/x-patch; charset=us-asciiDownload+31-6
On Mon, Jan 18, 2021 at 03:08:38PM +0900, Kyotaro Horiguchi wrote:
At Fri, 15 Jan 2021 20:38:16 -0800, Noah Misch <noah@leadboat.com> wrote in
On Wed, Jan 13, 2021 at 04:07:05PM +0900, Kyotaro Horiguchi wrote:
--- a/src/include/utils/snapmgr.h +++ b/src/include/utils/snapmgr.h @@ -37,7 +37,7 @@ */ #define RelationAllowsEarlyPruning(rel) \ ( \ - RelationNeedsWAL(rel) \ + RelationIsWalLogged(rel) \I suspect this is user-visible for a scenario like:
CREATE TABLE t AS SELECT ...; DELETE FROM t;
-- ... time passes, rows of t are now eligible for early pruning ...
BEGIN;
ALTER TABLE t SET TABLESPACE something; -- start skipping WAL
SELECT count(*) FROM t;After this patch, the SELECT would do early pruning, as it does in the absence
of the ALTER TABLE. When pruning doesn't update the page LSN,
TestForOldSnapshot() will be unable to detect that early pruning changed the
query results. Hence, RelationAllowsEarlyPruning() must not change this way.
Does that sound right?Mmm, maybe no. The pruning works on XID (or timestamp), not on LSN, so
it seems to work well even if pruning happened at the SELECT.
Conversely that should work after old_snapshot_threshold elapsed.Am I missing something?
I wrote the above based on the "PageGetLSN(page) > (snapshot)->lsn" check in
TestForOldSnapshot(). If the LSN isn't important, what else explains
RelationAllowsEarlyPruning() checking RelationNeedsWAL()?
On Sun, Jan 17, 2021 at 10:36:31PM -0800, Noah Misch wrote:
On Mon, Jan 18, 2021 at 03:08:38PM +0900, Kyotaro Horiguchi wrote:
At Fri, 15 Jan 2021 20:38:16 -0800, Noah Misch <noah@leadboat.com> wrote in
On Wed, Jan 13, 2021 at 04:07:05PM +0900, Kyotaro Horiguchi wrote:
--- a/src/include/utils/snapmgr.h +++ b/src/include/utils/snapmgr.h @@ -37,7 +37,7 @@ */ #define RelationAllowsEarlyPruning(rel) \ ( \ - RelationNeedsWAL(rel) \ + RelationIsWalLogged(rel) \I suspect this is user-visible for a scenario like:
CREATE TABLE t AS SELECT ...; DELETE FROM t;
-- ... time passes, rows of t are now eligible for early pruning ...
BEGIN;
ALTER TABLE t SET TABLESPACE something; -- start skipping WAL
SELECT count(*) FROM t;After this patch, the SELECT would do early pruning, as it does in the absence
of the ALTER TABLE. When pruning doesn't update the page LSN,
TestForOldSnapshot() will be unable to detect that early pruning changed the
query results. Hence, RelationAllowsEarlyPruning() must not change this way.
Does that sound right?Mmm, maybe no. The pruning works on XID (or timestamp), not on LSN, so
it seems to work well even if pruning happened at the SELECT.
Conversely that should work after old_snapshot_threshold elapsed.Am I missing something?
I wrote the above based on the "PageGetLSN(page) > (snapshot)->lsn" check in
TestForOldSnapshot(). If the LSN isn't important, what else explains
RelationAllowsEarlyPruning() checking RelationNeedsWAL()?
Thinking about it more, some RelationAllowsEarlyPruning() callers need
different treatment. Above, I was writing about the case of deciding whether
to do early pruning. The other RelationAllowsEarlyPruning() call sites are
deciding whether the relation might be lacking old data. For that case, we
should check RELPERSISTENCE_PERMANENT, not RelationNeedsWAL(). Otherwise, we
could fail to report an old-snapshot error in a case like this:
setup: CREATE TABLE t AS SELECT ...;
xact1: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT 1; -- take snapshot
xact2: DELETE FROM t;
(plenty of time passes)
xact3: SELECT count(*) FROM t; -- early pruning
xact1: SAVEPOINT q; SELECT count(*) FROM t; ROLLBACK TO q; -- "snapshot too old"
xact1: ALTER TABLE t SET TABLESPACE something; -- start skipping WAL
xact1: SELECT count(*) FROM t; -- no error, wanted "snapshot too old"
Is that plausible?
At Sun, 17 Jan 2021 23:02:18 -0800, Noah Misch <noah@leadboat.com> wrote in
On Sun, Jan 17, 2021 at 10:36:31PM -0800, Noah Misch wrote:
I wrote the above based on the "PageGetLSN(page) > (snapshot)->lsn" check in
TestForOldSnapshot(). If the LSN isn't important, what else explains
RelationAllowsEarlyPruning() checking RelationNeedsWAL()?Thinking about it more, some RelationAllowsEarlyPruning() callers need
different treatment. Above, I was writing about the case of deciding whether
to do early pruning. The other RelationAllowsEarlyPruning() call sites are
deciding whether the relation might be lacking old data. For that case, we
should check RELPERSISTENCE_PERMANENT, not RelationNeedsWAL(). Otherwise, we
could fail to report an old-snapshot error in a case like this:setup: CREATE TABLE t AS SELECT ...;
xact1: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT 1; -- take snapshot
xact2: DELETE FROM t;
(plenty of time passes)
xact3: SELECT count(*) FROM t; -- early pruning
xact1: SAVEPOINT q; SELECT count(*) FROM t; ROLLBACK TO q; -- "snapshot too old"
xact1: ALTER TABLE t SET TABLESPACE something; -- start skipping WAL
xact1: SELECT count(*) FROM t; -- no error, wanted "snapshot too old"Is that plausible?
Thank you for the consideration and yes. But I get "snapshot too old"
from the last query with the patched version. maybe I'm missing
something. I'm going to investigate the case.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
At Mon, 18 Jan 2021 17:30:22 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
At Sun, 17 Jan 2021 23:02:18 -0800, Noah Misch <noah@leadboat.com> wrote in
On Sun, Jan 17, 2021 at 10:36:31PM -0800, Noah Misch wrote:
I wrote the above based on the "PageGetLSN(page) > (snapshot)->lsn" check in
TestForOldSnapshot(). If the LSN isn't important, what else explains
RelationAllowsEarlyPruning() checking RelationNeedsWAL()?Thinking about it more, some RelationAllowsEarlyPruning() callers need
different treatment. Above, I was writing about the case of deciding whether
to do early pruning. The other RelationAllowsEarlyPruning() call sites are
deciding whether the relation might be lacking old data. For that case, we
should check RELPERSISTENCE_PERMANENT, not RelationNeedsWAL(). Otherwise, we
could fail to report an old-snapshot error in a case like this:
A> > setup: CREATE TABLE t AS SELECT ...;
B> > xact1: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT 1; -- take snapshot
C> > xact2: DELETE FROM t;
D> > (plenty of time passes)
E> > xact3: SELECT count(*) FROM t; -- early pruning
F> > xact1: SAVEPOINT q; SELECT count(*) FROM t; ROLLBACK TO q; -- "snapshot too old"
G> > xact1: ALTER TABLE t SET TABLESPACE something; -- start skipping WAL
H> > xact1: SELECT count(*) FROM t; -- no error, wanted "snapshot too old"
Is that plausible?
Thank you for the consideration and yes. But I get "snapshot too old"
from the last query with the patched version. maybe I'm missing
something. I'm going to investigate the case.
Ah. I took that in reverse way. (caught by the discussion on page
LSN.) Ok, the "current" code works that way. So we need to perform the
check the new way in RelationAllowsEarlyPruning. So in a short answer
for the last question in regard to the reference side is yes.
I understand that you are suggesting that at least
TransactionIdLimitedForOldSnapshots should follow not only relation
persistence but RelationNeedsWAL, based on the discussion on the
check on LSN of TestForOldSnapshot().
I still don't think that LSN in the WAL-skipping-created relfilenode
harm. ALTER TABLE SET TABLESPACE always makes a dead copy of every
block (except checksum) including page LSN regardless of wal_level. In
the scenario above, the last select at H correctly reads page LSN set
by E then copied by G, which is larger than the snapshot LSN at B. So
doesn't go the fast-return path before actual check performed by
RelationAllowsEarlyPruning.
As the result still RelationAllowsEarlyPruning is changed to check
only for the persistence of the table in this version. (In other
words, all the callers of the function works based on the same
criteria.)
- Removed RelationIsWalLoggeed which was forgotten to remove in the
last version.
- Enclosed parameter of RelationAllowsEarlyPruning.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
v3-0001-Do-not-use-RelationNeedsWAL-to-identify-relation-.patchtext/x-patch; charset=us-asciiDownload+24-6
On Tue, Jan 19, 2021 at 01:48:31PM +0900, Kyotaro Horiguchi wrote:
At Mon, 18 Jan 2021 17:30:22 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
At Sun, 17 Jan 2021 23:02:18 -0800, Noah Misch <noah@leadboat.com> wrote in
On Sun, Jan 17, 2021 at 10:36:31PM -0800, Noah Misch wrote:
I wrote the above based on the "PageGetLSN(page) > (snapshot)->lsn" check in
TestForOldSnapshot(). If the LSN isn't important, what else explains
RelationAllowsEarlyPruning() checking RelationNeedsWAL()?Thinking about it more, some RelationAllowsEarlyPruning() callers need
different treatment. Above, I was writing about the case of deciding whether
to do early pruning. The other RelationAllowsEarlyPruning() call sites are
deciding whether the relation might be lacking old data. For that case, we
should check RELPERSISTENCE_PERMANENT, not RelationNeedsWAL(). Otherwise, we
could fail to report an old-snapshot error in a case like this:A> > setup: CREATE TABLE t AS SELECT ...;
B> > xact1: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT 1; -- take snapshot
C> > xact2: DELETE FROM t;
D> > (plenty of time passes)
E> > xact3: SELECT count(*) FROM t; -- early pruning
F> > xact1: SAVEPOINT q; SELECT count(*) FROM t; ROLLBACK TO q; -- "snapshot too old"
G> > xact1: ALTER TABLE t SET TABLESPACE something; -- start skipping WAL
H> > xact1: SELECT count(*) FROM t; -- no error, wanted "snapshot too old"Is that plausible?
Thank you for the consideration and yes. But I get "snapshot too old"
from the last query with the patched version. maybe I'm missing
something. I'm going to investigate the case.Ah. I took that in reverse way. (caught by the discussion on page
LSN.) Ok, the "current" code works that way. So we need to perform the
check the new way in RelationAllowsEarlyPruning. So in a short answer
for the last question in regard to the reference side is yes.
Right. I expect the above procedure shows a bug in git master, but your patch
versions suffice to fix that bug.
I understand that you are suggesting that at least
TransactionIdLimitedForOldSnapshots should follow not only relation
persistence but RelationNeedsWAL, based on the discussion on the
check on LSN of TestForOldSnapshot().I still don't think that LSN in the WAL-skipping-created relfilenode
harm. ALTER TABLE SET TABLESPACE always makes a dead copy of every
block (except checksum) including page LSN regardless of wal_level. In
the scenario above, the last select at H correctly reads page LSN set
by E then copied by G, which is larger than the snapshot LSN at B. So
doesn't go the fast-return path before actual check performed by
RelationAllowsEarlyPruning.
I agree the above procedure doesn't have a problem under your patch versions.
See /messages/by-id/20210116043816.GA1644261@rfd.leadboat.com for a
different test case. In more detail:
setup: CREATE TABLE t AS SELECT ...;
xact1: BEGIN; DELETE FROM t;
xact2: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT 1; -- take snapshot
xact1: COMMIT;
(plenty of time passes, rows of t are now eligible for early pruning)
xact3: BEGIN;
xact3: ALTER TABLE t SET TABLESPACE something; -- start skipping WAL
xact3: SELECT count(*) FROM t; -- early pruning w/o WAL, w/o LSN changes
xact3: COMMIT;
xact2: SELECT count(*) FROM t; -- no error, wanted "snapshot too old"
I expect that shows no bug for git master, but I expect it does show a bug
with your patch versions. Could you try implementing both test procedures in
src/test/modules/snapshot_too_old? There's no need to make the test use
wal_level=minimal explicitly; it's sufficient to catch these bugs when
wal_level=minimal is in the TEMP_CONFIG file.
At Tue, 19 Jan 2021 01:31:52 -0800, Noah Misch <noah@leadboat.com> wrote in
On Tue, Jan 19, 2021 at 01:48:31PM +0900, Kyotaro Horiguchi wrote:
I understand that you are suggesting that at least
TransactionIdLimitedForOldSnapshots should follow not only relation
persistence but RelationNeedsWAL, based on the discussion on the
check on LSN of TestForOldSnapshot().I still don't think that LSN in the WAL-skipping-created relfilenode
harm. ALTER TABLE SET TABLESPACE always makes a dead copy of every
block (except checksum) including page LSN regardless of wal_level. In
the scenario above, the last select at H correctly reads page LSN set
by E then copied by G, which is larger than the snapshot LSN at B. So
doesn't go the fast-return path before actual check performed by
RelationAllowsEarlyPruning.I agree the above procedure doesn't have a problem under your patch versions.
See /messages/by-id/20210116043816.GA1644261@rfd.leadboat.com for a
different test case. In more detail:
A> setup: CREATE TABLE t AS SELECT ...;
B> xact1: BEGIN; DELETE FROM t;
C> xact2: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT 1; -- take snapshot
D> xact1: COMMIT;
(plenty of time passes, rows of t are now eligible for early pruning)
E> xact3: BEGIN;
F> xact3: ALTER TABLE t SET TABLESPACE something; -- start skipping WAL
G> xact3: SELECT count(*) FROM t; -- early pruning w/o WAL, w/o LSN changes
H> xact3: COMMIT;
J> xact2: SELECT count(*) FROM t; -- no error, wanted "snapshot too old"
I expect that shows no bug for git master, but I expect it does show a bug
I didn't see "snapshot too old" at J with the patch, but at the same
time the SELECT at G didn't prune tuples almost all of the trial. (the
last SELECT returns the correct answer.) I saw it get pruned only once
among many trials but I'm not sure how I could get to the situation
and haven't succeed to reproduce that.
The reason that the SELECT at G doesn't prune is that limited_xmin in
heap_page_prune_opt at G is limited up to the oldest xid among active
sessions. In this case the xmin of the session 2 is the same with the
xid of the session 1. So xmin of the session 3 at G is the same with
the xmin of the session 2, which is the same with the xid of the
session 1. So pruning doesn't happen. However that is very fragile as
the base for asserting that pruning won't happen.
Anyway, it seems actually dangerous that cause pruning on wal-skipped
relation.
with your patch versions. Could you try implementing both test procedures in
src/test/modules/snapshot_too_old? There's no need to make the test use
wal_level=minimal explicitly; it's sufficient to catch these bugs when
wal_level=minimal is in the TEMP_CONFIG file.
In the attached, TestForOldSnapshot() considers XLogIsNeeded(),
instead of moving the condition on LSN to TestForOldSnapshot_impl for
performance.
I'll add the test part in the next version.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
v4-0001-Do-not-use-RelationNeedsWAL-to-identify-relation-.patchtext/x-patch; charset=us-asciiDownload+12-6
At Wed, 20 Jan 2021 17:34:44 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
Anyway, it seems actually dangerous that cause pruning on wal-skipped
relation.with your patch versions. Could you try implementing both test procedures in
src/test/modules/snapshot_too_old? There's no need to make the test use
wal_level=minimal explicitly; it's sufficient to catch these bugs when
wal_level=minimal is in the TEMP_CONFIG file.In the attached, TestForOldSnapshot() considers XLogIsNeeded(),
instead of moving the condition on LSN to TestForOldSnapshot_impl for
performance.I'll add the test part in the next version.
This is it.
However, with the previous patch, two existing tests sto_using_cursor
and sto_using_select behaves differently from the master. That change
is coming from the omission of actual LSN check in TestForOldSnapshot
while wal_level=minimal. So we have no choice other than actually
updating page LSN.
In the scenario under discussion there are two places we need to do
that. one is heap_page_prune, and the other is RelationCopyStorge. As
a PoC, I used gistXLogAssignLSN() as is for thie purpose. See the
attached third file.
If it is the right direction, I will move XLOG_GIST_ASSIGN_LSN to
XLOG_ASSIGN_LSN and move gistXLogAssignLSN() to XLogAdvanceLSN() or
XLogNop() or such.
With the third patch, the test succedds both wal_level = replica and
minimal.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
v5-0001-Test-for-snapshot-too-old-and-wal_level-minimal.patchtext/x-patch; charset=us-asciiDownload+82-2
v5-0002-Do-not-use-RelationNeedsWAL-to-identify-relation-.patchtext/x-patch; charset=us-asciiDownload+12-6
v5-0003-Poc-Keep-page-LSN-updated-while-WAL-skipping.patchtext/x-patch; charset=us-asciiDownload+26-4
On Thu, Jan 21, 2021 at 12:28:44AM +0900, Kyotaro Horiguchi wrote:
At Wed, 20 Jan 2021 17:34:44 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
Anyway, it seems actually dangerous that cause pruning on wal-skipped
relation.with your patch versions. Could you try implementing both test procedures in
src/test/modules/snapshot_too_old? There's no need to make the test use
wal_level=minimal explicitly; it's sufficient to catch these bugs when
wal_level=minimal is in the TEMP_CONFIG file.In the attached, TestForOldSnapshot() considers XLogIsNeeded(),
instead of moving the condition on LSN to TestForOldSnapshot_impl for
performance.I'll add the test part in the next version.
That test helped me. I now see "there's not a single tuple removed due to
old_snapshot_threshold in src/test/modules/snapshot_too_old"[1]/messages/by-id/20200403001235.e6jfdll3gh2ygbuc@alap3.anarazel.de, which limits
our ability to test using this infrastructure.
However, with the previous patch, two existing tests sto_using_cursor
and sto_using_select behaves differently from the master. That change
is coming from the omission of actual LSN check in TestForOldSnapshot
while wal_level=minimal. So we have no choice other than actually
updating page LSN.In the scenario under discussion there are two places we need to do
that. one is heap_page_prune, and the other is RelationCopyStorge. As
a PoC, I used gistXLogAssignLSN() as is for thie purpose. See the
attached third file.
Fake LSNs make the system harder to understand, so I prefer not to spread fake
LSNs to more access methods. What I had in mind is to simply suppress early
pruning when the relation is skipping WAL. Attached. Is this reasonable? It
passes the older tests. While it changes the sto_wal_optimized.spec output, I
think it preserves the old_snapshot_threshold behavior contract.
[1]: /messages/by-id/20200403001235.e6jfdll3gh2ygbuc@alap3.anarazel.de
Attachments:
RelationNeedsWAL-to-relpersistence-v6nm.patchtext/plain; charset=us-asciiDownload+28-5
At Thu, 21 Jan 2021 00:19:58 -0800, Noah Misch <noah@leadboat.com> wrote in
On Thu, Jan 21, 2021 at 12:28:44AM +0900, Kyotaro Horiguchi wrote:
At Wed, 20 Jan 2021 17:34:44 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
Anyway, it seems actually dangerous that cause pruning on wal-skipped
relation.with your patch versions. Could you try implementing both test procedures in
src/test/modules/snapshot_too_old? There's no need to make the test use
wal_level=minimal explicitly; it's sufficient to catch these bugs when
wal_level=minimal is in the TEMP_CONFIG file.In the attached, TestForOldSnapshot() considers XLogIsNeeded(),
instead of moving the condition on LSN to TestForOldSnapshot_impl for
performance.I'll add the test part in the next version.
That test helped me. I now see "there's not a single tuple removed due to
old_snapshot_threshold in src/test/modules/snapshot_too_old"[1], which limits
our ability to test using this infrastructure.
Yes.
However, with the previous patch, two existing tests sto_using_cursor
and sto_using_select behaves differently from the master. That change
is coming from the omission of actual LSN check in TestForOldSnapshot
while wal_level=minimal. So we have no choice other than actually
updating page LSN.In the scenario under discussion there are two places we need to do
that. one is heap_page_prune, and the other is RelationCopyStorge. As
a PoC, I used gistXLogAssignLSN() as is for thie purpose. See the
attached third file.Fake LSNs make the system harder to understand, so I prefer not to spread fake
LSNs to more access methods. What I had in mind is to simply suppress early
pruning when the relation is skipping WAL. Attached. Is this reasonable? It
passes the older tests. While it changes the sto_wal_optimized.spec output, I
think it preserves the old_snapshot_threshold behavior contract.
Perhaps I'm missing something, but the patch doesn't pass the v5-0001
test with wal_level=minimal?
[1] /messages/by-id/20200403001235.e6jfdll3gh2ygbuc@alap3.anarazel.de
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Thu, Jan 21, 2021 at 06:02:11PM +0900, Kyotaro Horiguchi wrote:
At Thu, 21 Jan 2021 00:19:58 -0800, Noah Misch <noah@leadboat.com> wrote in
On Thu, Jan 21, 2021 at 12:28:44AM +0900, Kyotaro Horiguchi wrote:
However, with the previous patch, two existing tests sto_using_cursor
and sto_using_select behaves differently from the master. That change
is coming from the omission of actual LSN check in TestForOldSnapshot
while wal_level=minimal. So we have no choice other than actually
updating page LSN.In the scenario under discussion there are two places we need to do
that. one is heap_page_prune, and the other is RelationCopyStorge. As
a PoC, I used gistXLogAssignLSN() as is for thie purpose. See the
attached third file.Fake LSNs make the system harder to understand, so I prefer not to spread fake
LSNs to more access methods. What I had in mind is to simply suppress early
pruning when the relation is skipping WAL. Attached. Is this reasonable? It
passes the older tests. While it changes the sto_wal_optimized.spec output, I
think it preserves the old_snapshot_threshold behavior contract.Perhaps I'm missing something, but the patch doesn't pass the v5-0001
test with wal_level=minimal?
Correct. The case we must avoid is letting an old snapshot read an
early-pruned page without error. v5-0001 expects "ERROR: snapshot too old".
The patch suspends early pruning, so that error is not applicable.
On Thu, Jan 21, 2021 at 01:23:36AM -0800, Noah Misch wrote:
On Thu, Jan 21, 2021 at 06:02:11PM +0900, Kyotaro Horiguchi wrote:
At Thu, 21 Jan 2021 00:19:58 -0800, Noah Misch <noah@leadboat.com> wrote in
On Thu, Jan 21, 2021 at 12:28:44AM +0900, Kyotaro Horiguchi wrote:
However, with the previous patch, two existing tests sto_using_cursor
and sto_using_select behaves differently from the master. That change
is coming from the omission of actual LSN check in TestForOldSnapshot
while wal_level=minimal. So we have no choice other than actually
updating page LSN.In the scenario under discussion there are two places we need to do
that. one is heap_page_prune, and the other is RelationCopyStorge. As
a PoC, I used gistXLogAssignLSN() as is for thie purpose. See the
attached third file.Fake LSNs make the system harder to understand, so I prefer not to spread fake
LSNs to more access methods. What I had in mind is to simply suppress early
pruning when the relation is skipping WAL. Attached. Is this reasonable? It
passes the older tests. While it changes the sto_wal_optimized.spec output, I
think it preserves the old_snapshot_threshold behavior contract.Perhaps I'm missing something, but the patch doesn't pass the v5-0001
test with wal_level=minimal?Correct. The case we must avoid is letting an old snapshot read an
early-pruned page without error. v5-0001 expects "ERROR: snapshot too old".
The patch suspends early pruning, so that error is not applicable.
I think the attached version is ready. The changes since v6nm are cosmetic:
- Wrote log messages
- Split into two patches, since the user-visible bugs are materially different
- Fixed typos
- Ran perltidy
Is it okay if I push these on Saturday, or would you like more time to
investigate?
At Wed, 27 Jan 2021 02:48:48 -0800, Noah Misch <noah@leadboat.com> wrote in
On Thu, Jan 21, 2021 at 01:23:36AM -0800, Noah Misch wrote:
On Thu, Jan 21, 2021 at 06:02:11PM +0900, Kyotaro Horiguchi wrote:
Perhaps I'm missing something, but the patch doesn't pass the v5-0001
test with wal_level=minimal?Correct. The case we must avoid is letting an old snapshot read an
early-pruned page without error. v5-0001 expects "ERROR: snapshot too old".
The patch suspends early pruning, so that error is not applicable.I think the attached version is ready. The changes since v6nm are cosmetic:
- Wrote log messages
- Split into two patches, since the user-visible bugs are materially different
- Fixed typos
- Ran perltidyIs it okay if I push these on Saturday, or would you like more time to
investigate?
Thank you for the new version.
I studied the sto feature further and concluded that the checker side
is fine that it always follow the chages of page-LSN.
So what we can do for the issue is setting seemingly correct page LSN
at pruning or refrain from early-pruning while we are skipping
WAL. The reason I took the former is I thought that the latter might
be a problem since early-pruning would be postponed by a long-running
wal-skipping transaction.
So the patch looks fine to me. The commit message mekes sense.
However, is it ok that the existing tests (modules/snapshot_too_old)
fails when wal_level=minimal?
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Thu, Jan 28, 2021 at 12:06:27PM +0900, Kyotaro Horiguchi wrote:
At Wed, 27 Jan 2021 02:48:48 -0800, Noah Misch <noah@leadboat.com> wrote in
On Thu, Jan 21, 2021 at 01:23:36AM -0800, Noah Misch wrote:
On Thu, Jan 21, 2021 at 06:02:11PM +0900, Kyotaro Horiguchi wrote:
Perhaps I'm missing something, but the patch doesn't pass the v5-0001
test with wal_level=minimal?Correct. The case we must avoid is letting an old snapshot read an
early-pruned page without error. v5-0001 expects "ERROR: snapshot too old".
The patch suspends early pruning, so that error is not applicable.
I studied the sto feature further and concluded that the checker side
is fine that it always follow the chages of page-LSN.So what we can do for the issue is setting seemingly correct page LSN
at pruning or refrain from early-pruning while we are skipping
WAL. The reason I took the former is I thought that the latter might
be a problem since early-pruning would be postponed by a long-running
wal-skipping transaction.
Yes, that's an accurate summary.
So the patch looks fine to me. The commit message mekes sense.
However, is it ok that the existing tests (modules/snapshot_too_old)
fails when wal_level=minimal?
That would not be okay, but I'm not seeing that. How did your setup differ
from the following?
[nm@rfd 6:1 2021-01-27T23:06:33 postgresql 0]$ cat /nmscratch/minimal.conf
log_statement = all
wal_level = minimal
max_wal_senders = 0
log_line_prefix = '%m [%p %l %x] %q%a '
[nm@rfd 6:1 2021-01-27T23:06:38 postgresql 0]$ make -C src/test/modules/snapshot_too_old check TEMP_CONFIG=/nmscratch/minimal.conf
============== creating temporary instance ==============
============== initializing database system ==============
============== starting postmaster ==============
running on port 58080 with PID 2603099
============== creating database "isolation_regression" ==============
CREATE DATABASE
ALTER DATABASE
============== running regression test queries ==============
test sto_using_cursor ... ok 30168 ms
test sto_using_select ... ok 24197 ms
test sto_using_hash_index ... ok 6089 ms
============== shutting down postmaster ==============
============== removing temporary instance ==============
=====================
All 3 tests passed.
=====================
At Wed, 27 Jan 2021 23:10:53 -0800, Noah Misch <noah@leadboat.com> wrote in
On Thu, Jan 28, 2021 at 12:06:27PM +0900, Kyotaro Horiguchi wrote:
At Wed, 27 Jan 2021 02:48:48 -0800, Noah Misch <noah@leadboat.com> wrote in
On Thu, Jan 21, 2021 at 01:23:36AM -0800, Noah Misch wrote:
On Thu, Jan 21, 2021 at 06:02:11PM +0900, Kyotaro Horiguchi wrote:
Perhaps I'm missing something, but the patch doesn't pass the v5-0001
test with wal_level=minimal?Correct. The case we must avoid is letting an old snapshot read an
early-pruned page without error. v5-0001 expects "ERROR: snapshot too old".
The patch suspends early pruning, so that error is not applicable.I studied the sto feature further and concluded that the checker side
is fine that it always follow the chages of page-LSN.So what we can do for the issue is setting seemingly correct page LSN
at pruning or refrain from early-pruning while we are skipping
WAL. The reason I took the former is I thought that the latter might
be a problem since early-pruning would be postponed by a long-running
wal-skipping transaction.Yes, that's an accurate summary.
So the patch looks fine to me. The commit message mekes sense.
However, is it ok that the existing tests (modules/snapshot_too_old)
fails when wal_level=minimal?That would not be okay, but I'm not seeing that. How did your setup differ
from the following?
I did that with the following temp-conf. However, the tests succeeds
when I ran them again with the configuration. Sorry for the noise.
autovacuum = off
old_snapshot_threshold = 0
wal_level=minimal
max_wal_senders=0
wal_skip_threshold=0
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center