Spurious "apparent wraparound" via SimpleLruTruncate() rounding
While testing an xidStopLimit corner case, I got this:
3656710 2019-01-05 00:05:13.910 GMT LOG: automatic aggressive vacuum to prevent wraparound of table "test.pg_toast.pg_toast_826": index scans: 0
3656710 2019-01-05 00:05:16.912 GMT LOG: could not truncate directory "pg_xact": apparent wraparound
3656710 2019-01-05 00:05:16.912 GMT DEBUG: transaction ID wrap limit is 4294486400, limited by database with OID 1
3656710 2019-01-05 00:05:16.912 GMT WARNING: database "template1" must be vacuumed within 481499 transactions
3656710 2019-01-05 00:05:16.912 GMT HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
I think the WARNING was correct about having 481499 XIDs left before
xidWrapLimit, and the spurious "apparent wraparound" arose from this
rounding-down in SimpleLruTruncate():
cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
...
/*
* While we are holding the lock, make an important safety check: the
* planned cutoff point must be <= the current endpoint page. Otherwise we
* have already wrapped around, and proceeding with the truncation would
* risk removing the current segment.
*/
if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
{
LWLockRelease(shared->ControlLock);
ereport(LOG,
(errmsg("could not truncate directory \"%s\": apparent wraparound",
ctl->Dir)));
We round "cutoffPage" to make ctl->PagePrecedes(segpage, cutoffPage) return
false for the segment containing the cutoff page. CLOGPagePrecedes() (and
most SLRU PagePrecedes methods) implements a circular address space. Hence,
the rounding also causes ctl->PagePrecedes(segpage, cutoffPage) to return true
for the segment furthest in the future relative to the unrounded cutoffPage
(if it exists). That's bad. Such a segment rarely exists, because
xidStopLimit protects 1000000 XIDs, and the rounding moves truncation by no
more than (BLCKSZ * CLOG_XACTS_PER_BYTE * SLRU_PAGES_PER_SEGMENT - 1) =
1048575 XIDs. Thus, I expect to see this problem at 4.9% of xidStopLimit
values. I expect this is easier to see with multiStopLimit, which protects
only 100 mxid.
The main consequence is the false alarm. A prudent DBA will want to react to
true wraparound, but no such wraparound has occurred. Also, we temporarily
waste disk space in pg_xact. This feels like a recipe for future bugs. The
fix I have in mind, attached, is to change instances of
ctl->PagePrecedes(FIRST_PAGE_OF_SEGMENT, ROUNDED_cutoffPage) to
ctl->PagePrecedes(LAST_PAGE_OF_SEGMENT, cutoffPage). I'm inclined not to
back-patch this; does anyone favor back-patching?
Thanks,
nm
Attachments:
slru-truncate-modulo-v1.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 3623352..843486a 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -1171,11 +1171,6 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
int slotno;
/*
- * The cutoff point is the start of the segment containing cutoffPage.
- */
- cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
-
- /*
* Scan shared memory and remove any pages preceding the cutoff page, to
* ensure we won't rewrite them later. (Since this is normally called in
* or just after a checkpoint, any dirty pages should have been flushed
@@ -1320,11 +1315,10 @@ restart:
bool
SlruScanDirCbReportPresence(SlruCtl ctl, char *filename, int segpage, void *data)
{
+ int seg_last_page = segpage + SLRU_PAGES_PER_SEGMENT - 1;
int cutoffPage = *(int *) data;
- cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
-
- if (ctl->PagePrecedes(segpage, cutoffPage))
+ if (ctl->PagePrecedes(seg_last_page, cutoffPage))
return true; /* found one; don't iterate any more */
return false; /* keep going */
@@ -1337,9 +1331,10 @@ SlruScanDirCbReportPresence(SlruCtl ctl, char *filename, int segpage, void *data
static bool
SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename, int segpage, void *data)
{
+ int seg_last_page = segpage + SLRU_PAGES_PER_SEGMENT - 1;
int cutoffPage = *(int *) data;
- if (ctl->PagePrecedes(segpage, cutoffPage))
+ if (ctl->PagePrecedes(seg_last_page, cutoffPage))
SlruInternalDeleteSegment(ctl, filename);
return false; /* keep going */
On Sat, Feb 02, 2019 at 03:38:22AM -0500, Noah Misch wrote:
The main consequence is the false alarm. A prudent DBA will want to react to
true wraparound, but no such wraparound has occurred. Also, we temporarily
waste disk space in pg_xact. This feels like a recipe for future bugs. The
fix I have in mind, attached, is to change instances of
ctl->PagePrecedes(FIRST_PAGE_OF_SEGMENT, ROUNDED_cutoffPage) to
ctl->PagePrecedes(LAST_PAGE_OF_SEGMENT, cutoffPage). I'm inclined not to
back-patch this; does anyone favor back-patching?
To avoid wasting more of anyone's time: that patch is bad; I'll update this
thread when I have something better.
On Sat, Feb 02, 2019 at 03:38:22AM -0500, Noah Misch wrote:
The main consequence is the false alarm.
That conclusion was incorrect. On further study, I was able to reproduce data
loss via either of two weaknesses in the "apparent wraparound" test:
1. The result of the test is valid only until we release the SLRU ControlLock,
which we do before SlruScanDirCbDeleteCutoff() uses the cutoff to evaluate
segments for deletion. Once we release that lock, latest_page_number can
advance. This creates a TOCTOU race condition, allowing excess deletion:
[local] test=# table trunc_clog_concurrency ;
ERROR: could not access status of transaction 2149484247
DETAIL: Could not open file "pg_xact/0801": No such file or directory.
2. By the time the "apparent wraparound" test fires, we've already WAL-logged
the truncation. clog_redo() suppresses the "apparent wraparound" test,
then deletes too much. Startup then fails:
881997 2019-02-10 02:53:32.105 GMT FATAL: could not access status of transaction 708112327
881997 2019-02-10 02:53:32.105 GMT DETAIL: Could not open file "pg_xact/02A3": No such file or directory.
881855 2019-02-10 02:53:32.107 GMT LOG: startup process (PID 881997) exited with exit code 1
Fixes are available:
a. Fix the rounding in SimpleLruTruncate(). (The patch I posted upthread is
wrong; I will correct it in a separate message.)
b. Arrange so only one backend runs vac_truncate_clog() at a time. Other than
AsyncCtl, every SLRU truncation appears in vac_truncate_clog(), in a
checkpoint, or in the startup process. Hence, also arrange for only one
backend to call SimpleLruTruncate(AsyncCtl) at a time.
c. Test "apparent wraparound" before writing WAL, and don't WAL-log
truncations that "apparent wraparound" forces us to skip.
d. Hold the ControlLock for the entirety of SimpleLruTruncate(). This removes
the TOCTOU race condition, but TransactionIdDidCommit() and other key
operations would be waiting behind filesystem I/O.
e. Have the SLRU track a "low cutoff" for an ongoing truncation. Initially,
the low cutoff is the page furthest in the past relative to cutoffPage (the
"high cutoff"). If SimpleLruZeroPage() wishes to use a page in the
truncation range, it would acquire an LWLock and increment the low cutoff.
Before unlinking any segment, SlruScanDirCbDeleteCutoff() would take the
same LWLock and recheck the segment against the latest low cutoff.
With both (a) and (b), the only way I'd know to reach the "apparent
wraparound" message is to restart in single-user mode and burn XIDs to the
point of bona fide wraparound. Hence, I propose to back-patch (a) and (b),
and I propose (c) for HEAD only. I don't want (d), which threatens
performance too much. I would rather not have (e), because I expect it's more
complex than (b) and fixes strictly less than (b) fixes.
Can you see a way to improve on that plan? Can you see other bugs of this
nature that this plan does not fix?
Thanks,
nm
On Wed, Feb 13, 2019 at 11:26:23PM -0800, Noah Misch wrote:
On further study, I was able to reproduce data loss
Fixes are available:
a. Fix the rounding in SimpleLruTruncate(). (The patch I posted upthread is
wrong; I will correct it in a separate message.)
Here's a corrected version. I now delete a segment only if both its first
page and its last page are considered to precede the cutoff; see the new
comment at SlruMayDeleteSegment().
Attachments:
slru-truncate-modulo-v2.patchtext/plain; charset=us-asciiDownload
commit 09393a1 (HEAD)
Author: Noah Misch <noah@leadboat.com>
AuthorDate: Sat Feb 16 20:02:51 2019 -0800
Commit: Noah Misch <noah@leadboat.com>
CommitDate: Sat Feb 16 20:02:51 2019 -0800
Don't round SimpleLruTruncate() cutoffPage values.
Every core SLRU wraps around. The rounding did not account for that; in
rare cases, it permitted deletion of the most recently-populated page of
SLRU data. This closes a rare opportunity for data loss, which
manifested as "could not access status of transaction" errors. If a
user's physical replication primary logged ": apparent wraparound"
messages, the user should rebuild that primary's standbys regardless of
symptoms. At less risk is a cluster having emitted "not accepting
commands" errors or "must be vacuumed" warnings at some point. One can
test a cluster for this data loss by running VACUUM FREEZE in every
database. Back-patch to 9.4 (all supported versions).
Discussion: https://postgr.es/m/20190202083822.GC32531@gust.leadboat.com
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 3623352..71e29b9 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -1171,11 +1171,6 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
int slotno;
/*
- * The cutoff point is the start of the segment containing cutoffPage.
- */
- cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
-
- /*
* Scan shared memory and remove any pages preceding the cutoff page, to
* ensure we won't rewrite them later. (Since this is normally called in
* or just after a checkpoint, any dirty pages should have been flushed
@@ -1221,8 +1216,11 @@ restart:;
* Hmm, we have (or may have) I/O operations acting on the page, so
* we've got to wait for them to finish and then start again. This is
* the same logic as in SlruSelectLRUPage. (XXX if page is dirty,
- * wouldn't it be OK to just discard it without writing it? For now,
- * keep the logic the same as it was.)
+ * wouldn't it be OK to just discard it without writing it?
+ * SlruMayDeleteSegment() uses a stricter qualification, so we might
+ * not delete this page in the end; even if we don't delete it, we
+ * won't have cause to read its data again. For now, keep the logic
+ * the same as it was.)
*/
if (shared->page_status[slotno] == SLRU_PAGE_VALID)
SlruInternalWritePage(ctl, slotno, NULL);
@@ -1313,18 +1311,40 @@ restart:
}
/*
+ * Determine whether a segment is okay to delete.
+ *
+ * segpage is the first page of the segment, and cutoffPage is the oldest (in
+ * PagePrecedes order) page in the SLRU containing still-useful data. Since
+ * every core PagePrecedes callback implements "wrap around", check the
+ * segment's first and last pages:
+ *
+ * first<cutoff && last<cutoff: yes
+ * first<cutoff && last>=cutoff: no; cutoff falls inside this segment
+ * first>=cutoff && last<cutoff: no; wrap point falls inside this segment
+ * first>=cutoff && last>=cutoff: no; every page of this segment is too young
+ */
+static bool
+SlruMayDeleteSegment(SlruCtl ctl, int segpage, int cutoffPage)
+{
+ int seg_last_page = segpage + SLRU_PAGES_PER_SEGMENT - 1;
+
+ Assert(segpage % SLRU_PAGES_PER_SEGMENT == 0);
+
+ return (ctl->PagePrecedes(segpage, cutoffPage) &&
+ ctl->PagePrecedes(seg_last_page, cutoffPage));
+}
+
+/*
* SlruScanDirectory callback
- * This callback reports true if there's any segment prior to the one
- * containing the page passed as "data".
+ * This callback reports true if there's any segment wholly prior to the
+ * one containing the page passed as "data".
*/
bool
SlruScanDirCbReportPresence(SlruCtl ctl, char *filename, int segpage, void *data)
{
int cutoffPage = *(int *) data;
- cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
-
- if (ctl->PagePrecedes(segpage, cutoffPage))
+ if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
return true; /* found one; don't iterate any more */
return false; /* keep going */
@@ -1339,7 +1359,7 @@ SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename, int segpage, void *data)
{
int cutoffPage = *(int *) data;
- if (ctl->PagePrecedes(segpage, cutoffPage))
+ if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
SlruInternalDeleteSegment(ctl, filename);
return false; /* keep going */
On Wed, Feb 13, 2019 at 11:26:23PM -0800, Noah Misch wrote:
On Sat, Feb 02, 2019 at 03:38:22AM -0500, Noah Misch wrote:
The main consequence is the false alarm.
That conclusion was incorrect. On further study, I was able to reproduce data
loss via either of two weaknesses in the "apparent wraparound" test:1. The result of the test is valid only until we release the SLRU ControlLock,
which we do before SlruScanDirCbDeleteCutoff() uses the cutoff to evaluate
segments for deletion. Once we release that lock, latest_page_number can
advance. This creates a TOCTOU race condition, allowing excess deletion:[local] test=# table trunc_clog_concurrency ;
ERROR: could not access status of transaction 2149484247
DETAIL: Could not open file "pg_xact/0801": No such file or directory.2. By the time the "apparent wraparound" test fires, we've already WAL-logged
the truncation. clog_redo() suppresses the "apparent wraparound" test,
then deletes too much. Startup then fails:881997 2019-02-10 02:53:32.105 GMT FATAL: could not access status of transaction 708112327
881997 2019-02-10 02:53:32.105 GMT DETAIL: Could not open file "pg_xact/02A3": No such file or directory.
881855 2019-02-10 02:53:32.107 GMT LOG: startup process (PID 881997) exited with exit code 1Fixes are available:
a. Fix the rounding in SimpleLruTruncate(). (The patch I posted upthread is
wrong; I will correct it in a separate message.)b. Arrange so only one backend runs vac_truncate_clog() at a time. Other than
AsyncCtl, every SLRU truncation appears in vac_truncate_clog(), in a
checkpoint, or in the startup process. Hence, also arrange for only one
backend to call SimpleLruTruncate(AsyncCtl) at a time.c. Test "apparent wraparound" before writing WAL, and don't WAL-log
truncations that "apparent wraparound" forces us to skip.d. Hold the ControlLock for the entirety of SimpleLruTruncate(). This removes
the TOCTOU race condition, but TransactionIdDidCommit() and other key
operations would be waiting behind filesystem I/O.e. Have the SLRU track a "low cutoff" for an ongoing truncation. Initially,
the low cutoff is the page furthest in the past relative to cutoffPage (the
"high cutoff"). If SimpleLruZeroPage() wishes to use a page in the
truncation range, it would acquire an LWLock and increment the low cutoff.
Before unlinking any segment, SlruScanDirCbDeleteCutoff() would take the
same LWLock and recheck the segment against the latest low cutoff.With both (a) and (b), the only way I'd know to reach the "apparent
wraparound" message is to restart in single-user mode and burn XIDs to the
point of bona fide wraparound. Hence, I propose to back-patch (a) and (b),
and I propose (c) for HEAD only. I don't want (d), which threatens
performance too much. I would rather not have (e), because I expect it's more
complex than (b) and fixes strictly less than (b) fixes.Can you see a way to improve on that plan? Can you see other bugs of this
nature that this plan does not fix?
Seems reasonable, although I wonder how much more expensive would just
doing (d) be. It seems by far the least complex solution, and it moves
"just" the SlruScanDirectory() call before the lock. It's true it adds
I/O requests, OTOH it's just unlink() without fsync() and I'd expect the
number of files to be relatively low. Plus we already do SimpleLruWaitIO
and SlruInternalWritePage in the loop.
BTW isn't that an issue that SlruInternalDeleteSegment does not do any
fsync calls after unlinking the segments? If the system crashes/reboots
before this becomes persistent (i.e. some of the segments reappear,
won't that cause a problem)?
It's a bit unfortunate that a patch for a data corruption / loss issue
(even if a low-probability one) fell through multiple commitfests.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Jan 05, 2020 at 01:33:55AM +0100, Tomas Vondra wrote:
On Wed, Feb 13, 2019 at 11:26:23PM -0800, Noah Misch wrote:
a. Fix the rounding in SimpleLruTruncate(). (The patch I posted upthread is
wrong; I will correct it in a separate message.)b. Arrange so only one backend runs vac_truncate_clog() at a time. Other than
AsyncCtl, every SLRU truncation appears in vac_truncate_clog(), in a
checkpoint, or in the startup process. Hence, also arrange for only one
backend to call SimpleLruTruncate(AsyncCtl) at a time.c. Test "apparent wraparound" before writing WAL, and don't WAL-log
truncations that "apparent wraparound" forces us to skip.d. Hold the ControlLock for the entirety of SimpleLruTruncate(). This removes
the TOCTOU race condition, but TransactionIdDidCommit() and other key
operations would be waiting behind filesystem I/O.
With both (a) and (b), the only way I'd know to reach the "apparent
wraparound" message is to restart in single-user mode and burn XIDs to the
point of bona fide wraparound. Hence, I propose to back-patch (a) and (b),
and I propose (c) for HEAD only. I don't want (d), which threatens
performance too much. I would rather not have (e), because I expect it's more
complex than (b) and fixes strictly less than (b) fixes.
Seems reasonable, although I wonder how much more expensive would just
doing (d) be. It seems by far the least complex solution, and it moves
"just" the SlruScanDirectory() call before the lock. It's true it adds
I/O requests, OTOH it's just unlink() without fsync() and I'd expect the
number of files to be relatively low. Plus we already do SimpleLruWaitIO
and SlruInternalWritePage in the loop.
Trivial read-only transactions often need CLogControlLock to check tuple
visibility. If an unlink() takes 1s, stalling read-only transactions for that
1s is a big problem. SimpleLruWaitIO() and SlruInternalWritePage() release
the control lock during I/O, then re-acquire it. (Moreover,
SimpleLruTruncate() rarely calls them. Calling them implies a page old enough
to truncate was modified after the most recent checkpoint.)
BTW isn't that an issue that SlruInternalDeleteSegment does not do any
fsync calls after unlinking the segments? If the system crashes/reboots
before this becomes persistent (i.e. some of the segments reappear,
won't that cause a problem)?
I think not; we could turn SlruInternalDeleteSegment() into a no-op, and the
only SQL-visible consequence would be extra disk usage. CheckPoint fields
tell the server what region of slru data is meaningful, and segments outside
that range merely waste disk space. (If that's not true and can't be made
true, we'd also need to stop ignoring the unlink() return value.)
It's a bit unfortunate that a patch for a data corruption / loss issue
(even if a low-probability one) fell through multiple commitfests.
Thanks for investing in steps to fix that.
Noah Misch <noah@leadboat.com> writes:
On Sun, Jan 05, 2020 at 01:33:55AM +0100, Tomas Vondra wrote:
It's a bit unfortunate that a patch for a data corruption / loss issue
(even if a low-probability one) fell through multiple commitfests.
Thanks for investing in steps to fix that.
Yeah, this patch has been waiting in the queue for way too long :-(.
I spent some time studying this, and I have a few comments/questions:
1. It seems like this discussion is conflating two different issues.
The original complaint was about a seemingly-bogus log message "could
not truncate directory "pg_xact": apparent wraparound". Your theory
about that, IIUC, is that SimpleLruTruncate's initial round-back of
cutoffPage to a segment boundary moved us from a state where
shared->latest_page_number doesn't logically precede the cutoffPage to
one where it does, triggering the message. So removing the initial
round-back, and then doing whatever's needful to compensate for that in
the later processing, is a reasonable fix to prevent the bogus warning.
However, you're also discussing whether or not an SLRU segment file that
is close to the wraparound boundary should get removed or not. As far
as I can see that's 100% independent of issuance of the log message, no?
This might not affect the code substance of the patch at all; but it
seems like we need to be clear about it in our discussion, and maybe the
comments need to change too.
2. I wonder whether we have an issue even with rounding back to the
SLRU page boundary, as is done by each caller before we ever get to
SimpleLruTruncate. I'm pretty sure that the actual anti-wraparound
protections are all precise to the XID level, so that there's some
daylight between what SimpleLruTruncate is told and the range of
data that the higher-level code thinks is of interest. Maybe we
should restructure things so that we keep the original cutoff XID
(or equivalent) all the way through, and compare the start/end
positions of a segment file directly to that.
3. It feels like the proposed test of cutoff position against both
ends of a segment is a roundabout way of fixing the problem. I
wonder whether we ought not pass *both* the cutoff and the current
endpoint (latest_page_number) down to the truncation logic, and
have it compare against both of those values.
To try to clarify this in my head, I thought about an image of
the modular XID space as an octagon, where each side would correspond to
a segment file if we chose numbers such that there were only 8 possible
segment files. Let's say that nextXID is somewhere in the bottommost
octagon segment. The oldest possible value for the truncation cutoff
XID is a bit less than "halfway around" from nextXID; so it could be
in the topmost octagon segment, if the minimum permitted daylight-
till-wraparound is less than the SLRU segment size (which it is).
Then, if we round the cutoff XID "back" to a segment boundary, most
of the current (bottommost) segment is now less than halfway around
from that cutoff point, and in particular the current segment's
starting page is exactly halfway around. Because of the way that
TransactionIdPrecedes works, the current segment will be considered to
precede that cutoff point (the int32 difference comes out as exactly
2^31 which is negative). Boom, data loss, because we'll decide the
current segment is removable.
I think that your proposed patch does fix this, but I'm not quite
sold that the corner cases (where the cutoff XID is itself exactly
at a page boundary) are right. In any case, I think it'd be more
robust to be comparing explicitly against a notion of the latest
in-use page number, instead of backing into it from an assumption
that the cutoff XID itself is less than halfway around.
I wonder if we ought to dodge the problem by having a higher minimum
value of the required daylight-before-wraparound, so that the cutoff
point couldn't be in the diametrically-opposite-current segment but
would have to be at least one segment before that. In the end,
I believe that all of this logic was written under an assumption
that we should never get into a situation where we are so close
to the wraparound threshold that considerations like these would
manifest. Maybe we can get it right, but I don't have huge
faith in it.
It also bothers me that some of the callers of SimpleLruTruncate
have explicit one-count backoffs of the cutoff point and others
do not. There's no obvious reason for the difference, so I wonder
if that isn't something we should have across-the-board, or else
adjust SimpleLruTruncate to do the equivalent thing internally.
I haven't thought much yet about your second point about race
conditions arising from nextXID possibly moving before we
finish the deletion scan. Maybe we could integrate a fix for
that issue, along the lines of (1) see an SLRU segment file,
(2) determine that it appears to precede the cutoff XID, if so
(3) acquire the control lock and fetch latest_page_number,
compare against that to verify that the segment file is old
and not new, then (4) unlink if that still holds.
regards, tom lane
On Thu, Mar 19, 2020 at 06:04:52PM -0400, Tom Lane wrote:
Yeah, this patch has been waiting in the queue for way too long :-(.
Thanks for reviewing.
I spent some time studying this, and I have a few comments/questions:
1. It seems like this discussion is conflating two different issues.
The original complaint was about a seemingly-bogus log message "could
not truncate directory "pg_xact": apparent wraparound". Your theory
about that, IIUC, is that SimpleLruTruncate's initial round-back of
cutoffPage to a segment boundary moved us from a state where
shared->latest_page_number doesn't logically precede the cutoffPage to
one where it does, triggering the message. So removing the initial
round-back, and then doing whatever's needful to compensate for that in
the later processing, is a reasonable fix to prevent the bogus warning.
However, you're also discussing whether or not an SLRU segment file that
is close to the wraparound boundary should get removed or not. As far
When the newest XID and the oldest XID fall in "opposite" segments in the XID
space, we must not unlink the segment containing the newest XID. That is the
chief goal at present.
as I can see that's 100% independent of issuance of the log message, no?
Perhaps confusing is that the first message of the thread and the subject line
contain wrong claims, which I corrected in the 2019-02-13 message[1]/messages/by-id/20190214072623.GA1139206@rfd.leadboat.com. Due to
point (2) in [1]/messages/by-id/20190214072623.GA1139206@rfd.leadboat.com, it's essential to make the "apparent wraparound" message a
can't-happen event. Hence, I'm not looking to improve the message or its
firing conditions. I want to fix the excess segment deletion, at which point
the message will become unreachable except under single-user mode.
2. I wonder whether we have an issue even with rounding back to the
SLRU page boundary, as is done by each caller before we ever get to
SimpleLruTruncate. I'm pretty sure that the actual anti-wraparound
protections are all precise to the XID level, so that there's some
daylight between what SimpleLruTruncate is told and the range of
data that the higher-level code thinks is of interest. Maybe we
should restructure things so that we keep the original cutoff XID
(or equivalent) all the way through, and compare the start/end
positions of a segment file directly to that.
Currently, slru.c knows nothing about the division of pages into records.
Hmm. To keep oldestXact all the way through, I suppose the PagePrecedes
callback could become one or more record-oriented (e.g. XID-oriented)
callbacks. The current scheme just relies on TransactionIdToPage() in
TruncateCLOG(). If TransactionIdToPage() had a bug, all sorts of CLOG lookups
would do wrong things. Hence, I think today's scheme is tougher to get wrong.
Do you see it differently?
3. It feels like the proposed test of cutoff position against both
ends of a segment is a roundabout way of fixing the problem. I
wonder whether we ought not pass *both* the cutoff and the current
endpoint (latest_page_number) down to the truncation logic, and
have it compare against both of those values.
Since latest_page_number can keep changing throughout SlruScanDirectory()
execution, that would give a false impression of control. Better to
demonstrate that the xidWrapLimit machinery keeps latest_page_number within
acceptable constraints than to ascribe significance to a comparison with a
stale latest_page_number.
To try to clarify this in my head, I thought about an image of
the modular XID space as an octagon, where each side would correspond to
a segment file if we chose numbers such that there were only 8 possible
segment files. Let's say that nextXID is somewhere in the bottommost
octagon segment. The oldest possible value for the truncation cutoff
XID is a bit less than "halfway around" from nextXID; so it could be
in the topmost octagon segment, if the minimum permitted daylight-
till-wraparound is less than the SLRU segment size (which it is).
Then, if we round the cutoff XID "back" to a segment boundary, most
of the current (bottommost) segment is now less than halfway around
from that cutoff point, and in particular the current segment's
starting page is exactly halfway around. Because of the way that
TransactionIdPrecedes works, the current segment will be considered to
precede that cutoff point (the int32 difference comes out as exactly
2^31 which is negative). Boom, data loss, because we'll decide the
current segment is removable.
Exactly.
https://docs.google.com/drawings/d/1xRTbQ4DVyP5wI1Ujm_gmmY-cC8KKCjahEtsU_o0fC7I
uses your octagon to show the behaviors before and after this patch.
I think that your proposed patch does fix this, but I'm not quite
sold that the corner cases (where the cutoff XID is itself exactly
at a page boundary) are right.
That's a good thing to worry about. More specifically, I think the edge case
to check is when oldestXact is the last XID of a _segment_. That case
maximizes the XIDs we can delete. At that time, xidWrapLimit should likewise
fall near the end of some opposing segment that we refuse to unlink.
It also bothers me that some of the callers of SimpleLruTruncate
have explicit one-count backoffs of the cutoff point and others
do not. There's no obvious reason for the difference, so I wonder
if that isn't something we should have across-the-board, or else
adjust SimpleLruTruncate to do the equivalent thing internally.
Consider the case of PerformOffsetsTruncation(). If newOldestMulti is the
first of a page, then SimpleLruTruncate() gets the previous page. If that
page and the newOldestMulti page fall in different segments, could we unlink
the segment that contained newOldestMulti, or does some other off-by-one
compensate? I'm not sure. I do know that to lose mxact data this way, one
must reach multiStopLimit, restart in single user mode, and consume all the
way to the edge of multiWrapLimit. Considering the rarity of those
preconditions, I am not inclined to bundle a fix with $SUBJECT. (Even OID
reuse race conditions may present more risk than this does.) If someone does
pursue a fix here, I recommend looking at other fixes before making the other
callers subtract one. The subtraction in TruncateSUBTRANS() and
PerformOffsetsTruncation() is a hack.
[1]: /messages/by-id/20190214072623.GA1139206@rfd.leadboat.com
Noah Misch <noah@leadboat.com> writes:
On Thu, Mar 19, 2020 at 06:04:52PM -0400, Tom Lane wrote:
1. It seems like this discussion is conflating two different issues.
When the newest XID and the oldest XID fall in "opposite" segments in the XID
space, we must not unlink the segment containing the newest XID. That is the
chief goal at present.
Got it. Thanks for clarifying the scope of the patch.
3. It feels like the proposed test of cutoff position against both
ends of a segment is a roundabout way of fixing the problem. I
wonder whether we ought not pass *both* the cutoff and the current
endpoint (latest_page_number) down to the truncation logic, and
have it compare against both of those values.
Since latest_page_number can keep changing throughout SlruScanDirectory()
execution, that would give a false impression of control. Better to
demonstrate that the xidWrapLimit machinery keeps latest_page_number within
acceptable constraints than to ascribe significance to a comparison with a
stale latest_page_number.
Perhaps. I'm prepared to accept that line of argument so far as the clog
SLRU goes, but I'm not convinced that the other SLRUs have equally robust
defenses against advancing too far. So on the whole I'd rather that the
SLRU logic handled this issue strictly on the basis of what it knows,
without assumptions about what calling code may be doing. Still, maybe
we only really care about the risk for the clog SLRU?
To try to clarify this in my head, I thought about an image of
the modular XID space as an octagon, where each side would correspond to
a segment file if we chose numbers such that there were only 8 possible
segment files.
Exactly.
https://docs.google.com/drawings/d/1xRTbQ4DVyP5wI1Ujm_gmmY-cC8KKCjahEtsU_o0fC7I
uses your octagon to show the behaviors before and after this patch.
Cool, thanks for drafting that up. (My original sketch was not of
publishable quality ;-).) To clarify, the upper annotations probably
ought to read "nextXid <= xidWrapLimit"? And "cutoffPage" ought
to be affixed to the orange dot at lower right of the center image?
I agree that this diagram depicts why we have a problem right now,
and the right-hand image shows what we want to have happen.
What's a little less clear is whether the proposed patch achieves
that effect.
In particular, after studying this awhile, it seems like removal
of the initial "cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT"
adjustment isn't really affecting anything. It's already the case
that just by allowing oldestXact to get rounded back to an SLRU page
boundary, we've created some daylight between oldestXact and the
cutoff point. Rounding back further within the same SLRU segment
changes nothing. (For example, suppose that oldestXact is already
within the oldest page of its SLRU segment. Then either rounding
rule has the same effect. But there's still a little bit of room for
xidWrapLimit to be in the opposite SLRU segment, allowing trouble.)
So I think what we're actually trying to accomplish here is to
ensure that instead of deleting up to half of the SLRU space
before the cutoff, we delete up to half-less-one-segment.
Maybe it should be half-less-two-segments, just to provide some
cushion against edge cases. Reading the first comment in
SetTransactionIdLimit makes one not want to trust too much in
arguments based on the exact value of xidWrapLimit, while for
the other SLRUs it was already unclear whether the edge cases
were exactly right.
In any case, it feels like the specific solution you have here,
of testing both ends of the segment, is a roundabout way of
providing that one-segment slop; and it doesn't help if we decide
we need two-segment slop. Can we write the test in a way that
explicitly provides N segments of slop?
regards, tom lane
On Wed, Mar 25, 2020 at 04:42:31PM -0400, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
On Thu, Mar 19, 2020 at 06:04:52PM -0400, Tom Lane wrote:
3. It feels like the proposed test of cutoff position against both
ends of a segment is a roundabout way of fixing the problem. I
wonder whether we ought not pass *both* the cutoff and the current
endpoint (latest_page_number) down to the truncation logic, and
have it compare against both of those values.Since latest_page_number can keep changing throughout SlruScanDirectory()
execution, that would give a false impression of control. Better to
demonstrate that the xidWrapLimit machinery keeps latest_page_number within
acceptable constraints than to ascribe significance to a comparison with a
stale latest_page_number.Perhaps. I'm prepared to accept that line of argument so far as the clog
SLRU goes, but I'm not convinced that the other SLRUs have equally robust
defenses against advancing too far. So on the whole I'd rather that the
SLRU logic handled this issue strictly on the basis of what it knows,
without assumptions about what calling code may be doing. Still, maybe
we only really care about the risk for the clog SLRU?
PerformOffsetsTruncation() is the most at-risk, since a single VACUUM could
burn millions of multixacts via FreezeMultiXactId() calls. (To make that
happen in single-user mode, I suspect one could use prepared transactions as
active lockers and/or in-progress updaters.) I'm not concerned about other
SLRUs. TruncateCommitTs() moves in lockstep with TruncateCLOG(). The other
SimpleLruTruncate() callers handle data that becomes obsolete at every
postmaster restart.
Exactly.
https://docs.google.com/drawings/d/1xRTbQ4DVyP5wI1Ujm_gmmY-cC8KKCjahEtsU_o0fC7I
uses your octagon to show the behaviors before and after this patch.Cool, thanks for drafting that up. (My original sketch was not of
publishable quality ;-).) To clarify, the upper annotations probably
ought to read "nextXid <= xidWrapLimit"?
It diagrams the scenario of nextXid reaching xidWrapLimit, so the green dot
represents both values.
And "cutoffPage" ought
to be affixed to the orange dot at lower right of the center image?
No; oldestXact and cutoffPage have the same position in that diagram, because
the patch causes the cutoffPage variable to denote the page that contains
oldestXact. I've now added an orange dot to show that.
I agree that this diagram depicts why we have a problem right now,
and the right-hand image shows what we want to have happen.
What's a little less clear is whether the proposed patch achieves
that effect.In particular, after studying this awhile, it seems like removal
of the initial "cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT"
adjustment isn't really affecting anything.
True. The set of unlink() calls needs to be the same for oldestXact in the
first page of a segment, in the last page, or in some interior page. Removing
the rounding neither helps nor hurts correctness.
So I think what we're actually trying to accomplish here is to
ensure that instead of deleting up to half of the SLRU space
before the cutoff, we delete up to half-less-one-segment.
Maybe it should be half-less-two-segments, just to provide some
cushion against edge cases. Reading the first comment in
SetTransactionIdLimit makes one not want to trust too much in
arguments based on the exact value of xidWrapLimit, while for
the other SLRUs it was already unclear whether the edge cases
were exactly right.
That could be interesting insurance. While it would be sad for us to miss an
edge case and print "must be vacuumed within 2 transactions" when wrap has
already happened, reaching that message implies the DBA burned ~1M XIDs, all
in single-user mode. More plausible is FreezeMultiXactId() overrunning the
limit by tens of segments. Hence, if we do buy this insurance, let's skip far
more segments. For example, instead of unlinking segments representing up to
2^31 past XIDs, we could divide that into an upper half that we unlink and a
lower half. The lower half will stay in place; eventually, XID consumption
will overwrite it. Truncation behavior won't change until the region of CLOG
for pre-oldestXact XIDs exceeds 256 MiB. Beyond that threshold,
vac_truncate_clog() will unlink the upper 256 MiB and leave the rest. CLOG
maximum would rise from 512 MiB to 768 MiB. Would that be worthwhile?
Noah Misch <noah@leadboat.com> writes:
On Wed, Mar 25, 2020 at 04:42:31PM -0400, Tom Lane wrote:
So I think what we're actually trying to accomplish here is to
ensure that instead of deleting up to half of the SLRU space
before the cutoff, we delete up to half-less-one-segment.
Maybe it should be half-less-two-segments, just to provide some
cushion against edge cases. Reading the first comment in
SetTransactionIdLimit makes one not want to trust too much in
arguments based on the exact value of xidWrapLimit, while for
the other SLRUs it was already unclear whether the edge cases
were exactly right.
That could be interesting insurance. While it would be sad for us to miss an
edge case and print "must be vacuumed within 2 transactions" when wrap has
already happened, reaching that message implies the DBA burned ~1M XIDs, all
in single-user mode. More plausible is FreezeMultiXactId() overrunning the
limit by tens of segments. Hence, if we do buy this insurance, let's skip far
more segments. For example, instead of unlinking segments representing up to
2^31 past XIDs, we could divide that into an upper half that we unlink and a
lower half. The lower half will stay in place; eventually, XID consumption
will overwrite it. Truncation behavior won't change until the region of CLOG
for pre-oldestXact XIDs exceeds 256 MiB. Beyond that threshold,
vac_truncate_clog() will unlink the upper 256 MiB and leave the rest. CLOG
maximum would rise from 512 MiB to 768 MiB. Would that be worthwhile?
Hmm. I'm not particularly concerned about the disk-space-consumption
angle, but I do wonder about whether we'd be sacrificing the ability to
recover cleanly from a situation that the code does let you get into.
However, as long as we're sure that the system will ultimately reuse/
recycle a not-deleted old segment file without complaint, it's hard to
find much fault with your proposal. Temporarily wasting some disk
space is a lot more palatable than corrupting data, and these code
paths are necessarily not terribly well tested. So +1 for more
insurance.
regards, tom lane
On Mon, Apr 06, 2020 at 02:46:09PM -0400, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
On Wed, Mar 25, 2020 at 04:42:31PM -0400, Tom Lane wrote:
So I think what we're actually trying to accomplish here is to
ensure that instead of deleting up to half of the SLRU space
before the cutoff, we delete up to half-less-one-segment.
Maybe it should be half-less-two-segments, just to provide some
cushion against edge cases. Reading the first comment in
SetTransactionIdLimit makes one not want to trust too much in
arguments based on the exact value of xidWrapLimit, while for
the other SLRUs it was already unclear whether the edge cases
were exactly right.That could be interesting insurance. While it would be sad for us to miss an
edge case and print "must be vacuumed within 2 transactions" when wrap has
already happened, reaching that message implies the DBA burned ~1M XIDs, all
in single-user mode. More plausible is FreezeMultiXactId() overrunning the
limit by tens of segments. Hence, if we do buy this insurance, let's skip far
more segments. For example, instead of unlinking segments representing up to
2^31 past XIDs, we could divide that into an upper half that we unlink and a
lower half. The lower half will stay in place; eventually, XID consumption
will overwrite it. Truncation behavior won't change until the region of CLOG
for pre-oldestXact XIDs exceeds 256 MiB. Beyond that threshold,
vac_truncate_clog() will unlink the upper 256 MiB and leave the rest. CLOG
maximum would rise from 512 MiB to 768 MiB. Would that be worthwhile?Hmm. I'm not particularly concerned about the disk-space-consumption
angle, but I do wonder about whether we'd be sacrificing the ability to
recover cleanly from a situation that the code does let you get into.
However, as long as we're sure that the system will ultimately reuse/
recycle a not-deleted old segment file without complaint, it's hard to
find much fault with your proposal.
That is the trade-off. By distancing ourselves from the wraparound edge
cases, we'll get more segment recycling (heretofore an edge case).
Fortunately, recycling doesn't change behavior as you approach some limit; it
works or it doesn't.
Temporarily wasting some disk
space is a lot more palatable than corrupting data, and these code
paths are necessarily not terribly well tested. So +1 for more
insurance.
Okay, I'll give that a try. I expect this will replace the PagePrecedes
callback with a PageDiff callback such that PageDiff(a, b) < 0 iff
PagePrecedes(a, b). PageDiff callbacks shall distribute return values
uniformly in [INT_MIN,INT_MAX]. SimpleLruTruncate() will unlink segments
where INT_MIN/2 < PageDiff(candidate, cutoff) < 0.
On Mon, Apr 06, 2020 at 09:18:47PM -0700, Noah Misch wrote:
On Mon, Apr 06, 2020 at 02:46:09PM -0400, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
On Wed, Mar 25, 2020 at 04:42:31PM -0400, Tom Lane wrote:
So I think what we're actually trying to accomplish here is to
ensure that instead of deleting up to half of the SLRU space
before the cutoff, we delete up to half-less-one-segment.
Maybe it should be half-less-two-segments, just to provide some
cushion against edge cases. Reading the first comment in
SetTransactionIdLimit makes one not want to trust too much in
arguments based on the exact value of xidWrapLimit, while for
the other SLRUs it was already unclear whether the edge cases
were exactly right.That could be interesting insurance. While it would be sad for us to miss an
edge case and print "must be vacuumed within 2 transactions" when wrap has
already happened, reaching that message implies the DBA burned ~1M XIDs, all
in single-user mode. More plausible is FreezeMultiXactId() overrunning the
limit by tens of segments. Hence, if we do buy this insurance, let's skip far
more segments. For example, instead of unlinking segments representing up to
2^31 past XIDs, we could divide that into an upper half that we unlink and a
lower half. The lower half will stay in place; eventually, XID consumption
will overwrite it. Truncation behavior won't change until the region of CLOG
for pre-oldestXact XIDs exceeds 256 MiB. Beyond that threshold,
vac_truncate_clog() will unlink the upper 256 MiB and leave the rest. CLOG
maximum would rise from 512 MiB to 768 MiB. Would that be worthwhile?
Temporarily wasting some disk
space is a lot more palatable than corrupting data, and these code
paths are necessarily not terribly well tested. So +1 for more
insurance.Okay, I'll give that a try. I expect this will replace the PagePrecedes
callback with a PageDiff callback such that PageDiff(a, b) < 0 iff
PagePrecedes(a, b). PageDiff callbacks shall distribute return values
uniformly in [INT_MIN,INT_MAX]. SimpleLruTruncate() will unlink segments
where INT_MIN/2 < PageDiff(candidate, cutoff) < 0.
While doing so, I found that slru-truncate-modulo-v2.patch did get edge cases
wrong, as you feared. In particular, if the newest XID reached xidStopLimit
and was in the first page of a segment, TruncateCLOG() would delete its
segment. Attached slru-truncate-modulo-v3.patch fixes that; as restitution, I
added unit tests covering that and other scenarios. Reaching the bug via XIDs
was hard, requiring one to burn 1000k-CLOG_XACTS_PER_PAGE=967k XIDs in
single-user mode. I expect the bug was easier to reach via pg_multixact.
The insurance patch stacks on top of the bug fix patch. It does have a
negative effect on TruncateMultiXact(), which uses SlruScanDirCbFindEarliest
to skip truncation in corrupted clusters. SlruScanDirCbFindEarliest() gives
nonsense answers if "future" segments exist. That can happen today, but the
patch creates new ways to make it happen. The symptom is wasting yet more
space in pg_multixact. I am okay with this, since it arises only after one
fills pg_multixact 50% full. There are alternatives. We could weaken the
corruption defense in TruncateMultiXact() or look for another implementation
of equivalent defense. We could unlink, say, 75% or 95% of the "past" instead
of 50% (this patch) or >99.99% (today's behavior).
Thanks,
nm
Attachments:
slru-truncate-modulo-v3.patchtext/plain; charset=us-asciiDownload
Author: Noah Misch <noah@leadboat.com>
Commit: Noah Misch <noah@leadboat.com>
Prevent excess SimpleLruTruncate() deletion.
Every core SLRU wraps around. With the exception of pg_notify, the wrap
point can fall in the middle of a page. Account for this in the
PagePrecedes callback specification and in SimpleLruTruncate()'s use of
said callback. Update each callback implementation to fit the new
specification. This changes SerialPagePrecedesLogically() from the
style of asyncQueuePagePrecedes() to the style of CLOGPagePrecedes().
(Whereas pg_clog and pg_serial share a key space, pg_serial is nothing
like pg_notify.)
This closes a rare opportunity for data loss, which manifested as
"apparent wraparound" or "could not access status of transaction"
errors. This is more likely to affect pg_multixact, due to the thin
space between multiStopLimit and multiWrapLimit. If a user's physical
replication primary logged ": apparent wraparound" messages, the user
should rebuild that standbys of that primary regardless of symptoms. At
less risk is a cluster having emitted "not accepting commands" errors or
"must be vacuumed" warnings at some point. One can test a cluster for
this data loss by running VACUUM FREEZE in every database. Back-patch
to 9.5 (all supported versions).
Reviewed by Tom Lane.
Discussion: https://postgr.es/m/20190202083822.GC32531@gust.leadboat.com
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index f3da40a..d606042 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -693,6 +693,7 @@ CLOGShmemInit(void)
XactCtl->PagePrecedes = CLOGPagePrecedes;
SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER);
+ SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
}
/*
@@ -933,13 +934,22 @@ TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid)
/*
- * Decide which of two CLOG page numbers is "older" for truncation purposes.
+ * Decide whether a CLOG page number is "older" for truncation purposes.
*
* We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic. However, if we are asked about
- * page number zero, we don't want to hand InvalidTransactionId to
- * TransactionIdPrecedes: it'll get weird about permanent xact IDs. So,
- * offset both xids by FirstNormalTransactionId to avoid that.
+ * thing with wraparound XID arithmetic. However, TransactionIdPrecedes()
+ * would get weird about permanent xact IDs. So, offset both such that xid1,
+ * xid2, and xid + CLOG_XACTS_PER_PAGE - 1 are all normal XIDs; this offset is
+ * relevant to page 0 and to the page preceding page 0.
+ *
+ * The page containing oldestXact-2^31 is the important edge case. The
+ * portion of that page equaling or following oldestXact-2^31 is expendable,
+ * but the portion preceding oldestXact-2^31 is not. When oldestXact-2^31 is
+ * the first XID of a page and segment, the entire page and segment is
+ * expendable, and we could truncate the segment. Recognizing that case would
+ * require making oldestXact, not just the page containing oldestXact,
+ * available to this callback. The benefit would be rare and small, so we
+ * don't optimize that edge case.
*/
static bool
CLOGPagePrecedes(int page1, int page2)
@@ -948,11 +958,12 @@ CLOGPagePrecedes(int page1, int page2)
TransactionId xid2;
xid1 = ((TransactionId) page1) * CLOG_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId;
+ xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId;
+ xid2 += FirstNormalTransactionId + 1;
- return TransactionIdPrecedes(xid1, xid2);
+ return (TransactionIdPrecedes(xid1, xid2) &&
+ TransactionIdPrecedes(xid1, xid2 + CLOG_XACTS_PER_PAGE - 1));
}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 9cdb136..eaeb8c2 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -495,6 +495,7 @@ CommitTsShmemInit(void)
SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
CommitTsSLRULock, "pg_commit_ts",
LWTRANCHE_COMMITTS_BUFFER);
+ SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
commitTsShared = ShmemInitStruct("CommitTs shared",
sizeof(CommitTimestampShared),
@@ -883,14 +884,27 @@ AdvanceOldestCommitTsXid(TransactionId oldestXact)
/*
- * Decide which of two commitTS page numbers is "older" for truncation
- * purposes.
+ * Decide whether a commitTS page number is "older" for truncation purposes.
+ * Analogous to CLOGPagePrecedes().
*
- * We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic. However, if we are asked about
- * page number zero, we don't want to hand InvalidTransactionId to
- * TransactionIdPrecedes: it'll get weird about permanent xact IDs. So,
- * offset both xids by FirstNormalTransactionId to avoid that.
+ * At every supported BLCKSZ, (1 << 31) % COMMIT_TS_XACTS_PER_PAGE == 128.
+ * This introduces differences compared to CLOG and the other SLRUs having (1
+ * << 31) % per_page == 0. This function never tests exactly
+ * TransactionIdPrecedes(x-2^31, x). When the system reaches xidStopLimit,
+ * there are two possible counts of page boundaries between oldestXact and the
+ * latest XID assigned, depending on whether oldestXact is within the first
+ * 128 entries of its page. Since this function doesn't know the location of
+ * oldestXact within page2, it returns false for one page that actually is
+ * expendable. This is a wider (yet still negligible) version of the
+ * truncation opportunity that CLOGPagePrecedes() cannot recognize.
+ *
+ * For the sake of a worked example, number entries with decimal values such
+ * that page1==1 entries range from 1.0 to 1.999. Let N+0.15 be the number of
+ * pages that 2^31 entries will span (N is an integer). If oldestXact=N+2.1,
+ * then the final safe XID assignment leaves newestXact=1.95. We keep page 2,
+ * because entry=2.85 is the border that toggles whether entries precede the
+ * last entry of the oldestXact page. While page 2 is expendable at
+ * oldestXact=N+2.1, it would be precious at oldestXact=N+2.9.
*/
static bool
CommitTsPagePrecedes(int page1, int page2)
@@ -899,11 +913,12 @@ CommitTsPagePrecedes(int page1, int page2)
TransactionId xid2;
xid1 = ((TransactionId) page1) * COMMIT_TS_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId;
+ xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * COMMIT_TS_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId;
+ xid2 += FirstNormalTransactionId + 1;
- return TransactionIdPrecedes(xid1, xid2);
+ return (TransactionIdPrecedes(xid1, xid2) &&
+ TransactionIdPrecedes(xid1, xid2 + COMMIT_TS_XACTS_PER_PAGE - 1));
}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index ce84dac..ff96083 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1832,10 +1832,12 @@ MultiXactShmemInit(void)
"MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
MultiXactOffsetSLRULock, "pg_multixact/offsets",
LWTRANCHE_MULTIXACTOFFSET_BUFFER);
+ SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
SimpleLruInit(MultiXactMemberCtl,
"MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
MultiXactMemberSLRULock, "pg_multixact/members",
LWTRANCHE_MULTIXACTMEMBER_BUFFER);
+ /* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
/* Initialize our shared state struct */
MultiXactState = ShmemInitStruct("Shared MultiXact State",
@@ -2978,6 +2980,14 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
* truncate the members SLRU. So we first scan the directory to determine
* the earliest offsets page number that we can read without error.
*
+ * When nextMXact is less than one segment away from multiWrapLimit,
+ * SlruScanDirCbFindEarliest can find some early segment other than the
+ * actual earliest. (MultiXactOffsetPagePrecedes(EARLIEST, LATEST)
+ * returns false, because not all pairs of entries have the same answer.)
+ * That can also arise when an earlier truncation attempt failed unlink()
+ * or returned early from this function. The only consequence is
+ * returning early, which wastes space that we could have liberated.
+ *
* NB: It's also possible that the page that oldestMulti is on has already
* been truncated away, and we crashed before updating oldestMulti.
*/
@@ -3092,15 +3102,11 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
}
/*
- * Decide which of two MultiXactOffset page numbers is "older" for truncation
- * purposes.
- *
- * We need to use comparison of MultiXactId here in order to do the right
- * thing with wraparound. However, if we are asked about page number zero, we
- * don't want to hand InvalidMultiXactId to MultiXactIdPrecedes: it'll get
- * weird. So, offset both multis by FirstMultiXactId to avoid that.
- * (Actually, the current implementation doesn't do anything weird with
- * InvalidMultiXactId, but there's no harm in leaving this code like this.)
+ * Decide whether a MultiXactOffset page number is "older" for truncation
+ * purposes. Analogous to CLOGPagePrecedes().
+ *
+ * Offsetting the values is optional, because MultiXactIdPrecedes() has
+ * translational symmetry.
*/
static bool
MultiXactOffsetPagePrecedes(int page1, int page2)
@@ -3109,15 +3115,17 @@ MultiXactOffsetPagePrecedes(int page1, int page2)
MultiXactId multi2;
multi1 = ((MultiXactId) page1) * MULTIXACT_OFFSETS_PER_PAGE;
- multi1 += FirstMultiXactId;
+ multi1 += FirstMultiXactId + 1;
multi2 = ((MultiXactId) page2) * MULTIXACT_OFFSETS_PER_PAGE;
- multi2 += FirstMultiXactId;
+ multi2 += FirstMultiXactId + 1;
- return MultiXactIdPrecedes(multi1, multi2);
+ return (MultiXactIdPrecedes(multi1, multi2) &&
+ MultiXactIdPrecedes(multi1,
+ multi2 + MULTIXACT_OFFSETS_PER_PAGE - 1));
}
/*
- * Decide which of two MultiXactMember page numbers is "older" for truncation
+ * Decide whether a MultiXactMember page number is "older" for truncation
* purposes. There is no "invalid offset number" so use the numbers verbatim.
*/
static bool
@@ -3129,7 +3137,9 @@ MultiXactMemberPagePrecedes(int page1, int page2)
offset1 = ((MultiXactOffset) page1) * MULTIXACT_MEMBERS_PER_PAGE;
offset2 = ((MultiXactOffset) page2) * MULTIXACT_MEMBERS_PER_PAGE;
- return MultiXactOffsetPrecedes(offset1, offset2);
+ return (MultiXactOffsetPrecedes(offset1, offset2) &&
+ MultiXactOffsetPrecedes(offset1,
+ offset2 + MULTIXACT_MEMBERS_PER_PAGE - 1));
}
/*
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 61249f4..33e1e93 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -1219,11 +1219,6 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
pgstat_count_slru_truncate(shared->slru_stats_idx);
/*
- * The cutoff point is the start of the segment containing cutoffPage.
- */
- cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
-
- /*
* Scan shared memory and remove any pages preceding the cutoff page, to
* ensure we won't rewrite them later. (Since this is normally called in
* or just after a checkpoint, any dirty pages should have been flushed
@@ -1235,9 +1230,7 @@ restart:;
/*
* While we are holding the lock, make an important safety check: the
- * planned cutoff point must be <= the current endpoint page. Otherwise we
- * have already wrapped around, and proceeding with the truncation would
- * risk removing the current segment.
+ * current endpoint page must not be eligible for removal.
*/
if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
{
@@ -1269,8 +1262,11 @@ restart:;
* Hmm, we have (or may have) I/O operations acting on the page, so
* we've got to wait for them to finish and then start again. This is
* the same logic as in SlruSelectLRUPage. (XXX if page is dirty,
- * wouldn't it be OK to just discard it without writing it? For now,
- * keep the logic the same as it was.)
+ * wouldn't it be OK to just discard it without writing it?
+ * SlruMayDeleteSegment() uses a stricter qualification, so we might
+ * not delete this page in the end; even if we don't delete it, we
+ * won't have cause to read its data again. For now, keep the logic
+ * the same as it was.)
*/
if (shared->page_status[slotno] == SLRU_PAGE_VALID)
SlruInternalWritePage(ctl, slotno, NULL);
@@ -1361,18 +1357,133 @@ restart:
}
/*
+ * Determine whether a segment is okay to delete.
+ *
+ * segpage is the first page of the segment, and cutoffPage is the oldest (in
+ * PagePrecedes order) page in the SLRU containing still-useful data. Since
+ * every core PagePrecedes callback implements "wrap around", check the
+ * segment's first and last pages:
+ *
+ * first<cutoff && last<cutoff: yes
+ * first<cutoff && last>=cutoff: no; cutoff falls inside this segment
+ * first>=cutoff && last<cutoff: no; wrap point falls inside this segment
+ * first>=cutoff && last>=cutoff: no; every page of this segment is too young
+ */
+static bool
+SlruMayDeleteSegment(SlruCtl ctl, int segpage, int cutoffPage)
+{
+ int seg_last_page = segpage + SLRU_PAGES_PER_SEGMENT - 1;
+
+ Assert(segpage % SLRU_PAGES_PER_SEGMENT == 0);
+
+ return (ctl->PagePrecedes(segpage, cutoffPage) &&
+ ctl->PagePrecedes(seg_last_page, cutoffPage));
+}
+
+#ifdef USE_ASSERT_CHECKING
+static void
+SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
+{
+ TransactionId lhs,
+ rhs;
+ int newestPage,
+ oldestPage;
+ TransactionId newestXact,
+ oldestXact;
+
+ /*
+ * Compare an XID pair having undefined order (see RFC 1982), a pair at
+ * "opposite ends" of the XID space. TransactionIdPrecedes() treats each
+ * as preceding the other. If RHS is oldestXact, LHS is the first XID we
+ * must not assign.
+ */
+ lhs = per_page + offset; /* skip first page to avoid non-normal XIDs */
+ rhs = lhs + (1U << 31);
+ Assert(TransactionIdPrecedes(lhs, rhs));
+ Assert(TransactionIdPrecedes(rhs, lhs));
+ Assert(!TransactionIdPrecedes(lhs - 1, rhs));
+ Assert(TransactionIdPrecedes(rhs, lhs - 1));
+ Assert(TransactionIdPrecedes(lhs + 1, rhs));
+ Assert(!TransactionIdPrecedes(rhs, lhs + 1));
+ Assert(!TransactionIdFollowsOrEquals(lhs, rhs));
+ Assert(!TransactionIdFollowsOrEquals(rhs, lhs));
+ Assert(!ctl->PagePrecedes(lhs / per_page, lhs / per_page));
+ Assert(!ctl->PagePrecedes(lhs / per_page, rhs / per_page));
+ Assert(!ctl->PagePrecedes(rhs / per_page, lhs / per_page));
+ Assert(!ctl->PagePrecedes((lhs - per_page) / per_page, rhs / per_page));
+ Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 3 * per_page) / per_page));
+ Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 2 * per_page) / per_page));
+ Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 1 * per_page) / per_page)
+ || (1U << 31) % per_page != 0); /* See CommitTsPagePrecedes() */
+ Assert(ctl->PagePrecedes((lhs + 1 * per_page) / per_page, rhs / per_page)
+ || (1U << 31) % per_page != 0);
+ Assert(ctl->PagePrecedes((lhs + 2 * per_page) / per_page, rhs / per_page));
+ Assert(ctl->PagePrecedes((lhs + 3 * per_page) / per_page, rhs / per_page));
+ Assert(!ctl->PagePrecedes(rhs / per_page, (lhs + per_page) / per_page));
+
+ /*
+ * GetNewTransactionId() has assigned the last XID it can safely use, and
+ * that XID is in the *LAST* page of the second segment. We must not
+ * delete that segment.
+ */
+ newestPage = 2 * SLRU_PAGES_PER_SEGMENT - 1;
+ newestXact = newestPage * per_page + offset;
+ Assert(newestXact / per_page == newestPage);
+ oldestXact = newestXact + 1;
+ oldestXact -= 1U << 31;
+ oldestPage = oldestXact / per_page;
+ Assert(!SlruMayDeleteSegment(ctl,
+ (newestPage -
+ newestPage % SLRU_PAGES_PER_SEGMENT),
+ oldestPage));
+
+ /*
+ * GetNewTransactionId() has assigned the last XID it can safely use, and
+ * that XID is in the *FIRST* page of the second segment. We must not
+ * delete that segment.
+ */
+ newestPage = SLRU_PAGES_PER_SEGMENT;
+ newestXact = newestPage * per_page + offset;
+ Assert(newestXact / per_page == newestPage);
+ oldestXact = newestXact + 1;
+ oldestXact -= 1U << 31;
+ oldestPage = oldestXact / per_page;
+ Assert(!SlruMayDeleteSegment(ctl,
+ (newestPage -
+ newestPage % SLRU_PAGES_PER_SEGMENT),
+ oldestPage));
+}
+
+/*
+ * Unit-test a PagePrecedes function.
+ *
+ * This assumes every uint32 >= FirstNormalTransactionId is a valid key. It
+ * assumes each value occupies a contiguous, fixed-size region of SLRU bytes.
+ * (MultiXactMemberCtl separates flags from XIDs. AsyncCtl has
+ * variable-length entries, no keys, and no random access. These unit tests
+ * do not apply to them.)
+ */
+void
+SlruPagePrecedesUnitTests(SlruCtl ctl, int per_page)
+{
+ /* Test first, middle and last entries of a page. */
+ SlruPagePrecedesTestOffset(ctl, per_page, 0);
+ SlruPagePrecedesTestOffset(ctl, per_page, per_page / 2);
+ SlruPagePrecedesTestOffset(ctl, per_page, per_page - 1);
+}
+#endif
+
+/*
* SlruScanDirectory callback
- * This callback reports true if there's any segment prior to the one
- * containing the page passed as "data".
+ * This callback reports true if there's any segment wholly prior to the
+ * one containing the page passed as "data".
*/
bool
SlruScanDirCbReportPresence(SlruCtl ctl, char *filename, int segpage, void *data)
{
int cutoffPage = *(int *) data;
- cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
-
- if (ctl->PagePrecedes(segpage, cutoffPage))
+ if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
return true; /* found one; don't iterate any more */
return false; /* keep going */
@@ -1387,7 +1498,7 @@ SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename, int segpage, void *data)
{
int cutoffPage = *(int *) data;
- if (ctl->PagePrecedes(segpage, cutoffPage))
+ if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
SlruInternalDeleteSegment(ctl, filename);
return false; /* keep going */
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index f33ae40..09cfa38 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -196,6 +196,7 @@ SUBTRANSShmemInit(void)
LWTRANCHE_SUBTRANS_BUFFER);
/* Override default assumption that writes should be fsync'd */
SubTransCtl->do_fsync = false;
+ SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
}
/*
@@ -373,13 +374,8 @@ TruncateSUBTRANS(TransactionId oldestXact)
/*
- * Decide which of two SUBTRANS page numbers is "older" for truncation purposes.
- *
- * We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic. However, if we are asked about
- * page number zero, we don't want to hand InvalidTransactionId to
- * TransactionIdPrecedes: it'll get weird about permanent xact IDs. So,
- * offset both xids by FirstNormalTransactionId to avoid that.
+ * Decide whether a SUBTRANS page number is "older" for truncation purposes.
+ * Analogous to CLOGPagePrecedes().
*/
static bool
SubTransPagePrecedes(int page1, int page2)
@@ -388,9 +384,10 @@ SubTransPagePrecedes(int page1, int page2)
TransactionId xid2;
xid1 = ((TransactionId) page1) * SUBTRANS_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId;
+ xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * SUBTRANS_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId;
+ xid2 += FirstNormalTransactionId + 1;
- return TransactionIdPrecedes(xid1, xid2);
+ return (TransactionIdPrecedes(xid1, xid2) &&
+ TransactionIdPrecedes(xid1, xid2 + SUBTRANS_XACTS_PER_PAGE - 1));
}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index a3ba88d..7e5cd66 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -487,7 +487,12 @@ asyncQueuePageDiff(int p, int q)
return diff;
}
-/* Is p < q, accounting for wraparound? */
+/*
+ * Is p < q, accounting for wraparound?
+ *
+ * Since asyncQueueIsFull() blocks creation of a page that could precede any
+ * extant page, we need not assess entries within a page.
+ */
static bool
asyncQueuePagePrecedes(int p, int q)
{
@@ -1349,8 +1354,8 @@ asyncQueueIsFull(void)
* logically precedes the current global tail pointer, ie, the head
* pointer would wrap around compared to the tail. We cannot create such
* a head page for fear of confusing slru.c. For safety we round the tail
- * pointer back to a segment boundary (compare the truncation logic in
- * asyncQueueAdvanceTail).
+ * pointer back to a segment boundary (truncation logic in
+ * asyncQueueAdvanceTail does not do this, so doing it here is optional).
*
* Note that this test is *not* dependent on how much space there is on
* the current head page. This is necessary because asyncQueueAddEntries
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index ba93fb1..fde1b5c 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -438,7 +438,7 @@ static void SetPossibleUnsafeConflict(SERIALIZABLEXACT *roXact, SERIALIZABLEXACT
static void ReleaseRWConflict(RWConflict conflict);
static void FlagSxactUnsafe(SERIALIZABLEXACT *sxact);
-static bool SerialPagePrecedesLogically(int p, int q);
+static bool SerialPagePrecedesLogically(int page1, int page2);
static void SerialInit(void);
static void SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo);
static SerCommitSeqNo SerialGetMinConflictCommitSeqNo(TransactionId xid);
@@ -784,27 +784,77 @@ FlagSxactUnsafe(SERIALIZABLEXACT *sxact)
/*------------------------------------------------------------------------*/
/*
- * We will work on the page range of 0..SERIAL_MAX_PAGE.
- * Compares using wraparound logic, as is required by slru.c.
+ * Decide whether a Serial page number is "older" for truncation purposes.
+ * Analogous to CLOGPagePrecedes().
*/
static bool
-SerialPagePrecedesLogically(int p, int q)
+SerialPagePrecedesLogically(int page1, int page2)
{
- int diff;
+ TransactionId xid1;
+ TransactionId xid2;
+
+ xid1 = ((TransactionId) page1) * SERIAL_ENTRIESPERPAGE;
+ xid1 += FirstNormalTransactionId + 1;
+ xid2 = ((TransactionId) page2) * SERIAL_ENTRIESPERPAGE;
+ xid2 += FirstNormalTransactionId + 1;
+
+ return (TransactionIdPrecedes(xid1, xid2) &&
+ TransactionIdPrecedes(xid1, xid2 + SERIAL_ENTRIESPERPAGE - 1));
+}
+
+static void
+SerialPagePrecedesLogicallyUnitTests(void)
+{
+ int per_page = SERIAL_ENTRIESPERPAGE,
+ offset = per_page / 2;
+ int newestPage,
+ oldestPage,
+ headPage,
+ targetPage;
+ TransactionId newestXact,
+ oldestXact;
+
+ /* GetNewTransactionId() has assigned the last XID it can safely use. */
+ newestPage = 2 * SLRU_PAGES_PER_SEGMENT - 1; /* nothing special */
+ newestXact = newestPage * per_page + offset;
+ Assert(newestXact / per_page == newestPage);
+ oldestXact = newestXact + 1;
+ oldestXact -= 1U << 31;
+ oldestPage = oldestXact / per_page;
/*
- * We have to compare modulo (SERIAL_MAX_PAGE+1)/2. Both inputs should be
- * in the range 0..SERIAL_MAX_PAGE.
+ * In this scenario, the SLRU headPage pertains to the last ~1000 XIDs
+ * assigned. oldestXact finishes, ~2B XIDs having elapsed since it
+ * started. Further transactions cause us to summarize oldestXact to
+ * tailPage. Function must return false so SerialAdd() doesn't zero
+ * tailPage (which may contain entries for other old, recently-finished
+ * XIDs) and half the SLRU. Reaching this requires burning ~2B XIDs in
+ * single-user mode, a negligible possibility.
*/
- Assert(p >= 0 && p <= SERIAL_MAX_PAGE);
- Assert(q >= 0 && q <= SERIAL_MAX_PAGE);
-
- diff = p - q;
- if (diff >= ((SERIAL_MAX_PAGE + 1) / 2))
- diff -= SERIAL_MAX_PAGE + 1;
- else if (diff < -((int) (SERIAL_MAX_PAGE + 1) / 2))
- diff += SERIAL_MAX_PAGE + 1;
- return diff < 0;
+ headPage = newestPage;
+ targetPage = oldestPage;
+ Assert(!SerialPagePrecedesLogically(headPage, targetPage));
+
+ /*
+ * In this scenario, the SLRU headPage pertains to oldestXact. We're
+ * summarizing an XID near newestXact. (Assume few other XIDs used
+ * SERIALIZABLE, hence the minimal headPage advancement. Assume
+ * oldestXact was long-running and only recently reached the SLRU.)
+ * Function must return true to make SerialAdd() create targetPage.
+ *
+ * Today's implementation mishandles this case, but it doesn't matter
+ * enough to fix. Verify that the defect affects just one page by
+ * asserting correct treatment of its prior page. Reaching this case
+ * requires burning ~2B XIDs in single-user mode, a negligible
+ * possibility. Moreover, if it does happen, the consequence would be
+ * mild, namely a new transaction failing in SimpleLruReadPage().
+ */
+ headPage = oldestPage;
+ targetPage = newestPage;
+ Assert(SerialPagePrecedesLogically(headPage, targetPage - 1));
+#if 0
+ Assert(SerialPagePrecedesLogically(headPage, targetPage));
+#endif
}
/*
@@ -824,6 +874,8 @@ SerialInit(void)
LWTRANCHE_SERIAL_BUFFER);
/* Override default assumption that writes should be fsync'd */
SerialSlruCtl->do_fsync = false;
+ SerialPagePrecedesLogicallyUnitTests();
+ SlruPagePrecedesUnitTests(SerialSlruCtl, SERIAL_ENTRIESPERPAGE);
/*
* Create or attach to the SerialControl structure.
@@ -1032,7 +1084,7 @@ CheckPointPredicate(void)
}
else
{
- /*
+ /*----------
* The SLRU is no longer needed. Truncate to head before we set head
* invalid.
*
@@ -1041,6 +1093,25 @@ CheckPointPredicate(void)
* that we leave behind will appear to be new again. In that case it
* won't be removed until XID horizon advances enough to make it
* current again.
+ *
+ * XXX: This should happen in vac_truncate_clog(), not in checkpoints.
+ * Consider this scenario, starting from a system with no in-progress
+ * transactions and VACUUM FREEZE having maximized oldestXact:
+ * - Start a SERIALIZABLE transaction.
+ * - Start, finish, and summarize a SERIALIZABLE transaction, creating
+ * one SLRU page.
+ * - Consume XIDs to reach xidStopLimit.
+ * - Finish all transactions. Due to the long-running SERIALIZABLE
+ * transaction, earlier checkpoints did not touch headPage. The
+ * next checkpoint will change it, but that checkpoint happens after
+ * the end of the scenario.
+ * - VACUUM to advance XID limits.
+ * - Consume ~2M XIDs, crossing the former xidWrapLimit.
+ * - Start, finish, and summarize a SERIALIZABLE transaction.
+ * SerialAdd() declines to create the targetPage, because headPage
+ * is not regarded as in the past relative to that targetPage. The
+ * transaction instigating the summarize fails in
+ * SimpleLruReadPage().
*/
tailPage = serialControl->headPage;
serialControl->headPage = -1;
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 61fbc80..19982f6 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -117,9 +117,14 @@ typedef struct SlruCtlData
bool do_fsync;
/*
- * Decide which of two page numbers is "older" for truncation purposes. We
- * need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic.
+ * Decide whether a page is "older" for truncation and as a hint for
+ * evicting pages in LRU order. Return true if every entry of the first
+ * argument is older than every entry of the second argument. Note that
+ * !PagePrecedes(a,b) && !PagePrecedes(b,a) need not imply a==b; it also
+ * arises when some entries are older and some are not. For SLRUs using
+ * SimpleLruTruncate(), this must use modular arithmetic. (For others,
+ * the behavior of this callback has no functional implications.) Use
+ * SlruPagePrecedesUnitTests() in SLRUs meeting its criteria.
*/
bool (*PagePrecedes) (int, int);
@@ -143,6 +148,11 @@ extern int SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno,
TransactionId xid);
extern void SimpleLruWritePage(SlruCtl ctl, int slotno);
extern void SimpleLruFlush(SlruCtl ctl, bool allow_redirtied);
+#ifdef USE_ASSERT_CHECKING
+extern void SlruPagePrecedesUnitTests(SlruCtl ctl, int per_page);
+#else
+#define SlruPagePrecedesUnitTests(ctl, per_page) do {} while (0)
+#endif
extern void SimpleLruTruncate(SlruCtl ctl, int cutoffPage);
extern bool SimpleLruDoesPhysicalPageExist(SlruCtl ctl, int pageno);
slru-truncate-insurance-v1.patchtext/plain; charset=us-asciiDownload
Author: Noah Misch <noah@leadboat.com>
Commit: Noah Misch <noah@leadboat.com>
Unlink less in SimpleLruTruncate(), as insurance against bugs.
SimpleLruTruncate() has been unlinking every expendable file. In edge
cases, it also deleted important files. The most recent commit fixed
that. Given the history of this class of bugs evading detection, let's
not trust that patch exclusively. Instead of unlinking segments
representing up to 2^31 past XIDs, delete no more than half that much.
The balance will stay in place; eventually, XID consumption will
overwrite it. This could mitigate unknown SimpleLruTruncate() bugs and
simplify manual remediation after one has overtaken wrap limits in
single-user mode.
Truncation behavior won't change at all until an SLRU is half full.
Once it does change, a drawback is conflict with the following defense.
TruncateMultiXact() skips truncation when unexpected files exist on
disk, which this change deliberately makes more common. Hence,
pg_multixact becomes more likely to persist in consuming its maximum
storage. Also, this change may uncover bugs in SLRU page recycling by
making that more common. For SLRUs outside of pg_multixact, maximum
storage rises by 50%; for example, the CLOG maximum rises from 512 MiB
to 768 MiB. Usage in pg_multixact may double. Back-patch to 9.5 (all
supported versions).
Reviewed by FIXME.
Discussion: https://postgr.es/m/20200330052809.GB2324620@rfd.leadboat.com
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 54bc1ab..85e009b 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -405,6 +405,11 @@
in every database at least once every two billion transactions.
</para>
+ <!-- This oversimplifies; there are (2^31)-1 XIDs in the past, the same
+ number in the future, and one incomparable. (For each pair of incomparable
+ XIDs, TransactionIdPrecedes(a, b) and TransactionIdPrecedes(b, a) both
+ return true.) None of that is important to the DBA, since xidStopLimit
+ intervenes long before. -->
<para>
The reason that periodic vacuuming solves the problem is that
<command>VACUUM</command> will mark rows as <emphasis>frozen</emphasis>, indicating that
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index d606042..4d0db8f 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -52,7 +52,7 @@
* and CLOG segment numbering at
* 0xFFFFFFFF/CLOG_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need take no
* explicit notice of that fact in this module, except when comparing segment
- * and page numbers in TruncateCLOG (see CLOGPagePrecedes).
+ * and page numbers in TruncateCLOG (see CLOGPageDiff).
*/
/* We need two bits per xact, so four xacts fit in a byte */
@@ -89,7 +89,7 @@ static SlruCtlData XactCtlData;
static int ZeroCLOGPage(int pageno, bool writeXlog);
-static bool CLOGPagePrecedes(int page1, int page2);
+static int32 CLOGPageDiff(int page1, int page2);
static void WriteZeroPageXlogRec(int pageno);
static void WriteTruncateXlogRec(int pageno, TransactionId oldestXact,
Oid oldestXactDb);
@@ -690,10 +690,10 @@ CLOGShmemSize(void)
void
CLOGShmemInit(void)
{
- XactCtl->PagePrecedes = CLOGPagePrecedes;
+ XactCtl->PageDiff = CLOGPageDiff;
SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER);
- SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
+ SlruPageDiffUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
}
/*
@@ -908,7 +908,7 @@ TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid)
cutoffPage = TransactionIdToPage(oldestXact);
/* Check to see if there's any files that could be removed */
- if (!SlruScanDirectory(XactCtl, SlruScanDirCbReportPresence, &cutoffPage))
+ if (!SlruScanDirectory(XactCtl, SlruScanDirCbWouldTruncate, &cutoffPage))
return; /* nothing to remove */
/*
@@ -934,13 +934,14 @@ TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid)
/*
- * Decide whether a CLOG page number is "older" for truncation purposes.
+ * Diff CLOG page numbers for truncation purposes.
*
- * We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic. However, TransactionIdPrecedes()
- * would get weird about permanent xact IDs. So, offset both such that xid1,
- * xid2, and xid + CLOG_XACTS_PER_PAGE - 1 are all normal XIDs; this offset is
- * relevant to page 0 and to the page preceding page 0.
+ * To do the right thing with wraparound XID arithmetic, this mirrors
+ * TransactionIdPrecedes(). The Max() operation ensures we return a positive
+ * value when the wrap point may fall inside these pages. (When it does, some
+ * pairs of entries have a positive diff, and other pairs have a negative
+ * diff.) Only the predicate.c SLRU needs the Max() operation; to avoid
+ * having even more corner cases to understand, all XID-indexed SLRUs do it.
*
* The page containing oldestXact-2^31 is the important edge case. The
* portion of that page equaling or following oldestXact-2^31 is expendable,
@@ -948,22 +949,22 @@ TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid)
* the first XID of a page and segment, the entire page and segment is
* expendable, and we could truncate the segment. Recognizing that case would
* require making oldestXact, not just the page containing oldestXact,
- * available to this callback. The benefit would be rare and small, so we
- * don't optimize that edge case.
+ * available to this callback. slru.c wouldn't delete the page, anyway.
*/
-static bool
-CLOGPagePrecedes(int page1, int page2)
+static int32
+CLOGPageDiff(int page1, int page2)
{
TransactionId xid1;
TransactionId xid2;
+ int32 diff_head;
+ int32 diff_tail;
xid1 = ((TransactionId) page1) * CLOG_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId + 1;
- return (TransactionIdPrecedes(xid1, xid2) &&
- TransactionIdPrecedes(xid1, xid2 + CLOG_XACTS_PER_PAGE - 1));
+ diff_head = xid1 - xid2;
+ diff_tail = xid1 - (xid2 + CLOG_XACTS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index eaeb8c2..41fa9c9 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -46,7 +46,7 @@
* 0xFFFFFFFF/COMMIT_TS_XACTS_PER_PAGE, and CommitTs segment numbering at
* 0xFFFFFFFF/COMMIT_TS_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need take no
* explicit notice of that fact in this module, except when comparing segment
- * and page numbers in TruncateCommitTs (see CommitTsPagePrecedes).
+ * and page numbers in TruncateCommitTs (see CommitTsPageDiff).
*/
/*
@@ -109,7 +109,7 @@ static void TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
RepOriginId nodeid, int slotno);
static void error_commit_ts_disabled(void);
static int ZeroCommitTsPage(int pageno, bool writeXlog);
-static bool CommitTsPagePrecedes(int page1, int page2);
+static int32 CommitTsPageDiff(int page1, int page2);
static void ActivateCommitTs(void);
static void DeactivateCommitTs(void);
static void WriteZeroPageXlogRec(int pageno);
@@ -491,11 +491,11 @@ CommitTsShmemInit(void)
{
bool found;
- CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
+ CommitTsCtl->PageDiff = CommitTsPageDiff;
SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
CommitTsSLRULock, "pg_commit_ts",
LWTRANCHE_COMMITTS_BUFFER);
- SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
+ SlruPageDiffUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
commitTsShared = ShmemInitStruct("CommitTs shared",
sizeof(CommitTimestampShared),
@@ -831,7 +831,7 @@ TruncateCommitTs(TransactionId oldestXact)
cutoffPage = TransactionIdToCTsPage(oldestXact);
/* Check to see if there's any files that could be removed */
- if (!SlruScanDirectory(CommitTsCtl, SlruScanDirCbReportPresence,
+ if (!SlruScanDirectory(CommitTsCtl, SlruScanDirCbWouldTruncate,
&cutoffPage))
return; /* nothing to remove */
@@ -884,8 +884,8 @@ AdvanceOldestCommitTsXid(TransactionId oldestXact)
/*
- * Decide whether a commitTS page number is "older" for truncation purposes.
- * Analogous to CLOGPagePrecedes().
+ * Diff commitTS page numbers for truncation purposes. Analogous to
+ * CLOGPageDiff().
*
* At every supported BLCKSZ, (1 << 31) % COMMIT_TS_XACTS_PER_PAGE == 128.
* This introduces differences compared to CLOG and the other SLRUs having (1
@@ -896,7 +896,7 @@ AdvanceOldestCommitTsXid(TransactionId oldestXact)
* 128 entries of its page. Since this function doesn't know the location of
* oldestXact within page2, it returns false for one page that actually is
* expendable. This is a wider (yet still negligible) version of the
- * truncation opportunity that CLOGPagePrecedes() cannot recognize.
+ * truncation opportunity that CLOGPageDiff() cannot recognize.
*
* For the sake of a worked example, number entries with decimal values such
* that page1==1 entries range from 1.0 to 1.999. Let N+0.15 be the number of
@@ -906,19 +906,20 @@ AdvanceOldestCommitTsXid(TransactionId oldestXact)
* last entry of the oldestXact page. While page 2 is expendable at
* oldestXact=N+2.1, it would be precious at oldestXact=N+2.9.
*/
-static bool
-CommitTsPagePrecedes(int page1, int page2)
+static int32
+CommitTsPageDiff(int page1, int page2)
{
TransactionId xid1;
TransactionId xid2;
+ int32 diff_head;
+ int32 diff_tail;
xid1 = ((TransactionId) page1) * COMMIT_TS_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * COMMIT_TS_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId + 1;
- return (TransactionIdPrecedes(xid1, xid2) &&
- TransactionIdPrecedes(xid1, xid2 + COMMIT_TS_XACTS_PER_PAGE - 1));
+ diff_head = xid1 - xid2;
+ diff_tail = xid1 - (xid2 + COMMIT_TS_XACTS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index ff96083..1a2e6f9 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -102,7 +102,7 @@
* 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
* take no explicit notice of that fact in this module, except when comparing
* segment and page numbers in TruncateMultiXact (see
- * MultiXactOffsetPagePrecedes).
+ * MultiXactOffsetPageDiff).
*/
/* We need four bytes per offset */
@@ -355,8 +355,8 @@ static char *mxstatus_to_string(MultiXactStatus status);
/* management of SLRU infrastructure */
static int ZeroMultiXactOffsetPage(int pageno, bool writeXlog);
static int ZeroMultiXactMemberPage(int pageno, bool writeXlog);
-static bool MultiXactOffsetPagePrecedes(int page1, int page2);
-static bool MultiXactMemberPagePrecedes(int page1, int page2);
+static int32 MultiXactOffsetPageDiff(int page1, int page2);
+static int32 MultiXactMemberPageDiff(int page1, int page2);
static bool MultiXactOffsetPrecedes(MultiXactOffset offset1,
MultiXactOffset offset2);
static void ExtendMultiXactOffset(MultiXactId multi);
@@ -1825,14 +1825,14 @@ MultiXactShmemInit(void)
debug_elog2(DEBUG2, "Shared Memory Init for MultiXact");
- MultiXactOffsetCtl->PagePrecedes = MultiXactOffsetPagePrecedes;
- MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
+ MultiXactOffsetCtl->PageDiff = MultiXactOffsetPageDiff;
+ MultiXactMemberCtl->PageDiff = MultiXactMemberPageDiff;
SimpleLruInit(MultiXactOffsetCtl,
"MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
MultiXactOffsetSLRULock, "pg_multixact/offsets",
LWTRANCHE_MULTIXACTOFFSET_BUFFER);
- SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
+ SlruPageDiffUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
SimpleLruInit(MultiXactMemberCtl,
"MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
MultiXactMemberSLRULock, "pg_multixact/members",
@@ -2863,7 +2863,7 @@ SlruScanDirCbFindEarliest(SlruCtl ctl, char *filename, int segpage, void *data)
mxtruncinfo *trunc = (mxtruncinfo *) data;
if (trunc->earliestExistingPage == -1 ||
- ctl->PagePrecedes(segpage, trunc->earliestExistingPage))
+ ctl->PageDiff(segpage, trunc->earliestExistingPage))
{
trunc->earliestExistingPage = segpage;
}
@@ -2982,11 +2982,12 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
*
* When nextMXact is less than one segment away from multiWrapLimit,
* SlruScanDirCbFindEarliest can find some early segment other than the
- * actual earliest. (MultiXactOffsetPagePrecedes(EARLIEST, LATEST)
- * returns false, because not all pairs of entries have the same answer.)
- * That can also arise when an earlier truncation attempt failed unlink()
- * or returned early from this function. The only consequence is
- * returning early, which wastes space that we could have liberated.
+ * actual earliest. (MultiXactOffsetPageDiff(EARLIEST, LATEST) >= 0,
+ * because not all pairs of entries have the same answer.) That can also
+ * arise when an earlier truncation attempt failed unlink(), returned
+ * early from this function, or saw SlruWouldTruncateSegment() decline to
+ * delete the older half of the SLRU. The only consequence is returning
+ * early, which wastes space that we could have liberated.
*
* NB: It's also possible that the page that oldestMulti is on has already
* been truncated away, and we crashed before updating oldestMulti.
@@ -3102,44 +3103,42 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
}
/*
- * Decide whether a MultiXactOffset page number is "older" for truncation
- * purposes. Analogous to CLOGPagePrecedes().
- *
- * Offsetting the values is optional, because MultiXactIdPrecedes() has
- * translational symmetry.
+ * Diff MultiXactOffset page numbers for truncation purposes. Analogous to
+ * CLOGPageDiff().
*/
-static bool
-MultiXactOffsetPagePrecedes(int page1, int page2)
+static int32
+MultiXactOffsetPageDiff(int page1, int page2)
{
MultiXactId multi1;
MultiXactId multi2;
+ int32 diff_head;
+ int32 diff_tail;
multi1 = ((MultiXactId) page1) * MULTIXACT_OFFSETS_PER_PAGE;
- multi1 += FirstMultiXactId + 1;
multi2 = ((MultiXactId) page2) * MULTIXACT_OFFSETS_PER_PAGE;
- multi2 += FirstMultiXactId + 1;
- return (MultiXactIdPrecedes(multi1, multi2) &&
- MultiXactIdPrecedes(multi1,
- multi2 + MULTIXACT_OFFSETS_PER_PAGE - 1));
+ diff_head = multi1 - multi2;
+ diff_tail = multi1 - (multi2 + MULTIXACT_OFFSETS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
/*
- * Decide whether a MultiXactMember page number is "older" for truncation
- * purposes. There is no "invalid offset number" so use the numbers verbatim.
+ * Diff MultiXactMember page numbers for truncation purposes.
*/
-static bool
-MultiXactMemberPagePrecedes(int page1, int page2)
+static int32
+MultiXactMemberPageDiff(int page1, int page2)
{
MultiXactOffset offset1;
MultiXactOffset offset2;
+ int32 diff_head;
+ int32 diff_tail;
offset1 = ((MultiXactOffset) page1) * MULTIXACT_MEMBERS_PER_PAGE;
offset2 = ((MultiXactOffset) page2) * MULTIXACT_MEMBERS_PER_PAGE;
- return (MultiXactOffsetPrecedes(offset1, offset2) &&
- MultiXactOffsetPrecedes(offset1,
- offset2 + MULTIXACT_MEMBERS_PER_PAGE - 1));
+ diff_head = offset1 - offset2;
+ diff_tail = offset1 - (offset2 + MULTIXACT_MEMBERS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
/*
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 33e1e93..b665fe6 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -248,7 +248,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
/*
* Initialize the unshared control struct, including directory path. We
- * assume caller set PagePrecedes.
+ * assume caller set PageDiff.
*/
ctl->shared = shared;
ctl->do_fsync = true; /* default behavior */
@@ -1084,8 +1084,8 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
{
if (this_delta > best_valid_delta ||
(this_delta == best_valid_delta &&
- ctl->PagePrecedes(this_page_number,
- best_valid_page_number)))
+ ctl->PageDiff(this_page_number,
+ best_valid_page_number) < 0))
{
bestvalidslot = slotno;
best_valid_delta = this_delta;
@@ -1096,8 +1096,8 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
{
if (this_delta > best_invalid_delta ||
(this_delta == best_invalid_delta &&
- ctl->PagePrecedes(this_page_number,
- best_invalid_page_number)))
+ ctl->PageDiff(this_page_number,
+ best_invalid_page_number) < 0))
{
bestinvalidslot = slotno;
best_invalid_delta = this_delta;
@@ -1207,7 +1207,8 @@ SimpleLruFlush(SlruCtl ctl, bool allow_redirtied)
}
/*
- * Remove all segments before the one holding the passed page number
+ * Remove some obsolete segments. As defense in depth, this deletes less than
+ * PageDiff() authorizes; see SlruWouldTruncateSegment().
*/
void
SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
@@ -1232,7 +1233,7 @@ restart:;
* While we are holding the lock, make an important safety check: the
* current endpoint page must not be eligible for removal.
*/
- if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+ if (ctl->PageDiff(shared->latest_page_number, cutoffPage) < 0)
{
LWLockRelease(shared->ControlLock);
ereport(LOG,
@@ -1245,7 +1246,7 @@ restart:;
{
if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
continue;
- if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
+ if (ctl->PageDiff(shared->page_number[slotno], cutoffPage) >= 0)
continue;
/*
@@ -1357,33 +1358,46 @@ restart:
}
/*
- * Determine whether a segment is okay to delete.
+ * Determine whether to delete a segment.
*
* segpage is the first page of the segment, and cutoffPage is the oldest (in
- * PagePrecedes order) page in the SLRU containing still-useful data. Since
- * every core PagePrecedes callback implements "wrap around", check the
+ * PageDiff order) page in the SLRU containing still-useful data. Check the
* segment's first and last pages:
*
* first<cutoff && last<cutoff: yes
* first<cutoff && last>=cutoff: no; cutoff falls inside this segment
* first>=cutoff && last<cutoff: no; wrap point falls inside this segment
* first>=cutoff && last>=cutoff: no; every page of this segment is too young
+ *
+ * The PageDiff specification requires us not to remove pages where the
+ * callback reports negative values close to INT_MIN. Our interpretation is
+ * to decline to delete segments containing a page P such that PageDiff(P,
+ * cutoffPage) is in [INT_MIN, INT_MIN/2].
*/
static bool
-SlruMayDeleteSegment(SlruCtl ctl, int segpage, int cutoffPage)
+SlruWouldTruncateSegment(SlruCtl ctl, int segpage, int cutoffPage)
{
- int seg_last_page = segpage + SLRU_PAGES_PER_SEGMENT - 1;
+ int first_page_diff;
Assert(segpage % SLRU_PAGES_PER_SEGMENT == 0);
- return (ctl->PagePrecedes(segpage, cutoffPage) &&
- ctl->PagePrecedes(seg_last_page, cutoffPage));
+ first_page_diff = ctl->PageDiff(segpage, cutoffPage);
+ if (first_page_diff < 0 && first_page_diff > INT_MIN / 2)
+ {
+ int seg_last_page = segpage + SLRU_PAGES_PER_SEGMENT - 1;
+ int last_page_diff = ctl->PageDiff(seg_last_page, cutoffPage);
+
+ return last_page_diff < 0 && last_page_diff > INT_MIN / 2;
+ }
+ return false;
}
#ifdef USE_ASSERT_CHECKING
static void
-SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
+SlruPageDiffTestOffset(SlruCtl ctl, int per_page, uint32 offset)
{
+ int32 large_negative = INT_MIN / 1000 * 999,
+ large_positive = INT_MAX / 1000 * 999;
TransactionId lhs,
rhs;
int newestPage,
@@ -1407,19 +1421,27 @@ SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
Assert(!TransactionIdPrecedes(rhs, lhs + 1));
Assert(!TransactionIdFollowsOrEquals(lhs, rhs));
Assert(!TransactionIdFollowsOrEquals(rhs, lhs));
- Assert(!ctl->PagePrecedes(lhs / per_page, lhs / per_page));
- Assert(!ctl->PagePrecedes(lhs / per_page, rhs / per_page));
- Assert(!ctl->PagePrecedes(rhs / per_page, lhs / per_page));
- Assert(!ctl->PagePrecedes((lhs - per_page) / per_page, rhs / per_page));
- Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 3 * per_page) / per_page));
- Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 2 * per_page) / per_page));
- Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 1 * per_page) / per_page)
- || (1U << 31) % per_page != 0); /* See CommitTsPagePrecedes() */
- Assert(ctl->PagePrecedes((lhs + 1 * per_page) / per_page, rhs / per_page)
+ Assert(ctl->PageDiff(lhs / per_page, lhs / per_page) == 0);
+ Assert(ctl->PageDiff(lhs / per_page, rhs / per_page) > large_positive);
+ Assert(ctl->PageDiff(rhs / per_page, lhs / per_page) > large_positive);
+ Assert(ctl->PageDiff((lhs - per_page) / per_page, rhs / per_page) >
+ large_positive);
+ Assert(ctl->PageDiff(rhs / per_page, (lhs - 3 * per_page) / per_page) <
+ large_negative);
+ Assert(ctl->PageDiff(rhs / per_page, (lhs - 2 * per_page) / per_page) <
+ large_negative);
+ Assert(ctl->PageDiff(rhs / per_page, (lhs - 1 * per_page) / per_page) <
+ large_negative
+ || (1U << 31) % per_page != 0); /* See CommitTsPageDiff() */
+ Assert(ctl->PageDiff((lhs + 1 * per_page) / per_page, rhs / per_page) <
+ large_negative
|| (1U << 31) % per_page != 0);
- Assert(ctl->PagePrecedes((lhs + 2 * per_page) / per_page, rhs / per_page));
- Assert(ctl->PagePrecedes((lhs + 3 * per_page) / per_page, rhs / per_page));
- Assert(!ctl->PagePrecedes(rhs / per_page, (lhs + per_page) / per_page));
+ Assert(ctl->PageDiff((lhs + 2 * per_page) / per_page, rhs / per_page) <
+ large_negative);
+ Assert(ctl->PageDiff((lhs + 3 * per_page) / per_page, rhs / per_page) <
+ large_negative);
+ Assert(ctl->PageDiff(rhs / per_page, (lhs + per_page) / per_page) >
+ large_positive);
/*
* GetNewTransactionId() has assigned the last XID it can safely use, and
@@ -1432,10 +1454,10 @@ SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
oldestXact = newestXact + 1;
oldestXact -= 1U << 31;
oldestPage = oldestXact / per_page;
- Assert(!SlruMayDeleteSegment(ctl,
- (newestPage -
- newestPage % SLRU_PAGES_PER_SEGMENT),
- oldestPage));
+ Assert(!SlruWouldTruncateSegment(ctl,
+ (newestPage -
+ newestPage % SLRU_PAGES_PER_SEGMENT),
+ oldestPage));
/*
* GetNewTransactionId() has assigned the last XID it can safely use, and
@@ -1448,42 +1470,44 @@ SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
oldestXact = newestXact + 1;
oldestXact -= 1U << 31;
oldestPage = oldestXact / per_page;
- Assert(!SlruMayDeleteSegment(ctl,
- (newestPage -
- newestPage % SLRU_PAGES_PER_SEGMENT),
- oldestPage));
+ Assert(!SlruWouldTruncateSegment(ctl,
+ (newestPage -
+ newestPage % SLRU_PAGES_PER_SEGMENT),
+ oldestPage));
}
/*
- * Unit-test a PagePrecedes function.
+ * Unit-test a PageDiff function.
*
* This assumes every uint32 >= FirstNormalTransactionId is a valid key. It
* assumes each value occupies a contiguous, fixed-size region of SLRU bytes.
* (MultiXactMemberCtl separates flags from XIDs. AsyncCtl has
* variable-length entries, no keys, and no random access. These unit tests
* do not apply to them.)
+ *
+ * This is stricter than the PageDiff API requires.
*/
void
-SlruPagePrecedesUnitTests(SlruCtl ctl, int per_page)
+SlruPageDiffUnitTests(SlruCtl ctl, int per_page)
{
/* Test first, middle and last entries of a page. */
- SlruPagePrecedesTestOffset(ctl, per_page, 0);
- SlruPagePrecedesTestOffset(ctl, per_page, per_page / 2);
- SlruPagePrecedesTestOffset(ctl, per_page, per_page - 1);
+ SlruPageDiffTestOffset(ctl, per_page, 0);
+ SlruPageDiffTestOffset(ctl, per_page, per_page / 2);
+ SlruPageDiffTestOffset(ctl, per_page, per_page - 1);
}
#endif
/*
* SlruScanDirectory callback
- * This callback reports true if there's any segment wholly prior to the
- * one containing the page passed as "data".
+ * This callback reports true if SimpleLruTruncate(ctl, *data) would
+ * unlink any segment.
*/
bool
-SlruScanDirCbReportPresence(SlruCtl ctl, char *filename, int segpage, void *data)
+SlruScanDirCbWouldTruncate(SlruCtl ctl, char *filename, int segpage, void *data)
{
int cutoffPage = *(int *) data;
- if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
+ if (SlruWouldTruncateSegment(ctl, segpage, cutoffPage))
return true; /* found one; don't iterate any more */
return false; /* keep going */
@@ -1498,7 +1522,7 @@ SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename, int segpage, void *data)
{
int cutoffPage = *(int *) data;
- if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
+ if (SlruWouldTruncateSegment(ctl, segpage, cutoffPage))
SlruInternalDeleteSegment(ctl, filename);
return false; /* keep going */
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 09cfa38..76ce2d4 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -44,7 +44,7 @@
* 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE, and segment numbering at
* 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need take no
* explicit notice of that fact in this module, except when comparing segment
- * and page numbers in TruncateSUBTRANS (see SubTransPagePrecedes) and zeroing
+ * and page numbers in TruncateSUBTRANS (see SubTransPageDiff) and zeroing
* them in StartupSUBTRANS.
*/
@@ -64,7 +64,7 @@ static SlruCtlData SubTransCtlData;
static int ZeroSUBTRANSPage(int pageno);
-static bool SubTransPagePrecedes(int page1, int page2);
+static int32 SubTransPageDiff(int page1, int page2);
/*
@@ -190,13 +190,13 @@ SUBTRANSShmemSize(void)
void
SUBTRANSShmemInit(void)
{
- SubTransCtl->PagePrecedes = SubTransPagePrecedes;
+ SubTransCtl->PageDiff = SubTransPageDiff;
SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
SubtransSLRULock, "pg_subtrans",
LWTRANCHE_SUBTRANS_BUFFER);
/* Override default assumption that writes should be fsync'd */
SubTransCtl->do_fsync = false;
- SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
+ SlruPageDiffUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
}
/*
@@ -374,20 +374,21 @@ TruncateSUBTRANS(TransactionId oldestXact)
/*
- * Decide whether a SUBTRANS page number is "older" for truncation purposes.
- * Analogous to CLOGPagePrecedes().
+ * Diff SUBTRANS page numbers for truncation purposes. Analogous to
+ * CLOGPageDiff().
*/
-static bool
-SubTransPagePrecedes(int page1, int page2)
+static int32
+SubTransPageDiff(int page1, int page2)
{
TransactionId xid1;
TransactionId xid2;
+ int32 diff_head;
+ int32 diff_tail;
xid1 = ((TransactionId) page1) * SUBTRANS_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * SUBTRANS_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId + 1;
- return (TransactionIdPrecedes(xid1, xid2) &&
- TransactionIdPrecedes(xid1, xid2 + SUBTRANS_XACTS_PER_PAGE - 1));
+ diff_head = xid1 - xid2;
+ diff_tail = xid1 - (xid2 + SUBTRANS_XACTS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 7e5cd66..f3a676f 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -207,13 +207,13 @@ typedef struct QueuePosition
/* choose logically smaller QueuePosition */
#define QUEUE_POS_MIN(x,y) \
- (asyncQueuePagePrecedes((x).page, (y).page) ? (x) : \
+ (asyncQueuePageDiff((x).page, (y).page) < 0 ? (x) : \
(x).page != (y).page ? (y) : \
(x).offset < (y).offset ? (x) : (y))
/* choose logically larger QueuePosition */
#define QUEUE_POS_MAX(x,y) \
- (asyncQueuePagePrecedes((x).page, (y).page) ? (y) : \
+ (asyncQueuePageDiff((x).page, (y).page) < 0 ? (y) : \
(x).page != (y).page ? (x) : \
(x).offset > (y).offset ? (x) : (y))
@@ -433,8 +433,7 @@ static bool backendTryAdvanceTail = false;
bool Trace_notify = false;
/* local function prototypes */
-static int asyncQueuePageDiff(int p, int q);
-static bool asyncQueuePagePrecedes(int p, int q);
+static int32 asyncQueuePageDiff(int p, int q);
static void queue_listen(ListenActionKind action, const char *channel);
static void Async_UnlistenOnExit(int code, Datum arg);
static void Exec_ListenPreCommit(void);
@@ -465,12 +464,16 @@ static void ClearPendingActionsAndNotifies(void);
/*
* Compute the difference between two queue page numbers (i.e., p - q),
- * accounting for wraparound.
+ * accounting for wraparound. Since asyncQueueIsFull() blocks creation of a
+ * page that could precede any extant page, we need not assess entries within
+ * a page.
*/
-static int
+static int32
asyncQueuePageDiff(int p, int q)
{
- int diff;
+ int diff_max = ((QUEUE_MAX_PAGE + 1) / 2) - 1,
+ diff;
+ int32 scale = INT_MAX / diff_max;
/*
* We have to compare modulo (QUEUE_MAX_PAGE+1)/2. Both inputs should be
@@ -484,19 +487,24 @@ asyncQueuePageDiff(int p, int q)
diff -= QUEUE_MAX_PAGE + 1;
else if (diff < -((QUEUE_MAX_PAGE + 1) / 2))
diff += QUEUE_MAX_PAGE + 1;
- return diff;
+ return diff * scale;
}
-/*
- * Is p < q, accounting for wraparound?
- *
- * Since asyncQueueIsFull() blocks creation of a page that could precede any
- * extant page, we need not assess entries within a page.
- */
-static bool
-asyncQueuePagePrecedes(int p, int q)
+static void
+asyncQueuePageDiffUnitTests(void)
{
- return asyncQueuePageDiff(p, q) < 0;
+ int32 large_negative = INT_MIN / 1000 * 999,
+ large_positive = INT_MAX / 1000 * 999;
+ int diff_min = -((QUEUE_MAX_PAGE + 1) / 2),
+ diff_max = ((QUEUE_MAX_PAGE + 1) / 2) - 1;
+
+ Assert(asyncQueuePageDiff(diff_max, diff_max) == 0);
+ Assert(asyncQueuePageDiff(diff_max, 0) > large_positive);
+ Assert(asyncQueuePageDiff(diff_max + 1, 0) < large_negative);
+ Assert(asyncQueuePageDiff(0, QUEUE_MAX_PAGE + diff_min + 1) <
+ large_negative);
+ Assert(asyncQueuePageDiff(0, QUEUE_MAX_PAGE + diff_min + 2) >
+ large_positive);
}
/*
@@ -557,11 +565,12 @@ AsyncShmemInit(void)
/*
* Set up SLRU management of the pg_notify data.
*/
- NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
+ NotifyCtl->PageDiff = asyncQueuePageDiff;
SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER);
/* Override default assumption that writes should be fsync'd */
NotifyCtl->do_fsync = false;
+ asyncQueuePageDiffUnitTests();
if (!found)
{
@@ -1366,7 +1375,7 @@ asyncQueueIsFull(void)
nexthead = 0; /* wrap around */
boundary = QUEUE_POS_PAGE(QUEUE_TAIL);
boundary -= boundary % SLRU_PAGES_PER_SEGMENT;
- return asyncQueuePagePrecedes(nexthead, boundary);
+ return asyncQueuePageDiff(nexthead, boundary) < 0;
}
/*
@@ -2202,7 +2211,7 @@ asyncQueueAdvanceTail(void)
*/
newtailpage = QUEUE_POS_PAGE(min);
boundary = newtailpage - (newtailpage % SLRU_PAGES_PER_SEGMENT);
- if (asyncQueuePagePrecedes(oldtailpage, boundary))
+ if (asyncQueuePageDiff(oldtailpage, boundary) < 0)
{
/*
* SimpleLruTruncate() will ask for NotifySLRULock but will also
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index fde1b5c..6dbd87e 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -438,7 +438,7 @@ static void SetPossibleUnsafeConflict(SERIALIZABLEXACT *roXact, SERIALIZABLEXACT
static void ReleaseRWConflict(RWConflict conflict);
static void FlagSxactUnsafe(SERIALIZABLEXACT *sxact);
-static bool SerialPagePrecedesLogically(int page1, int page2);
+static int32 SerialPageDiffLogically(int page1, int page2);
static void SerialInit(void);
static void SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo);
static SerCommitSeqNo SerialGetMinConflictCommitSeqNo(TransactionId xid);
@@ -784,26 +784,30 @@ FlagSxactUnsafe(SERIALIZABLEXACT *sxact)
/*------------------------------------------------------------------------*/
/*
- * Decide whether a Serial page number is "older" for truncation purposes.
- * Analogous to CLOGPagePrecedes().
+ * Diff Serial page numbers for truncation purposes. Analogous to
+ * CLOGPageDiff().
+ *
+ * This must follow stricter rules than PageDiff demands, for the benefit of
+ * the call local to this file.
*/
-static bool
-SerialPagePrecedesLogically(int page1, int page2)
+static int32
+SerialPageDiffLogically(int page1, int page2)
{
TransactionId xid1;
TransactionId xid2;
+ int32 diff_head;
+ int32 diff_tail;
xid1 = ((TransactionId) page1) * SERIAL_ENTRIESPERPAGE;
- xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * SERIAL_ENTRIESPERPAGE;
- xid2 += FirstNormalTransactionId + 1;
- return (TransactionIdPrecedes(xid1, xid2) &&
- TransactionIdPrecedes(xid1, xid2 + SERIAL_ENTRIESPERPAGE - 1));
+ diff_head = xid1 - xid2;
+ diff_tail = xid1 - (xid2 + SERIAL_ENTRIESPERPAGE - 1);
+ return Max(diff_head, diff_tail);
}
static void
-SerialPagePrecedesLogicallyUnitTests(void)
+SerialPageDiffLogicallyUnitTests(void)
{
int per_page = SERIAL_ENTRIESPERPAGE,
offset = per_page / 2;
@@ -826,21 +830,21 @@ SerialPagePrecedesLogicallyUnitTests(void)
* In this scenario, the SLRU headPage pertains to the last ~1000 XIDs
* assigned. oldestXact finishes, ~2B XIDs having elapsed since it
* started. Further transactions cause us to summarize oldestXact to
- * tailPage. Function must return false so SerialAdd() doesn't zero
- * tailPage (which may contain entries for other old, recently-finished
- * XIDs) and half the SLRU. Reaching this requires burning ~2B XIDs in
- * single-user mode, a negligible possibility.
+ * tailPage. Function must return non-negative so SerialAdd() doesn't
+ * zero tailPage (which may contain entries for other old,
+ * recently-finished XIDs) and half the SLRU. Reaching this requires
+ * burning ~2B XIDs in single-user mode, a negligible possibility.
*/
headPage = newestPage;
targetPage = oldestPage;
- Assert(!SerialPagePrecedesLogically(headPage, targetPage));
+ Assert(SerialPageDiffLogically(headPage, targetPage) >= 0);
/*
* In this scenario, the SLRU headPage pertains to oldestXact. We're
* summarizing an XID near newestXact. (Assume few other XIDs used
* SERIALIZABLE, hence the minimal headPage advancement. Assume
* oldestXact was long-running and only recently reached the SLRU.)
- * Function must return true to make SerialAdd() create targetPage.
+ * Function must return negative to make SerialAdd() create targetPage.
*
* Today's implementation mishandles this case, but it doesn't matter
* enough to fix. Verify that the defect affects just one page by
@@ -851,9 +855,9 @@ SerialPagePrecedesLogicallyUnitTests(void)
*/
headPage = oldestPage;
targetPage = newestPage;
- Assert(SerialPagePrecedesLogically(headPage, targetPage - 1));
+ Assert(SerialPageDiffLogically(headPage, targetPage - 1) < 0);
#if 0
- Assert(SerialPagePrecedesLogically(headPage, targetPage));
+ Assert(SerialPageDiffLogically(headPage, targetPage) < 0);
#endif
}
@@ -868,14 +872,14 @@ SerialInit(void)
/*
* Set up SLRU management of the pg_serial data.
*/
- SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
+ SerialSlruCtl->PageDiff = SerialPageDiffLogically;
SimpleLruInit(SerialSlruCtl, "Serial",
NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
LWTRANCHE_SERIAL_BUFFER);
/* Override default assumption that writes should be fsync'd */
SerialSlruCtl->do_fsync = false;
- SerialPagePrecedesLogicallyUnitTests();
- SlruPagePrecedesUnitTests(SerialSlruCtl, SERIAL_ENTRIESPERPAGE);
+ SerialPageDiffLogicallyUnitTests();
+ SlruPageDiffUnitTests(SerialSlruCtl, SERIAL_ENTRIESPERPAGE);
/*
* Create or attach to the SerialControl structure.
@@ -937,8 +941,8 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
else
{
firstZeroPage = SerialNextPage(serialControl->headPage);
- isNewPage = SerialPagePrecedesLogically(serialControl->headPage,
- targetPage);
+ isNewPage = SerialPageDiffLogically(serialControl->headPage,
+ targetPage) < 0;
}
if (!TransactionIdIsValid(serialControl->headXid)
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 19982f6..a8144a5 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -28,7 +28,7 @@
* xxxx is CLOG or SUBTRANS, respectively), and segment numbering at
* 0xFFFFFFFF/xxxx_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
* take no explicit notice of that fact in slru.c, except when comparing
- * segment and page numbers in SimpleLruTruncate (see PagePrecedes()).
+ * segment and page numbers in SimpleLruTruncate (see PageDiff()).
*/
#define SLRU_PAGES_PER_SEGMENT 32
@@ -117,16 +117,18 @@ typedef struct SlruCtlData
bool do_fsync;
/*
- * Decide whether a page is "older" for truncation and as a hint for
- * evicting pages in LRU order. Return true if every entry of the first
- * argument is older than every entry of the second argument. Note that
- * !PagePrecedes(a,b) && !PagePrecedes(b,a) need not imply a==b; it also
- * arises when some entries are older and some are not. For SLRUs using
- * SimpleLruTruncate(), this must use modular arithmetic. (For others,
- * the behavior of this callback has no functional implications.) Use
- * SlruPagePrecedesUnitTests() in SLRUs meeting its criteria.
+ * Compute distance between two page numbers, for truncation and as a hint
+ * for evicting pages in LRU order. Callbacks shall distribute return
+ * values uniformly in [INT_MIN,INT_MAX]. If PageDiff(P, oldest_needed)
+ * is negative but not close to INT_MIN, that implies data in page P is
+ * obsolete. The exception for values close to INT_MIN permits
+ * implementations to return such values for edge cases where the answer
+ * changes mid-page from INT_MIN to INT_MAX. Use SlruPageDiffUnitTests()
+ * in SLRUs meeting its criteria. For SLRUs using SimpleLruTruncate(),
+ * this must use modular arithmetic. (For others, the behavior of this
+ * callback has no functional implications.)
*/
- bool (*PagePrecedes) (int, int);
+ int32 (*PageDiff) (int, int);
/*
* Dir is set during SimpleLruInit and does not change thereafter. Since
@@ -149,9 +151,9 @@ extern int SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno,
extern void SimpleLruWritePage(SlruCtl ctl, int slotno);
extern void SimpleLruFlush(SlruCtl ctl, bool allow_redirtied);
#ifdef USE_ASSERT_CHECKING
-extern void SlruPagePrecedesUnitTests(SlruCtl ctl, int per_page);
+extern void SlruPageDiffUnitTests(SlruCtl ctl, int per_page);
#else
-#define SlruPagePrecedesUnitTests(ctl, per_page) do {} while (0)
+#define SlruPageDiffUnitTests(ctl, per_page) do {} while (0)
#endif
extern void SimpleLruTruncate(SlruCtl ctl, int cutoffPage);
extern bool SimpleLruDoesPhysicalPageExist(SlruCtl ctl, int pageno);
@@ -162,8 +164,8 @@ extern bool SlruScanDirectory(SlruCtl ctl, SlruScanCallback callback, void *data
extern void SlruDeleteSegment(SlruCtl ctl, int segno);
/* SlruScanDirectory public callbacks */
-extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
- int segpage, void *data);
+extern bool SlruScanDirCbWouldTruncate(SlruCtl ctl, char *filename,
+ int segpage, void *data);
extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int segpage,
void *data);
On Mon, May 25, 2020 at 12:00:33AM -0700, Noah Misch wrote:
On Mon, Apr 06, 2020 at 09:18:47PM -0700, Noah Misch wrote:
On Mon, Apr 06, 2020 at 02:46:09PM -0400, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
On Wed, Mar 25, 2020 at 04:42:31PM -0400, Tom Lane wrote:
So I think what we're actually trying to accomplish here is to
ensure that instead of deleting up to half of the SLRU space
before the cutoff, we delete up to half-less-one-segment.
Maybe it should be half-less-two-segments, just to provide some
cushion against edge cases. Reading the first comment in
SetTransactionIdLimit makes one not want to trust too much in
arguments based on the exact value of xidWrapLimit, while for
the other SLRUs it was already unclear whether the edge cases
were exactly right.That could be interesting insurance. While it would be sad for us to miss an
edge case and print "must be vacuumed within 2 transactions" when wrap has
already happened, reaching that message implies the DBA burned ~1M XIDs, all
in single-user mode. More plausible is FreezeMultiXactId() overrunning the
limit by tens of segments. Hence, if we do buy this insurance, let's skip far
more segments. For example, instead of unlinking segments representing up to
2^31 past XIDs, we could divide that into an upper half that we unlink and a
lower half. The lower half will stay in place; eventually, XID consumption
will overwrite it. Truncation behavior won't change until the region of CLOG
for pre-oldestXact XIDs exceeds 256 MiB. Beyond that threshold,
vac_truncate_clog() will unlink the upper 256 MiB and leave the rest. CLOG
maximum would rise from 512 MiB to 768 MiB. Would that be worthwhile?Temporarily wasting some disk
space is a lot more palatable than corrupting data, and these code
paths are necessarily not terribly well tested. So +1 for more
insurance.Okay, I'll give that a try. I expect this will replace the PagePrecedes
callback with a PageDiff callback such that PageDiff(a, b) < 0 iff
PagePrecedes(a, b). PageDiff callbacks shall distribute return values
uniformly in [INT_MIN,INT_MAX]. SimpleLruTruncate() will unlink segments
where INT_MIN/2 < PageDiff(candidate, cutoff) < 0.While doing so, I found that slru-truncate-modulo-v2.patch did get edge cases
wrong, as you feared. In particular, if the newest XID reached xidStopLimit
and was in the first page of a segment, TruncateCLOG() would delete its
segment. Attached slru-truncate-modulo-v3.patch fixes that; as restitution, I
added unit tests covering that and other scenarios. Reaching the bug via XIDs
was hard, requiring one to burn 1000k-CLOG_XACTS_PER_PAGE=967k XIDs in
single-user mode. I expect the bug was easier to reach via pg_multixact.The insurance patch stacks on top of the bug fix patch. It does have a
negative effect on TruncateMultiXact(), which uses SlruScanDirCbFindEarliest
to skip truncation in corrupted clusters. SlruScanDirCbFindEarliest() gives
nonsense answers if "future" segments exist. That can happen today, but the
patch creates new ways to make it happen. The symptom is wasting yet more
space in pg_multixact. I am okay with this, since it arises only after one
fills pg_multixact 50% full. There are alternatives. We could weaken the
corruption defense in TruncateMultiXact() or look for another implementation
of equivalent defense. We could unlink, say, 75% or 95% of the "past" instead
of 50% (this patch) or >99.99% (today's behavior).
Rebased the second patch. The first patch did not need a rebase.
Attachments:
slru-truncate-modulo-v3.patchtext/plain; charset=us-asciiDownload
Author: Noah Misch <noah@leadboat.com>
Commit: Noah Misch <noah@leadboat.com>
Prevent excess SimpleLruTruncate() deletion.
Every core SLRU wraps around. With the exception of pg_notify, the wrap
point can fall in the middle of a page. Account for this in the
PagePrecedes callback specification and in SimpleLruTruncate()'s use of
said callback. Update each callback implementation to fit the new
specification. This changes SerialPagePrecedesLogically() from the
style of asyncQueuePagePrecedes() to the style of CLOGPagePrecedes().
(Whereas pg_clog and pg_serial share a key space, pg_serial is nothing
like pg_notify.)
This closes a rare opportunity for data loss, which manifested as
"apparent wraparound" or "could not access status of transaction"
errors. This is more likely to affect pg_multixact, due to the thin
space between multiStopLimit and multiWrapLimit. If a user's physical
replication primary logged ": apparent wraparound" messages, the user
should rebuild that standbys of that primary regardless of symptoms. At
less risk is a cluster having emitted "not accepting commands" errors or
"must be vacuumed" warnings at some point. One can test a cluster for
this data loss by running VACUUM FREEZE in every database. Back-patch
to 9.5 (all supported versions).
Reviewed by Tom Lane.
Discussion: https://postgr.es/m/20190202083822.GC32531@gust.leadboat.com
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index f3da40a..d606042 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -693,6 +693,7 @@ CLOGShmemInit(void)
XactCtl->PagePrecedes = CLOGPagePrecedes;
SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER);
+ SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
}
/*
@@ -933,13 +934,22 @@ TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid)
/*
- * Decide which of two CLOG page numbers is "older" for truncation purposes.
+ * Decide whether a CLOG page number is "older" for truncation purposes.
*
* We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic. However, if we are asked about
- * page number zero, we don't want to hand InvalidTransactionId to
- * TransactionIdPrecedes: it'll get weird about permanent xact IDs. So,
- * offset both xids by FirstNormalTransactionId to avoid that.
+ * thing with wraparound XID arithmetic. However, TransactionIdPrecedes()
+ * would get weird about permanent xact IDs. So, offset both such that xid1,
+ * xid2, and xid + CLOG_XACTS_PER_PAGE - 1 are all normal XIDs; this offset is
+ * relevant to page 0 and to the page preceding page 0.
+ *
+ * The page containing oldestXact-2^31 is the important edge case. The
+ * portion of that page equaling or following oldestXact-2^31 is expendable,
+ * but the portion preceding oldestXact-2^31 is not. When oldestXact-2^31 is
+ * the first XID of a page and segment, the entire page and segment is
+ * expendable, and we could truncate the segment. Recognizing that case would
+ * require making oldestXact, not just the page containing oldestXact,
+ * available to this callback. The benefit would be rare and small, so we
+ * don't optimize that edge case.
*/
static bool
CLOGPagePrecedes(int page1, int page2)
@@ -948,11 +958,12 @@ CLOGPagePrecedes(int page1, int page2)
TransactionId xid2;
xid1 = ((TransactionId) page1) * CLOG_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId;
+ xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId;
+ xid2 += FirstNormalTransactionId + 1;
- return TransactionIdPrecedes(xid1, xid2);
+ return (TransactionIdPrecedes(xid1, xid2) &&
+ TransactionIdPrecedes(xid1, xid2 + CLOG_XACTS_PER_PAGE - 1));
}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 9cdb136..eaeb8c2 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -495,6 +495,7 @@ CommitTsShmemInit(void)
SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
CommitTsSLRULock, "pg_commit_ts",
LWTRANCHE_COMMITTS_BUFFER);
+ SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
commitTsShared = ShmemInitStruct("CommitTs shared",
sizeof(CommitTimestampShared),
@@ -883,14 +884,27 @@ AdvanceOldestCommitTsXid(TransactionId oldestXact)
/*
- * Decide which of two commitTS page numbers is "older" for truncation
- * purposes.
+ * Decide whether a commitTS page number is "older" for truncation purposes.
+ * Analogous to CLOGPagePrecedes().
*
- * We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic. However, if we are asked about
- * page number zero, we don't want to hand InvalidTransactionId to
- * TransactionIdPrecedes: it'll get weird about permanent xact IDs. So,
- * offset both xids by FirstNormalTransactionId to avoid that.
+ * At every supported BLCKSZ, (1 << 31) % COMMIT_TS_XACTS_PER_PAGE == 128.
+ * This introduces differences compared to CLOG and the other SLRUs having (1
+ * << 31) % per_page == 0. This function never tests exactly
+ * TransactionIdPrecedes(x-2^31, x). When the system reaches xidStopLimit,
+ * there are two possible counts of page boundaries between oldestXact and the
+ * latest XID assigned, depending on whether oldestXact is within the first
+ * 128 entries of its page. Since this function doesn't know the location of
+ * oldestXact within page2, it returns false for one page that actually is
+ * expendable. This is a wider (yet still negligible) version of the
+ * truncation opportunity that CLOGPagePrecedes() cannot recognize.
+ *
+ * For the sake of a worked example, number entries with decimal values such
+ * that page1==1 entries range from 1.0 to 1.999. Let N+0.15 be the number of
+ * pages that 2^31 entries will span (N is an integer). If oldestXact=N+2.1,
+ * then the final safe XID assignment leaves newestXact=1.95. We keep page 2,
+ * because entry=2.85 is the border that toggles whether entries precede the
+ * last entry of the oldestXact page. While page 2 is expendable at
+ * oldestXact=N+2.1, it would be precious at oldestXact=N+2.9.
*/
static bool
CommitTsPagePrecedes(int page1, int page2)
@@ -899,11 +913,12 @@ CommitTsPagePrecedes(int page1, int page2)
TransactionId xid2;
xid1 = ((TransactionId) page1) * COMMIT_TS_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId;
+ xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * COMMIT_TS_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId;
+ xid2 += FirstNormalTransactionId + 1;
- return TransactionIdPrecedes(xid1, xid2);
+ return (TransactionIdPrecedes(xid1, xid2) &&
+ TransactionIdPrecedes(xid1, xid2 + COMMIT_TS_XACTS_PER_PAGE - 1));
}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index ce84dac..ff96083 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1832,10 +1832,12 @@ MultiXactShmemInit(void)
"MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
MultiXactOffsetSLRULock, "pg_multixact/offsets",
LWTRANCHE_MULTIXACTOFFSET_BUFFER);
+ SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
SimpleLruInit(MultiXactMemberCtl,
"MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
MultiXactMemberSLRULock, "pg_multixact/members",
LWTRANCHE_MULTIXACTMEMBER_BUFFER);
+ /* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
/* Initialize our shared state struct */
MultiXactState = ShmemInitStruct("Shared MultiXact State",
@@ -2978,6 +2980,14 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
* truncate the members SLRU. So we first scan the directory to determine
* the earliest offsets page number that we can read without error.
*
+ * When nextMXact is less than one segment away from multiWrapLimit,
+ * SlruScanDirCbFindEarliest can find some early segment other than the
+ * actual earliest. (MultiXactOffsetPagePrecedes(EARLIEST, LATEST)
+ * returns false, because not all pairs of entries have the same answer.)
+ * That can also arise when an earlier truncation attempt failed unlink()
+ * or returned early from this function. The only consequence is
+ * returning early, which wastes space that we could have liberated.
+ *
* NB: It's also possible that the page that oldestMulti is on has already
* been truncated away, and we crashed before updating oldestMulti.
*/
@@ -3092,15 +3102,11 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
}
/*
- * Decide which of two MultiXactOffset page numbers is "older" for truncation
- * purposes.
- *
- * We need to use comparison of MultiXactId here in order to do the right
- * thing with wraparound. However, if we are asked about page number zero, we
- * don't want to hand InvalidMultiXactId to MultiXactIdPrecedes: it'll get
- * weird. So, offset both multis by FirstMultiXactId to avoid that.
- * (Actually, the current implementation doesn't do anything weird with
- * InvalidMultiXactId, but there's no harm in leaving this code like this.)
+ * Decide whether a MultiXactOffset page number is "older" for truncation
+ * purposes. Analogous to CLOGPagePrecedes().
+ *
+ * Offsetting the values is optional, because MultiXactIdPrecedes() has
+ * translational symmetry.
*/
static bool
MultiXactOffsetPagePrecedes(int page1, int page2)
@@ -3109,15 +3115,17 @@ MultiXactOffsetPagePrecedes(int page1, int page2)
MultiXactId multi2;
multi1 = ((MultiXactId) page1) * MULTIXACT_OFFSETS_PER_PAGE;
- multi1 += FirstMultiXactId;
+ multi1 += FirstMultiXactId + 1;
multi2 = ((MultiXactId) page2) * MULTIXACT_OFFSETS_PER_PAGE;
- multi2 += FirstMultiXactId;
+ multi2 += FirstMultiXactId + 1;
- return MultiXactIdPrecedes(multi1, multi2);
+ return (MultiXactIdPrecedes(multi1, multi2) &&
+ MultiXactIdPrecedes(multi1,
+ multi2 + MULTIXACT_OFFSETS_PER_PAGE - 1));
}
/*
- * Decide which of two MultiXactMember page numbers is "older" for truncation
+ * Decide whether a MultiXactMember page number is "older" for truncation
* purposes. There is no "invalid offset number" so use the numbers verbatim.
*/
static bool
@@ -3129,7 +3137,9 @@ MultiXactMemberPagePrecedes(int page1, int page2)
offset1 = ((MultiXactOffset) page1) * MULTIXACT_MEMBERS_PER_PAGE;
offset2 = ((MultiXactOffset) page2) * MULTIXACT_MEMBERS_PER_PAGE;
- return MultiXactOffsetPrecedes(offset1, offset2);
+ return (MultiXactOffsetPrecedes(offset1, offset2) &&
+ MultiXactOffsetPrecedes(offset1,
+ offset2 + MULTIXACT_MEMBERS_PER_PAGE - 1));
}
/*
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 61249f4..33e1e93 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -1219,11 +1219,6 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
pgstat_count_slru_truncate(shared->slru_stats_idx);
/*
- * The cutoff point is the start of the segment containing cutoffPage.
- */
- cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
-
- /*
* Scan shared memory and remove any pages preceding the cutoff page, to
* ensure we won't rewrite them later. (Since this is normally called in
* or just after a checkpoint, any dirty pages should have been flushed
@@ -1235,9 +1230,7 @@ restart:;
/*
* While we are holding the lock, make an important safety check: the
- * planned cutoff point must be <= the current endpoint page. Otherwise we
- * have already wrapped around, and proceeding with the truncation would
- * risk removing the current segment.
+ * current endpoint page must not be eligible for removal.
*/
if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
{
@@ -1269,8 +1262,11 @@ restart:;
* Hmm, we have (or may have) I/O operations acting on the page, so
* we've got to wait for them to finish and then start again. This is
* the same logic as in SlruSelectLRUPage. (XXX if page is dirty,
- * wouldn't it be OK to just discard it without writing it? For now,
- * keep the logic the same as it was.)
+ * wouldn't it be OK to just discard it without writing it?
+ * SlruMayDeleteSegment() uses a stricter qualification, so we might
+ * not delete this page in the end; even if we don't delete it, we
+ * won't have cause to read its data again. For now, keep the logic
+ * the same as it was.)
*/
if (shared->page_status[slotno] == SLRU_PAGE_VALID)
SlruInternalWritePage(ctl, slotno, NULL);
@@ -1361,18 +1357,133 @@ restart:
}
/*
+ * Determine whether a segment is okay to delete.
+ *
+ * segpage is the first page of the segment, and cutoffPage is the oldest (in
+ * PagePrecedes order) page in the SLRU containing still-useful data. Since
+ * every core PagePrecedes callback implements "wrap around", check the
+ * segment's first and last pages:
+ *
+ * first<cutoff && last<cutoff: yes
+ * first<cutoff && last>=cutoff: no; cutoff falls inside this segment
+ * first>=cutoff && last<cutoff: no; wrap point falls inside this segment
+ * first>=cutoff && last>=cutoff: no; every page of this segment is too young
+ */
+static bool
+SlruMayDeleteSegment(SlruCtl ctl, int segpage, int cutoffPage)
+{
+ int seg_last_page = segpage + SLRU_PAGES_PER_SEGMENT - 1;
+
+ Assert(segpage % SLRU_PAGES_PER_SEGMENT == 0);
+
+ return (ctl->PagePrecedes(segpage, cutoffPage) &&
+ ctl->PagePrecedes(seg_last_page, cutoffPage));
+}
+
+#ifdef USE_ASSERT_CHECKING
+static void
+SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
+{
+ TransactionId lhs,
+ rhs;
+ int newestPage,
+ oldestPage;
+ TransactionId newestXact,
+ oldestXact;
+
+ /*
+ * Compare an XID pair having undefined order (see RFC 1982), a pair at
+ * "opposite ends" of the XID space. TransactionIdPrecedes() treats each
+ * as preceding the other. If RHS is oldestXact, LHS is the first XID we
+ * must not assign.
+ */
+ lhs = per_page + offset; /* skip first page to avoid non-normal XIDs */
+ rhs = lhs + (1U << 31);
+ Assert(TransactionIdPrecedes(lhs, rhs));
+ Assert(TransactionIdPrecedes(rhs, lhs));
+ Assert(!TransactionIdPrecedes(lhs - 1, rhs));
+ Assert(TransactionIdPrecedes(rhs, lhs - 1));
+ Assert(TransactionIdPrecedes(lhs + 1, rhs));
+ Assert(!TransactionIdPrecedes(rhs, lhs + 1));
+ Assert(!TransactionIdFollowsOrEquals(lhs, rhs));
+ Assert(!TransactionIdFollowsOrEquals(rhs, lhs));
+ Assert(!ctl->PagePrecedes(lhs / per_page, lhs / per_page));
+ Assert(!ctl->PagePrecedes(lhs / per_page, rhs / per_page));
+ Assert(!ctl->PagePrecedes(rhs / per_page, lhs / per_page));
+ Assert(!ctl->PagePrecedes((lhs - per_page) / per_page, rhs / per_page));
+ Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 3 * per_page) / per_page));
+ Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 2 * per_page) / per_page));
+ Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 1 * per_page) / per_page)
+ || (1U << 31) % per_page != 0); /* See CommitTsPagePrecedes() */
+ Assert(ctl->PagePrecedes((lhs + 1 * per_page) / per_page, rhs / per_page)
+ || (1U << 31) % per_page != 0);
+ Assert(ctl->PagePrecedes((lhs + 2 * per_page) / per_page, rhs / per_page));
+ Assert(ctl->PagePrecedes((lhs + 3 * per_page) / per_page, rhs / per_page));
+ Assert(!ctl->PagePrecedes(rhs / per_page, (lhs + per_page) / per_page));
+
+ /*
+ * GetNewTransactionId() has assigned the last XID it can safely use, and
+ * that XID is in the *LAST* page of the second segment. We must not
+ * delete that segment.
+ */
+ newestPage = 2 * SLRU_PAGES_PER_SEGMENT - 1;
+ newestXact = newestPage * per_page + offset;
+ Assert(newestXact / per_page == newestPage);
+ oldestXact = newestXact + 1;
+ oldestXact -= 1U << 31;
+ oldestPage = oldestXact / per_page;
+ Assert(!SlruMayDeleteSegment(ctl,
+ (newestPage -
+ newestPage % SLRU_PAGES_PER_SEGMENT),
+ oldestPage));
+
+ /*
+ * GetNewTransactionId() has assigned the last XID it can safely use, and
+ * that XID is in the *FIRST* page of the second segment. We must not
+ * delete that segment.
+ */
+ newestPage = SLRU_PAGES_PER_SEGMENT;
+ newestXact = newestPage * per_page + offset;
+ Assert(newestXact / per_page == newestPage);
+ oldestXact = newestXact + 1;
+ oldestXact -= 1U << 31;
+ oldestPage = oldestXact / per_page;
+ Assert(!SlruMayDeleteSegment(ctl,
+ (newestPage -
+ newestPage % SLRU_PAGES_PER_SEGMENT),
+ oldestPage));
+}
+
+/*
+ * Unit-test a PagePrecedes function.
+ *
+ * This assumes every uint32 >= FirstNormalTransactionId is a valid key. It
+ * assumes each value occupies a contiguous, fixed-size region of SLRU bytes.
+ * (MultiXactMemberCtl separates flags from XIDs. AsyncCtl has
+ * variable-length entries, no keys, and no random access. These unit tests
+ * do not apply to them.)
+ */
+void
+SlruPagePrecedesUnitTests(SlruCtl ctl, int per_page)
+{
+ /* Test first, middle and last entries of a page. */
+ SlruPagePrecedesTestOffset(ctl, per_page, 0);
+ SlruPagePrecedesTestOffset(ctl, per_page, per_page / 2);
+ SlruPagePrecedesTestOffset(ctl, per_page, per_page - 1);
+}
+#endif
+
+/*
* SlruScanDirectory callback
- * This callback reports true if there's any segment prior to the one
- * containing the page passed as "data".
+ * This callback reports true if there's any segment wholly prior to the
+ * one containing the page passed as "data".
*/
bool
SlruScanDirCbReportPresence(SlruCtl ctl, char *filename, int segpage, void *data)
{
int cutoffPage = *(int *) data;
- cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
-
- if (ctl->PagePrecedes(segpage, cutoffPage))
+ if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
return true; /* found one; don't iterate any more */
return false; /* keep going */
@@ -1387,7 +1498,7 @@ SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename, int segpage, void *data)
{
int cutoffPage = *(int *) data;
- if (ctl->PagePrecedes(segpage, cutoffPage))
+ if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
SlruInternalDeleteSegment(ctl, filename);
return false; /* keep going */
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index f33ae40..09cfa38 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -196,6 +196,7 @@ SUBTRANSShmemInit(void)
LWTRANCHE_SUBTRANS_BUFFER);
/* Override default assumption that writes should be fsync'd */
SubTransCtl->do_fsync = false;
+ SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
}
/*
@@ -373,13 +374,8 @@ TruncateSUBTRANS(TransactionId oldestXact)
/*
- * Decide which of two SUBTRANS page numbers is "older" for truncation purposes.
- *
- * We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic. However, if we are asked about
- * page number zero, we don't want to hand InvalidTransactionId to
- * TransactionIdPrecedes: it'll get weird about permanent xact IDs. So,
- * offset both xids by FirstNormalTransactionId to avoid that.
+ * Decide whether a SUBTRANS page number is "older" for truncation purposes.
+ * Analogous to CLOGPagePrecedes().
*/
static bool
SubTransPagePrecedes(int page1, int page2)
@@ -388,9 +384,10 @@ SubTransPagePrecedes(int page1, int page2)
TransactionId xid2;
xid1 = ((TransactionId) page1) * SUBTRANS_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId;
+ xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * SUBTRANS_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId;
+ xid2 += FirstNormalTransactionId + 1;
- return TransactionIdPrecedes(xid1, xid2);
+ return (TransactionIdPrecedes(xid1, xid2) &&
+ TransactionIdPrecedes(xid1, xid2 + SUBTRANS_XACTS_PER_PAGE - 1));
}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index a3ba88d..7e5cd66 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -487,7 +487,12 @@ asyncQueuePageDiff(int p, int q)
return diff;
}
-/* Is p < q, accounting for wraparound? */
+/*
+ * Is p < q, accounting for wraparound?
+ *
+ * Since asyncQueueIsFull() blocks creation of a page that could precede any
+ * extant page, we need not assess entries within a page.
+ */
static bool
asyncQueuePagePrecedes(int p, int q)
{
@@ -1349,8 +1354,8 @@ asyncQueueIsFull(void)
* logically precedes the current global tail pointer, ie, the head
* pointer would wrap around compared to the tail. We cannot create such
* a head page for fear of confusing slru.c. For safety we round the tail
- * pointer back to a segment boundary (compare the truncation logic in
- * asyncQueueAdvanceTail).
+ * pointer back to a segment boundary (truncation logic in
+ * asyncQueueAdvanceTail does not do this, so doing it here is optional).
*
* Note that this test is *not* dependent on how much space there is on
* the current head page. This is necessary because asyncQueueAddEntries
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index ba93fb1..fde1b5c 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -438,7 +438,7 @@ static void SetPossibleUnsafeConflict(SERIALIZABLEXACT *roXact, SERIALIZABLEXACT
static void ReleaseRWConflict(RWConflict conflict);
static void FlagSxactUnsafe(SERIALIZABLEXACT *sxact);
-static bool SerialPagePrecedesLogically(int p, int q);
+static bool SerialPagePrecedesLogically(int page1, int page2);
static void SerialInit(void);
static void SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo);
static SerCommitSeqNo SerialGetMinConflictCommitSeqNo(TransactionId xid);
@@ -784,27 +784,77 @@ FlagSxactUnsafe(SERIALIZABLEXACT *sxact)
/*------------------------------------------------------------------------*/
/*
- * We will work on the page range of 0..SERIAL_MAX_PAGE.
- * Compares using wraparound logic, as is required by slru.c.
+ * Decide whether a Serial page number is "older" for truncation purposes.
+ * Analogous to CLOGPagePrecedes().
*/
static bool
-SerialPagePrecedesLogically(int p, int q)
+SerialPagePrecedesLogically(int page1, int page2)
{
- int diff;
+ TransactionId xid1;
+ TransactionId xid2;
+
+ xid1 = ((TransactionId) page1) * SERIAL_ENTRIESPERPAGE;
+ xid1 += FirstNormalTransactionId + 1;
+ xid2 = ((TransactionId) page2) * SERIAL_ENTRIESPERPAGE;
+ xid2 += FirstNormalTransactionId + 1;
+
+ return (TransactionIdPrecedes(xid1, xid2) &&
+ TransactionIdPrecedes(xid1, xid2 + SERIAL_ENTRIESPERPAGE - 1));
+}
+
+static void
+SerialPagePrecedesLogicallyUnitTests(void)
+{
+ int per_page = SERIAL_ENTRIESPERPAGE,
+ offset = per_page / 2;
+ int newestPage,
+ oldestPage,
+ headPage,
+ targetPage;
+ TransactionId newestXact,
+ oldestXact;
+
+ /* GetNewTransactionId() has assigned the last XID it can safely use. */
+ newestPage = 2 * SLRU_PAGES_PER_SEGMENT - 1; /* nothing special */
+ newestXact = newestPage * per_page + offset;
+ Assert(newestXact / per_page == newestPage);
+ oldestXact = newestXact + 1;
+ oldestXact -= 1U << 31;
+ oldestPage = oldestXact / per_page;
/*
- * We have to compare modulo (SERIAL_MAX_PAGE+1)/2. Both inputs should be
- * in the range 0..SERIAL_MAX_PAGE.
+ * In this scenario, the SLRU headPage pertains to the last ~1000 XIDs
+ * assigned. oldestXact finishes, ~2B XIDs having elapsed since it
+ * started. Further transactions cause us to summarize oldestXact to
+ * tailPage. Function must return false so SerialAdd() doesn't zero
+ * tailPage (which may contain entries for other old, recently-finished
+ * XIDs) and half the SLRU. Reaching this requires burning ~2B XIDs in
+ * single-user mode, a negligible possibility.
*/
- Assert(p >= 0 && p <= SERIAL_MAX_PAGE);
- Assert(q >= 0 && q <= SERIAL_MAX_PAGE);
-
- diff = p - q;
- if (diff >= ((SERIAL_MAX_PAGE + 1) / 2))
- diff -= SERIAL_MAX_PAGE + 1;
- else if (diff < -((int) (SERIAL_MAX_PAGE + 1) / 2))
- diff += SERIAL_MAX_PAGE + 1;
- return diff < 0;
+ headPage = newestPage;
+ targetPage = oldestPage;
+ Assert(!SerialPagePrecedesLogically(headPage, targetPage));
+
+ /*
+ * In this scenario, the SLRU headPage pertains to oldestXact. We're
+ * summarizing an XID near newestXact. (Assume few other XIDs used
+ * SERIALIZABLE, hence the minimal headPage advancement. Assume
+ * oldestXact was long-running and only recently reached the SLRU.)
+ * Function must return true to make SerialAdd() create targetPage.
+ *
+ * Today's implementation mishandles this case, but it doesn't matter
+ * enough to fix. Verify that the defect affects just one page by
+ * asserting correct treatment of its prior page. Reaching this case
+ * requires burning ~2B XIDs in single-user mode, a negligible
+ * possibility. Moreover, if it does happen, the consequence would be
+ * mild, namely a new transaction failing in SimpleLruReadPage().
+ */
+ headPage = oldestPage;
+ targetPage = newestPage;
+ Assert(SerialPagePrecedesLogically(headPage, targetPage - 1));
+#if 0
+ Assert(SerialPagePrecedesLogically(headPage, targetPage));
+#endif
}
/*
@@ -824,6 +874,8 @@ SerialInit(void)
LWTRANCHE_SERIAL_BUFFER);
/* Override default assumption that writes should be fsync'd */
SerialSlruCtl->do_fsync = false;
+ SerialPagePrecedesLogicallyUnitTests();
+ SlruPagePrecedesUnitTests(SerialSlruCtl, SERIAL_ENTRIESPERPAGE);
/*
* Create or attach to the SerialControl structure.
@@ -1032,7 +1084,7 @@ CheckPointPredicate(void)
}
else
{
- /*
+ /*----------
* The SLRU is no longer needed. Truncate to head before we set head
* invalid.
*
@@ -1041,6 +1093,25 @@ CheckPointPredicate(void)
* that we leave behind will appear to be new again. In that case it
* won't be removed until XID horizon advances enough to make it
* current again.
+ *
+ * XXX: This should happen in vac_truncate_clog(), not in checkpoints.
+ * Consider this scenario, starting from a system with no in-progress
+ * transactions and VACUUM FREEZE having maximized oldestXact:
+ * - Start a SERIALIZABLE transaction.
+ * - Start, finish, and summarize a SERIALIZABLE transaction, creating
+ * one SLRU page.
+ * - Consume XIDs to reach xidStopLimit.
+ * - Finish all transactions. Due to the long-running SERIALIZABLE
+ * transaction, earlier checkpoints did not touch headPage. The
+ * next checkpoint will change it, but that checkpoint happens after
+ * the end of the scenario.
+ * - VACUUM to advance XID limits.
+ * - Consume ~2M XIDs, crossing the former xidWrapLimit.
+ * - Start, finish, and summarize a SERIALIZABLE transaction.
+ * SerialAdd() declines to create the targetPage, because headPage
+ * is not regarded as in the past relative to that targetPage. The
+ * transaction instigating the summarize fails in
+ * SimpleLruReadPage().
*/
tailPage = serialControl->headPage;
serialControl->headPage = -1;
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 61fbc80..19982f6 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -117,9 +117,14 @@ typedef struct SlruCtlData
bool do_fsync;
/*
- * Decide which of two page numbers is "older" for truncation purposes. We
- * need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic.
+ * Decide whether a page is "older" for truncation and as a hint for
+ * evicting pages in LRU order. Return true if every entry of the first
+ * argument is older than every entry of the second argument. Note that
+ * !PagePrecedes(a,b) && !PagePrecedes(b,a) need not imply a==b; it also
+ * arises when some entries are older and some are not. For SLRUs using
+ * SimpleLruTruncate(), this must use modular arithmetic. (For others,
+ * the behavior of this callback has no functional implications.) Use
+ * SlruPagePrecedesUnitTests() in SLRUs meeting its criteria.
*/
bool (*PagePrecedes) (int, int);
@@ -143,6 +148,11 @@ extern int SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno,
TransactionId xid);
extern void SimpleLruWritePage(SlruCtl ctl, int slotno);
extern void SimpleLruFlush(SlruCtl ctl, bool allow_redirtied);
+#ifdef USE_ASSERT_CHECKING
+extern void SlruPagePrecedesUnitTests(SlruCtl ctl, int per_page);
+#else
+#define SlruPagePrecedesUnitTests(ctl, per_page) do {} while (0)
+#endif
extern void SimpleLruTruncate(SlruCtl ctl, int cutoffPage);
extern bool SimpleLruDoesPhysicalPageExist(SlruCtl ctl, int pageno);
slru-truncate-insurance-v2.patchtext/plain; charset=us-asciiDownload
Author: Noah Misch <noah@leadboat.com>
Commit: Noah Misch <noah@leadboat.com>
Unlink less in SimpleLruTruncate(), as insurance against bugs.
SimpleLruTruncate() has been unlinking every expendable file. In edge
cases, it also deleted important files. The most recent commit fixed
that. Given the history of this class of bugs evading detection, let's
not trust that patch exclusively. Instead of unlinking segments
representing up to 2^31 past XIDs, delete no more than half that much.
The balance will stay in place; eventually, XID consumption will
overwrite it. This could mitigate unknown SimpleLruTruncate() bugs and
simplify manual remediation after one has overtaken wrap limits in
single-user mode.
Truncation behavior won't change at all until an SLRU is half full.
Once it does change, a drawback is conflict with the following defense.
TruncateMultiXact() skips truncation when unexpected files exist on
disk, which this change deliberately makes more common. Hence,
pg_multixact becomes more likely to persist in consuming its maximum
storage. Also, this change may uncover bugs in SLRU page recycling by
making that more common. For SLRUs outside of pg_multixact, maximum
storage rises by 50%; for example, the CLOG maximum rises from 512 MiB
to 768 MiB. Usage in pg_multixact may double. Back-patch to 9.5 (all
supported versions).
Reviewed by FIXME.
Discussion: https://postgr.es/m/20200330052809.GB2324620@rfd.leadboat.com
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index de0794a..f1abdbd 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -405,6 +405,11 @@
in every database at least once every two billion transactions.
</para>
+ <!-- This oversimplifies; there are (2^31)-1 XIDs in the past, the same
+ number in the future, and one incomparable. (For each pair of incomparable
+ XIDs, TransactionIdPrecedes(a, b) and TransactionIdPrecedes(b, a) both
+ return true.) None of that is important to the DBA, since xidStopLimit
+ intervenes long before. -->
<para>
The reason that periodic vacuuming solves the problem is that
<command>VACUUM</command> will mark rows as <emphasis>frozen</emphasis>, indicating that
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 4bd0da0..f57d5ff 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -52,7 +52,7 @@
* and CLOG segment numbering at
* 0xFFFFFFFF/CLOG_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need take no
* explicit notice of that fact in this module, except when comparing segment
- * and page numbers in TruncateCLOG (see CLOGPagePrecedes).
+ * and page numbers in TruncateCLOG (see CLOGPageDiff).
*/
/* We need two bits per xact, so four xacts fit in a byte */
@@ -89,7 +89,7 @@ static SlruCtlData XactCtlData;
static int ZeroCLOGPage(int pageno, bool writeXlog);
-static bool CLOGPagePrecedes(int page1, int page2);
+static int32 CLOGPageDiff(int page1, int page2);
static void WriteZeroPageXlogRec(int pageno);
static void WriteTruncateXlogRec(int pageno, TransactionId oldestXact,
Oid oldestXactDb);
@@ -689,10 +689,10 @@ CLOGShmemSize(void)
void
CLOGShmemInit(void)
{
- XactCtl->PagePrecedes = CLOGPagePrecedes;
+ XactCtl->PageDiff = CLOGPageDiff;
SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER);
- SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
+ SlruPageDiffUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
}
/*
@@ -907,7 +907,7 @@ TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid)
cutoffPage = TransactionIdToPage(oldestXact);
/* Check to see if there's any files that could be removed */
- if (!SlruScanDirectory(XactCtl, SlruScanDirCbReportPresence, &cutoffPage))
+ if (!SlruScanDirectory(XactCtl, SlruScanDirCbWouldTruncate, &cutoffPage))
return; /* nothing to remove */
/*
@@ -933,13 +933,14 @@ TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid)
/*
- * Decide whether a CLOG page number is "older" for truncation purposes.
+ * Diff CLOG page numbers for truncation purposes.
*
- * We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic. However, TransactionIdPrecedes()
- * would get weird about permanent xact IDs. So, offset both such that xid1,
- * xid2, and xid + CLOG_XACTS_PER_PAGE - 1 are all normal XIDs; this offset is
- * relevant to page 0 and to the page preceding page 0.
+ * To do the right thing with wraparound XID arithmetic, this mirrors
+ * TransactionIdPrecedes(). The Max() operation ensures we return a positive
+ * value when the wrap point may fall inside these pages. (When it does, some
+ * pairs of entries have a positive diff, and other pairs have a negative
+ * diff.) Only the predicate.c SLRU needs the Max() operation; to avoid
+ * having even more corner cases to understand, all XID-indexed SLRUs do it.
*
* The page containing oldestXact-2^31 is the important edge case. The
* portion of that page equaling or following oldestXact-2^31 is expendable,
@@ -947,22 +948,22 @@ TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid)
* the first XID of a page and segment, the entire page and segment is
* expendable, and we could truncate the segment. Recognizing that case would
* require making oldestXact, not just the page containing oldestXact,
- * available to this callback. The benefit would be rare and small, so we
- * don't optimize that edge case.
+ * available to this callback. slru.c wouldn't delete the page, anyway.
*/
-static bool
-CLOGPagePrecedes(int page1, int page2)
+static int32
+CLOGPageDiff(int page1, int page2)
{
TransactionId xid1;
TransactionId xid2;
+ int32 diff_head;
+ int32 diff_tail;
xid1 = ((TransactionId) page1) * CLOG_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId + 1;
- return (TransactionIdPrecedes(xid1, xid2) &&
- TransactionIdPrecedes(xid1, xid2 + CLOG_XACTS_PER_PAGE - 1));
+ diff_head = xid1 - xid2;
+ diff_tail = xid1 - (xid2 + CLOG_XACTS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index a1efee4..3f49571 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -46,7 +46,7 @@
* 0xFFFFFFFF/COMMIT_TS_XACTS_PER_PAGE, and CommitTs segment numbering at
* 0xFFFFFFFF/COMMIT_TS_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need take no
* explicit notice of that fact in this module, except when comparing segment
- * and page numbers in TruncateCommitTs (see CommitTsPagePrecedes).
+ * and page numbers in TruncateCommitTs (see CommitTsPageDiff).
*/
/*
@@ -109,7 +109,7 @@ static void TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
RepOriginId nodeid, int slotno);
static void error_commit_ts_disabled(void);
static int ZeroCommitTsPage(int pageno, bool writeXlog);
-static bool CommitTsPagePrecedes(int page1, int page2);
+static int32 CommitTsPageDiff(int page1, int page2);
static void ActivateCommitTs(void);
static void DeactivateCommitTs(void);
static void WriteZeroPageXlogRec(int pageno);
@@ -552,11 +552,11 @@ CommitTsShmemInit(void)
{
bool found;
- CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
+ CommitTsCtl->PageDiff = CommitTsPageDiff;
SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
CommitTsSLRULock, "pg_commit_ts",
LWTRANCHE_COMMITTS_BUFFER);
- SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
+ SlruPageDiffUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
commitTsShared = ShmemInitStruct("CommitTs shared",
sizeof(CommitTimestampShared),
@@ -892,7 +892,7 @@ TruncateCommitTs(TransactionId oldestXact)
cutoffPage = TransactionIdToCTsPage(oldestXact);
/* Check to see if there's any files that could be removed */
- if (!SlruScanDirectory(CommitTsCtl, SlruScanDirCbReportPresence,
+ if (!SlruScanDirectory(CommitTsCtl, SlruScanDirCbWouldTruncate,
&cutoffPage))
return; /* nothing to remove */
@@ -945,8 +945,8 @@ AdvanceOldestCommitTsXid(TransactionId oldestXact)
/*
- * Decide whether a commitTS page number is "older" for truncation purposes.
- * Analogous to CLOGPagePrecedes().
+ * Diff commitTS page numbers for truncation purposes. Analogous to
+ * CLOGPageDiff().
*
* At every supported BLCKSZ, (1 << 31) % COMMIT_TS_XACTS_PER_PAGE == 128.
* This introduces differences compared to CLOG and the other SLRUs having (1
@@ -957,7 +957,7 @@ AdvanceOldestCommitTsXid(TransactionId oldestXact)
* 128 entries of its page. Since this function doesn't know the location of
* oldestXact within page2, it returns false for one page that actually is
* expendable. This is a wider (yet still negligible) version of the
- * truncation opportunity that CLOGPagePrecedes() cannot recognize.
+ * truncation opportunity that CLOGPageDiff() cannot recognize.
*
* For the sake of a worked example, number entries with decimal values such
* that page1==1 entries range from 1.0 to 1.999. Let N+0.15 be the number of
@@ -967,19 +967,20 @@ AdvanceOldestCommitTsXid(TransactionId oldestXact)
* last entry of the oldestXact page. While page 2 is expendable at
* oldestXact=N+2.1, it would be precious at oldestXact=N+2.9.
*/
-static bool
-CommitTsPagePrecedes(int page1, int page2)
+static int32
+CommitTsPageDiff(int page1, int page2)
{
TransactionId xid1;
TransactionId xid2;
+ int32 diff_head;
+ int32 diff_tail;
xid1 = ((TransactionId) page1) * COMMIT_TS_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * COMMIT_TS_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId + 1;
- return (TransactionIdPrecedes(xid1, xid2) &&
- TransactionIdPrecedes(xid1, xid2 + COMMIT_TS_XACTS_PER_PAGE - 1));
+ diff_head = xid1 - xid2;
+ diff_tail = xid1 - (xid2 + COMMIT_TS_XACTS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index d3b541b..574f6f7 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -102,7 +102,7 @@
* 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
* take no explicit notice of that fact in this module, except when comparing
* segment and page numbers in TruncateMultiXact (see
- * MultiXactOffsetPagePrecedes).
+ * MultiXactOffsetPageDiff).
*/
/* We need four bytes per offset */
@@ -355,8 +355,8 @@ static char *mxstatus_to_string(MultiXactStatus status);
/* management of SLRU infrastructure */
static int ZeroMultiXactOffsetPage(int pageno, bool writeXlog);
static int ZeroMultiXactMemberPage(int pageno, bool writeXlog);
-static bool MultiXactOffsetPagePrecedes(int page1, int page2);
-static bool MultiXactMemberPagePrecedes(int page1, int page2);
+static int32 MultiXactOffsetPageDiff(int page1, int page2);
+static int32 MultiXactMemberPageDiff(int page1, int page2);
static bool MultiXactOffsetPrecedes(MultiXactOffset offset1,
MultiXactOffset offset2);
static void ExtendMultiXactOffset(MultiXactId multi);
@@ -1825,14 +1825,14 @@ MultiXactShmemInit(void)
debug_elog2(DEBUG2, "Shared Memory Init for MultiXact");
- MultiXactOffsetCtl->PagePrecedes = MultiXactOffsetPagePrecedes;
- MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
+ MultiXactOffsetCtl->PageDiff = MultiXactOffsetPageDiff;
+ MultiXactMemberCtl->PageDiff = MultiXactMemberPageDiff;
SimpleLruInit(MultiXactOffsetCtl,
"MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
MultiXactOffsetSLRULock, "pg_multixact/offsets",
LWTRANCHE_MULTIXACTOFFSET_BUFFER);
- SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
+ SlruPageDiffUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
SimpleLruInit(MultiXactMemberCtl,
"MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
MultiXactMemberSLRULock, "pg_multixact/members",
@@ -2859,7 +2859,7 @@ SlruScanDirCbFindEarliest(SlruCtl ctl, char *filename, int segpage, void *data)
mxtruncinfo *trunc = (mxtruncinfo *) data;
if (trunc->earliestExistingPage == -1 ||
- ctl->PagePrecedes(segpage, trunc->earliestExistingPage))
+ ctl->PageDiff(segpage, trunc->earliestExistingPage))
{
trunc->earliestExistingPage = segpage;
}
@@ -2978,11 +2978,12 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
*
* When nextMXact is less than one segment away from multiWrapLimit,
* SlruScanDirCbFindEarliest can find some early segment other than the
- * actual earliest. (MultiXactOffsetPagePrecedes(EARLIEST, LATEST)
- * returns false, because not all pairs of entries have the same answer.)
- * That can also arise when an earlier truncation attempt failed unlink()
- * or returned early from this function. The only consequence is
- * returning early, which wastes space that we could have liberated.
+ * actual earliest. (MultiXactOffsetPageDiff(EARLIEST, LATEST) >= 0,
+ * because not all pairs of entries have the same answer.) That can also
+ * arise when an earlier truncation attempt failed unlink(), returned
+ * early from this function, or saw SlruWouldTruncateSegment() decline to
+ * delete the older half of the SLRU. The only consequence is returning
+ * early, which wastes space that we could have liberated.
*
* NB: It's also possible that the page that oldestMulti is on has already
* been truncated away, and we crashed before updating oldestMulti.
@@ -3098,44 +3099,42 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
}
/*
- * Decide whether a MultiXactOffset page number is "older" for truncation
- * purposes. Analogous to CLOGPagePrecedes().
- *
- * Offsetting the values is optional, because MultiXactIdPrecedes() has
- * translational symmetry.
+ * Diff MultiXactOffset page numbers for truncation purposes. Analogous to
+ * CLOGPageDiff().
*/
-static bool
-MultiXactOffsetPagePrecedes(int page1, int page2)
+static int32
+MultiXactOffsetPageDiff(int page1, int page2)
{
MultiXactId multi1;
MultiXactId multi2;
+ int32 diff_head;
+ int32 diff_tail;
multi1 = ((MultiXactId) page1) * MULTIXACT_OFFSETS_PER_PAGE;
- multi1 += FirstMultiXactId + 1;
multi2 = ((MultiXactId) page2) * MULTIXACT_OFFSETS_PER_PAGE;
- multi2 += FirstMultiXactId + 1;
- return (MultiXactIdPrecedes(multi1, multi2) &&
- MultiXactIdPrecedes(multi1,
- multi2 + MULTIXACT_OFFSETS_PER_PAGE - 1));
+ diff_head = multi1 - multi2;
+ diff_tail = multi1 - (multi2 + MULTIXACT_OFFSETS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
/*
- * Decide whether a MultiXactMember page number is "older" for truncation
- * purposes. There is no "invalid offset number" so use the numbers verbatim.
+ * Diff MultiXactMember page numbers for truncation purposes.
*/
-static bool
-MultiXactMemberPagePrecedes(int page1, int page2)
+static int32
+MultiXactMemberPageDiff(int page1, int page2)
{
MultiXactOffset offset1;
MultiXactOffset offset2;
+ int32 diff_head;
+ int32 diff_tail;
offset1 = ((MultiXactOffset) page1) * MULTIXACT_MEMBERS_PER_PAGE;
offset2 = ((MultiXactOffset) page2) * MULTIXACT_MEMBERS_PER_PAGE;
- return (MultiXactOffsetPrecedes(offset1, offset2) &&
- MultiXactOffsetPrecedes(offset1,
- offset2 + MULTIXACT_MEMBERS_PER_PAGE - 1));
+ diff_head = offset1 - offset2;
+ diff_tail = offset1 - (offset2 + MULTIXACT_MEMBERS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
/*
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index dbef7c4..c987595 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -248,7 +248,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
/*
* Initialize the unshared control struct, including directory path. We
- * assume caller set PagePrecedes.
+ * assume caller set PageDiff.
*/
ctl->shared = shared;
ctl->do_fsync = true; /* default behavior */
@@ -1067,8 +1067,8 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
{
if (this_delta > best_valid_delta ||
(this_delta == best_valid_delta &&
- ctl->PagePrecedes(this_page_number,
- best_valid_page_number)))
+ ctl->PageDiff(this_page_number,
+ best_valid_page_number) < 0))
{
bestvalidslot = slotno;
best_valid_delta = this_delta;
@@ -1079,8 +1079,8 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
{
if (this_delta > best_invalid_delta ||
(this_delta == best_invalid_delta &&
- ctl->PagePrecedes(this_page_number,
- best_invalid_page_number)))
+ ctl->PageDiff(this_page_number,
+ best_invalid_page_number) < 0))
{
bestinvalidslot = slotno;
best_invalid_delta = this_delta;
@@ -1190,7 +1190,8 @@ SimpleLruFlush(SlruCtl ctl, bool allow_redirtied)
}
/*
- * Remove all segments before the one holding the passed page number
+ * Remove some obsolete segments. As defense in depth, this deletes less than
+ * PageDiff() authorizes; see SlruWouldTruncateSegment().
*
* All SLRUs prevent concurrent calls to this function, either with an LWLock
* or by calling it only as part of a checkpoint. Mutual exclusion must begin
@@ -1223,7 +1224,7 @@ restart:;
* While we are holding the lock, make an important safety check: the
* current endpoint page must not be eligible for removal.
*/
- if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+ if (ctl->PageDiff(shared->latest_page_number, cutoffPage) < 0)
{
LWLockRelease(shared->ControlLock);
ereport(LOG,
@@ -1236,7 +1237,7 @@ restart:;
{
if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
continue;
- if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
+ if (ctl->PageDiff(shared->page_number[slotno], cutoffPage) >= 0)
continue;
/*
@@ -1348,33 +1349,46 @@ restart:
}
/*
- * Determine whether a segment is okay to delete.
+ * Determine whether to delete a segment.
*
* segpage is the first page of the segment, and cutoffPage is the oldest (in
- * PagePrecedes order) page in the SLRU containing still-useful data. Since
- * every core PagePrecedes callback implements "wrap around", check the
+ * PageDiff order) page in the SLRU containing still-useful data. Check the
* segment's first and last pages:
*
* first<cutoff && last<cutoff: yes
* first<cutoff && last>=cutoff: no; cutoff falls inside this segment
* first>=cutoff && last<cutoff: no; wrap point falls inside this segment
* first>=cutoff && last>=cutoff: no; every page of this segment is too young
+ *
+ * The PageDiff specification requires us not to remove pages where the
+ * callback reports negative values close to INT_MIN. Our interpretation is
+ * to decline to delete segments containing a page P such that PageDiff(P,
+ * cutoffPage) is in [INT_MIN, INT_MIN/2].
*/
static bool
-SlruMayDeleteSegment(SlruCtl ctl, int segpage, int cutoffPage)
+SlruWouldTruncateSegment(SlruCtl ctl, int segpage, int cutoffPage)
{
- int seg_last_page = segpage + SLRU_PAGES_PER_SEGMENT - 1;
+ int first_page_diff;
Assert(segpage % SLRU_PAGES_PER_SEGMENT == 0);
- return (ctl->PagePrecedes(segpage, cutoffPage) &&
- ctl->PagePrecedes(seg_last_page, cutoffPage));
+ first_page_diff = ctl->PageDiff(segpage, cutoffPage);
+ if (first_page_diff < 0 && first_page_diff > INT_MIN / 2)
+ {
+ int seg_last_page = segpage + SLRU_PAGES_PER_SEGMENT - 1;
+ int last_page_diff = ctl->PageDiff(seg_last_page, cutoffPage);
+
+ return last_page_diff < 0 && last_page_diff > INT_MIN / 2;
+ }
+ return false;
}
#ifdef USE_ASSERT_CHECKING
static void
-SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
+SlruPageDiffTestOffset(SlruCtl ctl, int per_page, uint32 offset)
{
+ int32 large_negative = INT_MIN / 1000 * 999,
+ large_positive = INT_MAX / 1000 * 999;
TransactionId lhs,
rhs;
int newestPage,
@@ -1398,19 +1412,27 @@ SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
Assert(!TransactionIdPrecedes(rhs, lhs + 1));
Assert(!TransactionIdFollowsOrEquals(lhs, rhs));
Assert(!TransactionIdFollowsOrEquals(rhs, lhs));
- Assert(!ctl->PagePrecedes(lhs / per_page, lhs / per_page));
- Assert(!ctl->PagePrecedes(lhs / per_page, rhs / per_page));
- Assert(!ctl->PagePrecedes(rhs / per_page, lhs / per_page));
- Assert(!ctl->PagePrecedes((lhs - per_page) / per_page, rhs / per_page));
- Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 3 * per_page) / per_page));
- Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 2 * per_page) / per_page));
- Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 1 * per_page) / per_page)
- || (1U << 31) % per_page != 0); /* See CommitTsPagePrecedes() */
- Assert(ctl->PagePrecedes((lhs + 1 * per_page) / per_page, rhs / per_page)
+ Assert(ctl->PageDiff(lhs / per_page, lhs / per_page) == 0);
+ Assert(ctl->PageDiff(lhs / per_page, rhs / per_page) > large_positive);
+ Assert(ctl->PageDiff(rhs / per_page, lhs / per_page) > large_positive);
+ Assert(ctl->PageDiff((lhs - per_page) / per_page, rhs / per_page) >
+ large_positive);
+ Assert(ctl->PageDiff(rhs / per_page, (lhs - 3 * per_page) / per_page) <
+ large_negative);
+ Assert(ctl->PageDiff(rhs / per_page, (lhs - 2 * per_page) / per_page) <
+ large_negative);
+ Assert(ctl->PageDiff(rhs / per_page, (lhs - 1 * per_page) / per_page) <
+ large_negative
+ || (1U << 31) % per_page != 0); /* See CommitTsPageDiff() */
+ Assert(ctl->PageDiff((lhs + 1 * per_page) / per_page, rhs / per_page) <
+ large_negative
|| (1U << 31) % per_page != 0);
- Assert(ctl->PagePrecedes((lhs + 2 * per_page) / per_page, rhs / per_page));
- Assert(ctl->PagePrecedes((lhs + 3 * per_page) / per_page, rhs / per_page));
- Assert(!ctl->PagePrecedes(rhs / per_page, (lhs + per_page) / per_page));
+ Assert(ctl->PageDiff((lhs + 2 * per_page) / per_page, rhs / per_page) <
+ large_negative);
+ Assert(ctl->PageDiff((lhs + 3 * per_page) / per_page, rhs / per_page) <
+ large_negative);
+ Assert(ctl->PageDiff(rhs / per_page, (lhs + per_page) / per_page) >
+ large_positive);
/*
* GetNewTransactionId() has assigned the last XID it can safely use, and
@@ -1423,10 +1445,10 @@ SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
oldestXact = newestXact + 1;
oldestXact -= 1U << 31;
oldestPage = oldestXact / per_page;
- Assert(!SlruMayDeleteSegment(ctl,
- (newestPage -
- newestPage % SLRU_PAGES_PER_SEGMENT),
- oldestPage));
+ Assert(!SlruWouldTruncateSegment(ctl,
+ (newestPage -
+ newestPage % SLRU_PAGES_PER_SEGMENT),
+ oldestPage));
/*
* GetNewTransactionId() has assigned the last XID it can safely use, and
@@ -1439,42 +1461,44 @@ SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
oldestXact = newestXact + 1;
oldestXact -= 1U << 31;
oldestPage = oldestXact / per_page;
- Assert(!SlruMayDeleteSegment(ctl,
- (newestPage -
- newestPage % SLRU_PAGES_PER_SEGMENT),
- oldestPage));
+ Assert(!SlruWouldTruncateSegment(ctl,
+ (newestPage -
+ newestPage % SLRU_PAGES_PER_SEGMENT),
+ oldestPage));
}
/*
- * Unit-test a PagePrecedes function.
+ * Unit-test a PageDiff function.
*
* This assumes every uint32 >= FirstNormalTransactionId is a valid key. It
* assumes each value occupies a contiguous, fixed-size region of SLRU bytes.
* (MultiXactMemberCtl separates flags from XIDs. AsyncCtl has
* variable-length entries, no keys, and no random access. These unit tests
* do not apply to them.)
+ *
+ * This is stricter than the PageDiff API requires.
*/
void
-SlruPagePrecedesUnitTests(SlruCtl ctl, int per_page)
+SlruPageDiffUnitTests(SlruCtl ctl, int per_page)
{
/* Test first, middle and last entries of a page. */
- SlruPagePrecedesTestOffset(ctl, per_page, 0);
- SlruPagePrecedesTestOffset(ctl, per_page, per_page / 2);
- SlruPagePrecedesTestOffset(ctl, per_page, per_page - 1);
+ SlruPageDiffTestOffset(ctl, per_page, 0);
+ SlruPageDiffTestOffset(ctl, per_page, per_page / 2);
+ SlruPageDiffTestOffset(ctl, per_page, per_page - 1);
}
#endif
/*
* SlruScanDirectory callback
- * This callback reports true if there's any segment wholly prior to the
- * one containing the page passed as "data".
+ * This callback reports true if SimpleLruTruncate(ctl, *data) would
+ * unlink any segment.
*/
bool
-SlruScanDirCbReportPresence(SlruCtl ctl, char *filename, int segpage, void *data)
+SlruScanDirCbWouldTruncate(SlruCtl ctl, char *filename, int segpage, void *data)
{
int cutoffPage = *(int *) data;
- if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
+ if (SlruWouldTruncateSegment(ctl, segpage, cutoffPage))
return true; /* found one; don't iterate any more */
return false; /* keep going */
@@ -1489,7 +1513,7 @@ SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename, int segpage, void *data)
{
int cutoffPage = *(int *) data;
- if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
+ if (SlruWouldTruncateSegment(ctl, segpage, cutoffPage))
SlruInternalDeleteSegment(ctl, filename);
return false; /* keep going */
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 0a7e33d..bac0bbb 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -44,7 +44,7 @@
* 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE, and segment numbering at
* 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need take no
* explicit notice of that fact in this module, except when comparing segment
- * and page numbers in TruncateSUBTRANS (see SubTransPagePrecedes) and zeroing
+ * and page numbers in TruncateSUBTRANS (see SubTransPageDiff) and zeroing
* them in StartupSUBTRANS.
*/
@@ -64,7 +64,7 @@ static SlruCtlData SubTransCtlData;
static int ZeroSUBTRANSPage(int pageno);
-static bool SubTransPagePrecedes(int page1, int page2);
+static int32 SubTransPageDiff(int page1, int page2);
/*
@@ -190,13 +190,13 @@ SUBTRANSShmemSize(void)
void
SUBTRANSShmemInit(void)
{
- SubTransCtl->PagePrecedes = SubTransPagePrecedes;
+ SubTransCtl->PageDiff = SubTransPageDiff;
SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
SubtransSLRULock, "pg_subtrans",
LWTRANCHE_SUBTRANS_BUFFER);
/* Override default assumption that writes should be fsync'd */
SubTransCtl->do_fsync = false;
- SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
+ SlruPageDiffUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
}
/*
@@ -374,20 +374,21 @@ TruncateSUBTRANS(TransactionId oldestXact)
/*
- * Decide whether a SUBTRANS page number is "older" for truncation purposes.
- * Analogous to CLOGPagePrecedes().
+ * Diff SUBTRANS page numbers for truncation purposes. Analogous to
+ * CLOGPageDiff().
*/
-static bool
-SubTransPagePrecedes(int page1, int page2)
+static int32
+SubTransPageDiff(int page1, int page2)
{
TransactionId xid1;
TransactionId xid2;
+ int32 diff_head;
+ int32 diff_tail;
xid1 = ((TransactionId) page1) * SUBTRANS_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * SUBTRANS_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId + 1;
- return (TransactionIdPrecedes(xid1, xid2) &&
- TransactionIdPrecedes(xid1, xid2 + SUBTRANS_XACTS_PER_PAGE - 1));
+ diff_head = xid1 - xid2;
+ diff_tail = xid1 - (xid2 + SUBTRANS_XACTS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 9dcb3a8..1eb7174 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -207,13 +207,13 @@ typedef struct QueuePosition
/* choose logically smaller QueuePosition */
#define QUEUE_POS_MIN(x,y) \
- (asyncQueuePagePrecedes((x).page, (y).page) ? (x) : \
+ (asyncQueuePageDiff((x).page, (y).page) < 0 ? (x) : \
(x).page != (y).page ? (y) : \
(x).offset < (y).offset ? (x) : (y))
/* choose logically larger QueuePosition */
#define QUEUE_POS_MAX(x,y) \
- (asyncQueuePagePrecedes((x).page, (y).page) ? (y) : \
+ (asyncQueuePageDiff((x).page, (y).page) < 0 ? (y) : \
(x).page != (y).page ? (x) : \
(x).offset > (y).offset ? (x) : (y))
@@ -433,8 +433,7 @@ static bool backendTryAdvanceTail = false;
bool Trace_notify = false;
/* local function prototypes */
-static int asyncQueuePageDiff(int p, int q);
-static bool asyncQueuePagePrecedes(int p, int q);
+static int32 asyncQueuePageDiff(int p, int q);
static void queue_listen(ListenActionKind action, const char *channel);
static void Async_UnlistenOnExit(int code, Datum arg);
static void Exec_ListenPreCommit(void);
@@ -465,12 +464,16 @@ static void ClearPendingActionsAndNotifies(void);
/*
* Compute the difference between two queue page numbers (i.e., p - q),
- * accounting for wraparound.
+ * accounting for wraparound. Since asyncQueueIsFull() blocks creation of a
+ * page that could precede any extant page, we need not assess entries within
+ * a page.
*/
-static int
+static int32
asyncQueuePageDiff(int p, int q)
{
- int diff;
+ int diff_max = ((QUEUE_MAX_PAGE + 1) / 2) - 1,
+ diff;
+ int32 scale = INT_MAX / diff_max;
/*
* We have to compare modulo (QUEUE_MAX_PAGE+1)/2. Both inputs should be
@@ -484,19 +487,24 @@ asyncQueuePageDiff(int p, int q)
diff -= QUEUE_MAX_PAGE + 1;
else if (diff < -((QUEUE_MAX_PAGE + 1) / 2))
diff += QUEUE_MAX_PAGE + 1;
- return diff;
+ return diff * scale;
}
-/*
- * Is p < q, accounting for wraparound?
- *
- * Since asyncQueueIsFull() blocks creation of a page that could precede any
- * extant page, we need not assess entries within a page.
- */
-static bool
-asyncQueuePagePrecedes(int p, int q)
+static void
+asyncQueuePageDiffUnitTests(void)
{
- return asyncQueuePageDiff(p, q) < 0;
+ int32 large_negative = INT_MIN / 1000 * 999,
+ large_positive = INT_MAX / 1000 * 999;
+ int diff_min = -((QUEUE_MAX_PAGE + 1) / 2),
+ diff_max = ((QUEUE_MAX_PAGE + 1) / 2) - 1;
+
+ Assert(asyncQueuePageDiff(diff_max, diff_max) == 0);
+ Assert(asyncQueuePageDiff(diff_max, 0) > large_positive);
+ Assert(asyncQueuePageDiff(diff_max + 1, 0) < large_negative);
+ Assert(asyncQueuePageDiff(0, QUEUE_MAX_PAGE + diff_min + 1) <
+ large_negative);
+ Assert(asyncQueuePageDiff(0, QUEUE_MAX_PAGE + diff_min + 2) >
+ large_positive);
}
/*
@@ -557,11 +565,12 @@ AsyncShmemInit(void)
/*
* Set up SLRU management of the pg_notify data.
*/
- NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
+ NotifyCtl->PageDiff = asyncQueuePageDiff;
SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER);
/* Override default assumption that writes should be fsync'd */
NotifyCtl->do_fsync = false;
+ asyncQueuePageDiffUnitTests();
if (!found)
{
@@ -1366,7 +1375,7 @@ asyncQueueIsFull(void)
nexthead = 0; /* wrap around */
boundary = QUEUE_POS_PAGE(QUEUE_TAIL);
boundary -= boundary % SLRU_PAGES_PER_SEGMENT;
- return asyncQueuePagePrecedes(nexthead, boundary);
+ return asyncQueuePageDiff(nexthead, boundary) < 0;
}
/*
@@ -2205,7 +2214,7 @@ asyncQueueAdvanceTail(void)
*/
newtailpage = QUEUE_POS_PAGE(min);
boundary = newtailpage - (newtailpage % SLRU_PAGES_PER_SEGMENT);
- if (asyncQueuePagePrecedes(oldtailpage, boundary))
+ if (asyncQueuePageDiff(oldtailpage, boundary) < 0)
{
/*
* SimpleLruTruncate() will ask for NotifySLRULock but will also
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 5afcb18..78549fa 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -438,7 +438,7 @@ static void SetPossibleUnsafeConflict(SERIALIZABLEXACT *roXact, SERIALIZABLEXACT
static void ReleaseRWConflict(RWConflict conflict);
static void FlagSxactUnsafe(SERIALIZABLEXACT *sxact);
-static bool SerialPagePrecedesLogically(int page1, int page2);
+static int32 SerialPageDiffLogically(int page1, int page2);
static void SerialInit(void);
static void SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo);
static SerCommitSeqNo SerialGetMinConflictCommitSeqNo(TransactionId xid);
@@ -784,26 +784,30 @@ FlagSxactUnsafe(SERIALIZABLEXACT *sxact)
/*------------------------------------------------------------------------*/
/*
- * Decide whether a Serial page number is "older" for truncation purposes.
- * Analogous to CLOGPagePrecedes().
+ * Diff Serial page numbers for truncation purposes. Analogous to
+ * CLOGPageDiff().
+ *
+ * This must follow stricter rules than PageDiff demands, for the benefit of
+ * the call local to this file.
*/
-static bool
-SerialPagePrecedesLogically(int page1, int page2)
+static int32
+SerialPageDiffLogically(int page1, int page2)
{
TransactionId xid1;
TransactionId xid2;
+ int32 diff_head;
+ int32 diff_tail;
xid1 = ((TransactionId) page1) * SERIAL_ENTRIESPERPAGE;
- xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * SERIAL_ENTRIESPERPAGE;
- xid2 += FirstNormalTransactionId + 1;
- return (TransactionIdPrecedes(xid1, xid2) &&
- TransactionIdPrecedes(xid1, xid2 + SERIAL_ENTRIESPERPAGE - 1));
+ diff_head = xid1 - xid2;
+ diff_tail = xid1 - (xid2 + SERIAL_ENTRIESPERPAGE - 1);
+ return Max(diff_head, diff_tail);
}
static void
-SerialPagePrecedesLogicallyUnitTests(void)
+SerialPageDiffLogicallyUnitTests(void)
{
int per_page = SERIAL_ENTRIESPERPAGE,
offset = per_page / 2;
@@ -826,21 +830,21 @@ SerialPagePrecedesLogicallyUnitTests(void)
* In this scenario, the SLRU headPage pertains to the last ~1000 XIDs
* assigned. oldestXact finishes, ~2B XIDs having elapsed since it
* started. Further transactions cause us to summarize oldestXact to
- * tailPage. Function must return false so SerialAdd() doesn't zero
- * tailPage (which may contain entries for other old, recently-finished
- * XIDs) and half the SLRU. Reaching this requires burning ~2B XIDs in
- * single-user mode, a negligible possibility.
+ * tailPage. Function must return non-negative so SerialAdd() doesn't
+ * zero tailPage (which may contain entries for other old,
+ * recently-finished XIDs) and half the SLRU. Reaching this requires
+ * burning ~2B XIDs in single-user mode, a negligible possibility.
*/
headPage = newestPage;
targetPage = oldestPage;
- Assert(!SerialPagePrecedesLogically(headPage, targetPage));
+ Assert(SerialPageDiffLogically(headPage, targetPage) >= 0);
/*
* In this scenario, the SLRU headPage pertains to oldestXact. We're
* summarizing an XID near newestXact. (Assume few other XIDs used
* SERIALIZABLE, hence the minimal headPage advancement. Assume
* oldestXact was long-running and only recently reached the SLRU.)
- * Function must return true to make SerialAdd() create targetPage.
+ * Function must return negative to make SerialAdd() create targetPage.
*
* Today's implementation mishandles this case, but it doesn't matter
* enough to fix. Verify that the defect affects just one page by
@@ -851,9 +855,9 @@ SerialPagePrecedesLogicallyUnitTests(void)
*/
headPage = oldestPage;
targetPage = newestPage;
- Assert(SerialPagePrecedesLogically(headPage, targetPage - 1));
+ Assert(SerialPageDiffLogically(headPage, targetPage - 1) < 0);
#if 0
- Assert(SerialPagePrecedesLogically(headPage, targetPage));
+ Assert(SerialPageDiffLogically(headPage, targetPage) < 0);
#endif
}
@@ -868,14 +872,14 @@ SerialInit(void)
/*
* Set up SLRU management of the pg_serial data.
*/
- SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
+ SerialSlruCtl->PageDiff = SerialPageDiffLogically;
SimpleLruInit(SerialSlruCtl, "Serial",
NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
LWTRANCHE_SERIAL_BUFFER);
/* Override default assumption that writes should be fsync'd */
SerialSlruCtl->do_fsync = false;
- SerialPagePrecedesLogicallyUnitTests();
- SlruPagePrecedesUnitTests(SerialSlruCtl, SERIAL_ENTRIESPERPAGE);
+ SerialPageDiffLogicallyUnitTests();
+ SlruPageDiffUnitTests(SerialSlruCtl, SERIAL_ENTRIESPERPAGE);
/*
* Create or attach to the SerialControl structure.
@@ -937,8 +941,8 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
else
{
firstZeroPage = SerialNextPage(serialControl->headPage);
- isNewPage = SerialPagePrecedesLogically(serialControl->headPage,
- targetPage);
+ isNewPage = SerialPageDiffLogically(serialControl->headPage,
+ targetPage) < 0;
}
if (!TransactionIdIsValid(serialControl->headXid)
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 19982f6..a8144a5 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -28,7 +28,7 @@
* xxxx is CLOG or SUBTRANS, respectively), and segment numbering at
* 0xFFFFFFFF/xxxx_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
* take no explicit notice of that fact in slru.c, except when comparing
- * segment and page numbers in SimpleLruTruncate (see PagePrecedes()).
+ * segment and page numbers in SimpleLruTruncate (see PageDiff()).
*/
#define SLRU_PAGES_PER_SEGMENT 32
@@ -117,16 +117,18 @@ typedef struct SlruCtlData
bool do_fsync;
/*
- * Decide whether a page is "older" for truncation and as a hint for
- * evicting pages in LRU order. Return true if every entry of the first
- * argument is older than every entry of the second argument. Note that
- * !PagePrecedes(a,b) && !PagePrecedes(b,a) need not imply a==b; it also
- * arises when some entries are older and some are not. For SLRUs using
- * SimpleLruTruncate(), this must use modular arithmetic. (For others,
- * the behavior of this callback has no functional implications.) Use
- * SlruPagePrecedesUnitTests() in SLRUs meeting its criteria.
+ * Compute distance between two page numbers, for truncation and as a hint
+ * for evicting pages in LRU order. Callbacks shall distribute return
+ * values uniformly in [INT_MIN,INT_MAX]. If PageDiff(P, oldest_needed)
+ * is negative but not close to INT_MIN, that implies data in page P is
+ * obsolete. The exception for values close to INT_MIN permits
+ * implementations to return such values for edge cases where the answer
+ * changes mid-page from INT_MIN to INT_MAX. Use SlruPageDiffUnitTests()
+ * in SLRUs meeting its criteria. For SLRUs using SimpleLruTruncate(),
+ * this must use modular arithmetic. (For others, the behavior of this
+ * callback has no functional implications.)
*/
- bool (*PagePrecedes) (int, int);
+ int32 (*PageDiff) (int, int);
/*
* Dir is set during SimpleLruInit and does not change thereafter. Since
@@ -149,9 +151,9 @@ extern int SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno,
extern void SimpleLruWritePage(SlruCtl ctl, int slotno);
extern void SimpleLruFlush(SlruCtl ctl, bool allow_redirtied);
#ifdef USE_ASSERT_CHECKING
-extern void SlruPagePrecedesUnitTests(SlruCtl ctl, int per_page);
+extern void SlruPageDiffUnitTests(SlruCtl ctl, int per_page);
#else
-#define SlruPagePrecedesUnitTests(ctl, per_page) do {} while (0)
+#define SlruPageDiffUnitTests(ctl, per_page) do {} while (0)
#endif
extern void SimpleLruTruncate(SlruCtl ctl, int cutoffPage);
extern bool SimpleLruDoesPhysicalPageExist(SlruCtl ctl, int pageno);
@@ -162,8 +164,8 @@ extern bool SlruScanDirectory(SlruCtl ctl, SlruScanCallback callback, void *data
extern void SlruDeleteSegment(SlruCtl ctl, int segno);
/* SlruScanDirectory public callbacks */
-extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
- int segpage, void *data);
+extern bool SlruScanDirCbWouldTruncate(SlruCtl ctl, char *filename,
+ int segpage, void *data);
extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int segpage,
void *data);
On Sat, Aug 29, 2020 at 10:34:33PM -0700, Noah Misch wrote:
Rebased the second patch. The first patch did not need a rebase.
It looks like a new rebase is needed, the CF bot is complaining here.
--
Michael
On Mon, Sep 07, 2020 at 11:06:12AM +0900, Michael Paquier wrote:
On Sat, Aug 29, 2020 at 10:34:33PM -0700, Noah Misch wrote:
Rebased the second patch. The first patch did not need a rebase.
It looks like a new rebase is needed, the CF bot is complaining here.
http://cfbot.cputube.org/patch_29_2026.log applies the two patches in the
wrong order. For this CF entry, ignore it.
On Sun, Sep 06, 2020 at 08:14:21PM -0700, Noah Misch wrote:
http://cfbot.cputube.org/patch_29_2026.log applies the two patches in the
wrong order. For this CF entry, ignore it.
OK, thanks. This is a bug fix, so I have moved that to the next CF
for now. Noah, would you prefer more reviews or are you confident
enough to move on with this issue?
--
Michael
On Wed, Sep 30, 2020 at 03:06:40PM +0900, Michael Paquier wrote:
On Sun, Sep 06, 2020 at 08:14:21PM -0700, Noah Misch wrote:
http://cfbot.cputube.org/patch_29_2026.log applies the two patches in the
wrong order. For this CF entry, ignore it.OK, thanks. This is a bug fix, so I have moved that to the next CF
for now. Noah, would you prefer more reviews or are you confident
enough to move on with this issue?
The former. I plan to wait until a review puts this in Ready for Committer.
I'd be content if someone reviews the slru-truncate-modulo patch and disclaims
knowledge of the slru-truncate-insurance patch; I would then abandon the
latter patch. I'm not fond of how the latter turned out, particularly the
unintended consequence in TruncateMultiXact(). (See the commit message and/or
the edit to the comment in TruncateMultiXact().) The subtle interaction with
SerialAdd() is not great, either.
On Sat, Aug 29, 2020 at 10:34:33PM -0700, Noah Misch wrote:
On Mon, May 25, 2020 at 12:00:33AM -0700, Noah Misch wrote:
On Mon, Apr 06, 2020 at 09:18:47PM -0700, Noah Misch wrote:
On Mon, Apr 06, 2020 at 02:46:09PM -0400, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
On Wed, Mar 25, 2020 at 04:42:31PM -0400, Tom Lane wrote:
So I think what we're actually trying to accomplish here is to
ensure that instead of deleting up to half of the SLRU space
before the cutoff, we delete up to half-less-one-segment.
Maybe it should be half-less-two-segments, just to provide some
cushion against edge cases. Reading the first comment in
SetTransactionIdLimit makes one not want to trust too much in
arguments based on the exact value of xidWrapLimit, while for
the other SLRUs it was already unclear whether the edge cases
were exactly right.That could be interesting insurance. While it would be sad for us to miss an
edge case and print "must be vacuumed within 2 transactions" when wrap has
already happened, reaching that message implies the DBA burned ~1M XIDs, all
in single-user mode. More plausible is FreezeMultiXactId() overrunning the
limit by tens of segments. Hence, if we do buy this insurance, let's skip far
more segments. For example, instead of unlinking segments representing up to
2^31 past XIDs, we could divide that into an upper half that we unlink and a
lower half. The lower half will stay in place; eventually, XID consumption
will overwrite it. Truncation behavior won't change until the region of CLOG
for pre-oldestXact XIDs exceeds 256 MiB. Beyond that threshold,
vac_truncate_clog() will unlink the upper 256 MiB and leave the rest. CLOG
maximum would rise from 512 MiB to 768 MiB. Would that be worthwhile?Temporarily wasting some disk
space is a lot more palatable than corrupting data, and these code
paths are necessarily not terribly well tested. So +1 for more
insurance.Okay, I'll give that a try. I expect this will replace the PagePrecedes
callback with a PageDiff callback such that PageDiff(a, b) < 0 iff
PagePrecedes(a, b). PageDiff callbacks shall distribute return values
uniformly in [INT_MIN,INT_MAX]. SimpleLruTruncate() will unlink segments
where INT_MIN/2 < PageDiff(candidate, cutoff) < 0.While doing so, I found that slru-truncate-modulo-v2.patch did get edge cases
wrong, as you feared. In particular, if the newest XID reached xidStopLimit
and was in the first page of a segment, TruncateCLOG() would delete its
segment. Attached slru-truncate-modulo-v3.patch fixes that; as restitution, I
added unit tests covering that and other scenarios. Reaching the bug via XIDs
was hard, requiring one to burn 1000k-CLOG_XACTS_PER_PAGE=967k XIDs in
single-user mode. I expect the bug was easier to reach via pg_multixact.The insurance patch stacks on top of the bug fix patch. It does have a
negative effect on TruncateMultiXact(), which uses SlruScanDirCbFindEarliest
to skip truncation in corrupted clusters. SlruScanDirCbFindEarliest() gives
nonsense answers if "future" segments exist. That can happen today, but the
patch creates new ways to make it happen. The symptom is wasting yet more
space in pg_multixact. I am okay with this, since it arises only after one
fills pg_multixact 50% full. There are alternatives. We could weaken the
corruption defense in TruncateMultiXact() or look for another implementation
of equivalent defense. We could unlink, say, 75% or 95% of the "past" instead
of 50% (this patch) or >99.99% (today's behavior).Rebased the second patch. The first patch did not need a rebase.
Rebased both patches, necessitated by commit dee663f changing many of the same
spots. I've updated one of the log messages for 592a589 having landed.
I've also changed a patch name stem from slru-truncate-insurance to
slru-truncate-t-insurance, so it sorts after the other patch. Perhaps that
will trick http://cfbot.cputube.org/noah-misch.html into applying the patches
in the right order. If not, continue to ignore cfbot.
Attachments:
slru-truncate-modulo-v4.patchtext/plain; charset=us-asciiDownload
Author: Noah Misch <noah@leadboat.com>
Commit: Noah Misch <noah@leadboat.com>
Prevent excess SimpleLruTruncate() deletion.
Every core SLRU wraps around. With the exception of pg_notify, the wrap
point can fall in the middle of a page. Account for this in the
PagePrecedes callback specification and in SimpleLruTruncate()'s use of
said callback. Update each callback implementation to fit the new
specification. This changes SerialPagePrecedesLogically() from the
style of asyncQueuePagePrecedes() to the style of CLOGPagePrecedes().
(Whereas pg_clog and pg_serial share a key space, pg_serial is nothing
like pg_notify.) The bug fixed here has the same symptoms and user
followup steps as 592a589a04bd456410b853d86bd05faa9432cbbb. Back-patch
to 9.5 (all supported versions).
Reviewed by Tom Lane.
Discussion: https://postgr.es/m/20190202083822.GC32531@gust.leadboat.com
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 034349a..55bdac4 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -694,6 +694,7 @@ CLOGShmemInit(void)
SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
SYNC_HANDLER_CLOG);
+ SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
}
/*
@@ -912,13 +913,22 @@ TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid)
/*
- * Decide which of two CLOG page numbers is "older" for truncation purposes.
+ * Decide whether a CLOG page number is "older" for truncation purposes.
*
* We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic. However, if we are asked about
- * page number zero, we don't want to hand InvalidTransactionId to
- * TransactionIdPrecedes: it'll get weird about permanent xact IDs. So,
- * offset both xids by FirstNormalTransactionId to avoid that.
+ * thing with wraparound XID arithmetic. However, TransactionIdPrecedes()
+ * would get weird about permanent xact IDs. So, offset both such that xid1,
+ * xid2, and xid + CLOG_XACTS_PER_PAGE - 1 are all normal XIDs; this offset is
+ * relevant to page 0 and to the page preceding page 0.
+ *
+ * The page containing oldestXact-2^31 is the important edge case. The
+ * portion of that page equaling or following oldestXact-2^31 is expendable,
+ * but the portion preceding oldestXact-2^31 is not. When oldestXact-2^31 is
+ * the first XID of a page and segment, the entire page and segment is
+ * expendable, and we could truncate the segment. Recognizing that case would
+ * require making oldestXact, not just the page containing oldestXact,
+ * available to this callback. The benefit would be rare and small, so we
+ * don't optimize that edge case.
*/
static bool
CLOGPagePrecedes(int page1, int page2)
@@ -927,11 +937,12 @@ CLOGPagePrecedes(int page1, int page2)
TransactionId xid2;
xid1 = ((TransactionId) page1) * CLOG_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId;
+ xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId;
+ xid2 += FirstNormalTransactionId + 1;
- return TransactionIdPrecedes(xid1, xid2);
+ return (TransactionIdPrecedes(xid1, xid2) &&
+ TransactionIdPrecedes(xid1, xid2 + CLOG_XACTS_PER_PAGE - 1));
}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index cb8a968..8ffd48e 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -557,6 +557,7 @@ CommitTsShmemInit(void)
CommitTsSLRULock, "pg_commit_ts",
LWTRANCHE_COMMITTS_BUFFER,
SYNC_HANDLER_COMMIT_TS);
+ SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
commitTsShared = ShmemInitStruct("CommitTs shared",
sizeof(CommitTimestampShared),
@@ -927,14 +928,27 @@ AdvanceOldestCommitTsXid(TransactionId oldestXact)
/*
- * Decide which of two commitTS page numbers is "older" for truncation
- * purposes.
+ * Decide whether a commitTS page number is "older" for truncation purposes.
+ * Analogous to CLOGPagePrecedes().
*
- * We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic. However, if we are asked about
- * page number zero, we don't want to hand InvalidTransactionId to
- * TransactionIdPrecedes: it'll get weird about permanent xact IDs. So,
- * offset both xids by FirstNormalTransactionId to avoid that.
+ * At every supported BLCKSZ, (1 << 31) % COMMIT_TS_XACTS_PER_PAGE == 128.
+ * This introduces differences compared to CLOG and the other SLRUs having (1
+ * << 31) % per_page == 0. This function never tests exactly
+ * TransactionIdPrecedes(x-2^31, x). When the system reaches xidStopLimit,
+ * there are two possible counts of page boundaries between oldestXact and the
+ * latest XID assigned, depending on whether oldestXact is within the first
+ * 128 entries of its page. Since this function doesn't know the location of
+ * oldestXact within page2, it returns false for one page that actually is
+ * expendable. This is a wider (yet still negligible) version of the
+ * truncation opportunity that CLOGPagePrecedes() cannot recognize.
+ *
+ * For the sake of a worked example, number entries with decimal values such
+ * that page1==1 entries range from 1.0 to 1.999. Let N+0.15 be the number of
+ * pages that 2^31 entries will span (N is an integer). If oldestXact=N+2.1,
+ * then the final safe XID assignment leaves newestXact=1.95. We keep page 2,
+ * because entry=2.85 is the border that toggles whether entries precede the
+ * last entry of the oldestXact page. While page 2 is expendable at
+ * oldestXact=N+2.1, it would be precious at oldestXact=N+2.9.
*/
static bool
CommitTsPagePrecedes(int page1, int page2)
@@ -943,11 +957,12 @@ CommitTsPagePrecedes(int page1, int page2)
TransactionId xid2;
xid1 = ((TransactionId) page1) * COMMIT_TS_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId;
+ xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * COMMIT_TS_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId;
+ xid2 += FirstNormalTransactionId + 1;
- return TransactionIdPrecedes(xid1, xid2);
+ return (TransactionIdPrecedes(xid1, xid2) &&
+ TransactionIdPrecedes(xid1, xid2 + COMMIT_TS_XACTS_PER_PAGE - 1));
}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 43653fe..7423110 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1852,11 +1852,13 @@ MultiXactShmemInit(void)
MultiXactOffsetSLRULock, "pg_multixact/offsets",
LWTRANCHE_MULTIXACTOFFSET_BUFFER,
SYNC_HANDLER_MULTIXACT_OFFSET);
+ SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
SimpleLruInit(MultiXactMemberCtl,
"MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
MultiXactMemberSLRULock, "pg_multixact/members",
LWTRANCHE_MULTIXACTMEMBER_BUFFER,
SYNC_HANDLER_MULTIXACT_MEMBER);
+ /* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
/* Initialize our shared state struct */
MultiXactState = ShmemInitStruct("Shared MultiXact State",
@@ -2982,6 +2984,14 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
* truncate the members SLRU. So we first scan the directory to determine
* the earliest offsets page number that we can read without error.
*
+ * When nextMXact is less than one segment away from multiWrapLimit,
+ * SlruScanDirCbFindEarliest can find some early segment other than the
+ * actual earliest. (MultiXactOffsetPagePrecedes(EARLIEST, LATEST)
+ * returns false, because not all pairs of entries have the same answer.)
+ * That can also arise when an earlier truncation attempt failed unlink()
+ * or returned early from this function. The only consequence is
+ * returning early, which wastes space that we could have liberated.
+ *
* NB: It's also possible that the page that oldestMulti is on has already
* been truncated away, and we crashed before updating oldestMulti.
*/
@@ -3096,15 +3106,11 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
}
/*
- * Decide which of two MultiXactOffset page numbers is "older" for truncation
- * purposes.
+ * Decide whether a MultiXactOffset page number is "older" for truncation
+ * purposes. Analogous to CLOGPagePrecedes().
*
- * We need to use comparison of MultiXactId here in order to do the right
- * thing with wraparound. However, if we are asked about page number zero, we
- * don't want to hand InvalidMultiXactId to MultiXactIdPrecedes: it'll get
- * weird. So, offset both multis by FirstMultiXactId to avoid that.
- * (Actually, the current implementation doesn't do anything weird with
- * InvalidMultiXactId, but there's no harm in leaving this code like this.)
+ * Offsetting the values is optional, because MultiXactIdPrecedes() has
+ * translational symmetry.
*/
static bool
MultiXactOffsetPagePrecedes(int page1, int page2)
@@ -3113,15 +3119,17 @@ MultiXactOffsetPagePrecedes(int page1, int page2)
MultiXactId multi2;
multi1 = ((MultiXactId) page1) * MULTIXACT_OFFSETS_PER_PAGE;
- multi1 += FirstMultiXactId;
+ multi1 += FirstMultiXactId + 1;
multi2 = ((MultiXactId) page2) * MULTIXACT_OFFSETS_PER_PAGE;
- multi2 += FirstMultiXactId;
+ multi2 += FirstMultiXactId + 1;
- return MultiXactIdPrecedes(multi1, multi2);
+ return (MultiXactIdPrecedes(multi1, multi2) &&
+ MultiXactIdPrecedes(multi1,
+ multi2 + MULTIXACT_OFFSETS_PER_PAGE - 1));
}
/*
- * Decide which of two MultiXactMember page numbers is "older" for truncation
+ * Decide whether a MultiXactMember page number is "older" for truncation
* purposes. There is no "invalid offset number" so use the numbers verbatim.
*/
static bool
@@ -3133,7 +3141,9 @@ MultiXactMemberPagePrecedes(int page1, int page2)
offset1 = ((MultiXactOffset) page1) * MULTIXACT_MEMBERS_PER_PAGE;
offset2 = ((MultiXactOffset) page2) * MULTIXACT_MEMBERS_PER_PAGE;
- return MultiXactOffsetPrecedes(offset1, offset2);
+ return (MultiXactOffsetPrecedes(offset1, offset2) &&
+ MultiXactOffsetPrecedes(offset1,
+ offset2 + MULTIXACT_MEMBERS_PER_PAGE - 1));
}
/*
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 16a7898..014072f 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -1231,11 +1231,6 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
pgstat_count_slru_truncate(shared->slru_stats_idx);
/*
- * The cutoff point is the start of the segment containing cutoffPage.
- */
- cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
-
- /*
* Scan shared memory and remove any pages preceding the cutoff page, to
* ensure we won't rewrite them later. (Since this is normally called in
* or just after a checkpoint, any dirty pages should have been flushed
@@ -1247,9 +1242,7 @@ restart:;
/*
* While we are holding the lock, make an important safety check: the
- * planned cutoff point must be <= the current endpoint page. Otherwise we
- * have already wrapped around, and proceeding with the truncation would
- * risk removing the current segment.
+ * current endpoint page must not be eligible for removal.
*/
if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
{
@@ -1281,8 +1274,11 @@ restart:;
* Hmm, we have (or may have) I/O operations acting on the page, so
* we've got to wait for them to finish and then start again. This is
* the same logic as in SlruSelectLRUPage. (XXX if page is dirty,
- * wouldn't it be OK to just discard it without writing it? For now,
- * keep the logic the same as it was.)
+ * wouldn't it be OK to just discard it without writing it?
+ * SlruMayDeleteSegment() uses a stricter qualification, so we might
+ * not delete this page in the end; even if we don't delete it, we
+ * won't have cause to read its data again. For now, keep the logic
+ * the same as it was.)
*/
if (shared->page_status[slotno] == SLRU_PAGE_VALID)
SlruInternalWritePage(ctl, slotno, NULL);
@@ -1386,18 +1382,133 @@ restart:
}
/*
+ * Determine whether a segment is okay to delete.
+ *
+ * segpage is the first page of the segment, and cutoffPage is the oldest (in
+ * PagePrecedes order) page in the SLRU containing still-useful data. Since
+ * every core PagePrecedes callback implements "wrap around", check the
+ * segment's first and last pages:
+ *
+ * first<cutoff && last<cutoff: yes
+ * first<cutoff && last>=cutoff: no; cutoff falls inside this segment
+ * first>=cutoff && last<cutoff: no; wrap point falls inside this segment
+ * first>=cutoff && last>=cutoff: no; every page of this segment is too young
+ */
+static bool
+SlruMayDeleteSegment(SlruCtl ctl, int segpage, int cutoffPage)
+{
+ int seg_last_page = segpage + SLRU_PAGES_PER_SEGMENT - 1;
+
+ Assert(segpage % SLRU_PAGES_PER_SEGMENT == 0);
+
+ return (ctl->PagePrecedes(segpage, cutoffPage) &&
+ ctl->PagePrecedes(seg_last_page, cutoffPage));
+}
+
+#ifdef USE_ASSERT_CHECKING
+static void
+SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
+{
+ TransactionId lhs,
+ rhs;
+ int newestPage,
+ oldestPage;
+ TransactionId newestXact,
+ oldestXact;
+
+ /*
+ * Compare an XID pair having undefined order (see RFC 1982), a pair at
+ * "opposite ends" of the XID space. TransactionIdPrecedes() treats each
+ * as preceding the other. If RHS is oldestXact, LHS is the first XID we
+ * must not assign.
+ */
+ lhs = per_page + offset; /* skip first page to avoid non-normal XIDs */
+ rhs = lhs + (1U << 31);
+ Assert(TransactionIdPrecedes(lhs, rhs));
+ Assert(TransactionIdPrecedes(rhs, lhs));
+ Assert(!TransactionIdPrecedes(lhs - 1, rhs));
+ Assert(TransactionIdPrecedes(rhs, lhs - 1));
+ Assert(TransactionIdPrecedes(lhs + 1, rhs));
+ Assert(!TransactionIdPrecedes(rhs, lhs + 1));
+ Assert(!TransactionIdFollowsOrEquals(lhs, rhs));
+ Assert(!TransactionIdFollowsOrEquals(rhs, lhs));
+ Assert(!ctl->PagePrecedes(lhs / per_page, lhs / per_page));
+ Assert(!ctl->PagePrecedes(lhs / per_page, rhs / per_page));
+ Assert(!ctl->PagePrecedes(rhs / per_page, lhs / per_page));
+ Assert(!ctl->PagePrecedes((lhs - per_page) / per_page, rhs / per_page));
+ Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 3 * per_page) / per_page));
+ Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 2 * per_page) / per_page));
+ Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 1 * per_page) / per_page)
+ || (1U << 31) % per_page != 0); /* See CommitTsPagePrecedes() */
+ Assert(ctl->PagePrecedes((lhs + 1 * per_page) / per_page, rhs / per_page)
+ || (1U << 31) % per_page != 0);
+ Assert(ctl->PagePrecedes((lhs + 2 * per_page) / per_page, rhs / per_page));
+ Assert(ctl->PagePrecedes((lhs + 3 * per_page) / per_page, rhs / per_page));
+ Assert(!ctl->PagePrecedes(rhs / per_page, (lhs + per_page) / per_page));
+
+ /*
+ * GetNewTransactionId() has assigned the last XID it can safely use, and
+ * that XID is in the *LAST* page of the second segment. We must not
+ * delete that segment.
+ */
+ newestPage = 2 * SLRU_PAGES_PER_SEGMENT - 1;
+ newestXact = newestPage * per_page + offset;
+ Assert(newestXact / per_page == newestPage);
+ oldestXact = newestXact + 1;
+ oldestXact -= 1U << 31;
+ oldestPage = oldestXact / per_page;
+ Assert(!SlruMayDeleteSegment(ctl,
+ (newestPage -
+ newestPage % SLRU_PAGES_PER_SEGMENT),
+ oldestPage));
+
+ /*
+ * GetNewTransactionId() has assigned the last XID it can safely use, and
+ * that XID is in the *FIRST* page of the second segment. We must not
+ * delete that segment.
+ */
+ newestPage = SLRU_PAGES_PER_SEGMENT;
+ newestXact = newestPage * per_page + offset;
+ Assert(newestXact / per_page == newestPage);
+ oldestXact = newestXact + 1;
+ oldestXact -= 1U << 31;
+ oldestPage = oldestXact / per_page;
+ Assert(!SlruMayDeleteSegment(ctl,
+ (newestPage -
+ newestPage % SLRU_PAGES_PER_SEGMENT),
+ oldestPage));
+}
+
+/*
+ * Unit-test a PagePrecedes function.
+ *
+ * This assumes every uint32 >= FirstNormalTransactionId is a valid key. It
+ * assumes each value occupies a contiguous, fixed-size region of SLRU bytes.
+ * (MultiXactMemberCtl separates flags from XIDs. AsyncCtl has
+ * variable-length entries, no keys, and no random access. These unit tests
+ * do not apply to them.)
+ */
+void
+SlruPagePrecedesUnitTests(SlruCtl ctl, int per_page)
+{
+ /* Test first, middle and last entries of a page. */
+ SlruPagePrecedesTestOffset(ctl, per_page, 0);
+ SlruPagePrecedesTestOffset(ctl, per_page, per_page / 2);
+ SlruPagePrecedesTestOffset(ctl, per_page, per_page - 1);
+}
+#endif
+
+/*
* SlruScanDirectory callback
- * This callback reports true if there's any segment prior to the one
- * containing the page passed as "data".
+ * This callback reports true if there's any segment wholly prior to the
+ * one containing the page passed as "data".
*/
bool
SlruScanDirCbReportPresence(SlruCtl ctl, char *filename, int segpage, void *data)
{
int cutoffPage = *(int *) data;
- cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
-
- if (ctl->PagePrecedes(segpage, cutoffPage))
+ if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
return true; /* found one; don't iterate any more */
return false; /* keep going */
@@ -1412,7 +1523,7 @@ SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename, int segpage, void *data)
{
int cutoffPage = *(int *) data;
- if (ctl->PagePrecedes(segpage, cutoffPage))
+ if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
SlruInternalDeleteSegment(ctl, filename);
return false; /* keep going */
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 0111e86..c50490d 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -194,6 +194,7 @@ SUBTRANSShmemInit(void)
SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
SubtransSLRULock, "pg_subtrans",
LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE);
+ SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
}
/*
@@ -354,13 +355,8 @@ TruncateSUBTRANS(TransactionId oldestXact)
/*
- * Decide which of two SUBTRANS page numbers is "older" for truncation purposes.
- *
- * We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic. However, if we are asked about
- * page number zero, we don't want to hand InvalidTransactionId to
- * TransactionIdPrecedes: it'll get weird about permanent xact IDs. So,
- * offset both xids by FirstNormalTransactionId to avoid that.
+ * Decide whether a SUBTRANS page number is "older" for truncation purposes.
+ * Analogous to CLOGPagePrecedes().
*/
static bool
SubTransPagePrecedes(int page1, int page2)
@@ -369,9 +365,10 @@ SubTransPagePrecedes(int page1, int page2)
TransactionId xid2;
xid1 = ((TransactionId) page1) * SUBTRANS_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId;
+ xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * SUBTRANS_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId;
+ xid2 += FirstNormalTransactionId + 1;
- return TransactionIdPrecedes(xid1, xid2);
+ return (TransactionIdPrecedes(xid1, xid2) &&
+ TransactionIdPrecedes(xid1, xid2 + SUBTRANS_XACTS_PER_PAGE - 1));
}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 8dbcace..9872129 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -487,7 +487,12 @@ asyncQueuePageDiff(int p, int q)
return diff;
}
-/* Is p < q, accounting for wraparound? */
+/*
+ * Is p < q, accounting for wraparound?
+ *
+ * Since asyncQueueIsFull() blocks creation of a page that could precede any
+ * extant page, we need not assess entries within a page.
+ */
static bool
asyncQueuePagePrecedes(int p, int q)
{
@@ -1348,8 +1353,8 @@ asyncQueueIsFull(void)
* logically precedes the current global tail pointer, ie, the head
* pointer would wrap around compared to the tail. We cannot create such
* a head page for fear of confusing slru.c. For safety we round the tail
- * pointer back to a segment boundary (compare the truncation logic in
- * asyncQueueAdvanceTail).
+ * pointer back to a segment boundary (truncation logic in
+ * asyncQueueAdvanceTail does not do this, so doing it here is optional).
*
* Note that this test is *not* dependent on how much space there is on
* the current head page. This is necessary because asyncQueueAddEntries
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 8a365b4..1b646a0 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -438,7 +438,7 @@ static void SetPossibleUnsafeConflict(SERIALIZABLEXACT *roXact, SERIALIZABLEXACT
static void ReleaseRWConflict(RWConflict conflict);
static void FlagSxactUnsafe(SERIALIZABLEXACT *sxact);
-static bool SerialPagePrecedesLogically(int p, int q);
+static bool SerialPagePrecedesLogically(int page1, int page2);
static void SerialInit(void);
static void SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo);
static SerCommitSeqNo SerialGetMinConflictCommitSeqNo(TransactionId xid);
@@ -784,27 +784,77 @@ FlagSxactUnsafe(SERIALIZABLEXACT *sxact)
/*------------------------------------------------------------------------*/
/*
- * We will work on the page range of 0..SERIAL_MAX_PAGE.
- * Compares using wraparound logic, as is required by slru.c.
+ * Decide whether a Serial page number is "older" for truncation purposes.
+ * Analogous to CLOGPagePrecedes().
*/
static bool
-SerialPagePrecedesLogically(int p, int q)
+SerialPagePrecedesLogically(int page1, int page2)
{
- int diff;
+ TransactionId xid1;
+ TransactionId xid2;
+
+ xid1 = ((TransactionId) page1) * SERIAL_ENTRIESPERPAGE;
+ xid1 += FirstNormalTransactionId + 1;
+ xid2 = ((TransactionId) page2) * SERIAL_ENTRIESPERPAGE;
+ xid2 += FirstNormalTransactionId + 1;
+
+ return (TransactionIdPrecedes(xid1, xid2) &&
+ TransactionIdPrecedes(xid1, xid2 + SERIAL_ENTRIESPERPAGE - 1));
+}
+
+static void
+SerialPagePrecedesLogicallyUnitTests(void)
+{
+ int per_page = SERIAL_ENTRIESPERPAGE,
+ offset = per_page / 2;
+ int newestPage,
+ oldestPage,
+ headPage,
+ targetPage;
+ TransactionId newestXact,
+ oldestXact;
+
+ /* GetNewTransactionId() has assigned the last XID it can safely use. */
+ newestPage = 2 * SLRU_PAGES_PER_SEGMENT - 1; /* nothing special */
+ newestXact = newestPage * per_page + offset;
+ Assert(newestXact / per_page == newestPage);
+ oldestXact = newestXact + 1;
+ oldestXact -= 1U << 31;
+ oldestPage = oldestXact / per_page;
/*
- * We have to compare modulo (SERIAL_MAX_PAGE+1)/2. Both inputs should be
- * in the range 0..SERIAL_MAX_PAGE.
+ * In this scenario, the SLRU headPage pertains to the last ~1000 XIDs
+ * assigned. oldestXact finishes, ~2B XIDs having elapsed since it
+ * started. Further transactions cause us to summarize oldestXact to
+ * tailPage. Function must return false so SerialAdd() doesn't zero
+ * tailPage (which may contain entries for other old, recently-finished
+ * XIDs) and half the SLRU. Reaching this requires burning ~2B XIDs in
+ * single-user mode, a negligible possibility.
*/
- Assert(p >= 0 && p <= SERIAL_MAX_PAGE);
- Assert(q >= 0 && q <= SERIAL_MAX_PAGE);
-
- diff = p - q;
- if (diff >= ((SERIAL_MAX_PAGE + 1) / 2))
- diff -= SERIAL_MAX_PAGE + 1;
- else if (diff < -((int) (SERIAL_MAX_PAGE + 1) / 2))
- diff += SERIAL_MAX_PAGE + 1;
- return diff < 0;
+ headPage = newestPage;
+ targetPage = oldestPage;
+ Assert(!SerialPagePrecedesLogically(headPage, targetPage));
+
+ /*
+ * In this scenario, the SLRU headPage pertains to oldestXact. We're
+ * summarizing an XID near newestXact. (Assume few other XIDs used
+ * SERIALIZABLE, hence the minimal headPage advancement. Assume
+ * oldestXact was long-running and only recently reached the SLRU.)
+ * Function must return true to make SerialAdd() create targetPage.
+ *
+ * Today's implementation mishandles this case, but it doesn't matter
+ * enough to fix. Verify that the defect affects just one page by
+ * asserting correct treatment of its prior page. Reaching this case
+ * requires burning ~2B XIDs in single-user mode, a negligible
+ * possibility. Moreover, if it does happen, the consequence would be
+ * mild, namely a new transaction failing in SimpleLruReadPage().
+ */
+ headPage = oldestPage;
+ targetPage = newestPage;
+ Assert(SerialPagePrecedesLogically(headPage, targetPage - 1));
+#if 0
+ Assert(SerialPagePrecedesLogically(headPage, targetPage));
+#endif
}
/*
@@ -822,6 +872,8 @@ SerialInit(void)
SimpleLruInit(SerialSlruCtl, "Serial",
NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE);
+ SerialPagePrecedesLogicallyUnitTests();
+ SlruPagePrecedesUnitTests(SerialSlruCtl, SERIAL_ENTRIESPERPAGE);
/*
* Create or attach to the SerialControl structure.
@@ -1030,7 +1082,7 @@ CheckPointPredicate(void)
}
else
{
- /*
+ /*----------
* The SLRU is no longer needed. Truncate to head before we set head
* invalid.
*
@@ -1039,6 +1091,25 @@ CheckPointPredicate(void)
* that we leave behind will appear to be new again. In that case it
* won't be removed until XID horizon advances enough to make it
* current again.
+ *
+ * XXX: This should happen in vac_truncate_clog(), not in checkpoints.
+ * Consider this scenario, starting from a system with no in-progress
+ * transactions and VACUUM FREEZE having maximized oldestXact:
+ * - Start a SERIALIZABLE transaction.
+ * - Start, finish, and summarize a SERIALIZABLE transaction, creating
+ * one SLRU page.
+ * - Consume XIDs to reach xidStopLimit.
+ * - Finish all transactions. Due to the long-running SERIALIZABLE
+ * transaction, earlier checkpoints did not touch headPage. The
+ * next checkpoint will change it, but that checkpoint happens after
+ * the end of the scenario.
+ * - VACUUM to advance XID limits.
+ * - Consume ~2M XIDs, crossing the former xidWrapLimit.
+ * - Start, finish, and summarize a SERIALIZABLE transaction.
+ * SerialAdd() declines to create the targetPage, because headPage
+ * is not regarded as in the past relative to that targetPage. The
+ * transaction instigating the summarize fails in
+ * SimpleLruReadPage().
*/
tailPage = serialControl->headPage;
serialControl->headPage = -1;
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index b39b435..805dd2b 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -118,9 +118,14 @@ typedef struct SlruCtlData
SyncRequestHandler sync_handler;
/*
- * Decide which of two page numbers is "older" for truncation purposes. We
- * need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic.
+ * Decide whether a page is "older" for truncation and as a hint for
+ * evicting pages in LRU order. Return true if every entry of the first
+ * argument is older than every entry of the second argument. Note that
+ * !PagePrecedes(a,b) && !PagePrecedes(b,a) need not imply a==b; it also
+ * arises when some entries are older and some are not. For SLRUs using
+ * SimpleLruTruncate(), this must use modular arithmetic. (For others,
+ * the behavior of this callback has no functional implications.) Use
+ * SlruPagePrecedesUnitTests() in SLRUs meeting its criteria.
*/
bool (*PagePrecedes) (int, int);
@@ -145,6 +150,11 @@ extern int SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno,
TransactionId xid);
extern void SimpleLruWritePage(SlruCtl ctl, int slotno);
extern void SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied);
+#ifdef USE_ASSERT_CHECKING
+extern void SlruPagePrecedesUnitTests(SlruCtl ctl, int per_page);
+#else
+#define SlruPagePrecedesUnitTests(ctl, per_page) do {} while (0)
+#endif
extern void SimpleLruTruncate(SlruCtl ctl, int cutoffPage);
extern bool SimpleLruDoesPhysicalPageExist(SlruCtl ctl, int pageno);
slru-truncate-t-insurance-v3.patchtext/plain; charset=us-asciiDownload
Author: Noah Misch <noah@leadboat.com>
Commit: Noah Misch <noah@leadboat.com>
Unlink less in SimpleLruTruncate(), as insurance against bugs.
SimpleLruTruncate() has been unlinking every expendable file. In edge
cases, it also deleted important files. The most recent commit fixed
that. Given the history of this class of bugs evading detection, let's
not trust that patch exclusively. Instead of unlinking segments
representing up to 2^31 past XIDs, delete no more than half that much.
The balance will stay in place; eventually, XID consumption will
overwrite it. This could mitigate unknown SimpleLruTruncate() bugs and
simplify manual remediation after one has overtaken wrap limits in
single-user mode.
Truncation behavior won't change at all until an SLRU is half full.
Once it does change, a drawback is conflict with the following defense.
TruncateMultiXact() skips truncation when unexpected files exist on
disk, which this change deliberately makes more common. Hence,
pg_multixact becomes more likely to persist in consuming its maximum
storage. Also, this change may uncover bugs in SLRU page recycling by
making that more common. For SLRUs outside of pg_multixact, maximum
storage rises by 50%; for example, the CLOG maximum rises from 512 MiB
to 768 MiB. Usage in pg_multixact may double. Back-patch to 9.5 (all
supported versions).
Reviewed by FIXME.
Discussion: https://postgr.es/m/20200330052809.GB2324620@rfd.leadboat.com
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 4d8ad75..c2a6961 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -405,6 +405,11 @@
in every database at least once every two billion transactions.
</para>
+ <!-- This oversimplifies; there are (2^31)-1 XIDs in the past, the same
+ number in the future, and one incomparable. (For each pair of incomparable
+ XIDs, TransactionIdPrecedes(a, b) and TransactionIdPrecedes(b, a) both
+ return true.) None of that is important to the DBA, since xidStopLimit
+ intervenes long before. -->
<para>
The reason that periodic vacuuming solves the problem is that
<command>VACUUM</command> will mark rows as <emphasis>frozen</emphasis>, indicating that
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 55bdac4..38aef74 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -53,7 +53,7 @@
* and CLOG segment numbering at
* 0xFFFFFFFF/CLOG_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need take no
* explicit notice of that fact in this module, except when comparing segment
- * and page numbers in TruncateCLOG (see CLOGPagePrecedes).
+ * and page numbers in TruncateCLOG (see CLOGPageDiff).
*/
/* We need two bits per xact, so four xacts fit in a byte */
@@ -90,7 +90,7 @@ static SlruCtlData XactCtlData;
static int ZeroCLOGPage(int pageno, bool writeXlog);
-static bool CLOGPagePrecedes(int page1, int page2);
+static int32 CLOGPageDiff(int page1, int page2);
static void WriteZeroPageXlogRec(int pageno);
static void WriteTruncateXlogRec(int pageno, TransactionId oldestXact,
Oid oldestXactDb);
@@ -690,11 +690,11 @@ CLOGShmemSize(void)
void
CLOGShmemInit(void)
{
- XactCtl->PagePrecedes = CLOGPagePrecedes;
+ XactCtl->PageDiff = CLOGPageDiff;
SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
SYNC_HANDLER_CLOG);
- SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
+ SlruPageDiffUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
}
/*
@@ -887,7 +887,7 @@ TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid)
cutoffPage = TransactionIdToPage(oldestXact);
/* Check to see if there's any files that could be removed */
- if (!SlruScanDirectory(XactCtl, SlruScanDirCbReportPresence, &cutoffPage))
+ if (!SlruScanDirectory(XactCtl, SlruScanDirCbWouldTruncate, &cutoffPage))
return; /* nothing to remove */
/*
@@ -913,13 +913,14 @@ TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid)
/*
- * Decide whether a CLOG page number is "older" for truncation purposes.
+ * Diff CLOG page numbers for truncation purposes.
*
- * We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic. However, TransactionIdPrecedes()
- * would get weird about permanent xact IDs. So, offset both such that xid1,
- * xid2, and xid + CLOG_XACTS_PER_PAGE - 1 are all normal XIDs; this offset is
- * relevant to page 0 and to the page preceding page 0.
+ * To do the right thing with wraparound XID arithmetic, this mirrors
+ * TransactionIdPrecedes(). The Max() operation ensures we return a positive
+ * value when the wrap point may fall inside these pages. (When it does, some
+ * pairs of entries have a positive diff, and other pairs have a negative
+ * diff.) Only the predicate.c SLRU needs the Max() operation; to avoid
+ * having even more corner cases to understand, all XID-indexed SLRUs do it.
*
* The page containing oldestXact-2^31 is the important edge case. The
* portion of that page equaling or following oldestXact-2^31 is expendable,
@@ -927,22 +928,22 @@ TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid)
* the first XID of a page and segment, the entire page and segment is
* expendable, and we could truncate the segment. Recognizing that case would
* require making oldestXact, not just the page containing oldestXact,
- * available to this callback. The benefit would be rare and small, so we
- * don't optimize that edge case.
+ * available to this callback. slru.c wouldn't delete the page, anyway.
*/
-static bool
-CLOGPagePrecedes(int page1, int page2)
+static int32
+CLOGPageDiff(int page1, int page2)
{
TransactionId xid1;
TransactionId xid2;
+ int32 diff_head;
+ int32 diff_tail;
xid1 = ((TransactionId) page1) * CLOG_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId + 1;
- return (TransactionIdPrecedes(xid1, xid2) &&
- TransactionIdPrecedes(xid1, xid2 + CLOG_XACTS_PER_PAGE - 1));
+ diff_head = xid1 - xid2;
+ diff_tail = xid1 - (xid2 + CLOG_XACTS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 8ffd48e..65afc8c 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -46,7 +46,7 @@
* 0xFFFFFFFF/COMMIT_TS_XACTS_PER_PAGE, and CommitTs segment numbering at
* 0xFFFFFFFF/COMMIT_TS_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need take no
* explicit notice of that fact in this module, except when comparing segment
- * and page numbers in TruncateCommitTs (see CommitTsPagePrecedes).
+ * and page numbers in TruncateCommitTs (see CommitTsPageDiff).
*/
/*
@@ -109,7 +109,7 @@ static void TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
RepOriginId nodeid, int slotno);
static void error_commit_ts_disabled(void);
static int ZeroCommitTsPage(int pageno, bool writeXlog);
-static bool CommitTsPagePrecedes(int page1, int page2);
+static int32 CommitTsPageDiff(int page1, int page2);
static void ActivateCommitTs(void);
static void DeactivateCommitTs(void);
static void WriteZeroPageXlogRec(int pageno);
@@ -552,12 +552,12 @@ CommitTsShmemInit(void)
{
bool found;
- CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
+ CommitTsCtl->PageDiff = CommitTsPageDiff;
SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
CommitTsSLRULock, "pg_commit_ts",
LWTRANCHE_COMMITTS_BUFFER,
SYNC_HANDLER_COMMIT_TS);
- SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
+ SlruPageDiffUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
commitTsShared = ShmemInitStruct("CommitTs shared",
sizeof(CommitTimestampShared),
@@ -875,7 +875,7 @@ TruncateCommitTs(TransactionId oldestXact)
cutoffPage = TransactionIdToCTsPage(oldestXact);
/* Check to see if there's any files that could be removed */
- if (!SlruScanDirectory(CommitTsCtl, SlruScanDirCbReportPresence,
+ if (!SlruScanDirectory(CommitTsCtl, SlruScanDirCbWouldTruncate,
&cutoffPage))
return; /* nothing to remove */
@@ -928,8 +928,8 @@ AdvanceOldestCommitTsXid(TransactionId oldestXact)
/*
- * Decide whether a commitTS page number is "older" for truncation purposes.
- * Analogous to CLOGPagePrecedes().
+ * Diff commitTS page numbers for truncation purposes. Analogous to
+ * CLOGPageDiff().
*
* At every supported BLCKSZ, (1 << 31) % COMMIT_TS_XACTS_PER_PAGE == 128.
* This introduces differences compared to CLOG and the other SLRUs having (1
@@ -940,7 +940,7 @@ AdvanceOldestCommitTsXid(TransactionId oldestXact)
* 128 entries of its page. Since this function doesn't know the location of
* oldestXact within page2, it returns false for one page that actually is
* expendable. This is a wider (yet still negligible) version of the
- * truncation opportunity that CLOGPagePrecedes() cannot recognize.
+ * truncation opportunity that CLOGPageDiff() cannot recognize.
*
* For the sake of a worked example, number entries with decimal values such
* that page1==1 entries range from 1.0 to 1.999. Let N+0.15 be the number of
@@ -950,19 +950,20 @@ AdvanceOldestCommitTsXid(TransactionId oldestXact)
* last entry of the oldestXact page. While page 2 is expendable at
* oldestXact=N+2.1, it would be precious at oldestXact=N+2.9.
*/
-static bool
-CommitTsPagePrecedes(int page1, int page2)
+static int32
+CommitTsPageDiff(int page1, int page2)
{
TransactionId xid1;
TransactionId xid2;
+ int32 diff_head;
+ int32 diff_tail;
xid1 = ((TransactionId) page1) * COMMIT_TS_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * COMMIT_TS_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId + 1;
- return (TransactionIdPrecedes(xid1, xid2) &&
- TransactionIdPrecedes(xid1, xid2 + COMMIT_TS_XACTS_PER_PAGE - 1));
+ diff_head = xid1 - xid2;
+ diff_tail = xid1 - (xid2 + COMMIT_TS_XACTS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 7423110..4772aa0 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -102,7 +102,7 @@
* 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
* take no explicit notice of that fact in this module, except when comparing
* segment and page numbers in TruncateMultiXact (see
- * MultiXactOffsetPagePrecedes).
+ * MultiXactOffsetPageDiff).
*/
/* We need four bytes per offset */
@@ -355,8 +355,8 @@ static char *mxstatus_to_string(MultiXactStatus status);
/* management of SLRU infrastructure */
static int ZeroMultiXactOffsetPage(int pageno, bool writeXlog);
static int ZeroMultiXactMemberPage(int pageno, bool writeXlog);
-static bool MultiXactOffsetPagePrecedes(int page1, int page2);
-static bool MultiXactMemberPagePrecedes(int page1, int page2);
+static int32 MultiXactOffsetPageDiff(int page1, int page2);
+static int32 MultiXactMemberPageDiff(int page1, int page2);
static bool MultiXactOffsetPrecedes(MultiXactOffset offset1,
MultiXactOffset offset2);
static void ExtendMultiXactOffset(MultiXactId multi);
@@ -1844,15 +1844,15 @@ MultiXactShmemInit(void)
debug_elog2(DEBUG2, "Shared Memory Init for MultiXact");
- MultiXactOffsetCtl->PagePrecedes = MultiXactOffsetPagePrecedes;
- MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
+ MultiXactOffsetCtl->PageDiff = MultiXactOffsetPageDiff;
+ MultiXactMemberCtl->PageDiff = MultiXactMemberPageDiff;
SimpleLruInit(MultiXactOffsetCtl,
"MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
MultiXactOffsetSLRULock, "pg_multixact/offsets",
LWTRANCHE_MULTIXACTOFFSET_BUFFER,
SYNC_HANDLER_MULTIXACT_OFFSET);
- SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
+ SlruPageDiffUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
SimpleLruInit(MultiXactMemberCtl,
"MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
MultiXactMemberSLRULock, "pg_multixact/members",
@@ -2867,7 +2867,7 @@ SlruScanDirCbFindEarliest(SlruCtl ctl, char *filename, int segpage, void *data)
mxtruncinfo *trunc = (mxtruncinfo *) data;
if (trunc->earliestExistingPage == -1 ||
- ctl->PagePrecedes(segpage, trunc->earliestExistingPage))
+ ctl->PageDiff(segpage, trunc->earliestExistingPage))
{
trunc->earliestExistingPage = segpage;
}
@@ -2986,11 +2986,12 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
*
* When nextMXact is less than one segment away from multiWrapLimit,
* SlruScanDirCbFindEarliest can find some early segment other than the
- * actual earliest. (MultiXactOffsetPagePrecedes(EARLIEST, LATEST)
- * returns false, because not all pairs of entries have the same answer.)
- * That can also arise when an earlier truncation attempt failed unlink()
- * or returned early from this function. The only consequence is
- * returning early, which wastes space that we could have liberated.
+ * actual earliest. (MultiXactOffsetPageDiff(EARLIEST, LATEST) >= 0,
+ * because not all pairs of entries have the same answer.) That can also
+ * arise when an earlier truncation attempt failed unlink(), returned
+ * early from this function, or saw SlruWouldTruncateSegment() decline to
+ * delete the older half of the SLRU. The only consequence is returning
+ * early, which wastes space that we could have liberated.
*
* NB: It's also possible that the page that oldestMulti is on has already
* been truncated away, and we crashed before updating oldestMulti.
@@ -3106,44 +3107,42 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
}
/*
- * Decide whether a MultiXactOffset page number is "older" for truncation
- * purposes. Analogous to CLOGPagePrecedes().
- *
- * Offsetting the values is optional, because MultiXactIdPrecedes() has
- * translational symmetry.
+ * Diff MultiXactOffset page numbers for truncation purposes. Analogous to
+ * CLOGPageDiff().
*/
-static bool
-MultiXactOffsetPagePrecedes(int page1, int page2)
+static int32
+MultiXactOffsetPageDiff(int page1, int page2)
{
MultiXactId multi1;
MultiXactId multi2;
+ int32 diff_head;
+ int32 diff_tail;
multi1 = ((MultiXactId) page1) * MULTIXACT_OFFSETS_PER_PAGE;
- multi1 += FirstMultiXactId + 1;
multi2 = ((MultiXactId) page2) * MULTIXACT_OFFSETS_PER_PAGE;
- multi2 += FirstMultiXactId + 1;
- return (MultiXactIdPrecedes(multi1, multi2) &&
- MultiXactIdPrecedes(multi1,
- multi2 + MULTIXACT_OFFSETS_PER_PAGE - 1));
+ diff_head = multi1 - multi2;
+ diff_tail = multi1 - (multi2 + MULTIXACT_OFFSETS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
/*
- * Decide whether a MultiXactMember page number is "older" for truncation
- * purposes. There is no "invalid offset number" so use the numbers verbatim.
+ * Diff MultiXactMember page numbers for truncation purposes.
*/
-static bool
-MultiXactMemberPagePrecedes(int page1, int page2)
+static int32
+MultiXactMemberPageDiff(int page1, int page2)
{
MultiXactOffset offset1;
MultiXactOffset offset2;
+ int32 diff_head;
+ int32 diff_tail;
offset1 = ((MultiXactOffset) page1) * MULTIXACT_MEMBERS_PER_PAGE;
offset2 = ((MultiXactOffset) page2) * MULTIXACT_MEMBERS_PER_PAGE;
- return (MultiXactOffsetPrecedes(offset1, offset2) &&
- MultiXactOffsetPrecedes(offset1,
- offset2 + MULTIXACT_MEMBERS_PER_PAGE - 1));
+ diff_head = offset1 - offset2;
+ diff_tail = offset1 - (offset2 + MULTIXACT_MEMBERS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
/*
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 014072f..d6097ab 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -260,7 +260,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
/*
* Initialize the unshared control struct, including directory path. We
- * assume caller set PagePrecedes.
+ * assume caller set PageDiff.
*/
ctl->shared = shared;
ctl->sync_handler = sync_handler;
@@ -1091,8 +1091,8 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
{
if (this_delta > best_valid_delta ||
(this_delta == best_valid_delta &&
- ctl->PagePrecedes(this_page_number,
- best_valid_page_number)))
+ ctl->PageDiff(this_page_number,
+ best_valid_page_number) < 0))
{
bestvalidslot = slotno;
best_valid_delta = this_delta;
@@ -1103,8 +1103,8 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
{
if (this_delta > best_invalid_delta ||
(this_delta == best_invalid_delta &&
- ctl->PagePrecedes(this_page_number,
- best_invalid_page_number)))
+ ctl->PageDiff(this_page_number,
+ best_invalid_page_number) < 0))
{
bestinvalidslot = slotno;
best_invalid_delta = this_delta;
@@ -1211,7 +1211,8 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
}
/*
- * Remove all segments before the one holding the passed page number
+ * Remove some obsolete segments. As defense in depth, this deletes less than
+ * PageDiff() authorizes; see SlruWouldTruncateSegment().
*
* All SLRUs prevent concurrent calls to this function, either with an LWLock
* or by calling it only as part of a checkpoint. Mutual exclusion must begin
@@ -1244,7 +1245,7 @@ restart:;
* While we are holding the lock, make an important safety check: the
* current endpoint page must not be eligible for removal.
*/
- if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+ if (ctl->PageDiff(shared->latest_page_number, cutoffPage) < 0)
{
LWLockRelease(shared->ControlLock);
ereport(LOG,
@@ -1257,7 +1258,7 @@ restart:;
{
if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
continue;
- if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
+ if (ctl->PageDiff(shared->page_number[slotno], cutoffPage) >= 0)
continue;
/*
@@ -1382,33 +1383,46 @@ restart:
}
/*
- * Determine whether a segment is okay to delete.
+ * Determine whether to delete a segment.
*
* segpage is the first page of the segment, and cutoffPage is the oldest (in
- * PagePrecedes order) page in the SLRU containing still-useful data. Since
- * every core PagePrecedes callback implements "wrap around", check the
+ * PageDiff order) page in the SLRU containing still-useful data. Check the
* segment's first and last pages:
*
* first<cutoff && last<cutoff: yes
* first<cutoff && last>=cutoff: no; cutoff falls inside this segment
* first>=cutoff && last<cutoff: no; wrap point falls inside this segment
* first>=cutoff && last>=cutoff: no; every page of this segment is too young
+ *
+ * The PageDiff specification requires us not to remove pages where the
+ * callback reports negative values close to INT_MIN. Our interpretation is
+ * to decline to delete segments containing a page P such that PageDiff(P,
+ * cutoffPage) is in [INT_MIN, INT_MIN/2].
*/
static bool
-SlruMayDeleteSegment(SlruCtl ctl, int segpage, int cutoffPage)
+SlruWouldTruncateSegment(SlruCtl ctl, int segpage, int cutoffPage)
{
- int seg_last_page = segpage + SLRU_PAGES_PER_SEGMENT - 1;
+ int first_page_diff;
Assert(segpage % SLRU_PAGES_PER_SEGMENT == 0);
- return (ctl->PagePrecedes(segpage, cutoffPage) &&
- ctl->PagePrecedes(seg_last_page, cutoffPage));
+ first_page_diff = ctl->PageDiff(segpage, cutoffPage);
+ if (first_page_diff < 0 && first_page_diff > INT_MIN / 2)
+ {
+ int seg_last_page = segpage + SLRU_PAGES_PER_SEGMENT - 1;
+ int last_page_diff = ctl->PageDiff(seg_last_page, cutoffPage);
+
+ return last_page_diff < 0 && last_page_diff > INT_MIN / 2;
+ }
+ return false;
}
#ifdef USE_ASSERT_CHECKING
static void
-SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
+SlruPageDiffTestOffset(SlruCtl ctl, int per_page, uint32 offset)
{
+ int32 large_negative = INT_MIN / 1000 * 999,
+ large_positive = INT_MAX / 1000 * 999;
TransactionId lhs,
rhs;
int newestPage,
@@ -1432,19 +1446,27 @@ SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
Assert(!TransactionIdPrecedes(rhs, lhs + 1));
Assert(!TransactionIdFollowsOrEquals(lhs, rhs));
Assert(!TransactionIdFollowsOrEquals(rhs, lhs));
- Assert(!ctl->PagePrecedes(lhs / per_page, lhs / per_page));
- Assert(!ctl->PagePrecedes(lhs / per_page, rhs / per_page));
- Assert(!ctl->PagePrecedes(rhs / per_page, lhs / per_page));
- Assert(!ctl->PagePrecedes((lhs - per_page) / per_page, rhs / per_page));
- Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 3 * per_page) / per_page));
- Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 2 * per_page) / per_page));
- Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 1 * per_page) / per_page)
- || (1U << 31) % per_page != 0); /* See CommitTsPagePrecedes() */
- Assert(ctl->PagePrecedes((lhs + 1 * per_page) / per_page, rhs / per_page)
+ Assert(ctl->PageDiff(lhs / per_page, lhs / per_page) == 0);
+ Assert(ctl->PageDiff(lhs / per_page, rhs / per_page) > large_positive);
+ Assert(ctl->PageDiff(rhs / per_page, lhs / per_page) > large_positive);
+ Assert(ctl->PageDiff((lhs - per_page) / per_page, rhs / per_page) >
+ large_positive);
+ Assert(ctl->PageDiff(rhs / per_page, (lhs - 3 * per_page) / per_page) <
+ large_negative);
+ Assert(ctl->PageDiff(rhs / per_page, (lhs - 2 * per_page) / per_page) <
+ large_negative);
+ Assert(ctl->PageDiff(rhs / per_page, (lhs - 1 * per_page) / per_page) <
+ large_negative
+ || (1U << 31) % per_page != 0); /* See CommitTsPageDiff() */
+ Assert(ctl->PageDiff((lhs + 1 * per_page) / per_page, rhs / per_page) <
+ large_negative
|| (1U << 31) % per_page != 0);
- Assert(ctl->PagePrecedes((lhs + 2 * per_page) / per_page, rhs / per_page));
- Assert(ctl->PagePrecedes((lhs + 3 * per_page) / per_page, rhs / per_page));
- Assert(!ctl->PagePrecedes(rhs / per_page, (lhs + per_page) / per_page));
+ Assert(ctl->PageDiff((lhs + 2 * per_page) / per_page, rhs / per_page) <
+ large_negative);
+ Assert(ctl->PageDiff((lhs + 3 * per_page) / per_page, rhs / per_page) <
+ large_negative);
+ Assert(ctl->PageDiff(rhs / per_page, (lhs + per_page) / per_page) >
+ large_positive);
/*
* GetNewTransactionId() has assigned the last XID it can safely use, and
@@ -1457,10 +1479,10 @@ SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
oldestXact = newestXact + 1;
oldestXact -= 1U << 31;
oldestPage = oldestXact / per_page;
- Assert(!SlruMayDeleteSegment(ctl,
- (newestPage -
- newestPage % SLRU_PAGES_PER_SEGMENT),
- oldestPage));
+ Assert(!SlruWouldTruncateSegment(ctl,
+ (newestPage -
+ newestPage % SLRU_PAGES_PER_SEGMENT),
+ oldestPage));
/*
* GetNewTransactionId() has assigned the last XID it can safely use, and
@@ -1473,42 +1495,44 @@ SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
oldestXact = newestXact + 1;
oldestXact -= 1U << 31;
oldestPage = oldestXact / per_page;
- Assert(!SlruMayDeleteSegment(ctl,
- (newestPage -
- newestPage % SLRU_PAGES_PER_SEGMENT),
- oldestPage));
+ Assert(!SlruWouldTruncateSegment(ctl,
+ (newestPage -
+ newestPage % SLRU_PAGES_PER_SEGMENT),
+ oldestPage));
}
/*
- * Unit-test a PagePrecedes function.
+ * Unit-test a PageDiff function.
*
* This assumes every uint32 >= FirstNormalTransactionId is a valid key. It
* assumes each value occupies a contiguous, fixed-size region of SLRU bytes.
* (MultiXactMemberCtl separates flags from XIDs. AsyncCtl has
* variable-length entries, no keys, and no random access. These unit tests
* do not apply to them.)
+ *
+ * This is stricter than the PageDiff API requires.
*/
void
-SlruPagePrecedesUnitTests(SlruCtl ctl, int per_page)
+SlruPageDiffUnitTests(SlruCtl ctl, int per_page)
{
/* Test first, middle and last entries of a page. */
- SlruPagePrecedesTestOffset(ctl, per_page, 0);
- SlruPagePrecedesTestOffset(ctl, per_page, per_page / 2);
- SlruPagePrecedesTestOffset(ctl, per_page, per_page - 1);
+ SlruPageDiffTestOffset(ctl, per_page, 0);
+ SlruPageDiffTestOffset(ctl, per_page, per_page / 2);
+ SlruPageDiffTestOffset(ctl, per_page, per_page - 1);
}
#endif
/*
* SlruScanDirectory callback
- * This callback reports true if there's any segment wholly prior to the
- * one containing the page passed as "data".
+ * This callback reports true if SimpleLruTruncate(ctl, *data) would
+ * unlink any segment.
*/
bool
-SlruScanDirCbReportPresence(SlruCtl ctl, char *filename, int segpage, void *data)
+SlruScanDirCbWouldTruncate(SlruCtl ctl, char *filename, int segpage, void *data)
{
int cutoffPage = *(int *) data;
- if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
+ if (SlruWouldTruncateSegment(ctl, segpage, cutoffPage))
return true; /* found one; don't iterate any more */
return false; /* keep going */
@@ -1523,7 +1547,7 @@ SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename, int segpage, void *data)
{
int cutoffPage = *(int *) data;
- if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
+ if (SlruWouldTruncateSegment(ctl, segpage, cutoffPage))
SlruInternalDeleteSegment(ctl, filename);
return false; /* keep going */
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index c50490d..537862c 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -44,7 +44,7 @@
* 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE, and segment numbering at
* 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need take no
* explicit notice of that fact in this module, except when comparing segment
- * and page numbers in TruncateSUBTRANS (see SubTransPagePrecedes) and zeroing
+ * and page numbers in TruncateSUBTRANS (see SubTransPageDiff) and zeroing
* them in StartupSUBTRANS.
*/
@@ -64,7 +64,7 @@ static SlruCtlData SubTransCtlData;
static int ZeroSUBTRANSPage(int pageno);
-static bool SubTransPagePrecedes(int page1, int page2);
+static int32 SubTransPageDiff(int page1, int page2);
/*
@@ -190,11 +190,11 @@ SUBTRANSShmemSize(void)
void
SUBTRANSShmemInit(void)
{
- SubTransCtl->PagePrecedes = SubTransPagePrecedes;
+ SubTransCtl->PageDiff = SubTransPageDiff;
SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
SubtransSLRULock, "pg_subtrans",
LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE);
- SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
+ SlruPageDiffUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
}
/*
@@ -355,20 +355,21 @@ TruncateSUBTRANS(TransactionId oldestXact)
/*
- * Decide whether a SUBTRANS page number is "older" for truncation purposes.
- * Analogous to CLOGPagePrecedes().
+ * Diff SUBTRANS page numbers for truncation purposes. Analogous to
+ * CLOGPageDiff().
*/
-static bool
-SubTransPagePrecedes(int page1, int page2)
+static int32
+SubTransPageDiff(int page1, int page2)
{
TransactionId xid1;
TransactionId xid2;
+ int32 diff_head;
+ int32 diff_tail;
xid1 = ((TransactionId) page1) * SUBTRANS_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * SUBTRANS_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId + 1;
- return (TransactionIdPrecedes(xid1, xid2) &&
- TransactionIdPrecedes(xid1, xid2 + SUBTRANS_XACTS_PER_PAGE - 1));
+ diff_head = xid1 - xid2;
+ diff_tail = xid1 - (xid2 + SUBTRANS_XACTS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 9872129..9c6b82c 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -207,13 +207,13 @@ typedef struct QueuePosition
/* choose logically smaller QueuePosition */
#define QUEUE_POS_MIN(x,y) \
- (asyncQueuePagePrecedes((x).page, (y).page) ? (x) : \
+ (asyncQueuePageDiff((x).page, (y).page) < 0 ? (x) : \
(x).page != (y).page ? (y) : \
(x).offset < (y).offset ? (x) : (y))
/* choose logically larger QueuePosition */
#define QUEUE_POS_MAX(x,y) \
- (asyncQueuePagePrecedes((x).page, (y).page) ? (y) : \
+ (asyncQueuePageDiff((x).page, (y).page) < 0 ? (y) : \
(x).page != (y).page ? (x) : \
(x).offset > (y).offset ? (x) : (y))
@@ -433,8 +433,7 @@ static bool backendTryAdvanceTail = false;
bool Trace_notify = false;
/* local function prototypes */
-static int asyncQueuePageDiff(int p, int q);
-static bool asyncQueuePagePrecedes(int p, int q);
+static int32 asyncQueuePageDiff(int p, int q);
static void queue_listen(ListenActionKind action, const char *channel);
static void Async_UnlistenOnExit(int code, Datum arg);
static void Exec_ListenPreCommit(void);
@@ -465,12 +464,16 @@ static void ClearPendingActionsAndNotifies(void);
/*
* Compute the difference between two queue page numbers (i.e., p - q),
- * accounting for wraparound.
+ * accounting for wraparound. Since asyncQueueIsFull() blocks creation of a
+ * page that could precede any extant page, we need not assess entries within
+ * a page.
*/
-static int
+static int32
asyncQueuePageDiff(int p, int q)
{
- int diff;
+ int diff_max = ((QUEUE_MAX_PAGE + 1) / 2) - 1,
+ diff;
+ int32 scale = INT_MAX / diff_max;
/*
* We have to compare modulo (QUEUE_MAX_PAGE+1)/2. Both inputs should be
@@ -484,19 +487,24 @@ asyncQueuePageDiff(int p, int q)
diff -= QUEUE_MAX_PAGE + 1;
else if (diff < -((QUEUE_MAX_PAGE + 1) / 2))
diff += QUEUE_MAX_PAGE + 1;
- return diff;
+ return diff * scale;
}
-/*
- * Is p < q, accounting for wraparound?
- *
- * Since asyncQueueIsFull() blocks creation of a page that could precede any
- * extant page, we need not assess entries within a page.
- */
-static bool
-asyncQueuePagePrecedes(int p, int q)
+static void
+asyncQueuePageDiffUnitTests(void)
{
- return asyncQueuePageDiff(p, q) < 0;
+ int32 large_negative = INT_MIN / 1000 * 999,
+ large_positive = INT_MAX / 1000 * 999;
+ int diff_min = -((QUEUE_MAX_PAGE + 1) / 2),
+ diff_max = ((QUEUE_MAX_PAGE + 1) / 2) - 1;
+
+ Assert(asyncQueuePageDiff(diff_max, diff_max) == 0);
+ Assert(asyncQueuePageDiff(diff_max, 0) > large_positive);
+ Assert(asyncQueuePageDiff(diff_max + 1, 0) < large_negative);
+ Assert(asyncQueuePageDiff(0, QUEUE_MAX_PAGE + diff_min + 1) <
+ large_negative);
+ Assert(asyncQueuePageDiff(0, QUEUE_MAX_PAGE + diff_min + 2) >
+ large_positive);
}
/*
@@ -557,10 +565,11 @@ AsyncShmemInit(void)
/*
* Set up SLRU management of the pg_notify data.
*/
- NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
+ NotifyCtl->PageDiff = asyncQueuePageDiff;
SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
SYNC_HANDLER_NONE);
+ asyncQueuePageDiffUnitTests();
if (!found)
{
@@ -1365,7 +1374,7 @@ asyncQueueIsFull(void)
nexthead = 0; /* wrap around */
boundary = QUEUE_POS_PAGE(QUEUE_TAIL);
boundary -= boundary % SLRU_PAGES_PER_SEGMENT;
- return asyncQueuePagePrecedes(nexthead, boundary);
+ return asyncQueuePageDiff(nexthead, boundary) < 0;
}
/*
@@ -2203,7 +2212,7 @@ asyncQueueAdvanceTail(void)
*/
newtailpage = QUEUE_POS_PAGE(min);
boundary = newtailpage - (newtailpage % SLRU_PAGES_PER_SEGMENT);
- if (asyncQueuePagePrecedes(oldtailpage, boundary))
+ if (asyncQueuePageDiff(oldtailpage, boundary) < 0)
{
/*
* SimpleLruTruncate() will ask for NotifySLRULock but will also
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 1b646a0..733ad93 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -438,7 +438,7 @@ static void SetPossibleUnsafeConflict(SERIALIZABLEXACT *roXact, SERIALIZABLEXACT
static void ReleaseRWConflict(RWConflict conflict);
static void FlagSxactUnsafe(SERIALIZABLEXACT *sxact);
-static bool SerialPagePrecedesLogically(int page1, int page2);
+static int32 SerialPageDiffLogically(int page1, int page2);
static void SerialInit(void);
static void SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo);
static SerCommitSeqNo SerialGetMinConflictCommitSeqNo(TransactionId xid);
@@ -784,26 +784,30 @@ FlagSxactUnsafe(SERIALIZABLEXACT *sxact)
/*------------------------------------------------------------------------*/
/*
- * Decide whether a Serial page number is "older" for truncation purposes.
- * Analogous to CLOGPagePrecedes().
+ * Diff Serial page numbers for truncation purposes. Analogous to
+ * CLOGPageDiff().
+ *
+ * This must follow stricter rules than PageDiff demands, for the benefit of
+ * the call local to this file.
*/
-static bool
-SerialPagePrecedesLogically(int page1, int page2)
+static int32
+SerialPageDiffLogically(int page1, int page2)
{
TransactionId xid1;
TransactionId xid2;
+ int32 diff_head;
+ int32 diff_tail;
xid1 = ((TransactionId) page1) * SERIAL_ENTRIESPERPAGE;
- xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * SERIAL_ENTRIESPERPAGE;
- xid2 += FirstNormalTransactionId + 1;
- return (TransactionIdPrecedes(xid1, xid2) &&
- TransactionIdPrecedes(xid1, xid2 + SERIAL_ENTRIESPERPAGE - 1));
+ diff_head = xid1 - xid2;
+ diff_tail = xid1 - (xid2 + SERIAL_ENTRIESPERPAGE - 1);
+ return Max(diff_head, diff_tail);
}
static void
-SerialPagePrecedesLogicallyUnitTests(void)
+SerialPageDiffLogicallyUnitTests(void)
{
int per_page = SERIAL_ENTRIESPERPAGE,
offset = per_page / 2;
@@ -826,21 +830,21 @@ SerialPagePrecedesLogicallyUnitTests(void)
* In this scenario, the SLRU headPage pertains to the last ~1000 XIDs
* assigned. oldestXact finishes, ~2B XIDs having elapsed since it
* started. Further transactions cause us to summarize oldestXact to
- * tailPage. Function must return false so SerialAdd() doesn't zero
- * tailPage (which may contain entries for other old, recently-finished
- * XIDs) and half the SLRU. Reaching this requires burning ~2B XIDs in
- * single-user mode, a negligible possibility.
+ * tailPage. Function must return non-negative so SerialAdd() doesn't
+ * zero tailPage (which may contain entries for other old,
+ * recently-finished XIDs) and half the SLRU. Reaching this requires
+ * burning ~2B XIDs in single-user mode, a negligible possibility.
*/
headPage = newestPage;
targetPage = oldestPage;
- Assert(!SerialPagePrecedesLogically(headPage, targetPage));
+ Assert(SerialPageDiffLogically(headPage, targetPage) >= 0);
/*
* In this scenario, the SLRU headPage pertains to oldestXact. We're
* summarizing an XID near newestXact. (Assume few other XIDs used
* SERIALIZABLE, hence the minimal headPage advancement. Assume
* oldestXact was long-running and only recently reached the SLRU.)
- * Function must return true to make SerialAdd() create targetPage.
+ * Function must return negative to make SerialAdd() create targetPage.
*
* Today's implementation mishandles this case, but it doesn't matter
* enough to fix. Verify that the defect affects just one page by
@@ -851,9 +855,9 @@ SerialPagePrecedesLogicallyUnitTests(void)
*/
headPage = oldestPage;
targetPage = newestPage;
- Assert(SerialPagePrecedesLogically(headPage, targetPage - 1));
+ Assert(SerialPageDiffLogically(headPage, targetPage - 1) < 0);
#if 0
- Assert(SerialPagePrecedesLogically(headPage, targetPage));
+ Assert(SerialPageDiffLogically(headPage, targetPage) < 0);
#endif
}
@@ -868,12 +872,12 @@ SerialInit(void)
/*
* Set up SLRU management of the pg_serial data.
*/
- SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
+ SerialSlruCtl->PageDiff = SerialPageDiffLogically;
SimpleLruInit(SerialSlruCtl, "Serial",
NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE);
- SerialPagePrecedesLogicallyUnitTests();
- SlruPagePrecedesUnitTests(SerialSlruCtl, SERIAL_ENTRIESPERPAGE);
+ SerialPageDiffLogicallyUnitTests();
+ SlruPageDiffUnitTests(SerialSlruCtl, SERIAL_ENTRIESPERPAGE);
/*
* Create or attach to the SerialControl structure.
@@ -935,8 +939,8 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
else
{
firstZeroPage = SerialNextPage(serialControl->headPage);
- isNewPage = SerialPagePrecedesLogically(serialControl->headPage,
- targetPage);
+ isNewPage = SerialPageDiffLogically(serialControl->headPage,
+ targetPage) < 0;
}
if (!TransactionIdIsValid(serialControl->headXid)
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 805dd2b..967ccd1 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -29,7 +29,7 @@
* xxxx is CLOG or SUBTRANS, respectively), and segment numbering at
* 0xFFFFFFFF/xxxx_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
* take no explicit notice of that fact in slru.c, except when comparing
- * segment and page numbers in SimpleLruTruncate (see PagePrecedes()).
+ * segment and page numbers in SimpleLruTruncate (see PageDiff()).
*/
#define SLRU_PAGES_PER_SEGMENT 32
@@ -118,16 +118,18 @@ typedef struct SlruCtlData
SyncRequestHandler sync_handler;
/*
- * Decide whether a page is "older" for truncation and as a hint for
- * evicting pages in LRU order. Return true if every entry of the first
- * argument is older than every entry of the second argument. Note that
- * !PagePrecedes(a,b) && !PagePrecedes(b,a) need not imply a==b; it also
- * arises when some entries are older and some are not. For SLRUs using
- * SimpleLruTruncate(), this must use modular arithmetic. (For others,
- * the behavior of this callback has no functional implications.) Use
- * SlruPagePrecedesUnitTests() in SLRUs meeting its criteria.
+ * Compute distance between two page numbers, for truncation and as a hint
+ * for evicting pages in LRU order. Callbacks shall distribute return
+ * values uniformly in [INT_MIN,INT_MAX]. If PageDiff(P, oldest_needed)
+ * is negative but not close to INT_MIN, that implies data in page P is
+ * obsolete. The exception for values close to INT_MIN permits
+ * implementations to return such values for edge cases where the answer
+ * changes mid-page from INT_MIN to INT_MAX. Use SlruPageDiffUnitTests()
+ * in SLRUs meeting its criteria. For SLRUs using SimpleLruTruncate(),
+ * this must use modular arithmetic. (For others, the behavior of this
+ * callback has no functional implications.)
*/
- bool (*PagePrecedes) (int, int);
+ int32 (*PageDiff) (int, int);
/*
* Dir is set during SimpleLruInit and does not change thereafter. Since
@@ -151,9 +153,9 @@ extern int SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno,
extern void SimpleLruWritePage(SlruCtl ctl, int slotno);
extern void SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied);
#ifdef USE_ASSERT_CHECKING
-extern void SlruPagePrecedesUnitTests(SlruCtl ctl, int per_page);
+extern void SlruPageDiffUnitTests(SlruCtl ctl, int per_page);
#else
-#define SlruPagePrecedesUnitTests(ctl, per_page) do {} while (0)
+#define SlruPageDiffUnitTests(ctl, per_page) do {} while (0)
#endif
extern void SimpleLruTruncate(SlruCtl ctl, int cutoffPage);
extern bool SimpleLruDoesPhysicalPageExist(SlruCtl ctl, int pageno);
@@ -166,8 +168,8 @@ extern void SlruDeleteSegment(SlruCtl ctl, int segno);
extern int SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path);
/* SlruScanDirectory public callbacks */
-extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
- int segpage, void *data);
+extern bool SlruScanDirCbWouldTruncate(SlruCtl ctl, char *filename,
+ int segpage, void *data);
extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int segpage,
void *data);
On Wed, Oct 28, 2020 at 09:01:59PM -0700, Noah Misch wrote:
On Sat, Aug 29, 2020 at 10:34:33PM -0700, Noah Misch wrote:
On Mon, May 25, 2020 at 12:00:33AM -0700, Noah Misch wrote:
[last non-rebase change]
Rebased the second patch. The first patch did not need a rebase.
Rebased both patches, necessitated by commit dee663f changing many of the same
spots.
Rebased both patches, necessitated by commit c732c3f (a repair of commit
dee663f). As I mentioned on another branch of the thread, I'd be content if
someone reviews the slru-truncate-modulo patch and disclaims knowledge of the
slru-truncate-insurance patch; I would then abandon the latter patch.
Attachments:
slru-truncate-modulo-v5.patchtext/plain; charset=us-asciiDownload
Author: Noah Misch <noah@leadboat.com>
Commit: Noah Misch <noah@leadboat.com>
Prevent excess SimpleLruTruncate() deletion.
Every core SLRU wraps around. With the exception of pg_notify, the wrap
point can fall in the middle of a page. Account for this in the
PagePrecedes callback specification and in SimpleLruTruncate()'s use of
said callback. Update each callback implementation to fit the new
specification. This changes SerialPagePrecedesLogically() from the
style of asyncQueuePagePrecedes() to the style of CLOGPagePrecedes().
(Whereas pg_clog and pg_serial share a key space, pg_serial is nothing
like pg_notify.) The bug fixed here has the same symptoms and user
followup steps as 592a589a04bd456410b853d86bd05faa9432cbbb. Back-patch
to 9.5 (all supported versions).
Reviewed by Tom Lane.
Discussion: https://postgr.es/m/20190202083822.GC32531@gust.leadboat.com
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 034349a..55bdac4 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -694,6 +694,7 @@ CLOGShmemInit(void)
SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
SYNC_HANDLER_CLOG);
+ SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
}
/*
@@ -912,13 +913,22 @@ TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid)
/*
- * Decide which of two CLOG page numbers is "older" for truncation purposes.
+ * Decide whether a CLOG page number is "older" for truncation purposes.
*
* We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic. However, if we are asked about
- * page number zero, we don't want to hand InvalidTransactionId to
- * TransactionIdPrecedes: it'll get weird about permanent xact IDs. So,
- * offset both xids by FirstNormalTransactionId to avoid that.
+ * thing with wraparound XID arithmetic. However, TransactionIdPrecedes()
+ * would get weird about permanent xact IDs. So, offset both such that xid1,
+ * xid2, and xid + CLOG_XACTS_PER_PAGE - 1 are all normal XIDs; this offset is
+ * relevant to page 0 and to the page preceding page 0.
+ *
+ * The page containing oldestXact-2^31 is the important edge case. The
+ * portion of that page equaling or following oldestXact-2^31 is expendable,
+ * but the portion preceding oldestXact-2^31 is not. When oldestXact-2^31 is
+ * the first XID of a page and segment, the entire page and segment is
+ * expendable, and we could truncate the segment. Recognizing that case would
+ * require making oldestXact, not just the page containing oldestXact,
+ * available to this callback. The benefit would be rare and small, so we
+ * don't optimize that edge case.
*/
static bool
CLOGPagePrecedes(int page1, int page2)
@@ -927,11 +937,12 @@ CLOGPagePrecedes(int page1, int page2)
TransactionId xid2;
xid1 = ((TransactionId) page1) * CLOG_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId;
+ xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId;
+ xid2 += FirstNormalTransactionId + 1;
- return TransactionIdPrecedes(xid1, xid2);
+ return (TransactionIdPrecedes(xid1, xid2) &&
+ TransactionIdPrecedes(xid1, xid2 + CLOG_XACTS_PER_PAGE - 1));
}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 2fe551f..ae45777 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -557,6 +557,7 @@ CommitTsShmemInit(void)
CommitTsSLRULock, "pg_commit_ts",
LWTRANCHE_COMMITTS_BUFFER,
SYNC_HANDLER_COMMIT_TS);
+ SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
commitTsShared = ShmemInitStruct("CommitTs shared",
sizeof(CommitTimestampShared),
@@ -927,14 +928,27 @@ AdvanceOldestCommitTsXid(TransactionId oldestXact)
/*
- * Decide which of two commitTS page numbers is "older" for truncation
- * purposes.
+ * Decide whether a commitTS page number is "older" for truncation purposes.
+ * Analogous to CLOGPagePrecedes().
*
- * We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic. However, if we are asked about
- * page number zero, we don't want to hand InvalidTransactionId to
- * TransactionIdPrecedes: it'll get weird about permanent xact IDs. So,
- * offset both xids by FirstNormalTransactionId to avoid that.
+ * At every supported BLCKSZ, (1 << 31) % COMMIT_TS_XACTS_PER_PAGE == 128.
+ * This introduces differences compared to CLOG and the other SLRUs having (1
+ * << 31) % per_page == 0. This function never tests exactly
+ * TransactionIdPrecedes(x-2^31, x). When the system reaches xidStopLimit,
+ * there are two possible counts of page boundaries between oldestXact and the
+ * latest XID assigned, depending on whether oldestXact is within the first
+ * 128 entries of its page. Since this function doesn't know the location of
+ * oldestXact within page2, it returns false for one page that actually is
+ * expendable. This is a wider (yet still negligible) version of the
+ * truncation opportunity that CLOGPagePrecedes() cannot recognize.
+ *
+ * For the sake of a worked example, number entries with decimal values such
+ * that page1==1 entries range from 1.0 to 1.999. Let N+0.15 be the number of
+ * pages that 2^31 entries will span (N is an integer). If oldestXact=N+2.1,
+ * then the final safe XID assignment leaves newestXact=1.95. We keep page 2,
+ * because entry=2.85 is the border that toggles whether entries precede the
+ * last entry of the oldestXact page. While page 2 is expendable at
+ * oldestXact=N+2.1, it would be precious at oldestXact=N+2.9.
*/
static bool
CommitTsPagePrecedes(int page1, int page2)
@@ -943,11 +957,12 @@ CommitTsPagePrecedes(int page1, int page2)
TransactionId xid2;
xid1 = ((TransactionId) page1) * COMMIT_TS_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId;
+ xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * COMMIT_TS_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId;
+ xid2 += FirstNormalTransactionId + 1;
- return TransactionIdPrecedes(xid1, xid2);
+ return (TransactionIdPrecedes(xid1, xid2) &&
+ TransactionIdPrecedes(xid1, xid2 + COMMIT_TS_XACTS_PER_PAGE - 1));
}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index eb8de7c..ab34fa4 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1852,11 +1852,13 @@ MultiXactShmemInit(void)
MultiXactOffsetSLRULock, "pg_multixact/offsets",
LWTRANCHE_MULTIXACTOFFSET_BUFFER,
SYNC_HANDLER_MULTIXACT_OFFSET);
+ SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
SimpleLruInit(MultiXactMemberCtl,
"MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
MultiXactMemberSLRULock, "pg_multixact/members",
LWTRANCHE_MULTIXACTMEMBER_BUFFER,
SYNC_HANDLER_MULTIXACT_MEMBER);
+ /* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
/* Initialize our shared state struct */
MultiXactState = ShmemInitStruct("Shared MultiXact State",
@@ -2982,6 +2984,14 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
* truncate the members SLRU. So we first scan the directory to determine
* the earliest offsets page number that we can read without error.
*
+ * When nextMXact is less than one segment away from multiWrapLimit,
+ * SlruScanDirCbFindEarliest can find some early segment other than the
+ * actual earliest. (MultiXactOffsetPagePrecedes(EARLIEST, LATEST)
+ * returns false, because not all pairs of entries have the same answer.)
+ * That can also arise when an earlier truncation attempt failed unlink()
+ * or returned early from this function. The only consequence is
+ * returning early, which wastes space that we could have liberated.
+ *
* NB: It's also possible that the page that oldestMulti is on has already
* been truncated away, and we crashed before updating oldestMulti.
*/
@@ -3096,15 +3106,11 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
}
/*
- * Decide which of two MultiXactOffset page numbers is "older" for truncation
- * purposes.
+ * Decide whether a MultiXactOffset page number is "older" for truncation
+ * purposes. Analogous to CLOGPagePrecedes().
*
- * We need to use comparison of MultiXactId here in order to do the right
- * thing with wraparound. However, if we are asked about page number zero, we
- * don't want to hand InvalidMultiXactId to MultiXactIdPrecedes: it'll get
- * weird. So, offset both multis by FirstMultiXactId to avoid that.
- * (Actually, the current implementation doesn't do anything weird with
- * InvalidMultiXactId, but there's no harm in leaving this code like this.)
+ * Offsetting the values is optional, because MultiXactIdPrecedes() has
+ * translational symmetry.
*/
static bool
MultiXactOffsetPagePrecedes(int page1, int page2)
@@ -3113,15 +3119,17 @@ MultiXactOffsetPagePrecedes(int page1, int page2)
MultiXactId multi2;
multi1 = ((MultiXactId) page1) * MULTIXACT_OFFSETS_PER_PAGE;
- multi1 += FirstMultiXactId;
+ multi1 += FirstMultiXactId + 1;
multi2 = ((MultiXactId) page2) * MULTIXACT_OFFSETS_PER_PAGE;
- multi2 += FirstMultiXactId;
+ multi2 += FirstMultiXactId + 1;
- return MultiXactIdPrecedes(multi1, multi2);
+ return (MultiXactIdPrecedes(multi1, multi2) &&
+ MultiXactIdPrecedes(multi1,
+ multi2 + MULTIXACT_OFFSETS_PER_PAGE - 1));
}
/*
- * Decide which of two MultiXactMember page numbers is "older" for truncation
+ * Decide whether a MultiXactMember page number is "older" for truncation
* purposes. There is no "invalid offset number" so use the numbers verbatim.
*/
static bool
@@ -3133,7 +3141,9 @@ MultiXactMemberPagePrecedes(int page1, int page2)
offset1 = ((MultiXactOffset) page1) * MULTIXACT_MEMBERS_PER_PAGE;
offset2 = ((MultiXactOffset) page2) * MULTIXACT_MEMBERS_PER_PAGE;
- return MultiXactOffsetPrecedes(offset1, offset2);
+ return (MultiXactOffsetPrecedes(offset1, offset2) &&
+ MultiXactOffsetPrecedes(offset1,
+ offset2 + MULTIXACT_MEMBERS_PER_PAGE - 1));
}
/*
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index cec17cb..74d4281 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -1231,11 +1231,6 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
pgstat_count_slru_truncate(shared->slru_stats_idx);
/*
- * The cutoff point is the start of the segment containing cutoffPage.
- */
- cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
-
- /*
* Scan shared memory and remove any pages preceding the cutoff page, to
* ensure we won't rewrite them later. (Since this is normally called in
* or just after a checkpoint, any dirty pages should have been flushed
@@ -1247,9 +1242,7 @@ restart:;
/*
* While we are holding the lock, make an important safety check: the
- * planned cutoff point must be <= the current endpoint page. Otherwise we
- * have already wrapped around, and proceeding with the truncation would
- * risk removing the current segment.
+ * current endpoint page must not be eligible for removal.
*/
if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
{
@@ -1281,8 +1274,11 @@ restart:;
* Hmm, we have (or may have) I/O operations acting on the page, so
* we've got to wait for them to finish and then start again. This is
* the same logic as in SlruSelectLRUPage. (XXX if page is dirty,
- * wouldn't it be OK to just discard it without writing it? For now,
- * keep the logic the same as it was.)
+ * wouldn't it be OK to just discard it without writing it?
+ * SlruMayDeleteSegment() uses a stricter qualification, so we might
+ * not delete this page in the end; even if we don't delete it, we
+ * won't have cause to read its data again. For now, keep the logic
+ * the same as it was.)
*/
if (shared->page_status[slotno] == SLRU_PAGE_VALID)
SlruInternalWritePage(ctl, slotno, NULL);
@@ -1378,18 +1374,133 @@ restart:
}
/*
+ * Determine whether a segment is okay to delete.
+ *
+ * segpage is the first page of the segment, and cutoffPage is the oldest (in
+ * PagePrecedes order) page in the SLRU containing still-useful data. Since
+ * every core PagePrecedes callback implements "wrap around", check the
+ * segment's first and last pages:
+ *
+ * first<cutoff && last<cutoff: yes
+ * first<cutoff && last>=cutoff: no; cutoff falls inside this segment
+ * first>=cutoff && last<cutoff: no; wrap point falls inside this segment
+ * first>=cutoff && last>=cutoff: no; every page of this segment is too young
+ */
+static bool
+SlruMayDeleteSegment(SlruCtl ctl, int segpage, int cutoffPage)
+{
+ int seg_last_page = segpage + SLRU_PAGES_PER_SEGMENT - 1;
+
+ Assert(segpage % SLRU_PAGES_PER_SEGMENT == 0);
+
+ return (ctl->PagePrecedes(segpage, cutoffPage) &&
+ ctl->PagePrecedes(seg_last_page, cutoffPage));
+}
+
+#ifdef USE_ASSERT_CHECKING
+static void
+SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
+{
+ TransactionId lhs,
+ rhs;
+ int newestPage,
+ oldestPage;
+ TransactionId newestXact,
+ oldestXact;
+
+ /*
+ * Compare an XID pair having undefined order (see RFC 1982), a pair at
+ * "opposite ends" of the XID space. TransactionIdPrecedes() treats each
+ * as preceding the other. If RHS is oldestXact, LHS is the first XID we
+ * must not assign.
+ */
+ lhs = per_page + offset; /* skip first page to avoid non-normal XIDs */
+ rhs = lhs + (1U << 31);
+ Assert(TransactionIdPrecedes(lhs, rhs));
+ Assert(TransactionIdPrecedes(rhs, lhs));
+ Assert(!TransactionIdPrecedes(lhs - 1, rhs));
+ Assert(TransactionIdPrecedes(rhs, lhs - 1));
+ Assert(TransactionIdPrecedes(lhs + 1, rhs));
+ Assert(!TransactionIdPrecedes(rhs, lhs + 1));
+ Assert(!TransactionIdFollowsOrEquals(lhs, rhs));
+ Assert(!TransactionIdFollowsOrEquals(rhs, lhs));
+ Assert(!ctl->PagePrecedes(lhs / per_page, lhs / per_page));
+ Assert(!ctl->PagePrecedes(lhs / per_page, rhs / per_page));
+ Assert(!ctl->PagePrecedes(rhs / per_page, lhs / per_page));
+ Assert(!ctl->PagePrecedes((lhs - per_page) / per_page, rhs / per_page));
+ Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 3 * per_page) / per_page));
+ Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 2 * per_page) / per_page));
+ Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 1 * per_page) / per_page)
+ || (1U << 31) % per_page != 0); /* See CommitTsPagePrecedes() */
+ Assert(ctl->PagePrecedes((lhs + 1 * per_page) / per_page, rhs / per_page)
+ || (1U << 31) % per_page != 0);
+ Assert(ctl->PagePrecedes((lhs + 2 * per_page) / per_page, rhs / per_page));
+ Assert(ctl->PagePrecedes((lhs + 3 * per_page) / per_page, rhs / per_page));
+ Assert(!ctl->PagePrecedes(rhs / per_page, (lhs + per_page) / per_page));
+
+ /*
+ * GetNewTransactionId() has assigned the last XID it can safely use, and
+ * that XID is in the *LAST* page of the second segment. We must not
+ * delete that segment.
+ */
+ newestPage = 2 * SLRU_PAGES_PER_SEGMENT - 1;
+ newestXact = newestPage * per_page + offset;
+ Assert(newestXact / per_page == newestPage);
+ oldestXact = newestXact + 1;
+ oldestXact -= 1U << 31;
+ oldestPage = oldestXact / per_page;
+ Assert(!SlruMayDeleteSegment(ctl,
+ (newestPage -
+ newestPage % SLRU_PAGES_PER_SEGMENT),
+ oldestPage));
+
+ /*
+ * GetNewTransactionId() has assigned the last XID it can safely use, and
+ * that XID is in the *FIRST* page of the second segment. We must not
+ * delete that segment.
+ */
+ newestPage = SLRU_PAGES_PER_SEGMENT;
+ newestXact = newestPage * per_page + offset;
+ Assert(newestXact / per_page == newestPage);
+ oldestXact = newestXact + 1;
+ oldestXact -= 1U << 31;
+ oldestPage = oldestXact / per_page;
+ Assert(!SlruMayDeleteSegment(ctl,
+ (newestPage -
+ newestPage % SLRU_PAGES_PER_SEGMENT),
+ oldestPage));
+}
+
+/*
+ * Unit-test a PagePrecedes function.
+ *
+ * This assumes every uint32 >= FirstNormalTransactionId is a valid key. It
+ * assumes each value occupies a contiguous, fixed-size region of SLRU bytes.
+ * (MultiXactMemberCtl separates flags from XIDs. AsyncCtl has
+ * variable-length entries, no keys, and no random access. These unit tests
+ * do not apply to them.)
+ */
+void
+SlruPagePrecedesUnitTests(SlruCtl ctl, int per_page)
+{
+ /* Test first, middle and last entries of a page. */
+ SlruPagePrecedesTestOffset(ctl, per_page, 0);
+ SlruPagePrecedesTestOffset(ctl, per_page, per_page / 2);
+ SlruPagePrecedesTestOffset(ctl, per_page, per_page - 1);
+}
+#endif
+
+/*
* SlruScanDirectory callback
- * This callback reports true if there's any segment prior to the one
- * containing the page passed as "data".
+ * This callback reports true if there's any segment wholly prior to the
+ * one containing the page passed as "data".
*/
bool
SlruScanDirCbReportPresence(SlruCtl ctl, char *filename, int segpage, void *data)
{
int cutoffPage = *(int *) data;
- cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
-
- if (ctl->PagePrecedes(segpage, cutoffPage))
+ if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
return true; /* found one; don't iterate any more */
return false; /* keep going */
@@ -1404,7 +1515,7 @@ SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename, int segpage, void *data)
{
int cutoffPage = *(int *) data;
- if (ctl->PagePrecedes(segpage, cutoffPage))
+ if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
SlruInternalDeleteSegment(ctl, segpage / SLRU_PAGES_PER_SEGMENT);
return false; /* keep going */
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 0111e86..c50490d 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -194,6 +194,7 @@ SUBTRANSShmemInit(void)
SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
SubtransSLRULock, "pg_subtrans",
LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE);
+ SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
}
/*
@@ -354,13 +355,8 @@ TruncateSUBTRANS(TransactionId oldestXact)
/*
- * Decide which of two SUBTRANS page numbers is "older" for truncation purposes.
- *
- * We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic. However, if we are asked about
- * page number zero, we don't want to hand InvalidTransactionId to
- * TransactionIdPrecedes: it'll get weird about permanent xact IDs. So,
- * offset both xids by FirstNormalTransactionId to avoid that.
+ * Decide whether a SUBTRANS page number is "older" for truncation purposes.
+ * Analogous to CLOGPagePrecedes().
*/
static bool
SubTransPagePrecedes(int page1, int page2)
@@ -369,9 +365,10 @@ SubTransPagePrecedes(int page1, int page2)
TransactionId xid2;
xid1 = ((TransactionId) page1) * SUBTRANS_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId;
+ xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * SUBTRANS_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId;
+ xid2 += FirstNormalTransactionId + 1;
- return TransactionIdPrecedes(xid1, xid2);
+ return (TransactionIdPrecedes(xid1, xid2) &&
+ TransactionIdPrecedes(xid1, xid2 + SUBTRANS_XACTS_PER_PAGE - 1));
}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 8dbcace..9872129 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -487,7 +487,12 @@ asyncQueuePageDiff(int p, int q)
return diff;
}
-/* Is p < q, accounting for wraparound? */
+/*
+ * Is p < q, accounting for wraparound?
+ *
+ * Since asyncQueueIsFull() blocks creation of a page that could precede any
+ * extant page, we need not assess entries within a page.
+ */
static bool
asyncQueuePagePrecedes(int p, int q)
{
@@ -1348,8 +1353,8 @@ asyncQueueIsFull(void)
* logically precedes the current global tail pointer, ie, the head
* pointer would wrap around compared to the tail. We cannot create such
* a head page for fear of confusing slru.c. For safety we round the tail
- * pointer back to a segment boundary (compare the truncation logic in
- * asyncQueueAdvanceTail).
+ * pointer back to a segment boundary (truncation logic in
+ * asyncQueueAdvanceTail does not do this, so doing it here is optional).
*
* Note that this test is *not* dependent on how much space there is on
* the current head page. This is necessary because asyncQueueAddEntries
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 8a365b4..1b646a0 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -438,7 +438,7 @@ static void SetPossibleUnsafeConflict(SERIALIZABLEXACT *roXact, SERIALIZABLEXACT
static void ReleaseRWConflict(RWConflict conflict);
static void FlagSxactUnsafe(SERIALIZABLEXACT *sxact);
-static bool SerialPagePrecedesLogically(int p, int q);
+static bool SerialPagePrecedesLogically(int page1, int page2);
static void SerialInit(void);
static void SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo);
static SerCommitSeqNo SerialGetMinConflictCommitSeqNo(TransactionId xid);
@@ -784,27 +784,77 @@ FlagSxactUnsafe(SERIALIZABLEXACT *sxact)
/*------------------------------------------------------------------------*/
/*
- * We will work on the page range of 0..SERIAL_MAX_PAGE.
- * Compares using wraparound logic, as is required by slru.c.
+ * Decide whether a Serial page number is "older" for truncation purposes.
+ * Analogous to CLOGPagePrecedes().
*/
static bool
-SerialPagePrecedesLogically(int p, int q)
+SerialPagePrecedesLogically(int page1, int page2)
{
- int diff;
+ TransactionId xid1;
+ TransactionId xid2;
+
+ xid1 = ((TransactionId) page1) * SERIAL_ENTRIESPERPAGE;
+ xid1 += FirstNormalTransactionId + 1;
+ xid2 = ((TransactionId) page2) * SERIAL_ENTRIESPERPAGE;
+ xid2 += FirstNormalTransactionId + 1;
+
+ return (TransactionIdPrecedes(xid1, xid2) &&
+ TransactionIdPrecedes(xid1, xid2 + SERIAL_ENTRIESPERPAGE - 1));
+}
+
+static void
+SerialPagePrecedesLogicallyUnitTests(void)
+{
+ int per_page = SERIAL_ENTRIESPERPAGE,
+ offset = per_page / 2;
+ int newestPage,
+ oldestPage,
+ headPage,
+ targetPage;
+ TransactionId newestXact,
+ oldestXact;
+
+ /* GetNewTransactionId() has assigned the last XID it can safely use. */
+ newestPage = 2 * SLRU_PAGES_PER_SEGMENT - 1; /* nothing special */
+ newestXact = newestPage * per_page + offset;
+ Assert(newestXact / per_page == newestPage);
+ oldestXact = newestXact + 1;
+ oldestXact -= 1U << 31;
+ oldestPage = oldestXact / per_page;
/*
- * We have to compare modulo (SERIAL_MAX_PAGE+1)/2. Both inputs should be
- * in the range 0..SERIAL_MAX_PAGE.
+ * In this scenario, the SLRU headPage pertains to the last ~1000 XIDs
+ * assigned. oldestXact finishes, ~2B XIDs having elapsed since it
+ * started. Further transactions cause us to summarize oldestXact to
+ * tailPage. Function must return false so SerialAdd() doesn't zero
+ * tailPage (which may contain entries for other old, recently-finished
+ * XIDs) and half the SLRU. Reaching this requires burning ~2B XIDs in
+ * single-user mode, a negligible possibility.
*/
- Assert(p >= 0 && p <= SERIAL_MAX_PAGE);
- Assert(q >= 0 && q <= SERIAL_MAX_PAGE);
-
- diff = p - q;
- if (diff >= ((SERIAL_MAX_PAGE + 1) / 2))
- diff -= SERIAL_MAX_PAGE + 1;
- else if (diff < -((int) (SERIAL_MAX_PAGE + 1) / 2))
- diff += SERIAL_MAX_PAGE + 1;
- return diff < 0;
+ headPage = newestPage;
+ targetPage = oldestPage;
+ Assert(!SerialPagePrecedesLogically(headPage, targetPage));
+
+ /*
+ * In this scenario, the SLRU headPage pertains to oldestXact. We're
+ * summarizing an XID near newestXact. (Assume few other XIDs used
+ * SERIALIZABLE, hence the minimal headPage advancement. Assume
+ * oldestXact was long-running and only recently reached the SLRU.)
+ * Function must return true to make SerialAdd() create targetPage.
+ *
+ * Today's implementation mishandles this case, but it doesn't matter
+ * enough to fix. Verify that the defect affects just one page by
+ * asserting correct treatment of its prior page. Reaching this case
+ * requires burning ~2B XIDs in single-user mode, a negligible
+ * possibility. Moreover, if it does happen, the consequence would be
+ * mild, namely a new transaction failing in SimpleLruReadPage().
+ */
+ headPage = oldestPage;
+ targetPage = newestPage;
+ Assert(SerialPagePrecedesLogically(headPage, targetPage - 1));
+#if 0
+ Assert(SerialPagePrecedesLogically(headPage, targetPage));
+#endif
}
/*
@@ -822,6 +872,8 @@ SerialInit(void)
SimpleLruInit(SerialSlruCtl, "Serial",
NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE);
+ SerialPagePrecedesLogicallyUnitTests();
+ SlruPagePrecedesUnitTests(SerialSlruCtl, SERIAL_ENTRIESPERPAGE);
/*
* Create or attach to the SerialControl structure.
@@ -1030,7 +1082,7 @@ CheckPointPredicate(void)
}
else
{
- /*
+ /*----------
* The SLRU is no longer needed. Truncate to head before we set head
* invalid.
*
@@ -1039,6 +1091,25 @@ CheckPointPredicate(void)
* that we leave behind will appear to be new again. In that case it
* won't be removed until XID horizon advances enough to make it
* current again.
+ *
+ * XXX: This should happen in vac_truncate_clog(), not in checkpoints.
+ * Consider this scenario, starting from a system with no in-progress
+ * transactions and VACUUM FREEZE having maximized oldestXact:
+ * - Start a SERIALIZABLE transaction.
+ * - Start, finish, and summarize a SERIALIZABLE transaction, creating
+ * one SLRU page.
+ * - Consume XIDs to reach xidStopLimit.
+ * - Finish all transactions. Due to the long-running SERIALIZABLE
+ * transaction, earlier checkpoints did not touch headPage. The
+ * next checkpoint will change it, but that checkpoint happens after
+ * the end of the scenario.
+ * - VACUUM to advance XID limits.
+ * - Consume ~2M XIDs, crossing the former xidWrapLimit.
+ * - Start, finish, and summarize a SERIALIZABLE transaction.
+ * SerialAdd() declines to create the targetPage, because headPage
+ * is not regarded as in the past relative to that targetPage. The
+ * transaction instigating the summarize fails in
+ * SimpleLruReadPage().
*/
tailPage = serialControl->headPage;
serialControl->headPage = -1;
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index b39b435..805dd2b 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -118,9 +118,14 @@ typedef struct SlruCtlData
SyncRequestHandler sync_handler;
/*
- * Decide which of two page numbers is "older" for truncation purposes. We
- * need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic.
+ * Decide whether a page is "older" for truncation and as a hint for
+ * evicting pages in LRU order. Return true if every entry of the first
+ * argument is older than every entry of the second argument. Note that
+ * !PagePrecedes(a,b) && !PagePrecedes(b,a) need not imply a==b; it also
+ * arises when some entries are older and some are not. For SLRUs using
+ * SimpleLruTruncate(), this must use modular arithmetic. (For others,
+ * the behavior of this callback has no functional implications.) Use
+ * SlruPagePrecedesUnitTests() in SLRUs meeting its criteria.
*/
bool (*PagePrecedes) (int, int);
@@ -145,6 +150,11 @@ extern int SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno,
TransactionId xid);
extern void SimpleLruWritePage(SlruCtl ctl, int slotno);
extern void SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied);
+#ifdef USE_ASSERT_CHECKING
+extern void SlruPagePrecedesUnitTests(SlruCtl ctl, int per_page);
+#else
+#define SlruPagePrecedesUnitTests(ctl, per_page) do {} while (0)
+#endif
extern void SimpleLruTruncate(SlruCtl ctl, int cutoffPage);
extern bool SimpleLruDoesPhysicalPageExist(SlruCtl ctl, int pageno);
slru-truncate-t-insurance-v4.patchtext/plain; charset=us-asciiDownload
Author: Noah Misch <noah@leadboat.com>
Commit: Noah Misch <noah@leadboat.com>
Unlink less in SimpleLruTruncate(), as insurance against bugs.
SimpleLruTruncate() has been unlinking every expendable file. In edge
cases, it also deleted important files. The most recent commit fixed
that. Given the history of this class of bugs evading detection, let's
not trust that patch exclusively. Instead of unlinking segments
representing up to 2^31 past XIDs, delete no more than half that much.
The balance will stay in place; eventually, XID consumption will
overwrite it. This could mitigate unknown SimpleLruTruncate() bugs and
simplify manual remediation after one has overtaken wrap limits in
single-user mode.
Truncation behavior won't change at all until an SLRU is half full.
Once it does change, a drawback is conflict with the following defense.
TruncateMultiXact() skips truncation when unexpected files exist on
disk, which this change deliberately makes more common. Hence,
pg_multixact becomes more likely to persist in consuming its maximum
storage. Also, this change may uncover bugs in SLRU page recycling by
making that more common. For SLRUs outside of pg_multixact, maximum
storage rises by 50%; for example, the CLOG maximum rises from 512 MiB
to 768 MiB. Usage in pg_multixact may double. Back-patch to 9.5 (all
supported versions).
Reviewed by FIXME.
Discussion: https://postgr.es/m/20200330052809.GB2324620@rfd.leadboat.com
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 4d8ad75..c2a6961 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -405,6 +405,11 @@
in every database at least once every two billion transactions.
</para>
+ <!-- This oversimplifies; there are (2^31)-1 XIDs in the past, the same
+ number in the future, and one incomparable. (For each pair of incomparable
+ XIDs, TransactionIdPrecedes(a, b) and TransactionIdPrecedes(b, a) both
+ return true.) None of that is important to the DBA, since xidStopLimit
+ intervenes long before. -->
<para>
The reason that periodic vacuuming solves the problem is that
<command>VACUUM</command> will mark rows as <emphasis>frozen</emphasis>, indicating that
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 55bdac4..38aef74 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -53,7 +53,7 @@
* and CLOG segment numbering at
* 0xFFFFFFFF/CLOG_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need take no
* explicit notice of that fact in this module, except when comparing segment
- * and page numbers in TruncateCLOG (see CLOGPagePrecedes).
+ * and page numbers in TruncateCLOG (see CLOGPageDiff).
*/
/* We need two bits per xact, so four xacts fit in a byte */
@@ -90,7 +90,7 @@ static SlruCtlData XactCtlData;
static int ZeroCLOGPage(int pageno, bool writeXlog);
-static bool CLOGPagePrecedes(int page1, int page2);
+static int32 CLOGPageDiff(int page1, int page2);
static void WriteZeroPageXlogRec(int pageno);
static void WriteTruncateXlogRec(int pageno, TransactionId oldestXact,
Oid oldestXactDb);
@@ -690,11 +690,11 @@ CLOGShmemSize(void)
void
CLOGShmemInit(void)
{
- XactCtl->PagePrecedes = CLOGPagePrecedes;
+ XactCtl->PageDiff = CLOGPageDiff;
SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
SYNC_HANDLER_CLOG);
- SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
+ SlruPageDiffUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
}
/*
@@ -887,7 +887,7 @@ TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid)
cutoffPage = TransactionIdToPage(oldestXact);
/* Check to see if there's any files that could be removed */
- if (!SlruScanDirectory(XactCtl, SlruScanDirCbReportPresence, &cutoffPage))
+ if (!SlruScanDirectory(XactCtl, SlruScanDirCbWouldTruncate, &cutoffPage))
return; /* nothing to remove */
/*
@@ -913,13 +913,14 @@ TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid)
/*
- * Decide whether a CLOG page number is "older" for truncation purposes.
+ * Diff CLOG page numbers for truncation purposes.
*
- * We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic. However, TransactionIdPrecedes()
- * would get weird about permanent xact IDs. So, offset both such that xid1,
- * xid2, and xid + CLOG_XACTS_PER_PAGE - 1 are all normal XIDs; this offset is
- * relevant to page 0 and to the page preceding page 0.
+ * To do the right thing with wraparound XID arithmetic, this mirrors
+ * TransactionIdPrecedes(). The Max() operation ensures we return a positive
+ * value when the wrap point may fall inside these pages. (When it does, some
+ * pairs of entries have a positive diff, and other pairs have a negative
+ * diff.) Only the predicate.c SLRU needs the Max() operation; to avoid
+ * having even more corner cases to understand, all XID-indexed SLRUs do it.
*
* The page containing oldestXact-2^31 is the important edge case. The
* portion of that page equaling or following oldestXact-2^31 is expendable,
@@ -927,22 +928,22 @@ TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid)
* the first XID of a page and segment, the entire page and segment is
* expendable, and we could truncate the segment. Recognizing that case would
* require making oldestXact, not just the page containing oldestXact,
- * available to this callback. The benefit would be rare and small, so we
- * don't optimize that edge case.
+ * available to this callback. slru.c wouldn't delete the page, anyway.
*/
-static bool
-CLOGPagePrecedes(int page1, int page2)
+static int32
+CLOGPageDiff(int page1, int page2)
{
TransactionId xid1;
TransactionId xid2;
+ int32 diff_head;
+ int32 diff_tail;
xid1 = ((TransactionId) page1) * CLOG_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId + 1;
- return (TransactionIdPrecedes(xid1, xid2) &&
- TransactionIdPrecedes(xid1, xid2 + CLOG_XACTS_PER_PAGE - 1));
+ diff_head = xid1 - xid2;
+ diff_tail = xid1 - (xid2 + CLOG_XACTS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index ae45777..0da2cf6 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -46,7 +46,7 @@
* 0xFFFFFFFF/COMMIT_TS_XACTS_PER_PAGE, and CommitTs segment numbering at
* 0xFFFFFFFF/COMMIT_TS_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need take no
* explicit notice of that fact in this module, except when comparing segment
- * and page numbers in TruncateCommitTs (see CommitTsPagePrecedes).
+ * and page numbers in TruncateCommitTs (see CommitTsPageDiff).
*/
/*
@@ -109,7 +109,7 @@ static void TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
RepOriginId nodeid, int slotno);
static void error_commit_ts_disabled(void);
static int ZeroCommitTsPage(int pageno, bool writeXlog);
-static bool CommitTsPagePrecedes(int page1, int page2);
+static int32 CommitTsPageDiff(int page1, int page2);
static void ActivateCommitTs(void);
static void DeactivateCommitTs(void);
static void WriteZeroPageXlogRec(int pageno);
@@ -552,12 +552,12 @@ CommitTsShmemInit(void)
{
bool found;
- CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
+ CommitTsCtl->PageDiff = CommitTsPageDiff;
SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
CommitTsSLRULock, "pg_commit_ts",
LWTRANCHE_COMMITTS_BUFFER,
SYNC_HANDLER_COMMIT_TS);
- SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
+ SlruPageDiffUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
commitTsShared = ShmemInitStruct("CommitTs shared",
sizeof(CommitTimestampShared),
@@ -875,7 +875,7 @@ TruncateCommitTs(TransactionId oldestXact)
cutoffPage = TransactionIdToCTsPage(oldestXact);
/* Check to see if there's any files that could be removed */
- if (!SlruScanDirectory(CommitTsCtl, SlruScanDirCbReportPresence,
+ if (!SlruScanDirectory(CommitTsCtl, SlruScanDirCbWouldTruncate,
&cutoffPage))
return; /* nothing to remove */
@@ -928,8 +928,8 @@ AdvanceOldestCommitTsXid(TransactionId oldestXact)
/*
- * Decide whether a commitTS page number is "older" for truncation purposes.
- * Analogous to CLOGPagePrecedes().
+ * Diff commitTS page numbers for truncation purposes. Analogous to
+ * CLOGPageDiff().
*
* At every supported BLCKSZ, (1 << 31) % COMMIT_TS_XACTS_PER_PAGE == 128.
* This introduces differences compared to CLOG and the other SLRUs having (1
@@ -940,7 +940,7 @@ AdvanceOldestCommitTsXid(TransactionId oldestXact)
* 128 entries of its page. Since this function doesn't know the location of
* oldestXact within page2, it returns false for one page that actually is
* expendable. This is a wider (yet still negligible) version of the
- * truncation opportunity that CLOGPagePrecedes() cannot recognize.
+ * truncation opportunity that CLOGPageDiff() cannot recognize.
*
* For the sake of a worked example, number entries with decimal values such
* that page1==1 entries range from 1.0 to 1.999. Let N+0.15 be the number of
@@ -950,19 +950,20 @@ AdvanceOldestCommitTsXid(TransactionId oldestXact)
* last entry of the oldestXact page. While page 2 is expendable at
* oldestXact=N+2.1, it would be precious at oldestXact=N+2.9.
*/
-static bool
-CommitTsPagePrecedes(int page1, int page2)
+static int32
+CommitTsPageDiff(int page1, int page2)
{
TransactionId xid1;
TransactionId xid2;
+ int32 diff_head;
+ int32 diff_tail;
xid1 = ((TransactionId) page1) * COMMIT_TS_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * COMMIT_TS_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId + 1;
- return (TransactionIdPrecedes(xid1, xid2) &&
- TransactionIdPrecedes(xid1, xid2 + COMMIT_TS_XACTS_PER_PAGE - 1));
+ diff_head = xid1 - xid2;
+ diff_tail = xid1 - (xid2 + COMMIT_TS_XACTS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index ab34fa4..373bbeb 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -102,7 +102,7 @@
* 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
* take no explicit notice of that fact in this module, except when comparing
* segment and page numbers in TruncateMultiXact (see
- * MultiXactOffsetPagePrecedes).
+ * MultiXactOffsetPageDiff).
*/
/* We need four bytes per offset */
@@ -355,8 +355,8 @@ static char *mxstatus_to_string(MultiXactStatus status);
/* management of SLRU infrastructure */
static int ZeroMultiXactOffsetPage(int pageno, bool writeXlog);
static int ZeroMultiXactMemberPage(int pageno, bool writeXlog);
-static bool MultiXactOffsetPagePrecedes(int page1, int page2);
-static bool MultiXactMemberPagePrecedes(int page1, int page2);
+static int32 MultiXactOffsetPageDiff(int page1, int page2);
+static int32 MultiXactMemberPageDiff(int page1, int page2);
static bool MultiXactOffsetPrecedes(MultiXactOffset offset1,
MultiXactOffset offset2);
static void ExtendMultiXactOffset(MultiXactId multi);
@@ -1844,15 +1844,15 @@ MultiXactShmemInit(void)
debug_elog2(DEBUG2, "Shared Memory Init for MultiXact");
- MultiXactOffsetCtl->PagePrecedes = MultiXactOffsetPagePrecedes;
- MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
+ MultiXactOffsetCtl->PageDiff = MultiXactOffsetPageDiff;
+ MultiXactMemberCtl->PageDiff = MultiXactMemberPageDiff;
SimpleLruInit(MultiXactOffsetCtl,
"MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
MultiXactOffsetSLRULock, "pg_multixact/offsets",
LWTRANCHE_MULTIXACTOFFSET_BUFFER,
SYNC_HANDLER_MULTIXACT_OFFSET);
- SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
+ SlruPageDiffUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
SimpleLruInit(MultiXactMemberCtl,
"MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
MultiXactMemberSLRULock, "pg_multixact/members",
@@ -2867,7 +2867,7 @@ SlruScanDirCbFindEarliest(SlruCtl ctl, char *filename, int segpage, void *data)
mxtruncinfo *trunc = (mxtruncinfo *) data;
if (trunc->earliestExistingPage == -1 ||
- ctl->PagePrecedes(segpage, trunc->earliestExistingPage))
+ ctl->PageDiff(segpage, trunc->earliestExistingPage))
{
trunc->earliestExistingPage = segpage;
}
@@ -2986,11 +2986,12 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
*
* When nextMXact is less than one segment away from multiWrapLimit,
* SlruScanDirCbFindEarliest can find some early segment other than the
- * actual earliest. (MultiXactOffsetPagePrecedes(EARLIEST, LATEST)
- * returns false, because not all pairs of entries have the same answer.)
- * That can also arise when an earlier truncation attempt failed unlink()
- * or returned early from this function. The only consequence is
- * returning early, which wastes space that we could have liberated.
+ * actual earliest. (MultiXactOffsetPageDiff(EARLIEST, LATEST) >= 0,
+ * because not all pairs of entries have the same answer.) That can also
+ * arise when an earlier truncation attempt failed unlink(), returned
+ * early from this function, or saw SlruWouldTruncateSegment() decline to
+ * delete the older half of the SLRU. The only consequence is returning
+ * early, which wastes space that we could have liberated.
*
* NB: It's also possible that the page that oldestMulti is on has already
* been truncated away, and we crashed before updating oldestMulti.
@@ -3106,44 +3107,42 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
}
/*
- * Decide whether a MultiXactOffset page number is "older" for truncation
- * purposes. Analogous to CLOGPagePrecedes().
- *
- * Offsetting the values is optional, because MultiXactIdPrecedes() has
- * translational symmetry.
+ * Diff MultiXactOffset page numbers for truncation purposes. Analogous to
+ * CLOGPageDiff().
*/
-static bool
-MultiXactOffsetPagePrecedes(int page1, int page2)
+static int32
+MultiXactOffsetPageDiff(int page1, int page2)
{
MultiXactId multi1;
MultiXactId multi2;
+ int32 diff_head;
+ int32 diff_tail;
multi1 = ((MultiXactId) page1) * MULTIXACT_OFFSETS_PER_PAGE;
- multi1 += FirstMultiXactId + 1;
multi2 = ((MultiXactId) page2) * MULTIXACT_OFFSETS_PER_PAGE;
- multi2 += FirstMultiXactId + 1;
- return (MultiXactIdPrecedes(multi1, multi2) &&
- MultiXactIdPrecedes(multi1,
- multi2 + MULTIXACT_OFFSETS_PER_PAGE - 1));
+ diff_head = multi1 - multi2;
+ diff_tail = multi1 - (multi2 + MULTIXACT_OFFSETS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
/*
- * Decide whether a MultiXactMember page number is "older" for truncation
- * purposes. There is no "invalid offset number" so use the numbers verbatim.
+ * Diff MultiXactMember page numbers for truncation purposes.
*/
-static bool
-MultiXactMemberPagePrecedes(int page1, int page2)
+static int32
+MultiXactMemberPageDiff(int page1, int page2)
{
MultiXactOffset offset1;
MultiXactOffset offset2;
+ int32 diff_head;
+ int32 diff_tail;
offset1 = ((MultiXactOffset) page1) * MULTIXACT_MEMBERS_PER_PAGE;
offset2 = ((MultiXactOffset) page2) * MULTIXACT_MEMBERS_PER_PAGE;
- return (MultiXactOffsetPrecedes(offset1, offset2) &&
- MultiXactOffsetPrecedes(offset1,
- offset2 + MULTIXACT_MEMBERS_PER_PAGE - 1));
+ diff_head = offset1 - offset2;
+ diff_tail = offset1 - (offset2 + MULTIXACT_MEMBERS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
/*
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 74d4281..0158b8e 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -260,7 +260,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
/*
* Initialize the unshared control struct, including directory path. We
- * assume caller set PagePrecedes.
+ * assume caller set PageDiff.
*/
ctl->shared = shared;
ctl->sync_handler = sync_handler;
@@ -1091,8 +1091,8 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
{
if (this_delta > best_valid_delta ||
(this_delta == best_valid_delta &&
- ctl->PagePrecedes(this_page_number,
- best_valid_page_number)))
+ ctl->PageDiff(this_page_number,
+ best_valid_page_number) < 0))
{
bestvalidslot = slotno;
best_valid_delta = this_delta;
@@ -1103,8 +1103,8 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
{
if (this_delta > best_invalid_delta ||
(this_delta == best_invalid_delta &&
- ctl->PagePrecedes(this_page_number,
- best_invalid_page_number)))
+ ctl->PageDiff(this_page_number,
+ best_invalid_page_number) < 0))
{
bestinvalidslot = slotno;
best_invalid_delta = this_delta;
@@ -1211,7 +1211,8 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
}
/*
- * Remove all segments before the one holding the passed page number
+ * Remove some obsolete segments. As defense in depth, this deletes less than
+ * PageDiff() authorizes; see SlruWouldTruncateSegment().
*
* All SLRUs prevent concurrent calls to this function, either with an LWLock
* or by calling it only as part of a checkpoint. Mutual exclusion must begin
@@ -1244,7 +1245,7 @@ restart:;
* While we are holding the lock, make an important safety check: the
* current endpoint page must not be eligible for removal.
*/
- if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+ if (ctl->PageDiff(shared->latest_page_number, cutoffPage) < 0)
{
LWLockRelease(shared->ControlLock);
ereport(LOG,
@@ -1257,7 +1258,7 @@ restart:;
{
if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
continue;
- if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
+ if (ctl->PageDiff(shared->page_number[slotno], cutoffPage) >= 0)
continue;
/*
@@ -1374,33 +1375,46 @@ restart:
}
/*
- * Determine whether a segment is okay to delete.
+ * Determine whether to delete a segment.
*
* segpage is the first page of the segment, and cutoffPage is the oldest (in
- * PagePrecedes order) page in the SLRU containing still-useful data. Since
- * every core PagePrecedes callback implements "wrap around", check the
+ * PageDiff order) page in the SLRU containing still-useful data. Check the
* segment's first and last pages:
*
* first<cutoff && last<cutoff: yes
* first<cutoff && last>=cutoff: no; cutoff falls inside this segment
* first>=cutoff && last<cutoff: no; wrap point falls inside this segment
* first>=cutoff && last>=cutoff: no; every page of this segment is too young
+ *
+ * The PageDiff specification requires us not to remove pages where the
+ * callback reports negative values close to INT_MIN. Our interpretation is
+ * to decline to delete segments containing a page P such that PageDiff(P,
+ * cutoffPage) is in [INT_MIN, INT_MIN/2].
*/
static bool
-SlruMayDeleteSegment(SlruCtl ctl, int segpage, int cutoffPage)
+SlruWouldTruncateSegment(SlruCtl ctl, int segpage, int cutoffPage)
{
- int seg_last_page = segpage + SLRU_PAGES_PER_SEGMENT - 1;
+ int first_page_diff;
Assert(segpage % SLRU_PAGES_PER_SEGMENT == 0);
- return (ctl->PagePrecedes(segpage, cutoffPage) &&
- ctl->PagePrecedes(seg_last_page, cutoffPage));
+ first_page_diff = ctl->PageDiff(segpage, cutoffPage);
+ if (first_page_diff < 0 && first_page_diff > INT_MIN / 2)
+ {
+ int seg_last_page = segpage + SLRU_PAGES_PER_SEGMENT - 1;
+ int last_page_diff = ctl->PageDiff(seg_last_page, cutoffPage);
+
+ return last_page_diff < 0 && last_page_diff > INT_MIN / 2;
+ }
+ return false;
}
#ifdef USE_ASSERT_CHECKING
static void
-SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
+SlruPageDiffTestOffset(SlruCtl ctl, int per_page, uint32 offset)
{
+ int32 large_negative = INT_MIN / 1000 * 999,
+ large_positive = INT_MAX / 1000 * 999;
TransactionId lhs,
rhs;
int newestPage,
@@ -1424,19 +1438,27 @@ SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
Assert(!TransactionIdPrecedes(rhs, lhs + 1));
Assert(!TransactionIdFollowsOrEquals(lhs, rhs));
Assert(!TransactionIdFollowsOrEquals(rhs, lhs));
- Assert(!ctl->PagePrecedes(lhs / per_page, lhs / per_page));
- Assert(!ctl->PagePrecedes(lhs / per_page, rhs / per_page));
- Assert(!ctl->PagePrecedes(rhs / per_page, lhs / per_page));
- Assert(!ctl->PagePrecedes((lhs - per_page) / per_page, rhs / per_page));
- Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 3 * per_page) / per_page));
- Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 2 * per_page) / per_page));
- Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 1 * per_page) / per_page)
- || (1U << 31) % per_page != 0); /* See CommitTsPagePrecedes() */
- Assert(ctl->PagePrecedes((lhs + 1 * per_page) / per_page, rhs / per_page)
+ Assert(ctl->PageDiff(lhs / per_page, lhs / per_page) == 0);
+ Assert(ctl->PageDiff(lhs / per_page, rhs / per_page) > large_positive);
+ Assert(ctl->PageDiff(rhs / per_page, lhs / per_page) > large_positive);
+ Assert(ctl->PageDiff((lhs - per_page) / per_page, rhs / per_page) >
+ large_positive);
+ Assert(ctl->PageDiff(rhs / per_page, (lhs - 3 * per_page) / per_page) <
+ large_negative);
+ Assert(ctl->PageDiff(rhs / per_page, (lhs - 2 * per_page) / per_page) <
+ large_negative);
+ Assert(ctl->PageDiff(rhs / per_page, (lhs - 1 * per_page) / per_page) <
+ large_negative
+ || (1U << 31) % per_page != 0); /* See CommitTsPageDiff() */
+ Assert(ctl->PageDiff((lhs + 1 * per_page) / per_page, rhs / per_page) <
+ large_negative
|| (1U << 31) % per_page != 0);
- Assert(ctl->PagePrecedes((lhs + 2 * per_page) / per_page, rhs / per_page));
- Assert(ctl->PagePrecedes((lhs + 3 * per_page) / per_page, rhs / per_page));
- Assert(!ctl->PagePrecedes(rhs / per_page, (lhs + per_page) / per_page));
+ Assert(ctl->PageDiff((lhs + 2 * per_page) / per_page, rhs / per_page) <
+ large_negative);
+ Assert(ctl->PageDiff((lhs + 3 * per_page) / per_page, rhs / per_page) <
+ large_negative);
+ Assert(ctl->PageDiff(rhs / per_page, (lhs + per_page) / per_page) >
+ large_positive);
/*
* GetNewTransactionId() has assigned the last XID it can safely use, and
@@ -1449,10 +1471,10 @@ SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
oldestXact = newestXact + 1;
oldestXact -= 1U << 31;
oldestPage = oldestXact / per_page;
- Assert(!SlruMayDeleteSegment(ctl,
- (newestPage -
- newestPage % SLRU_PAGES_PER_SEGMENT),
- oldestPage));
+ Assert(!SlruWouldTruncateSegment(ctl,
+ (newestPage -
+ newestPage % SLRU_PAGES_PER_SEGMENT),
+ oldestPage));
/*
* GetNewTransactionId() has assigned the last XID it can safely use, and
@@ -1465,42 +1487,44 @@ SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
oldestXact = newestXact + 1;
oldestXact -= 1U << 31;
oldestPage = oldestXact / per_page;
- Assert(!SlruMayDeleteSegment(ctl,
- (newestPage -
- newestPage % SLRU_PAGES_PER_SEGMENT),
- oldestPage));
+ Assert(!SlruWouldTruncateSegment(ctl,
+ (newestPage -
+ newestPage % SLRU_PAGES_PER_SEGMENT),
+ oldestPage));
}
/*
- * Unit-test a PagePrecedes function.
+ * Unit-test a PageDiff function.
*
* This assumes every uint32 >= FirstNormalTransactionId is a valid key. It
* assumes each value occupies a contiguous, fixed-size region of SLRU bytes.
* (MultiXactMemberCtl separates flags from XIDs. AsyncCtl has
* variable-length entries, no keys, and no random access. These unit tests
* do not apply to them.)
+ *
+ * This is stricter than the PageDiff API requires.
*/
void
-SlruPagePrecedesUnitTests(SlruCtl ctl, int per_page)
+SlruPageDiffUnitTests(SlruCtl ctl, int per_page)
{
/* Test first, middle and last entries of a page. */
- SlruPagePrecedesTestOffset(ctl, per_page, 0);
- SlruPagePrecedesTestOffset(ctl, per_page, per_page / 2);
- SlruPagePrecedesTestOffset(ctl, per_page, per_page - 1);
+ SlruPageDiffTestOffset(ctl, per_page, 0);
+ SlruPageDiffTestOffset(ctl, per_page, per_page / 2);
+ SlruPageDiffTestOffset(ctl, per_page, per_page - 1);
}
#endif
/*
* SlruScanDirectory callback
- * This callback reports true if there's any segment wholly prior to the
- * one containing the page passed as "data".
+ * This callback reports true if SimpleLruTruncate(ctl, *data) would
+ * unlink any segment.
*/
bool
-SlruScanDirCbReportPresence(SlruCtl ctl, char *filename, int segpage, void *data)
+SlruScanDirCbWouldTruncate(SlruCtl ctl, char *filename, int segpage, void *data)
{
int cutoffPage = *(int *) data;
- if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
+ if (SlruWouldTruncateSegment(ctl, segpage, cutoffPage))
return true; /* found one; don't iterate any more */
return false; /* keep going */
@@ -1515,7 +1539,7 @@ SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename, int segpage, void *data)
{
int cutoffPage = *(int *) data;
- if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
+ if (SlruWouldTruncateSegment(ctl, segpage, cutoffPage))
SlruInternalDeleteSegment(ctl, segpage / SLRU_PAGES_PER_SEGMENT);
return false; /* keep going */
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index c50490d..537862c 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -44,7 +44,7 @@
* 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE, and segment numbering at
* 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need take no
* explicit notice of that fact in this module, except when comparing segment
- * and page numbers in TruncateSUBTRANS (see SubTransPagePrecedes) and zeroing
+ * and page numbers in TruncateSUBTRANS (see SubTransPageDiff) and zeroing
* them in StartupSUBTRANS.
*/
@@ -64,7 +64,7 @@ static SlruCtlData SubTransCtlData;
static int ZeroSUBTRANSPage(int pageno);
-static bool SubTransPagePrecedes(int page1, int page2);
+static int32 SubTransPageDiff(int page1, int page2);
/*
@@ -190,11 +190,11 @@ SUBTRANSShmemSize(void)
void
SUBTRANSShmemInit(void)
{
- SubTransCtl->PagePrecedes = SubTransPagePrecedes;
+ SubTransCtl->PageDiff = SubTransPageDiff;
SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
SubtransSLRULock, "pg_subtrans",
LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE);
- SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
+ SlruPageDiffUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
}
/*
@@ -355,20 +355,21 @@ TruncateSUBTRANS(TransactionId oldestXact)
/*
- * Decide whether a SUBTRANS page number is "older" for truncation purposes.
- * Analogous to CLOGPagePrecedes().
+ * Diff SUBTRANS page numbers for truncation purposes. Analogous to
+ * CLOGPageDiff().
*/
-static bool
-SubTransPagePrecedes(int page1, int page2)
+static int32
+SubTransPageDiff(int page1, int page2)
{
TransactionId xid1;
TransactionId xid2;
+ int32 diff_head;
+ int32 diff_tail;
xid1 = ((TransactionId) page1) * SUBTRANS_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * SUBTRANS_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId + 1;
- return (TransactionIdPrecedes(xid1, xid2) &&
- TransactionIdPrecedes(xid1, xid2 + SUBTRANS_XACTS_PER_PAGE - 1));
+ diff_head = xid1 - xid2;
+ diff_tail = xid1 - (xid2 + SUBTRANS_XACTS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 9872129..9c6b82c 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -207,13 +207,13 @@ typedef struct QueuePosition
/* choose logically smaller QueuePosition */
#define QUEUE_POS_MIN(x,y) \
- (asyncQueuePagePrecedes((x).page, (y).page) ? (x) : \
+ (asyncQueuePageDiff((x).page, (y).page) < 0 ? (x) : \
(x).page != (y).page ? (y) : \
(x).offset < (y).offset ? (x) : (y))
/* choose logically larger QueuePosition */
#define QUEUE_POS_MAX(x,y) \
- (asyncQueuePagePrecedes((x).page, (y).page) ? (y) : \
+ (asyncQueuePageDiff((x).page, (y).page) < 0 ? (y) : \
(x).page != (y).page ? (x) : \
(x).offset > (y).offset ? (x) : (y))
@@ -433,8 +433,7 @@ static bool backendTryAdvanceTail = false;
bool Trace_notify = false;
/* local function prototypes */
-static int asyncQueuePageDiff(int p, int q);
-static bool asyncQueuePagePrecedes(int p, int q);
+static int32 asyncQueuePageDiff(int p, int q);
static void queue_listen(ListenActionKind action, const char *channel);
static void Async_UnlistenOnExit(int code, Datum arg);
static void Exec_ListenPreCommit(void);
@@ -465,12 +464,16 @@ static void ClearPendingActionsAndNotifies(void);
/*
* Compute the difference between two queue page numbers (i.e., p - q),
- * accounting for wraparound.
+ * accounting for wraparound. Since asyncQueueIsFull() blocks creation of a
+ * page that could precede any extant page, we need not assess entries within
+ * a page.
*/
-static int
+static int32
asyncQueuePageDiff(int p, int q)
{
- int diff;
+ int diff_max = ((QUEUE_MAX_PAGE + 1) / 2) - 1,
+ diff;
+ int32 scale = INT_MAX / diff_max;
/*
* We have to compare modulo (QUEUE_MAX_PAGE+1)/2. Both inputs should be
@@ -484,19 +487,24 @@ asyncQueuePageDiff(int p, int q)
diff -= QUEUE_MAX_PAGE + 1;
else if (diff < -((QUEUE_MAX_PAGE + 1) / 2))
diff += QUEUE_MAX_PAGE + 1;
- return diff;
+ return diff * scale;
}
-/*
- * Is p < q, accounting for wraparound?
- *
- * Since asyncQueueIsFull() blocks creation of a page that could precede any
- * extant page, we need not assess entries within a page.
- */
-static bool
-asyncQueuePagePrecedes(int p, int q)
+static void
+asyncQueuePageDiffUnitTests(void)
{
- return asyncQueuePageDiff(p, q) < 0;
+ int32 large_negative = INT_MIN / 1000 * 999,
+ large_positive = INT_MAX / 1000 * 999;
+ int diff_min = -((QUEUE_MAX_PAGE + 1) / 2),
+ diff_max = ((QUEUE_MAX_PAGE + 1) / 2) - 1;
+
+ Assert(asyncQueuePageDiff(diff_max, diff_max) == 0);
+ Assert(asyncQueuePageDiff(diff_max, 0) > large_positive);
+ Assert(asyncQueuePageDiff(diff_max + 1, 0) < large_negative);
+ Assert(asyncQueuePageDiff(0, QUEUE_MAX_PAGE + diff_min + 1) <
+ large_negative);
+ Assert(asyncQueuePageDiff(0, QUEUE_MAX_PAGE + diff_min + 2) >
+ large_positive);
}
/*
@@ -557,10 +565,11 @@ AsyncShmemInit(void)
/*
* Set up SLRU management of the pg_notify data.
*/
- NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
+ NotifyCtl->PageDiff = asyncQueuePageDiff;
SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
SYNC_HANDLER_NONE);
+ asyncQueuePageDiffUnitTests();
if (!found)
{
@@ -1365,7 +1374,7 @@ asyncQueueIsFull(void)
nexthead = 0; /* wrap around */
boundary = QUEUE_POS_PAGE(QUEUE_TAIL);
boundary -= boundary % SLRU_PAGES_PER_SEGMENT;
- return asyncQueuePagePrecedes(nexthead, boundary);
+ return asyncQueuePageDiff(nexthead, boundary) < 0;
}
/*
@@ -2203,7 +2212,7 @@ asyncQueueAdvanceTail(void)
*/
newtailpage = QUEUE_POS_PAGE(min);
boundary = newtailpage - (newtailpage % SLRU_PAGES_PER_SEGMENT);
- if (asyncQueuePagePrecedes(oldtailpage, boundary))
+ if (asyncQueuePageDiff(oldtailpage, boundary) < 0)
{
/*
* SimpleLruTruncate() will ask for NotifySLRULock but will also
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 1b646a0..733ad93 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -438,7 +438,7 @@ static void SetPossibleUnsafeConflict(SERIALIZABLEXACT *roXact, SERIALIZABLEXACT
static void ReleaseRWConflict(RWConflict conflict);
static void FlagSxactUnsafe(SERIALIZABLEXACT *sxact);
-static bool SerialPagePrecedesLogically(int page1, int page2);
+static int32 SerialPageDiffLogically(int page1, int page2);
static void SerialInit(void);
static void SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo);
static SerCommitSeqNo SerialGetMinConflictCommitSeqNo(TransactionId xid);
@@ -784,26 +784,30 @@ FlagSxactUnsafe(SERIALIZABLEXACT *sxact)
/*------------------------------------------------------------------------*/
/*
- * Decide whether a Serial page number is "older" for truncation purposes.
- * Analogous to CLOGPagePrecedes().
+ * Diff Serial page numbers for truncation purposes. Analogous to
+ * CLOGPageDiff().
+ *
+ * This must follow stricter rules than PageDiff demands, for the benefit of
+ * the call local to this file.
*/
-static bool
-SerialPagePrecedesLogically(int page1, int page2)
+static int32
+SerialPageDiffLogically(int page1, int page2)
{
TransactionId xid1;
TransactionId xid2;
+ int32 diff_head;
+ int32 diff_tail;
xid1 = ((TransactionId) page1) * SERIAL_ENTRIESPERPAGE;
- xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * SERIAL_ENTRIESPERPAGE;
- xid2 += FirstNormalTransactionId + 1;
- return (TransactionIdPrecedes(xid1, xid2) &&
- TransactionIdPrecedes(xid1, xid2 + SERIAL_ENTRIESPERPAGE - 1));
+ diff_head = xid1 - xid2;
+ diff_tail = xid1 - (xid2 + SERIAL_ENTRIESPERPAGE - 1);
+ return Max(diff_head, diff_tail);
}
static void
-SerialPagePrecedesLogicallyUnitTests(void)
+SerialPageDiffLogicallyUnitTests(void)
{
int per_page = SERIAL_ENTRIESPERPAGE,
offset = per_page / 2;
@@ -826,21 +830,21 @@ SerialPagePrecedesLogicallyUnitTests(void)
* In this scenario, the SLRU headPage pertains to the last ~1000 XIDs
* assigned. oldestXact finishes, ~2B XIDs having elapsed since it
* started. Further transactions cause us to summarize oldestXact to
- * tailPage. Function must return false so SerialAdd() doesn't zero
- * tailPage (which may contain entries for other old, recently-finished
- * XIDs) and half the SLRU. Reaching this requires burning ~2B XIDs in
- * single-user mode, a negligible possibility.
+ * tailPage. Function must return non-negative so SerialAdd() doesn't
+ * zero tailPage (which may contain entries for other old,
+ * recently-finished XIDs) and half the SLRU. Reaching this requires
+ * burning ~2B XIDs in single-user mode, a negligible possibility.
*/
headPage = newestPage;
targetPage = oldestPage;
- Assert(!SerialPagePrecedesLogically(headPage, targetPage));
+ Assert(SerialPageDiffLogically(headPage, targetPage) >= 0);
/*
* In this scenario, the SLRU headPage pertains to oldestXact. We're
* summarizing an XID near newestXact. (Assume few other XIDs used
* SERIALIZABLE, hence the minimal headPage advancement. Assume
* oldestXact was long-running and only recently reached the SLRU.)
- * Function must return true to make SerialAdd() create targetPage.
+ * Function must return negative to make SerialAdd() create targetPage.
*
* Today's implementation mishandles this case, but it doesn't matter
* enough to fix. Verify that the defect affects just one page by
@@ -851,9 +855,9 @@ SerialPagePrecedesLogicallyUnitTests(void)
*/
headPage = oldestPage;
targetPage = newestPage;
- Assert(SerialPagePrecedesLogically(headPage, targetPage - 1));
+ Assert(SerialPageDiffLogically(headPage, targetPage - 1) < 0);
#if 0
- Assert(SerialPagePrecedesLogically(headPage, targetPage));
+ Assert(SerialPageDiffLogically(headPage, targetPage) < 0);
#endif
}
@@ -868,12 +872,12 @@ SerialInit(void)
/*
* Set up SLRU management of the pg_serial data.
*/
- SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
+ SerialSlruCtl->PageDiff = SerialPageDiffLogically;
SimpleLruInit(SerialSlruCtl, "Serial",
NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE);
- SerialPagePrecedesLogicallyUnitTests();
- SlruPagePrecedesUnitTests(SerialSlruCtl, SERIAL_ENTRIESPERPAGE);
+ SerialPageDiffLogicallyUnitTests();
+ SlruPageDiffUnitTests(SerialSlruCtl, SERIAL_ENTRIESPERPAGE);
/*
* Create or attach to the SerialControl structure.
@@ -935,8 +939,8 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
else
{
firstZeroPage = SerialNextPage(serialControl->headPage);
- isNewPage = SerialPagePrecedesLogically(serialControl->headPage,
- targetPage);
+ isNewPage = SerialPageDiffLogically(serialControl->headPage,
+ targetPage) < 0;
}
if (!TransactionIdIsValid(serialControl->headXid)
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 805dd2b..967ccd1 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -29,7 +29,7 @@
* xxxx is CLOG or SUBTRANS, respectively), and segment numbering at
* 0xFFFFFFFF/xxxx_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
* take no explicit notice of that fact in slru.c, except when comparing
- * segment and page numbers in SimpleLruTruncate (see PagePrecedes()).
+ * segment and page numbers in SimpleLruTruncate (see PageDiff()).
*/
#define SLRU_PAGES_PER_SEGMENT 32
@@ -118,16 +118,18 @@ typedef struct SlruCtlData
SyncRequestHandler sync_handler;
/*
- * Decide whether a page is "older" for truncation and as a hint for
- * evicting pages in LRU order. Return true if every entry of the first
- * argument is older than every entry of the second argument. Note that
- * !PagePrecedes(a,b) && !PagePrecedes(b,a) need not imply a==b; it also
- * arises when some entries are older and some are not. For SLRUs using
- * SimpleLruTruncate(), this must use modular arithmetic. (For others,
- * the behavior of this callback has no functional implications.) Use
- * SlruPagePrecedesUnitTests() in SLRUs meeting its criteria.
+ * Compute distance between two page numbers, for truncation and as a hint
+ * for evicting pages in LRU order. Callbacks shall distribute return
+ * values uniformly in [INT_MIN,INT_MAX]. If PageDiff(P, oldest_needed)
+ * is negative but not close to INT_MIN, that implies data in page P is
+ * obsolete. The exception for values close to INT_MIN permits
+ * implementations to return such values for edge cases where the answer
+ * changes mid-page from INT_MIN to INT_MAX. Use SlruPageDiffUnitTests()
+ * in SLRUs meeting its criteria. For SLRUs using SimpleLruTruncate(),
+ * this must use modular arithmetic. (For others, the behavior of this
+ * callback has no functional implications.)
*/
- bool (*PagePrecedes) (int, int);
+ int32 (*PageDiff) (int, int);
/*
* Dir is set during SimpleLruInit and does not change thereafter. Since
@@ -151,9 +153,9 @@ extern int SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno,
extern void SimpleLruWritePage(SlruCtl ctl, int slotno);
extern void SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied);
#ifdef USE_ASSERT_CHECKING
-extern void SlruPagePrecedesUnitTests(SlruCtl ctl, int per_page);
+extern void SlruPageDiffUnitTests(SlruCtl ctl, int per_page);
#else
-#define SlruPagePrecedesUnitTests(ctl, per_page) do {} while (0)
+#define SlruPageDiffUnitTests(ctl, per_page) do {} while (0)
#endif
extern void SimpleLruTruncate(SlruCtl ctl, int cutoffPage);
extern bool SimpleLruDoesPhysicalPageExist(SlruCtl ctl, int pageno);
@@ -166,8 +168,8 @@ extern void SlruDeleteSegment(SlruCtl ctl, int segno);
extern int SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path);
/* SlruScanDirectory public callbacks */
-extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
- int segpage, void *data);
+extern bool SlruScanDirCbWouldTruncate(SlruCtl ctl, char *filename,
+ int segpage, void *data);
extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int segpage,
void *data);
Hi Noah!
I've found this thread in CF looking for something to review.
9 нояб. 2020 г., в 09:53, Noah Misch <noah@leadboat.com> написал(а):
Rebased both patches, necessitated by commit c732c3f (a repair of commit
dee663f). As I mentioned on another branch of the thread, I'd be content if
someone reviews the slru-truncate-modulo patch and disclaims knowledge of the
slru-truncate-insurance patch; I would then abandon the latter patch.
<slru-truncate-modulo-v5.patch><slru-truncate-t-insurance-v4.patch>
Commit c732c3f adds some SYNC_FORGET_REQUESTs.
slru-truncate-modulo-v5.patch fixes off-by-one error in functions like *PagePrecedes(int page1, int page2).
slru-truncate-t-insurance-v4.patch ensures that off-by-one errors do not inflict data loss.
While I agree that fixing error is better than hiding it, I could not figure out how c732c3f is connected to these patches.
Can you please give me few pointers how to understand this connection?
Best regards, Andrey Borodin.
On Fri, Jan 01, 2021 at 11:05:29PM +0500, Andrey Borodin wrote:
I've found this thread in CF looking for something to review.
Thanks for taking a look.
9 нояб. 2020 г., в 09:53, Noah Misch <noah@leadboat.com> написал(а):
Rebased both patches, necessitated by commit c732c3f (a repair of commit
dee663f). As I mentioned on another branch of the thread, I'd be content if
someone reviews the slru-truncate-modulo patch and disclaims knowledge of the
slru-truncate-insurance patch; I would then abandon the latter patch.
<slru-truncate-modulo-v5.patch><slru-truncate-t-insurance-v4.patch>Commit c732c3f adds some SYNC_FORGET_REQUESTs.
slru-truncate-modulo-v5.patch fixes off-by-one error in functions like *PagePrecedes(int page1, int page2).
slru-truncate-t-insurance-v4.patch ensures that off-by-one errors do not inflict data loss.While I agree that fixing error is better than hiding it, I could not figure out how c732c3f is connected to these patches.
Can you please give me few pointers how to understand this connection?
Commit c732c3f is the last commit that caused a merge conflict. There's no
other connection to this thread, and one can review patches on this thread
without studying commit c732c3f. Specifically, this thread's
slru-truncate-modulo patch and commit c732c3f modify adjacent lines in
SlruScanDirCbDeleteCutoff(); here's the diff after merge conflict resolution:
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@@ -1525,4 -1406,4 +1517,4 @@@ SlruScanDirCbDeleteCutoff(SlruCtl ctl,
- if (ctl->PagePrecedes(segpage, cutoffPage))
+ if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
- SlruInternalDeleteSegment(ctl, filename);
+ SlruInternalDeleteSegment(ctl, segpage / SLRU_PAGES_PER_SEGMENT);
2 янв. 2021 г., в 01:35, Noah Misch <noah@leadboat.com> написал(а):
There's no
other connection to this thread, and one can review patches on this thread
without studying commit c732c3f.
OK, thanks!
Do I understand correctly that this is bugfix that needs to be back-patched?
Thus we should not refactor 4 identical *PagePrecedes(int page1, int page2) into 1 generic function?
Since functions are not symmetric anymore, maybe we should have better names for arguments than "page1" and "page2"? At least in dev branch.
Is it common practice to embed tests into assert checking like in SlruPagePrecedesUnitTests()?
SLRU seems no near simple, BTW. The only simple place is naive caching algorithm. I remember there was a thread to do relations from SLRUs.
Best regards, Andrey Borodin.
On Sat, Jan 02, 2021 at 12:31:45PM +0500, Andrey Borodin wrote:
Do I understand correctly that this is bugfix that needs to be back-patched?
The slru-truncate-modulo patch fixes a bug. The slru-truncate-t-insurance
patch does not. Neither _needs_ to be back-patched, though I'm proposing to
back-patch both. I welcome opinions about that.
Thus we should not refactor 4 identical *PagePrecedes(int page1, int page2) into 1 generic function?
I agree with not refactoring that way, in this case.
Since functions are not symmetric anymore, maybe we should have better names for arguments than "page1" and "page2"? At least in dev branch.
That works for me. What names would you suggest?
Is it common practice to embed tests into assert checking like in SlruPagePrecedesUnitTests()?
No; it's neither common practice nor a policy breach.
1 янв. 2021 г., в 23:05, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
I've found this thread in CF looking for something to review.
We discussed patches with Noah offlist. I'm resending summary to list.
There are two patches:
1. slru-truncate-modulo-v5.patch
2. slru-truncate-t-insurance-v4.patch
It would be a bit easier to review if patches were versioned together (v5 both), because 2nd patch applies on top of 1st. Also 2nd patch have a problem if applying with git (in async.c).
First patch fixes a bug with possible SLRU truncation around wrapping point too early.
Basic idea of the patch is "If we want to delete a range we must be eligible to delete it's beginning and ending".
So to test if page is deletable it tests that first and last xids(or other SLRU's unit) are of no interest. To test if a segment is deletable it tests if first and last pages can be deleted.
Patch adds test in unusual manner: they are implemented as assert functions. Tests are fast, they are only checking basic and edge cases. But tests will not be run if Postgres is build without asserts.
I'm a little suspicious of implementation of *PagePrecedes(int page1, int page2) functions. Consider following example from the patch:
static bool
CommitTsPagePrecedes(int page1, int page2)
{
TransactionId xid1;
TransactionId xid2;
xid1 = ((TransactionId) page1) * COMMIT_TS_XACTS_PER_PAGE;
xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * COMMIT_TS_XACTS_PER_PAGE;
xid2 += FirstNormalTransactionId + 1;
return (TransactionIdPrecedes(xid1, xid2) &&
TransactionIdPrecedes(xid1, xid2 + COMMIT_TS_XACTS_PER_PAGE - 1));
}
We are adding FirstNormalTransactionId to xids to avoid them being special xids.
We add COMMIT_TS_XACTS_PER_PAGE - 1 to xid2 to shift it to the end of the page. But due to += FirstNormalTransactionId we shift slightly behind page boundaries and risk that xid2 + COMMIT_TS_XACTS_PER_PAGE - 1 can become FrozenTransactionId (FirstNormalTransactionId - 1). Thus we add +1 to all values in scope. While the logic is correct, coding is difficult. Maybe we could just use
page1_first_normal_xid = ((TransactionId) page1) * CLOG_XACTS_PER_PAGE + FirstNormalTransactionId;
page2_first_normal_xid = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE + FirstNormalTransactionId;
page2_last_xid = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE + CLOG_XACTS_PER_PAGE - 1;
But I'm not insisting on this.
Following comment is not correct for 1Kb and 4Kb pages
+ * At every supported BLCKSZ, (1 << 31) % COMMIT_TS_XACTS_PER_PAGE == 128.
All above notes are not blocking the fix, I just wanted to let committer know about this. I think that it's very important to have this bug fixed.
Second patch converts binary *PagePreceeds() functions to *PageDiff()s and adds logic to avoid deleting pages in suspicious cases. This logic depends on the scale of returned by diff value: it expects that overflow happens between INT_MIN\INT_MAX. This it prevents page deletion if page_diff <= INT_MIN / 2 (too far from current cleaning point; and in normal cases, of cause).
It must be comparison here, not equality test.
- ctl->PagePrecedes(segpage, trunc->earliestExistingPage))
+ ctl->PageDiff(segpage, trunc->earliestExistingPage))
This
int diff_max = ((QUEUE_MAX_PAGE + 1) / 2) - 1,
seems to be functional equivalent of
int diff_max = ((QUEUE_MAX_PAGE - 1) / 2),
What I like about the patch is that it removes all described above trickery around + FirstNormalTransactionId + 1.
AFAICS the overall purpose of the 2nd patch is to help corrupted by other bugs clusters avoid deleting SLRU segments.
I'm a little bit afraid that this kind of patch can hide bugs (while potentially saving some users data). Besides this patch seems like a useful precaution. Maybe we could emit scary warnings if SLRU segments do not stack into continuous range?
Thanks!
Best regards, Andrey Borodin.
On Wed, Jan 06, 2021 at 11:28:36AM +0500, Andrey Borodin wrote:
First patch fixes a bug with possible SLRU truncation around wrapping point too early.
Basic idea of the patch is "If we want to delete a range we must be eligible to delete it's beginning and ending".
So to test if page is deletable it tests that first and last xids(or other SLRU's unit) are of no interest. To test if a segment is deletable it tests if first and last pages can be deleted.
Yes.
I'm a little suspicious of implementation of *PagePrecedes(int page1, int page2) functions. Consider following example from the patch:
static bool
CommitTsPagePrecedes(int page1, int page2)
{
TransactionId xid1;
TransactionId xid2;xid1 = ((TransactionId) page1) * COMMIT_TS_XACTS_PER_PAGE;
xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * COMMIT_TS_XACTS_PER_PAGE;
xid2 += FirstNormalTransactionId + 1;return (TransactionIdPrecedes(xid1, xid2) &&
TransactionIdPrecedes(xid1, xid2 + COMMIT_TS_XACTS_PER_PAGE - 1));
}We are adding FirstNormalTransactionId to xids to avoid them being special xids.
We add COMMIT_TS_XACTS_PER_PAGE - 1 to xid2 to shift it to the end of the page. But due to += FirstNormalTransactionId we shift slightly behind page boundaries and risk that xid2 + COMMIT_TS_XACTS_PER_PAGE - 1 can become FrozenTransactionId (FirstNormalTransactionId - 1). Thus we add +1 to all values in scope. While the logic is correct, coding is difficult. Maybe we could just use
Right. The overall objective is to compare the first XID of page1 to the
first and last XIDs of page2. The FirstNormalTransactionId+1 addend operates
at a lower level. It just makes TransactionIdPrecedes() behave like
NormalTransactionIdPrecedes() without the latter's assertion.
page1_first_normal_xid = ((TransactionId) page1) * CLOG_XACTS_PER_PAGE + FirstNormalTransactionId;
page2_first_normal_xid = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE + FirstNormalTransactionId;
page2_last_xid = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE + CLOG_XACTS_PER_PAGE - 1;
But I'm not insisting on this.
I see your point, but I doubt using different addends on different operands
makes this easier to understand. If anything, I'd lean toward adding more
explicit abstraction between "the XID we intend to test" and "the XID we're
using to fool some general-purpose API".
Following comment is not correct for 1Kb and 4Kb pages
+ * At every supported BLCKSZ, (1 << 31) % COMMIT_TS_XACTS_PER_PAGE == 128.
Fixed, thanks.
All above notes are not blocking the fix, I just wanted to let committer know about this. I think that it's very important to have this bug fixed.
Second patch converts binary *PagePreceeds() functions to *PageDiff()s and adds logic to avoid deleting pages in suspicious cases. This logic depends on the scale of returned by diff value: it expects that overflow happens between INT_MIN\INT_MAX. This it prevents page deletion if page_diff <= INT_MIN / 2 (too far from current cleaning point; and in normal cases, of cause).
It must be comparison here, not equality test. - ctl->PagePrecedes(segpage, trunc->earliestExistingPage)) + ctl->PageDiff(segpage, trunc->earliestExistingPage))
That's bad. Fixed, thanks.
This
int diff_max = ((QUEUE_MAX_PAGE + 1) / 2) - 1,
seems to be functional equivalent of
int diff_max = ((QUEUE_MAX_PAGE - 1) / 2),
Do you think one conveys the concept better than the other?
AFAICS the overall purpose of the 2nd patch is to help corrupted by other bugs clusters avoid deleting SLRU segments.
Yes.
I'm a little bit afraid that this kind of patch can hide bugs (while potentially saving some users data). Besides this patch seems like a useful precaution. Maybe we could emit scary warnings if SLRU segments do not stack into continuous range?
Scary warnings are good for an observation that implies a bug, but the
slru-truncate-t-insurance patch causes such an outcome in non-bug cases where
it doesn't happen today. In other words, discontinuous ranges of SLRU
segments would be even more common after that patch. For example, it would
happen anytime oldestXID advances by more than ~1B at a time.
Thanks,
nm
9 янв. 2021 г., в 15:17, Noah Misch <noah@leadboat.com> написал(а):
This
int diff_max = ((QUEUE_MAX_PAGE + 1) / 2) - 1,
seems to be functional equivalent of
int diff_max = ((QUEUE_MAX_PAGE - 1) / 2),Do you think one conveys the concept better than the other?
I see now that next comments mention "(QUEUE_MAX_PAGE+1)/2", so I think there is no need to change something in a patch here.
I'm a little bit afraid that this kind of patch can hide bugs (while potentially saving some users data). Besides this patch seems like a useful precaution. Maybe we could emit scary warnings if SLRU segments do not stack into continuous range?
Scary warnings are good for an observation that implies a bug, but the
slru-truncate-t-insurance patch causes such an outcome in non-bug cases where
it doesn't happen today. In other words, discontinuous ranges of SLRU
segments would be even more common after that patch. For example, it would
happen anytime oldestXID advances by more than ~1B at a time.
Uhm, I thought that if there is going to be more than ~1B xids - we are going to keep all segements forever and range still will be continuous. Or am I missing something?
Thanks!
Best regards, Andrey Borodin.
On Sat, Jan 09, 2021 at 08:25:39PM +0500, Andrey Borodin wrote:
9 янв. 2021 г., в 15:17, Noah Misch <noah@leadboat.com> написал(а):
I'm a little bit afraid that this kind of patch can hide bugs (while potentially saving some users data). Besides this patch seems like a useful precaution. Maybe we could emit scary warnings if SLRU segments do not stack into continuous range?
Scary warnings are good for an observation that implies a bug, but the
slru-truncate-t-insurance patch causes such an outcome in non-bug cases where
it doesn't happen today. In other words, discontinuous ranges of SLRU
segments would be even more common after that patch. For example, it would
happen anytime oldestXID advances by more than ~1B at a time.Uhm, I thought that if there is going to be more than ~1B xids - we are going to keep all segements forever and range still will be continuous. Or am I missing something?
No; it deletes the most recent ~1B and leaves the older segments. An
exception is multixact, as described in the commit message and the patch's
change to a comment in TruncateMultiXact().
10 янв. 2021 г., в 03:15, Noah Misch <noah@leadboat.com> написал(а):
No; it deletes the most recent ~1B and leaves the older segments. An
exception is multixact, as described in the commit message and the patch's
change to a comment in TruncateMultiXact().
Thanks for clarification.
One more thing: retention point at 3/4 of overall space (half of wraparound) seems more or less random to me. Why not 5/8 or 9/16?
Can you please send revised patches with fixes?
Thanks!
Best regards, Andrey Borodin.
On Sun, Jan 10, 2021 at 11:44:14AM +0500, Andrey Borodin wrote:
10 янв. 2021 г., в 03:15, Noah Misch <noah@leadboat.com> написал(а):
No; it deletes the most recent ~1B and leaves the older segments. An
exception is multixact, as described in the commit message and the patch's
change to a comment in TruncateMultiXact().Thanks for clarification.
One more thing: retention point at 3/4 of overall space (half of wraparound) seems more or less random to me. Why not 5/8 or 9/16?
No reason for that exact value. The purpose of that patch is to mitigate bugs
that cause the server to write data into a region of the SLRU that we permit
truncation to unlink. If the patch instead tested "diff > INT_MIN * .99", the
new behavior would get little testing, because xidWarnLimit would start first.
Also, the new behavior wouldn't mitigate bugs that trespass >~20M XIDs into
unlink-eligible space. If the patch tested "diff > INT_MIN * .01", more sites
would see disk consumption grow. I think reasonable multipliers range from
0.5 (in the patch today) to 0.9, but it's a judgment call.
Can you please send revised patches with fixes?
Attached.
Attachments:
slru-truncate-modulo-v6.patchtext/plain; charset=us-asciiDownload
Author: Noah Misch <noah@leadboat.com>
Commit: Noah Misch <noah@leadboat.com>
Prevent excess SimpleLruTruncate() deletion.
Every core SLRU wraps around. With the exception of pg_notify, the wrap
point can fall in the middle of a page. Account for this in the
PagePrecedes callback specification and in SimpleLruTruncate()'s use of
said callback. Update each callback implementation to fit the new
specification. This changes SerialPagePrecedesLogically() from the
style of asyncQueuePagePrecedes() to the style of CLOGPagePrecedes().
(Whereas pg_clog and pg_serial share a key space, pg_serial is nothing
like pg_notify.) The bug fixed here has the same symptoms and user
followup steps as 592a589a04bd456410b853d86bd05faa9432cbbb. Back-patch
to 9.5 (all supported versions).
Reviewed by Andrey Borodin and Tom Lane.
Discussion: https://postgr.es/m/20190202083822.GC32531@gust.leadboat.com
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 69a81f3..410d02a 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -694,6 +694,7 @@ CLOGShmemInit(void)
SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
SYNC_HANDLER_CLOG);
+ SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
}
/*
@@ -912,13 +913,22 @@ TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid)
/*
- * Decide which of two CLOG page numbers is "older" for truncation purposes.
+ * Decide whether a CLOG page number is "older" for truncation purposes.
*
* We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic. However, if we are asked about
- * page number zero, we don't want to hand InvalidTransactionId to
- * TransactionIdPrecedes: it'll get weird about permanent xact IDs. So,
- * offset both xids by FirstNormalTransactionId to avoid that.
+ * thing with wraparound XID arithmetic. However, TransactionIdPrecedes()
+ * would get weird about permanent xact IDs. So, offset both such that xid1,
+ * xid2, and xid2 + CLOG_XACTS_PER_PAGE - 1 are all normal XIDs; this offset
+ * is relevant to page 0 and to the page preceding page 0.
+ *
+ * The page containing oldestXact-2^31 is the important edge case. The
+ * portion of that page equaling or following oldestXact-2^31 is expendable,
+ * but the portion preceding oldestXact-2^31 is not. When oldestXact-2^31 is
+ * the first XID of a page and segment, the entire page and segment is
+ * expendable, and we could truncate the segment. Recognizing that case would
+ * require making oldestXact, not just the page containing oldestXact,
+ * available to this callback. The benefit would be rare and small, so we
+ * don't optimize that edge case.
*/
static bool
CLOGPagePrecedes(int page1, int page2)
@@ -927,11 +937,12 @@ CLOGPagePrecedes(int page1, int page2)
TransactionId xid2;
xid1 = ((TransactionId) page1) * CLOG_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId;
+ xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId;
+ xid2 += FirstNormalTransactionId + 1;
- return TransactionIdPrecedes(xid1, xid2);
+ return (TransactionIdPrecedes(xid1, xid2) &&
+ TransactionIdPrecedes(xid1, xid2 + CLOG_XACTS_PER_PAGE - 1));
}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index b786eef..9f42461 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -557,6 +557,7 @@ CommitTsShmemInit(void)
CommitTsSLRULock, "pg_commit_ts",
LWTRANCHE_COMMITTS_BUFFER,
SYNC_HANDLER_COMMIT_TS);
+ SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
commitTsShared = ShmemInitStruct("CommitTs shared",
sizeof(CommitTimestampShared),
@@ -927,14 +928,27 @@ AdvanceOldestCommitTsXid(TransactionId oldestXact)
/*
- * Decide which of two commitTS page numbers is "older" for truncation
- * purposes.
+ * Decide whether a commitTS page number is "older" for truncation purposes.
+ * Analogous to CLOGPagePrecedes().
*
- * We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic. However, if we are asked about
- * page number zero, we don't want to hand InvalidTransactionId to
- * TransactionIdPrecedes: it'll get weird about permanent xact IDs. So,
- * offset both xids by FirstNormalTransactionId to avoid that.
+ * At default BLCKSZ, (1 << 31) % COMMIT_TS_XACTS_PER_PAGE == 128. This
+ * introduces differences compared to CLOG and the other SLRUs having (1 <<
+ * 31) % per_page == 0. This function never tests exactly
+ * TransactionIdPrecedes(x-2^31, x). When the system reaches xidStopLimit,
+ * there are two possible counts of page boundaries between oldestXact and the
+ * latest XID assigned, depending on whether oldestXact is within the first
+ * 128 entries of its page. Since this function doesn't know the location of
+ * oldestXact within page2, it returns false for one page that actually is
+ * expendable. This is a wider (yet still negligible) version of the
+ * truncation opportunity that CLOGPagePrecedes() cannot recognize.
+ *
+ * For the sake of a worked example, number entries with decimal values such
+ * that page1==1 entries range from 1.0 to 1.999. Let N+0.15 be the number of
+ * pages that 2^31 entries will span (N is an integer). If oldestXact=N+2.1,
+ * then the final safe XID assignment leaves newestXact=1.95. We keep page 2,
+ * because entry=2.85 is the border that toggles whether entries precede the
+ * last entry of the oldestXact page. While page 2 is expendable at
+ * oldestXact=N+2.1, it would be precious at oldestXact=N+2.9.
*/
static bool
CommitTsPagePrecedes(int page1, int page2)
@@ -943,11 +957,12 @@ CommitTsPagePrecedes(int page1, int page2)
TransactionId xid2;
xid1 = ((TransactionId) page1) * COMMIT_TS_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId;
+ xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * COMMIT_TS_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId;
+ xid2 += FirstNormalTransactionId + 1;
- return TransactionIdPrecedes(xid1, xid2);
+ return (TransactionIdPrecedes(xid1, xid2) &&
+ TransactionIdPrecedes(xid1, xid2 + COMMIT_TS_XACTS_PER_PAGE - 1));
}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 1233448..7dcfa02 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1852,11 +1852,13 @@ MultiXactShmemInit(void)
MultiXactOffsetSLRULock, "pg_multixact/offsets",
LWTRANCHE_MULTIXACTOFFSET_BUFFER,
SYNC_HANDLER_MULTIXACT_OFFSET);
+ SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
SimpleLruInit(MultiXactMemberCtl,
"MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
MultiXactMemberSLRULock, "pg_multixact/members",
LWTRANCHE_MULTIXACTMEMBER_BUFFER,
SYNC_HANDLER_MULTIXACT_MEMBER);
+ /* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
/* Initialize our shared state struct */
MultiXactState = ShmemInitStruct("Shared MultiXact State",
@@ -2982,6 +2984,14 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
* truncate the members SLRU. So we first scan the directory to determine
* the earliest offsets page number that we can read without error.
*
+ * When nextMXact is less than one segment away from multiWrapLimit,
+ * SlruScanDirCbFindEarliest can find some early segment other than the
+ * actual earliest. (MultiXactOffsetPagePrecedes(EARLIEST, LATEST)
+ * returns false, because not all pairs of entries have the same answer.)
+ * That can also arise when an earlier truncation attempt failed unlink()
+ * or returned early from this function. The only consequence is
+ * returning early, which wastes space that we could have liberated.
+ *
* NB: It's also possible that the page that oldestMulti is on has already
* been truncated away, and we crashed before updating oldestMulti.
*/
@@ -3096,15 +3106,11 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
}
/*
- * Decide which of two MultiXactOffset page numbers is "older" for truncation
- * purposes.
+ * Decide whether a MultiXactOffset page number is "older" for truncation
+ * purposes. Analogous to CLOGPagePrecedes().
*
- * We need to use comparison of MultiXactId here in order to do the right
- * thing with wraparound. However, if we are asked about page number zero, we
- * don't want to hand InvalidMultiXactId to MultiXactIdPrecedes: it'll get
- * weird. So, offset both multis by FirstMultiXactId to avoid that.
- * (Actually, the current implementation doesn't do anything weird with
- * InvalidMultiXactId, but there's no harm in leaving this code like this.)
+ * Offsetting the values is optional, because MultiXactIdPrecedes() has
+ * translational symmetry.
*/
static bool
MultiXactOffsetPagePrecedes(int page1, int page2)
@@ -3113,15 +3119,17 @@ MultiXactOffsetPagePrecedes(int page1, int page2)
MultiXactId multi2;
multi1 = ((MultiXactId) page1) * MULTIXACT_OFFSETS_PER_PAGE;
- multi1 += FirstMultiXactId;
+ multi1 += FirstMultiXactId + 1;
multi2 = ((MultiXactId) page2) * MULTIXACT_OFFSETS_PER_PAGE;
- multi2 += FirstMultiXactId;
+ multi2 += FirstMultiXactId + 1;
- return MultiXactIdPrecedes(multi1, multi2);
+ return (MultiXactIdPrecedes(multi1, multi2) &&
+ MultiXactIdPrecedes(multi1,
+ multi2 + MULTIXACT_OFFSETS_PER_PAGE - 1));
}
/*
- * Decide which of two MultiXactMember page numbers is "older" for truncation
+ * Decide whether a MultiXactMember page number is "older" for truncation
* purposes. There is no "invalid offset number" so use the numbers verbatim.
*/
static bool
@@ -3133,7 +3141,9 @@ MultiXactMemberPagePrecedes(int page1, int page2)
offset1 = ((MultiXactOffset) page1) * MULTIXACT_MEMBERS_PER_PAGE;
offset2 = ((MultiXactOffset) page2) * MULTIXACT_MEMBERS_PER_PAGE;
- return MultiXactOffsetPrecedes(offset1, offset2);
+ return (MultiXactOffsetPrecedes(offset1, offset2) &&
+ MultiXactOffsetPrecedes(offset1,
+ offset2 + MULTIXACT_MEMBERS_PER_PAGE - 1));
}
/*
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 244e518..e49e06e 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -1231,11 +1231,6 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
pgstat_count_slru_truncate(shared->slru_stats_idx);
/*
- * The cutoff point is the start of the segment containing cutoffPage.
- */
- cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
-
- /*
* Scan shared memory and remove any pages preceding the cutoff page, to
* ensure we won't rewrite them later. (Since this is normally called in
* or just after a checkpoint, any dirty pages should have been flushed
@@ -1247,9 +1242,7 @@ restart:;
/*
* While we are holding the lock, make an important safety check: the
- * planned cutoff point must be <= the current endpoint page. Otherwise we
- * have already wrapped around, and proceeding with the truncation would
- * risk removing the current segment.
+ * current endpoint page must not be eligible for removal.
*/
if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
{
@@ -1281,8 +1274,11 @@ restart:;
* Hmm, we have (or may have) I/O operations acting on the page, so
* we've got to wait for them to finish and then start again. This is
* the same logic as in SlruSelectLRUPage. (XXX if page is dirty,
- * wouldn't it be OK to just discard it without writing it? For now,
- * keep the logic the same as it was.)
+ * wouldn't it be OK to just discard it without writing it?
+ * SlruMayDeleteSegment() uses a stricter qualification, so we might
+ * not delete this page in the end; even if we don't delete it, we
+ * won't have cause to read its data again. For now, keep the logic
+ * the same as it was.)
*/
if (shared->page_status[slotno] == SLRU_PAGE_VALID)
SlruInternalWritePage(ctl, slotno, NULL);
@@ -1378,18 +1374,133 @@ restart:
}
/*
+ * Determine whether a segment is okay to delete.
+ *
+ * segpage is the first page of the segment, and cutoffPage is the oldest (in
+ * PagePrecedes order) page in the SLRU containing still-useful data. Since
+ * every core PagePrecedes callback implements "wrap around", check the
+ * segment's first and last pages:
+ *
+ * first<cutoff && last<cutoff: yes
+ * first<cutoff && last>=cutoff: no; cutoff falls inside this segment
+ * first>=cutoff && last<cutoff: no; wrap point falls inside this segment
+ * first>=cutoff && last>=cutoff: no; every page of this segment is too young
+ */
+static bool
+SlruMayDeleteSegment(SlruCtl ctl, int segpage, int cutoffPage)
+{
+ int seg_last_page = segpage + SLRU_PAGES_PER_SEGMENT - 1;
+
+ Assert(segpage % SLRU_PAGES_PER_SEGMENT == 0);
+
+ return (ctl->PagePrecedes(segpage, cutoffPage) &&
+ ctl->PagePrecedes(seg_last_page, cutoffPage));
+}
+
+#ifdef USE_ASSERT_CHECKING
+static void
+SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
+{
+ TransactionId lhs,
+ rhs;
+ int newestPage,
+ oldestPage;
+ TransactionId newestXact,
+ oldestXact;
+
+ /*
+ * Compare an XID pair having undefined order (see RFC 1982), a pair at
+ * "opposite ends" of the XID space. TransactionIdPrecedes() treats each
+ * as preceding the other. If RHS is oldestXact, LHS is the first XID we
+ * must not assign.
+ */
+ lhs = per_page + offset; /* skip first page to avoid non-normal XIDs */
+ rhs = lhs + (1U << 31);
+ Assert(TransactionIdPrecedes(lhs, rhs));
+ Assert(TransactionIdPrecedes(rhs, lhs));
+ Assert(!TransactionIdPrecedes(lhs - 1, rhs));
+ Assert(TransactionIdPrecedes(rhs, lhs - 1));
+ Assert(TransactionIdPrecedes(lhs + 1, rhs));
+ Assert(!TransactionIdPrecedes(rhs, lhs + 1));
+ Assert(!TransactionIdFollowsOrEquals(lhs, rhs));
+ Assert(!TransactionIdFollowsOrEquals(rhs, lhs));
+ Assert(!ctl->PagePrecedes(lhs / per_page, lhs / per_page));
+ Assert(!ctl->PagePrecedes(lhs / per_page, rhs / per_page));
+ Assert(!ctl->PagePrecedes(rhs / per_page, lhs / per_page));
+ Assert(!ctl->PagePrecedes((lhs - per_page) / per_page, rhs / per_page));
+ Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 3 * per_page) / per_page));
+ Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 2 * per_page) / per_page));
+ Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 1 * per_page) / per_page)
+ || (1U << 31) % per_page != 0); /* See CommitTsPagePrecedes() */
+ Assert(ctl->PagePrecedes((lhs + 1 * per_page) / per_page, rhs / per_page)
+ || (1U << 31) % per_page != 0);
+ Assert(ctl->PagePrecedes((lhs + 2 * per_page) / per_page, rhs / per_page));
+ Assert(ctl->PagePrecedes((lhs + 3 * per_page) / per_page, rhs / per_page));
+ Assert(!ctl->PagePrecedes(rhs / per_page, (lhs + per_page) / per_page));
+
+ /*
+ * GetNewTransactionId() has assigned the last XID it can safely use, and
+ * that XID is in the *LAST* page of the second segment. We must not
+ * delete that segment.
+ */
+ newestPage = 2 * SLRU_PAGES_PER_SEGMENT - 1;
+ newestXact = newestPage * per_page + offset;
+ Assert(newestXact / per_page == newestPage);
+ oldestXact = newestXact + 1;
+ oldestXact -= 1U << 31;
+ oldestPage = oldestXact / per_page;
+ Assert(!SlruMayDeleteSegment(ctl,
+ (newestPage -
+ newestPage % SLRU_PAGES_PER_SEGMENT),
+ oldestPage));
+
+ /*
+ * GetNewTransactionId() has assigned the last XID it can safely use, and
+ * that XID is in the *FIRST* page of the second segment. We must not
+ * delete that segment.
+ */
+ newestPage = SLRU_PAGES_PER_SEGMENT;
+ newestXact = newestPage * per_page + offset;
+ Assert(newestXact / per_page == newestPage);
+ oldestXact = newestXact + 1;
+ oldestXact -= 1U << 31;
+ oldestPage = oldestXact / per_page;
+ Assert(!SlruMayDeleteSegment(ctl,
+ (newestPage -
+ newestPage % SLRU_PAGES_PER_SEGMENT),
+ oldestPage));
+}
+
+/*
+ * Unit-test a PagePrecedes function.
+ *
+ * This assumes every uint32 >= FirstNormalTransactionId is a valid key. It
+ * assumes each value occupies a contiguous, fixed-size region of SLRU bytes.
+ * (MultiXactMemberCtl separates flags from XIDs. AsyncCtl has
+ * variable-length entries, no keys, and no random access. These unit tests
+ * do not apply to them.)
+ */
+void
+SlruPagePrecedesUnitTests(SlruCtl ctl, int per_page)
+{
+ /* Test first, middle and last entries of a page. */
+ SlruPagePrecedesTestOffset(ctl, per_page, 0);
+ SlruPagePrecedesTestOffset(ctl, per_page, per_page / 2);
+ SlruPagePrecedesTestOffset(ctl, per_page, per_page - 1);
+}
+#endif
+
+/*
* SlruScanDirectory callback
- * This callback reports true if there's any segment prior to the one
- * containing the page passed as "data".
+ * This callback reports true if there's any segment wholly prior to the
+ * one containing the page passed as "data".
*/
bool
SlruScanDirCbReportPresence(SlruCtl ctl, char *filename, int segpage, void *data)
{
int cutoffPage = *(int *) data;
- cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
-
- if (ctl->PagePrecedes(segpage, cutoffPage))
+ if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
return true; /* found one; don't iterate any more */
return false; /* keep going */
@@ -1404,7 +1515,7 @@ SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename, int segpage, void *data)
{
int cutoffPage = *(int *) data;
- if (ctl->PagePrecedes(segpage, cutoffPage))
+ if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
SlruInternalDeleteSegment(ctl, segpage / SLRU_PAGES_PER_SEGMENT);
return false; /* keep going */
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 8cdc9e0..6a8e521 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -194,6 +194,7 @@ SUBTRANSShmemInit(void)
SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
SubtransSLRULock, "pg_subtrans",
LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE);
+ SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
}
/*
@@ -354,13 +355,8 @@ TruncateSUBTRANS(TransactionId oldestXact)
/*
- * Decide which of two SUBTRANS page numbers is "older" for truncation purposes.
- *
- * We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic. However, if we are asked about
- * page number zero, we don't want to hand InvalidTransactionId to
- * TransactionIdPrecedes: it'll get weird about permanent xact IDs. So,
- * offset both xids by FirstNormalTransactionId to avoid that.
+ * Decide whether a SUBTRANS page number is "older" for truncation purposes.
+ * Analogous to CLOGPagePrecedes().
*/
static bool
SubTransPagePrecedes(int page1, int page2)
@@ -369,9 +365,10 @@ SubTransPagePrecedes(int page1, int page2)
TransactionId xid2;
xid1 = ((TransactionId) page1) * SUBTRANS_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId;
+ xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * SUBTRANS_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId;
+ xid2 += FirstNormalTransactionId + 1;
- return TransactionIdPrecedes(xid1, xid2);
+ return (TransactionIdPrecedes(xid1, xid2) &&
+ TransactionIdPrecedes(xid1, xid2 + SUBTRANS_XACTS_PER_PAGE - 1));
}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 7c133ec..42b232d 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -490,7 +490,12 @@ asyncQueuePageDiff(int p, int q)
return diff;
}
-/* Is p < q, accounting for wraparound? */
+/*
+ * Is p < q, accounting for wraparound?
+ *
+ * Since asyncQueueIsFull() blocks creation of a page that could precede any
+ * extant page, we need not assess entries within a page.
+ */
static bool
asyncQueuePagePrecedes(int p, int q)
{
@@ -1352,8 +1357,8 @@ asyncQueueIsFull(void)
* logically precedes the current global tail pointer, ie, the head
* pointer would wrap around compared to the tail. We cannot create such
* a head page for fear of confusing slru.c. For safety we round the tail
- * pointer back to a segment boundary (compare the truncation logic in
- * asyncQueueAdvanceTail).
+ * pointer back to a segment boundary (truncation logic in
+ * asyncQueueAdvanceTail does not do this, so doing it here is optional).
*
* Note that this test is *not* dependent on how much space there is on
* the current head page. This is necessary because asyncQueueAddEntries
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 1b262d6..822c22e 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -438,7 +438,7 @@ static void SetPossibleUnsafeConflict(SERIALIZABLEXACT *roXact, SERIALIZABLEXACT
static void ReleaseRWConflict(RWConflict conflict);
static void FlagSxactUnsafe(SERIALIZABLEXACT *sxact);
-static bool SerialPagePrecedesLogically(int p, int q);
+static bool SerialPagePrecedesLogically(int page1, int page2);
static void SerialInit(void);
static void SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo);
static SerCommitSeqNo SerialGetMinConflictCommitSeqNo(TransactionId xid);
@@ -784,27 +784,77 @@ FlagSxactUnsafe(SERIALIZABLEXACT *sxact)
/*------------------------------------------------------------------------*/
/*
- * We will work on the page range of 0..SERIAL_MAX_PAGE.
- * Compares using wraparound logic, as is required by slru.c.
+ * Decide whether a Serial page number is "older" for truncation purposes.
+ * Analogous to CLOGPagePrecedes().
*/
static bool
-SerialPagePrecedesLogically(int p, int q)
+SerialPagePrecedesLogically(int page1, int page2)
{
- int diff;
+ TransactionId xid1;
+ TransactionId xid2;
+
+ xid1 = ((TransactionId) page1) * SERIAL_ENTRIESPERPAGE;
+ xid1 += FirstNormalTransactionId + 1;
+ xid2 = ((TransactionId) page2) * SERIAL_ENTRIESPERPAGE;
+ xid2 += FirstNormalTransactionId + 1;
+
+ return (TransactionIdPrecedes(xid1, xid2) &&
+ TransactionIdPrecedes(xid1, xid2 + SERIAL_ENTRIESPERPAGE - 1));
+}
+
+static void
+SerialPagePrecedesLogicallyUnitTests(void)
+{
+ int per_page = SERIAL_ENTRIESPERPAGE,
+ offset = per_page / 2;
+ int newestPage,
+ oldestPage,
+ headPage,
+ targetPage;
+ TransactionId newestXact,
+ oldestXact;
+
+ /* GetNewTransactionId() has assigned the last XID it can safely use. */
+ newestPage = 2 * SLRU_PAGES_PER_SEGMENT - 1; /* nothing special */
+ newestXact = newestPage * per_page + offset;
+ Assert(newestXact / per_page == newestPage);
+ oldestXact = newestXact + 1;
+ oldestXact -= 1U << 31;
+ oldestPage = oldestXact / per_page;
/*
- * We have to compare modulo (SERIAL_MAX_PAGE+1)/2. Both inputs should be
- * in the range 0..SERIAL_MAX_PAGE.
+ * In this scenario, the SLRU headPage pertains to the last ~1000 XIDs
+ * assigned. oldestXact finishes, ~2B XIDs having elapsed since it
+ * started. Further transactions cause us to summarize oldestXact to
+ * tailPage. Function must return false so SerialAdd() doesn't zero
+ * tailPage (which may contain entries for other old, recently-finished
+ * XIDs) and half the SLRU. Reaching this requires burning ~2B XIDs in
+ * single-user mode, a negligible possibility.
*/
- Assert(p >= 0 && p <= SERIAL_MAX_PAGE);
- Assert(q >= 0 && q <= SERIAL_MAX_PAGE);
-
- diff = p - q;
- if (diff >= ((SERIAL_MAX_PAGE + 1) / 2))
- diff -= SERIAL_MAX_PAGE + 1;
- else if (diff < -((int) (SERIAL_MAX_PAGE + 1) / 2))
- diff += SERIAL_MAX_PAGE + 1;
- return diff < 0;
+ headPage = newestPage;
+ targetPage = oldestPage;
+ Assert(!SerialPagePrecedesLogically(headPage, targetPage));
+
+ /*
+ * In this scenario, the SLRU headPage pertains to oldestXact. We're
+ * summarizing an XID near newestXact. (Assume few other XIDs used
+ * SERIALIZABLE, hence the minimal headPage advancement. Assume
+ * oldestXact was long-running and only recently reached the SLRU.)
+ * Function must return true to make SerialAdd() create targetPage.
+ *
+ * Today's implementation mishandles this case, but it doesn't matter
+ * enough to fix. Verify that the defect affects just one page by
+ * asserting correct treatment of its prior page. Reaching this case
+ * requires burning ~2B XIDs in single-user mode, a negligible
+ * possibility. Moreover, if it does happen, the consequence would be
+ * mild, namely a new transaction failing in SimpleLruReadPage().
+ */
+ headPage = oldestPage;
+ targetPage = newestPage;
+ Assert(SerialPagePrecedesLogically(headPage, targetPage - 1));
+#if 0
+ Assert(SerialPagePrecedesLogically(headPage, targetPage));
+#endif
}
/*
@@ -822,6 +872,8 @@ SerialInit(void)
SimpleLruInit(SerialSlruCtl, "Serial",
NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE);
+ SerialPagePrecedesLogicallyUnitTests();
+ SlruPagePrecedesUnitTests(SerialSlruCtl, SERIAL_ENTRIESPERPAGE);
/*
* Create or attach to the SerialControl structure.
@@ -1030,7 +1082,7 @@ CheckPointPredicate(void)
}
else
{
- /*
+ /*----------
* The SLRU is no longer needed. Truncate to head before we set head
* invalid.
*
@@ -1039,6 +1091,25 @@ CheckPointPredicate(void)
* that we leave behind will appear to be new again. In that case it
* won't be removed until XID horizon advances enough to make it
* current again.
+ *
+ * XXX: This should happen in vac_truncate_clog(), not in checkpoints.
+ * Consider this scenario, starting from a system with no in-progress
+ * transactions and VACUUM FREEZE having maximized oldestXact:
+ * - Start a SERIALIZABLE transaction.
+ * - Start, finish, and summarize a SERIALIZABLE transaction, creating
+ * one SLRU page.
+ * - Consume XIDs to reach xidStopLimit.
+ * - Finish all transactions. Due to the long-running SERIALIZABLE
+ * transaction, earlier checkpoints did not touch headPage. The
+ * next checkpoint will change it, but that checkpoint happens after
+ * the end of the scenario.
+ * - VACUUM to advance XID limits.
+ * - Consume ~2M XIDs, crossing the former xidWrapLimit.
+ * - Start, finish, and summarize a SERIALIZABLE transaction.
+ * SerialAdd() declines to create the targetPage, because headPage
+ * is not regarded as in the past relative to that targetPage. The
+ * transaction instigating the summarize fails in
+ * SimpleLruReadPage().
*/
tailPage = serialControl->headPage;
serialControl->headPage = -1;
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 3496467..dd52e8c 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -118,9 +118,14 @@ typedef struct SlruCtlData
SyncRequestHandler sync_handler;
/*
- * Decide which of two page numbers is "older" for truncation purposes. We
- * need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic.
+ * Decide whether a page is "older" for truncation and as a hint for
+ * evicting pages in LRU order. Return true if every entry of the first
+ * argument is older than every entry of the second argument. Note that
+ * !PagePrecedes(a,b) && !PagePrecedes(b,a) need not imply a==b; it also
+ * arises when some entries are older and some are not. For SLRUs using
+ * SimpleLruTruncate(), this must use modular arithmetic. (For others,
+ * the behavior of this callback has no functional implications.) Use
+ * SlruPagePrecedesUnitTests() in SLRUs meeting its criteria.
*/
bool (*PagePrecedes) (int, int);
@@ -145,6 +150,11 @@ extern int SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno,
TransactionId xid);
extern void SimpleLruWritePage(SlruCtl ctl, int slotno);
extern void SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied);
+#ifdef USE_ASSERT_CHECKING
+extern void SlruPagePrecedesUnitTests(SlruCtl ctl, int per_page);
+#else
+#define SlruPagePrecedesUnitTests(ctl, per_page) do {} while (0)
+#endif
extern void SimpleLruTruncate(SlruCtl ctl, int cutoffPage);
extern bool SimpleLruDoesPhysicalPageExist(SlruCtl ctl, int pageno);
slru-truncate-t-insurance-v5.patchtext/plain; charset=us-asciiDownload
Author: Noah Misch <noah@leadboat.com>
Commit: Noah Misch <noah@leadboat.com>
Unlink less in SimpleLruTruncate(), as insurance against bugs.
SimpleLruTruncate() has been unlinking every expendable file. In edge
cases, it also deleted important files. The most recent commit fixed
that. Given the history of this class of bugs evading detection, let's
not trust that patch exclusively. Instead of unlinking segments
representing up to 2^31 past XIDs, delete no more than half that much.
The balance will stay in place; eventually, XID consumption will
overwrite it. This could mitigate unknown SimpleLruTruncate() bugs and
simplify manual remediation after one has overtaken wrap limits in
single-user mode.
Truncation behavior won't change at all until an SLRU is half full.
Once it does change, a drawback is conflict with the following defense.
TruncateMultiXact() skips truncation when unexpected files exist on
disk, which this change deliberately makes more common. Hence,
pg_multixact becomes more likely to persist in consuming its maximum
storage. Also, this change may uncover bugs in SLRU page recycling by
making that more common. For SLRUs outside of pg_multixact, maximum
storage rises by 50%; for example, the CLOG maximum rises from 512 MiB
to 768 MiB. Usage in pg_multixact may double. Back-patch to 9.5 (all
supported versions).
Reviewed by Andrey Borodin.
Discussion: https://postgr.es/m/20200330052809.GB2324620@rfd.leadboat.com
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 4d8ad75..c2a6961 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -405,6 +405,11 @@
in every database at least once every two billion transactions.
</para>
+ <!-- This oversimplifies; there are (2^31)-1 XIDs in the past, the same
+ number in the future, and one incomparable. (For each pair of incomparable
+ XIDs, TransactionIdPrecedes(a, b) and TransactionIdPrecedes(b, a) both
+ return true.) None of that is important to the DBA, since xidStopLimit
+ intervenes long before. -->
<para>
The reason that periodic vacuuming solves the problem is that
<command>VACUUM</command> will mark rows as <emphasis>frozen</emphasis>, indicating that
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 410d02a..1b277e8 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -53,7 +53,7 @@
* and CLOG segment numbering at
* 0xFFFFFFFF/CLOG_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need take no
* explicit notice of that fact in this module, except when comparing segment
- * and page numbers in TruncateCLOG (see CLOGPagePrecedes).
+ * and page numbers in TruncateCLOG (see CLOGPageDiff).
*/
/* We need two bits per xact, so four xacts fit in a byte */
@@ -90,7 +90,7 @@ static SlruCtlData XactCtlData;
static int ZeroCLOGPage(int pageno, bool writeXlog);
-static bool CLOGPagePrecedes(int page1, int page2);
+static int32 CLOGPageDiff(int page1, int page2);
static void WriteZeroPageXlogRec(int pageno);
static void WriteTruncateXlogRec(int pageno, TransactionId oldestXact,
Oid oldestXactDb);
@@ -690,11 +690,11 @@ CLOGShmemSize(void)
void
CLOGShmemInit(void)
{
- XactCtl->PagePrecedes = CLOGPagePrecedes;
+ XactCtl->PageDiff = CLOGPageDiff;
SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
SYNC_HANDLER_CLOG);
- SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
+ SlruPageDiffUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
}
/*
@@ -887,7 +887,7 @@ TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid)
cutoffPage = TransactionIdToPage(oldestXact);
/* Check to see if there's any files that could be removed */
- if (!SlruScanDirectory(XactCtl, SlruScanDirCbReportPresence, &cutoffPage))
+ if (!SlruScanDirectory(XactCtl, SlruScanDirCbWouldTruncate, &cutoffPage))
return; /* nothing to remove */
/*
@@ -913,13 +913,14 @@ TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid)
/*
- * Decide whether a CLOG page number is "older" for truncation purposes.
+ * Diff CLOG page numbers for truncation purposes.
*
- * We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic. However, TransactionIdPrecedes()
- * would get weird about permanent xact IDs. So, offset both such that xid1,
- * xid2, and xid2 + CLOG_XACTS_PER_PAGE - 1 are all normal XIDs; this offset
- * is relevant to page 0 and to the page preceding page 0.
+ * To do the right thing with wraparound XID arithmetic, this mirrors
+ * TransactionIdPrecedes(). The Max() operation ensures we return a positive
+ * value when the wrap point may fall inside these pages. (When it does, some
+ * pairs of entries have a positive diff, and other pairs have a negative
+ * diff.) Only the predicate.c SLRU needs the Max() operation; to avoid
+ * having even more corner cases to understand, all XID-indexed SLRUs do it.
*
* The page containing oldestXact-2^31 is the important edge case. The
* portion of that page equaling or following oldestXact-2^31 is expendable,
@@ -927,22 +928,22 @@ TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid)
* the first XID of a page and segment, the entire page and segment is
* expendable, and we could truncate the segment. Recognizing that case would
* require making oldestXact, not just the page containing oldestXact,
- * available to this callback. The benefit would be rare and small, so we
- * don't optimize that edge case.
+ * available to this callback. slru.c wouldn't delete the page, anyway.
*/
-static bool
-CLOGPagePrecedes(int page1, int page2)
+static int32
+CLOGPageDiff(int page1, int page2)
{
TransactionId xid1;
TransactionId xid2;
+ int32 diff_head;
+ int32 diff_tail;
xid1 = ((TransactionId) page1) * CLOG_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId + 1;
- return (TransactionIdPrecedes(xid1, xid2) &&
- TransactionIdPrecedes(xid1, xid2 + CLOG_XACTS_PER_PAGE - 1));
+ diff_head = xid1 - xid2;
+ diff_tail = xid1 - (xid2 + CLOG_XACTS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 9f42461..48c4cf0 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -46,7 +46,7 @@
* 0xFFFFFFFF/COMMIT_TS_XACTS_PER_PAGE, and CommitTs segment numbering at
* 0xFFFFFFFF/COMMIT_TS_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need take no
* explicit notice of that fact in this module, except when comparing segment
- * and page numbers in TruncateCommitTs (see CommitTsPagePrecedes).
+ * and page numbers in TruncateCommitTs (see CommitTsPageDiff).
*/
/*
@@ -109,7 +109,7 @@ static void TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
RepOriginId nodeid, int slotno);
static void error_commit_ts_disabled(void);
static int ZeroCommitTsPage(int pageno, bool writeXlog);
-static bool CommitTsPagePrecedes(int page1, int page2);
+static int32 CommitTsPageDiff(int page1, int page2);
static void ActivateCommitTs(void);
static void DeactivateCommitTs(void);
static void WriteZeroPageXlogRec(int pageno);
@@ -552,12 +552,12 @@ CommitTsShmemInit(void)
{
bool found;
- CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
+ CommitTsCtl->PageDiff = CommitTsPageDiff;
SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
CommitTsSLRULock, "pg_commit_ts",
LWTRANCHE_COMMITTS_BUFFER,
SYNC_HANDLER_COMMIT_TS);
- SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
+ SlruPageDiffUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
commitTsShared = ShmemInitStruct("CommitTs shared",
sizeof(CommitTimestampShared),
@@ -875,7 +875,7 @@ TruncateCommitTs(TransactionId oldestXact)
cutoffPage = TransactionIdToCTsPage(oldestXact);
/* Check to see if there's any files that could be removed */
- if (!SlruScanDirectory(CommitTsCtl, SlruScanDirCbReportPresence,
+ if (!SlruScanDirectory(CommitTsCtl, SlruScanDirCbWouldTruncate,
&cutoffPage))
return; /* nothing to remove */
@@ -928,8 +928,8 @@ AdvanceOldestCommitTsXid(TransactionId oldestXact)
/*
- * Decide whether a commitTS page number is "older" for truncation purposes.
- * Analogous to CLOGPagePrecedes().
+ * Diff commitTS page numbers for truncation purposes. Analogous to
+ * CLOGPageDiff().
*
* At default BLCKSZ, (1 << 31) % COMMIT_TS_XACTS_PER_PAGE == 128. This
* introduces differences compared to CLOG and the other SLRUs having (1 <<
@@ -940,7 +940,7 @@ AdvanceOldestCommitTsXid(TransactionId oldestXact)
* 128 entries of its page. Since this function doesn't know the location of
* oldestXact within page2, it returns false for one page that actually is
* expendable. This is a wider (yet still negligible) version of the
- * truncation opportunity that CLOGPagePrecedes() cannot recognize.
+ * truncation opportunity that CLOGPageDiff() cannot recognize.
*
* For the sake of a worked example, number entries with decimal values such
* that page1==1 entries range from 1.0 to 1.999. Let N+0.15 be the number of
@@ -950,19 +950,20 @@ AdvanceOldestCommitTsXid(TransactionId oldestXact)
* last entry of the oldestXact page. While page 2 is expendable at
* oldestXact=N+2.1, it would be precious at oldestXact=N+2.9.
*/
-static bool
-CommitTsPagePrecedes(int page1, int page2)
+static int32
+CommitTsPageDiff(int page1, int page2)
{
TransactionId xid1;
TransactionId xid2;
+ int32 diff_head;
+ int32 diff_tail;
xid1 = ((TransactionId) page1) * COMMIT_TS_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * COMMIT_TS_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId + 1;
- return (TransactionIdPrecedes(xid1, xid2) &&
- TransactionIdPrecedes(xid1, xid2 + COMMIT_TS_XACTS_PER_PAGE - 1));
+ diff_head = xid1 - xid2;
+ diff_tail = xid1 - (xid2 + COMMIT_TS_XACTS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 7dcfa02..4a53ad6 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -102,7 +102,7 @@
* 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
* take no explicit notice of that fact in this module, except when comparing
* segment and page numbers in TruncateMultiXact (see
- * MultiXactOffsetPagePrecedes).
+ * MultiXactOffsetPageDiff).
*/
/* We need four bytes per offset */
@@ -355,8 +355,8 @@ static char *mxstatus_to_string(MultiXactStatus status);
/* management of SLRU infrastructure */
static int ZeroMultiXactOffsetPage(int pageno, bool writeXlog);
static int ZeroMultiXactMemberPage(int pageno, bool writeXlog);
-static bool MultiXactOffsetPagePrecedes(int page1, int page2);
-static bool MultiXactMemberPagePrecedes(int page1, int page2);
+static int32 MultiXactOffsetPageDiff(int page1, int page2);
+static int32 MultiXactMemberPageDiff(int page1, int page2);
static bool MultiXactOffsetPrecedes(MultiXactOffset offset1,
MultiXactOffset offset2);
static void ExtendMultiXactOffset(MultiXactId multi);
@@ -1844,15 +1844,15 @@ MultiXactShmemInit(void)
debug_elog2(DEBUG2, "Shared Memory Init for MultiXact");
- MultiXactOffsetCtl->PagePrecedes = MultiXactOffsetPagePrecedes;
- MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
+ MultiXactOffsetCtl->PageDiff = MultiXactOffsetPageDiff;
+ MultiXactMemberCtl->PageDiff = MultiXactMemberPageDiff;
SimpleLruInit(MultiXactOffsetCtl,
"MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
MultiXactOffsetSLRULock, "pg_multixact/offsets",
LWTRANCHE_MULTIXACTOFFSET_BUFFER,
SYNC_HANDLER_MULTIXACT_OFFSET);
- SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
+ SlruPageDiffUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
SimpleLruInit(MultiXactMemberCtl,
"MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
MultiXactMemberSLRULock, "pg_multixact/members",
@@ -2867,7 +2867,7 @@ SlruScanDirCbFindEarliest(SlruCtl ctl, char *filename, int segpage, void *data)
mxtruncinfo *trunc = (mxtruncinfo *) data;
if (trunc->earliestExistingPage == -1 ||
- ctl->PagePrecedes(segpage, trunc->earliestExistingPage))
+ ctl->PageDiff(segpage, trunc->earliestExistingPage) < 0)
{
trunc->earliestExistingPage = segpage;
}
@@ -2986,11 +2986,12 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
*
* When nextMXact is less than one segment away from multiWrapLimit,
* SlruScanDirCbFindEarliest can find some early segment other than the
- * actual earliest. (MultiXactOffsetPagePrecedes(EARLIEST, LATEST)
- * returns false, because not all pairs of entries have the same answer.)
- * That can also arise when an earlier truncation attempt failed unlink()
- * or returned early from this function. The only consequence is
- * returning early, which wastes space that we could have liberated.
+ * actual earliest. (MultiXactOffsetPageDiff(EARLIEST, LATEST) >= 0,
+ * because not all pairs of entries have the same answer.) That can also
+ * arise when an earlier truncation attempt failed unlink(), returned
+ * early from this function, or saw SlruWouldTruncateSegment() decline to
+ * delete the older half of the SLRU. The only consequence is returning
+ * early, which wastes space that we could have liberated.
*
* NB: It's also possible that the page that oldestMulti is on has already
* been truncated away, and we crashed before updating oldestMulti.
@@ -3106,44 +3107,42 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
}
/*
- * Decide whether a MultiXactOffset page number is "older" for truncation
- * purposes. Analogous to CLOGPagePrecedes().
- *
- * Offsetting the values is optional, because MultiXactIdPrecedes() has
- * translational symmetry.
+ * Diff MultiXactOffset page numbers for truncation purposes. Analogous to
+ * CLOGPageDiff().
*/
-static bool
-MultiXactOffsetPagePrecedes(int page1, int page2)
+static int32
+MultiXactOffsetPageDiff(int page1, int page2)
{
MultiXactId multi1;
MultiXactId multi2;
+ int32 diff_head;
+ int32 diff_tail;
multi1 = ((MultiXactId) page1) * MULTIXACT_OFFSETS_PER_PAGE;
- multi1 += FirstMultiXactId + 1;
multi2 = ((MultiXactId) page2) * MULTIXACT_OFFSETS_PER_PAGE;
- multi2 += FirstMultiXactId + 1;
- return (MultiXactIdPrecedes(multi1, multi2) &&
- MultiXactIdPrecedes(multi1,
- multi2 + MULTIXACT_OFFSETS_PER_PAGE - 1));
+ diff_head = multi1 - multi2;
+ diff_tail = multi1 - (multi2 + MULTIXACT_OFFSETS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
/*
- * Decide whether a MultiXactMember page number is "older" for truncation
- * purposes. There is no "invalid offset number" so use the numbers verbatim.
+ * Diff MultiXactMember page numbers for truncation purposes.
*/
-static bool
-MultiXactMemberPagePrecedes(int page1, int page2)
+static int32
+MultiXactMemberPageDiff(int page1, int page2)
{
MultiXactOffset offset1;
MultiXactOffset offset2;
+ int32 diff_head;
+ int32 diff_tail;
offset1 = ((MultiXactOffset) page1) * MULTIXACT_MEMBERS_PER_PAGE;
offset2 = ((MultiXactOffset) page2) * MULTIXACT_MEMBERS_PER_PAGE;
- return (MultiXactOffsetPrecedes(offset1, offset2) &&
- MultiXactOffsetPrecedes(offset1,
- offset2 + MULTIXACT_MEMBERS_PER_PAGE - 1));
+ diff_head = offset1 - offset2;
+ diff_tail = offset1 - (offset2 + MULTIXACT_MEMBERS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
/*
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index e49e06e..e0f6d20 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -260,7 +260,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
/*
* Initialize the unshared control struct, including directory path. We
- * assume caller set PagePrecedes.
+ * assume caller set PageDiff.
*/
ctl->shared = shared;
ctl->sync_handler = sync_handler;
@@ -1091,8 +1091,8 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
{
if (this_delta > best_valid_delta ||
(this_delta == best_valid_delta &&
- ctl->PagePrecedes(this_page_number,
- best_valid_page_number)))
+ ctl->PageDiff(this_page_number,
+ best_valid_page_number) < 0))
{
bestvalidslot = slotno;
best_valid_delta = this_delta;
@@ -1103,8 +1103,8 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
{
if (this_delta > best_invalid_delta ||
(this_delta == best_invalid_delta &&
- ctl->PagePrecedes(this_page_number,
- best_invalid_page_number)))
+ ctl->PageDiff(this_page_number,
+ best_invalid_page_number) < 0))
{
bestinvalidslot = slotno;
best_invalid_delta = this_delta;
@@ -1211,7 +1211,8 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
}
/*
- * Remove all segments before the one holding the passed page number
+ * Remove some obsolete segments. As defense in depth, this deletes less than
+ * PageDiff() authorizes; see SlruWouldTruncateSegment().
*
* All SLRUs prevent concurrent calls to this function, either with an LWLock
* or by calling it only as part of a checkpoint. Mutual exclusion must begin
@@ -1244,7 +1245,7 @@ restart:;
* While we are holding the lock, make an important safety check: the
* current endpoint page must not be eligible for removal.
*/
- if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+ if (ctl->PageDiff(shared->latest_page_number, cutoffPage) < 0)
{
LWLockRelease(shared->ControlLock);
ereport(LOG,
@@ -1257,7 +1258,7 @@ restart:;
{
if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
continue;
- if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
+ if (ctl->PageDiff(shared->page_number[slotno], cutoffPage) >= 0)
continue;
/*
@@ -1374,33 +1375,46 @@ restart:
}
/*
- * Determine whether a segment is okay to delete.
+ * Determine whether to delete a segment.
*
* segpage is the first page of the segment, and cutoffPage is the oldest (in
- * PagePrecedes order) page in the SLRU containing still-useful data. Since
- * every core PagePrecedes callback implements "wrap around", check the
+ * PageDiff order) page in the SLRU containing still-useful data. Check the
* segment's first and last pages:
*
* first<cutoff && last<cutoff: yes
* first<cutoff && last>=cutoff: no; cutoff falls inside this segment
* first>=cutoff && last<cutoff: no; wrap point falls inside this segment
* first>=cutoff && last>=cutoff: no; every page of this segment is too young
+ *
+ * The PageDiff specification requires us not to remove pages where the
+ * callback reports negative values close to INT_MIN. Our interpretation is
+ * to decline to delete segments containing a page P such that PageDiff(P,
+ * cutoffPage) is in [INT_MIN, INT_MIN/2].
*/
static bool
-SlruMayDeleteSegment(SlruCtl ctl, int segpage, int cutoffPage)
+SlruWouldTruncateSegment(SlruCtl ctl, int segpage, int cutoffPage)
{
- int seg_last_page = segpage + SLRU_PAGES_PER_SEGMENT - 1;
+ int first_page_diff;
Assert(segpage % SLRU_PAGES_PER_SEGMENT == 0);
- return (ctl->PagePrecedes(segpage, cutoffPage) &&
- ctl->PagePrecedes(seg_last_page, cutoffPage));
+ first_page_diff = ctl->PageDiff(segpage, cutoffPage);
+ if (first_page_diff < 0 && first_page_diff > INT_MIN / 2)
+ {
+ int seg_last_page = segpage + SLRU_PAGES_PER_SEGMENT - 1;
+ int last_page_diff = ctl->PageDiff(seg_last_page, cutoffPage);
+
+ return last_page_diff < 0 && last_page_diff > INT_MIN / 2;
+ }
+ return false;
}
#ifdef USE_ASSERT_CHECKING
static void
-SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
+SlruPageDiffTestOffset(SlruCtl ctl, int per_page, uint32 offset)
{
+ int32 large_negative = INT_MIN / 1000 * 999,
+ large_positive = INT_MAX / 1000 * 999;
TransactionId lhs,
rhs;
int newestPage,
@@ -1424,19 +1438,27 @@ SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
Assert(!TransactionIdPrecedes(rhs, lhs + 1));
Assert(!TransactionIdFollowsOrEquals(lhs, rhs));
Assert(!TransactionIdFollowsOrEquals(rhs, lhs));
- Assert(!ctl->PagePrecedes(lhs / per_page, lhs / per_page));
- Assert(!ctl->PagePrecedes(lhs / per_page, rhs / per_page));
- Assert(!ctl->PagePrecedes(rhs / per_page, lhs / per_page));
- Assert(!ctl->PagePrecedes((lhs - per_page) / per_page, rhs / per_page));
- Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 3 * per_page) / per_page));
- Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 2 * per_page) / per_page));
- Assert(ctl->PagePrecedes(rhs / per_page, (lhs - 1 * per_page) / per_page)
- || (1U << 31) % per_page != 0); /* See CommitTsPagePrecedes() */
- Assert(ctl->PagePrecedes((lhs + 1 * per_page) / per_page, rhs / per_page)
+ Assert(ctl->PageDiff(lhs / per_page, lhs / per_page) == 0);
+ Assert(ctl->PageDiff(lhs / per_page, rhs / per_page) > large_positive);
+ Assert(ctl->PageDiff(rhs / per_page, lhs / per_page) > large_positive);
+ Assert(ctl->PageDiff((lhs - per_page) / per_page, rhs / per_page) >
+ large_positive);
+ Assert(ctl->PageDiff(rhs / per_page, (lhs - 3 * per_page) / per_page) <
+ large_negative);
+ Assert(ctl->PageDiff(rhs / per_page, (lhs - 2 * per_page) / per_page) <
+ large_negative);
+ Assert(ctl->PageDiff(rhs / per_page, (lhs - 1 * per_page) / per_page) <
+ large_negative
+ || (1U << 31) % per_page != 0); /* See CommitTsPageDiff() */
+ Assert(ctl->PageDiff((lhs + 1 * per_page) / per_page, rhs / per_page) <
+ large_negative
|| (1U << 31) % per_page != 0);
- Assert(ctl->PagePrecedes((lhs + 2 * per_page) / per_page, rhs / per_page));
- Assert(ctl->PagePrecedes((lhs + 3 * per_page) / per_page, rhs / per_page));
- Assert(!ctl->PagePrecedes(rhs / per_page, (lhs + per_page) / per_page));
+ Assert(ctl->PageDiff((lhs + 2 * per_page) / per_page, rhs / per_page) <
+ large_negative);
+ Assert(ctl->PageDiff((lhs + 3 * per_page) / per_page, rhs / per_page) <
+ large_negative);
+ Assert(ctl->PageDiff(rhs / per_page, (lhs + per_page) / per_page) >
+ large_positive);
/*
* GetNewTransactionId() has assigned the last XID it can safely use, and
@@ -1449,10 +1471,10 @@ SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
oldestXact = newestXact + 1;
oldestXact -= 1U << 31;
oldestPage = oldestXact / per_page;
- Assert(!SlruMayDeleteSegment(ctl,
- (newestPage -
- newestPage % SLRU_PAGES_PER_SEGMENT),
- oldestPage));
+ Assert(!SlruWouldTruncateSegment(ctl,
+ (newestPage -
+ newestPage % SLRU_PAGES_PER_SEGMENT),
+ oldestPage));
/*
* GetNewTransactionId() has assigned the last XID it can safely use, and
@@ -1465,42 +1487,44 @@ SlruPagePrecedesTestOffset(SlruCtl ctl, int per_page, uint32 offset)
oldestXact = newestXact + 1;
oldestXact -= 1U << 31;
oldestPage = oldestXact / per_page;
- Assert(!SlruMayDeleteSegment(ctl,
- (newestPage -
- newestPage % SLRU_PAGES_PER_SEGMENT),
- oldestPage));
+ Assert(!SlruWouldTruncateSegment(ctl,
+ (newestPage -
+ newestPage % SLRU_PAGES_PER_SEGMENT),
+ oldestPage));
}
/*
- * Unit-test a PagePrecedes function.
+ * Unit-test a PageDiff function.
*
* This assumes every uint32 >= FirstNormalTransactionId is a valid key. It
* assumes each value occupies a contiguous, fixed-size region of SLRU bytes.
* (MultiXactMemberCtl separates flags from XIDs. AsyncCtl has
* variable-length entries, no keys, and no random access. These unit tests
* do not apply to them.)
+ *
+ * This is stricter than the PageDiff API requires.
*/
void
-SlruPagePrecedesUnitTests(SlruCtl ctl, int per_page)
+SlruPageDiffUnitTests(SlruCtl ctl, int per_page)
{
/* Test first, middle and last entries of a page. */
- SlruPagePrecedesTestOffset(ctl, per_page, 0);
- SlruPagePrecedesTestOffset(ctl, per_page, per_page / 2);
- SlruPagePrecedesTestOffset(ctl, per_page, per_page - 1);
+ SlruPageDiffTestOffset(ctl, per_page, 0);
+ SlruPageDiffTestOffset(ctl, per_page, per_page / 2);
+ SlruPageDiffTestOffset(ctl, per_page, per_page - 1);
}
#endif
/*
* SlruScanDirectory callback
- * This callback reports true if there's any segment wholly prior to the
- * one containing the page passed as "data".
+ * This callback reports true if SimpleLruTruncate(ctl, *data) would
+ * unlink any segment.
*/
bool
-SlruScanDirCbReportPresence(SlruCtl ctl, char *filename, int segpage, void *data)
+SlruScanDirCbWouldTruncate(SlruCtl ctl, char *filename, int segpage, void *data)
{
int cutoffPage = *(int *) data;
- if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
+ if (SlruWouldTruncateSegment(ctl, segpage, cutoffPage))
return true; /* found one; don't iterate any more */
return false; /* keep going */
@@ -1515,7 +1539,7 @@ SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename, int segpage, void *data)
{
int cutoffPage = *(int *) data;
- if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
+ if (SlruWouldTruncateSegment(ctl, segpage, cutoffPage))
SlruInternalDeleteSegment(ctl, segpage / SLRU_PAGES_PER_SEGMENT);
return false; /* keep going */
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 6a8e521..0c0b7bc 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -44,7 +44,7 @@
* 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE, and segment numbering at
* 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need take no
* explicit notice of that fact in this module, except when comparing segment
- * and page numbers in TruncateSUBTRANS (see SubTransPagePrecedes) and zeroing
+ * and page numbers in TruncateSUBTRANS (see SubTransPageDiff) and zeroing
* them in StartupSUBTRANS.
*/
@@ -64,7 +64,7 @@ static SlruCtlData SubTransCtlData;
static int ZeroSUBTRANSPage(int pageno);
-static bool SubTransPagePrecedes(int page1, int page2);
+static int32 SubTransPageDiff(int page1, int page2);
/*
@@ -190,11 +190,11 @@ SUBTRANSShmemSize(void)
void
SUBTRANSShmemInit(void)
{
- SubTransCtl->PagePrecedes = SubTransPagePrecedes;
+ SubTransCtl->PageDiff = SubTransPageDiff;
SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
SubtransSLRULock, "pg_subtrans",
LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE);
- SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
+ SlruPageDiffUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
}
/*
@@ -355,20 +355,21 @@ TruncateSUBTRANS(TransactionId oldestXact)
/*
- * Decide whether a SUBTRANS page number is "older" for truncation purposes.
- * Analogous to CLOGPagePrecedes().
+ * Diff SUBTRANS page numbers for truncation purposes. Analogous to
+ * CLOGPageDiff().
*/
-static bool
-SubTransPagePrecedes(int page1, int page2)
+static int32
+SubTransPageDiff(int page1, int page2)
{
TransactionId xid1;
TransactionId xid2;
+ int32 diff_head;
+ int32 diff_tail;
xid1 = ((TransactionId) page1) * SUBTRANS_XACTS_PER_PAGE;
- xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * SUBTRANS_XACTS_PER_PAGE;
- xid2 += FirstNormalTransactionId + 1;
- return (TransactionIdPrecedes(xid1, xid2) &&
- TransactionIdPrecedes(xid1, xid2 + SUBTRANS_XACTS_PER_PAGE - 1));
+ diff_head = xid1 - xid2;
+ diff_tail = xid1 - (xid2 + SUBTRANS_XACTS_PER_PAGE - 1);
+ return Max(diff_head, diff_tail);
}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 42b232d..475b4ab 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -207,13 +207,13 @@ typedef struct QueuePosition
/* choose logically smaller QueuePosition */
#define QUEUE_POS_MIN(x,y) \
- (asyncQueuePagePrecedes((x).page, (y).page) ? (x) : \
+ (asyncQueuePageDiff((x).page, (y).page) < 0 ? (x) : \
(x).page != (y).page ? (y) : \
(x).offset < (y).offset ? (x) : (y))
/* choose logically larger QueuePosition */
#define QUEUE_POS_MAX(x,y) \
- (asyncQueuePagePrecedes((x).page, (y).page) ? (y) : \
+ (asyncQueuePageDiff((x).page, (y).page) < 0 ? (y) : \
(x).page != (y).page ? (x) : \
(x).offset > (y).offset ? (x) : (y))
@@ -436,8 +436,7 @@ static bool backendTryAdvanceTail = false;
bool Trace_notify = false;
/* local function prototypes */
-static int asyncQueuePageDiff(int p, int q);
-static bool asyncQueuePagePrecedes(int p, int q);
+static int32 asyncQueuePageDiff(int p, int q);
static void queue_listen(ListenActionKind action, const char *channel);
static void Async_UnlistenOnExit(int code, Datum arg);
static void Exec_ListenPreCommit(void);
@@ -468,12 +467,16 @@ static void ClearPendingActionsAndNotifies(void);
/*
* Compute the difference between two queue page numbers (i.e., p - q),
- * accounting for wraparound.
+ * accounting for wraparound. Since asyncQueueIsFull() blocks creation of a
+ * page that could precede any extant page, we need not assess entries within
+ * a page.
*/
-static int
+static int32
asyncQueuePageDiff(int p, int q)
{
- int diff;
+ int diff_max = ((QUEUE_MAX_PAGE + 1) / 2) - 1,
+ diff;
+ int32 scale = INT_MAX / diff_max;
/*
* We have to compare modulo (QUEUE_MAX_PAGE+1)/2. Both inputs should be
@@ -487,19 +490,24 @@ asyncQueuePageDiff(int p, int q)
diff -= QUEUE_MAX_PAGE + 1;
else if (diff < -((QUEUE_MAX_PAGE + 1) / 2))
diff += QUEUE_MAX_PAGE + 1;
- return diff;
+ return diff * scale;
}
-/*
- * Is p < q, accounting for wraparound?
- *
- * Since asyncQueueIsFull() blocks creation of a page that could precede any
- * extant page, we need not assess entries within a page.
- */
-static bool
-asyncQueuePagePrecedes(int p, int q)
+static void
+asyncQueuePageDiffUnitTests(void)
{
- return asyncQueuePageDiff(p, q) < 0;
+ int32 large_negative = INT_MIN / 1000 * 999,
+ large_positive = INT_MAX / 1000 * 999;
+ int diff_min = -((QUEUE_MAX_PAGE + 1) / 2),
+ diff_max = ((QUEUE_MAX_PAGE + 1) / 2) - 1;
+
+ Assert(asyncQueuePageDiff(diff_max, diff_max) == 0);
+ Assert(asyncQueuePageDiff(diff_max, 0) > large_positive);
+ Assert(asyncQueuePageDiff(diff_max + 1, 0) < large_negative);
+ Assert(asyncQueuePageDiff(0, QUEUE_MAX_PAGE + diff_min + 1) <
+ large_negative);
+ Assert(asyncQueuePageDiff(0, QUEUE_MAX_PAGE + diff_min + 2) >
+ large_positive);
}
/*
@@ -561,10 +569,11 @@ AsyncShmemInit(void)
/*
* Set up SLRU management of the pg_notify data.
*/
- NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
+ NotifyCtl->PageDiff = asyncQueuePageDiff;
SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
SYNC_HANDLER_NONE);
+ asyncQueuePageDiffUnitTests();
if (!found)
{
@@ -1369,7 +1378,7 @@ asyncQueueIsFull(void)
nexthead = 0; /* wrap around */
boundary = QUEUE_STOP_PAGE;
boundary -= boundary % SLRU_PAGES_PER_SEGMENT;
- return asyncQueuePagePrecedes(nexthead, boundary);
+ return asyncQueuePageDiff(nexthead, boundary) < 0;
}
/*
@@ -2229,7 +2238,7 @@ asyncQueueAdvanceTail(void)
*/
newtailpage = QUEUE_POS_PAGE(min);
boundary = newtailpage - (newtailpage % SLRU_PAGES_PER_SEGMENT);
- if (asyncQueuePagePrecedes(oldtailpage, boundary))
+ if (asyncQueuePageDiff(oldtailpage, boundary) < 0)
{
/*
* SimpleLruTruncate() will ask for NotifySLRULock but will also
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 822c22e..cef2188 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -438,7 +438,7 @@ static void SetPossibleUnsafeConflict(SERIALIZABLEXACT *roXact, SERIALIZABLEXACT
static void ReleaseRWConflict(RWConflict conflict);
static void FlagSxactUnsafe(SERIALIZABLEXACT *sxact);
-static bool SerialPagePrecedesLogically(int page1, int page2);
+static int32 SerialPageDiffLogically(int page1, int page2);
static void SerialInit(void);
static void SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo);
static SerCommitSeqNo SerialGetMinConflictCommitSeqNo(TransactionId xid);
@@ -784,26 +784,30 @@ FlagSxactUnsafe(SERIALIZABLEXACT *sxact)
/*------------------------------------------------------------------------*/
/*
- * Decide whether a Serial page number is "older" for truncation purposes.
- * Analogous to CLOGPagePrecedes().
+ * Diff Serial page numbers for truncation purposes. Analogous to
+ * CLOGPageDiff().
+ *
+ * This must follow stricter rules than PageDiff demands, for the benefit of
+ * the call local to this file.
*/
-static bool
-SerialPagePrecedesLogically(int page1, int page2)
+static int32
+SerialPageDiffLogically(int page1, int page2)
{
TransactionId xid1;
TransactionId xid2;
+ int32 diff_head;
+ int32 diff_tail;
xid1 = ((TransactionId) page1) * SERIAL_ENTRIESPERPAGE;
- xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * SERIAL_ENTRIESPERPAGE;
- xid2 += FirstNormalTransactionId + 1;
- return (TransactionIdPrecedes(xid1, xid2) &&
- TransactionIdPrecedes(xid1, xid2 + SERIAL_ENTRIESPERPAGE - 1));
+ diff_head = xid1 - xid2;
+ diff_tail = xid1 - (xid2 + SERIAL_ENTRIESPERPAGE - 1);
+ return Max(diff_head, diff_tail);
}
static void
-SerialPagePrecedesLogicallyUnitTests(void)
+SerialPageDiffLogicallyUnitTests(void)
{
int per_page = SERIAL_ENTRIESPERPAGE,
offset = per_page / 2;
@@ -826,21 +830,21 @@ SerialPagePrecedesLogicallyUnitTests(void)
* In this scenario, the SLRU headPage pertains to the last ~1000 XIDs
* assigned. oldestXact finishes, ~2B XIDs having elapsed since it
* started. Further transactions cause us to summarize oldestXact to
- * tailPage. Function must return false so SerialAdd() doesn't zero
- * tailPage (which may contain entries for other old, recently-finished
- * XIDs) and half the SLRU. Reaching this requires burning ~2B XIDs in
- * single-user mode, a negligible possibility.
+ * tailPage. Function must return non-negative so SerialAdd() doesn't
+ * zero tailPage (which may contain entries for other old,
+ * recently-finished XIDs) and half the SLRU. Reaching this requires
+ * burning ~2B XIDs in single-user mode, a negligible possibility.
*/
headPage = newestPage;
targetPage = oldestPage;
- Assert(!SerialPagePrecedesLogically(headPage, targetPage));
+ Assert(SerialPageDiffLogically(headPage, targetPage) >= 0);
/*
* In this scenario, the SLRU headPage pertains to oldestXact. We're
* summarizing an XID near newestXact. (Assume few other XIDs used
* SERIALIZABLE, hence the minimal headPage advancement. Assume
* oldestXact was long-running and only recently reached the SLRU.)
- * Function must return true to make SerialAdd() create targetPage.
+ * Function must return negative to make SerialAdd() create targetPage.
*
* Today's implementation mishandles this case, but it doesn't matter
* enough to fix. Verify that the defect affects just one page by
@@ -851,9 +855,9 @@ SerialPagePrecedesLogicallyUnitTests(void)
*/
headPage = oldestPage;
targetPage = newestPage;
- Assert(SerialPagePrecedesLogically(headPage, targetPage - 1));
+ Assert(SerialPageDiffLogically(headPage, targetPage - 1) < 0);
#if 0
- Assert(SerialPagePrecedesLogically(headPage, targetPage));
+ Assert(SerialPageDiffLogically(headPage, targetPage) < 0);
#endif
}
@@ -868,12 +872,12 @@ SerialInit(void)
/*
* Set up SLRU management of the pg_serial data.
*/
- SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
+ SerialSlruCtl->PageDiff = SerialPageDiffLogically;
SimpleLruInit(SerialSlruCtl, "Serial",
NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE);
- SerialPagePrecedesLogicallyUnitTests();
- SlruPagePrecedesUnitTests(SerialSlruCtl, SERIAL_ENTRIESPERPAGE);
+ SerialPageDiffLogicallyUnitTests();
+ SlruPageDiffUnitTests(SerialSlruCtl, SERIAL_ENTRIESPERPAGE);
/*
* Create or attach to the SerialControl structure.
@@ -935,8 +939,8 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
else
{
firstZeroPage = SerialNextPage(serialControl->headPage);
- isNewPage = SerialPagePrecedesLogically(serialControl->headPage,
- targetPage);
+ isNewPage = SerialPageDiffLogically(serialControl->headPage,
+ targetPage) < 0;
}
if (!TransactionIdIsValid(serialControl->headXid)
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index dd52e8c..492cce7 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -29,7 +29,7 @@
* xxxx is CLOG or SUBTRANS, respectively), and segment numbering at
* 0xFFFFFFFF/xxxx_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
* take no explicit notice of that fact in slru.c, except when comparing
- * segment and page numbers in SimpleLruTruncate (see PagePrecedes()).
+ * segment and page numbers in SimpleLruTruncate (see PageDiff()).
*/
#define SLRU_PAGES_PER_SEGMENT 32
@@ -118,16 +118,18 @@ typedef struct SlruCtlData
SyncRequestHandler sync_handler;
/*
- * Decide whether a page is "older" for truncation and as a hint for
- * evicting pages in LRU order. Return true if every entry of the first
- * argument is older than every entry of the second argument. Note that
- * !PagePrecedes(a,b) && !PagePrecedes(b,a) need not imply a==b; it also
- * arises when some entries are older and some are not. For SLRUs using
- * SimpleLruTruncate(), this must use modular arithmetic. (For others,
- * the behavior of this callback has no functional implications.) Use
- * SlruPagePrecedesUnitTests() in SLRUs meeting its criteria.
+ * Compute distance between two page numbers, for truncation and as a hint
+ * for evicting pages in LRU order. Callbacks shall distribute return
+ * values uniformly in [INT_MIN,INT_MAX]. If PageDiff(P, oldest_needed)
+ * is negative but not close to INT_MIN, that implies data in page P is
+ * obsolete. The exception for values close to INT_MIN permits
+ * implementations to return such values for edge cases where the answer
+ * changes mid-page from INT_MIN to INT_MAX. Use SlruPageDiffUnitTests()
+ * in SLRUs meeting its criteria. For SLRUs using SimpleLruTruncate(),
+ * this must use modular arithmetic. (For others, the behavior of this
+ * callback has no functional implications.)
*/
- bool (*PagePrecedes) (int, int);
+ int32 (*PageDiff) (int, int);
/*
* Dir is set during SimpleLruInit and does not change thereafter. Since
@@ -151,9 +153,9 @@ extern int SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno,
extern void SimpleLruWritePage(SlruCtl ctl, int slotno);
extern void SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied);
#ifdef USE_ASSERT_CHECKING
-extern void SlruPagePrecedesUnitTests(SlruCtl ctl, int per_page);
+extern void SlruPageDiffUnitTests(SlruCtl ctl, int per_page);
#else
-#define SlruPagePrecedesUnitTests(ctl, per_page) do {} while (0)
+#define SlruPageDiffUnitTests(ctl, per_page) do {} while (0)
#endif
extern void SimpleLruTruncate(SlruCtl ctl, int cutoffPage);
extern bool SimpleLruDoesPhysicalPageExist(SlruCtl ctl, int pageno);
@@ -166,8 +168,8 @@ extern void SlruDeleteSegment(SlruCtl ctl, int segno);
extern int SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path);
/* SlruScanDirectory public callbacks */
-extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
- int segpage, void *data);
+extern bool SlruScanDirCbWouldTruncate(SlruCtl ctl, char *filename,
+ int segpage, void *data);
extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int segpage,
void *data);
10 янв. 2021 г., в 14:43, Noah Misch <noah@leadboat.com> написал(а):
Can you please send revised patches with fixes?
Attached.
<slru-truncate-modulo-v6.patch>
<slru-truncate-t-insurance-v5.patch>
I'm marking patch as ready for committer.
I can't tell should we backpatch insurance patch or not: it potentially fixes unknown bugs, and potentially contains unknown bugs. I can't reason because of such uncertainty. I've tried to look for any potential problem and as for now I see none. Chances are <slru-truncate-t-insurance-v5.patch> is doing code less error-prone.
Fix <slru-truncate-modulo-v6.patch> certainly worth backpatching.
Thanks!
Best regards, Andrey Borodin.
On Mon, Jan 11, 2021 at 11:22:05AM +0500, Andrey Borodin wrote:
I'm marking patch as ready for committer.
Thanks.
I can't tell should we backpatch insurance patch or not: it potentially fixes unknown bugs, and potentially contains unknown bugs. I can't reason because of such uncertainty. I've tried to look for any potential problem and as for now I see none. Chances are <slru-truncate-t-insurance-v5.patch> is doing code less error-prone.
What do you think of abandoning slru-truncate-t-insurance entirely? As of
/messages/by-id/20200330052809.GB2324620@rfd.leadboat.com I liked the idea
behind it, despite its complicating the system for hackers and DBAs. The
TruncateMultiXact() interaction rendered it less appealing. In v14+, commit
cd5e822 mitigates the kind of bugs that slru-truncate-t-insurance mitigates,
further reducing the latter's value. slru-truncate-t-insurance does mitigate
larger trespasses into unlink-eligible space, though.
Fix <slru-truncate-modulo-v6.patch> certainly worth backpatching.
I'll push it on Saturday, probably.
12 янв. 2021 г., в 13:49, Noah Misch <noah@leadboat.com> написал(а):
What do you think of abandoning slru-truncate-t-insurance entirely? As of
/messages/by-id/20200330052809.GB2324620@rfd.leadboat.com I liked the idea
behind it, despite its complicating the system for hackers and DBAs. The
TruncateMultiXact() interaction rendered it less appealing. In v14+, commit
cd5e822 mitigates the kind of bugs that slru-truncate-t-insurance mitigates,
further reducing the latter's value. slru-truncate-t-insurance does mitigate
larger trespasses into unlink-eligible space, though.
I seem to me that not committing an insurance patch is not a mistake. Let's abandon slru-truncate-t-insurance for now.
Fix <slru-truncate-modulo-v6.patch> certainly worth backpatching.
I'll push it on Saturday, probably.
Thanks!
Best regards, Andrey Borodin.