min_safe_lsn column in pg_replication_slots view
Hi,
Per the docs, pg_replication_slots.min_safe_lsn inedicates "the minimum
LSN currently available for walsenders". When I executed pg_walfile_name()
with min_safe_lsn, the function returned the name of the last removed
WAL file instead of minimum available WAL file name. This happens because
min_safe_lsn actually indicates the ending position (the boundary byte)
of the last removed WAL file.
I guess that some users would want to calculate the minimum available
WAL file name from min_safe_lsn by using pg_walfile_name(), but the result
would be incorrect. Isn't this confusing? min_safe_lsn should indicate
the bondary byte + 1, instead?
BTW, I just wonder why each row in pg_replication_slots needs to have
min_safe_lsn column? Basically min_safe_lsn should be the same between
every replication slots.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Mon, Jun 15, 2020 at 12:40:03PM +0900, Fujii Masao wrote:
BTW, I just wonder why each row in pg_replication_slots needs to have
min_safe_lsn column? Basically min_safe_lsn should be the same between
every replication slots.
Indeed, that's confusing in its current shape. I would buy putting
this value into pg_replication_slots if there were some consistency of
the data to worry about because of locking issues, but here this data
is controlled within info_lck, which is out of the the repslot LW
lock. So I think that it is incorrect to put this data in this view
and that we should remove it, and that instead we had better push for
a system view that maps with the contents of XLogCtl.
--
Michael
At Mon, 15 Jun 2020 13:44:31 +0900, Michael Paquier <michael@paquier.xyz> wrote in
On Mon, Jun 15, 2020 at 12:40:03PM +0900, Fujii Masao wrote:
BTW, I just wonder why each row in pg_replication_slots needs to have
min_safe_lsn column? Basically min_safe_lsn should be the same between
every replication slots.Indeed, that's confusing in its current shape. I would buy putting
this value into pg_replication_slots if there were some consistency of
the data to worry about because of locking issues, but here this data
is controlled within info_lck, which is out of the the repslot LW
lock. So I think that it is incorrect to put this data in this view
and that we should remove it, and that instead we had better push for
a system view that maps with the contents of XLogCtl.
It was once the difference from the safe_lsn to restart_lsn which is
distinct among slots. Then it was changed to the safe_lsn. I agree to
the discussion above, but it is needed anywhere since no one can know
the margin until the slot goes to the "lost" state without it. (Note
that currently even wal_status and min_safe_lsn can be inconsistent in
a line.)
Just for the need for table-consistency and in-line consistency, we
could just remember the result of XLogGetLastRemovedSegno() around
taking info_lock in the function. That doesn't make a practical
difference but makes the view look consistent.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
consistent_min_safe_lsn.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d6fe205eb4..f57d67a900 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9498,7 +9498,7 @@ CreateRestartPoint(int flags)
* * WALAVAIL_INVALID_LSN means the slot hasn't been set to reserve WAL.
*/
WALAvailability
-GetWALAvailability(XLogRecPtr targetLSN)
+GetWALAvailability(XLogRecPtr targetLSN, XLogSegNo last_removed_seg)
{
XLogRecPtr currpos; /* current write LSN */
XLogSegNo currSeg; /* segid of currpos */
@@ -9524,7 +9524,7 @@ GetWALAvailability(XLogRecPtr targetLSN)
* the first WAL segment file since startup, which causes the status being
* wrong under certain abnormal conditions but that doesn't actually harm.
*/
- oldestSeg = XLogGetLastRemovedSegno() + 1;
+ oldestSeg = last_removed_seg + 1;
/* calculate oldest segment by max_wal_size and wal_keep_segments */
XLByteToSeg(currpos, currSeg, wal_segment_size);
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 1b929a603e..bd2e3e84ed 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -243,6 +243,7 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
MemoryContext per_query_ctx;
MemoryContext oldcontext;
int slotno;
+ XLogSegNo last_removed_seg;
/* check to see if caller supports us returning a tuplestore */
if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
@@ -272,6 +273,14 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
rsinfo->setResult = tupstore;
rsinfo->setDesc = tupdesc;
+ /*
+ * Remember the last removed segment at this point for the consistency in
+ * this table. Since there's no interlock between slot data and
+ * checkpointer, the segment can be removed in-between, but that doesn't
+ * make any practical difference.
+ */
+ last_removed_seg = XLogGetLastRemovedSegno();
+
MemoryContextSwitchTo(oldcontext);
LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
@@ -282,7 +291,6 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
Datum values[PG_GET_REPLICATION_SLOTS_COLS];
bool nulls[PG_GET_REPLICATION_SLOTS_COLS];
WALAvailability walstate;
- XLogSegNo last_removed_seg;
int i;
if (!slot->in_use)
@@ -342,7 +350,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
else
nulls[i++] = true;
- walstate = GetWALAvailability(slot_contents.data.restart_lsn);
+ walstate = GetWALAvailability(slot_contents.data.restart_lsn,
+ last_removed_seg);
switch (walstate)
{
@@ -368,7 +377,7 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
if (max_slot_wal_keep_size_mb >= 0 &&
(walstate == WALAVAIL_NORMAL || walstate == WALAVAIL_RESERVED) &&
- ((last_removed_seg = XLogGetLastRemovedSegno()) != 0))
+ (last_removed_seg != 0))
{
XLogRecPtr min_safe_lsn;
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index e917dfe92d..c05a18148d 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -326,7 +326,8 @@ extern void ShutdownXLOG(int code, Datum arg);
extern void InitXLOGAccess(void);
extern void CreateCheckPoint(int flags);
extern bool CreateRestartPoint(int flags);
-extern WALAvailability GetWALAvailability(XLogRecPtr restart_lsn);
+extern WALAvailability GetWALAvailability(XLogRecPtr targetLSN,
+ XLogSegNo last_removed_seg);
extern XLogRecPtr CalculateMaxmumSafeLSN(void);
extern void XLogPutNextOid(Oid nextOid);
extern XLogRecPtr XLogRestorePoint(const char *rpName);
On 2020/06/15 16:35, Kyotaro Horiguchi wrote:
At Mon, 15 Jun 2020 13:44:31 +0900, Michael Paquier <michael@paquier.xyz> wrote in
On Mon, Jun 15, 2020 at 12:40:03PM +0900, Fujii Masao wrote:
BTW, I just wonder why each row in pg_replication_slots needs to have
min_safe_lsn column? Basically min_safe_lsn should be the same between
every replication slots.Indeed, that's confusing in its current shape. I would buy putting
this value into pg_replication_slots if there were some consistency of
the data to worry about because of locking issues, but here this data
is controlled within info_lck, which is out of the the repslot LW
lock. So I think that it is incorrect to put this data in this view
and that we should remove it, and that instead we had better push for
a system view that maps with the contents of XLogCtl.
Agreed. But as you know it's too late to do that for v13...
So firstly I'd like to fix the issues in pg_replication_slots view,
and then we can improve the things later for v14 if necessary.
It was once the difference from the safe_lsn to restart_lsn which is
distinct among slots. Then it was changed to the safe_lsn. I agree to
the discussion above, but it is needed anywhere since no one can know
the margin until the slot goes to the "lost" state without it. (Note
that currently even wal_status and min_safe_lsn can be inconsistent in
a line.)Just for the need for table-consistency and in-line consistency, we
could just remember the result of XLogGetLastRemovedSegno() around
taking info_lock in the function. That doesn't make a practical
difference but makes the view look consistent.
Agreed. Thanks for the patch. Here are the review comments:
Not only the last removed segment but also current write position
should be obtain at the beginning of pg_get_replication_slots()
and should be given to GetWALAvailability(), for the consistency?
Even after applying your patch, min_safe_lsn is calculated for
each slot even though the calculated result is always the same.
Which is a bit waste of cycle. We should calculate min_safe_lsn
at the beginning and use it for each slot?
XLogSegNoOffsetToRecPtr(last_removed_seg + 1, 0,
Isn't it better to use 1 as the second argument of the above,
in order to address the issue that I reported upthread?
Otherwise, the WAL file name that pg_walfile_name(min_safe_lsn) returns
would be confusing.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Thanks for the comments.
At Wed, 17 Jun 2020 21:37:55 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
On 2020/06/15 16:35, Kyotaro Horiguchi wrote:
At Mon, 15 Jun 2020 13:44:31 +0900, Michael Paquier
<michael@paquier.xyz> wrote inOn Mon, Jun 15, 2020 at 12:40:03PM +0900, Fujii Masao wrote:
BTW, I just wonder why each row in pg_replication_slots needs to have
min_safe_lsn column? Basically min_safe_lsn should be the same between
every replication slots.Indeed, that's confusing in its current shape. I would buy putting
this value into pg_replication_slots if there were some consistency of
the data to worry about because of locking issues, but here this data
is controlled within info_lck, which is out of the the repslot LW
lock. So I think that it is incorrect to put this data in this view
and that we should remove it, and that instead we had better push for
a system view that maps with the contents of XLogCtl.Agreed. But as you know it's too late to do that for v13...
So firstly I'd like to fix the issues in pg_replication_slots view,
and then we can improve the things later for v14 if necessary.It was once the difference from the safe_lsn to restart_lsn which is
distinct among slots. Then it was changed to the safe_lsn. I agree to
the discussion above, but it is needed anywhere since no one can know
the margin until the slot goes to the "lost" state without it. (Note
that currently even wal_status and min_safe_lsn can be inconsistent in
a line.)
Just for the need for table-consistency and in-line consistency, we
could just remember the result of XLogGetLastRemovedSegno() around
taking info_lock in the function. That doesn't make a practical
difference but makes the view look consistent.Agreed. Thanks for the patch. Here are the review comments:
Not only the last removed segment but also current write position
should be obtain at the beginning of pg_get_replication_slots()
and should be given to GetWALAvailability(), for the consistency?
You are right. Though I faintly thought that I didn't need that since
WriteRecPtr doesn't move by so wide steps as removed_segment, actually
it moves.
Even after applying your patch, min_safe_lsn is calculated for
each slot even though the calculated result is always the same.
Which is a bit waste of cycle. We should calculate min_safe_lsn
at the beginning and use it for each slot?
Agreed. That may results in a wastful calculation but it's better than
repeated wasteful calculations.
XLogSegNoOffsetToRecPtr(last_removed_seg + 1, 0,
Isn't it better to use 1 as the second argument of the above,
in order to address the issue that I reported upthread?
Otherwise, the WAL file name that pg_walfile_name(min_safe_lsn)
returns
would be confusing.
Mmm. pg_walfile_name seems too specialize to
pg_stop_backup(). (pg_walfile_name_offset() returns wrong result for
segment boundaries.) I'm not willing to do that only to follow such
suspicious(?) specification, but surely it would practically be better
doing that. Please find the attached first patch. I found that
there's no reason to hide min_safe_lsn when wal_status has certain
values. So I changed it to be shown always.
By the way, I noticed that when a replication slot reserves all
existing WAL segments, checkpoint cannot remove a file and
lastRemovedSegment is left being 0. The second attached forces
RemoveOldXlogFiles to initialize the variable even when no WAL
segments are removed. It puts no additional loads on file system
since the directory is scanned anyway. My old proposal to
unconditionally initialize it separately from checkpoint was rejected,
but I think this is acceptable.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
v2-0001-Make-pg_stat_replication-view-consistent.patchtext/x-patch; charset=us-asciiDownload
From 6742e3b31ac53ef54458d64e34feeac7d4c710d2 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Thu, 18 Jun 2020 13:36:23 +0900
Subject: [PATCH v2 1/2] Make pg_stat_replication view consistent
min_safe_lsn is not consistent within the result of a query. Make it
consistent. On the way fixing that min_safe_lsn is changed to show the
second byte of a segment instead of first byte in order to users can
get the intended segment file name from pg_walfile_name.
---
src/backend/access/transam/xlog.c | 15 +++++++------
src/backend/replication/slotfuncs.c | 35 ++++++++++++++++++++---------
src/include/access/xlog.h | 4 +++-
3 files changed, 35 insertions(+), 19 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 55cac186dc..940f5fcb18 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9484,6 +9484,9 @@ CreateRestartPoint(int flags)
* Report availability of WAL for the given target LSN
* (typically a slot's restart_lsn)
*
+ * currWriteLSN and lastRemovedSeg is the results of XLogGetLastRemovedSegno()
+ * and GetXLogWriteRecPtr() respectively at the referring time.
+ *
* Returns one of the following enum values:
* * WALAVAIL_NORMAL means targetLSN is available because it is in the range
* of max_wal_size.
@@ -9498,9 +9501,9 @@ CreateRestartPoint(int flags)
* * WALAVAIL_INVALID_LSN means the slot hasn't been set to reserve WAL.
*/
WALAvailability
-GetWALAvailability(XLogRecPtr targetLSN)
+GetWALAvailability(XLogRecPtr targetLSN, XLogRecPtr currWriteLSN,
+ XLogSegNo lastRemovedSeg)
{
- XLogRecPtr currpos; /* current write LSN */
XLogSegNo currSeg; /* segid of currpos */
XLogSegNo targetSeg; /* segid of targetLSN */
XLogSegNo oldestSeg; /* actual oldest segid */
@@ -9513,21 +9516,19 @@ GetWALAvailability(XLogRecPtr targetLSN)
if (XLogRecPtrIsInvalid(targetLSN))
return WALAVAIL_INVALID_LSN;
- currpos = GetXLogWriteRecPtr();
-
/* calculate oldest segment currently needed by slots */
XLByteToSeg(targetLSN, targetSeg, wal_segment_size);
- KeepLogSeg(currpos, &oldestSlotSeg);
+ KeepLogSeg(currWriteLSN, &oldestSlotSeg);
/*
* Find the oldest extant segment file. We get 1 until checkpoint removes
* the first WAL segment file since startup, which causes the status being
* wrong under certain abnormal conditions but that doesn't actually harm.
*/
- oldestSeg = XLogGetLastRemovedSegno() + 1;
+ oldestSeg = lastRemovedSeg + 1;
/* calculate oldest segment by max_wal_size and wal_keep_segments */
- XLByteToSeg(currpos, currSeg, wal_segment_size);
+ XLByteToSeg(currWriteLSN, currSeg, wal_segment_size);
keepSegs = ConvertToXSegs(Max(max_wal_size_mb, wal_keep_segments),
wal_segment_size) + 1;
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 1b929a603e..063c6dd435 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -243,6 +243,9 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
MemoryContext per_query_ctx;
MemoryContext oldcontext;
int slotno;
+ XLogSegNo last_removed_seg;
+ XLogRecPtr curr_write_lsn;
+ XLogRecPtr min_safe_lsn = 0;
/* check to see if caller supports us returning a tuplestore */
if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
@@ -272,6 +275,24 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
rsinfo->setResult = tupstore;
rsinfo->setDesc = tupdesc;
+ /*
+ * Remember the last removed segment and current WAL write LSN at this
+ * point for the consistency in this table. Since there's no interlock
+ * between slot data and checkpointer, the segment can be removed
+ * in-between, but that doesn't make any practical difference. Also we
+ * calculate safe_min_lsn here as the same value is shown for all slots.
+ */
+ last_removed_seg = XLogGetLastRemovedSegno();
+ curr_write_lsn = GetXLogWriteRecPtr();
+
+ /*
+ * Show the next byte after segment boundary as min_safe_lsn so that
+ * pg_walfile_name() returns the correct file name for the value.
+ */
+ if (max_slot_wal_keep_size_mb >= 0 && last_removed_seg != 0)
+ XLogSegNoOffsetToRecPtr(last_removed_seg + 1, 1,
+ wal_segment_size, min_safe_lsn);
+
MemoryContextSwitchTo(oldcontext);
LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
@@ -282,7 +303,6 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
Datum values[PG_GET_REPLICATION_SLOTS_COLS];
bool nulls[PG_GET_REPLICATION_SLOTS_COLS];
WALAvailability walstate;
- XLogSegNo last_removed_seg;
int i;
if (!slot->in_use)
@@ -342,7 +362,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
else
nulls[i++] = true;
- walstate = GetWALAvailability(slot_contents.data.restart_lsn);
+ walstate = GetWALAvailability(slot_contents.data.restart_lsn,
+ curr_write_lsn, last_removed_seg);
switch (walstate)
{
@@ -366,16 +387,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
elog(ERROR, "invalid walstate: %d", (int) walstate);
}
- if (max_slot_wal_keep_size_mb >= 0 &&
- (walstate == WALAVAIL_NORMAL || walstate == WALAVAIL_RESERVED) &&
- ((last_removed_seg = XLogGetLastRemovedSegno()) != 0))
- {
- XLogRecPtr min_safe_lsn;
-
- XLogSegNoOffsetToRecPtr(last_removed_seg + 1, 0,
- wal_segment_size, min_safe_lsn);
+ if (min_safe_lsn != 0)
values[i++] = Int64GetDatum(min_safe_lsn);
- }
else
nulls[i++] = true;
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index e917dfe92d..bbe00e7195 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -326,7 +326,9 @@ extern void ShutdownXLOG(int code, Datum arg);
extern void InitXLOGAccess(void);
extern void CreateCheckPoint(int flags);
extern bool CreateRestartPoint(int flags);
-extern WALAvailability GetWALAvailability(XLogRecPtr restart_lsn);
+extern WALAvailability GetWALAvailability(XLogRecPtr targetLSN,
+ XLogRecPtr currWriteLSN,
+ XLogSegNo last_removed_seg);
extern XLogRecPtr CalculateMaxmumSafeLSN(void);
extern void XLogPutNextOid(Oid nextOid);
extern XLogRecPtr XLogRestorePoint(const char *rpName);
--
2.18.4
v2-0002-Forcibly-initialize-lastRemovedSegNo-at-the-first.patchtext/x-patch; charset=us-asciiDownload
From 10425b3c062ed3972c0ad067adb7c14c5dca43c8 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Thu, 18 Jun 2020 14:49:48 +0900
Subject: [PATCH v2 2/2] Forcibly initialize lastRemovedSegNo at the first
checkpoint
If we have a replication slot retaining all existing WAL segments, a
checkpoint doesn't initialize lastRemovedSegNo. That means
pg_replication_slots can not show min_safe_lsn until any slot loses a
segment. Forcibly initialize it even if no WAL segments are removed at
the first checkpoint.
---
src/backend/access/transam/xlog.c | 33 +++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 940f5fcb18..282bac32ab 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4025,6 +4025,16 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr lastredoptr, XLogRecPtr endptr)
DIR *xldir;
struct dirent *xlde;
char lastoff[MAXFNAMELEN];
+ char oldest[MAXFNAMELEN];
+ bool remember_oldest = false;
+
+ /*
+ * If we haven't set the initial last removed segno, force to set it even
+ * if we wouldn't have removed any segments.
+ */
+ oldest[0] = 0;
+ if (XLogGetLastRemovedSegno() == 0)
+ remember_oldest = true;
/*
* Construct a filename of the last segment to be kept. The timeline ID
@@ -4062,10 +4072,33 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr lastredoptr, XLogRecPtr endptr)
{
/* Update the last removed location in shared memory first */
UpdateLastRemovedPtr(xlde->d_name);
+ remember_oldest = false;
RemoveXlogFile(xlde->d_name, lastredoptr, endptr);
}
}
+ /* Remember the oldest file left in the directotry */
+ else if(remember_oldest &&
+ (oldest[0] == 0 || strcmp(xlde->d_name + 8, oldest + 8) < 0))
+ strncpy(oldest, xlde->d_name, MAXFNAMELEN);
+ }
+
+ /*
+ * We haven't initialize the pointer, initialize it.
+ * The last removed pointer is the oldest existing segment minus 1.
+ */
+ if (remember_oldest && oldest[0] != 0)
+ {
+ uint32 tli;
+ XLogSegNo segno;
+
+ XLogFromFileName(oldest, &tli, &segno, wal_segment_size);
+
+ if (segno > 1)
+ {
+ XLogFileName(oldest, tli, segno - 1, wal_segment_size);
+ UpdateLastRemovedPtr(oldest);
+ }
}
FreeDir(xldir);
--
2.18.4
On Thu, Jun 18, 2020 at 11:52 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
At Wed, 17 Jun 2020 21:37:55 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
On 2020/06/15 16:35, Kyotaro Horiguchi wrote:
Isn't it better to use 1 as the second argument of the above,
in order to address the issue that I reported upthread?
Otherwise, the WAL file name that pg_walfile_name(min_safe_lsn)
returns
would be confusing.Mmm. pg_walfile_name seems too specialize to
pg_stop_backup(). (pg_walfile_name_offset() returns wrong result for
segment boundaries.) I'm not willing to do that only to follow such
suspicious(?) specification, but surely it would practically be better
doing that. Please find the attached first patch.
It is a little unclear to me how this or any proposed patch will solve
the original problem reported by Fujii-San? Basically, the problem
arises because we don't have an interlock between when the checkpoint
removes the WAL segment and the view tries to acquire the same. Am, I
missing something?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
At Thu, 18 Jun 2020 18:18:37 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
On Thu, Jun 18, 2020 at 11:52 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:At Wed, 17 Jun 2020 21:37:55 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
On 2020/06/15 16:35, Kyotaro Horiguchi wrote:
Isn't it better to use 1 as the second argument of the above,
in order to address the issue that I reported upthread?
Otherwise, the WAL file name that pg_walfile_name(min_safe_lsn)
returns
would be confusing.Mmm. pg_walfile_name seems too specialize to
pg_stop_backup(). (pg_walfile_name_offset() returns wrong result for
segment boundaries.) I'm not willing to do that only to follow such
suspicious(?) specification, but surely it would practically be better
doing that. Please find the attached first patch.It is a little unclear to me how this or any proposed patch will solve
the original problem reported by Fujii-San? Basically, the problem
arises because we don't have an interlock between when the checkpoint
removes the WAL segment and the view tries to acquire the same. Am, I
missing something?
I'm not sure, but I don't get the point of blocking WAL segment
removal until the view is completed. The said columns of the view are
just for monitoring, which needs an information snapshot seemingly
taken at a certain time. And InvalidateObsoleteReplicationSlots kills
walsenders using lastRemovedSegNo of a different time. The two are
independent each other.
Also the patch changes min_safe_lsn to show an LSN at segment boundary
+ 1.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Fri, Jun 19, 2020 at 10:02:54AM +0900, Kyotaro Horiguchi wrote:
At Thu, 18 Jun 2020 18:18:37 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
It is a little unclear to me how this or any proposed patch will solve
the original problem reported by Fujii-San? Basically, the problem
arises because we don't have an interlock between when the checkpoint
removes the WAL segment and the view tries to acquire the same. Am, I
missing something?
The proposed patch fetches the computation of the minimum LSN across
all slots before taking ReplicationSlotControlLock so its value gets
more lossy, and potentially older than what the slots actually
include. So it is an attempt to take the safest spot possible.
Honestly, I find a bit silly the design to compute and use the same
minimum LSN value for all the tuples returned by
pg_get_replication_slots, and you can actually get a pretty good
estimate of that by emulating ReplicationSlotsComputeRequiredLSN()
directly with what pg_replication_slot provides as we have a min()
aggregate for pg_lsn.
For these reasons, I think that we should remove for now this
information from the view, and reconsider this part more carefully for
14~ with a clear definition of how much lossiness we are ready to
accept for the information provided here, if necessary. We could for
example just have a separate SQL function that just grabs this value
(or a more global SQL view for XLogCtl data that includes this data).
I'm not sure, but I don't get the point of blocking WAL segment
removal until the view is completed.
We should really not do that anyway for a monitoring view.
--
Michael
At Fri, 19 Jun 2020 10:39:58 +0900, Michael Paquier <michael@paquier.xyz> wrote in
On Fri, Jun 19, 2020 at 10:02:54AM +0900, Kyotaro Horiguchi wrote:
At Thu, 18 Jun 2020 18:18:37 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
It is a little unclear to me how this or any proposed patch will solve
the original problem reported by Fujii-San? Basically, the problem
arises because we don't have an interlock between when the checkpoint
removes the WAL segment and the view tries to acquire the same. Am, I
missing something?The proposed patch fetches the computation of the minimum LSN across
all slots before taking ReplicationSlotControlLock so its value gets
more lossy, and potentially older than what the slots actually
include. So it is an attempt to take the safest spot possible.
Minimum LSN (lastRemovedSegNo) is not protected by the lock. That
makes no defference.
Honestly, I find a bit silly the design to compute and use the same
minimum LSN value for all the tuples returned by
pg_get_replication_slots, and you can actually get a pretty good
I see it as silly. I think I said upthread that it was the distance
to the point where the slot loses a segment, and it was rejected but
just removing it makes us unable to estimate the distance so it is
there.
estimate of that by emulating ReplicationSlotsComputeRequiredLSN()
directly with what pg_replication_slot provides as we have a min()
aggregate for pg_lsn.
min(lastRemovedSegNo) is the earliest value. It is enough to read it
at the first then use it in all slots.
For these reasons, I think that we should remove for now this
information from the view, and reconsider this part more carefully for
14~ with a clear definition of how much lossiness we are ready to
accept for the information provided here, if necessary. We could for
example just have a separate SQL function that just grabs this value
(or a more global SQL view for XLogCtl data that includes this data).
I think, we need at least one of the "distance" above or min_safe_lsn
in anywhere reachable from users.
I'm not sure, but I don't get the point of blocking WAL segment
removal until the view is completed.We should really not do that anyway for a monitoring view.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Fri, Jun 19, 2020 at 6:32 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
At Thu, 18 Jun 2020 18:18:37 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
On Thu, Jun 18, 2020 at 11:52 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:At Wed, 17 Jun 2020 21:37:55 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
On 2020/06/15 16:35, Kyotaro Horiguchi wrote:
Isn't it better to use 1 as the second argument of the above,
in order to address the issue that I reported upthread?
Otherwise, the WAL file name that pg_walfile_name(min_safe_lsn)
returns
would be confusing.Mmm. pg_walfile_name seems too specialize to
pg_stop_backup(). (pg_walfile_name_offset() returns wrong result for
segment boundaries.) I'm not willing to do that only to follow such
suspicious(?) specification, but surely it would practically be better
doing that. Please find the attached first patch.It is a little unclear to me how this or any proposed patch will solve
the original problem reported by Fujii-San? Basically, the problem
arises because we don't have an interlock between when the checkpoint
removes the WAL segment and the view tries to acquire the same. Am, I
missing something?I'm not sure, but I don't get the point of blocking WAL segment
removal until the view is completed.
I am not suggesting to do that.
The said columns of the view are
just for monitoring, which needs an information snapshot seemingly
taken at a certain time. And InvalidateObsoleteReplicationSlots kills
walsenders using lastRemovedSegNo of a different time. The two are
independent each other.Also the patch changes min_safe_lsn to show an LSN at segment boundary
+ 1.
But aren't we doing last_removed_seg+1 even without the patch? See code below
- {
- XLogRecPtr min_safe_lsn;
-
- XLogSegNoOffsetToRecPtr(last_removed_seg + 1, 0,
- wal_segment_size, min_safe_lsn);
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Fri, Jun 19, 2020 at 8:44 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
At Fri, 19 Jun 2020 10:39:58 +0900, Michael Paquier <michael@paquier.xyz> wrote in
Honestly, I find a bit silly the design to compute and use the same
minimum LSN value for all the tuples returned by
pg_get_replication_slots, and you can actually get a pretty goodI see it as silly. I think I said upthread that it was the distance
to the point where the slot loses a segment, and it was rejected but
just removing it makes us unable to estimate the distance so it is
there.
IIUC, the value of min_safe_lsn will lesser than restart_lsn, so one
can compute the difference of those to see how much ahead the
replication slot's restart_lsn is from min_safe_lsn but still it is
not clear how user will make any use of it. Can you please explain
how the distance you are talking about is useful to users or anyone?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
At Fri, 19 Jun 2020 09:09:03 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
On Fri, Jun 19, 2020 at 8:44 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:At Fri, 19 Jun 2020 10:39:58 +0900, Michael Paquier <michael@paquier.xyz> wrote in
Honestly, I find a bit silly the design to compute and use the same
minimum LSN value for all the tuples returned by
pg_get_replication_slots, and you can actually get a pretty goodI see it as silly. I think I said upthread that it was the distance
to the point where the slot loses a segment, and it was rejected but
just removing it makes us unable to estimate the distance so it is
there.IIUC, the value of min_safe_lsn will lesser than restart_lsn, so one
can compute the difference of those to see how much ahead the
replication slot's restart_lsn is from min_safe_lsn but still it is
not clear how user will make any use of it. Can you please explain
how the distance you are talking about is useful to users or anyone?
When max_slot_wal_keep_size is set, the slot may retain up to as many
as that amount of old WAL segments then suddenly loses the oldest
segments. *I* thought that I would use it in an HA cluster tool to
inform users about the remaining time (not literally, of course) a
disconnected standy is allowed diconnected. Of course even if some
segments have been lost, they could be copied from the primary's
archive so that's not critical in theory.
--
Kyotaro Horiguchi
NTT Open Source Software Center
At Fri, 19 Jun 2020 08:59:48 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
Mmm. pg_walfile_name seems too specialize to
pg_stop_backup(). (pg_walfile_name_offset() returns wrong result for
segment boundaries.) I'm not willing to do that only to follow such
suspicious(?) specification, but surely it would practically be better
doing that. Please find the attached first patch.It is a little unclear to me how this or any proposed patch will solve
the original problem reported by Fujii-San? Basically, the problem
arises because we don't have an interlock between when the checkpoint
removes the WAL segment and the view tries to acquire the same. Am, I
missing something?I'm not sure, but I don't get the point of blocking WAL segment
removal until the view is completed.I am not suggesting to do that.
The said columns of the view are
just for monitoring, which needs an information snapshot seemingly
taken at a certain time. And InvalidateObsoleteReplicationSlots kills
walsenders using lastRemovedSegNo of a different time. The two are
independent each other.Also the patch changes min_safe_lsn to show an LSN at segment boundary
+ 1.But aren't we doing last_removed_seg+1 even without the patch? See code below
- {
- XLogRecPtr min_safe_lsn;
-
- XLogSegNoOffsetToRecPtr(last_removed_seg + 1, 0,
- wal_segment_size, min_safe_lsn);
It is at the beginning byte of the *next* segment. Fujii-san told that
it should be the next byte of it, namely
"XLogSegNoOffsetToRecPtr(last_removed_seg + 1, *1*,", and the patch
calculates as that. It adds the follows instead.
+ if (max_slot_wal_keep_size_mb >= 0 && last_removed_seg != 0)
+ XLogSegNoOffsetToRecPtr(last_removed_seg + 1, 1,
+ wal_segment_size, min_safe_lsn);
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 2020/06/19 10:39, Michael Paquier wrote:
On Fri, Jun 19, 2020 at 10:02:54AM +0900, Kyotaro Horiguchi wrote:
At Thu, 18 Jun 2020 18:18:37 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
It is a little unclear to me how this or any proposed patch will solve
the original problem reported by Fujii-San? Basically, the problem
arises because we don't have an interlock between when the checkpoint
removes the WAL segment and the view tries to acquire the same. Am, I
missing something?The proposed patch fetches the computation of the minimum LSN across
all slots before taking ReplicationSlotControlLock so its value gets
more lossy, and potentially older than what the slots actually
include. So it is an attempt to take the safest spot possible.Honestly, I find a bit silly the design to compute and use the same
minimum LSN value for all the tuples returned by
pg_get_replication_slots, and you can actually get a pretty good
estimate of that by emulating ReplicationSlotsComputeRequiredLSN()
directly with what pg_replication_slot provides as we have a min()
aggregate for pg_lsn.For these reasons, I think that we should remove for now this
information from the view, and reconsider this part more carefully for
14~ with a clear definition of how much lossiness we are ready to
accept for the information provided here, if necessary.
Agreed. But isn't it too late to remove the columns (i.e., change
the catalog) for v13? Because v13 beta1 was already released.
IIUC the catalog should not be changed since beta1 release so that
users can upgrade PostgreSQL without initdb.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Fri, Jun 19, 2020 at 04:13:27PM +0900, Fujii Masao wrote:
Agreed. But isn't it too late to remove the columns (i.e., change
the catalog) for v13? Because v13 beta1 was already released.
IIUC the catalog should not be changed since beta1 release so that
users can upgrade PostgreSQL without initdb.
Catalog bumps have happened in the past between beta versions:
git log -p REL_12_BETA1..REL_12_BETA2 src/include/catalog/catversion.h
git log -p REL_11_BETA1..REL_11_BETA2 src/include/catalog/catversion.h
git log -p REL_10_BETA1..REL_10_BETA2 src/include/catalog/catversion.h
So we usually avoid to do that between betas, but my take here is that
a catalog bump is better than regretting a change we may have to live
with after the release is sealed.
--
Michael
At Fri, 19 Jun 2020 16:36:09 +0900, Michael Paquier <michael@paquier.xyz> wrote in
On Fri, Jun 19, 2020 at 04:13:27PM +0900, Fujii Masao wrote:
Agreed. But isn't it too late to remove the columns (i.e., change
the catalog) for v13? Because v13 beta1 was already released.
IIUC the catalog should not be changed since beta1 release so that
users can upgrade PostgreSQL without initdb.Catalog bumps have happened in the past between beta versions:
git log -p REL_12_BETA1..REL_12_BETA2 src/include/catalog/catversion.h
git log -p REL_11_BETA1..REL_11_BETA2 src/include/catalog/catversion.h
git log -p REL_10_BETA1..REL_10_BETA2 src/include/catalog/catversion.hSo we usually avoid to do that between betas, but my take here is that
a catalog bump is better than regretting a change we may have to live
with after the release is sealed.
FWIW if we decide that it is really useless, I agree to remove it now.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 2020/06/19 16:43, Kyotaro Horiguchi wrote:
At Fri, 19 Jun 2020 16:36:09 +0900, Michael Paquier <michael@paquier.xyz> wrote in
On Fri, Jun 19, 2020 at 04:13:27PM +0900, Fujii Masao wrote:
Agreed. But isn't it too late to remove the columns (i.e., change
the catalog) for v13? Because v13 beta1 was already released.
IIUC the catalog should not be changed since beta1 release so that
users can upgrade PostgreSQL without initdb.Catalog bumps have happened in the past between beta versions:
git log -p REL_12_BETA1..REL_12_BETA2 src/include/catalog/catversion.h
git log -p REL_11_BETA1..REL_11_BETA2 src/include/catalog/catversion.h
git log -p REL_10_BETA1..REL_10_BETA2 src/include/catalog/catversion.hSo we usually avoid to do that between betas, but my take here is that
a catalog bump is better than regretting a change we may have to live
with after the release is sealed.
Sounds reasonable.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Fri, Jun 19, 2020 at 05:34:01PM +0900, Fujii Masao wrote:
On 2020/06/19 16:43, Kyotaro Horiguchi wrote:
At Fri, 19 Jun 2020 16:36:09 +0900, Michael Paquier <michael@paquier.xyz> wrote in
So we usually avoid to do that between betas, but my take here is that
a catalog bump is better than regretting a change we may have to live
with after the release is sealed.Sounds reasonable.
If we want to make this happen, I am afraid that the time is short as
beta2 is planned for next week. As the version will be likely tagged
by Monday US time, it would be good to get this addressed within 48
hours to give some room to the buildfarm to react. Attached is a
straight-forward proposal of patch. Any thoughts?
(The change in catversion.h is a self-reminder.)
--
Michael
Attachments:
repslot-min-safe-cleanup.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 7644147cf5..a3753fc9f9 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 202006151
+#define CATALOG_VERSION_NO 202006191
#endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 61f2c2f5b4..c1bbb5dd63 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10063,9 +10063,9 @@
proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
proretset => 't', provolatile => 's', prorettype => 'record',
proargtypes => '',
- proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,pg_lsn}',
- proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,min_safe_lsn}',
+ proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text}',
+ proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status}',
prosrc => 'pg_get_replication_slots' },
{ oid => '3786', descr => 'set up a logical replication slot',
proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5314e9348f..507b602a08 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -878,8 +878,7 @@ CREATE VIEW pg_replication_slots AS
L.catalog_xmin,
L.restart_lsn,
L.confirmed_flush_lsn,
- L.wal_status,
- L.min_safe_lsn
+ L.wal_status
FROM pg_get_replication_slots() AS L
LEFT JOIN pg_database D ON (L.datoid = D.oid);
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 06e4955de7..a0713bcdf2 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -236,7 +236,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
Datum
pg_get_replication_slots(PG_FUNCTION_ARGS)
{
-#define PG_GET_REPLICATION_SLOTS_COLS 13
+#define PG_GET_REPLICATION_SLOTS_COLS 12
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
TupleDesc tupdesc;
Tuplestorestate *tupstore;
@@ -366,19 +366,6 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
elog(ERROR, "invalid walstate: %d", (int) walstate);
}
- if (max_slot_wal_keep_size_mb >= 0 &&
- (walstate == WALAVAIL_NORMAL || walstate == WALAVAIL_RESERVED) &&
- ((last_removed_seg = XLogGetLastRemovedSegno()) != 0))
- {
- XLogRecPtr min_safe_lsn;
-
- XLogSegNoOffsetToRecPtr(last_removed_seg + 1, 0,
- wal_segment_size, min_safe_lsn);
- values[i++] = Int64GetDatum(min_safe_lsn);
- }
- else
- nulls[i++] = true;
-
Assert(i == PG_GET_REPLICATION_SLOTS_COLS);
tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index cba7df920c..721e94a1a3 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -28,9 +28,9 @@ $node_master->safe_psql('postgres',
# The slot state and remain should be null before the first connection
my $result = $node_master->safe_psql('postgres',
- "SELECT restart_lsn IS NULL, wal_status is NULL, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT restart_lsn IS NULL, wal_status is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
-is($result, "t|t|t", 'check the state of non-reserved slot is "unknown"');
+is($result, "t|t", 'check the state of non-reserved slot is "unknown"');
# Take backup
@@ -54,9 +54,9 @@ $node_standby->stop;
# Preparation done, the slot is the state "normal" now
$result = $node_master->safe_psql('postgres',
- "SELECT wal_status, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
-is($result, "normal|t", 'check the catching-up state');
+is($result, "normal", 'check the catching-up state');
# Advance WAL by five segments (= 5MB) on master
advance_wal($node_master, 1);
@@ -64,18 +64,18 @@ $node_master->safe_psql('postgres', "CHECKPOINT;");
# The slot is always "safe" when fitting max_wal_size
$result = $node_master->safe_psql('postgres',
- "SELECT wal_status, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
-is($result, "normal|t", 'check that it is safe if WAL fits in max_wal_size');
+is($result, "normal", 'check that it is safe if WAL fits in max_wal_size');
advance_wal($node_master, 4);
$node_master->safe_psql('postgres', "CHECKPOINT;");
# The slot is always "safe" when max_slot_wal_keep_size is not set
$result = $node_master->safe_psql('postgres',
- "SELECT wal_status, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
-is($result, "normal|t", 'check that slot is working');
+is($result, "normal", 'check that slot is working');
# The standby can reconnect to master
$node_standby->start;
@@ -93,23 +93,19 @@ max_slot_wal_keep_size = ${max_slot_wal_keep_size_mb}MB
));
$node_master->reload;
-# The slot is in safe state. The distance from the min_safe_lsn should
-# be as almost (max_slot_wal_keep_size - 1) times large as the segment
-# size
-
+# The slot should be in a normal state.
$result = $node_master->safe_psql('postgres',
"SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'");
is($result, "normal", 'check that max_slot_wal_keep_size is working');
# Advance WAL again then checkpoint, reducing remain by 2 MB.
+# The slot should still be in a normal state after checkpoint.
advance_wal($node_master, 2);
$node_master->safe_psql('postgres', "CHECKPOINT;");
-
-# The slot is still working
$result = $node_master->safe_psql('postgres',
"SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'");
is($result, "normal",
- 'check that min_safe_lsn gets close to the current LSN');
+ 'check that the slot state changes to "normal"');
# The standby can reconnect to master
$node_standby->start;
@@ -153,9 +149,9 @@ advance_wal($node_master, 1);
# Slot gets into 'lost' state
$result = $node_master->safe_psql('postgres',
- "SELECT wal_status, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
-is($result, "lost|t", 'check that the slot state changes to "lost"');
+is($result, "lost", 'check that the slot state changes to "lost"');
# The standby still can connect to master before a checkpoint
$node_standby->start;
@@ -184,9 +180,9 @@ ok( find_in_log(
# This slot should be broken
$result = $node_master->safe_psql('postgres',
- "SELECT slot_name, active, restart_lsn IS NULL, wal_status, min_safe_lsn FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT slot_name, active, restart_lsn IS NULL, wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
-is($result, "rep1|f|t||", 'check that the slot became inactive');
+is($result, "rep1|f|t|", 'check that the slot became inactive');
# The standby no longer can connect to the master
$logstart = get_log_size($node_standby);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index b813e32215..48245774e1 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1463,9 +1463,8 @@ pg_replication_slots| SELECT l.slot_name,
l.catalog_xmin,
l.restart_lsn,
l.confirmed_flush_lsn,
- l.wal_status,
- l.min_safe_lsn
- FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, min_safe_lsn)
+ l.wal_status
+ FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status)
LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
pg_roles| SELECT pg_authid.rolname,
pg_authid.rolsuper,
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 700271fd40..c2ea595d3d 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -11260,15 +11260,6 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
field is null.
</para></entry>
</row>
-
- <row>
- <entry role="catalog_table_entry"><para role="column_definition">
- <structfield>min_safe_lsn</structfield> <type>pg_lsn</type>
- </para>
- <para>
- The minimum LSN currently available for walsenders.
- </para></entry>
- </row>
</tbody>
</tgroup>
</table>
At Fri, 19 Jun 2020 21:15:52 +0900, Michael Paquier <michael@paquier.xyz> wrote in
On Fri, Jun 19, 2020 at 05:34:01PM +0900, Fujii Masao wrote:
On 2020/06/19 16:43, Kyotaro Horiguchi wrote:
At Fri, 19 Jun 2020 16:36:09 +0900, Michael Paquier <michael@paquier.xyz> wrote in
So we usually avoid to do that between betas, but my take here is that
a catalog bump is better than regretting a change we may have to live
with after the release is sealed.Sounds reasonable.
If we want to make this happen, I am afraid that the time is short as
beta2 is planned for next week. As the version will be likely tagged
by Monday US time, it would be good to get this addressed within 48
hours to give some room to the buildfarm to react. Attached is a
straight-forward proposal of patch. Any thoughts?(The change in catversion.h is a self-reminder.)
Thanks for the patch.
As a whole it contains all needed for ripping off the min_safe_lsn.
Some items in the TAP test gets coarse but none of them lose
significance. Compiles almost cleanly and passes all tests including
TAP test.
The variable last_removed_seg in slotfuncs.c:285 is left alone but no
longer used after applying this patch. It should be removed as well.
Other than that the patch looks good to me.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 2020/06/19 21:15, Michael Paquier wrote:
On Fri, Jun 19, 2020 at 05:34:01PM +0900, Fujii Masao wrote:
On 2020/06/19 16:43, Kyotaro Horiguchi wrote:
At Fri, 19 Jun 2020 16:36:09 +0900, Michael Paquier <michael@paquier.xyz> wrote in
So we usually avoid to do that between betas, but my take here is that
a catalog bump is better than regretting a change we may have to live
with after the release is sealed.Sounds reasonable.
If we want to make this happen, I am afraid that the time is short as
beta2 is planned for next week. As the version will be likely tagged
by Monday US time, it would be good to get this addressed within 48
hours to give some room to the buildfarm to react. Attached is a
straight-forward proposal of patch. Any thoughts?
It's better if we can do that. But I think that we should hear Alvaro's opinion
about this before rushing to push the patch. Even if we miss beta2 as the result
of that, I'm ok. We would be able to push something better into beta3.
So, CC Alvaro.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2020-Jun-20, Fujii Masao wrote:
It's better if we can do that. But I think that we should hear Alvaro's opinion
about this before rushing to push the patch. Even if we miss beta2 as the result
of that, I'm ok. We would be able to push something better into beta3.
So, CC Alvaro.
Uh, I was not aware of this thread. I'll go over it now and let you
know.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-Jun-19, Michael Paquier wrote:
If we want to make this happen, I am afraid that the time is short as
beta2 is planned for next week. As the version will be likely tagged
by Monday US time, it would be good to get this addressed within 48
hours to give some room to the buildfarm to react. Attached is a
straight-forward proposal of patch. Any thoughts?
I don't disagree with removing the LSN column, but at the same time we
need to provide *some* way for users to monitor this, so let's add a
function to extract the value they need for that. It seems simple
enough.
I cannot implement it myself now, though. I've reached the end of my
week and I'm not sure I'll be able to work on it during the weekend.
I agree with Kyotaro's opinion that the pg_walfile_name() function seems
too single-minded ...
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Jun 20, 2020 at 7:12 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2020-Jun-19, Michael Paquier wrote:
If we want to make this happen, I am afraid that the time is short as
beta2 is planned for next week. As the version will be likely tagged
by Monday US time, it would be good to get this addressed within 48
hours to give some room to the buildfarm to react. Attached is a
straight-forward proposal of patch. Any thoughts?I don't disagree with removing the LSN column, but at the same time we
need to provide *some* way for users to monitor this, so let's add a
function to extract the value they need for that. It seems simple
enough.
Isn't this information specific to checkpoints, so maybe better to
display in view pg_stat_bgwriter?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Sat, Jun 20, 2020 at 09:45:52AM +0530, Amit Kapila wrote:
On Sat, Jun 20, 2020 at 7:12 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
I don't disagree with removing the LSN column, but at the same time we
need to provide *some* way for users to monitor this, so let's add a
function to extract the value they need for that. It seems simple
enough.Isn't this information specific to checkpoints, so maybe better to
display in view pg_stat_bgwriter?
Not sure that's a good match. If we decide to expose that, a separate
function returning a LSN based on the segment number from
XLogGetLastRemovedSegno() sounds fine to me, like
pg_wal_last_recycled_lsn(). Perhaps somebody has a better name in
mind?
--
Michael
On Sat, Jun 20, 2020 at 03:53:54PM +0900, Michael Paquier wrote:
On Sat, Jun 20, 2020 at 09:45:52AM +0530, Amit Kapila wrote:
Isn't this information specific to checkpoints, so maybe better to
display in view pg_stat_bgwriter?Not sure that's a good match. If we decide to expose that, a separate
function returning a LSN based on the segment number from
XLogGetLastRemovedSegno() sounds fine to me, like
pg_wal_last_recycled_lsn(). Perhaps somebody has a better name in
mind?
I was thinking on this one for the last couple of days, and came up
with the name pg_wal_oldest_lsn(), as per the attached, traking the
oldest WAL location still available. That's unfortunately too late
for beta2, but let's continue the discussion.
--
Michael
Attachments:
repslot-min-safe-cleanup-v2.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 7644147cf5..7de4338910 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 202006151
+#define CATALOG_VERSION_NO 202006221
#endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 61f2c2f5b4..1a07877086 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6090,6 +6090,10 @@
proname => 'pg_current_wal_flush_lsn', provolatile => 'v',
prorettype => 'pg_lsn', proargtypes => '',
prosrc => 'pg_current_wal_flush_lsn' },
+{ oid => '9054', descr => 'oldest wal location still available',
+ proname => 'pg_wal_oldest_lsn', provolatile => 'v',
+ prorettype => 'pg_lsn', proargtypes => '',
+ prosrc => 'pg_wal_oldest_lsn' },
{ oid => '2850',
descr => 'wal filename and byte offset, given a wal location',
proname => 'pg_walfile_name_offset', prorettype => 'record',
@@ -10063,9 +10067,9 @@
proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
proretset => 't', provolatile => 's', prorettype => 'record',
proargtypes => '',
- proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,pg_lsn}',
- proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,min_safe_lsn}',
+ proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text}',
+ proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status}',
prosrc => 'pg_get_replication_slots' },
{ oid => '3786', descr => 'set up a logical replication slot',
proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 290658b22c..ccb3b5d5db 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -387,6 +387,31 @@ pg_current_wal_flush_lsn(PG_FUNCTION_ARGS)
PG_RETURN_LSN(current_recptr);
}
+/*
+ * Report the oldest WAL location still available after WAL segment removal
+ *
+ * This is useful to monitor how much WAL retention is happening with
+ * replication slots and concurrent checkpoints. NULL means that no WAL
+ * segments have been removed since startup yet.
+ */
+Datum
+pg_wal_oldest_lsn(PG_FUNCTION_ARGS)
+{
+ XLogRecPtr oldestptr;
+ XLogSegNo last_removed_seg;
+
+ last_removed_seg = XLogGetLastRemovedSegno();
+
+ /* Leave if no segments have been removed since startup */
+ if (last_removed_seg == 0)
+ PG_RETURN_NULL();
+
+ XLogSegNoOffsetToRecPtr(last_removed_seg + 1, 0,
+ wal_segment_size, oldestptr);
+
+ PG_RETURN_LSN(oldestptr);
+}
+
/*
* Report the last WAL receive location (same format as pg_start_backup etc)
*
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5314e9348f..507b602a08 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -878,8 +878,7 @@ CREATE VIEW pg_replication_slots AS
L.catalog_xmin,
L.restart_lsn,
L.confirmed_flush_lsn,
- L.wal_status,
- L.min_safe_lsn
+ L.wal_status
FROM pg_get_replication_slots() AS L
LEFT JOIN pg_database D ON (L.datoid = D.oid);
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 06e4955de7..590f7054d6 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -236,7 +236,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
Datum
pg_get_replication_slots(PG_FUNCTION_ARGS)
{
-#define PG_GET_REPLICATION_SLOTS_COLS 13
+#define PG_GET_REPLICATION_SLOTS_COLS 12
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
TupleDesc tupdesc;
Tuplestorestate *tupstore;
@@ -282,7 +282,6 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
Datum values[PG_GET_REPLICATION_SLOTS_COLS];
bool nulls[PG_GET_REPLICATION_SLOTS_COLS];
WALAvailability walstate;
- XLogSegNo last_removed_seg;
int i;
if (!slot->in_use)
@@ -366,19 +365,6 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
elog(ERROR, "invalid walstate: %d", (int) walstate);
}
- if (max_slot_wal_keep_size_mb >= 0 &&
- (walstate == WALAVAIL_NORMAL || walstate == WALAVAIL_RESERVED) &&
- ((last_removed_seg = XLogGetLastRemovedSegno()) != 0))
- {
- XLogRecPtr min_safe_lsn;
-
- XLogSegNoOffsetToRecPtr(last_removed_seg + 1, 0,
- wal_segment_size, min_safe_lsn);
- values[i++] = Int64GetDatum(min_safe_lsn);
- }
- else
- nulls[i++] = true;
-
Assert(i == PG_GET_REPLICATION_SLOTS_COLS);
tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index cba7df920c..721e94a1a3 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -28,9 +28,9 @@ $node_master->safe_psql('postgres',
# The slot state and remain should be null before the first connection
my $result = $node_master->safe_psql('postgres',
- "SELECT restart_lsn IS NULL, wal_status is NULL, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT restart_lsn IS NULL, wal_status is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
-is($result, "t|t|t", 'check the state of non-reserved slot is "unknown"');
+is($result, "t|t", 'check the state of non-reserved slot is "unknown"');
# Take backup
@@ -54,9 +54,9 @@ $node_standby->stop;
# Preparation done, the slot is the state "normal" now
$result = $node_master->safe_psql('postgres',
- "SELECT wal_status, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
-is($result, "normal|t", 'check the catching-up state');
+is($result, "normal", 'check the catching-up state');
# Advance WAL by five segments (= 5MB) on master
advance_wal($node_master, 1);
@@ -64,18 +64,18 @@ $node_master->safe_psql('postgres', "CHECKPOINT;");
# The slot is always "safe" when fitting max_wal_size
$result = $node_master->safe_psql('postgres',
- "SELECT wal_status, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
-is($result, "normal|t", 'check that it is safe if WAL fits in max_wal_size');
+is($result, "normal", 'check that it is safe if WAL fits in max_wal_size');
advance_wal($node_master, 4);
$node_master->safe_psql('postgres', "CHECKPOINT;");
# The slot is always "safe" when max_slot_wal_keep_size is not set
$result = $node_master->safe_psql('postgres',
- "SELECT wal_status, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
-is($result, "normal|t", 'check that slot is working');
+is($result, "normal", 'check that slot is working');
# The standby can reconnect to master
$node_standby->start;
@@ -93,23 +93,19 @@ max_slot_wal_keep_size = ${max_slot_wal_keep_size_mb}MB
));
$node_master->reload;
-# The slot is in safe state. The distance from the min_safe_lsn should
-# be as almost (max_slot_wal_keep_size - 1) times large as the segment
-# size
-
+# The slot should be in a normal state.
$result = $node_master->safe_psql('postgres',
"SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'");
is($result, "normal", 'check that max_slot_wal_keep_size is working');
# Advance WAL again then checkpoint, reducing remain by 2 MB.
+# The slot should still be in a normal state after checkpoint.
advance_wal($node_master, 2);
$node_master->safe_psql('postgres', "CHECKPOINT;");
-
-# The slot is still working
$result = $node_master->safe_psql('postgres',
"SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'");
is($result, "normal",
- 'check that min_safe_lsn gets close to the current LSN');
+ 'check that the slot state changes to "normal"');
# The standby can reconnect to master
$node_standby->start;
@@ -153,9 +149,9 @@ advance_wal($node_master, 1);
# Slot gets into 'lost' state
$result = $node_master->safe_psql('postgres',
- "SELECT wal_status, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
-is($result, "lost|t", 'check that the slot state changes to "lost"');
+is($result, "lost", 'check that the slot state changes to "lost"');
# The standby still can connect to master before a checkpoint
$node_standby->start;
@@ -184,9 +180,9 @@ ok( find_in_log(
# This slot should be broken
$result = $node_master->safe_psql('postgres',
- "SELECT slot_name, active, restart_lsn IS NULL, wal_status, min_safe_lsn FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT slot_name, active, restart_lsn IS NULL, wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
-is($result, "rep1|f|t||", 'check that the slot became inactive');
+is($result, "rep1|f|t|", 'check that the slot became inactive');
# The standby no longer can connect to the master
$logstart = get_log_size($node_standby);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index b813e32215..48245774e1 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1463,9 +1463,8 @@ pg_replication_slots| SELECT l.slot_name,
l.catalog_xmin,
l.restart_lsn,
l.confirmed_flush_lsn,
- l.wal_status,
- l.min_safe_lsn
- FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, min_safe_lsn)
+ l.wal_status
+ FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status)
LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
pg_roles| SELECT pg_authid.rolname,
pg_authid.rolsuper,
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 5a66115df1..5122bfcd8b 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -11260,15 +11260,6 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
field is null.
</para></entry>
</row>
-
- <row>
- <entry role="catalog_table_entry"><para role="column_definition">
- <structfield>min_safe_lsn</structfield> <type>pg_lsn</type>
- </para>
- <para>
- The minimum LSN currently available for walsenders.
- </para></entry>
- </row>
</tbody>
</tgroup>
</table>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b7c450ea29..afc1c1410c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24067,6 +24067,22 @@ SELECT collation for ('foo' COLLATE "de_DE");
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_oldest_wal_lsn</primary>
+ </indexterm>
+ <function>pg_oldest_wal_lsn</function> ()
+ <returnvalue>pg_lsn</returnvalue>
+ </para>
+ <para>
+ Returns the oldest WAL location still available on the system,
+ calculated based on the latest WAL segment recycled, or
+ <literal>NULL</literal> if no WAL segments have been removed since
+ startup.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
On Mon, Jun 22, 2020 at 11:19 AM Michael Paquier <michael@paquier.xyz> wrote:
On Sat, Jun 20, 2020 at 03:53:54PM +0900, Michael Paquier wrote:
On Sat, Jun 20, 2020 at 09:45:52AM +0530, Amit Kapila wrote:
Isn't this information specific to checkpoints, so maybe better to
display in view pg_stat_bgwriter?Not sure that's a good match. If we decide to expose that, a separate
function returning a LSN based on the segment number from
XLogGetLastRemovedSegno() sounds fine to me, like
pg_wal_last_recycled_lsn(). Perhaps somebody has a better name in
mind?I was thinking on this one for the last couple of days, and came up
with the name pg_wal_oldest_lsn(), as per the attached, traking the
oldest WAL location still available.
I feel such a function is good to have but I am not sure if there is a
need to tie it with the removal of min_safe_lsn column.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 2020/06/22 21:01, Amit Kapila wrote:
On Mon, Jun 22, 2020 at 11:19 AM Michael Paquier <michael@paquier.xyz> wrote:
On Sat, Jun 20, 2020 at 03:53:54PM +0900, Michael Paquier wrote:
On Sat, Jun 20, 2020 at 09:45:52AM +0530, Amit Kapila wrote:
Isn't this information specific to checkpoints, so maybe better to
display in view pg_stat_bgwriter?Not sure that's a good match. If we decide to expose that, a separate
function returning a LSN based on the segment number from
XLogGetLastRemovedSegno() sounds fine to me, like
pg_wal_last_recycled_lsn(). Perhaps somebody has a better name in
mind?I was thinking on this one for the last couple of days, and came up
with the name pg_wal_oldest_lsn(), as per the attached, traking the
oldest WAL location still available.
Thanks for the patch!
+ <literal>NULL</literal> if no WAL segments have been removed since
+ startup.
Isn't this confusing? I think that we should store the last removed
WAL segment to somewhere (e.g., pg_control) and restore it at
the startup, so that we can see the actual value even after the startup.
Or we should scan pg_wal directory and find the "minimal" WAL segment
and return its LSN.
I feel such a function is good to have but I am not sure if there is a
need to tie it with the removal of min_safe_lsn column.
We should expose the LSN calculated from
"the current WAL LSN - max(wal_keep_segments * 16MB, max_slot_wal_keep_size)"?
This indicates the minimum LSN of WAL files that are guaraneed to be
currently retained by wal_keep_segments and max_slot_wal_keep_size.
That is, if checkpoint occurs when restart_lsn of replication slot is
smaller than that minimum LSN, some required WAL files may be removed.
So DBAs can periodically monitor and compare restart_lsn and that minimum
LSN. If they see frequently that difference of those LSN is very small,
they can decide to increase wal_keep_segments or max_slot_wal_keep_size,
to prevent required WAL files from being removed. Thought?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
At Mon, 22 Jun 2020 22:02:51 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
On 2020/06/22 21:01, Amit Kapila wrote:
On Mon, Jun 22, 2020 at 11:19 AM Michael Paquier <michael@paquier.xyz>
wrote:On Sat, Jun 20, 2020 at 03:53:54PM +0900, Michael Paquier wrote:
On Sat, Jun 20, 2020 at 09:45:52AM +0530, Amit Kapila wrote:
Isn't this information specific to checkpoints, so maybe better to
display in view pg_stat_bgwriter?Not sure that's a good match. If we decide to expose that, a separate
function returning a LSN based on the segment number from
XLogGetLastRemovedSegno() sounds fine to me, like
pg_wal_last_recycled_lsn(). Perhaps somebody has a better name in
mind?I was thinking on this one for the last couple of days, and came up
with the name pg_wal_oldest_lsn(), as per the attached, traking the
oldest WAL location still available.Thanks for the patch!
+ <literal>NULL</literal> if no WAL segments have been removed since + startup.Isn't this confusing? I think that we should store the last removed
WAL segment to somewhere (e.g., pg_control) and restore it at
the startup, so that we can see the actual value even after the
startup.
Or we should scan pg_wal directory and find the "minimal" WAL segment
and return its LSN.
Running a separate scan on pg_wal at startup or first time the oldest
WAL segno is referenced is something that was rejected before. But
with the current behavior we don't find the last removed segment until
any slot loses a segment if all WAL files are retained by a slot. FWIW
I recently proposed a patch to find the oldest WAL file while trying
removing old WAL files.
I feel such a function is good to have but I am not sure if there is a
need to tie it with the removal of min_safe_lsn column.We should expose the LSN calculated from
"the current WAL LSN - max(wal_keep_segments * 16MB,
max_slot_wal_keep_size)"?
This indicates the minimum LSN of WAL files that are guaraneed to be
currently retained by wal_keep_segments and max_slot_wal_keep_size.
That is, if checkpoint occurs when restart_lsn of replication slot is
smaller than that minimum LSN, some required WAL files may be removed.
So DBAs can periodically monitor and compare restart_lsn and that
minimum
LSN. If they see frequently that difference of those LSN is very
small,
they can decide to increase wal_keep_segments or
max_slot_wal_keep_size,
to prevent required WAL files from being removed. Thought?
I'm not sure about the consensus here about showing that number in the
view. It is similar to "remain" in the earlier versions of this patch
but a bit simpler. It would be usable in a similar way. I can live
with either numbers.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Tue, Jun 23, 2020 at 10:10:37AM +0900, Kyotaro Horiguchi wrote:
At Mon, 22 Jun 2020 22:02:51 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
Isn't this confusing? I think that we should store the last removed
WAL segment to somewhere (e.g., pg_control) and restore it at
the startup, so that we can see the actual value even after the
startup.
Or we should scan pg_wal directory and find the "minimal" WAL segment
and return its LSN.Running a separate scan on pg_wal at startup or first time the oldest
WAL segno is referenced is something that was rejected before. But
with the current behavior we don't find the last removed segment until
any slot loses a segment if all WAL files are retained by a slot. FWIW
I recently proposed a patch to find the oldest WAL file while trying
removing old WAL files.
Hmm. I agree that the approach I previously sent may be kind of
confusing without a clear initialization point, which would actually
be (checkPointCopy.redo + checkPointCopy.ThisTimeLineID) from the
control file with an extra computation depending on any replication
slot data present on disk? So one could do the maths cleanly after
StartupReplicationSlots() is called in the startup process. My point
is: it does not seem really obvious to me that we need to change the
control file to track that.
I'm not sure about the consensus here about showing that number in the
view. It is similar to "remain" in the earlier versions of this patch
but a bit simpler. It would be usable in a similar way. I can live
with either numbers.
Anyway, here is my take. We are discussing a design issue here, we
are moving the discussion into having a different design, and
discussing new designs is never a good sign post-beta (some open items
tend to move towards this direction every year). So I'd like to think
that the best thing we can do here is just to drop min_safe_lsn from
pg_replication_slots, and just reconsider this part for 14~ with
something we think is better.
By the way, I have added a separate open item for this thread.
--
Michael
On 2020/06/23 10:10, Kyotaro Horiguchi wrote:
At Mon, 22 Jun 2020 22:02:51 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
On 2020/06/22 21:01, Amit Kapila wrote:
On Mon, Jun 22, 2020 at 11:19 AM Michael Paquier <michael@paquier.xyz>
wrote:On Sat, Jun 20, 2020 at 03:53:54PM +0900, Michael Paquier wrote:
On Sat, Jun 20, 2020 at 09:45:52AM +0530, Amit Kapila wrote:
Isn't this information specific to checkpoints, so maybe better to
display in view pg_stat_bgwriter?Not sure that's a good match. If we decide to expose that, a separate
function returning a LSN based on the segment number from
XLogGetLastRemovedSegno() sounds fine to me, like
pg_wal_last_recycled_lsn(). Perhaps somebody has a better name in
mind?I was thinking on this one for the last couple of days, and came up
with the name pg_wal_oldest_lsn(), as per the attached, traking the
oldest WAL location still available.Thanks for the patch!
+ <literal>NULL</literal> if no WAL segments have been removed since + startup.Isn't this confusing? I think that we should store the last removed
WAL segment to somewhere (e.g., pg_control) and restore it at
the startup, so that we can see the actual value even after the
startup.
Or we should scan pg_wal directory and find the "minimal" WAL segment
and return its LSN.Running a separate scan on pg_wal at startup or first time the oldest
WAL segno is referenced is something that was rejected before. But
with the current behavior we don't find the last removed segment until
any slot loses a segment if all WAL files are retained by a slot.
Because scanning pg_wal can be heavy operation especially when
max_wal_size is high and there are lots of WAL files? If so, it might
be better to save the value in pg_control as I told upthread.
However I'm not sure the use case of this function yet...
FWIW
I recently proposed a patch to find the oldest WAL file while trying
removing old WAL files.I feel such a function is good to have but I am not sure if there is a
need to tie it with the removal of min_safe_lsn column.We should expose the LSN calculated from
"the current WAL LSN - max(wal_keep_segments * 16MB,
max_slot_wal_keep_size)"?
This indicates the minimum LSN of WAL files that are guaraneed to be
currently retained by wal_keep_segments and max_slot_wal_keep_size.
That is, if checkpoint occurs when restart_lsn of replication slot is
smaller than that minimum LSN, some required WAL files may be removed.
So DBAs can periodically monitor and compare restart_lsn and that
minimum
LSN. If they see frequently that difference of those LSN is very
small,
they can decide to increase wal_keep_segments or
max_slot_wal_keep_size,
to prevent required WAL files from being removed. Thought?I'm not sure about the consensus here about showing that number in the
view. It is similar to "remain" in the earlier versions of this patch
but a bit simpler. It would be usable in a similar way. I can live
with either numbers.
It's useless to display this value in each replication slot in the view.
I'm thinking to expose it as a function.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Mon, Jun 22, 2020 at 6:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/06/22 21:01, Amit Kapila wrote:
On Mon, Jun 22, 2020 at 11:19 AM Michael Paquier <michael@paquier.xyz> wrote:
On Sat, Jun 20, 2020 at 03:53:54PM +0900, Michael Paquier wrote:
On Sat, Jun 20, 2020 at 09:45:52AM +0530, Amit Kapila wrote:
Isn't this information specific to checkpoints, so maybe better to
display in view pg_stat_bgwriter?Not sure that's a good match. If we decide to expose that, a separate
function returning a LSN based on the segment number from
XLogGetLastRemovedSegno() sounds fine to me, like
pg_wal_last_recycled_lsn(). Perhaps somebody has a better name in
mind?I was thinking on this one for the last couple of days, and came up
with the name pg_wal_oldest_lsn(), as per the attached, traking the
oldest WAL location still available.Thanks for the patch!
+ <literal>NULL</literal> if no WAL segments have been removed since + startup.Isn't this confusing? I think that we should store the last removed
WAL segment to somewhere (e.g., pg_control) and restore it at
the startup, so that we can see the actual value even after the startup.
Or we should scan pg_wal directory and find the "minimal" WAL segment
and return its LSN.I feel such a function is good to have but I am not sure if there is a
need to tie it with the removal of min_safe_lsn column.We should expose the LSN calculated from
"the current WAL LSN - max(wal_keep_segments * 16MB, max_slot_wal_keep_size)"?
This indicates the minimum LSN of WAL files that are guaraneed to be
currently retained by wal_keep_segments and max_slot_wal_keep_size.
That is, if checkpoint occurs when restart_lsn of replication slot is
smaller than that minimum LSN, some required WAL files may be removed.So DBAs can periodically monitor and compare restart_lsn and that minimum
LSN. If they see frequently that difference of those LSN is very small,
they can decide to increase wal_keep_segments or max_slot_wal_keep_size,
to prevent required WAL files from being removed. Thought?
+1. This sounds like a good and useful stat for users.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Jun 23, 2020 at 7:47 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/06/23 10:10, Kyotaro Horiguchi wrote:
At Mon, 22 Jun 2020 22:02:51 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
I feel such a function is good to have but I am not sure if there is a
need to tie it with the removal of min_safe_lsn column.We should expose the LSN calculated from
"the current WAL LSN - max(wal_keep_segments * 16MB,
max_slot_wal_keep_size)"?
This indicates the minimum LSN of WAL files that are guaraneed to be
currently retained by wal_keep_segments and max_slot_wal_keep_size.
That is, if checkpoint occurs when restart_lsn of replication slot is
smaller than that minimum LSN, some required WAL files may be removed.
So DBAs can periodically monitor and compare restart_lsn and that
minimum
LSN. If they see frequently that difference of those LSN is very
small,
they can decide to increase wal_keep_segments or
max_slot_wal_keep_size,
to prevent required WAL files from being removed. Thought?I'm not sure about the consensus here about showing that number in the
view. It is similar to "remain" in the earlier versions of this patch
but a bit simpler. It would be usable in a similar way. I can live
with either numbers.It's useless to display this value in each replication slot in the view.
I'm thinking to expose it as a function.
Having a separate function for this seems like a good idea but can we
consider displaying it in a view like pg_stat_replication_slots as we
are discussing a nearby thread to have such a view for other things.
I think ultimately this information is required to check whether some
slot can be invalidated or not, so having it displayed along with
other slot information might not be a bad idea.
[1]: /messages/by-id/CAA4eK1Jyh4qgdnxzV4fYuk9GiXLb=Uz-6o19E2RfiN8MPmUu3A@mail.gmail.com
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
At Tue, 23 Jun 2020 11:50:34 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
On Mon, Jun 22, 2020 at 6:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/06/22 21:01, Amit Kapila wrote:
On Mon, Jun 22, 2020 at 11:19 AM Michael Paquier <michael@paquier.xyz> wrote:
On Sat, Jun 20, 2020 at 03:53:54PM +0900, Michael Paquier wrote:
On Sat, Jun 20, 2020 at 09:45:52AM +0530, Amit Kapila wrote:
Isn't this information specific to checkpoints, so maybe better to
display in view pg_stat_bgwriter?Not sure that's a good match. If we decide to expose that, a separate
function returning a LSN based on the segment number from
XLogGetLastRemovedSegno() sounds fine to me, like
pg_wal_last_recycled_lsn(). Perhaps somebody has a better name in
mind?I was thinking on this one for the last couple of days, and came up
with the name pg_wal_oldest_lsn(), as per the attached, traking the
oldest WAL location still available.Thanks for the patch!
+ <literal>NULL</literal> if no WAL segments have been removed since + startup.Isn't this confusing? I think that we should store the last removed
WAL segment to somewhere (e.g., pg_control) and restore it at
the startup, so that we can see the actual value even after the startup.
Or we should scan pg_wal directory and find the "minimal" WAL segment
and return its LSN.I feel such a function is good to have but I am not sure if there is a
need to tie it with the removal of min_safe_lsn column.We should expose the LSN calculated from
"the current WAL LSN - max(wal_keep_segments * 16MB, max_slot_wal_keep_size)"?
This indicates the minimum LSN of WAL files that are guaraneed to be
currently retained by wal_keep_segments and max_slot_wal_keep_size.
That is, if checkpoint occurs when restart_lsn of replication slot is
smaller than that minimum LSN, some required WAL files may be removed.So DBAs can periodically monitor and compare restart_lsn and that minimum
LSN. If they see frequently that difference of those LSN is very small,
they can decide to increase wal_keep_segments or max_slot_wal_keep_size,
to prevent required WAL files from being removed. Thought?+1. This sounds like a good and useful stat for users.
+1 for showing a number that is not involving lastRemovedSegNo. It is
like returning to the initial version of this patch. It showed a
number like ((the suggested above) minus restart_lsn). The number is
different for each slot so they fit in the view.
The number is usable for the same purpose so I'm ok with it.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 2020-Jun-23, Kyotaro Horiguchi wrote:
At Tue, 23 Jun 2020 11:50:34 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
On Mon, Jun 22, 2020 at 6:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
We should expose the LSN calculated from
"the current WAL LSN - max(wal_keep_segments * 16MB, max_slot_wal_keep_size)"?
This indicates the minimum LSN of WAL files that are guaraneed to be
currently retained by wal_keep_segments and max_slot_wal_keep_size.
That is, if checkpoint occurs when restart_lsn of replication slot is
smaller than that minimum LSN, some required WAL files may be removed.So DBAs can periodically monitor and compare restart_lsn and that minimum
LSN. If they see frequently that difference of those LSN is very small,
they can decide to increase wal_keep_segments or max_slot_wal_keep_size,
to prevent required WAL files from being removed. Thought?+1. This sounds like a good and useful stat for users.
+1 for showing a number that is not involving lastRemovedSegNo. It is
like returning to the initial version of this patch. It showed a
number like ((the suggested above) minus restart_lsn). The number is
different for each slot so they fit in the view.The number is usable for the same purpose so I'm ok with it.
I think we should publish the value from wal_keep_segments separately
from max_slot_wal_keep_size. ISTM that the user might decide to change
or remove wal_keep_segments and be suddenly at risk of losing slots
because of overlooking that it was wal_keep_segments, not
max_slot_wal_keep_size, that was protecting them.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020/06/23 15:27, Amit Kapila wrote:
On Tue, Jun 23, 2020 at 7:47 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/06/23 10:10, Kyotaro Horiguchi wrote:
At Mon, 22 Jun 2020 22:02:51 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
I feel such a function is good to have but I am not sure if there is a
need to tie it with the removal of min_safe_lsn column.We should expose the LSN calculated from
"the current WAL LSN - max(wal_keep_segments * 16MB,
max_slot_wal_keep_size)"?
This indicates the minimum LSN of WAL files that are guaraneed to be
currently retained by wal_keep_segments and max_slot_wal_keep_size.
That is, if checkpoint occurs when restart_lsn of replication slot is
smaller than that minimum LSN, some required WAL files may be removed.
So DBAs can periodically monitor and compare restart_lsn and that
minimum
LSN. If they see frequently that difference of those LSN is very
small,
they can decide to increase wal_keep_segments or
max_slot_wal_keep_size,
to prevent required WAL files from being removed. Thought?I'm not sure about the consensus here about showing that number in the
view. It is similar to "remain" in the earlier versions of this patch
but a bit simpler. It would be usable in a similar way. I can live
with either numbers.It's useless to display this value in each replication slot in the view.
I'm thinking to expose it as a function.Having a separate function for this seems like a good idea but can we
consider displaying it in a view like pg_stat_replication_slots as we
are discussing a nearby thread to have such a view for other things.
I think ultimately this information is required to check whether some
slot can be invalidated or not, so having it displayed along with
other slot information might not be a bad idea.
"the current WAL LSN - max(wal_keep_segments * 16MB, max_slot_wal_keep_size)"
is the same value between all the replication slots. But you think it's better
to display that same value for every slots in the view?
Or you're thinking to display the difference of that LSN value and
restart_lsn as Horiguchi-san suggested? That diff varies each replication slot,
so it seems ok to display it for every rows.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2020/06/24 8:39, Alvaro Herrera wrote:
On 2020-Jun-23, Kyotaro Horiguchi wrote:
At Tue, 23 Jun 2020 11:50:34 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
On Mon, Jun 22, 2020 at 6:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
We should expose the LSN calculated from
"the current WAL LSN - max(wal_keep_segments * 16MB, max_slot_wal_keep_size)"?
This indicates the minimum LSN of WAL files that are guaraneed to be
currently retained by wal_keep_segments and max_slot_wal_keep_size.
That is, if checkpoint occurs when restart_lsn of replication slot is
smaller than that minimum LSN, some required WAL files may be removed.So DBAs can periodically monitor and compare restart_lsn and that minimum
LSN. If they see frequently that difference of those LSN is very small,
they can decide to increase wal_keep_segments or max_slot_wal_keep_size,
to prevent required WAL files from being removed. Thought?+1. This sounds like a good and useful stat for users.
+1 for showing a number that is not involving lastRemovedSegNo. It is
like returning to the initial version of this patch. It showed a
number like ((the suggested above) minus restart_lsn). The number is
different for each slot so they fit in the view.The number is usable for the same purpose so I'm ok with it.
I think we should publish the value from wal_keep_segments separately
from max_slot_wal_keep_size. ISTM that the user might decide to change
or remove wal_keep_segments and be suddenly at risk of losing slots
because of overlooking that it was wal_keep_segments, not
max_slot_wal_keep_size, that was protecting them.
You mean to have two functions that returns
1. "current WAL LSN - wal_keep_segments * 16MB"
2. "current WAL LSN - max_slot_wal_keep_size"
Right?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Wed, Jun 24, 2020 at 2:37 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/06/23 15:27, Amit Kapila wrote:
Having a separate function for this seems like a good idea but can we
consider displaying it in a view like pg_stat_replication_slots as we
are discussing a nearby thread to have such a view for other things.
I think ultimately this information is required to check whether some
slot can be invalidated or not, so having it displayed along with
other slot information might not be a bad idea."the current WAL LSN - max(wal_keep_segments * 16MB, max_slot_wal_keep_size)"
is the same value between all the replication slots. But you think it's better
to display that same value for every slots in the view?Or you're thinking to display the difference of that LSN value and
restart_lsn as Horiguchi-san suggested?
I see value in Horiguchi-San's proposal. IIUC, it will tell help
DBAs/Users to know if any particular slot will get invalidated soon.
That diff varies each replication slot,
so it seems ok to display it for every rows.
Yes.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 2020-Jun-24, Fujii Masao wrote:
On 2020/06/24 8:39, Alvaro Herrera wrote:
I think we should publish the value from wal_keep_segments separately
from max_slot_wal_keep_size. ISTM that the user might decide to change
or remove wal_keep_segments and be suddenly at risk of losing slots
because of overlooking that it was wal_keep_segments, not
max_slot_wal_keep_size, that was protecting them.You mean to have two functions that returns
1. "current WAL LSN - wal_keep_segments * 16MB"
2. "current WAL LSN - max_slot_wal_keep_size"
Hmm, but all the values there are easily findable. What would be the
point in repeating it?
Maybe we should disregard this line of thinking and go back to
Horiguchi-san's original proposal, to wit use the "distance to
breakage", as also supported now by Amit Kapila[1]/messages/by-id/CAA4eK1L2oJ7T1cESdc5w4J9L3Q_hhvWqTigdAXKfnsJy4=v13w@mail.gmail.com (unless I
misunderstand him).
[1]: /messages/by-id/CAA4eK1L2oJ7T1cESdc5w4J9L3Q_hhvWqTigdAXKfnsJy4=v13w@mail.gmail.com
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jun 24, 2020 at 8:45 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2020-Jun-24, Fujii Masao wrote:
On 2020/06/24 8:39, Alvaro Herrera wrote:
I think we should publish the value from wal_keep_segments separately
from max_slot_wal_keep_size. ISTM that the user might decide to change
or remove wal_keep_segments and be suddenly at risk of losing slots
because of overlooking that it was wal_keep_segments, not
max_slot_wal_keep_size, that was protecting them.You mean to have two functions that returns
1. "current WAL LSN - wal_keep_segments * 16MB"
2. "current WAL LSN - max_slot_wal_keep_size"Hmm, but all the values there are easily findable. What would be the
point in repeating it?Maybe we should disregard this line of thinking and go back to
Horiguchi-san's original proposal, to wit use the "distance to
breakage", as also supported now by Amit Kapila[1] (unless I
misunderstand him).
+1. I also think let's drop the idea of exposing a function for this
value and revert the min_safe_lsn part of the work as proposed by
Michael above [1]/messages/by-id/20200622054950.GC50978@paquier.xyz excluding the function pg_wal_oldest_lsn() in that
patch. Then, we can expose this as a new stat for PG14. I feel it
would be better to display this stat in a new view (something like
pg_stat_replication_slots) as discussed in another thread [2]/messages/by-id/CA+fd4k5_pPAYRTDrO2PbtTOe0eHQpBvuqmCr8ic39uTNmR49Eg@mail.gmail.com. Does
that make sense?
[1]: /messages/by-id/20200622054950.GC50978@paquier.xyz
[2]: /messages/by-id/CA+fd4k5_pPAYRTDrO2PbtTOe0eHQpBvuqmCr8ic39uTNmR49Eg@mail.gmail.com
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 2020-Jun-25, Amit Kapila wrote:
+1. I also think let's drop the idea of exposing a function for this
value and revert the min_safe_lsn part of the work as proposed by
Michael above [1] excluding the function pg_wal_oldest_lsn() in that
patch. Then, we can expose this as a new stat for PG14. I feel it
would be better to display this stat in a new view (something like
pg_stat_replication_slots) as discussed in another thread [2]. Does
that make sense?
I don't understand the proposal. Michael posted a patch that adds
pg_wal_oldest_lsn(), and you say we should apply the patch except the
part that adds that function -- so what part would be applying?
If the proposal is to apply just the hunk in pg_get_replication_slots
that removes min_safe_lsn, and do nothing else in pg13, then I don't like
it. The feature exposes a way to monitor slots w.r.t. the maximum slot
size; I'm okay if you prefer to express that in a different way, but I
don't like the idea of shipping pg13 without any way to monitor it.
As reported by Masao-san, the current min_safe_lsn has a definitional
problem when used with pg_walfile_name(), but we've established that
that's because pg_walfile_name() has a special-case definition, not
because min_safe_lsn itself is bogus. If we're looking for a minimal
change that can fix this problem, let's increment one byte, which should
fix that issue, no?
I also see that some people complain that all slots return the same
value and therefore this column is redundant. To that argument I say
that it's not unreasonable that we'll add a slot-specific size limit;
and if we do, we'll be happy we had slot-specific min safe LSN; see e.g.
/messages/by-id/20170301160610.wc7ez3vihmialntd@alap3.anarazel.de
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jun 25, 2020 at 11:24:27AM -0400, Alvaro Herrera wrote:
I don't understand the proposal. Michael posted a patch that adds
pg_wal_oldest_lsn(), and you say we should apply the patch except the
part that adds that function -- so what part would be applying?
I have sent last week a patch about only the removal of min_safe_lsn:
/messages/by-id/20200619121552.GH453547@paquier.xyz
So this applies to this part.
If the proposal is to apply just the hunk in pg_get_replication_slots
that removes min_safe_lsn, and do nothing else in pg13, then I don't like
it. The feature exposes a way to monitor slots w.r.t. the maximum slot
size; I'm okay if you prefer to express that in a different way, but I
don't like the idea of shipping pg13 without any way to monitor it.
From what I can see, it seems to me that we have a lot of views of how
to tackle the matter. That gives an idea that we are not really happy
with the current state of things, and usually a sign that we may want
to redesign it, going back to this issue for v14.
My 2c.
--
Michael
On 2020-Jun-26, Michael Paquier wrote:
On Thu, Jun 25, 2020 at 11:24:27AM -0400, Alvaro Herrera wrote:
I don't understand the proposal. Michael posted a patch that adds
pg_wal_oldest_lsn(), and you say we should apply the patch except the
part that adds that function -- so what part would be applying?I have sent last week a patch about only the removal of min_safe_lsn:
/messages/by-id/20200619121552.GH453547@paquier.xyz
So this applies to this part.
Well, I oppose that because it leaves us with no way to monitor slot
limits. In his opening email, Masao-san proposed to simply change the
value by adding 1. How you go from adding 1 to a column to removing
the column completely with no recourse, is beyond me.
Let me summarize the situation and possible ways forward as I see them.
If I'm mistaken, please correct me.
Problems:
i) pg_replication_slot.min_safe_lsn has a weird definition in that all
replication slots show the same value
ii) min_safe_lsn cannot be used with pg_walfile_name, because it returns
the name of the previous segment.
Proposed solutions:
a) Do nothing -- keep the min_safe_lsn column as is. Warn users that
pg_walfile_name should not be used with this column.
b) Redefine min_safe_lsn to be lsn+1, so that pg_walfile_name can be used
and return a useful value.
c) Remove min_safe_lsn; add functions that expose the same value
d) Remove min_safe_lsn; add a new view that exposes the same value and
possibly others
e) Replace min_safe_lsn with a "distance" column, which reports
restart_lsn - oldest valid LSN
(Note that you no longer have an LSN in this scenario, so you can't
call pg_walfile_name.)
The original patch implemented (e); it was changed to its current
definition because of this[1]/messages/by-id/20171106132050.6apzynxrqrzghb4r@alap3.anarazel.de comment. My proposal is to put it back.
[1]: /messages/by-id/20171106132050.6apzynxrqrzghb4r@alap3.anarazel.de
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jun 26, 2020 at 4:54 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2020-Jun-26, Michael Paquier wrote:
On Thu, Jun 25, 2020 at 11:24:27AM -0400, Alvaro Herrera wrote:
I don't understand the proposal. Michael posted a patch that adds
pg_wal_oldest_lsn(), and you say we should apply the patch except the
part that adds that function -- so what part would be applying?I have sent last week a patch about only the removal of min_safe_lsn:
/messages/by-id/20200619121552.GH453547@paquier.xyz
So this applies to this part.Well, I oppose that because it leaves us with no way to monitor slot
limits. In his opening email, Masao-san proposed to simply change the
value by adding 1. How you go from adding 1 to a column to removing
the column completely with no recourse, is beyond me.Let me summarize the situation and possible ways forward as I see them.
If I'm mistaken, please correct me.Problems:
i) pg_replication_slot.min_safe_lsn has a weird definition in that all
replication slots show the same value
It is also not clear how the user can make use of that value?
ii) min_safe_lsn cannot be used with pg_walfile_name, because it returns
the name of the previous segment.Proposed solutions:
a) Do nothing -- keep the min_safe_lsn column as is. Warn users that
pg_walfile_name should not be used with this column.
b) Redefine min_safe_lsn to be lsn+1, so that pg_walfile_name can be used
and return a useful value.
c) Remove min_safe_lsn; add functions that expose the same value
d) Remove min_safe_lsn; add a new view that exposes the same value and
possibly otherse) Replace min_safe_lsn with a "distance" column, which reports
restart_lsn - oldest valid LSN
(Note that you no longer have an LSN in this scenario, so you can't
call pg_walfile_name.)
Can we consider an option to "Remove min_safe_lsn; document how a user
can monitor the distance"? We have a function to get current WAL
insert location and other things required are available either via
view or as guc variable values. The reason I am thinking of this
option is that it might be better to get some more feedback on what is
the most appropriate value to display. However, I am okay if we can
reach a consensus on one of the above options.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 2020/06/26 13:45, Amit Kapila wrote:
On Fri, Jun 26, 2020 at 4:54 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2020-Jun-26, Michael Paquier wrote:
On Thu, Jun 25, 2020 at 11:24:27AM -0400, Alvaro Herrera wrote:
I don't understand the proposal. Michael posted a patch that adds
pg_wal_oldest_lsn(), and you say we should apply the patch except the
part that adds that function -- so what part would be applying?I have sent last week a patch about only the removal of min_safe_lsn:
/messages/by-id/20200619121552.GH453547@paquier.xyz
So this applies to this part.Well, I oppose that because it leaves us with no way to monitor slot
limits. In his opening email, Masao-san proposed to simply change the
value by adding 1. How you go from adding 1 to a column to removing
the column completely with no recourse, is beyond me.Let me summarize the situation and possible ways forward as I see them.
If I'm mistaken, please correct me.Problems:
i) pg_replication_slot.min_safe_lsn has a weird definition in that all
replication slots show the same valueIt is also not clear how the user can make use of that value?
ii) min_safe_lsn cannot be used with pg_walfile_name, because it returns
the name of the previous segment.Proposed solutions:
a) Do nothing -- keep the min_safe_lsn column as is. Warn users that
pg_walfile_name should not be used with this column.
b) Redefine min_safe_lsn to be lsn+1, so that pg_walfile_name can be used
and return a useful value.
c) Remove min_safe_lsn; add functions that expose the same value
d) Remove min_safe_lsn; add a new view that exposes the same value and
possibly otherse) Replace min_safe_lsn with a "distance" column, which reports
restart_lsn - oldest valid LSN
(Note that you no longer have an LSN in this scenario, so you can't
call pg_walfile_name.)
I like (e).
Can we consider an option to "Remove min_safe_lsn; document how a user
can monitor the distance"? We have a function to get current WAL
insert location and other things required are available either via
view or as guc variable values. The reason I am thinking of this
option is that it might be better to get some more feedback on what is
the most appropriate value to display. However, I am okay if we can
reach a consensus on one of the above options.
Yes, that's an idea. But it might not be easy to calculate that distance
manually by subtracting max_slot_wal_keep_size from the current LSN.
Because we've not supported -(pg_lsn, numeric) operator yet. I'm
proposing that operator, but it's for v14.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2020/06/30 17:07, Fujii Masao wrote:
On 2020/06/26 13:45, Amit Kapila wrote:
On Fri, Jun 26, 2020 at 4:54 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2020-Jun-26, Michael Paquier wrote:
On Thu, Jun 25, 2020 at 11:24:27AM -0400, Alvaro Herrera wrote:
I don't understand the proposal. Michael posted a patch that adds
pg_wal_oldest_lsn(), and you say we should apply the patch except the
part that adds that function -- so what part would be applying?I have sent last week a patch about only the removal of min_safe_lsn:
/messages/by-id/20200619121552.GH453547@paquier.xyz
So this applies to this part.Well, I oppose that because it leaves us with no way to monitor slot
limits. In his opening email, Masao-san proposed to simply change the
value by adding 1. How you go from adding 1 to a column to removing
the column completely with no recourse, is beyond me.Let me summarize the situation and possible ways forward as I see them.
If I'm mistaken, please correct me.Problems:
i) pg_replication_slot.min_safe_lsn has a weird definition in that all
replication slots show the same valueIt is also not clear how the user can make use of that value?
ii) min_safe_lsn cannot be used with pg_walfile_name, because it returns
the name of the previous segment.Proposed solutions:
a) Do nothing -- keep the min_safe_lsn column as is. Warn users that
pg_walfile_name should not be used with this column.
b) Redefine min_safe_lsn to be lsn+1, so that pg_walfile_name can be used
and return a useful value.
c) Remove min_safe_lsn; add functions that expose the same value
d) Remove min_safe_lsn; add a new view that exposes the same value and
possibly otherse) Replace min_safe_lsn with a "distance" column, which reports
restart_lsn - oldest valid LSN
(Note that you no longer have an LSN in this scenario, so you can't
call pg_walfile_name.)I like (e).
Can we consider an option to "Remove min_safe_lsn; document how a user
can monitor the distance"? We have a function to get current WAL
insert location and other things required are available either via
view or as guc variable values. The reason I am thinking of this
option is that it might be better to get some more feedback on what is
the most appropriate value to display. However, I am okay if we can
reach a consensus on one of the above options.Yes, that's an idea. But it might not be easy to calculate that distance
manually by subtracting max_slot_wal_keep_size from the current LSN.
Because we've not supported -(pg_lsn, numeric) operator yet. I'm
proposing that operator, but it's for v14.
Sorry this is not true. That distance can be calculated without those operators.
For example,
SELECT restart_lsn - pg_current_wal_lsn() + (SELECT setting::numeric * 1024 * 1024 FROM pg_settings WHERE name = 'max_slot_wal_keep_size') distance FROM pg_replication_slots;
If the calculated distance is small or negative value, which means that
we may lose some required WAL files. So in this case it's worth considering
to increase max_slot_wal_keep_size.
I still think it's better and more helpful to display something like
that distance in pg_replication_slots rather than making each user
calculate it...
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2020-Jun-30, Fujii Masao wrote:
Sorry this is not true. That distance can be calculated without those operators.
For example,SELECT restart_lsn - pg_current_wal_lsn() + (SELECT setting::numeric * 1024 * 1024 FROM pg_settings WHERE name = 'max_slot_wal_keep_size') distance FROM pg_replication_slots;
If the calculated distance is small or negative value, which means that
we may lose some required WAL files. So in this case it's worth considering
to increase max_slot_wal_keep_size.
... OK, but you're forgetting wal_keep_segments.
I still think it's better and more helpful to display something like
that distance in pg_replication_slots rather than making each user
calculate it...
Agreed.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
At Tue, 30 Jun 2020 23:23:30 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
Can we consider an option to "Remove min_safe_lsn; document how a user
can monitor the distance"? We have a function to get current WAL
insert location and other things required are available either via
view or as guc variable values. The reason I am thinking of this
option is that it might be better to get some more feedback on what is
the most appropriate value to display. However, I am okay if we can
reach a consensus on one of the above options.Yes, that's an idea. But it might not be easy to calculate that
distance
manually by subtracting max_slot_wal_keep_size from the current LSN.
Because we've not supported -(pg_lsn, numeric) operator yet. I'm
proposing that operator, but it's for v14.Sorry this is not true. That distance can be calculated without those
operators.
For example,SELECT restart_lsn - pg_current_wal_lsn() + (SELECT setting::numeric *
1024 * 1024 FROM pg_settings WHERE name = 'max_slot_wal_keep_size')
distance FROM pg_replication_slots;
It's an approximation with accuracy of segment size. The calculation
would be not that simple because of the unit of the calculation. The
formula for the exact calculateion (ignoring wal_keep_segments) is:
distance = (seg_floor(restart_lsn) +
seg_floor(max_slot_wal_keep_size) + 1) * wal_segment_size -
current_lsn
where seg_floor is floor() by wal_segment_size.
regards.
If the calculated distance is small or negative value, which means
that
we may lose some required WAL files. So in this case it's worth
considering
to increase max_slot_wal_keep_size.I still think it's better and more helpful to display something like
that distance in pg_replication_slots rather than making each user
calculate it...
Agreed. The attached replaces min_safe_lsn with "distance".
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
v1-0001-Replace-min_safe_lsn-with-distance-in-pg_replicat.patchtext/x-patch; charset=us-asciiDownload
From 7e078c9ff8e0594ce8f4e95c7db84ea0a31e9775 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Tue, 30 Jun 2020 21:09:19 +0900
Subject: [PATCH v1] Replace min_safe_lsn with "distance" in
pg_replication_slots
pg_replication_slot.min_safe_lsn, which shows the oldest LSN kept in
pg_wal, is doubtful in usability for monitoring. Change it to
distance, which shows how many bytes the server can advance before the
slot loses required segments.
---
src/backend/access/transam/xlog.c | 39 +++++++++++++++++++++++
src/backend/catalog/system_views.sql | 2 +-
src/backend/replication/slotfuncs.c | 19 +++++------
src/include/access/xlog.h | 1 +
src/include/catalog/pg_proc.dat | 4 +--
src/test/recovery/t/019_replslot_limit.pl | 20 ++++++------
src/test/regress/expected/rules.out | 4 +--
7 files changed, 62 insertions(+), 27 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index fd93bcfaeb..1f27639912 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9570,6 +9570,45 @@ GetWALAvailability(XLogRecPtr targetLSN)
return WALAVAIL_REMOVED;
}
+/*
+ * Calculate how many bytes we can advance from currptr until the targetLSN is
+ * removed.
+ *
+ * Returns 0 if the distance is invalid.
+ */
+uint64
+DistanceToWALHorizon(XLogRecPtr targetLSN, XLogRecPtr currptr)
+{
+ XLogSegNo targetSeg;
+ XLogSegNo keepSegs;
+ XLogSegNo failSeg;
+ XLogRecPtr horizon;
+
+ XLByteToSeg(targetLSN, targetSeg, wal_segment_size);
+ keepSegs = 0;
+
+ /* no limit if max_slot_wal_keep_size is invalid */
+ if (max_slot_wal_keep_size_mb < 0)
+ return 0;
+
+ /* How many segments slots can keep? */
+ keepSegs = ConvertToXSegs(max_slot_wal_keep_size_mb, wal_segment_size);
+
+ /* override by wal_keep_segments if needed */
+ if (wal_keep_segments > keepSegs)
+ keepSegs = wal_keep_segments;
+
+ /* calculate the LSN where targetLSN is lost when currpos reaches */
+ failSeg = targetSeg + keepSegs + 1;
+ XLogSegNoOffsetToRecPtr(failSeg, 0, wal_segment_size, horizon);
+
+ /* If currptr already beyond the horizon, return zero. */
+ if (currptr > horizon)
+ return 0;
+
+ /* return the distance from currptr to the horizon */
+ return horizon - currptr;
+}
/*
* Retreat *logSegNo to the last segment that we need to retain because of
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5314e9348f..b9847a9f92 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -879,7 +879,7 @@ CREATE VIEW pg_replication_slots AS
L.restart_lsn,
L.confirmed_flush_lsn,
L.wal_status,
- L.min_safe_lsn
+ L.distance
FROM pg_get_replication_slots() AS L
LEFT JOIN pg_database D ON (L.datoid = D.oid);
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 88033a79b2..532b3c5826 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -242,6 +242,7 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
Tuplestorestate *tupstore;
MemoryContext per_query_ctx;
MemoryContext oldcontext;
+ XLogRecPtr currlsn;
int slotno;
/* check to see if caller supports us returning a tuplestore */
@@ -274,6 +275,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
MemoryContextSwitchTo(oldcontext);
+ currlsn = GetXLogWriteRecPtr();
+
LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
for (slotno = 0; slotno < max_replication_slots; slotno++)
{
@@ -282,7 +285,7 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
Datum values[PG_GET_REPLICATION_SLOTS_COLS];
bool nulls[PG_GET_REPLICATION_SLOTS_COLS];
WALAvailability walstate;
- XLogSegNo last_removed_seg;
+ uint32 distance;
int i;
if (!slot->in_use)
@@ -398,16 +401,10 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
break;
}
- if (max_slot_wal_keep_size_mb >= 0 &&
- (walstate == WALAVAIL_RESERVED || walstate == WALAVAIL_EXTENDED) &&
- ((last_removed_seg = XLogGetLastRemovedSegno()) != 0))
- {
- XLogRecPtr min_safe_lsn;
-
- XLogSegNoOffsetToRecPtr(last_removed_seg + 1, 0,
- wal_segment_size, min_safe_lsn);
- values[i++] = Int64GetDatum(min_safe_lsn);
- }
+ distance =
+ DistanceToWALHorizon(slot_contents.data.restart_lsn, currlsn);
+ if (distance > 0)
+ values[i++] = Int64GetDatum(distance);
else
nulls[i++] = true;
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 77ac4e785f..1ec448c5d5 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -329,6 +329,7 @@ extern void InitXLOGAccess(void);
extern void CreateCheckPoint(int flags);
extern bool CreateRestartPoint(int flags);
extern WALAvailability GetWALAvailability(XLogRecPtr targetLSN);
+extern uint64 DistanceToWALHorizon(XLogRecPtr targetLSN, XLogRecPtr currptr);
extern XLogRecPtr CalculateMaxmumSafeLSN(void);
extern void XLogPutNextOid(Oid nextOid);
extern XLogRecPtr XLogRestorePoint(const char *rpName);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 61f2c2f5b4..199fd994bd 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10063,9 +10063,9 @@
proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
proretset => 't', provolatile => 's', prorettype => 'record',
proargtypes => '',
- proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,pg_lsn}',
+ proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8}',
proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,min_safe_lsn}',
+ proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,distance}',
prosrc => 'pg_get_replication_slots' },
{ oid => '3786', descr => 'set up a logical replication slot',
proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 7d22ae5720..1c76d2d9e9 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -28,7 +28,7 @@ $node_master->safe_psql('postgres',
# The slot state and remain should be null before the first connection
my $result = $node_master->safe_psql('postgres',
- "SELECT restart_lsn IS NULL, wal_status is NULL, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT restart_lsn IS NULL, wal_status is NULL, distance is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
is($result, "t|t|t", 'check the state of non-reserved slot is "unknown"');
@@ -52,9 +52,9 @@ $node_master->wait_for_catchup($node_standby, 'replay', $start_lsn);
# Stop standby
$node_standby->stop;
-# Preparation done, the slot is the state "normal" now
+# Preparation done, the slot is the state "reserved" now
$result = $node_master->safe_psql('postgres',
- "SELECT wal_status, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT wal_status, distance is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
is($result, "reserved|t", 'check the catching-up state');
@@ -64,7 +64,7 @@ $node_master->safe_psql('postgres', "CHECKPOINT;");
# The slot is always "safe" when fitting max_wal_size
$result = $node_master->safe_psql('postgres',
- "SELECT wal_status, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT wal_status, distance is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
is($result, "reserved|t",
'check that it is safe if WAL fits in max_wal_size');
@@ -74,7 +74,7 @@ $node_master->safe_psql('postgres', "CHECKPOINT;");
# The slot is always "safe" when max_slot_wal_keep_size is not set
$result = $node_master->safe_psql('postgres',
- "SELECT wal_status, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT wal_status, distance is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
is($result, "reserved|t", 'check that slot is working');
@@ -94,9 +94,7 @@ max_slot_wal_keep_size = ${max_slot_wal_keep_size_mb}MB
));
$node_master->reload;
-# The slot is in safe state. The distance from the min_safe_lsn should
-# be as almost (max_slot_wal_keep_size - 1) times large as the segment
-# size
+# The slot is in safe state.
$result = $node_master->safe_psql('postgres',
"SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'");
@@ -110,7 +108,7 @@ $node_master->safe_psql('postgres', "CHECKPOINT;");
$result = $node_master->safe_psql('postgres',
"SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'");
is($result, "reserved",
- 'check that min_safe_lsn gets close to the current LSN');
+ 'check that distance gets close to the current LSN');
# The standby can reconnect to master
$node_standby->start;
@@ -154,7 +152,7 @@ advance_wal($node_master, 1);
# Slot gets into 'unreserved' state
$result = $node_master->safe_psql('postgres',
- "SELECT wal_status, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT wal_status, distance is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
is($result, "unreserved|t",
'check that the slot state changes to "unreserved"');
@@ -186,7 +184,7 @@ ok( find_in_log(
# This slot should be broken
$result = $node_master->safe_psql('postgres',
- "SELECT slot_name, active, restart_lsn IS NULL, wal_status, min_safe_lsn FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT slot_name, active, restart_lsn IS NULL, wal_status, distance FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
is($result, "rep1|f|t|lost|",
'check that the slot became inactive and the state "lost" persists');
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index b813e32215..392eab12dd 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1464,8 +1464,8 @@ pg_replication_slots| SELECT l.slot_name,
l.restart_lsn,
l.confirmed_flush_lsn,
l.wal_status,
- l.min_safe_lsn
- FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, min_safe_lsn)
+ l.distance
+ FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, distance)
LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
pg_roles| SELECT pg_authid.rolname,
pg_authid.rolsuper,
--
2.18.4
On Tue, Jun 30, 2020 at 7:53 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/06/30 17:07, Fujii Masao wrote:
On 2020/06/26 13:45, Amit Kapila wrote:
Can we consider an option to "Remove min_safe_lsn; document how a user
can monitor the distance"? We have a function to get current WAL
insert location and other things required are available either via
view or as guc variable values. The reason I am thinking of this
option is that it might be better to get some more feedback on what is
the most appropriate value to display. However, I am okay if we can
reach a consensus on one of the above options.Yes, that's an idea. But it might not be easy to calculate that distance
manually by subtracting max_slot_wal_keep_size from the current LSN.
Because we've not supported -(pg_lsn, numeric) operator yet. I'm
proposing that operator, but it's for v14.Sorry this is not true. That distance can be calculated without those operators.
For example,SELECT restart_lsn - pg_current_wal_lsn() + (SELECT setting::numeric * 1024 * 1024 FROM pg_settings WHERE name = 'max_slot_wal_keep_size') distance FROM pg_replication_slots;
If the calculated distance is small or negative value, which means that
we may lose some required WAL files. So in this case it's worth considering
to increase max_slot_wal_keep_size.I still think it's better and more helpful to display something like
that distance in pg_replication_slots rather than making each user
calculate it...
Okay, but do we think it is better to display this in
pg_replication_slots or some new view like pg_stat_*_slots as being
discussed in [1]/messages/by-id/CA+fd4k5_pPAYRTDrO2PbtTOe0eHQpBvuqmCr8ic39uTNmR49Eg@mail.gmail.com? It should not happen that we later decide to move
this or similar stats to that view.
[1]: /messages/by-id/CA+fd4k5_pPAYRTDrO2PbtTOe0eHQpBvuqmCr8ic39uTNmR49Eg@mail.gmail.com
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 2020-Jul-01, Amit Kapila wrote:
Okay, but do we think it is better to display this in
pg_replication_slots or some new view like pg_stat_*_slots as being
discussed in [1]? It should not happen that we later decide to move
this or similar stats to that view.
It seems that the main motivation for having some counters in another
view is the ability to reset them; and resetting this distance value
makes no sense, so I think it's better to have it in
pg_replication_slots.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jul 01, 2020 at 11:14:21AM -0400, Alvaro Herrera wrote:
On 2020-Jul-01, Amit Kapila wrote:
Okay, but do we think it is better to display this in
pg_replication_slots or some new view like pg_stat_*_slots as being
discussed in [1]? It should not happen that we later decide to move
this or similar stats to that view.It seems that the main motivation for having some counters in another
view is the ability to reset them; and resetting this distance value
makes no sense, so I think it's better to have it in
pg_replication_slots.
pg_replication_slots would make sense to me than a stat view for a
distance column. Now, I have to admit that I am worried when seeing
design discussions on this thread for 13 after beta2 has been shipped,
so my vote would still be to remove for now the column in 13, document
an equivalent query to do this work (I actually just do that in a
bgworker monitoring repslot bloat now in some stuff I maintain
internally), and resend a patch in v14 to give the occasion for this
feature to go through one extra round of review. My 2c.
--
Michael
On 2020-Jul-02, michael@paquier.xyz wrote:
pg_replication_slots would make sense to me than a stat view for a
distance column. Now, I have to admit that I am worried when seeing
design discussions on this thread for 13 after beta2 has been shipped,
We already had this discussion and one of the things we said before
beta2 was "we're still in beta2, there's time". I see no need to panic.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jul 1, 2020 at 8:44 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2020-Jul-01, Amit Kapila wrote:
Okay, but do we think it is better to display this in
pg_replication_slots or some new view like pg_stat_*_slots as being
discussed in [1]? It should not happen that we later decide to move
this or similar stats to that view.It seems that the main motivation for having some counters in another
view is the ability to reset them; and resetting this distance value
makes no sense, so I think it's better to have it in
pg_replication_slots.
Fair enough. It would be good if we can come up with something better
than 'distance' for this column. Some ideas safe_wal_limit,
safe_wal_size?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 2020-Jul-04, Amit Kapila wrote:
Fair enough. It would be good if we can come up with something better
than 'distance' for this column. Some ideas safe_wal_limit,
safe_wal_size?
Hmm, I like safe_wal_size.
I've been looking at this intermittently since late last week and I
intend to get it done in the next couple of days.
Thanks!
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-Jul-06, Alvaro Herrera wrote:
Hmm, I like safe_wal_size.
I've been looking at this intermittently since late last week and I
intend to get it done in the next couple of days.
I propose the attached. This is pretty much what was proposed by
Kyotaro, but I made a couple of changes. Most notably, I moved the
calculation to the view code itself rather than creating a function in
xlog.c, mostly because it seemed to me that the new function was
creating an abstraction leakage without adding any value; also, if we
add per-slot size limits later, it would get worse.
The other change was to report negative values when the slot becomes
unreserved, rather than zero. It shows how much beyond safety your
slots are getting, so it seems useful. Clamping at zero seems to serve
no purpose.
I also made it report null immediately when slots are in state lost.
But beware of slots that appear lost but fall in the unreserved category
because they advanced before checkpointer signalled them. (This case
requires a debugger to hit ...)
One thing that got my attention while going over this is that the error
message we throw when making a slot invalid is not very helpful; it
doesn't say what the current insertion LSN was at that point. Maybe we
should add that? (As a separate patch, of couse.)
Any more thoughts? If not, I'll get this pushed tomorrow finally.
Thanks
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
0001-Morph-pg_replication_slots.min_safe_lsn-to-safe_wal_.patchtext/x-diff; charset=iso-8859-1Download
From bd65ed10b25429fea8776bf9bc38c82a3544a3fa Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Fri, 3 Jul 2020 17:25:00 -0400
Subject: [PATCH] Morph pg_replication_slots.min_safe_lsn to safe_wal_size
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The previous definition of the column almost universally disliked, so
provide this updated definition which is more useful for monitoring
purposes: a large positive value is good, while zero or a negative value
means danger. This is very easy to monitor.
FIXME -- catversion bump required
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Ãlvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: XXX
Discussion: https://postgr.es/m/9ddfbf8c-2f67-904d-44ed-cf8bc5916228@oss.nttdata.com
---
doc/src/sgml/catalogs.sgml | 7 ++--
src/backend/access/transam/xlog.c | 3 +-
src/backend/catalog/system_views.sql | 2 +-
src/backend/replication/slotfuncs.c | 42 ++++++++++++++++-------
src/include/catalog/pg_proc.dat | 4 +--
src/test/recovery/t/019_replslot_limit.pl | 22 ++++++------
src/test/regress/expected/rules.out | 4 +--
7 files changed, 51 insertions(+), 33 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 003d278370..361793b337 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -11275,10 +11275,13 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
<row>
<entry role="catalog_table_entry"><para role="column_definition">
- <structfield>min_safe_lsn</structfield> <type>pg_lsn</type>
+ <structfield>safe_wal_size</structfield> <type>int8</type>
</para>
<para>
- The minimum LSN currently available for walsenders.
+ The number of bytes that can be written to WAL such that this slot
+ is not in danger of getting in state "lost". It is NULL for lost
+ slots, as well as if <varname>max_slot_wal_keep_size</varname>
+ is <literal>-1</literal>.
</para></entry>
</row>
</tbody>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index fd93bcfaeb..2f6d67592a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9513,8 +9513,7 @@ GetWALAvailability(XLogRecPtr targetLSN)
XLogSegNo targetSeg; /* segid of targetLSN */
XLogSegNo oldestSeg; /* actual oldest segid */
XLogSegNo oldestSegMaxWalSize; /* oldest segid kept by max_wal_size */
- XLogSegNo oldestSlotSeg = InvalidXLogRecPtr; /* oldest segid kept by
- * slot */
+ XLogSegNo oldestSlotSeg; /* oldest segid kept by slot */
uint64 keepSegs;
/*
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5314e9348f..73676d04cf 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -879,7 +879,7 @@ CREATE VIEW pg_replication_slots AS
L.restart_lsn,
L.confirmed_flush_lsn,
L.wal_status,
- L.min_safe_lsn
+ L.safe_wal_size
FROM pg_get_replication_slots() AS L
LEFT JOIN pg_database D ON (L.datoid = D.oid);
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 88033a79b2..0ddf20e858 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -242,6 +242,7 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
Tuplestorestate *tupstore;
MemoryContext per_query_ctx;
MemoryContext oldcontext;
+ XLogRecPtr currlsn;
int slotno;
/* check to see if caller supports us returning a tuplestore */
@@ -274,6 +275,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
MemoryContextSwitchTo(oldcontext);
+ currlsn = GetXLogWriteRecPtr();
+
LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
for (slotno = 0; slotno < max_replication_slots; slotno++)
{
@@ -282,7 +285,6 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
Datum values[PG_GET_REPLICATION_SLOTS_COLS];
bool nulls[PG_GET_REPLICATION_SLOTS_COLS];
WALAvailability walstate;
- XLogSegNo last_removed_seg;
int i;
if (!slot->in_use)
@@ -387,10 +389,12 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
SpinLockAcquire(&slot->mutex);
pid = slot->active_pid;
+ slot_contents.data.restart_lsn = slot->data.restart_lsn;
SpinLockRelease(&slot->mutex);
if (pid != 0)
{
values[i++] = CStringGetTextDatum("unreserved");
+ walstate = WALAVAIL_UNRESERVED;
break;
}
}
@@ -398,18 +402,32 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
break;
}
- if (max_slot_wal_keep_size_mb >= 0 &&
- (walstate == WALAVAIL_RESERVED || walstate == WALAVAIL_EXTENDED) &&
- ((last_removed_seg = XLogGetLastRemovedSegno()) != 0))
- {
- XLogRecPtr min_safe_lsn;
-
- XLogSegNoOffsetToRecPtr(last_removed_seg + 1, 0,
- wal_segment_size, min_safe_lsn);
- values[i++] = Int64GetDatum(min_safe_lsn);
- }
- else
+ /*
+ * safe_wal_size is only computed for slots that have not been lost,
+ * and only if there's a configured maximum size.
+ */
+ if (walstate == WALAVAIL_REMOVED || max_slot_wal_keep_size_mb < 0)
nulls[i++] = true;
+ else
+ {
+ XLogSegNo targetSeg;
+ XLogSegNo keepSegs;
+ XLogSegNo failSeg;
+ XLogRecPtr failLSN;
+
+ XLByteToSeg(slot_contents.data.restart_lsn, targetSeg, wal_segment_size);
+
+ /* determine how many segments slots can be kept by slots ... */
+ keepSegs = max_slot_wal_keep_size_mb / (wal_segment_size / (1024 * 1024));
+ /* ... and override by wal_keep_segments as needed */
+ keepSegs = Max(keepSegs, wal_keep_segments);
+
+ /* if currpos reaches failLSN, we lose our segment */
+ failSeg = targetSeg + keepSegs + 1;
+ XLogSegNoOffsetToRecPtr(failSeg, 0, wal_segment_size, failLSN);
+
+ values[i++] = Int64GetDatum(failLSN - currlsn);
+ }
Assert(i == PG_GET_REPLICATION_SLOTS_COLS);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 38295aca48..3c89c53aa2 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10075,9 +10075,9 @@
proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
proretset => 't', provolatile => 's', prorettype => 'record',
proargtypes => '',
- proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,pg_lsn}',
+ proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8}',
proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,min_safe_lsn}',
+ proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size}',
prosrc => 'pg_get_replication_slots' },
{ oid => '3786', descr => 'set up a logical replication slot',
proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 7d22ae5720..af656c6902 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -28,7 +28,7 @@ $node_master->safe_psql('postgres',
# The slot state and remain should be null before the first connection
my $result = $node_master->safe_psql('postgres',
- "SELECT restart_lsn IS NULL, wal_status is NULL, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT restart_lsn IS NULL, wal_status is NULL, safe_wal_size is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
is($result, "t|t|t", 'check the state of non-reserved slot is "unknown"');
@@ -52,9 +52,9 @@ $node_master->wait_for_catchup($node_standby, 'replay', $start_lsn);
# Stop standby
$node_standby->stop;
-# Preparation done, the slot is the state "normal" now
+# Preparation done, the slot is the state "reserved" now
$result = $node_master->safe_psql('postgres',
- "SELECT wal_status, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT wal_status, safe_wal_size IS NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
is($result, "reserved|t", 'check the catching-up state');
@@ -64,7 +64,7 @@ $node_master->safe_psql('postgres', "CHECKPOINT;");
# The slot is always "safe" when fitting max_wal_size
$result = $node_master->safe_psql('postgres',
- "SELECT wal_status, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT wal_status, safe_wal_size IS NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
is($result, "reserved|t",
'check that it is safe if WAL fits in max_wal_size');
@@ -74,7 +74,7 @@ $node_master->safe_psql('postgres', "CHECKPOINT;");
# The slot is always "safe" when max_slot_wal_keep_size is not set
$result = $node_master->safe_psql('postgres',
- "SELECT wal_status, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT wal_status, safe_wal_size IS NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
is($result, "reserved|t", 'check that slot is working');
@@ -94,9 +94,7 @@ max_slot_wal_keep_size = ${max_slot_wal_keep_size_mb}MB
));
$node_master->reload;
-# The slot is in safe state. The distance from the min_safe_lsn should
-# be as almost (max_slot_wal_keep_size - 1) times large as the segment
-# size
+# The slot is in safe state.
$result = $node_master->safe_psql('postgres',
"SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'");
@@ -110,7 +108,7 @@ $node_master->safe_psql('postgres', "CHECKPOINT;");
$result = $node_master->safe_psql('postgres',
"SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'");
is($result, "reserved",
- 'check that min_safe_lsn gets close to the current LSN');
+ 'check that safe_wal_size gets close to the current LSN');
# The standby can reconnect to master
$node_standby->start;
@@ -152,9 +150,9 @@ $node_master->safe_psql('postgres', "CHECKPOINT;");
# Advance WAL again without checkpoint; remain goes to 0.
advance_wal($node_master, 1);
-# Slot gets into 'unreserved' state
+# Slot gets into 'unreserved' state and safe_wal_size is negative
$result = $node_master->safe_psql('postgres',
- "SELECT wal_status, min_safe_lsn is NULL FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT wal_status, safe_wal_size <= 0 FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
is($result, "unreserved|t",
'check that the slot state changes to "unreserved"');
@@ -186,7 +184,7 @@ ok( find_in_log(
# This slot should be broken
$result = $node_master->safe_psql('postgres',
- "SELECT slot_name, active, restart_lsn IS NULL, wal_status, min_safe_lsn FROM pg_replication_slots WHERE slot_name = 'rep1'"
+ "SELECT slot_name, active, restart_lsn IS NULL, wal_status, safe_wal_size FROM pg_replication_slots WHERE slot_name = 'rep1'"
);
is($result, "rep1|f|t|lost|",
'check that the slot became inactive and the state "lost" persists');
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index b813e32215..93bb2159ca 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1464,8 +1464,8 @@ pg_replication_slots| SELECT l.slot_name,
l.restart_lsn,
l.confirmed_flush_lsn,
l.wal_status,
- l.min_safe_lsn
- FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, min_safe_lsn)
+ l.safe_wal_size
+ FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size)
LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
pg_roles| SELECT pg_authid.rolname,
pg_authid.rolsuper,
--
2.20.1
Thanks!
At Mon, 6 Jul 2020 20:54:36 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in
On 2020-Jul-06, Alvaro Herrera wrote:
Hmm, I like safe_wal_size.
I agree to the name, too.
I've been looking at this intermittently since late last week and I
intend to get it done in the next couple of days.I propose the attached. This is pretty much what was proposed by
Kyotaro, but I made a couple of changes. Most notably, I moved the
calculation to the view code itself rather than creating a function in
xlog.c, mostly because it seemed to me that the new function was
creating an abstraction leakage without adding any value; also, if we
add per-slot size limits later, it would get worse.
I'm not sure that detailed WAL segment calculation fits slotfuncs.c
but I don't object to the change. However if we do that:
+ /* determine how many segments slots can be kept by slots ... */
+ keepSegs = max_slot_wal_keep_size_mb / (wal_segment_size / (1024 * 1024));
Couldn't we move ConvertToXSegs from xlog.c to xlog_ingernals.h and
use it intead of the bare expression?
The other change was to report negative values when the slot becomes
unreserved, rather than zero. It shows how much beyond safety your
slots are getting, so it seems useful. Clamping at zero seems to serve
no purpose.
The reason for the clamping is the signedness of the values, or
integral promotion. However, I believe the calculation cannot go
beyond the range of signed long so the signedness conversion in the
patch looks fine.
I also made it report null immediately when slots are in state lost.
But beware of slots that appear lost but fall in the unreserved category
because they advanced before checkpointer signalled them. (This case
requires a debugger to hit ...)
Oh! Okay, that change seems right to me.
One thing that got my attention while going over this is that the error
message we throw when making a slot invalid is not very helpful; it
doesn't say what the current insertion LSN was at that point. Maybe we
should add that? (As a separate patch, of couse.)
It sounds helpful to me. (I remember that I sometime want to see
checkpoint LSNs in server log..)
Any more thoughts? If not, I'll get this pushed tomorrow finally.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 2020-Jul-07, Kyotaro Horiguchi wrote:
At Mon, 6 Jul 2020 20:54:36 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in
I propose the attached. This is pretty much what was proposed by
Kyotaro, but I made a couple of changes. Most notably, I moved the
calculation to the view code itself rather than creating a function in
xlog.c, mostly because it seemed to me that the new function was
creating an abstraction leakage without adding any value; also, if we
add per-slot size limits later, it would get worse.I'm not sure that detailed WAL segment calculation fits slotfuncs.c
but I don't object to the change. However if we do that:+ /* determine how many segments slots can be kept by slots ... */ + keepSegs = max_slot_wal_keep_size_mb / (wal_segment_size / (1024 * 1024));Couldn't we move ConvertToXSegs from xlog.c to xlog_ingernals.h and
use it intead of the bare expression?
I was of two minds about that, and the only reason I didn't do it is
that we'll need to give it a better name if we do it ... I'm open to
suggestions.
The other change was to report negative values when the slot becomes
unreserved, rather than zero. It shows how much beyond safety your
slots are getting, so it seems useful. Clamping at zero seems to serve
no purpose.The reason for the clamping is the signedness of the values, or
integral promotion. However, I believe the calculation cannot go
beyond the range of signed long so the signedness conversion in the
patch looks fine.
Yeah, I think the negative values are useful to see. I think if you
ever get close to 2^62, you're in much more serious trouble anyway :-)
But I don't deny that the math there could be subject of overflow
issues. If you want to verify, please be my guest ...
One thing that got my attention while going over this is that the error
message we throw when making a slot invalid is not very helpful; it
doesn't say what the current insertion LSN was at that point. Maybe we
should add that? (As a separate patch, of couse.)It sounds helpful to me. (I remember that I sometime want to see
checkpoint LSNs in server log..)
Hmm, ... let's do that for pg14!
Thanks for looking,
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-Jul-06, Alvaro Herrera wrote:
On 2020-Jul-07, Kyotaro Horiguchi wrote:
Couldn't we move ConvertToXSegs from xlog.c to xlog_ingernals.h and
use it intead of the bare expression?I was of two minds about that, and the only reason I didn't do it is
that we'll need to give it a better name if we do it ... I'm open to
suggestions.
In absence of other suggestions I gave this the name XLogMBVarToSegs,
and redefined ConvertToXSegs to use that. Didn't touch callers in
xlog.c to avoid pointless churn. Pushed to both master and 13.
I hope this satisfies everyone ... Masao-san, thanks for reporting the
problem, and thanks Horiguchi-san for providing the fix. (Also thanks
to Amit and Michael for discussion.)
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
In absence of other suggestions I gave this the name XLogMBVarToSegs,
and redefined ConvertToXSegs to use that. Didn't touch callers in
xlog.c to avoid pointless churn. Pushed to both master and 13.
The buildfarm's sparc64 members seem unhappy with this.
regards, tom lane
On 2020-Jul-08, Tom Lane wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
In absence of other suggestions I gave this the name XLogMBVarToSegs,
and redefined ConvertToXSegs to use that. Didn't touch callers in
xlog.c to avoid pointless churn. Pushed to both master and 13.The buildfarm's sparc64 members seem unhappy with this.
Hmm. Some of them are, yeah, but it's not universal. For example
mussurana and ibisbill are not showing failures.
Anyway the error is pretty strange: only GetWALAvailability is showing a
problem, but the size calculation in the view function def is returning
a negative number, as expected.
So looking at the code in GetWALAvailability, what happens is that
targetSeg >= oldestSlotSeg, but we expect the opposite. I'd bet for
targetSeg to be correct, since its input is just the slot LSN -- pretty
easy. But for oldestSlotSeg, we have KeepLogSeg involved.
No immediate ideas ...
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2020-Jul-08, Tom Lane wrote:
The buildfarm's sparc64 members seem unhappy with this.
Hmm. Some of them are, yeah, but it's not universal. For example
mussurana and ibisbill are not showing failures.
Ah, right, I was thinking they hadn't run since this commit, but they
have.
Anyway the error is pretty strange: only GetWALAvailability is showing a
problem, but the size calculation in the view function def is returning
a negative number, as expected.
We've previously noted what seem to be compiler optimization bugs on
both sparc32 and sparc64; the latest thread about that is
/messages/by-id/f28f842d-e82b-4e30-a81a-2a1f9fa4a8e1@www.fastmail.com
This is looking uncomfortably like the same thing. Tom, could you
experiment with different -O levels on those animals?
regards, tom lane
On 2020-Jul-08, Tom Lane wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
Anyway the error is pretty strange: only GetWALAvailability is showing a
problem, but the size calculation in the view function def is returning
a negative number, as expected.We've previously noted what seem to be compiler optimization bugs on
both sparc32 and sparc64; the latest thread about that is
/messages/by-id/f28f842d-e82b-4e30-a81a-2a1f9fa4a8e1@www.fastmail.comThis is looking uncomfortably like the same thing.
Ouch. So 12 builds with -O0 but 13 does not? Did we do something to
sequence.c to work around this problem? I cannot find anything.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2020-Jul-08, Tom Lane wrote:
We've previously noted what seem to be compiler optimization bugs on
both sparc32 and sparc64; the latest thread about that is
/messages/by-id/f28f842d-e82b-4e30-a81a-2a1f9fa4a8e1@www.fastmail.com
This is looking uncomfortably like the same thing.
Ouch. So 12 builds with -O0 but 13 does not?
Unless Tom's changed the animal's config since that thread, yes.
Did we do something to
sequence.c to work around this problem? I cannot find anything.
We did not. If it's a compiler bug, and one as phase-of-the-moon-
dependent as this seems to be, I'd have zero confidence that any
specific source code change would fix it (barring someone confidently
explaining exactly what the compiler bug is, anyway). The best we
can do for now is hope that backing off the -O level avoids the bug.
regards, tom lane
On 2020-Jul-08, Tom Lane wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
Ouch. So 12 builds with -O0 but 13 does not?
Unless Tom's changed the animal's config since that thread, yes.
I verified the configs in branches 12 and 13 in one of the failing
animals, and yes that's still the case.
Did we do something to
sequence.c to work around this problem? I cannot find anything.We did not. If it's a compiler bug, and one as phase-of-the-moon-
dependent as this seems to be, I'd have zero confidence that any
specific source code change would fix it (barring someone confidently
explaining exactly what the compiler bug is, anyway). The best we
can do for now is hope that backing off the -O level avoids the bug.
An easy workaround might be to add -O0 for that platform in that
directory's Makefile.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2020-Jul-08, Tom Lane wrote:
We did not. If it's a compiler bug, and one as phase-of-the-moon-
dependent as this seems to be, I'd have zero confidence that any
specific source code change would fix it (barring someone confidently
explaining exactly what the compiler bug is, anyway). The best we
can do for now is hope that backing off the -O level avoids the bug.
An easy workaround might be to add -O0 for that platform in that
directory's Makefile.
"Back off the -O level in one directory" seems about as misguided as
"back off the -O level in one branch", if you ask me. There's no
reason to suppose that the problem won't bite us somewhere else next
week.
The previous sparc32 bug that we'd made some effort to run to ground
is described here:
/messages/by-id/15142.1498165769@sss.pgh.pa.us
We really don't know what aspects of the source code trigger that.
I'm slightly suspicious that we might be seeing the same bug in the
sparc64 builds, but it's just a guess.
regards, tom lane
... or on the other hand, maybe these animals are just showing more
sensitivity than others to an actual code bug. skink is showing
valgrind failures in this very area, on both HEAD and v13:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2020-07-08%2021%3A13%3A02
==3166208== VALGRINDERROR-BEGIN
==3166208== Conditional jump or move depends on uninitialised value(s)
==3166208== at 0x28618D: KeepLogSeg (xlog.c:9627)
==3166208== by 0x296AC5: GetWALAvailability (xlog.c:9533)
==3166208== by 0x4FFECB: pg_get_replication_slots (slotfuncs.c:356)
==3166208== by 0x3C762F: ExecMakeTableFunctionResult (execSRF.c:234)
==3166208== by 0x3D9A9A: FunctionNext (nodeFunctionscan.c:95)
==3166208== by 0x3C81D6: ExecScanFetch (execScan.c:133)
==3166208== by 0x3C81D6: ExecScan (execScan.c:199)
==3166208== by 0x3D99A9: ExecFunctionScan (nodeFunctionscan.c:270)
==3166208== by 0x3C5072: ExecProcNodeFirst (execProcnode.c:450)
==3166208== by 0x3BD35E: ExecProcNode (executor.h:245)
==3166208== by 0x3BD35E: ExecutePlan (execMain.c:1646)
==3166208== by 0x3BE039: standard_ExecutorRun (execMain.c:364)
==3166208== by 0x3BE102: ExecutorRun (execMain.c:308)
==3166208== by 0x559669: PortalRunSelect (pquery.c:912)
==3166208== Uninitialised value was created by a stack allocation
==3166208== at 0x296A84: GetWALAvailability (xlog.c:9523)
==3166208==
==3166208== VALGRINDERROR-END
and even the most cursory look at the code confirms that there's a
real bug here. KeepLogSeg expects *logSegNo to be defined on entry,
but GetWALAvailability hasn't bothered to initialize oldestSlotSeg.
It is not clear to me which one is in the wrong; the comment for
KeepLogSeg isn't particularly clear on this.
regards, tom lane
On 2020-Jul-09, Tom Lane wrote:
and even the most cursory look at the code confirms that there's a
real bug here. KeepLogSeg expects *logSegNo to be defined on entry,
but GetWALAvailability hasn't bothered to initialize oldestSlotSeg.
It is not clear to me which one is in the wrong; the comment for
KeepLogSeg isn't particularly clear on this.
Oh, so I introduced the bug when I removed the initialization in this
fix. That one was using the wrong datatype, but evidently it achieved
the right effect. And KeepLogSeg is using the wrong datatype Invalid
macro also.
I think we should define InvalidXLogSegNo to be ~((uint64)0) and add a
macro to test for that.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-Jul-09, Alvaro Herrera wrote:
I think we should define InvalidXLogSegNo to be ~((uint64)0) and add a
macro to test for that.
That's overkill really. I just used zero. Running
contrib/test_decoding under valgrind, this now passes.
I think I'd rather do away with the compare to zero, and initialize to
something else in GetWALAvailability, though. What we're doing seems
unclean and unclear.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
0001-fix-uninitialized-value.patchtext/x-diff; charset=us-asciiDownload
From 07546b71190b27433e673ff7d248e0f98215abb1 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Fri, 10 Jul 2020 11:27:55 -0400
Subject: [PATCH] fix uninitialized value
---
src/backend/access/transam/xlog.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 28daf72a50..3d828cc56f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9530,6 +9530,7 @@ GetWALAvailability(XLogRecPtr targetLSN)
* considering wal_keep_segments and max_slot_wal_keep_size
*/
XLByteToSeg(targetLSN, targetSeg, wal_segment_size);
+ oldestSlotSeg = 0;
KeepLogSeg(currpos, &oldestSlotSeg);
/*
@@ -9624,7 +9625,7 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
}
/* don't delete WAL segments newer than the calculated segment */
- if (XLogRecPtrIsInvalid(*logSegNo) || segno < *logSegNo)
+ if (*logSegNo == 0 || segno < *logSegNo)
*logSegNo = segno;
}
--
2.20.1
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2020-Jul-09, Alvaro Herrera wrote:
I think we should define InvalidXLogSegNo to be ~((uint64)0) and add a
macro to test for that.
That's overkill really. I just used zero. Running
contrib/test_decoding under valgrind, this now passes.
I think I'd rather do away with the compare to zero, and initialize to
something else in GetWALAvailability, though. What we're doing seems
unclean and unclear.
Is zero really not a valid segment number?
regards, tom lane
On 2020-Jul-11, Tom Lane wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2020-Jul-09, Alvaro Herrera wrote:
I think we should define InvalidXLogSegNo to be ~((uint64)0) and add a
macro to test for that.That's overkill really. I just used zero. Running
contrib/test_decoding under valgrind, this now passes.I think I'd rather do away with the compare to zero, and initialize to
something else in GetWALAvailability, though. What we're doing seems
unclean and unclear.Is zero really not a valid segment number?
No, but you cannot retreat from that ...
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
A much more sensible answer is to initialize the segno to the segment
currently being written, as in the attached.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
v3-0001-Fix-uninitialized-value-in-segno-calculation.patchtext/x-diff; charset=us-asciiDownload
From 699e3a25e0673353fcb10fa92577f7534e594227 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Mon, 13 Jul 2020 13:22:40 -0400
Subject: [PATCH v3] Fix uninitialized value in segno calculation
Remove previous hack in KeepLogSeg that added a case to deal with a
(badly represented) invalid segment number; to replace it, initialize
the segment number to be retreated to the currently being written
segment.
Per valgrind-running buildfarm member skink, and some sparc64 animals.
Discussion: https://postgr.es/m/1724648.1594230917@sss.pgh.pa.us
---
src/backend/access/transam/xlog.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 28daf72a50..0a97b1d37f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9523,13 +9523,13 @@ GetWALAvailability(XLogRecPtr targetLSN)
if (XLogRecPtrIsInvalid(targetLSN))
return WALAVAIL_INVALID_LSN;
- currpos = GetXLogWriteRecPtr();
-
/*
- * calculate the oldest segment currently reserved by all slots,
- * considering wal_keep_segments and max_slot_wal_keep_size
+ * Calculate the oldest segment currently reserved by all slots,
+ * considering wal_keep_segments and max_slot_wal_keep_size. Initialize
+ * oldestSlotSeg to the current segment.
*/
- XLByteToSeg(targetLSN, targetSeg, wal_segment_size);
+ currpos = GetXLogWriteRecPtr();
+ XLByteToSeg(currpos, oldestSlotSeg, wal_segment_size);
KeepLogSeg(currpos, &oldestSlotSeg);
/*
@@ -9548,6 +9548,9 @@ GetWALAvailability(XLogRecPtr targetLSN)
else
oldestSegMaxWalSize = 1;
+ /* the segment we care about */
+ XLByteToSeg(targetLSN, targetSeg, wal_segment_size);
+
/*
* No point in returning reserved or extended status values if the
* targetSeg is known to be lost.
@@ -9624,7 +9627,7 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
}
/* don't delete WAL segments newer than the calculated segment */
- if (XLogRecPtrIsInvalid(*logSegNo) || segno < *logSegNo)
+ if (segno < *logSegNo)
*logSegNo = segno;
}
--
2.20.1
On 2020-Jul-13, Alvaro Herrera wrote:
A much more sensible answer is to initialize the segno to the segment
currently being written, as in the attached.
Ran the valgrind test locally and it passes. Pushed it now.
Thanks,
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
At Mon, 13 Jul 2020 13:52:12 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in
On 2020-Jul-13, Alvaro Herrera wrote:
A much more sensible answer is to initialize the segno to the segment
currently being written, as in the attached.Ran the valgrind test locally and it passes. Pushed it now.
- if (XLogRecPtrIsInvalid(*logSegNo) || segno < *logSegNo)
+ if (segno < *logSegNo)
Oops! Thank you for fixing it!
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center