pgsql: Track last_inactive_time in pg_replication_slots.

Started by Amit Kapilaalmost 2 years ago33 messages

amit.kapila16@gmail.com

almost 2 years ago

Track last_inactive_time in pg_replication_slots.

This commit adds a new property called last_inactive_time for slots. It is
set to 0 whenever a slot is made active/acquired and set to the current
timestamp whenever the slot is inactive/released or restored from the disk.
Note that we don't set the last_inactive_time for the slots currently being
synced from the primary to the standby because such slots are typically
inactive as decoding is not allowed on those.

The 'last_inactive_time' will be useful on production servers to debug and
analyze inactive replication slots. It will also help to know the lifetime
of a replication slot - one can know how long a streaming standby, logical
subscriber, or replication slot consumer is down.

The 'last_inactive_time' will also be useful to implement inactive
timeout-based replication slot invalidation in a future commit.

Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik
Discussion: /messages/by-id/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/a11f330b5584f2430371d68871e00f5c63735299

Modified Files
--------------
doc/src/sgml/system-views.sgml | 10 ++
src/backend/catalog/system_views.sql | 1 +
src/backend/replication/slot.c | 35 +++++++
src/backend/replication/slotfuncs.c | 7 +-
src/include/catalog/catversion.h | 2 +-
src/include/catalog/pg_proc.dat | 6 +-
src/include/replication/slot.h | 3 +
src/test/recovery/t/019_replslot_limit.pl | 152 ++++++++++++++++++++++++++++++
src/test/regress/expected/rules.out | 3 +-
9 files changed, 213 insertions(+), 6 deletions(-)

Robert Haas

robertmhaas@gmail.com

almost 2 years ago

In reply to: Amit Kapila (#1)

Re: pgsql: Track last_inactive_time in pg_replication_slots.

On Mon, Mar 25, 2024 at 7:11 AM Amit Kapila <akapila@postgresql.org> wrote:

Track last_inactive_time in pg_replication_slots.

This commit adds a new property called last_inactive_time for slots. It is
set to 0 whenever a slot is made active/acquired and set to the current
timestamp whenever the slot is inactive/released or restored from the disk.
Note that we don't set the last_inactive_time for the slots currently being
synced from the primary to the standby because such slots are typically
inactive as decoding is not allowed on those.

So the field name is last_inactive_time, but if I'm reading this
description right, it's actually the last time the slot was active,
except for the weird exception for slots being synced. I'm wondering
if this field needs to be renamed.

And I'm suspicious that having an exception for slots being synced is
a bad idea. That makes too much of a judgement about how the user will
use this field. It's usually better to just expose the data, and if
the user needs helps to make sense of that data, then give them that
help separately. In this case, that would mean removing the exception,
but making it easy to tell the difference between slots are inactive
because they're being synced and slots that are inactive for some
other reason.

--
Robert Haas
EDB: http://www.enterprisedb.com

Amit Kapila

amit.kapila16@gmail.com

almost 2 years ago

In reply to: Robert Haas (#2)

Re: pgsql: Track last_inactive_time in pg_replication_slots.

On Mon, Mar 25, 2024 at 6:57 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Mar 25, 2024 at 7:11 AM Amit Kapila <akapila@postgresql.org> wrote:

Track last_inactive_time in pg_replication_slots.

This commit adds a new property called last_inactive_time for slots. It is
set to 0 whenever a slot is made active/acquired and set to the current
timestamp whenever the slot is inactive/released or restored from the disk.
Note that we don't set the last_inactive_time for the slots currently being
synced from the primary to the standby because such slots are typically
inactive as decoding is not allowed on those.

So the field name is last_inactive_time, but if I'm reading this
description right, it's actually the last time the slot was active,
except for the weird exception for slots being synced. I'm wondering
if this field needs to be renamed.

We considered the other two names as last_inactive_at and
last_active_time. For the first (last_inactive_at), there was an
argument that most other fields that display time ends with _time. For
the second (last_active_time), there was an argument that it could be
misleading as one could think that it should be updated each time WAL
record decoding is happening [1]/messages/by-id/Zf1yx9QMbhgJ/Lfy@ip-10-97-1-34.eu-west-3.compute.internal. The other possibility is to name it
last_used_time but I think it won't be much different from
last_active_time.

And I'm suspicious that having an exception for slots being synced is
a bad idea. That makes too much of a judgement about how the user will
use this field. It's usually better to just expose the data, and if
the user needs helps to make sense of that data, then give them that
help separately.

The reason we didn't set this for sync slots is that they won't be
usable (one can't use them to decode WAL) unless standby is promoted
[2]: /messages/by-id/CAJpy0uBGv85dFiWMnNLm6NuEs3eTVicsJCyRvMGbR8H+fOVBnA@mail.gmail.com
involved in this discussion to see what they think.

In this case, that would mean removing the exception,
but making it easy to tell the difference between slots are inactive
because they're being synced and slots that are inactive for some
other reason.

I think this can be differentiated with the help of 'synced' column in
pg_replication_slots.

[1]: /messages/by-id/Zf1yx9QMbhgJ/Lfy@ip-10-97-1-34.eu-west-3.compute.internal
[2]: /messages/by-id/CAJpy0uBGv85dFiWMnNLm6NuEs3eTVicsJCyRvMGbR8H+fOVBnA@mail.gmail.com

--
With Regards,
Amit Kapila.

Robert Haas

robertmhaas@gmail.com

almost 2 years ago

In reply to: Amit Kapila (#3)

Re: pgsql: Track last_inactive_time in pg_replication_slots.

On Mon, Mar 25, 2024 at 10:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

We considered the other two names as last_inactive_at and
last_active_time. For the first (last_inactive_at), there was an
argument that most other fields that display time ends with _time. For
the second (last_active_time), there was an argument that it could be
misleading as one could think that it should be updated each time WAL
record decoding is happening [1]. The other possibility is to name it
last_used_time but I think it won't be much different from
last_active_time.

I don't understand the bit about updating it each time WAL record
decoding is happening. If it's the last active time, and the slot is
currently active, then the answer is either "right now" or "currently
undefined." I'd expect to see NULL in the system view in such a case.
And if that's so, then there's nothing to update each time a record is
decoded, because it's just still going to show NULL.

Why does this field get set to the current time when the slot is
restored from disk?

--
Robert Haas
EDB: http://www.enterprisedb.com

Amit Kapila

amit.kapila16@gmail.com

almost 2 years ago

In reply to: Robert Haas (#4)

Re: pgsql: Track last_inactive_time in pg_replication_slots.

On Mon, Mar 25, 2024 at 7:51 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Mar 25, 2024 at 10:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

We considered the other two names as last_inactive_at and
last_active_time. For the first (last_inactive_at), there was an
argument that most other fields that display time ends with _time. For
the second (last_active_time), there was an argument that it could be
misleading as one could think that it should be updated each time WAL
record decoding is happening [1]. The other possibility is to name it
last_used_time but I think it won't be much different from
last_active_time.

I don't understand the bit about updating it each time WAL record
decoding is happening. If it's the last active time, and the slot is
currently active, then the answer is either "right now" or "currently
undefined." I'd expect to see NULL in the system view in such a case.
And if that's so, then there's nothing to update each time a record is
decoded, because it's just still going to show NULL.

IIUC, Bertrand's point was that users can interpret last_active_time
as a value that gets updated each time they decode a change which is
not what we are doing. So, this can confuse users. Your expectation of
answer (NULL) when the slot is active is correct and that is what will
happen.

Why does this field get set to the current time when the slot is
restored from disk?

It is because we don't want to include the time the server is down in
the last_inactive_time. Say, if we are shutting down the server at
time X and the server remains down for another two hours, we don't
want to include those two hours as the slot inactive time. The related
theory is that this field will be used to invalidate inactive slots
based on a threshold (say inactive_timeout). Say, before the shutdown,
we release the slot and set the current_time for last_inactive_time
for each slot and persist that information as well. Now, if the server
is down for a long time, we may invalidate the slots as soon as the
server comes up. So, instead, we just set this field at the time we
read slots for disk and then reset it to 0/NULL as soon as the slot
became active.

--
With Regards,
Amit Kapila.

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

almost 2 years ago

In reply to: Amit Kapila (#5)

Re: pgsql: Track last_inactive_time in pg_replication_slots.

Hi,

On Mon, Mar 25, 2024 at 08:38:16PM +0530, Amit Kapila wrote:

On Mon, Mar 25, 2024 at 7:51 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Mar 25, 2024 at 10:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

We considered the other two names as last_inactive_at and
last_active_time. For the first (last_inactive_at), there was an
argument that most other fields that display time ends with _time. For
the second (last_active_time), there was an argument that it could be
misleading as one could think that it should be updated each time WAL
record decoding is happening [1]. The other possibility is to name it
last_used_time but I think it won't be much different from
last_active_time.

I don't understand the bit about updating it each time WAL record
decoding is happening. If it's the last active time, and the slot is
currently active, then the answer is either "right now" or "currently
undefined." I'd expect to see NULL in the system view in such a case.
And if that's so, then there's nothing to update each time a record is
decoded, because it's just still going to show NULL.

IIUC, Bertrand's point was that users can interpret last_active_time
as a value that gets updated each time they decode a change which is
not what we are doing. So, this can confuse users. Your expectation of
answer (NULL) when the slot is active is correct and that is what will
happen.

Yeah, and so would be the confusion: why is last_active_time NULL while one is
using the slot?

Why does this field get set to the current time when the slot is
restored from disk?

It is because we don't want to include the time the server is down in
the last_inactive_time. Say, if we are shutting down the server at
time X and the server remains down for another two hours, we don't
want to include those two hours as the slot inactive time. The related
theory is that this field will be used to invalidate inactive slots
based on a threshold (say inactive_timeout). Say, before the shutdown,
we release the slot and set the current_time for last_inactive_time
for each slot and persist that information as well. Now, if the server
is down for a long time, we may invalidate the slots as soon as the
server comes up. So, instead, we just set this field at the time we
read slots for disk and then reset it to 0/NULL as soon as the slot
became active.

Right, and we also want to invalidate the slot if not used duration > timeout,
so that setting the field to zero when the slot is restored from disk is also not
an option.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Amit Kapila

amit.kapila16@gmail.com

almost 2 years ago

In reply to: Bertrand Drouvot (#6)

Re: pgsql: Track last_inactive_time in pg_replication_slots.

On Mon, Mar 25, 2024 at 8:46 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

On Mon, Mar 25, 2024 at 08:38:16PM +0530, Amit Kapila wrote:

On Mon, Mar 25, 2024 at 7:51 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Mar 25, 2024 at 10:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

We considered the other two names as last_inactive_at and
last_active_time. For the first (last_inactive_at), there was an
argument that most other fields that display time ends with _time. For
the second (last_active_time), there was an argument that it could be
misleading as one could think that it should be updated each time WAL
record decoding is happening [1]. The other possibility is to name it
last_used_time but I think it won't be much different from
last_active_time.

I don't understand the bit about updating it each time WAL record
decoding is happening. If it's the last active time, and the slot is
currently active, then the answer is either "right now" or "currently
undefined." I'd expect to see NULL in the system view in such a case.
And if that's so, then there's nothing to update each time a record is
decoded, because it's just still going to show NULL.

IIUC, Bertrand's point was that users can interpret last_active_time
as a value that gets updated each time they decode a change which is
not what we are doing. So, this can confuse users. Your expectation of
answer (NULL) when the slot is active is correct and that is what will
happen.

Yeah, and so would be the confusion: why is last_active_time NULL while one is
using the slot?

It is because we set it to zero when we acquire the slot and that
value will remain the same till the slot is active. I am not sure if I
understood your question so what I am saying might not make sense.

--
With Regards,
Amit Kapila.

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

almost 2 years ago

In reply to: Amit Kapila (#7)

Re: pgsql: Track last_inactive_time in pg_replication_slots.

Hi,

On Mon, Mar 25, 2024 at 08:59:55PM +0530, Amit Kapila wrote:

On Mon, Mar 25, 2024 at 8:46 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

On Mon, Mar 25, 2024 at 08:38:16PM +0530, Amit Kapila wrote:

On Mon, Mar 25, 2024 at 7:51 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Mar 25, 2024 at 10:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

We considered the other two names as last_inactive_at and
last_active_time. For the first (last_inactive_at), there was an
argument that most other fields that display time ends with _time. For
the second (last_active_time), there was an argument that it could be
misleading as one could think that it should be updated each time WAL
record decoding is happening [1]. The other possibility is to name it
last_used_time but I think it won't be much different from
last_active_time.

I don't understand the bit about updating it each time WAL record
decoding is happening. If it's the last active time, and the slot is
currently active, then the answer is either "right now" or "currently
undefined." I'd expect to see NULL in the system view in such a case.
And if that's so, then there's nothing to update each time a record is
decoded, because it's just still going to show NULL.

IIUC, Bertrand's point was that users can interpret last_active_time
as a value that gets updated each time they decode a change which is
not what we are doing. So, this can confuse users. Your expectation of
answer (NULL) when the slot is active is correct and that is what will
happen.

Yeah, and so would be the confusion: why is last_active_time NULL while one is
using the slot?

It is because we set it to zero when we acquire the slot and that
value will remain the same till the slot is active. I am not sure if I
understood your question so what I am saying might not make sense.

There is no "real" question, I was just highlighting the confusion in case we
name the field "last_active_time".

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Robert Haas

robertmhaas@gmail.com

almost 2 years ago

In reply to: Bertrand Drouvot (#6)

Re: pgsql: Track last_inactive_time in pg_replication_slots.

On Mon, Mar 25, 2024 at 11:16 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

IIUC, Bertrand's point was that users can interpret last_active_time
as a value that gets updated each time they decode a change which is
not what we are doing. So, this can confuse users. Your expectation of
answer (NULL) when the slot is active is correct and that is what will
happen.

Yeah, and so would be the confusion: why is last_active_time NULL while one is
using the slot?

I agree that users could get confused here, but the solution to that
shouldn't be to give the field a name that is the opposite of what it
actually does. I expect a field called last_inactive_time to tell me
the last time that the slot was inactive. Here, it tells me the last
time that a currently-inactive slot previously *WAS* active. How can
you justify calling that the last *INACTIVE* time?

AFAICS, the user who has the confusion that you mention here is simply
wrong. If they are looking at a field called "last active time" and
the slot is active, then the correct answer is "right now" or
"undefined" and that is what they will see. Sure, they might not
understand that. But flipping the name of the field on its head cannot
be the right way to help them.

With the current naming, I expect to have the exact opposite confusion
as your hypothetical confused user. I'm going to be looking at a slot
that's currently inactive, and it's going to tell me that the
last_inactive_time was at some time in the past. And I'm going to say
"what the heck is going on here, the slot is inactive *right now*!"

Half of me wonders whether we should avoid this whole problem by
renaming it to something like last_state_change or
last_state_change_time, or maybe just state_change like we do in
pg_stat_activity, and making it mean the last time the slot flipped
between active and inactive in either direction. I'm not sure if this
is better, but unless I'm misunderstanding something, the current
situation is terrible.

--
Robert Haas
EDB: http://www.enterprisedb.com

#10

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

almost 2 years ago

In reply to: Robert Haas (#9)

Re: pgsql: Track last_inactive_time in pg_replication_slots.

Hi,

On Mon, Mar 25, 2024 at 11:49:00AM -0400, Robert Haas wrote:

On Mon, Mar 25, 2024 at 11:16 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

IIUC, Bertrand's point was that users can interpret last_active_time
as a value that gets updated each time they decode a change which is
not what we are doing. So, this can confuse users. Your expectation of
answer (NULL) when the slot is active is correct and that is what will
happen.

Yeah, and so would be the confusion: why is last_active_time NULL while one is
using the slot?

I agree that users could get confused here, but the solution to that
shouldn't be to give the field a name that is the opposite of what it
actually does. I expect a field called last_inactive_time to tell me
the last time that the slot was inactive. Here, it tells me the last
time that a currently-inactive slot previously *WAS* active. How can
you justify calling that the last *INACTIVE* time?

AFAICS, the user who has the confusion that you mention here is simply
wrong. If they are looking at a field called "last active time" and
the slot is active, then the correct answer is "right now" or
"undefined" and that is what they will see. Sure, they might not
understand that. But flipping the name of the field on its head cannot
be the right way to help them.

With the current naming, I expect to have the exact opposite confusion
as your hypothetical confused user. I'm going to be looking at a slot
that's currently inactive, and it's going to tell me that the
last_inactive_time was at some time in the past. And I'm going to say
"what the heck is going on here, the slot is inactive *right now*!"

Half of me wonders whether we should avoid this whole problem by
renaming it to something like last_state_change or
last_state_change_time, or maybe just state_change like we do in
pg_stat_activity, and making it mean the last time the slot flipped
between active and inactive in either direction. I'm not sure if this
is better, but unless I'm misunderstanding something, the current
situation is terrible.

Now that I read your arguments I think that last_<active|inactive>_time could be
both missleading because at the end they rely on users "expectation".

Would "released_time" sounds better? (at the end this is exactly what it does
represent unless for the case where it is restored from disk for which the meaning
would still makes sense to me though). It seems to me that released_time does not
lead to any expectation then removing any confusion.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#11

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

almost 2 years ago

In reply to: Amit Kapila (#3)

Re: pgsql: Track last_inactive_time in pg_replication_slots.

Hi,

On Mon, Mar 25, 2024 at 07:32:11PM +0530, Amit Kapila wrote:

On Mon, Mar 25, 2024 at 6:57 PM Robert Haas <robertmhaas@gmail.com> wrote:

And I'm suspicious that having an exception for slots being synced is
a bad idea. That makes too much of a judgement about how the user will
use this field. It's usually better to just expose the data, and if
the user needs helps to make sense of that data, then give them that
help separately.

The reason we didn't set this for sync slots is that they won't be
usable (one can't use them to decode WAL) unless standby is promoted
[2]. But I see your point as well. So, I have copied the others
involved in this discussion to see what they think.

Yeah I also see Robert's point. If we also sync the "last inactive time" field then
we would need to take care of the corner case mentioned by Shveta in [1]/messages/by-id/CAJpy0uCLu+mqAwAMum=pXE9YYsy0BE7hOSw_Wno5vjwpFY=63g@mail.gmail.com during
promotion.

[1]: /messages/by-id/CAJpy0uCLu+mqAwAMum=pXE9YYsy0BE7hOSw_Wno5vjwpFY=63g@mail.gmail.com

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#12

Robert Haas

robertmhaas@gmail.com

almost 2 years ago

In reply to: Bertrand Drouvot (#10)

Re: pgsql: Track last_inactive_time in pg_replication_slots.

On Mon, Mar 25, 2024 at 12:12 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Now that I read your arguments I think that last_<active|inactive>_time could be
both missleading because at the end they rely on users "expectation".

Well, the user is always going to expect *something* -- that's just
how language works.

Would "released_time" sounds better? (at the end this is exactly what it does
represent unless for the case where it is restored from disk for which the meaning
would still makes sense to me though). It seems to me that released_time does not
lead to any expectation then removing any confusion.

Yeah, that's not bad. I mean, I don't agree that released_time doesn't
lead to any expectation, but what it leads me to expect is that you're
going to tell me the time at which the slot was released. So if it's
currently active, then I see NULL, because it's not released; but if
it's inactive, then I see the time at which it became so.

In the same vein, I think deactivated_at or inactive_since might be
good names to consider. I think they get at the same thing as
released_time, but they avoid introducing a completely new word
(release, as opposed to active/inactive).

--
Robert Haas
EDB: http://www.enterprisedb.com

#13

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

almost 2 years ago

In reply to: Robert Haas (#12)

Re: pgsql: Track last_inactive_time in pg_replication_slots.

Hi,

On Mon, Mar 25, 2024 at 12:25:37PM -0400, Robert Haas wrote:

On Mon, Mar 25, 2024 at 12:12 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Would "released_time" sounds better? (at the end this is exactly what it does
represent unless for the case where it is restored from disk for which the meaning
would still makes sense to me though). It seems to me that released_time does not
lead to any expectation then removing any confusion.

Yeah, that's not bad. I mean, I don't agree that released_time doesn't
lead to any expectation,
but what it leads me to expect is that you're
going to tell me the time at which the slot was released. So if it's
currently active, then I see NULL, because it's not released; but if
it's inactive, then I see the time at which it became so.

In the same vein, I think deactivated_at or inactive_since might be
good names to consider. I think they get at the same thing as
released_time, but they avoid introducing a completely new word
(release, as opposed to active/inactive).

Yeah, I'd vote for inactive_since then.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#14

Nathan Bossart

nathandbossart@gmail.com

almost 2 years ago

In reply to: Bertrand Drouvot (#13)

Re: pgsql: Track last_inactive_time in pg_replication_slots.

On Mon, Mar 25, 2024 at 04:49:12PM +0000, Bertrand Drouvot wrote:

On Mon, Mar 25, 2024 at 12:25:37PM -0400, Robert Haas wrote:

In the same vein, I think deactivated_at or inactive_since might be
good names to consider. I think they get at the same thing as
released_time, but they avoid introducing a completely new word
(release, as opposed to active/inactive).

Yeah, I'd vote for inactive_since then.

Having only skimmed some of the related discussions, I'm inclined to agree
that inactive_since provides the clearest description for the column.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#15

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

almost 2 years ago

In reply to: Nathan Bossart (#14)

Re: pgsql: Track last_inactive_time in pg_replication_slots.

On Tue, Mar 26, 2024 at 1:30 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Mon, Mar 25, 2024 at 04:49:12PM +0000, Bertrand Drouvot wrote:

On Mon, Mar 25, 2024 at 12:25:37PM -0400, Robert Haas wrote:

In the same vein, I think deactivated_at or inactive_since might be
good names to consider. I think they get at the same thing as
released_time, but they avoid introducing a completely new word
(release, as opposed to active/inactive).

Yeah, I'd vote for inactive_since then.

Having only skimmed some of the related discussions, I'm inclined to agree
that inactive_since provides the clearest description for the column.

I think we all have some agreement on inactive_since. So, I'm
attaching the patch for that change.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#16

Amit Kapila

amit.kapila16@gmail.com

almost 2 years ago

In reply to: Robert Haas (#12)

Re: pgsql: Track last_inactive_time in pg_replication_slots.

On Mon, Mar 25, 2024 at 9:55 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Mar 25, 2024 at 12:12 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Would "released_time" sounds better? (at the end this is exactly what it does
represent unless for the case where it is restored from disk for which the meaning
would still makes sense to me though). It seems to me that released_time does not
lead to any expectation then removing any confusion.

Yeah, that's not bad. I mean, I don't agree that released_time doesn't
lead to any expectation, but what it leads me to expect is that you're
going to tell me the time at which the slot was released. So if it's
currently active, then I see NULL, because it's not released; but if
it's inactive, then I see the time at which it became so.

In the same vein, I think deactivated_at or inactive_since might be
good names to consider. I think they get at the same thing as
released_time, but they avoid introducing a completely new word
(release, as opposed to active/inactive).

We have a consensus on inactive_since, so I'll make that change. I
would also like to solicit your opinion on the other slot-level
parameter we are planning to introduce. This new slot-level parameter
will be named as inactive_timeout. This will indicate that once the
slot is inactive for the inactive_timeout period, we will invalidate
the slot. We are also discussing to have this parameter
(inactive_timeout) as GUC [1]/messages/by-id/20240325195443.GA2923888@nathanxps13. We can have this new parameter both at
the slot level and as well as a GUC, or just one of those.

[1]: /messages/by-id/20240325195443.GA2923888@nathanxps13

--
With Regards,
Amit Kapila.

#17

shveta malik

shveta.malik@gmail.com

almost 2 years ago

In reply to: Bertrand Drouvot (#11)

Re: pgsql: Track last_inactive_time in pg_replication_slots.

On Mon, Mar 25, 2024 at 9:54 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Hi,

On Mon, Mar 25, 2024 at 07:32:11PM +0530, Amit Kapila wrote:

On Mon, Mar 25, 2024 at 6:57 PM Robert Haas <robertmhaas@gmail.com> wrote:

And I'm suspicious that having an exception for slots being synced is
a bad idea. That makes too much of a judgement about how the user will
use this field. It's usually better to just expose the data, and if
the user needs helps to make sense of that data, then give them that
help separately.

The reason we didn't set this for sync slots is that they won't be
usable (one can't use them to decode WAL) unless standby is promoted
[2]. But I see your point as well. So, I have copied the others
involved in this discussion to see what they think.

Yeah I also see Robert's point. If we also sync the "last inactive time" field then
we would need to take care of the corner case mentioned by Shveta in [1] during
promotion.

I have suggested one potential solution for that in [1]/messages/by-id/CAJpy0uB-yE+Riw7JQ4hW0+igJxvPc+rq+9c7WyTa1Jz7+2gAiA@mail.gmail.com. Please have a look.

[1]: /messages/by-id/CAJpy0uB-yE+Riw7JQ4hW0+igJxvPc+rq+9c7WyTa1Jz7+2gAiA@mail.gmail.com

thanks
Shveta

#18

shveta malik

shveta.malik@gmail.com

almost 2 years ago

In reply to: Bharath Rupireddy (#15)

Re: pgsql: Track last_inactive_time in pg_replication_slots.

On Tue, Mar 26, 2024 at 1:50 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Tue, Mar 26, 2024 at 1:30 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Mon, Mar 25, 2024 at 04:49:12PM +0000, Bertrand Drouvot wrote:

On Mon, Mar 25, 2024 at 12:25:37PM -0400, Robert Haas wrote:

In the same vein, I think deactivated_at or inactive_since might be
good names to consider. I think they get at the same thing as
released_time, but they avoid introducing a completely new word
(release, as opposed to active/inactive).

Yeah, I'd vote for inactive_since then.

Having only skimmed some of the related discussions, I'm inclined to agree
that inactive_since provides the clearest description for the column.

I think we all have some agreement on inactive_since. So, I'm
attaching the patch for that change.

pg_proc.dat needs to be changed to refer to 'inactive_since' instead
of 'last_inactive_time' in the attached patch.

thanks
Shveta

#19

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

almost 2 years ago

In reply to: shveta malik (#17)

Re: pgsql: Track last_inactive_time in pg_replication_slots.

On Tue, Mar 26, 2024 at 9:38 AM shveta malik <shveta.malik@gmail.com> wrote:

On Mon, Mar 25, 2024 at 9:54 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Hi,

On Mon, Mar 25, 2024 at 07:32:11PM +0530, Amit Kapila wrote:

On Mon, Mar 25, 2024 at 6:57 PM Robert Haas <robertmhaas@gmail.com> wrote:

And I'm suspicious that having an exception for slots being synced is
a bad idea. That makes too much of a judgement about how the user will
use this field. It's usually better to just expose the data, and if
the user needs helps to make sense of that data, then give them that
help separately.

The reason we didn't set this for sync slots is that they won't be
usable (one can't use them to decode WAL) unless standby is promoted
[2]. But I see your point as well. So, I have copied the others
involved in this discussion to see what they think.

Yeah I also see Robert's point. If we also sync the "last inactive time" field then
we would need to take care of the corner case mentioned by Shveta in [1] during
promotion.

I have suggested one potential solution for that in [1]. Please have a look.

[1]: /messages/by-id/CAJpy0uB-yE+Riw7JQ4hW0+igJxvPc+rq+9c7WyTa1Jz7+2gAiA@mail.gmail.com

I posted the v21 patch implementing the above idea in the other thread
- /messages/by-id/CALj2ACXRFx9g7A9RFJZF7eBe=zxk7=apMRFuCgJJKYB7O=vgwg@mail.gmail.com.
For ease, I'm also attaching the patch in here.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#20

Alvaro Herrera

alvherre@2ndquadrant.com

almost 2 years ago

In reply to: Amit Kapila (#16)

Re: pgsql: Track last_inactive_time in pg_replication_slots.

On 2024-Mar-26, Amit Kapila wrote:

We have a consensus on inactive_since, so I'll make that change.

Sounds reasonable. So this is a timestamptz if the slot is inactive,
NULL if active, right? What value is it going to have for sync slots?

I would also like to solicit your opinion on the other slot-level
parameter we are planning to introduce. This new slot-level parameter
will be named as inactive_timeout.

Maybe inactivity_timeout?

This will indicate that once the slot is inactive for the
inactive_timeout period, we will invalidate the slot. We are also
discussing to have this parameter (inactive_timeout) as GUC [1]. We
can have this new parameter both at the slot level and as well as a
GUC, or just one of those.

replication_slot_inactivity_timeout?

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Tom: There seems to be something broken here.
Teodor: I'm in sackcloth and ashes... Fixed.
/messages/by-id/482D1632.8010507@sigaev.ru

#21

Amit Kapila

amit.kapila16@gmail.com

almost 2 years ago

In reply to: Alvaro Herrera (#20)

#22

Alvaro Herrera

alvherre@2ndquadrant.com

almost 2 years ago

In reply to: Amit Kapila (#21)

#23

Bertrand Drouvot