Unexpected behavior when setting "idle_replication_slot_timeout"

Started by Gunnar Morling9 months ago38 messagesbugs
Jump to latest
#1Gunnar Morling
gunnar.morling@googlemail.com

Hi all,

I am exploring the new setting "idle_replication_slot_timeout" in Postgres
18; for testing purposes, I set the value to "30s", which, unexpectedly to
me, didn't cause an idle slot to be invalidated when I triggered a
checkpoint after the timeout had been reached.

The docs of the option state that the value is rounded up or down to the
nearest full minute, so I reckon "30s" gets rounded down to 0, thus
effectively disabling the feature. It might be less surprising to users if
values between "1s" and "59s" get actually always rounded up to one minute?
Arguably, that'd seem the more intuitive behavior to me. Alternatively,
logging a warning might be considered for values between "1s" and "30s"?
Curious what folks here think.

Thanks and all the best,

--Gunnar

#2Nisha Moond
nisha.moond412@gmail.com
In reply to: Gunnar Morling (#1)
Re: Unexpected behavior when setting "idle_replication_slot_timeout"

On Fri, Jul 4, 2025 at 4:35 PM Gunnar Morling
<gunnar.morling@googlemail.com> wrote:

Hi all,

I am exploring the new setting "idle_replication_slot_timeout" in Postgres 18; for testing purposes, I set the value to "30s", which, unexpectedly to me, didn't cause an idle slot to be invalidated when I triggered a checkpoint after the timeout had been reached.

The docs of the option state that the value is rounded up or down to the nearest full minute, so I reckon "30s" gets rounded down to 0, thus effectively disabling the feature. It might be less surprising to users if values between "1s" and "59s" get actually always rounded up to one minute? Arguably, that'd seem the more intuitive behavior to me. Alternatively, logging a warning might be considered for values between "1s" and "30s"? Curious what folks here think.

Thanks for bringing this up!

Yes, this is expected behavior, idle_replication_slot_timeout accepts
values in minutes, so a setting like "30s" is rounded down to 0,
effectively disabling the timeout, while values >= "31s" are rounded
up to 1.

This behavior isn’t specific to this GUC as Postgres generally rounds
values below a parameter’s minimum unit without a warning. For
example, wal_summary_keep_time and log_rotation_age behave the same
way.

--
Thanks,
Nisha

#3Fujii Masao
masao.fujii@gmail.com
In reply to: Nisha Moond (#2)
Re: Unexpected behavior when setting "idle_replication_slot_timeout"

On 2025/07/04 21:14, Nisha Moond wrote:

On Fri, Jul 4, 2025 at 4:35 PM Gunnar Morling
<gunnar.morling@googlemail.com> wrote:

Hi all,

I am exploring the new setting "idle_replication_slot_timeout" in Postgres 18; for testing purposes, I set the value to "30s", which, unexpectedly to me, didn't cause an idle slot to be invalidated when I triggered a checkpoint after the timeout had been reached.

The docs of the option state that the value is rounded up or down to the nearest full minute, so I reckon "30s" gets rounded down to 0, thus effectively disabling the feature.

When I first tried using idle_replication_slot_timeout, I also encountered this issue.

It might be less surprising to users if values between "1s" and "59s" get actually always rounded up to one minute? Arguably, that'd seem the more intuitive behavior to me. Alternatively, logging a warning might be considered for values between "1s" and "30s"? Curious what folks here think.

Thanks for bringing this up!

Yes, this is expected behavior, idle_replication_slot_timeout accepts
values in minutes, so a setting like "30s" is rounded down to 0,
effectively disabling the timeout, while values >= "31s" are rounded
up to 1.

This behavior isn’t specific to this GUC as Postgres generally rounds
values below a parameter’s minimum unit without a warning. For
example, wal_summary_keep_time and log_rotation_age behave the same
way.

Right.

But I wonder why the current unit of this GUC is minutes (GUC_UNIT_MIN).
Since at least two users (including myself) tried to set it to a value
less than 1 minute, it might worth considering changing the unit to seconds
(GUC_UNIT_S). Also which would reduces the chance of the reported trouble.

Regards,

--
Fujii Masao
NTT DATA Japan Corporation

#4Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#3)
Re: Unexpected behavior when setting "idle_replication_slot_timeout"

On 2025/07/04 23:12, Fujii Masao wrote:

But I wonder why the current unit of this GUC is minutes (GUC_UNIT_MIN).
Since at least two users (including myself) tried to set it to a value
less than 1 minute, it might worth considering changing the unit to seconds
(GUC_UNIT_S). Also which would reduces the chance of the reported trouble.

Attached patch changes unit of idle_replication_slot_timeout to seconds.

Regards,

--
Fujii Masao
NTT DATA Japan Corporation

Attachments:

v1-0001-Change-unit-of-idle_replication_slot_timeout-to-s.patchtext/plain; charset=UTF-8; name=v1-0001-Change-unit-of-idle_replication_slot_timeout-to-s.patchDownload+12-16
#5Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Fujii Masao (#4)
Re: Unexpected behavior when setting "idle_replication_slot_timeout"

On Sat, 2025-07-05 at 00:22 +0900, Fujii Masao wrote:

On 2025/07/04 23:12, Fujii Masao wrote:

But I wonder why the current unit of this GUC is minutes (GUC_UNIT_MIN).
Since at least two users (including myself) tried to set it to a value
less than 1 minute, it might worth considering changing the unit to seconds
(GUC_UNIT_S). Also which would reduces the chance of the reported trouble.

Attached patch changes unit of idle_replication_slot_timeout to seconds.

-1

I think that the reason that several users tried to set it it less than a minute
is that they were trying to test the feature and didn't want to wait long.
I cannot imagine that anybody will want to abandon a standby server just
because it is idle for more than 30 seconds.

For me, there would be two appealing alternatives:

1. Always round up to the next minute.

2. Use the value -1 to deactivate the feature.
Optionally, we could forbid the value 0 in this case.

Stepping back a bit, I am not happy with the documentation. I looked at it,
trying to figure out what the parameter does, and I am none the wiser.
What exactly does it mean for a replication slot to idle?

- Does it mean that the standby is not connected?
- Does it mean that the standby is connected, but gives no feedback?
Probably not, but I only guess that because I know that there is a different
parameter for that.
- Does it mean that the standby is giving feedback, but that feedback doesn't
indicate progress?

I think we could do better here.

Yours,
Laurenz Albe

#6Fujii Masao
masao.fujii@gmail.com
In reply to: Laurenz Albe (#5)
Re: Unexpected behavior when setting "idle_replication_slot_timeout"

On 2025/07/05 1:07, Laurenz Albe wrote:

On Sat, 2025-07-05 at 00:22 +0900, Fujii Masao wrote:

On 2025/07/04 23:12, Fujii Masao wrote:

But I wonder why the current unit of this GUC is minutes (GUC_UNIT_MIN).
Since at least two users (including myself) tried to set it to a value
less than 1 minute, it might worth considering changing the unit to seconds
(GUC_UNIT_S). Also which would reduces the chance of the reported trouble.

Attached patch changes unit of idle_replication_slot_timeout to seconds.

-1

I think that the reason that several users tried to set it it less than a minute
is that they were trying to test the feature and didn't want to wait long.
I cannot imagine that anybody will want to abandon a standby server just
because it is idle for more than 30 seconds.

Maybe. But changing the unit to seconds doesn't make things worse, does it?
It still allows users to set values greater than 1 minute, and also less than
1 minute for debugging or testing purposes, if needed.

Or are you suggesting we should disallow values below 1 minute?

Regards,

--
Fujii Masao
NTT DATA Japan Corporation

#7Gunnar Morling
gunnar.morling@googlemail.com
In reply to: Fujii Masao (#6)
Re: Unexpected behavior when setting "idle_replication_slot_timeout"

Hey all,

Thanks for all the replies!

Indeed I also wouldn't expect anyone to set this timeout to 30s for
purposes other than testing or exploration. But then again, if the option
can be set with second precision, is there any reason to not honor this? To
me, it boils down to the principle of least surprise for users, in
particular when it comes to a setting like "30s" being interpreted as
disabling the feature.

What exactly does it mean for a replication slot to idle?

I'd like to echo this; I also was/am confused by the term "idle" here; it
isn't fully clear to me what it means for a slot to be idle, and in
particular whether it is different from a slot being inactive.
pg_replication_slots uses the "active"/"inactive" terminology, and it seems
that this is what idleness is about, as per the docs of the setting [1]https://www.postgresql.org/docs/18/runtime-config-replication.html#GUC-IDLE-REPLICATION-SLOT-TIMEOUT:

[Users] can force a checkpoint to promptly invalidate inactive slots. The

duration of slot inactivity is calculated using the slot's
pg_replication_slots.inactive_since value.

If so, "inactive_replication_slot_timeout" might be a more consistent name
for that option?

Best,

--Gunnar

[1]: https://www.postgresql.org/docs/18/runtime-config-replication.html#GUC-IDLE-REPLICATION-SLOT-TIMEOUT
https://www.postgresql.org/docs/18/runtime-config-replication.html#GUC-IDLE-REPLICATION-SLOT-TIMEOUT

On Fri, 4 Jul 2025 at 18:24, Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:

Show quoted text

On 2025/07/05 1:07, Laurenz Albe wrote:

On Sat, 2025-07-05 at 00:22 +0900, Fujii Masao wrote:

On 2025/07/04 23:12, Fujii Masao wrote:

But I wonder why the current unit of this GUC is minutes

(GUC_UNIT_MIN).

Since at least two users (including myself) tried to set it to a value
less than 1 minute, it might worth considering changing the unit to

seconds

(GUC_UNIT_S). Also which would reduces the chance of the reported

trouble.

Attached patch changes unit of idle_replication_slot_timeout to seconds.

-1

I think that the reason that several users tried to set it it less than

a minute

is that they were trying to test the feature and didn't want to wait

long.

I cannot imagine that anybody will want to abandon a standby server just
because it is idle for more than 30 seconds.

Maybe. But changing the unit to seconds doesn't make things worse, does it?
It still allows users to set values greater than 1 minute, and also less
than
1 minute for debugging or testing purposes, if needed.

Or are you suggesting we should disallow values below 1 minute?

Regards,

--
Fujii Masao
NTT DATA Japan Corporation

#8Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Fujii Masao (#6)
Re: Unexpected behavior when setting "idle_replication_slot_timeout"

On Sat, 2025-07-05 at 01:24 +0900, Fujii Masao wrote:

On 2025/07/05 1:07, Laurenz Albe wrote:

On Sat, 2025-07-05 at 00:22 +0900, Fujii Masao wrote:

On 2025/07/04 23:12, Fujii Masao wrote:

But I wonder why the current unit of this GUC is minutes (GUC_UNIT_MIN).
Since at least two users (including myself) tried to set it to a value
less than 1 minute, it might worth considering changing the unit to seconds
(GUC_UNIT_S). Also which would reduces the chance of the reported trouble.

Attached patch changes unit of idle_replication_slot_timeout to seconds.

-1

I think that the reason that several users tried to set it it less than a minute
is that they were trying to test the feature and didn't want to wait long.
I cannot imagine that anybody will want to abandon a standby server just
because it is idle for more than 30 seconds.

Maybe. But changing the unit to seconds doesn't make things worse, does it?
It still allows users to set values greater than 1 minute, and also less than
1 minute for debugging or testing purposes, if needed.

Or are you suggesting we should disallow values below 1 minute?

I guess you are right. There is no problem with second precision, even if
the use case in this case was artificial.
I withdraw my objection.

Gunnar Morlin wrote:

I also was/am confused by the term "idle" here; it isn't fully clear to me
what it means for a slot to be idle, and in particular whether it is different
from a slot being inactive.  [...]

If so, "inactive_replication_slot_timeout" might be a more consistent name
for that option?

Perhaps. I must say that I don't care so much about the name, as long as the
documentation doesn't leave any doubts.

Yours,
Laurenz Albe

#9Amit Kapila
amit.kapila16@gmail.com
In reply to: Fujii Masao (#6)
Re: Unexpected behavior when setting "idle_replication_slot_timeout"

On Fri, Jul 4, 2025 at 9:54 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2025/07/05 1:07, Laurenz Albe wrote:

On Sat, 2025-07-05 at 00:22 +0900, Fujii Masao wrote:

On 2025/07/04 23:12, Fujii Masao wrote:

But I wonder why the current unit of this GUC is minutes (GUC_UNIT_MIN).
Since at least two users (including myself) tried to set it to a value
less than 1 minute, it might worth considering changing the unit to seconds
(GUC_UNIT_S). Also which would reduces the chance of the reported trouble.

Attached patch changes unit of idle_replication_slot_timeout to seconds.

-1

I think that the reason that several users tried to set it it less than a minute
is that they were trying to test the feature and didn't want to wait long.
I cannot imagine that anybody will want to abandon a standby server just
because it is idle for more than 30 seconds.

Maybe. But changing the unit to seconds doesn't make things worse, does it?
It still allows users to set values greater than 1 minute, and also less than
1 minute for debugging or testing purposes, if needed.

We expect the value of this variable to be in hours or, in some cases,
days. Specifying in seconds would be inconvenient for users. Now, I
agree there is a value in testing/debugging to allow it to be seconds,
but the same could be said for other variables whose units are in
minutes, like log_rotation_age and wal_summary_keep_time.

Or are you suggesting we should disallow values below 1 minute?

We should be consistent with other similar GUC variables whose unit is
in GUC_UNIT_MIN.

--
With Regards,
Amit Kapila.

#10Amit Kapila
amit.kapila16@gmail.com
In reply to: Laurenz Albe (#5)
Re: Unexpected behavior when setting "idle_replication_slot_timeout"

On Fri, Jul 4, 2025 at 9:37 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

Stepping back a bit, I am not happy with the documentation. I looked at it,
trying to figure out what the parameter does, and I am none the wiser.
What exactly does it mean for a replication slot to idle?

- Does it mean that the standby is not connected?

It means the above. The slot is used for purposes other than the
standby as well, so we can't mention something only specific to the
standby. This parameter mainly cares when the slot becomes inactive.
You can get more information by looking at
pg_replication_slots.inactive_since (as indicated by sentence: "The
duration of slot inactivity is calculated using the slot's
pg_replication_slots.inactive_since value.").

I think we could do better here.

Sure, feel free to propose what you think makes it better.

--
With Regards,
Amit Kapila.

#11Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Amit Kapila (#10)
Re: Unexpected behavior when setting "idle_replication_slot_timeout"

On Sat, 2025-07-05 at 09:52 +0530, Amit Kapila wrote:

On Fri, Jul 4, 2025 at 9:37 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

What exactly does it mean for a replication slot to idle?

- Does it mean that the standby is not connected?

It means the above. The slot is used for purposes other than the
standby as well, so we can't mention something only specific to the
standby.

I think we could do better here.

Sure, feel free to propose what you think makes it better.

Done in the attached patch.

We expect the value of this variable to be in hours or, in some cases,
days. Specifying in seconds would be inconvenient for users.

I don't buy that argument. Specifying shared_buffers in 8kB blocks
would be quite inconvenient to most users, but I don't remember any
complaints about it. One of the nice things about the Grand Unified
Configuration is that you can use units different from the default.

On the other hand, if the behavior is clearly documented, as I have
tried to do with my patch, it should be fine. So I'll rest my case if
you apply my patch.

Yours,
Laurenz Albe

Attachments:

v1-0001-Improve-doc-for-idle_replication_slot_timeout.patchtext/x-patch; charset=UTF-8; name=v1-0001-Improve-doc-for-idle_replication_slot_timeout.patchDownload+8-7
#12David G. Johnston
david.g.johnston@gmail.com
In reply to: Laurenz Albe (#11)
Re: Unexpected behavior when setting "idle_replication_slot_timeout"

On Friday, July 4, 2025, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

One of the nice things about the Grand Unified
Configuration is that you can use units different from the default.

Supplying the value as '2 h', SHOW will display 2h while
pg_settings.setting shows 120 (min).

David J.

#13David G. Johnston
david.g.johnston@gmail.com
In reply to: Laurenz Albe (#11)
Re: Unexpected behavior when setting "idle_replication_slot_timeout"

On Fri, Jul 4, 2025 at 10:35 PM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

On the other hand, if the behavior is clearly documented, as I have
tried to do with my patch, it should be fine. So I'll rest my case if
you apply my patch.

We should clearly document how rounding works in section 19.1.1 (which we
mostly do; "If the parameter is of integer type, a final rounding to
integer occurs after any unit conversion.") and not in every time-related
setting that chooses to use something larger than microseconds. So, 30s is
'unit converted' up to 0.5 minutes (not explicitly explained...) then
rounded to zero (which is odd, half normally rounds up...). I'm against
cluttering up the individual settings docs with this detail.

If the change from idle to inactive is needed in the description we
should just admit we named it wrong in the first place. As-is, the
description matches the name and the callout to the field in the second
paragraph precisely clears up what this setting at least cares about. The
reader should be directed to how that field is computed should they need
clarification.

Thus, I'd accept but not find required the idle/inactive wording change to
any of various degrees; and would ask that any clarification regarding
generic setting value interpretation be relegated to 19.1.1 where all such
settings can benefit.

David J.

#14Laurenz Albe
laurenz.albe@cybertec.at
In reply to: David G. Johnston (#13)
Re: Unexpected behavior when setting "idle_replication_slot_timeout"

On Fri, 2025-07-04 at 23:16 -0700, David G. Johnston wrote:

We should clearly document how rounding works in section 19.1.1
(which we mostly do; "If the parameter is of integer type, a final rounding
to integer occurs after any unit conversion.") and not in every
time-related setting that chooses to use something larger than microseconds.
So, 30s is 'unit converted' up to 0.5 minutes (not explicitly explained...)
then rounded to zero (which is odd, half normally rounds up...).
I'm against cluttering up the individual settings docs with this detail.

That's fine with me; do you have a patch?

If the change from idle to inactive is needed in the description we should
just admit we named it wrong in the first place.

I had half a mind to propose renaming the parameter, but I shied from
a lengthy bikeshedding discussion. Reading up on the archives, I see
that Peter Smith proposed the term "idle" in [1]/messages/by-id/CAHut+PtHbYNxPvtMfs7jARbsVcFXL1=C9SO3Q93NgVDgbKN7LQ@mail.gmail.com, and nobody had any
problem with it.

For the record: I would be much more happy if the parameter were called
"inactive_replication_slot_timeout", since we use the term "active" in
"pg_replication_slots". Also, we call connections "idle" when they are
established, but doing nothing, and this parameter is about disconnected
replication connections.

  As-is, the description
matches the name and the callout to the field in the second paragraph
precisely clears up what this setting at least cares about.  The reader
should be directed to how that field is computed should they need clarification.

Thus, I'd accept but not find required the idle/inactive wording change to
any of various degrees; and would ask that any clarification regarding
generic setting value interpretation be relegated to 19.1.1 where all
such settings can benefit.

I am sure that there is some information in these sentences, but I cannot
extract it, even after reading them twice.

Yours,
Laurenz Albe

[1]: /messages/by-id/CAHut+PtHbYNxPvtMfs7jARbsVcFXL1=C9SO3Q93NgVDgbKN7LQ@mail.gmail.com

#15David G. Johnston
david.g.johnston@gmail.com
In reply to: Laurenz Albe (#14)
Re: Unexpected behavior when setting "idle_replication_slot_timeout"

On Saturday, July 5, 2025, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

I am sure that there is some information in these sentences, but I cannot
extract it, even after reading them twice.

Maybe: “During checkpoint if the interval since
pg_replication_slots.inactive_since and now is larger than this value
pg_replication_slots.conflicting is set to true and
pg_replication_slots.inactive_reason is set to ‘timeout’. See section
wherever for more information on handling conflicted slots.”

Heck, writing this, “idle” is probably better, a slot can recover from
being idle on its own but usually inactive would imply having to do
something to make it active again.

IMO our documentation for replication has serious flaws but this particular
area is clear enough. Like any good timeout the slot is killed if it goes
unused “idle” for some length of time. We can describe that in many ways
but the name, to me, is fully descriptive and consistent with other
timeouts like “idle_in_transaction_timeout”.

David J.

#16Fujii Masao
masao.fujii@gmail.com
In reply to: Amit Kapila (#9)
Re: Unexpected behavior when setting "idle_replication_slot_timeout"

On 2025/07/05 13:06, Amit Kapila wrote:

On Fri, Jul 4, 2025 at 9:54 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2025/07/05 1:07, Laurenz Albe wrote:

On Sat, 2025-07-05 at 00:22 +0900, Fujii Masao wrote:

On 2025/07/04 23:12, Fujii Masao wrote:

But I wonder why the current unit of this GUC is minutes (GUC_UNIT_MIN).
Since at least two users (including myself) tried to set it to a value
less than 1 minute, it might worth considering changing the unit to seconds
(GUC_UNIT_S). Also which would reduces the chance of the reported trouble.

Attached patch changes unit of idle_replication_slot_timeout to seconds.

-1

I think that the reason that several users tried to set it it less than a minute
is that they were trying to test the feature and didn't want to wait long.
I cannot imagine that anybody will want to abandon a standby server just
because it is idle for more than 30 seconds.

Maybe. But changing the unit to seconds doesn't make things worse, does it?
It still allows users to set values greater than 1 minute, and also less than
1 minute for debugging or testing purposes, if needed.

We expect the value of this variable to be in hours or, in some cases,
days. Specifying in seconds would be inconvenient for users. Now, I
agree there is a value in testing/debugging to allow it to be seconds,
but the same could be said for other variables whose units are in
minutes, like log_rotation_age and wal_summary_keep_time.

Even if we change the unit to seconds (GUC_UNIT_S), it's still possible
to set the timeout values such as hours or days. For example, even with
the patch, we can set the timeout to 365 days:

=# ALTER SYSTEM SET idle_replication_slot_timeout TO '365d';
=# SELECT pg_reload_conf();
=# SHOW idle_replication_slot_timeout ;
idle_replication_slot_timeout
-------------------------------
365d
(1 row)

Do you see any serious downside to switching the unit to seconds? I don't
think it introduces any serious issues. On the contrary, it gives users finer
control over the timeout, and additionally works around the issue
that we're discussing here.

Regards,

--
Fujii Masao
NTT DATA Japan Corporation

#17Fujii Masao
masao.fujii@gmail.com
In reply to: David G. Johnston (#13)
Re: Unexpected behavior when setting "idle_replication_slot_timeout"

On 2025/07/05 15:16, David G. Johnston wrote:

On Fri, Jul 4, 2025 at 10:35 PM Laurenz Albe <laurenz.albe@cybertec.at <mailto:laurenz.albe@cybertec.at>> wrote:

On the other hand, if the behavior is clearly documented, as I have
tried to do with my patch, it should be fine.  So I'll rest my case if
you apply my patch.

We should clearly document how rounding works in section 19.1.1 (which we mostly do; "If the parameter is of integer type, a final rounding to integer occurs after any unit conversion.") and not in every time-related setting that chooses to use something larger than microseconds.  So, 30s is 'unit converted' up to 0.5 minutes (not explicitly explained...) then rounded to zero (which is odd, half normally rounds up...).

This happens because in this case rounding is done using rint(3),
which uses banker's rounding by default. While the rint(3)'s rounding method
can be changed with fesetround(), PostgreSQL doesn't seem to change it.
So 0.5 rounds to 0, 1.5 and 2.5 round to 2, 3.5 and 4.5 round to 4, and so on.

Regards,

--
Fujii Masao
NTT DATA Japan Corporation

#18David G. Johnston
david.g.johnston@gmail.com
In reply to: Fujii Masao (#16)
Re: Unexpected behavior when setting "idle_replication_slot_timeout"

On Saturday, July 5, 2025, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

Do you see any serious downside to switching the unit to seconds? I don't
think it introduces any serious issues. On the contrary, it gives users
finer
control over the timeout, and additionally works around the issue
that we're discussing here.

I do not, and would rather we make the change. Minutes are an
unconventional base unit for time in our world and should be avoided.

David J.

#19David G. Johnston
david.g.johnston@gmail.com
In reply to: Fujii Masao (#17)
Re: Unexpected behavior when setting "idle_replication_slot_timeout"

On Saturday, July 5, 2025, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2025/07/05 15:16, David G. Johnston wrote:

On Fri, Jul 4, 2025 at 10:35 PM Laurenz Albe <laurenz.albe@cybertec.at
<mailto:laurenz.albe@cybertec.at>> wrote:

On the other hand, if the behavior is clearly documented, as I have
tried to do with my patch, it should be fine. So I'll rest my case if
you apply my patch.

We should clearly document how rounding works in section 19.1.1 (which we
mostly do; "If the parameter is of integer type, a final rounding to
integer occurs after any unit conversion.") and not in every time-related
setting that chooses to use something larger than microseconds. So, 30s is
'unit converted' up to 0.5 minutes (not explicitly explained...) then
rounded to zero (which is odd, half normally rounds up...).

This happens because in this case rounding is done using rint(3),
which uses banker's rounding by default. While the rint(3)'s rounding
method
can be changed with fesetround(), PostgreSQL doesn't seem to change it.
So 0.5 rounds to 0, 1.5 and 2.5 round to 2, 3.5 and 4.5 round to 4, and so
on.

Right, a.k.a., half-even rounding. So, not odd.

David J.

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#18)
Re: Unexpected behavior when setting "idle_replication_slot_timeout"

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Saturday, July 5, 2025, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

Do you see any serious downside to switching the unit to seconds? I don't
think it introduces any serious issues. On the contrary, it gives users
finer
control over the timeout, and additionally works around the issue
that we're discussing here.

I do not, and would rather we make the change. Minutes are an
unconventional base unit for time in our world and should be avoided.

By my count, there are ten GUCs declared with GUC_UNIT_S and three
with GUC_UNIT_MIN. I'd say that there may be some lean towards
seconds but David's argument seems like pure hyperbole.

I'm kind of down on changing the unit, because it will *silently*
break configuration files where the value was set without a unit.

May I suggest an alternative? We could change the variable from int
to float type and continue to specify it in minutes. That will have
exactly zero compatibility impact, it allows sub-minute values to
be selected at need, and it removes the need for hair-splitting
documentation about what the rounding rules are.

We previously did the same with vacuum_cost_delay to avoid worries
about how to specify sub-millisecond precision for that. So the
infrastructure is already in place, I think. The patch will be
different from what is proposed but should need to touch pretty
much the same places.

regards, tom lane

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#20)
#22Fujii Masao
masao.fujii@gmail.com
In reply to: Tom Lane (#21)
#23Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Tom Lane (#21)
#24Amit Kapila
amit.kapila16@gmail.com
In reply to: Tom Lane (#21)
#25Amit Kapila
amit.kapila16@gmail.com
In reply to: Laurenz Albe (#23)
#26Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Amit Kapila (#25)
#27Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Laurenz Albe (#26)
#28Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Hayato Kuroda (Fujitsu) (#27)
#29Fujii Masao
masao.fujii@gmail.com
In reply to: Laurenz Albe (#28)
#30Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Fujii Masao (#29)
#31Amit Kapila
amit.kapila16@gmail.com
In reply to: Laurenz Albe (#30)
#32Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Amit Kapila (#31)
#33Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Hayato Kuroda (Fujitsu) (#32)
#34Fujii Masao
masao.fujii@gmail.com
In reply to: Laurenz Albe (#33)
#35Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Fujii Masao (#34)
#36Fujii Masao
masao.fujii@gmail.com
In reply to: Laurenz Albe (#35)
#37Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Fujii Masao (#36)
#38Gunnar Morling
gunnar.morling@googlemail.com
In reply to: Laurenz Albe (#37)