Reject negative max_retention_duration values

Started by Chao Li15 days ago9 messageshackers

li.evan.chao@gmail.com

15 days ago

Hi,

While testing “[a850be2fe] Add max_retention_duration option to subscriptions”, I found a small problem where max_retention_duration accepts negative values.

See this repro:
```
evantest=# create subscription sub connection 'dbname=postgres' publication p with (connect=false, max_retention_duration=-100);
WARNING: subscription was created, but is not connected
HINT: To initiate replication, you must manually create the replication slot, enable the subscription, and alter the subscription to refresh publications.
CREATE SUBSCRIPTION
evantest=# select subname, submaxretention from pg_subscription where subname='sub';
subname | submaxretention
---------+-----------------
sub | -100
(1 row)
```

I gave it -100, and it was stored in pg_subscription.

I went through the docs and didn’t find anything mentioning whether a negative value is acceptable. But from the code, I think this indicates that a negative value is meaningless:
```
/*
* Ensure that system configuration parameters are set appropriately to
* support retain_dead_tuples and max_retention_duration.
*/
CheckSubDeadTupleRetention(true, !opts.enabled, WARNING,
opts.retaindeadtuples, opts.retaindeadtuples,
(opts.maxretention > 0));
```

The last parameter of CheckSubDeadTupleRetention checks (opts.maxretention > 0), which implies that max_retention_duration is only meaningful when it is greater than 0. Also, the docs say that 0 means “unlimited”.

From the following function, I think a negative value actually hurts:
```
/*
* Check whether conflict information retention should be stopped due to
* exceeding the maximum wait time (max_retention_duration).
*
* If retention should be stopped, return true. Otherwise, return false.
*/
static bool
should_stop_conflict_info_retention(RetainDeadTuplesData *rdt_data)
{
TimestampTz now;

Assert(TransactionIdIsValid(rdt_data->candidate_xid));
Assert(rdt_data->phase == RDT_WAIT_FOR_PUBLISHER_STATUS ||
rdt_data->phase == RDT_WAIT_FOR_LOCAL_FLUSH);

if (!MySubscription->maxretention)
return false;

/*
* Use last_recv_time when applying changes in the loop to avoid
* unnecessary system time retrieval. If last_recv_time is not available,
* obtain the current timestamp.
*/
now = rdt_data->last_recv_time ? rdt_data->last_recv_time : GetCurrentTimestamp();

/*
* Return early if the wait time has not exceeded the configured maximum
* (max_retention_duration). Time spent waiting for table synchronization
* is excluded from this calculation, as it occurs infrequently.
*/
if (!TimestampDifferenceExceeds(rdt_data->candidate_xid_time, now,
MySubscription->maxretention +
rdt_data->table_sync_wait_time))
return false;

return true;
}
```

It only checks !MySubscription->maxretention, so a negative value will pass. Then, in TimestampDifferenceExceeds, the negative value is counted, causing the actual wait time to be shorter than the intended value.

So, I think we should reject negative values in the first place. The attached tiny patch rejects negative max_retention_duration.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Amit Kapila

amit.kapila16@gmail.com

14 days ago

In reply to: Chao Li (#1)

Re: Reject negative max_retention_duration values

On Tue, Jun 9, 2026 at 1:35 PM Chao Li <li.evan.chao@gmail.com> wrote:

```

It only checks !MySubscription->maxretention, so a negative value will pass. Then, in TimestampDifferenceExceeds, the negative value is counted, causing the actual wait time to be shorter than the intended value.

Thanks for the report and analysis. On first read, it sounds correct
to me, I'll spend some more time and share my feedback.

--
With Regards,
Amit Kapila.

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

14 days ago

In reply to: Chao Li (#1)

RE: Reject negative max_retention_duration values

Dear Chao,

+1 the idea to reject the negative value.

```
@@ -4796,7 +4796,7 @@ should_stop_conflict_info_retention(RetainDeadTuplesData *rdt_data)
Assert(rdt_data->phase == RDT_WAIT_FOR_PUBLISHER_STATUS ||
rdt_data->phase == RDT_WAIT_FOR_LOCAL_FLUSH);

-	if (!MySubscription->maxretention)
+	if (MySubscription->maxretention <= 0)
 		return false;
```

IIUC, the negative value can be rejected at CREATE/ALTER SUBSCRIPTION phase, right?
Why do we have to update even here? For the clarification purpose?

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Chao Li

li.evan.chao@gmail.com

14 days ago

In reply to: Hayato Kuroda (Fujitsu) (#3)

Re: Reject negative max_retention_duration values

On Jun 9, 2026, at 22:44, Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote:

Dear Chao,

+1 the idea to reject the negative value.

```
@@ -4796,7 +4796,7 @@ should_stop_conflict_info_retention(RetainDeadTuplesData *rdt_data)
Assert(rdt_data->phase == RDT_WAIT_FOR_PUBLISHER_STATUS ||
rdt_data->phase == RDT_WAIT_FOR_LOCAL_FLUSH);
- if (!MySubscription->maxretention)
+ if (MySubscription->maxretention <= 0)
return false;
```
IIUC, the negative value can be rejected at CREATE/ALTER SUBSCRIPTION phase, right?
Why do we have to update even here? For the clarification purpose?

Thanks for your review.

Yes, this patch rejects negative values at CREATE/ALTER SUBSCRIPTION time, so in theory the if (MySubscription->maxretention <= 0) change is not strictly necessary. I made that change for a few reasons (from strong do weak):

* The new check is equivalent to if (!(MySubscription->maxretention > 0)), meaning MySubscription->maxretention is not meaningful for this function unless it is positive. So semantically, I think it is correct, or at least it does not hurt anything.
* The new check makes the purpose more explicit in this function: only positive values are meaningful here.
* It also keeps the worker logic self-contained: this function only performs timeout checks when max_retention_duration is positive, instead of relying entirely on option parsing to ensure that.
* Since v19beta1 has been released, there is a very small possibility that some databases, most likely test deployments, may already have negative max_retention_duration values, so this also acts as defensive coding. I understand beta1 can never be a production deployment, so this is just a weak reason.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

14 days ago

In reply to: Chao Li (#4)

RE: Reject negative max_retention_duration values

Dear Chao,

Yes, this patch rejects negative values at CREATE/ALTER SUBSCRIPTION time,
so in theory the if (MySubscription->maxretention <= 0) change is not strictly
necessary. I made that change for a few reasons (from strong do weak):

I personal preference is to use Assert() for detecting cannot-happen case,
but it's not very strong opinion. Let's see how others say.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Michael Paquier

michael@paquier.xyz

14 days ago

In reply to: Hayato Kuroda (Fujitsu) (#5)

Re: Reject negative max_retention_duration values

On Wed, Jun 10, 2026 at 05:12:26AM +0000, Hayato Kuroda (Fujitsu) wrote:

Dear Chao,

Yes, this patch rejects negative values at CREATE/ALTER SUBSCRIPTION time,
so in theory the if (MySubscription->maxretention <= 0) change is not strictly
necessary. I made that change for a few reasons (from strong do weak):

I personal preference is to use Assert() for detecting cannot-happen case,
but it's not very strong opinion. Let's see how others say.

An assertion offers less protection than an elog(ERROR) if a value is
read from catalogs, which could be the case here? Think for example
corrupted catalog data. (I did not read the patch in details, so I
may have missed something, of course, but I was under the impression
that this could apply for this case with MySubscription.)
--
Michael

Chao Li

li.evan.chao@gmail.com

14 days ago

In reply to: Michael Paquier (#6)

Re: Reject negative max_retention_duration values

On Jun 10, 2026, at 13:40, Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Jun 10, 2026 at 05:12:26AM +0000, Hayato Kuroda (Fujitsu) wrote:

Dear Chao,

Yes, this patch rejects negative values at CREATE/ALTER SUBSCRIPTION time,
so in theory the if (MySubscription->maxretention <= 0) change is not strictly
necessary. I made that change for a few reasons (from strong do weak):

I personal preference is to use Assert() for detecting cannot-happen case,
but it's not very strong opinion. Let's see how others say.

An assertion offers less protection than an elog(ERROR) if a value is
read from catalogs, which could be the case here? Think for example
corrupted catalog data. (I did not read the patch in details, so I
may have missed something, of course, but I was under the impression
that this could apply for this case with MySubscription.)
--
Michael

AFAIK, Assert is mostly for catching code bugs, so I don’t think it is the right tool here. I agree that an elog(ERROR) would catch an invalid catalog value at runtime.

However, looking at the nearly code where MySubscription->maxretention is read:

LogicalRepApplyLoop()
```
else if (MySubscription->maxretention > 0)
wait_time = Min(wait_time, MySubscription->maxretention);
```

adjust_xid_advance_interval()
```
if (MySubscription->retentionactive && MySubscription->maxretention > 0)
rdt_data->xid_advance_interval = Min(rdt_data->xid_advance_interval,
MySubscription->maxretention);
```

Both treat the timeout as active only when maxretention > 0. So making should_stop_conflict_info_retention() return false when maxretention <= 0 is consistent with the existing pattern. If we add an elog(ERROR) for MySubscription->maxretention < 0 here, then the question is why we don’t add the same check in the other places too.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Amit Kapila

amit.kapila16@gmail.com

13 days ago

In reply to: Chao Li (#7)

Re: Reject negative max_retention_duration values

On Wed, Jun 10, 2026 at 12:47 PM Chao Li <li.evan.chao@gmail.com> wrote:

On Jun 10, 2026, at 13:40, Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Jun 10, 2026 at 05:12:26AM +0000, Hayato Kuroda (Fujitsu) wrote:

Dear Chao,

Yes, this patch rejects negative values at CREATE/ALTER SUBSCRIPTION time,
so in theory the if (MySubscription->maxretention <= 0) change is not strictly
necessary. I made that change for a few reasons (from strong do weak):

I personal preference is to use Assert() for detecting cannot-happen case,
but it's not very strong opinion. Let's see how others say.

An assertion offers less protection than an elog(ERROR) if a value is
read from catalogs, which could be the case here? Think for example
corrupted catalog data. (I did not read the patch in details, so I
may have missed something, of course, but I was under the impression
that this could apply for this case with MySubscription.)
--
Michael

AFAIK, Assert is mostly for catching code bugs, so I don’t think it is the right tool here. I agree that an elog(ERROR) would catch an invalid catalog value at runtime.

However, looking at the nearly code where MySubscription->maxretention is read:

LogicalRepApplyLoop()
```
else if (MySubscription->maxretention > 0)
wait_time = Min(wait_time, MySubscription->maxretention);
```

adjust_xid_advance_interval()
```
if (MySubscription->retentionactive && MySubscription->maxretention > 0)
rdt_data->xid_advance_interval = Min(rdt_data->xid_advance_interval,
MySubscription->maxretention);
```

Both treat the timeout as active only when maxretention > 0. So making should_stop_conflict_info_retention() return false when maxretention <= 0 is consistent with the existing pattern. If we add an elog(ERROR) for MySubscription->maxretention < 0 here, then the question is why we don’t add the same check in the other places too.

You have made some good points for this defensive coding and I do see
value in those but I still feel we can leave the original code as-is.
The new people who try to understand this code area could question why
this < is required here similar to Kuroda-san, we can try to address
that via a comment but I feel there is no strong appeal for the same.
If you still want to make the case, you can extract that part of the
code and we can discuss it separately, if we reach consensus on the
same, I would be happy to commit the same.

--
With Regards,
Amit Kapila.

Chao Li

li.evan.chao@gmail.com

13 days ago

In reply to: Amit Kapila (#8)

Re: Reject negative max_retention_duration values

On Jun 10, 2026, at 18:31, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jun 10, 2026 at 12:47 PM Chao Li <li.evan.chao@gmail.com> wrote:

On Jun 10, 2026, at 13:40, Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Jun 10, 2026 at 05:12:26AM +0000, Hayato Kuroda (Fujitsu) wrote:

Dear Chao,

Yes, this patch rejects negative values at CREATE/ALTER SUBSCRIPTION time,
so in theory the if (MySubscription->maxretention <= 0) change is not strictly
necessary. I made that change for a few reasons (from strong do weak):

I personal preference is to use Assert() for detecting cannot-happen case,
but it's not very strong opinion. Let's see how others say.

An assertion offers less protection than an elog(ERROR) if a value is
read from catalogs, which could be the case here? Think for example
corrupted catalog data. (I did not read the patch in details, so I
may have missed something, of course, but I was under the impression
that this could apply for this case with MySubscription.)
--
Michael

AFAIK, Assert is mostly for catching code bugs, so I don’t think it is the right tool here. I agree that an elog(ERROR) would catch an invalid catalog value at runtime.

However, looking at the nearly code where MySubscription->maxretention is read:

LogicalRepApplyLoop()
```
else if (MySubscription->maxretention > 0)
wait_time = Min(wait_time, MySubscription->maxretention);
```

adjust_xid_advance_interval()
```
if (MySubscription->retentionactive && MySubscription->maxretention > 0)
rdt_data->xid_advance_interval = Min(rdt_data->xid_advance_interval,
MySubscription->maxretention);
```

Both treat the timeout as active only when maxretention > 0. So making should_stop_conflict_info_retention() return false when maxretention <= 0 is consistent with the existing pattern. If we add an elog(ERROR) for MySubscription->maxretention < 0 here, then the question is why we don’t add the same check in the other places too.

You have made some good points for this defensive coding and I do see
value in those but I still feel we can leave the original code as-is.
The new people who try to understand this code area could question why
this < is required here similar to Kuroda-san, we can try to address
that via a comment but I feel there is no strong appeal for the same.
If you still want to make the case, you can extract that part of the
code and we can discuss it separately, if we reach consensus on the
same, I would be happy to commit the same.

--
With Regards,
Amit Kapila.

I don’t have a strong opinion there. So I reverted the change from worker.c. Other parts are unchanged from v1.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Reject negative max_retention_duration values

Attachments:

Attachments: