BRIN minmax multi - incorrect distance for infinite timestamp/date

Started by Tomas Vondraover 2 years ago27 messages

tomas.vondra@enterprisedb.com

over 2 years ago

1 attachment(s)

Hi,

Ashutosh Bapat reported me off-list a possible issue in how BRIN
minmax-multi calculate distance for infinite timestamp/date values.

The current code does this:

if (TIMESTAMP_NOT_FINITE(dt1) || TIMESTAMP_NOT_FINITE(dt2))
PG_RETURN_FLOAT8(0);

so means infinite values are "very close" to any other value, and thus
likely to be merged into a summary range. That's exactly the opposite of
what we want to do, possibly resulting in inefficient indexes.

Consider this example

create table test (a timestamptz) with (fillfactor=50);

insert into test
select (now() + ((10000 * random())::int || ' seconds')::interval)
from generate_series(1,1000000) s(i);

update test set a = '-infinity'::timestamptz where random() < 0.01;
update test set a = 'infinity'::timestamptz where random() < 0.01;

explain (analyze, timing off, costs off)
select * from test where a = '2024-01-01'::timestamptz;

QUERY PLAN

------------------------------------------------------------------------------
Bitmap Heap Scan on test (actual rows=0 loops=1)
Recheck Cond: (a = '2024-01-01 00:00:00+01'::timestamp with time zone)
Rows Removed by Index Recheck: 680662
Heap Blocks: lossy=6024
-> Bitmap Index Scan on test_a_idx (actual rows=60240 loops=1)
Index Cond: (a = '2024-01-01 00:00:00+01'::timestamp with time
zone)
Planning Time: 0.075 ms
Execution Time: 106.871 ms
(8 rows)

Clearly, large part of the table gets scanned - this happens because
when building the index, we end up with ranges like this:

[-infinity,a,b,c,...,x,y,z,infinity]

and we conclude that distance for [-infinity,a] is 0, and we combine
these values into a range. And the same for [z,infinity]. But we should
do exactly the opposite thing - never merge those.

Attached is a patch fixing this, with which the plan looks like this:

QUERY PLAN

------------------------------------------------------------------------------
Bitmap Heap Scan on test (actual rows=0 loops=1)
Recheck Cond: (a = '2024-01-01 00:00:00+01'::timestamp with time zone)
-> Bitmap Index Scan on test_a_idx (actual rows=0 loops=1)
Index Cond: (a = '2024-01-01 00:00:00+01'::timestamp with time
zone)
Planning Time: 0.289 ms
Execution Time: 9.432 ms
(6 rows)

Which seems much better.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

brin-infinity-fix.patchtext/x-patch; charset=UTF-8; name=brin-infinity-fix.patchDownload

diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index f8b2a3f9bc..c8775c274e 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2084,8 +2084,14 @@ brin_minmax_multi_distance_date(PG_FUNCTION_ARGS)
 	DateADT		dateVal1 = PG_GETARG_DATEADT(0);
 	DateADT		dateVal2 = PG_GETARG_DATEADT(1);
 
+	/*
+	 * If either value is infinite, we treat them as in infinite distance.
+	 * We deduplicate the values before calculating distances for them, so
+	 * either one value is finite, or the sign is different - so the
+	 * inifinite distance is appropriate for both cases.
+	 */
 	if (DATE_NOT_FINITE(dateVal1) || DATE_NOT_FINITE(dateVal2))
-		PG_RETURN_FLOAT8(0);
+		PG_RETURN_FLOAT8(get_float8_infinity());
 
 	PG_RETURN_FLOAT8(dateVal1 - dateVal2);
 }
@@ -2141,8 +2147,14 @@ brin_minmax_multi_distance_timestamp(PG_FUNCTION_ARGS)
 	Timestamp	dt1 = PG_GETARG_TIMESTAMP(0);
 	Timestamp	dt2 = PG_GETARG_TIMESTAMP(1);
 
+	/*
+	 * If either value is infinite, we treat them as in infinite distance.
+	 * We deduplicate the values before calculating distances for them, so
+	 * either one value is finite, or the sign is different - so the
+	 * inifinite distance is appropriate for both cases.
+	 */
 	if (TIMESTAMP_NOT_FINITE(dt1) || TIMESTAMP_NOT_FINITE(dt2))
-		PG_RETURN_FLOAT8(0);
+		PG_RETURN_FLOAT8(get_float8_infinity());
 
 	delta = dt2 - dt1;

Dean Rasheed

dean.a.rasheed@gmail.com

about 2 years ago

In reply to: Tomas Vondra (#1)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On Thu, 12 Oct 2023 at 23:43, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Ashutosh Bapat reported me off-list a possible issue in how BRIN
minmax-multi calculate distance for infinite timestamp/date values.

The current code does this:

if (TIMESTAMP_NOT_FINITE(dt1) || TIMESTAMP_NOT_FINITE(dt2))
PG_RETURN_FLOAT8(0);

Yes indeed, that looks wrong. I noticed the same thing while reviewing
the infinite interval patch.

so means infinite values are "very close" to any other value, and thus
likely to be merged into a summary range. That's exactly the opposite of
what we want to do, possibly resulting in inefficient indexes.

Is this only inefficient? Or can it also lead to wrong query results?

Attached is a patch fixing this

I wonder if it's actually necessary to give infinity any special
handling at all for dates and timestamps. For those types, "infinity"
is actually just INT_MIN/MAX, which compares correctly with any finite
value, and will be much larger/smaller than any common value, so it
seems like it isn't necessary to give "infinite" values any special
treatment. That would be consistent with date_cmp() and
timestamp_cmp().

Something else that looks wrong about that BRIN code is that the
integer subtraction might lead to overflow -- it is subtracting two
integer values, then casting the result to float8. It should cast each
input before subtracting, more like brin_minmax_multi_distance_int8().

IOW, I think brin_minmax_multi_distance_date/timestamp could be made
basically identical to brin_minmax_multi_distance_int4/8.

Regards,
Dean

Tomas Vondra

tomas.vondra@enterprisedb.com

about 2 years ago

In reply to: Dean Rasheed (#2)

1 attachment(s)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On 10/13/23 11:21, Dean Rasheed wrote:

On Thu, 12 Oct 2023 at 23:43, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Ashutosh Bapat reported me off-list a possible issue in how BRIN
minmax-multi calculate distance for infinite timestamp/date values.

The current code does this:

if (TIMESTAMP_NOT_FINITE(dt1) || TIMESTAMP_NOT_FINITE(dt2))
PG_RETURN_FLOAT8(0);

Yes indeed, that looks wrong. I noticed the same thing while reviewing
the infinite interval patch.

so means infinite values are "very close" to any other value, and thus
likely to be merged into a summary range. That's exactly the opposite of
what we want to do, possibly resulting in inefficient indexes.

Is this only inefficient? Or can it also lead to wrong query results?

I don't think it can produce incorrect results. It only affects which
values we "merge" into an interval when building the summaries.

Attached is a patch fixing this

I wonder if it's actually necessary to give infinity any special
handling at all for dates and timestamps. For those types, "infinity"
is actually just INT_MIN/MAX, which compares correctly with any finite
value, and will be much larger/smaller than any common value, so it
seems like it isn't necessary to give "infinite" values any special
treatment. That would be consistent with date_cmp() and
timestamp_cmp().

Right, but ....

Something else that looks wrong about that BRIN code is that the
integer subtraction might lead to overflow -- it is subtracting two
integer values, then casting the result to float8. It should cast each
input before subtracting, more like brin_minmax_multi_distance_int8().

... it also needs to fix this, otherwise it overflows. Consider

delta = dt2 - dt1;

and assume dt1 is INT64_MIN, or that dt2 is INT64_MAX.

IOW, I think brin_minmax_multi_distance_date/timestamp could be made
basically identical to brin_minmax_multi_distance_int4/8.

Right. Attached is a patch doing it this way.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

brin-infinity-fix-v2.patchtext/x-patch; charset=UTF-8; name=brin-infinity-fix-v2.patchDownload

diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index f8b2a3f9bc6..3d5e6d0c8f6 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2084,10 +2084,13 @@ brin_minmax_multi_distance_date(PG_FUNCTION_ARGS)
 	DateADT		dateVal1 = PG_GETARG_DATEADT(0);
 	DateADT		dateVal2 = PG_GETARG_DATEADT(1);
 
-	if (DATE_NOT_FINITE(dateVal1) || DATE_NOT_FINITE(dateVal2))
-		PG_RETURN_FLOAT8(0);
+	/*
+	 * We know the values are range boundaries, but the range may be collapsed
+	 * (i.e. single points), with equal values.
+	 */
+	Assert(dateVal1 <= dateVal2);
 
-	PG_RETURN_FLOAT8(dateVal1 - dateVal2);
+	PG_RETURN_FLOAT8((double) dateVal2 - (double) dateVal1);
 }
 
 /*
@@ -2136,19 +2139,16 @@ brin_minmax_multi_distance_timetz(PG_FUNCTION_ARGS)
 Datum
 brin_minmax_multi_distance_timestamp(PG_FUNCTION_ARGS)
 {
-	float8		delta = 0;
-
 	Timestamp	dt1 = PG_GETARG_TIMESTAMP(0);
 	Timestamp	dt2 = PG_GETARG_TIMESTAMP(1);
 
-	if (TIMESTAMP_NOT_FINITE(dt1) || TIMESTAMP_NOT_FINITE(dt2))
-		PG_RETURN_FLOAT8(0);
-
-	delta = dt2 - dt1;
-
-	Assert(delta >= 0);
+	/*
+	 * We know the values are range boundaries, but the range may be collapsed
+	 * (i.e. single points), with equal values.
+	 */
+	Assert(dt1 <= dt2);
 
-	PG_RETURN_FLOAT8(delta);
+	PG_RETURN_FLOAT8((double) dt2 - (double) dt1);
 }
 
 /*

Dean Rasheed

dean.a.rasheed@gmail.com

about 2 years ago

In reply to: Tomas Vondra (#3)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On Fri, 13 Oct 2023 at 11:44, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 10/13/23 11:21, Dean Rasheed wrote:

Is this only inefficient? Or can it also lead to wrong query results?

I don't think it can produce incorrect results. It only affects which
values we "merge" into an interval when building the summaries.

Ah, I get it now. These "distance" support functions are only used to
see how far apart 2 ranges are, for the purposes of the algorithm that
merges the 2 closest ranges. So if it gets it wrong, it only leads to
a poor choice of ranges to merge, making the query inefficient, but
still correct.

Presumably, that also makes this kind of change safe to back-patch
(not sure if you were planning to do that?), since it will only affect
range merging choices when inserting new values into existing indexes.

Regards,
Dean

Tomas Vondra

tomas.vondra@enterprisedb.com

about 2 years ago

In reply to: Dean Rasheed (#4)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On 10/13/23 14:04, Dean Rasheed wrote:

On Fri, 13 Oct 2023 at 11:44, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 10/13/23 11:21, Dean Rasheed wrote:

Is this only inefficient? Or can it also lead to wrong query results?

I don't think it can produce incorrect results. It only affects which
values we "merge" into an interval when building the summaries.

Ah, I get it now. These "distance" support functions are only used to
see how far apart 2 ranges are, for the purposes of the algorithm that
merges the 2 closest ranges. So if it gets it wrong, it only leads to
a poor choice of ranges to merge, making the query inefficient, but
still correct.

Right.

Presumably, that also makes this kind of change safe to back-patch
(not sure if you were planning to do that?), since it will only affect
range merging choices when inserting new values into existing indexes.

I do plan to backpatch this, yes. I don't think there are many people
affected by this (few people are using infinite dates/timestamps, but
maybe the overflow could be more common).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Dean Rasheed

dean.a.rasheed@gmail.com

about 2 years ago

In reply to: Tomas Vondra (#5)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On Fri, 13 Oct 2023 at 13:17, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

I do plan to backpatch this, yes. I don't think there are many people
affected by this (few people are using infinite dates/timestamps, but
maybe the overflow could be more common).

OK, though I doubt that such values are common in practice.

There's also an overflow problem in
brin_minmax_multi_distance_interval() though, and I think that's worse
because overflows there throw "interval out of range" errors, which
can prevent index creation or inserts.

There's a patch (0009 in [1]/messages/by-id/CAExHW5u1JE7dxK=WLzqhCszNToxQzJdieRmhREpW6r8w6kcRGQ@mail.gmail.com) as part of the infinite interval work,
but that just uses interval_mi(), and so doesn't fix the
interval-out-of-range errors, except for infinite intervals, which are
treated as special cases, which I don't think is really necessary.

I think this should be rewritten to compute delta from ia and ib
without going via an intermediate Interval value. I.e., instead of
computing "result", just do something like

dayfraction = (ib->time % USECS_PER_DAY) - (ia->time % USECS_PER_DAY);
days = (ib->time / USECS_PER_DAY) - (ia->time / USECS_PER_DAY);
days += (int64) ib->day - (int64) ia->day;
days += ((int64) ib->month - (int64) ia->month) * INT64CONST(30);

then convert to double precision as it does now:

delta = (double) days + dayfraction / (double) USECS_PER_DAY;

So the first part is exact 64-bit integer arithmetic, with no chance
of overflow, and it'll handle "infinite" intervals just fine, when
that feature gets added.

Regards,
Dean

[1]: /messages/by-id/CAExHW5u1JE7dxK=WLzqhCszNToxQzJdieRmhREpW6r8w6kcRGQ@mail.gmail.com

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

about 2 years ago

In reply to: Dean Rasheed (#6)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

Thanks Tomas for bringing this discussion to hackers.

On Fri, Oct 13, 2023 at 8:58 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

On Fri, 13 Oct 2023 at 13:17, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

I do plan to backpatch this, yes. I don't think there are many people
affected by this (few people are using infinite dates/timestamps, but
maybe the overflow could be more common).

The example you gave is missing CREATE INDEX command. Is it "create
index test_idx_a on test using brin(a);"

Do already create indexes have this issue? Do they need to rebuilt
after upgrading?

OK, though I doubt that such values are common in practice.

There's also an overflow problem in
brin_minmax_multi_distance_interval() though, and I think that's worse
because overflows there throw "interval out of range" errors, which
can prevent index creation or inserts.

There's a patch (0009 in [1]) as part of the infinite interval work,
but that just uses interval_mi(), and so doesn't fix the
interval-out-of-range errors, except for infinite intervals, which are
treated as special cases, which I don't think is really necessary.

Right. I used interval_mi() to preserve the finite value behaviour as
is. But ...

I think this should be rewritten to compute delta from ia and ib
without going via an intermediate Interval value. I.e., instead of
computing "result", just do something like

dayfraction = (ib->time % USECS_PER_DAY) - (ia->time % USECS_PER_DAY);
days = (ib->time / USECS_PER_DAY) - (ia->time / USECS_PER_DAY);
days += (int64) ib->day - (int64) ia->day;
days += ((int64) ib->month - (int64) ia->month) * INT64CONST(30);

then convert to double precision as it does now:

delta = (double) days + dayfraction / (double) USECS_PER_DAY;

Given Tomas's explanation of how these functions are supposed to work,
I think your suggestions is better.

I was worried that above calculations may not produce the same result
as the current code when there is no error because modulo and integer
division are not distributive over subtraction. But it looks like
together they behave as normal division which is distributive over
subtraction. I couldn't find an example where that is not true.

Tomas, you may want to incorporate this in your patch. If not, I will
incorporate it in my infinite interval patchset in [1]/messages/by-id/CAExHW5u1JE7dxK=WLzqhCszNToxQzJdieRmhREpW6r8w6kcRGQ@mail.gmail.com.

[1]: /messages/by-id/CAExHW5u1JE7dxK=WLzqhCszNToxQzJdieRmhREpW6r8w6kcRGQ@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Tomas Vondra

tomas.vondra@enterprisedb.com

about 2 years ago

In reply to: Ashutosh Bapat (#7)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On 10/16/23 11:25, Ashutosh Bapat wrote:

Thanks Tomas for bringing this discussion to hackers.

On Fri, Oct 13, 2023 at 8:58 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

On Fri, 13 Oct 2023 at 13:17, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

I do plan to backpatch this, yes. I don't think there are many people
affected by this (few people are using infinite dates/timestamps, but
maybe the overflow could be more common).

The example you gave is missing CREATE INDEX command. Is it "create
index test_idx_a on test using brin(a);"

Ah, you're right - apologies. FWIW when testing I usually use 1-page
ranges, to possibly hit more combinations / problems. More importantly,
it needs to specify the opclass, otherwise it'll use the default minmax
opclass which does not use distance at all:

create index test_idx_a on test
using brin (a timestamptz_minmax_multi_ops) with (pages_per_range=1);

Do already create indexes have this issue? Do they need to rebuilt
after upgrading?

Yes, existing indexes will have inefficient ranges. I'm not sure we want
to push people to reindex everything, the issue seem somewhat unlikely
in practice.

OK, though I doubt that such values are common in practice.

There's also an overflow problem in
brin_minmax_multi_distance_interval() though, and I think that's worse
because overflows there throw "interval out of range" errors, which
can prevent index creation or inserts.

There's a patch (0009 in [1]) as part of the infinite interval work,
but that just uses interval_mi(), and so doesn't fix the
interval-out-of-range errors, except for infinite intervals, which are
treated as special cases, which I don't think is really necessary.

Right. I used interval_mi() to preserve the finite value behaviour as
is. But ...

I think this should be rewritten to compute delta from ia and ib
without going via an intermediate Interval value. I.e., instead of
computing "result", just do something like

dayfraction = (ib->time % USECS_PER_DAY) - (ia->time % USECS_PER_DAY);
days = (ib->time / USECS_PER_DAY) - (ia->time / USECS_PER_DAY);
days += (int64) ib->day - (int64) ia->day;
days += ((int64) ib->month - (int64) ia->month) * INT64CONST(30);

then convert to double precision as it does now:

delta = (double) days + dayfraction / (double) USECS_PER_DAY;

Given Tomas's explanation of how these functions are supposed to work,
I think your suggestions is better.

I was worried that above calculations may not produce the same result
as the current code when there is no error because modulo and integer
division are not distributive over subtraction. But it looks like
together they behave as normal division which is distributive over
subtraction. I couldn't find an example where that is not true.

Tomas, you may want to incorporate this in your patch. If not, I will
incorporate it in my infinite interval patchset in [1].

I'd rather keep it as separate patch, although maybe let's deal with it
separately from the larger patches. It's a bug, and having it in a patch
set that adds a feature does not seem like a good idea (or maybe I don't
understand what the other thread does, I haven't looked very closely).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

about 2 years ago

In reply to: Tomas Vondra (#8)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On Mon, Oct 16, 2023 at 7:33 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 10/16/23 11:25, Ashutosh Bapat wrote:

Thanks Tomas for bringing this discussion to hackers.

On Fri, Oct 13, 2023 at 8:58 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

On Fri, 13 Oct 2023 at 13:17, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

I do plan to backpatch this, yes. I don't think there are many people
affected by this (few people are using infinite dates/timestamps, but
maybe the overflow could be more common).

The example you gave is missing CREATE INDEX command. Is it "create
index test_idx_a on test using brin(a);"

Ah, you're right - apologies. FWIW when testing I usually use 1-page
ranges, to possibly hit more combinations / problems. More importantly,
it needs to specify the opclass, otherwise it'll use the default minmax
opclass which does not use distance at all:

create index test_idx_a on test
using brin (a timestamptz_minmax_multi_ops) with (pages_per_range=1);

Thanks.

Do already create indexes have this issue? Do they need to rebuilt
after upgrading?

Yes, existing indexes will have inefficient ranges. I'm not sure we want
to push people to reindex everything, the issue seem somewhat unlikely
in practice.

If the column has infinity values only then they need to rebuild the
index. Such users may notice this bug fix in the release notes and
decide to rebuild the index themselves.

I think this should be rewritten to compute delta from ia and ib
without going via an intermediate Interval value. I.e., instead of
computing "result", just do something like

dayfraction = (ib->time % USECS_PER_DAY) - (ia->time % USECS_PER_DAY);
days = (ib->time / USECS_PER_DAY) - (ia->time / USECS_PER_DAY);
days += (int64) ib->day - (int64) ia->day;
days += ((int64) ib->month - (int64) ia->month) * INT64CONST(30);

then convert to double precision as it does now:

delta = (double) days + dayfraction / (double) USECS_PER_DAY;

Given Tomas's explanation of how these functions are supposed to work,
I think your suggestions is better.

I was worried that above calculations may not produce the same result
as the current code when there is no error because modulo and integer
division are not distributive over subtraction. But it looks like
together they behave as normal division which is distributive over
subtraction. I couldn't find an example where that is not true.

Tomas, you may want to incorporate this in your patch. If not, I will
incorporate it in my infinite interval patchset in [1].

I'd rather keep it as separate patch, although maybe let's deal with it
separately from the larger patches. It's a bug, and having it in a patch
set that adds a feature does not seem like a good idea (or maybe I don't
understand what the other thread does, I haven't looked very closely).

If you incorporate these changes, I will need to remove 0009, which
mostly rewrites that function, from my patchset. If you don't, my
patch rewrites anyway. Either way is fine with me.

--
Best Wishes,
Ashutosh Bapat

#10

Tomas Vondra

tomas.vondra@enterprisedb.com

about 2 years ago

In reply to: Ashutosh Bapat (#9)

8 attachment(s)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

Hi,

Here's a couple cleaned-up patches fixing the various discussed here.
I've tried to always add a regression test demonstrating the issue
first, and then fix it in the next patch.

In particular, this deals with these issues:

1) overflows in distance calculation for large timestamp values (0002)

2) incorrect subtraction in distance for date values (0003)

3) incorrect distance for infinite date/timestamp values (0005)

4) failing distance for extreme interval values (0007)

All the problems except "2" have been discussed earlier, but this seems
a bit more serious than the other issues, as it's easier to hit. It
subtracts the values in the opposite order (smaller - larger), so the
distances are negated. Which means we actually merge the values from the
most distant ones, and thus are "guaranteed" to build very a very
inefficient summary. People with multi-minmax indexes on "date" columns
probably will need to reindex.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Tests-for-overflows-with-dates-and-timestamps-in-BRI.patchtext/x-patch; charset=UTF-8; name=0001-Tests-for-overflows-with-dates-and-timestamps-in-BRI.patchDownload

From e23ca4e53d352e05951fb314dea347682794c25b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Tue, 17 Oct 2023 18:18:53 +0200
Subject: [PATCH 1/8] Tests for overflows with dates and timestamps in BRIN

When calculating distances for date and timestamp values for BRIN
minmax-multi indexes, we need to be careful about overflows for extreme
values. In that case the distance is negative, resulting in building of
inefficient summaries.

The new regression tests check this for date and timestamp data types.
It adds tables with data close to the allowed min/max values, and builds
a minmax-multi index on it.
---
 src/test/regress/expected/brin_multi.out | 61 ++++++++++++++++++++++
 src/test/regress/sql/brin_multi.sql      | 65 ++++++++++++++++++++++++
 2 files changed, 126 insertions(+)

diff --git a/src/test/regress/expected/brin_multi.out b/src/test/regress/expected/brin_multi.out
index 9f46934c9be..d5bd600f8fd 100644
--- a/src/test/regress/expected/brin_multi.out
+++ b/src/test/regress/expected/brin_multi.out
@@ -823,3 +823,64 @@ SELECT COUNT(*) FROM brin_test_multi_2 WHERE a = 'aab32389-22bc-c25a-6f60-6eb525
 
 DROP TABLE brin_test_multi_2;
 RESET enable_seqscan;
+-- test overflows during CREATE INDEX with extreme timestamp values
+CREATE TABLE brin_timestamp_test(a TIMESTAMPTZ);
+SET datestyle TO iso;
+INSERT INTO brin_timestamp_test VALUES
+('4713-01-01 00:00:01 BC'), ('4713-01-01 00:00:02 BC'), ('4713-01-01 00:00:03 BC'),
+('4713-01-01 00:00:04 BC'), ('4713-01-01 00:00:05 BC'), ('4713-01-01 00:00:06 BC'),
+('4713-01-01 00:00:07 BC'), ('4713-01-01 00:00:08 BC'), ('4713-01-01 00:00:09 BC'),
+('4713-01-01 00:00:10 BC'), ('4713-01-01 00:00:11 BC'), ('4713-01-01 00:00:12 BC'),
+('4713-01-01 00:00:13 BC'), ('4713-01-01 00:00:14 BC'), ('4713-01-01 00:00:15 BC'),
+('4713-01-01 00:00:16 BC'), ('4713-01-01 00:00:17 BC'), ('4713-01-01 00:00:18 BC'),
+('4713-01-01 00:00:19 BC'), ('4713-01-01 00:00:20 BC'), ('4713-01-01 00:00:21 BC'),
+('4713-01-01 00:00:22 BC'), ('4713-01-01 00:00:23 BC'), ('4713-01-01 00:00:24 BC'),
+('4713-01-01 00:00:25 BC'), ('4713-01-01 00:00:26 BC'), ('4713-01-01 00:00:27 BC'),
+('4713-01-01 00:00:28 BC'), ('4713-01-01 00:00:29 BC'), ('4713-01-01 00:00:30 BC'),
+('294276-12-01 00:00:01'), ('294276-12-01 00:00:02'), ('294276-12-01 00:00:03'),
+('294276-12-01 00:00:04'), ('294276-12-01 00:00:05'), ('294276-12-01 00:00:06'),
+('294276-12-01 00:00:07'), ('294276-12-01 00:00:08'), ('294276-12-01 00:00:09'),
+('294276-12-01 00:00:10'), ('294276-12-01 00:00:11'), ('294276-12-01 00:00:12'),
+('294276-12-01 00:00:13'), ('294276-12-01 00:00:14'), ('294276-12-01 00:00:15'),
+('294276-12-01 00:00:16'), ('294276-12-01 00:00:17'), ('294276-12-01 00:00:18'),
+('294276-12-01 00:00:19'), ('294276-12-01 00:00:20'), ('294276-12-01 00:00:21'),
+('294276-12-01 00:00:22'), ('294276-12-01 00:00:23'), ('294276-12-01 00:00:24'),
+('294276-12-01 00:00:25'), ('294276-12-01 00:00:26'), ('294276-12-01 00:00:27'),
+('294276-12-01 00:00:28'), ('294276-12-01 00:00:29'), ('294276-12-01 00:00:30');
+CREATE INDEX ON brin_timestamp_test USING brin (a timestamptz_minmax_multi_ops) WITH (pages_per_range=1);
+DROP TABLE brin_timestamp_test;
+-- test overflows during CREATE INDEX with extreme date values
+CREATE TABLE brin_date_test(a DATE);
+INSERT INTO brin_date_test VALUES
+('4713-01-01 BC'), ('4713-01-02 BC'), ('4713-01-03 BC'), ('4713-01-04 BC'),
+('4713-01-05 BC'), ('4713-01-06 BC'), ('4713-01-07 BC'), ('4713-01-08 BC'),
+('4713-01-09 BC'), ('4713-01-10 BC'), ('4713-01-11 BC'), ('4713-01-12 BC'),
+('4713-01-13 BC'), ('4713-01-14 BC'), ('4713-01-15 BC'), ('4713-01-16 BC'),
+('4713-01-17 BC'), ('4713-01-18 BC'), ('4713-01-19 BC'), ('4713-01-20 BC'),
+('4713-01-21 BC'), ('4713-01-22 BC'), ('4713-01-23 BC'), ('4713-01-24 BC'),
+('4713-01-25 BC'), ('4713-01-26 BC'), ('4713-01-27 BC'), ('4713-01-28 BC'),
+('4713-01-29 BC'), ('4713-01-30 BC'), ('4713-01-31 BC'),
+('5874897-12-01'), ('5874897-12-02'), ('5874897-12-03'), ('5874897-12-04'),
+('5874897-12-05'), ('5874897-12-06'), ('5874897-12-07'), ('5874897-12-08'),
+('5874897-12-09'), ('5874897-12-10'), ('5874897-12-11'), ('5874897-12-12'),
+('5874897-12-13'), ('5874897-12-14'), ('5874897-12-15'), ('5874897-12-16'),
+('5874897-12-17'), ('5874897-12-18'), ('5874897-12-19'), ('5874897-12-20'),
+('5874897-12-21'), ('5874897-12-22'), ('5874897-12-23'), ('5874897-12-24'),
+('5874897-12-25'), ('5874897-12-26'), ('5874897-12-27'), ('5874897-12-28'),
+('5874897-12-29'), ('5874897-12-30'), ('5874897-12-31');
+CREATE INDEX ON brin_date_test USING brin (a date_minmax_multi_ops) WITH (pages_per_range=1);
+SET enable_seqscan = off;
+-- make sure the ranges were built correctly and 2023-01-01 eliminates all
+EXPLAIN (ANALYZE, TIMING OFF, COSTS OFF, SUMMARY OFF)
+SELECT * FROM brin_date_test WHERE a = '2023-01-01'::date;
+                               QUERY PLAN                                
+-------------------------------------------------------------------------
+ Bitmap Heap Scan on brin_date_test (actual rows=0 loops=1)
+   Recheck Cond: (a = '2023-01-01'::date)
+   ->  Bitmap Index Scan on brin_date_test_a_idx (actual rows=0 loops=1)
+         Index Cond: (a = '2023-01-01'::date)
+(4 rows)
+
+DROP TABLE brin_date_test;
+RESET enable_seqscan;
+RESET datestyle;
diff --git a/src/test/regress/sql/brin_multi.sql b/src/test/regress/sql/brin_multi.sql
index d50dbdee682..63d35eacd2d 100644
--- a/src/test/regress/sql/brin_multi.sql
+++ b/src/test/regress/sql/brin_multi.sql
@@ -586,3 +586,68 @@ SELECT COUNT(*) FROM brin_test_multi_2 WHERE a = 'aab32389-22bc-c25a-6f60-6eb525
 
 DROP TABLE brin_test_multi_2;
 RESET enable_seqscan;
+
+-- test overflows during CREATE INDEX with extreme timestamp values
+CREATE TABLE brin_timestamp_test(a TIMESTAMPTZ);
+
+SET datestyle TO iso;
+
+INSERT INTO brin_timestamp_test VALUES
+('4713-01-01 00:00:01 BC'), ('4713-01-01 00:00:02 BC'), ('4713-01-01 00:00:03 BC'),
+('4713-01-01 00:00:04 BC'), ('4713-01-01 00:00:05 BC'), ('4713-01-01 00:00:06 BC'),
+('4713-01-01 00:00:07 BC'), ('4713-01-01 00:00:08 BC'), ('4713-01-01 00:00:09 BC'),
+('4713-01-01 00:00:10 BC'), ('4713-01-01 00:00:11 BC'), ('4713-01-01 00:00:12 BC'),
+('4713-01-01 00:00:13 BC'), ('4713-01-01 00:00:14 BC'), ('4713-01-01 00:00:15 BC'),
+('4713-01-01 00:00:16 BC'), ('4713-01-01 00:00:17 BC'), ('4713-01-01 00:00:18 BC'),
+('4713-01-01 00:00:19 BC'), ('4713-01-01 00:00:20 BC'), ('4713-01-01 00:00:21 BC'),
+('4713-01-01 00:00:22 BC'), ('4713-01-01 00:00:23 BC'), ('4713-01-01 00:00:24 BC'),
+('4713-01-01 00:00:25 BC'), ('4713-01-01 00:00:26 BC'), ('4713-01-01 00:00:27 BC'),
+('4713-01-01 00:00:28 BC'), ('4713-01-01 00:00:29 BC'), ('4713-01-01 00:00:30 BC'),
+
+('294276-12-01 00:00:01'), ('294276-12-01 00:00:02'), ('294276-12-01 00:00:03'),
+('294276-12-01 00:00:04'), ('294276-12-01 00:00:05'), ('294276-12-01 00:00:06'),
+('294276-12-01 00:00:07'), ('294276-12-01 00:00:08'), ('294276-12-01 00:00:09'),
+('294276-12-01 00:00:10'), ('294276-12-01 00:00:11'), ('294276-12-01 00:00:12'),
+('294276-12-01 00:00:13'), ('294276-12-01 00:00:14'), ('294276-12-01 00:00:15'),
+('294276-12-01 00:00:16'), ('294276-12-01 00:00:17'), ('294276-12-01 00:00:18'),
+('294276-12-01 00:00:19'), ('294276-12-01 00:00:20'), ('294276-12-01 00:00:21'),
+('294276-12-01 00:00:22'), ('294276-12-01 00:00:23'), ('294276-12-01 00:00:24'),
+('294276-12-01 00:00:25'), ('294276-12-01 00:00:26'), ('294276-12-01 00:00:27'),
+('294276-12-01 00:00:28'), ('294276-12-01 00:00:29'), ('294276-12-01 00:00:30');
+
+CREATE INDEX ON brin_timestamp_test USING brin (a timestamptz_minmax_multi_ops) WITH (pages_per_range=1);
+DROP TABLE brin_timestamp_test;
+
+-- test overflows during CREATE INDEX with extreme date values
+CREATE TABLE brin_date_test(a DATE);
+
+INSERT INTO brin_date_test VALUES
+('4713-01-01 BC'), ('4713-01-02 BC'), ('4713-01-03 BC'), ('4713-01-04 BC'),
+('4713-01-05 BC'), ('4713-01-06 BC'), ('4713-01-07 BC'), ('4713-01-08 BC'),
+('4713-01-09 BC'), ('4713-01-10 BC'), ('4713-01-11 BC'), ('4713-01-12 BC'),
+('4713-01-13 BC'), ('4713-01-14 BC'), ('4713-01-15 BC'), ('4713-01-16 BC'),
+('4713-01-17 BC'), ('4713-01-18 BC'), ('4713-01-19 BC'), ('4713-01-20 BC'),
+('4713-01-21 BC'), ('4713-01-22 BC'), ('4713-01-23 BC'), ('4713-01-24 BC'),
+('4713-01-25 BC'), ('4713-01-26 BC'), ('4713-01-27 BC'), ('4713-01-28 BC'),
+('4713-01-29 BC'), ('4713-01-30 BC'), ('4713-01-31 BC'),
+
+('5874897-12-01'), ('5874897-12-02'), ('5874897-12-03'), ('5874897-12-04'),
+('5874897-12-05'), ('5874897-12-06'), ('5874897-12-07'), ('5874897-12-08'),
+('5874897-12-09'), ('5874897-12-10'), ('5874897-12-11'), ('5874897-12-12'),
+('5874897-12-13'), ('5874897-12-14'), ('5874897-12-15'), ('5874897-12-16'),
+('5874897-12-17'), ('5874897-12-18'), ('5874897-12-19'), ('5874897-12-20'),
+('5874897-12-21'), ('5874897-12-22'), ('5874897-12-23'), ('5874897-12-24'),
+('5874897-12-25'), ('5874897-12-26'), ('5874897-12-27'), ('5874897-12-28'),
+('5874897-12-29'), ('5874897-12-30'), ('5874897-12-31');
+
+CREATE INDEX ON brin_date_test USING brin (a date_minmax_multi_ops) WITH (pages_per_range=1);
+
+SET enable_seqscan = off;
+
+-- make sure the ranges were built correctly and 2023-01-01 eliminates all
+EXPLAIN (ANALYZE, TIMING OFF, COSTS OFF, SUMMARY OFF)
+SELECT * FROM brin_date_test WHERE a = '2023-01-01'::date;
+
+DROP TABLE brin_date_test;
+RESET enable_seqscan;
+RESET datestyle;
-- 
2.41.0

0002-Fix-overflow-in-brin_minmax_multi_distance_timestamp.patchtext/x-patch; charset=UTF-8; name=0002-Fix-overflow-in-brin_minmax_multi_distance_timestamp.patchDownload

From 3b45932614c9b0d3afaeddd06d9f3f53a521ba32 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Tue, 17 Oct 2023 18:20:45 +0200
Subject: [PATCH 2/8] Fix overflow in brin_minmax_multi_distance_timestamp

When calculating the distance between timestamp values, make sure to
convert the int64 values to double first. This prevents an overflow when
calculating the difference.
---
 src/backend/access/brin/brin_minmax_multi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index f8b2a3f9bc6..8c72a0a0366 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2144,7 +2144,7 @@ brin_minmax_multi_distance_timestamp(PG_FUNCTION_ARGS)
 	if (TIMESTAMP_NOT_FINITE(dt1) || TIMESTAMP_NOT_FINITE(dt2))
 		PG_RETURN_FLOAT8(0);
 
-	delta = dt2 - dt1;
+	delta = (float8) dt2 - (float8) dt1;
 
 	Assert(delta >= 0);
 
-- 
2.41.0

0003-Fix-calculation-in-brin_minmax_multi_distance_date.patchtext/x-patch; charset=UTF-8; name=0003-Fix-calculation-in-brin_minmax_multi_distance_date.patchDownload

From 48dc7638395512168c6beeeb38fa15037769540c Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Tue, 17 Oct 2023 18:31:29 +0200
Subject: [PATCH 3/8] Fix calculation in brin_minmax_multi_distance_date

When calculating the distance between dates, make sure to subtract the
values in the right order. This can't overflow, because the min/max
values are not sufficiently far away.
---
 src/backend/access/brin/brin_minmax_multi.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index 8c72a0a0366..cadfb4481ef 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2081,13 +2081,18 @@ brin_minmax_multi_distance_uuid(PG_FUNCTION_ARGS)
 Datum
 brin_minmax_multi_distance_date(PG_FUNCTION_ARGS)
 {
+	float8		delta = 0;
 	DateADT		dateVal1 = PG_GETARG_DATEADT(0);
 	DateADT		dateVal2 = PG_GETARG_DATEADT(1);
 
 	if (DATE_NOT_FINITE(dateVal1) || DATE_NOT_FINITE(dateVal2))
 		PG_RETURN_FLOAT8(0);
 
-	PG_RETURN_FLOAT8(dateVal1 - dateVal2);
+	delta = (dateVal2 - dateVal1);
+
+	Assert(delta >= 0);
+
+	PG_RETURN_FLOAT8(delta);
 }
 
 /*
-- 
2.41.0

0004-Add-tests-for-inifite-date-timestamp-values.patchtext/x-patch; charset=UTF-8; name=0004-Add-tests-for-inifite-date-timestamp-values.patchDownload

From 7f608536adeaecc0817edfb81b85c91e09e543a9 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Tue, 17 Oct 2023 19:15:40 +0200
Subject: [PATCH 4/8] Add tests for inifite date/timestamp values

Make sure that infinite values in date/timestamp columns are treated as
if in infinite distance. This means the values should not be merged with
any other values, leaving them as outliers. The test queries a value in
the "gap" and checks the range was eliminated by the BRIN index.
---
 src/test/regress/expected/brin_multi.out | 72 ++++++++++++++++++++++++
 src/test/regress/sql/brin_multi.sql      | 54 ++++++++++++++++++
 2 files changed, 126 insertions(+)

diff --git a/src/test/regress/expected/brin_multi.out b/src/test/regress/expected/brin_multi.out
index d5bd600f8fd..fad0c536b9a 100644
--- a/src/test/regress/expected/brin_multi.out
+++ b/src/test/regress/expected/brin_multi.out
@@ -881,6 +881,78 @@ SELECT * FROM brin_date_test WHERE a = '2023-01-01'::date;
          Index Cond: (a = '2023-01-01'::date)
 (4 rows)
 
+DROP TABLE brin_date_test;
+RESET enable_seqscan;
+-- test handling of infinite timestamp values
+CREATE TABLE brin_timestamp_test(a TIMESTAMP);
+INSERT INTO brin_timestamp_test VALUES
+ ('-infinity'),   ('infinity'), ('2000-01-01'), ('2000-01-02'), ('2000-01-03'),
+('2000-01-04'), ('2000-01-05'), ('2000-01-06'), ('2000-01-07'), ('2000-01-08'),
+('2000-01-09'), ('2000-01-10'), ('2000-01-11'), ('2000-01-12'), ('2000-01-13'),
+('2000-01-14'), ('2000-01-15'), ('2000-01-16'), ('2000-01-17'), ('2000-01-18'),
+('2000-01-19'), ('2000-01-20'), ('2000-01-21'), ('2000-01-22'), ('2000-01-23'),
+('2000-01-24'), ('2000-01-25'), ('2000-01-26'), ('2000-01-27'), ('2000-01-28'),
+('2000-01-29'), ('2000-01-30'), ('2000-01-31'), ('2000-02-01'), ('2000-02-02'),
+('2000-02-03'), ('2000-02-04'), ('2000-02-05'), ('2000-02-06'), ('2000-02-07'),
+('2000-02-08'), ('2000-02-09');
+CREATE INDEX ON brin_timestamp_test USING brin (a timestamp_minmax_multi_ops) WITH (pages_per_range=1);
+SET enable_seqscan = off;
+EXPLAIN (ANALYZE, TIMING OFF, COSTS OFF, SUMMARY OFF)
+SELECT * FROM brin_timestamp_test WHERE a = '2023-01-01'::timestamp;
+                                  QUERY PLAN                                  
+------------------------------------------------------------------------------
+ Bitmap Heap Scan on brin_timestamp_test (actual rows=0 loops=1)
+   Recheck Cond: (a = '2023-01-01 00:00:00'::timestamp without time zone)
+   ->  Bitmap Index Scan on brin_timestamp_test_a_idx (actual rows=0 loops=1)
+         Index Cond: (a = '2023-01-01 00:00:00'::timestamp without time zone)
+(4 rows)
+
+EXPLAIN (ANALYZE, TIMING OFF, COSTS OFF, SUMMARY OFF)
+SELECT * FROM brin_timestamp_test WHERE a = '1900-01-01'::timestamp;
+                                  QUERY PLAN                                  
+------------------------------------------------------------------------------
+ Bitmap Heap Scan on brin_timestamp_test (actual rows=0 loops=1)
+   Recheck Cond: (a = '1900-01-01 00:00:00'::timestamp without time zone)
+   ->  Bitmap Index Scan on brin_timestamp_test_a_idx (actual rows=0 loops=1)
+         Index Cond: (a = '1900-01-01 00:00:00'::timestamp without time zone)
+(4 rows)
+
+DROP TABLE brin_timestamp_test;
+RESET enable_seqscan;
+-- test handling of infinite date values
+CREATE TABLE brin_date_test(a DATE);
+INSERT INTO brin_date_test VALUES
+('-infinity'),   ('infinity'), ('2000-01-01'), ('2000-01-02'), ('2000-01-03'),
+('2000-01-04'), ('2000-01-05'), ('2000-01-06'), ('2000-01-07'), ('2000-01-08'),
+('2000-01-09'), ('2000-01-10'), ('2000-01-11'), ('2000-01-12'), ('2000-01-13'),
+('2000-01-14'), ('2000-01-15'), ('2000-01-16'), ('2000-01-17'), ('2000-01-18'),
+('2000-01-19'), ('2000-01-20'), ('2000-01-21'), ('2000-01-22'), ('2000-01-23'),
+('2000-01-24'), ('2000-01-25'), ('2000-01-26'), ('2000-01-27'), ('2000-01-28'),
+('2000-01-29'), ('2000-01-30'), ('2000-01-31'), ('2000-02-01'), ('2000-02-02'),
+('2000-02-03'), ('2000-02-04'), ('2000-02-05'), ('2000-02-06'), ('2000-02-07'),
+('2000-02-08'), ('2000-02-09');
+CREATE INDEX ON brin_date_test USING brin (a date_minmax_multi_ops) WITH (pages_per_range=1);
+SET enable_seqscan = off;
+EXPLAIN (ANALYZE, TIMING OFF, COSTS OFF, SUMMARY OFF)
+SELECT * FROM brin_date_test WHERE a = '2023-01-01'::date;
+                               QUERY PLAN                                
+-------------------------------------------------------------------------
+ Bitmap Heap Scan on brin_date_test (actual rows=0 loops=1)
+   Recheck Cond: (a = '2023-01-01'::date)
+   ->  Bitmap Index Scan on brin_date_test_a_idx (actual rows=0 loops=1)
+         Index Cond: (a = '2023-01-01'::date)
+(4 rows)
+
+EXPLAIN (ANALYZE, TIMING OFF, COSTS OFF, SUMMARY OFF)
+SELECT * FROM brin_date_test WHERE a = '1900-01-01'::date;
+                               QUERY PLAN                                
+-------------------------------------------------------------------------
+ Bitmap Heap Scan on brin_date_test (actual rows=0 loops=1)
+   Recheck Cond: (a = '1900-01-01'::date)
+   ->  Bitmap Index Scan on brin_date_test_a_idx (actual rows=0 loops=1)
+         Index Cond: (a = '1900-01-01'::date)
+(4 rows)
+
 DROP TABLE brin_date_test;
 RESET enable_seqscan;
 RESET datestyle;
diff --git a/src/test/regress/sql/brin_multi.sql b/src/test/regress/sql/brin_multi.sql
index 63d35eacd2d..25f939325f9 100644
--- a/src/test/regress/sql/brin_multi.sql
+++ b/src/test/regress/sql/brin_multi.sql
@@ -648,6 +648,60 @@ SET enable_seqscan = off;
 EXPLAIN (ANALYZE, TIMING OFF, COSTS OFF, SUMMARY OFF)
 SELECT * FROM brin_date_test WHERE a = '2023-01-01'::date;
 
+DROP TABLE brin_date_test;
+RESET enable_seqscan;
+
+-- test handling of infinite timestamp values
+CREATE TABLE brin_timestamp_test(a TIMESTAMP);
+
+INSERT INTO brin_timestamp_test VALUES
+ ('-infinity'),   ('infinity'), ('2000-01-01'), ('2000-01-02'), ('2000-01-03'),
+('2000-01-04'), ('2000-01-05'), ('2000-01-06'), ('2000-01-07'), ('2000-01-08'),
+('2000-01-09'), ('2000-01-10'), ('2000-01-11'), ('2000-01-12'), ('2000-01-13'),
+('2000-01-14'), ('2000-01-15'), ('2000-01-16'), ('2000-01-17'), ('2000-01-18'),
+('2000-01-19'), ('2000-01-20'), ('2000-01-21'), ('2000-01-22'), ('2000-01-23'),
+('2000-01-24'), ('2000-01-25'), ('2000-01-26'), ('2000-01-27'), ('2000-01-28'),
+('2000-01-29'), ('2000-01-30'), ('2000-01-31'), ('2000-02-01'), ('2000-02-02'),
+('2000-02-03'), ('2000-02-04'), ('2000-02-05'), ('2000-02-06'), ('2000-02-07'),
+('2000-02-08'), ('2000-02-09');
+
+CREATE INDEX ON brin_timestamp_test USING brin (a timestamp_minmax_multi_ops) WITH (pages_per_range=1);
+
+SET enable_seqscan = off;
+
+EXPLAIN (ANALYZE, TIMING OFF, COSTS OFF, SUMMARY OFF)
+SELECT * FROM brin_timestamp_test WHERE a = '2023-01-01'::timestamp;
+
+EXPLAIN (ANALYZE, TIMING OFF, COSTS OFF, SUMMARY OFF)
+SELECT * FROM brin_timestamp_test WHERE a = '1900-01-01'::timestamp;
+
+DROP TABLE brin_timestamp_test;
+RESET enable_seqscan;
+
+-- test handling of infinite date values
+CREATE TABLE brin_date_test(a DATE);
+
+INSERT INTO brin_date_test VALUES
+('-infinity'),   ('infinity'), ('2000-01-01'), ('2000-01-02'), ('2000-01-03'),
+('2000-01-04'), ('2000-01-05'), ('2000-01-06'), ('2000-01-07'), ('2000-01-08'),
+('2000-01-09'), ('2000-01-10'), ('2000-01-11'), ('2000-01-12'), ('2000-01-13'),
+('2000-01-14'), ('2000-01-15'), ('2000-01-16'), ('2000-01-17'), ('2000-01-18'),
+('2000-01-19'), ('2000-01-20'), ('2000-01-21'), ('2000-01-22'), ('2000-01-23'),
+('2000-01-24'), ('2000-01-25'), ('2000-01-26'), ('2000-01-27'), ('2000-01-28'),
+('2000-01-29'), ('2000-01-30'), ('2000-01-31'), ('2000-02-01'), ('2000-02-02'),
+('2000-02-03'), ('2000-02-04'), ('2000-02-05'), ('2000-02-06'), ('2000-02-07'),
+('2000-02-08'), ('2000-02-09');
+
+CREATE INDEX ON brin_date_test USING brin (a date_minmax_multi_ops) WITH (pages_per_range=1);
+
+SET enable_seqscan = off;
+
+EXPLAIN (ANALYZE, TIMING OFF, COSTS OFF, SUMMARY OFF)
+SELECT * FROM brin_date_test WHERE a = '2023-01-01'::date;
+
+EXPLAIN (ANALYZE, TIMING OFF, COSTS OFF, SUMMARY OFF)
+SELECT * FROM brin_date_test WHERE a = '1900-01-01'::date;
+
 DROP TABLE brin_date_test;
 RESET enable_seqscan;
 RESET datestyle;
-- 
2.41.0

0005-Fix-handling-of-infinity-date-timestamp-values.patchtext/x-patch; charset=UTF-8; name=0005-Fix-handling-of-infinity-date-timestamp-values.patchDownload

From 1a8109457a631e5da94182189d60e7f826387ce4 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Tue, 17 Oct 2023 19:19:38 +0200
Subject: [PATCH 5/8] Fix handling of infinity date/timestamp values

We don't need explicit handling of infinite date/timestamp values when
calculating distances, because those values are represented as extreme
but regular values (e.g. INT64_MIN/MAX for the timestamp type).

We don't need an exact distance, just a value that is much larger than
distanced between regular values. With the added cast to double values,
we can simply calculate the "regular" distance.
---
 src/backend/access/brin/brin_minmax_multi.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index cadfb4481ef..37706e5bf28 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2085,10 +2085,7 @@ brin_minmax_multi_distance_date(PG_FUNCTION_ARGS)
 	DateADT		dateVal1 = PG_GETARG_DATEADT(0);
 	DateADT		dateVal2 = PG_GETARG_DATEADT(1);
 
-	if (DATE_NOT_FINITE(dateVal1) || DATE_NOT_FINITE(dateVal2))
-		PG_RETURN_FLOAT8(0);
-
-	delta = (dateVal2 - dateVal1);
+	delta = (float8) dateVal2 - (float8) dateVal1;
 
 	Assert(delta >= 0);
 
@@ -2146,9 +2143,6 @@ brin_minmax_multi_distance_timestamp(PG_FUNCTION_ARGS)
 	Timestamp	dt1 = PG_GETARG_TIMESTAMP(0);
 	Timestamp	dt2 = PG_GETARG_TIMESTAMP(1);
 
-	if (TIMESTAMP_NOT_FINITE(dt1) || TIMESTAMP_NOT_FINITE(dt2))
-		PG_RETURN_FLOAT8(0);
-
 	delta = (float8) dt2 - (float8) dt1;
 
 	Assert(delta >= 0);
-- 
2.41.0

0006-Add-test-for-BRIN-minmax-multi-with-extreme-interval.patchtext/x-patch; charset=UTF-8; name=0006-Add-test-for-BRIN-minmax-multi-with-extreme-interval.patchDownload

From aefd39fcb7be90aee8d55c23d71730e27ecf66b3 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Tue, 17 Oct 2023 20:32:33 +0200
Subject: [PATCH 6/8] Add test for BRIN minmax-multi with extreme interval
 values

Tries to build index on interval columns with extreme values, to make
sure the distance calculation works.
---
 src/test/regress/expected/brin_multi.out | 21 +++++++++++++++++++++
 src/test/regress/sql/brin_multi.sql      | 24 ++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/src/test/regress/expected/brin_multi.out b/src/test/regress/expected/brin_multi.out
index fad0c536b9a..e07042cd8a1 100644
--- a/src/test/regress/expected/brin_multi.out
+++ b/src/test/regress/expected/brin_multi.out
@@ -956,3 +956,24 @@ SELECT * FROM brin_date_test WHERE a = '1900-01-01'::date;
 DROP TABLE brin_date_test;
 RESET enable_seqscan;
 RESET datestyle;
+-- test handling of overflow for interval values
+CREATE TABLE brin_interval_test(a INTERVAL);
+INSERT INTO brin_interval_test VALUES
+('-177999985 years'), ('177999985 years'),
+('-177999986 years'), ('177999986 years'),
+('-177999987 years'), ('177999987 years'),
+('-177999988 years'), ('177999988 years'),
+('-177999989 years'), ('177999989 years'),
+('-177999990 years'), ('177999990 years'),
+('-177999991 years'), ('177999991 years'),
+('-177999992 years'), ('177999992 years'),
+('-177999993 years'), ('177999993 years'),
+('-177999994 years'), ('177999994 years'),
+('-177999995 years'), ('177999995 years'),
+('-177999996 years'), ('177999996 years'),
+('-177999997 years'), ('177999997 years'),
+('-177999998 years'), ('177999998 years'),
+('-177999999 years'), ('177999999 years'),
+('-178000000 years'), ('178000000 years');
+CREATE INDEX ON brin_interval_test USING brin (a interval_minmax_multi_ops) WITH (pages_per_range=1);
+DROP TABLE brin_interval_test;
diff --git a/src/test/regress/sql/brin_multi.sql b/src/test/regress/sql/brin_multi.sql
index 25f939325f9..302b41e7fd0 100644
--- a/src/test/regress/sql/brin_multi.sql
+++ b/src/test/regress/sql/brin_multi.sql
@@ -705,3 +705,27 @@ SELECT * FROM brin_date_test WHERE a = '1900-01-01'::date;
 DROP TABLE brin_date_test;
 RESET enable_seqscan;
 RESET datestyle;
+
+-- test handling of overflow for interval values
+CREATE TABLE brin_interval_test(a INTERVAL);
+
+INSERT INTO brin_interval_test VALUES
+('-177999985 years'), ('177999985 years'),
+('-177999986 years'), ('177999986 years'),
+('-177999987 years'), ('177999987 years'),
+('-177999988 years'), ('177999988 years'),
+('-177999989 years'), ('177999989 years'),
+('-177999990 years'), ('177999990 years'),
+('-177999991 years'), ('177999991 years'),
+('-177999992 years'), ('177999992 years'),
+('-177999993 years'), ('177999993 years'),
+('-177999994 years'), ('177999994 years'),
+('-177999995 years'), ('177999995 years'),
+('-177999996 years'), ('177999996 years'),
+('-177999997 years'), ('177999997 years'),
+('-177999998 years'), ('177999998 years'),
+('-177999999 years'), ('177999999 years'),
+('-178000000 years'), ('178000000 years');
+
+CREATE INDEX ON brin_interval_test USING brin (a interval_minmax_multi_ops) WITH (pages_per_range=1);
+DROP TABLE brin_interval_test;
-- 
2.41.0

0007-Fix-distance-calculation-for-extreme-interval-values.patchtext/x-patch; charset=UTF-8; name=0007-Fix-distance-calculation-for-extreme-interval-values.patchDownload

From 40566b81ff8d34f9a400d9b426b0301e8fbd577f Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Tue, 17 Oct 2023 20:38:01 +0200
Subject: [PATCH 7/8] Fix distance calculation for extreme interval values

Make sure we can calculate distance even for extreme interval values,
which for extreme values was triggering 'interval out of range' errors.
Instead of building a new interval, calculate the distance directly.
---
 src/backend/access/brin/brin_minmax_multi.c | 33 +++------------------
 1 file changed, 4 insertions(+), 29 deletions(-)

diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index 37706e5bf28..9811451b542 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2160,45 +2160,20 @@ brin_minmax_multi_distance_interval(PG_FUNCTION_ARGS)
 
 	Interval   *ia = PG_GETARG_INTERVAL_P(0);
 	Interval   *ib = PG_GETARG_INTERVAL_P(1);
-	Interval   *result;
 
 	int64		dayfraction;
 	int64		days;
 
-	result = (Interval *) palloc(sizeof(Interval));
-
-	result->month = ib->month - ia->month;
-	/* overflow check copied from int4mi */
-	if (!SAMESIGN(ib->month, ia->month) &&
-		!SAMESIGN(result->month, ib->month))
-		ereport(ERROR,
-				(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
-				 errmsg("interval out of range")));
-
-	result->day = ib->day - ia->day;
-	if (!SAMESIGN(ib->day, ia->day) &&
-		!SAMESIGN(result->day, ib->day))
-		ereport(ERROR,
-				(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
-				 errmsg("interval out of range")));
-
-	result->time = ib->time - ia->time;
-	if (!SAMESIGN(ib->time, ia->time) &&
-		!SAMESIGN(result->time, ib->time))
-		ereport(ERROR,
-				(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
-				 errmsg("interval out of range")));
-
 	/*
 	 * Delta is (fractional) number of days between the intervals. Assume
 	 * months have 30 days for consistency with interval_cmp_internal. We
 	 * don't need to be exact, in the worst case we'll build a bit less
 	 * efficient ranges. But we should not contradict interval_cmp.
 	 */
-	dayfraction = result->time % USECS_PER_DAY;
-	days = result->time / USECS_PER_DAY;
-	days += result->month * INT64CONST(30);
-	days += result->day;
+	dayfraction = (ib->time % USECS_PER_DAY) - (ia->time % USECS_PER_DAY);
+	days = (ib->time / USECS_PER_DAY) - (ia->time / USECS_PER_DAY);
+	days += (int64) ib->day - (int64) ia->day;
+	days += ((int64) ib->month - (int64) ia->month) * INT64CONST(30);
 
 	/* convert to double precision */
 	delta = (double) days + dayfraction / (double) USECS_PER_DAY;
-- 
2.41.0

0008-Add-more-tests-for-BRIN-on-interval-values.patchtext/x-patch; charset=UTF-8; name=0008-Add-more-tests-for-BRIN-on-interval-values.patchDownload

From 401bed518f89454eac4bf0a8b871ccae57cd53f9 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Tue, 17 Oct 2023 20:49:01 +0200
Subject: [PATCH 8/8] Add more tests for BRIN on interval values

Make sure we don't build inefficient ranges on large interval values.
---
 src/test/regress/expected/brin_multi.out | 49 ++++++++++++++++++++++++
 src/test/regress/sql/brin_multi.sql      | 40 +++++++++++++++++++
 2 files changed, 89 insertions(+)

diff --git a/src/test/regress/expected/brin_multi.out b/src/test/regress/expected/brin_multi.out
index e07042cd8a1..d361b344d42 100644
--- a/src/test/regress/expected/brin_multi.out
+++ b/src/test/regress/expected/brin_multi.out
@@ -977,3 +977,52 @@ INSERT INTO brin_interval_test VALUES
 ('-178000000 years'), ('178000000 years');
 CREATE INDEX ON brin_interval_test USING brin (a interval_minmax_multi_ops) WITH (pages_per_range=1);
 DROP TABLE brin_interval_test;
+-- test handling of long intervals
+CREATE TABLE brin_interval_test(a INTERVAL);
+INSERT INTO brin_interval_test VALUES
+('-1 years'), ('1 years'),
+('-2 years'), ('2 years'),
+('-3 years'), ('3 years'),
+('-4 years'), ('4 years'),
+('-5 years'), ('5 years'),
+('-6 years'), ('6 years'),
+('-7 years'), ('7 years'),
+('-8 years'), ('8 years'),
+('-9 years'), ('9 years'),
+('-10 years'), ('10 years'),
+('-11 years'), ('11 years'),
+('-12 years'), ('12 years'),
+('-13 years'), ('13 years'),
+('-14 years'), ('14 years'),
+('-15 years'), ('15 years'),
+('-16 years'), ('16 years'),
+('-17 years'), ('17 years'),
+('-18 years'), ('18 years'),
+('-19 years'), ('19 years'),
+('-20 years'), ('20 years'),
+('-178000000 years'), ('178000000 years');
+CREATE INDEX ON brin_interval_test USING brin (a interval_minmax_multi_ops) WITH (pages_per_range=1);
+SET enable_seqscan = off;
+EXPLAIN (ANALYZE, TIMING OFF, COSTS OFF, SUMMARY OFF)
+SELECT * FROM brin_interval_test WHERE a = '-30 years'::interval;
+                                 QUERY PLAN                                  
+-----------------------------------------------------------------------------
+ Bitmap Heap Scan on brin_interval_test (actual rows=0 loops=1)
+   Recheck Cond: (a = '@ 30 years ago'::interval)
+   ->  Bitmap Index Scan on brin_interval_test_a_idx (actual rows=0 loops=1)
+         Index Cond: (a = '@ 30 years ago'::interval)
+(4 rows)
+
+EXPLAIN (ANALYZE, TIMING OFF, COSTS OFF, SUMMARY OFF)
+SELECT * FROM brin_interval_test WHERE a = '30 years'::interval;
+                                 QUERY PLAN                                  
+-----------------------------------------------------------------------------
+ Bitmap Heap Scan on brin_interval_test (actual rows=0 loops=1)
+   Recheck Cond: (a = '@ 30 years'::interval)
+   ->  Bitmap Index Scan on brin_interval_test_a_idx (actual rows=0 loops=1)
+         Index Cond: (a = '@ 30 years'::interval)
+(4 rows)
+
+DROP TABLE brin_interval_test;
+RESET enable_seqscan;
+RESET datestyle;
diff --git a/src/test/regress/sql/brin_multi.sql b/src/test/regress/sql/brin_multi.sql
index 302b41e7fd0..e22ce0c138b 100644
--- a/src/test/regress/sql/brin_multi.sql
+++ b/src/test/regress/sql/brin_multi.sql
@@ -729,3 +729,43 @@ INSERT INTO brin_interval_test VALUES
 
 CREATE INDEX ON brin_interval_test USING brin (a interval_minmax_multi_ops) WITH (pages_per_range=1);
 DROP TABLE brin_interval_test;
+
+-- test handling of long intervals
+CREATE TABLE brin_interval_test(a INTERVAL);
+
+INSERT INTO brin_interval_test VALUES
+('-1 years'), ('1 years'),
+('-2 years'), ('2 years'),
+('-3 years'), ('3 years'),
+('-4 years'), ('4 years'),
+('-5 years'), ('5 years'),
+('-6 years'), ('6 years'),
+('-7 years'), ('7 years'),
+('-8 years'), ('8 years'),
+('-9 years'), ('9 years'),
+('-10 years'), ('10 years'),
+('-11 years'), ('11 years'),
+('-12 years'), ('12 years'),
+('-13 years'), ('13 years'),
+('-14 years'), ('14 years'),
+('-15 years'), ('15 years'),
+('-16 years'), ('16 years'),
+('-17 years'), ('17 years'),
+('-18 years'), ('18 years'),
+('-19 years'), ('19 years'),
+('-20 years'), ('20 years'),
+('-178000000 years'), ('178000000 years');
+
+CREATE INDEX ON brin_interval_test USING brin (a interval_minmax_multi_ops) WITH (pages_per_range=1);
+
+SET enable_seqscan = off;
+
+EXPLAIN (ANALYZE, TIMING OFF, COSTS OFF, SUMMARY OFF)
+SELECT * FROM brin_interval_test WHERE a = '-30 years'::interval;
+
+EXPLAIN (ANALYZE, TIMING OFF, COSTS OFF, SUMMARY OFF)
+SELECT * FROM brin_interval_test WHERE a = '30 years'::interval;
+
+DROP TABLE brin_interval_test;
+RESET enable_seqscan;
+RESET datestyle;
-- 
2.41.0

#11

Tomas Vondra

tomas.vondra@enterprisedb.com

about 2 years ago

In reply to: Tomas Vondra (#10)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On 10/17/23 22:25, Tomas Vondra wrote:

Hi,

Here's a couple cleaned-up patches fixing the various discussed here.
I've tried to always add a regression test demonstrating the issue
first, and then fix it in the next patch.

In particular, this deals with these issues:

1) overflows in distance calculation for large timestamp values (0002)

2) incorrect subtraction in distance for date values (0003)

3) incorrect distance for infinite date/timestamp values (0005)

4) failing distance for extreme interval values (0007)

All the problems except "2" have been discussed earlier, but this seems
a bit more serious than the other issues, as it's easier to hit. It
subtracts the values in the opposite order (smaller - larger), so the
distances are negated. Which means we actually merge the values from the
most distant ones, and thus are "guaranteed" to build very a very
inefficient summary. People with multi-minmax indexes on "date" columns
probably will need to reindex.

BTW when adding the tests with extreme values, I noticed this:

test=# select '5874897-01-01'::date;
date
---------------
5874897-01-01
(1 row)

test=# select '5874897-01-01'::date + '1 second'::interval;
ERROR: date out of range for timestamp

IIUC this happens because the first thing date_pl_interval does is
date2timestamp, ignoring the fact that the ranges of those data types
are different - dates allow values up to '5874897 AD' while timestamps
only allows values up to '294276 AD'.

This seems to be a long-standing behavior, added by a9e08392dd6f in
2004. Not sure how serious it is, I just noticed when I tried to do
arithmetics on the extreme values in tests.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12

Dean Rasheed

dean.a.rasheed@gmail.com

about 2 years ago

In reply to: Tomas Vondra (#10)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On Tue, 17 Oct 2023 at 21:25, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Here's a couple cleaned-up patches fixing the various discussed here.
I've tried to always add a regression test demonstrating the issue
first, and then fix it in the next patch.

This looks good to me.

2) incorrect subtraction in distance for date values (0003)

All the problems except "2" have been discussed earlier, but this seems
a bit more serious than the other issues, as it's easier to hit. It
subtracts the values in the opposite order (smaller - larger), so the
distances are negated. Which means we actually merge the values from the
most distant ones, and thus are "guaranteed" to build very a very
inefficient summary.

Yeah, that's not good. Amusingly this accidentally made infinite dates
behave correctly, since they were distance 0 away from anything else,
which was larger than all the other negative distances! But yes, that
needed fixing properly.

Thanks for taking care of this.

Regards,
Dean

#13

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

about 2 years ago

In reply to: Tomas Vondra (#10)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On Wed, Oct 18, 2023 at 1:55 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Hi,

Here's a couple cleaned-up patches fixing the various discussed here.
I've tried to always add a regression test demonstrating the issue
first, and then fix it in the next patch.

It will be good to commit the test changes as well.

In particular, this deals with these issues:

1) overflows in distance calculation for large timestamp values (0002)

I could reduce the SQL for timestamp overflow test to just
-- test overflows during CREATE INDEX with extreme timestamp values
CREATE TEMPORARY TABLE brin_timestamp_test(a TIMESTAMPTZ);

SET datestyle TO iso;

INSERT INTO brin_timestamp_test VALUES
('4713-01-01 00:00:30 BC'),
('294276-12-01 00:00:01');

CREATE INDEX ON brin_timestamp_test USING brin (a
timestamptz_minmax_multi_ops) WITH (pages_per_range=1);

I didn't understand the purpose of adding 60 odd values to the table.
It didn't tell which of those values triggers the overflow. Minimal
set above is much easier to understand IMO. Using a temporary table
just avoids DROP TABLE statement. But I am ok if you want to use
non-temporary table with DROP.

Code changes in 0002 look fine. Do we want to add a comment "cast to a
wider datatype to avoid overflow"? Or is that too explicit?

The code changes fix the timestamp issue but there's a diff in case of

2) incorrect subtraction in distance for date values (0003)

The test case for date brin index didn't crash though. Even after
applying 0003 patch. The reason why date subtraction can't overflow is
a bit obscure. PostgreSQL doesn't allow dates beyond 4714-12-31 BC
because of the code below
#define IS_VALID_DATE(d) \
((DATETIME_MIN_JULIAN - POSTGRES_EPOCH_JDATE) <= (d) && \
(d) < (DATE_END_JULIAN - POSTGRES_EPOCH_JDATE))
This prevents the lower side to be well within the negative int32
overflow threshold and we always subtract higher value from the lower
one. May be good to elaborate this? A later patch does use float 8
casting eliminating "any" possibility of overflow. So the comment may
not be necessary after squashing all the changes.

3) incorrect distance for infinite date/timestamp values (0005)

The tests could use a minimal set of rows here too.

The code changes look fine and fix the problem seen with the tests alone.

4) failing distance for extreme interval values (0007)

I could reproduce the issue with a minimal set of values
-- test handling of overflow for interval values
CREATE TABLE brin_interval_test(a INTERVAL);

INSERT INTO brin_interval_test VALUES
('177999985 years'),
('-178000000 years');

CREATE INDEX ON brin_interval_test USING brin (a
interval_minmax_multi_ops) WITH (pages_per_range=1);
DROP TABLE brin_interval_test;

The code looks fine and fixed the issue seen with the test.

We may want to combine various test cases though. Like the test adding
infinity and extreme values could be combined. Also the number of
values it inserts in the table for the reasons stated above.

All the problems except "2" have been discussed earlier, but this seems
a bit more serious than the other issues, as it's easier to hit. It
subtracts the values in the opposite order (smaller - larger), so the
distances are negated. Which means we actually merge the values from the
most distant ones, and thus are "guaranteed" to build very a very
inefficient summary. People with multi-minmax indexes on "date" columns
probably will need to reindex.

Right. Do we highlight that in the commit message so that the person
writing release notes picks it up from there?

--
Best Wishes,
Ashutosh Bapat

#14

Dean Rasheed

dean.a.rasheed@gmail.com

about 2 years ago

In reply to: Tomas Vondra (#11)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On Wed, 18 Oct 2023 at 09:16, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

BTW when adding the tests with extreme values, I noticed this:

test=# select '5874897-01-01'::date;
date
---------------
5874897-01-01
(1 row)

test=# select '5874897-01-01'::date + '1 second'::interval;
ERROR: date out of range for timestamp

That's correct because date + interval returns timestamp, and the
value is out of range for a timestamp. This is equivalent to:

select '5874897-01-01'::date::timestamp + '1 second'::interval;
ERROR: date out of range for timestamp

and I think it's good that it gives a different error from this:

select '294276-01-01'::date::timestamp + '1 year'::interval;
ERROR: timestamp out of range

so you can tell that the overflow in the first case happens before the addition.

Of course a side effect of internally casting first is that you can't
do things like this:

select '5874897-01-01'::date - '5872897 years'::interval;
ERROR: date out of range for timestamp

which arguably ought to return '2000-01-01 00:00:00'. In practice
though, I think it would be far more trouble than it's worth trying to
change that.

Regards,
Dean

#15

Tomas Vondra

tomas.vondra@enterprisedb.com

about 2 years ago

In reply to: Dean Rasheed (#12)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On 10/18/23 12:13, Dean Rasheed wrote:

On Tue, 17 Oct 2023 at 21:25, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Here's a couple cleaned-up patches fixing the various discussed here.
I've tried to always add a regression test demonstrating the issue
first, and then fix it in the next patch.

This looks good to me.

2) incorrect subtraction in distance for date values (0003)

All the problems except "2" have been discussed earlier, but this seems
a bit more serious than the other issues, as it's easier to hit. It
subtracts the values in the opposite order (smaller - larger), so the
distances are negated. Which means we actually merge the values from the
most distant ones, and thus are "guaranteed" to build very a very
inefficient summary.

Yeah, that's not good. Amusingly this accidentally made infinite dates
behave correctly, since they were distance 0 away from anything else,
which was larger than all the other negative distances! But yes, that
needed fixing properly.

Right. Apparently two wrongs can make a right, after all ;-)

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16

Tomas Vondra

tomas.vondra@enterprisedb.com

about 2 years ago

In reply to: Ashutosh Bapat (#13)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On 10/18/23 12:47, Ashutosh Bapat wrote:

On Wed, Oct 18, 2023 at 1:55 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Hi,

Here's a couple cleaned-up patches fixing the various discussed here.
I've tried to always add a regression test demonstrating the issue
first, and then fix it in the next patch.

It will be good to commit the test changes as well.

I do plan to commit them, ofc. I was just explaining why I'm adding the
tests first, and then fixing the issue in a separate commit.

In particular, this deals with these issues:

1) overflows in distance calculation for large timestamp values (0002)

I could reduce the SQL for timestamp overflow test to just
-- test overflows during CREATE INDEX with extreme timestamp values
CREATE TEMPORARY TABLE brin_timestamp_test(a TIMESTAMPTZ);

SET datestyle TO iso;

INSERT INTO brin_timestamp_test VALUES
('4713-01-01 00:00:30 BC'),
('294276-12-01 00:00:01');

CREATE INDEX ON brin_timestamp_test USING brin (a
timestamptz_minmax_multi_ops) WITH (pages_per_range=1);

I didn't understand the purpose of adding 60 odd values to the table.
It didn't tell which of those values triggers the overflow. Minimal
set above is much easier to understand IMO. Using a temporary table
just avoids DROP TABLE statement. But I am ok if you want to use
non-temporary table with DROP.

Code changes in 0002 look fine. Do we want to add a comment "cast to a
wider datatype to avoid overflow"? Or is that too explicit?

The code changes fix the timestamp issue but there's a diff in case of

I did use that many values to actually force "compaction" and merging of
points into ranges. We only keep 32 values per page range, so with 2
values we'll not build a range. You're right it may still trigger the
overflow (we probably still calculate distances, I didn't realize that),
but without the compaction we can't check the query plans.

However, I agree 60 values may be a bit too much. And I realized we can
reduce the count quite a bit by using the values_per_range option. We
could set it to 8 (which is the minimum).

2) incorrect subtraction in distance for date values (0003)

The test case for date brin index didn't crash though. Even after
applying 0003 patch. The reason why date subtraction can't overflow is
a bit obscure. PostgreSQL doesn't allow dates beyond 4714-12-31 BC
because of the code below
#define IS_VALID_DATE(d) \
((DATETIME_MIN_JULIAN - POSTGRES_EPOCH_JDATE) <= (d) && \
(d) < (DATE_END_JULIAN - POSTGRES_EPOCH_JDATE))
This prevents the lower side to be well within the negative int32
overflow threshold and we always subtract higher value from the lower
one. May be good to elaborate this? A later patch does use float 8
casting eliminating "any" possibility of overflow. So the comment may
not be necessary after squashing all the changes.

Not sure what you mean by "crash". Yes, it doesn't hit an assert,
because there's none when calculating distance for date. It however
should fail in the query plan check due to the incorrect order of
subtractions.

Also, the commit message does not claim to fix overflow. In fact, it
says it can't overflow ...

3) incorrect distance for infinite date/timestamp values (0005)

The tests could use a minimal set of rows here too.

The code changes look fine and fix the problem seen with the tests alone.

4) failing distance for extreme interval values (0007)

I could reproduce the issue with a minimal set of values
-- test handling of overflow for interval values
CREATE TABLE brin_interval_test(a INTERVAL);

INSERT INTO brin_interval_test VALUES
('177999985 years'),
('-178000000 years');

CREATE INDEX ON brin_interval_test USING brin (a
interval_minmax_multi_ops) WITH (pages_per_range=1);
DROP TABLE brin_interval_test;

The code looks fine and fixed the issue seen with the test.

We may want to combine various test cases though. Like the test adding
infinity and extreme values could be combined. Also the number of
values it inserts in the table for the reasons stated above.

I prefer not to do that. I find it more comprehensible to keep the tests
separate / testing different things. If the tests were expensive to
setup or something like that, that'd be a different situation.

All the problems except "2" have been discussed earlier, but this seems
a bit more serious than the other issues, as it's easier to hit. It
subtracts the values in the opposite order (smaller - larger), so the
distances are negated. Which means we actually merge the values from the
most distant ones, and thus are "guaranteed" to build very a very
inefficient summary. People with multi-minmax indexes on "date" columns
probably will need to reindex.

Right. Do we highlight that in the commit message so that the person
writing release notes picks it up from there?

Yes, I think I'll mention what impact each issue can have on indexes.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

about 2 years ago

In reply to: Tomas Vondra (#16)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On Wed, Oct 18, 2023 at 8:23 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

I did use that many values to actually force "compaction" and merging of
points into ranges. We only keep 32 values per page range, so with 2
values we'll not build a range. You're right it may still trigger the
overflow (we probably still calculate distances, I didn't realize that),
but without the compaction we can't check the query plans.

However, I agree 60 values may be a bit too much. And I realized we can
reduce the count quite a bit by using the values_per_range option. We
could set it to 8 (which is the minimum).

I haven't read BRIN code, except the comments in the beginning of the
file. From what you describe it seems we will store first 32 values as
is, but later as the number of values grow create ranges from those?
Please point me to the relevant source code/documentation. Anyway, if
we can reduce the number of rows we insert, that will be good.

Not sure what you mean by "crash". Yes, it doesn't hit an assert,
because there's none when calculating distance for date. It however
should fail in the query plan check due to the incorrect order of
subtractions.

Also, the commit message does not claim to fix overflow. In fact, it
says it can't overflow ...

Reading the commit message
"Tests for overflows with dates and timestamps in BRIN ...

...

The new regression tests check this for date and timestamp data types.
It adds tables with data close to the allowed min/max values, and builds
a minmax-multi index on it."

I expected the CREATE INDEX statement to throw an error or fail the
"Assert(delta >= 0);" in brin_minmax_multi_distance_date(). But a
later commit mentions that the overflow is not possible.

We may want to combine various test cases though. Like the test adding
infinity and extreme values could be combined. Also the number of
values it inserts in the table for the reasons stated above.

I prefer not to do that. I find it more comprehensible to keep the tests
separate / testing different things. If the tests were expensive to
setup or something like that, that'd be a different situation.

Fair enough.

Right. Do we highlight that in the commit message so that the person
writing release notes picks it up from there?

Yes, I think I'll mention what impact each issue can have on indexes.

Thanks.

--
Best Wishes,
Ashutosh Bapat

#18

Dean Rasheed

dean.a.rasheed@gmail.com

about 2 years ago

In reply to: Ashutosh Bapat (#17)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On Thu, 19 Oct 2023, 05:32 Ashutosh Bapat, <ashutosh.bapat.oss@gmail.com>
wrote:

On Wed, Oct 18, 2023 at 8:23 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

I did use that many values to actually force "compaction" and merging of
points into ranges. We only keep 32 values per page range, so with 2
values we'll not build a range. You're right it may still trigger the
overflow (we probably still calculate distances, I didn't realize that),
but without the compaction we can't check the query plans.

However, I agree 60 values may be a bit too much. And I realized we can
reduce the count quite a bit by using the values_per_range option. We
could set it to 8 (which is the minimum).

I haven't read BRIN code, except the comments in the beginning of the
file. From what you describe it seems we will store first 32 values as
is, but later as the number of values grow create ranges from those?
Please point me to the relevant source code/documentation. Anyway, if
we can reduce the number of rows we insert, that will be good.

I don't think 60 values is excessive, but instead of listing them out by
hand, perhaps use generate_series().

Regards,
Dean

#19

Tomas Vondra

tomas.vondra@enterprisedb.com

about 2 years ago

In reply to: Ashutosh Bapat (#17)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On 10/19/23 06:32, Ashutosh Bapat wrote:

On Wed, Oct 18, 2023 at 8:23 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

I did use that many values to actually force "compaction" and merging of
points into ranges. We only keep 32 values per page range, so with 2
values we'll not build a range. You're right it may still trigger the
overflow (we probably still calculate distances, I didn't realize that),
but without the compaction we can't check the query plans.

However, I agree 60 values may be a bit too much. And I realized we can
reduce the count quite a bit by using the values_per_range option. We
could set it to 8 (which is the minimum).

I haven't read BRIN code, except the comments in the beginning of the
file. From what you describe it seems we will store first 32 values as
is, but later as the number of values grow create ranges from those?
Please point me to the relevant source code/documentation. Anyway, if
we can reduce the number of rows we insert, that will be good.

I don't think we have documentation other than what's at the beginning
of the file. What the comment tries to explain is that the summary has a
maximum size (32 values by default), and each value can be either a
point or a range. A point requires one value, range requires two. So we
accumulate values one by one - until 32 values that's fine. Once we get
33, we have to merge some of the points into ranges, and we do that in a
greedy way by distance.

For example, this may happen:

33 values
-> 31 values + 1 range [requires 33]
-> 30 values + 1 range [requires 32]
...

The exact steps depend on which values/ranges are picked for the merge,
of course. In any case, there's no difference between the initial 32
values and the values added later.

Does that explain the algorithm? I'm not against clarifying the comment,
of course.

Not sure what you mean by "crash". Yes, it doesn't hit an assert,
because there's none when calculating distance for date. It however
should fail in the query plan check due to the incorrect order of
subtractions.

Also, the commit message does not claim to fix overflow. In fact, it
says it can't overflow ...

Reading the commit message
"Tests for overflows with dates and timestamps in BRIN ...

...

The new regression tests check this for date and timestamp data types.
It adds tables with data close to the allowed min/max values, and builds
a minmax-multi index on it."

I expected the CREATE INDEX statement to throw an error or fail the
"Assert(delta >= 0);" in brin_minmax_multi_distance_date(). But a
later commit mentions that the overflow is not possible.

Hmmm, yeah. The comment should mention the date doesn't have issue with
overflows, but other bugs.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20

Tomas Vondra

tomas.vondra@enterprisedb.com

about 2 years ago

In reply to: Dean Rasheed (#18)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On 10/19/23 09:04, Dean Rasheed wrote:

On Thu, 19 Oct 2023, 05:32 Ashutosh Bapat, <ashutosh.bapat.oss@gmail.com
<mailto:ashutosh.bapat.oss@gmail.com>> wrote:

On Wed, Oct 18, 2023 at 8:23 PM Tomas Vondra
<tomas.vondra@enterprisedb.com
<mailto:tomas.vondra@enterprisedb.com>> wrote:

I did use that many values to actually force "compaction" and

merging of

points into ranges. We only keep 32 values per page range, so with 2
values we'll not build a range. You're right it may still trigger the
overflow (we probably still calculate distances, I didn't realize

that),

but without the compaction we can't check the query plans.

However, I agree 60 values may be a bit too much. And I realized

we can

reduce the count quite a bit by using the values_per_range option. We
could set it to 8 (which is the minimum).

I haven't read BRIN code, except the comments in the beginning of the
file. From what you describe it seems we will store first 32 values as
is, but later as the number of values grow create ranges from those?
Please point me to the relevant source code/documentation. Anyway, if
we can reduce the number of rows we insert, that will be good.

I don't think 60 values is excessive, but instead of listing them out by
hand, perhaps use generate_series().

I tried to do that, but I ran into troubles with the "date" tests. I
needed to build values that close to the min/max values, so I did
something like

SELECT '4713-01-01 BC'::date + (i || ' days')::interval FROM
generate_series(1,10) s(i);

And then the same for the max date, but that fails because of the
date/timestamp conversion in date plus operator.

However, maybe two simple generate_series() would work ...

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#21

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

about 2 years ago

In reply to: Tomas Vondra (#19)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On Thu, Oct 19, 2023 at 2:31 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Does that explain the algorithm? I'm not against clarifying the comment,
of course.

Thanks a lot for this explanation. It's clear now.

I tried to do that, but I ran into troubles with the "date" tests. I
needed to build values that close to the min/max values, so I did
something like

SELECT '4713-01-01 BC'::date + (i || ' days')::interval FROM
generate_series(1,10) s(i);

And then the same for the max date, but that fails because of the
date/timestamp conversion in date plus operator.

However, maybe two simple generate_series() would work ...

Something like this? select i::date from generate_series('4713-02-01
BC'::date, '4713-01-01 BC'::date, '-1 day'::interval) i;
i
---------------
4713-02-01 BC
4713-01-31 BC
4713-01-30 BC
4713-01-29 BC
4713-01-28 BC
4713-01-27 BC
4713-01-26 BC
4713-01-25 BC
4713-01-24 BC
4713-01-23 BC
4713-01-22 BC
4713-01-21 BC
4713-01-20 BC
4713-01-19 BC
4713-01-18 BC
4713-01-17 BC
4713-01-16 BC
4713-01-15 BC
4713-01-14 BC
4713-01-13 BC
4713-01-12 BC
4713-01-11 BC
4713-01-10 BC
4713-01-09 BC
4713-01-08 BC
4713-01-07 BC
4713-01-06 BC
4713-01-05 BC
4713-01-04 BC
4713-01-03 BC
4713-01-02 BC
4713-01-01 BC
(32 rows)

--
Best Wishes,
Ashutosh Bapat

#22

Tomas Vondra

tomas.vondra@enterprisedb.com

about 2 years ago

In reply to: Ashutosh Bapat (#21)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On 10/19/23 11:22, Ashutosh Bapat wrote:

On Thu, Oct 19, 2023 at 2:31 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Does that explain the algorithm? I'm not against clarifying the comment,
of course.

Thanks a lot for this explanation. It's clear now.

I tried to do that, but I ran into troubles with the "date" tests. I
needed to build values that close to the min/max values, so I did
something like

SELECT '4713-01-01 BC'::date + (i || ' days')::interval FROM
generate_series(1,10) s(i);

And then the same for the max date, but that fails because of the
date/timestamp conversion in date plus operator.

However, maybe two simple generate_series() would work ...

Something like this? select i::date from generate_series('4713-02-01
BC'::date, '4713-01-01 BC'::date, '-1 day'::interval) i;

That works, but if you try the same thing with the largest date, that'll
fail

select i::date from generate_series('5874896-12-01'::date,
'5874897-01-01'::date,
'1 day'::interval) i;

ERROR: date out of range for timestamp

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#23

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

about 2 years ago

In reply to: Tomas Vondra (#22)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On Thu, Oct 19, 2023 at 4:51 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 10/19/23 11:22, Ashutosh Bapat wrote:

On Thu, Oct 19, 2023 at 2:31 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Does that explain the algorithm? I'm not against clarifying the comment,
of course.

Thanks a lot for this explanation. It's clear now.

I tried to do that, but I ran into troubles with the "date" tests. I
needed to build values that close to the min/max values, so I did
something like

SELECT '4713-01-01 BC'::date + (i || ' days')::interval FROM
generate_series(1,10) s(i);

And then the same for the max date, but that fails because of the
date/timestamp conversion in date plus operator.

However, maybe two simple generate_series() would work ...

Something like this? select i::date from generate_series('4713-02-01
BC'::date, '4713-01-01 BC'::date, '-1 day'::interval) i;

That works, but if you try the same thing with the largest date, that'll
fail

select i::date from generate_series('5874896-12-01'::date,
'5874897-01-01'::date,
'1 day'::interval) i;

ERROR: date out of range for timestamp

Hmm, I see. This uses generate_series(timestamp, timestamp, interval) version.

date + integer -> date though, so the following works. It's also an
example at https://www.postgresql.org/docs/16/functions-srf.html.
#SELECT '5874896-12-01'::date + i FROM
generate_series(1,10) s(i);

I think we should provide generate_series(date, date, integer) which
will use date + integer -> date.

--
Best Wishes,
Ashutosh Bapat

#24

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

about 2 years ago

In reply to: Ashutosh Bapat (#23)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On Thu, Oct 19, 2023 at 6:11 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

I think we should provide generate_series(date, date, integer) which
will use date + integer -> date.

Just to be clear, I don't mean that this patch should add it.

--
Best Wishes,
Ashutosh Bapat

#25

Tomas Vondra

tomas.vondra@enterprisedb.com

about 2 years ago

In reply to: Ashutosh Bapat (#24)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On 10/20/23 11:52, Ashutosh Bapat wrote:

On Thu, Oct 19, 2023 at 6:11 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

I think we should provide generate_series(date, date, integer) which
will use date + integer -> date.

Just to be clear, I don't mean that this patch should add it.

I'm not against adding such generate_series() variant. For this patch
I'll use something like the query you proposed, I think.

I was thinking about the (date + interval) failure a bit more, and while
I think it's confusing it's not quite wrong. The problem is that the
interval may have hours/minutes, so it makes sense that the operator
returns timestamp. That's not what most operators do, where the data
type does not change. So a bit unexpected, but seems correct.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#26

Tomas Vondra

tomas.vondra@enterprisedb.com

about 2 years ago

In reply to: Tomas Vondra (#25)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

FWIW I've cleaned up and pushed all the patches we came up with this
thread. And I've backpatched all of them to 14+.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#27

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

about 2 years ago

In reply to: Tomas Vondra (#26)

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On Fri, Oct 27, 2023 at 10:32 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

FWIW I've cleaned up and pushed all the patches we came up with this
thread. And I've backpatched all of them to 14+.

Thanks a lot Tomas.

--
Best Wishes,
Ashutosh Bapat