Missing update of all_hasnulls in BRIN opclasses
Hi,
While working on some BRIN code, I discovered a bug in handling NULL
values - when inserting a non-NULL value into a NULL-only range, we
reset the all_nulls flag but don't update the has_nulls flag. And
because of that we then fail to return the range for IS NULL ranges.
Reproducing this is trivial:
create table t (a int);
create index on t using brin (a);
insert into t values (null);
insert into t values (1);
set enable_seqscan = off;
select * from t where a is null;
This should return 1 row, but actually it returns no rows.
Attached is a patch fixing this by properly updating the has_nulls flag.
I reproduced this all the way back to 9.5, so it's a long-standing bug.
It's interesting no one noticed / reported it so far, it doesn't seem
like a particularly rare corner case.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
brin-hasnulls-fix.patchtext/x-patch; charset=UTF-8; name=brin-hasnulls-fix.patchDownload
diff --git a/src/backend/access/brin/brin_bloom.c b/src/backend/access/brin/brin_bloom.c
index 6b0af7267d5..60315450b41 100644
--- a/src/backend/access/brin/brin_bloom.c
+++ b/src/backend/access/brin/brin_bloom.c
@@ -539,6 +539,7 @@ brin_bloom_add_value(PG_FUNCTION_ARGS)
BloomGetFalsePositiveRate(opts));
column->bv_values[0] = PointerGetDatum(filter);
column->bv_allnulls = false;
+ column->bv_hasnulls = true;
updated = true;
}
else
diff --git a/src/backend/access/brin/brin_inclusion.c b/src/backend/access/brin/brin_inclusion.c
index 4b02d374f23..e0f44d3e62c 100644
--- a/src/backend/access/brin/brin_inclusion.c
+++ b/src/backend/access/brin/brin_inclusion.c
@@ -164,6 +164,7 @@ brin_inclusion_add_value(PG_FUNCTION_ARGS)
column->bv_values[INCLUSION_UNMERGEABLE] = BoolGetDatum(false);
column->bv_values[INCLUSION_CONTAINS_EMPTY] = BoolGetDatum(false);
column->bv_allnulls = false;
+ column->bv_hasnulls = true;
new = true;
}
diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 9e8a8e056cc..8a5661a8952 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -90,6 +90,7 @@ brin_minmax_add_value(PG_FUNCTION_ARGS)
column->bv_values[0] = datumCopy(newval, attr->attbyval, attr->attlen);
column->bv_values[1] = datumCopy(newval, attr->attbyval, attr->attlen);
column->bv_allnulls = false;
+ column->bv_hasnulls = true;
PG_RETURN_BOOL(true);
}
diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index 9a0bcf6698d..4e7119e2d78 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2500,6 +2500,7 @@ brin_minmax_multi_add_value(PG_FUNCTION_ARGS)
MemoryContextSwitchTo(oldctx);
column->bv_allnulls = false;
+ column->bv_hasnulls = true;
modified = true;
column->bv_mem_value = PointerGetDatum(ranges);
diff --git a/src/test/regress/expected/brin.out b/src/test/regress/expected/brin.out
index 73fa38396e4..cc896c2d9d4 100644
--- a/src/test/regress/expected/brin.out
+++ b/src/test/regress/expected/brin.out
@@ -572,3 +572,39 @@ CREATE UNLOGGED TABLE brintest_unlogged (n numrange);
CREATE INDEX brinidx_unlogged ON brintest_unlogged USING brin (n);
INSERT INTO brintest_unlogged VALUES (numrange(0, 2^1000::numeric));
DROP TABLE brintest_unlogged;
+-- test that we properly update has_nulls when inserting something into
+-- a range that only had NULLs before
+CREATE TABLE brintest_4 (a INT, b INT, c INT, d INET);
+CREATE INDEX brintest_4_idx ON brintest_4 USING brin (a, b int4_minmax_multi_ops, c int4_bloom_ops, d inet_inclusion_ops);
+-- insert a NULL value, so that we get an all-nulls range
+INSERT INTO brintest_4 VALUES (NULL, NULL, NULL, NULL);
+-- now insert a non-NULL value
+INSERT INTO brintest_4 VALUES (1, 1, 1, '127.0.0.1');
+-- see that we can still match the value when using the brin index
+SET enable_seqscan = off;
+SELECT * FROM brintest_4 WHERE a IS NULL;
+ a | b | c | d
+---+---+---+---
+ | | |
+(1 row)
+
+SELECT * FROM brintest_4 WHERE b IS NULL;
+ a | b | c | d
+---+---+---+---
+ | | |
+(1 row)
+
+SELECT * FROM brintest_4 WHERE c IS NULL;
+ a | b | c | d
+---+---+---+---
+ | | |
+(1 row)
+
+SELECT * FROM brintest_4 WHERE d IS NULL;
+ a | b | c | d
+---+---+---+---
+ | | |
+(1 row)
+
+DROP TABLE brintest_4;
+SET enable_seqscan = off;
diff --git a/src/test/regress/sql/brin.sql b/src/test/regress/sql/brin.sql
index e68e9e18df5..17a01a4b82f 100644
--- a/src/test/regress/sql/brin.sql
+++ b/src/test/regress/sql/brin.sql
@@ -515,3 +515,24 @@ CREATE UNLOGGED TABLE brintest_unlogged (n numrange);
CREATE INDEX brinidx_unlogged ON brintest_unlogged USING brin (n);
INSERT INTO brintest_unlogged VALUES (numrange(0, 2^1000::numeric));
DROP TABLE brintest_unlogged;
+
+-- test that we properly update has_nulls when inserting something into
+-- a range that only had NULLs before
+CREATE TABLE brintest_4 (a INT, b INT, c INT, d INET);
+CREATE INDEX brintest_4_idx ON brintest_4 USING brin (a, b int4_minmax_multi_ops, c int4_bloom_ops, d inet_inclusion_ops);
+
+-- insert a NULL value, so that we get an all-nulls range
+INSERT INTO brintest_4 VALUES (NULL, NULL, NULL, NULL);
+
+-- now insert a non-NULL value
+INSERT INTO brintest_4 VALUES (1, 1, 1, '127.0.0.1');
+
+-- see that we can still match the value when using the brin index
+SET enable_seqscan = off;
+SELECT * FROM brintest_4 WHERE a IS NULL;
+SELECT * FROM brintest_4 WHERE b IS NULL;
+SELECT * FROM brintest_4 WHERE c IS NULL;
+SELECT * FROM brintest_4 WHERE d IS NULL;
+
+DROP TABLE brintest_4;
+SET enable_seqscan = off;
On Fri, 21 Oct 2022 at 17:24, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
Hi,
While working on some BRIN code, I discovered a bug in handling NULL
values - when inserting a non-NULL value into a NULL-only range, we
reset the all_nulls flag but don't update the has_nulls flag. And
because of that we then fail to return the range for IS NULL ranges.
Ah, that's bad.
One question though: doesn't (shouldn't?) column->bv_allnulls already
imply column->bv_hasnulls? The column has nulls if all of the values
are null, right? Or is the description of the field deceptive, and
does bv_hasnulls actually mean "has nulls bitmap"?
Attached is a patch fixing this by properly updating the has_nulls flag.
One comment on the patch:
+SET enable_seqscan = off; + [...] +SET enable_seqscan = off;
Looks like duplicated SETs. Should that last one be RESET instead?
Apart from that, this patch looks good.
- Matthias
On 10/21/22 17:50, Matthias van de Meent wrote:
On Fri, 21 Oct 2022 at 17:24, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:Hi,
While working on some BRIN code, I discovered a bug in handling NULL
values - when inserting a non-NULL value into a NULL-only range, we
reset the all_nulls flag but don't update the has_nulls flag. And
because of that we then fail to return the range for IS NULL ranges.Ah, that's bad.
Yeah, I guess we'll need to inform the users to consider rebuilding BRIN
indexes on NULL-able columns.
One question though: doesn't (shouldn't?) column->bv_allnulls already
imply column->bv_hasnulls? The column has nulls if all of the values
are null, right? Or is the description of the field deceptive, and
does bv_hasnulls actually mean "has nulls bitmap"?
What null bitmap do you mean? We're talking about summary for a page
range - IIRC we translate this to nullbitmap for a BRIN tuple, but there
may be multiple columns, and "has nulls bitmap" is an aggregate over all
of them.
Yeah, maybe it'd make sense to also have has_nulls=true whenever
all_nulls=true, and maybe it'd be simpler because it'd be enough to
check just one flag in consistent function etc. But we still need to
track 2 different states - "has nulls" and "has summary".
In any case, this ship sailed long ago - at least for the existing
opclasses.
Attached is a patch fixing this by properly updating the has_nulls flag.
One comment on the patch:
+SET enable_seqscan = off; + [...] +SET enable_seqscan = off;Looks like duplicated SETs. Should that last one be RESET instead?
Yeah, should have been RESET.
Apart from that, this patch looks good.
Thanks!
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 10/21/22 18:44, Tomas Vondra wrote:
...
Apart from that, this patch looks good.
Sadly, I don't think we can fix it like this :-(
The problem is that all ranges start with all_nulls=true, because the
new range gets initialized by brin_memtuple_initialize() like that. But
this happens for *every* range before we even start processing the rows.
So this way all the ranges would end up with has_nulls=true, making that
flag pretty useless.
Actually, even just doing "truncate" on the table creates such all-nulls
range for the first range, and serializes it to disk.
I wondered why we even write such tuples for "empty" ranges to disk, for
example after "TRUNCATE" - the table is empty by definition, so how come
we write all-nulls brin summary for the first range?
For example brininsert() checks if the brin tuple was modified and needs
to be written back, but brinbuild() just ignores that, and initializes
(and writes) writes the tuple to disk anyway. I think we should not do
that - there should be a flag in BrinBuildState, tracking if the BRIN
tuple was modified, and we should only write it if it's true.
That means we should never get an on-disk summary representing nothing.
That doesn't fix the issue, though, because we still need to pass the
memtuple tuple to the add_value opclass procedure, and whether it sets
the has_nulls flag depends on whether it's a new tuple representing no
other rows (in which case has_nulls remains false) or whether it was
read from disk (in which case it needs to be flipped to 'true').
But the opclass has no way to tell the difference at the moment - it
just gets the BrinMemTuple. So we'd have to extend this, somehow.
I wonder how to do this in a back-patchable way - we can't add
parameters to the opclass procedure, and the other solution seems to be
storing it right in the BrinMemTuple, somehow. But that's likely an ABI
break :-(
The only solution I can think of is actually passing it using all_nulls
and has_nulls - we could set both flags to true (which we never do now)
and teach the opclass that it signifies "empty" (and thus not to update
has_nulls after resetting all_nulls).
Something like the attached (I haven't added any more tests, not sure
what would those look like - I can't think of a query testing this,
although maybe we could check how the flags change using pageinspect).
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
0001-fixup-brin-has_nulls.patchtext/x-patch; charset=UTF-8; name=0001-fixup-brin-has_nulls.patchDownload
From a99fd6a737cec24bb4063e99a241ff3e04c6ebb8 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Thu, 20 Oct 2022 19:55:23 +0200
Subject: [PATCH 1/9] fixup: brin has_nulls
---
src/backend/access/brin/brin_bloom.c | 1 +
src/backend/access/brin/brin_inclusion.c | 1 +
src/backend/access/brin/brin_minmax.c | 1 +
src/backend/access/brin/brin_minmax_multi.c | 1 +
4 files changed, 4 insertions(+)
diff --git a/src/backend/access/brin/brin_bloom.c b/src/backend/access/brin/brin_bloom.c
index 6b0af7267d5..60315450b41 100644
--- a/src/backend/access/brin/brin_bloom.c
+++ b/src/backend/access/brin/brin_bloom.c
@@ -539,6 +539,7 @@ brin_bloom_add_value(PG_FUNCTION_ARGS)
BloomGetFalsePositiveRate(opts));
column->bv_values[0] = PointerGetDatum(filter);
column->bv_allnulls = false;
+ column->bv_hasnulls = true;
updated = true;
}
else
diff --git a/src/backend/access/brin/brin_inclusion.c b/src/backend/access/brin/brin_inclusion.c
index 4b02d374f23..e0f44d3e62c 100644
--- a/src/backend/access/brin/brin_inclusion.c
+++ b/src/backend/access/brin/brin_inclusion.c
@@ -164,6 +164,7 @@ brin_inclusion_add_value(PG_FUNCTION_ARGS)
column->bv_values[INCLUSION_UNMERGEABLE] = BoolGetDatum(false);
column->bv_values[INCLUSION_CONTAINS_EMPTY] = BoolGetDatum(false);
column->bv_allnulls = false;
+ column->bv_hasnulls = true;
new = true;
}
diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 9e8a8e056cc..8a5661a8952 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -90,6 +90,7 @@ brin_minmax_add_value(PG_FUNCTION_ARGS)
column->bv_values[0] = datumCopy(newval, attr->attbyval, attr->attlen);
column->bv_values[1] = datumCopy(newval, attr->attbyval, attr->attlen);
column->bv_allnulls = false;
+ column->bv_hasnulls = true;
PG_RETURN_BOOL(true);
}
diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index 9a0bcf6698d..4e7119e2d78 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2500,6 +2500,7 @@ brin_minmax_multi_add_value(PG_FUNCTION_ARGS)
MemoryContextSwitchTo(oldctx);
column->bv_allnulls = false;
+ column->bv_hasnulls = true;
modified = true;
column->bv_mem_value = PointerGetDatum(ranges);
--
2.37.3
0002-fixup-brin-has_nulls-2.patchtext/x-patch; charset=UTF-8; name=0002-fixup-brin-has_nulls-2.patchDownload
From 57e53d34f2f7bba91fcc0de6f4eff551669554fb Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 22 Oct 2022 02:26:48 +0200
Subject: [PATCH 2/9] fixup: brin has_nulls 2
---
src/backend/access/brin/brin.c | 22 +++++++++++++--------
src/backend/access/brin/brin_bloom.c | 10 +++++++++-
src/backend/access/brin/brin_inclusion.c | 10 +++++++++-
src/backend/access/brin/brin_minmax.c | 10 +++++++++-
src/backend/access/brin/brin_minmax_multi.c | 10 +++++++++-
src/backend/access/brin/brin_tuple.c | 2 +-
6 files changed, 51 insertions(+), 13 deletions(-)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 20b7d65b948..6fabd14c263 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -56,6 +56,7 @@ typedef struct BrinBuildState
BrinRevmap *bs_rmAccess;
BrinDesc *bs_bdesc;
BrinMemTuple *bs_dtuple;
+ bool bs_modified;
} BrinBuildState;
/*
@@ -793,6 +794,7 @@ brinbuildCallback(Relation index,
/* set state to correspond to the next range */
state->bs_currRangeStart += state->bs_pagesPerRange;
+ state->bs_modified = false;
/* re-initialize state for it */
brin_memtuple_initialize(state->bs_dtuple, state->bs_bdesc);
@@ -801,6 +803,7 @@ brinbuildCallback(Relation index,
/* Accumulate the current tuple into the running state */
(void) add_values_to_range(index, state->bs_bdesc, state->bs_dtuple,
values, isnull);
+ state->bs_modified = true;
}
/*
@@ -1287,6 +1290,7 @@ initialize_brin_buildstate(Relation idxRel, BrinRevmap *revmap,
state->bs_rmAccess = revmap;
state->bs_bdesc = brin_build_desc(idxRel);
state->bs_dtuple = brin_new_memtuple(state->bs_bdesc);
+ state->bs_modified = false;
return state;
}
@@ -1569,14 +1573,16 @@ form_and_insert_tuple(BrinBuildState *state)
BrinTuple *tup;
Size size;
- tup = brin_form_tuple(state->bs_bdesc, state->bs_currRangeStart,
- state->bs_dtuple, &size);
- brin_doinsert(state->bs_irel, state->bs_pagesPerRange, state->bs_rmAccess,
- &state->bs_currentInsertBuf, state->bs_currRangeStart,
- tup, size);
- state->bs_numtuples++;
-
- pfree(tup);
+ if (state->bs_modified)
+ {
+ tup = brin_form_tuple(state->bs_bdesc, state->bs_currRangeStart,
+ state->bs_dtuple, &size);
+ brin_doinsert(state->bs_irel, state->bs_pagesPerRange, state->bs_rmAccess,
+ &state->bs_currentInsertBuf, state->bs_currRangeStart,
+ tup, size);
+ state->bs_numtuples++;
+ pfree(tup);
+ }
}
/*
diff --git a/src/backend/access/brin/brin_bloom.c b/src/backend/access/brin/brin_bloom.c
index 60315450b41..96e5961a408 100644
--- a/src/backend/access/brin/brin_bloom.c
+++ b/src/backend/access/brin/brin_bloom.c
@@ -539,7 +539,15 @@ brin_bloom_add_value(PG_FUNCTION_ARGS)
BloomGetFalsePositiveRate(opts));
column->bv_values[0] = PointerGetDatum(filter);
column->bv_allnulls = false;
- column->bv_hasnulls = true;
+
+ /*
+ * When both bv_allnulls and bv_hasnulls are set to true, it means this
+ * summary was representing no rows. So we just set bv_hasnulls=false.
+ * Otherwise we need to set it to true, because there already were some
+ * NULL values, apparently.
+ */
+ column->bv_hasnulls = !column->bv_hasnulls;
+
updated = true;
}
else
diff --git a/src/backend/access/brin/brin_inclusion.c b/src/backend/access/brin/brin_inclusion.c
index e0f44d3e62c..dd8fe379c7b 100644
--- a/src/backend/access/brin/brin_inclusion.c
+++ b/src/backend/access/brin/brin_inclusion.c
@@ -164,7 +164,15 @@ brin_inclusion_add_value(PG_FUNCTION_ARGS)
column->bv_values[INCLUSION_UNMERGEABLE] = BoolGetDatum(false);
column->bv_values[INCLUSION_CONTAINS_EMPTY] = BoolGetDatum(false);
column->bv_allnulls = false;
- column->bv_hasnulls = true;
+
+ /*
+ * When both bv_allnulls and bv_hasnulls are set to true, it means this
+ * summary was representing no rows. So we just set bv_hasnulls=false.
+ * Otherwise we need to set it to true, because there already were some
+ * NULL values, apparently.
+ */
+ column->bv_hasnulls = !column->bv_hasnulls;
+
new = true;
}
diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 8a5661a8952..ead9e8f4e36 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -90,7 +90,15 @@ brin_minmax_add_value(PG_FUNCTION_ARGS)
column->bv_values[0] = datumCopy(newval, attr->attbyval, attr->attlen);
column->bv_values[1] = datumCopy(newval, attr->attbyval, attr->attlen);
column->bv_allnulls = false;
- column->bv_hasnulls = true;
+
+ /*
+ * When both bv_allnulls and bv_hasnulls are set to true, it means this
+ * summary was representing no rows. So we just set bv_hasnulls=false.
+ * Otherwise we need to set it to true, because there already were some
+ * NULL values, apparently.
+ */
+ column->bv_hasnulls = !column->bv_hasnulls;
+
PG_RETURN_BOOL(true);
}
diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index 4e7119e2d78..410bfdcfa79 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2500,7 +2500,15 @@ brin_minmax_multi_add_value(PG_FUNCTION_ARGS)
MemoryContextSwitchTo(oldctx);
column->bv_allnulls = false;
- column->bv_hasnulls = true;
+
+ /*
+ * When both bv_allnulls and bv_hasnulls are set to true, it means this
+ * summary was representing no rows. So we just set bv_hasnulls=false.
+ * Otherwise we need to set it to true, because there already were some
+ * NULL values, apparently.
+ */
+ column->bv_hasnulls = !column->bv_hasnulls;
+
modified = true;
column->bv_mem_value = PointerGetDatum(ranges);
diff --git a/src/backend/access/brin/brin_tuple.c b/src/backend/access/brin/brin_tuple.c
index c0e2dbd23ba..4d2a45ddcb6 100644
--- a/src/backend/access/brin/brin_tuple.c
+++ b/src/backend/access/brin/brin_tuple.c
@@ -517,7 +517,7 @@ brin_memtuple_initialize(BrinMemTuple *dtuple, BrinDesc *brdesc)
{
dtuple->bt_columns[i].bv_attno = i + 1;
dtuple->bt_columns[i].bv_allnulls = true;
- dtuple->bt_columns[i].bv_hasnulls = false;
+ dtuple->bt_columns[i].bv_hasnulls = true;
dtuple->bt_columns[i].bv_values = (Datum *) currdatum;
dtuple->bt_columns[i].bv_mem_value = PointerGetDatum(NULL);
--
2.37.3
On 2022-Oct-22, Tomas Vondra wrote:
I wonder how to do this in a back-patchable way - we can't add
parameters to the opclass procedure, and the other solution seems to be
storing it right in the BrinMemTuple, somehow. But that's likely an ABI
break :-(
Hmm, I don't see the ABI incompatibility. BrinMemTuple is an in-memory
structure, so you can add new members at the end of the struct and it
will pose no problems to existing code.
The only solution I can think of is actually passing it using all_nulls
and has_nulls - we could set both flags to true (which we never do now)
and teach the opclass that it signifies "empty" (and thus not to update
has_nulls after resetting all_nulls).Something like the attached (I haven't added any more tests, not sure
what would those look like - I can't think of a query testing this,
although maybe we could check how the flags change using pageinspect).
I'll try to have a look at these patches tomorrow or on Monday.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"I suspect most samba developers are already technically insane...
Of course, since many of them are Australians, you can't tell." (L. Torvalds)
On 10/22/22 10:00, Alvaro Herrera wrote:
On 2022-Oct-22, Tomas Vondra wrote:
I wonder how to do this in a back-patchable way - we can't add
parameters to the opclass procedure, and the other solution seems to be
storing it right in the BrinMemTuple, somehow. But that's likely an ABI
break :-(Hmm, I don't see the ABI incompatibility. BrinMemTuple is an in-memory
structure, so you can add new members at the end of the struct and it
will pose no problems to existing code.
But we're not passing BrinMemTuple to the opclass procedures - we're
passing a pointer to BrinValues, so we'd have to add the flag there. And
we're storing an array of those, so adding a field may shift the array
even if you add it at the end. Not sure if that's OK or not.
The only solution I can think of is actually passing it using all_nulls
and has_nulls - we could set both flags to true (which we never do now)
and teach the opclass that it signifies "empty" (and thus not to update
has_nulls after resetting all_nulls).Something like the attached (I haven't added any more tests, not sure
what would those look like - I can't think of a query testing this,
although maybe we could check how the flags change using pageinspect).I'll try to have a look at these patches tomorrow or on Monday.
I was experimenting with this a bit more, and unfortunately the latest
patch is still a few bricks shy - it did fix this particular issue, but
there were other cases that remained/got broken. See the first patch,
that adds a bunch of pageinspect tests testing different combinations.
After thinking about it a bit more, I think we can't quite fix this at
the opclass level, so the yesterday's patches are wrong. Instead, this
should be fixed in values_add_to_range() - the whole trick is we need to
remember the range was empty at the beginning, and only set the flag
when allnulls is false.
The reworked patch does that. And we can use the same logic (both flags
set mean no tuples were added to the range) when building the index, a
separate flag is not needed.
This slightly affects existing regression tests, because we won't create
any ranges for empty table (now we created one, because we initialized a
tuple in brinbuild and then wrote it to disk). This means that
brin_summarize_range now returns 0, but I think that's fine.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
0001-pageinspect-brinbugs-test-20221022.patchtext/x-patch; charset=UTF-8; name=0001-pageinspect-brinbugs-test-20221022.patchDownload
From 5456cf819426d3f90c004f673dfc863903e568a1 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 22 Oct 2022 12:47:33 +0200
Subject: [PATCH 1/2] pageinspect brinbugs test
---
contrib/pageinspect/Makefile | 2 +-
contrib/pageinspect/expected/brinbugs.out | 222 ++++++++++++++++++++++
contrib/pageinspect/sql/brinbugs.sql | 114 +++++++++++
3 files changed, 337 insertions(+), 1 deletion(-)
create mode 100644 contrib/pageinspect/expected/brinbugs.out
create mode 100644 contrib/pageinspect/sql/brinbugs.sql
diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index 5c0736564ab..92305e981f7 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -21,7 +21,7 @@ DATA = pageinspect--1.9--1.10.sql pageinspect--1.8--1.9.sql \
pageinspect--1.0--1.1.sql
PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
-REGRESS = page btree brin gin gist hash checksum oldextversions
+REGRESS = page btree brin gin gist hash checksum oldextversions brinbugs
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/pageinspect/expected/brinbugs.out b/contrib/pageinspect/expected/brinbugs.out
new file mode 100644
index 00000000000..23843caa138
--- /dev/null
+++ b/contrib/pageinspect/expected/brinbugs.out
@@ -0,0 +1,222 @@
+create extension pageinspect;
+create table t (a int, b int);
+create index on t using brin (a, b);
+-- both columns should have has_nulls=false and [1,1] range
+truncate t;
+insert into t values (1,1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | f | f | {1 .. 1}
+ 1 | 0 | 2 | f | f | f | {1 .. 1}
+(2 rows)
+
+-- first column should have all_nulls=true, second has_nulls=false and [1,1] range
+truncate t;
+insert into t values (null, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | t | f | f |
+ 1 | 0 | 2 | f | f | f | {1 .. 1}
+(2 rows)
+
+-- both columns should have all_nulls=true
+truncate t;
+insert into t values (null, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+-------
+ 1 | 0 | 1 | t | f | f |
+ 1 | 0 | 2 | t | f | f |
+(2 rows)
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (null, null);
+insert into t values (1, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | t | f | {1 .. 1}
+ 1 | 0 | 2 | f | t | f | {1 .. 1}
+(2 rows)
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (1, 1);
+insert into t values (null, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | t | f | {1 .. 1}
+ 1 | 0 | 2 | f | t | f | {1 .. 1}
+(2 rows)
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (null, 1);
+insert into t values (1, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | t | f | {1 .. 1}
+ 1 | 0 | 2 | f | t | f | {1 .. 1}
+(2 rows)
+
+-- both columns should have has_nulls=false and [1,1] range
+truncate t;
+insert into t values (1, 1);
+insert into t values (1, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | f | f | {1 .. 1}
+ 1 | 0 | 2 | f | f | f | {1 .. 1}
+(2 rows)
+
+-- both columns should have all_nulls=true only
+truncate t;
+insert into t values (null, null);
+insert into t values (null, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+-------
+ 1 | 0 | 1 | t | f | f |
+ 1 | 0 | 2 | t | f | f |
+(2 rows)
+
+-- first column should have has_nulls=false and [1,1] range, second all_nulls=true
+truncate t;
+insert into t values (1, null);
+insert into t values (1, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | f | f | {1 .. 1}
+ 1 | 0 | 2 | t | f | f |
+(2 rows)
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (null, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+-------
+ 1 | 0 | 1 | t | f | f |
+ 1 | 0 | 2 | t | f | f |
+(2 rows)
+
+insert into t values (1,1);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | t | f | {1 .. 1}
+ 1 | 0 | 2 | f | t | f | {1 .. 1}
+(2 rows)
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (1, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | f | f | {1 .. 1}
+ 1 | 0 | 2 | f | f | f | {1 .. 1}
+(2 rows)
+
+insert into t values (null, null);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | t | f | {1 .. 1}
+ 1 | 0 | 2 | f | t | f | {1 .. 1}
+(2 rows)
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (null, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | t | f | f |
+ 1 | 0 | 2 | f | f | f | {1 .. 1}
+(2 rows)
+
+insert into t values (1, null);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | t | f | {1 .. 1}
+ 1 | 0 | 2 | f | t | f | {1 .. 1}
+(2 rows)
+
+-- both columns should have has_nulls=false and [1,1] range
+truncate t;
+insert into t values (1, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | f | f | {1 .. 1}
+ 1 | 0 | 2 | f | f | f | {1 .. 1}
+(2 rows)
+
+insert into t values (1, 1);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | f | f | {1 .. 1}
+ 1 | 0 | 2 | f | f | f | {1 .. 1}
+(2 rows)
+
+-- both columns should have all_nulls=true
+truncate t;
+insert into t values (null, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+-------
+ 1 | 0 | 1 | t | f | f |
+ 1 | 0 | 2 | t | f | f |
+(2 rows)
+
+insert into t values (null, null);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+-------
+ 1 | 0 | 1 | t | f | f |
+ 1 | 0 | 2 | t | f | f |
+(2 rows)
+
+-- first column should have has_nulls=false and [1,1] range, second all_nulls=true
+truncate t;
+insert into t values (1, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | f | f | {1 .. 1}
+ 1 | 0 | 2 | t | f | f |
+(2 rows)
+
+insert into t values (1, null);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | f | f | {1 .. 1}
+ 1 | 0 | 2 | t | f | f |
+(2 rows)
+
diff --git a/contrib/pageinspect/sql/brinbugs.sql b/contrib/pageinspect/sql/brinbugs.sql
new file mode 100644
index 00000000000..a141aed5adc
--- /dev/null
+++ b/contrib/pageinspect/sql/brinbugs.sql
@@ -0,0 +1,114 @@
+create extension pageinspect;
+
+create table t (a int, b int);
+create index on t using brin (a, b);
+
+
+-- both columns should have has_nulls=false and [1,1] range
+truncate t;
+insert into t values (1,1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- first column should have all_nulls=true, second has_nulls=false and [1,1] range
+truncate t;
+insert into t values (null, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- both columns should have all_nulls=true
+truncate t;
+insert into t values (null, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (null, null);
+insert into t values (1, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (1, 1);
+insert into t values (null, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (null, 1);
+insert into t values (1, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- both columns should have has_nulls=false and [1,1] range
+truncate t;
+insert into t values (1, 1);
+insert into t values (1, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- both columns should have all_nulls=true only
+truncate t;
+insert into t values (null, null);
+insert into t values (null, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- first column should have has_nulls=false and [1,1] range, second all_nulls=true
+truncate t;
+insert into t values (1, null);
+insert into t values (1, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (null, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+insert into t values (1,1);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (1, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+insert into t values (null, null);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (null, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+insert into t values (1, null);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- both columns should have has_nulls=false and [1,1] range
+truncate t;
+insert into t values (1, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+insert into t values (1, 1);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- both columns should have all_nulls=true
+truncate t;
+insert into t values (null, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+insert into t values (null, null);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- first column should have has_nulls=false and [1,1] range, second all_nulls=true
+truncate t;
+insert into t values (1, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+insert into t values (1, null);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
--
2.37.3
0002-fixup-brin-has_nulls-20221022.patchtext/x-patch; charset=UTF-8; name=0002-fixup-brin-has_nulls-20221022.patchDownload
From 3f7e2f05570a11b430e40c184b867775cce5efe9 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Thu, 20 Oct 2022 19:55:23 +0200
Subject: [PATCH 2/2] fixup: brin has_nulls
---
src/backend/access/brin/brin.c | 76 ++++++++++++++++++---
src/backend/access/brin/brin_minmax_multi.c | 1 +
src/backend/access/brin/brin_tuple.c | 13 +++-
src/test/regress/expected/brin.out | 2 +-
src/test/regress/expected/brin_bloom.out | 2 +-
src/test/regress/expected/brin_multi.out | 2 +-
6 files changed, 83 insertions(+), 13 deletions(-)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 20b7d65b948..ee9b3bb0574 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -1568,15 +1568,36 @@ form_and_insert_tuple(BrinBuildState *state)
{
BrinTuple *tup;
Size size;
+ bool modified = false;
+ BrinMemTuple *dtuple = state->bs_dtuple;
+ int i;
- tup = brin_form_tuple(state->bs_bdesc, state->bs_currRangeStart,
- state->bs_dtuple, &size);
- brin_doinsert(state->bs_irel, state->bs_pagesPerRange, state->bs_rmAccess,
- &state->bs_currentInsertBuf, state->bs_currRangeStart,
- tup, size);
- state->bs_numtuples++;
+ /*
+ * Was the memtuple modified (any tuples added to it)?
+ *
+ * Should be enough to check just the first attribute - either we add a row
+ * to all columns or none of them.
+ */
+ for (i = 0; i < state->bs_bdesc->bd_tupdesc->natts; i++)
+ {
+ if (!(dtuple->bt_columns[i].bv_allnulls &&
+ dtuple->bt_columns[i].bv_hasnulls))
+ {
+ modified = true;
+ break;
+ }
+ }
- pfree(tup);
+ if (modified)
+ {
+ tup = brin_form_tuple(state->bs_bdesc, state->bs_currRangeStart,
+ state->bs_dtuple, &size);
+ brin_doinsert(state->bs_irel, state->bs_pagesPerRange, state->bs_rmAccess,
+ &state->bs_currentInsertBuf, state->bs_currRangeStart,
+ tup, size);
+ state->bs_numtuples++;
+ pfree(tup);
+ }
}
/*
@@ -1710,24 +1731,53 @@ add_values_to_range(Relation idxRel, BrinDesc *bdesc, BrinMemTuple *dtup,
Datum result;
BrinValues *bval;
FmgrInfo *addValue;
+ bool first_row;
+ bool has_nulls = false;
bval = &dtup->bt_columns[keyno];
+ /*
+ * Is this the first tuple we're adding to the range range? We track
+ * that by setting both bv_hasnulls and bval->bv_allnulls to true
+ * during initialization. But it's not a valid combination (at most
+ * one of those flags should be set), so we reset the second flag.
+ */
+ first_row = (bval->bv_hasnulls && bval->bv_allnulls);
+
+ if (bval->bv_hasnulls && bval->bv_allnulls)
+ {
+ bval->bv_hasnulls = false;
+ modified = true;
+ }
+
if (bdesc->bd_info[keyno]->oi_regular_nulls && nulls[keyno])
{
/*
* If the new value is null, we record that we saw it if it's the
* first one; otherwise, there's nothing to do.
+ *
+ * XXX This used to check "hasnulls" but now that might result in
+ * having both flags set. That used to be OK, because we just
+ * ignore hasnulls flag in brin_form_tuple when allnulls=true.
+ * But now we interpret this combination as "firt row" so it
+ * would confuse following calls. So make sure to only set one
+ * of the flags - when allnulls=true we're done, as it already
+ * marks the range as containing ranges.
*/
- if (!bval->bv_hasnulls)
+ if (!bval->bv_allnulls)
{
bval->bv_hasnulls = true;
modified = true;
}
-
continue;
}
+ /*
+ * Does the range already has NULL values? Either of the flags can
+ * be set, but we ignore the state before adding first row.
+ */
+ has_nulls = (bval->bv_hasnulls || bval->bv_allnulls) && !first_row;
+
addValue = index_getprocinfo(idxRel, keyno + 1,
BRIN_PROCNUM_ADDVALUE);
result = FunctionCall4Coll(addValue,
@@ -1736,8 +1786,16 @@ add_values_to_range(Relation idxRel, BrinDesc *bdesc, BrinMemTuple *dtup,
PointerGetDatum(bval),
values[keyno],
nulls[keyno]);
+
/* if that returned true, we need to insert the updated tuple */
modified |= DatumGetBool(result);
+
+ /*
+ * If we had NULLS, and the opclass didn't set allnulls=true, set
+ * the hasnulls so that we know there are NULL values.
+ */
+ if (has_nulls && !bval->bv_allnulls)
+ bval->bv_hasnulls = true;
}
return modified;
diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index 9a0bcf6698d..d5ce5b47ff4 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2500,6 +2500,7 @@ brin_minmax_multi_add_value(PG_FUNCTION_ARGS)
MemoryContextSwitchTo(oldctx);
column->bv_allnulls = false;
+
modified = true;
column->bv_mem_value = PointerGetDatum(ranges);
diff --git a/src/backend/access/brin/brin_tuple.c b/src/backend/access/brin/brin_tuple.c
index c0e2dbd23ba..7ea272c2f52 100644
--- a/src/backend/access/brin/brin_tuple.c
+++ b/src/backend/access/brin/brin_tuple.c
@@ -136,6 +136,9 @@ brin_form_tuple(BrinDesc *brdesc, BlockNumber blkno, BrinMemTuple *tuple,
{
int datumno;
+ Assert(!(tuple->bt_columns[keyno].bv_hasnulls &&
+ tuple->bt_columns[keyno].bv_allnulls));
+
/*
* "allnulls" is set when there's no nonnull value in any row in the
* column; when this happens, there is no data to store. Thus set the
@@ -517,7 +520,7 @@ brin_memtuple_initialize(BrinMemTuple *dtuple, BrinDesc *brdesc)
{
dtuple->bt_columns[i].bv_attno = i + 1;
dtuple->bt_columns[i].bv_allnulls = true;
- dtuple->bt_columns[i].bv_hasnulls = false;
+ dtuple->bt_columns[i].bv_hasnulls = true;
dtuple->bt_columns[i].bv_values = (Datum *) currdatum;
dtuple->bt_columns[i].bv_mem_value = PointerGetDatum(NULL);
@@ -585,6 +588,14 @@ brin_deform_tuple(BrinDesc *brdesc, BrinTuple *tuple, BrinMemTuple *dMemtuple)
{
int i;
+ Assert(!(allnulls[keyno] && hasnulls[keyno]));
+
+ /*
+ * Make sure to overwrite the hasnulls flag, because it might have
+ * been initialized as true by brin_memtuple_initialize.
+ */
+ dtup->bt_columns[keyno].bv_hasnulls = hasnulls[keyno];
+
if (allnulls[keyno])
{
valueno += brdesc->bd_info[keyno]->oi_nstored;
diff --git a/src/test/regress/expected/brin.out b/src/test/regress/expected/brin.out
index 73fa38396e4..ebc31222354 100644
--- a/src/test/regress/expected/brin.out
+++ b/src/test/regress/expected/brin.out
@@ -454,7 +454,7 @@ $$;
SELECT brin_summarize_range('brin_summarize_idx', 0);
brin_summarize_range
----------------------
- 0
+ 1
(1 row)
-- nothing: already summarized
diff --git a/src/test/regress/expected/brin_bloom.out b/src/test/regress/expected/brin_bloom.out
index 32c56a996a2..6e847f9113d 100644
--- a/src/test/regress/expected/brin_bloom.out
+++ b/src/test/regress/expected/brin_bloom.out
@@ -373,7 +373,7 @@ $$;
SELECT brin_summarize_range('brin_summarize_bloom_idx', 0);
brin_summarize_range
----------------------
- 0
+ 1
(1 row)
-- nothing: already summarized
diff --git a/src/test/regress/expected/brin_multi.out b/src/test/regress/expected/brin_multi.out
index f3309f433f8..e65f1c20d4f 100644
--- a/src/test/regress/expected/brin_multi.out
+++ b/src/test/regress/expected/brin_multi.out
@@ -407,7 +407,7 @@ $$;
SELECT brin_summarize_range('brin_summarize_multi_idx', 0);
brin_summarize_range
----------------------
- 0
+ 1
(1 row)
-- nothing: already summarized
--
2.37.3
Hi, Tomas:
For 0002-fixup-brin-has_nulls-20221022.patch :
+ first_row = (bval->bv_hasnulls && bval->bv_allnulls);
+
+ if (bval->bv_hasnulls && bval->bv_allnulls)
It seems the if condition can be changed to `if (first_row)` which is more
readable.
Chhers
Import Notes
Resolved by subject fallback
Here's an improved version of the fix I posted about a month ago.
0001
Adds tests demonstrating the issue, as before. I realized there's an
isolation test in src/test/module/brin that can demonstrate this, so I
modified it too, not just the pageinspect test as before.
0002
Uses the combination of all_nulls/has_nulls to identify "empty" range,
and does not store them to disk. I however realized not storing "empty"
ranges is probably not desirable. Imagine a table with a "gap" (e.g. due
to a batch DELETE) of pages with no rows:
create table x (a int) with (fillfactor = 10);
insert into x select i from generate_series(1,1000) s(i);
delete from x where a < 1000;
create index on x using brin(a) with (pages_per_range=1);
Any bitmap index scan using this index would have to scan all those
empty ranges, because there are no summaries.
0003
Still uses the all_nulls/has_nulls flags to identify empty ranges, but
stores them - and then we check the combination in bringetbitmap() to
skip those ranges as not matching any scan keys.
This also restores some of the existing behavior - for example creating
a BRIN index on entirely empty table (no pages at all) still allocates a
48kB index (3 index pages, 3 fsm pages). Seems a bit strange, but it's
an existing behavior.
As explained before, I've considered adding an new flag to one of the
BRIN structs - BrinMemTuple or BrinValues. But we can't add as last
field to BrinMemTuple because there already is FLEXIBLE_ARRAY_MEMBER,
and adding a field to BrinValues would change stride of the bt_columns
array. So this would break ABI, making this not backpatchable.
Furthermore, if we want to store summaries for empty ranges (which is
what 0003 does), we need to store the flag in the BRIN index tuple. And
we can't change the on-disk representation in backbranches, so encoding
this in the existing tuple seems like the only way.
So using the combination of all_nulls/has_nulls flag seems like the only
viable option, unfortunately.
Opinions? Considering this will need to be backpatches, it'd be good to
get some feedback on the approach. I think it's fine, but it would be
unfortunate to fix one issue but break BRIN in a different way.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
0001-pageinspect-brinbugs-test.patchtext/x-patch; charset=UTF-8; name=0001-pageinspect-brinbugs-test.patchDownload
From dea5e7aa821ddf745e509371f33bf1953ff6e853 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 22 Oct 2022 12:47:33 +0200
Subject: [PATCH 1/3] pageinspect brinbugs test
Introduce a brinbugs.sql test suite into pageinspect, demonstrating the
issue with forgetting about initial NULL values. Ultimately this should
be added to the exisging brin.sql suite.
Furthermore, this tweaks an existing isolation test, originally intended
to test concurrent inserts and summarization, to also test this - it's
enough to ensure the first value added to the table is NULL.
---
contrib/pageinspect/Makefile | 2 +-
contrib/pageinspect/expected/brinbugs.out | 222 ++++++++++++++++++
contrib/pageinspect/sql/brinbugs.sql | 114 +++++++++
...summarization-and-inprogress-insertion.out | 6 +-
...ummarization-and-inprogress-insertion.spec | 1 +
5 files changed, 341 insertions(+), 4 deletions(-)
create mode 100644 contrib/pageinspect/expected/brinbugs.out
create mode 100644 contrib/pageinspect/sql/brinbugs.sql
diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index ad5a3ac5112..67eb02b78fd 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -22,7 +22,7 @@ DATA = pageinspect--1.10--1.11.sql \
pageinspect--1.0--1.1.sql
PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
-REGRESS = page btree brin gin gist hash checksum oldextversions
+REGRESS = page btree brin gin gist hash checksum oldextversions brinbugs
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/pageinspect/expected/brinbugs.out b/contrib/pageinspect/expected/brinbugs.out
new file mode 100644
index 00000000000..23843caa138
--- /dev/null
+++ b/contrib/pageinspect/expected/brinbugs.out
@@ -0,0 +1,222 @@
+create extension pageinspect;
+create table t (a int, b int);
+create index on t using brin (a, b);
+-- both columns should have has_nulls=false and [1,1] range
+truncate t;
+insert into t values (1,1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | f | f | {1 .. 1}
+ 1 | 0 | 2 | f | f | f | {1 .. 1}
+(2 rows)
+
+-- first column should have all_nulls=true, second has_nulls=false and [1,1] range
+truncate t;
+insert into t values (null, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | t | f | f |
+ 1 | 0 | 2 | f | f | f | {1 .. 1}
+(2 rows)
+
+-- both columns should have all_nulls=true
+truncate t;
+insert into t values (null, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+-------
+ 1 | 0 | 1 | t | f | f |
+ 1 | 0 | 2 | t | f | f |
+(2 rows)
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (null, null);
+insert into t values (1, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | t | f | {1 .. 1}
+ 1 | 0 | 2 | f | t | f | {1 .. 1}
+(2 rows)
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (1, 1);
+insert into t values (null, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | t | f | {1 .. 1}
+ 1 | 0 | 2 | f | t | f | {1 .. 1}
+(2 rows)
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (null, 1);
+insert into t values (1, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | t | f | {1 .. 1}
+ 1 | 0 | 2 | f | t | f | {1 .. 1}
+(2 rows)
+
+-- both columns should have has_nulls=false and [1,1] range
+truncate t;
+insert into t values (1, 1);
+insert into t values (1, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | f | f | {1 .. 1}
+ 1 | 0 | 2 | f | f | f | {1 .. 1}
+(2 rows)
+
+-- both columns should have all_nulls=true only
+truncate t;
+insert into t values (null, null);
+insert into t values (null, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+-------
+ 1 | 0 | 1 | t | f | f |
+ 1 | 0 | 2 | t | f | f |
+(2 rows)
+
+-- first column should have has_nulls=false and [1,1] range, second all_nulls=true
+truncate t;
+insert into t values (1, null);
+insert into t values (1, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | f | f | {1 .. 1}
+ 1 | 0 | 2 | t | f | f |
+(2 rows)
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (null, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+-------
+ 1 | 0 | 1 | t | f | f |
+ 1 | 0 | 2 | t | f | f |
+(2 rows)
+
+insert into t values (1,1);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | t | f | {1 .. 1}
+ 1 | 0 | 2 | f | t | f | {1 .. 1}
+(2 rows)
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (1, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | f | f | {1 .. 1}
+ 1 | 0 | 2 | f | f | f | {1 .. 1}
+(2 rows)
+
+insert into t values (null, null);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | t | f | {1 .. 1}
+ 1 | 0 | 2 | f | t | f | {1 .. 1}
+(2 rows)
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (null, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | t | f | f |
+ 1 | 0 | 2 | f | f | f | {1 .. 1}
+(2 rows)
+
+insert into t values (1, null);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | t | f | {1 .. 1}
+ 1 | 0 | 2 | f | t | f | {1 .. 1}
+(2 rows)
+
+-- both columns should have has_nulls=false and [1,1] range
+truncate t;
+insert into t values (1, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | f | f | {1 .. 1}
+ 1 | 0 | 2 | f | f | f | {1 .. 1}
+(2 rows)
+
+insert into t values (1, 1);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | f | f | {1 .. 1}
+ 1 | 0 | 2 | f | f | f | {1 .. 1}
+(2 rows)
+
+-- both columns should have all_nulls=true
+truncate t;
+insert into t values (null, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+-------
+ 1 | 0 | 1 | t | f | f |
+ 1 | 0 | 2 | t | f | f |
+(2 rows)
+
+insert into t values (null, null);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+-------
+ 1 | 0 | 1 | t | f | f |
+ 1 | 0 | 2 | t | f | f |
+(2 rows)
+
+-- first column should have has_nulls=false and [1,1] range, second all_nulls=true
+truncate t;
+insert into t values (1, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | f | f | {1 .. 1}
+ 1 | 0 | 2 | t | f | f |
+(2 rows)
+
+insert into t values (1, null);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | f | f | {1 .. 1}
+ 1 | 0 | 2 | t | f | f |
+(2 rows)
+
diff --git a/contrib/pageinspect/sql/brinbugs.sql b/contrib/pageinspect/sql/brinbugs.sql
new file mode 100644
index 00000000000..a141aed5adc
--- /dev/null
+++ b/contrib/pageinspect/sql/brinbugs.sql
@@ -0,0 +1,114 @@
+create extension pageinspect;
+
+create table t (a int, b int);
+create index on t using brin (a, b);
+
+
+-- both columns should have has_nulls=false and [1,1] range
+truncate t;
+insert into t values (1,1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- first column should have all_nulls=true, second has_nulls=false and [1,1] range
+truncate t;
+insert into t values (null, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- both columns should have all_nulls=true
+truncate t;
+insert into t values (null, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (null, null);
+insert into t values (1, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (1, 1);
+insert into t values (null, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (null, 1);
+insert into t values (1, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- both columns should have has_nulls=false and [1,1] range
+truncate t;
+insert into t values (1, 1);
+insert into t values (1, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- both columns should have all_nulls=true only
+truncate t;
+insert into t values (null, null);
+insert into t values (null, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- first column should have has_nulls=false and [1,1] range, second all_nulls=true
+truncate t;
+insert into t values (1, null);
+insert into t values (1, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (null, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+insert into t values (1,1);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (1, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+insert into t values (null, null);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- both columns should have has_nulls=true and [1,1] range
+truncate t;
+insert into t values (null, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+insert into t values (1, null);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- both columns should have has_nulls=false and [1,1] range
+truncate t;
+insert into t values (1, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+insert into t values (1, 1);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- both columns should have all_nulls=true
+truncate t;
+insert into t values (null, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+insert into t values (null, null);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+
+-- first column should have has_nulls=false and [1,1] range, second all_nulls=true
+truncate t;
+insert into t values (1, null);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+insert into t values (1, null);
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
diff --git a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
index 2a4755d0998..02ef52d299a 100644
--- a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
+++ b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
@@ -4,7 +4,7 @@ starting permutation: s2check s1b s2b s1i s2summ s1c s2c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -26,7 +26,7 @@ step s2c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
2| 1| 1|f |f |f |{1 .. 1000}
(2 rows)
@@ -35,7 +35,7 @@ starting permutation: s2check s1b s1i s2vacuum s1c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
diff --git a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
index 19ac18a2e88..18ba92b7ba1 100644
--- a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
+++ b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
@@ -9,6 +9,7 @@ setup
) WITH (fillfactor=10);
CREATE INDEX brinidx ON brin_iso USING brin (value) WITH (pages_per_range=1);
-- this fills the first page
+ INSERT INTO brin_iso VALUES (NULL);
DO $$
DECLARE curtid tid;
BEGIN
--
2.38.1
0002-Fix-handling-of-NULLs-when-building-BRIN-summaries.patchtext/x-patch; charset=UTF-8; name=0002-Fix-handling-of-NULLs-when-building-BRIN-summaries.patchDownload
From 846f4a434a1cc7a72a5beb88326f9c03c9d599f1 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Thu, 20 Oct 2022 19:55:23 +0200
Subject: [PATCH 2/3] Fix handling of NULLs when building BRIN summaries
The existing code initializes all_nulls = true for new summaries, but
this poses issue when adding the first non-NULL value. We have to reset
all_nulls to false, but we don't know whether to modify has_nulls.
If there was a NULL value before, we need to set has_nulls=true. But if
the range was empty and we're adding the first value, we must not set
has_nulls.
The current code resets all_nulls=false, without setting has_nulls, but
that means we may forget the range contains NULL values. So this is a
index corruption, producing incorrect results to IS NULL conditions.
We might always set has_nulls=true whenever resetting all_nulls, which
would resolve the index corruption, but it'd mean all ranges have either
all_nulls or has_nulls set, making the index useless for IS [NOT] NULL
queries.
Ideally, we'd add a new flag to identify empty summaries, but that does
not really work for backpatching - we'd need to add the flag to some
struct, e.g. BrinValues, but that'd change stride of the bt_columns
array, i.e. an ABI break.
So instead we use an "impossible" combination with both all_nulls and
has_nulls set to true to identify this case. And we never store index
tuples with this combination.
Note: This may be an issue because we won't store summaries for empty
ranges, making them match any condition. So far we had the same issue
for IS NULL conditions only.
---
.../expected/pg_freespacemap.out | 11 ++-
.../pg_freespacemap/sql/pg_freespacemap.sql | 8 +-
src/backend/access/brin/brin.c | 88 +++++++++++++++++--
src/backend/access/brin/brin_tuple.c | 19 +++-
...summarization-and-inprogress-insertion.out | 20 +----
...ummarization-and-inprogress-insertion.spec | 4 +-
src/test/modules/brin/t/01_workitems.pl | 11 +--
src/test/regress/expected/brin.out | 2 +-
src/test/regress/expected/brin_bloom.out | 2 +-
src/test/regress/expected/brin_multi.out | 2 +-
10 files changed, 126 insertions(+), 41 deletions(-)
diff --git a/contrib/pg_freespacemap/expected/pg_freespacemap.out b/contrib/pg_freespacemap/expected/pg_freespacemap.out
index eb574c23736..fa0d78c88a4 100644
--- a/contrib/pg_freespacemap/expected/pg_freespacemap.out
+++ b/contrib/pg_freespacemap/expected/pg_freespacemap.out
@@ -1,8 +1,11 @@
CREATE EXTENSION pg_freespacemap;
-CREATE TABLE freespace_tab (c1 int) WITH (autovacuum_enabled = off);
-CREATE INDEX freespace_brin ON freespace_tab USING brin (c1);
+CREATE TABLE freespace_tab (c1 int) WITH (autovacuum_enabled = off, fillfactor = 10);
+CREATE INDEX freespace_brin ON freespace_tab USING brin (c1) WITH (pages_per_range=1);
CREATE INDEX freespace_btree ON freespace_tab USING btree (c1);
CREATE INDEX freespace_hash ON freespace_tab USING hash (c1);
+-- necessary to build the first BRIN index tuple
+INSERT INTO freespace_tab VALUES (1);
+VACUUM;
-- report all the sizes of the FSMs for all the relation blocks.
WITH rel AS (SELECT oid::regclass AS id FROM pg_class WHERE relname ~ 'freespace')
SELECT rel.id, fsm.blkno, (fsm.avail > 0) AS is_avail
@@ -10,10 +13,12 @@ WITH rel AS (SELECT oid::regclass AS id FROM pg_class WHERE relname ~ 'freespace
ORDER BY 1, 2;
id | blkno | is_avail
-----------------+-------+----------
+ freespace_tab | 0 | t
freespace_brin | 0 | f
freespace_brin | 1 | f
freespace_brin | 2 | t
freespace_btree | 0 | f
+ freespace_btree | 1 | f
freespace_hash | 0 | f
freespace_hash | 1 | f
freespace_hash | 2 | f
@@ -24,7 +29,7 @@ WITH rel AS (SELECT oid::regclass AS id FROM pg_class WHERE relname ~ 'freespace
freespace_hash | 7 | f
freespace_hash | 8 | f
freespace_hash | 9 | f
-(14 rows)
+(16 rows)
INSERT INTO freespace_tab VALUES (1);
VACUUM freespace_tab;
diff --git a/contrib/pg_freespacemap/sql/pg_freespacemap.sql b/contrib/pg_freespacemap/sql/pg_freespacemap.sql
index 06275d8fac8..efc0699aa6f 100644
--- a/contrib/pg_freespacemap/sql/pg_freespacemap.sql
+++ b/contrib/pg_freespacemap/sql/pg_freespacemap.sql
@@ -1,10 +1,14 @@
CREATE EXTENSION pg_freespacemap;
-CREATE TABLE freespace_tab (c1 int) WITH (autovacuum_enabled = off);
-CREATE INDEX freespace_brin ON freespace_tab USING brin (c1);
+CREATE TABLE freespace_tab (c1 int) WITH (autovacuum_enabled = off, fillfactor = 10);
+CREATE INDEX freespace_brin ON freespace_tab USING brin (c1) WITH (pages_per_range=1);
CREATE INDEX freespace_btree ON freespace_tab USING btree (c1);
CREATE INDEX freespace_hash ON freespace_tab USING hash (c1);
+-- necessary to build the first BRIN index tuple
+INSERT INTO freespace_tab VALUES (1);
+VACUUM;
+
-- report all the sizes of the FSMs for all the relation blocks.
WITH rel AS (SELECT oid::regclass AS id FROM pg_class WHERE relname ~ 'freespace')
SELECT rel.id, fsm.blkno, (fsm.avail > 0) AS is_avail
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 7e386250ae9..3ed8eefab86 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -1568,15 +1568,48 @@ form_and_insert_tuple(BrinBuildState *state)
{
BrinTuple *tup;
Size size;
+ bool modified = false;
+ BrinMemTuple *dtuple = state->bs_dtuple;
+ int i;
- tup = brin_form_tuple(state->bs_bdesc, state->bs_currRangeStart,
- state->bs_dtuple, &size);
- brin_doinsert(state->bs_irel, state->bs_pagesPerRange, state->bs_rmAccess,
- &state->bs_currentInsertBuf, state->bs_currRangeStart,
- tup, size);
- state->bs_numtuples++;
+ /*
+ * Check if any rows were processed for the page range represented by this
+ * memtuple. We initially set both allnulls/hasnulls to true to identify
+ * if the range is in this initial/empty state.
+ *
+ * XXX It should be enough to check only the first summary - either all
+ * summaries are empty or none of them.
+ */
+ for (i = 0; i < state->bs_bdesc->bd_tupdesc->natts; i++)
+ {
+ if (!(dtuple->bt_columns[i].bv_allnulls &&
+ dtuple->bt_columns[i].bv_hasnulls))
+ {
+ modified = true;
+ break;
+ }
+ }
- pfree(tup);
+ /*
+ * If the memtuple was modified (i.e. we added any rows to it), insert it
+ * into the index. That is, we don't store index tuples not representing
+ * any rows from table.
+ *
+ * XXX This has the undesirable consequence, that if the table has a gap
+ * (a long sequence of pages with no remaining tuples), we won't have any
+ * BRIN summaries for this part of the table. Which means that we'll have
+ * to scan this gap for each bitmap index scan.
+ */
+ if (modified)
+ {
+ tup = brin_form_tuple(state->bs_bdesc, state->bs_currRangeStart,
+ state->bs_dtuple, &size);
+ brin_doinsert(state->bs_irel, state->bs_pagesPerRange, state->bs_rmAccess,
+ &state->bs_currentInsertBuf, state->bs_currRangeStart,
+ tup, size);
+ state->bs_numtuples++;
+ pfree(tup);
+ }
}
/*
@@ -1710,24 +1743,53 @@ add_values_to_range(Relation idxRel, BrinDesc *bdesc, BrinMemTuple *dtup,
Datum result;
BrinValues *bval;
FmgrInfo *addValue;
+ bool first_row;
+ bool has_nulls = false;
bval = &dtup->bt_columns[keyno];
+ /*
+ * Is this the first tuple we're adding to the range range? We track
+ * that by setting both bv_hasnulls and bval->bv_allnulls to true
+ * during initialization. But it's not a valid combination (at most
+ * one of those flags should be set), so we reset the second flag.
+ */
+ first_row = (bval->bv_hasnulls && bval->bv_allnulls);
+
+ if (bval->bv_hasnulls && bval->bv_allnulls)
+ {
+ bval->bv_hasnulls = false;
+ modified = true;
+ }
+
if (bdesc->bd_info[keyno]->oi_regular_nulls && nulls[keyno])
{
/*
* If the new value is null, we record that we saw it if it's the
* first one; otherwise, there's nothing to do.
+ *
+ * XXX This used to check "hasnulls" but now that might result in
+ * having both flags set. That used to be OK, because we just
+ * ignore hasnulls flag in brin_form_tuple when allnulls=true.
+ * But now we interpret this combination as "firt row" so it
+ * would confuse following calls. So make sure to only set one
+ * of the flags - when allnulls=true we're done, as it already
+ * marks the range as containing ranges.
*/
- if (!bval->bv_hasnulls)
+ if (!bval->bv_allnulls)
{
bval->bv_hasnulls = true;
modified = true;
}
-
continue;
}
+ /*
+ * Does the range already has NULL values? Either of the flags can
+ * be set, but we ignore the state before adding first row.
+ */
+ has_nulls = (bval->bv_hasnulls || bval->bv_allnulls) && !first_row;
+
addValue = index_getprocinfo(idxRel, keyno + 1,
BRIN_PROCNUM_ADDVALUE);
result = FunctionCall4Coll(addValue,
@@ -1736,8 +1798,16 @@ add_values_to_range(Relation idxRel, BrinDesc *bdesc, BrinMemTuple *dtup,
PointerGetDatum(bval),
values[keyno],
nulls[keyno]);
+
/* if that returned true, we need to insert the updated tuple */
modified |= DatumGetBool(result);
+
+ /*
+ * If we had NULLS, and the opclass didn't set allnulls=true, set
+ * the hasnulls so that we know there are NULL values.
+ */
+ if (has_nulls && !bval->bv_allnulls)
+ bval->bv_hasnulls = true;
}
return modified;
diff --git a/src/backend/access/brin/brin_tuple.c b/src/backend/access/brin/brin_tuple.c
index c0e2dbd23ba..1b5e72cde24 100644
--- a/src/backend/access/brin/brin_tuple.c
+++ b/src/backend/access/brin/brin_tuple.c
@@ -136,6 +136,13 @@ brin_form_tuple(BrinDesc *brdesc, BlockNumber blkno, BrinMemTuple *tuple,
{
int datumno;
+ /*
+ * We should never get here for memtuples in the initial state, i.e.
+ * before any rows were added to it.
+ */
+ Assert(!(tuple->bt_columns[keyno].bv_hasnulls &&
+ tuple->bt_columns[keyno].bv_allnulls));
+
/*
* "allnulls" is set when there's no nonnull value in any row in the
* column; when this happens, there is no data to store. Thus set the
@@ -516,8 +523,10 @@ brin_memtuple_initialize(BrinMemTuple *dtuple, BrinDesc *brdesc)
for (i = 0; i < brdesc->bd_tupdesc->natts; i++)
{
dtuple->bt_columns[i].bv_attno = i + 1;
+
+ /* each memtuple starts as if it represents no rows */
dtuple->bt_columns[i].bv_allnulls = true;
- dtuple->bt_columns[i].bv_hasnulls = false;
+ dtuple->bt_columns[i].bv_hasnulls = true;
dtuple->bt_columns[i].bv_values = (Datum *) currdatum;
dtuple->bt_columns[i].bv_mem_value = PointerGetDatum(NULL);
@@ -585,6 +594,14 @@ brin_deform_tuple(BrinDesc *brdesc, BrinTuple *tuple, BrinMemTuple *dMemtuple)
{
int i;
+ Assert(!(allnulls[keyno] && hasnulls[keyno]));
+
+ /*
+ * Make sure to overwrite the hasnulls flag, because it might have
+ * been initialized as true by brin_memtuple_initialize.
+ */
+ dtup->bt_columns[keyno].bv_hasnulls = hasnulls[keyno];
+
if (allnulls[keyno])
{
valueno += brdesc->bd_info[keyno]->oi_nstored;
diff --git a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
index 02ef52d299a..2266012eac7 100644
--- a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
+++ b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
@@ -1,12 +1,6 @@
Parsed test spec with 2 sessions
-starting permutation: s2check s1b s2b s1i s2summ s1c s2c s2check
-step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
-itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
-----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |t |f |{1 .. 1}
-(1 row)
-
+starting permutation: s1b s2b s1i s2summ s1c s2c s2check
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
step s2b: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT 1;
?column?
@@ -18,7 +12,7 @@ step s1i: INSERT INTO brin_iso VALUES (1000);
step s2summ: SELECT brin_summarize_new_values('brinidx'::regclass);
brin_summarize_new_values
-------------------------
- 1
+ 2
(1 row)
step s1c: COMMIT;
@@ -31,13 +25,7 @@ itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
(2 rows)
-starting permutation: s2check s1b s1i s2vacuum s1c s2check
-step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
-itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
-----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |t |f |{1 .. 1}
-(1 row)
-
+starting permutation: s1b s1i s2vacuum s1c s2check
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
step s1i: INSERT INTO brin_iso VALUES (1000);
step s2vacuum: VACUUM brin_iso;
@@ -45,7 +33,7 @@ step s1c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
2| 1| 1|f |f |f |{1 .. 1000}
(2 rows)
diff --git a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
index 18ba92b7ba1..6319ae4c38d 100644
--- a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
+++ b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
@@ -41,5 +41,5 @@ step "s2vacuum" { VACUUM brin_iso; }
step "s2check" { SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass); }
-permutation "s2check" "s1b" "s2b" "s1i" "s2summ" "s1c" "s2c" "s2check"
-permutation "s2check" "s1b" "s1i" "s2vacuum" "s1c" "s2check"
+permutation "s1b" "s2b" "s1i" "s2summ" "s1c" "s2c" "s2check"
+permutation "s1b" "s1i" "s2vacuum" "s1c" "s2check"
diff --git a/src/test/modules/brin/t/01_workitems.pl b/src/test/modules/brin/t/01_workitems.pl
index 3108c02cf4d..eeec44b0060 100644
--- a/src/test/modules/brin/t/01_workitems.pl
+++ b/src/test/modules/brin/t/01_workitems.pl
@@ -24,20 +24,21 @@ $node->safe_psql(
create index brin_wi_idx on brin_wi using brin (a) with (pages_per_range=1, autosummarize=on);
'
);
-my $count = $node->safe_psql('postgres',
- "select count(*) from brin_page_items(get_raw_page('brin_wi_idx', 2), 'brin_wi_idx'::regclass)"
-);
-is($count, '1', "initial index state is correct");
$node->safe_psql('postgres',
'insert into brin_wi select * from generate_series(1, 100)');
+$node->poll_query_until(
+ 'postgres',
+ "select pg_relation_size('brin_wi_idx'::regclass) / current_setting('block_size')::int = 3",
+ 't');
+
$node->poll_query_until(
'postgres',
"select count(*) > 1 from brin_page_items(get_raw_page('brin_wi_idx', 2), 'brin_wi_idx'::regclass)",
't');
-$count = $node->safe_psql('postgres',
+my $count = $node->safe_psql('postgres',
"select count(*) > 1 from brin_page_items(get_raw_page('brin_wi_idx', 2), 'brin_wi_idx'::regclass)"
);
is($count, 't', "index got summarized");
diff --git a/src/test/regress/expected/brin.out b/src/test/regress/expected/brin.out
index 73fa38396e4..ebc31222354 100644
--- a/src/test/regress/expected/brin.out
+++ b/src/test/regress/expected/brin.out
@@ -454,7 +454,7 @@ $$;
SELECT brin_summarize_range('brin_summarize_idx', 0);
brin_summarize_range
----------------------
- 0
+ 1
(1 row)
-- nothing: already summarized
diff --git a/src/test/regress/expected/brin_bloom.out b/src/test/regress/expected/brin_bloom.out
index 32c56a996a2..6e847f9113d 100644
--- a/src/test/regress/expected/brin_bloom.out
+++ b/src/test/regress/expected/brin_bloom.out
@@ -373,7 +373,7 @@ $$;
SELECT brin_summarize_range('brin_summarize_bloom_idx', 0);
brin_summarize_range
----------------------
- 0
+ 1
(1 row)
-- nothing: already summarized
diff --git a/src/test/regress/expected/brin_multi.out b/src/test/regress/expected/brin_multi.out
index f3309f433f8..e65f1c20d4f 100644
--- a/src/test/regress/expected/brin_multi.out
+++ b/src/test/regress/expected/brin_multi.out
@@ -407,7 +407,7 @@ $$;
SELECT brin_summarize_range('brin_summarize_multi_idx', 0);
brin_summarize_range
----------------------
- 0
+ 1
(1 row)
-- nothing: already summarized
--
2.38.1
0003-Store-BRIN-summaries-for-empty-ranges.patchtext/x-patch; charset=UTF-8; name=0003-Store-BRIN-summaries-for-empty-ranges.patchDownload
From 39ff4c701b619aee0feba6767d71ffec6ae256ca Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 27 Nov 2022 22:44:56 +0100
Subject: [PATCH 3/3] Store BRIN summaries for empty ranges
Instead of ignoring summaries representing ranges with no tuples (e.g.
after a large batch DELETE), store the tuple with the "impossible"
combination of all_nulls/has_nulls flags.
When querying the index, we then consider these ranges as not matching
any condition.
---
src/backend/access/brin/brin.c | 58 ++++++-------------
src/backend/access/brin/brin_tuple.c | 14 +----
...summarization-and-inprogress-insertion.out | 18 +++++-
...ummarization-and-inprogress-insertion.spec | 4 +-
src/test/modules/brin/t/01_workitems.pl | 11 ++--
src/test/regress/expected/brin.out | 2 +-
src/test/regress/expected/brin_bloom.out | 2 +-
src/test/regress/expected/brin_multi.out | 2 +-
8 files changed, 46 insertions(+), 65 deletions(-)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 3ed8eefab86..8cf17156f50 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -591,6 +591,17 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
bval = &dtup->bt_columns[attno - 1];
+ /*
+ * If the range has both allnulls and hasnulls set, it means
+ * there are no rows in the range, so we can skip it (we have
+ * scan keys, and we know there's nothing to match).
+ */
+ if (bval->bv_allnulls && bval->bv_hasnulls)
+ {
+ addrange = false;
+ break;
+ }
+
/*
* First check if there are any IS [NOT] NULL scan keys,
* and if we're violating them. In that case we can
@@ -1568,48 +1579,15 @@ form_and_insert_tuple(BrinBuildState *state)
{
BrinTuple *tup;
Size size;
- bool modified = false;
- BrinMemTuple *dtuple = state->bs_dtuple;
- int i;
- /*
- * Check if any rows were processed for the page range represented by this
- * memtuple. We initially set both allnulls/hasnulls to true to identify
- * if the range is in this initial/empty state.
- *
- * XXX It should be enough to check only the first summary - either all
- * summaries are empty or none of them.
- */
- for (i = 0; i < state->bs_bdesc->bd_tupdesc->natts; i++)
- {
- if (!(dtuple->bt_columns[i].bv_allnulls &&
- dtuple->bt_columns[i].bv_hasnulls))
- {
- modified = true;
- break;
- }
- }
+ tup = brin_form_tuple(state->bs_bdesc, state->bs_currRangeStart,
+ state->bs_dtuple, &size);
+ brin_doinsert(state->bs_irel, state->bs_pagesPerRange, state->bs_rmAccess,
+ &state->bs_currentInsertBuf, state->bs_currRangeStart,
+ tup, size);
+ state->bs_numtuples++;
- /*
- * If the memtuple was modified (i.e. we added any rows to it), insert it
- * into the index. That is, we don't store index tuples not representing
- * any rows from table.
- *
- * XXX This has the undesirable consequence, that if the table has a gap
- * (a long sequence of pages with no remaining tuples), we won't have any
- * BRIN summaries for this part of the table. Which means that we'll have
- * to scan this gap for each bitmap index scan.
- */
- if (modified)
- {
- tup = brin_form_tuple(state->bs_bdesc, state->bs_currRangeStart,
- state->bs_dtuple, &size);
- brin_doinsert(state->bs_irel, state->bs_pagesPerRange, state->bs_rmAccess,
- &state->bs_currentInsertBuf, state->bs_currRangeStart,
- tup, size);
- state->bs_numtuples++;
- pfree(tup);
- }
+ pfree(tup);
}
/*
diff --git a/src/backend/access/brin/brin_tuple.c b/src/backend/access/brin/brin_tuple.c
index 1b5e72cde24..308c12a255b 100644
--- a/src/backend/access/brin/brin_tuple.c
+++ b/src/backend/access/brin/brin_tuple.c
@@ -136,13 +136,6 @@ brin_form_tuple(BrinDesc *brdesc, BlockNumber blkno, BrinMemTuple *tuple,
{
int datumno;
- /*
- * We should never get here for memtuples in the initial state, i.e.
- * before any rows were added to it.
- */
- Assert(!(tuple->bt_columns[keyno].bv_hasnulls &&
- tuple->bt_columns[keyno].bv_allnulls));
-
/*
* "allnulls" is set when there's no nonnull value in any row in the
* column; when this happens, there is no data to store. Thus set the
@@ -594,11 +587,10 @@ brin_deform_tuple(BrinDesc *brdesc, BrinTuple *tuple, BrinMemTuple *dMemtuple)
{
int i;
- Assert(!(allnulls[keyno] && hasnulls[keyno]));
-
/*
- * Make sure to overwrite the hasnulls flag, because it might have
- * been initialized as true by brin_memtuple_initialize.
+ * Make sure to overwrite the hasnulls flag, because it was initialized
+ * to true by brin_memtuple_initialize and we don't want to skip it if
+ * allnulls.
*/
dtup->bt_columns[keyno].bv_hasnulls = hasnulls[keyno];
diff --git a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
index 2266012eac7..584ac2602f7 100644
--- a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
+++ b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
@@ -1,6 +1,12 @@
Parsed test spec with 2 sessions
-starting permutation: s1b s2b s1i s2summ s1c s2c s2check
+starting permutation: s2check s1b s2b s1i s2summ s1c s2c s2check
+step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
+itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
+----------+------+------+--------+--------+-----------+--------
+ 1| 0| 1|f |t |f |{1 .. 1}
+(1 row)
+
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
step s2b: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT 1;
?column?
@@ -12,7 +18,7 @@ step s1i: INSERT INTO brin_iso VALUES (1000);
step s2summ: SELECT brin_summarize_new_values('brinidx'::regclass);
brin_summarize_new_values
-------------------------
- 2
+ 1
(1 row)
step s1c: COMMIT;
@@ -25,7 +31,13 @@ itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
(2 rows)
-starting permutation: s1b s1i s2vacuum s1c s2check
+starting permutation: s2check s1b s1i s2vacuum s1c s2check
+step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
+itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
+----------+------+------+--------+--------+-----------+--------
+ 1| 0| 1|f |t |f |{1 .. 1}
+(1 row)
+
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
step s1i: INSERT INTO brin_iso VALUES (1000);
step s2vacuum: VACUUM brin_iso;
diff --git a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
index 6319ae4c38d..18ba92b7ba1 100644
--- a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
+++ b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
@@ -41,5 +41,5 @@ step "s2vacuum" { VACUUM brin_iso; }
step "s2check" { SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass); }
-permutation "s1b" "s2b" "s1i" "s2summ" "s1c" "s2c" "s2check"
-permutation "s1b" "s1i" "s2vacuum" "s1c" "s2check"
+permutation "s2check" "s1b" "s2b" "s1i" "s2summ" "s1c" "s2c" "s2check"
+permutation "s2check" "s1b" "s1i" "s2vacuum" "s1c" "s2check"
diff --git a/src/test/modules/brin/t/01_workitems.pl b/src/test/modules/brin/t/01_workitems.pl
index eeec44b0060..3108c02cf4d 100644
--- a/src/test/modules/brin/t/01_workitems.pl
+++ b/src/test/modules/brin/t/01_workitems.pl
@@ -24,21 +24,20 @@ $node->safe_psql(
create index brin_wi_idx on brin_wi using brin (a) with (pages_per_range=1, autosummarize=on);
'
);
+my $count = $node->safe_psql('postgres',
+ "select count(*) from brin_page_items(get_raw_page('brin_wi_idx', 2), 'brin_wi_idx'::regclass)"
+);
+is($count, '1', "initial index state is correct");
$node->safe_psql('postgres',
'insert into brin_wi select * from generate_series(1, 100)');
-$node->poll_query_until(
- 'postgres',
- "select pg_relation_size('brin_wi_idx'::regclass) / current_setting('block_size')::int = 3",
- 't');
-
$node->poll_query_until(
'postgres',
"select count(*) > 1 from brin_page_items(get_raw_page('brin_wi_idx', 2), 'brin_wi_idx'::regclass)",
't');
-my $count = $node->safe_psql('postgres',
+$count = $node->safe_psql('postgres',
"select count(*) > 1 from brin_page_items(get_raw_page('brin_wi_idx', 2), 'brin_wi_idx'::regclass)"
);
is($count, 't', "index got summarized");
diff --git a/src/test/regress/expected/brin.out b/src/test/regress/expected/brin.out
index ebc31222354..73fa38396e4 100644
--- a/src/test/regress/expected/brin.out
+++ b/src/test/regress/expected/brin.out
@@ -454,7 +454,7 @@ $$;
SELECT brin_summarize_range('brin_summarize_idx', 0);
brin_summarize_range
----------------------
- 1
+ 0
(1 row)
-- nothing: already summarized
diff --git a/src/test/regress/expected/brin_bloom.out b/src/test/regress/expected/brin_bloom.out
index 6e847f9113d..32c56a996a2 100644
--- a/src/test/regress/expected/brin_bloom.out
+++ b/src/test/regress/expected/brin_bloom.out
@@ -373,7 +373,7 @@ $$;
SELECT brin_summarize_range('brin_summarize_bloom_idx', 0);
brin_summarize_range
----------------------
- 1
+ 0
(1 row)
-- nothing: already summarized
diff --git a/src/test/regress/expected/brin_multi.out b/src/test/regress/expected/brin_multi.out
index e65f1c20d4f..f3309f433f8 100644
--- a/src/test/regress/expected/brin_multi.out
+++ b/src/test/regress/expected/brin_multi.out
@@ -407,7 +407,7 @@ $$;
SELECT brin_summarize_range('brin_summarize_multi_idx', 0);
brin_summarize_range
----------------------
- 1
+ 0
(1 row)
-- nothing: already summarized
--
2.38.1
On Mon, Nov 28, 2022 at 01:13:14AM +0100, Tomas Vondra wrote:
Opinions? Considering this will need to be backpatches, it'd be good to
get some feedback on the approach. I think it's fine, but it would be
unfortunate to fix one issue but break BRIN in a different way.
--- a/contrib/pageinspect/Makefile +++ b/contrib/pageinspect/Makefile @@ -22,7 +22,7 @@ DATA = pageinspect--1.10--1.11.sql \ pageinspect--1.0--1.1.sql PGFILEDESC = "pageinspect - functions to inspect contents of database pages"-REGRESS = page btree brin gin gist hash checksum oldextversions +REGRESS = page btree brin gin gist hash checksum oldextversions brinbugs
I can't comment on the patch itself, but:
These changes to ./Makefile will also need to be made in ./meson.build.
Also (per cirrusci), the test sometimes fail since two parallel tests
are doing "CREATE EXTENSION".
Hi,
here's an improved and cleaned-up version of the fix.
I removed brinbugs.sql from pageinspect, because it seems enough to have
the other tests (I added brinbugs first, before realizing those exist).
This also means meson.build is fine and there are no tests doing CREATE
EXTENSION concurrently etc.
I decided to go with the 0003 approach, which stores summaries for empty
ranges. That seems to be less intrusive (it's more like what we do now),
and works better for tables with a lot of bulk deletes. It means we can
have ranges with allnulls=hasnulls=true, which wasn't the case before,
but I don't see why this should break e.g. custom opclasses (if it does,
it probably means the opclass is wrong).
Finally, I realized union_tuples needs to be tweaked to deal with empty
ranges properly. The changes are fairly limited, though.
I plan to push this into master right at the beginning of January, and
then backpatch a couple days later.
I still feel a bit uneasy about tweaking this, but I don't think there's
a better way than reusing the existing flags.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
0001-Fix-handling-of-NULLs-in-BRIN-summaries-20221230.patchtext/x-patch; charset=UTF-8; name=0001-Fix-handling-of-NULLs-in-BRIN-summaries-20221230.patchDownload
From 10c03ef1b41f69b54db761b16f8bb1e0642d815b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Thu, 29 Dec 2022 22:41:36 +0100
Subject: [PATCH] Fix handling of NULLs in BRIN summaries
BRIN did not properly distinguish empty (new) and all-NULL ranges. All
ranges were initialized with all_nulls=true and opclasses simply reset
this to false when adding the first non-NULL value. This fails if the
first value in the range is NULL, and there are no other NULLs in the
range - we forget the range contains a NULL value.
This happens because the "all_nulls" flag is used for two separate
purposes - to mark empty ranges (not representing any rows yet) and
ranges containing only NULL values. The opclass can't know which of
those cases is it.
The opclass might also set has_nulls=true when resetting the all_nulls
flag - that would make it correct, but the indexes would be useless for
IS NULL conditions as all ranges start with all_nulls=true (and so all
ranges would have one of those flags set to true).
Ideally we'd introduce a new "is_empty" flag marking empty summaries,
but that would break ABI and/or on-disk format, depending on where we
add the flag. Considering we need to backpatch this, that doesn't seem
particularly great.
So instead we use an "impossible" combination of both flags (all_nulls
and has_nulls) set to true to mark "empty" ranges. It'd be better to
have a single flag for the whole index tuple (and not per-summary one),
because "range is empty" applies to all ranges in a multi-column index,
but this is where the existing flags are.
We could skip storing index tuples with this combination, but then we'd
have to always read/process this range - even if there are no rows, it
would still require reading the pages etc. So we store them, but ignore
them when building the bitmap.
---
src/backend/access/brin/brin.c | 78 ++++++++++++++++++-
src/backend/access/brin/brin_tuple.c | 11 ++-
...summarization-and-inprogress-insertion.out | 8 +-
...ummarization-and-inprogress-insertion.spec | 1 +
4 files changed, 89 insertions(+), 9 deletions(-)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 7e386250ae..c7bf8963fa 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -591,6 +591,17 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
bval = &dtup->bt_columns[attno - 1];
+ /*
+ * If the range has both allnulls and hasnulls set, it means
+ * there are no rows in the range, so we can skip it (we have
+ * scan keys, and we know there's nothing to match).
+ */
+ if (bval->bv_allnulls && bval->bv_hasnulls)
+ {
+ addrange = false;
+ break;
+ }
+
/*
* First check if there are any IS [NOT] NULL scan keys,
* and if we're violating them. In that case we can
@@ -1608,11 +1619,32 @@ union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
if (opcinfo->oi_regular_nulls)
{
- /* Adjust "hasnulls". */
+ /*
+ * If B is empty (represents no rows), ignore it and just keep
+ * A as is (might be empty etc.).
+ */
+ if (col_b->bv_allnulls && col_b->bv_hasnulls)
+ continue;
+
+ /*
+ * Adjust "hasnulls".
+ *
+ * It may happen A has allnulls=true, and we should reset it. We
+ * need to copy the values from B first, which happens later.
+ * We know the next condition can't trigger, because B is not
+ * empty so only one of the flags is true.
+ */
if (!col_a->bv_hasnulls && col_b->bv_hasnulls)
col_a->bv_hasnulls = true;
- /* If there are no values in B, there's nothing left to do. */
+ /*
+ * If there are no values in B, there's nothing left to do.
+ *
+ * Note this is mutually exclusive with the preceding condition.
+ * We have skipped "empty" B, so hasnulls and allnulls can't be
+ * both true. So if we adjusted hasnulls for A, there have to be
+ * values for B (i.e. we're not terminating here).
+ */
if (col_b->bv_allnulls)
continue;
@@ -1626,6 +1658,7 @@ union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
{
int i;
+ /* This also applies if we adjusted hasnulls=true earlier. */
col_a->bv_allnulls = false;
for (i = 0; i < opcinfo->oi_nstored; i++)
@@ -1710,24 +1743,53 @@ add_values_to_range(Relation idxRel, BrinDesc *bdesc, BrinMemTuple *dtup,
Datum result;
BrinValues *bval;
FmgrInfo *addValue;
+ bool first_row;
+ bool has_nulls = false;
bval = &dtup->bt_columns[keyno];
+ /*
+ * Is this the first tuple we're adding to the range range? We track
+ * that by setting both bv_hasnulls and bval->bv_allnulls to true
+ * during initialization. But it's not a valid combination (at most
+ * one of those flags should be set), so we reset the second flag.
+ */
+ first_row = (bval->bv_hasnulls && bval->bv_allnulls);
+
+ if (bval->bv_hasnulls && bval->bv_allnulls)
+ {
+ bval->bv_hasnulls = false;
+ modified = true;
+ }
+
if (bdesc->bd_info[keyno]->oi_regular_nulls && nulls[keyno])
{
/*
* If the new value is null, we record that we saw it if it's the
* first one; otherwise, there's nothing to do.
+ *
+ * XXX This used to check "hasnulls" but now that might result in
+ * having both flags set. That used to be OK, because we just
+ * ignore hasnulls flag in brin_form_tuple when allnulls=true.
+ * But now we interpret this combination as "firt row" so it
+ * would confuse following calls. So make sure to only set one
+ * of the flags - when allnulls=true we're done, as it already
+ * marks the range as containing ranges.
*/
- if (!bval->bv_hasnulls)
+ if (!bval->bv_allnulls)
{
bval->bv_hasnulls = true;
modified = true;
}
-
continue;
}
+ /*
+ * Does the range already has NULL values? Either of the flags can
+ * be set, but we ignore the state before adding first row.
+ */
+ has_nulls = (bval->bv_hasnulls || bval->bv_allnulls) && !first_row;
+
addValue = index_getprocinfo(idxRel, keyno + 1,
BRIN_PROCNUM_ADDVALUE);
result = FunctionCall4Coll(addValue,
@@ -1736,8 +1798,16 @@ add_values_to_range(Relation idxRel, BrinDesc *bdesc, BrinMemTuple *dtup,
PointerGetDatum(bval),
values[keyno],
nulls[keyno]);
+
/* if that returned true, we need to insert the updated tuple */
modified |= DatumGetBool(result);
+
+ /*
+ * If we had NULLS, and the opclass didn't set allnulls=true, set
+ * the hasnulls so that we know there are NULL values.
+ */
+ if (has_nulls && !bval->bv_allnulls)
+ bval->bv_hasnulls = true;
}
return modified;
diff --git a/src/backend/access/brin/brin_tuple.c b/src/backend/access/brin/brin_tuple.c
index c0e2dbd23b..308c12a255 100644
--- a/src/backend/access/brin/brin_tuple.c
+++ b/src/backend/access/brin/brin_tuple.c
@@ -516,8 +516,10 @@ brin_memtuple_initialize(BrinMemTuple *dtuple, BrinDesc *brdesc)
for (i = 0; i < brdesc->bd_tupdesc->natts; i++)
{
dtuple->bt_columns[i].bv_attno = i + 1;
+
+ /* each memtuple starts as if it represents no rows */
dtuple->bt_columns[i].bv_allnulls = true;
- dtuple->bt_columns[i].bv_hasnulls = false;
+ dtuple->bt_columns[i].bv_hasnulls = true;
dtuple->bt_columns[i].bv_values = (Datum *) currdatum;
dtuple->bt_columns[i].bv_mem_value = PointerGetDatum(NULL);
@@ -585,6 +587,13 @@ brin_deform_tuple(BrinDesc *brdesc, BrinTuple *tuple, BrinMemTuple *dMemtuple)
{
int i;
+ /*
+ * Make sure to overwrite the hasnulls flag, because it was initialized
+ * to true by brin_memtuple_initialize and we don't want to skip it if
+ * allnulls.
+ */
+ dtup->bt_columns[keyno].bv_hasnulls = hasnulls[keyno];
+
if (allnulls[keyno])
{
valueno += brdesc->bd_info[keyno]->oi_nstored;
diff --git a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
index 2a4755d099..584ac2602f 100644
--- a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
+++ b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
@@ -4,7 +4,7 @@ starting permutation: s2check s1b s2b s1i s2summ s1c s2c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -26,7 +26,7 @@ step s2c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
2| 1| 1|f |f |f |{1 .. 1000}
(2 rows)
@@ -35,7 +35,7 @@ starting permutation: s2check s1b s1i s2vacuum s1c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -45,7 +45,7 @@ step s1c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
2| 1| 1|f |f |f |{1 .. 1000}
(2 rows)
diff --git a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
index 19ac18a2e8..18ba92b7ba 100644
--- a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
+++ b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
@@ -9,6 +9,7 @@ setup
) WITH (fillfactor=10);
CREATE INDEX brinidx ON brin_iso USING brin (value) WITH (pages_per_range=1);
-- this fills the first page
+ INSERT INTO brin_iso VALUES (NULL);
DO $$
DECLARE curtid tid;
BEGIN
--
2.38.1
On Fri, Dec 30, 2022 at 01:18:36AM +0100, Tomas Vondra wrote:
+ * Does the range already has NULL values? Either of the flags can
should say: "already have NULL values"
+ * If we had NULLS, and the opclass didn't set allnulls=true, set + * the hasnulls so that we know there are NULL values.
You could remove "the" before "hasnulls".
Or say "clear hasnulls so that.."
@@ -585,6 +587,13 @@ brin_deform_tuple(BrinDesc *brdesc, BrinTuple *tuple, BrinMemTuple *dMemtuple)
{
int i;+ /* + * Make sure to overwrite the hasnulls flag, because it was initialized + * to true by brin_memtuple_initialize and we don't want to skip it if + * allnulls.
Does "if allnulls" mean "if allnulls is true" ?
It's a bit unclear.
+ */ + dtup->bt_columns[keyno].bv_hasnulls = hasnulls[keyno]; + if (allnulls[keyno]) { valueno += brdesc->bd_info[keyno]->oi_nstored;
--
Justin
Thanks Justin! I've applied all the fixes you proposed, and (hopefully)
improved a couple other comments.
I've been working on this over the past couple days, trying to polish
and commit it over the weekend - both into master and backbranches.
Sadly, the backpatching part turned out to be a bit more complicated
than I expected, because of the BRIN reworks in PG14 (done by me, as
foundation for the new opclasses, so ... well).
Anyway, I got it done, but it's a bit uglier than I hoped for and I
don't feel like pushing this on Sunday midnight. I think it's correct,
but maybe another pass to polish it a bit more is better.
So here are two patches - one for 11-13, the other for 14-master.
There's also a separate patch with pageinspect tests, but only as a
demonstration of various (non)broken cases, not for commit. And then
also a bash script generating indexes with random data, randomized
summarization etc. - on unpatched systems this happens to fail in about
1/3 of the runs (at least for me). I haven't seen any failures with the
patches attached (on any branch).
As for the issue / fix, I don't think there's a better solution than
what the patch does - we need to distinguish empty / all-nulls ranges,
but we can't add a flag because of on-disk format / ABI. So using the
existing flags seems like the only option - I haven't heard any other
ideas so far, and I couldn't come up with any myself either.
I've also thought about alternative "encodings" into allnulls/hasnulls,
instead of treating (true,true) as "empty" - but none of that ended up
being any simpler, quite the opposite actually, as it would change what
the individual flags mean etc. So AFAICS this is the best / least
disruptive option.
I went over all the places touching these flags, to double check if any
of those needs some tweaks (similar to union_tuples, which I missed for
a long time). But I haven't found anything else, so I think this version
of the patches is complete.
As for assessing how many indexes are affected - in principle, any index
on columns with NULLs may be broken. But it only matters if the index is
used for IS NULL queries, other queries are not affected.
I also realized that this only affects insertion of individual tuples
into existing all-null summaries, not "bulk" summarization that sees all
values at once. This happens because in this case add_values_to_range
sets hasnulls=true for the first (NULL) value, and then calls the
addValue procedure for the second (non-NULL) one, which resets the
allnulls flag to false.
But when inserting individual rows, we first set hasnulls=true, but
brin_form_tuple ignores that because of allnulls=true. And then when
inserting the second row, we start with hasnulls=false again, and the
opclass quietly resets the allnulls flag.
I guess this further reduces the number of broken indexes, especially
for data sets with small null_frac, or for append-only (or -mostly)
tables where most of the summarization is bulk.
I still feel a bit uneasy about this, but I think the patch is solid.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
0001-extra-tests.patchtext/x-patch; charset=UTF-8; name=0001-extra-tests.patchDownload
From 033286cff9733c24fdc7c3f774d947c9f1072aa0 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 8 Jan 2023 23:42:41 +0100
Subject: [PATCH] extra tests
---
contrib/pageinspect/Makefile | 2 +-
contrib/pageinspect/expected/brin-fails.out | 152 ++++++++++++++++++++
contrib/pageinspect/sql/brin-fails.sql | 85 +++++++++++
3 files changed, 238 insertions(+), 1 deletion(-)
create mode 100644 contrib/pageinspect/expected/brin-fails.out
create mode 100644 contrib/pageinspect/sql/brin-fails.sql
diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index 95e030b396..69a28a6d3d 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -22,7 +22,7 @@ DATA = pageinspect--1.11--1.12.sql pageinspect--1.10--1.11.sql \
pageinspect--1.0--1.1.sql
PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
-REGRESS = page btree brin gin gist hash checksum oldextversions
+REGRESS = page btree brin gin gist hash checksum oldextversions brin-fails
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/pageinspect/expected/brin-fails.out b/contrib/pageinspect/expected/brin-fails.out
new file mode 100644
index 0000000000..a12c761fc8
--- /dev/null
+++ b/contrib/pageinspect/expected/brin-fails.out
@@ -0,0 +1,152 @@
+create table t (a int);
+create extension pageinspect;
+-- works
+drop index if exists t_a_idx;
+NOTICE: index "t_a_idx" does not exist, skipping
+truncate t;
+insert into t values (null), (1);
+create index on t using brin (a);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | t | f | {1 .. 1}
+(1 row)
+
+-- works
+drop index if exists t_a_idx;
+truncate t;
+insert into t values (1), (null);
+create index on t using brin (a);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | t | f | {1 .. 1}
+(1 row)
+
+-- fails
+drop index if exists t_a_idx;
+truncate t;
+insert into t values (null);
+create index on t using brin (a);
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | t | f | {1 .. 1}
+(1 row)
+
+-- fails
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a);
+insert into t values (null);
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | f | t | f | {1 .. 1}
+(1 row)
+
+-- works
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a) with (pages_per_range=1);
+insert into t select null from generate_series(1,291); -- fill first page
+insert into t values (null), (1);
+select brin_summarize_new_values('t_a_idx');
+ brin_summarize_new_values
+---------------------------
+ 1
+(1 row)
+
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | t | f | f |
+ 2 | 1 | 1 | f | t | f | {1 .. 1}
+(2 rows)
+
+-- fails
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a) with (pages_per_range=1);
+insert into t select null from generate_series(1,291); -- fill first page
+insert into t values (null);
+select brin_summarize_new_values('t_a_idx');
+ brin_summarize_new_values
+---------------------------
+ 1
+(1 row)
+
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | t | f | f |
+ 2 | 1 | 1 | f | t | f | {1 .. 1}
+(2 rows)
+
+-- works
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a) with (pages_per_range=1);
+insert into t select null from generate_series(1,291); -- fill first page
+insert into t values (null);
+insert into t values (1);
+select brin_summarize_new_values('t_a_idx');
+ brin_summarize_new_values
+---------------------------
+ 1
+(1 row)
+
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | t | f | f |
+ 2 | 1 | 1 | f | t | f | {1 .. 1}
+(2 rows)
+
+-- fails
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a) with (pages_per_range=1);
+insert into t select null from generate_series(1,291); -- fill first page
+insert into t values (null);
+insert into t values (null);
+select brin_summarize_new_values('t_a_idx');
+ brin_summarize_new_values
+---------------------------
+ 1
+(1 row)
+
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | t | f | f |
+ 2 | 1 | 1 | f | t | f | {1 .. 1}
+(2 rows)
+
+-- fails
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a) with (pages_per_range=1);
+insert into t select null from generate_series(1,291); -- fill first page
+insert into t values (null);
+insert into t values (null);
+select brin_summarize_new_values('t_a_idx');
+ brin_summarize_new_values
+---------------------------
+ 1
+(1 row)
+
+insert into t values (null);
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
+------------+--------+--------+----------+----------+-------------+----------
+ 1 | 0 | 1 | t | f | f |
+ 2 | 1 | 1 | f | t | f | {1 .. 1}
+(2 rows)
+
diff --git a/contrib/pageinspect/sql/brin-fails.sql b/contrib/pageinspect/sql/brin-fails.sql
new file mode 100644
index 0000000000..ca57ba7e03
--- /dev/null
+++ b/contrib/pageinspect/sql/brin-fails.sql
@@ -0,0 +1,85 @@
+create table t (a int);
+create extension pageinspect;
+
+-- works
+drop index if exists t_a_idx;
+truncate t;
+insert into t values (null), (1);
+create index on t using brin (a);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- works
+drop index if exists t_a_idx;
+truncate t;
+insert into t values (1), (null);
+create index on t using brin (a);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- fails
+drop index if exists t_a_idx;
+truncate t;
+insert into t values (null);
+create index on t using brin (a);
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- fails
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a);
+insert into t values (null);
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- works
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a) with (pages_per_range=1);
+insert into t select null from generate_series(1,291); -- fill first page
+insert into t values (null), (1);
+select brin_summarize_new_values('t_a_idx');
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- fails
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a) with (pages_per_range=1);
+insert into t select null from generate_series(1,291); -- fill first page
+insert into t values (null);
+select brin_summarize_new_values('t_a_idx');
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- works
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a) with (pages_per_range=1);
+insert into t select null from generate_series(1,291); -- fill first page
+insert into t values (null);
+insert into t values (1);
+select brin_summarize_new_values('t_a_idx');
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- fails
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a) with (pages_per_range=1);
+insert into t select null from generate_series(1,291); -- fill first page
+insert into t values (null);
+insert into t values (null);
+select brin_summarize_new_values('t_a_idx');
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- fails
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a) with (pages_per_range=1);
+insert into t select null from generate_series(1,291); -- fill first page
+insert into t values (null);
+insert into t values (null);
+select brin_summarize_new_values('t_a_idx');
+insert into t values (null);
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
--
2.39.0
0001-Fix-handling-of-NULLs-in-BRIN-indexes-14-master.patchtext/x-patch; charset=UTF-8; name=0001-Fix-handling-of-NULLs-in-BRIN-indexes-14-master.patchDownload
From 26905146ebb93422152f0f6ec3a835f62b1b8327 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 8 Jan 2023 16:43:06 +0100
Subject: [PATCH] Fix handling of NULLs in BRIN indexes
BRIN indexes did not properly distinguish between summaries for empty
(no rows) and all-NULL ranges. All summaries were initialized with
allnulls=true, and the opclasses simply reset allnulls to false when
processing the first non-NULL value. This however fails if the range
starts with a NULL value (or a sequence of NULL values), in which case
we forget the range contains NULL values.
This happens because the allnulls flag is used for two separate
purposes - to mark empty ranges (not representing any rows yet) and
ranges containing only NULL values.
Opclasses don't know which of these cases it is, and so don't know
whether to set hasnulls=true. Setting hasnulls=true in both cases would
make it correct, but it would also make BRIN indexes useless for queries
with IS NULL clauses - all ranges start empty (and thus allnulls=true),
so all ranges would end up with either allnulls=true or hasnulls=true.
The severity of the issue is somewhat reduced by the fact that it only
happens when adding values to an existing summary with allnulls=true,
not when the summarization is processing values in bulk (e.g. during
CREATE INDEX or automatic summarization). In this case the flags were
updated in a slightly different way, not forgetting the NULL values.
The best solution would be to introduce a new flag marking index tuples
representing ranges with no rows, but that would break on-disk format
and/or ABI, depending on where we put the flag. Considering we need to
backpatch this, that's not acceptable.
So instead we use an "impossible" combination of both flags (allnulls
and hasnulls) set to true, to mark "empty" ranges with no rows. In
principle "empty" is a feature of the whole index tuple, which may
contain multiple summaries in a multi-column index, but this is where
the flags are, unfortunately.
We could also skip storing index tuples for empty summaries, but then
we'd have to always process such ranges - even if there are no rows in
large parts of the table (e.g. after a bulk DELETE), it would still
require reading the pages etc. So we store them, but ignore them when
building the bitmap.
Backpatch to 11. The issue exists since BRIN indexes were introduced in
9.5, but older releases are already EOL.
Backpatch-through: 11
Reviewed-by: Justin Pryzby, Matthias van de Meent
Discussion: https://postgr.es/m/402430e4-7d9d-6cf1-09ef-464d80afff3b@enterprisedb.com
---
src/backend/access/brin/brin.c | 77 ++++++++++++++++++-
src/backend/access/brin/brin_tuple.c | 16 +++-
...summarization-and-inprogress-insertion.out | 8 +-
...ummarization-and-inprogress-insertion.spec | 1 +
4 files changed, 94 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index de1427a1e0..aa8d4017a7 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -591,6 +591,17 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
bval = &dtup->bt_columns[attno - 1];
+ /*
+ * If the range has both allnulls and hasnulls set, it means
+ * there are no rows in the range, so we can skip it (we have
+ * scan keys, and we know there's nothing to match).
+ */
+ if (bval->bv_allnulls && bval->bv_hasnulls)
+ {
+ addrange = false;
+ break;
+ }
+
/*
* First check if there are any IS [NOT] NULL scan keys,
* and if we're violating them. In that case we can
@@ -1608,11 +1619,32 @@ union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
if (opcinfo->oi_regular_nulls)
{
- /* Adjust "hasnulls". */
+ /*
+ * If B is empty (represents no rows), ignore it and just keep
+ * A as is (might be empty etc.).
+ */
+ if (col_b->bv_allnulls && col_b->bv_hasnulls)
+ continue;
+
+ /*
+ * Adjust "hasnulls".
+ *
+ * It may happen A has allnulls=true, and we should reset it. We
+ * need to copy the values from B first, which happens later.
+ * We know the next condition can't trigger, because B is not
+ * empty so only one of the flags is true.
+ */
if (!col_a->bv_hasnulls && col_b->bv_hasnulls)
col_a->bv_hasnulls = true;
- /* If there are no values in B, there's nothing left to do. */
+ /*
+ * If there are no values in B, there's nothing left to do.
+ *
+ * Note this is mutually exclusive with the preceding condition.
+ * We have skipped "empty" B, so hasnulls and allnulls can't be
+ * both true. So if we adjusted hasnulls for A, there have to be
+ * values for B (i.e. we're not terminating here).
+ */
if (col_b->bv_allnulls)
continue;
@@ -1626,6 +1658,7 @@ union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
{
int i;
+ /* This also applies if we adjusted hasnulls=true earlier. */
col_a->bv_allnulls = false;
for (i = 0; i < opcinfo->oi_nstored; i++)
@@ -1710,16 +1743,41 @@ add_values_to_range(Relation idxRel, BrinDesc *bdesc, BrinMemTuple *dtup,
Datum result;
BrinValues *bval;
FmgrInfo *addValue;
+ bool first_row;
+ bool hasnulls = false;
bval = &dtup->bt_columns[keyno];
+ /*
+ * Is this the first tuple we're adding to the range range? We track
+ * that by setting both bv_hasnulls and bval->bv_allnulls to true
+ * during initialization. But it's not a valid combination (at most
+ * one of those flags should be set), so we reset the second flag.
+ */
+ first_row = (bval->bv_hasnulls && bval->bv_allnulls);
+
+ if (bval->bv_hasnulls && bval->bv_allnulls)
+ {
+ bval->bv_hasnulls = false;
+ modified = true;
+ }
+
if (bdesc->bd_info[keyno]->oi_regular_nulls && nulls[keyno])
{
/*
* If the new value is null, we record that we saw it if it's the
* first one; otherwise, there's nothing to do.
+ *
+ * We used to check "bv_hasnulls" which might result in having both
+ * flags set. That used to be OK, because we just ignore hasnulls
+ * flag in brin_form_tuple when bv_allnulls=true.
+ *
+ * But now we interpret this combination as "first row" so it would
+ * confuse following calls. So make sure to only set one of these
+ * flags - when allnulls=true we're done, as it already marks the
+ * range as containing values.
*/
- if (!bval->bv_hasnulls)
+ if (!bval->bv_allnulls)
{
bval->bv_hasnulls = true;
modified = true;
@@ -1728,6 +1786,12 @@ add_values_to_range(Relation idxRel, BrinDesc *bdesc, BrinMemTuple *dtup,
continue;
}
+ /*
+ * Does the range already have NULL values? Either of the flags can
+ * be set, but we ignore the state before adding first row.
+ */
+ hasnulls = (bval->bv_hasnulls || bval->bv_allnulls) && !first_row;
+
addValue = index_getprocinfo(idxRel, keyno + 1,
BRIN_PROCNUM_ADDVALUE);
result = FunctionCall4Coll(addValue,
@@ -1738,6 +1802,13 @@ add_values_to_range(Relation idxRel, BrinDesc *bdesc, BrinMemTuple *dtup,
nulls[keyno]);
/* if that returned true, we need to insert the updated tuple */
modified |= DatumGetBool(result);
+
+ /*
+ * If we had NULLS, and the opclass didn't set allnulls=true, clear
+ * hasnulls so that we remember there are NULL values.
+ */
+ if (hasnulls && !bval->bv_allnulls)
+ bval->bv_hasnulls = true;
}
return modified;
diff --git a/src/backend/access/brin/brin_tuple.c b/src/backend/access/brin/brin_tuple.c
index 84b79dbfc0..5078754f1e 100644
--- a/src/backend/access/brin/brin_tuple.c
+++ b/src/backend/access/brin/brin_tuple.c
@@ -516,8 +516,15 @@ brin_memtuple_initialize(BrinMemTuple *dtuple, BrinDesc *brdesc)
for (i = 0; i < brdesc->bd_tupdesc->natts; i++)
{
dtuple->bt_columns[i].bv_attno = i + 1;
+
+ /*
+ * Each memtuple starts as if it represents no rows, which is indicated
+ * by having bot allnulls and hasnulls set to true. We track this for
+ * all columns, because we don't have a flag for the whole memtuple.
+ */
dtuple->bt_columns[i].bv_allnulls = true;
- dtuple->bt_columns[i].bv_hasnulls = false;
+ dtuple->bt_columns[i].bv_hasnulls = true;
+
dtuple->bt_columns[i].bv_values = (Datum *) currdatum;
dtuple->bt_columns[i].bv_mem_value = PointerGetDatum(NULL);
@@ -585,6 +592,13 @@ brin_deform_tuple(BrinDesc *brdesc, BrinTuple *tuple, BrinMemTuple *dMemtuple)
{
int i;
+ /*
+ * Make sure to overwrite the hasnulls flag, because it was initialized
+ * to true by brin_memtuple_initialize and we don't want to skip it if
+ * allnulls=true.
+ */
+ dtup->bt_columns[keyno].bv_hasnulls = hasnulls[keyno];
+
if (allnulls[keyno])
{
valueno += brdesc->bd_info[keyno]->oi_nstored;
diff --git a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
index 2a4755d099..584ac2602f 100644
--- a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
+++ b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
@@ -4,7 +4,7 @@ starting permutation: s2check s1b s2b s1i s2summ s1c s2c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -26,7 +26,7 @@ step s2c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
2| 1| 1|f |f |f |{1 .. 1000}
(2 rows)
@@ -35,7 +35,7 @@ starting permutation: s2check s1b s1i s2vacuum s1c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -45,7 +45,7 @@ step s1c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
2| 1| 1|f |f |f |{1 .. 1000}
(2 rows)
diff --git a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
index 19ac18a2e8..18ba92b7ba 100644
--- a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
+++ b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
@@ -9,6 +9,7 @@ setup
) WITH (fillfactor=10);
CREATE INDEX brinidx ON brin_iso USING brin (value) WITH (pages_per_range=1);
-- this fills the first page
+ INSERT INTO brin_iso VALUES (NULL);
DO $$
DECLARE curtid tid;
BEGIN
--
2.39.0
0001-Fix-handling-of-NULLs-in-BRIN-indexes-11-13.patchtext/x-patch; charset=UTF-8; name=0001-Fix-handling-of-NULLs-in-BRIN-indexes-11-13.patchDownload
From 16047a0dabca1a0a31cc8d86b274859cd1166438 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 8 Jan 2023 22:04:41 +0100
Subject: [PATCH] Fix handling of NULLs in BRIN indexes
BRIN indexes did not properly distinguish between summaries for empty
(no rows) and all-NULL ranges. All summaries were initialized with
allnulls=true, and the opclasses simply reset allnulls to false when
processing the first non-NULL value. This however fails if the range
starts with a NULL value (or a sequence of NULL values), in which case
we forget the range contains NULL values.
This happens because the allnulls flag is used for two separate
purposes - to mark empty ranges (not representing any rows yet) and
ranges containing only NULL values.
Opclasses don't know which of these cases it is, and so don't know
whether to set hasnulls=true. Setting hasnulls=true in both cases would
make it correct, but it would also make BRIN indexes useless for queries
with IS NULL clauses - all ranges start empty (and thus allnulls=true),
so all ranges would end up with either allnulls=true or hasnulls=true.
The severity of the issue is somewhat reduced by the fact that it only
happens when adding values to an existing summary with allnulls=true,
not when the summarization is processing values in bulk (e.g. during
CREATE INDEX or automatic summarization). In this case the flags were
updated in a slightly different way, not forgetting the NULL values.
The best solution would be to introduce a new flag marking index tuples
representing ranges with no rows, but that would break on-disk format
and/or ABI, depending on where we put the flag. Considering we need to
backpatch this, that's not acceptable.
So instead we use an "impossible" combination of both flags (allnulls
and hasnulls) set to true, to mark "empty" ranges with no rows. In
principle "empty" is a feature of the whole index tuple, which may
contain multiple summaries in a multi-column index, but this is where
the flags are, unfortunately.
We could also skip storing index tuples for empty summaries, but then
we'd have to always process such ranges - even if there are no rows in
large parts of the table (e.g. after a bulk DELETE), it would still
require reading the pages etc. So we store them, but ignore them when
building the bitmap.
Backpatch to 11. The issue exists since BRIN indexes were introduced in
9.5, but older releases are already EOL.
Backpatch-through: 11
Reviewed-by: Justin Pryzby, Matthias van de Meent
Discussion: https://postgr.es/m/402430e4-7d9d-6cf1-09ef-464d80afff3b@enterprisedb.com
---
src/backend/access/brin/brin.c | 46 ++++++++++++++++
src/backend/access/brin/brin_minmax.c | 54 +++++++++++++++++--
src/backend/access/brin/brin_tuple.c | 16 +++++-
...summarization-and-inprogress-insertion.out | 8 +--
...ummarization-and-inprogress-insertion.spec | 1 +
5 files changed, 116 insertions(+), 9 deletions(-)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 0becfde113..f1dd39e016 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -253,8 +253,19 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
Datum result;
BrinValues *bval;
FmgrInfo *addValue;
+ bool first_row;
+ bool hasnulls;
bval = &dtup->bt_columns[keyno];
+
+ first_row = (bval->bv_hasnulls && bval->bv_allnulls);
+
+ /*
+ * Does the range already have NULL values? Either of the flags can
+ * be set, but we ignore the state before adding first row.
+ */
+ hasnulls = (bval->bv_hasnulls || bval->bv_allnulls) && !first_row;
+
addValue = index_getprocinfo(idxRel, keyno + 1,
BRIN_PROCNUM_ADDVALUE);
result = FunctionCall4Coll(addValue,
@@ -265,6 +276,13 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
nulls[keyno]);
/* if that returned true, we need to insert the updated tuple */
need_insert |= DatumGetBool(result);
+
+ /*
+ * If we had NULLS, and the opclass didn't set allnulls=true, clear
+ * hasnulls so that we remember there are NULL values.
+ */
+ if (hasnulls && !bval->bv_allnulls)
+ bval->bv_hasnulls = true;
}
if (!need_insert)
@@ -508,6 +526,17 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
CurrentMemoryContext);
}
+ /*
+ * If the range has both allnulls and hasnulls set, it means
+ * there are no rows in the range, so we can skip it (we have
+ * scan keys, and we know there's nothing to match).
+ */
+ if (bval->bv_allnulls && bval->bv_hasnulls)
+ {
+ addrange = false;
+ break;
+ }
+
/*
* Check whether the scan key is consistent with the page
* range values; if so, have the pages in the range added
@@ -645,8 +674,19 @@ brinbuildCallback(Relation index,
FmgrInfo *addValue;
BrinValues *col;
Form_pg_attribute attr = TupleDescAttr(state->bs_bdesc->bd_tupdesc, i);
+ bool first_row;
+ bool hasnulls;
col = &state->bs_dtuple->bt_columns[i];
+
+ first_row = (col->bv_hasnulls && col->bv_allnulls);
+
+ /*
+ * Does the range already have NULL values? Either of the flags can
+ * be set, but we ignore the state before adding first row.
+ */
+ hasnulls = (col->bv_hasnulls || col->bv_allnulls) && !first_row;
+
addValue = index_getprocinfo(index, i + 1,
BRIN_PROCNUM_ADDVALUE);
@@ -658,6 +698,12 @@ brinbuildCallback(Relation index,
PointerGetDatum(state->bs_bdesc),
PointerGetDatum(col),
values[i], isnull[i]);
+ /*
+ * If we had NULLS, and the opclass didn't set allnulls=true, clear
+ * hasnulls so that we remember there are NULL values.
+ */
+ if (hasnulls && !col->bv_allnulls)
+ col->bv_hasnulls = true;
}
}
diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 4b5d6a7213..fe19536c90 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -75,14 +75,38 @@ brin_minmax_add_value(PG_FUNCTION_ARGS)
Form_pg_attribute attr;
AttrNumber attno;
+ /*
+ * Is this the first tuple we're adding to the range range? We track
+ * that by setting both bv_hasnulls and bval->bv_allnulls to true
+ * during initialization. But it's not a valid combination (at most
+ * one of those flags should be set), so we reset the second flag.
+ *
+ * XXX The caller is responsible for tracking first_row=true.
+ */
+ if (column->bv_hasnulls && column->bv_allnulls)
+ {
+ column->bv_hasnulls = false;
+ updated = true;
+ }
+
/*
* If the new value is null, we record that we saw it if it's the first
* one; otherwise, there's nothing to do.
*/
if (isnull)
{
- if (column->bv_hasnulls)
- PG_RETURN_BOOL(false);
+ /*
+ * We used to check "bv_hasnulls" which might result in having both
+ * flags set. That used to be OK, because we just ignore hasnulls
+ * flag in brin_form_tuple when bv_allnulls=true.
+ *
+ * But now we interpret this combination as "first row" so it would
+ * confuse following calls. So make sure to only set one of these
+ * flags - when allnulls=true we're done, as it already marks the
+ * range as containing values.
+ */
+ if (column->bv_allnulls)
+ PG_RETURN_BOOL(updated);
column->bv_hasnulls = true;
PG_RETURN_BOOL(true);
@@ -250,11 +274,32 @@ brin_minmax_union(PG_FUNCTION_ARGS)
Assert(col_a->bv_attno == col_b->bv_attno);
- /* Adjust "hasnulls" */
+ /*
+ * If B is empty (represents no rows), ignore it and just keep
+ * A as is (might be empty etc.).
+ */
+ if (col_b->bv_allnulls && col_b->bv_hasnulls)
+ PG_RETURN_VOID();
+
+ /*
+ * Adjust "hasnulls".
+ *
+ * It may happen A has allnulls=true, and we should reset it. We
+ * need to copy the values from B first, which happens later.
+ * We know the next condition can't trigger, because B is not
+ * empty so only one of the flags is true.
+ */
if (!col_a->bv_hasnulls && col_b->bv_hasnulls)
col_a->bv_hasnulls = true;
- /* If there are no values in B, there's nothing left to do */
+ /*
+ * If there are no values in B, there's nothing left to do.
+ *
+ * Note this is mutually exclusive with the preceding condition.
+ * We have skipped "empty" B, so hasnulls and allnulls can't be
+ * both true. So if we adjusted hasnulls for A, there have to be
+ * values for B (i.e. we're not terminating here).
+ */
if (col_b->bv_allnulls)
PG_RETURN_VOID();
@@ -269,6 +314,7 @@ brin_minmax_union(PG_FUNCTION_ARGS)
*/
if (col_a->bv_allnulls)
{
+ /* This also applies if we adjusted hasnulls=true earlier. */
col_a->bv_allnulls = false;
col_a->bv_values[0] = datumCopy(col_b->bv_values[0],
attr->attbyval, attr->attlen);
diff --git a/src/backend/access/brin/brin_tuple.c b/src/backend/access/brin/brin_tuple.c
index b3b453aed1..fa0bfd8699 100644
--- a/src/backend/access/brin/brin_tuple.c
+++ b/src/backend/access/brin/brin_tuple.c
@@ -493,8 +493,15 @@ brin_memtuple_initialize(BrinMemTuple *dtuple, BrinDesc *brdesc)
for (i = 0; i < brdesc->bd_tupdesc->natts; i++)
{
dtuple->bt_columns[i].bv_attno = i + 1;
+
+ /*
+ * Each memtuple starts as if it represents no rows, which is indicated
+ * by having bot allnulls and hasnulls set to true. We track this for
+ * all columns, because we don't have a flag for the whole memtuple.
+ */
dtuple->bt_columns[i].bv_allnulls = true;
- dtuple->bt_columns[i].bv_hasnulls = false;
+ dtuple->bt_columns[i].bv_hasnulls = true;
+
dtuple->bt_columns[i].bv_values = (Datum *) currdatum;
currdatum += sizeof(Datum) * brdesc->bd_info[i]->oi_nstored;
}
@@ -557,6 +564,13 @@ brin_deform_tuple(BrinDesc *brdesc, BrinTuple *tuple, BrinMemTuple *dMemtuple)
{
int i;
+ /*
+ * Make sure to overwrite the hasnulls flag, because it was initialized
+ * to true by brin_memtuple_initialize and we don't want to skip it if
+ * allnulls=true.
+ */
+ dtup->bt_columns[keyno].bv_hasnulls = hasnulls[keyno];
+
if (allnulls[keyno])
{
valueno += brdesc->bd_info[keyno]->oi_nstored;
diff --git a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
index 2a4755d099..584ac2602f 100644
--- a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
+++ b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
@@ -4,7 +4,7 @@ starting permutation: s2check s1b s2b s1i s2summ s1c s2c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -26,7 +26,7 @@ step s2c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
2| 1| 1|f |f |f |{1 .. 1000}
(2 rows)
@@ -35,7 +35,7 @@ starting permutation: s2check s1b s1i s2vacuum s1c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -45,7 +45,7 @@ step s1c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
2| 1| 1|f |f |f |{1 .. 1000}
(2 rows)
diff --git a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
index 19ac18a2e8..18ba92b7ba 100644
--- a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
+++ b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
@@ -9,6 +9,7 @@ setup
) WITH (fillfactor=10);
CREATE INDEX brinidx ON brin_iso USING brin (value) WITH (pages_per_range=1);
-- this fills the first page
+ INSERT INTO brin_iso VALUES (NULL);
DO $$
DECLARE curtid tid;
BEGIN
--
2.39.0
On 1/9/23 00:34, Tomas Vondra wrote:
I've been working on this over the past couple days, trying to polish
and commit it over the weekend - both into master and backbranches.
Sadly, the backpatching part turned out to be a bit more complicated
than I expected, because of the BRIN reworks in PG14 (done by me, as
foundation for the new opclasses, so ... well).Anyway, I got it done, but it's a bit uglier than I hoped for and I
don't feel like pushing this on Sunday midnight. I think it's correct,
but maybe another pass to polish it a bit more is better.So here are two patches - one for 11-13, the other for 14-master.
I spent a bit more time on this fix. I realized there are two more
places that need fixes.
Firstly, the placeholder tuple needs to be marked as "empty" too, so
that it can be correctly updated by other backends etc.
Secondly, union_tuples had a couple bugs in handling empty ranges (this
is related to the placeholder tuple changes). I wonder what's the best
way to test this in an automated way - it's very dependent on timing of
the concurrent updated. For example we need to do something like this:
T1: run pg_summarize_range() until it inserts the placeholder tuple
T2: do an insert into the page range (updates placeholder)
T1: continue pg_summarize_range() to merge into the placeholder
But there are no convenient ways to do this, I think. I had to check the
various cases using breakpoints in gdb etc.
I'm not very happy with the union_tuples() changes - it's quite verbose,
perhaps a bit too verbose. We have to check for empty ranges first, and
then various combinations of allnulls/hasnulls flags for both BRIN
tuples. There are 9 combinations, and the current code just checks them
one by one - I was getting repeatedly confused by the original code, but
maybe it's too much.
As for the backpatch, I tried to keep it as close to the 14+ fixes as
possible, but it effectively backports some of the 14+ BRIN changes. In
particular, 14+ moved most of the NULL-handling logic from opclasses to
brin.c, and I think it's reasonable to do that for the backbranches too.
The alternative is to apply the same fix to every BRIN_PROCNUM_UNION
opclass procedure out there. I guess doing that for minmax+inclusion is
not a huge deal, but what about external opclasses? And without the fix
the indexes are effectively broken. Fixing this outside in brin.c (in
the union procedure) fixes this for every opclass procedure, without any
actual limitation of functinality (14+ does that anyway).
But maybe someone thinks this is a bad idea and we should do something
else in the backbranches?
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
0001-Fix-handling-of-NULLs-in-BRIN-ind-14-master-20230224.patchtext/x-patch; charset=UTF-8; name=0001-Fix-handling-of-NULLs-in-BRIN-ind-14-master-20230224.patchDownload
From 0abf47f311bfb0b03e5349b12c8e67ad3d5c0842 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 8 Jan 2023 16:43:06 +0100
Subject: [PATCH] Fix handling of NULLs in BRIN indexes
BRIN indexes did not properly distinguish between summaries for empty
(no rows) and all-NULL ranges. All summaries were initialized with
allnulls=true, and the opclasses simply reset allnulls to false when
processing the first non-NULL value. This however fails if the range
starts with a NULL value (or a sequence of NULL values), in which case
we forget the range contains NULL values.
This happens because the allnulls flag is used for two separate
purposes - to mark empty ranges (not representing any rows yet) and
ranges containing only NULL values.
Opclasses don't know which of these cases it is, and so don't know
whether to set hasnulls=true. Setting hasnulls=true in both cases would
make it correct, but it would also make BRIN indexes useless for queries
with IS NULL clauses - all ranges start empty (and thus allnulls=true),
so all ranges would end up with either allnulls=true or hasnulls=true.
The severity of the issue is somewhat reduced by the fact that it only
happens when adding values to an existing summary with allnulls=true,
not when the summarization is processing values in bulk (e.g. during
CREATE INDEX or automatic summarization). In this case the flags were
updated in a slightly different way, not forgetting the NULL values.
The best solution would be to introduce a new flag marking index tuples
representing ranges with no rows, but that would break on-disk format
and/or ABI, depending on where we put the flag. Considering we need to
backpatch this, that's not acceptable.
So instead we use an "impossible" combination of both flags (allnulls
and hasnulls) set to true, to mark "empty" ranges with no rows. In
principle "empty" is a feature of the whole index tuple, which may
contain multiple summaries in a multi-column index, but this is where
the flags are, unfortunately.
We could also skip storing index tuples for empty summaries, but then
we'd have to always process such ranges - even if there are no rows in
large parts of the table (e.g. after a bulk DELETE), it would still
require reading the pages etc. So we store them, but ignore them when
building the bitmap.
Backpatch to 11. The issue exists since BRIN indexes were introduced in
9.5, but older releases are already EOL.
Backpatch-through: 11
Reviewed-by: Justin Pryzby, Matthias van de Meent
Discussion: https://postgr.es/m/402430e4-7d9d-6cf1-09ef-464d80afff3b@enterprisedb.com
---
src/backend/access/brin/brin.c | 223 +++++++++++++++++-
src/backend/access/brin/brin_tuple.c | 31 ++-
...summarization-and-inprogress-insertion.out | 8 +-
...ummarization-and-inprogress-insertion.spec | 1 +
4 files changed, 244 insertions(+), 19 deletions(-)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index b5a5fa7b334..a7c2c072bd4 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -70,6 +70,8 @@ typedef struct BrinOpaque
#define BRIN_ALL_BLOCKRANGES InvalidBlockNumber
+#define BRIN_RANGE_IS_EMPTY(col) ((col)->bv_allnulls && (col)->bv_hasnulls)
+
static BrinBuildState *initialize_brin_buildstate(Relation idxRel,
BrinRevmap *revmap, BlockNumber pagesPerRange);
static void terminate_brin_buildstate(BrinBuildState *state);
@@ -591,6 +593,17 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
bval = &dtup->bt_columns[attno - 1];
+ /*
+ * If the range has both allnulls and hasnulls set, it means
+ * there are no rows in the range, so we can skip it (we know
+ * there's nothing to match).
+ */
+ if (BRIN_RANGE_IS_EMPTY(bval))
+ {
+ addrange = false;
+ break;
+ }
+
/*
* First check if there are any IS [NOT] NULL scan keys,
* and if we're violating them. In that case we can
@@ -1615,26 +1628,99 @@ union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
if (opcinfo->oi_regular_nulls)
{
- /* Adjust "hasnulls". */
- if (!col_a->bv_hasnulls && col_b->bv_hasnulls)
- col_a->bv_hasnulls = true;
+ /*
+ * If B is empty (represents no rows), ignore it and just keep
+ * A as is (might be empty etc.).
+ */
+ if (BRIN_RANGE_IS_EMPTY(col_b))
+ continue;
+
+ /*
+ * Now we know B is not empty - it has either NULLs or data, or
+ * some combination of it. We need to merge it into A somehow.
+ *
+ * If A is empty, we simply copy all the flags and data from B.
+ */
+ if (BRIN_RANGE_IS_EMPTY(col_a))
+ {
+ int i;
+
+ col_a->bv_allnulls = col_b->bv_allnulls;
+ col_a->bv_hasnulls = col_b->bv_hasnulls;
+
+ /* If B has no data, we're done. */
+ if (col_b->bv_allnulls)
+ continue;
+
+ for (i = 0; i < opcinfo->oi_nstored; i++)
+ col_a->bv_values[i] =
+ datumCopy(col_b->bv_values[i],
+ opcinfo->oi_typcache[i]->typbyval,
+ opcinfo->oi_typcache[i]->typlen);
- /* If there are no values in B, there's nothing left to do. */
- if (col_b->bv_allnulls)
continue;
+ }
/*
- * Adjust "allnulls". If A doesn't have values, just copy the
- * values from B into A, and we're done. We cannot run the
- * operators in this case, because values in A might contain
- * garbage. Note we already established that B contains values.
+ * Both A and B are not empty, and we need to merge B into A.
+ * There are multiple combinations of allnulls/hasnulls flags.
+ * We've handled the "empty" case on either side above, so we
+ * can ignore those cases - which leaves 3 flag combinations
+ * on each side, so 9 combinations in total.
+ *
+ * A:all A:has B:all B:has
+ * true false true false - nothing to do
+ * true false false true - set A:has=true, copy from B
+ * true false false false - set A:has=true, copy from B
+ *
+ * false true true false - nothing to do
+ * false true false true - flags OK, call union proc
+ * false true false false - flags OK, call union proc
+ *
+ * false false true false - set A:has=true
+ * false false false true - set A:has=true, call union proc
+ * false false false false - flags OK, call union proc
*/
- if (col_a->bv_allnulls)
+ if (col_a->bv_allnulls && col_b->bv_allnulls)
+ {
+ /* nothing to do - both sides are NULL-only */
+ continue;
+ }
+ else if (col_a->bv_allnulls && col_b->bv_hasnulls)
{
- int i;
+ int i;
+ /*
+ * A is NULL-only, but B has some non-NULL values too. So the
+ * result has both NULLs and non-NULL values.
+ */
+ col_a->bv_allnulls = false;
+ col_a->bv_hasnulls = true;
+ /* copy data from B to A */
+ for (i = 0; i < opcinfo->oi_nstored; i++)
+ col_a->bv_values[i] =
+ datumCopy(col_b->bv_values[i],
+ opcinfo->oi_typcache[i]->typbyval,
+ opcinfo->oi_typcache[i]->typlen);
+
+ continue;
+ }
+ else if (col_a->bv_allnulls) /* B has no NULLs */
+ {
+ int i;
+
+ /*
+ * A is NULL-only, but B has some non-NULL values too. So the
+ * result has both NULLs and non-NULL values.
+ *
+ * XXX This is the same as the preceding branch, but I've left
+ * it here to keep the branches mapped 1:1 to the table of
+ * combinations.
+ */
col_a->bv_allnulls = false;
+ col_a->bv_hasnulls = true;
+ /* copy data from B to A */
for (i = 0; i < opcinfo->oi_nstored; i++)
col_a->bv_values[i] =
datumCopy(col_b->bv_values[i],
@@ -1643,6 +1729,55 @@ union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
continue;
}
+ else if (col_a->bv_hasnulls && col_b->bv_allnulls)
+ {
+ /* Nothing to do (flags are correct, no data to copy). */
+ continue;
+ }
+ else if (col_a->bv_hasnulls && col_b->bv_hasnulls)
+ {
+ /*
+ * Flags are correct, but both A and B have non-NULL values.
+ * So we have to call the support proc BRIN_PROCNUM_UNION
+ * (so no 'continue' here).
+ */
+ }
+ else if (col_a->bv_hasnulls) /* B has no NULLs */
+ {
+ /*
+ * B has no NULL values, so flags are OK. But both sides have
+ * some non-NULL values, so we have to call the support proc
+ * (so no 'continue' here).
+ *
+ * XXX Same as the preceding branch, but kept for 1:1 mapping.
+ */
+ }
+ else if (col_b->bv_allnulls) /* A has no NULLs */
+ {
+ /*
+ * Just update the hasnulls flag to remember B has NULL values
+ * and we're done (no data non-NULL values to copy/merge).
+ */
+ col_a->bv_hasnulls = true;
+ continue;
+ }
+ else if (col_b->bv_hasnulls) /* A has no NULLs */
+ {
+ /*
+ * Update the hasnulls flag to remember B has NULL values, but
+ * both sides have some non-NULL data so we needto call the
+ * BRIN_PROCNUM_UNION procedure (so no 'continue' here).
+ */
+ col_a->bv_hasnulls = true;
+ }
+ else
+ {
+ /*
+ * Neither side has any NULL values, both sides have non-NULL
+ * values, so we need to call the BRIN_PROCNUM_UNION proc (so
+ * no 'continue' here).
+ */
+ }
}
unionFn = index_getprocinfo(bdesc->bd_index, keyno + 1,
@@ -1717,19 +1852,67 @@ add_values_to_range(Relation idxRel, BrinDesc *bdesc, BrinMemTuple *dtup,
Datum result;
BrinValues *bval;
FmgrInfo *addValue;
+ bool hasnulls;
bval = &dtup->bt_columns[keyno];
+ /*
+ * Does the range have actual NULL values? Either of the flags can
+ * be set, but we ignore the state before adding first row.
+ *
+ * We have to remember this, because we'll modify the flags and we
+ * need to know if the range started as empty.
+ */
+ hasnulls = (!BRIN_RANGE_IS_EMPTY(bval)) &&
+ (bval->bv_hasnulls || bval->bv_allnulls);
+
+ /*
+ * We need to consider whether the range is empty (not representing
+ * any rows yet), i.e. if it has both flags (allnulls hasnulls) set
+ * to true.
+ *
+ * If the range is empty, we clear the hasnulls flag - after adding
+ * a value it won't be empty anymore. Either it'll be all-NULL (and
+ * leaving allnulls=true covers that), or it will have no NULLs at
+ * all (but building the state is up to the opclass).
+ *
+ * If the range is not empty, we remember if there are NULL values.
+ * In this case both flags can't be set to true (that'd be empty
+ * range), so it's either allnulls=true or hasnulls=true. But the
+ * opclasses clear allnulls when adding the first non-NULL value,
+ * so we need to remember this.
+ *
+ * When adding a null value we can do everything locally, without
+ * calling BRIN_PROCNUM_ADDVALUE.
+ */
+ if (BRIN_RANGE_IS_EMPTY(bval))
+ {
+ bval->bv_hasnulls = false;
+ modified = true;
+ }
+
+ /*
+ * If the value we're adding is NULL, handle it locally. Otherwise
+ * call the BRIN_PROCNUM_ADDVALUE procedure.
+ */
if (bdesc->bd_info[keyno]->oi_regular_nulls && nulls[keyno])
{
/*
* If the new value is null, we record that we saw it if it's the
* first one; otherwise, there's nothing to do.
+ *
+ * We can't check "bv_hasnulls" because then we might end up with
+ * both flags set to true, which is interpreted as empty range.
+ * But that'd be wrong, because we've just added a value.
+ *
+ * So either the range has allnulls=true, or we have to set the
+ * hasnulls flag. Check if we're changing the value to determine
+ * if the index tuple was modified.
*/
- if (!bval->bv_hasnulls)
+ if (!bval->bv_allnulls)
{
+ modified |= (!bval->bv_hasnulls);
bval->bv_hasnulls = true;
- modified = true;
}
continue;
@@ -1745,6 +1928,20 @@ add_values_to_range(Relation idxRel, BrinDesc *bdesc, BrinMemTuple *dtup,
nulls[keyno]);
/* if that returned true, we need to insert the updated tuple */
modified |= DatumGetBool(result);
+
+ /*
+ * If the range was not empty and had NULL values, make sure we don't
+ * forget about the NULL values. Either the allnulls flag is still set
+ * to true, or (if the opclass cleared it) we need to set hasnulls=true.
+ */
+ if (hasnulls && !bval->bv_allnulls)
+ {
+ modified |= (!bval->bv_hasnulls);
+ bval->bv_hasnulls = true;
+ }
+
+ /* We've added a row, so the summary should not be empty. */
+ Assert(!BRIN_RANGE_IS_EMPTY(bval));
}
return modified;
diff --git a/src/backend/access/brin/brin_tuple.c b/src/backend/access/brin/brin_tuple.c
index 84b79dbfc0d..b2292a0c1a9 100644
--- a/src/backend/access/brin/brin_tuple.c
+++ b/src/backend/access/brin/brin_tuple.c
@@ -417,7 +417,20 @@ brin_form_placeholder_tuple(BrinDesc *brdesc, BlockNumber blkno, Size *size)
*bitP |= bitmask;
}
- /* no need to set hasnulls */
+ /* set hasnulls true for all attributes */
+ for (keyno = 0; keyno < brdesc->bd_tupdesc->natts; keyno++)
+ {
+ if (bitmask != HIGHBIT)
+ bitmask <<= 1;
+ else
+ {
+ bitP += 1;
+ *bitP = 0x0;
+ bitmask = 1;
+ }
+
+ *bitP |= bitmask;
+ }
*size = len;
return rettuple;
@@ -516,8 +529,15 @@ brin_memtuple_initialize(BrinMemTuple *dtuple, BrinDesc *brdesc)
for (i = 0; i < brdesc->bd_tupdesc->natts; i++)
{
dtuple->bt_columns[i].bv_attno = i + 1;
+
+ /*
+ * Each memtuple starts as if it represents no rows, which is indicated
+ * by having bot allnulls and hasnulls set to true. We track this for
+ * all columns, because we don't have a flag for the whole memtuple.
+ */
dtuple->bt_columns[i].bv_allnulls = true;
- dtuple->bt_columns[i].bv_hasnulls = false;
+ dtuple->bt_columns[i].bv_hasnulls = true;
+
dtuple->bt_columns[i].bv_values = (Datum *) currdatum;
dtuple->bt_columns[i].bv_mem_value = PointerGetDatum(NULL);
@@ -585,6 +605,13 @@ brin_deform_tuple(BrinDesc *brdesc, BrinTuple *tuple, BrinMemTuple *dMemtuple)
{
int i;
+ /*
+ * Make sure to overwrite the hasnulls flag, because it was initialized
+ * to true by brin_memtuple_initialize and we don't want to skip it if
+ * allnulls=true.
+ */
+ dtup->bt_columns[keyno].bv_hasnulls = hasnulls[keyno];
+
if (allnulls[keyno])
{
valueno += brdesc->bd_info[keyno]->oi_nstored;
diff --git a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
index 2a4755d0998..584ac2602f7 100644
--- a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
+++ b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
@@ -4,7 +4,7 @@ starting permutation: s2check s1b s2b s1i s2summ s1c s2c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -26,7 +26,7 @@ step s2c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
2| 1| 1|f |f |f |{1 .. 1000}
(2 rows)
@@ -35,7 +35,7 @@ starting permutation: s2check s1b s1i s2vacuum s1c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -45,7 +45,7 @@ step s1c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
2| 1| 1|f |f |f |{1 .. 1000}
(2 rows)
diff --git a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
index 19ac18a2e88..18ba92b7ba1 100644
--- a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
+++ b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
@@ -9,6 +9,7 @@ setup
) WITH (fillfactor=10);
CREATE INDEX brinidx ON brin_iso USING brin (value) WITH (pages_per_range=1);
-- this fills the first page
+ INSERT INTO brin_iso VALUES (NULL);
DO $$
DECLARE curtid tid;
BEGIN
--
2.39.2
0001-Fix-handling-of-NULLs-in-BRIN-indexes-11-13-20230224.patchtext/x-patch; charset=UTF-8; name=0001-Fix-handling-of-NULLs-in-BRIN-indexes-11-13-20230224.patchDownload
From fd5f37eafc27f42674768ea5593e3309f5ad07a7 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 8 Jan 2023 22:04:41 +0100
Subject: [PATCH] Fix handling of NULLs in BRIN indexes
BRIN indexes did not properly distinguish between summaries for empty
(no rows) and all-NULL ranges. All summaries were initialized with
allnulls=true, and the opclasses simply reset allnulls to false when
processing the first non-NULL value. This however fails if the range
starts with a NULL value (or a sequence of NULL values), in which case
we forget the range contains NULL values.
This happens because the allnulls flag is used for two separate
purposes - to mark empty ranges (not representing any rows yet) and
ranges containing only NULL values.
Opclasses don't know which of these cases it is, and so don't know
whether to set hasnulls=true. Setting hasnulls=true in both cases would
make it correct, but it would also make BRIN indexes useless for queries
with IS NULL clauses - all ranges start empty (and thus allnulls=true),
so all ranges would end up with either allnulls=true or hasnulls=true.
The severity of the issue is somewhat reduced by the fact that it only
happens when adding values to an existing summary with allnulls=true,
not when the summarization is processing values in bulk (e.g. during
CREATE INDEX or automatic summarization). In this case the flags were
updated in a slightly different way, not forgetting the NULL values.
The best solution would be to introduce a new flag marking index tuples
representing ranges with no rows, but that would break on-disk format
and/or ABI, depending on where we put the flag. Considering we need to
backpatch this, that's not acceptable.
So instead we use an "impossible" combination of both flags (allnulls
and hasnulls) set to true, to mark "empty" ranges with no rows. In
principle "empty" is a feature of the whole index tuple, which may
contain multiple summaries in a multi-column index, but this is where
the flags are, unfortunately.
We could also skip storing index tuples for empty summaries, but then
we'd have to always process such ranges - even if there are no rows in
large parts of the table (e.g. after a bulk DELETE), it would still
require reading the pages etc. So we store them, but ignore them when
building the bitmap.
Backpatch to 11. The issue exists since BRIN indexes were introduced in
9.5, but older releases are already EOL.
Backpatch-through: 11
Reviewed-by: Justin Pryzby, Matthias van de Meent
Discussion: https://postgr.es/m/402430e4-7d9d-6cf1-09ef-464d80afff3b@enterprisedb.com
---
src/backend/access/brin/brin.c | 337 +++++++++++++++++-
src/backend/access/brin/brin_inclusion.c | 46 +--
src/backend/access/brin/brin_minmax.c | 43 +--
src/backend/access/brin/brin_tuple.c | 31 +-
...summarization-and-inprogress-insertion.out | 8 +-
...ummarization-and-inprogress-insertion.spec | 1 +
6 files changed, 369 insertions(+), 97 deletions(-)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 0becfde1133..5ede5a88367 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -35,6 +35,7 @@
#include "storage/freespace.h"
#include "utils/acl.h"
#include "utils/builtins.h"
+#include "utils/datum.h"
#include "utils/index_selfuncs.h"
#include "utils/memutils.h"
#include "utils/rel.h"
@@ -68,6 +69,8 @@ typedef struct BrinOpaque
#define BRIN_ALL_BLOCKRANGES InvalidBlockNumber
+#define BRIN_RANGE_IS_EMPTY(col) ((col)->bv_allnulls && (col)->bv_hasnulls)
+
static BrinBuildState *initialize_brin_buildstate(Relation idxRel,
BrinRevmap *revmap, BlockNumber pagesPerRange);
static void terminate_brin_buildstate(BrinBuildState *state);
@@ -253,18 +256,94 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
Datum result;
BrinValues *bval;
FmgrInfo *addValue;
+ bool hasnulls;
bval = &dtup->bt_columns[keyno];
- addValue = index_getprocinfo(idxRel, keyno + 1,
- BRIN_PROCNUM_ADDVALUE);
- result = FunctionCall4Coll(addValue,
- idxRel->rd_indcollation[keyno],
- PointerGetDatum(bdesc),
- PointerGetDatum(bval),
- values[keyno],
- nulls[keyno]);
- /* if that returned true, we need to insert the updated tuple */
- need_insert |= DatumGetBool(result);
+
+ /*
+ * Does the range have actual NULL values? Either of the flags can
+ * be set, but we ignore the state before adding first row.
+ *
+ * We have to remember this, because we'll modify the flags and we
+ * need to know if the range started as empty.
+ */
+ hasnulls = (!BRIN_RANGE_IS_EMPTY(bval)) &&
+ (bval->bv_hasnulls || bval->bv_allnulls);
+
+ /*
+ * We need to consider whether the range is empty (not representing
+ * any rows yet), i.e. if it has both flags (allnulls hasnulls) set
+ * to true.
+ *
+ * If the range is empty, we clear the hasnulls flag - after adding
+ * a value it won't be empty anymore. Either it'll be all-NULL (and
+ * leaving allnulls=true covers that), or it will have no NULLs at
+ * all (but building the state is up to the opclass).
+ *
+ * If the range is not empty, we remember if there are NULL values.
+ * In this case both flags can't be set to true (that'd be empty
+ * range), so it's either allnulls=true or hasnulls=true. But the
+ * opclasses clear allnulls when adding the first non-NULL value,
+ * so we need to remember this.
+ *
+ * When adding a null value we can do everything locally, without
+ * calling BRIN_PROCNUM_ADDVALUE.
+ */
+ if (BRIN_RANGE_IS_EMPTY(bval))
+ {
+ bval->bv_hasnulls = false;
+ need_insert = true;
+ }
+
+ /*
+ * If the value we're adding is NULL, handle it locally. Otherwise
+ * call the BRIN_PROCNUM_ADDVALUE procedure.
+ */
+ if (nulls[keyno])
+ {
+ /*
+ * We can't check "bv_hasnulls" because then we might end up with
+ * both flags set to true, which is interpreted as empty range.
+ * But that'd be wrong, because we've just added a value.
+ *
+ * So either the range has allnulls=true, or we have to set the
+ * hasnulls flag. Check if we're changing the value to determine
+ * if the index tuple was modified.
+ */
+ if (!bval->bv_allnulls)
+ {
+ /* Are we changing the tuple? */
+ need_insert |= (!bval->bv_hasnulls);
+ bval->bv_hasnulls = true;
+ }
+ }
+ else
+ {
+ addValue = index_getprocinfo(idxRel, keyno + 1,
+ BRIN_PROCNUM_ADDVALUE);
+ result = FunctionCall4Coll(addValue,
+ idxRel->rd_indcollation[keyno],
+ PointerGetDatum(bdesc),
+ PointerGetDatum(bval),
+ values[keyno],
+ nulls[keyno]);
+ /* if that returned true, we need to insert the updated tuple */
+ need_insert |= DatumGetBool(result);
+ }
+
+ /*
+ * If the range was not an empty range (it'd have hasnulls=false),
+ * make sure we remember there were NULL values. Either the allnulls
+ * flag is still set to true, or we need to set the hasnulls flag.
+ */
+ if (hasnulls && !bval->bv_allnulls)
+ {
+ need_insert |= (!bval->bv_hasnulls);
+ bval->bv_hasnulls = true;
+ }
+
+ /* We've added a row, so the summary should not be empty. */
+ Assert(!BRIN_RANGE_IS_EMPTY(bval));
}
if (!need_insert)
@@ -508,6 +587,17 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
CurrentMemoryContext);
}
+ /*
+ * If the range has both allnulls and hasnulls set, it means
+ * there are no rows in the range, so we can skip it (we know
+ * there's nothing to match).
+ */
+ if (BRIN_RANGE_IS_EMPTY(bval))
+ {
+ addrange = false;
+ break;
+ }
+
/*
* Check whether the scan key is consistent with the page
* range values; if so, have the pages in the range added
@@ -645,19 +735,80 @@ brinbuildCallback(Relation index,
FmgrInfo *addValue;
BrinValues *col;
Form_pg_attribute attr = TupleDescAttr(state->bs_bdesc->bd_tupdesc, i);
+ bool hasnulls;
col = &state->bs_dtuple->bt_columns[i];
- addValue = index_getprocinfo(index, i + 1,
- BRIN_PROCNUM_ADDVALUE);
+
+ /*
+ * Does the range have actual NULL values? Either of the flags can
+ * be set, but we ignore the state before adding first row.
+ *
+ * We have to remember this, because we'll modify the flags and we
+ * need to know if the range started as empty.
+ */
+ hasnulls = (!BRIN_RANGE_IS_EMPTY(col)) &&
+ (col->bv_hasnulls || col->bv_allnulls);
+
+ /*
+ * We need to consider whether the range is empty (not representing
+ * any rows yet), i.e. if it has both flags (allnulls hasnulls) set
+ * to true.
+ *
+ * If the range is empty, we clear the hasnulls flag - after adding
+ * a value it won't be empty anymore. Either it'll be all-NULL (and
+ * leaving allnulls=true covers that), or it will have no NULLs at
+ * all (but building the state is up to the opclass).
+ *
+ * If the range is not empty, we remember if there are NULL values.
+ * In this case both flags can't be set to true (that'd be empty
+ * range), so it's either allnulls=true or hasnulls=true. But the
+ * opclasses clear allnulls when adding the first non-NULL value,
+ * so we need to remember this.
+ *
+ * When adding a null value we can do everything locally, without
+ * calling BRIN_PROCNUM_ADDVALUE.
+ */
+ if (BRIN_RANGE_IS_EMPTY(col))
+ col->bv_hasnulls = false;
+
+ if (isnull[i])
+ {
+ /*
+ * We can't check "bv_hasnulls" because then we might end up with
+ * both flags set to true, which is interpreted as empty range.
+ * But that'd be wrong, because we've just added a value.
+ *
+ * So either the range has allnulls=true, or we have to set the
+ * hasnulls flag.
+ */
+ if (!col->bv_allnulls)
+ col->bv_hasnulls = true;
+ }
+ else
+ {
+ addValue = index_getprocinfo(index, i + 1,
+ BRIN_PROCNUM_ADDVALUE);
+
+ /*
+ * Update dtuple state, if and as necessary.
+ */
+ FunctionCall4Coll(addValue,
+ attr->attcollation,
+ PointerGetDatum(state->bs_bdesc),
+ PointerGetDatum(col),
+ values[i], isnull[i]);
+ }
/*
- * Update dtuple state, if and as necessary.
+ * If the range was not an empty range (it'd have hasnulls=false),
+ * make sure we remember there were NULL values. Either the allnulls
+ * flag is still set to true, or we need to set the hasnulls flag.
*/
- FunctionCall4Coll(addValue,
- attr->attcollation,
- PointerGetDatum(state->bs_bdesc),
- PointerGetDatum(col),
- values[i], isnull[i]);
+ if (hasnulls && !col->bv_allnulls)
+ col->bv_hasnulls = true;
+
+ /* We've added a row, so the summary should not be empty. */
+ Assert(!BRIN_RANGE_IS_EMPTY(col));
}
}
@@ -1468,9 +1619,159 @@ union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
for (keyno = 0; keyno < bdesc->bd_tupdesc->natts; keyno++)
{
FmgrInfo *unionFn;
+ BrinOpcInfo *opcinfo = bdesc->bd_info[keyno];
BrinValues *col_a = &a->bt_columns[keyno];
BrinValues *col_b = &db->bt_columns[keyno];
+ /*
+ * If B is empty (represents no rows), ignore it and just keep
+ * A as is (might be empty etc.).
+ */
+ if (BRIN_RANGE_IS_EMPTY(col_b))
+ continue;
+
+ /*
+ * Now we know B is not empty - it has either NULLs or data, or
+ * some combination of it. We need to merge it into A somehow.
+ *
+ * If A is empty, we simply copy all the flags and data from B.
+ */
+ if (BRIN_RANGE_IS_EMPTY(col_a))
+ {
+ int i;
+
+ col_a->bv_allnulls = col_b->bv_allnulls;
+ col_a->bv_hasnulls = col_b->bv_hasnulls;
+
+ /* If B has no data, we're done. */
+ if (col_b->bv_allnulls)
+ continue;
+
+ for (i = 0; i < opcinfo->oi_nstored; i++)
+ col_a->bv_values[i] =
+ datumCopy(col_b->bv_values[i],
+ opcinfo->oi_typcache[i]->typbyval,
+ opcinfo->oi_typcache[i]->typlen);
+ }
+
+ /*
+ * Both A and B are not empty, and we need to merge B into A.
+ * There are multiple combinations of allnulls/hasnulls flags.
+ * We've handled the "empty" case on either side above, so we
+ * can ignore those cases - which leaves 3 flag combinations
+ * on each side, so 9 combinations in total.
+ *
+ * A:all A:has B:all B:has
+ * true false true false - nothing to do
+ * true false false true - set A:has=true, copy from B
+ * true false false false - set A:has=true, copy from B
+ *
+ * false true true false - nothing to do
+ * false true false true - flags OK, call union proc
+ * false true false false - flags OK, call union proc
+ *
+ * false false true false - set A:has=true
+ * false false false true - set A:has=true, call union proc
+ * false false false false - flags OK, call union proc
+ */
+ if (col_a->bv_allnulls && col_b->bv_allnulls)
+ {
+ /* nothing to do - both sides are NULL-only */
+ continue;
+ }
+ else if (col_a->bv_allnulls && col_b->bv_hasnulls)
+ {
+ int i;
+ /*
+ * A is NULL-only, but B has some non-NULL values too. So the
+ * result has both NULLs and non-NULL values.
+ */
+ col_a->bv_allnulls = false;
+ col_a->bv_hasnulls = true;
+
+ /* copy data from B to A */
+ for (i = 0; i < opcinfo->oi_nstored; i++)
+ col_a->bv_values[i] =
+ datumCopy(col_b->bv_values[i],
+ opcinfo->oi_typcache[i]->typbyval,
+ opcinfo->oi_typcache[i]->typlen);
+
+ continue;
+ }
+ else if (col_a->bv_allnulls) /* B has no NULLs */
+ {
+ int i;
+
+ /*
+ * A is NULL-only, but B has some non-NULL values too. So the
+ * result has both NULLs and non-NULL values.
+ *
+ * XXX This is the same as the preceding branch, but I've left
+ * it here to keep the branches mapped 1:1 to the table of
+ * combinations.
+ */
+ col_a->bv_allnulls = false;
+ col_a->bv_hasnulls = true;
+
+ /* copy data from B to A */
+ for (i = 0; i < opcinfo->oi_nstored; i++)
+ col_a->bv_values[i] =
+ datumCopy(col_b->bv_values[i],
+ opcinfo->oi_typcache[i]->typbyval,
+ opcinfo->oi_typcache[i]->typlen);
+
+ continue;
+ }
+ else if (col_a->bv_hasnulls && col_b->bv_allnulls)
+ {
+ /* Nothing to do (flags are correct, no data to copy). */
+ continue;
+ }
+ else if (col_a->bv_hasnulls && col_b->bv_hasnulls)
+ {
+ /*
+ * Flags are correct, but both A and B have non-NULL values.
+ * So we have to call the support proc BRIN_PROCNUM_UNION
+ * (so no 'continue' here).
+ */
+ }
+ else if (col_a->bv_hasnulls) /* B has no NULLs */
+ {
+ /*
+ * B has no NULL values, so flags are OK. But both sides have
+ * some non-NULL values, so we have to call the support proc
+ * (so no 'continue' here).
+ *
+ * XXX Same as the preceding branch, but kept for 1:1 mapping.
+ */
+ }
+ else if (col_b->bv_allnulls) /* A has no NULLs */
+ {
+ /*
+ * Just update the hasnulls flag to remember B has NULL values
+ * and we're done (no data non-NULL values to copy/merge).
+ */
+ col_a->bv_hasnulls = true;
+ continue;
+ }
+ else if (col_b->bv_hasnulls) /* A has no NULLs */
+ {
+ /*
+ * Update the hasnulls flag to remember B has NULL values, but
+ * both sides have some non-NULL data so we needto call the
+ * BRIN_PROCNUM_UNION procedure (so no 'continue' here).
+ */
+ col_a->bv_hasnulls = true;
+ }
+ else
+ {
+ /*
+ * Neither side has any NULL values, both sides have non-NULL
+ * values, so we need to call the BRIN_PROCNUM_UNION proc (so
+ * no 'continue' here).
+ */
+ }
+
unionFn = index_getprocinfo(bdesc->bd_index, keyno + 1,
BRIN_PROCNUM_UNION);
FunctionCall3Coll(unionFn,
diff --git a/src/backend/access/brin/brin_inclusion.c b/src/backend/access/brin/brin_inclusion.c
index 7e380d66ed5..f9217ca8254 100644
--- a/src/backend/access/brin/brin_inclusion.c
+++ b/src/backend/access/brin/brin_inclusion.c
@@ -147,18 +147,8 @@ brin_inclusion_add_value(PG_FUNCTION_ARGS)
AttrNumber attno;
Form_pg_attribute attr;
- /*
- * If the new value is null, we record that we saw it if it's the first
- * one; otherwise, there's nothing to do.
- */
- if (isnull)
- {
- if (column->bv_hasnulls)
- PG_RETURN_BOOL(false);
-
- column->bv_hasnulls = true;
- PG_RETURN_BOOL(true);
- }
+ /* We're not passing NULL values to the opclass anymore. */
+ Assert(!isnull);
attno = column->bv_attno;
attr = TupleDescAttr(bdesc->bd_tupdesc, attno - 1);
@@ -517,36 +507,16 @@ brin_inclusion_union(PG_FUNCTION_ARGS)
Assert(col_a->bv_attno == col_b->bv_attno);
- /* Adjust "hasnulls". */
- if (!col_a->bv_hasnulls && col_b->bv_hasnulls)
- col_a->bv_hasnulls = true;
-
- /* If there are no values in B, there's nothing left to do. */
- if (col_b->bv_allnulls)
- PG_RETURN_VOID();
+ /*
+ * All-null summaries are no longer passed to the union proc (this also
+ * implies the summaries are not empty).
+ */
+ Assert(!col_a->bv_allnulls);
+ Assert(!col_b->bv_allnulls);
attno = col_a->bv_attno;
attr = TupleDescAttr(bdesc->bd_tupdesc, attno - 1);
- /*
- * Adjust "allnulls". If A doesn't have values, just copy the values from
- * B into A, and we're done. We cannot run the operators in this case,
- * because values in A might contain garbage. Note we already established
- * that B contains values.
- */
- if (col_a->bv_allnulls)
- {
- col_a->bv_allnulls = false;
- col_a->bv_values[INCLUSION_UNION] =
- datumCopy(col_b->bv_values[INCLUSION_UNION],
- attr->attbyval, attr->attlen);
- col_a->bv_values[INCLUSION_UNMERGEABLE] =
- col_b->bv_values[INCLUSION_UNMERGEABLE];
- col_a->bv_values[INCLUSION_CONTAINS_EMPTY] =
- col_b->bv_values[INCLUSION_CONTAINS_EMPTY];
- PG_RETURN_VOID();
- }
-
/* If B includes empty elements, mark A similarly, if needed. */
if (!DatumGetBool(col_a->bv_values[INCLUSION_CONTAINS_EMPTY]) &&
DatumGetBool(col_b->bv_values[INCLUSION_CONTAINS_EMPTY]))
diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 4b5d6a72135..f2748d2e267 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -75,18 +75,8 @@ brin_minmax_add_value(PG_FUNCTION_ARGS)
Form_pg_attribute attr;
AttrNumber attno;
- /*
- * If the new value is null, we record that we saw it if it's the first
- * one; otherwise, there's nothing to do.
- */
- if (isnull)
- {
- if (column->bv_hasnulls)
- PG_RETURN_BOOL(false);
-
- column->bv_hasnulls = true;
- PG_RETURN_BOOL(true);
- }
+ /* We're not passing NULL values to the opclass anymore. */
+ Assert(!isnull);
attno = column->bv_attno;
attr = TupleDescAttr(bdesc->bd_tupdesc, attno - 1);
@@ -250,33 +240,16 @@ brin_minmax_union(PG_FUNCTION_ARGS)
Assert(col_a->bv_attno == col_b->bv_attno);
- /* Adjust "hasnulls" */
- if (!col_a->bv_hasnulls && col_b->bv_hasnulls)
- col_a->bv_hasnulls = true;
-
- /* If there are no values in B, there's nothing left to do */
- if (col_b->bv_allnulls)
- PG_RETURN_VOID();
+ /*
+ * All-null summaries are no longer passed to the union proc (this also
+ * implies the summaries are not empty).
+ */
+ Assert(!col_a->bv_allnulls);
+ Assert(!col_b->bv_allnulls);
attno = col_a->bv_attno;
attr = TupleDescAttr(bdesc->bd_tupdesc, attno - 1);
- /*
- * Adjust "allnulls". If A doesn't have values, just copy the values from
- * B into A, and we're done. We cannot run the operators in this case,
- * because values in A might contain garbage. Note we already established
- * that B contains values.
- */
- if (col_a->bv_allnulls)
- {
- col_a->bv_allnulls = false;
- col_a->bv_values[0] = datumCopy(col_b->bv_values[0],
- attr->attbyval, attr->attlen);
- col_a->bv_values[1] = datumCopy(col_b->bv_values[1],
- attr->attbyval, attr->attlen);
- PG_RETURN_VOID();
- }
-
/* Adjust minimum, if B's min is less than A's min */
finfo = minmax_get_strategy_procinfo(bdesc, attno, attr->atttypid,
BTLessStrategyNumber);
diff --git a/src/backend/access/brin/brin_tuple.c b/src/backend/access/brin/brin_tuple.c
index b3b453aed12..861dc76b7d3 100644
--- a/src/backend/access/brin/brin_tuple.c
+++ b/src/backend/access/brin/brin_tuple.c
@@ -394,7 +394,20 @@ brin_form_placeholder_tuple(BrinDesc *brdesc, BlockNumber blkno, Size *size)
*bitP |= bitmask;
}
- /* no need to set hasnulls */
+ /* set hasnulls true for all attributes */
+ for (keyno = 0; keyno < brdesc->bd_tupdesc->natts; keyno++)
+ {
+ if (bitmask != HIGHBIT)
+ bitmask <<= 1;
+ else
+ {
+ bitP += 1;
+ *bitP = 0x0;
+ bitmask = 1;
+ }
+
+ *bitP |= bitmask;
+ }
*size = len;
return rettuple;
@@ -493,8 +506,15 @@ brin_memtuple_initialize(BrinMemTuple *dtuple, BrinDesc *brdesc)
for (i = 0; i < brdesc->bd_tupdesc->natts; i++)
{
dtuple->bt_columns[i].bv_attno = i + 1;
+
+ /*
+ * Each memtuple starts as if it represents no rows, which is indicated
+ * by having bot allnulls and hasnulls set to true. We track this for
+ * all columns, because we don't have a flag for the whole memtuple.
+ */
dtuple->bt_columns[i].bv_allnulls = true;
- dtuple->bt_columns[i].bv_hasnulls = false;
+ dtuple->bt_columns[i].bv_hasnulls = true;
+
dtuple->bt_columns[i].bv_values = (Datum *) currdatum;
currdatum += sizeof(Datum) * brdesc->bd_info[i]->oi_nstored;
}
@@ -557,6 +577,13 @@ brin_deform_tuple(BrinDesc *brdesc, BrinTuple *tuple, BrinMemTuple *dMemtuple)
{
int i;
+ /*
+ * Make sure to overwrite the hasnulls flag, because it was initialized
+ * to true by brin_memtuple_initialize and we don't want to skip it if
+ * allnulls=true.
+ */
+ dtup->bt_columns[keyno].bv_hasnulls = hasnulls[keyno];
+
if (allnulls[keyno])
{
valueno += brdesc->bd_info[keyno]->oi_nstored;
diff --git a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
index 2a4755d0998..584ac2602f7 100644
--- a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
+++ b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
@@ -4,7 +4,7 @@ starting permutation: s2check s1b s2b s1i s2summ s1c s2c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -26,7 +26,7 @@ step s2c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
2| 1| 1|f |f |f |{1 .. 1000}
(2 rows)
@@ -35,7 +35,7 @@ starting permutation: s2check s1b s1i s2vacuum s1c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -45,7 +45,7 @@ step s1c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
2| 1| 1|f |f |f |{1 .. 1000}
(2 rows)
diff --git a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
index 19ac18a2e88..18ba92b7ba1 100644
--- a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
+++ b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
@@ -9,6 +9,7 @@ setup
) WITH (fillfactor=10);
CREATE INDEX brinidx ON brin_iso USING brin (value) WITH (pages_per_range=1);
-- this fills the first page
+ INSERT INTO brin_iso VALUES (NULL);
DO $$
DECLARE curtid tid;
BEGIN
--
2.39.2
Thanks for doing all this. (Do I understand correctly that this patch
is not in the commitfest?)
I think my mental model for this was that "allnulls" meant that either
there are no values for the column in question or that the values were
all nulls (For minmax without NULL handling, which is where this all
started, these two things are essentially the same: the range is not to
be returned. So this became a bug the instant I added handling for NULL
values.) I failed to realize that these were two different things, and
this is likely the origin of all these troubles.
What do you think of using the unused bit in BrinTuple->bt_info to
denote a range that contains no heap tuples? This also means we need it
in BrinMemTuple, I think we can do this:
@@ -44,6 +44,7 @@ typedef struct BrinValues
typedef struct BrinMemTuple
{
bool bt_placeholder; /* this is a placeholder tuple */
+ bool bt_empty_range; /* range has no tuples */
BlockNumber bt_blkno; /* heap blkno that the tuple is for */
MemoryContext bt_context; /* memcxt holding the bt_columns values */
/* output arrays for brin_deform_tuple: */
@@ -69,7 +70,7 @@ typedef struct BrinTuple
*
* 7th (high) bit: has nulls
* 6th bit: is placeholder tuple
- * 5th bit: unused
+ * 5th bit: range has no tuples
* 4-0 bit: offset of data
* ---------------
*/
@@ -82,7 +83,7 @@ typedef struct BrinTuple
* bt_info manipulation macros
*/
#define BRIN_OFFSET_MASK 0x1F
-/* bit 0x20 is not used at present */
+#define BRIN_EMPTY_RANGE 0x20
#define BRIN_PLACEHOLDER_MASK 0x40
#define BRIN_NULLS_MASK 0x80
(Note that bt_empty_range uses a hole in the struct, so there's no ABI
change.)
This is BRIN-tuple-level, not column-level, so conceptually it seems
more appropriate. (In the case where both are empty in union_tuples, we
can return without entering the per-attribute loop at all, though I
admit it's not a very interesting case.) This approach avoids having to
invent the strange combination of all+has to mean empty.
On 2023-Feb-24, Tomas Vondra wrote:
I wonder what's the best
way to test this in an automated way - it's very dependent on timing of
the concurrent updated. For example we need to do something like this:T1: run pg_summarize_range() until it inserts the placeholder tuple
T2: do an insert into the page range (updates placeholder)
T1: continue pg_summarize_range() to merge into the placeholderBut there are no convenient ways to do this, I think. I had to check the
various cases using breakpoints in gdb etc.
Yeah, I struggled with this during initial development but in the end
did nothing. I think we would need to introduce some new framework,
perhaps Korotkov stop-events stuff at
/messages/by-id/CAPpHfdsTeb+hBT5=qxghjNG_cHcJLDaNQ9sdy9vNwBF2E2PuZA@mail.gmail.com
which seemed to me a good fit -- we would add a stop point after the
placeholder tuple is inserted.
I'm not very happy with the union_tuples() changes - it's quite verbose,
perhaps a bit too verbose. We have to check for empty ranges first, and
then various combinations of allnulls/hasnulls flags for both BRIN
tuples. There are 9 combinations, and the current code just checks them
one by one - I was getting repeatedly confused by the original code, but
maybe it's too much.
I think it's okay. I tried to make it more compact (by saying "these
two combinations here are case 2, and these two other are case 4", and
keeping each of the other combinations a separate case; so there are
really 7 cases). But that doesn't make it any easier to follow, on the
contrary it was more convoluted. I think a dozen extra lines of source
is not a problem.
The alternative is to apply the same fix to every BRIN_PROCNUM_UNION
opclass procedure out there. I guess doing that for minmax+inclusion is
not a huge deal, but what about external opclasses? And without the fix
the indexes are effectively broken. Fixing this outside in brin.c (in
the union procedure) fixes this for every opclass procedure, without any
actual limitation of functinality (14+ does that anyway).
About the hypothetical question, you could as well ask what about
unicorns. I have never seen any hint that any external opclass exist.
I am all for maintaining compatibility, but I think this concern is
overblown for BRIN. Anyway, I think your proposed fix is better than
changing individual 'union' support procs, so it doesn't matter.
As far as I understood, you're now worried that there will be an
incompatibility because we will fail to call the 'union' procedure in
cases where we previously called it? In other words, you fear that some
hypothetical opclass was handling the NULL values in some way that's
incompatible with this? I haven't thought terribly hard about this, but
I can't see a way for this to cause incompatibilities.
But maybe someone thinks this is a bad idea and we should do something
else in the backbranches?
I think the new handling of NULLs in commit 72ccf55cb99c ("Move IS [NOT]
NULL handling from BRIN support functions") is better than what was
there before, so I don't object to backpatching it now that we know it's
necessary to fix a bug, and also we have field experience that the
approach is solid.
The attached patch is just a pointer to comments that I think need light
edition. There's also a typo "bot" (for "both") in a comment that I
think would go away if you accept my suggestion to store 'empty' at the
tuple level. Note that I worked with the REL_14_STABLE sources, because
for some reason I thought that that was the newest that needed
backpatching of 72ccf55cb99c, but now that I'm finishing this email I
realize that I should have used 13 instead /facepalm
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"La persona que no quería pecar / estaba obligada a sentarse
en duras y empinadas sillas / desprovistas, por cierto
de blandos atenuantes" (Patricio Vogel)
Attachments:
minor-fixes.patch.txttext/plain; charset=us-asciiDownload
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 95ed4ef362..0dddc6fa9c 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -594,9 +594,9 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
bval = &dtup->bt_columns[attno - 1];
/*
- * If the range has both allnulls and hasnulls set, it means
- * there are no rows in the range, so we can skip it (we know
- * there's nothing to match).
+ * If the BRIN tuple indicates that this range is empty,
+ * we can skip it: there's nothing to match. We don't
+ * need to examine the next columns.
*/
if (BRIN_RANGE_IS_EMPTY(bval))
{
diff --git a/src/backend/access/brin/brin_tuple.c b/src/backend/access/brin/brin_tuple.c
index 7355e330f9..b3ba5ac365 100644
--- a/src/backend/access/brin/brin_tuple.c
+++ b/src/backend/access/brin/brin_tuple.c
@@ -607,8 +607,8 @@ brin_deform_tuple(BrinDesc *brdesc, BrinTuple *tuple, BrinMemTuple *dMemtuple)
/*
* Make sure to overwrite the hasnulls flag, because it was initialized
- * to true by brin_memtuple_initialize and we don't want to skip it if
- * allnulls=true.
+ * to true by brin_memtuple_initialize and we don't want to skip [it] if
+ * allnulls=true. (XXX "it" what?)
*/
dtup->bt_columns[keyno].bv_hasnulls = hasnulls[keyno];
On 3/3/23 11:32, Alvaro Herrera wrote:
Thanks for doing all this. (Do I understand correctly that this patch
is not in the commitfest?)I think my mental model for this was that "allnulls" meant that either
there are no values for the column in question or that the values were
all nulls (For minmax without NULL handling, which is where this all
started, these two things are essentially the same: the range is not to
be returned. So this became a bug the instant I added handling for NULL
values.) I failed to realize that these were two different things, and
this is likely the origin of all these troubles.What do you think of using the unused bit in BrinTuple->bt_info to
denote a range that contains no heap tuples? This also means we need it
in BrinMemTuple, I think we can do this:@@ -44,6 +44,7 @@ typedef struct BrinValues typedef struct BrinMemTuple { bool bt_placeholder; /* this is a placeholder tuple */ + bool bt_empty_range; /* range has no tuples */ BlockNumber bt_blkno; /* heap blkno that the tuple is for */ MemoryContext bt_context; /* memcxt holding the bt_columns values */ /* output arrays for brin_deform_tuple: */ @@ -69,7 +70,7 @@ typedef struct BrinTuple * * 7th (high) bit: has nulls * 6th bit: is placeholder tuple - * 5th bit: unused + * 5th bit: range has no tuples * 4-0 bit: offset of data * --------------- */ @@ -82,7 +83,7 @@ typedef struct BrinTuple * bt_info manipulation macros */ #define BRIN_OFFSET_MASK 0x1F -/* bit 0x20 is not used at present */ +#define BRIN_EMPTY_RANGE 0x20 #define BRIN_PLACEHOLDER_MASK 0x40 #define BRIN_NULLS_MASK 0x80(Note that bt_empty_range uses a hole in the struct, so there's no ABI
change.)This is BRIN-tuple-level, not column-level, so conceptually it seems
more appropriate. (In the case where both are empty in union_tuples, we
can return without entering the per-attribute loop at all, though I
admit it's not a very interesting case.) This approach avoids having to
invent the strange combination of all+has to mean empty.
Oh, that's an interesting idea! I haven't realized there's an unused bit
at the tuple level, and I absolutely agree it'd be a better match than
having this in individual summaries (like now).
It'd mean we'd not have the option to fix this withing the opclasses,
because we only pass them the BrinValue and not the tuple. But if you
think that's reasonable, that'd be OK.
The other thing I was unsure is if the bit could be set for any existing
tuples, but AFAICS that shouldn't be possible - brin_form_tuple does
palloc0, so it should be 0.
I suspect doing this might make the patch quite a bit simpler, actually.
On 2023-Feb-24, Tomas Vondra wrote:
I wonder what's the best
way to test this in an automated way - it's very dependent on timing of
the concurrent updated. For example we need to do something like this:T1: run pg_summarize_range() until it inserts the placeholder tuple
T2: do an insert into the page range (updates placeholder)
T1: continue pg_summarize_range() to merge into the placeholderBut there are no convenient ways to do this, I think. I had to check the
various cases using breakpoints in gdb etc.Yeah, I struggled with this during initial development but in the end
did nothing. I think we would need to introduce some new framework,
perhaps Korotkov stop-events stuff at
/messages/by-id/CAPpHfdsTeb+hBT5=qxghjNG_cHcJLDaNQ9sdy9vNwBF2E2PuZA@mail.gmail.com
which seemed to me a good fit -- we would add a stop point after the
placeholder tuple is inserted.
Yeah, but we don't have that at the moment. I actually ended up adding a
couple sleeps during development, which allowed me to hit just the right
order of operations - a poor-man's version of those stop-events. Did
work but hardly an acceptable approach.
I'm not very happy with the union_tuples() changes - it's quite verbose,
perhaps a bit too verbose. We have to check for empty ranges first, and
then various combinations of allnulls/hasnulls flags for both BRIN
tuples. There are 9 combinations, and the current code just checks them
one by one - I was getting repeatedly confused by the original code, but
maybe it's too much.I think it's okay. I tried to make it more compact (by saying "these
two combinations here are case 2, and these two other are case 4", and
keeping each of the other combinations a separate case; so there are
really 7 cases). But that doesn't make it any easier to follow, on the
contrary it was more convoluted. I think a dozen extra lines of source
is not a problem.
OK
The alternative is to apply the same fix to every BRIN_PROCNUM_UNION
opclass procedure out there. I guess doing that for minmax+inclusion is
not a huge deal, but what about external opclasses? And without the fix
the indexes are effectively broken. Fixing this outside in brin.c (in
the union procedure) fixes this for every opclass procedure, without any
actual limitation of functinality (14+ does that anyway).About the hypothetical question, you could as well ask what about
unicorns. I have never seen any hint that any external opclass exist.
I am all for maintaining compatibility, but I think this concern is
overblown for BRIN. Anyway, I think your proposed fix is better than
changing individual 'union' support procs, so it doesn't matter.
OK
As far as I understood, you're now worried that there will be an
incompatibility because we will fail to call the 'union' procedure in
cases where we previously called it? In other words, you fear that some
hypothetical opclass was handling the NULL values in some way that's
incompatible with this? I haven't thought terribly hard about this, but
I can't see a way for this to cause incompatibilities.
Yeah, the possible incompatibility is one concern - I have a hard time
imagining such an opclass, because it'd have to handle NULLs in some
strange way. But and as you noted, we're not aware of any external BRIN
opclasses, so maybe this is OK.
The other concern is more generic - as I mentioned, moving the NULL
handling from opclasses to brin.c is what we did in PG14, so this feels
a bit like a backport, and I dislike that a little bit.
But maybe someone thinks this is a bad idea and we should do something
else in the backbranches?I think the new handling of NULLs in commit 72ccf55cb99c ("Move IS [NOT]
NULL handling from BRIN support functions") is better than what was
there before, so I don't object to backpatching it now that we know it's
necessary to fix a bug, and also we have field experience that the
approach is solid.
OK, good to hear.
The attached patch is just a pointer to comments that I think need light
edition. There's also a typo "bot" (for "both") in a comment that I
think would go away if you accept my suggestion to store 'empty' at the
tuple level. Note that I worked with the REL_14_STABLE sources, because
for some reason I thought that that was the newest that needed
backpatching of 72ccf55cb99c, but now that I'm finishing this email I
realize that I should have used 13 instead /facepalm
Thanks. I'll try to rework the patches to use the bt_info unused bit,
and report back in a week or two.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
It took me a while but I finally got back to reworking this to use the
bt_info bit, as proposed by Alvaro. And it turned out to work great,
because (a) it's a tuple-level flag, i.e. the right place, and (b) it
does not overload existing flags.
This greatly simplified the code in add_values_to_range and (especially)
union_tuples, making it much easier to understand, I think.
One disadvantage is we are unable to see which ranges are empty in
current pageinspect, but 0002 addresses that by adding "empty" column to
the brin_page_items() output. That's a matter for master only, though.
It's a trivial patch and it makes it easier/possible to test this, so we
should consider to squeeze it into PG16.
I did quite a bit of testing - the attached 0003 adds extra tests, but I
don't propose to get this committed as is - it's rather overkill. Maybe
some reduced version of it ...
The hardest thing to test is the union_tuples() part, as it requires
concurrent operations with "correct" timing. Easy to simulate by
breakpoints in GDB, not so much in plain regression/TAP tests.
There's also a stress tests, doing a lot of randomized summarizations,
etc. Without the fix this failed in maybe 30% of runs, now I did ~100
runs without a single failure.
I haven't done any backporting, but I think it should be simpler than
with the earlier approach. I wonder if we need to care about starting to
use the previously unused bit - I don't think so, in the worst case
we'll just ignore it, but maybe I'm missing something (e.g. when using
physical replication).
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
0001-Fix-handling-of-NULLs-in-BRIN-indexes.patchtext/x-patch; charset=UTF-8; name=0001-Fix-handling-of-NULLs-in-BRIN-indexes.patchDownload
From 10efaf9964806e5a30818994e3cfda879bb90171 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 8 Jan 2023 16:43:06 +0100
Subject: [PATCH 1/3] Fix handling of NULLs in BRIN indexes
BRIN indexes did not properly distinguish between summaries for empty
(no rows) and all-NULL ranges. All summaries were initialized with
allnulls=true, and the opclasses simply reset allnulls to false when
processing the first non-NULL value. This however fails if the range
starts with a NULL value (or a sequence of NULL values), in which case
we forget the range contains NULL values.
This happens because the allnulls flag is used for two separate
purposes - to mark empty ranges (not representing any rows yet) and
ranges containing only NULL values.
Opclasses don't know which of these cases it is, and so don't know
whether to set hasnulls=true. Setting hasnulls=true in both cases would
make it correct, but it would also make BRIN indexes useless for queries
with IS NULL clauses - all ranges start empty (and thus allnulls=true),
so all ranges would end up with either allnulls=true or hasnulls=true.
The severity of the issue is somewhat reduced by the fact that it only
happens when adding values to an existing summary with allnulls=true,
not when the summarization is processing values in bulk (e.g. during
CREATE INDEX or automatic summarization). In this case the flags were
updated in a slightly different way, not forgetting the NULL values.
This introduces a new a new flag marking index tuples representing
ranges with no rows. Luckily we have an unused tuple in the BRIN tuple
header that we can use for this.
We still store index tuples for empty ranges, because otherwise we'd not
be able to say whether a range is empty or not yet summarized, and we'd
have to process them for any query.
Backpatch to 11. The issue exists since BRIN indexes were introduced in
9.5, but older releases are already EOL.
Backpatch-through: 11
Reviewed-by: Alvaro Herrera, Justin Pryzby, Matthias van de Meent
Discussion: https://postgr.es/m/402430e4-7d9d-6cf1-09ef-464d80afff3b@enterprisedb.com
---
src/backend/access/brin/brin.c | 115 +++++++++++++++++-
src/backend/access/brin/brin_tuple.c | 15 ++-
src/include/access/brin_tuple.h | 6 +-
...summarization-and-inprogress-insertion.out | 8 +-
...ummarization-and-inprogress-insertion.spec | 1 +
5 files changed, 137 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 53e4721a54e..162a0c052aa 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -592,6 +592,17 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
bval = &dtup->bt_columns[attno - 1];
+ /*
+ * If the BRIN tuple indicates that this range is empty,
+ * we can skip it: there's nothing to match. We don't
+ * need to examine the next columns.
+ */
+ if (dtup->bt_empty_range)
+ {
+ addrange = false;
+ break;
+ }
+
/*
* First check if there are any IS [NOT] NULL scan keys,
* and if we're violating them. In that case we can
@@ -1590,6 +1601,8 @@ form_and_insert_tuple(BrinBuildState *state)
/*
* Given two deformed tuples, adjust the first one so that it's consistent
* with the summary values in both.
+ *
+ * XXX I'm not sure we can actually get empty "b".
*/
static void
union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
@@ -1607,6 +1620,64 @@ union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
db = brin_deform_tuple(bdesc, b, NULL);
MemoryContextSwitchTo(oldcxt);
+ /*
+ * Check if the ranges are empty.
+ *
+ * If at least one of them is empty, we don't need to call per-key union
+ * functions at all. If "b" is empty, we just use "a" as the result (it
+ * might be empty fine, but that's fine). If "a" is empty but "b" is not,
+ * we use "b" as the result (but we have to copy the data into "a" first).
+ *
+ * Only when both ranges are non-empty, we actually do the per-key merge.
+ */
+
+ /* If "b" is empty - ignore it and just use "a" (even if it's empty etc.). */
+ if (db->bt_empty_range)
+ {
+ /* skip the per-key merge */
+ MemoryContextDelete(cxt);
+ return;
+ }
+
+ /*
+ * Now we know "b" is not empty. If "a" is empty, then "b" is the result.
+ * But we need to copy the data from "b" to "a" first, because that's how
+ * we pass result out.
+ *
+ * We have to copy all the global/per-key flags etc. too.
+ */
+ if (a->bt_empty_range)
+ {
+ for (keyno = 0; keyno < bdesc->bd_tupdesc->natts; keyno++)
+ {
+ int i;
+ BrinValues *col_a = &a->bt_columns[keyno];
+ BrinValues *col_b = &db->bt_columns[keyno];
+ BrinOpcInfo *opcinfo = bdesc->bd_info[keyno];
+
+ col_a->bv_allnulls = col_b->bv_allnulls;
+ col_a->bv_hasnulls = col_b->bv_hasnulls;
+
+ /* If "b" has no data, we're done. */
+ if (col_b->bv_allnulls)
+ continue;
+
+ for (i = 0; i < opcinfo->oi_nstored; i++)
+ col_a->bv_values[i] =
+ datumCopy(col_b->bv_values[i],
+ opcinfo->oi_typcache[i]->typbyval,
+ opcinfo->oi_typcache[i]->typlen);
+ }
+
+ /* "a" started empty, but "b" was not empty, so remember that */
+ a->bt_empty_range = false;
+
+ /* skip the per-key merge */
+ MemoryContextDelete(cxt);
+ return;
+ }
+
+ /* Now we know neither range is empty. */
for (keyno = 0; keyno < bdesc->bd_tupdesc->natts; keyno++)
{
FmgrInfo *unionFn;
@@ -1704,7 +1775,9 @@ add_values_to_range(Relation idxRel, BrinDesc *bdesc, BrinMemTuple *dtup,
Datum *values, bool *nulls)
{
int keyno;
- bool modified = false;
+
+ /* If the range starts empty, we're certainly going to modify it. */
+ bool modified = dtup->bt_empty_range;
/*
* Compare the key values of the new tuple to the stored index values; our
@@ -1718,9 +1791,24 @@ add_values_to_range(Relation idxRel, BrinDesc *bdesc, BrinMemTuple *dtup,
Datum result;
BrinValues *bval;
FmgrInfo *addValue;
+ bool has_nulls;
bval = &dtup->bt_columns[keyno];
+ /*
+ * Does the range have actual NULL values? Either of the flags can
+ * be set, but we ignore the state before adding first row.
+ *
+ * We have to remember this, because we'll modify the flags and we
+ * need to know if the range started as empty.
+ */
+ has_nulls = ((!dtup->bt_empty_range) &&
+ (bval->bv_hasnulls || bval->bv_allnulls));
+
+ /*
+ * If the value we're adding is NULL, handle it locally. Otherwise
+ * call the BRIN_PROCNUM_ADDVALUE procedure.
+ */
if (bdesc->bd_info[keyno]->oi_regular_nulls && nulls[keyno])
{
/*
@@ -1746,8 +1834,33 @@ add_values_to_range(Relation idxRel, BrinDesc *bdesc, BrinMemTuple *dtup,
nulls[keyno]);
/* if that returned true, we need to insert the updated tuple */
modified |= DatumGetBool(result);
+
+ /*
+ * If the range was had actual NULL values (i.e. did not start empty),
+ * make sure we don't forget about the NULL values. Either the allnulls
+ * flag is still set to true, or (if the opclass cleared it) we need to
+ * set hasnulls=true.
+ *
+ * XXX This can only happen when the opclass modified the tuple, so the
+ * modified flag should be set.
+ */
+ if (has_nulls && !(bval->bv_hasnulls || bval->bv_allnulls))
+ {
+ Assert(modified);
+ bval->bv_hasnulls = true;
+ }
}
+ /*
+ * After updating summaries for all the keys, mark it as not empty.
+ *
+ * If we're actually changing the flag value (i.e. tuple started as empty),
+ * we should have modified the tuple. So we should not see empty range that
+ * was not modified.
+ */
+ Assert(!dtup->bt_empty_range || modified);
+ dtup->bt_empty_range = false;
+
return modified;
}
diff --git a/src/backend/access/brin/brin_tuple.c b/src/backend/access/brin/brin_tuple.c
index 84b79dbfc0d..b81247a262c 100644
--- a/src/backend/access/brin/brin_tuple.c
+++ b/src/backend/access/brin/brin_tuple.c
@@ -372,6 +372,9 @@ brin_form_tuple(BrinDesc *brdesc, BlockNumber blkno, BrinMemTuple *tuple,
if (tuple->bt_placeholder)
rettuple->bt_info |= BRIN_PLACEHOLDER_MASK;
+ if (tuple->bt_empty_range)
+ rettuple->bt_info |= BRIN_EMPTY_RANGE_MASK;
+
*size = len;
return rettuple;
}
@@ -399,7 +402,7 @@ brin_form_placeholder_tuple(BrinDesc *brdesc, BlockNumber blkno, Size *size)
rettuple = palloc0(len);
rettuple->bt_blkno = blkno;
rettuple->bt_info = hoff;
- rettuple->bt_info |= BRIN_NULLS_MASK | BRIN_PLACEHOLDER_MASK;
+ rettuple->bt_info |= BRIN_NULLS_MASK | BRIN_PLACEHOLDER_MASK | BRIN_EMPTY_RANGE_MASK;
bitP = ((bits8 *) ((char *) rettuple + SizeOfBrinTuple)) - 1;
bitmask = HIGHBIT;
@@ -489,6 +492,8 @@ brin_new_memtuple(BrinDesc *brdesc)
dtup->bt_allnulls = palloc(sizeof(bool) * brdesc->bd_tupdesc->natts);
dtup->bt_hasnulls = palloc(sizeof(bool) * brdesc->bd_tupdesc->natts);
+ dtup->bt_empty_range = true;
+
dtup->bt_context = AllocSetContextCreate(CurrentMemoryContext,
"brin dtuple",
ALLOCSET_DEFAULT_SIZES);
@@ -527,6 +532,9 @@ brin_memtuple_initialize(BrinMemTuple *dtuple, BrinDesc *brdesc)
currdatum += sizeof(Datum) * brdesc->bd_info[i]->oi_nstored;
}
+ /* FIXME Shouldn't this reset bt_placeholder too? */
+ dtuple->bt_empty_range = true;
+
return dtuple;
}
@@ -560,6 +568,11 @@ brin_deform_tuple(BrinDesc *brdesc, BrinTuple *tuple, BrinMemTuple *dMemtuple)
if (BrinTupleIsPlaceholder(tuple))
dtup->bt_placeholder = true;
+
+ /* ranges start as empty, depends on the BrinTuple */
+ if (!BrinTupleIsEmptyRange(tuple))
+ dtup->bt_empty_range = false;
+
dtup->bt_blkno = tuple->bt_blkno;
values = dtup->bt_values;
diff --git a/src/include/access/brin_tuple.h b/src/include/access/brin_tuple.h
index 732f91edf11..c56747aca4a 100644
--- a/src/include/access/brin_tuple.h
+++ b/src/include/access/brin_tuple.h
@@ -44,6 +44,7 @@ typedef struct BrinValues
typedef struct BrinMemTuple
{
bool bt_placeholder; /* this is a placeholder tuple */
+ bool bt_empty_range; /* range represents no tuples */
BlockNumber bt_blkno; /* heap blkno that the tuple is for */
MemoryContext bt_context; /* memcxt holding the bt_columns values */
/* output arrays for brin_deform_tuple: */
@@ -69,7 +70,7 @@ typedef struct BrinTuple
*
* 7th (high) bit: has nulls
* 6th bit: is placeholder tuple
- * 5th bit: unused
+ * 5th bit: range is empty
* 4-0 bit: offset of data
* ---------------
*/
@@ -82,13 +83,14 @@ typedef struct BrinTuple
* bt_info manipulation macros
*/
#define BRIN_OFFSET_MASK 0x1F
-/* bit 0x20 is not used at present */
+#define BRIN_EMPTY_RANGE_MASK 0x20
#define BRIN_PLACEHOLDER_MASK 0x40
#define BRIN_NULLS_MASK 0x80
#define BrinTupleDataOffset(tup) ((Size) (((BrinTuple *) (tup))->bt_info & BRIN_OFFSET_MASK))
#define BrinTupleHasNulls(tup) (((((BrinTuple *) (tup))->bt_info & BRIN_NULLS_MASK)) != 0)
#define BrinTupleIsPlaceholder(tup) (((((BrinTuple *) (tup))->bt_info & BRIN_PLACEHOLDER_MASK)) != 0)
+#define BrinTupleIsEmptyRange(tup) (((((BrinTuple *) (tup))->bt_info & BRIN_EMPTY_RANGE_MASK)) != 0)
extern BrinTuple *brin_form_tuple(BrinDesc *brdesc, BlockNumber blkno,
diff --git a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
index 2a4755d0998..584ac2602f7 100644
--- a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
+++ b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
@@ -4,7 +4,7 @@ starting permutation: s2check s1b s2b s1i s2summ s1c s2c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -26,7 +26,7 @@ step s2c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
2| 1| 1|f |f |f |{1 .. 1000}
(2 rows)
@@ -35,7 +35,7 @@ starting permutation: s2check s1b s1i s2vacuum s1c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -45,7 +45,7 @@ step s1c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
2| 1| 1|f |f |f |{1 .. 1000}
(2 rows)
diff --git a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
index 19ac18a2e88..18ba92b7ba1 100644
--- a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
+++ b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
@@ -9,6 +9,7 @@ setup
) WITH (fillfactor=10);
CREATE INDEX brinidx ON brin_iso USING brin (value) WITH (pages_per_range=1);
-- this fills the first page
+ INSERT INTO brin_iso VALUES (NULL);
DO $$
DECLARE curtid tid;
BEGIN
--
2.39.2
0002-Show-empty-ranges-in-brin_page_items.patchtext/x-patch; charset=UTF-8; name=0002-Show-empty-ranges-in-brin_page_items.patchDownload
From a7ed39285a7c8f3655091346755b81b0d79b2f3e Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 27 Mar 2023 22:47:12 +0200
Subject: [PATCH 2/3] Show empty ranges in brin_page_items
Show which BRIN ranges are empty (no rows), as indicated by the newly
introduced flag.
---
contrib/pageinspect/brinfuncs.c | 10 ++++---
contrib/pageinspect/expected/brin.out | 6 ++--
.../pageinspect/pageinspect--1.11--1.12.sql | 17 +++++++++++
...summarization-and-inprogress-insertion.out | 28 +++++++++----------
4 files changed, 40 insertions(+), 21 deletions(-)
diff --git a/contrib/pageinspect/brinfuncs.c b/contrib/pageinspect/brinfuncs.c
index 000dcd8f5d8..a781f265514 100644
--- a/contrib/pageinspect/brinfuncs.c
+++ b/contrib/pageinspect/brinfuncs.c
@@ -201,8 +201,8 @@ brin_page_items(PG_FUNCTION_ARGS)
dtup = NULL;
for (;;)
{
- Datum values[7];
- bool nulls[7] = {0};
+ Datum values[8];
+ bool nulls[8] = {0};
/*
* This loop is called once for every attribute of every tuple in the
@@ -239,6 +239,7 @@ brin_page_items(PG_FUNCTION_ARGS)
nulls[4] = true;
nulls[5] = true;
nulls[6] = true;
+ nulls[7] = true;
}
else
{
@@ -261,6 +262,7 @@ brin_page_items(PG_FUNCTION_ARGS)
values[3] = BoolGetDatum(dtup->bt_columns[att].bv_allnulls);
values[4] = BoolGetDatum(dtup->bt_columns[att].bv_hasnulls);
values[5] = BoolGetDatum(dtup->bt_placeholder);
+ values[6] = BoolGetDatum(dtup->bt_empty_range);
if (!dtup->bt_columns[att].bv_allnulls)
{
BrinValues *bvalues = &dtup->bt_columns[att];
@@ -286,12 +288,12 @@ brin_page_items(PG_FUNCTION_ARGS)
}
appendStringInfoChar(&s, '}');
- values[6] = CStringGetTextDatum(s.data);
+ values[7] = CStringGetTextDatum(s.data);
pfree(s.data);
}
else
{
- nulls[6] = true;
+ nulls[7] = true;
}
}
diff --git a/contrib/pageinspect/expected/brin.out b/contrib/pageinspect/expected/brin.out
index e12fbeb4774..098ddc202f4 100644
--- a/contrib/pageinspect/expected/brin.out
+++ b/contrib/pageinspect/expected/brin.out
@@ -43,9 +43,9 @@ SELECT * FROM brin_revmap_data(get_raw_page('test1_a_idx', 1)) LIMIT 5;
SELECT * FROM brin_page_items(get_raw_page('test1_a_idx', 2), 'test1_a_idx')
ORDER BY blknum, attnum LIMIT 5;
- itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
-------------+--------+--------+----------+----------+-------------+----------
- 1 | 0 | 1 | f | f | f | {1 .. 1}
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | f | f | f | {1 .. 1}
(1 row)
-- Mask DETAIL messages as these are not portable across architectures.
diff --git a/contrib/pageinspect/pageinspect--1.11--1.12.sql b/contrib/pageinspect/pageinspect--1.11--1.12.sql
index 70c3abccf57..a20d67a9e82 100644
--- a/contrib/pageinspect/pageinspect--1.11--1.12.sql
+++ b/contrib/pageinspect/pageinspect--1.11--1.12.sql
@@ -21,3 +21,20 @@ CREATE FUNCTION bt_multi_page_stats(IN relname text, IN blkno int8, IN blk_count
RETURNS SETOF record
AS 'MODULE_PATHNAME', 'bt_multi_page_stats'
LANGUAGE C STRICT PARALLEL RESTRICTED;
+
+--
+-- add information about BRIN empty ranges
+--
+DROP FUNCTION brin_page_items(IN page bytea, IN index_oid regclass);
+CREATE FUNCTION brin_page_items(IN page bytea, IN index_oid regclass,
+ OUT itemoffset int,
+ OUT blknum int8,
+ OUT attnum int,
+ OUT allnulls bool,
+ OUT hasnulls bool,
+ OUT placeholder bool,
+ OUT empty bool,
+ OUT value text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'brin_page_items'
+LANGUAGE C STRICT PARALLEL RESTRICTED;
diff --git a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
index 584ac2602f7..201786c82c0 100644
--- a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
+++ b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
@@ -2,9 +2,9 @@ Parsed test spec with 2 sessions
starting permutation: s2check s1b s2b s1i s2summ s1c s2c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
-itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
-----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |t |f |{1 .. 1}
+itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|empty|value
+----------+------+------+--------+--------+-----------+-----+--------
+ 1| 0| 1|f |t |f |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -24,18 +24,18 @@ brin_summarize_new_values
step s1c: COMMIT;
step s2c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
-itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
-----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |t |f |{1 .. 1}
- 2| 1| 1|f |f |f |{1 .. 1000}
+itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|empty|value
+----------+------+------+--------+--------+-----------+-----+-----------
+ 1| 0| 1|f |t |f |f |{1 .. 1}
+ 2| 1| 1|f |f |f |f |{1 .. 1000}
(2 rows)
starting permutation: s2check s1b s1i s2vacuum s1c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
-itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
-----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |t |f |{1 .. 1}
+itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|empty|value
+----------+------+------+--------+--------+-----------+-----+--------
+ 1| 0| 1|f |t |f |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -43,9 +43,9 @@ step s1i: INSERT INTO brin_iso VALUES (1000);
step s2vacuum: VACUUM brin_iso;
step s1c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
-itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
-----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |t |f |{1 .. 1}
- 2| 1| 1|f |f |f |{1 .. 1000}
+itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|empty|value
+----------+------+------+--------+--------+-----------+-----+-----------
+ 1| 0| 1|f |t |f |f |{1 .. 1}
+ 2| 1| 1|f |f |f |f |{1 .. 1000}
(2 rows)
--
2.39.2
0003-extra-tests.patchtext/x-patch; charset=UTF-8; name=0003-extra-tests.patchDownload
From 3906763f4cd896c5fad68248eb77fbd59299f162 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 27 Mar 2023 22:50:46 +0200
Subject: [PATCH 3/3] extra tests
---
contrib/pageinspect/Makefile | 2 +-
contrib/pageinspect/expected/brin-fails.out | 152 +++++++++++++++
contrib/pageinspect/expected/brin2.out | 201 ++++++++++++++++++++
contrib/pageinspect/sql/brin-fails.sql | 86 +++++++++
contrib/pageinspect/sql/brin2.sql | 117 ++++++++++++
5 files changed, 557 insertions(+), 1 deletion(-)
create mode 100644 contrib/pageinspect/expected/brin-fails.out
create mode 100644 contrib/pageinspect/expected/brin2.out
create mode 100644 contrib/pageinspect/sql/brin-fails.sql
create mode 100644 contrib/pageinspect/sql/brin2.sql
diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index 95e030b3969..6a5795fdfd9 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -22,7 +22,7 @@ DATA = pageinspect--1.11--1.12.sql pageinspect--1.10--1.11.sql \
pageinspect--1.0--1.1.sql
PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
-REGRESS = page btree brin gin gist hash checksum oldextversions
+REGRESS = page btree brin brin2 gin gist hash checksum brin-fails oldextversions
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/pageinspect/expected/brin-fails.out b/contrib/pageinspect/expected/brin-fails.out
new file mode 100644
index 00000000000..08479894ec3
--- /dev/null
+++ b/contrib/pageinspect/expected/brin-fails.out
@@ -0,0 +1,152 @@
+create table t (a int);
+-- works
+drop index if exists t_a_idx;
+NOTICE: index "t_a_idx" does not exist, skipping
+truncate t;
+insert into t values (null), (1);
+create index on t using brin (a);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | t | f | f | {1 .. 1}
+(1 row)
+
+-- works
+drop index if exists t_a_idx;
+truncate t;
+insert into t values (1), (null);
+create index on t using brin (a);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | t | f | f | {1 .. 1}
+(1 row)
+
+-- fails
+drop index if exists t_a_idx;
+truncate t;
+insert into t values (null);
+create index on t using brin (a);
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | t | f | f | {1 .. 1}
+(1 row)
+
+-- fails
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a);
+insert into t values (null);
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | t | f | f | {1 .. 1}
+(1 row)
+
+-- works
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a) with (pages_per_range=1);
+insert into t select null from generate_series(1,291); -- fill first page
+insert into t values (null), (1);
+select brin_summarize_new_values('t_a_idx');
+ brin_summarize_new_values
+---------------------------
+ 1
+(1 row)
+
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | t | f | f | f |
+ 2 | 1 | 1 | f | t | f | f | {1 .. 1}
+(2 rows)
+
+-- fails
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a) with (pages_per_range=1);
+insert into t select null from generate_series(1,291); -- fill first page
+insert into t values (null);
+select brin_summarize_new_values('t_a_idx');
+ brin_summarize_new_values
+---------------------------
+ 1
+(1 row)
+
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | t | f | f | f |
+ 2 | 1 | 1 | f | t | f | f | {1 .. 1}
+(2 rows)
+
+-- works
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a) with (pages_per_range=1);
+insert into t select null from generate_series(1,291); -- fill first page
+insert into t values (null);
+insert into t values (1);
+select brin_summarize_new_values('t_a_idx');
+ brin_summarize_new_values
+---------------------------
+ 1
+(1 row)
+
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | t | f | f | f |
+ 2 | 1 | 1 | f | t | f | f | {1 .. 1}
+(2 rows)
+
+-- fails
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a) with (pages_per_range=1);
+insert into t select null from generate_series(1,291); -- fill first page
+insert into t values (null);
+insert into t values (null);
+select brin_summarize_new_values('t_a_idx');
+ brin_summarize_new_values
+---------------------------
+ 1
+(1 row)
+
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | t | f | f | f |
+ 2 | 1 | 1 | f | t | f | f | {1 .. 1}
+(2 rows)
+
+-- fails
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a) with (pages_per_range=1);
+insert into t select null from generate_series(1,291); -- fill first page
+insert into t values (null);
+insert into t values (null);
+select brin_summarize_new_values('t_a_idx');
+ brin_summarize_new_values
+---------------------------
+ 1
+(1 row)
+
+insert into t values (null);
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | t | f | f | f |
+ 2 | 1 | 1 | f | t | f | f | {1 .. 1}
+(2 rows)
+
+drop table t;
diff --git a/contrib/pageinspect/expected/brin2.out b/contrib/pageinspect/expected/brin2.out
new file mode 100644
index 00000000000..8406a792cd3
--- /dev/null
+++ b/contrib/pageinspect/expected/brin2.out
@@ -0,0 +1,201 @@
+create table t (a int);
+--
+drop index if exists t_a_idx;
+NOTICE: index "t_a_idx" does not exist, skipping
+truncate t;
+create index on t using brin (a);
+-- empty range, all_nulls=true (default)
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+-------
+ 1 | 0 | 1 | t | f | f | t |
+(1 row)
+
+-- insert NULL value, range no longer empty, all_nulls=true
+insert into t values (NULL);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+-------
+ 1 | 0 | 1 | t | f | f | f |
+(1 row)
+
+-- another NULL value, still all_nulls=true
+insert into t values (NULL);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+-------
+ 1 | 0 | 1 | t | f | f | f |
+(1 row)
+
+-- not-NULL value, switches to has_nulls=true
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | t | f | f | {1 .. 1}
+(1 row)
+
+-- reinsert the not-NULL value, stays the same
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | t | f | f | {1 .. 1}
+(1 row)
+
+-- another not-NULL value, still has_nulls=true, range extends
+insert into t values (2);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | t | f | f | {1 .. 2}
+(1 row)
+
+-- another NULL value, still has_nulls=true
+insert into t values (NULL);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | t | f | f | {1 .. 2}
+(1 row)
+
+--
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a);
+-- empty range, all_nulls=true (default)
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+-------
+ 1 | 0 | 1 | t | f | f | t |
+(1 row)
+
+-- insert non-NULL value, range no longer empty, all_nulls=false
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | f | f | f | {1 .. 1}
+(1 row)
+
+-- re-insert non-NULL value, stays the same
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | f | f | f | {1 .. 1}
+(1 row)
+
+-- insert NULL value, has_nulls=true
+insert into t values (NULL);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | t | f | f | {1 .. 1}
+(1 row)
+
+-- another not-NULL value, still has_nulls=true, range expands
+insert into t values (2);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | t | f | f | {1 .. 2}
+(1 row)
+
+-- another NULL value, still has_nulls=true
+insert into t values (NULL);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | t | f | f | {1 .. 2}
+(1 row)
+
+--
+drop index if exists t_a_idx;
+truncate t;
+insert into t values (1); -- start with one non-NULL value
+create index on t using brin (a);
+-- non-empty range, all_nulls/has_nulls=false
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | f | f | f | {1 .. 1}
+(1 row)
+
+-- re-insert the non-NULL value, stays the same
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | f | f | f | {1 .. 1}
+(1 row)
+
+-- insert another non-NULL value, null flags stay the same
+insert into t values (2);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | f | f | f | {1 .. 2}
+(1 row)
+
+-- insert NULL value, has_nulls=true
+insert into t values (null);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | t | f | f | {1 .. 2}
+(1 row)
+
+--
+drop index if exists t_a_idx;
+truncate t;
+insert into t values (NULL); -- start with one non-NULL value
+create index on t using brin (a);
+-- non-empty range, all_nulls=true
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+-------
+ 1 | 0 | 1 | t | f | f | f |
+(1 row)
+
+-- re-insert NULL, stays the same
+insert into t values (NULL);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+-------
+ 1 | 0 | 1 | t | f | f | f |
+(1 row)
+
+-- insert a non-NULL value, switches to has_nulls=true
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | t | f | f | {1 .. 1}
+(1 row)
+
+-- re-insert the non-NULL value, stays the same
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | t | f | f | {1 .. 1}
+(1 row)
+
+-- insert another the non-NULL value, stays the same, range updated
+insert into t values (2);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | t | f | f | {1 .. 2}
+(1 row)
+
+-- insert NULL value, stays the same
+insert into t values (NULL);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | t | f | f | {1 .. 2}
+(1 row)
+
+drop table t;
diff --git a/contrib/pageinspect/sql/brin-fails.sql b/contrib/pageinspect/sql/brin-fails.sql
new file mode 100644
index 00000000000..e5b37fa6b12
--- /dev/null
+++ b/contrib/pageinspect/sql/brin-fails.sql
@@ -0,0 +1,86 @@
+create table t (a int);
+
+-- works
+drop index if exists t_a_idx;
+truncate t;
+insert into t values (null), (1);
+create index on t using brin (a);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- works
+drop index if exists t_a_idx;
+truncate t;
+insert into t values (1), (null);
+create index on t using brin (a);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- fails
+drop index if exists t_a_idx;
+truncate t;
+insert into t values (null);
+create index on t using brin (a);
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- fails
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a);
+insert into t values (null);
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- works
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a) with (pages_per_range=1);
+insert into t select null from generate_series(1,291); -- fill first page
+insert into t values (null), (1);
+select brin_summarize_new_values('t_a_idx');
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- fails
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a) with (pages_per_range=1);
+insert into t select null from generate_series(1,291); -- fill first page
+insert into t values (null);
+select brin_summarize_new_values('t_a_idx');
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- works
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a) with (pages_per_range=1);
+insert into t select null from generate_series(1,291); -- fill first page
+insert into t values (null);
+insert into t values (1);
+select brin_summarize_new_values('t_a_idx');
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- fails
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a) with (pages_per_range=1);
+insert into t select null from generate_series(1,291); -- fill first page
+insert into t values (null);
+insert into t values (null);
+select brin_summarize_new_values('t_a_idx');
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- fails
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a) with (pages_per_range=1);
+insert into t select null from generate_series(1,291); -- fill first page
+insert into t values (null);
+insert into t values (null);
+select brin_summarize_new_values('t_a_idx');
+insert into t values (null);
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+drop table t;
diff --git a/contrib/pageinspect/sql/brin2.sql b/contrib/pageinspect/sql/brin2.sql
new file mode 100644
index 00000000000..501252e7b8c
--- /dev/null
+++ b/contrib/pageinspect/sql/brin2.sql
@@ -0,0 +1,117 @@
+create table t (a int);
+
+--
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a);
+
+-- empty range, all_nulls=true (default)
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- insert NULL value, range no longer empty, all_nulls=true
+insert into t values (NULL);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- another NULL value, still all_nulls=true
+insert into t values (NULL);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- not-NULL value, switches to has_nulls=true
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- reinsert the not-NULL value, stays the same
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- another not-NULL value, still has_nulls=true, range extends
+insert into t values (2);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- another NULL value, still has_nulls=true
+insert into t values (NULL);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+
+--
+drop index if exists t_a_idx;
+truncate t;
+create index on t using brin (a);
+
+-- empty range, all_nulls=true (default)
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- insert non-NULL value, range no longer empty, all_nulls=false
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- re-insert non-NULL value, stays the same
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- insert NULL value, has_nulls=true
+insert into t values (NULL);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- another not-NULL value, still has_nulls=true, range expands
+insert into t values (2);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- another NULL value, still has_nulls=true
+insert into t values (NULL);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+
+--
+drop index if exists t_a_idx;
+truncate t;
+insert into t values (1); -- start with one non-NULL value
+create index on t using brin (a);
+
+-- non-empty range, all_nulls/has_nulls=false
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- re-insert the non-NULL value, stays the same
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- insert another non-NULL value, null flags stay the same
+insert into t values (2);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- insert NULL value, has_nulls=true
+insert into t values (null);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+
+--
+drop index if exists t_a_idx;
+truncate t;
+insert into t values (NULL); -- start with one non-NULL value
+create index on t using brin (a);
+
+-- non-empty range, all_nulls=true
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- re-insert NULL, stays the same
+insert into t values (NULL);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- insert a non-NULL value, switches to has_nulls=true
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- re-insert the non-NULL value, stays the same
+insert into t values (1);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- insert another the non-NULL value, stays the same, range updated
+insert into t values (2);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+-- insert NULL value, stays the same
+insert into t values (NULL);
+select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'::regclass);
+
+
+drop table t;
--
2.39.2
Hi,
here's an updated version of the patch, including a backport version. I
ended up making the code yet a bit closer to master by introducing
add_values_to_range(). The current pre-14 code has the loop adding data
to the BRIN tuple in two places, but with the "fixed" logic handling
NULLs and the empty_range flag the amount of duplicated code got too
high, so this seem reasonable.
Both cases have a patch extending pageinspect to show the new flag, but
obviously we should commit that only in master. In backbranches it's
meant only to make testing easier.
I plan to do a bit more testing, I'd welcome some feedback - it's a
long-standing bug, and it'd be good to finally get this fixed. I don't
think the patch can be made any simpler.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
0001-Fix-handling-of-NULLs-in-BRIN-indexes-20230423-11-13.patchtext/x-patch; charset=UTF-8; name=0001-Fix-handling-of-NULLs-in-BRIN-indexes-20230423-11-13.patchDownload
From 42603e32bd0ad456a53a96ac2d05ce714f97e1ba Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 23 Apr 2023 19:26:18 +0200
Subject: [PATCH 1/2] Fix handling of NULLs in BRIN indexes
BRIN indexes did not properly distinguish between summaries for empty
(no rows) and all-NULL ranges. All summaries were initialized with
allnulls=true, and the opclasses simply reset allnulls to false when
processing the first non-NULL value. This however fails if the range
starts with a NULL value (or a sequence of NULL values), in which case
we forget the range contains NULL values.
This happens because the allnulls flag is used for two separate
purposes - to mark empty ranges (not representing any rows yet) and
ranges containing only NULL values.
Opclasses don't know which of these cases it is, and so don't know
whether to set hasnulls=true. Setting hasnulls=true in both cases would
make it correct, but it would also make BRIN indexes useless for queries
with IS NULL clauses - all ranges start empty (and thus allnulls=true),
so all ranges would end up with either allnulls=true or hasnulls=true.
The severity of the issue is somewhat reduced by the fact that it only
happens when adding values to an existing summary with allnulls=true,
not when the summarization is processing values in bulk (e.g. during
CREATE INDEX or automatic summarization). In this case the flags were
updated in a slightly different way, not forgetting the NULL values.
The best solution would be to introduce a new flag marking index tuples
representing ranges with no rows, but that would break on-disk format
and/or ABI, depending on where we put the flag. Considering we need to
backpatch this, that's not acceptable.
So instead we use an "impossible" combination of both flags (allnulls
and hasnulls) set to true, to mark "empty" ranges with no rows. In
principle "empty" is a feature of the whole index tuple, which may
contain multiple summaries in a multi-column index, but this is where
the flags are, unfortunately.
We could also skip storing index tuples for empty summaries, but then
we'd have to always process such ranges - even if there are no rows in
large parts of the table (e.g. after a bulk DELETE), it would still
require reading the pages etc. So we store them, but ignore them when
building the bitmap.
Backpatch to 11. The issue exists since BRIN indexes were introduced in
9.5, but older releases are already EOL.
Backpatch-through: 11
Reviewed-by: Justin Pryzby, Matthias van de Meent
Discussion: https://postgr.es/m/402430e4-7d9d-6cf1-09ef-464d80afff3b@enterprisedb.com
---
src/backend/access/brin/brin.c | 226 ++++++++++++++----
src/backend/access/brin/brin_tuple.c | 15 +-
src/include/access/brin_tuple.h | 6 +-
...summarization-and-inprogress-insertion.out | 8 +-
...ummarization-and-inprogress-insertion.spec | 1 +
5 files changed, 201 insertions(+), 55 deletions(-)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 0becfde1133..f99a9ba5cf9 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -35,6 +35,7 @@
#include "storage/freespace.h"
#include "utils/acl.h"
#include "utils/builtins.h"
+#include "utils/datum.h"
#include "utils/index_selfuncs.h"
#include "utils/memutils.h"
#include "utils/rel.h"
@@ -77,7 +78,8 @@ static void form_and_insert_tuple(BrinBuildState *state);
static void union_tuples(BrinDesc *bdesc, BrinMemTuple *a,
BrinTuple *b);
static void brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy);
-
+static bool add_values_to_range(Relation idxRel, BrinDesc *bdesc,
+ BrinMemTuple *dtup, Datum *values, bool *nulls);
/*
* BRIN handler function: return IndexAmRoutine with access method parameters
@@ -173,11 +175,10 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
for (;;)
{
- bool need_insert = false;
+ bool need_insert;
OffsetNumber off;
BrinTuple *brtup;
BrinMemTuple *dtup;
- int keyno;
CHECK_FOR_INTERRUPTS();
@@ -241,31 +242,7 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
dtup = brin_deform_tuple(bdesc, brtup, NULL);
- /*
- * Compare the key values of the new tuple to the stored index values;
- * our deformed tuple will get updated if the new tuple doesn't fit
- * the original range (note this means we can't break out of the loop
- * early). Make a note of whether this happens, so that we know to
- * insert the modified tuple later.
- */
- for (keyno = 0; keyno < bdesc->bd_tupdesc->natts; keyno++)
- {
- Datum result;
- BrinValues *bval;
- FmgrInfo *addValue;
-
- bval = &dtup->bt_columns[keyno];
- addValue = index_getprocinfo(idxRel, keyno + 1,
- BRIN_PROCNUM_ADDVALUE);
- result = FunctionCall4Coll(addValue,
- idxRel->rd_indcollation[keyno],
- PointerGetDatum(bdesc),
- PointerGetDatum(bval),
- values[keyno],
- nulls[keyno]);
- /* if that returned true, we need to insert the updated tuple */
- need_insert |= DatumGetBool(result);
- }
+ need_insert = add_values_to_range(idxRel, bdesc, dtup, values, nulls);
if (!need_insert)
{
@@ -508,6 +485,17 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
CurrentMemoryContext);
}
+ /*
+ * If the BRIN tuple indicates that this range is empty,
+ * we can skip it: there's nothing to match. We don't
+ * need to examine the next columns.
+ */
+ if (dtup->bt_empty_range)
+ {
+ addrange = false;
+ break;
+ }
+
/*
* Check whether the scan key is consistent with the page
* range values; if so, have the pages in the range added
@@ -611,7 +599,6 @@ brinbuildCallback(Relation index,
{
BrinBuildState *state = (BrinBuildState *) brstate;
BlockNumber thisblock;
- int i;
thisblock = ItemPointerGetBlockNumber(tid);
@@ -640,25 +627,8 @@ brinbuildCallback(Relation index,
}
/* Accumulate the current tuple into the running state */
- for (i = 0; i < state->bs_bdesc->bd_tupdesc->natts; i++)
- {
- FmgrInfo *addValue;
- BrinValues *col;
- Form_pg_attribute attr = TupleDescAttr(state->bs_bdesc->bd_tupdesc, i);
-
- col = &state->bs_dtuple->bt_columns[i];
- addValue = index_getprocinfo(index, i + 1,
- BRIN_PROCNUM_ADDVALUE);
-
- /*
- * Update dtuple state, if and as necessary.
- */
- FunctionCall4Coll(addValue,
- attr->attcollation,
- PointerGetDatum(state->bs_bdesc),
- PointerGetDatum(col),
- values[i], isnull[i]);
- }
+ (void) add_values_to_range(index, state->bs_bdesc, state->bs_dtuple,
+ values, isnull);
}
/*
@@ -1448,6 +1418,8 @@ form_and_insert_tuple(BrinBuildState *state)
/*
* Given two deformed tuples, adjust the first one so that it's consistent
* with the summary values in both.
+ *
+ * XXX I'm not sure we can actually get empty "b".
*/
static void
union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
@@ -1465,6 +1437,64 @@ union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
db = brin_deform_tuple(bdesc, b, NULL);
MemoryContextSwitchTo(oldcxt);
+ /*
+ * Check if the ranges are empty.
+ *
+ * If at least one of them is empty, we don't need to call per-key union
+ * functions at all. If "b" is empty, we just use "a" as the result (it
+ * might be empty fine, but that's fine). If "a" is empty but "b" is not,
+ * we use "b" as the result (but we have to copy the data into "a" first).
+ *
+ * Only when both ranges are non-empty, we actually do the per-key merge.
+ */
+
+ /* If "b" is empty - ignore it and just use "a" (even if it's empty etc.). */
+ if (db->bt_empty_range)
+ {
+ /* skip the per-key merge */
+ MemoryContextDelete(cxt);
+ return;
+ }
+
+ /*
+ * Now we know "b" is not empty. If "a" is empty, then "b" is the result.
+ * But we need to copy the data from "b" to "a" first, because that's how
+ * we pass result out.
+ *
+ * We have to copy all the global/per-key flags etc. too.
+ */
+ if (a->bt_empty_range)
+ {
+ for (keyno = 0; keyno < bdesc->bd_tupdesc->natts; keyno++)
+ {
+ int i;
+ BrinValues *col_a = &a->bt_columns[keyno];
+ BrinValues *col_b = &db->bt_columns[keyno];
+ BrinOpcInfo *opcinfo = bdesc->bd_info[keyno];
+
+ col_a->bv_allnulls = col_b->bv_allnulls;
+ col_a->bv_hasnulls = col_b->bv_hasnulls;
+
+ /* If "b" has no data, we're done. */
+ if (col_b->bv_allnulls)
+ continue;
+
+ for (i = 0; i < opcinfo->oi_nstored; i++)
+ col_a->bv_values[i] =
+ datumCopy(col_b->bv_values[i],
+ opcinfo->oi_typcache[i]->typbyval,
+ opcinfo->oi_typcache[i]->typlen);
+ }
+
+ /* "a" started empty, but "b" was not empty, so remember that */
+ a->bt_empty_range = false;
+
+ /* skip the per-key merge */
+ MemoryContextDelete(cxt);
+ return;
+ }
+
+ /* Now we know neither range is empty. */
for (keyno = 0; keyno < bdesc->bd_tupdesc->natts; keyno++)
{
FmgrInfo *unionFn;
@@ -1523,3 +1553,103 @@ brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy)
*/
FreeSpaceMapVacuum(idxrel);
}
+
+static bool
+add_values_to_range(Relation idxRel, BrinDesc *bdesc, BrinMemTuple *dtup,
+ Datum *values, bool *nulls)
+{
+ int keyno;
+
+ /* If the range starts empty, we're certainly going to modify it. */
+ bool modified = dtup->bt_empty_range;
+
+ /*
+ * Compare the key values of the new tuple to the stored index values;
+ * our deformed tuple will get updated if the new tuple doesn't fit
+ * the original range (note this means we can't break out of the loop
+ * early). Make a note of whether this happens, so that we know to
+ * insert the modified tuple later.
+ */
+ for (keyno = 0; keyno < bdesc->bd_tupdesc->natts; keyno++)
+ {
+ Datum result;
+ BrinValues *bval;
+ FmgrInfo *addValue;
+ bool has_nulls;
+
+ bval = &dtup->bt_columns[keyno];
+
+ /*
+ * Does the range have actual NULL values? Either of the flags can
+ * be set, but we ignore the state before adding first row.
+ *
+ * We have to remember this, because we'll modify the flags and we
+ * need to know if the range started as empty.
+ */
+ has_nulls = ((!dtup->bt_empty_range) &&
+ (bval->bv_hasnulls || bval->bv_allnulls));
+
+ /*
+ * If the value we're adding is NULL, handle it locally. Otherwise
+ * call the BRIN_PROCNUM_ADDVALUE procedure.
+ */
+ if (nulls[keyno])
+ {
+ /*
+ * We can't check "bv_hasnulls" because then we might end up with
+ * both flags set to true, which is interpreted as empty range.
+ * But that'd be wrong, because we've just added a value.
+ *
+ * So either the range has allnulls=true, or we have to set the
+ * hasnulls flag. Check if we're changing the value to determine
+ * if the index tuple was modified.
+ */
+ if (!bval->bv_allnulls)
+ {
+ /* Are we changing the tuple? */
+ modified |= (!bval->bv_hasnulls);
+ bval->bv_hasnulls = true;
+ }
+ }
+ else
+ {
+ addValue = index_getprocinfo(idxRel, keyno + 1,
+ BRIN_PROCNUM_ADDVALUE);
+ result = FunctionCall4Coll(addValue,
+ idxRel->rd_indcollation[keyno],
+ PointerGetDatum(bdesc),
+ PointerGetDatum(bval),
+ values[keyno],
+ nulls[keyno]);
+ /* if that returned true, we need to insert the updated tuple */
+ modified |= DatumGetBool(result);
+ }
+
+ /*
+ * If the range was had actual NULL values (i.e. did not start empty),
+ * make sure we don't forget about the NULL values. Either the allnulls
+ * flag is still set to true, or (if the opclass cleared it) we need to
+ * set hasnulls=true.
+ *
+ * XXX This can only happen when the opclass modified the tuple, so the
+ * modified flag should be set.
+ */
+ if (has_nulls && !(bval->bv_hasnulls || bval->bv_allnulls))
+ {
+ Assert(modified);
+ bval->bv_hasnulls = true;
+ }
+ }
+
+ /*
+ * After updating summaries for all the keys, mark it as not empty.
+ *
+ * If we're actually changing the flag value (i.e. tuple started as
+ * empty), we should have modified the tuple. So we should not see
+ * empty range that was not modified.
+ */
+ Assert(!dtup->bt_empty_range || modified);
+ dtup->bt_empty_range = false;
+
+ return modified;
+}
diff --git a/src/backend/access/brin/brin_tuple.c b/src/backend/access/brin/brin_tuple.c
index b3b453aed12..f9877980f4d 100644
--- a/src/backend/access/brin/brin_tuple.c
+++ b/src/backend/access/brin/brin_tuple.c
@@ -349,6 +349,9 @@ brin_form_tuple(BrinDesc *brdesc, BlockNumber blkno, BrinMemTuple *tuple,
if (tuple->bt_placeholder)
rettuple->bt_info |= BRIN_PLACEHOLDER_MASK;
+ if (tuple->bt_empty_range)
+ rettuple->bt_info |= BRIN_EMPTY_RANGE_MASK;
+
*size = len;
return rettuple;
}
@@ -376,7 +379,7 @@ brin_form_placeholder_tuple(BrinDesc *brdesc, BlockNumber blkno, Size *size)
rettuple = palloc0(len);
rettuple->bt_blkno = blkno;
rettuple->bt_info = hoff;
- rettuple->bt_info |= BRIN_NULLS_MASK | BRIN_PLACEHOLDER_MASK;
+ rettuple->bt_info |= BRIN_NULLS_MASK | BRIN_PLACEHOLDER_MASK | BRIN_EMPTY_RANGE_MASK;
bitP = ((bits8 *) ((char *) rettuple + SizeOfBrinTuple)) - 1;
bitmask = HIGHBIT;
@@ -466,6 +469,8 @@ brin_new_memtuple(BrinDesc *brdesc)
dtup->bt_allnulls = palloc(sizeof(bool) * brdesc->bd_tupdesc->natts);
dtup->bt_hasnulls = palloc(sizeof(bool) * brdesc->bd_tupdesc->natts);
+ dtup->bt_empty_range = true;
+
dtup->bt_context = AllocSetContextCreate(CurrentMemoryContext,
"brin dtuple",
ALLOCSET_DEFAULT_SIZES);
@@ -499,6 +504,9 @@ brin_memtuple_initialize(BrinMemTuple *dtuple, BrinDesc *brdesc)
currdatum += sizeof(Datum) * brdesc->bd_info[i]->oi_nstored;
}
+ /* FIXME Shouldn't this reset bt_placeholder too? */
+ dtuple->bt_empty_range = true;
+
return dtuple;
}
@@ -532,6 +540,11 @@ brin_deform_tuple(BrinDesc *brdesc, BrinTuple *tuple, BrinMemTuple *dMemtuple)
if (BrinTupleIsPlaceholder(tuple))
dtup->bt_placeholder = true;
+
+ /* ranges start as empty, depends on the BrinTuple */
+ if (!BrinTupleIsEmptyRange(tuple))
+ dtup->bt_empty_range = false;
+
dtup->bt_blkno = tuple->bt_blkno;
values = dtup->bt_values;
diff --git a/src/include/access/brin_tuple.h b/src/include/access/brin_tuple.h
index a9ccc3995b4..5280872707a 100644
--- a/src/include/access/brin_tuple.h
+++ b/src/include/access/brin_tuple.h
@@ -36,6 +36,7 @@ typedef struct BrinValues
typedef struct BrinMemTuple
{
bool bt_placeholder; /* this is a placeholder tuple */
+ bool bt_empty_range; /* range represents no tuples */
BlockNumber bt_blkno; /* heap blkno that the tuple is for */
MemoryContext bt_context; /* memcxt holding the bt_columns values */
/* output arrays for brin_deform_tuple: */
@@ -61,7 +62,7 @@ typedef struct BrinTuple
*
* 7th (high) bit: has nulls
* 6th bit: is placeholder tuple
- * 5th bit: unused
+ * 5th bit: range is empty
* 4-0 bit: offset of data
* ---------------
*/
@@ -74,13 +75,14 @@ typedef struct BrinTuple
* bt_info manipulation macros
*/
#define BRIN_OFFSET_MASK 0x1F
-/* bit 0x20 is not used at present */
+#define BRIN_EMPTY_RANGE_MASK 0x20
#define BRIN_PLACEHOLDER_MASK 0x40
#define BRIN_NULLS_MASK 0x80
#define BrinTupleDataOffset(tup) ((Size) (((BrinTuple *) (tup))->bt_info & BRIN_OFFSET_MASK))
#define BrinTupleHasNulls(tup) (((((BrinTuple *) (tup))->bt_info & BRIN_NULLS_MASK)) != 0)
#define BrinTupleIsPlaceholder(tup) (((((BrinTuple *) (tup))->bt_info & BRIN_PLACEHOLDER_MASK)) != 0)
+#define BrinTupleIsEmptyRange(tup) (((((BrinTuple *) (tup))->bt_info & BRIN_EMPTY_RANGE_MASK)) != 0)
extern BrinTuple *brin_form_tuple(BrinDesc *brdesc, BlockNumber blkno,
diff --git a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
index 2a4755d0998..584ac2602f7 100644
--- a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
+++ b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
@@ -4,7 +4,7 @@ starting permutation: s2check s1b s2b s1i s2summ s1c s2c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -26,7 +26,7 @@ step s2c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
2| 1| 1|f |f |f |{1 .. 1000}
(2 rows)
@@ -35,7 +35,7 @@ starting permutation: s2check s1b s1i s2vacuum s1c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -45,7 +45,7 @@ step s1c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
2| 1| 1|f |f |f |{1 .. 1000}
(2 rows)
diff --git a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
index 19ac18a2e88..18ba92b7ba1 100644
--- a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
+++ b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
@@ -9,6 +9,7 @@ setup
) WITH (fillfactor=10);
CREATE INDEX brinidx ON brin_iso USING brin (value) WITH (pages_per_range=1);
-- this fills the first page
+ INSERT INTO brin_iso VALUES (NULL);
DO $$
DECLARE curtid tid;
BEGIN
--
2.40.0
0002-pageinspect-tweak-20230423-11-13.patchtext/x-patch; charset=UTF-8; name=0002-pageinspect-tweak-20230423-11-13.patchDownload
From 157923fb3d55095d8d45fb1e7fe56a992d667b32 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 23 Apr 2023 22:19:05 +0200
Subject: [PATCH 2/2] pageinspect tweak
---
contrib/pageinspect/brinfuncs.c | 10 ++++++----
contrib/pageinspect/pageinspect--1.7--1.8.sql | 20 +++++++++++++++++++
2 files changed, 26 insertions(+), 4 deletions(-)
diff --git a/contrib/pageinspect/brinfuncs.c b/contrib/pageinspect/brinfuncs.c
index 04a90c4782d..8db6f1bcc7d 100644
--- a/contrib/pageinspect/brinfuncs.c
+++ b/contrib/pageinspect/brinfuncs.c
@@ -227,8 +227,8 @@ brin_page_items(PG_FUNCTION_ARGS)
dtup = NULL;
for (;;)
{
- Datum values[7];
- bool nulls[7];
+ Datum values[8];
+ bool nulls[8];
/*
* This loop is called once for every attribute of every tuple in the
@@ -267,6 +267,7 @@ brin_page_items(PG_FUNCTION_ARGS)
nulls[4] = true;
nulls[5] = true;
nulls[6] = true;
+ nulls[7] = true;
}
else
{
@@ -278,6 +279,7 @@ brin_page_items(PG_FUNCTION_ARGS)
values[3] = BoolGetDatum(dtup->bt_columns[att].bv_allnulls);
values[4] = BoolGetDatum(dtup->bt_columns[att].bv_hasnulls);
values[5] = BoolGetDatum(dtup->bt_placeholder);
+ values[6] = BoolGetDatum(dtup->bt_empty_range);
if (!dtup->bt_columns[att].bv_allnulls)
{
BrinValues *bvalues = &dtup->bt_columns[att];
@@ -303,12 +305,12 @@ brin_page_items(PG_FUNCTION_ARGS)
}
appendStringInfoChar(&s, '}');
- values[6] = CStringGetTextDatum(s.data);
+ values[7] = CStringGetTextDatum(s.data);
pfree(s.data);
}
else
{
- nulls[6] = true;
+ nulls[7] = true;
}
}
diff --git a/contrib/pageinspect/pageinspect--1.7--1.8.sql b/contrib/pageinspect/pageinspect--1.7--1.8.sql
index 45031a026a6..edfb580a4ec 100644
--- a/contrib/pageinspect/pageinspect--1.7--1.8.sql
+++ b/contrib/pageinspect/pageinspect--1.7--1.8.sql
@@ -67,3 +67,23 @@ CREATE FUNCTION bt_page_items(IN page bytea,
RETURNS SETOF record
AS 'MODULE_PATHNAME', 'bt_page_items_bytea'
LANGUAGE C STRICT PARALLEL SAFE;
+
+--
+-- add information about BRIN empty ranges
+--
+DROP FUNCTION brin_page_items(IN page bytea, IN index_oid regclass);
+--
+-- brin_page_items()
+--
+CREATE FUNCTION brin_page_items(IN page bytea, IN index_oid regclass,
+ OUT itemoffset int,
+ OUT blknum int,
+ OUT attnum int,
+ OUT allnulls bool,
+ OUT hasnulls bool,
+ OUT placeholder bool,
+ OUT empty bool,
+ OUT value text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'brin_page_items'
+LANGUAGE C STRICT PARALLEL SAFE;
--
2.40.0
0001-Fix-handling-of-NULLs-in-BRIN-indexe-20230423-master.patchtext/x-patch; charset=UTF-8; name=0001-Fix-handling-of-NULLs-in-BRIN-indexe-20230423-master.patchDownload
From 75ca742082cd50cef4609ecf0021e7bf69f34de6 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 8 Jan 2023 16:43:06 +0100
Subject: [PATCH 1/2] Fix handling of NULLs in BRIN indexes
BRIN indexes did not properly distinguish between summaries for empty
(no rows) and all-NULL ranges. All summaries were initialized with
allnulls=true, and the opclasses simply reset allnulls to false when
processing the first non-NULL value. This however fails if the range
starts with a NULL value (or a sequence of NULL values), in which case
we forget the range contains NULL values.
This happens because the allnulls flag is used for two separate
purposes - to mark empty ranges (not representing any rows yet) and
ranges containing only NULL values.
Opclasses don't know which of these cases it is, and so don't know
whether to set hasnulls=true. Setting hasnulls=true in both cases would
make it correct, but it would also make BRIN indexes useless for queries
with IS NULL clauses - all ranges start empty (and thus allnulls=true),
so all ranges would end up with either allnulls=true or hasnulls=true.
The severity of the issue is somewhat reduced by the fact that it only
happens when adding values to an existing summary with allnulls=true,
not when the summarization is processing values in bulk (e.g. during
CREATE INDEX or automatic summarization). In this case the flags were
updated in a slightly different way, not forgetting the NULL values.
This introduces a new a new flag marking index tuples representing
ranges with no rows. Luckily we have an unused tuple in the BRIN tuple
header that we can use for this.
We still store index tuples for empty ranges, because otherwise we'd not
be able to say whether a range is empty or not yet summarized, and we'd
have to process them for any query.
Backpatch to 11. The issue exists since BRIN indexes were introduced in
9.5, but older releases are already EOL.
Backpatch-through: 11
Reviewed-by: Alvaro Herrera, Justin Pryzby, Matthias van de Meent
Discussion: https://postgr.es/m/402430e4-7d9d-6cf1-09ef-464d80afff3b@enterprisedb.com
---
src/backend/access/brin/brin.c | 115 +++++++++++++++++-
src/backend/access/brin/brin_tuple.c | 15 ++-
src/include/access/brin_tuple.h | 6 +-
...summarization-and-inprogress-insertion.out | 8 +-
...ummarization-and-inprogress-insertion.spec | 1 +
5 files changed, 137 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 41bf950a4af..2e20f318e95 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -592,6 +592,17 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
bval = &dtup->bt_columns[attno - 1];
+ /*
+ * If the BRIN tuple indicates that this range is empty,
+ * we can skip it: there's nothing to match. We don't
+ * need to examine the next columns.
+ */
+ if (dtup->bt_empty_range)
+ {
+ addrange = false;
+ break;
+ }
+
/*
* First check if there are any IS [NOT] NULL scan keys,
* and if we're violating them. In that case we can
@@ -1589,6 +1600,8 @@ form_and_insert_tuple(BrinBuildState *state)
/*
* Given two deformed tuples, adjust the first one so that it's consistent
* with the summary values in both.
+ *
+ * XXX I'm not sure we can actually get empty "b".
*/
static void
union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
@@ -1606,6 +1619,64 @@ union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
db = brin_deform_tuple(bdesc, b, NULL);
MemoryContextSwitchTo(oldcxt);
+ /*
+ * Check if the ranges are empty.
+ *
+ * If at least one of them is empty, we don't need to call per-key union
+ * functions at all. If "b" is empty, we just use "a" as the result (it
+ * might be empty fine, but that's fine). If "a" is empty but "b" is not,
+ * we use "b" as the result (but we have to copy the data into "a" first).
+ *
+ * Only when both ranges are non-empty, we actually do the per-key merge.
+ */
+
+ /* If "b" is empty - ignore it and just use "a" (even if it's empty etc.). */
+ if (db->bt_empty_range)
+ {
+ /* skip the per-key merge */
+ MemoryContextDelete(cxt);
+ return;
+ }
+
+ /*
+ * Now we know "b" is not empty. If "a" is empty, then "b" is the result.
+ * But we need to copy the data from "b" to "a" first, because that's how
+ * we pass result out.
+ *
+ * We have to copy all the global/per-key flags etc. too.
+ */
+ if (a->bt_empty_range)
+ {
+ for (keyno = 0; keyno < bdesc->bd_tupdesc->natts; keyno++)
+ {
+ int i;
+ BrinValues *col_a = &a->bt_columns[keyno];
+ BrinValues *col_b = &db->bt_columns[keyno];
+ BrinOpcInfo *opcinfo = bdesc->bd_info[keyno];
+
+ col_a->bv_allnulls = col_b->bv_allnulls;
+ col_a->bv_hasnulls = col_b->bv_hasnulls;
+
+ /* If "b" has no data, we're done. */
+ if (col_b->bv_allnulls)
+ continue;
+
+ for (i = 0; i < opcinfo->oi_nstored; i++)
+ col_a->bv_values[i] =
+ datumCopy(col_b->bv_values[i],
+ opcinfo->oi_typcache[i]->typbyval,
+ opcinfo->oi_typcache[i]->typlen);
+ }
+
+ /* "a" started empty, but "b" was not empty, so remember that */
+ a->bt_empty_range = false;
+
+ /* skip the per-key merge */
+ MemoryContextDelete(cxt);
+ return;
+ }
+
+ /* Now we know neither range is empty. */
for (keyno = 0; keyno < bdesc->bd_tupdesc->natts; keyno++)
{
FmgrInfo *unionFn;
@@ -1703,7 +1774,9 @@ add_values_to_range(Relation idxRel, BrinDesc *bdesc, BrinMemTuple *dtup,
Datum *values, bool *nulls)
{
int keyno;
- bool modified = false;
+
+ /* If the range starts empty, we're certainly going to modify it. */
+ bool modified = dtup->bt_empty_range;
/*
* Compare the key values of the new tuple to the stored index values; our
@@ -1717,9 +1790,24 @@ add_values_to_range(Relation idxRel, BrinDesc *bdesc, BrinMemTuple *dtup,
Datum result;
BrinValues *bval;
FmgrInfo *addValue;
+ bool has_nulls;
bval = &dtup->bt_columns[keyno];
+ /*
+ * Does the range have actual NULL values? Either of the flags can
+ * be set, but we ignore the state before adding first row.
+ *
+ * We have to remember this, because we'll modify the flags and we
+ * need to know if the range started as empty.
+ */
+ has_nulls = ((!dtup->bt_empty_range) &&
+ (bval->bv_hasnulls || bval->bv_allnulls));
+
+ /*
+ * If the value we're adding is NULL, handle it locally. Otherwise
+ * call the BRIN_PROCNUM_ADDVALUE procedure.
+ */
if (bdesc->bd_info[keyno]->oi_regular_nulls && nulls[keyno])
{
/*
@@ -1745,8 +1833,33 @@ add_values_to_range(Relation idxRel, BrinDesc *bdesc, BrinMemTuple *dtup,
nulls[keyno]);
/* if that returned true, we need to insert the updated tuple */
modified |= DatumGetBool(result);
+
+ /*
+ * If the range was had actual NULL values (i.e. did not start empty),
+ * make sure we don't forget about the NULL values. Either the allnulls
+ * flag is still set to true, or (if the opclass cleared it) we need to
+ * set hasnulls=true.
+ *
+ * XXX This can only happen when the opclass modified the tuple, so the
+ * modified flag should be set.
+ */
+ if (has_nulls && !(bval->bv_hasnulls || bval->bv_allnulls))
+ {
+ Assert(modified);
+ bval->bv_hasnulls = true;
+ }
}
+ /*
+ * After updating summaries for all the keys, mark it as not empty.
+ *
+ * If we're actually changing the flag value (i.e. tuple started as empty),
+ * we should have modified the tuple. So we should not see empty range that
+ * was not modified.
+ */
+ Assert(!dtup->bt_empty_range || modified);
+ dtup->bt_empty_range = false;
+
return modified;
}
diff --git a/src/backend/access/brin/brin_tuple.c b/src/backend/access/brin/brin_tuple.c
index 84b79dbfc0d..b81247a262c 100644
--- a/src/backend/access/brin/brin_tuple.c
+++ b/src/backend/access/brin/brin_tuple.c
@@ -372,6 +372,9 @@ brin_form_tuple(BrinDesc *brdesc, BlockNumber blkno, BrinMemTuple *tuple,
if (tuple->bt_placeholder)
rettuple->bt_info |= BRIN_PLACEHOLDER_MASK;
+ if (tuple->bt_empty_range)
+ rettuple->bt_info |= BRIN_EMPTY_RANGE_MASK;
+
*size = len;
return rettuple;
}
@@ -399,7 +402,7 @@ brin_form_placeholder_tuple(BrinDesc *brdesc, BlockNumber blkno, Size *size)
rettuple = palloc0(len);
rettuple->bt_blkno = blkno;
rettuple->bt_info = hoff;
- rettuple->bt_info |= BRIN_NULLS_MASK | BRIN_PLACEHOLDER_MASK;
+ rettuple->bt_info |= BRIN_NULLS_MASK | BRIN_PLACEHOLDER_MASK | BRIN_EMPTY_RANGE_MASK;
bitP = ((bits8 *) ((char *) rettuple + SizeOfBrinTuple)) - 1;
bitmask = HIGHBIT;
@@ -489,6 +492,8 @@ brin_new_memtuple(BrinDesc *brdesc)
dtup->bt_allnulls = palloc(sizeof(bool) * brdesc->bd_tupdesc->natts);
dtup->bt_hasnulls = palloc(sizeof(bool) * brdesc->bd_tupdesc->natts);
+ dtup->bt_empty_range = true;
+
dtup->bt_context = AllocSetContextCreate(CurrentMemoryContext,
"brin dtuple",
ALLOCSET_DEFAULT_SIZES);
@@ -527,6 +532,9 @@ brin_memtuple_initialize(BrinMemTuple *dtuple, BrinDesc *brdesc)
currdatum += sizeof(Datum) * brdesc->bd_info[i]->oi_nstored;
}
+ /* FIXME Shouldn't this reset bt_placeholder too? */
+ dtuple->bt_empty_range = true;
+
return dtuple;
}
@@ -560,6 +568,11 @@ brin_deform_tuple(BrinDesc *brdesc, BrinTuple *tuple, BrinMemTuple *dMemtuple)
if (BrinTupleIsPlaceholder(tuple))
dtup->bt_placeholder = true;
+
+ /* ranges start as empty, depends on the BrinTuple */
+ if (!BrinTupleIsEmptyRange(tuple))
+ dtup->bt_empty_range = false;
+
dtup->bt_blkno = tuple->bt_blkno;
values = dtup->bt_values;
diff --git a/src/include/access/brin_tuple.h b/src/include/access/brin_tuple.h
index 732f91edf11..c56747aca4a 100644
--- a/src/include/access/brin_tuple.h
+++ b/src/include/access/brin_tuple.h
@@ -44,6 +44,7 @@ typedef struct BrinValues
typedef struct BrinMemTuple
{
bool bt_placeholder; /* this is a placeholder tuple */
+ bool bt_empty_range; /* range represents no tuples */
BlockNumber bt_blkno; /* heap blkno that the tuple is for */
MemoryContext bt_context; /* memcxt holding the bt_columns values */
/* output arrays for brin_deform_tuple: */
@@ -69,7 +70,7 @@ typedef struct BrinTuple
*
* 7th (high) bit: has nulls
* 6th bit: is placeholder tuple
- * 5th bit: unused
+ * 5th bit: range is empty
* 4-0 bit: offset of data
* ---------------
*/
@@ -82,13 +83,14 @@ typedef struct BrinTuple
* bt_info manipulation macros
*/
#define BRIN_OFFSET_MASK 0x1F
-/* bit 0x20 is not used at present */
+#define BRIN_EMPTY_RANGE_MASK 0x20
#define BRIN_PLACEHOLDER_MASK 0x40
#define BRIN_NULLS_MASK 0x80
#define BrinTupleDataOffset(tup) ((Size) (((BrinTuple *) (tup))->bt_info & BRIN_OFFSET_MASK))
#define BrinTupleHasNulls(tup) (((((BrinTuple *) (tup))->bt_info & BRIN_NULLS_MASK)) != 0)
#define BrinTupleIsPlaceholder(tup) (((((BrinTuple *) (tup))->bt_info & BRIN_PLACEHOLDER_MASK)) != 0)
+#define BrinTupleIsEmptyRange(tup) (((((BrinTuple *) (tup))->bt_info & BRIN_EMPTY_RANGE_MASK)) != 0)
extern BrinTuple *brin_form_tuple(BrinDesc *brdesc, BlockNumber blkno,
diff --git a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
index 2a4755d0998..584ac2602f7 100644
--- a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
+++ b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
@@ -4,7 +4,7 @@ starting permutation: s2check s1b s2b s1i s2summ s1c s2c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -26,7 +26,7 @@ step s2c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
2| 1| 1|f |f |f |{1 .. 1000}
(2 rows)
@@ -35,7 +35,7 @@ starting permutation: s2check s1b s1i s2vacuum s1c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -45,7 +45,7 @@ step s1c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
2| 1| 1|f |f |f |{1 .. 1000}
(2 rows)
diff --git a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
index 19ac18a2e88..18ba92b7ba1 100644
--- a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
+++ b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
@@ -9,6 +9,7 @@ setup
) WITH (fillfactor=10);
CREATE INDEX brinidx ON brin_iso USING brin (value) WITH (pages_per_range=1);
-- this fills the first page
+ INSERT INTO brin_iso VALUES (NULL);
DO $$
DECLARE curtid tid;
BEGIN
--
2.40.0
0002-Show-empty-ranges-in-brin_page_items-20230423-master.patchtext/x-patch; charset=UTF-8; name=0002-Show-empty-ranges-in-brin_page_items-20230423-master.patchDownload
From da108ee9bcfd4b8a4ac8b9fba2cc8b6ab4f757f7 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 27 Mar 2023 22:47:12 +0200
Subject: [PATCH 2/2] Show empty ranges in brin_page_items
Show which BRIN ranges are empty (no rows), as indicated by the newly
introduced flag.
---
contrib/pageinspect/brinfuncs.c | 10 ++++---
contrib/pageinspect/expected/brin.out | 6 ++--
.../pageinspect/pageinspect--1.11--1.12.sql | 17 +++++++++++
...summarization-and-inprogress-insertion.out | 28 +++++++++----------
4 files changed, 40 insertions(+), 21 deletions(-)
diff --git a/contrib/pageinspect/brinfuncs.c b/contrib/pageinspect/brinfuncs.c
index 000dcd8f5d8..a781f265514 100644
--- a/contrib/pageinspect/brinfuncs.c
+++ b/contrib/pageinspect/brinfuncs.c
@@ -201,8 +201,8 @@ brin_page_items(PG_FUNCTION_ARGS)
dtup = NULL;
for (;;)
{
- Datum values[7];
- bool nulls[7] = {0};
+ Datum values[8];
+ bool nulls[8] = {0};
/*
* This loop is called once for every attribute of every tuple in the
@@ -239,6 +239,7 @@ brin_page_items(PG_FUNCTION_ARGS)
nulls[4] = true;
nulls[5] = true;
nulls[6] = true;
+ nulls[7] = true;
}
else
{
@@ -261,6 +262,7 @@ brin_page_items(PG_FUNCTION_ARGS)
values[3] = BoolGetDatum(dtup->bt_columns[att].bv_allnulls);
values[4] = BoolGetDatum(dtup->bt_columns[att].bv_hasnulls);
values[5] = BoolGetDatum(dtup->bt_placeholder);
+ values[6] = BoolGetDatum(dtup->bt_empty_range);
if (!dtup->bt_columns[att].bv_allnulls)
{
BrinValues *bvalues = &dtup->bt_columns[att];
@@ -286,12 +288,12 @@ brin_page_items(PG_FUNCTION_ARGS)
}
appendStringInfoChar(&s, '}');
- values[6] = CStringGetTextDatum(s.data);
+ values[7] = CStringGetTextDatum(s.data);
pfree(s.data);
}
else
{
- nulls[6] = true;
+ nulls[7] = true;
}
}
diff --git a/contrib/pageinspect/expected/brin.out b/contrib/pageinspect/expected/brin.out
index e12fbeb4774..098ddc202f4 100644
--- a/contrib/pageinspect/expected/brin.out
+++ b/contrib/pageinspect/expected/brin.out
@@ -43,9 +43,9 @@ SELECT * FROM brin_revmap_data(get_raw_page('test1_a_idx', 1)) LIMIT 5;
SELECT * FROM brin_page_items(get_raw_page('test1_a_idx', 2), 'test1_a_idx')
ORDER BY blknum, attnum LIMIT 5;
- itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
-------------+--------+--------+----------+----------+-------------+----------
- 1 | 0 | 1 | f | f | f | {1 .. 1}
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | f | f | f | {1 .. 1}
(1 row)
-- Mask DETAIL messages as these are not portable across architectures.
diff --git a/contrib/pageinspect/pageinspect--1.11--1.12.sql b/contrib/pageinspect/pageinspect--1.11--1.12.sql
index 70c3abccf57..a20d67a9e82 100644
--- a/contrib/pageinspect/pageinspect--1.11--1.12.sql
+++ b/contrib/pageinspect/pageinspect--1.11--1.12.sql
@@ -21,3 +21,20 @@ CREATE FUNCTION bt_multi_page_stats(IN relname text, IN blkno int8, IN blk_count
RETURNS SETOF record
AS 'MODULE_PATHNAME', 'bt_multi_page_stats'
LANGUAGE C STRICT PARALLEL RESTRICTED;
+
+--
+-- add information about BRIN empty ranges
+--
+DROP FUNCTION brin_page_items(IN page bytea, IN index_oid regclass);
+CREATE FUNCTION brin_page_items(IN page bytea, IN index_oid regclass,
+ OUT itemoffset int,
+ OUT blknum int8,
+ OUT attnum int,
+ OUT allnulls bool,
+ OUT hasnulls bool,
+ OUT placeholder bool,
+ OUT empty bool,
+ OUT value text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'brin_page_items'
+LANGUAGE C STRICT PARALLEL RESTRICTED;
diff --git a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
index 584ac2602f7..201786c82c0 100644
--- a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
+++ b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
@@ -2,9 +2,9 @@ Parsed test spec with 2 sessions
starting permutation: s2check s1b s2b s1i s2summ s1c s2c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
-itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
-----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |t |f |{1 .. 1}
+itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|empty|value
+----------+------+------+--------+--------+-----------+-----+--------
+ 1| 0| 1|f |t |f |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -24,18 +24,18 @@ brin_summarize_new_values
step s1c: COMMIT;
step s2c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
-itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
-----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |t |f |{1 .. 1}
- 2| 1| 1|f |f |f |{1 .. 1000}
+itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|empty|value
+----------+------+------+--------+--------+-----------+-----+-----------
+ 1| 0| 1|f |t |f |f |{1 .. 1}
+ 2| 1| 1|f |f |f |f |{1 .. 1000}
(2 rows)
starting permutation: s2check s1b s1i s2vacuum s1c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
-itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
-----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |t |f |{1 .. 1}
+itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|empty|value
+----------+------+------+--------+--------+-----------+-----+--------
+ 1| 0| 1|f |t |f |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -43,9 +43,9 @@ step s1i: INSERT INTO brin_iso VALUES (1000);
step s2vacuum: VACUUM brin_iso;
step s1c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
-itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
-----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |t |f |{1 .. 1}
- 2| 1| 1|f |f |f |{1 .. 1000}
+itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|empty|value
+----------+------+------+--------+--------+-----------+-----+-----------
+ 1| 0| 1|f |t |f |f |{1 .. 1}
+ 2| 1| 1|f |f |f |f |{1 .. 1000}
(2 rows)
--
2.40.0
On 2023-Apr-23, Tomas Vondra wrote:
here's an updated version of the patch, including a backport version. I
ended up making the code yet a bit closer to master by introducing
add_values_to_range(). The current pre-14 code has the loop adding data
to the BRIN tuple in two places, but with the "fixed" logic handling
NULLs and the empty_range flag the amount of duplicated code got too
high, so this seem reasonable.
In backbranches, the new field to BrinMemTuple needs to be at the end of
the struct, to avoid ABI breakage.
There's a comment in add_values_to_range with a typo "If the range was had".
Also, "So we should not see empty range that was not modified" should
perhaps be "should not see an empty range".
(As for your FIXME comment in brin_memtuple_initialize, I think you're
correct: we definitely need to reset bt_placeholder. Otherwise, we risk
places that call it in a loop using a BrinMemTuple with one range with
the flag set, in a range where that doesn't hold. It might be
impossible for this to happen, given how narrow the conditions are on
which bt_placeholder is used; but it seems safer to reset it anyway.)
Some pgindent noise would be induced by this patch. I think it's worth
cleaning up ahead of time.
I did a quick experiment of turning the patches over -- applying test
first, code fix after (requires some conflict fixing). With that I
verified that the behavior of 'hasnulls' indeed changes with the code
fix.
Both cases have a patch extending pageinspect to show the new flag, but
obviously we should commit that only in master. In backbranches it's
meant only to make testing easier.
In backbranches, I think it should be reasonably easy to add a
--1.7--1.7.1.sql file and set the default version to 1.7.1; that then
enables us to have the functionality (and the tests) in older branches
too. If you just add a --1.X.1--1.12.sql version to each branch that's
identical to that branch's current pageinspect version upgrade script,
that would let us have working upgrade paths for all branches. This is
a bit laborious but straightforward enough.
If you don't feel like adding that, I volunteer to add the necessary
scripts to all branches after you commit the bugfix, and ensure that all
upgrade paths work correctly.
I plan to do a bit more testing, I'd welcome some feedback - it's a
long-standing bug, and it'd be good to finally get this fixed. I don't
think the patch can be made any simpler.
The approach looks good to me.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Oh, great altar of passive entertainment, bestow upon me thy discordant images
at such speed as to render linear thought impossible" (Calvin a la TV)
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
On 2023-Apr-23, Tomas Vondra wrote:
Both cases have a patch extending pageinspect to show the new flag, but
obviously we should commit that only in master. In backbranches it's
meant only to make testing easier.
In backbranches, I think it should be reasonably easy to add a
--1.7--1.7.1.sql file and set the default version to 1.7.1; that then
enables us to have the functionality (and the tests) in older branches
too. If you just add a --1.X.1--1.12.sql version to each branch that's
identical to that branch's current pageinspect version upgrade script,
that would let us have working upgrade paths for all branches. This is
a bit laborious but straightforward enough.
"A bit laborious"? That seems enormously out of proportion to the
benefit of putting this test case into back branches. It will have
costs for end users too, not only us. As an example, it would
unecessarily block some upgrade paths, if the upgraded-to installation
is slightly older and lacks the necessary --1.X.1--1.12 script.
If you don't feel like adding that, I volunteer to add the necessary
scripts to all branches after you commit the bugfix, and ensure that all
upgrade paths work correctly.
I do not think this should happen at all, whether you're willing to
do the work or not.
regards, tom lane
On 4/24/23 17:46, Tom Lane wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
On 2023-Apr-23, Tomas Vondra wrote:
Both cases have a patch extending pageinspect to show the new flag, but
obviously we should commit that only in master. In backbranches it's
meant only to make testing easier.In backbranches, I think it should be reasonably easy to add a
--1.7--1.7.1.sql file and set the default version to 1.7.1; that then
enables us to have the functionality (and the tests) in older branches
too. If you just add a --1.X.1--1.12.sql version to each branch that's
identical to that branch's current pageinspect version upgrade script,
that would let us have working upgrade paths for all branches. This is
a bit laborious but straightforward enough."A bit laborious"? That seems enormously out of proportion to the
benefit of putting this test case into back branches. It will have
costs for end users too, not only us. As an example, it would
unecessarily block some upgrade paths, if the upgraded-to installation
is slightly older and lacks the necessary --1.X.1--1.12 script.
Why would that block the upgrade? Presumably we'd add two upgrade
scripts in the master, to allow upgrade both from 1.X and 1.X.1.
If you don't feel like adding that, I volunteer to add the necessary
scripts to all branches after you commit the bugfix, and ensure that all
upgrade paths work correctly.I do not think this should happen at all, whether you're willing to
do the work or not.
FWIW I'm fine with not doing that. As mentioned, I only included this
patch to make testing the patch easier (otherwise the flag is impossible
to inspect directly).
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
On 4/24/23 17:46, Tom Lane wrote:
"A bit laborious"? That seems enormously out of proportion to the
benefit of putting this test case into back branches. It will have
costs for end users too, not only us. As an example, it would
unecessarily block some upgrade paths, if the upgraded-to installation
is slightly older and lacks the necessary --1.X.1--1.12 script.
Why would that block the upgrade? Presumably we'd add two upgrade
scripts in the master, to allow upgrade both from 1.X and 1.X.1.
It would for example block updating from 14.8 or later to 15.2, since
a 15.2 installation would not have the script to update from 1.X.1.
Yeah, people could work around that by only installing the latest
version, but there are plenty of real-world scenarios where you'd be
creating friction, or at least confusion. I do not think that this
test case is worth it.
regards, tom lane
On 4/24/23 23:05, Tomas Vondra wrote:
On 4/24/23 17:46, Tom Lane wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
On 2023-Apr-23, Tomas Vondra wrote:
Both cases have a patch extending pageinspect to show the new flag, but
obviously we should commit that only in master. In backbranches it's
meant only to make testing easier.In backbranches, I think it should be reasonably easy to add a
--1.7--1.7.1.sql file and set the default version to 1.7.1; that then
enables us to have the functionality (and the tests) in older branches
too. If you just add a --1.X.1--1.12.sql version to each branch that's
identical to that branch's current pageinspect version upgrade script,
that would let us have working upgrade paths for all branches. This is
a bit laborious but straightforward enough."A bit laborious"? That seems enormously out of proportion to the
benefit of putting this test case into back branches. It will have
costs for end users too, not only us. As an example, it would
unecessarily block some upgrade paths, if the upgraded-to installation
is slightly older and lacks the necessary --1.X.1--1.12 script.Why would that block the upgrade? Presumably we'd add two upgrade
scripts in the master, to allow upgrade both from 1.X and 1.X.1.
D'oh! I just realized I misunderstood the issue. Yes, if the target
version is missing the new script, that won't work. I'm not sure how
likely that is - in my experience people refresh versions at the same
time - but it's certainly an assumption we shouldn't do, I guess.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 4/24/23 17:36, Alvaro Herrera wrote:
On 2023-Apr-23, Tomas Vondra wrote:
here's an updated version of the patch, including a backport version. I
ended up making the code yet a bit closer to master by introducing
add_values_to_range(). The current pre-14 code has the loop adding data
to the BRIN tuple in two places, but with the "fixed" logic handling
NULLs and the empty_range flag the amount of duplicated code got too
high, so this seem reasonable.In backbranches, the new field to BrinMemTuple needs to be at the end of
the struct, to avoid ABI breakage.
Good point.
There's a comment in add_values_to_range with a typo "If the range was had".
Also, "So we should not see empty range that was not modified" should
perhaps be "should not see an empty range".
OK, I'll check the wording of the comments.
(As for your FIXME comment in brin_memtuple_initialize, I think you're
correct: we definitely need to reset bt_placeholder. Otherwise, we risk
places that call it in a loop using a BrinMemTuple with one range with
the flag set, in a range where that doesn't hold. It might be
impossible for this to happen, given how narrow the conditions are on
which bt_placeholder is used; but it seems safer to reset it anyway.)
Yeah. But isn't that a separate preexisting issue, strictly speaking?
Some pgindent noise would be induced by this patch. I think it's worth
cleaning up ahead of time.
True. Will do.
I did a quick experiment of turning the patches over -- applying test
first, code fix after (requires some conflict fixing). With that I
verified that the behavior of 'hasnulls' indeed changes with the code
fix.
Thanks. Could you do some testing of the union_tuples stuff too? It's a
bit tricky part - the behavior is timing sensitive, so testing it
requires gdb breakpoints breakpoints or something like that. I think
it's correct, but it'd be nice to check that.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2023-Apr-24, Tomas Vondra wrote:
On 4/24/23 17:36, Alvaro Herrera wrote:
(As for your FIXME comment in brin_memtuple_initialize, I think you're
correct: we definitely need to reset bt_placeholder. Otherwise, we risk
places that call it in a loop using a BrinMemTuple with one range with
the flag set, in a range where that doesn't hold. It might be
impossible for this to happen, given how narrow the conditions are on
which bt_placeholder is used; but it seems safer to reset it anyway.)Yeah. But isn't that a separate preexisting issue, strictly speaking?
Yes.
I did a quick experiment of turning the patches over -- applying test
first, code fix after (requires some conflict fixing). With that I
verified that the behavior of 'hasnulls' indeed changes with the code
fix.Thanks. Could you do some testing of the union_tuples stuff too? It's a
bit tricky part - the behavior is timing sensitive, so testing it
requires gdb breakpoints breakpoints or something like that. I think
it's correct, but it'd be nice to check that.
I'll have a look.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
<inflex> really, I see PHP as like a strange amalgamation of C, Perl, Shell
<crab> inflex: you know that "amalgam" means "mixture with mercury",
more or less, right?
<crab> i.e., "deadly poison"
On 4/24/23 23:20, Tomas Vondra wrote:
On 4/24/23 17:36, Alvaro Herrera wrote:
On 2023-Apr-23, Tomas Vondra wrote:
here's an updated version of the patch, including a backport version. I
ended up making the code yet a bit closer to master by introducing
add_values_to_range(). The current pre-14 code has the loop adding data
to the BRIN tuple in two places, but with the "fixed" logic handling
NULLs and the empty_range flag the amount of duplicated code got too
high, so this seem reasonable.In backbranches, the new field to BrinMemTuple needs to be at the end of
the struct, to avoid ABI breakage.
Unfortunately, this is not actually possible :-(
The BrinMemTuple has a FLEXIBLE_ARRAY_MEMBER at the end, so we can't
place anything after it. I think we have three options:
a) some other approach? - I really can't see any, except maybe for going
back to the previous approach (i.e. encoding the info using the existing
BrinValues allnulls/hasnulls flags)
b) encoding the info in existing BrinMemTuple flags - e.g. we could use
bt_placeholder to store two bits, not just one. Seems a bit ugly.
c) ignore the issue - AFAICS this would be an issue only for (external)
code accessing BrinMemTuple structs, but I don't think we're aware of
any out-of-core BRIN opclasses or anything like that ...
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On Sun, May 07, 2023 at 12:13:07AM +0200, Tomas Vondra wrote:
c) ignore the issue - AFAICS this would be an issue only for (external)
code accessing BrinMemTuple structs, but I don't think we're aware of
any out-of-core BRIN opclasses or anything like that ...
FTR there's at least postgis that implments BRIN opclasses (for geometries and
geographies), but there's nothing fancy in the implementation and it doesn't
access BrinMemTuple struct.
On 5/7/23 07:08, Julien Rouhaud wrote:
Hi,
On Sun, May 07, 2023 at 12:13:07AM +0200, Tomas Vondra wrote:
c) ignore the issue - AFAICS this would be an issue only for (external)
code accessing BrinMemTuple structs, but I don't think we're aware of
any out-of-core BRIN opclasses or anything like that ...FTR there's at least postgis that implments BRIN opclasses (for geometries and
geographies), but there's nothing fancy in the implementation and it doesn't
access BrinMemTuple struct.
Right. I believe that should be fine, because opclasses don't access the
tuple directly - instead we pass pointers to individual pieces. But
maybe it'd be a good idea to test this.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2023-May-07, Tomas Vondra wrote:
Álvaro wrote:
In backbranches, the new field to BrinMemTuple needs to be at the end of
the struct, to avoid ABI breakage.Unfortunately, this is not actually possible :-(
The BrinMemTuple has a FLEXIBLE_ARRAY_MEMBER at the end, so we can't
place anything after it. I think we have three options:a) some other approach? - I really can't see any, except maybe for going
back to the previous approach (i.e. encoding the info using the existing
BrinValues allnulls/hasnulls flags)
Actually, mine was quite the stupid suggestion: the BrinMemTuple already
has a 3 byte hole in the place where you originally wanted to add the
flag:
struct BrinMemTuple {
_Bool bt_placeholder; /* 0 1 */
/* XXX 3 bytes hole, try to pack */
BlockNumber bt_blkno; /* 4 4 */
MemoryContext bt_context; /* 8 8 */
Datum * bt_values; /* 16 8 */
_Bool * bt_allnulls; /* 24 8 */
_Bool * bt_hasnulls; /* 32 8 */
BrinValues bt_columns[]; /* 40 0 */
/* size: 40, cachelines: 1, members: 7 */
/* sum members: 37, holes: 1, sum holes: 3 */
/* last cacheline: 40 bytes */
};
so putting it there was already not causing any ABI breakage. So, the
solution to this problem of not being able to put it at the end is just
to return the struct to your original formulation.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)
On 5/15/23 12:06, Alvaro Herrera wrote:
On 2023-May-07, Tomas Vondra wrote:
Álvaro wrote:
In backbranches, the new field to BrinMemTuple needs to be at the end of
the struct, to avoid ABI breakage.Unfortunately, this is not actually possible :-(
The BrinMemTuple has a FLEXIBLE_ARRAY_MEMBER at the end, so we can't
place anything after it. I think we have three options:a) some other approach? - I really can't see any, except maybe for going
back to the previous approach (i.e. encoding the info using the existing
BrinValues allnulls/hasnulls flags)Actually, mine was quite the stupid suggestion: the BrinMemTuple already
has a 3 byte hole in the place where you originally wanted to add the
flag:struct BrinMemTuple {
_Bool bt_placeholder; /* 0 1 *//* XXX 3 bytes hole, try to pack */
BlockNumber bt_blkno; /* 4 4 */
MemoryContext bt_context; /* 8 8 */
Datum * bt_values; /* 16 8 */
_Bool * bt_allnulls; /* 24 8 */
_Bool * bt_hasnulls; /* 32 8 */
BrinValues bt_columns[]; /* 40 0 *//* size: 40, cachelines: 1, members: 7 */
/* sum members: 37, holes: 1, sum holes: 3 */
/* last cacheline: 40 bytes */
};so putting it there was already not causing any ABI breakage. So, the
solution to this problem of not being able to put it at the end is just
to return the struct to your original formulation.
Thanks, that's pretty lucky. It means we're not breaking on-disk format
nor ABI, which is great.
Attached is a final version of the patches - I intend to do a bit more
testing, go through the comments once more, and get this committed today
or perhaps tomorrow morning, so that it gets into beta1.
Unfortunately, while polishing the patches, I realized union_tuples()
has yet another long-standing bug with handling NULL values, because it
does this:
/* Adjust "hasnulls". */
if (!col_a->bv_hasnulls && col_b->bv_hasnulls)
col_a->bv_hasnulls = true;
but let's assume "col_a" is a summary representing "1" and "col_b"
represents NULL (col_b->bv_hasnulls=false col_b->bv_allnulls=true).
Well, in that case we fail to "remember" col_a should represent NULL
values too :-(
This is somewhat separate issue, because it's unrelated to empty ranges
(neither of the two ranges is empty). It's hard to test it, because it
requires a particular timing of the concurrent actions, but a breakpoint
in brin.c on the brin_can_do_samepage_update call (in summarize_range)
does the trick for manual testing.
0001 fixes the issue. 0002 is the original fix, and 0003 is just the
pageinspect changes (for master only).
For the backbranches, I thought about making the code more like master
(by moving some of the handling from opclasses to brin.c), but decided
not to. It'd be low-risk, but it feels wrong to kinda do what the master
does under "oi_regular_nulls" flag.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
0001-Fix-handling-of-NULLs-when-merging-BRIN-su-14-master.patchtext/x-patch; charset=UTF-8; name=0001-Fix-handling-of-NULLs-when-merging-BRIN-su-14-master.patchDownload
From 698609154324f64831ac44e4440a2283691184e2 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Thu, 18 May 2023 13:00:31 +0200
Subject: [PATCH 1/3] Fix handling of NULLs when merging BRIN summaries
When merging BRIN summaries, union_tuples() did not correctly update the
target hasnulls/allnulls flags. When merging all-NULL summary into a
summary without any NULL values, the result had both flags set to false
(instead of having hasnulls=true).
This happened because the code only considered the hasnulls flags,
ignoring the possibility the source summary has allnulls=true.
Discovered while investigating issues with handling empty BRIN ranges
and handling of NULL values, but it's a separate problem (has nothing to
do with empty ranges).
Fixed by considering both flags on the source summary, and updating the
hasnulls flag on the target summary.
Backpatch to 11. The bug exists since 9.5 (where BRIN indexes were
introduced), but those releases are EOL already.
Discussion: https://postgr.es/m/402430e4-7d9d-6cf1-09ef-464d80afff3b%40enterprisedb.com
---
src/backend/access/brin/brin.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 41bf950a4af..a155525b7df 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -1613,10 +1613,13 @@ union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
BrinValues *col_b = &db->bt_columns[keyno];
BrinOpcInfo *opcinfo = bdesc->bd_info[keyno];
if (opcinfo->oi_regular_nulls)
{
+ /* Does the "b" summary represent any NULL values? */
+ bool b_has_nulls = (col_b->bv_hasnulls || col_b->bv_allnulls);
+
/* Adjust "hasnulls". */
- if (!col_a->bv_hasnulls && col_b->bv_hasnulls)
+ if (!col_a->bv_allnulls && b_has_nulls)
col_a->bv_hasnulls = true;
/* If there are no values in B, there's nothing left to do. */
@@ -1628,12 +1631,17 @@ union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
* values from B into A, and we're done. We cannot run the
* operators in this case, because values in A might contain
* garbage. Note we already established that B contains values.
+ *
+ * Also adjust "hasnulls" in order not to forget the summary
+ * represents NULL values. This is not redundant with the earlier
+ * update, because that only happens when allnulls=false.
*/
if (col_a->bv_allnulls)
{
int i;
col_a->bv_allnulls = false;
+ col_a->bv_hasnulls = true;
for (i = 0; i < opcinfo->oi_nstored; i++)
col_a->bv_values[i] =
--
2.40.1
0001-Fix-handling-of-NULLs-when-merging-BRIN-summar-11-13.patchtext/x-patch; charset=UTF-8; name=0001-Fix-handling-of-NULLs-when-merging-BRIN-summar-11-13.patchDownload
From 57ab42cfc55ff7f078de00af8d0e0c44a5354658 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Thu, 18 May 2023 13:33:10 +0200
Subject: [PATCH 1/2] Fix handling of NULLs when merging BRIN summaries
When merging BRIN summaries, union_tuples() did not correctly update the
target hasnulls/allnulls flags. When merging all-NULL summary into a
summary without any NULL values, the result had both flags set to false
(instead of having hasnulls=true).
This happened because the code only considered the hasnulls flags,
ignoring the possibility the source summary has allnulls=true.
Discovered while investigating issues with handling empty BRIN ranges
and handling of NULL values, but it's a separate problem (has nothing to
do with empty ranges).
Fixed by considering both flags on the source summary, and updating the
hasnulls flag on the target summary.
Backpatch to 11. The bug exists since 9.5 (where BRIN indexes were
introduced), but those releases are EOL already.
Discussion: https://postgr.es/m/402430e4-7d9d-6cf1-09ef-464d80afff3b%40enterprisedb.com
---
src/backend/access/brin/brin_inclusion.c | 10 +++++++++-
src/backend/access/brin/brin_minmax.c | 10 +++++++++-
2 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/brin/brin_inclusion.c b/src/backend/access/brin/brin_inclusion.c
index 7e380d66ed5..ca45178fbff 100644
--- a/src/backend/access/brin/brin_inclusion.c
+++ b/src/backend/access/brin/brin_inclusion.c
@@ -515,10 +515,13 @@ brin_inclusion_union(PG_FUNCTION_ARGS)
FmgrInfo *finfo;
Datum result;
+ /* Does the "b" summary represent any NULL values? */
+ bool b_has_nulls = (col_b->bv_hasnulls || col_b->bv_allnulls);
+
Assert(col_a->bv_attno == col_b->bv_attno);
/* Adjust "hasnulls". */
- if (!col_a->bv_hasnulls && col_b->bv_hasnulls)
+ if (!col_a->bv_allnulls && b_has_nulls)
col_a->bv_hasnulls = true;
/* If there are no values in B, there's nothing left to do. */
@@ -533,10 +536,15 @@ brin_inclusion_union(PG_FUNCTION_ARGS)
* B into A, and we're done. We cannot run the operators in this case,
* because values in A might contain garbage. Note we already established
* that B contains values.
+ *
+ * Also adjust "hasnulls" in order not to forget the summary represents NULL
+ * values. This is not redundant with the earlier update, because that only
+ * happens when allnulls=false.
*/
if (col_a->bv_allnulls)
{
col_a->bv_allnulls = false;
+ col_a->bv_hasnulls = true;
col_a->bv_values[INCLUSION_UNION] =
datumCopy(col_b->bv_values[INCLUSION_UNION],
attr->attbyval, attr->attlen);
diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 4b5d6a72135..40da0c8094a 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -248,10 +248,13 @@ brin_minmax_union(PG_FUNCTION_ARGS)
FmgrInfo *finfo;
bool needsadj;
+ /* Does the "b" summary represent any NULL values? */
+ bool b_has_nulls = (col_b->bv_hasnulls || col_b->bv_allnulls);
+
Assert(col_a->bv_attno == col_b->bv_attno);
/* Adjust "hasnulls" */
- if (!col_a->bv_hasnulls && col_b->bv_hasnulls)
+ if (!col_a->bv_allnulls && b_has_nulls)
col_a->bv_hasnulls = true;
/* If there are no values in B, there's nothing left to do */
@@ -266,10 +269,15 @@ brin_minmax_union(PG_FUNCTION_ARGS)
* B into A, and we're done. We cannot run the operators in this case,
* because values in A might contain garbage. Note we already established
* that B contains values.
+ *
+ * Also adjust "hasnulls" in order not to forget the summary represents NULL
+ * values. This is not redundant with the earlier update, because that only
+ * happens when allnulls=false.
*/
if (col_a->bv_allnulls)
{
col_a->bv_allnulls = false;
+ col_a->bv_hasnulls = true;
col_a->bv_values[0] = datumCopy(col_b->bv_values[0],
attr->attbyval, attr->attlen);
col_a->bv_values[1] = datumCopy(col_b->bv_values[1],
--
2.40.1
0002-Fix-handling-of-NULLs-in-BRIN-indexes-11-13.patchtext/x-patch; charset=UTF-8; name=0002-Fix-handling-of-NULLs-in-BRIN-indexes-11-13.patchDownload
From 638fe7f96d21bc90622dab8a48bb8018fed9150e Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 23 Apr 2023 19:26:18 +0200
Subject: [PATCH 2/2] Fix handling of NULLs in BRIN indexes
BRIN indexes did not properly distinguish between summaries for empty
(no rows) and all-NULL ranges, treating them as essentially the same
thing. Summaries were initialized with allnulls=true, and opclasses
simply reset allnulls to false when processing the first non-NULL value.
This however produces incorrect results if the range starts with a NULL
value (or a sequence of NULL values), in which case we forget the range
contains NULL values when adding the first non-NULL value.
This happens because the allnulls flag is used for two separate
purposes - to mark empty ranges (not representing any rows yet) and
ranges containing only NULL values.
Opclasses don't know which of these cases it is, and so don't know
whether to set hasnulls=true. Setting the flag in both cases would make
it correct, but it would also make BRIN indexes useless for queries with
IS NULL clauses. All ranges start empty (and thus allnulls=true), so all
ranges would end up with either allnulls=true or hasnulls=true.
The severity of the issue is somewhat reduced by the fact that it only
happens when adding values to an existing summary with allnulls=true.
This can happen e.g. for small tables (because a summary for the first
range exists for all BRIN indexes), or for tables with large fraction of
NULL values in the indexed columns.
Bulk summarization (e.g. during CREATE INDEX or automatic summarization)
that processes all values at once is not affected by this issue. In this
case the flags were updated in a slightly different way, not forgetting
the NULL values.
To identify empty ranges we use a new flag, stored in an unused bit in
the BRIN tuple header so the on-disk format remains the same. A matching
flag is added to BrinMemTuple, into a 3B gap after bt_placeholder.
That means there's no risk of ABI breakage, although we don't actually
pass the BrinMemTuple to any public API.
We could also skip storing index tuples for empty summaries, but then
we'd have to always process such ranges - even if there are no rows in
large parts of the table (e.g. after a bulk DELETE), it would still
require reading the pages etc. So we store them, but ignore them when
building the bitmap.
Backpatch to 11. The issue exists since BRIN indexes were introduced in
9.5, but older releases are already EOL.
Backpatch-through: 11
Reviewed-by: Justin Pryzby, Matthias van de Meent, Alvaro Herrera
Discussion: https://postgr.es/m/402430e4-7d9d-6cf1-09ef-464d80afff3b@enterprisedb.com
---
src/backend/access/brin/brin.c | 176 +++++++++++++++++-
src/backend/access/brin/brin_tuple.c | 14 +-
src/include/access/brin_tuple.h | 6 +-
...summarization-and-inprogress-insertion.out | 8 +-
...ummarization-and-inprogress-insertion.spec | 1 +
5 files changed, 191 insertions(+), 14 deletions(-)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 0becfde1133..176ae0099ff 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -35,6 +35,7 @@
#include "storage/freespace.h"
#include "utils/acl.h"
#include "utils/builtins.h"
+#include "utils/datum.h"
#include "utils/index_selfuncs.h"
#include "utils/memutils.h"
#include "utils/rel.h"
@@ -173,7 +174,7 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
for (;;)
{
- bool need_insert = false;
+ bool need_insert;
OffsetNumber off;
BrinTuple *brtup;
BrinMemTuple *dtup;
@@ -241,6 +242,9 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
dtup = brin_deform_tuple(bdesc, brtup, NULL);
+ /* If the range starts empty, we're certainly going to modify it. */
+ need_insert = dtup->bt_empty_range;
+
/*
* Compare the key values of the new tuple to the stored index values;
* our deformed tuple will get updated if the new tuple doesn't fit
@@ -253,8 +257,20 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
Datum result;
BrinValues *bval;
FmgrInfo *addValue;
+ bool has_nulls;
bval = &dtup->bt_columns[keyno];
+
+ /*
+ * Does the range have actual NULL values? Either of the flags can
+ * be set, but we ignore the state before adding first row.
+ *
+ * We have to remember this, because we'll modify the flags and we
+ * need to know if the range started as empty.
+ */
+ has_nulls = ((!dtup->bt_empty_range) &&
+ (bval->bv_hasnulls || bval->bv_allnulls));
+
addValue = index_getprocinfo(idxRel, keyno + 1,
BRIN_PROCNUM_ADDVALUE);
result = FunctionCall4Coll(addValue,
@@ -265,8 +281,33 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
nulls[keyno]);
/* if that returned true, we need to insert the updated tuple */
need_insert |= DatumGetBool(result);
+
+ /*
+ * If the range was had actual NULL values (i.e. did not start empty),
+ * make sure we don't forget about the NULL values. Either the allnulls
+ * flag is still set to true, or (if the opclass cleared it) we need to
+ * set hasnulls=true.
+ *
+ * XXX This can only happen when the opclass modified the tuple, so the
+ * modified flag should be set.
+ */
+ if (has_nulls && !(bval->bv_hasnulls || bval->bv_allnulls))
+ {
+ Assert(need_insert);
+ bval->bv_hasnulls = true;
+ }
}
+ /*
+ * After updating summaries for all the keys, mark it as not empty.
+ *
+ * If we're actually changing the flag value (i.e. tuple started as
+ * empty), we should have modified the tuple. So we should not see
+ * empty range that was not modified.
+ */
+ Assert(!dtup->bt_empty_range || need_insert);
+ dtup->bt_empty_range = false;
+
if (!need_insert)
{
/*
@@ -508,6 +549,17 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
CurrentMemoryContext);
}
+ /*
+ * If the BRIN tuple indicates that this range is empty,
+ * we can skip it: there's nothing to match. We don't
+ * need to examine the next columns.
+ */
+ if (dtup->bt_empty_range)
+ {
+ addrange = false;
+ break;
+ }
+
/*
* Check whether the scan key is consistent with the page
* range values; if so, have the pages in the range added
@@ -612,6 +664,7 @@ brinbuildCallback(Relation index,
BrinBuildState *state = (BrinBuildState *) brstate;
BlockNumber thisblock;
int i;
+ bool modified;
thisblock = ItemPointerGetBlockNumber(tid);
@@ -640,25 +693,76 @@ brinbuildCallback(Relation index,
}
/* Accumulate the current tuple into the running state */
+
+ /* If the range starts empty, we're certainly going to modify it. */
+ modified = state->bs_dtuple->bt_empty_range;
+
+ /*
+ * Compare the key values of the new tuple to the stored index values;
+ * our deformed tuple will get updated if the new tuple doesn't fit
+ * the original range (note this means we can't break out of the loop
+ * early). Make a note of whether this happens, so that we know to
+ * insert the modified tuple later.
+ */
for (i = 0; i < state->bs_bdesc->bd_tupdesc->natts; i++)
{
FmgrInfo *addValue;
BrinValues *col;
Form_pg_attribute attr = TupleDescAttr(state->bs_bdesc->bd_tupdesc, i);
+ bool has_nulls;
+ Datum result;
col = &state->bs_dtuple->bt_columns[i];
+
+ /*
+ * Does the range have actual NULL values? Either of the flags can
+ * be set, but we ignore the state before adding first row.
+ *
+ * We have to remember this, because we'll modify the flags and we
+ * need to know if the range started as empty.
+ */
+ has_nulls = ((!state->bs_dtuple->bt_empty_range) &&
+ (col->bv_hasnulls || col->bv_allnulls));
+
+ /*
+ * Call the BRIN_PROCNUM_ADDVALUE procedure. We do this even for NULL
+ * values, because who knows what the opclass is doing.
+ */
addValue = index_getprocinfo(index, i + 1,
BRIN_PROCNUM_ADDVALUE);
+ result = FunctionCall4Coll(addValue,
+ attr->attcollation,
+ PointerGetDatum(state->bs_bdesc),
+ PointerGetDatum(col),
+ values[i], isnull[i]);
+ /* if that returned true, we need to insert the updated tuple */
+ modified |= DatumGetBool(result);
/*
- * Update dtuple state, if and as necessary.
+ * If the range was had actual NULL values (i.e. did not start empty),
+ * make sure we don't forget about the NULL values. Either the allnulls
+ * flag is still set to true, or (if the opclass cleared it) we need to
+ * set hasnulls=true.
+ *
+ * XXX This can only happen when the opclass modified the tuple, so the
+ * modified flag should be set.
*/
- FunctionCall4Coll(addValue,
- attr->attcollation,
- PointerGetDatum(state->bs_bdesc),
- PointerGetDatum(col),
- values[i], isnull[i]);
+ if (has_nulls && !(col->bv_hasnulls || col->bv_allnulls))
+ {
+ Assert(modified);
+ col->bv_hasnulls = true;
+ }
}
+
+ /*
+ * After updating summaries for all the keys, mark it as not empty.
+ *
+ * If we're actually changing the flag value (i.e. tuple started as
+ * empty), we should have modified the tuple. So we should not see
+ * empty range that was not modified.
+ */
+ Assert(!state->bs_dtuple->bt_empty_range || modified);
+ state->bs_dtuple->bt_empty_range = false;
}
/*
@@ -1465,6 +1569,64 @@ union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
db = brin_deform_tuple(bdesc, b, NULL);
MemoryContextSwitchTo(oldcxt);
+ /*
+ * Check if the ranges are empty.
+ *
+ * If at least one of them is empty, we don't need to call per-key union
+ * functions at all. If "b" is empty, we just use "a" as the result (it
+ * might be empty fine, but that's fine). If "a" is empty but "b" is not,
+ * we use "b" as the result (but we have to copy the data into "a" first).
+ *
+ * Only when both ranges are non-empty, we actually do the per-key merge.
+ */
+
+ /* If "b" is empty - ignore it and just use "a" (even if it's empty etc.). */
+ if (db->bt_empty_range)
+ {
+ /* skip the per-key merge */
+ MemoryContextDelete(cxt);
+ return;
+ }
+
+ /*
+ * Now we know "b" is not empty. If "a" is empty, then "b" is the result.
+ * But we need to copy the data from "b" to "a" first, because that's how
+ * we pass result out.
+ *
+ * We have to copy all the global/per-key flags etc. too.
+ */
+ if (a->bt_empty_range)
+ {
+ for (keyno = 0; keyno < bdesc->bd_tupdesc->natts; keyno++)
+ {
+ int i;
+ BrinValues *col_a = &a->bt_columns[keyno];
+ BrinValues *col_b = &db->bt_columns[keyno];
+ BrinOpcInfo *opcinfo = bdesc->bd_info[keyno];
+
+ col_a->bv_allnulls = col_b->bv_allnulls;
+ col_a->bv_hasnulls = col_b->bv_hasnulls;
+
+ /* If "b" has no data, we're done. */
+ if (col_b->bv_allnulls)
+ continue;
+
+ for (i = 0; i < opcinfo->oi_nstored; i++)
+ col_a->bv_values[i] =
+ datumCopy(col_b->bv_values[i],
+ opcinfo->oi_typcache[i]->typbyval,
+ opcinfo->oi_typcache[i]->typlen);
+ }
+
+ /* "a" started empty, but "b" was not empty, so remember that */
+ a->bt_empty_range = false;
+
+ /* skip the per-key merge */
+ MemoryContextDelete(cxt);
+ return;
+ }
+
+ /* Neither range is empty, so call the union proc. */
for (keyno = 0; keyno < bdesc->bd_tupdesc->natts; keyno++)
{
FmgrInfo *unionFn;
diff --git a/src/backend/access/brin/brin_tuple.c b/src/backend/access/brin/brin_tuple.c
index b3b453aed12..aafd2f17ca0 100644
--- a/src/backend/access/brin/brin_tuple.c
+++ b/src/backend/access/brin/brin_tuple.c
@@ -349,6 +349,9 @@ brin_form_tuple(BrinDesc *brdesc, BlockNumber blkno, BrinMemTuple *tuple,
if (tuple->bt_placeholder)
rettuple->bt_info |= BRIN_PLACEHOLDER_MASK;
+ if (tuple->bt_empty_range)
+ rettuple->bt_info |= BRIN_EMPTY_RANGE_MASK;
+
*size = len;
return rettuple;
}
@@ -376,7 +379,7 @@ brin_form_placeholder_tuple(BrinDesc *brdesc, BlockNumber blkno, Size *size)
rettuple = palloc0(len);
rettuple->bt_blkno = blkno;
rettuple->bt_info = hoff;
- rettuple->bt_info |= BRIN_NULLS_MASK | BRIN_PLACEHOLDER_MASK;
+ rettuple->bt_info |= BRIN_NULLS_MASK | BRIN_PLACEHOLDER_MASK | BRIN_EMPTY_RANGE_MASK;
bitP = ((bits8 *) ((char *) rettuple + SizeOfBrinTuple)) - 1;
bitmask = HIGHBIT;
@@ -466,6 +469,8 @@ brin_new_memtuple(BrinDesc *brdesc)
dtup->bt_allnulls = palloc(sizeof(bool) * brdesc->bd_tupdesc->natts);
dtup->bt_hasnulls = palloc(sizeof(bool) * brdesc->bd_tupdesc->natts);
+ dtup->bt_empty_range = true;
+
dtup->bt_context = AllocSetContextCreate(CurrentMemoryContext,
"brin dtuple",
ALLOCSET_DEFAULT_SIZES);
@@ -499,6 +504,8 @@ brin_memtuple_initialize(BrinMemTuple *dtuple, BrinDesc *brdesc)
currdatum += sizeof(Datum) * brdesc->bd_info[i]->oi_nstored;
}
+ dtuple->bt_empty_range = true;
+
return dtuple;
}
@@ -532,6 +539,11 @@ brin_deform_tuple(BrinDesc *brdesc, BrinTuple *tuple, BrinMemTuple *dMemtuple)
if (BrinTupleIsPlaceholder(tuple))
dtup->bt_placeholder = true;
+
+ /* ranges start as empty, depends on the BrinTuple */
+ if (!BrinTupleIsEmptyRange(tuple))
+ dtup->bt_empty_range = false;
+
dtup->bt_blkno = tuple->bt_blkno;
values = dtup->bt_values;
diff --git a/src/include/access/brin_tuple.h b/src/include/access/brin_tuple.h
index a9ccc3995b4..5280872707a 100644
--- a/src/include/access/brin_tuple.h
+++ b/src/include/access/brin_tuple.h
@@ -36,6 +36,7 @@ typedef struct BrinValues
typedef struct BrinMemTuple
{
bool bt_placeholder; /* this is a placeholder tuple */
+ bool bt_empty_range; /* range represents no tuples */
BlockNumber bt_blkno; /* heap blkno that the tuple is for */
MemoryContext bt_context; /* memcxt holding the bt_columns values */
/* output arrays for brin_deform_tuple: */
@@ -61,7 +62,7 @@ typedef struct BrinTuple
*
* 7th (high) bit: has nulls
* 6th bit: is placeholder tuple
- * 5th bit: unused
+ * 5th bit: range is empty
* 4-0 bit: offset of data
* ---------------
*/
@@ -74,13 +75,14 @@ typedef struct BrinTuple
* bt_info manipulation macros
*/
#define BRIN_OFFSET_MASK 0x1F
-/* bit 0x20 is not used at present */
+#define BRIN_EMPTY_RANGE_MASK 0x20
#define BRIN_PLACEHOLDER_MASK 0x40
#define BRIN_NULLS_MASK 0x80
#define BrinTupleDataOffset(tup) ((Size) (((BrinTuple *) (tup))->bt_info & BRIN_OFFSET_MASK))
#define BrinTupleHasNulls(tup) (((((BrinTuple *) (tup))->bt_info & BRIN_NULLS_MASK)) != 0)
#define BrinTupleIsPlaceholder(tup) (((((BrinTuple *) (tup))->bt_info & BRIN_PLACEHOLDER_MASK)) != 0)
+#define BrinTupleIsEmptyRange(tup) (((((BrinTuple *) (tup))->bt_info & BRIN_EMPTY_RANGE_MASK)) != 0)
extern BrinTuple *brin_form_tuple(BrinDesc *brdesc, BlockNumber blkno,
diff --git a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
index 2a4755d0998..584ac2602f7 100644
--- a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
+++ b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
@@ -4,7 +4,7 @@ starting permutation: s2check s1b s2b s1i s2summ s1c s2c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -26,7 +26,7 @@ step s2c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
2| 1| 1|f |f |f |{1 .. 1000}
(2 rows)
@@ -35,7 +35,7 @@ starting permutation: s2check s1b s1i s2vacuum s1c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -45,7 +45,7 @@ step s1c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
2| 1| 1|f |f |f |{1 .. 1000}
(2 rows)
diff --git a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
index 19ac18a2e88..18ba92b7ba1 100644
--- a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
+++ b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
@@ -9,6 +9,7 @@ setup
) WITH (fillfactor=10);
CREATE INDEX brinidx ON brin_iso USING brin (value) WITH (pages_per_range=1);
-- this fills the first page
+ INSERT INTO brin_iso VALUES (NULL);
DO $$
DECLARE curtid tid;
BEGIN
--
2.40.1
0002-Fix-handling-of-NULLs-in-BRIN-indexes-14-master.patchtext/x-patch; charset=UTF-8; name=0002-Fix-handling-of-NULLs-in-BRIN-indexes-14-master.patchDownload
From d82099ce68e58b28f5f7331f0dcbe013dbac65b0 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 8 Jan 2023 16:43:06 +0100
Subject: [PATCH 2/3] Fix handling of NULLs in BRIN indexes
BRIN indexes did not properly distinguish between summaries for empty
(no rows) and all-NULL ranges, treating them as essentially the same
thing. Summaries were initialized with allnulls=true, and opclasses
simply reset allnulls to false when processing the first non-NULL value.
This however produces incorrect results if the range starts with a NULL
value (or a sequence of NULL values), in which case we forget the range
contains NULL values when adding the first non-NULL value.
This happens because the allnulls flag is used for two separate
purposes - to mark empty ranges (not representing any rows yet) and
ranges containing only NULL values.
Opclasses don't know which of these cases it is, and so don't know
whether to set hasnulls=true. Setting the flag in both cases would make
it correct, but it would also make BRIN indexes useless for queries with
IS NULL clauses. All ranges start empty (and thus allnulls=true), so all
ranges would end up with either allnulls=true or hasnulls=true.
The severity of the issue is somewhat reduced by the fact that it only
happens when adding values to an existing summary with allnulls=true.
This can happen e.g. for small tables (because a summary for the first
range exists for all BRIN indexes), or for tables with large fraction of
NULL values in the indexed columns.
Bulk summarization (e.g. during CREATE INDEX or automatic summarization)
that processes all values at once is not affected by this issue. In this
case the flags were updated in a slightly different way, not forgetting
the NULL values.
To identify empty ranges we use a new flag, stored in an unused bit in
the BRIN tuple header so the on-disk format remains the same. A matching
flag is added to BrinMemTuple, into a 3B gap after bt_placeholder.
That means there's no risk of ABI breakage, although we don't actually
pass the BrinMemTuple to any public API.
We could also skip storing index tuples for empty summaries, but then
we'd have to always process such ranges - even if there are no rows in
large parts of the table (e.g. after a bulk DELETE), it would still
require reading the pages etc. So we store them, but ignore them when
building the bitmap.
Backpatch to 11. The issue exists since BRIN indexes were introduced in
9.5, but older releases are already EOL.
Backpatch-through: 11
Reviewed-by: Justin Pryzby, Matthias van de Meent, Alvaro Herrera
Discussion: https://postgr.es/m/402430e4-7d9d-6cf1-09ef-464d80afff3b@enterprisedb.com
---
src/backend/access/brin/brin.c | 113 +++++++++++++++++-
src/backend/access/brin/brin_tuple.c | 14 ++-
src/include/access/brin_tuple.h | 6 +-
...summarization-and-inprogress-insertion.out | 8 +-
...ummarization-and-inprogress-insertion.spec | 1 +
5 files changed, 134 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index a155525b7df..46aa1f1bc80 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -592,6 +592,17 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
bval = &dtup->bt_columns[attno - 1];
+ /*
+ * If the BRIN tuple indicates that this range is empty,
+ * we can skip it: there's nothing to match. We don't
+ * need to examine the next columns.
+ */
+ if (dtup->bt_empty_range)
+ {
+ addrange = false;
+ break;
+ }
+
/*
* First check if there are any IS [NOT] NULL scan keys,
* and if we're violating them. In that case we can
@@ -1606,6 +1617,64 @@ union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
db = brin_deform_tuple(bdesc, b, NULL);
MemoryContextSwitchTo(oldcxt);
+ /*
+ * Check if the ranges are empty.
+ *
+ * If at least one of them is empty, we don't need to call per-key union
+ * functions at all. If "b" is empty, we just use "a" as the result (it
+ * might be empty fine, but that's fine). If "a" is empty but "b" is not,
+ * we use "b" as the result (but we have to copy the data into "a" first).
+ *
+ * Only when both ranges are non-empty, we actually do the per-key merge.
+ */
+
+ /* If "b" is empty - ignore it and just use "a" (even if it's empty etc.). */
+ if (db->bt_empty_range)
+ {
+ /* skip the per-key merge */
+ MemoryContextDelete(cxt);
+ return;
+ }
+
+ /*
+ * Now we know "b" is not empty. If "a" is empty, then "b" is the result.
+ * But we need to copy the data from "b" to "a" first, because that's how
+ * we pass result out.
+ *
+ * We have to copy all the global/per-key flags etc. too.
+ */
+ if (a->bt_empty_range)
+ {
+ for (keyno = 0; keyno < bdesc->bd_tupdesc->natts; keyno++)
+ {
+ int i;
+ BrinValues *col_a = &a->bt_columns[keyno];
+ BrinValues *col_b = &db->bt_columns[keyno];
+ BrinOpcInfo *opcinfo = bdesc->bd_info[keyno];
+
+ col_a->bv_allnulls = col_b->bv_allnulls;
+ col_a->bv_hasnulls = col_b->bv_hasnulls;
+
+ /* If "b" has no data, we're done. */
+ if (col_b->bv_allnulls)
+ continue;
+
+ for (i = 0; i < opcinfo->oi_nstored; i++)
+ col_a->bv_values[i] =
+ datumCopy(col_b->bv_values[i],
+ opcinfo->oi_typcache[i]->typbyval,
+ opcinfo->oi_typcache[i]->typlen);
+ }
+
+ /* "a" started empty, but "b" was not empty, so remember that */
+ a->bt_empty_range = false;
+
+ /* skip the per-key merge */
+ MemoryContextDelete(cxt);
+ return;
+ }
+
+ /* Now we know neither range is empty. */
for (keyno = 0; keyno < bdesc->bd_tupdesc->natts; keyno++)
{
FmgrInfo *unionFn;
@@ -1711,7 +1780,9 @@ add_values_to_range(Relation idxRel, BrinDesc *bdesc, BrinMemTuple *dtup,
Datum *values, bool *nulls)
{
int keyno;
- bool modified = false;
+
+ /* If the range starts empty, we're certainly going to modify it. */
+ bool modified = dtup->bt_empty_range;
/*
* Compare the key values of the new tuple to the stored index values; our
@@ -1725,9 +1796,24 @@ add_values_to_range(Relation idxRel, BrinDesc *bdesc, BrinMemTuple *dtup,
Datum result;
BrinValues *bval;
FmgrInfo *addValue;
+ bool has_nulls;
bval = &dtup->bt_columns[keyno];
+ /*
+ * Does the range have actual NULL values? Either of the flags can
+ * be set, but we ignore the state before adding first row.
+ *
+ * We have to remember this, because we'll modify the flags and we
+ * need to know if the range started as empty.
+ */
+ has_nulls = ((!dtup->bt_empty_range) &&
+ (bval->bv_hasnulls || bval->bv_allnulls));
+
+ /*
+ * If the value we're adding is NULL, handle it locally. Otherwise
+ * call the BRIN_PROCNUM_ADDVALUE procedure.
+ */
if (bdesc->bd_info[keyno]->oi_regular_nulls && nulls[keyno])
{
/*
@@ -1753,8 +1839,33 @@ add_values_to_range(Relation idxRel, BrinDesc *bdesc, BrinMemTuple *dtup,
nulls[keyno]);
/* if that returned true, we need to insert the updated tuple */
modified |= DatumGetBool(result);
+
+ /*
+ * If the range was had actual NULL values (i.e. did not start empty),
+ * make sure we don't forget about the NULL values. Either the allnulls
+ * flag is still set to true, or (if the opclass cleared it) we need to
+ * set hasnulls=true.
+ *
+ * XXX This can only happen when the opclass modified the tuple, so the
+ * modified flag should be set.
+ */
+ if (has_nulls && !(bval->bv_hasnulls || bval->bv_allnulls))
+ {
+ Assert(modified);
+ bval->bv_hasnulls = true;
+ }
}
+ /*
+ * After updating summaries for all the keys, mark it as not empty.
+ *
+ * If we're actually changing the flag value (i.e. tuple started as empty),
+ * we should have modified the tuple. So we should not see empty range that
+ * was not modified.
+ */
+ Assert(!dtup->bt_empty_range || modified);
+ dtup->bt_empty_range = false;
+
return modified;
}
diff --git a/src/backend/access/brin/brin_tuple.c b/src/backend/access/brin/brin_tuple.c
index 84b79dbfc0d..23dfeab7de8 100644
--- a/src/backend/access/brin/brin_tuple.c
+++ b/src/backend/access/brin/brin_tuple.c
@@ -372,6 +372,9 @@ brin_form_tuple(BrinDesc *brdesc, BlockNumber blkno, BrinMemTuple *tuple,
if (tuple->bt_placeholder)
rettuple->bt_info |= BRIN_PLACEHOLDER_MASK;
+ if (tuple->bt_empty_range)
+ rettuple->bt_info |= BRIN_EMPTY_RANGE_MASK;
+
*size = len;
return rettuple;
}
@@ -399,7 +402,7 @@ brin_form_placeholder_tuple(BrinDesc *brdesc, BlockNumber blkno, Size *size)
rettuple = palloc0(len);
rettuple->bt_blkno = blkno;
rettuple->bt_info = hoff;
- rettuple->bt_info |= BRIN_NULLS_MASK | BRIN_PLACEHOLDER_MASK;
+ rettuple->bt_info |= BRIN_NULLS_MASK | BRIN_PLACEHOLDER_MASK | BRIN_EMPTY_RANGE_MASK;
bitP = ((bits8 *) ((char *) rettuple + SizeOfBrinTuple)) - 1;
bitmask = HIGHBIT;
@@ -489,6 +492,8 @@ brin_new_memtuple(BrinDesc *brdesc)
dtup->bt_allnulls = palloc(sizeof(bool) * brdesc->bd_tupdesc->natts);
dtup->bt_hasnulls = palloc(sizeof(bool) * brdesc->bd_tupdesc->natts);
+ dtup->bt_empty_range = true;
+
dtup->bt_context = AllocSetContextCreate(CurrentMemoryContext,
"brin dtuple",
ALLOCSET_DEFAULT_SIZES);
@@ -527,6 +532,8 @@ brin_memtuple_initialize(BrinMemTuple *dtuple, BrinDesc *brdesc)
currdatum += sizeof(Datum) * brdesc->bd_info[i]->oi_nstored;
}
+ dtuple->bt_empty_range = true;
+
return dtuple;
}
@@ -560,6 +567,11 @@ brin_deform_tuple(BrinDesc *brdesc, BrinTuple *tuple, BrinMemTuple *dMemtuple)
if (BrinTupleIsPlaceholder(tuple))
dtup->bt_placeholder = true;
+
+ /* ranges start as empty, depends on the BrinTuple */
+ if (!BrinTupleIsEmptyRange(tuple))
+ dtup->bt_empty_range = false;
+
dtup->bt_blkno = tuple->bt_blkno;
values = dtup->bt_values;
diff --git a/src/include/access/brin_tuple.h b/src/include/access/brin_tuple.h
index 732f91edf11..c56747aca4a 100644
--- a/src/include/access/brin_tuple.h
+++ b/src/include/access/brin_tuple.h
@@ -44,6 +44,7 @@ typedef struct BrinValues
typedef struct BrinMemTuple
{
bool bt_placeholder; /* this is a placeholder tuple */
+ bool bt_empty_range; /* range represents no tuples */
BlockNumber bt_blkno; /* heap blkno that the tuple is for */
MemoryContext bt_context; /* memcxt holding the bt_columns values */
/* output arrays for brin_deform_tuple: */
@@ -69,7 +70,7 @@ typedef struct BrinTuple
*
* 7th (high) bit: has nulls
* 6th bit: is placeholder tuple
- * 5th bit: unused
+ * 5th bit: range is empty
* 4-0 bit: offset of data
* ---------------
*/
@@ -82,13 +83,14 @@ typedef struct BrinTuple
* bt_info manipulation macros
*/
#define BRIN_OFFSET_MASK 0x1F
-/* bit 0x20 is not used at present */
+#define BRIN_EMPTY_RANGE_MASK 0x20
#define BRIN_PLACEHOLDER_MASK 0x40
#define BRIN_NULLS_MASK 0x80
#define BrinTupleDataOffset(tup) ((Size) (((BrinTuple *) (tup))->bt_info & BRIN_OFFSET_MASK))
#define BrinTupleHasNulls(tup) (((((BrinTuple *) (tup))->bt_info & BRIN_NULLS_MASK)) != 0)
#define BrinTupleIsPlaceholder(tup) (((((BrinTuple *) (tup))->bt_info & BRIN_PLACEHOLDER_MASK)) != 0)
+#define BrinTupleIsEmptyRange(tup) (((((BrinTuple *) (tup))->bt_info & BRIN_EMPTY_RANGE_MASK)) != 0)
extern BrinTuple *brin_form_tuple(BrinDesc *brdesc, BlockNumber blkno,
diff --git a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
index 2a4755d0998..584ac2602f7 100644
--- a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
+++ b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
@@ -4,7 +4,7 @@ starting permutation: s2check s1b s2b s1i s2summ s1c s2c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -26,7 +26,7 @@ step s2c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
2| 1| 1|f |f |f |{1 .. 1000}
(2 rows)
@@ -35,7 +35,7 @@ starting permutation: s2check s1b s1i s2vacuum s1c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -45,7 +45,7 @@ step s1c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |f |f |{1 .. 1}
+ 1| 0| 1|f |t |f |{1 .. 1}
2| 1| 1|f |f |f |{1 .. 1000}
(2 rows)
diff --git a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
index 19ac18a2e88..18ba92b7ba1 100644
--- a/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
+++ b/src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec
@@ -9,6 +9,7 @@ setup
) WITH (fillfactor=10);
CREATE INDEX brinidx ON brin_iso USING brin (value) WITH (pages_per_range=1);
-- this fills the first page
+ INSERT INTO brin_iso VALUES (NULL);
DO $$
DECLARE curtid tid;
BEGIN
--
2.40.1
0003-Show-empty-ranges-in-brin_page_items-14-master.patchtext/x-patch; charset=UTF-8; name=0003-Show-empty-ranges-in-brin_page_items-14-master.patchDownload
From 736811865435feaf98b2c7d60d83fa113ff2aa67 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 27 Mar 2023 22:47:12 +0200
Subject: [PATCH 3/3] Show empty ranges in brin_page_items
Show which BRIN ranges are empty (no rows), as indicated by the newly
introduced flag.
---
contrib/pageinspect/brinfuncs.c | 10 ++++---
contrib/pageinspect/expected/brin.out | 6 ++--
.../pageinspect/pageinspect--1.11--1.12.sql | 17 +++++++++++
doc/src/sgml/pageinspect.sgml | 16 +++++------
...summarization-and-inprogress-insertion.out | 28 +++++++++----------
5 files changed, 48 insertions(+), 29 deletions(-)
diff --git a/contrib/pageinspect/brinfuncs.c b/contrib/pageinspect/brinfuncs.c
index 000dcd8f5d8..a781f265514 100644
--- a/contrib/pageinspect/brinfuncs.c
+++ b/contrib/pageinspect/brinfuncs.c
@@ -201,8 +201,8 @@ brin_page_items(PG_FUNCTION_ARGS)
dtup = NULL;
for (;;)
{
- Datum values[7];
- bool nulls[7] = {0};
+ Datum values[8];
+ bool nulls[8] = {0};
/*
* This loop is called once for every attribute of every tuple in the
@@ -239,6 +239,7 @@ brin_page_items(PG_FUNCTION_ARGS)
nulls[4] = true;
nulls[5] = true;
nulls[6] = true;
+ nulls[7] = true;
}
else
{
@@ -261,6 +262,7 @@ brin_page_items(PG_FUNCTION_ARGS)
values[3] = BoolGetDatum(dtup->bt_columns[att].bv_allnulls);
values[4] = BoolGetDatum(dtup->bt_columns[att].bv_hasnulls);
values[5] = BoolGetDatum(dtup->bt_placeholder);
+ values[6] = BoolGetDatum(dtup->bt_empty_range);
if (!dtup->bt_columns[att].bv_allnulls)
{
BrinValues *bvalues = &dtup->bt_columns[att];
@@ -286,12 +288,12 @@ brin_page_items(PG_FUNCTION_ARGS)
}
appendStringInfoChar(&s, '}');
- values[6] = CStringGetTextDatum(s.data);
+ values[7] = CStringGetTextDatum(s.data);
pfree(s.data);
}
else
{
- nulls[6] = true;
+ nulls[7] = true;
}
}
diff --git a/contrib/pageinspect/expected/brin.out b/contrib/pageinspect/expected/brin.out
index e12fbeb4774..098ddc202f4 100644
--- a/contrib/pageinspect/expected/brin.out
+++ b/contrib/pageinspect/expected/brin.out
@@ -43,9 +43,9 @@ SELECT * FROM brin_revmap_data(get_raw_page('test1_a_idx', 1)) LIMIT 5;
SELECT * FROM brin_page_items(get_raw_page('test1_a_idx', 2), 'test1_a_idx')
ORDER BY blknum, attnum LIMIT 5;
- itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
-------------+--------+--------+----------+----------+-------------+----------
- 1 | 0 | 1 | f | f | f | {1 .. 1}
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | f | f | f | {1 .. 1}
(1 row)
-- Mask DETAIL messages as these are not portable across architectures.
diff --git a/contrib/pageinspect/pageinspect--1.11--1.12.sql b/contrib/pageinspect/pageinspect--1.11--1.12.sql
index 70c3abccf57..a20d67a9e82 100644
--- a/contrib/pageinspect/pageinspect--1.11--1.12.sql
+++ b/contrib/pageinspect/pageinspect--1.11--1.12.sql
@@ -21,3 +21,20 @@ CREATE FUNCTION bt_multi_page_stats(IN relname text, IN blkno int8, IN blk_count
RETURNS SETOF record
AS 'MODULE_PATHNAME', 'bt_multi_page_stats'
LANGUAGE C STRICT PARALLEL RESTRICTED;
+
+--
+-- add information about BRIN empty ranges
+--
+DROP FUNCTION brin_page_items(IN page bytea, IN index_oid regclass);
+CREATE FUNCTION brin_page_items(IN page bytea, IN index_oid regclass,
+ OUT itemoffset int,
+ OUT blknum int8,
+ OUT attnum int,
+ OUT allnulls bool,
+ OUT hasnulls bool,
+ OUT placeholder bool,
+ OUT empty bool,
+ OUT value text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'brin_page_items'
+LANGUAGE C STRICT PARALLEL RESTRICTED;
diff --git a/doc/src/sgml/pageinspect.sgml b/doc/src/sgml/pageinspect.sgml
index 01f1e96204b..e4225ecd485 100644
--- a/doc/src/sgml/pageinspect.sgml
+++ b/doc/src/sgml/pageinspect.sgml
@@ -613,14 +613,14 @@ test=# SELECT * FROM brin_revmap_data(get_raw_page('brinidx', 2)) LIMIT 5;
test=# SELECT * FROM brin_page_items(get_raw_page('brinidx', 5),
'brinidx')
ORDER BY blknum, attnum LIMIT 6;
- itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
-------------+--------+--------+----------+----------+-------------+--------------
- 137 | 0 | 1 | t | f | f |
- 137 | 0 | 2 | f | f | f | {1 .. 88}
- 138 | 4 | 1 | t | f | f |
- 138 | 4 | 2 | f | f | f | {89 .. 176}
- 139 | 8 | 1 | t | f | f |
- 139 | 8 | 2 | f | f | f | {177 .. 264}
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+--------------
+ 137 | 0 | 1 | t | f | f | f |
+ 137 | 0 | 2 | f | f | f | f | {1 .. 88}
+ 138 | 4 | 1 | t | f | f | f |
+ 138 | 4 | 2 | f | f | f | f | {89 .. 176}
+ 139 | 8 | 1 | t | f | f | f |
+ 139 | 8 | 2 | f | f | f | f | {177 .. 264}
</screen>
The returned columns correspond to the fields in the
<structname>BrinMemTuple</structname> and <structname>BrinValues</structname> structs.
diff --git a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
index 584ac2602f7..201786c82c0 100644
--- a/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
+++ b/src/test/modules/brin/expected/summarization-and-inprogress-insertion.out
@@ -2,9 +2,9 @@ Parsed test spec with 2 sessions
starting permutation: s2check s1b s2b s1i s2summ s1c s2c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
-itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
-----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |t |f |{1 .. 1}
+itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|empty|value
+----------+------+------+--------+--------+-----------+-----+--------
+ 1| 0| 1|f |t |f |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -24,18 +24,18 @@ brin_summarize_new_values
step s1c: COMMIT;
step s2c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
-itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
-----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |t |f |{1 .. 1}
- 2| 1| 1|f |f |f |{1 .. 1000}
+itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|empty|value
+----------+------+------+--------+--------+-----------+-----+-----------
+ 1| 0| 1|f |t |f |f |{1 .. 1}
+ 2| 1| 1|f |f |f |f |{1 .. 1000}
(2 rows)
starting permutation: s2check s1b s1i s2vacuum s1c s2check
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
-itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
-----------+------+------+--------+--------+-----------+--------
- 1| 0| 1|f |t |f |{1 .. 1}
+itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|empty|value
+----------+------+------+--------+--------+-----------+-----+--------
+ 1| 0| 1|f |t |f |f |{1 .. 1}
(1 row)
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -43,9 +43,9 @@ step s1i: INSERT INTO brin_iso VALUES (1000);
step s2vacuum: VACUUM brin_iso;
step s1c: COMMIT;
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
-itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
-----------+------+------+--------+--------+-----------+-----------
- 1| 0| 1|f |t |f |{1 .. 1}
- 2| 1| 1|f |f |f |{1 .. 1000}
+itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|empty|value
+----------+------+------+--------+--------+-----------+-----+-----------
+ 1| 0| 1|f |t |f |f |{1 .. 1}
+ 2| 1| 1|f |f |f |f |{1 .. 1000}
(2 rows)
--
2.40.1
On 5/18/23 20:45, Tomas Vondra wrote:
...
0001 fixes the issue. 0002 is the original fix, and 0003 is just the
pageinspect changes (for master only).For the backbranches, I thought about making the code more like master
(by moving some of the handling from opclasses to brin.c), but decided
not to. It'd be low-risk, but it feels wrong to kinda do what the master
does under "oi_regular_nulls" flag.
I've now pushed all these patches into relevant branches, after some
minor last-minute tweaks, and so far it didn't cause any buildfarm
issues. Assuming this fully fixes the NULL-handling for BRIN, this
leaves just the deadlock issue discussed in [1]/messages/by-id/261e68bc-f5f5-5234-fb2c-af4f583513c0@enterprisedb.com.
It seems rather unfortunate all these issues went unnoticed / unreported
essentially since BRIN was introduced in 9.5. To some extent it might be
explained by fairly low likelihood of actually hitting the issue (just
the right timing, concurrency with summarization, NULL values, ...). It
took me quite a bit of time and luck to (accidentally) hit these issues
while stress testing the code.
But there's also the problem of writing tests for this kind of thing. To
exercise the interesting parts (e.g. the union_tuples), it's necessary
to coordinate the order of concurrent steps - but what's a good generic
way to do that (which we could do in TAP tests)? In manual testing it's
doable by setting breakpoints on a particular lines, and step through
the concurrent processes that way.
But that doesn't seem like a particularly great solution for regression
tests. I can imagine adding some sort of "probes" into the code and then
attaching breakpoints to those, but surely we're not the first project
needing this ...
regards
[1]: /messages/by-id/261e68bc-f5f5-5234-fb2c-af4f583513c0@enterprisedb.com
/messages/by-id/261e68bc-f5f5-5234-fb2c-af4f583513c0@enterprisedb.com
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company