Fix for pageinspect bug in PG 17
Hi,
Here's a fix for pageinspect bug in PG17, reported in [1]. The bug turns
out to be introduced by my commit
commit dae761a87edae444d11a411f711f1d679bed5941
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Fri Dec 8 17:07:30 2023 +0100
Add empty BRIN ranges during CREATE INDEX
...
This adds an out argument to brin_page_items, but I failed to consider
the user may still run with an older version of the extension - either
after pg_upgrade (as in the report), or when the CREATE EXTENSION
command specifies VERSION.
The new "empty" field is added in the middle of the output tuple, which
shifts the values, causing segfault when accessing a text field at the
end of the array.
Of course, we add arguments to existing functions pretty often, and we
know how to do that in backwards-compatible way - pg_stat_statements is
a good example of how to do that nicely. Unfortunately, it's too late to
do that for brin_page_items :-( There may already be upgraded systems or
with installed pageinspect, etc.
The only fix I can think of is explicitly looking at TupleDesc->natts,
as in the attached fix.
regards
--
Tomas Vondra
Attachments:
pageinspect-fix.patchtext/x-patch; charset=UTF-8; name=pageinspect-fix.patchDownload
diff --git a/contrib/pageinspect/brinfuncs.c b/contrib/pageinspect/brinfuncs.c
index 22621d584fa..6086b02445b 100644
--- a/contrib/pageinspect/brinfuncs.c
+++ b/contrib/pageinspect/brinfuncs.c
@@ -240,25 +240,29 @@ brin_page_items(PG_FUNCTION_ARGS)
else
{
int att = attno - 1;
+ int idx = 0;
- values[0] = UInt16GetDatum(offset);
+ values[idx++] = UInt16GetDatum(offset);
switch (TupleDescAttr(rsinfo->setDesc, 1)->atttypid)
{
case INT8OID:
- values[1] = Int64GetDatum((int64) dtup->bt_blkno);
+ values[idx++] = Int64GetDatum((int64) dtup->bt_blkno);
break;
case INT4OID:
/* support for old extension version */
- values[1] = UInt32GetDatum(dtup->bt_blkno);
+ values[idx++] = UInt32GetDatum(dtup->bt_blkno);
break;
default:
elog(ERROR, "incorrect output types");
}
- values[2] = UInt16GetDatum(attno);
- values[3] = BoolGetDatum(dtup->bt_columns[att].bv_allnulls);
- values[4] = BoolGetDatum(dtup->bt_columns[att].bv_hasnulls);
- values[5] = BoolGetDatum(dtup->bt_placeholder);
- values[6] = BoolGetDatum(dtup->bt_empty_range);
+ values[idx++] = UInt16GetDatum(attno);
+ values[idx++] = BoolGetDatum(dtup->bt_columns[att].bv_allnulls);
+ values[idx++] = BoolGetDatum(dtup->bt_columns[att].bv_hasnulls);
+ values[idx++] = BoolGetDatum(dtup->bt_placeholder);
+
+ if (rsinfo->setDesc->natts >= 8)
+ values[idx++] = BoolGetDatum(dtup->bt_empty_range);
+
if (!dtup->bt_columns[att].bv_allnulls)
{
BrinValues *bvalues = &dtup->bt_columns[att];
@@ -284,13 +288,15 @@ brin_page_items(PG_FUNCTION_ARGS)
}
appendStringInfoChar(&s, '}');
- values[7] = CStringGetTextDatum(s.data);
+ values[idx++] = CStringGetTextDatum(s.data);
pfree(s.data);
}
else
{
- nulls[7] = true;
+ nulls[idx++] = true;
}
+
+ Assert(idx == rsinfo->setDesc->natts);
}
tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
On Mon, Nov 11, 2024 at 07:32:10PM +0100, Tomas Vondra wrote:
This adds an out argument to brin_page_items, but I failed to consider
the user may still run with an older version of the extension - either
after pg_upgrade (as in the report), or when the CREATE EXTENSION
command specifies VERSION.
Perhaps it would be worth adding a test where the function is called
in the context of a past extension version? (I know, I'm noisy when
it comes to that.)
The only fix I can think of is explicitly looking at TupleDesc->natts,
as in the attached fix.
How about some consistency instead as this is the same error as
691e8b2e1889? btreefuncs.c is deciding to issue an error if we are
trying to call one of its functions in the context of a past version
of the extension. pageinspect is a superuser module, it does not seem
that bad to me to tell users that they should force an upgrade when
using it.
--
Michael
Dear Tomas,
Here's a fix for pageinspect bug in PG17, reported in [1]. The bug turns
out to be introduced by my commit
I could not see the link, but I think it is [1]/messages/by-id/VI1PR02MB63331C3D90E2104FD12399D38A5D2@VI1PR02MB6333.eurprd02.prod.outlook.com, right?
commit dae761a87edae444d11a411f711f1d679bed5941
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Fri Dec 8 17:07:30 2023 +0100Add empty BRIN ranges during CREATE INDEX
...
This adds an out argument to brin_page_items, but I failed to consider
the user may still run with an older version of the extension - either
after pg_upgrade (as in the report), or when the CREATE EXTENSION
command specifies VERSION.
I confirmed that 428c0ca added new output entries (version is bumped 11->12),
and the same C function is used for both 11 and 12.
The new "empty" field is added in the middle of the output tuple, which
shifts the values, causing segfault when accessing a text field at the
end of the array.
Agreed and I could reproduce by steps:
```
postgres=# CREATE EXTENSION pageinspect VERSION "1.11" ;
CREATE EXTENSION
postgres=# CREATE TABLE foo (id int);
CREATE TABLE
postgres=# CREATE INDEX ON foo USING brin ( id );
CREATE INDEX
postgres=# SELECT * FROM brin_page_items(get_raw_page('foo_id_idx', 2), 'foo_id_idx');
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
```
Of course, we add arguments to existing functions pretty often, and we
know how to do that in backwards-compatible way - pg_stat_statements is
a good example of how to do that nicely. Unfortunately, it's too late to
do that for brin_page_items :-( There may already be upgraded systems or
with installed pageinspect, etc.The only fix I can think of is explicitly looking at TupleDesc->natts,
as in the attached fix.
I've tested your patch and it worked well. Also, I tried to upgrade from 11 and 12
and ran again, it could execute well.
Few points:
1.
Do you think we should follow the approach like pg_stat_statements for master?
Or, do you want to apply the patch both PG17 and HEAD? I think latter one is OK
because we should avoid code branches as much as possible.
2.
Later readers may surprise the part for checking `rsinfo->setDesc->natts`.
Can you use the boolean and add comments, something like
```
+ /* Support older API version */
+ bool output_empty_attr = (rsinfo->setDesc->natts >= 8);
```
3.
Do we have to add the test for the fix?
[1]: /messages/by-id/VI1PR02MB63331C3D90E2104FD12399D38A5D2@VI1PR02MB6333.eurprd02.prod.outlook.com
Best regards,
Hayato Kuroda
FUJITSU LIMITED
On 11/12/24 08:39, Michael Paquier wrote:
On Mon, Nov 11, 2024 at 07:32:10PM +0100, Tomas Vondra wrote:
This adds an out argument to brin_page_items, but I failed to consider
the user may still run with an older version of the extension - either
after pg_upgrade (as in the report), or when the CREATE EXTENSION
command specifies VERSION.Perhaps it would be worth adding a test where the function is called
in the context of a past extension version? (I know, I'm noisy when
it comes to that.)The only fix I can think of is explicitly looking at TupleDesc->natts,
as in the attached fix.How about some consistency instead as this is the same error as
691e8b2e1889? btreefuncs.c is deciding to issue an error if we are
trying to call one of its functions in the context of a past version
of the extension. pageinspect is a superuser module, it does not seem
that bad to me to tell users that they should force an upgrade when
using it.
Yeah, a test with an older version is a good idea. And pageinspect
already has such tests in oldextversions.sql, so it was simple to just
add it there.
And I think you're right it's probably better to just error out. Yes, we
could check the natts value and build the right tuple, but if btree
functions already error out, it'd be a bit futile to do it differently
(although I'm not sure why bt_meta() says it can't be done reliably).
regards
--
Tomas Vondra
Attachments:
pageinspect-fix-v2.patchtext/x-patch; charset=UTF-8; name=pageinspect-fix-v2.patchDownload
diff --git a/contrib/pageinspect/brinfuncs.c b/contrib/pageinspect/brinfuncs.c
index 22621d584fa..525c82600f1 100644
--- a/contrib/pageinspect/brinfuncs.c
+++ b/contrib/pageinspect/brinfuncs.c
@@ -117,6 +117,8 @@ verify_brin_page(bytea *raw_page, uint16 type, const char *strtype)
return page;
}
+/* Number of output arguments (columns) for brin_page_items() */
+#define BRIN_PAGE_ITEMS_COLS_V1_12 8
/*
* Extract all item values from a BRIN index page
@@ -153,6 +155,20 @@ brin_page_items(PG_FUNCTION_ARGS)
errmsg("\"%s\" is not a %s index",
RelationGetRelationName(indexRel), "BRIN")));
+ /*
+ * We need a kluge here to detect API versions prior to 1.12. Earlier
+ * versions don't have the "empty" flag as o
+ *
+ * In principle we could skip the 'empty' flag and build the tuple with
+ * the old tuple descriptor, but functions like bt_meta() already error
+ * out in this case, so follow that precedent.
+ */
+ if (rsinfo->setDesc->natts < BRIN_PAGE_ITEMS_COLS_V1_12)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
+ errmsg("function has wrong number of declared columns"),
+ errhint("To resolve the problem, update the \"pageinspect\" extension to the latest version.")));
+
bdesc = brin_build_desc(indexRel);
/* minimally verify the page we got */
diff --git a/contrib/pageinspect/expected/oldextversions.out b/contrib/pageinspect/expected/oldextversions.out
index f5c4b61bd79..c0abd7ab3ca 100644
--- a/contrib/pageinspect/expected/oldextversions.out
+++ b/contrib/pageinspect/expected/oldextversions.out
@@ -52,5 +52,20 @@ SELECT pagesize, version FROM page_header(get_raw_page('test1', 0));
8192 | 4
(1 row)
+DROP TABLE test1;
+-- brin_page_items got a new OUT parameter in 1.12
+CREATE TABLE test1 (a int, b text);
+INSERT INTO test1 VALUES (1, 'one');
+CREATE INDEX test1_a_idx ON test1 USING brin (a);
+SELECT * FROM brin_page_items(get_raw_page('test1_a_idx', 2), 'test1_a_idx');
+ERROR: function has wrong number of declared columns
+HINT: To resolve the problem, update the "pageinspect" extension to the latest version.
+ALTER EXTENSION pageinspect UPDATE TO '1.12';
+SELECT * FROM brin_page_items(get_raw_page('test1_a_idx', 2), 'test1_a_idx');
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value
+------------+--------+--------+----------+----------+-------------+-------+----------
+ 1 | 0 | 1 | f | f | f | f | {1 .. 1}
+(1 row)
+
DROP TABLE test1;
DROP EXTENSION pageinspect;
diff --git a/contrib/pageinspect/sql/oldextversions.sql b/contrib/pageinspect/sql/oldextversions.sql
index 9f953492c23..b5342d3397d 100644
--- a/contrib/pageinspect/sql/oldextversions.sql
+++ b/contrib/pageinspect/sql/oldextversions.sql
@@ -22,5 +22,18 @@ ALTER EXTENSION pageinspect UPDATE TO '1.9';
\df page_header
SELECT pagesize, version FROM page_header(get_raw_page('test1', 0));
+DROP TABLE test1;
+
+-- brin_page_items got a new OUT parameter in 1.12
+CREATE TABLE test1 (a int, b text);
+INSERT INTO test1 VALUES (1, 'one');
+CREATE INDEX test1_a_idx ON test1 USING brin (a);
+
+SELECT * FROM brin_page_items(get_raw_page('test1_a_idx', 2), 'test1_a_idx');
+
+ALTER EXTENSION pageinspect UPDATE TO '1.12';
+
+SELECT * FROM brin_page_items(get_raw_page('test1_a_idx', 2), 'test1_a_idx');
+
DROP TABLE test1;
DROP EXTENSION pageinspect;
On 11/12/24 09:04, Hayato Kuroda (Fujitsu) wrote:
Dear Tomas,
Here's a fix for pageinspect bug in PG17, reported in [1]. The bug turns
out to be introduced by my commitI could not see the link, but I think it is [1], right?
Right. Apologies, I forgot to include the link.
commit dae761a87edae444d11a411f711f1d679bed5941
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Fri Dec 8 17:07:30 2023 +0100Add empty BRIN ranges during CREATE INDEX
...
This adds an out argument to brin_page_items, but I failed to consider
the user may still run with an older version of the extension - either
after pg_upgrade (as in the report), or when the CREATE EXTENSION
command specifies VERSION.I confirmed that 428c0ca added new output entries (version is bumped 11->12),
and the same C function is used for both 11 and 12.The new "empty" field is added in the middle of the output tuple, which
shifts the values, causing segfault when accessing a text field at the
end of the array.Agreed and I could reproduce by steps:
```
postgres=# CREATE EXTENSION pageinspect VERSION "1.11" ;
CREATE EXTENSION
postgres=# CREATE TABLE foo (id int);
CREATE TABLE
postgres=# CREATE INDEX ON foo USING brin ( id );
CREATE INDEX
postgres=# SELECT * FROM brin_page_items(get_raw_page('foo_id_idx', 2), 'foo_id_idx');
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
```Of course, we add arguments to existing functions pretty often, and we
know how to do that in backwards-compatible way - pg_stat_statements is
a good example of how to do that nicely. Unfortunately, it's too late to
do that for brin_page_items :-( There may already be upgraded systems or
with installed pageinspect, etc.The only fix I can think of is explicitly looking at TupleDesc->natts,
as in the attached fix.I've tested your patch and it worked well. Also, I tried to upgrade from 11 and 12
and ran again, it could execute well.Few points:
1.
Do you think we should follow the approach like pg_stat_statements for master?
Or, do you want to apply the patch both PG17 and HEAD? I think latter one is OK
because we should avoid code branches as much as possible.
My plan was to apply the patch to both 17 and HEAD, and then maybe do
something smarter in HEAD in a separate commit. But then Michael pointed
out other pageinspect functions just error out in this version-mismatch
cases, so I think it's better to just do it the same way.
Maybe we could be smarter, but it seems pointless to do it only for one
pageinspect function, while still requiring an upgrade for other more
widely used functions in the same situation (albeit for a much older
version of the extension).
2.
Later readers may surprise the part for checking `rsinfo->setDesc->natts`.
Can you use the boolean and add comments, something like``` + /* Support older API version */ + bool output_empty_attr = (rsinfo->setDesc->natts >= 8); ```
Doesn't matter, if we error out.
3.
Do we have to add the test for the fix?
Yes. See the patch I shared a couple minutes ago.
regards
--
Tomas Vondra
On Wed, Nov 13, 2024 at 11:07 AM Tomas Vondra <tomas@vondra.me> wrote:
My plan was to apply the patch to both 17 and HEAD, and then maybe do
something smarter in HEAD in a separate commit. But then Michael pointed
out other pageinspect functions just error out in this version-mismatch
cases, so I think it's better to just do it the same way.
FWIW I didn't actually backpatch commit 691e8b2e18. I decided that it
was better to just paper-over the issue on backbranches instead -- see
commit c788115b.
The problem that I fixed back in 2020 was a problem with the data
types used -- not a failure to consider older versions of the
extension at all. It was just convenient to use the number of columns
to detect the version of the extension to detect a problematic
(incorrectly typed) function.
--
Peter Geoghegan
On 11/13/24 18:20, Peter Geoghegan wrote:
On Wed, Nov 13, 2024 at 11:07 AM Tomas Vondra <tomas@vondra.me> wrote:
My plan was to apply the patch to both 17 and HEAD, and then maybe do
something smarter in HEAD in a separate commit. But then Michael pointed
out other pageinspect functions just error out in this version-mismatch
cases, so I think it's better to just do it the same way.FWIW I didn't actually backpatch commit 691e8b2e18. I decided that it
was better to just paper-over the issue on backbranches instead -- see
commit c788115b.The problem that I fixed back in 2020 was a problem with the data
types used -- not a failure to consider older versions of the
extension at all. It was just convenient to use the number of columns
to detect the version of the extension to detect a problematic
(incorrectly typed) function.
Does that mean you think we should fix the issue at hand differently?
Say, by looking at number of columns and building the correct tuple,
like I did in my initial patch?
regards
--
Tomas Vondra
On Wed, Nov 13, 2024 at 09:00:30PM +0100, Tomas Vondra wrote:
Does that mean you think we should fix the issue at hand differently?
Say, by looking at number of columns and building the correct tuple,
like I did in my initial patch?
691e8b2e18 is not something I would have done when it comes to
pageinspect, FWIW. There is the superuser argument for this module,
so I'd vote for an error and apply the same policy across all branches
as a matter of consistency.
A second argument in favor of raising an error is that it makes future
changes a bit easier to implement so as you can reuse the existing
routine as an "internal" workhorse called by the SQL functions
depending on the version of the extension involved at runtime, as we
do in pgss. Data type changes are easier to track, for example. If
you add a solution based on the number of arguments, you may have to
mix it with a per-version approach, at least on HEAD. As long as
there is a test to check on behavior or another, we're going to be
fine because we'll know if a previous assumption suddenly breaks.
Anyway, your solution to use the number of columns to determine what
should be done at runtime works as well, so feel free to just ignore
me. That's just an approach I'd prefer taking here because that's
easier to work with IMO, but I can also see why people don't like the
versioning logic a-la-PGSS, as well.
--
Michael
On Wed, Nov 13, 2024 at 7:48 PM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Nov 13, 2024 at 09:00:30PM +0100, Tomas Vondra wrote:
Does that mean you think we should fix the issue at hand differently?
Say, by looking at number of columns and building the correct tuple,
like I did in my initial patch?691e8b2e18 is not something I would have done when it comes to
pageinspect, FWIW. There is the superuser argument for this module,
so I'd vote for an error and apply the same policy across all branches
as a matter of consistency.
691e8b2e18 was the one that threw the error?
--
Peter Geoghegan
On Wed, Nov 13, 2024 at 3:00 PM Tomas Vondra <tomas@vondra.me> wrote:
Does that mean you think we should fix the issue at hand differently?
Say, by looking at number of columns and building the correct tuple,
like I did in my initial patch?
I don't feel strongly about it either way. But if it was my fix I'd
probably make it work on both versions. There's plenty of precedent
for that.
--
Peter Geoghegan
On Wed, Nov 13, 2024 at 07:54:03PM -0500, Peter Geoghegan wrote:
On Wed, Nov 13, 2024 at 7:48 PM Michael Paquier <michael@paquier.xyz> wrote:
691e8b2e18 is not something I would have done when it comes to
pageinspect, FWIW. There is the superuser argument for this module,
so I'd vote for an error and apply the same policy across all branches
as a matter of consistency.691e8b2e18 was the one that threw the error?
Oops. Sorry about that. I meant c788115b5eb5 in this sentence, not
691e8b2e18.
--
Michael
Dear Tomas,
Thanks for updating the patch.
I've tested new patch and confirmed the brin_pgage_items() could error out:
```
postgres=# SELECT * FROM brin_page_items(get_raw_page('foo_id_idx', 2), 'foo_id_idx');
ERROR: function has wrong number of declared columns
HINT: To resolve the problem, update the "pageinspect" extension to the latest version.
```
I also do not have strong opinions for both approaches.
If I had to say... it's OK to do error out. The source becomes simpler and I cannot
find use-case that user strongly wants to use the older pageinspect extension.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
I've pushed (and backpatched) a fix for this.
I ended up doing the simplest thing -- error out if the number of
columns does not match, suggesting to update to latest extension version.
I considered handling it in a nicer way, but I didn't like the result
very much and I think that's sufficient for superuser-only extension.
And 691e8b2e18 seems like a reasonable precedent (even though the
backbranches did do a different thing).
I also considered introducing pg_stat_statements-style versioning, but
it's too late to do that in backbranches, and I don't think we expect
the function to change very often to justify this.
regards
--
Tomas Vondra
On Tue, Dec 17, 2024 at 06:01:39PM +0100, Tomas Vondra wrote:
I also considered introducing pg_stat_statements-style versioning, but
it's too late to do that in backbranches, and I don't think we expect
the function to change very often to justify this.
Sounds good to me, thanks!
--
Michael