Fw: Re: heap_force_common in contrib/pg_surgery/heap_surgery.c has an off by one stack buffer overflow
Hi 王跃林,
Thank you for reporting the issue, I am able to reproduce it on master.
The include_this_tid[] array is sized MaxHeapTuplesPerPage but indexed
using 1-based OffsetNumber,
so the largest legal offset (MaxHeapTuplesPerPage itself) lands one slot
past the end.
psql (19beta1)
Type "help" for help.
postgres=# CREATE EXTENSION IF NOT EXISTS pg_surgery;
CREATE EXTENSION
postgres=# CREATE TABLE vuln_005_t();
CREATE TABLE
postgres=# INSERT INTO vuln_005_t SELECT FROM generate_series(1, 291);
INSERT 0 291
postgres=# SELECT heap_force_freeze('vuln_005_t'::regclass, ARRAY['(0,
291)']::tid[]);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
!?> q
-?> q
-?>
!?> quit
Proposed patch attached. It does two things:
1. Resize include_this_tid[] to MaxHeapTuplesPerPage + 1 so every legal
1-based offset has a slot. This removes the structural off-by-one
2. Extend the per-TID input check to also reject offno > MaxHeapTuplesPerPage,
so a corrupted page whose pd_lower lets max offset exceed the structural
maximum cannot reach the array either.
With the patch I no longer see the crash
postgres=# DROP TABLE IF EXISTS vuln_005_t;
DROP TABLE
postgres=# DROP EXTENSION IF EXISTS pg_surgery;
DROP EXTENSION
postgres=#
postgres=# CREATE EXTENSION pg_surgery;
CREATE EXTENSION
postgres=# CREATE TABLE vuln_005_t();
CREATE TABLE
postgres=# INSERT INTO vuln_005_t SELECT FROM generate_series(1, 291);
INSERT 0 291
postgres=# SELECT count(*) FROM vuln_005_t;
count
-------
291
(1 row)
postgres=# SELECT heap_force_freeze('vuln_005_t'::regclass, ARRAY['(0,
291)']::tid[]);
heap_force_freeze
-------------------
(1 row)
Regards,
Surya Poondla
Attachments:
0001-Fix-off-by-one-stack-buffer-overflow-in-pg_surgery.patchapplication/octet-stream; name=0001-Fix-off-by-one-stack-buffer-overflow-in-pg_surgery.patchDownload+6-4
Thanks for your reply!
王跃林
3020001251@tju.edu.cn
Original:
From:surya poondla <suryapoondla4@gmail.com>Date:2026-06-04 06:31:27(中国 (GMT+08:00))To:violin0613@tju.edu.cn<violin0613@tju.edu.cn>Cc:pgsql-bugs<pgsql-bugs@postgresql.org>Subject:Re: Fw: Re: heap_force_common in contrib/pg_surgery/heap_surgery.c has an off by one stack buffer overflowHi 王跃林,
Thank you for reporting the issue, I am able to reproduce it on master.
The include_this_tid[] array is sized MaxHeapTuplesPerPage but indexed using 1-based OffsetNumber, so the largest legal offset (MaxHeapTuplesPerPage itself) lands one slot past the end.
psql (19beta1)
Type "help" for help.
postgres=# CREATE EXTENSION IF NOT EXISTS pg_surgery;
CREATE EXTENSION
postgres=# CREATE TABLE vuln_005_t();
CREATE TABLE
postgres=# INSERT INTO vuln_005_t SELECT FROM generate_series(1, 291);
INSERT 0 291
postgres=# SELECT heap_force_freeze('vuln_005_t'::regclass, ARRAY['(0, 291)']::tid[]);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
!?> q
-?> q
-?>
!?> quit
Proposed patch attached. It does two things:
1. Resize include_this_tid[] to MaxHeapTuplesPerPage + 1 so every legal 1-based offset has a slot. This removes the structural off-by-one
2. Extend the per-TID input check to also reject offno > MaxHeapTuplesPerPage, so a corrupted page whose pd_lower lets max offset exceed the structural maximum cannot reach the array either.
With the patch I no longer see the crash
postgres=# DROP TABLE IF EXISTS vuln_005_t;
DROP TABLE
postgres=# DROP EXTENSION IF EXISTS pg_surgery;
DROP EXTENSION
postgres=#
postgres=# CREATE EXTENSION pg_surgery;
CREATE EXTENSION
postgres=# CREATE TABLE vuln_005_t();
CREATE TABLE
postgres=# INSERT INTO vuln_005_t SELECT FROM generate_series(1, 291);
INSERT 0 291
postgres=# SELECT count(*) FROM vuln_005_t;
count
-------
291
(1 row)
postgres=# SELECT heap_force_freeze('vuln_005_t'::regclass, ARRAY['(0, 291)']::tid[]);
heap_force_freeze
-------------------
(1 row)
Regards,
Surya Poondla
On Wed, Jun 03, 2026 at 03:31:27PM -0700, surya poondla wrote:
Thank you for reporting the issue, I am able to reproduce it on master.
The include_this_tid[] array is sized MaxHeapTuplesPerPage but indexed
using 1-based OffsetNumber,
so the largest legal offset (MaxHeapTuplesPerPage itself) lands one slot
past the end.
- bool include_this_tid[MaxHeapTuplesPerPage];
+ /* Sized +1 because OffsetNumbers are 1-based and can reach MaxHeapTuplesPerPage. */
+ bool include_this_tid[MaxHeapTuplesPerPage + 1];
The offset number begins at 1. Hence, instead of making this array
larger by one, you could keep it at the same size and adjust the array
index to use (offno - 1) instead.
--
Michael
Hi Michael,
On Thu, Jun 4, 2026 at 1:01 PM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Jun 03, 2026 at 03:31:27PM -0700, surya poondla wrote:
Thank you for reporting the issue, I am able to reproduce it on master.
The include_this_tid[] array is sized MaxHeapTuplesPerPage but indexed
using 1-based OffsetNumber,
so the largest legal offset (MaxHeapTuplesPerPage itself) lands one slot
past the end.- bool include_this_tid[MaxHeapTuplesPerPage]; + /* Sized +1 because OffsetNumbers are 1-based and can reach MaxHeapTuplesPerPage. */ + bool include_this_tid[MaxHeapTuplesPerPage + 1];The offset number begins at 1. Hence, instead of making this array
larger by one, you could keep it at the same size and adjust the array
index to use (offno - 1) instead.
I think Surya's approach is worth considering here. Making the helper
array 1-based aligns it naturally with PostgreSQL's convention, where
page offsets are 1-based, and that consistency has real readability
benefits.
With a 1-based helper array:
bool include_this_tid[MaxHeapTuplesPerPage + 1];
we can write:
include_this_tid[offno] = true;
if (!include_this_tid[curoff])
continue;
i.e. we can simply "mark this offset number" and "check whether this
offset number is included" - no mental translation required.
With a zero-based helper array:
bool include_this_tid[MaxHeapTuplesPerPage];
every access has to do a conversion:
include_this_tid[offno - 1] = true;
if (!include_this_tid[curoff - 1])
continue;
This works correctly, but it places an ongoing burden on anyone
reading or modifying the code - they need to keep in mind that page
offsets are 1-based, that this particular array is 0-based, that the
subtraction must be applied consistently, that it should be skipped
for InvalidOffsetNumber, and that the two index spaces should not be
inadvertently mixed in future edits.
These are admittedly small risks, but they are real ones. Keeping the
array 1-based eliminates that entire class of potential confusion and
makes the code easier to maintain going forward. I'd lean toward
Surya's approach for that reason.
--
With Regards,
Ashutosh Sharma.
On Thu, Jun 04, 2026 at 02:42:23PM +0530, Ashutosh Sharma wrote:
These are admittedly small risks, but they are real ones. Keeping the
array 1-based eliminates that entire class of potential confusion and
makes the code easier to maintain going forward. I'd lean toward
Surya's approach for that reason.
That depends on the code path involved:
- pruneheap.c has "processed" and "htsv", that use a +1 index to avoid
the substract, where we also worry about performance.
- heapam_handler.c has in_index, that uses a -1 index.
At the end, the first pattern is an outlier, we don't need to worry
about performance in pg_surgery, and we're talking about three lines
of code in pg_surgery to change (two for include_this_tid, one for the
assertion). With all that in mind, I'd just do a -1 conversion and
call it a day. :)
--
Michael
On Fri, Jun 05, 2026 at 08:17:15AM +0900, Michael Paquier wrote:
At the end, the first pattern is an outlier, we don't need to worry
about performance in pg_surgery, and we're talking about three lines
of code in pg_surgery to change (two for include_this_tid, one for the
assertion). With all that in mind, I'd just do a -1 conversion and
call it a day. :)
Which implies something like the simpler patch attached.
--
Michael
Attachments:
surgery-oneoff.patchtext/plain; charset=us-asciiDownload+3-3
Hi Michael,
Thanks for the patch.
On Fri, Jun 5, 2026 at 12:27 PM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Jun 05, 2026 at 08:17:15AM +0900, Michael Paquier wrote:
At the end, the first pattern is an outlier, we don't need to worry
about performance in pg_surgery, and we're talking about three lines
of code in pg_surgery to change (two for include_this_tid, one for the
assertion). With all that in mind, I'd just do a -1 conversion and
call it a day. :)Which implies something like the simpler patch attached.
I have one small comment:
"+ Assert((offno - 1) < MaxHeapTuplesPerPage);"
I think this can be simplified to:
Assert(offno <= MaxHeapTuplesPerPage);
Since "offno" is already 1-based, there doesn't seem to be a need to
subtract 1 from it and adjust the comparison accordingly.
--
With Regards,
Ashutosh Sharma.
On Fri, Jun 05, 2026 at 01:30:42PM +0530, Ashutosh Sharma wrote:
Since "offno" is already 1-based, there doesn't seem to be a need to
subtract 1 from it and adjust the comparison accordingly.
Sure. Changed as you have suggested here, and backpatched down to
v14.
--
Michael