TRAP: failed Assert("offsets[i] > offsets[i - 1]"), File: "tidstore.c"

Started by Andrei Lepikhov14 days ago13 messagesbugs

lepihov@gmail.com

14 days ago

Hi,

While experimenting with query plans, I periodically see test_tidstore
fail on the assertion in TidStoreSetBlockOffsets().

The cause is that the harness function do_set_block_offsets() forwards
the SQL array straight to TidStoreSetBlockOffsets(), which has an
explicit contract:

"The offset numbers 'offsets' must be sorted in ascending order."

array_agg() without ORDER BY gives no such guarantee, and plan shapes
that reshuffle the input can deliver the offsets out of order and trip
the Assert.

The issue is minor and doesn't expose any underlying bug, but it is
still worth fixing: it removes a source of flaky test runs and makes
life easier for extension developers who reuse the same pattern.

Patch attached.

--
regards, Andrei Lepikhov,
pgEdge

Masahiko Sawada

sawada.mshk@gmail.com

14 days ago

In reply to: Andrei Lepikhov (#1)

Re: TRAP: failed Assert("offsets[i] > offsets[i - 1]"), File: "tidstore.c"

On Wed, Apr 15, 2026 at 5:48 AM Andrei Lepikhov <lepihov@gmail.com> wrote:

Hi,

While experimenting with query plans, I periodically see test_tidstore
fail on the assertion in TidStoreSetBlockOffsets().

The cause is that the harness function do_set_block_offsets() forwards
the SQL array straight to TidStoreSetBlockOffsets(), which has an
explicit contract:

"The offset numbers 'offsets' must be sorted in ascending order."

array_agg() without ORDER BY gives no such guarantee, and plan shapes
that reshuffle the input can deliver the offsets out of order and trip
the Assert.

The issue is minor and doesn't expose any underlying bug, but it is
still worth fixing: it removes a source of flaky test runs and makes
life easier for extension developers who reuse the same pattern.

Patch attached.

Thank you for the report.

Could you provide the reproducer of the assertion failure? IIRC there
have not been such reports on the community so far and the test should
be included in the patch anyway.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Andrei Lepikhov

lepihov@gmail.com

14 days ago

In reply to: Masahiko Sawada (#2)

Re: TRAP: failed Assert("offsets[i] > offsets[i - 1]"), File: "tidstore.c"

On 15/04/2026 22:50, Masahiko Sawada wrote:

On Wed, Apr 15, 2026 at 5:48 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
Could you provide the reproducer of the assertion failure? IIRC there
have not been such reports on the community so far and the test should
be included in the patch anyway.

Sure! See in attachment.

--
regards, Andrei Lepikhov,
pgEdge

Masahiko Sawada

sawada.mshk@gmail.com

14 days ago

In reply to: Andrei Lepikhov (#3)

Re: TRAP: failed Assert("offsets[i] > offsets[i - 1]"), File: "tidstore.c"

On Thu, Apr 16, 2026 at 12:13 AM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 15/04/2026 22:50, Masahiko Sawada wrote:

On Wed, Apr 15, 2026 at 5:48 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
Could you provide the reproducer of the assertion failure? IIRC there
have not been such reports on the community so far and the test should
be included in the patch anyway.

Sure! See in attachment.

Thank you for updating the patch.

IIUC the assertion failure could happen only where we do random TIDs
test like below because it's not guaranteed that the offset numbers in
the array after applying DISTICT are sorted.

-- Random TIDs test. The offset numbers are randomized and must be --
unique and ordered. INSERT INTO hideblocks (blockno) SELECT
do_set_block_offsets(blkno, array_agg(DISTINCT greatest((random() *
:maxoffset)::int, 1))::int2[]) FROM generate_series(1, 100)
num_offsets, generate_series(1000, 1100, 1) blkno GROUP BY blkno;

While I agree that we need to sort the offset numbers, I think it
would be better to make sure the offset numbers in the array to be
sorted in a test_tidstore.sql file where required, instead of doing so
for all cases.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Andrei Lepikhov

lepihov@gmail.com

14 days ago

In reply to: Masahiko Sawada (#4)

Re: TRAP: failed Assert("offsets[i] > offsets[i - 1]"), File: "tidstore.c"

On 16/04/2026 10:11, Masahiko Sawada wrote:

On Thu, Apr 16, 2026 at 12:13 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
-- Random TIDs test. The offset numbers are randomized and must be --
unique and ordered. INSERT INTO hideblocks (blockno) SELECT
do_set_block_offsets(blkno, array_agg(DISTINCT greatest((random() *
:maxoffset)::int, 1))::int2[]) FROM generate_series(1, 100)
num_offsets, generate_series(1000, 1100, 1) blkno GROUP BY blkno;

Alright, I used an explicit sort in reverse order to make sure the test is
stable. I usually create modules that may change different paths, costs, and
orders, and using random can make things unpredictable. But for this specific
test, I don't see any risk.

While I agree that we need to sort the offset numbers, I think it
would be better to make sure the offset numbers in the array to be
sorted in a test_tidstore.sql file where required, instead of doing so
for all cases.

I'm not sure I follow. Are you saying that do_set_block_offsets shouldn't sort
the incoming offsets? I made this change mainly to meet the
TidStoreSetBlockOffsets contract. Since this is just a simple test, performance
isn't really a concern.

--
regards, Andrei Lepikhov,
pgEdge

Masahiko Sawada

sawada.mshk@gmail.com

13 days ago

In reply to: Andrei Lepikhov (#5)

Re: TRAP: failed Assert("offsets[i] > offsets[i - 1]"), File: "tidstore.c"

On Thu, Apr 16, 2026 at 1:26 AM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 16/04/2026 10:11, Masahiko Sawada wrote:

On Thu, Apr 16, 2026 at 12:13 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
-- Random TIDs test. The offset numbers are randomized and must be --
unique and ordered. INSERT INTO hideblocks (blockno) SELECT
do_set_block_offsets(blkno, array_agg(DISTINCT greatest((random() *
:maxoffset)::int, 1))::int2[]) FROM generate_series(1, 100)
num_offsets, generate_series(1000, 1100, 1) blkno GROUP BY blkno;

Alright, I used an explicit sort in reverse order to make sure the test is
stable. I usually create modules that may change different paths, costs, and
orders, and using random can make things unpredictable. But for this specific
test, I don't see any risk.

While I agree that we need to sort the offset numbers, I think it
would be better to make sure the offset numbers in the array to be
sorted in a test_tidstore.sql file where required, instead of doing so
for all cases.

I'm not sure I follow. Are you saying that do_set_block_offsets shouldn't sort
the incoming offsets?

No, I wanted to mean that if we sort the given array in
do_set_block_offsets() as the proposed patch does, we end up always
sorting arrays even if the sorting is no actually required (e.g., when
executing "SELECT do_set_block_offsets(1,
array[1,2,3,4,100]::int2[]);"). So an alternative idea to stabilize
the regression test would be to create a SQL function to return a list
of sorted offsets and use it where it's required. While the patch gets
a little bigger, It would also help simplify the tests somewhat by
removing the redundant codes. I've attached the patch for this idea.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Andrei Lepikhov

lepihov@gmail.com

12 days ago

In reply to: Masahiko Sawada (#6)

Re: TRAP: failed Assert("offsets[i] > offsets[i - 1]"), File: "tidstore.c"

On 16/04/2026 19:58, Masahiko Sawada wrote:

On Thu, Apr 16, 2026 at 1:26 AM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 16/04/2026 10:11, Masahiko Sawada wrote:

On Thu, Apr 16, 2026 at 12:13 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
-- Random TIDs test. The offset numbers are randomized and must be --
unique and ordered. INSERT INTO hideblocks (blockno) SELECT
do_set_block_offsets(blkno, array_agg(DISTINCT greatest((random() *
:maxoffset)::int, 1))::int2[]) FROM generate_series(1, 100)
num_offsets, generate_series(1000, 1100, 1) blkno GROUP BY blkno;

Alright, I used an explicit sort in reverse order to make sure the test is
stable. I usually create modules that may change different paths, costs, and
orders, and using random can make things unpredictable. But for this specific
test, I don't see any risk.

While I agree that we need to sort the offset numbers, I think it
would be better to make sure the offset numbers in the array to be
sorted in a test_tidstore.sql file where required, instead of doing so
for all cases.

I'm not sure I follow. Are you saying that do_set_block_offsets shouldn't sort
the incoming offsets?

No, I wanted to mean that if we sort the given array in
do_set_block_offsets() as the proposed patch does, we end up always
sorting arrays even if the sorting is no actually required (e.g., when
executing "SELECT do_set_block_offsets(1,
array[1,2,3,4,100]::int2[]);"). So an alternative idea to stabilize
the regression test would be to create a SQL function to return a list
of sorted offsets and use it where it's required. While the patch gets
a little bigger, It would also help simplify the tests somewhat by
removing the redundant codes. I've attached the patch for this idea.

Ok. No objections. Both changes are just test routines registered by the
test_tidstore module.

I decided to add C code, mostly following the idea that we reuse examples from
the Postgres codebase when writing our patches/extensions. An explicit
demonstration of the sort contract on the TidStoreSetBlockOffsets() call might
help developers who don't read function comments each time.

--
regards, Andrei Lepikhov,
pgEdge

Masahiko Sawada

sawada.mshk@gmail.com

7 days ago

In reply to: Andrei Lepikhov (#7)

Re: TRAP: failed Assert("offsets[i] > offsets[i - 1]"), File: "tidstore.c"

On Fri, Apr 17, 2026 at 2:26 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 16/04/2026 19:58, Masahiko Sawada wrote:

On Thu, Apr 16, 2026 at 1:26 AM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 16/04/2026 10:11, Masahiko Sawada wrote:

On Thu, Apr 16, 2026 at 12:13 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
-- Random TIDs test. The offset numbers are randomized and must be --
unique and ordered. INSERT INTO hideblocks (blockno) SELECT
do_set_block_offsets(blkno, array_agg(DISTINCT greatest((random() *
:maxoffset)::int, 1))::int2[]) FROM generate_series(1, 100)
num_offsets, generate_series(1000, 1100, 1) blkno GROUP BY blkno;

Alright, I used an explicit sort in reverse order to make sure the test is
stable. I usually create modules that may change different paths, costs, and
orders, and using random can make things unpredictable. But for this specific
test, I don't see any risk.

While I agree that we need to sort the offset numbers, I think it
would be better to make sure the offset numbers in the array to be
sorted in a test_tidstore.sql file where required, instead of doing so
for all cases.

I'm not sure I follow. Are you saying that do_set_block_offsets shouldn't sort
the incoming offsets?

No, I wanted to mean that if we sort the given array in
do_set_block_offsets() as the proposed patch does, we end up always
sorting arrays even if the sorting is no actually required (e.g., when
executing "SELECT do_set_block_offsets(1,
array[1,2,3,4,100]::int2[]);"). So an alternative idea to stabilize
the regression test would be to create a SQL function to return a list
of sorted offsets and use it where it's required. While the patch gets
a little bigger, It would also help simplify the tests somewhat by
removing the redundant codes. I've attached the patch for this idea.

Ok. No objections. Both changes are just test routines registered by the
test_tidstore module.

I decided to add C code, mostly following the idea that we reuse examples from
the Postgres codebase when writing our patches/extensions. An explicit
demonstration of the sort contract on the TidStoreSetBlockOffsets() call might
help developers who don't read function comments each time.

Understood. After more thoughts, I think your idea would be better.

One thing still unclear to me is in which situation the query inthe
test produces an array of unsorted offset numbers. While I understand
it's not guaranteed that the DISTINCT clause returns the sorted
result, doing DISTINCT in an aggregation function is using sort-based
deduplication. I'd like to confirm that the queries in the test could
end up producing the results that violate the assertion. Is it
possible to do that by changing GUC parameters or something?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Andrei Lepikhov

lepihov@gmail.com

7 days ago

In reply to: Masahiko Sawada (#8)

Re: TRAP: failed Assert("offsets[i] > offsets[i - 1]"), File: "tidstore.c"

On 22/04/2026 18:51, Masahiko Sawada wrote:

On Fri, Apr 17, 2026 at 2:26 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 16/04/2026 19:58, Masahiko Sawada wrote:

Understood. After more thoughts, I think your idea would be better.

One thing still unclear to me is in which situation the query inthe
test produces an array of unsorted offset numbers. While I understand
it's not guaranteed that the DISTINCT clause returns the sorted
result, doing DISTINCT in an aggregation function is using sort-based
deduplication. I'd like to confirm that the queries in the test could
end up producing the results that violate the assertion. Is it
possible to do that by changing GUC parameters or something?

No, this is part of ongoing research into Postgres Optimizer vulnerabilities. I
used two tools: pg_pathcheck [1]https://github.com/danolivo/pg_pathcheck and pg-chaos-mode [2]https://github.com/danolivo/pg-chaos-test. The first tool finds
hidden dangling pointers in pathlists, which we are currently discussing in
another thread. The second is a patch that makes the cost-based decision random
to help uncover hidden or unwritten coding contracts.
Both tools are experimental and not meant for core use; they are only used to
trigger potential issues. In this case, I think the query picked a costly sorted
path, which led to the crash.

[1]: https://github.com/danolivo/pg_pathcheck
[2]: https://github.com/danolivo/pg-chaos-test

--
regards, Andrei Lepikhov,
pgEdge

#10

Masahiko Sawada

sawada.mshk@gmail.com

5 days ago

In reply to: Andrei Lepikhov (#9)

Re: TRAP: failed Assert("offsets[i] > offsets[i - 1]"), File: "tidstore.c"

On Wed, Apr 22, 2026 at 10:23 AM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 22/04/2026 18:51, Masahiko Sawada wrote:

On Fri, Apr 17, 2026 at 2:26 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 16/04/2026 19:58, Masahiko Sawada wrote:

Understood. After more thoughts, I think your idea would be better.

One thing still unclear to me is in which situation the query inthe
test produces an array of unsorted offset numbers. While I understand
it's not guaranteed that the DISTINCT clause returns the sorted
result, doing DISTINCT in an aggregation function is using sort-based
deduplication. I'd like to confirm that the queries in the test could
end up producing the results that violate the assertion. Is it
possible to do that by changing GUC parameters or something?

No, this is part of ongoing research into Postgres Optimizer vulnerabilities. I
used two tools: pg_pathcheck [1] and pg-chaos-mode [2]. The first tool finds
hidden dangling pointers in pathlists, which we are currently discussing in
another thread. The second is a patch that makes the cost-based decision random
to help uncover hidden or unwritten coding contracts.

Thank you for the clarification!

Both tools are experimental and not meant for core use; they are only used to
trigger potential issues. In this case, I think the query picked a costly sorted
path, which led to the crash.

Does this imply that array_agg() could return unsorted results
depending on the plan the optimizer chooses? Or is such a path
currently never selected by the optimizer?

I’m asking because if this scenario never occurs with the current
optimizer, it might make sense to apply the patch only to HEAD (i.e.,
for PG20). On the other hand, backpatching to PG17 might be justified,
given that DISTINCT does not guarantee sorted results in principle,
and the fix could benefit extension development on stable branches.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#11

Andrei Lepikhov

lepihov@gmail.com

4 days ago

In reply to: Masahiko Sawada (#10)

Re: TRAP: failed Assert("offsets[i] > offsets[i - 1]"), File: "tidstore.c"

On 25/04/2026 01:23, Masahiko Sawada wrote:

On Wed, Apr 22, 2026 at 10:23 AM Andrei Lepikhov <lepihov@gmail.com> wrote:

Both tools are experimental and not meant for core use; they are only used to
trigger potential issues. In this case, I think the query picked a costly sorted
path, which led to the crash.

Does this imply that array_agg() could return unsorted results
depending on the plan the optimizer chooses? Or is such a path
currently never selected by the optimizer?

The array_agg() function does not sort its output. In theory, this means the
join could return results in any order, but in practice, I have not seen this
happen.

I’m asking because if this scenario never occurs with the current
optimizer, it might make sense to apply the patch only to HEAD (i.e.,
for PG20). On the other hand, backpatching to PG17 might be justified,
given that DISTINCT does not guarantee sorted results in principle,
and the fix could benefit extension development on stable branches.

In stable versions, the planner's logic remains unchanged. So, it seems
reliable. However, backpatching could help extension developers a little bit.
Since this code fixes a real issue and does not break anything complex, I would
backpatch it. Still, I am fine with just committing it to master if you prefer.

P.S.

I looked into the issue further. The problem happens when the join sides are
shuffled. Here is what I found:

EXPLAIN of the successful execution (unnecessary details stripped):

Insert on pg_temp.hideblocks (cost=1.21..1.66 rows=0 width=0)
-> Subquery Scan on unnamed_subquery (cost=1.21..1.66 rows=5 width=8)
Output: unnamed_subquery.do_set_block_offsets
-> GroupAggregate (cost=1.21..1.61 rows=5 width=16)
Output: do_set_block_offsets("*VALUES*".column1,
(array_agg("*VALUES*_1".column1))::smallint[]), ...
Group Key: "*VALUES*".column1
-> Sort (cost=1.21..1.27 rows=25 width=12)
Output: "*VALUES*".column1, "*VALUES*_1".column1
Sort Key: "*VALUES*".column1
-> Nested Loop (cost=0.00..0.62 rows=25 width=12)
Output: "*VALUES*".column1, "*VALUES*_1".column1
-> Values Scan on "*VALUES*"
(cost=0.00..0.06 rows=5 width=8)
Output: "*VALUES*".column1
-> Values Scan on "*VALUES*_1"
(cost=0.00..0.06 rows=5 width=4)
Output: "*VALUES*_1".column1

EXPLAIN that causes assertion:

Insert on pg_temp.hideblocks (cost=1.03..1.48 rows=0 width=0)
-> Subquery Scan on unnamed_subquery (cost=1.03..1.48 rows=5 width=8)
Output: unnamed_subquery.do_set_block_offsets
-> GroupAggregate (cost=1.03..1.43 rows=5 width=16)
Output: do_set_block_offsets("*VALUES*".column1,
(array_agg("*VALUES*_1".column1))::smallint[]),...
Group Key: "*VALUES*".column1
-> Sort (cost=1.03..1.09 rows=25 width=12)
Output: "*VALUES*".column1, "*VALUES*_1".column1
Sort Key: "*VALUES*".column1
-> Nested Loop (cost=0.00..0.45 rows=25 width=12)
Output: "*VALUES*".column1, "*VALUES*_1".column1
-> Values Scan on "*VALUES*_1"
(cost=0.00..0.06 rows=5 width=4)
Output: "*VALUES*_1".column1
-> Materialize (cost=0.00..0.09 rows=5 width=8)
Output: "*VALUES*".column1
-> Values Scan on "*VALUES*"
(cost=0.00..0.06 rows=5 width=8)
Output: "*VALUES*".column1

At the second case offsets have come to the aggregation without order that
highlighted the issue.

--
regards, Andrei Lepikhov,
pgEdge

#12

Masahiko Sawada

sawada.mshk@gmail.com

1 day ago

In reply to: Andrei Lepikhov (#11)

Re: TRAP: failed Assert("offsets[i] > offsets[i - 1]"), File: "tidstore.c"

On Sun, Apr 26, 2026 at 2:06 AM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 25/04/2026 01:23, Masahiko Sawada wrote:

On Wed, Apr 22, 2026 at 10:23 AM Andrei Lepikhov <lepihov@gmail.com> wrote:

Both tools are experimental and not meant for core use; they are only used to
trigger potential issues. In this case, I think the query picked a costly sorted
path, which led to the crash.

Does this imply that array_agg() could return unsorted results
depending on the plan the optimizer chooses? Or is such a path
currently never selected by the optimizer?

The array_agg() function does not sort its output. In theory, this means the
join could return results in any order, but in practice, I have not seen this
happen.

I’m asking because if this scenario never occurs with the current
optimizer, it might make sense to apply the patch only to HEAD (i.e.,
for PG20). On the other hand, backpatching to PG17 might be justified,
given that DISTINCT does not guarantee sorted results in principle,
and the fix could benefit extension development on stable branches.

In stable versions, the planner's logic remains unchanged. So, it seems
reliable. However, backpatching could help extension developers a little bit.
Since this code fixes a real issue and does not break anything complex, I would
backpatch it. Still, I am fine with just committing it to master if you prefer.

P.S.

I looked into the issue further. The problem happens when the join sides are
shuffled. Here is what I found:

EXPLAIN of the successful execution (unnecessary details stripped):

Insert on pg_temp.hideblocks (cost=1.21..1.66 rows=0 width=0)
-> Subquery Scan on unnamed_subquery (cost=1.21..1.66 rows=5 width=8)
Output: unnamed_subquery.do_set_block_offsets
-> GroupAggregate (cost=1.21..1.61 rows=5 width=16)
Output: do_set_block_offsets("*VALUES*".column1,
(array_agg("*VALUES*_1".column1))::smallint[]), ...
Group Key: "*VALUES*".column1
-> Sort (cost=1.21..1.27 rows=25 width=12)
Output: "*VALUES*".column1, "*VALUES*_1".column1
Sort Key: "*VALUES*".column1
-> Nested Loop (cost=0.00..0.62 rows=25 width=12)
Output: "*VALUES*".column1, "*VALUES*_1".column1
-> Values Scan on "*VALUES*"
(cost=0.00..0.06 rows=5 width=8)
Output: "*VALUES*".column1
-> Values Scan on "*VALUES*_1"
(cost=0.00..0.06 rows=5 width=4)
Output: "*VALUES*_1".column1

EXPLAIN that causes assertion:

Insert on pg_temp.hideblocks (cost=1.03..1.48 rows=0 width=0)
-> Subquery Scan on unnamed_subquery (cost=1.03..1.48 rows=5 width=8)
Output: unnamed_subquery.do_set_block_offsets
-> GroupAggregate (cost=1.03..1.43 rows=5 width=16)
Output: do_set_block_offsets("*VALUES*".column1,
(array_agg("*VALUES*_1".column1))::smallint[]),...
Group Key: "*VALUES*".column1
-> Sort (cost=1.03..1.09 rows=25 width=12)
Output: "*VALUES*".column1, "*VALUES*_1".column1
Sort Key: "*VALUES*".column1
-> Nested Loop (cost=0.00..0.45 rows=25 width=12)
Output: "*VALUES*".column1, "*VALUES*_1".column1
-> Values Scan on "*VALUES*_1"
(cost=0.00..0.06 rows=5 width=4)
Output: "*VALUES*_1".column1
-> Materialize (cost=0.00..0.09 rows=5 width=8)
Output: "*VALUES*".column1
-> Values Scan on "*VALUES*"
(cost=0.00..0.06 rows=5 width=8)
Output: "*VALUES*".column1

At the second case offsets have come to the aggregation without order that
highlighted the issue.

Thank you for sharing the details.

While the assertion failure is not observed during regular regression
tests because the query is simple enough that the optimizer
consistently chooses plans producing the sorted results, given that
the DISTINCT without the ORDER BY doesn't guarantee to produce the
sorted results in theory, I think it makes sense to apply the proposed
patch. And, it would also make sense to backpatch to PG17, where
tid_store was introduced, for extension development on back branches.

I've attached the patches. I'm going to push them, barring any objections.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#13

Masahiko Sawada

sawada.mshk@gmail.com

about 8 hours ago

In reply to: Masahiko Sawada (#12)

Re: TRAP: failed Assert("offsets[i] > offsets[i - 1]"), File: "tidstore.c"

On Tue, Apr 28, 2026 at 10:09 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Apr 26, 2026 at 2:06 AM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 25/04/2026 01:23, Masahiko Sawada wrote:

On Wed, Apr 22, 2026 at 10:23 AM Andrei Lepikhov <lepihov@gmail.com> wrote:

Both tools are experimental and not meant for core use; they are only used to
trigger potential issues. In this case, I think the query picked a costly sorted
path, which led to the crash.

Does this imply that array_agg() could return unsorted results
depending on the plan the optimizer chooses? Or is such a path
currently never selected by the optimizer?

The array_agg() function does not sort its output. In theory, this means the
join could return results in any order, but in practice, I have not seen this
happen.

I’m asking because if this scenario never occurs with the current
optimizer, it might make sense to apply the patch only to HEAD (i.e.,
for PG20). On the other hand, backpatching to PG17 might be justified,
given that DISTINCT does not guarantee sorted results in principle,
and the fix could benefit extension development on stable branches.

In stable versions, the planner's logic remains unchanged. So, it seems
reliable. However, backpatching could help extension developers a little bit.
Since this code fixes a real issue and does not break anything complex, I would
backpatch it. Still, I am fine with just committing it to master if you prefer.

P.S.

I looked into the issue further. The problem happens when the join sides are
shuffled. Here is what I found:

EXPLAIN of the successful execution (unnecessary details stripped):

Insert on pg_temp.hideblocks (cost=1.21..1.66 rows=0 width=0)
-> Subquery Scan on unnamed_subquery (cost=1.21..1.66 rows=5 width=8)
Output: unnamed_subquery.do_set_block_offsets
-> GroupAggregate (cost=1.21..1.61 rows=5 width=16)
Output: do_set_block_offsets("*VALUES*".column1,
(array_agg("*VALUES*_1".column1))::smallint[]), ...
Group Key: "*VALUES*".column1
-> Sort (cost=1.21..1.27 rows=25 width=12)
Output: "*VALUES*".column1, "*VALUES*_1".column1
Sort Key: "*VALUES*".column1
-> Nested Loop (cost=0.00..0.62 rows=25 width=12)
Output: "*VALUES*".column1, "*VALUES*_1".column1
-> Values Scan on "*VALUES*"
(cost=0.00..0.06 rows=5 width=8)
Output: "*VALUES*".column1
-> Values Scan on "*VALUES*_1"
(cost=0.00..0.06 rows=5 width=4)
Output: "*VALUES*_1".column1

EXPLAIN that causes assertion:

Insert on pg_temp.hideblocks (cost=1.03..1.48 rows=0 width=0)
-> Subquery Scan on unnamed_subquery (cost=1.03..1.48 rows=5 width=8)
Output: unnamed_subquery.do_set_block_offsets
-> GroupAggregate (cost=1.03..1.43 rows=5 width=16)
Output: do_set_block_offsets("*VALUES*".column1,
(array_agg("*VALUES*_1".column1))::smallint[]),...
Group Key: "*VALUES*".column1
-> Sort (cost=1.03..1.09 rows=25 width=12)
Output: "*VALUES*".column1, "*VALUES*_1".column1
Sort Key: "*VALUES*".column1
-> Nested Loop (cost=0.00..0.45 rows=25 width=12)
Output: "*VALUES*".column1, "*VALUES*_1".column1
-> Values Scan on "*VALUES*_1"
(cost=0.00..0.06 rows=5 width=4)
Output: "*VALUES*_1".column1
-> Materialize (cost=0.00..0.09 rows=5 width=8)
Output: "*VALUES*".column1
-> Values Scan on "*VALUES*"
(cost=0.00..0.06 rows=5 width=8)
Output: "*VALUES*".column1

At the second case offsets have come to the aggregation without order that
highlighted the issue.

Thank you for sharing the details.

While the assertion failure is not observed during regular regression
tests because the query is simple enough that the optimizer
consistently chooses plans producing the sorted results, given that
the DISTINCT without the ORDER BY doesn't guarantee to produce the
sorted results in theory, I think it makes sense to apply the proposed
patch. And, it would also make sense to backpatch to PG17, where
tid_store was introduced, for extension development on back branches.

I've attached the patches. I'm going to push them, barring any objections.

Pushed.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

TRAP: failed Assert("offsets[i] > offsets[i - 1]"), File: "tidstore.c"

Attachments:

Attachments:

Attachments:

Attachments: