BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements

Started by PG Bug reporting form11 months ago15 messagesbugs

noreply@postgresql.org

11 months ago

The following bug has been logged on the website:

Bug reference: 18947
Logged by: Alexander Kozhemyakin
Email address: a.kozhemyakin@postgrespro.ru
PostgreSQL version: 18beta1
Operating system: Ubuntu 24.04
Description:

Hi, I encountered an assertion failure when running the following query on
the master branch (commit 112e40b8) when building with --enable-cassert
reproduce
initdb -D data
echo "
shared_preload_libraries = 'pg_stat_statements'
" > data/postgresql.auto.conf
pg_ctl -D data -l logfile start
psql <<EOF
(values(1)) limit 1;
EOF
bt
(gdb)
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized
out>) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=<optimized out>) at
./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at
./nptl/pthread_kill.c:89
#3 0x00007b9ecba4527e in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#4 0x00007b9ecba288ff in __GI_abort () at ./stdlib/abort.c:79
#5 0x00005ee635e0da6f in ExceptionalCondition
(conditionName=conditionName@entry=0x7b9ecc55f97d "len_to_wrt >= 0",
fileName=fileName@entry=0x7b9ecc55f000
"/home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c",
lineNumber=lineNumber@entry=2929)
at /home/postgres/test/postgres/src/backend/utils/error/assert.c:66
#6 0x00007b9ecc55c5a6 in generate_normalized_query
(query_len_p=0x7fff53b9a348, query_loc=1, query=0x5ee6575d0df1 "values(1))
limit 1;", jstate=0x5ee657606e88) at
/home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c:2929
#7 pgss_store (query=0x5ee6575d0df1 "values(1)) limit 1;",
queryId=<optimized out>, query_location=<optimized out>,
query_len=<optimized out>, kind=kind@entry=PGSS_INVALID,
total_time=total_time@entry=0, rows=rows@entry=0, bufusage=0x0,
walusage=0x0, jitusage=0x0, jstate=0x5ee657606e88,
parallel_workers_to_launch=0, parallel_workers_launched=0) at
/home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c:1350
#8 0x00007b9ecc55c6ca in pgss_post_parse_analyze (pstate=0x5ee6575d1a98,
query=0x5ee6575d1ba8, jstate=0x5ee657606e88) at
/home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c:868
#9 0x00005ee6359f46cb in parse_analyze_fixedparams
(parseTree=parseTree@entry=0x5ee6575d1a18,
sourceText=sourceText@entry=0x5ee6575d0df0 "(values(1)) limit 1;",
paramTypes=paramTypes@entry=0x0, numParams=numParams@entry=0,
queryEnv=queryEnv@entry=0x0)
at /home/postgres/test/postgres/src/backend/parser/analyze.c:128
#10 0x00005ee635cb71a0 in pg_analyze_and_rewrite_fixedparams (queryEnv=0x0,
numParams=0, paramTypes=0x0, query_string=<optimized out>,
parsetree=<optimized out>) at
/home/postgres/test/postgres/src/backend/tcop/postgres.c:682
#11 exec_simple_query (query_string=0x5ee6575d0df0 "(values(1)) limit 1;")
at /home/postgres/test/postgres/src/backend/tcop/postgres.c:1189
#12 0x00005ee635cb8eb0 in PostgresMain (dbname=<optimized out>,
username=<optimized out>) at
/home/postgres/test/postgres/src/backend/tcop/postgres.c:4766
#13 0x00005ee635cb3293 in BackendMain (startup_data=<optimized out>,
startup_data_len=<optimized out>) at
/home/postgres/test/postgres/src/backend/tcop/backend_startup.c:124
#14 0x00005ee635c021e2 in postmaster_child_launch (child_type=<optimized
out>, child_slot=1, startup_data=startup_data@entry=0x7fff53b9a9d0,
startup_data_len=startup_data_len@entry=24,
client_sock=client_sock@entry=0x7fff53b9a9f0) at
/home/postgres/test/postgres/src/backend/postmaster/launch_backend.c:290
#15 0x00005ee635c06014 in BackendStartup (client_sock=0x7fff53b9a9f0) at
/home/postgres/test/postgres/src/backend/postmaster/postmaster.c:3580
#16 ServerLoop () at
/home/postgres/test/postgres/src/backend/postmaster/postmaster.c:1702
#17 0x00005ee635c07b0e in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x5ee65758a0b0) at
/home/postgres/test/postgres/src/backend/postmaster/postmaster.c:1400
#18 0x00005ee6358d3f4e in main (argc=3, argv=0x5ee65758a0b0) at
/home/postgres/test/postgres/src/backend/main/main.c:227

Dilip Kumar

dilipbalaut@gmail.com

10 months ago

In reply to: PG Bug reporting form (#1)

Re: BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements

On Sat, Jun 7, 2025 at 9:10 PM PG Bug reporting form
<noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 18947
Logged by: Alexander Kozhemyakin
Email address: a.kozhemyakin@postgrespro.ru
PostgreSQL version: 18beta1
Operating system: Ubuntu 24.04
Description:

Hi, I encountered an assertion failure when running the following query on
the master branch (commit 112e40b8) when building with --enable-cassert
reproduce
initdb -D data
echo "
shared_preload_libraries = 'pg_stat_statements'
" > data/postgresql.auto.conf
pg_ctl -D data -l logfile start
psql <<EOF
(values(1)) limit 1;
EOF
bt
(gdb)
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized
out>) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=<optimized out>) at
./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at
./nptl/pthread_kill.c:89
#3 0x00007b9ecba4527e in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#4 0x00007b9ecba288ff in __GI_abort () at ./stdlib/abort.c:79
#5 0x00005ee635e0da6f in ExceptionalCondition
(conditionName=conditionName@entry=0x7b9ecc55f97d "len_to_wrt >= 0",
fileName=fileName@entry=0x7b9ecc55f000
"/home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c",
lineNumber=lineNumber@entry=2929)
at /home/postgres/test/postgres/src/backend/utils/error/assert.c:66
#6 0x00007b9ecc55c5a6 in generate_normalized_query
(query_len_p=0x7fff53b9a348, query_loc=1, query=0x5ee6575d0df1 "values(1))
limit 1;", jstate=0x5ee657606e88) at
/home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c:2929
#7 pgss_store (query=0x5ee6575d0df1 "values(1)) limit 1;",
queryId=<optimized out>, query_location=<optimized out>,
query_len=<optimized out>, kind=kind@entry=PGSS_INVALID,
total_time=total_time@entry=0, rows=rows@entry=0, bufusage=0x0,
walusage=0x0, jitusage=0x0, jstate=0x5ee657606e88,
parallel_workers_to_launch=0, parallel_workers_launched=0) at
/home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c:1350
#8 0x00007b9ecc55c6ca in pgss_post_parse_analyze (pstate=0x5ee6575d1a98,
query=0x5ee6575d1ba8, jstate=0x5ee657606e88) at
/home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c:868
#9 0x00005ee6359f46cb in parse_analyze_fixedparams
(parseTree=parseTree@entry=0x5ee6575d1a18,
sourceText=sourceText@entry=0x5ee6575d0df0 "(values(1)) limit 1;",
paramTypes=paramTypes@entry=0x0, numParams=numParams@entry=0,
queryEnv=queryEnv@entry=0x0)
at /home/postgres/test/postgres/src/backend/parser/analyze.c:128
#10 0x00005ee635cb71a0 in pg_analyze_and_rewrite_fixedparams (queryEnv=0x0,
numParams=0, paramTypes=0x0, query_string=<optimized out>,
parsetree=<optimized out>) at
/home/postgres/test/postgres/src/backend/tcop/postgres.c:682
#11 exec_simple_query (query_string=0x5ee6575d0df0 "(values(1)) limit 1;")
at /home/postgres/test/postgres/src/backend/tcop/postgres.c:1189
#12 0x00005ee635cb8eb0 in PostgresMain (dbname=<optimized out>,
username=<optimized out>) at
/home/postgres/test/postgres/src/backend/tcop/postgres.c:4766
#13 0x00005ee635cb3293 in BackendMain (startup_data=<optimized out>,
startup_data_len=<optimized out>) at
/home/postgres/test/postgres/src/backend/tcop/backend_startup.c:124
#14 0x00005ee635c021e2 in postmaster_child_launch (child_type=<optimized
out>, child_slot=1, startup_data=startup_data@entry=0x7fff53b9a9d0,
startup_data_len=startup_data_len@entry=24,
client_sock=client_sock@entry=0x7fff53b9a9f0) at
/home/postgres/test/postgres/src/backend/postmaster/launch_backend.c:290
#15 0x00005ee635c06014 in BackendStartup (client_sock=0x7fff53b9a9f0) at
/home/postgres/test/postgres/src/backend/postmaster/postmaster.c:3580
#16 ServerLoop () at
/home/postgres/test/postgres/src/backend/postmaster/postmaster.c:1702
#17 0x00005ee635c07b0e in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x5ee65758a0b0) at
/home/postgres/test/postgres/src/backend/postmaster/postmaster.c:1400
#18 0x00005ee6358d3f4e in main (argc=3, argv=0x5ee65758a0b0) at
/home/postgres/test/postgres/src/backend/main/main.c:227

I didn't get time to debug, I might look into it tomorrow if someone
doesn't do it before that, but I am able to reproduce the issue and
seems like problem is with respect to computing the quer_loc in below
loop, and due to that 'quer_loc' is becoming bigger than query_len and
hitting the assert.

for (i = 0; i < jstate->clocations_count; i++)
{
...
quer_loc = off + tok_len;
}

len_to_wrt = query_len - quer_loc;

--
Regards,
Dilip Kumar
Google

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: Dilip Kumar (#2)

Re: BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements

On Sat, Jun 07, 2025 at 10:19:53PM +0530, Dilip Kumar wrote:

I didn't get time to debug, I might look into it tomorrow if someone
doesn't do it before that, but I am able to reproduce the issue and
seems like problem is with respect to computing the quer_loc in below
loop, and due to that 'quer_loc' is becoming bigger than query_len and
hitting the assert.

The loop in charge of assigning the constants in the queries needs to
be smarter in terms of the new locations assigned to sub-queries, here
the VALUES one.

499edb09741b is the origin of the regression, so open item assigned to
me (will look into it later), adding Anthonin in CC.
--
Michael

Dilip Kumar

dilipbalaut@gmail.com

10 months ago

In reply to: Michael Paquier (#3)

Re: BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements

On Sun, Jun 8, 2025 at 8:09 AM Michael Paquier <michael@paquier.xyz> wrote:

On Sat, Jun 07, 2025 at 10:19:53PM +0530, Dilip Kumar wrote:

I didn't get time to debug, I might look into it tomorrow if someone
doesn't do it before that, but I am able to reproduce the issue and
seems like problem is with respect to computing the quer_loc in below
loop, and due to that 'quer_loc' is becoming bigger than query_len and
hitting the assert.

The loop in charge of assigning the constants in the queries needs to
be smarter in terms of the new locations assigned to sub-queries, here
the VALUES one.

499edb09741b is the origin of the regression, so open item assigned to
me (will look into it later), adding Anthonin in CC.

Thanks, I was looking into the commit pointed by you, I found this
change problematic,

transformTopLevelStmt(ParseState *pstate, RawStmt *parseTree)
{
Query *result;

+       /* Store RawStmt's length and location in pstate */
+       pstate->p_stmt_len = parseTree->stmt_len;
+       pstate->p_stmt_location = parseTree->stmt_location;
+
        /* We're at top level, so allow SELECT INTO */
        result = transformOptionalSelectInto(pstate, parseTree->stmt);

- result->stmt_location = parseTree->stmt_location;
- result->stmt_len = parseTree->stmt_len;

So here before this commit we were setting the 'query->stmt_len' as
parseTree->stmt_len from the Top level statement, but after this fix,
we are setting the length of the select statement, in query->stmt_len,
because now we are doing that in
transformOptionalSelectInto->transformStmt->setQueryLocationAndLength
so that is just transferring the select statement and has length of
that. Due to that in generate_normalized_query() we are getting
'quer_loc' bigger than the 'query_len'.

--
Regards,
Dilip Kumar
Google

Fujii Masao

masao.fujii@gmail.com

10 months ago

In reply to: Michael Paquier (#3)

Re: BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements

On 2025/06/08 11:39, Michael Paquier wrote:

On Sat, Jun 07, 2025 at 10:19:53PM +0530, Dilip Kumar wrote:

I didn't get time to debug, I might look into it tomorrow if someone
doesn't do it before that, but I am able to reproduce the issue and
seems like problem is with respect to computing the quer_loc in below
loop, and due to that 'quer_loc' is becoming bigger than query_len and
hitting the assert.

The loop in charge of assigning the constants in the queries needs to
be smarter in terms of the new locations assigned to sub-queries, here
the VALUES one.

499edb09741b is the origin of the regression, so open item assigned to
me (will look into it later), adding Anthonin in CC.

The issue seems to be that the commit missed handling cases where
a clause follows a SELECT query wrapped in parentheses.

If the following clause includes a constant (e.g., LIMIT 1 or
FETCH FIRST 1 ROW ONLY), it can trigger the reported assertion failure.
If it doesn't include a constant (e.g., FETCH FIRST ROW ONLY),
the assertion failure doesn't occur, but only the inner SELECT query (inside
the parentheses) appears in pg_stat_statements, i.e.,
the "FETCH FIRST ROW ONLY" part is missing from pg_stat_statements.query.

+					SelectStmt *n = (SelectStmt *) $2;
+
+					/*
+					 * As SelectStmt's location starts at the SELECT keyword,
+					 * we need to track the length of the SelectStmt within
+					 * parentheses to be able to extract the relevant part
+					 * of the query.  Without this, the RawStmt's length would
+					 * be used and would include the closing parenthesis.
+					 */
+					n->stmt_len = @3 - @2;

Shouldn't this part be skipped in cases where the SELECT with parens is
followed by a clause? At least in those cases, this logic doesn't seem
appropriate.

Regards,

--
Fujii Masao
NTT DATA Japan Corporation

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: Fujii Masao (#5)

Re: BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements

On Mon, Jun 09, 2025 at 07:07:42PM +0900, Fujii Masao wrote:

Shouldn't this part be skipped in cases where the SELECT with parens is
followed by a clause? At least in those cases, this logic doesn't seem
appropriate.

Touché. That means messing with any parent Node that includes or
touches select_with_parens, select_no_parens (JSON_ARRAY has one) or
SelectStmt (few of these, like in CTAS). That's not cool in the long
term because we would expect any new node that englobes a SelectStmt
to be able to set up their inner length and location as they should,
as far as I get it. I was wondering first if we could have done
something with the top-level statement depending on the nesting level
of PGSS, but that's not going to fly high, for example with cases like
this one (any case that has an inner SELECT):
copy ((select 1) union ((select 1)) fetch first 1 row only) to stdout

Even with that in mind, a second thing I was wondering is to compile
the length within select_with_parens, still there seems to be an extra
parenthesis pending at the end of some of the strings. Perhaps I'm
missing a simpler concept, of course.

Anyway, this is the second bug that we have in this area, and this one
is worse. Now that we are in the middle of beta, this is gathering
the signs that we should revert the change from the tree for the
moment and potentially revisit the whole in v19 with these edge cases
handled. So attached is a patch doing a revert of:
499edb09741b, nested statement tracking.
06450c7b8c70, follow-up fix for 499edb09741b.

There was also 6b652e6ce85a for the addition of query ID for CREATE
TABLE AS and DECLARE, and this is still OK as a standalone change.
Its tests are updated with an extra entry added in the non-top-level
case, with the query string matching the top-level string as a result
of the attached.

If you have comments, feel free.
--
Michael

Anthonin Bonnefoy

anthonin.bonnefoy@datadoghq.com

10 months ago

In reply to: Michael Paquier (#6)

Re: BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements

On Tue, Jun 10, 2025 at 7:46 AM Michael Paquier <michael@paquier.xyz> wrote:

Touché. That means messing with any parent Node that includes or
touches select_with_parens, select_no_parens (JSON_ARRAY has one) or
SelectStmt (few of these, like in CTAS). That's not cool in the long
term because we would expect any new node that englobes a SelectStmt
to be able to set up their inner length and location as they should,
as far as I get it. I was wondering first if we could have done
something with the top-level statement depending on the nesting level
of PGSS, but that's not going to fly high, for example with cases like
this one (any case that has an inner SELECT):
copy ((select 1) union ((select 1)) fetch first 1 row only) to stdout

One wrong assumption I made was that parenthesis aren't part of the
select statement, which is definitely wrong with queries like '(SELECT
1) limit 1;'. As we don't have end location tracking, we don't have
the possibility to get the length of the select statement.

I've joined a possible (and very rough, this would definitely require
more tests) fix:
- SelectStmt's location is now the outermost '(' position
- Don't track the length anymore in 'select_no_parens'
- Nodes that are embedding a SelectStatement would need to update its
length if necessary. For COPY, this is straightforward as we can use
the parenthesis location surrounding the statement. With CTAS,
SelectStmt's length should be updated if there's an existing
opt_with_data.

Anyway, this is the second bug that we have in this area, and this one
is worse. Now that we are in the middle of beta, this is gathering
the signs that we should revert the change from the tree for the
moment and potentially revisit the whole in v19 with these edge cases
handled. So attached is a patch doing a revert of:
499edb09741b, nested statement tracking.
06450c7b8c70, follow-up fix for 499edb09741b.

That's fair. I can get a better patch with the approach mentioned, but
I don't know if that would be enough to cover all edge cases. Though
reverting this is also going to break pgAudit tests if I remember
correctly.

Regards,
Anthonin

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: Anthonin Bonnefoy (#7)

Re: BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements

On Tue, Jun 10, 2025 at 12:02:06PM +0200, Anthonin Bonnefoy wrote:

One wrong assumption I made was that parenthesis aren't part of the
select statement, which is definitely wrong with queries like '(SELECT
1) limit 1;'. As we don't have end location tracking, we don't have
the possibility to get the length of the select statement.

Yeah.

I've joined a possible (and very rough, this would definitely require
more tests) fix:
- SelectStmt's location is now the outermost '(' position
- Don't track the length anymore in 'select_no_parens'
- Nodes that are embedding a SelectStatement would need to update its
length if necessary. For COPY, this is straightforward as we can use
the parenthesis location surrounding the statement. With CTAS,
SelectStmt's length should be updated if there's an existing
opt_with_data.

Hmm. I am wondering whether this would be enough; these are not the
only places where we use select_no_parens and select_with_parens. I'd
be curious to see what you think a fully-bloomed fix should look like.
--
Michael

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: Michael Paquier (#8)

Re: BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements

On Tue, Jun 10, 2025 at 08:14:40PM +0900, Michael Paquier wrote:

Hmm. I am wondering whether this would be enough; these are not the
only places where we use select_no_parens and select_with_parens. I'd
be curious to see what you think a fully-bloomed fix should look like.

By the way, this report also means that none of the problematic query
patterns are run in the main regression test suite, or we would have
noticed something immediately when running the recovery test
027_stream_regress.pl that loads PGSS.

So we have a problem in test coverage with query normalization
patterns. But we also have a second, older, problem in terms of
coverage regarding the parentheses patterns that are assumed as
supported in the parser. Even if we finish by reverting the nested
query tracking support for now (let's settle things before beta2 not
after it), we should definitely improve the main regression test suite
in this area to make sure that reintroducing it safe.
--
Michael

#10

Fujii Masao

masao.fujii@gmail.com

10 months ago

In reply to: Michael Paquier (#9)

Re: BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements

On 2025/06/11 7:39, Michael Paquier wrote:

On Tue, Jun 10, 2025 at 08:14:40PM +0900, Michael Paquier wrote:

Hmm. I am wondering whether this would be enough; these are not the
only places where we use select_no_parens and select_with_parens. I'd
be curious to see what you think a fully-bloomed fix should look like.

By the way, this report also means that none of the problematic query
patterns are run in the main regression test suite, or we would have
noticed something immediately when running the recovery test
027_stream_regress.pl that loads PGSS.

So we have a problem in test coverage with query normalization
patterns. But we also have a second, older, problem in terms of
coverage regarding the parentheses patterns that are assumed as
supported in the parser. Even if we finish by reverting the nested
query tracking support for now (let's settle things before beta2 not
after it), we should definitely improve the main regression test suite
in this area to make sure that reintroducing it safe.

Sounds reasonable to me.

Also I agree with reverting the commit and revisiting the feature in v19.

Regards,

--
Fujii Masao
NTT DATA Japan Corporation

#11

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: Fujii Masao (#10)

Re: BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements

On Wed, Jun 11, 2025 at 07:57:19AM +0900, Fujii Masao wrote:

Also I agree with reverting the commit and revisiting the feature in v19.

Yeah, thanks for the input. If we rely on last year's schedule, v17
beta2 was tagged on the 24th of June, meaning that we should have
approximately a bit more than one week to settle entirely the matter
of this thread. That's just too short in my experience seeing the
query patterns to analyze, and we are dealing with what is going to be
a stable branch very soon.
--
Michael

#12

Anthonin Bonnefoy

anthonin.bonnefoy@datadoghq.com

10 months ago

In reply to: Michael Paquier (#11)

Re: BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements

Here's a v2 for the attempted fix. This is using the mentioned approach:
- SelectStmt's location is set to Select's position when the node if built
- If there are surrounding parentheses, we include them and move the
start of the statement to the outermost '('

The Statement length is only set for:
- COPY: We use the PreparableStatement's surrounding '()' to set both
location and length (setting location should be redundant here though)
- CTAS: we set the statement's length if there's a tailing 'WITH (NO) DATA'.
- Other than those, the statement length will fallback to use the top
RawStmt's remaining length.

I haven't done it for Views and JSON_ARRAY. Their nested queries
aren't currently tracked and reported by pgss so there's no way to
test those. This could be done but there's no available way to test
this is correct AFAIK?

#13

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: Anthonin Bonnefoy (#12)

Re: BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements

On Wed, Jun 11, 2025 at 11:04:36AM +0200, Anthonin Bonnefoy wrote:

Here's a v2 for the attempted fix. This is using the mentioned approach:
- SelectStmt's location is set to Select's position when the node if built
- If there are surrounding parentheses, we include them and move the
start of the statement to the outermost '('

The Statement length is only set for:
- COPY: We use the PreparableStatement's surrounding '()' to set both
location and length (setting location should be redundant here though)
- CTAS: we set the statement's length if there's a tailing 'WITH (NO) DATA'.
- Other than those, the statement length will fallback to use the top
RawStmt's remaining length.

@@ -3417,9 +3418,9 @@ CopyStmt: COPY opt_binary qualified_name opt_column_list 
[...]
-     updatePreparableStmtEnd($3, @2, @4 - @2);
+     updatePreparableStmtEnd($3, @3, @4 - @3);

For the case of COPY, it is possible to get a much better report by
setting the location and length to not include the first parenthesis,
like that.

I haven't done it for Views and JSON_ARRAY. Their nested queries
aren't currently tracked and reported by pgss so there's no way to
test those.

I have to admit that it is inconsistent to set a location in their
respective inner nodes while we don't use updateSelectStmtEnd() to
adjust the length based on the location assigned by the parser.
There's a risk of having a code path in the future looking at the
location later on based on the parsed state and taking a wrong
decision because the location and the length do not match. Such cases
could be easier to reach outside of core, for example imagine an
extension looking at a ViewStmt where we set the location but not a
matching length. We would consider as inner query a string that
begins at the beginning of the location until the end of the string
because the length is not set to match with what the parser finds.

This could be done but there's no available way to test
this is correct AFAIK?

Yes, perhaps we should think harder about this part. There are two
patterns among a few others like UNION, which can also be used in
views.

Anyway, I am going to pull the plug on this one today, because we need
to take a decision with beta2 approaching fast but it's mostly me
being not fully confident that this is right as-is. If we're wrong in
one path, that's an out-of-bound memory write waiting for us in PGSS
for the query normalization, as this report proves, and that would be
bad.

The first step would be to improve the main regression test suite for
all these grammar "parenthesis" patterns we have never had coverage
for in core. I am wondering if it is actually possible to reach an
inconsistent state even in older stable branches because we've never
tested all these grammar patterns. Perhaps I'm wrong, but we would be
more confident with all these extra patterns stressed first by
027_stream_regress.pl which forces queries to have some normalization
applied.
--
Michael

#14

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: Michael Paquier (#13)

Re: BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements

On Thu, Jun 12, 2025 at 08:46:26AM +0900, Michael Paquier wrote:

Anyway, I am going to pull the plug on this one today, because we need
to take a decision with beta2 approaching fast but it's mostly me
being not fully confident that this is right as-is. If we're wrong in
one path, that's an out-of-bound memory write waiting for us in PGSS
for the query normalization, as this report proves, and that would be
bad.

Two things that I have in mind regarding all that, worth dumping now:
- 06450c7b8c70 is reverted, but I have kept the test introduced in
this commit to check the location of multiple queries separated by
semicolons. This was something that we failed to track in the tree,
with David Steele's pgaudit only providing coverage.
- About the tests we could do to validate more in depth the locations
and lengths assigned to the parsed node without relying on PGSS and
EXPLAIN, which will support never support the full range of things
like views. I was wondering about the addition of a test module that
plugs into one or more hooks (only the post-parse one should be
enough), where we simply print the Nodes generated to a string using
the facility in src/backend/nodes/print.c. It would be then possible
to filter the output generated with some regex magic to print the
fields we want to check. That would work for the CREATE VIEW case,
for example, and it could be used for other things than just the
statement length and/or locations set in a gIven Node.
--
Michael

#15

Anthonin Bonnefoy

anthonin.bonnefoy@datadoghq.com

10 months ago

In reply to: Michael Paquier (#14)

Re: BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements

On Fri, Jun 13, 2025 at 1:51 AM Michael Paquier <michael@paquier.xyz> wrote:

- About the tests we could do to validate more in depth the locations
and lengths assigned to the parsed node without relying on PGSS and
EXPLAIN, which will support never support the full range of things
like views. I was wondering about the addition of a test module that
plugs into one or more hooks (only the post-parse one should be
enough), where we simply print the Nodes generated to a string using
the facility in src/backend/nodes/print.c. It would be then possible
to filter the output generated with some regex magic to print the
fields we want to check. That would work for the CREATE VIEW case,
for example, and it could be used for other things than just the
statement length and/or locations set in a gIven Node.

Agree on the tests. Having a test module that dumps the parse tree in
an usable way for the tests was definitely something I had in mind.
Relying on PGSS to test parse behaviour definitely has its limit.

BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements

Attachments:

Attachments:

Attachments: