Regression in statement locations

Started by David Steele11 months ago12 messageshackers
Jump to latest
#1David Steele
david@pgmasters.net

Hackers,

499edb0 introduced more precise locations for nested statements. In
general it works quite well and has made the pgAudit output much more
readable -- so kudos for that.

However, I have noticed one regression in the pgAudit tests. We have
this somewhat odd statement intentionally crafted for catch issues with
logging:

--
-- Test obfuscated dynamic sql for clean logging
DO $$
DECLARE
table_name TEXT = 'do_table';
BEGIN
EXECUTE E'\t\n\r CREATE TABLE ' || table_name || E' ("weird name"
INT)\t\n\r ; DROP table ' || table_name;
END $$;

In PG17 pgAudit logs the expected output:

NOTICE: AUDIT: SESSION,35,2,DDL,CREATE
TABLE,TABLE,public.do_table,"CREATE TABLE do_table (""weird name""
INT)",<none>
NOTICE: AUDIT: SESSION,35,3,DDL,DROP TABLE,TABLE,public.do_table,DROP
table do_table,<none>

But in PG18 we now get:

NOTICE: AUDIT: SESSION,35,2,DDL,CREATE
TABLE,TABLE,public.do_table,"CREATE TABLE do_table (""weird name""
INT)",<none>
NOTICE: AUDIT: SESSION,35,3,DDL,DROP
TABLE,TABLE,public.do_table,"CREATE TABLE do_table (""weird name"" INT)

; DROP table do_table",<none>

The create table statement is fine but the drop table statement has
create table glued to it. I tried removing the extra quotes, whitespace,
etc. from our test and the result is the same.

I had a look through 499edb0 and did not see anything obvious. It is
possible that I am missing a required change on my side but since the
commit did not make any code changes to pg_stat_statements I do not
think so. It is also possible that the regression is not coming from
499edb0 but I do not see another obvious candidate.

Thoughts?

Thanks,
-David

#2Michael Paquier
michael@paquier.xyz
In reply to: David Steele (#1)
Re: Regression in statement locations

On Mon, May 19, 2025 at 02:31:24PM +0000, David Steele wrote:

But in PG18 we now get:

NOTICE: AUDIT: SESSION,35,2,DDL,CREATE TABLE,TABLE,public.do_table,"CREATE
TABLE do_table (""weird name"" INT)",<none>
NOTICE: AUDIT: SESSION,35,3,DDL,DROP TABLE,TABLE,public.do_table,"CREATE
TABLE do_table (""weird name"" INT)

; DROP table do_table",<none>

The create table statement is fine but the drop table statement has create
table glued to it. I tried removing the extra quotes, whitespace, etc. from
our test and the result is the same.

Yes, I've noticed this one while looking at the output generated by
pgaudit through the new logic when I've posted this message:
/messages/by-id/ZzqXU-2uTWzC38jO@paquier.xyz

If I remember what was on my mind back then (don't have this \r\n case
specifically written down on my notes from last November as far as I
can see now), I was thinking that this new output is expected as this
is the full query string for the nested statement passed down to the
EXECUTE clause done in this PL function. Anthonin, what's your take?

I am adding Jian in CC, who also participated on the thread.
--
Michael

#3Sami Imseih
samimseih@gmail.com
In reply to: Michael Paquier (#2)
Re: Regression in statement locations

It is also possible that the regression is not coming from
499edb0 but I do not see another obvious candidate.

I used pg_stat_statements to repro the issue, and a bisect
resulted in 499edb0 being the source of the regression.

```
select pg_stat_statements_reset();

set pg_stat_statements.track='all';
DO $$
DECLARE
BEGIN
EXECUTE 'CREATE TABLE do_table (weird_name INT); DROP table do_table';
END $$;
select query from pg_stat_statements where not toplevel;
```

WITHOUT a semicolon at the end of the statements, as reported by David

```
EXECUTE 'CREATE TABLE do_table (weird_name INT); DROP table do_table';
```

I can reproduce the issue

query
-------------------------------------------------------------
CREATE TABLE do_table (weird_name INT); DROP table do_table
CREATE TABLE do_table (weird_name INT)
(2 rows)

WITH a semicolon at the end of the statements
```
EXECUTE 'CREATE TABLE do_table (weird_name INT); DROP table do_table;';
```

I cannot reproduce the issue

query
----------------------------------------
DROP table do_table
CREATE TABLE do_table (weird_name INT)
(2 rows)

I am still not sure why this is the case, but wanted to share this
for now.

--
Sami Imseih
Amazon Web Services (AWS)

#4Michael Paquier
michael@paquier.xyz
In reply to: Sami Imseih (#3)
Re: Regression in statement locations

On Mon, May 19, 2025 at 05:10:14PM -0500, Sami Imseih wrote:

I am still not sure why this is the case, but wanted to share this
for now.

Hmm. Something seems to not be compiling well for the final query of
a stmtmulti in gram.y with updateRawStmtEnd(), as we rely on the
position of the semicolon to decide what the stmt_len should be with
multiple querues. We don't set the stmt_len without the semicolon,
causing pgss to reuse the full query length when storing the entry of
the last query.

With the semicolon in place, stmt_len gets set for the last query of
the string. Still digging more..
--
Michael

#5Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#4)
Re: Regression in statement locations

On Tue, May 20, 2025 at 08:38:47AM +0900, Michael Paquier wrote:

With the semicolon in place, stmt_len gets set for the last query of
the string. Still digging more..

And got it. The problem is that we are failing to update the
statement location in a couple of cases with subqueries, and that we
should handle (p_stmt_len == 0) as of using the remaining bytes in
the string when a location is available, but the code was too
aggressive in thinking that the length = 0 case should be always
discarded.

Once I have fixed that, I've been a bit puzzled by the difference in
output in the tests of pg_overexplain, but I think that the new output
is actually the correct one: the queries whose plan outputs have
changed are passed as arguments of the explain_filter() function,
hence the location of the inner queries point at the start location of
the inner query instead of the start of the top-level query. Note
that if you add a semicolon at the end of these three queries in the
pg_overexplain tests, we finish with an end location reported.

I have also played with 499edb0 reverted and noted that the results of
pg_overexplain were inconsistent when the module has been originally
introduced, with two queries choking a bit.
--
Michael

Attachments:

0001-Fix-regression-with-statement-location-of-some-subqu.patchtext/x-diff; charset=us-asciiDownload+102-16
#6jian he
jian.universality@gmail.com
In reply to: Michael Paquier (#5)
Re: Regression in statement locations

On Tue, May 20, 2025 at 11:59 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, May 20, 2025 at 08:38:47AM +0900, Michael Paquier wrote:

With the semicolon in place, stmt_len gets set for the last query of
the string. Still digging more..

And got it. The problem is that we are failing to update the
statement location in a couple of cases with subqueries, and that we
should handle (p_stmt_len == 0) as of using the remaining bytes in
the string when a location is available, but the code was too
aggressive in thinking that the length = 0 case should be always
discarded.

I was thinking of using strlen(pstate->p_sourcetext) - qry->stmt_location;
in setQueryLocationAndLength,
then I saw your changes in the pg_overexplain module.

maybe put some comments on top of setQueryLocationAndLength
for (qry->stmt_len == 0) case, but i saw the comments on the bottom
of setQueryLocationAndLength.

Overall, it looks good to me.

#7Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: jian he (#6)
Re: Regression in statement locations

On Tue, May 20, 2025 at 5:59 AM Michael Paquier <michael@paquier.xyz> wrote:

Once I have fixed that, I've been a bit puzzled by the difference in
output in the tests of pg_overexplain, but I think that the new output
is actually the correct one: the queries whose plan outputs have
changed are passed as arguments of the explain_filter() function,
hence the location of the inner queries point at the start location of
the inner query instead of the start of the top-level query. Note
that if you add a semicolon at the end of these three queries in the
pg_overexplain tests, we finish with an end location reported.

Indeed, by dumping more details on the query string and parsed string
in pg_overexplain, the reported location does match the inner SELECT.
This seems appropriate since it is for the planned select statement.

Executor Parameter Types: none
Query String: EXPLAIN (DEBUG, RANGE_TABLE, COSTS OFF) SELECT
genus, array_agg(name ORDER BY name) FROM vegetables GROUP BY genus
Parse Location: 41 to end
Parsed String: SELECT genus, array_agg(name ORDER BY name) FROM
vegetables GROUP BY genus

Looking at the tests, there are 2 additionals empty DO blocks:
+DO $$
+DECLARE BEGIN
+END $$;

What's the point of those? They won't be visible in the output since
we have 'toplevel IS FALSE' in the pg_stat_statements calls and they
don't fit the "DO block --- multiple inner queries with separators".

Other than that, the patch looks good.

#8Michael Paquier
michael@paquier.xyz
In reply to: Anthonin Bonnefoy (#7)
Re: Regression in statement locations

On Tue, May 20, 2025 at 09:58:04AM +0200, Anthonin Bonnefoy wrote:

Looking at the tests, there are 2 additionals empty DO blocks:
+DO $$
+DECLARE BEGIN
+END $$;

What's the point of those? They won't be visible in the output since
we have 'toplevel IS FALSE' in the pg_stat_statements calls and they
don't fit the "DO block --- multiple inner queries with separators".

That's a copy-pasto. Will remove.

Other than that, the patch looks good.

Thanks for the review, Anthonin and Jian.
--
Michael

#9Sami Imseih
samimseih@gmail.com
In reply to: Michael Paquier (#8)
Re: Regression in statement locations

Tested the patch and it looks good to me.

Not that I thought it would fail, but I also confirmed the pgaudit case
works as expected.

```
LOG: AUDIT: SESSION,10,2,DDL,CREATE TABLE,,,"CREATE TABLE do_table
(""weird name""
INT)",<not logged>
LOG: AUDIT: SESSION,10,3,DDL,DROP TABLE,,,DROP table do_table,<not logged>
DO
```

--
Sami

#10David Steele
david@pgmasters.net
In reply to: Sami Imseih (#9)
Re: Regression in statement locations

On 5/20/25 07:34, Sami Imseih wrote:

Tested the patch and it looks good to me.

Not that I thought it would fail, but I also confirmed the pgaudit case
works as expected.

I also tested and everything looks good with the patch.

I did a careful examination of the remaining diffs (there are quite a
few) and in every case I consider them to be beneficial, i.e. they make
the output more targeted and readable.

I did not do a real code review, but I did notice that the test table
column is called weird_name as in our tests. I would argue that since it
is missing the quotes and space it is not really all that weird and
should maybe get a normal name so developers in the future don't wonder
what is weird about it.

Thank you for this improvement and the quick fix!

Regards,
-David

#11Michael Paquier
michael@paquier.xyz
In reply to: David Steele (#10)
Re: Regression in statement locations

On Tue, May 20, 2025 at 02:04:20PM +0000, David Steele wrote:

I did a careful examination of the remaining diffs (there are quite a few)
and in every case I consider them to be beneficial, i.e. they make the
output more targeted and readable.

I did not do a real code review, but I did notice that the test table column
is called weird_name as in our tests. I would argue that since it is missing
the quotes and space it is not really all that weird and should maybe get a
normal name so developers in the future don't wonder what is weird about it.

I have fixed that, as it is not a weird attribute, removed the
unnedeed DO blocks in the tests as pointed out by Anthonin, and moved
one comment as pointed out by Jian.

Thank you for this improvement and the quick fix!

Yeah, thanks all for pointing out that sometimes my analysis of things
can go off tracks.

The fix has been applied now on HEAD. I've also checked the state of
pgaudit on branch dev-pg18, with the regression getting fixed. Things
look clear now, at least from my side.
--
Michael

#12David Steele
david@pgmasters.net
In reply to: Michael Paquier (#11)
Re: Regression in statement locations

On 5/20/25 21:28, Michael Paquier wrote:

The fix has been applied now on HEAD. I've also checked the state of
pgaudit on branch dev-pg18, with the regression getting fixed. Things
look clear now, at least from my side.

Just retested and it looks good from my side, too. Thanks again!

-David