BUG: pg_stat_statements query normalization issues with combined queries

Started by Fabien COELHOover 9 years ago52 messageshackers
Jump to latest
#1Fabien COELHO
coelho@cri.ensmp.fr

Hello,

While investigating a performance issue, I tried to get informations from
pg_stat_statements, however I ran into another issue: it seems that when
using combined queries pg_stat_statements query normalization does not
work properly... 2 queries that should have been mapped to only one are
instead map to... 3 cases, as constants are not all ignored:

query

BEGIN ; +
SELECT data FROM Stuff WHERE id = 1 ; +
SELECT data FROM Stuff WHERE id = 2 ; +
SELECT data FROM Stuff WHERE id = 3 ; +
COMMIT;

BEGIN ; +
SELECT data FROM Stuff WHERE id = 4 ; +
SELECT data FROM Stuff WHERE id = 5 ; +
SELECT data FROM Stuff WHERE id = 6 ; +
COMMIT;

BEGIN ; +
SELECT data FROM Stuff WHERE id = ? +
SELECT data FROM Stuff WHERE id = 2 +
SELECT data FROM Stuff WHERE id = 3 +
COMMIT;

I was expecting the 2 combined queries either to be separated in
individual queries "SELECT data FROM Stuff WHERE id = ?" or in one large
queries with three ?, but not the above result...

--
Fabien

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Fabien COELHO (#1)
Re: BUG: pg_stat_statements query normalization issues with combined queries

I was expecting the 2 combined queries either to be separated in individual
queries "SELECT data FROM Stuff WHERE id = ?" or in one large queries with
three ?, but not the above result...

Oops, I forgot the attachement, see repeat script on 9.6.1

--
Fabien.

Attachments:

combined_statement_issue.sqlapplication/x-sql; name=combined_statement_issue.sqlDownload
#3Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Fabien COELHO (#2)
Re: BUG: pg_stat_statements query normalization issues with combined queries

I was expecting the 2 combined queries either to be separated in individual
queries "SELECT data FROM Stuff WHERE id = ?" or in one large queries with
three ?, but not the above result...

Oops, I forgot the attachement, see repeat script on 9.6.1

After some quick investigation, I concluded that the issue is that the
whole combined query string is used instead of the part refering to the
actual query being processed.

As a result:

- the full un-normalized string is used for both BEGIN & COMMIT
=> n shared entries, one for each actual query, begin & commit are mixed together

- the fist-part only normalized string is used for each SELECT
=> 1 shared entry with "query" is the first partially
normalied encountered combined query

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Fabien COELHO (#3)
Re: BUG: pg_stat_statements query normalization issues with combined queries

Hello again pgdevs,

More investigations conclude that the information is lost by the parser.

From a ;-separated string raw_parser returns a List<Node> which are
directly the nodes of the statements, with empty statements silently
skipped.

The Node entry of high level statements (*Stmt) does NOT contain any
location information, only some lower level nodes have it.

A possible fix of this would be to add the query string location
information systematically at the head of every high level *Stmt, which
would then share both NodeTag and this information (say location and
length). This would be a new intermediate kind of node of which all these
statements would "inherit", say a "ParseNode" structure.

The actual location information may be filled-in at quite a high level in
the parser, maybe around "stmtmulti" and "stmt" rules, so it would not
necessarily be too intrusive in the overall parser grammar.

Then once the information is available, the query string may be cut where
appropriate to only store the relevant string in pg_stat_statements or
above.

Would this approach be acceptable, or is modifying Nodes a no-go area?

If it is acceptable, I can probably put together a patch and submit it.

If not, I suggest to update the documentation to tell that
pg_stat_statements does not work properly with combined queries.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Robert Haas
robertmhaas@gmail.com
In reply to: Fabien COELHO (#4)
Re: BUG: pg_stat_statements query normalization issues with combined queries

On Tue, Dec 20, 2016 at 6:18 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Would this approach be acceptable, or is modifying Nodes a no-go area?

If it is acceptable, I can probably put together a patch and submit it.

If not, I suggest to update the documentation to tell that
pg_stat_statements does not work properly with combined queries.

I think you've found a bug, but I'm a little doubtful about your
proposed fix. However, I haven't studied the code, so I don't know
what other approach might be better.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Robert Haas (#5)
Re: BUG: pg_stat_statements query normalization issues with combined queries

At Tue, 20 Dec 2016 22:42:48 -0500, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoZT94brLAGK7gCmxB4mO=C-Cdz1N8KN8Xen4sexHozN=Q@mail.gmail.com>

On Tue, Dec 20, 2016 at 6:18 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Would this approach be acceptable, or is modifying Nodes a no-go area?

If it is acceptable, I can probably put together a patch and submit it.

If not, I suggest to update the documentation to tell that
pg_stat_statements does not work properly with combined queries.

I think you've found a bug, but I'm a little doubtful about your
proposed fix. However, I haven't studied the code, so I don't know
what other approach might be better.

That will work and doable, but the more fundamental issue here
seems to be that post_parse_analyze_hook or other similar hooks
are called with a Query incosistent with query_debug_string. It
is very conveniently used but the name seems to me to be
suggesting that such usage is out of purpose. I'm not sure,
though.

A similar behavior is shown in error messages but this harms far
less than the case of pg_stat_statements.

ERROR: column "b" does not exist
LINE 1: ...elect * from t where a = 1; select * from t where b = 2; com...
^

1. Let all of the parse node have location in
debug_query_string. (The original proposal)

2. Let the parser return once for each statement (like psql
parser) and corresponding query string is stored in a
varialble other than debug_query_string.

...

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Kyotaro Horiguchi (#6)
Re: BUG: pg_stat_statements query normalization issues with combined queries

Hello Robert & Kyotaro,

I'm a little doubtful about your proposed fix. However, I haven't
studied the code, so I don't know what other approach might be better.

That is indeed the question!

The key point is that the parser parses several statements from a string,
but currently there is no clue about where the queries was found in the
string. Only a parser may know about what is being parsed so can generate
the location information. Now the possible solutions I see are:

- the string is split per query before parsing, but this requires at
least a lexer... it looks pretty uneffective to have another lexing
phase involved, even if an existing lexer is reused.

- the parser processes one query at a time and keep the "remaining
unparse string" in some state that can be queried to check up to
where it proceeded, but then the result must be stored somewhere
and it would make sense that it would be in the statement anyway,
just the management of the location information would be outside
the parser. Also that would add the cost of relaunching the parser,
not sure how bad or insignificant this is. This is (2) below.

- the parser still generates a List<Node> but keep track of the location
of statements doing so, somehow... propably as I outlined.

The query string information can be at some way of pointing in the initial
string, or the substring itself that could be extracted at some point.
I initially suggested the former because this is already what the parser
does for some nodes, and because it seems simpler to do so.

Extracting the string instead would suggest that the location of tokens
within this statement are relative to this string rather than the initial
one, but that involves a lot more changes and it is easier to miss
something doing so.

That will work and doable, but the more fundamental issue here
seems to be that post_parse_analyze_hook or other similar hooks
are called with a Query incosistent with query_debug_string.

Sure.

It is very conveniently used but the name seems to me to be suggesting
that such usage is out of purpose. I'm not sure, though.

A similar behavior is shown in error messages but this harms far
less than the case of pg_stat_statements.

ERROR: column "b" does not exist
LINE 1: ...elect * from t where a = 1; select * from t where b = 2; com...
^

Ah, I wrote this piece of code a long time ago:-) The location is relative
to the full string, see comment above, changing that would involve much
more changes, and I'm not sure whether it is desirable.

Also, think of:

SELECT * FROM foo; DROP TABLE foo; SELECT * FROM foo;

Maybe the context to know which "SELECT * FROM foo" generates an error
should be kept.

1. Let all of the parse node have location in
debug_query_string. (The original proposal)

Yep.

2. Let the parser return once for each statement (like psql
parser)

I'm not sure it does... "\;" generates ";" in the output and the psql
lexer keeps on lexing.

and corresponding query string is stored in a
varialble other than debug_query_string.

I think that would involve many changes because of the way postgres is
written, the list is expected and returned by quite a few functions.
Moreover query rewriting may generate several queries out of one anyway,
so the list would be kept.

So I'm rather still in favor of my initial proposal, that is extend the
existing location information to statements, not only some tokens.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Fabien COELHO (#7)
Re: BUG: pg_stat_statements query normalization issues with combined queries

At Wed, 21 Dec 2016 09:28:58 +0100 (CET), Fabien COELHO <coelho@cri.ensmp.fr> wrote in <alpine.DEB.2.20.1612210841370.3892@lancre>

Hello Robert & Kyotaro,

I'm a little doubtful about your proposed fix. However, I haven't
studied the code, so I don't know what other approach might be better.

That is indeed the question!

The key point is that the parser parses several statements from a
string, but currently there is no clue about where the queries was
found in the string. Only a parser may know about what is being parsed
so can generate the location information. Now the possible solutions I
see are:

- the string is split per query before parsing, but this requires at
least a lexer... it looks pretty uneffective to have another lexing
phase involved, even if an existing lexer is reused.

I don't see this is promising. Apparently a waste of CPU cycles.

- the parser processes one query at a time and keep the "remaining
unparse string" in some state that can be queried to check up to
where it proceeded, but then the result must be stored somewhere
and it would make sense that it would be in the statement anyway,
just the management of the location information would be outside
the parser. Also that would add the cost of relaunching the parser,
not sure how bad or insignificant this is. This is (2) below.

I raised this as a spoiler, I see this is somewhat too invasive
for the problem to be solved.

- the parser still generates a List<Node> but keep track of the location
of statements doing so, somehow... propably as I outlined.

Yeah, this seems most reasonable so far. It seems to me to be
better that the statement location is conveyed as a part of a
parse tree so as not to need need a side channel for location.

I'd like to rename debug_query_string to more reasonable name if
we are allowed but perhaps not.

The query string information can be at some way of pointing in the
initial
string, or the substring itself that could be extracted at some point.
I initially suggested the former because this is already what the
parser does for some nodes, and because it seems simpler to do so.

Extracting the string instead would suggest that the location of
tokens
within this statement are relative to this string rather than the
initial one, but that involves a lot more changes and it is easier to
miss something doing so.

Agreed that copying statement string would be too much. But the
new *Stmt node should somehow have also the end location of the
statement since the user of a parse tree cannot look into another
one.

That will work and doable, but the more fundamental issue here
seems to be that post_parse_analyze_hook or other similar hooks
are called with a Query incosistent with query_debug_string.

Sure.

It is very conveniently used but the name seems to me to be suggesting
that such usage is out of purpose. I'm not sure, though.

A similar behavior is shown in error messages but this harms far
less than the case of pg_stat_statements.

ERROR: column "b" does not exist
LINE 1: ...elect * from t where a = 1; select * from t where b = 2;
com...
^

Ah, I wrote this piece of code a long time ago:-) The location is
relative to the full string, see comment above, changing that would
involve much
more changes, and I'm not sure whether it is desirable.

Also, think of:

SELECT * FROM foo; DROP TABLE foo; SELECT * FROM foo;

Maybe the context to know which "SELECT * FROM foo" generates an error
should be kept.

1. Let all of the parse node have location in
debug_query_string. (The original proposal)

Yep.

2. Let the parser return once for each statement (like psql
parser)

I'm not sure it does... "\;" generates ";" in the output and the psql
lexer keeps on lexing.

and corresponding query string is stored in a
varialble other than debug_query_string.

I think that would involve many changes because of the way postgres is
written, the list is expected and returned by quite a few
functions. Moreover query rewriting may generate several queries out
of one anyway, so the list would be kept.

So I'm rather still in favor of my initial proposal, that is extend
the existing location information to statements, not only some tokens.

I thought that it's too much to let all types of parse node have
location but grepping the gram.y with "makeNode" pursuaded me to
agree with you. After changing all *Stmt nodes, only several
types of nodes seems missing it.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Kyotaro Horiguchi (#8)
Re: BUG: pg_stat_statements query normalization issues with combined queries

Hello Kyotaro-san,

[...] Agreed that copying statement string would be too much. But the
new *Stmt node should somehow have also the end location of the
statement since the user of a parse tree cannot look into another one.

Yes. I was thinking of adding a "length" field next to "location", where
appropriate.

So I'm rather still in favor of my initial proposal, that is extend
the existing location information to statements, not only some tokens.

I thought that it's too much to let all types of parse node have
location but grepping the gram.y with "makeNode" pursuaded me to
agree with you.

Indeed...

After changing all *Stmt nodes, only several types of nodes seems
missing it.

Yes. I'll try to put together a patch and submit it to the next CF.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Fabien COELHO (#9)
Re: BUG: pg_stat_statements query normalization issues with combined queries

Yes. I'll try to put together a patch and submit it to the next CF.

Here it is. I'll add this to the next CF.

This patch fixes the handling of compound/combined queries by
pg_stat_statements (i.e. several queries sent together, eg with psql:
"SELECT 1 \; SELECT 2;").

This bug was found in passing while investigating a strange performance
issue I had with compound statements.

Collect query location information in parser:

* create a ParseNode for statements with location information fields
(qlocation & qlength).

* add these fields to all parsed statements (*Stmt), Query and PlannedStmt.

* statements location is filled in:

- the lexer keep track of the last ';' encountered.

- the information is used by the parser to compute the query length,
which depends on whether the query ended on ';' or on the input end.

- the query location is propagated to Query and PlannedStmt when built.

Fix pg_stat_statement:

* the information is used by pg_stat_statement so as to extract the relevant part
of the query string, including some trimming.

* pg_stat_statement validation is greatly extended so as to test options
and exercise various cases, including compound statements.

note 1: two non-statements tags (use for alter table commands) have been
moved so that all statements tags are contiguous and easy to check.

note 2: the impact on the lexer & parser is quite minimal with this
approach, about 30 LOC, most of which in one function. The largest changes
are in the node header to add location fields.

note 3: the query length excludes the final separator, so that
;-terminated and end of input terminated queries show the same.

note 4: the added test suggests that when tracking "all", queries in SQL
user functions are not tracked, this might be a bug.

--
Fabien.

Attachments:

parsenodes-1.patchtext/x-diff; name=parsenodes-1.patchDownload+808-31
#11Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Fabien COELHO (#10)
Re: BUG: pg_stat_statements query normalization issues with combined queries

Yes. I'll try to put together a patch and submit it to the next CF.

Here it is. I'll add this to the next CF.

Oops... better without a stupid overflow. Shame on me!

--
Fabien.

Attachments:

parsenodes-2.patchtext/x-diff; name=parsenodes-2.patchDownload+808-31
#12Craig Ringer
craig@2ndquadrant.com
In reply to: Robert Haas (#5)
Re: BUG: pg_stat_statements query normalization issues with combined queries

On 21 Dec. 2016 11:44, "Robert Haas" <robertmhaas@gmail.com> wrote:

On Tue, Dec 20, 2016 at 6:18 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Would this approach be acceptable, or is modifying Nodes a no-go area?

If it is acceptable, I can probably put together a patch and submit it.

If not, I suggest to update the documentation to tell that
pg_stat_statements does not work properly with combined queries.

I think you've found a bug, but I'm a little doubtful about your
proposed fix. However, I haven't studied the code, so I don't know
what other approach might be better.

FWIW this issue with multi-statements also causes issues with
ProcessUtility_hook. It gets the char* querytext of the whole
multistatement. Then gets invoked once for each utility command within. It
has no information about which statement text the current invocation
corresponds to.

Having a.pointer into the query text for the start and end would be good
there too. Not as good as doing away with multis entirely as a bad hack but
that's not practical for BC and protocol reasons.

BTW we should be sure the somewhat wacky semantics of multi-statements with
embedded commits are documented. I'll check tomorrow.

#13Craig Ringer
craig@2ndquadrant.com
In reply to: Fabien COELHO (#10)
Re: BUG: pg_stat_statements query normalization issues with combined queries

On 23 Dec. 2016 17:53, "Fabien COELHO" <coelho@cri.ensmp.fr> wrote:

Yes. I'll try to put together a patch and submit it to the next CF.

Here it is. I'll add this to the next CF.

Great. Please put me (ringerc) down as a reviewer. I'll get to this as soon
as holidays settle down. It'd be very useful for some things I'm working on
too, and is much better then my early draft of similar functionality.

#14Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Craig Ringer (#13)
Re: BUG: pg_stat_statements query normalization issues with combined queries

Hello Craig,

Please put me (ringerc) down as a reviewer.

Done.

Please do not loose time reviewing stupid version 1... skip to version 2
directly:-)

Also, note that the query-location part may be considered separately from
the pg_stat_statements part which uses it.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Fabien COELHO (#14)
Re: BUG: pg_stat_statements query normalization issues with combined queries

Hello,

At Mon, 26 Dec 2016 16:00:57 +0100 (CET), Fabien COELHO <coelho@cri.ensmp.fr> wrote in <alpine.DEB.2.20.1612261554510.4911@lancre>

Hello Craig,

Please put me (ringerc) down as a reviewer.

Done.

Please do not loose time reviewing stupid version 1... skip to version
2 directly:-)

Also, note that the query-location part may be considered separately
from the pg_stat_statements part which uses it.

In nonterminal stmtmulti, setQueryLocation is added and used to
calcualte and store the length of every stmt's. This needs an
extra storage in bse_yy_extra_type and introduces a bit
complicated stuff. But I think raw_parser can do that without
such extra storage and labor, then gram.y gets far simpler.

The struct member to store the location and length is named
'qlocation', and 'qlength' in the added ParseNode. But many nodes
already have 'location' of exactly the same purpopse. I don't see
a necessity to differentiate them.

Putting the two above together, the following is my suggestion
for the parser part.

- Make all parse nodes have the following two members at the
beginning. This unifies all parse node from the view of setting
their location. Commenting about this would be better.

| NodeTag type;
| int location;

- Remove struct ParseNode and setQueryLocation. stmtmulti sets
location in the same manner to other kind of nodes, just doing
'= @n'. base_yyparse() doesn't calculate statement length.

- Make raw_parser caluclate statement length then store it into
each stmt scanning the yyextra.parsetree.

What do you think about this?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#15)
Re: BUG: pg_stat_statements query normalization issues with combined queries

Hello,

At Tue, 27 Dec 2016 10:28:53 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20161227.102853.204155513.horiguchi.kyotaro@lab.ntt.co.jp>

Putting the two above together, the following is my suggestion
for the parser part.

- Make all parse nodes have the following two members at the
beginning. This unifies all parse node from the view of setting
their location. Commenting about this would be better.

| NodeTag type;
| int location;

- Remove struct ParseNode and setQueryLocation. stmtmulti sets
location in the same manner to other kind of nodes, just doing
'= @n'. base_yyparse() doesn't calculate statement length.

- Make raw_parser caluclate statement length then store it into
each stmt scanning the yyextra.parsetree.

An additional comment on parser(planner?) part.

This patch make planner() copy the location and length from
parse to result, but copying such stuffs is a job of
standard_planner.

Then the following is a comment on pg_stat_statements.c

- pg_stat_statements.c:977 - isParseNode(node)
node should be parsenode.

- The name for the addional parameter query_loc is written as
query_len in its prototype.

- In pgss_store, "else if (strlen(query)) != query_len)" doesn't
work as expected because of one-off error. query_len here is
the length of the query *excluding* the last semicolon.

- qtext_store doesn't rely on terminating nul character and the
function already takes query length as a parameter. So, there's
no need to copy the partial debug_query_string into palloc'ed
memory. Just replacing the query with query_loc will be
sufficient.

Have a nice holidays.

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Kyotaro Horiguchi (#15)
Re: BUG: pg_stat_statements query normalization issues with combined queries

Hello Kyotaro-san,

In nonterminal stmtmulti, setQueryLocation is added and used to
calcualte and store the length of every stmt's. This needs an
extra storage in bse_yy_extra_type

The extra storage is one int.

and introduces a bit complicated stuff. But I think raw_parser can do
that without such extra storage and labor,

How? The issue is that stmtmulti silently skip some ';' when empty
statements are found, so I need to keep track of that to know where to
stop, using the beginning location of the next statement, which is
probably your idea, does not work.

then gram.y gets far simpler.

The struct member to store the location and length is named
'qlocation', and 'qlength' in the added ParseNode. But many nodes
already have 'location' of exactly the same purpopse. I don't see
a necessity to differentiate them.

Alas, the "CreateTableSpaceStmt" struct already had a "char *location"
that I did not want to change... If I use "location", then I have to
change it, why not...

Another reason for the name difference is that qlocation/qlength
convention is slightly different from location when not set: location is
set manually to -1 when the information is not available, but this
convention is not possible for statements without changing all their
allocations and initializations (there are hundreds...), so the convention
I used for qlocation/qlength is that it is not set if qlength is zero,
which is the default value set by makeNode, so no change was needed...

Changing this point would mean a *lot* of possibly error prone changes in
the parser code everywhere statements are allocated...

Putting the two above together, the following is my suggestion
for the parser part.

- Make all parse nodes have the following two members at the
beginning. This unifies all parse node from the view of setting
their location. Commenting about this would be better.

| NodeTag type;
| int location;

Currently only a few parse nodes have locations, those about variables and
constants that can be designated by an error message. I added the
information to about 100 statements, but for the many remaining parse
nodes the information does not exist and is not needed, so I would prefer
to avoid adding a field both unset and unused.

- Remove struct ParseNode

"ParseNode" is needed to access the location and length of statements,
otherwise they are only "Node", which does not have a location.

and setQueryLocation.

The function is called twice, I wanted to avoid duplicating code.

stmtmulti sets location in the same manner to other kind of nodes, just
doing '= @n'. base_yyparse() doesn't calculate statement length.

But...

- Make raw_parser caluclate statement length then store it into
each stmt scanning the yyextra.parsetree.

... I cannot calculate the statement length cleanly because of empty
statements. If I know where the last ';' is, then the length computation
must be when the reduction occurs, hence the function called from the
stmtmulti rule.

What do you think about this?

I think that I do not know how to compute the length without knowing where
the last ';' was, because of how empty statements are silently skipped
around the stmtmulti/stmt rules, so I think that the length computation
should stay where it is.

What I can certainly do is:

- add more comments to explain why it is done like that

What I could do with some inputs from reviewers/committers is:

- rename the "char *location" field of the create table space statement
to "directory" or something else... but what?

- change qlocation/qlength to location/length...

However, then the -1 convention for location not set is not true, which
is annoying. Using this convention means hundreds of changes because every
statement allocation & initialization must be changed. This is
obviously possible, but is a much larger patch, which I cannot say
would be much better than this one, it would just be different.

What I'm reserved about:

- adding a location field to nodes that do not need it.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Kyotaro Horiguchi (#16)
Re: BUG: pg_stat_statements query normalization issues with combined queries

An additional comment on parser(planner?) part.

This patch make planner() copy the location and length from
parse to result, but copying such stuffs is a job of
standard_planner.

I put the copy in planner because standard_planner may not be called by
planner, and in all cases I think that the fields should be copied...

Otherwise it is the responsability of the planner hook to do the copy, it
would is ok if the planner hook calls standard_planner, but the fact is
not warranted, the comments says "the plugin would normally call
standard_planner". What if it does not?

So it seemed safer this way.

Then the following is a comment on pg_stat_statements.c

- pg_stat_statements.c:977 - isParseNode(node)
node should be parsenode.

Could be. ParseNode is a Node, it is just a pointer cast. I can assert on
pn instead of node.

- The name for the addional parameter query_loc is written as
query_len in its prototype.

Indeed, a typo on "generate_normalized_query" prototype.

- In pgss_store, "else if (strlen(query)) != query_len)" doesn't
work as expected because of one-off error. query_len here is
the length of the query *excluding* the last semicolon.

It was somehow intentional: if the query is not the same as the string,
then a copy is performed to have the right null-terminated string...
but

- qtext_store doesn't rely on terminating nul character and the
function already takes query length as a parameter. So, there's
no need to copy the partial debug_query_string into palloc'ed
memory. Just replacing the query with query_loc will be
sufficient.

Hmmm... it seemed good so I tried it, and it did not work...

The subtle reason is that qtext_store does assume that the query string is
null terminated... it just does not recompute the length its length.

However it can be changed to behave correctly in this case, so as to avoid
the copy.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Fabien COELHO (#17)
Re: BUG: pg_stat_statements query normalization issues with combined queries

Fabien COELHO <coelho@cri.ensmp.fr> writes:

How? The issue is that stmtmulti silently skip some ';' when empty
statements are found, so I need to keep track of that to know where to
stop, using the beginning location of the next statement, which is
probably your idea, does not work.

Maybe you should undo that. I've generally found that trying to put
optimizations into the grammar is a bad idea; it's better to KISS in
gram.y and apply optimizations later on.

- Make all parse nodes have the following two members at the
beginning. This unifies all parse node from the view of setting
their location. Commenting about this would be better.

| NodeTag type;
| int location;

I do not like this. You are making what should be a small patch into
a giant, invasive one. I'd think about adding a single parse node type
that corresponds to "stmt" in the grammar and really doesn't do anything
but carry the location info for the overall statement. That info could
then be transferred into the resulting Query node. Problem solved with
(I think) very little code touched.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Tom Lane (#19)
Re: BUG: pg_stat_statements query normalization issues with combined queries

Hello Tom,

How? The issue is that stmtmulti silently skip some ';' when empty
statements are found, [...]

Maybe you should undo that.

I was trying to be minimally invasive in the parser, i.e. not to change
any rules.

I've generally found that trying to put optimizations into the grammar
is a bad idea; it's better to KISS in gram.y and apply optimizations
later on.

I can change rules, but I'm not sure it will be better. It would allow to
remove the last ';' location in the lexer which Kyotar does not like, but
it would add some new stretching to compute the length of statements and
remove the empty statements.

I would say that it would be different, but not necessarily better.

- Make all parse nodes have the following two members at the
beginning. This unifies all parse node from the view of setting
their location. Commenting about this would be better.

| NodeTag type;
| int location;

I do not like this. You are making what should be a small patch into
a giant, invasive one.

I would not say that the current patch is giant & invasive, if you
abstract the two added fields to high-level statements.

I'd think about adding a single parse node type that corresponds to
"stmt" in the grammar and really doesn't do anything but carry the
location info for the overall statement. That info could then be
transferred into the resulting Query node.

I'm not sure I understand your suggestion. Currently I have added the
location & length information to all high-level statements nodes, plus
query and planned. I think that planned is necessary for utility
statements.

I understand that you suggest to add a new intermediate structure:

typedef struct {
NodeTag tag;
int location, length;
Node *stmt;
} ParseNode;

So that raw_parser would return a List<ParseNode> instead of a List<Node>,
and the statements would be unmodified.

Problem solved with (I think) very little code touched.

Hmmm...

Then raw_parser() callers have to manage a List<ParseNode> instead of the
List<Node>, I'm not sure of the size of the propagation to manage the
added indirection level appropriately: raw_parser is called 4 times (in
parser, tcop, commands, plpgsql...). The first call in tcop is
copyObject(), equal(), check_log_statement(), errdetail_execute()...

Probably it can be done, but I'm moderately unthousiastic: it looks like
starting unwinding a ball of wool, and I'm not sure where it would stop,
as it means touching at least the 4 uses of raw_parser in 4 distinct part
of postgres, whereas the current approach did not change anything for
raw_parser callers, although at the price of modifying all statement
nodes.

Did I read you correctly?

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Fabien COELHO (#20)
#22Craig Ringer
craig@2ndquadrant.com
In reply to: Tom Lane (#21)
#23Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Tom Lane (#21)
#24Craig Ringer
craig@2ndquadrant.com
In reply to: Fabien COELHO (#23)
#25Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Craig Ringer (#24)
#26Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Fabien COELHO (#23)
#27Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Fabien COELHO (#26)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kyotaro Horiguchi (#27)
#29Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Tom Lane (#28)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Fabien COELHO (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#30)
#32Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Tom Lane (#30)
#33Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Tom Lane (#31)
#34Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Fabien COELHO (#29)
#35Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Fabien COELHO (#33)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kyotaro Horiguchi (#34)
#37Craig Ringer
craig@2ndquadrant.com
In reply to: Tom Lane (#30)
#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Craig Ringer (#37)
#39Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Tom Lane (#38)
#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Fabien COELHO (#39)
#41Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Tom Lane (#40)
#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Fabien COELHO (#41)
#43Craig Ringer
craig@2ndquadrant.com
In reply to: Tom Lane (#42)
#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: Craig Ringer (#43)
#45Craig Ringer
craig@2ndquadrant.com
In reply to: Tom Lane (#44)
#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Craig Ringer (#45)
#47Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Tom Lane (#46)
#48Craig Ringer
craig@2ndquadrant.com
In reply to: Tom Lane (#46)
#49Tom Lane
tgl@sss.pgh.pa.us
In reply to: Craig Ringer (#48)
#50Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kyotaro Horiguchi (#47)
#51Craig Ringer
craig@2ndquadrant.com
In reply to: Tom Lane (#50)
#52Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Craig Ringer (#51)