[PROPOSAL] extend the object names to the qualified names in pg_stat_statements

Started by Sergei Agalakovover 7 years ago14 messageshackers

sergei.agalakov@gmail.com

over 7 years ago

Hi,

It would help to analyze performance issues if pg_stat_statements would
extend the object names to the qualified names.
Currently if we have two schemas ( say s1 and s2) with the objects with
the same name ( say tables t1) then after the next executions:

set schema 's1';
select count(*) from t1; // returns 10
set schema 's2';
select count(*) from t1; // returns 1000000000

we see in
select queryid, query from pg_stat_statements where query like '%from t1%'
something like this
3004391594 select count(*) from t1
1336375111 select count(*) from t1

We do see that the queries are different but we can't see why they are
so much different in the execution time.
If the pg_stat_statements module would extend the object name to the
qualified names like s1.t1 and s2.t2 in the new column query_qn then we
would see the report as
select queryid, query_qn from pg_stat_statements where query like '%from
t1%'
3004391594 select count(*) from s1.t1
1336375111 select count(*) from s2.t1
with an immediate understanding of what's going on.

This problem isn't only about table names. The SQL statement may refer
to the views, functions, operands etc. from the 'wrong' schema.
Obviously it would be even bigger help in the situations with the more
complex queries where it will be mush more difficult to find
that the query was executed with the incorrect search_path settings.

This change doesn't brake any existing functionality and will be easily
utilized in the monitoring scripts and tools.

Thank you,

Sergei Agalakov

It was discussed in pgsql-general, and now it seems to be ready as a
proposal to pgsql-hackers.
/messages/by-id/372d75cc-053b-1a07-948f-089408d3cd3a@gmail.com

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Sergei Agalakov (#1)

Re: [PROPOSAL] extend the object names to the qualified names in pg_stat_statements

Sergei Agalakov <sergei.agalakov@gmail.com> writes:

It would help to analyze performance issues if pg_stat_statements would
extend the object names to the qualified names.

What pg_stat_statements puts out is the original query text. As was
already pointed out to you, changing that text is likely to break
use-cases in which people are trying to match entries to actual
queries or log entries. This would also entail rather significant
overhead to find out schema names and interpolate them into the text.

regards, tom lane

Alvaro Herrera

alvherre@2ndquadrant.com

over 7 years ago

In reply to: Tom Lane (#2)

Re: [PROPOSAL] extend the object names to the qualified names in pg_stat_statements

On 2018-Nov-28, Tom Lane wrote:

Sergei Agalakov <sergei.agalakov@gmail.com> writes:

It would help to analyze performance issues if pg_stat_statements would
extend the object names to the qualified names.

What pg_stat_statements puts out is the original query text. As was
already pointed out to you, changing that text is likely to break
use-cases in which people are trying to match entries to actual
queries or log entries.

It's not immediately obvious, but he is proposing a _new_ column
query_qn that has qualified names, leaving the current query column
unchanged.

This would also entail rather significant overhead to find out schema
names and interpolate them into the text.

True. I was thinking that the qualified-names version of the query
would be obtained via ruleutils or some similar mechanism to deparse
from the parsed query tree (not from the original query text), where
only pg_catalog is considered visible. This would be enabled using a
GUC that defaults to off.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 7 years ago

In reply to: Alvaro Herrera (#3)

Re: [PROPOSAL] extend the object names to the qualified names in pg_stat_statements

On 11/28/18 10:46 PM, Alvaro Herrera wrote:

On 2018-Nov-28, Tom Lane wrote:

Sergei Agalakov <sergei.agalakov@gmail.com> writes:

It would help to analyze performance issues if pg_stat_statements would
extend the object names to the qualified names.

What pg_stat_statements puts out is the original query text. As was
already pointed out to you, changing that text is likely to break
use-cases in which people are trying to match entries to actual
queries or log entries.

It's not immediately obvious, but he is proposing a _new_ column
query_qn that has qualified names, leaving the current query column
unchanged.

This would also entail rather significant overhead to find out schema
names and interpolate them into the text.

True. I was thinking that the qualified-names version of the query
would be obtained via ruleutils or some similar mechanism to deparse
from the parsed query tree (not from the original query text), where
only pg_catalog is considered visible. This would be enabled using a
GUC that defaults to off.

Wouldn't it be sufficient / better to just store the search_path used
when executing the query? That should be enough to resolve the object
names correctly, and the overhead would be much lower (both in terms
extra space and CPU overhead).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Alvaro Herrera (#3)

Re: [PROPOSAL] extend the object names to the qualified names in pg_stat_statements

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2018-Nov-28, Tom Lane wrote:

This would also entail rather significant overhead to find out schema
names and interpolate them into the text.

True. I was thinking that the qualified-names version of the query
would be obtained via ruleutils or some similar mechanism to deparse
from the parsed query tree (not from the original query text), where
only pg_catalog is considered visible. This would be enabled using a
GUC that defaults to off.

Color me skeptical --- ruleutils has never especially been designed
to be fast, and I can't see that the overhead of this is going to be
acceptable to anybody who needs pg_stat_statements in production.
(Some admittedly rough experiments suggest that we might be
talking about an order-of-magnitude slowdown for simple queries.)

regards, tom lane

Sergei Agalakov

sergei.agalakov@gmail.com

over 7 years ago

In reply to: Tom Lane (#5)

Re: [PROPOSAL] extend the object names to the qualified names in pg_stat_statements

It's a real problem. I saw this pattern more than once already.
The people have several schemas with identical data structures as a
preparation to eventual migration of the schema to its own server in the
cloud.
So you have ten schemas, one generic user from connection pool and
pg_stat_statements is useless to identify a performance problem for a
schema.
You see ten different records with the different queryid and identical
query text, you see that one record indicates a performance issue
for this query in one particular schema, and you can't say what schema
it is. A report that groups queries by the text may hide this problem
altogether
because in average the statistics are OK.
I hoped that it would be possible to reassemble the query text after
parsing where the names were already uniquely resolved.

Thank you,

Sergei

Show quoted text

On 11/28/2018 2:59 PM, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2018-Nov-28, Tom Lane wrote:

This would also entail rather significant overhead to find out schema
names and interpolate them into the text.

True. I was thinking that the qualified-names version of the query
would be obtained via ruleutils or some similar mechanism to deparse
from the parsed query tree (not from the original query text), where
only pg_catalog is considered visible. This would be enabled using a
GUC that defaults to off.

Color me skeptical --- ruleutils has never especially been designed
to be fast, and I can't see that the overhead of this is going to be
acceptable to anybody who needs pg_stat_statements in production.
(Some admittedly rough experiments suggest that we might be
talking about an order-of-magnitude slowdown for simple queries.)

regards, tom lane

legrand legrand

legrand_legrand@hotmail.com

over 7 years ago

In reply to: Alvaro Herrera (#3)

Re: [PROPOSAL] extend the object names to the qualified names in pg_stat_statements

Alvaro Herrera-9 wrote

On 2018-Nov-28, Tom Lane wrote:

This would also entail rather significant overhead to find out schema
names and interpolate them into the text.

True. I was thinking that the qualified-names version of the query
would be obtained via ruleutils or some similar mechanism to deparse
from the parsed query tree (not from the original query text), where
only pg_catalog is considered visible. This would be enabled using a
GUC that defaults to off.

maybe that explain on "table with many columns" suffers from the same
problem, see:

/messages/by-id/1537816948278-0.post@n3.nabble.com

/messages/by-id/20180924195048.7dhnjvq5ggwjtiwi@alap3.anarazel.de

regards
PAscal

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Christoph Berg

myon@debian.org

over 7 years ago

In reply to: Tomas Vondra (#4)

Re: [PROPOSAL] extend the object names to the qualified names in pg_stat_statements

Re: Tomas Vondra 2018-11-28 <1b4e4c5e-7007-cd61-aae5-4a1c248e385c@2ndquadrant.com>

Wouldn't it be sufficient / better to just store the search_path used
when executing the query? That should be enough to resolve the object
names correctly, and the overhead would be much lower (both in terms
extra space and CPU overhead).

That would also work better for sub-queries from functions, if you
wanted to re-run the query in question. It's closer to reality.

Christoph

Alvaro Herrera

alvherre@2ndquadrant.com

over 7 years ago

In reply to: Tom Lane (#5)

Re: [PROPOSAL] extend the object names to the qualified names in pg_stat_statements

On 2018-Nov-28, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2018-Nov-28, Tom Lane wrote:

This would also entail rather significant overhead to find out schema
names and interpolate them into the text.

True. I was thinking that the qualified-names version of the query
would be obtained via ruleutils or some similar mechanism to deparse
from the parsed query tree (not from the original query text), where
only pg_catalog is considered visible. This would be enabled using a
GUC that defaults to off.

Color me skeptical --- ruleutils has never especially been designed
to be fast, and I can't see that the overhead of this is going to be
acceptable to anybody who needs pg_stat_statements in production.
(Some admittedly rough experiments suggest that we might be
talking about an order-of-magnitude slowdown for simple queries.)

Good point.

Maybe we can save the OID array of schemas that are in search_path when
the query is first entered into the statement pool, and produce the
query_qn column only at the time the entry is interpreted (that is, when
pg_stat_statements is query). ... oh, but that requires saving the plan
tree too, which doesn't sound very convenient.

Maybe just storing the search_path schemas (as Tomas already suggested)
is sufficient for Sergei's use case? Do away with query_qn as such, and
just have the user interpret the names according to the stored
search_path.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#10

Stephen Frost

sfrost@snowman.net

over 7 years ago

In reply to: Alvaro Herrera (#9)

Re: [PROPOSAL] extend the object names to the qualified names in pg_stat_statements

Greetings,

* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:

On 2018-Nov-28, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2018-Nov-28, Tom Lane wrote:

This would also entail rather significant overhead to find out schema
names and interpolate them into the text.

True. I was thinking that the qualified-names version of the query
would be obtained via ruleutils or some similar mechanism to deparse
from the parsed query tree (not from the original query text), where
only pg_catalog is considered visible. This would be enabled using a
GUC that defaults to off.

Color me skeptical --- ruleutils has never especially been designed
to be fast, and I can't see that the overhead of this is going to be
acceptable to anybody who needs pg_stat_statements in production.
(Some admittedly rough experiments suggest that we might be
talking about an order-of-magnitude slowdown for simple queries.)

Good point.

Maybe we can save the OID array of schemas that are in search_path when
the query is first entered into the statement pool, and produce the
query_qn column only at the time the entry is interpreted (that is, when
pg_stat_statements is query). ... oh, but that requires saving the plan
tree too, which doesn't sound very convenient.

Maybe just storing the search_path schemas (as Tomas already suggested)
is sufficient for Sergei's use case? Do away with query_qn as such, and
just have the user interpret the names according to the stored
search_path.

Seems like what you'd really want is to store all the environment, not
just the search_path (consider the $user case...). Maybe saving just
the OIDs of the search_path and then using them later would also work
but it seems like we're just building up to tracking everything and
doing it piecemeal with an extra column added in this release, another
in the next release, etc..

Thanks!

Stephen

#11

Sergei Agalakov

sergei.agalakov@gmail.com

over 7 years ago

In reply to: Alvaro Herrera (#9)

Re: [PROPOSAL] extend the object names to the qualified names in pg_stat_statements

On 11/29/2018 10:47 AM, Alvaro Herrera wrote:

On 2018-Nov-28, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2018-Nov-28, Tom Lane wrote:

This would also entail rather significant overhead to find out schema
names and interpolate them into the text.

True. I was thinking that the qualified-names version of the query
would be obtained via ruleutils or some similar mechanism to deparse
from the parsed query tree (not from the original query text), where
only pg_catalog is considered visible. This would be enabled using a
GUC that defaults to off.

Color me skeptical --- ruleutils has never especially been designed
to be fast, and I can't see that the overhead of this is going to be
acceptable to anybody who needs pg_stat_statements in production.
(Some admittedly rough experiments suggest that we might be
talking about an order-of-magnitude slowdown for simple queries.)

Good point.

Maybe we can save the OID array of schemas that are in search_path when
the query is first entered into the statement pool, and produce the
query_qn column only at the time the entry is interpreted (that is, when
pg_stat_statements is query). ... oh, but that requires saving the plan
tree too, which doesn't sound very convenient.

Maybe just storing the search_path schemas (as Tomas already suggested)
is sufficient for Sergei's use case? Do away with query_qn as such, and
just have the user interpret the names according to the stored
search_path.

I thought about just saving the search_path. It has all the necessary
information for a DBA or a developer, but creates problems for reporting
tools.
If we have the new query_qn column then we can group statistics by
query_qn and display it on the charts and graphs.
If instead we use a combination of query and search_path then a
reporting tools has to figure out the way to show both values to
distinguish between
different versions of query.

If it is easier/faster to add search_path then let's add search_path
instead of query_qn. It is already documented that query text isn't
unique by itself,
and reporting tools have to use query+queryid for uniqueness, and
search_path will provide the currently unavailable information to the
DBAs/developers.
Good.

#12

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Alvaro Herrera (#9)

Re: [PROPOSAL] extend the object names to the qualified names in pg_stat_statements

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2018-Nov-28, Tom Lane wrote:

Color me skeptical --- ruleutils has never especially been designed
to be fast, and I can't see that the overhead of this is going to be
acceptable to anybody who needs pg_stat_statements in production.

Good point.

Maybe we can save the OID array of schemas that are in search_path when
the query is first entered into the statement pool, and produce the
query_qn column only at the time the entry is interpreted (that is, when
pg_stat_statements is query). ... oh, but that requires saving the plan
tree too, which doesn't sound very convenient.

Yeah, and any subsequent DDL on relevant tables would break it.

Maybe just storing the search_path schemas (as Tomas already suggested)
is sufficient for Sergei's use case? Do away with query_qn as such, and
just have the user interpret the names according to the stored
search_path.

This'd be OK by me, though I'm not sure that it represents a convenient
solution for the original problem.

regards, tom lane

#13

Sergei Agalakov

sergei.agalakov@gmail.com

over 7 years ago

In reply to: Stephen Frost (#10)

Re: [PROPOSAL] extend the object names to the qualified names in pg_stat_statements

On 11/29/2018 10:59 AM, Stephen Frost wrote:

Greetings,

* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:

On 2018-Nov-28, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2018-Nov-28, Tom Lane wrote:

This would also entail rather significant overhead to find out schema
names and interpolate them into the text.

True. I was thinking that the qualified-names version of the query
would be obtained via ruleutils or some similar mechanism to deparse
from the parsed query tree (not from the original query text), where
only pg_catalog is considered visible. This would be enabled using a
GUC that defaults to off.

Color me skeptical --- ruleutils has never especially been designed
to be fast, and I can't see that the overhead of this is going to be
acceptable to anybody who needs pg_stat_statements in production.
(Some admittedly rough experiments suggest that we might be
talking about an order-of-magnitude slowdown for simple queries.)

Good point.

Maybe we can save the OID array of schemas that are in search_path when
the query is first entered into the statement pool, and produce the
query_qn column only at the time the entry is interpreted (that is, when
pg_stat_statements is query). ... oh, but that requires saving the plan
tree too, which doesn't sound very convenient.

Maybe just storing the search_path schemas (as Tomas already suggested)
is sufficient for Sergei's use case? Do away with query_qn as such, and
just have the user interpret the names according to the stored
search_path.

Seems like what you'd really want is to store all the environment, not
just the search_path (consider the $user case...). Maybe saving just
the OIDs of the search_path and then using them later would also work
but it seems like we're just building up to tracking everything and
doing it piecemeal with an extra column added in this release, another
in the next release, etc..

Thanks!

Stephen

It's a valid concern. Instead of the adding just search_path column we
can add a column session_info jsonb.
Its content can be defined by the new configuration parameter

pg_stat_statements.session_info ('current_schemas, current_user,
session_user') // a subset of the data from the system information functions

and it will have data like
{
ï¿½ï¿½ï¿½ "current_schemas" : ["pg_catalog", "s1", "s2", "public"],
ï¿½ï¿½ï¿½ "current_user" : "user1",
ï¿½ï¿½ï¿½ "session_user" : "user1"
}

It will allow the DBA/developer to reproduce a performance issue, and
will allow the deeper level of granularity for the reporting tools.
It's more complex than I have anticipated but doesn't break backward
compatibility and extensible.

Thank you,

Sergei

#14

Sergei Agalakov

sergei.agalakov@gmail.com

over 7 years ago

In reply to: Tom Lane (#12)

Re: [PROPOSAL] extend the object names to the qualified names in pg_stat_statements

On 11/29/2018 12:46 PM, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2018-Nov-28, Tom Lane wrote:

Color me skeptical --- ruleutils has never especially been designed
to be fast, and I can't see that the overhead of this is going to be
acceptable to anybody who needs pg_stat_statements in production.

Good point.
Maybe we can save the OID array of schemas that are in search_path when
the query is first entered into the statement pool, and produce the
query_qn column only at the time the entry is interpreted (that is, when
pg_stat_statements is query). ... oh, but that requires saving the plan
tree too, which doesn't sound very convenient.

Yeah, and any subsequent DDL on relevant tables would break it.

Maybe just storing the search_path schemas (as Tomas already suggested)
is sufficient for Sergei's use case? Do away with query_qn as such, and
just have the user interpret the names according to the stored
search_path.

This'd be OK by me, though I'm not sure that it represents a convenient
solution for the original problem.

regards, tom lane

It's less convenient, but it presents a solution. Either that or even
better a new session_info column that
may capture together with search_path also current_user, session_user
etc. that would allow to distinct
between the different versions of the query based on the real context of
execution.

Thank you,

Sergei