[PROPOSAL] extend the object names to the qualified names in pg_stat_statements
Hi,
It would help to analyze performance issues if pg_stat_statements would
extend the object names to the qualified names.
Currently if we have two schemas ( say s1 and s2) with the objects with
the same name ( say tables t1) then after the next executions:
set schema 's1';
select count(*) from t1; // returns 10
set schema 's2';
select count(*) from t1; // returns 1000000000
we see in
select queryid, query from pg_stat_statements where query like '%from t1%'
something like this
3004391594 select count(*) from t1
1336375111 select count(*) from t1
We do see that the queries are different but we can't see why they are
so much different in the execution time.
If the pg_stat_statements module would extend the object name to the
qualified names like s1.t1 and s2.t2 in the new column query_qn then we
would see the report as
select queryid, query_qn from pg_stat_statements where query like '%from
t1%'
3004391594 select count(*) from s1.t1
1336375111 select count(*) from s2.t1
with an immediate understanding of what's going on.
This problem isn't only about table names. The SQL statement may refer
to the views, functions, operands etc. from the 'wrong' schema.
Obviously it would be even bigger help in the situations with the more
complex queries where it will be mush more difficult to find
that the query was executed with the incorrect search_path settings.
This change doesn't brake any existing functionality and will be easily
utilized in the monitoring scripts and tools.
Thank you,
Sergei Agalakov
It was discussed in pgsql-general, and now it seems to be ready as a
proposal to pgsql-hackers.
/messages/by-id/372d75cc-053b-1a07-948f-089408d3cd3a@gmail.com
Sergei Agalakov <sergei.agalakov@gmail.com> writes:
It would help to analyze performance issues if pg_stat_statements would
extend the object names to the qualified names.
What pg_stat_statements puts out is the original query text. As was
already pointed out to you, changing that text is likely to break
use-cases in which people are trying to match entries to actual
queries or log entries. This would also entail rather significant
overhead to find out schema names and interpolate them into the text.
regards, tom lane
On 2018-Nov-28, Tom Lane wrote:
Sergei Agalakov <sergei.agalakov@gmail.com> writes:
It would help to analyze performance issues if pg_stat_statements would
extend the object names to the qualified names.What pg_stat_statements puts out is the original query text. As was
already pointed out to you, changing that text is likely to break
use-cases in which people are trying to match entries to actual
queries or log entries.
It's not immediately obvious, but he is proposing a _new_ column
query_qn that has qualified names, leaving the current query column
unchanged.
This would also entail rather significant overhead to find out schema
names and interpolate them into the text.
True. I was thinking that the qualified-names version of the query
would be obtained via ruleutils or some similar mechanism to deparse
from the parsed query tree (not from the original query text), where
only pg_catalog is considered visible. This would be enabled using a
GUC that defaults to off.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 11/28/18 10:46 PM, Alvaro Herrera wrote:
On 2018-Nov-28, Tom Lane wrote:
Sergei Agalakov <sergei.agalakov@gmail.com> writes:
It would help to analyze performance issues if pg_stat_statements would
extend the object names to the qualified names.What pg_stat_statements puts out is the original query text. As was
already pointed out to you, changing that text is likely to break
use-cases in which people are trying to match entries to actual
queries or log entries.It's not immediately obvious, but he is proposing a _new_ column
query_qn that has qualified names, leaving the current query column
unchanged.This would also entail rather significant overhead to find out schema
names and interpolate them into the text.True. I was thinking that the qualified-names version of the query
would be obtained via ruleutils or some similar mechanism to deparse
from the parsed query tree (not from the original query text), where
only pg_catalog is considered visible. This would be enabled using a
GUC that defaults to off.
Wouldn't it be sufficient / better to just store the search_path used
when executing the query? That should be enough to resolve the object
names correctly, and the overhead would be much lower (both in terms
extra space and CPU overhead).
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2018-Nov-28, Tom Lane wrote:
This would also entail rather significant overhead to find out schema
names and interpolate them into the text.
True. I was thinking that the qualified-names version of the query
would be obtained via ruleutils or some similar mechanism to deparse
from the parsed query tree (not from the original query text), where
only pg_catalog is considered visible. This would be enabled using a
GUC that defaults to off.
Color me skeptical --- ruleutils has never especially been designed
to be fast, and I can't see that the overhead of this is going to be
acceptable to anybody who needs pg_stat_statements in production.
(Some admittedly rough experiments suggest that we might be
talking about an order-of-magnitude slowdown for simple queries.)
regards, tom lane
It's a real problem. I saw this pattern more than once already.
The people have several schemas with identical data structures as a
preparation to eventual migration of the schema to its own server in the
cloud.
So you have ten schemas, one generic user from connection pool and
pg_stat_statements is useless to identify a performance problem for a
schema.
You see ten different records with the different queryid and identical
query text, you see that one record indicates a performance issue
for this query in one particular schema, and you can't say what schema
it is. A report that groups queries by the text may hide this problem
altogether
because in average the statistics are OK.
I hoped that it would be possible to reassemble the query text after
parsing where the names were already uniquely resolved.
Thank you,
Sergei
Show quoted text
On 11/28/2018 2:59 PM, Tom Lane wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2018-Nov-28, Tom Lane wrote:
This would also entail rather significant overhead to find out schema
names and interpolate them into the text.True. I was thinking that the qualified-names version of the query
would be obtained via ruleutils or some similar mechanism to deparse
from the parsed query tree (not from the original query text), where
only pg_catalog is considered visible. This would be enabled using a
GUC that defaults to off.Color me skeptical --- ruleutils has never especially been designed
to be fast, and I can't see that the overhead of this is going to be
acceptable to anybody who needs pg_stat_statements in production.
(Some admittedly rough experiments suggest that we might be
talking about an order-of-magnitude slowdown for simple queries.)regards, tom lane
Alvaro Herrera-9 wrote
On 2018-Nov-28, Tom Lane wrote:
This would also entail rather significant overhead to find out schema
names and interpolate them into the text.True. I was thinking that the qualified-names version of the query
would be obtained via ruleutils or some similar mechanism to deparse
from the parsed query tree (not from the original query text), where
only pg_catalog is considered visible. This would be enabled using a
GUC that defaults to off.
maybe that explain on "table with many columns" suffers from the same
problem, see:
/messages/by-id/1537816948278-0.post@n3.nabble.com
/messages/by-id/20180924195048.7dhnjvq5ggwjtiwi@alap3.anarazel.de
regards
PAscal
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: Tomas Vondra 2018-11-28 <1b4e4c5e-7007-cd61-aae5-4a1c248e385c@2ndquadrant.com>
Wouldn't it be sufficient / better to just store the search_path used
when executing the query? That should be enough to resolve the object
names correctly, and the overhead would be much lower (both in terms
extra space and CPU overhead).
That would also work better for sub-queries from functions, if you
wanted to re-run the query in question. It's closer to reality.
Christoph
On 2018-Nov-28, Tom Lane wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2018-Nov-28, Tom Lane wrote:
This would also entail rather significant overhead to find out schema
names and interpolate them into the text.True. I was thinking that the qualified-names version of the query
would be obtained via ruleutils or some similar mechanism to deparse
from the parsed query tree (not from the original query text), where
only pg_catalog is considered visible. This would be enabled using a
GUC that defaults to off.Color me skeptical --- ruleutils has never especially been designed
to be fast, and I can't see that the overhead of this is going to be
acceptable to anybody who needs pg_stat_statements in production.
(Some admittedly rough experiments suggest that we might be
talking about an order-of-magnitude slowdown for simple queries.)
Good point.
Maybe we can save the OID array of schemas that are in search_path when
the query is first entered into the statement pool, and produce the
query_qn column only at the time the entry is interpreted (that is, when
pg_stat_statements is query). ... oh, but that requires saving the plan
tree too, which doesn't sound very convenient.
Maybe just storing the search_path schemas (as Tomas already suggested)
is sufficient for Sergei's use case? Do away with query_qn as such, and
just have the user interpret the names according to the stored
search_path.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Greetings,
* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
On 2018-Nov-28, Tom Lane wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2018-Nov-28, Tom Lane wrote:
This would also entail rather significant overhead to find out schema
names and interpolate them into the text.True. I was thinking that the qualified-names version of the query
would be obtained via ruleutils or some similar mechanism to deparse
from the parsed query tree (not from the original query text), where
only pg_catalog is considered visible. This would be enabled using a
GUC that defaults to off.Color me skeptical --- ruleutils has never especially been designed
to be fast, and I can't see that the overhead of this is going to be
acceptable to anybody who needs pg_stat_statements in production.
(Some admittedly rough experiments suggest that we might be
talking about an order-of-magnitude slowdown for simple queries.)Good point.
Maybe we can save the OID array of schemas that are in search_path when
the query is first entered into the statement pool, and produce the
query_qn column only at the time the entry is interpreted (that is, when
pg_stat_statements is query). ... oh, but that requires saving the plan
tree too, which doesn't sound very convenient.Maybe just storing the search_path schemas (as Tomas already suggested)
is sufficient for Sergei's use case? Do away with query_qn as such, and
just have the user interpret the names according to the stored
search_path.
Seems like what you'd really want is to store all the environment, not
just the search_path (consider the $user case...). Maybe saving just
the OIDs of the search_path and then using them later would also work
but it seems like we're just building up to tracking everything and
doing it piecemeal with an extra column added in this release, another
in the next release, etc..
Thanks!
Stephen
On 11/29/2018 10:47 AM, Alvaro Herrera wrote:
On 2018-Nov-28, Tom Lane wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2018-Nov-28, Tom Lane wrote:
This would also entail rather significant overhead to find out schema
names and interpolate them into the text.True. I was thinking that the qualified-names version of the query
would be obtained via ruleutils or some similar mechanism to deparse
from the parsed query tree (not from the original query text), where
only pg_catalog is considered visible. This would be enabled using a
GUC that defaults to off.Color me skeptical --- ruleutils has never especially been designed
to be fast, and I can't see that the overhead of this is going to be
acceptable to anybody who needs pg_stat_statements in production.
(Some admittedly rough experiments suggest that we might be
talking about an order-of-magnitude slowdown for simple queries.)Good point.
Maybe we can save the OID array of schemas that are in search_path when
the query is first entered into the statement pool, and produce the
query_qn column only at the time the entry is interpreted (that is, when
pg_stat_statements is query). ... oh, but that requires saving the plan
tree too, which doesn't sound very convenient.Maybe just storing the search_path schemas (as Tomas already suggested)
is sufficient for Sergei's use case? Do away with query_qn as such, and
just have the user interpret the names according to the stored
search_path.
I thought about just saving the search_path. It has all the necessary
information for a DBA or a developer, but creates problems for reporting
tools.
If we have the new query_qn column then we can group statistics by
query_qn and display it on the charts and graphs.
If instead we use a combination of query and search_path then a
reporting tools has to figure out the way to show both values to
distinguish between
different versions of query.
If it is easier/faster to add search_path then let's add search_path
instead of query_qn. It is already documented that query text isn't
unique by itself,
and reporting tools have to use query+queryid for uniqueness, and
search_path will provide the currently unavailable information to the
DBAs/developers.
Good.
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2018-Nov-28, Tom Lane wrote:
Color me skeptical --- ruleutils has never especially been designed
to be fast, and I can't see that the overhead of this is going to be
acceptable to anybody who needs pg_stat_statements in production.
Good point.
Maybe we can save the OID array of schemas that are in search_path when
the query is first entered into the statement pool, and produce the
query_qn column only at the time the entry is interpreted (that is, when
pg_stat_statements is query). ... oh, but that requires saving the plan
tree too, which doesn't sound very convenient.
Yeah, and any subsequent DDL on relevant tables would break it.
Maybe just storing the search_path schemas (as Tomas already suggested)
is sufficient for Sergei's use case? Do away with query_qn as such, and
just have the user interpret the names according to the stored
search_path.
This'd be OK by me, though I'm not sure that it represents a convenient
solution for the original problem.
regards, tom lane
On 11/29/2018 10:59 AM, Stephen Frost wrote:
Greetings,
* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
On 2018-Nov-28, Tom Lane wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2018-Nov-28, Tom Lane wrote:
This would also entail rather significant overhead to find out schema
names and interpolate them into the text.True. I was thinking that the qualified-names version of the query
would be obtained via ruleutils or some similar mechanism to deparse
from the parsed query tree (not from the original query text), where
only pg_catalog is considered visible. This would be enabled using a
GUC that defaults to off.Color me skeptical --- ruleutils has never especially been designed
to be fast, and I can't see that the overhead of this is going to be
acceptable to anybody who needs pg_stat_statements in production.
(Some admittedly rough experiments suggest that we might be
talking about an order-of-magnitude slowdown for simple queries.)Good point.
Maybe we can save the OID array of schemas that are in search_path when
the query is first entered into the statement pool, and produce the
query_qn column only at the time the entry is interpreted (that is, when
pg_stat_statements is query). ... oh, but that requires saving the plan
tree too, which doesn't sound very convenient.Maybe just storing the search_path schemas (as Tomas already suggested)
is sufficient for Sergei's use case? Do away with query_qn as such, and
just have the user interpret the names according to the stored
search_path.Seems like what you'd really want is to store all the environment, not
just the search_path (consider the $user case...). Maybe saving just
the OIDs of the search_path and then using them later would also work
but it seems like we're just building up to tracking everything and
doing it piecemeal with an extra column added in this release, another
in the next release, etc..Thanks!
Stephen
It's a valid concern. Instead of the adding just search_path column we
can add a column session_info jsonb.
Its content can be defined by the new configuration parameter
pg_stat_statements.session_info ('current_schemas, current_user,
session_user') // a subset of the data from the system information functions
and it will have data like
{
��� "current_schemas" : ["pg_catalog", "s1", "s2", "public"],
��� "current_user" : "user1",
��� "session_user" : "user1"
}
It will allow the DBA/developer to reproduce a performance issue, and
will allow the deeper level of granularity for the reporting tools.
It's more complex than I have anticipated but doesn't break backward
compatibility and extensible.
Thank you,
Sergei
On 11/29/2018 12:46 PM, Tom Lane wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2018-Nov-28, Tom Lane wrote:
Color me skeptical --- ruleutils has never especially been designed
to be fast, and I can't see that the overhead of this is going to be
acceptable to anybody who needs pg_stat_statements in production.Good point.
Maybe we can save the OID array of schemas that are in search_path when
the query is first entered into the statement pool, and produce the
query_qn column only at the time the entry is interpreted (that is, when
pg_stat_statements is query). ... oh, but that requires saving the plan
tree too, which doesn't sound very convenient.Yeah, and any subsequent DDL on relevant tables would break it.
Maybe just storing the search_path schemas (as Tomas already suggested)
is sufficient for Sergei's use case? Do away with query_qn as such, and
just have the user interpret the names according to the stored
search_path.This'd be OK by me, though I'm not sure that it represents a convenient
solution for the original problem.regards, tom lane
It's less convenient, but it presents a solution. Either that or even
better a new session_info column that
may capture together with search_path also current_user, session_user
etc. that would allow to distinct
between the different versions of the query based on the real context of
execution.
Thank you,
Sergei