Multiple Query IDs for a rewritten parse tree
On 5/1/2022 10:13, Tom Lane wrote:
I feel like we need to get away from the idea that there is just
one query hash, and somehow let different extensions attach
differently-calculated hashes to a query. I don't have any immediate
ideas about how to do that in a reasonably inexpensive way.
Now, queryId field represents an query class (depending on an jumbling
implementation). It is used by extensions as the way for simple tracking
a query from a parse tree creation point to the end of its life along
all hook calls, which an extension uses (remember about possible plan
caching).
I know at least two different cases of using queryId:
1) for monitoring purposes - pg_stat_statements is watching how often
queries of a class emerge in the database and collects a stat on each class.
2) adaptive purposes - some extensions influence a planner decision
during the optimization stage and want to learn on a performance shift
at the end of execution stage.
Different purposes may require different jumbling implementations. But
users can want to use such extensions at the same time. So we should
allow to attach many different query IDs to a query (maybe better to
call it as 'query label'?).
Thinking for a while I invented three different ways to implement it:
1. queryId will be a trivial 64-bit counter. So, each extension can
differ each query from any other, track it along all hooks, use an
jumbling code and store an queryId internally. Here only one big problem
I see - increasing overhead in the case of many consumers of queryId
feature.
2. Instead of simple queryId we can store a list of pairs (QueryId,
funcOid). An extension can register a callback for queryId generation
and the core will form a list of queryIds right after an query tree
rewriting. funcOid is needed to differ jumbling implementations. Here we
should invent an additional node type for an element of the list.
3. Instead of queryId we could add a multi-purpose 'private' list in the
Query struct. Any extension can add to this list additional object(s)
(with registered object type, of course). As an example, i can imagine a
kind of convention for queryIds in such case - store a String node with
value: '<extension name> - <Query ID>'.
This way we should implement a registered callback mechanism too.
I think, third way is the cheapest, flexible and simple for implementation.
Any thoughts, comments, criticism ?
--
regards,
Andrey Lepikhov
Postgres Professional
Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:
On 5/1/2022 10:13, Tom Lane wrote:
I feel like we need to get away from the idea that there is just
one query hash, and somehow let different extensions attach
differently-calculated hashes to a query. I don't have any immediate
ideas about how to do that in a reasonably inexpensive way.
Thinking for a while I invented three different ways to implement it:
1. queryId will be a trivial 64-bit counter.
This seems pretty useless. The whole point of the query hash, at
least for many use-cases, is to allow recognizing queries that are
the same or similar.
2. Instead of simple queryId we can store a list of pairs (QueryId,
funcOid). An extension can register a callback for queryId generation
and the core will form a list of queryIds right after an query tree
rewriting. funcOid is needed to differ jumbling implementations. Here we
should invent an additional node type for an element of the list.
I'm not sure that funcOid is a reasonable way to tag different hash
calculation methods, because it isn't necessarily stable across
installations. For the same reason, it'd be hard for two extensions
to collaborate on a common query-hash definition.
3. Instead of queryId we could add a multi-purpose 'private' list in the
Query struct. Any extension can add to this list additional object(s)
(with registered object type, of course). As an example, i can imagine a
kind of convention for queryIds in such case - store a String node with
value: '<extension name> - <Query ID>'.
Again, this is presuming that every extension is totally independent
and has no interest in what any other code is doing. But I don't
think we want to make every extension that wants a hash duplicate
the whole of queryjumble.c.
The idea I'd been vaguely thinking about is to allow attaching a list
of query-hash nodes to a Query, where each node would contain a "tag"
identifying the specific hash calculation method, and also the value
of the query's hash calculated according to that method. We could
probably get away with saying that all such hash values must be uint64.
The main difference from your function-OID idea, I think, is that
I'm envisioning the tags as being small integers with well-known
values, similarly to the way we manage stakind values in pg_statistic.
In this way, an extension that wants a hash that the core knows how
to calculate doesn't need its own copy of the code, and similarly
one extension could publish a calculation method for use by other
extensions.
We'd also need some mechanism for registering a function to be
used to calculate the hash for any given tag value, of course.
regards, tom lane
On Sat, Jan 08, 2022 at 07:49:59PM -0500, Tom Lane wrote:
The idea I'd been vaguely thinking about is to allow attaching a list
of query-hash nodes to a Query, where each node would contain a "tag"
identifying the specific hash calculation method, and also the value
of the query's hash calculated according to that method. We could
probably get away with saying that all such hash values must be uint64.
The main difference from your function-OID idea, I think, is that
I'm envisioning the tags as being small integers with well-known
values, similarly to the way we manage stakind values in pg_statistic.
In this way, an extension that wants a hash that the core knows how
to calculate doesn't need its own copy of the code, and similarly
one extension could publish a calculation method for use by other
extensions.
An extension that wants a slightly modified version of hash calculation
implementation from the core would still need to copy everything. The
core probably has to provide more than one (hash, method) pair to cover
some basic needs.
On Sun, Jan 09, 2022 at 12:43:06PM +0100, Dmitry Dolgov wrote:
An extension that wants a slightly modified version of hash calculation
implementation from the core would still need to copy everything. The
core probably has to provide more than one (hash, method) pair to cover
some basic needs.
Or just GUC(s) to adapt the behavior. But in any case there isn't much that
can be done that won't result in a huge performance drop (like e.g. the wanted
stability over logical replication or backup/restore).
On Sat, Jan 08, 2022 at 07:49:59PM -0500, Tom Lane wrote:
The idea I'd been vaguely thinking about is to allow attaching a list
of query-hash nodes to a Query, where each node would contain a "tag"
identifying the specific hash calculation method, and also the value
of the query's hash calculated according to that method.
For now the queryid mixes two different things: fingerprinting and query text
normalization. Should each calculation method be allowed to do a different
normalization too, and if yes where should be stored the state data needed for
that? If not, we would need some kind of primary hash for that purpose.
Looking at Andrey's use case for wanting multiple hashes, I don't think that
adaptive optimization needs a normalized query string. The only use would be
to output some statistics, but this could be achieved by storing a list of
"primary queryid" for each adaptive entry. That's probably also true for
anything that's not monitoring intended. Also, all monitoring consumers should
probably agree on the same queryid, both fingerprint and normalized string, as
otherwise it's impossible to cross-reference metric data.
On 1/9/22 5:13 PM, Julien Rouhaud wrote:
For now the queryid mixes two different things: fingerprinting and query text
normalization. Should each calculation method be allowed to do a different
normalization too, and if yes where should be stored the state data needed for
that? If not, we would need some kind of primary hash for that purpose.
Do You mean JumbleState?
I think, registering queryId generator we should store also a pointer
(void **args) to an additional data entry, as usual.
Looking at Andrey's use case for wanting multiple hashes, I don't think that
adaptive optimization needs a normalized query string. The only use would be
to output some statistics, but this could be achieved by storing a list of
"primary queryid" for each adaptive entry. That's probably also true for
anything that's not monitoring intended. Also, all monitoring consumers should
probably agree on the same queryid, both fingerprint and normalized string, as
otherwise it's impossible to cross-reference metric data.
I can add one more use case.
Our extension for freezing query plan uses query tree comparison
technique to prove, that the plan can be applied (and we don't need to
execute planning procedure at all).
The procedure of a tree equality checking is expensive and we use
cheaper queryId comparison to identify possible candidates. So here, for
the better performance and queries coverage, we need to use query tree
normalization - queryId should be stable to some modifications in a
query text which do not change semantics.
As an example, query plan with external parameters can be used to
execute constant query if these constants correspond by place and type
to the parameters. So, queryId calculation technique returns also
pointers to all constants and parameters found during the calculation.
--
regards,
Andrey Lepikhov
Postgres Professional
On Mon, Jan 10, 2022 at 09:10:59AM +0500, Andrey V. Lepikhov wrote:
On 1/9/22 5:13 PM, Julien Rouhaud wrote:
For now the queryid mixes two different things: fingerprinting and query text
normalization. Should each calculation method be allowed to do a different
normalization too, and if yes where should be stored the state data needed for
that? If not, we would need some kind of primary hash for that purpose.Do You mean JumbleState?
Yes, or some other abstraction. For instance with the proposed patch to handle
differently the IN clauses, you needs a to change JumbleState.
I think, registering queryId generator we should store also a pointer (void
**args) to an additional data entry, as usual.
Well, yes but only if you need to produce different versions of the normalized
query text, and I'm not convinced that's really necessary.
Looking at Andrey's use case for wanting multiple hashes, I don't think that
adaptive optimization needs a normalized query string. The only use would be
to output some statistics, but this could be achieved by storing a list of
"primary queryid" for each adaptive entry. That's probably also true for
anything that's not monitoring intended. Also, all monitoring consumers should
probably agree on the same queryid, both fingerprint and normalized string, as
otherwise it's impossible to cross-reference metric data.I can add one more use case.
Our extension for freezing query plan uses query tree comparison technique
to prove, that the plan can be applied (and we don't need to execute
planning procedure at all).
The procedure of a tree equality checking is expensive and we use cheaper
queryId comparison to identify possible candidates. So here, for the better
performance and queries coverage, we need to use query tree normalization -
queryId should be stable to some modifications in a query text which do not
change semantics.
As an example, query plan with external parameters can be used to execute
constant query if these constants correspond by place and type to the
parameters. So, queryId calculation technique returns also pointers to all
constants and parameters found during the calculation.
I'm also working on a similar extension, and yes you can't accept any
fingerprinting approach for that. I don't know what are the exact heuristics
of your cheaper queryid calculation are, but is it reasonable to use it with
something like pg_stat_statements? If yes, you don't really need two queryid
approach for the sake of this single extension and therefore don't need to
store multiple jumble state or similar per statement. Especially since
requiring another one would mean a performance drop as soon as you want to use
something as common as pg_stat_statements.
On 1/10/22 9:51 AM, Julien Rouhaud wrote:
On Mon, Jan 10, 2022 at 09:10:59AM +0500, Andrey V. Lepikhov wrote:
I can add one more use case.
Our extension for freezing query plan uses query tree comparison technique
to prove, that the plan can be applied (and we don't need to execute
planning procedure at all).
The procedure of a tree equality checking is expensive and we use cheaper
queryId comparison to identify possible candidates. So here, for the better
performance and queries coverage, we need to use query tree normalization -
queryId should be stable to some modifications in a query text which do not
change semantics.
As an example, query plan with external parameters can be used to execute
constant query if these constants correspond by place and type to the
parameters. So, queryId calculation technique returns also pointers to all
constants and parameters found during the calculation.I'm also working on a similar extension, and yes you can't accept any
fingerprinting approach for that. I don't know what are the exact heuristics
of your cheaper queryid calculation are, but is it reasonable to use it with
something like pg_stat_statements? If yes, you don't really need two queryid
approach for the sake of this single extension and therefore don't need to
store multiple jumble state or similar per statement. Especially since
requiring another one would mean a performance drop as soon as you want to use
something as common as pg_stat_statements.
I think, pg_stat_statements can live with an queryId generator of the
sr_plan extension. But It replaces all constants with $XXX parameter at
the query string. In our extension user defines which plan is optimal
and which constants can be used as parameters in the plan.
One drawback I see here - creating or dropping of my extension changes
behavior of pg_stat_statements that leads to distortion of the DB load
profile. Also, we haven't guarantees, that another extension will work
correctly (or in optimal way) with such queryId.
--
regards,
Andrey Lepikhov
Postgres Professional
On Mon, Jan 10, 2022 at 12:37:34PM +0500, Andrey V. Lepikhov wrote:
I think, pg_stat_statements can live with an queryId generator of the
sr_plan extension. But It replaces all constants with $XXX parameter at the
query string. In our extension user defines which plan is optimal and which
constants can be used as parameters in the plan.
I don't know the details of that extension, but I think that it should work as
long as you have the constants information in the jumble state, whatever the
resulting normalized query string is right?
One drawback I see here - creating or dropping of my extension changes
behavior of pg_stat_statements that leads to distortion of the DB load
profile. Also, we haven't guarantees, that another extension will work
correctly (or in optimal way) with such queryId.
But then, if generating 2 queryid is a better for performance, does the
extension really need an additional jumble state and/or normalized query
string?
On 10/1/2022 13:56, Julien Rouhaud wrote:
On Mon, Jan 10, 2022 at 12:37:34PM +0500, Andrey V. Lepikhov wrote:
I think, pg_stat_statements can live with an queryId generator of the
sr_plan extension. But It replaces all constants with $XXX parameter at the
query string. In our extension user defines which plan is optimal and which
constants can be used as parameters in the plan.I don't know the details of that extension, but I think that it should work as
long as you have the constants information in the jumble state, whatever the
resulting normalized query string is right?
Yes. the same input query string doesn't prove that frozen query plan
can be used, because rewrite rules could be changed. So we use only a
query tree. Here we must have custom jumbling implementation.
queryId in this extension defines two aspects:
1. How many input queries will be compared with a query tree template of
the frozen statement.
2. As a result, performance overheads on unsuccessful comparings.
One drawback I see here - creating or dropping of my extension changes
behavior of pg_stat_statements that leads to distortion of the DB load
profile. Also, we haven't guarantees, that another extension will work
correctly (or in optimal way) with such queryId.But then, if generating 2 queryid is a better for performance, does the
extension really need an additional jumble state and/or normalized query
string?
Additional Jumble state isn't necessary, but it would be better for
performance to collect pointers to all constant nodes during a process
of hash generation.
--
regards,
Andrey Lepikhov
Postgres Professional
On Mon, Jan 10, 2022 at 03:24:46PM +0500, Andrey Lepikhov wrote:
On 10/1/2022 13:56, Julien Rouhaud wrote:
I don't know the details of that extension, but I think that it should work as
long as you have the constants information in the jumble state, whatever the
resulting normalized query string is right?Yes. the same input query string doesn't prove that frozen query plan can be
used, because rewrite rules could be changed. So we use only a query tree.
Yes, I'm fully aware of that. I wasn't asking about using the input query
string but the need for generating a dedicated normalized output query string
that would be potentially different from the one generated by
pg_stat_statements (or similar).
But then, if generating 2 queryid is a better for performance, does the
extension really need an additional jumble state and/or normalized query
string?Additional Jumble state isn't necessary, but it would be better for
performance to collect pointers to all constant nodes during a process of
hash generation.
I agree, but it's even better for performance if this is collected only once.
I still don't know if this extension (or any extension) would require something
different from a common jumble state that would serve for all purpose or if we
can work with a single jumble state and only handle multiple queryid.
On 10/1/2022 15:39, Julien Rouhaud wrote:
On Mon, Jan 10, 2022 at 03:24:46PM +0500, Andrey Lepikhov wrote:
On 10/1/2022 13:56, Julien Rouhaud wrote:
Yes. the same input query string doesn't prove that frozen query plan can be
used, because rewrite rules could be changed. So we use only a query tree.Yes, I'm fully aware of that. I wasn't asking about using the input query
string but the need for generating a dedicated normalized output query string
that would be potentially different from the one generated by
pg_stat_statements (or similar).
Thanks, now I got it. I don't remember a single situation where we would
need to normalize a query string.
But then, if generating 2 queryid is a better for performance, does the
extension really need an additional jumble state and/or normalized query
string?Additional Jumble state isn't necessary, but it would be better for
performance to collect pointers to all constant nodes during a process of
hash generation.I agree, but it's even better for performance if this is collected only once.
I still don't know if this extension (or any extension) would require something
different from a common jumble state that would serve for all purpose or if we
can work with a single jumble state and only handle multiple queryid.
I think, JumbleState in highly monitoring-specific, maybe even
pg_stat_statements specific (maybe you can designate another examples).
I haven't ideas how it can be used in arbitrary extension.
But, it is not so difficult to imagine an implementation (as part of
Tom's approach, described earlier) such kind of storage for each
generation method.
--
regards,
Andrey Lepikhov
Postgres Professional
On 1/9/22 5:49 AM, Tom Lane wrote:
The idea I'd been vaguely thinking about is to allow attaching a list
of query-hash nodes to a Query, where each node would contain a "tag"
identifying the specific hash calculation method, and also the value
of the query's hash calculated according to that method. We could
probably get away with saying that all such hash values must be uint64.
The main difference from your function-OID idea, I think, is that
I'm envisioning the tags as being small integers with well-known
values, similarly to the way we manage stakind values in pg_statistic.
In this way, an extension that wants a hash that the core knows how
to calculate doesn't need its own copy of the code, and similarly
one extension could publish a calculation method for use by other
extensions.
To move forward, I have made a patch that implements this idea (see
attachment). It is a POC, but passes all regression tests.
Registration of an queryId generator implemented by analogy with
extensible methods machinery. Also, I switched queryId to int64 type and
renamed to 'label'.
Some lessons learned:
1. Single queryId implementation is deeply tangled with the core code
(stat reporting machinery and parallel workers as an example).
2. We need a custom queryId, that is based on a generated queryId
(according to the logic of pg_stat_statements).
3. We should think about safety of de-registering procedure.
4. We should reserve position of default in-core generator and think on
logic of enabling/disabling it.
5. We should add an EXPLAIN hook, to allow an extension to print this
custom queryId.
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
POC-multiple-query_id.patchtext/x-patch; charset=UTF-8; name=POC-multiple-query_id.patchDownload
From f54bb60bd71b49ac8e1b85cd2ad86332a8a81e84 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Date: Thu, 20 Jan 2022 16:05:17 +0500
Subject: [PATCH] Initial commit
---
.../pg_stat_statements/pg_stat_statements.c | 57 ++++--
src/backend/commands/explain.c | 11 +-
src/backend/executor/execMain.c | 4 +-
src/backend/executor/execParallel.c | 3 +-
src/backend/nodes/copyfuncs.c | 21 ++-
src/backend/nodes/outfuncs.c | 17 +-
src/backend/nodes/readfuncs.c | 17 +-
src/backend/optimizer/plan/planner.c | 2 +-
src/backend/parser/analyze.c | 15 +-
src/backend/rewrite/rewriteHandler.c | 4 +-
src/backend/tcop/postgres.c | 11 +-
src/backend/utils/misc/guc.c | 2 +-
src/backend/utils/misc/queryjumble.c | 164 ++++++++++++++++--
src/include/nodes/nodes.h | 1 +
src/include/nodes/parsenodes.h | 10 +-
src/include/nodes/plannodes.h | 2 +-
src/include/parser/analyze.h | 3 +-
src/include/utils/queryjumble.h | 12 +-
18 files changed, 294 insertions(+), 62 deletions(-)
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 082bfa8f77..ebfc3331df 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -307,8 +307,7 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_info);
static void pgss_shmem_startup(void);
static void pgss_shmem_shutdown(int code, Datum arg);
-static void pgss_post_parse_analyze(ParseState *pstate, Query *query,
- JumbleState *jstate);
+static void pgss_post_parse_analyze(ParseState *pstate, Query *query);
static PlannedStmt *pgss_planner(Query *parse,
const char *query_string,
int cursorOptions,
@@ -813,13 +812,29 @@ error:
}
/*
- * Post-parse-analysis hook: mark query with a queryId
+ * Post-parse-analysis hook: create a label for the query.
*/
static void
-pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
+pgss_post_parse_analyze(ParseState *pstate, Query *query)
{
+ JumbleState *jstate;
+ QueryLabel *label = get_query_label(query->queryIds, 0);
+
if (prev_post_parse_analyze_hook)
- prev_post_parse_analyze_hook(pstate, query, jstate);
+ prev_post_parse_analyze_hook(pstate, query);
+
+ if (label)
+ {
+ add_custom_query_label(&query->queryIds, -1, label->hash);
+ jstate = (JumbleState *) label->context;
+ }
+ else
+ {
+ add_custom_query_label(&query->queryIds, -1, UINT64CONST(0));
+ jstate = NULL;
+ }
+
+ label = get_query_label(query->queryIds, -1);
/* Safety check... */
if (!pgss || !pgss_hash || !pgss_enabled(exec_nested_level))
@@ -833,7 +848,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
if (query->utilityStmt)
{
if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt))
- query->queryId = UINT64CONST(0);
+ label->hash = UINT64CONST(0);
return;
}
@@ -846,7 +861,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
*/
if (jstate && jstate->clocations_count > 0)
pgss_store(pstate->p_sourcetext,
- query->queryId,
+ label->hash,
query->stmt_location,
query->stmt_len,
PGSS_INVALID,
@@ -868,6 +883,9 @@ pgss_planner(Query *parse,
ParamListInfo boundParams)
{
PlannedStmt *result;
+ int64 queryId;
+
+ queryId = get_query_label_hash(parse->queryIds, -1);
/*
* We can't process the query if no query_string is provided, as
@@ -883,7 +901,7 @@ pgss_planner(Query *parse,
*/
if (pgss_enabled(plan_nested_level + exec_nested_level)
&& pgss_track_planning && query_string
- && parse->queryId != UINT64CONST(0))
+ && queryId != UINT64CONST(0))
{
instr_time start;
instr_time duration;
@@ -930,7 +948,7 @@ pgss_planner(Query *parse,
WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
pgss_store(query_string,
- parse->queryId,
+ queryId,
parse->stmt_location,
parse->stmt_len,
PGSS_PLAN,
@@ -959,17 +977,21 @@ pgss_planner(Query *parse,
static void
pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ int64 queryId;
+
if (prev_ExecutorStart)
prev_ExecutorStart(queryDesc, eflags);
else
standard_ExecutorStart(queryDesc, eflags);
+ queryId = get_query_label_hash(queryDesc->plannedstmt->queryIds, -1);
+
/*
* If query has queryId zero, don't track it. This prevents double
* counting of optimizable statements that are directly contained in
* utility statements.
*/
- if (pgss_enabled(exec_nested_level) && queryDesc->plannedstmt->queryId != UINT64CONST(0))
+ if (pgss_enabled(exec_nested_level) && queryId != UINT64CONST(0))
{
/*
* Set up to track total elapsed time in ExecutorRun. Make sure the
@@ -1036,7 +1058,7 @@ pgss_ExecutorFinish(QueryDesc *queryDesc)
static void
pgss_ExecutorEnd(QueryDesc *queryDesc)
{
- uint64 queryId = queryDesc->plannedstmt->queryId;
+ uint64 queryId = get_query_label_hash(queryDesc->plannedstmt->queryIds, -1);
if (queryId != UINT64CONST(0) && queryDesc->totaltime &&
pgss_enabled(exec_nested_level))
@@ -1076,7 +1098,16 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
DestReceiver *dest, QueryCompletion *qc)
{
Node *parsetree = pstmt->utilityStmt;
- uint64 saved_queryId = pstmt->queryId;
+ QueryLabel *label = get_query_label(pstmt->queryIds, -1);
+ uint64 saved_queryId;
+
+ if (!label)
+ {
+ add_custom_query_label(&pstmt->queryIds, -1, UINT64CONST(0));
+ label = get_query_label(pstmt->queryIds, -1);
+ }
+
+ saved_queryId = label->hash;
/*
* Force utility statements to get queryId zero. We do this even in cases
@@ -1093,7 +1124,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
* only.
*/
if (pgss_enabled(exec_nested_level) && pgss_track_utility)
- pstmt->queryId = UINT64CONST(0);
+ label->hash = UINT64CONST(0);
/*
* If it's an EXECUTE statement, we don't track it and don't increment the
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index b970997c34..b31a870156 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -166,7 +166,6 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
{
ExplainState *es = NewExplainState();
TupOutputState *tstate;
- JumbleState *jstate = NULL;
Query *query;
List *rewritten;
ListCell *lc;
@@ -246,10 +245,10 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
query = castNode(Query, stmt->query);
if (IsQueryIdEnabled())
- jstate = JumbleQuery(query, pstate->p_sourcetext);
+ GenerateQueryLabels(query, pstate->p_sourcetext);
if (post_parse_analyze_hook)
- (*post_parse_analyze_hook) (pstate, query, jstate);
+ (*post_parse_analyze_hook) (pstate, query);
/*
* Parse analysis was done already, but we still have to run the rule
@@ -604,14 +603,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* Create textual dump of plan tree */
ExplainPrintPlan(es, queryDesc);
- if (es->verbose && plannedstmt->queryId != UINT64CONST(0))
+ if (es->verbose && plannedstmt->queryIds != NIL)
{
/*
* Output the queryid as an int64 rather than a uint64 so we match
* what would be seen in the BIGINT pg_stat_statements.queryid column.
*/
- ExplainPropertyInteger("Query Identifier", NULL, (int64)
- plannedstmt->queryId, es);
+ ExplainPropertyInteger("Query Identifier", NULL,
+ get_query_label_hash(plannedstmt->queryIds, 0), es);
}
/* Show buffer usage in planning */
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 549d9eb696..cbb97cf91f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -62,6 +62,7 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/partcache.h"
+#include "utils/queryjumble.h"
#include "utils/rls.h"
#include "utils/ruleutils.h"
#include "utils/snapmgr.h"
@@ -135,7 +136,8 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
* that it's harmless to report the query_id multiple time, as the call
* will be ignored if the top level query_id has already been reported.
*/
- pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
+ pgstat_report_query_id(
+ get_query_label_hash(queryDesc->plannedstmt->queryIds, 0), false);
if (ExecutorStart_hook)
(*ExecutorStart_hook) (queryDesc, eflags);
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 5dd8ab7db2..d025860f45 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -49,6 +49,7 @@
#include "utils/dsa.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
+#include "utils/queryjumble.h"
#include "utils/snapmgr.h"
/*
@@ -175,7 +176,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
*/
pstmt = makeNode(PlannedStmt);
pstmt->commandType = CMD_SELECT;
- pstmt->queryId = pgstat_get_my_query_id();
+ add_custom_query_label(&pstmt->queryIds, 0, pgstat_get_my_query_id());
pstmt->hasReturning = false;
pstmt->hasModifyingCTE = false;
pstmt->canSetTag = true;
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 90b5da51c9..6f159e0cbd 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -87,7 +87,7 @@ _copyPlannedStmt(const PlannedStmt *from)
PlannedStmt *newnode = makeNode(PlannedStmt);
COPY_SCALAR_FIELD(commandType);
- COPY_SCALAR_FIELD(queryId);
+ COPY_NODE_FIELD(queryIds);
COPY_SCALAR_FIELD(hasReturning);
COPY_SCALAR_FIELD(hasModifyingCTE);
COPY_SCALAR_FIELD(canSetTag);
@@ -3159,6 +3159,20 @@ _copyTriggerTransition(const TriggerTransition *from)
return newnode;
}
+static QueryLabel *
+_copyQueryLabel(const QueryLabel *from)
+{
+ QueryLabel *newnode = makeNode(QueryLabel);
+
+ COPY_SCALAR_FIELD(kind);
+ COPY_SCALAR_FIELD(hash);
+
+ /* Caller should re-generate labels if it want to use it. */
+ newnode->context = NULL;
+
+ return newnode;
+}
+
static Query *
_copyQuery(const Query *from)
{
@@ -3166,7 +3180,7 @@ _copyQuery(const Query *from)
COPY_SCALAR_FIELD(commandType);
COPY_SCALAR_FIELD(querySource);
- COPY_SCALAR_FIELD(queryId);
+ COPY_NODE_FIELD(queryIds);
COPY_SCALAR_FIELD(canSetTag);
COPY_NODE_FIELD(utilityStmt);
COPY_SCALAR_FIELD(resultRelation);
@@ -5405,6 +5419,9 @@ copyObjectImpl(const void *from)
/*
* PARSE NODES
*/
+ case T_QueryLabel:
+ retval = _copyQueryLabel(from);
+ break;
case T_Query:
retval = _copyQuery(from);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 2b0236937a..c86eb636b6 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -305,7 +305,7 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_NODE_TYPE("PLANNEDSTMT");
WRITE_ENUM_FIELD(commandType, CmdType);
- WRITE_UINT64_FIELD(queryId);
+ WRITE_NODE_FIELD(queryIds);
WRITE_BOOL_FIELD(hasReturning);
WRITE_BOOL_FIELD(hasModifyingCTE);
WRITE_BOOL_FIELD(canSetTag);
@@ -3030,6 +3030,16 @@ _outStatsElem(StringInfo str, const StatsElem *node)
WRITE_NODE_FIELD(expr);
}
+/* Needed for dump of a PlannedStmt node */
+static void
+_outQueryLabel(StringInfo str, const QueryLabel *node)
+{
+ WRITE_NODE_TYPE("QUERYLABEL");
+
+ WRITE_INT_FIELD(kind);
+ WRITE_UINT64_FIELD(hash);
+}
+
static void
_outQuery(StringInfo str, const Query *node)
{
@@ -3037,7 +3047,7 @@ _outQuery(StringInfo str, const Query *node)
WRITE_ENUM_FIELD(commandType, CmdType);
WRITE_ENUM_FIELD(querySource, QuerySource);
- /* we intentionally do not print the queryId field */
+ /* we intentionally do not print the queryId fields */
WRITE_BOOL_FIELD(canSetTag);
/*
@@ -4403,6 +4413,9 @@ outNode(StringInfo str, const void *obj)
case T_StatsElem:
_outStatsElem(str, obj);
break;
+ case T_QueryLabel:
+ _outQueryLabel(str, obj);
+ break;
case T_Query:
_outQuery(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 3f68f7c18d..47c35707d4 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -238,6 +238,17 @@ readBitmapset(void)
return _readBitmapset();
}
+static QueryLabel *
+_readQueryLabel(void)
+{
+ READ_LOCALS(QueryLabel);
+ READ_INT_FIELD(kind);
+ READ_UINT64_FIELD(hash);
+ local_node->context = NULL;
+
+ READ_DONE();
+}
+
/*
* _readQuery
*/
@@ -248,7 +259,7 @@ _readQuery(void)
READ_ENUM_FIELD(commandType, CmdType);
READ_ENUM_FIELD(querySource, QuerySource);
- local_node->queryId = UINT64CONST(0); /* not saved in output format */
+ local_node->queryIds = NIL; /* not saved in output format */
READ_BOOL_FIELD(canSetTag);
READ_NODE_FIELD(utilityStmt);
READ_INT_FIELD(resultRelation);
@@ -1578,7 +1589,7 @@ _readPlannedStmt(void)
READ_LOCALS(PlannedStmt);
READ_ENUM_FIELD(commandType, CmdType);
- READ_UINT64_FIELD(queryId);
+ READ_NODE_FIELD(queryIds);
READ_BOOL_FIELD(hasReturning);
READ_BOOL_FIELD(hasModifyingCTE);
READ_BOOL_FIELD(canSetTag);
@@ -2728,6 +2739,8 @@ parseNodeString(void)
if (MATCH("QUERY", 5))
return_value = _readQuery();
+ else if (MATCH("QUERYLABEL", 10))
+ return_value = _readQueryLabel();
else if (MATCH("WITHCHECKOPTION", 15))
return_value = _readWithCheckOption();
else if (MATCH("SORTGROUPCLAUSE", 15))
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index bd09f85aea..2ef8a97997 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -510,7 +510,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result = makeNode(PlannedStmt);
result->commandType = parse->commandType;
- result->queryId = parse->queryId;
+ result->queryIds = list_copy(parse->queryIds);
result->hasReturning = (parse->returningList != NIL);
result->hasModifyingCTE = parse->hasModifyingCTE;
result->canSetTag = parse->canSetTag;
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 6ac2e9ce23..5925641795 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -113,7 +113,6 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
{
ParseState *pstate = make_parsestate(NULL);
Query *query;
- JumbleState *jstate = NULL;
Assert(sourceText != NULL); /* required as of 8.4 */
@@ -127,14 +126,15 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
query = transformTopLevelStmt(pstate, parseTree);
if (IsQueryIdEnabled())
- jstate = JumbleQuery(query, sourceText);
+ GenerateQueryLabels(query, sourceText);
if (post_parse_analyze_hook)
- (*post_parse_analyze_hook) (pstate, query, jstate);
+ (*post_parse_analyze_hook) (pstate, query);
free_parsestate(pstate);
- pgstat_report_query_id(query->queryId, false);
+ /* Report id of default generator. */
+ pgstat_report_query_id(get_query_label_hash(query->queryIds, 0), false);
return query;
}
@@ -152,7 +152,6 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
{
ParseState *pstate = make_parsestate(NULL);
Query *query;
- JumbleState *jstate = NULL;
Assert(sourceText != NULL); /* required as of 8.4 */
@@ -166,14 +165,14 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
check_variable_parameters(pstate, query);
if (IsQueryIdEnabled())
- jstate = JumbleQuery(query, sourceText);
+ GenerateQueryLabels(query, sourceText);
if (post_parse_analyze_hook)
- (*post_parse_analyze_hook) (pstate, query, jstate);
+ (*post_parse_analyze_hook) (pstate, query);
free_parsestate(pstate);
- pgstat_report_query_id(query->queryId, false);
+ pgstat_report_query_id(get_query_label_hash(query->queryIds, 0), false);
return query;
}
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 3d82138cb3..1cce0949b9 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -4079,7 +4079,7 @@ RewriteQuery(Query *parsetree, List *rewrite_events)
List *
QueryRewrite(Query *parsetree)
{
- uint64 input_query_id = parsetree->queryId;
+ List *input_query_ids = parsetree->queryIds;
List *querylist;
List *results;
ListCell *l;
@@ -4114,7 +4114,7 @@ QueryRewrite(Query *parsetree)
query = fireRIRrules(query, NIL);
- query->queryId = input_query_id;
+ query->queryIds = input_query_ids;
results = lappend(results, query);
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index fda2e9360e..76a8312b47 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -686,7 +686,6 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
ParseState *pstate;
Query *query;
List *querytree_list;
- JumbleState *jstate = NULL;
Assert(query_string != NULL); /* required as of 8.4 */
@@ -706,14 +705,14 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
query = transformTopLevelStmt(pstate, parsetree);
if (IsQueryIdEnabled())
- jstate = JumbleQuery(query, query_string);
+ GenerateQueryLabels(query, query_string);
if (post_parse_analyze_hook)
- (*post_parse_analyze_hook) (pstate, query, jstate);
+ (*post_parse_analyze_hook) (pstate, query);
free_parsestate(pstate);
- pgstat_report_query_id(query->queryId, false);
+ pgstat_report_query_id(get_query_label_hash(query->queryIds, 0), false);
if (log_parser_stats)
ShowUsage("PARSE ANALYSIS STATISTICS");
@@ -797,7 +796,7 @@ pg_rewrite_query(Query *query)
* queryId is not saved in stored rules, but we must preserve
* it here to avoid breaking pg_stat_statements.
*/
- new_query->queryId = query->queryId;
+ new_query->queryIds = list_copy(query->queryIds);
new_list = lappend(new_list, new_query);
pfree(str);
@@ -933,7 +932,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
stmt->utilityStmt = query->utilityStmt;
stmt->stmt_location = query->stmt_location;
stmt->stmt_len = query->stmt_len;
- stmt->queryId = query->queryId;
+ stmt->queryIds = list_copy(query->queryIds);
}
else
{
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 4c94f09c64..fb5dd1cdce 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -4695,7 +4695,7 @@ static struct config_enum ConfigureNamesEnum[] =
},
&compute_query_id,
COMPUTE_QUERY_ID_AUTO, compute_query_id_options,
- NULL, NULL, NULL
+ NULL, assign_query_id, NULL
},
{
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index a67487e5fe..59bf53bc46 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -97,18 +97,18 @@ CleanQuerytext(const char *query, int *location, int *len)
return query;
}
-JumbleState *
-JumbleQuery(Query *query, const char *querytext)
+int64
+JumbleQuery(Query *query, const char *querytext, void **context)
{
- JumbleState *jstate = NULL;
+ JumbleState *jstate = NULL;
+ int64 queryId;
Assert(IsQueryIdEnabled());
if (query->utilityStmt)
{
- query->queryId = compute_utility_query_id(querytext,
- query->stmt_location,
- query->stmt_len);
+ queryId = compute_utility_query_id(querytext, query->stmt_location,
+ query->stmt_len);
}
else
{
@@ -125,19 +125,20 @@ JumbleQuery(Query *query, const char *querytext)
/* Compute query ID and mark the Query node with it */
JumbleQueryInternal(jstate, query);
- query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
- jstate->jumble_len,
- 0));
+ queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
+ jstate->jumble_len,
+ 0));
/*
* If we are unlucky enough to get a hash of zero, use 1 instead, to
* prevent confusion with the utility-statement case.
*/
- if (query->queryId == UINT64CONST(0))
- query->queryId = UINT64CONST(1);
+ if (queryId == UINT64CONST(0))
+ queryId = UINT64CONST(1);
}
- return jstate;
+ *context = jstate;
+ return queryId;
}
/*
@@ -150,7 +151,29 @@ void
EnableQueryId(void)
{
if (compute_query_id != COMPUTE_QUERY_ID_OFF)
+ {
+ (void) RegisterQueryIdGen(0, JumbleQuery);
query_id_enabled = true;
+ }
+}
+
+void
+assign_query_id(int newval, void *extra)
+{
+ switch (newval)
+ {
+ case COMPUTE_QUERY_ID_OFF:
+ /* De-register routine should be implemented */
+ break;
+ case COMPUTE_QUERY_ID_AUTO:
+ break;
+ case COMPUTE_QUERY_ID_ON:
+ (void) RegisterQueryIdGen(0, JumbleQuery);
+ break;
+ default:
+ elog(ERROR, "Unknown value");
+ break;
+ }
}
/*
@@ -856,3 +879,120 @@ RecordConstLocation(JumbleState *jstate, int location)
jstate->clocations_count++;
}
}
+
+/* *******************************************************************************
+ *
+ * Query label custom generation machinery.
+ *
+ * ******************************************************************************/
+
+#include "utils/hsearch.h"
+
+static HTAB *QueryLabelGens = NULL;
+
+typedef struct QLGeneratorEntry
+{
+ int16 kind;
+ query_generator_type callback;
+} QLGeneratorEntry;
+
+/*
+ * An internal function to register a new callback structure
+ * Use a positive value for the 'kind' field to register an query label generator.
+ * Caller should worry about intersections by this value with other extensions.
+ */
+bool
+RegisterQueryIdGen(const int16 kind, query_generator_type callback)
+{
+ QLGeneratorEntry *entry;
+ bool found;
+
+ if (QueryLabelGens == NULL)
+ {
+ HASHCTL ctl;
+
+ ctl.keysize = sizeof(int16);
+ ctl.entrysize = sizeof(QLGeneratorEntry);
+
+ QueryLabelGens = hash_create("Query Label Generators", 8, &ctl,
+ HASH_ELEM | HASH_BLOBS);
+ }
+
+ if (kind < 0)
+ elog(ERROR, "Can't use value %d as an query generator kind.", kind);
+
+ entry = (QLGeneratorEntry *) hash_search(QueryLabelGens, (void *) &kind,
+ HASH_ENTER, &found);
+ if (found)
+ /* Give caller a chance to process the problem. */
+ return false;
+
+ entry->callback = callback;
+ return true;
+}
+
+/*
+ * Create QueryLabel entry for each registered generator.
+ * All memory allocations is made in current memory context.
+ */
+void
+GenerateQueryLabels(Query *query, const char *querytext)
+{
+ HASH_SEQ_STATUS hash_seq;
+ QLGeneratorEntry *entry;
+
+ /* Newly generated query haven't any labels. */
+ Assert(query->queryIds == NIL);
+
+ if (QueryLabelGens == NULL || hash_get_num_entries(QueryLabelGens) == 0)
+ return;
+
+ hash_seq_init(&hash_seq, QueryLabelGens);
+ while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ {
+ QueryLabel *qlabel = makeNode(QueryLabel);
+
+ qlabel->kind = entry->kind;
+ qlabel->hash = entry->callback(query, querytext, &qlabel->context);
+ query->queryIds = lappend(query->queryIds, qlabel);
+ }
+}
+
+int64
+get_query_label_hash(List *queryIds, const int16 kind)
+{
+ QueryLabel *label;
+
+ label = get_query_label(queryIds, kind);
+ return (label) ? label->hash : 0;
+}
+
+QueryLabel *
+get_query_label(List *queryIds, const int16 kind)
+{
+ ListCell *lc;
+
+ foreach(lc, queryIds)
+ {
+ QueryLabel *label = lfirst_node(QueryLabel, lc);
+ if (label->kind == kind)
+ return label;
+ }
+ return NULL;
+}
+
+bool
+add_custom_query_label(List **queryIds, int16 kind, int64 hash)
+{
+ QueryLabel *label;
+
+ if (get_query_label_hash(*queryIds, kind) != 0)
+ elog(ERROR, "Duplicated custom label %ld for the kind %d.", hash, kind);
+
+ label = makeNode(QueryLabel);
+ label->kind = kind;
+ label->hash = hash;
+ label->context = NULL;
+ *queryIds = lappend(*queryIds, label);
+ return true;
+}
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index f9ddafd345..0835034c2d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -314,6 +314,7 @@ typedef enum NodeTag
T_RawStmt,
T_Query,
T_PlannedStmt,
+ T_QueryLabel,
T_InsertStmt,
T_DeleteStmt,
T_UpdateStmt,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 3e9bdc781f..bc9ca32273 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -70,6 +70,14 @@ typedef enum SetQuantifier
SET_QUANTIFIER_DISTINCT
} SetQuantifier;
+typedef struct QueryLabel
+{
+ NodeTag type;
+ int16 kind; /* unique ID of generator. 0 reserved for in-core jumbling routine. */
+ int64 hash; /* generated stamp. */
+ void *context; /* internal generator-related data. */
+} QueryLabel;
+
/*
* Grantable rights are encoded so that we can OR them together in a bitmask.
* The present representation of AclItem limits us to 16 distinct rights,
@@ -121,7 +129,7 @@ typedef struct Query
QuerySource querySource; /* where did I come from? */
- uint64 queryId; /* query identifier (can be set by plugins) */
+ List *queryIds; /* query identifiers (can be added by plugins) */
bool canSetTag; /* do I set the command result tag? */
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 0b518ce6b2..f588311de2 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -45,7 +45,7 @@ typedef struct PlannedStmt
CmdType commandType; /* select|insert|update|delete|utility */
- uint64 queryId; /* query identifier (copied from Query) */
+ List *queryIds; /* query identifiers (copied from Query) */
bool hasReturning; /* is it insert|update|delete RETURNING? */
diff --git a/src/include/parser/analyze.h b/src/include/parser/analyze.h
index 0022184de0..455396c054 100644
--- a/src/include/parser/analyze.h
+++ b/src/include/parser/analyze.h
@@ -19,8 +19,7 @@
/* Hook for plugins to get control at end of parse analysis */
typedef void (*post_parse_analyze_hook_type) (ParseState *pstate,
- Query *query,
- JumbleState *jstate);
+ Query *query);
extern PGDLLIMPORT post_parse_analyze_hook_type post_parse_analyze_hook;
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
index a4c277269e..f09b88e3ec 100644
--- a/src/include/utils/queryjumble.h
+++ b/src/include/utils/queryjumble.h
@@ -65,7 +65,7 @@ extern int compute_query_id;
extern const char *CleanQuerytext(const char *query, int *location, int *len);
-extern JumbleState *JumbleQuery(Query *query, const char *querytext);
+extern int64 JumbleQuery(Query *query, const char *querytext, void **context);
extern void EnableQueryId(void);
extern bool query_id_enabled;
@@ -84,4 +84,14 @@ IsQueryIdEnabled(void)
return query_id_enabled;
}
+typedef int64 (*query_generator_type) (Query *query, const char *querytext,
+ void **context);
+
+extern void assign_query_id(int newval, void *extra);
+extern bool RegisterQueryIdGen(const int16 kind, query_generator_type callback);
+extern void GenerateQueryLabels(Query *query, const char *querytext);
+extern int64 get_query_label_hash(List *queryIds, const int16 kind);
+extern QueryLabel *get_query_label(List *queryIds, const int16 kind);
+extern bool add_custom_query_label(List **queryIds, int16 kind, int64 hash);
+
#endif /* QUERYJUMBLE_H */
--
2.25.1
On Fri, Jan 21, 2022 at 11:33:22AM +0500, Andrey V. Lepikhov wrote:
On 1/9/22 5:49 AM, Tom Lane wrote:The idea I'd been vaguely thinking about is to allow attaching a list
of query-hash nodes to a Query, where each node would contain a "tag"
identifying the specific hash calculation method, and also the value
of the query's hash calculated according to that method. We could
probably get away with saying that all such hash values must be uint64.
The main difference from your function-OID idea, I think, is that
I'm envisioning the tags as being small integers with well-known
values, similarly to the way we manage stakind values in pg_statistic.
In this way, an extension that wants a hash that the core knows how
to calculate doesn't need its own copy of the code, and similarly
one extension could publish a calculation method for use by other
extensions.To move forward, I have made a patch that implements this idea (see
attachment). It is a POC, but passes all regression tests.
Thanks. Couple of comments off the top of my head:
Registration of an queryId generator implemented by analogy with extensible
methods machinery.
Why not more like suggested with stakind and slots in some data
structure? All of those generators have to be iterated anyway, so not
sure if a hash table makes sense.
Also, I switched queryId to int64 type and renamed to
'label'.
A name with "id" in it would be better I believe. Label could be think
of as "the query belongs to a certain category", while the purpose is
identification.
2. We need a custom queryId, that is based on a generated queryId (according
to the logic of pg_stat_statements).
Could you clarify?
4. We should reserve position of default in-core generator
From the discussion above I was under the impression that the core
generator should be distinguished by a predefined kind.
5. We should add an EXPLAIN hook, to allow an extension to print this custom
queryId.
Why? It would make sense if custom generation code will be generating
some complex structure, but the queryId itself is still a hash.
Hi,
On Fri, Jan 28, 2022 at 05:51:56PM +0100, Dmitry Dolgov wrote:
On Fri, Jan 21, 2022 at 11:33:22AM +0500, Andrey V. Lepikhov wrote:
On 1/9/22 5:49 AM, Tom Lane wrote:The idea I'd been vaguely thinking about is to allow attaching a list
of query-hash nodes to a Query, where each node would contain a "tag"
identifying the specific hash calculation method, and also the value
of the query's hash calculated according to that method. We could
probably get away with saying that all such hash values must be uint64.
The main difference from your function-OID idea, I think, is that
I'm envisioning the tags as being small integers with well-known
values, similarly to the way we manage stakind values in pg_statistic.
In this way, an extension that wants a hash that the core knows how
to calculate doesn't need its own copy of the code, and similarly
one extension could publish a calculation method for use by other
extensions.To move forward, I have made a patch that implements this idea (see
attachment). It is a POC, but passes all regression tests.[...]
4. We should reserve position of default in-core generator
From the discussion above I was under the impression that the core
generator should be distinguished by a predefined kind.
I don't really like this approach. IIUC, this patch as-is is meant to break
pg_stat_statements extensibility. If kind == 0 is reserved for in-core queryid
then you can't use pg_stat_statement with a different queryid generator
anymore. Unless you meant that kind == 0 is reserved for monitoring /
pg_stat_statement purpose and custom extension should register that specific
kind too if that's what they are supposed to implement?
The patch also reserves kind == -1 for pg_stat_statements internal purpose, and
I'm not really sure why that's needed. Are additional extension that want to
agree with pg_stat_statements on what the monitoring queryid is
supposed to do that too?
I'm also unsure of how are extensions supposed to cooperate in general, as
I feel that queryids should be implemented for some "intent" (like monitoring,
planning optimization...). That being said I'm not sure if e.g. AQO heuristics
are too specific for its need or if it could be shared with other extension
that might be dealing with similar concerns.
On Sat, Jan 29, 2022 at 03:51:33PM +0800, Julien Rouhaud wrote:
Hi,On Fri, Jan 28, 2022 at 05:51:56PM +0100, Dmitry Dolgov wrote:
On Fri, Jan 21, 2022 at 11:33:22AM +0500, Andrey V. Lepikhov wrote:
On 1/9/22 5:49 AM, Tom Lane wrote:The idea I'd been vaguely thinking about is to allow attaching a list
of query-hash nodes to a Query, where each node would contain a "tag"
identifying the specific hash calculation method, and also the value
of the query's hash calculated according to that method. We could
probably get away with saying that all such hash values must be uint64.
The main difference from your function-OID idea, I think, is that
I'm envisioning the tags as being small integers with well-known
values, similarly to the way we manage stakind values in pg_statistic.
In this way, an extension that wants a hash that the core knows how
to calculate doesn't need its own copy of the code, and similarly
one extension could publish a calculation method for use by other
extensions.To move forward, I have made a patch that implements this idea (see
attachment). It is a POC, but passes all regression tests.[...]
4. We should reserve position of default in-core generator
From the discussion above I was under the impression that the core
generator should be distinguished by a predefined kind.I don't really like this approach. IIUC, this patch as-is is meant to break
pg_stat_statements extensibility. If kind == 0 is reserved for in-core queryid
then you can't use pg_stat_statement with a different queryid generator
anymore. Unless you meant that kind == 0 is reserved for monitoring /
pg_stat_statement purpose and custom extension should register that specific
kind too if that's what they are supposed to implement?[...]
I'm also unsure of how are extensions supposed to cooperate in general, as
I feel that queryids should be implemented for some "intent" (like monitoring,
planning optimization...). That being said I'm not sure if e.g. AQO heuristics
are too specific for its need or if it could be shared with other extension
that might be dealing with similar concerns.
Assuming there are multiple providers and consumers of queryIds, every
such consumer extension needs to know which type of queryId it wants to
use. E.g. in case of pg_stat_statements, it needs to be somehow
configured to know which of those kinds to take, to preserve
extensibility you're talking about. Does the answer make sense, or did
you mean something else?
Btw, the approach in this thread still doesn't give a clue what to do
when an extension needs to reuse some parts of core queryId generator,
as in case with pg_stat_statements and "IN" condition merging.
Hi,
On Sat, Jan 29, 2022 at 06:12:05PM +0100, Dmitry Dolgov wrote:
On Sat, Jan 29, 2022 at 03:51:33PM +0800, Julien Rouhaud wrote:
I'm also unsure of how are extensions supposed to cooperate in general, as
I feel that queryids should be implemented for some "intent" (like monitoring,
planning optimization...). That being said I'm not sure if e.g. AQO heuristics
are too specific for its need or if it could be shared with other extension
that might be dealing with similar concerns.Assuming there are multiple providers and consumers of queryIds, every
such consumer extension needs to know which type of queryId it wants to
use. E.g. in case of pg_stat_statements, it needs to be somehow
configured to know which of those kinds to take, to preserve
extensibility you're talking about. Does the answer make sense, or did
you mean something else?
I guess, but I don't think that the proposed approach does that.
The DBA should be able to configure a monitoring queryid provider, a planning
queryid provider... and the extensions should have a way to know which is
which. And also I don't think that the DBA should be allowed to setup multiple
monitoring queryid providers, nor change them dynamically.
Btw, the approach in this thread still doesn't give a clue what to do
when an extension needs to reuse some parts of core queryId generator,
as in case with pg_stat_statements and "IN" condition merging.
Indeed, as the query text normalization is not extensible.
On Sun, Jan 30, 2022 at 01:48:20AM +0800, Julien Rouhaud wrote:
Hi,On Sat, Jan 29, 2022 at 06:12:05PM +0100, Dmitry Dolgov wrote:
On Sat, Jan 29, 2022 at 03:51:33PM +0800, Julien Rouhaud wrote:
I'm also unsure of how are extensions supposed to cooperate in general, as
I feel that queryids should be implemented for some "intent" (like monitoring,
planning optimization...). That being said I'm not sure if e.g. AQO heuristics
are too specific for its need or if it could be shared with other extension
that might be dealing with similar concerns.Assuming there are multiple providers and consumers of queryIds, every
such consumer extension needs to know which type of queryId it wants to
use. E.g. in case of pg_stat_statements, it needs to be somehow
configured to know which of those kinds to take, to preserve
extensibility you're talking about. Does the answer make sense, or did
you mean something else?I guess, but I don't think that the proposed approach does that.
Yes, it doesn't, I'm just channeling my understanding of the problem.
On 1/28/22 9:51 PM, Dmitry Dolgov wrote:
On Fri, Jan 21, 2022 at 11:33:22AM +0500, Andrey V. Lepikhov wrote:
Registration of an queryId generator implemented by analogy with extensible
methods machinery.Why not more like suggested with stakind and slots in some data
structure? All of those generators have to be iterated anyway, so not
sure if a hash table makes sense.
Maybe. But it is not obvious. We don't really know, how many extensions
could set an queryId.
For example, adaptive planning extensions definitely wants to set an
unique id (for example, simplistic counter) to trace specific
{query,plan} across all executions (remember plancache too). And they
would register a personal generator for such purpose.
Also, I switched queryId to int64 type and renamed to
'label'.A name with "id" in it would be better I believe. Label could be think
of as "the query belongs to a certain category", while the purpose is
identification.
I think, it is not a full true. Current jumbling generates not unique
queryId (i hope, intentionally) and pg_stat_statements uses queryId to
group queries into classes.
For tracking specific query along execution path it performs additional
efforts (to remember nesting query level, as an example).
BTW, before [1]/messages/by-id/e50c1e8f-e5d6-5988-48fa-63dd992e9565@postgrespro.ru -- regards, Andrey Lepikhov Postgres Professional, I tried to improve queryId, that can be stable for
permutations of tables in 'FROM' section and so on. It would allow to
reduce a number of pg_stat_statements entries (critical factor when you
use an ORM, like 1C for example).
So, i think queryId is an Id and a category too.
2. We need a custom queryId, that is based on a generated queryId (according
to the logic of pg_stat_statements).Could you clarify?
pg_stat_statements uses origin queryId and changes it for a reason
(sometimes zeroed it, sometimes not). So you can't use this value in
another extension and be confident that you use original value,
generated by JumbleQuery(). Custom queryId allows to solve this problem.
4. We should reserve position of default in-core generator
From the discussion above I was under the impression that the core
generator should be distinguished by a predefined kind.
Yes, but I think we should have a range of values, enough for use in
third party extensions.
5. We should add an EXPLAIN hook, to allow an extension to print this custom
queryId.Why? It would make sense if custom generation code will be generating
some complex structure, but the queryId itself is still a hash.
Extension can print not only queryId, but an explanation of a kind,
maybe additional logic.
Moreover why an extension can't show some useful monitoring data,
collected during an query execution, in verbose mode?
[1]: /messages/by-id/e50c1e8f-e5d6-5988-48fa-63dd992e9565@postgrespro.ru -- regards, Andrey Lepikhov Postgres Professional
/messages/by-id/e50c1e8f-e5d6-5988-48fa-63dd992e9565@postgrespro.ru
--
regards,
Andrey Lepikhov
Postgres Professional
On Mon, Jan 31, 2022 at 02:59:17PM +0500, Andrey V. Lepikhov wrote:
On 1/28/22 9:51 PM, Dmitry Dolgov wrote:On Fri, Jan 21, 2022 at 11:33:22AM +0500, Andrey V. Lepikhov wrote:
Registration of an queryId generator implemented by analogy with extensible
methods machinery.Why not more like suggested with stakind and slots in some data
structure? All of those generators have to be iterated anyway, so not
sure if a hash table makes sense.Maybe. But it is not obvious. We don't really know, how many extensions
could set an queryId.
For example, adaptive planning extensions definitely wants to set an unique
id (for example, simplistic counter) to trace specific {query,plan} across
all executions (remember plancache too). And they would register a personal
generator for such purpose.
I don't see how the number of extensions justify a hash table? I mean,
all of the generators will still most likely be executed at once (not on
demand) and store the result somewhere.
In any way I would like to remind, that this functionality wants to be
on the pretty much hot path, which means strong evidences of no
significant performance overhead are needed. And sure, there could be
multiple queryId consumers, but in the vast majority of cases only one
queryId will be generated.
2. We need a custom queryId, that is based on a generated queryId (according
to the logic of pg_stat_statements).Could you clarify?
pg_stat_statements uses origin queryId and changes it for a reason
(sometimes zeroed it, sometimes not).
IIRC it does that only for utility statements. I still fail to see the
problem, why would a custom extension not register a new generator if it
needs something different?
5. We should add an EXPLAIN hook, to allow an extension to print this custom
queryId.Why? It would make sense if custom generation code will be generating
some complex structure, but the queryId itself is still a hash.Extension can print not only queryId, but an explanation of a kind, maybe
additional logic.
Moreover why an extension can't show some useful monitoring data, collected
during an query execution, in verbose mode?
A good recommendation which pops up every now and then in hackers, is to
concentrace on what is immediately useful, leaving "maybe useful" things
for later. I have a strong feeling this is the case here.
On 29/1/2022 12:51, Julien Rouhaud wrote:
4. We should reserve position of default in-core generator
From the discussion above I was under the impression that the core
generator should be distinguished by a predefined kind.I don't really like this approach. IIUC, this patch as-is is meant to break
pg_stat_statements extensibility. If kind == 0 is reserved for in-core queryid
then you can't use pg_stat_statement with a different queryid generator
anymore.
Yes, it is one more problem. Maybe if we want to make it extensible, we
could think about hooks in the pg_stat_statements too?
The patch also reserves kind == -1 for pg_stat_statements internal purpose, and
I'm not really sure why that's needed.
My idea - tags with positive numbers are reserved for generation
results, that is performance-critical.
As I see during the implementation, pg_stat_statements makes additional
changes on queryId (no matter which ones). Because our purpose is to
avoid interference in this place, I invented negative values, where
extensions can store their queryIds, based on any generator or not.
Maybe it is redundant - main idea here was to highlight the issue.
I'm also unsure of how are extensions supposed to cooperate in general, as
I feel that queryids should be implemented for some "intent" (like monitoring,
planning optimization...). That being said I'm not sure if e.g. AQO heuristics
are too specific for its need or if it could be shared with other extension
that might be dealing with similar concerns.
I think, it depends on a specific purpose of an extension.
--
regards,
Andrey Lepikhov
Postgres Professional