Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Started by Yun Lialmost 7 years ago222 messages

liyunjuanyong@gmail.com

almost 7 years ago

Hey pg developers,

Do you think if we can add queryId into the pg_stat_get_activity function
and ultimatly expose it in the view? It would be easier to track "similar"
query's performance over time easier.

Thanks a lot!
Yun

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: Yun Li (#1)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Yun Li <liyunjuanyong@gmail.com> writes:

Do you think if we can add queryId into the pg_stat_get_activity function
and ultimatly expose it in the view? It would be easier to track "similar"
query's performance over time easier.

No, we're not likely to do that, because it would mean (1) baking one
single definition of "query ID" into the core system and (2) paying
the cost to calculate that ID all the time.

pg_stat_statements has a notion of query ID, but that notion might be
quite inappropriate for other usages, which is why it's an extension
and not core.

regards, tom lane

robertmhaas@gmail.com

almost 7 years ago

In reply to: Tom Lane (#2)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Mar 15, 2019 at 9:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yun Li <liyunjuanyong@gmail.com> writes:

Do you think if we can add queryId into the pg_stat_get_activity function
and ultimatly expose it in the view? It would be easier to track "similar"
query's performance over time easier.

No, we're not likely to do that, because it would mean (1) baking one
single definition of "query ID" into the core system and (2) paying
the cost to calculate that ID all the time.

pg_stat_statements has a notion of query ID, but that notion might be
quite inappropriate for other usages, which is why it's an extension
and not core.

Having written an extension that also wanted a query ID, I disagree
with this position. There's only one query ID field available, and
you can't use two extensions that care about query ID unless they
compute it the same way, and replicating all the code that computes
the query ID into each new extension that wants one sucks. I think we
should actually bite the bullet and move all of that code into core,
and then just let extensions say whether they care about it getting
set.

Also, I think this is now the third independent request to expose
query ID in pg_stat_statements. I think we should give the people
what they want.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: Robert Haas (#3)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Mar 15, 2019 at 9:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

pg_stat_statements has a notion of query ID, but that notion might be
quite inappropriate for other usages, which is why it's an extension
and not core.

Having written an extension that also wanted a query ID, I disagree
with this position.

[ shrug... ] The fact remains that pg_stat_statements's definition is
pretty lame. There's a lot of judgment calls in which query fields
it chooses to examine or ignore, and there's been no attempt at all
to make the ID PG-version-independent, and I rather doubt that it's
platform-independent either. Nor will the IDs survive a dump/reload
even on the same server, since object OIDs will likely change.

These things are OK, or at least mostly tolerable, for pg_stat_statements'
usage ... but I don't think it's a good idea to have the core code
dictating that definition to all extensions. Right now, if you have
an extension that needs some other query-ID definition, you can do it,
you just can't run that extension alongside pg_stat_statements.
But you'll be out of luck if the core code starts filling that field.

I'd be happier about having the core code compute a query ID if we
had a definition that was not so obviously slapped together.

regards, tom lane

rjuju123@gmail.com

almost 7 years ago

In reply to: Tom Lane (#4)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Sat, Mar 16, 2019 at 5:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Mar 15, 2019 at 9:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

pg_stat_statements has a notion of query ID, but that notion might be
quite inappropriate for other usages, which is why it's an extension
and not core.

Having written an extension that also wanted a query ID, I disagree
with this position.

[ shrug... ] The fact remains that pg_stat_statements's definition is
pretty lame. There's a lot of judgment calls in which query fields
it chooses to examine or ignore, and there's been no attempt at all
to make the ID PG-version-independent, and I rather doubt that it's
platform-independent either. Nor will the IDs survive a dump/reload
even on the same server, since object OIDs will likely change.

These things are OK, or at least mostly tolerable, for pg_stat_statements'
usage ... but I don't think it's a good idea to have the core code
dictating that definition to all extensions. Right now, if you have
an extension that needs some other query-ID definition, you can do it,
you just can't run that extension alongside pg_stat_statements.
But you'll be out of luck if the core code starts filling that field.

I'd be happier about having the core code compute a query ID if we
had a definition that was not so obviously slapped together.

But the queryId itself is stored in core. Exposing it in
pg_stat_activity or log_line_prefix would still allow users to choose
the implementation of their choice, or none. That seems like a
different complaint from asking pgss integration in core to have all
its metrics available by default (or at least without a restart).

Maybe we could add a GUC for pg_stat_statements to choose whether it
should set the queryid itself and not, if anyone wants to have its
metrics but with different queryid semantics?

legrand legrand

legrand_legrand@hotmail.com

almost 7 years ago

In reply to: Robert Haas (#3)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Hello,

This is available in https://github.com/legrandlegrand/pg_stat_sql_plans
extension with a specific function
pgssp_backend_queryid(pid) that permits to join pg_stat_activity with
pg_stat_sql_plans (that is similar to pg_stat_statements) and also permits
to collect samples of wait events per query id.

This extension computes its own queryid based on a normalized query text
(that doesn't change after table
drop/create).

Maybe that queryid calculation should stay in a dedicated extension,
permiting to users to choose their queryid definition.

Regards
PAscal

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

liyunjuanyong@gmail.com

almost 7 years ago

In reply to: Julien Rouhaud (#5)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Thanks a lot for really good points!! I did not expected I will get this
many points of view. :P

I have identical experience with Robert when other extension calculate the
id different as PGSS, PGSS will overwritten that id when it is on. But Tom
got a point that if we centralize the logic that pgss has, then other
extension will have no way to change it unless we have some new config to
toggle pointed out by Julien. Also Tom got the concern about the current
PGSS jumble query logic is not bullet proof and may take time then impact
the perf.

Let's take one step back. Since queryId is stored in core as Julien pointed
out, can we just add that global to the pg_stat_get_activity and ultimately
exposed in pg_stat_activity view? Then no matter whether PGSS is on or
off, or however the customer extensions are updating that filed, we expose
that field in that view then enable user to leverage that id to join with
pgss or their extension. Will this sounds a good idea?

Thanks again,
Yun

On Sat, Mar 16, 2019 at 11:01 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

Show quoted text

On Sat, Mar 16, 2019 at 5:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Mar 15, 2019 at 9:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

pg_stat_statements has a notion of query ID, but that notion might be
quite inappropriate for other usages, which is why it's an extension
and not core.

Having written an extension that also wanted a query ID, I disagree
with this position.

[ shrug... ] The fact remains that pg_stat_statements's definition is
pretty lame. There's a lot of judgment calls in which query fields
it chooses to examine or ignore, and there's been no attempt at all
to make the ID PG-version-independent, and I rather doubt that it's
platform-independent either. Nor will the IDs survive a dump/reload
even on the same server, since object OIDs will likely change.

These things are OK, or at least mostly tolerable, for

pg_stat_statements'

usage ... but I don't think it's a good idea to have the core code
dictating that definition to all extensions. Right now, if you have
an extension that needs some other query-ID definition, you can do it,
you just can't run that extension alongside pg_stat_statements.
But you'll be out of luck if the core code starts filling that field.

I'd be happier about having the core code compute a query ID if we
had a definition that was not so obviously slapped together.

But the queryId itself is stored in core. Exposing it in
pg_stat_activity or log_line_prefix would still allow users to choose
the implementation of their choice, or none. That seems like a
different complaint from asking pgss integration in core to have all
its metrics available by default (or at least without a restart).

Maybe we could add a GUC for pg_stat_statements to choose whether it
should set the queryid itself and not, if anyone wants to have its
metrics but with different queryid semantics?

Nikolay Samokhvalov

samokhvalov@gmail.com

almost 7 years ago

In reply to: Robert Haas (#3)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Hello

On Sat, Mar 16, 2019 at 7:32 AM Robert Haas <robertmhaas@gmail.com> wrote:

Also, I think this is now the third independent request to expose
query ID in pg_stat_statements. I think we should give the people
what they want.

Count me as the 4th.

This would be a very important feature for automated query analysis.
pg_stat_statements lacks query examples, and the only way to get them is
from the logs.
Where we don't have queryid as well. So people end up either doing it
manually or writing
yet another set of nasty regular expressions.

Routing query analysis s a crucial for any large project. If there are
chances to implement
queryid for pg_stat_activity (or anything that will allow to automate query
analysis)
in Postgres 12 or later -- this would be a great news and huge support for
engineers.
Same level as recently implemented sampling for statement logging.

By the way, if queryid goes to the core someday, I'm sure it is worth to
consider using
it in logs as well.

Thanks,
Nik

rjuju123@gmail.com

almost 7 years ago

In reply to: Yun Li (#7)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Mon, Mar 18, 2019 at 6:23 PM Yun Li <liyunjuanyong@gmail.com> wrote:

Let's take one step back. Since queryId is stored in core as Julien pointed out, can we just add that global to the pg_stat_get_activity and ultimately exposed in pg_stat_activity view? Then no matter whether PGSS is on or off, or however the customer extensions are updating that filed, we expose that field in that view then enable user to leverage that id to join with pgss or their extension. Will this sounds a good idea?

I'd greatly welcome expose queryid exposure in pg_stat_activity, and
also in log_line_prefix. I'm afraid that it's too late for pg12
inclusion, but I'll be happy to provide a patch for that for pg13.

Maksim Milyutin

milyutinma@gmail.com

almost 7 years ago

In reply to: Robert Haas (#3)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On 3/16/19 5:32 PM, Robert Haas wrote:

There's only one query ID field available, and
you can't use two extensions that care about query ID unless they
compute it the same way, and replicating all the code that computes
the query ID into each new extension that wants one sucks. I think we
should actually bite the bullet and move all of that code into core,
and then just let extensions say whether they care about it getting
set.

+1.

But I think that enough to integrate into core the query normalization
routine and store generalized query strings (from which the queryId is
produced) in shared memory (for example, hashtable that maps queryId to
the text representation of generalized query). And activate
normalization routine and filling the table of generalized queries by
specified GUC.

This allows to unbind extensions that require queryId from using
pg_stat_statements and consider such computing of queryId as canonical.

--
Regards,
Maksim Milyutin

rjuju123@gmail.com

almost 7 years ago

In reply to: Maksim Milyutin (#10)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Mar 19, 2019 at 2:45 PM Maksim Milyutin <milyutinma@gmail.com> wrote:

But I think that enough to integrate into core the query normalization
routine and store generalized query strings (from which the queryId is
produced) in shared memory (for example, hashtable that maps queryId to
the text representation of generalized query).

That's more or less how pg_stat_statements was previously behaving,
and it had too many problems. Current implementation, with an
external file, is a better alternative.

And activate
normalization routine and filling the table of generalized queries by
specified GUC.

This allows to unbind extensions that require queryId from using
pg_stat_statements and consider such computing of queryId as canonical.

The problem I see with this approach is that if you want a different
implementation, you'll have to reimplement the in-core normalised
queries saving and retrieval, but with a different set of SQL-visible
functions. I don't think that's it's acceptable, unless we add a
specific hook for query normalisation and queryid computing. But it
isn't ideal either, as it would be a total mess if someone changes the
implementation without resetting the previously saved normalised
queries.

rjuju123@gmail.com

almost 7 years ago

In reply to: Julien Rouhaud (#9)

1 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Mon, Mar 18, 2019 at 7:33 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Mon, Mar 18, 2019 at 6:23 PM Yun Li <liyunjuanyong@gmail.com> wrote:

Let's take one step back. Since queryId is stored in core as Julien pointed out, can we just add that global to the pg_stat_get_activity and ultimately exposed in pg_stat_activity view? Then no matter whether PGSS is on or off, or however the customer extensions are updating that filed, we expose that field in that view then enable user to leverage that id to join with pgss or their extension. Will this sounds a good idea?

I'd greatly welcome expose queryid exposure in pg_stat_activity, and
also in log_line_prefix. I'm afraid that it's too late for pg12
inclusion, but I'll be happy to provide a patch for that for pg13.

Here's a prototype patch for queryid exposure in pg_stat_activity and
log_line prefix.

Attachments:

queryid_exposure-v1.difftext/x-patch; charset=US-ASCII; name=queryid_exposure-v1.diffDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d383de2512..37570825be 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6260,6 +6260,11 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>queryid: identifier of session's current query, if any</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index ac2721c8ad..726c9430d5 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -800,6 +800,19 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      <entry><type>xid</type></entry>
      <entry>The current backend's <literal>xmin</literal> horizon.</entry>
     </row>
+    <row>
+     <entry><structfield>queryid</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>Identifier this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this field
+      shows the identifier of the currently executing query. In all other
+      states, it shows the identifier of last query that was executed, unless
+      an error occured which will reset this field to 0.  By default, query
+      identifiers are not computed, so this field will always display 0, unless
+      an additional module that compute query identifiers, such as <xref
+      linkend="pgstatstatements"/>, is configured.
+     </entry>
+    </row>
     <row>
      <entry><structfield>query</structfield></entry>
      <entry><type>text</type></entry>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index d962648bc5..6b62c7db1c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -708,6 +708,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 63a34760ee..955722d3a4 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -140,6 +140,8 @@ static void EvalPlanQualStart(EPQState *epqstate, EState *parentestate,
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	pg_atomic_write_u64(&MyProc->queryId, queryDesc->plannedstmt->queryId);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
@@ -300,6 +302,8 @@ ExecutorRun(QueryDesc *queryDesc,
 			ScanDirection direction, uint64 count,
 			bool execute_once)
 {
+	pg_atomic_write_u64(&MyProc->queryId, queryDesc->plannedstmt->queryId);
+
 	if (ExecutorRun_hook)
 		(*ExecutorRun_hook) (queryDesc, direction, count, execute_once);
 	else
@@ -399,6 +403,8 @@ standard_ExecutorRun(QueryDesc *queryDesc,
 void
 ExecutorFinish(QueryDesc *queryDesc)
 {
+	pg_atomic_write_u64(&MyProc->queryId, queryDesc->plannedstmt->queryId);
+
 	if (ExecutorFinish_hook)
 		(*ExecutorFinish_hook) (queryDesc);
 	else
@@ -459,6 +465,8 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
 void
 ExecutorEnd(QueryDesc *queryDesc)
 {
+	pg_atomic_write_u64(&MyProc->queryId, queryDesc->plannedstmt->queryId);
+
 	if (ExecutorEnd_hook)
 		(*ExecutorEnd_hook) (queryDesc);
 	else
@@ -538,6 +546,8 @@ ExecutorRewind(QueryDesc *queryDesc)
 	/* It's probably not sensible to rescan updating queries */
 	Assert(queryDesc->operation == CMD_SELECT);
 
+	pg_atomic_write_u64(&MyProc->queryId, queryDesc->plannedstmt->queryId);
+
 	/*
 	 * Switch into per-query memory context
 	 */
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index d898f4ca78..0729c2f1a3 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -24,6 +24,7 @@
 #include "executor/executor.h"
 #include "executor/spi_priv.h"
 #include "miscadmin.h"
+#include "storage/proc.h"
 #include "tcop/pquery.h"
 #include "tcop/utility.h"
 #include "utils/builtins.h"
@@ -1879,6 +1880,7 @@ _SPI_prepare_plan(const char *src, SPIPlanPtr plan)
 	List	   *plancache_list;
 	ListCell   *list_item;
 	ErrorContextCallback spierrcontext;
+	uint64		old_queryId = pg_atomic_read_u64(&MyProc->queryId);
 
 	/*
 	 * Setup error traceback support for ereport()
@@ -1935,6 +1937,8 @@ _SPI_prepare_plan(const char *src, SPIPlanPtr plan)
 											   _SPI_current->queryEnv);
 		}
 
+		pg_atomic_write_u64(&MyProc->queryId, old_queryId);
+
 		/* Finish filling in the CachedPlanSource */
 		CompleteCachedPlan(plansource,
 						   stmt_list,
@@ -2046,6 +2050,7 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
 	int			res = 0;
 	bool		pushed_active_snap = false;
 	ErrorContextCallback spierrcontext;
+	uint64		old_queryId = pg_atomic_read_u64(&MyProc->queryId);
 	CachedPlan *cplan = NULL;
 	ListCell   *lc1;
 
@@ -2135,6 +2140,8 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
 												   _SPI_current->queryEnv);
 			}
 
+			pg_atomic_write_u64(&MyProc->queryId, old_queryId);
+
 			/* Finish filling in the CachedPlanSource */
 			CompleteCachedPlan(plansource,
 							   stmt_list,
@@ -2305,6 +2312,8 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
 				}
 			}
 
+			pg_atomic_write_u64(&MyProc->queryId, old_queryId);
+
 			/*
 			 * The last canSetTag query sets the status values returned to the
 			 * caller.  Be careful to free any tuptables not returned, to
@@ -2408,6 +2417,7 @@ static int
 _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
 {
 	int			operation = queryDesc->operation;
+	uint64		old_queryId = pg_atomic_read_u64(&MyProc->queryId);
 	int			eflags;
 	int			res;
 
@@ -2472,6 +2482,8 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
 	ExecutorEnd(queryDesc);
 	/* FreeQueryDesc is done by the caller */
 
+	pg_atomic_write_u64(&MyProc->queryId, old_queryId);
+
 #ifdef SPI_EXECUTOR_STATS
 	if (ShowExecutorStats)
 		ShowUsage("SPI EXECUTOR STATS");
@@ -2519,6 +2531,7 @@ _SPI_cursor_operation(Portal portal, FetchDirection direction, long count,
 					  DestReceiver *dest)
 {
 	uint64		nfetched;
+	uint64		old_queryId = pg_atomic_read_u64(&MyProc->queryId);
 
 	/* Check that the portal is valid */
 	if (!PortalIsValid(portal))
@@ -2553,6 +2566,8 @@ _SPI_cursor_operation(Portal portal, FetchDirection direction, long count,
 	if (dest->mydest == DestSPI && _SPI_checktuples())
 		elog(ERROR, "consistency check on SPI tuple count failed");
 
+	pg_atomic_write_u64(&MyProc->queryId, old_queryId);
+
 	/* Put the result into place for access by caller */
 	SPI_processed = _SPI_current->processed;
 	SPI_tuptable = _SPI_current->tuptable;
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index d6cdd16607..9ee0d72746 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -44,6 +44,7 @@
 #include "parser/parse_target.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
+#include "storage/proc.h"
 #include "utils/rel.h"
 
 
@@ -118,6 +119,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 	if (post_parse_analyze_hook)
 		(*post_parse_analyze_hook) (pstate, query);
 
+	pg_atomic_write_u64(&MyProc->queryId, query->queryId);
+
 	free_parsestate(pstate);
 
 	return query;
@@ -151,6 +154,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 	if (post_parse_analyze_hook)
 		(*post_parse_analyze_hook) (pstate, query);
 
+	pg_atomic_write_u64(&MyProc->queryId, query->queryId);
+
 	free_parsestate(pstate);
 
 	return query;
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 0da5b19719..4693597aa1 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -284,6 +284,7 @@ InitProcGlobal(void)
 		 */
 		pg_atomic_init_u32(&(procs[i].procArrayGroupNext), INVALID_PGPROCNO);
 		pg_atomic_init_u32(&(procs[i].clogGroupNext), INVALID_PGPROCNO);
+		pg_atomic_init_u64(&(procs[i].queryId), 0);
 	}
 
 	/*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index f9ce3d8f22..73b92d243f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -744,6 +744,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 	if (post_parse_analyze_hook)
 		(*post_parse_analyze_hook) (pstate, query);
 
+	pg_atomic_write_u64(&MyProc->queryId, query->queryId);
+
 	free_parsestate(pstate);
 
 	if (log_parser_stats)
@@ -4027,6 +4029,12 @@ PostgresMain(int argc, char *argv[],
 		 */
 		debug_query_string = NULL;
 
+		/*
+		 * Also reset the queryId, as any new error encountered before a
+		 * specific query is executed isn't linked to the last saved value
+		 */
+		pg_atomic_write_u64(&MyProc->queryId, 0);
+
 		/*
 		 * Abort the current transaction in order to recover.
 		 */
@@ -4106,6 +4114,12 @@ PostgresMain(int argc, char *argv[],
 		 */
 		doing_extended_query_message = false;
 
+		/*
+		 * Also reset the queryId, so any error encountered before a specific
+		 * query is executed won't display the last saved value
+		 */
+		pg_atomic_write_u64(&MyProc->queryId, 0);
+
 		/*
 		 * Release storage left over from prior query cycle, and create a new
 		 * query input buffer in the cleared MessageContext.
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index da1d685c08..b8ba5819d2 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -541,7 +541,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	26
+#define PG_STAT_GET_ACTIVITY_COLS	27
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -855,6 +855,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[18] = BoolGetDatum(false);	/* ssl */
 				nulls[19] = nulls[20] = nulls[21] = nulls[22] = nulls[23] = nulls[24] = nulls[25] = true;
 			}
+			values[26] = DatumGetUInt64(pg_atomic_read_u64(&proc->queryId));
 		}
 		else
 		{
@@ -879,6 +880,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[23] = true;
 			nulls[24] = true;
 			nulls[25] = true;
+			nulls[26] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 8b4720ef3a..8e611bd239 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2594,6 +2594,20 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (MyProc != NULL)
+				{
+					if (padding != 0)
+						appendStringInfo(buf, "%*ld", padding,
+								pg_atomic_read_u64(&MyProc->queryId));
+					else
+						appendStringInfo(buf, "%ld",
+								pg_atomic_read_u64(&MyProc->queryId));
+				}
+				else if (padding != 0)
+					appendStringInfoSpaces(buf,
+										   padding > 0 ? padding : -padding);
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index cccb5f145a..1c3efdff9c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -515,6 +515,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 84120de362..d796e4905d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5089,9 +5089,10 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn}',
+  proallargtypes =>
+  '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 1cee7db89d..8e3a6ae9ca 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -173,6 +173,7 @@ struct PGPROC
 	 */
 	TransactionId procArrayGroupMemberXid;
 
+	pg_atomic_uint64	queryId;	/* current queryid if any */
 	uint32		wait_event_info;	/* proc's wait information */
 
 	/* Support for group transaction status update. */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index f104dc4a62..b10c827507 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1737,9 +1737,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1872,7 +1873,7 @@ pg_stat_replication| SELECT s.pid,
     w.sync_priority,
     w.sync_state,
     w.reply_time
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_ssl| SELECT s.pid,
@@ -1884,7 +1885,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn);
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, queryid);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,
     st.pid,

jfinnert@amazon.com

almost 7 years ago

In reply to: Julien Rouhaud (#12)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

The queryId depends on oids, so it is not stable enough for some purposes.
For example, to create a SQL identifier that survives across a server
upgrade, or that can be shipped to another database, the queryId isn't
usable.

The apg_plan_mgmt extensions keeps both its own stable SQL identifier as
well as the queryId, so it can be used to join to pg_stat_statements if
desired. If we were to standardize on one SQL identifier, it should be
stable enough to survive a major version upgrade or to be the same in
different databases.

-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

robertmhaas@gmail.com

almost 7 years ago

In reply to: Jim Finnerty (#13)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Mar 19, 2019 at 1:24 PM Jim Finnerty <jfinnert@amazon.com> wrote:

The queryId depends on oids, so it is not stable enough for some purposes.
For example, to create a SQL identifier that survives across a server
upgrade, or that can be shipped to another database, the queryId isn't
usable.

The apg_plan_mgmt extensions keeps both its own stable SQL identifier as
well as the queryId, so it can be used to join to pg_stat_statements if
desired. If we were to standardize on one SQL identifier, it should be
stable enough to survive a major version upgrade or to be the same in
different databases.

If Amazon would like to open-source its (AIUI) proprietary technology
for computing query IDs and propose it for inclusion in PostgreSQL,
cool, but I think that is a separate question from whether people
would like more convenient access to the query ID technology that we
have today. I think it's 100% clear that they would like that, even
as things stand, and therefore it does not make sense to block that
behind Amazon deciding to share what it already has or somebody else
trying to reimplement it.

If we need to have a space for both a core-standard query ID and
another query ID that is available for extension use, adding one more
field to struct Query, so we can have both coreQueryId and
extensionQueryId or whatever, would be easy to do. It appears that
there's more use case than I would have guessed for custom query IDs.
On the other hand, it also appears that a lot of people would be very,
very happy to just be able to see the query ID field that already
exists, both in pg_stat_statements in pg_stat_activity, and we
shouldn't throw up unnecessary impediments in the way of making that
happen, at least IMHO.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

legrand legrand

legrand_legrand@hotmail.com

almost 7 years ago

In reply to: Julien Rouhaud (#12)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Great,
thank you Julien !

Would it make sense to add it in auto explain ?
I don't know for explain itself, but maybe ...

Regards
PAscal

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

rjuju123@gmail.com

almost 7 years ago

In reply to: legrand legrand (#15)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Mar 19, 2019 at 8:38 PM legrand legrand
<legrand_legrand@hotmail.com> wrote:

Would it make sense to add it in auto explain ?
I don't know for explain itself, but maybe ...

I'd think that people interested in getting the queryid in the logs
would configure the log_line_prefix to display it consistently rather
than having it in only a subset of cases, so that's probably not
really needed.

legrand legrand

legrand_legrand@hotmail.com

almost 7 years ago

In reply to: Robert Haas (#14)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Hi Jim, Robert,

As this is a distinct subject from adding QueryId to pg_stat_activity,
would it be possible to continue the discussion "new QueryId definition"
(for postgres open source software) here:

/messages/by-id/1553029215728-0.post@n3.nabble.com

Thanks in advance.
Regards
PAscal

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

legrand legrand

legrand_legrand@hotmail.com

almost 7 years ago

In reply to: Julien Rouhaud (#16)

RE: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Would it make sense to add it in auto explain ?
I don't know for explain itself, but maybe ...

I'd think that people interested in getting the queryid in the logs
would configure the log_line_prefix to display it consistently rather
than having it in only a subset of cases, so that's probably not
really needed.

Ok.
Shoudn't you add this to commitfest ?

rjuju123@gmail.com

almost 7 years ago

In reply to: legrand legrand (#18)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Mon, Mar 25, 2019 at 12:36 PM legrand legrand
<legrand_legrand@hotmail.com> wrote:

Would it make sense to add it in auto explain ?
I don't know for explain itself, but maybe ...

I'd think that people interested in getting the queryid in the logs
would configure the log_line_prefix to display it consistently rather
than having it in only a subset of cases, so that's probably not
really needed.

Ok.
Shoudn't you add this to commitfest ?

I added it last week, see https://commitfest.postgresql.org/23/2069/

legrand legrand

legrand_legrand@hotmail.com

almost 7 years ago

In reply to: Julien Rouhaud (#19)

RE: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Shoudn't you add this to commitfest ?

I added it last week, see https://commitfest.postgresql.org/23/2069/

Oups, sorry for the noise

rjuju123@gmail.com

over 6 years ago

In reply to: Julien Rouhaud (#12)

1 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Mar 19, 2019 at 3:51 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Mon, Mar 18, 2019 at 7:33 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Mon, Mar 18, 2019 at 6:23 PM Yun Li <liyunjuanyong@gmail.com> wrote:

Let's take one step back. Since queryId is stored in core as Julien pointed out, can we just add that global to the pg_stat_get_activity and ultimately exposed in pg_stat_activity view? Then no matter whether PGSS is on or off, or however the customer extensions are updating that filed, we expose that field in that view then enable user to leverage that id to join with pgss or their extension. Will this sounds a good idea?

I'd greatly welcome expose queryid exposure in pg_stat_activity, and
also in log_line_prefix. I'm afraid that it's too late for pg12
inclusion, but I'll be happy to provide a patch for that for pg13.

Here's a prototype patch for queryid exposure in pg_stat_activity and
log_line prefix.

Patch doesn't apply anymore, PFA rebased v2.

Attachments:

queryid_exposure-v2.difftext/x-patch; charset=US-ASCII; name=queryid_exposure-v2.diffDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 84341a30e5..d68b492c25 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6353,6 +6353,11 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>queryid: identifier of session's current query, if any</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c303..d4e3d70933 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -824,6 +824,19 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      <entry><type>xid</type></entry>
      <entry>The current backend's <literal>xmin</literal> horizon.</entry>
     </row>
+    <row>
+     <entry><structfield>queryid</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>Identifier this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this field
+      shows the identifier of the currently executing query. In all other
+      states, it shows the identifier of last query that was executed, unless
+      an error occured which will reset this field to 0.  By default, query
+      identifiers are not computed, so this field will always display 0, unless
+      an additional module that compute query identifiers, such as <xref
+      linkend="pgstatstatements"/>, is configured.
+     </entry>
+    </row>
     <row>
      <entry><structfield>query</structfield></entry>
      <entry><type>text</type></entry>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ea4c85e395..f30098c2cd 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -749,6 +749,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 27f0345515..44c9525a59 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -143,6 +143,8 @@ static void EvalPlanQualStart(EPQState *epqstate, EState *parentestate,
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	pg_atomic_write_u64(&MyProc->queryId, queryDesc->plannedstmt->queryId);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
@@ -303,6 +305,8 @@ ExecutorRun(QueryDesc *queryDesc,
 			ScanDirection direction, uint64 count,
 			bool execute_once)
 {
+	pg_atomic_write_u64(&MyProc->queryId, queryDesc->plannedstmt->queryId);
+
 	if (ExecutorRun_hook)
 		(*ExecutorRun_hook) (queryDesc, direction, count, execute_once);
 	else
@@ -402,6 +406,8 @@ standard_ExecutorRun(QueryDesc *queryDesc,
 void
 ExecutorFinish(QueryDesc *queryDesc)
 {
+	pg_atomic_write_u64(&MyProc->queryId, queryDesc->plannedstmt->queryId);
+
 	if (ExecutorFinish_hook)
 		(*ExecutorFinish_hook) (queryDesc);
 	else
@@ -462,6 +468,8 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
 void
 ExecutorEnd(QueryDesc *queryDesc)
 {
+	pg_atomic_write_u64(&MyProc->queryId, queryDesc->plannedstmt->queryId);
+
 	if (ExecutorEnd_hook)
 		(*ExecutorEnd_hook) (queryDesc);
 	else
@@ -541,6 +549,8 @@ ExecutorRewind(QueryDesc *queryDesc)
 	/* It's probably not sensible to rescan updating queries */
 	Assert(queryDesc->operation == CMD_SELECT);
 
+	pg_atomic_write_u64(&MyProc->queryId, queryDesc->plannedstmt->queryId);
+
 	/*
 	 * Switch into per-query memory context
 	 */
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 8eedb613a1..1d8c859a88 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -24,6 +24,7 @@
 #include "executor/executor.h"
 #include "executor/spi_priv.h"
 #include "miscadmin.h"
+#include "storage/proc.h"
 #include "tcop/pquery.h"
 #include "tcop/utility.h"
 #include "utils/builtins.h"
@@ -1940,6 +1941,7 @@ _SPI_prepare_plan(const char *src, SPIPlanPtr plan)
 	List	   *plancache_list;
 	ListCell   *list_item;
 	ErrorContextCallback spierrcontext;
+	uint64		old_queryId = pg_atomic_read_u64(&MyProc->queryId);
 
 	/*
 	 * Setup error traceback support for ereport()
@@ -1996,6 +1998,8 @@ _SPI_prepare_plan(const char *src, SPIPlanPtr plan)
 											   _SPI_current->queryEnv);
 		}
 
+		pg_atomic_write_u64(&MyProc->queryId, old_queryId);
+
 		/* Finish filling in the CachedPlanSource */
 		CompleteCachedPlan(plansource,
 						   stmt_list,
@@ -2107,6 +2111,7 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
 	int			res = 0;
 	bool		pushed_active_snap = false;
 	ErrorContextCallback spierrcontext;
+	uint64		old_queryId = pg_atomic_read_u64(&MyProc->queryId);
 	CachedPlan *cplan = NULL;
 	ListCell   *lc1;
 
@@ -2196,6 +2201,8 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
 												   _SPI_current->queryEnv);
 			}
 
+			pg_atomic_write_u64(&MyProc->queryId, old_queryId);
+
 			/* Finish filling in the CachedPlanSource */
 			CompleteCachedPlan(plansource,
 							   stmt_list,
@@ -2366,6 +2373,8 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
 				}
 			}
 
+			pg_atomic_write_u64(&MyProc->queryId, old_queryId);
+
 			/*
 			 * The last canSetTag query sets the status values returned to the
 			 * caller.  Be careful to free any tuptables not returned, to
@@ -2469,6 +2478,7 @@ static int
 _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
 {
 	int			operation = queryDesc->operation;
+	uint64		old_queryId = pg_atomic_read_u64(&MyProc->queryId);
 	int			eflags;
 	int			res;
 
@@ -2533,6 +2543,8 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
 	ExecutorEnd(queryDesc);
 	/* FreeQueryDesc is done by the caller */
 
+	pg_atomic_write_u64(&MyProc->queryId, old_queryId);
+
 #ifdef SPI_EXECUTOR_STATS
 	if (ShowExecutorStats)
 		ShowUsage("SPI EXECUTOR STATS");
@@ -2580,6 +2592,7 @@ _SPI_cursor_operation(Portal portal, FetchDirection direction, long count,
 					  DestReceiver *dest)
 {
 	uint64		nfetched;
+	uint64		old_queryId = pg_atomic_read_u64(&MyProc->queryId);
 
 	/* Check that the portal is valid */
 	if (!PortalIsValid(portal))
@@ -2614,6 +2627,8 @@ _SPI_cursor_operation(Portal portal, FetchDirection direction, long count,
 	if (dest->mydest == DestSPI && _SPI_checktuples())
 		elog(ERROR, "consistency check on SPI tuple count failed");
 
+	pg_atomic_write_u64(&MyProc->queryId, old_queryId);
+
 	/* Put the result into place for access by caller */
 	SPI_processed = _SPI_current->processed;
 	SPI_tuptable = _SPI_current->tuptable;
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index b13c246183..cca506674c 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -44,6 +44,7 @@
 #include "parser/parse_target.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
+#include "storage/proc.h"
 #include "utils/rel.h"
 
 
@@ -118,6 +119,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 	if (post_parse_analyze_hook)
 		(*post_parse_analyze_hook) (pstate, query);
 
+	pg_atomic_write_u64(&MyProc->queryId, query->queryId);
+
 	free_parsestate(pstate);
 
 	return query;
@@ -151,6 +154,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 	if (post_parse_analyze_hook)
 		(*post_parse_analyze_hook) (pstate, query);
 
+	pg_atomic_write_u64(&MyProc->queryId, query->queryId);
+
 	free_parsestate(pstate);
 
 	return query;
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 498373fd0e..b080764165 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -284,6 +284,7 @@ InitProcGlobal(void)
 		 */
 		pg_atomic_init_u32(&(procs[i].procArrayGroupNext), INVALID_PGPROCNO);
 		pg_atomic_init_u32(&(procs[i].clogGroupNext), INVALID_PGPROCNO);
+		pg_atomic_init_u64(&(procs[i].queryId), 0);
 	}
 
 	/*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 44a59e1d4f..43d4a852f9 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -744,6 +744,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 	if (post_parse_analyze_hook)
 		(*post_parse_analyze_hook) (pstate, query);
 
+	pg_atomic_write_u64(&MyProc->queryId, query->queryId);
+
 	free_parsestate(pstate);
 
 	if (log_parser_stats)
@@ -4029,6 +4031,12 @@ PostgresMain(int argc, char *argv[],
 		 */
 		debug_query_string = NULL;
 
+		/*
+		 * Also reset the queryId, as any new error encountered before a
+		 * specific query is executed isn't linked to the last saved value
+		 */
+		pg_atomic_write_u64(&MyProc->queryId, 0);
+
 		/*
 		 * Abort the current transaction in order to recover.
 		 */
@@ -4108,6 +4116,12 @@ PostgresMain(int argc, char *argv[],
 		 */
 		doing_extended_query_message = false;
 
+		/*
+		 * Also reset the queryId, so any error encountered before a specific
+		 * query is executed won't display the last saved value
+		 */
+		pg_atomic_write_u64(&MyProc->queryId, 0);
+
 		/*
 		 * Release storage left over from prior query cycle, and create a new
 		 * query input buffer in the cleared MessageContext.
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..f6b0c58b4c 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -546,7 +546,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	29
+#define PG_STAT_GET_ACTIVITY_COLS	30
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -875,6 +875,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[28] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			values[29] = DatumGetUInt64(pg_atomic_read_u64(&proc->queryId));
 		}
 		else
 		{
@@ -902,6 +903,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[26] = true;
 			nulls[27] = true;
 			nulls[28] = true;
+			nulls[29] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 8b4720ef3a..8e611bd239 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2594,6 +2594,20 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (MyProc != NULL)
+				{
+					if (padding != 0)
+						appendStringInfo(buf, "%*ld", padding,
+								pg_atomic_read_u64(&MyProc->queryId));
+					else
+						appendStringInfo(buf, "%ld",
+								pg_atomic_read_u64(&MyProc->queryId));
+				}
+				else if (padding != 0)
+					appendStringInfoSpaces(buf,
+										   padding > 0 ? padding : -padding);
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 5ee5e09ddf..d6d31195f3 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -523,6 +523,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 87335248a0..115b9c4ad0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5114,9 +5114,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 1cee7db89d..8e3a6ae9ca 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -173,6 +173,7 @@ struct PGPROC
 	 */
 	TransactionId procArrayGroupMemberXid;
 
+	pg_atomic_uint64	queryId;	/* current queryid if any */
 	uint32		wait_event_info;	/* proc's wait information */
 
 	/* Support for group transaction status update. */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 210e9cd146..773ef0438b 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1739,9 +1739,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1952,7 +1953,7 @@ pg_stat_replication| SELECT s.pid,
     w.sync_priority,
     w.sync_state,
     w.reply_time
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_ssl| SELECT s.pid,
@@ -1964,7 +1965,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc);
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, queryid);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,
     st.pid,

rjuju123@gmail.com

over 6 years ago

In reply to: Julien Rouhaud (#21)

1 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Jun 28, 2019 at 4:39 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Tue, Mar 19, 2019 at 3:51 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Mon, Mar 18, 2019 at 7:33 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Mon, Mar 18, 2019 at 6:23 PM Yun Li <liyunjuanyong@gmail.com> wrote:

Let's take one step back. Since queryId is stored in core as Julien pointed out, can we just add that global to the pg_stat_get_activity and ultimately exposed in pg_stat_activity view? Then no matter whether PGSS is on or off, or however the customer extensions are updating that filed, we expose that field in that view then enable user to leverage that id to join with pgss or their extension. Will this sounds a good idea?

I'd greatly welcome expose queryid exposure in pg_stat_activity, and
also in log_line_prefix. I'm afraid that it's too late for pg12
inclusion, but I'll be happy to provide a patch for that for pg13.

Here's a prototype patch for queryid exposure in pg_stat_activity and
log_line prefix.

Patch doesn't apply anymore, PFA rebased v2.

Sorry, I missed the new pg_stat_gssapi view.

Attachments:

queryid_exposure-v3.difftext/x-patch; charset=US-ASCII; name=queryid_exposure-v3.diffDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 84341a30e5..d68b492c25 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6353,6 +6353,11 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>queryid: identifier of session's current query, if any</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c303..d4e3d70933 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -824,6 +824,19 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      <entry><type>xid</type></entry>
      <entry>The current backend's <literal>xmin</literal> horizon.</entry>
     </row>
+    <row>
+     <entry><structfield>queryid</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>Identifier this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this field
+      shows the identifier of the currently executing query. In all other
+      states, it shows the identifier of last query that was executed, unless
+      an error occured which will reset this field to 0.  By default, query
+      identifiers are not computed, so this field will always display 0, unless
+      an additional module that compute query identifiers, such as <xref
+      linkend="pgstatstatements"/>, is configured.
+     </entry>
+    </row>
     <row>
      <entry><structfield>query</structfield></entry>
      <entry><type>text</type></entry>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ea4c85e395..f30098c2cd 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -749,6 +749,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 27f0345515..44c9525a59 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -143,6 +143,8 @@ static void EvalPlanQualStart(EPQState *epqstate, EState *parentestate,
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	pg_atomic_write_u64(&MyProc->queryId, queryDesc->plannedstmt->queryId);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
@@ -303,6 +305,8 @@ ExecutorRun(QueryDesc *queryDesc,
 			ScanDirection direction, uint64 count,
 			bool execute_once)
 {
+	pg_atomic_write_u64(&MyProc->queryId, queryDesc->plannedstmt->queryId);
+
 	if (ExecutorRun_hook)
 		(*ExecutorRun_hook) (queryDesc, direction, count, execute_once);
 	else
@@ -402,6 +406,8 @@ standard_ExecutorRun(QueryDesc *queryDesc,
 void
 ExecutorFinish(QueryDesc *queryDesc)
 {
+	pg_atomic_write_u64(&MyProc->queryId, queryDesc->plannedstmt->queryId);
+
 	if (ExecutorFinish_hook)
 		(*ExecutorFinish_hook) (queryDesc);
 	else
@@ -462,6 +468,8 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
 void
 ExecutorEnd(QueryDesc *queryDesc)
 {
+	pg_atomic_write_u64(&MyProc->queryId, queryDesc->plannedstmt->queryId);
+
 	if (ExecutorEnd_hook)
 		(*ExecutorEnd_hook) (queryDesc);
 	else
@@ -541,6 +549,8 @@ ExecutorRewind(QueryDesc *queryDesc)
 	/* It's probably not sensible to rescan updating queries */
 	Assert(queryDesc->operation == CMD_SELECT);
 
+	pg_atomic_write_u64(&MyProc->queryId, queryDesc->plannedstmt->queryId);
+
 	/*
 	 * Switch into per-query memory context
 	 */
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 8eedb613a1..1d8c859a88 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -24,6 +24,7 @@
 #include "executor/executor.h"
 #include "executor/spi_priv.h"
 #include "miscadmin.h"
+#include "storage/proc.h"
 #include "tcop/pquery.h"
 #include "tcop/utility.h"
 #include "utils/builtins.h"
@@ -1940,6 +1941,7 @@ _SPI_prepare_plan(const char *src, SPIPlanPtr plan)
 	List	   *plancache_list;
 	ListCell   *list_item;
 	ErrorContextCallback spierrcontext;
+	uint64		old_queryId = pg_atomic_read_u64(&MyProc->queryId);
 
 	/*
 	 * Setup error traceback support for ereport()
@@ -1996,6 +1998,8 @@ _SPI_prepare_plan(const char *src, SPIPlanPtr plan)
 											   _SPI_current->queryEnv);
 		}
 
+		pg_atomic_write_u64(&MyProc->queryId, old_queryId);
+
 		/* Finish filling in the CachedPlanSource */
 		CompleteCachedPlan(plansource,
 						   stmt_list,
@@ -2107,6 +2111,7 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
 	int			res = 0;
 	bool		pushed_active_snap = false;
 	ErrorContextCallback spierrcontext;
+	uint64		old_queryId = pg_atomic_read_u64(&MyProc->queryId);
 	CachedPlan *cplan = NULL;
 	ListCell   *lc1;
 
@@ -2196,6 +2201,8 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
 												   _SPI_current->queryEnv);
 			}
 
+			pg_atomic_write_u64(&MyProc->queryId, old_queryId);
+
 			/* Finish filling in the CachedPlanSource */
 			CompleteCachedPlan(plansource,
 							   stmt_list,
@@ -2366,6 +2373,8 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
 				}
 			}
 
+			pg_atomic_write_u64(&MyProc->queryId, old_queryId);
+
 			/*
 			 * The last canSetTag query sets the status values returned to the
 			 * caller.  Be careful to free any tuptables not returned, to
@@ -2469,6 +2478,7 @@ static int
 _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
 {
 	int			operation = queryDesc->operation;
+	uint64		old_queryId = pg_atomic_read_u64(&MyProc->queryId);
 	int			eflags;
 	int			res;
 
@@ -2533,6 +2543,8 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
 	ExecutorEnd(queryDesc);
 	/* FreeQueryDesc is done by the caller */
 
+	pg_atomic_write_u64(&MyProc->queryId, old_queryId);
+
 #ifdef SPI_EXECUTOR_STATS
 	if (ShowExecutorStats)
 		ShowUsage("SPI EXECUTOR STATS");
@@ -2580,6 +2592,7 @@ _SPI_cursor_operation(Portal portal, FetchDirection direction, long count,
 					  DestReceiver *dest)
 {
 	uint64		nfetched;
+	uint64		old_queryId = pg_atomic_read_u64(&MyProc->queryId);
 
 	/* Check that the portal is valid */
 	if (!PortalIsValid(portal))
@@ -2614,6 +2627,8 @@ _SPI_cursor_operation(Portal portal, FetchDirection direction, long count,
 	if (dest->mydest == DestSPI && _SPI_checktuples())
 		elog(ERROR, "consistency check on SPI tuple count failed");
 
+	pg_atomic_write_u64(&MyProc->queryId, old_queryId);
+
 	/* Put the result into place for access by caller */
 	SPI_processed = _SPI_current->processed;
 	SPI_tuptable = _SPI_current->tuptable;
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index b13c246183..cca506674c 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -44,6 +44,7 @@
 #include "parser/parse_target.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
+#include "storage/proc.h"
 #include "utils/rel.h"
 
 
@@ -118,6 +119,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 	if (post_parse_analyze_hook)
 		(*post_parse_analyze_hook) (pstate, query);
 
+	pg_atomic_write_u64(&MyProc->queryId, query->queryId);
+
 	free_parsestate(pstate);
 
 	return query;
@@ -151,6 +154,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 	if (post_parse_analyze_hook)
 		(*post_parse_analyze_hook) (pstate, query);
 
+	pg_atomic_write_u64(&MyProc->queryId, query->queryId);
+
 	free_parsestate(pstate);
 
 	return query;
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 498373fd0e..b080764165 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -284,6 +284,7 @@ InitProcGlobal(void)
 		 */
 		pg_atomic_init_u32(&(procs[i].procArrayGroupNext), INVALID_PGPROCNO);
 		pg_atomic_init_u32(&(procs[i].clogGroupNext), INVALID_PGPROCNO);
+		pg_atomic_init_u64(&(procs[i].queryId), 0);
 	}
 
 	/*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 44a59e1d4f..43d4a852f9 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -744,6 +744,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 	if (post_parse_analyze_hook)
 		(*post_parse_analyze_hook) (pstate, query);
 
+	pg_atomic_write_u64(&MyProc->queryId, query->queryId);
+
 	free_parsestate(pstate);
 
 	if (log_parser_stats)
@@ -4029,6 +4031,12 @@ PostgresMain(int argc, char *argv[],
 		 */
 		debug_query_string = NULL;
 
+		/*
+		 * Also reset the queryId, as any new error encountered before a
+		 * specific query is executed isn't linked to the last saved value
+		 */
+		pg_atomic_write_u64(&MyProc->queryId, 0);
+
 		/*
 		 * Abort the current transaction in order to recover.
 		 */
@@ -4108,6 +4116,12 @@ PostgresMain(int argc, char *argv[],
 		 */
 		doing_extended_query_message = false;
 
+		/*
+		 * Also reset the queryId, so any error encountered before a specific
+		 * query is executed won't display the last saved value
+		 */
+		pg_atomic_write_u64(&MyProc->queryId, 0);
+
 		/*
 		 * Release storage left over from prior query cycle, and create a new
 		 * query input buffer in the cleared MessageContext.
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..f6b0c58b4c 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -546,7 +546,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	29
+#define PG_STAT_GET_ACTIVITY_COLS	30
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -875,6 +875,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[28] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			values[29] = DatumGetUInt64(pg_atomic_read_u64(&proc->queryId));
 		}
 		else
 		{
@@ -902,6 +903,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[26] = true;
 			nulls[27] = true;
 			nulls[28] = true;
+			nulls[29] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 8b4720ef3a..8e611bd239 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2594,6 +2594,20 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (MyProc != NULL)
+				{
+					if (padding != 0)
+						appendStringInfo(buf, "%*ld", padding,
+								pg_atomic_read_u64(&MyProc->queryId));
+					else
+						appendStringInfo(buf, "%ld",
+								pg_atomic_read_u64(&MyProc->queryId));
+				}
+				else if (padding != 0)
+					appendStringInfoSpaces(buf,
+										   padding > 0 ? padding : -padding);
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 5ee5e09ddf..d6d31195f3 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -523,6 +523,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 87335248a0..115b9c4ad0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5114,9 +5114,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 1cee7db89d..8e3a6ae9ca 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -173,6 +173,7 @@ struct PGPROC
 	 */
 	TransactionId procArrayGroupMemberXid;
 
+	pg_atomic_uint64	queryId;	/* current queryid if any */
 	uint32		wait_event_info;	/* proc's wait information */
 
 	/* Support for group transaction status update. */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 210e9cd146..0cbef52045 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1739,9 +1739,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1845,7 +1846,7 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc);
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, queryid);
 pg_stat_progress_cluster| SELECT s.pid,
     s.datid,
     d.datname,
@@ -1952,7 +1953,7 @@ pg_stat_replication| SELECT s.pid,
     w.sync_priority,
     w.sync_state,
     w.reply_time
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_ssl| SELECT s.pid,
@@ -1964,7 +1965,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc);
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, queryid);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,
     st.pid,

Peter Geoghegan

pg@bowt.ie

over 6 years ago

In reply to: Robert Haas (#14)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Mar 19, 2019 at 12:00 PM Robert Haas <robertmhaas@gmail.com> wrote:

On the other hand, it also appears that a lot of people would be very,
very happy to just be able to see the query ID field that already
exists, both in pg_stat_statements in pg_stat_activity, and we
shouldn't throw up unnecessary impediments in the way of making that
happen, at least IMHO.

+1.

pg_stat_statements will already lose all the statistics that it
aggregated in the event of a hard crash. The trade-off that the query
jumbling logic makes is not a bad one, all things considered.

--
Peter Geoghegan

Peter Geoghegan

pg@bowt.ie

over 6 years ago

In reply to: legrand legrand (#15)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Mar 19, 2019 at 12:38 PM legrand legrand
<legrand_legrand@hotmail.com> wrote:

Would it make sense to add it in auto explain ?
I don't know for explain itself, but maybe ...

I think that it should appear in EXPLAIN. pg_stat_statements already
cannot have a query hash of zero, so it might be okay to display it
only when its value is non-zero.

--
Peter Geoghegan

efimkin@yandex-team.ru

over 6 years ago

In reply to: Peter Geoghegan (#24)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

What reason to use pg_atomic_uint64?
In docs:
occured - > occurred

julien.rouhaud@free.fr

over 6 years ago

In reply to: Evgeny Efimkin (#25)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Hello,

On Wed, Jul 31, 2019 at 10:55 AM Evgeny Efimkin <efimkin@yandex-team.ru> wrote:

What reason to use pg_atomic_uint64?

The queryid is read and written without holding any lock on the PGPROC
entry, so the pg_atomic_uint64 will guarantee that we get a consistent
value in pg_stat_get_activity(). Other reads shouldn't be a problem
as far as I remember.

In docs:
occured - > occurred

Thanks! I fixed it on my local branch.

andres@anarazel.de

over 6 years ago

In reply to: Julien Rouhaud (#26)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Hi,

On 2019-07-31 23:51:40 +0200, Julien Rouhaud wrote:

On Wed, Jul 31, 2019 at 10:55 AM Evgeny Efimkin <efimkin@yandex-team.ru> wrote:

What reason to use pg_atomic_uint64?

The queryid is read and written without holding any lock on the PGPROC
entry, so the pg_atomic_uint64 will guarantee that we get a consistent
value in pg_stat_get_activity(). Other reads shouldn't be a problem
as far as I remember.

Hm, I don't think that's necessary in this case. That's what the
st_changecount protocol is trying to ensure, no?

/*
* To avoid locking overhead, we use the following protocol: a backend
* increments st_changecount before modifying its entry, and again after
* finishing a modification. A would-be reader should note the value of
* st_changecount, copy the entry into private memory, then check
* st_changecount again. If the value hasn't changed, and if it's even,
* the copy is valid; otherwise start over. This makes updates cheap
* while reads are potentially expensive, but that's the tradeoff we want.
*
* The above protocol needs memory barriers to ensure that the apparent
* order of execution is as it desires. Otherwise, for example, the CPU
* might rearrange the code so that st_changecount is incremented twice
* before the modification on a machine with weak memory ordering. Hence,
* use the macros defined below for manipulating st_changecount, rather
* than touching it directly.
*/
int st_changecount;

And if it were necessary, why wouldn't any of the other fields in
PgBackendStatus need it? There's plenty of other fields written to
without a lock, and several of those are also 8 bytes (so it's not a
case of assuming that 8 byte reads might not be atomic, but for byte
reads are).

Greetings,

Andres Freund

julien.rouhaud@free.fr

over 6 years ago

In reply to: Andres Freund (#27)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Jul 31, 2019 at 11:59 PM Andres Freund <andres@anarazel.de> wrote:

On 2019-07-31 23:51:40 +0200, Julien Rouhaud wrote:

On Wed, Jul 31, 2019 at 10:55 AM Evgeny Efimkin <efimkin@yandex-team.ru> wrote:

What reason to use pg_atomic_uint64?

The queryid is read and written without holding any lock on the PGPROC
entry, so the pg_atomic_uint64 will guarantee that we get a consistent
value in pg_stat_get_activity(). Other reads shouldn't be a problem
as far as I remember.

Hm, I don't think that's necessary in this case. That's what the
st_changecount protocol is trying to ensure, no?

/*
* To avoid locking overhead, we use the following protocol: a backend
* increments st_changecount before modifying its entry, and again after
* finishing a modification. A would-be reader should note the value of
* st_changecount, copy the entry into private memory, then check
* st_changecount again. If the value hasn't changed, and if it's even,
* the copy is valid; otherwise start over. This makes updates cheap
* while reads are potentially expensive, but that's the tradeoff we want.
*
* The above protocol needs memory barriers to ensure that the apparent
* order of execution is as it desires. Otherwise, for example, the CPU
* might rearrange the code so that st_changecount is incremented twice
* before the modification on a machine with weak memory ordering. Hence,
* use the macros defined below for manipulating st_changecount, rather
* than touching it directly.
*/
int st_changecount;

And if it were necessary, why wouldn't any of the other fields in
PgBackendStatus need it? There's plenty of other fields written to
without a lock, and several of those are also 8 bytes (so it's not a
case of assuming that 8 byte reads might not be atomic, but for byte
reads are).

This patch is actually storing the queryid in PGPROC, not in
PgBackendStatus, thus the need for an atomic. I used PGPROC because
the value needs to be available in log_line_prefix() and spi.c, so
pgstat.c / PgBackendStatus didn't seem like the best interface in that
case. Is widening PGPROC is too expensive for this purpose?

robertmhaas@gmail.com

over 6 years ago

In reply to: Julien Rouhaud (#28)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Aug 1, 2019 at 2:46 AM Julien Rouhaud <julien.rouhaud@free.fr> wrote:

This patch is actually storing the queryid in PGPROC, not in
PgBackendStatus, thus the need for an atomic. I used PGPROC because
the value needs to be available in log_line_prefix() and spi.c, so
pgstat.c / PgBackendStatus didn't seem like the best interface in that
case. Is widening PGPROC is too expensive for this purpose?

I doubt it.

However, I think that the fact that this patch adds 15 new calls to
pg_atomic_write_u64(&MyProc->queryId, ...) is probably not a good
sign. It seems like we ought to be able to centralize it better than
that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

andres@anarazel.de

over 6 years ago

In reply to: Julien Rouhaud (#28)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Hi,

On 2019-08-01 08:45:45 +0200, Julien Rouhaud wrote:

On Wed, Jul 31, 2019 at 11:59 PM Andres Freund <andres@anarazel.de> wrote:

And if it were necessary, why wouldn't any of the other fields in
PgBackendStatus need it? There's plenty of other fields written to
without a lock, and several of those are also 8 bytes (so it's not a
case of assuming that 8 byte reads might not be atomic, but for byte
reads are).

This patch is actually storing the queryid in PGPROC, not in
PgBackendStatus, thus the need for an atomic. I used PGPROC because
the value needs to be available in log_line_prefix() and spi.c, so
pgstat.c / PgBackendStatus didn't seem like the best interface in that
case.

Hm. I'm not convinced that really is the case? You can just access
MyBEentry, and read and update it? I mean, we do so at a frequency
roughtly as high as high as the new queryid updates for things like
pgstat_report_activity(). Reading the value of your own backend you'd
not need to follow the changecount algorithm, I think, because it's only
updated from the current backend. If reading were a problem, you
trivially just could have a cache in a local variable, to avoid
accessing shared memory.

Is widening PGPROC is too expensive for this purpose?

Well, I'm mostly not a fan of putting even more in there, because it's
pretty hard to understand already. To me it architecturally status
information doesn't belong there (In fact, I'm somewhat unhappy that
wait_event_info etc in there, but that's at least commonly updated at
the same time as other fields in PGPROC).

Greetings,

Andres Freund

andres@anarazel.de

over 6 years ago

In reply to: Robert Haas (#29)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On 2019-08-01 14:20:46 -0400, Robert Haas wrote:

However, I think that the fact that this patch adds 15 new calls to
pg_atomic_write_u64(&MyProc->queryId, ...) is probably not a good
sign. It seems like we ought to be able to centralize it better than
that.

+1

julien.rouhaud@free.fr

over 6 years ago

In reply to: Andres Freund (#30)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Aug 1, 2019 at 8:36 PM Andres Freund <andres@anarazel.de> wrote:

On 2019-08-01 08:45:45 +0200, Julien Rouhaud wrote:

On Wed, Jul 31, 2019 at 11:59 PM Andres Freund <andres@anarazel.de> wrote:

And if it were necessary, why wouldn't any of the other fields in
PgBackendStatus need it? There's plenty of other fields written to
without a lock, and several of those are also 8 bytes (so it's not a
case of assuming that 8 byte reads might not be atomic, but for byte
reads are).

This patch is actually storing the queryid in PGPROC, not in
PgBackendStatus, thus the need for an atomic. I used PGPROC because
the value needs to be available in log_line_prefix() and spi.c, so
pgstat.c / PgBackendStatus didn't seem like the best interface in that
case.

Hm. I'm not convinced that really is the case? You can just access
MyBEentry, and read and update it?

Sure, but it requires extra wrapper functions, and the st_changecount
dance when writing the new value.

I mean, we do so at a frequency
roughtly as high as high as the new queryid updates for things like
pgstat_report_activity().

pgstat_report_activity() is only called for top-level statement. For
the queryid we need to track it down to all nested statements, which
could be way higher. But pgstat_progress_update_param() is called way
more than that.

Reading the value of your own backend you'd
not need to follow the changecount algorithm, I think, because it's only
updated from the current backend. If reading were a problem, you
trivially just could have a cache in a local variable, to avoid
accessing shared memory.

Yes definitely, except for pgstat_get_activity(), all reads are
backend local and should be totally safe to read as is.

julien.rouhaud@free.fr

over 6 years ago

In reply to: Andres Freund (#31)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Aug 1, 2019 at 8:36 PM Andres Freund <andres@anarazel.de> wrote:

On 2019-08-01 14:20:46 -0400, Robert Haas wrote:

However, I think that the fact that this patch adds 15 new calls to
pg_atomic_write_u64(&MyProc->queryId, ...) is probably not a good
sign. It seems like we ought to be able to centralize it better than
that.

+1

Unfortunately I didn't find a better way to do that. Since you can
have nested execution, I don't see how to avoid adding extra code in
every parts of query execution.

andres@anarazel.de

over 6 years ago

In reply to: Julien Rouhaud (#32)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Hi,

On 2019-08-01 22:42:23 +0200, Julien Rouhaud wrote:

On Thu, Aug 1, 2019 at 8:36 PM Andres Freund <andres@anarazel.de> wrote:

On 2019-08-01 08:45:45 +0200, Julien Rouhaud wrote:

On Wed, Jul 31, 2019 at 11:59 PM Andres Freund <andres@anarazel.de> wrote:

And if it were necessary, why wouldn't any of the other fields in
PgBackendStatus need it? There's plenty of other fields written to
without a lock, and several of those are also 8 bytes (so it's not a
case of assuming that 8 byte reads might not be atomic, but for byte
reads are).

This patch is actually storing the queryid in PGPROC, not in
PgBackendStatus, thus the need for an atomic. I used PGPROC because
the value needs to be available in log_line_prefix() and spi.c, so
pgstat.c / PgBackendStatus didn't seem like the best interface in that
case.

Hm. I'm not convinced that really is the case? You can just access
MyBEentry, and read and update it?

Sure, but it requires extra wrapper functions, and the st_changecount
dance when writing the new value.

So? You need a wrapper function anyway, there's no way we're going to
add all those separate pg_atomic_write* calls directly.

I mean, we do so at a frequency
roughtly as high as high as the new queryid updates for things like
pgstat_report_activity().

pgstat_report_activity() is only called for top-level statement. For
the queryid we need to track it down to all nested statements, which
could be way higher.

Compared to the overhead of executing a separate query the cost of
single function call containing a MyBEentry update of an 8byte value
seems almost guaranteed to be immeasurable. The executor startup alone
is several orders of magnitude more expensive.

I also think this proposed column should probably respect
the track_activities GUC.

Greetings,

Andres Freund

andres@anarazel.de

over 6 years ago

In reply to: Julien Rouhaud (#33)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Hi,

On 2019-08-01 22:49:48 +0200, Julien Rouhaud wrote:

On Thu, Aug 1, 2019 at 8:36 PM Andres Freund <andres@anarazel.de> wrote:

On 2019-08-01 14:20:46 -0400, Robert Haas wrote:

However, I think that the fact that this patch adds 15 new calls to
pg_atomic_write_u64(&MyProc->queryId, ...) is probably not a good
sign. It seems like we ought to be able to centralize it better than
that.

+1

Unfortunately I didn't find a better way to do that. Since you can
have nested execution, I don't see how to avoid adding extra code in
every parts of query execution.

At least my +1 is not primarily about the number of sites that need to
handle queryid changes, but that they all need to know about the way the
queryid is stored. Including how atomicity etc is handled. That
knowledge should be in one or two places, not more. In a file where that
knowledge makes sense.

I'm *also* concerned about the number of places, as that makes it likely
that some have been missed/new ones will be introduced without the
queryid handling. But that wasn't what I was referring to above.

I'm actually quite unconvinced that it's sensible to update the global
value for nested queries. That'll mean e.g. the log_line_prefix and
pg_stat_activity values are most of the time going to be bogus while
nested, because the querystring that's associated with those will *not*
be the value that the queryid corresponds to. elog.c uses
debug_query_string to log the statement, which is only updated for
top-level queries (outside of some exceptions like parallel workers for
parallel queries in a function or stuff like that). And pg_stat_activity
is also only updated for top level queries.

Greetings,

Andres Freund

julien.rouhaud@free.fr

over 6 years ago

In reply to: Andres Freund (#34)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Aug 1, 2019 at 10:52 PM Andres Freund <andres@anarazel.de> wrote:

On 2019-08-01 22:42:23 +0200, Julien Rouhaud wrote:

Sure, but it requires extra wrapper functions, and the st_changecount
dance when writing the new value.

So? You need a wrapper function anyway, there's no way we're going to
add all those separate pg_atomic_write* calls directly.

Ok

I also think this proposed column should probably respect
the track_activities GUC.

Oh indeed, I'll fix that when I'll be sure of the semantics to implement.

julien.rouhaud@free.fr

over 6 years ago

In reply to: Andres Freund (#35)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Aug 1, 2019 at 11:05 PM Andres Freund <andres@anarazel.de> wrote:

I'm actually quite unconvinced that it's sensible to update the global
value for nested queries. That'll mean e.g. the log_line_prefix and
pg_stat_activity values are most of the time going to be bogus while
nested, because the querystring that's associated with those will *not*
be the value that the queryid corresponds to. elog.c uses
debug_query_string to log the statement, which is only updated for
top-level queries (outside of some exceptions like parallel workers for
parallel queries in a function or stuff like that). And pg_stat_activity
is also only updated for top level queries.

Having the nested queryid seems indeed quite broken for
log_line_prefix. However having the nested queryid in
pg_stat_activity would be convenient to track what is a long stored
functions currently doing. Maybe we could expose something like
top_level_queryid and current_queryid instead?

andres@anarazel.de

over 6 years ago

In reply to: Julien Rouhaud (#37)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Hi,

On 2019-08-02 10:54:35 +0200, Julien Rouhaud wrote:

On Thu, Aug 1, 2019 at 11:05 PM Andres Freund <andres@anarazel.de> wrote:

I'm actually quite unconvinced that it's sensible to update the global
value for nested queries. That'll mean e.g. the log_line_prefix and
pg_stat_activity values are most of the time going to be bogus while
nested, because the querystring that's associated with those will *not*
be the value that the queryid corresponds to. elog.c uses
debug_query_string to log the statement, which is only updated for
top-level queries (outside of some exceptions like parallel workers for
parallel queries in a function or stuff like that). And pg_stat_activity
is also only updated for top level queries.

Having the nested queryid seems indeed quite broken for
log_line_prefix. However having the nested queryid in
pg_stat_activity would be convenient to track what is a long stored
functions currently doing. Maybe we could expose something like
top_level_queryid and current_queryid instead?

Given that the query string is the toplevel one, I think that'd just be
confusing. And given the fact that it adds *substantial* additional
complexity, I'd just rip the subcommand bits out.

Greetings,

Andres Freund

julien.rouhaud@free.fr

over 6 years ago

In reply to: Andres Freund (#38)

1 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Hi,

On Sat, Aug 3, 2019 at 1:21 AM Andres Freund <andres@anarazel.de> wrote:

On 2019-08-02 10:54:35 +0200, Julien Rouhaud wrote:

However having the nested queryid in
pg_stat_activity would be convenient to track what is a long stored
functions currently doing. Maybe we could expose something like
top_level_queryid and current_queryid instead?

Given that the query string is the toplevel one, I think that'd just be
confusing. And given the fact that it adds *substantial* additional
complexity, I'd just rip the subcommand bits out.

Ok, so here's a version that only exposes the top-level queryid only.
There can still be discrepancies with the query field, if a
multi-command string is provided. The queryid will be updated each
time a new top level statement is executed.

As the queryid cannot be immediately known, and may never exist at all
if a query fails to parse, here are the heuristic I used to update the
stored queryid:

- it's reset to 0 each time pgstat_report_activity(STATE_RUNNING) is
called. This way, we're sure that we don't display last query's
queryid in the logs if the next query fails to parse
- it's also reset to 0 at the beginning of exec_simple_query() loop on
the parsetree_list (for multi-command string case)
- pg_analyze_and_rewrite() and pg_analyze_and_rewrite_params() will
report the new queryid after parse analysis.
- a non-zero queryid will only be updated if the stored one is zero

This should also work as intended for background worker using SPI,
provided that they correctly call pgstat_report_activity. I also
modified ExecInitParallelPlan() to publish the queryId in the
serialized plannedStmt, so ParallelQueryMain() can report it to make
the queryid available in the parallel workers too.

Note that this patch makes it clear that a zero queryid means no
queryid computed (and NULL will be displayed in such case in
pg_stat_activity). pg_stat_statements already makes sure that it
cannot compute a zero queryid.

It also assume that any extension computing a queryid will do that in
the post_parse_analysis hook, which seems like a sane requirement. We
may want to have a dedicated hook for that instead, if more people get
interested in having the queryid only, possibly different
implementations, if it becomes available outside pgss.

Attachments:

queryid_exposure_v4.difftext/x-patch; charset=US-ASCII; name=queryid_exposure_v4.diffDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c91e3e1550..c7ca1bf9a8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6353,6 +6353,11 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>queryid: identifier of session's current query, if any</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
@@ -6736,8 +6741,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       <listitem>
        <para>
         Enables the collection of information on the currently
-        executing command of each session, along with the time when
-        that command began execution. This parameter is on by
+        executing command of each session, along with its identifier and the
+        time when that command began execution. This parameter is on by
         default. Note that even when enabled, this information is not
         visible to all users, only to superusers and the user owning
         the session being reported on, so it should not represent a
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c303..7f287c7a7e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -824,6 +824,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      <entry><type>xid</type></entry>
      <entry>The current backend's <literal>xmin</literal> horizon.</entry>
     </row>
+    <row>
+     <entry><structfield>queryid</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this field
+      shows the identifier of the currently executing query. In all other
+      states, it shows the identifier of last query that was executed.  By
+      default, query identifiers are not computed, so this field will always
+      be null, unless an additional module that compute query identifiers, such
+      as <xref linkend="pgstatstatements"/>, is configured.
+     </entry>
+    </row>
     <row>
      <entry><structfield>query</structfield></entry>
      <entry><type>text</type></entry>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ea4c85e395..f30098c2cd 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -749,6 +749,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 53cd2fc666..9ba6d3f2e6 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -121,7 +121,7 @@ typedef struct ExecParallelInitializeDSMContext
 } ExecParallelInitializeDSMContext;
 
 /* Helper functions that run in the parallel leader. */
-static char *ExecSerializePlan(Plan *plan, EState *estate);
+static char *ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId);
 static bool ExecParallelEstimate(PlanState *node,
 								 ExecParallelEstimateContext *e);
 static bool ExecParallelInitializeDSM(PlanState *node,
@@ -140,7 +140,7 @@ static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
  * Create a serialized representation of the plan to be sent to each worker.
  */
 static char *
-ExecSerializePlan(Plan *plan, EState *estate)
+ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId)
 {
 	PlannedStmt *pstmt;
 	ListCell   *lc;
@@ -171,7 +171,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = UINT64CONST(0);
+	pstmt->queryId = queryId;
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
@@ -560,7 +560,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
 ParallelExecutorInfo *
 ExecInitParallelPlan(PlanState *planstate, EState *estate,
 					 Bitmapset *sendParams, int nworkers,
-					 int64 tuples_needed)
+					 int64 tuples_needed,
+					 uint64 queryId)
 {
 	ParallelExecutorInfo *pei;
 	ParallelContext *pcxt;
@@ -601,7 +602,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pei->planstate = planstate;
 
 	/* Fix up and serialize plan to be sent to workers. */
-	pstmt_data = ExecSerializePlan(planstate->plan, estate);
+	pstmt_data = ExecSerializePlan(planstate->plan, estate, queryId);
 
 	/* Create a parallel context. */
 	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -1355,8 +1356,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Setting debug_query_string for individual workers */
 	debug_query_string = queryDesc->sourceText;
 
-	/* Report workers' query for monitoring purposes */
+	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 69d5a1f239..b57b502022 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -172,7 +172,8 @@ ExecGather(PlanState *pstate)
 												 estate,
 												 gather->initParam,
 												 gather->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 6ef128e2ab..25b2494ab7 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -24,6 +24,7 @@
 #include "lib/binaryheap.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -216,7 +217,8 @@ ExecGatherMerge(PlanState *pstate)
 												 estate,
 												 gm->initParam,
 												 gm->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index b4f2b28b51..f6b0089730 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3148,6 +3148,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
+			beentry->st_queryid = 0;
 			proc->wait_event_info = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
@@ -3178,6 +3179,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	beentry->st_state = state;
 	beentry->st_state_start_timestamp = current_timestamp;
 
+	/*
+	 * If a new query is started, we reset the query identifier as it'll only
+	 * be known after parse analysis, to avoid reporting last query's
+	 * identifier.
+	 */
+	if (state == STATE_RUNNING)
+		beentry->st_queryid = 0;
+
 	if (cmd_str != NULL)
 	{
 		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
@@ -3188,6 +3197,47 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/* --------
+ * pgstat_report_queryid() -
+ *
+ *	Called to update top-level query identifier.
+ * --------
+ */
+void
+pgstat_report_queryid(uint64 queryId)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	/*
+	 * if track_activities is disabled, st_queryid should already have been
+	 * reset
+	 */
+	if (!pgstat_track_activities)
+		return;
+
+	/*
+	 * We only report the top-level query identifiers.  The stored queryid is
+	 * reset when a backend call pgstat_report_activity(STATE_RUNNING), or with
+	 * an explicit call to this function.  If the saved query identifier is not
+	 * zero it means that it's not a top-level command, so ignore the one
+	 * provided unless it's an explicit call to reset the identifier.
+	 */
+	if (queryId != 0 && beentry->st_queryid != 0)
+		return;
+
+	/*
+	 * Update my status entry, following the protocol of bumping
+	 * st_changecount before and after.  We use a volatile pointer here to
+	 * ensure the compiler doesn't try to get cute.
+	 */
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+	beentry->st_queryid = queryId;
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
 /*-----------
  * pgstat_progress_start_command() -
  *
@@ -4787,6 +4837,20 @@ pgstat_get_db_entry(Oid databaseid, bool create)
 	return result;
 }
 
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ *	Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+	if (!MyBEEntry)
+		return 0;
+
+	return MyBEEntry->st_queryid;
+}
+
 
 /*
  * Lookup the hash table entry for the specified table. If no hash
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index a6505c7335..f88a8e74bd 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -695,6 +695,8 @@ pg_analyze_and_rewrite(RawStmt *parsetree, const char *query_string,
 	query = parse_analyze(parsetree, query_string, paramTypes, numParams,
 						  queryEnv);
 
+	pgstat_report_queryid(query->queryId);
+
 	if (log_parser_stats)
 		ShowUsage("PARSE ANALYSIS STATISTICS");
 
@@ -746,6 +748,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId);
+
 	if (log_parser_stats)
 		ShowUsage("PARSE ANALYSIS STATISTICS");
 
@@ -1077,6 +1081,8 @@ exec_simple_query(const char *query_string)
 		DestReceiver *receiver;
 		int16		format;
 
+		pgstat_report_queryid(0);
+
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..a4b09ba3f1 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -546,7 +546,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	29
+#define PG_STAT_GET_ACTIVITY_COLS	30
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -875,6 +875,10 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[28] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			if (beentry->st_queryid == 0)
+				nulls[29] = true;
+			else
+				values[29] = DatumGetUInt64(beentry->st_queryid);
 		}
 		else
 		{
@@ -902,6 +906,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[26] = true;
 			nulls[27] = true;
 			nulls[28] = true;
+			nulls[29] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 8b4720ef3a..762f58ae1a 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -69,10 +69,10 @@
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
-#include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2594,6 +2594,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (padding != 0)
+					appendStringInfo(buf, "%*ld", padding,
+							pgstat_get_my_queryid());
+				else
+					appendStringInfo(buf, "%ld",
+							pgstat_get_my_queryid());
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index cfad86c02a..bcb8881e93 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -523,6 +523,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b88e886f7d..bf6971a5da 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5142,9 +5142,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 46fcf89992..e619aa467e 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -38,7 +38,8 @@ typedef struct ParallelExecutorInfo
 
 extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
 												  EState *estate, Bitmapset *sendParam, int nworkers,
-												  int64 tuples_needed);
+												  int64 tuples_needed,
+												  uint64 queryId);
 extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
 extern void ExecParallelFinish(ParallelExecutorInfo *pei);
 extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0a3ad3a188..6dd6e8441d 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1097,6 +1097,9 @@ typedef struct PgBackendStatus
 	ProgressCommandType st_progress_command;
 	Oid			st_progress_command_target;
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+
+	/* query identifier, optionnally computed using post_parse_analyze_hook */
+	uint64		st_queryid;
 } PgBackendStatus;
 
 /*
@@ -1276,6 +1279,7 @@ extern void pgstat_initialize(void);
 extern void pgstat_bestart(void);
 
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_queryid(uint64 queryId);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
@@ -1285,6 +1289,7 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
 extern const char *pgstat_get_backend_desc(BackendType backendType);
+extern uint64 pgstat_get_my_queryid(void);
 
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 210e9cd146..0cbef52045 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1739,9 +1739,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1845,7 +1846,7 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc);
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, queryid);
 pg_stat_progress_cluster| SELECT s.pid,
     s.datid,
     d.datname,
@@ -1952,7 +1953,7 @@ pg_stat_replication| SELECT s.pid,
     w.sync_priority,
     w.sync_state,
     w.reply_time
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_ssl| SELECT s.pid,
@@ -1964,7 +1965,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc);
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, queryid);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,
     st.pid,

legrand legrand

legrand_legrand@hotmail.com

over 6 years ago

In reply to: Julien Rouhaud (#37)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

However having the nested queryid in
pg_stat_activity would be convenient to track
what is a long stored functions currently doing.

+1

And this could permit to get wait event sampling per queryid when
pg_stat_statements.track = all

Regards
PAscal

--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 6 years ago

In reply to: legrand legrand (#40)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

At Sun, 4 Aug 2019 00:04:01 -0700 (MST), legrand legrand <legrand_legrand@hotmail.com> wrote in <1564902241482-0.post@n3.nabble.com>

However having the nested queryid in
pg_stat_activity would be convenient to track
what is a long stored functions currently doing.

+1

And this could permit to get wait event sampling per queryid when
pg_stat_statements.track = all

I'm strongly on this side emotionally, but also I'm on Tom and
Andres's side that exposing querid that way is not the right
thing.

Doing that means we don't need exact correspondence between
top-level query and queryId (in nested or multistatement queries)
in this patch. pg_stat_statements will allow us to do the same
thing by having additional uint64[MaxBackends] array in
pgssSharedState, instead of expanding PgBackendStatus array in
core by the same size.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

rjuju123@gmail.com

over 6 years ago

In reply to: Kyotaro Horiguchi (#41)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Mon, Aug 5, 2019 at 9:28 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Sun, 4 Aug 2019 00:04:01 -0700 (MST), legrand legrand <legrand_legrand@hotmail.com> wrote in <1564902241482-0.post@n3.nabble.com>

However having the nested queryid in
pg_stat_activity would be convenient to track
what is a long stored functions currently doing.

+1

And this could permit to get wait event sampling per queryid when
pg_stat_statements.track = all

I'm strongly on this side emotionally, but also I'm on Tom and
Andres's side that exposing querid that way is not the right
thing.

Doing that means we don't need exact correspondence between
top-level query and queryId (in nested or multistatement queries)
in this patch. pg_stat_statements will allow us to do the same
thing by having additional uint64[MaxBackends] array in
pgssSharedState, instead of expanding PgBackendStatus array in
core by the same size.

Sure, but the problem with this approach is that all extensions that
compute their own queryid would have to do the same. I hope that we
can come up with an approach friendlier for those extensions.

legrand legrand

legrand_legrand@hotmail.com

over 6 years ago

In reply to: Kyotaro Horiguchi (#41)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Kyotaro Horiguchi-4 wrote

At Sun, 4 Aug 2019 00:04:01 -0700 (MST), legrand legrand <

legrand_legrand@

> wrote in <

1564902241482-0.post@.nabble

However having the nested queryid in
pg_stat_activity would be convenient to track
what is a long stored functions currently doing.

+1

And this could permit to get wait event sampling per queryid when
pg_stat_statements.track = all

I'm strongly on this side emotionally, but also I'm on Tom and
Andres's side that exposing querid that way is not the right
thing.

Doing that means we don't need exact correspondence between
top-level query and queryId (in nested or multistatement queries)
in this patch. pg_stat_statements will allow us to do the same
thing by having additional uint64[MaxBackends] array in
pgssSharedState, instead of expanding PgBackendStatus array in
core by the same size.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Hi Kyotaro,
Thank you for this answer.
What you propose here is already available
Inside pg_stat_sql_plans extension (a derivative from
Pg_stat_statements and pg_store_plans)
And I’m used to this queryid behavior with top Level
Queries...
My emotion was high but I will accept it !
Regards
PAscal

--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

efimkin@yandex-team.ru

over 6 years ago

In reply to: legrand legrand (#43)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: tested, passed

HI!
patch is look good for me.

The new status of this patch is: Ready for Committer

Michael Paquier

michael@paquier.xyz

over 6 years ago

In reply to: Evgeny Efimkin (#44)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Aug 07, 2019 at 09:03:21AM +0000, Evgeny Efimkin wrote:

The new status of this patch is: Ready for Committer

I may be wrong of course, but it looks that this is wanted and the
current shape of the patch looks sensible:
- Register the query ID using a backend entry.
- Only consider the top-level query.

An invalid query ID is assumed to be 0 in the patch, per the way it is
defined in pg_stat_statements. However this also maps with the case
where we have a utility statement.

+    * We only report the top-level query identifiers.  The stored queryid is
+    * reset when a backend call pgstat_report_activity(STATE_RUNNING), or with
s/call/calls/

+   /*
+    * We only report the top-level query identifiers.  The stored queryid is
+    * reset when a backend call pgstat_report_activity(STATE_RUNNING), or with
+    * an explicit call to this function.  If the saved query identifier is not
+    * zero it means that it's not a top-level command, so ignore the one
+    * provided unless it's an explicit call to reset the identifier.
+    */
+   if (queryId != 0 && beentry->st_queryid != 0)
+       return;
Hmm.  I am wondering if we shouldn't have an API dedicated to the
reset of the query ID.  That logic looks rather brittle..

Wouldn't it be better (and more consistent) to update the query ID in
parse_analyze_varparams() and parse_analyze() as well after going
through the post_parse_analyze hook instead of pg_analyze_and_rewrite?

+   /*
+    * If a new query is started, we reset the query identifier as it'll only
+    * be known after parse analysis, to avoid reporting last query's
+    * identifier.
+    */
+   if (state == STATE_RUNNING)
+       beentry->st_queryid = 0
I don't quite get why you don't reset the counter in other cases as
well.  If the backend entry is idle in transaction or in an idle
state, it seems to me that we should not report the query ID of the
last query run in the transaction.  And that would make the reset in
exec_simple_query() unnecessary, no?
--
Michael

julien.rouhaud@free.fr

over 6 years ago

In reply to: Michael Paquier (#45)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Thanks for looking at it!

On Wed, Sep 11, 2019 at 6:45 AM Michael Paquier <michael@paquier.xyz> wrote:

An invalid query ID is assumed to be 0 in the patch, per the way it is
defined in pg_stat_statements. However this also maps with the case
where we have a utility statement.

Oh indeed. Which means that if a utility statements later calls
parse_analyze or friends, this patch would report an unexpected
queryid. That's at least possible for something like

COPY (SELECT * FROM tbl) TO ...

The thing is that pg_stat_statements assigns a 0 queryid in the
post_parse_analyze_hook to recognize utility statements and avoid
tracking instrumentation twice in case of utility statements, and then
compute a queryid base on a hash of the query text. Maybe we could
instead fully reserve queryid "2" for utility statements (so forcing
queryid "1" for standard queries if jumbling returns 0 *or* 2 instead
of only 0), and use "2" as the identifier for utility statement
instead of "0"?

+   /*
+    * We only report the top-level query identifiers.  The stored queryid is
+    * reset when a backend call pgstat_report_activity(STATE_RUNNING), or with
+    * an explicit call to this function.  If the saved query identifier is not
+    * zero it means that it's not a top-level command, so ignore the one
+    * provided unless it's an explicit call to reset the identifier.
+    */
+   if (queryId != 0 && beentry->st_queryid != 0)
+       return;
Hmm.  I am wondering if we shouldn't have an API dedicated to the
reset of the query ID.  That logic looks rather brittle..

How about adding a "bool force" parameter to allow resetting the queryid to 0?

Wouldn't it be better (and more consistent) to update the query ID in
parse_analyze_varparams() and parse_analyze() as well after going
through the post_parse_analyze hook instead of pg_analyze_and_rewrite?

I thought about it without knowing what would be best. I'll change to
report the queryid right after calling post_parse_analyze_hook then.

+   /*
+    * If a new query is started, we reset the query identifier as it'll only
+    * be known after parse analysis, to avoid reporting last query's
+    * identifier.
+    */
+   if (state == STATE_RUNNING)
+       beentry->st_queryid = 0
I don't quite get why you don't reset the counter in other cases as
well.  If the backend entry is idle in transaction or in an idle
state, it seems to me that we should not report the query ID of the
last query run in the transaction.  And that would make the reset in
exec_simple_query() unnecessary, no?

I'm reproducing the same behavior as for the query text, ie. showing
the information about the last executed query text if state is idle:

+     <entry><structfield>queryid</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this field
+      shows the identifier of the currently executing query. In all other
+      states, it shows the identifier of last query that was executed.

I think that showing the last executed query's queryid is as useful as
the query text. Also, while avoiding a reset in exec_simple_query()
it'd be required to do such reset in case of error during query
execution, so that wouldn't make things quite simpler..

Michael Paquier

michael@paquier.xyz

about 6 years ago

In reply to: Julien Rouhaud (#46)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Sep 11, 2019 at 06:30:22PM +0200, Julien Rouhaud wrote:

The thing is that pg_stat_statements assigns a 0 queryid in the
post_parse_analyze_hook to recognize utility statements and avoid
tracking instrumentation twice in case of utility statements, and then
compute a queryid base on a hash of the query text. Maybe we could
instead fully reserve queryid "2" for utility statements (so forcing
queryid "1" for standard queries if jumbling returns 0 *or* 2 instead
of only 0), and use "2" as the identifier for utility statement
instead of "0"?

Hmm. Not sure. At this stage it would be nice to gather more input
on the matter, and FWIW, I don't like much the assumption that a query
ID of 0 is perhaps a utility statement, or perhaps nothing depending
on the state of a backend entry, or even perhaps something else
depending how on how modules make use and define such query IDs.
--
Michael

bruce@momjian.us

about 6 years ago

In reply to: Michael Paquier (#47)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Mon, Nov 11, 2019 at 05:37:30PM +0900, Michael Paquier wrote:

On Wed, Sep 11, 2019 at 06:30:22PM +0200, Julien Rouhaud wrote:

The thing is that pg_stat_statements assigns a 0 queryid in the
post_parse_analyze_hook to recognize utility statements and avoid
tracking instrumentation twice in case of utility statements, and then
compute a queryid base on a hash of the query text. Maybe we could
instead fully reserve queryid "2" for utility statements (so forcing
queryid "1" for standard queries if jumbling returns 0 *or* 2 instead
of only 0), and use "2" as the identifier for utility statement
instead of "0"?

Hmm. Not sure. At this stage it would be nice to gather more input
on the matter, and FWIW, I don't like much the assumption that a query
ID of 0 is perhaps a utility statement, or perhaps nothing depending
on the state of a backend entry, or even perhaps something else
depending how on how modules make use and define such query IDs.

I thought each extension would export a function to compute the query
id, and you would all that function with the pg_stat_activity.query
string.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

julien.rouhaud@free.fr

about 6 years ago

In reply to: Bruce Momjian (#48)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Nov 13, 2019 at 4:15 AM Bruce Momjian <bruce@momjian.us> wrote:

On Mon, Nov 11, 2019 at 05:37:30PM +0900, Michael Paquier wrote:

On Wed, Sep 11, 2019 at 06:30:22PM +0200, Julien Rouhaud wrote:

The thing is that pg_stat_statements assigns a 0 queryid in the
post_parse_analyze_hook to recognize utility statements and avoid
tracking instrumentation twice in case of utility statements, and then
compute a queryid base on a hash of the query text. Maybe we could
instead fully reserve queryid "2" for utility statements (so forcing
queryid "1" for standard queries if jumbling returns 0 *or* 2 instead
of only 0), and use "2" as the identifier for utility statement
instead of "0"?

Hmm. Not sure. At this stage it would be nice to gather more input
on the matter, and FWIW, I don't like much the assumption that a query
ID of 0 is perhaps a utility statement, or perhaps nothing depending
on the state of a backend entry, or even perhaps something else
depending how on how modules make use and define such query IDs.

I thought each extension would export a function to compute the query
id, and you would all that function with the pg_stat_activity.query
string.

I'd really like to have the queryid function available through SQL,
but I think that this specific case wouldn't work very well for
pg_stat_statements' approach as it's working with oid. The query
string in pg_stat_activity is the user provided one rather than a
fully-qualified version, so in order to get that query's queryid, you
need to know the exact search_path in use in that backend, and that's
not something available.

Michael Paquier

michael@paquier.xyz

about 6 years ago

In reply to: Julien Rouhaud (#49)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote:

I'd really like to have the queryid function available through SQL,
but I think that this specific case wouldn't work very well for
pg_stat_statements' approach as it's working with oid. The query
string in pg_stat_activity is the user provided one rather than a
fully-qualified version, so in order to get that query's queryid, you
need to know the exact search_path in use in that backend, and that's
not something available.

Yeah.. So, we have a patch marked as ready for committer here, and it
seems to me that we have a couple of issues to discuss more about
first particularly this query ID of 0. Again, do others have more
any input to offer?
--
Michael

Michael Paquier

michael@paquier.xyz

about 6 years ago

In reply to: Michael Paquier (#50)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Nov 29, 2019 at 03:19:49PM +0900, Michael Paquier wrote:

On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote:

I'd really like to have the queryid function available through SQL,
but I think that this specific case wouldn't work very well for
pg_stat_statements' approach as it's working with oid. The query
string in pg_stat_activity is the user provided one rather than a
fully-qualified version, so in order to get that query's queryid, you
need to know the exact search_path in use in that backend, and that's
not something available.

Yeah.. So, we have a patch marked as ready for committer here, and it
seems to me that we have a couple of issues to discuss more about
first particularly this query ID of 0. Again, do others have more
any input to offer?

And while on it, the latest patch does not apply, so a rebase is
needed here.
--
Michael

julien.rouhaud@free.fr

about 6 years ago

In reply to: Michael Paquier (#51)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Nov 29, 2019 at 7:21 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Nov 29, 2019 at 03:19:49PM +0900, Michael Paquier wrote:

On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote:

I'd really like to have the queryid function available through SQL,
but I think that this specific case wouldn't work very well for
pg_stat_statements' approach as it's working with oid. The query
string in pg_stat_activity is the user provided one rather than a
fully-qualified version, so in order to get that query's queryid, you
need to know the exact search_path in use in that backend, and that's
not something available.

Yeah.. So, we have a patch marked as ready for committer here, and it
seems to me that we have a couple of issues to discuss more about
first particularly this query ID of 0. Again, do others have more
any input to offer?

I just realized that with current infrastructure it's not possible to
display a utility queryid. We need to recognize utility to not
process the counters twice (once in processUtility, once in the
underlying executor), so we don't provide a queryid for utility
statements in parse analysis. Current magic value 0 has the side
effect of showing an invalid queryid for all utilty statements, and
using a magic value different from 0 will just always display that
magic value. We could instead add another field in the Query and
PlannedStmt structs, say "int queryid_flags", that extensions could
use for their needs?

And while on it, the latest patch does not apply, so a rebase is
needed here.

Yep, I noticed that this morning. I already rebased the patch
locally, I'll send a new version with new modifications when we reach
an agreement on the utility issue.

tomas.vondra@2ndquadrant.com

almost 6 years ago

In reply to: Julien Rouhaud (#52)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Nov 29, 2019 at 09:39:09AM +0100, Julien Rouhaud wrote:

On Fri, Nov 29, 2019 at 7:21 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Nov 29, 2019 at 03:19:49PM +0900, Michael Paquier wrote:

On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote:

I'd really like to have the queryid function available through SQL,
but I think that this specific case wouldn't work very well for
pg_stat_statements' approach as it's working with oid. The query
string in pg_stat_activity is the user provided one rather than a
fully-qualified version, so in order to get that query's queryid, you
need to know the exact search_path in use in that backend, and that's
not something available.

Yeah.. So, we have a patch marked as ready for committer here, and it
seems to me that we have a couple of issues to discuss more about
first particularly this query ID of 0. Again, do others have more
any input to offer?

I just realized that with current infrastructure it's not possible to
display a utility queryid. We need to recognize utility to not
process the counters twice (once in processUtility, once in the
underlying executor), so we don't provide a queryid for utility
statements in parse analysis. Current magic value 0 has the side
effect of showing an invalid queryid for all utilty statements, and
using a magic value different from 0 will just always display that
magic value. We could instead add another field in the Query and
PlannedStmt structs, say "int queryid_flags", that extensions could
use for their needs?

And while on it, the latest patch does not apply, so a rebase is
needed here.

Yep, I noticed that this morning. I already rebased the patch
locally, I'll send a new version with new modifications when we reach
an agreement on the utility issue.

Well, this patch was in WoA since November, but now that I look at it
that might have been wrong - we're clearly waiting for agreement on how
to handle queryid for utility commands. I suspect the WoA status might
have been driving people away from this thread :-(

I've switched the patch to "needs review" and moved it to the next CF.
What I think needs to happen is we get a patch implementing one of the
proposed solutions, and discuss that.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

julien.rouhaud@free.fr

almost 6 years ago

In reply to: Tomas Vondra (#53)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Sat, Feb 1, 2020 at 12:30 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

On Fri, Nov 29, 2019 at 09:39:09AM +0100, Julien Rouhaud wrote:

On Fri, Nov 29, 2019 at 7:21 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Nov 29, 2019 at 03:19:49PM +0900, Michael Paquier wrote:

On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote:

I'd really like to have the queryid function available through SQL,
but I think that this specific case wouldn't work very well for
pg_stat_statements' approach as it's working with oid. The query
string in pg_stat_activity is the user provided one rather than a
fully-qualified version, so in order to get that query's queryid, you
need to know the exact search_path in use in that backend, and that's
not something available.

Yeah.. So, we have a patch marked as ready for committer here, and it
seems to me that we have a couple of issues to discuss more about
first particularly this query ID of 0. Again, do others have more
any input to offer?

I just realized that with current infrastructure it's not possible to
display a utility queryid. We need to recognize utility to not
process the counters twice (once in processUtility, once in the
underlying executor), so we don't provide a queryid for utility
statements in parse analysis. Current magic value 0 has the side
effect of showing an invalid queryid for all utilty statements, and
using a magic value different from 0 will just always display that
magic value. We could instead add another field in the Query and
PlannedStmt structs, say "int queryid_flags", that extensions could
use for their needs?

And while on it, the latest patch does not apply, so a rebase is
needed here.

Yep, I noticed that this morning. I already rebased the patch
locally, I'll send a new version with new modifications when we reach
an agreement on the utility issue.

Well, this patch was in WoA since November, but now that I look at it
that might have been wrong - we're clearly waiting for agreement on how
to handle queryid for utility commands. I suspect the WoA status might
have been driving people away from this thread :-(

Oh, indeed.

I've switched the patch to "needs review" and moved it to the next CF.

Thanks

What I think needs to happen is we get a patch implementing one of the
proposed solutions, and discuss that.

There's also the possibility to reserve 1 bit of the hash to know if
this is a utility command or not, although I don't recall right now
all the possible issues with utility commands and some special
handling of them. I'll work on it before the next commitfest.

robertmhaas@gmail.com

almost 6 years ago

In reply to: Julien Rouhaud (#54)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Feb 5, 2020 at 9:32 AM Julien Rouhaud <julien.rouhaud@free.fr> wrote:

There's also the possibility to reserve 1 bit of the hash to know if
this is a utility command or not, although I don't recall right now
all the possible issues with utility commands and some special
handling of them. I'll work on it before the next commitfest.

FWIW, I don't really see why it would be bad to have 0 mean that
"there's no query ID for some reason" without caring whether that's
because the current statement is a utility statement or because
there's no statement in progress at all or whatever else. The user
probably doesn't need our help to distinguish between "no statement"
and "utility statement", right?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

rjuju123@gmail.com

almost 6 years ago

In reply to: Robert Haas (#55)

1 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Feb 06, 2020 at 02:59:09PM -0500, Robert Haas wrote:

On Wed, Feb 5, 2020 at 9:32 AM Julien Rouhaud <julien.rouhaud@free.fr> wrote:

There's also the possibility to reserve 1 bit of the hash to know if
this is a utility command or not, although I don't recall right now
all the possible issues with utility commands and some special
handling of them. I'll work on it before the next commitfest.

FWIW, I don't really see why it would be bad to have 0 mean that
"there's no query ID for some reason" without caring whether that's
because the current statement is a utility statement or because
there's no statement in progress at all or whatever else. The user
probably doesn't need our help to distinguish between "no statement"
and "utility statement", right?

Sure, but if we don't fix that it means that we also won't expose any queryid
for utility statement, even if pg_stat_statements is configured to track those
(with a very poor queryid handling, but still).

While looking at this again, I realized that pg_stat_statements doesn't compute
a queryid during the post parse analysis hook just to make sure that no query
identifier will be set during executorStart and the rest of executor functions.

AFAICT, that can't happen anyway since pg_plan_queries() will discard any
computed queryid for utility statements. This seems to be an oversight due to
original pg_stat_statements implementation, so I fixed this.

Then, as processUtility is called between parse analysis and executor, I think
that we can simply work around this by computing utility statements query
identifier during parse analysis, removing it in pgss_ProcessUtility and
keeping a copy of it for the pgss_store calls in that function, as done in the
attached v5.

This fixes everything except EXECUTE statements, which has to get the
underlying query's queryid. The problem is that EXECUTE won't get through
parse analysis, so while it's correctly handled for execution and pgss_store,
it's not being exposed in pg_stat_activity and log_line_prefix. To fix it, I
added an extra call to pgstat_report_queryid in executorStart. As this
function is a no-op if a queryid is already exposed, this shouldn't cause any
harm and fix any other cases of query execution that don't go through parse
analysis.

Finally, DEALLOCATE is entirely ignored by pg_stat_statements, so those
statements will always be reported with a NULL/0 queryid, but this is
consistent as it's also not present in pg_stat_statements() SRF.

Attachments:

queryid_exposure-v5.difftext/plain; charset=us-asciiDownload

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 6f82a671ee..2da810ade6 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -112,6 +112,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+									!IsA(n, PrepareStmt) && \
+									!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -308,7 +316,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 								ProcessUtilityContext context, ParamListInfo params,
 								QueryEnvironment *queryEnv,
 								DestReceiver *dest, char *completionTag);
-static uint64 pgss_hash_string(const char *str, int len);
+static const char *pgss_clean_querytext(const char *query, int *location, int *len);
+static uint64 pgss_compute_utility_queryid(const char *query, int query_len);
 static void pgss_store(const char *query, uint64 queryId,
 					   int query_location, int query_len,
 					   double total_time, uint64 rows,
@@ -792,16 +801,34 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * We compute a queryId now so that it can get exported in out
+	 * PgBackendStatus.  pgss_ProcessUtility will later discard it to prevents
+	 * double counting of optimizable statements that are directly contained in
+	 * utility statements.  Note that we don't compute a queryId for prepared
+	 * statemets related utility, as those will inherit from the underlying
+	 * statements's one (except DEALLOCATE which is entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && PGSS_HANDLED_UTILITY(query->utilityStmt)
+			&& pstate->p_sourcetext)
+		{
+			const char *querytext = pstate->p_sourcetext;
+			int query_location = query->stmt_location;
+			int query_len = query->stmt_len;
+
+			/*
+			 * Confine our attention to the relevant part of the string, if the
+			 * query is a portion of a multi-statement source string.
+			 */
+			querytext = pgss_clean_querytext(pstate->p_sourcetext,
+											 &query_location,
+											 &query_len);
+
+			query->queryId = pgss_compute_utility_queryid(querytext, query_len);
+		}
+		else
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -963,6 +990,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, char *completionTag)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
+	uint64		saved_queryId = pstmt->queryId;
+
+	/*
+	 * Utility statements get queryId zero.  We do this even in cases where
+	 * the statement contains an optimizable statement for which a queryId
+	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
+	 * runtime control will first go through ProcessUtility and then the
+	 * executor, and we don't want the executor hooks to do anything, since we
+	 * are already measuring the statement's costs at the utility level.
+	 *
+	 * Note that this is only done if pg_stat_statements is enabled and
+	 * configured to track utility statements, in the unlikely possibility
+	 * that user configured another extension to handle utility statements
+	 * only.
+	 */
+	if (pgss_enabled() && pgss_track_utility)
+		pstmt->queryId = UINT64CONST(0);
 
 	/*
 	 * If it's an EXECUTE statement, we don't track it and don't increment the
@@ -979,9 +1023,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	 * Likewise, we don't track execution of DEALLOCATE.
 	 */
 	if (pgss_track_utility && pgss_enabled() &&
-		!IsA(parsetree, ExecuteStmt) &&
-		!IsA(parsetree, PrepareStmt) &&
-		!IsA(parsetree, DeallocateStmt))
+		PGSS_HANDLED_UTILITY(parsetree))
 	{
 		instr_time	start;
 		instr_time	duration;
@@ -1047,7 +1089,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		INSTR_TIME_SUBTRACT(bufusage.blk_write_time, bufusage_start.blk_write_time);
 
 		pgss_store(queryString,
-				   0,			/* signal that it's a utility stmt */
+				   saved_queryId,
 				   pstmt->stmt_location,
 				   pstmt->stmt_len,
 				   INSTR_TIME_GET_MILLISEC(duration),
@@ -1069,22 +1111,76 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 }
 
 /*
- * Given an arbitrarily long query string, produce a hash for the purposes of
- * identifying the query, without normalizing constants.  Used when hashing
- * utility statements.
+ * Given a possibly multi-statement source string, confine our attention to the
+ * relevant part of the string.
+ */
+static const char *
+pgss_clean_querytext(const char *query, int *location, int *len)
+{
+	int query_location = *location;
+	int query_len = *len;
+
+	/* First apply starting offset, unless it's -1 (unknown). */
+	if (query_location >= 0)
+	{
+		Assert(query_location <= strlen(query));
+		query += query_location;
+		/* Length of 0 (or -1) means "rest of string" */
+		if (query_len <= 0)
+			query_len = strlen(query);
+		else
+			Assert(query_len <= strlen(query));
+	}
+	else
+	{
+		/* If query location is unknown, distrust query_len as well */
+		query_location = 0;
+		query_len = strlen(query);
+	}
+
+	/*
+	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
+	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 */
+	while (query_len > 0 && scanner_isspace(query[0]))
+		query++, query_location++, query_len--;
+	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
+		query_len--;
+
+	*location = query_location;
+	*len = query_len;
+
+	return query;
+}
+
+/*
+ * Compute a query identifier for the given utility query string.
  */
 static uint64
-pgss_hash_string(const char *str, int len)
+pgss_compute_utility_queryid(const char *str, int query_len)
 {
-	return DatumGetUInt64(hash_any_extended((const unsigned char *) str,
-											len, 0));
+	uint64 queryId;
+
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+											   query_len, 0));
+
+	/*
+	 * If we are unlucky enough to get a hash of zero(invalid), use
+	 * queryID as 2 instead, queryID 1 is already in use for normal
+	 * statements.
+	 */
+	if (queryId == UINT64CONST(0))
+		queryId = UINT64CONST(2);
+
+	return queryId;
 }
 
 /*
  * Store some statistics for a statement.
  *
- * If queryId is 0 then this is a utility statement and we should compute
- * a suitable queryId internally.
+ * If queryId is 0 then this is a utility statement for which we couldn't
+ * compute a queryId during parse analysis, and we should compute a suitable
+ * queryId internally.
  *
  * If jstate is not NULL then we're trying to create an entry for which
  * we have no statistics as yet; we just want to record the normalized
@@ -1111,50 +1207,15 @@ pgss_store(const char *query, uint64 queryId,
 	/*
 	 * Confine our attention to the relevant part of the string, if the query
 	 * is a portion of a multi-statement source string.
-	 *
-	 * First apply starting offset, unless it's -1 (unknown).
-	 */
-	if (query_location >= 0)
-	{
-		Assert(query_location <= strlen(query));
-		query += query_location;
-		/* Length of 0 (or -1) means "rest of string" */
-		if (query_len <= 0)
-			query_len = strlen(query);
-		else
-			Assert(query_len <= strlen(query));
-	}
-	else
-	{
-		/* If query location is unknown, distrust query_len as well */
-		query_location = 0;
-		query_len = strlen(query);
-	}
-
-	/*
-	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
-	 * not libc's isspace(), because we want to match the lexer's behavior.
 	 */
-	while (query_len > 0 && scanner_isspace(query[0]))
-		query++, query_location++, query_len--;
-	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
-		query_len--;
+	query = pgss_clean_querytext(query, &query_location, &query_len);
 
 	/*
-	 * For utility statements, we just hash the query string to get an ID.
+	 * For not already handled utility statements, we just hash the query
+	 * string to get an ID.
 	 */
 	if (queryId == UINT64CONST(0))
-	{
-		queryId = pgss_hash_string(query, query_len);
-
-		/*
-		 * If we are unlucky enough to get a hash of zero(invalid), use
-		 * queryID as 2 instead, queryID 1 is already in use for normal
-		 * statements.
-		 */
-		if (queryId == UINT64CONST(0))
-			queryId = UINT64CONST(2);
-	}
+		queryId = pgss_compute_utility_queryid(query, query_len);
 
 	/* Set up key for hashtable search */
 	key.userid = GetUserId();
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c1128f89ec..52faea72ce 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6552,6 +6552,11 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>queryid: identifier of session's current query, if any</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
@@ -6960,8 +6965,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       <listitem>
        <para>
         Enables the collection of information on the currently
-        executing command of each session, along with the time when
-        that command began execution. This parameter is on by
+        executing command of each session, along with its identifier and the
+        time when that command began execution. This parameter is on by
         default. Note that even when enabled, this information is not
         visible to all users, only to superusers and the user owning
         the session being reported on, so it should not represent a
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 08353cb343..b9ebf3539d 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -845,6 +845,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      <entry><type>xid</type></entry>
      <entry>The current backend's <literal>xmin</literal> horizon.</entry>
     </row>
+    <row>
+     <entry><structfield>queryid</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this field
+      shows the identifier of the currently executing query. In all other
+      states, it shows the identifier of last query that was executed.  By
+      default, query identifiers are not computed, so this field will always
+      be null, unless an additional module that compute query identifiers, such
+      as <xref linkend="pgstatstatements"/>, is configured.
+     </entry>
+    </row>
     <row>
      <entry><structfield>query</structfield></entry>
      <entry><type>text</type></entry>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f681aafcf9..b953932b03 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -757,6 +757,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ee5c3a60ff..fd7346919b 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -54,6 +54,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "tcop/utility.h"
@@ -142,6 +143,13 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	/* In some cases (e.g. an EXECUTE statement) a query execution will skip
+	 * parse analysis, which means that the queryid won't be reported.  Note
+	 * that it's harmless to report the queryid multiple time, as the call will
+	 * be ignored if the top level queryid has already been reported.
+	 */
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a753d6efa0..c5c02a1d2f 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -121,7 +121,7 @@ typedef struct ExecParallelInitializeDSMContext
 } ExecParallelInitializeDSMContext;
 
 /* Helper functions that run in the parallel leader. */
-static char *ExecSerializePlan(Plan *plan, EState *estate);
+static char *ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId);
 static bool ExecParallelEstimate(PlanState *node,
 								 ExecParallelEstimateContext *e);
 static bool ExecParallelInitializeDSM(PlanState *node,
@@ -140,7 +140,7 @@ static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
  * Create a serialized representation of the plan to be sent to each worker.
  */
 static char *
-ExecSerializePlan(Plan *plan, EState *estate)
+ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId)
 {
 	PlannedStmt *pstmt;
 	ListCell   *lc;
@@ -171,7 +171,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = UINT64CONST(0);
+	pstmt->queryId = queryId;
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
@@ -562,7 +562,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
 ParallelExecutorInfo *
 ExecInitParallelPlan(PlanState *planstate, EState *estate,
 					 Bitmapset *sendParams, int nworkers,
-					 int64 tuples_needed)
+					 int64 tuples_needed,
+					 uint64 queryId)
 {
 	ParallelExecutorInfo *pei;
 	ParallelContext *pcxt;
@@ -603,7 +604,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pei->planstate = planstate;
 
 	/* Fix up and serialize plan to be sent to workers. */
-	pstmt_data = ExecSerializePlan(planstate->plan, estate);
+	pstmt_data = ExecSerializePlan(planstate->plan, estate, queryId);
 
 	/* Create a parallel context. */
 	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -1357,8 +1358,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Setting debug_query_string for individual workers */
 	debug_query_string = queryDesc->sourceText;
 
-	/* Report workers' query for monitoring purposes */
+	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 6b8ed867d5..c57e197020 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -172,7 +172,8 @@ ExecGather(PlanState *pstate)
 												 estate,
 												 gather->initParam,
 												 gather->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 317ddb4ae2..b2040dca8e 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -24,6 +24,7 @@
 #include "lib/binaryheap.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -216,7 +217,8 @@ ExecGatherMerge(PlanState *pstate)
 												 estate,
 												 gm->initParam,
 												 gm->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 748bebffc1..712d48d5bb 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -43,6 +43,7 @@
 #include "parser/parse_relation.h"
 #include "parser/parse_target.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/rel.h"
 
@@ -120,6 +121,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
@@ -153,6 +156,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 7169509a79..bcd119f160 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3141,6 +3141,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
+			beentry->st_queryid = 0;
 			proc->wait_event_info = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
@@ -3171,6 +3172,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	beentry->st_state = state;
 	beentry->st_state_start_timestamp = current_timestamp;
 
+	/*
+	 * If a new query is started, we reset the query identifier as it'll only
+	 * be known after parse analysis, to avoid reporting last query's
+	 * identifier.
+	 */
+	if (state == STATE_RUNNING)
+		beentry->st_queryid = 0;
+
 	if (cmd_str != NULL)
 	{
 		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
@@ -3181,6 +3190,48 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/* --------
+ * pgstat_report_queryid() -
+ *
+ *	Called to update top-level query identifier.
+ * --------
+ */
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	/*
+	 * if track_activities is disabled, st_queryid should already have been
+	 * reset
+	 */
+	if (!pgstat_track_activities)
+		return;
+
+	/*
+	 * We only report the top-level query identifiers.  The stored queryid is
+	 * reset when a backend calls pgstat_report_activity(STATE_RUNNING), or
+	 * with an explicit call to this function using the force flag.  If the
+	 * saved query identifier is not zero it means that it's not a top-level
+	 * command, so ignore the one provided unless it's an explicit call to
+	 * reset the identifier.
+	 */
+	if (beentry->st_queryid != 0 && !force)
+		return;
+
+	/*
+	 * Update my status entry, following the protocol of bumping
+	 * st_changecount before and after.  We use a volatile pointer here to
+	 * ensure the compiler doesn't try to get cute.
+	 */
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+	beentry->st_queryid = queryId;
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
 /*-----------
  * pgstat_progress_start_command() -
  *
@@ -4754,6 +4805,20 @@ pgstat_get_db_entry(Oid databaseid, bool create)
 	return result;
 }
 
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ *	Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+	if (!MyBEEntry)
+		return 0;
+
+	return MyBEEntry->st_queryid;
+}
+
 
 /*
  * Lookup the hash table entry for the specified table. If no hash
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 0a6f80963b..4ad39c5845 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -742,6 +742,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	if (log_parser_stats)
 		ShowUsage("PARSE ANALYSIS STATISTICS");
 
@@ -958,6 +960,7 @@ pg_plan_queries(List *querytrees, int cursorOptions, ParamListInfo boundParams)
 			stmt->utilityStmt = query->utilityStmt;
 			stmt->stmt_location = query->stmt_location;
 			stmt->stmt_len = query->stmt_len;
+			stmt->queryId = query->queryId;
 		}
 		else
 		{
@@ -1073,6 +1076,8 @@ exec_simple_query(const char *query_string)
 		DestReceiver *receiver;
 		int16		format;
 
+		pgstat_report_queryid(0, true);
+
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 7e6a3c1774..9f7dd372ef 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -547,7 +547,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	30
+#define PG_STAT_GET_ACTIVITY_COLS	31
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -888,6 +888,10 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[28] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			if (beentry->st_queryid == 0)
+				nulls[30] = true;
+			else
+				values[30] = DatumGetUInt64(beentry->st_queryid);
 		}
 		else
 		{
@@ -916,6 +920,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[27] = true;
 			nulls[28] = true;
 			nulls[29] = true;
+			nulls[30] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index f5b0211f66..678c43fd45 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -72,10 +72,10 @@
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
-#include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2702,6 +2702,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (padding != 0)
+					appendStringInfo(buf, "%*ld", padding,
+							pgstat_get_my_queryid());
+				else
+					appendStringInfo(buf, "%ld",
+							pgstat_get_my_queryid());
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e1048c0047..63491299f9 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -532,6 +532,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 226c904c04..508998fac8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5175,9 +5175,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid, queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 17d07cf020..63bb80c00c 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -38,7 +38,8 @@ typedef struct ParallelExecutorInfo
 
 extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
 												  EState *estate, Bitmapset *sendParam, int nworkers,
-												  int64 tuples_needed);
+												  int64 tuples_needed,
+												  uint64 queryId);
 extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
 extern void ExecParallelFinish(ParallelExecutorInfo *pei);
 extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index aecb6013f0..9a11aec4ee 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1098,6 +1098,9 @@ typedef struct PgBackendStatus
 	ProgressCommandType st_progress_command;
 	Oid			st_progress_command_target;
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+
+	/* query identifier, optionnally computed using post_parse_analyze_hook */
+	uint64		st_queryid;
 } PgBackendStatus;
 
 /*
@@ -1277,6 +1280,7 @@ extern void pgstat_initialize(void);
 extern void pgstat_bestart(void);
 
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_queryid(uint64 queryId, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
@@ -1286,6 +1290,7 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
 extern const char *pgstat_get_backend_desc(BackendType backendType);
+extern uint64 pgstat_get_my_queryid(void);
 
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 634f8256f7..0a92797777 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1746,9 +1746,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1852,7 +1853,7 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
@@ -1985,7 +1986,7 @@ pg_stat_replication| SELECT s.pid,
     w.spill_txns,
     w.spill_count,
     w.spill_bytes
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time, spill_txns, spill_count, spill_bytes) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_ssl| SELECT s.pid,
@@ -1997,7 +1998,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,

julien.rouhaud@free.fr

almost 6 years ago

In reply to: Julien Rouhaud (#56)

1 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Feb 7, 2020 at 11:12 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Thu, Feb 06, 2020 at 02:59:09PM -0500, Robert Haas wrote:

On Wed, Feb 5, 2020 at 9:32 AM Julien Rouhaud <julien.rouhaud@free.fr> wrote:

There's also the possibility to reserve 1 bit of the hash to know if
this is a utility command or not, although I don't recall right now
all the possible issues with utility commands and some special
handling of them. I'll work on it before the next commitfest.

FWIW, I don't really see why it would be bad to have 0 mean that
"there's no query ID for some reason" without caring whether that's
because the current statement is a utility statement or because
there's no statement in progress at all or whatever else. The user
probably doesn't need our help to distinguish between "no statement"
and "utility statement", right?

Sure, but if we don't fix that it means that we also won't expose any queryid
for utility statement, even if pg_stat_statements is configured to track those
(with a very poor queryid handling, but still).

While looking at this again, I realized that pg_stat_statements doesn't compute
a queryid during the post parse analysis hook just to make sure that no query
identifier will be set during executorStart and the rest of executor functions.

AFAICT, that can't happen anyway since pg_plan_queries() will discard any
computed queryid for utility statements. This seems to be an oversight due to
original pg_stat_statements implementation, so I fixed this.

Then, as processUtility is called between parse analysis and executor, I think
that we can simply work around this by computing utility statements query
identifier during parse analysis, removing it in pgss_ProcessUtility and
keeping a copy of it for the pgss_store calls in that function, as done in the
attached v5.

This fixes everything except EXECUTE statements, which has to get the
underlying query's queryid. The problem is that EXECUTE won't get through
parse analysis, so while it's correctly handled for execution and pgss_store,
it's not being exposed in pg_stat_activity and log_line_prefix. To fix it, I
added an extra call to pgstat_report_queryid in executorStart. As this
function is a no-op if a queryid is already exposed, this shouldn't cause any
harm and fix any other cases of query execution that don't go through parse
analysis.

Finally, DEALLOCATE is entirely ignored by pg_stat_statements, so those
statements will always be reported with a NULL/0 queryid, but this is
consistent as it's also not present in pg_stat_statements() SRF.

cfbot reports a failure since 2f9661311b (command completion tag
change), so here's a rebased v6, no change otherwise.

Attachments:

queryid_exposure-v6.difftext/x-patch; charset=US-ASCII; name=queryid_exposure-v6.diffDownload

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 20dc8c605b..2b3aa79cb6 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -112,6 +112,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+									!IsA(n, PrepareStmt) && \
+									!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -308,7 +316,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 								ProcessUtilityContext context, ParamListInfo params,
 								QueryEnvironment *queryEnv,
 								DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
+static const char *pgss_clean_querytext(const char *query, int *location, int *len);
+static uint64 pgss_compute_utility_queryid(const char *query, int query_len);
 static void pgss_store(const char *query, uint64 queryId,
 					   int query_location, int query_len,
 					   double total_time, uint64 rows,
@@ -792,16 +801,34 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * We compute a queryId now so that it can get exported in out
+	 * PgBackendStatus.  pgss_ProcessUtility will later discard it to prevents
+	 * double counting of optimizable statements that are directly contained in
+	 * utility statements.  Note that we don't compute a queryId for prepared
+	 * statemets related utility, as those will inherit from the underlying
+	 * statements's one (except DEALLOCATE which is entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && PGSS_HANDLED_UTILITY(query->utilityStmt)
+			&& pstate->p_sourcetext)
+		{
+			const char *querytext = pstate->p_sourcetext;
+			int query_location = query->stmt_location;
+			int query_len = query->stmt_len;
+
+			/*
+			 * Confine our attention to the relevant part of the string, if the
+			 * query is a portion of a multi-statement source string.
+			 */
+			querytext = pgss_clean_querytext(pstate->p_sourcetext,
+											 &query_location,
+											 &query_len);
+
+			query->queryId = pgss_compute_utility_queryid(querytext, query_len);
+		}
+		else
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -963,6 +990,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, QueryCompletion *qc)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
+	uint64		saved_queryId = pstmt->queryId;
+
+	/*
+	 * Utility statements get queryId zero.  We do this even in cases where
+	 * the statement contains an optimizable statement for which a queryId
+	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
+	 * runtime control will first go through ProcessUtility and then the
+	 * executor, and we don't want the executor hooks to do anything, since we
+	 * are already measuring the statement's costs at the utility level.
+	 *
+	 * Note that this is only done if pg_stat_statements is enabled and
+	 * configured to track utility statements, in the unlikely possibility
+	 * that user configured another extension to handle utility statements
+	 * only.
+	 */
+	if (pgss_enabled() && pgss_track_utility)
+		pstmt->queryId = UINT64CONST(0);
 
 	/*
 	 * If it's an EXECUTE statement, we don't track it and don't increment the
@@ -979,9 +1023,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	 * Likewise, we don't track execution of DEALLOCATE.
 	 */
 	if (pgss_track_utility && pgss_enabled() &&
-		!IsA(parsetree, ExecuteStmt) &&
-		!IsA(parsetree, PrepareStmt) &&
-		!IsA(parsetree, DeallocateStmt))
+		PGSS_HANDLED_UTILITY(parsetree))
 	{
 		instr_time	start;
 		instr_time	duration;
@@ -1042,7 +1084,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		INSTR_TIME_SUBTRACT(bufusage.blk_write_time, bufusage_start.blk_write_time);
 
 		pgss_store(queryString,
-				   0,			/* signal that it's a utility stmt */
+				   saved_queryId,
 				   pstmt->stmt_location,
 				   pstmt->stmt_len,
 				   INSTR_TIME_GET_MILLISEC(duration),
@@ -1064,22 +1106,76 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 }
 
 /*
- * Given an arbitrarily long query string, produce a hash for the purposes of
- * identifying the query, without normalizing constants.  Used when hashing
- * utility statements.
+ * Given a possibly multi-statement source string, confine our attention to the
+ * relevant part of the string.
+ */
+static const char *
+pgss_clean_querytext(const char *query, int *location, int *len)
+{
+	int query_location = *location;
+	int query_len = *len;
+
+	/* First apply starting offset, unless it's -1 (unknown). */
+	if (query_location >= 0)
+	{
+		Assert(query_location <= strlen(query));
+		query += query_location;
+		/* Length of 0 (or -1) means "rest of string" */
+		if (query_len <= 0)
+			query_len = strlen(query);
+		else
+			Assert(query_len <= strlen(query));
+	}
+	else
+	{
+		/* If query location is unknown, distrust query_len as well */
+		query_location = 0;
+		query_len = strlen(query);
+	}
+
+	/*
+	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
+	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 */
+	while (query_len > 0 && scanner_isspace(query[0]))
+		query++, query_location++, query_len--;
+	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
+		query_len--;
+
+	*location = query_location;
+	*len = query_len;
+
+	return query;
+}
+
+/*
+ * Compute a query identifier for the given utility query string.
  */
 static uint64
-pgss_hash_string(const char *str, int len)
+pgss_compute_utility_queryid(const char *str, int query_len)
 {
-	return DatumGetUInt64(hash_any_extended((const unsigned char *) str,
-											len, 0));
+	uint64 queryId;
+
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+											   query_len, 0));
+
+	/*
+	 * If we are unlucky enough to get a hash of zero(invalid), use
+	 * queryID as 2 instead, queryID 1 is already in use for normal
+	 * statements.
+	 */
+	if (queryId == UINT64CONST(0))
+		queryId = UINT64CONST(2);
+
+	return queryId;
 }
 
 /*
  * Store some statistics for a statement.
  *
- * If queryId is 0 then this is a utility statement and we should compute
- * a suitable queryId internally.
+ * If queryId is 0 then this is a utility statement for which we couldn't
+ * compute a queryId during parse analysis, and we should compute a suitable
+ * queryId internally.
  *
  * If jstate is not NULL then we're trying to create an entry for which
  * we have no statistics as yet; we just want to record the normalized
@@ -1106,50 +1202,15 @@ pgss_store(const char *query, uint64 queryId,
 	/*
 	 * Confine our attention to the relevant part of the string, if the query
 	 * is a portion of a multi-statement source string.
-	 *
-	 * First apply starting offset, unless it's -1 (unknown).
-	 */
-	if (query_location >= 0)
-	{
-		Assert(query_location <= strlen(query));
-		query += query_location;
-		/* Length of 0 (or -1) means "rest of string" */
-		if (query_len <= 0)
-			query_len = strlen(query);
-		else
-			Assert(query_len <= strlen(query));
-	}
-	else
-	{
-		/* If query location is unknown, distrust query_len as well */
-		query_location = 0;
-		query_len = strlen(query);
-	}
-
-	/*
-	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
-	 * not libc's isspace(), because we want to match the lexer's behavior.
 	 */
-	while (query_len > 0 && scanner_isspace(query[0]))
-		query++, query_location++, query_len--;
-	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
-		query_len--;
+	query = pgss_clean_querytext(query, &query_location, &query_len);
 
 	/*
-	 * For utility statements, we just hash the query string to get an ID.
+	 * For not already handled utility statements, we just hash the query
+	 * string to get an ID.
 	 */
 	if (queryId == UINT64CONST(0))
-	{
-		queryId = pgss_hash_string(query, query_len);
-
-		/*
-		 * If we are unlucky enough to get a hash of zero(invalid), use
-		 * queryID as 2 instead, queryID 1 is already in use for normal
-		 * statements.
-		 */
-		if (queryId == UINT64CONST(0))
-			queryId = UINT64CONST(2);
-	}
+		queryId = pgss_compute_utility_queryid(query, query_len);
 
 	/* Set up key for hashtable search */
 	key.userid = GetUserId();
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c1128f89ec..52faea72ce 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6552,6 +6552,11 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>queryid: identifier of session's current query, if any</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
@@ -6960,8 +6965,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       <listitem>
        <para>
         Enables the collection of information on the currently
-        executing command of each session, along with the time when
-        that command began execution. This parameter is on by
+        executing command of each session, along with its identifier and the
+        time when that command began execution. This parameter is on by
         default. Note that even when enabled, this information is not
         visible to all users, only to superusers and the user owning
         the session being reported on, so it should not represent a
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 987580d6df..bf7a81ed6e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -853,6 +853,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      <entry><type>xid</type></entry>
      <entry>The current backend's <literal>xmin</literal> horizon.</entry>
     </row>
+    <row>
+     <entry><structfield>queryid</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this field
+      shows the identifier of the currently executing query. In all other
+      states, it shows the identifier of last query that was executed.  By
+      default, query identifiers are not computed, so this field will always
+      be null, unless an additional module that compute query identifiers, such
+      as <xref linkend="pgstatstatements"/>, is configured.
+     </entry>
+    </row>
     <row>
      <entry><structfield>query</structfield></entry>
      <entry><type>text</type></entry>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b8a3f46912..eb217fd713 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -757,6 +757,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 28130fbc2b..13d9947025 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -54,6 +54,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "tcop/utility.h"
@@ -142,6 +143,13 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	/* In some cases (e.g. an EXECUTE statement) a query execution will skip
+	 * parse analysis, which means that the queryid won't be reported.  Note
+	 * that it's harmless to report the queryid multiple time, as the call will
+	 * be ignored if the top level queryid has already been reported.
+	 */
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a753d6efa0..c5c02a1d2f 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -121,7 +121,7 @@ typedef struct ExecParallelInitializeDSMContext
 } ExecParallelInitializeDSMContext;
 
 /* Helper functions that run in the parallel leader. */
-static char *ExecSerializePlan(Plan *plan, EState *estate);
+static char *ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId);
 static bool ExecParallelEstimate(PlanState *node,
 								 ExecParallelEstimateContext *e);
 static bool ExecParallelInitializeDSM(PlanState *node,
@@ -140,7 +140,7 @@ static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
  * Create a serialized representation of the plan to be sent to each worker.
  */
 static char *
-ExecSerializePlan(Plan *plan, EState *estate)
+ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId)
 {
 	PlannedStmt *pstmt;
 	ListCell   *lc;
@@ -171,7 +171,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = UINT64CONST(0);
+	pstmt->queryId = queryId;
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
@@ -562,7 +562,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
 ParallelExecutorInfo *
 ExecInitParallelPlan(PlanState *planstate, EState *estate,
 					 Bitmapset *sendParams, int nworkers,
-					 int64 tuples_needed)
+					 int64 tuples_needed,
+					 uint64 queryId)
 {
 	ParallelExecutorInfo *pei;
 	ParallelContext *pcxt;
@@ -603,7 +604,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pei->planstate = planstate;
 
 	/* Fix up and serialize plan to be sent to workers. */
-	pstmt_data = ExecSerializePlan(planstate->plan, estate);
+	pstmt_data = ExecSerializePlan(planstate->plan, estate, queryId);
 
 	/* Create a parallel context. */
 	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -1357,8 +1358,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Setting debug_query_string for individual workers */
 	debug_query_string = queryDesc->sourceText;
 
-	/* Report workers' query for monitoring purposes */
+	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 6b8ed867d5..c57e197020 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -172,7 +172,8 @@ ExecGather(PlanState *pstate)
 												 estate,
 												 gather->initParam,
 												 gather->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 317ddb4ae2..b2040dca8e 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -24,6 +24,7 @@
 #include "lib/binaryheap.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -216,7 +217,8 @@ ExecGatherMerge(PlanState *pstate)
 												 estate,
 												 gm->initParam,
 												 gm->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 6676412842..11fead8422 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -43,6 +43,7 @@
 #include "parser/parse_relation.h"
 #include "parser/parse_target.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/rel.h"
 
@@ -120,6 +121,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
@@ -153,6 +156,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 462b4d7e06..130fd56f70 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3144,6 +3144,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
+			beentry->st_queryid = 0;
 			proc->wait_event_info = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
@@ -3174,6 +3175,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	beentry->st_state = state;
 	beentry->st_state_start_timestamp = current_timestamp;
 
+	/*
+	 * If a new query is started, we reset the query identifier as it'll only
+	 * be known after parse analysis, to avoid reporting last query's
+	 * identifier.
+	 */
+	if (state == STATE_RUNNING)
+		beentry->st_queryid = 0;
+
 	if (cmd_str != NULL)
 	{
 		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
@@ -3184,6 +3193,48 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/* --------
+ * pgstat_report_queryid() -
+ *
+ *	Called to update top-level query identifier.
+ * --------
+ */
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	/*
+	 * if track_activities is disabled, st_queryid should already have been
+	 * reset
+	 */
+	if (!pgstat_track_activities)
+		return;
+
+	/*
+	 * We only report the top-level query identifiers.  The stored queryid is
+	 * reset when a backend calls pgstat_report_activity(STATE_RUNNING), or
+	 * with an explicit call to this function using the force flag.  If the
+	 * saved query identifier is not zero it means that it's not a top-level
+	 * command, so ignore the one provided unless it's an explicit call to
+	 * reset the identifier.
+	 */
+	if (beentry->st_queryid != 0 && !force)
+		return;
+
+	/*
+	 * Update my status entry, following the protocol of bumping
+	 * st_changecount before and after.  We use a volatile pointer here to
+	 * ensure the compiler doesn't try to get cute.
+	 */
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+	beentry->st_queryid = queryId;
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
 /*-----------
  * pgstat_progress_start_command() -
  *
@@ -4757,6 +4808,20 @@ pgstat_get_db_entry(Oid databaseid, bool create)
 	return result;
 }
 
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ *	Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+	if (!MyBEEntry)
+		return 0;
+
+	return MyBEEntry->st_queryid;
+}
+
 
 /*
  * Lookup the hash table entry for the specified table. If no hash
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 9dba3b0566..32b2b0d1c0 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -742,6 +742,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	if (log_parser_stats)
 		ShowUsage("PARSE ANALYSIS STATISTICS");
 
@@ -958,6 +960,7 @@ pg_plan_queries(List *querytrees, int cursorOptions, ParamListInfo boundParams)
 			stmt->utilityStmt = query->utilityStmt;
 			stmt->stmt_location = query->stmt_location;
 			stmt->stmt_len = query->stmt_len;
+			stmt->queryId = query->queryId;
 		}
 		else
 		{
@@ -1073,6 +1076,8 @@ exec_simple_query(const char *query_string)
 		DestReceiver *receiver;
 		int16		format;
 
+		pgstat_report_queryid(0, true);
+
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 54d2673254..8dffac22bb 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -549,7 +549,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	30
+#define PG_STAT_GET_ACTIVITY_COLS	31
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -890,6 +890,10 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[28] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			if (beentry->st_queryid == 0)
+				nulls[30] = true;
+			else
+				values[30] = DatumGetUInt64(beentry->st_queryid);
 		}
 		else
 		{
@@ -918,6 +922,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[27] = true;
 			nulls[28] = true;
 			nulls[29] = true;
+			nulls[30] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index f5b0211f66..678c43fd45 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -72,10 +72,10 @@
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
-#include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2702,6 +2702,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (padding != 0)
+					appendStringInfo(buf, "%*ld", padding,
+							pgstat_get_my_queryid());
+				else
+					appendStringInfo(buf, "%ld",
+							pgstat_get_my_queryid());
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e58e4788a8..8d139c2b28 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -532,6 +532,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 07a86c7b7b..9e6da3b1ec 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5178,9 +5178,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid, queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 17d07cf020..63bb80c00c 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -38,7 +38,8 @@ typedef struct ParallelExecutorInfo
 
 extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
 												  EState *estate, Bitmapset *sendParam, int nworkers,
-												  int64 tuples_needed);
+												  int64 tuples_needed,
+												  uint64 queryId);
 extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
 extern void ExecParallelFinish(ParallelExecutorInfo *pei);
 extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 7bc36c6583..3df2967214 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1099,6 +1099,9 @@ typedef struct PgBackendStatus
 	ProgressCommandType st_progress_command;
 	Oid			st_progress_command_target;
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+
+	/* query identifier, optionnally computed using post_parse_analyze_hook */
+	uint64		st_queryid;
 } PgBackendStatus;
 
 /*
@@ -1278,6 +1281,7 @@ extern void pgstat_initialize(void);
 extern void pgstat_bestart(void);
 
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_queryid(uint64 queryId, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
@@ -1287,6 +1291,7 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
 extern const char *pgstat_get_backend_desc(BackendType backendType);
+extern uint64 pgstat_get_my_queryid(void);
 
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index c7304611c3..17e369993e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1746,9 +1746,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1852,7 +1853,7 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
@@ -2000,7 +2001,7 @@ pg_stat_replication| SELECT s.pid,
     w.spill_txns,
     w.spill_count,
     w.spill_bytes
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time, spill_txns, spill_count, spill_bytes) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_ssl| SELECT s.pid,
@@ -2012,7 +2013,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,

rjuju123@gmail.com

almost 6 years ago

In reply to: Peter Geoghegan (#24)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Jun 28, 2019 at 11:49:53AM -0700, Peter Geoghegan wrote:

On Tue, Mar 19, 2019 at 12:38 PM legrand legrand
<legrand_legrand@hotmail.com> wrote:

Would it make sense to add it in auto explain ?
I don't know for explain itself, but maybe ...

I think that it should appear in EXPLAIN. pg_stat_statements already
cannot have a query hash of zero, so it might be okay to display it
only when its value is non-zero.

I had forgotten about this. After looking at it, I can see a few issues.

For now post_parse_analyze_hook isn't called for the underlying statement, so
we don't have the queryid. And we can't compute the queryid for the underlying
query in the initial post_parse_analyze_hook call as we don't want the executor
to have a queryid set in that case to avoid cumulating counters for both the
explain and the query.

We could add an extra call in ExplainQuery, but this will be ignored by
pg_stat_statements unless you set pg_stat_statements.track to all. Also,
pgss_post_parse_analyze will try to record an entry with the normalized query
text if no one exists yet and if any constant where removed. The problem is
that, as I already mentioned in [1]/messages/by-id/CAOBaU_Y-y+VOhTZgDOuDk6-9V72-ZXdWccXo_kx0P4DDBEEh9A@mail.gmail.com, the underlying query doesn't have
query_location or query_len valued, so the recorded query text will at least
contain the explain part of the input query.

[1]: /messages/by-id/CAOBaU_Y-y+VOhTZgDOuDk6-9V72-ZXdWccXo_kx0P4DDBEEh9A@mail.gmail.com

rjuju123@gmail.com

almost 6 years ago

In reply to: Julien Rouhaud (#57)

1 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Mar 03, 2020 at 04:24:59PM +0100, Julien Rouhaud wrote:

cfbot reports a failure since 2f9661311b (command completion tag
change), so here's a rebased v6, no change otherwise.

Conflict with 8e8a0becb3 (Unify several ways to tracking backend type), thanks
again to cfbot, rebased v7 attached.

Attachments:

v7-0001-Expose-queryid-in-pg_stat_activity-and-log_line_p.patchtext/x-diff; charset=us-asciiDownload

From dda1ab659a44c9a6375ee051111d249baa2ec552 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Mon, 18 Mar 2019 18:55:50 +0100
Subject: [PATCH v7 1/2] Expose queryid in pg_stat_activity and log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.

Author: Julien Rouhaud
Reviewed-by: Evgeny Efimkin, Michael Paquier
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 179 ++++++++++++------
 doc/src/sgml/config.sgml                      |   9 +-
 doc/src/sgml/monitoring.sgml                  |  12 ++
 src/backend/catalog/system_views.sql          |   1 +
 src/backend/executor/execMain.c               |   8 +
 src/backend/executor/execParallel.c           |  14 +-
 src/backend/executor/nodeGather.c             |   3 +-
 src/backend/executor/nodeGatherMerge.c        |   4 +-
 src/backend/parser/analyze.c                  |   5 +
 src/backend/postmaster/pgstat.c               |  65 +++++++
 src/backend/tcop/postgres.c                   |   5 +
 src/backend/utils/adt/pgstatfuncs.c           |   7 +-
 src/backend/utils/error/elog.c                |  10 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/catalog/pg_proc.dat               |   6 +-
 src/include/executor/execParallel.h           |   3 +-
 src/include/pgstat.h                          |   5 +
 src/test/regress/expected/rules.out           |   9 +-
 18 files changed, 267 insertions(+), 79 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 20dc8c605b..2b3aa79cb6 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -112,6 +112,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+									!IsA(n, PrepareStmt) && \
+									!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -308,7 +316,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 								ProcessUtilityContext context, ParamListInfo params,
 								QueryEnvironment *queryEnv,
 								DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
+static const char *pgss_clean_querytext(const char *query, int *location, int *len);
+static uint64 pgss_compute_utility_queryid(const char *query, int query_len);
 static void pgss_store(const char *query, uint64 queryId,
 					   int query_location, int query_len,
 					   double total_time, uint64 rows,
@@ -792,16 +801,34 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * We compute a queryId now so that it can get exported in out
+	 * PgBackendStatus.  pgss_ProcessUtility will later discard it to prevents
+	 * double counting of optimizable statements that are directly contained in
+	 * utility statements.  Note that we don't compute a queryId for prepared
+	 * statemets related utility, as those will inherit from the underlying
+	 * statements's one (except DEALLOCATE which is entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && PGSS_HANDLED_UTILITY(query->utilityStmt)
+			&& pstate->p_sourcetext)
+		{
+			const char *querytext = pstate->p_sourcetext;
+			int query_location = query->stmt_location;
+			int query_len = query->stmt_len;
+
+			/*
+			 * Confine our attention to the relevant part of the string, if the
+			 * query is a portion of a multi-statement source string.
+			 */
+			querytext = pgss_clean_querytext(pstate->p_sourcetext,
+											 &query_location,
+											 &query_len);
+
+			query->queryId = pgss_compute_utility_queryid(querytext, query_len);
+		}
+		else
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -963,6 +990,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, QueryCompletion *qc)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
+	uint64		saved_queryId = pstmt->queryId;
+
+	/*
+	 * Utility statements get queryId zero.  We do this even in cases where
+	 * the statement contains an optimizable statement for which a queryId
+	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
+	 * runtime control will first go through ProcessUtility and then the
+	 * executor, and we don't want the executor hooks to do anything, since we
+	 * are already measuring the statement's costs at the utility level.
+	 *
+	 * Note that this is only done if pg_stat_statements is enabled and
+	 * configured to track utility statements, in the unlikely possibility
+	 * that user configured another extension to handle utility statements
+	 * only.
+	 */
+	if (pgss_enabled() && pgss_track_utility)
+		pstmt->queryId = UINT64CONST(0);
 
 	/*
 	 * If it's an EXECUTE statement, we don't track it and don't increment the
@@ -979,9 +1023,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	 * Likewise, we don't track execution of DEALLOCATE.
 	 */
 	if (pgss_track_utility && pgss_enabled() &&
-		!IsA(parsetree, ExecuteStmt) &&
-		!IsA(parsetree, PrepareStmt) &&
-		!IsA(parsetree, DeallocateStmt))
+		PGSS_HANDLED_UTILITY(parsetree))
 	{
 		instr_time	start;
 		instr_time	duration;
@@ -1042,7 +1084,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		INSTR_TIME_SUBTRACT(bufusage.blk_write_time, bufusage_start.blk_write_time);
 
 		pgss_store(queryString,
-				   0,			/* signal that it's a utility stmt */
+				   saved_queryId,
 				   pstmt->stmt_location,
 				   pstmt->stmt_len,
 				   INSTR_TIME_GET_MILLISEC(duration),
@@ -1064,22 +1106,76 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 }
 
 /*
- * Given an arbitrarily long query string, produce a hash for the purposes of
- * identifying the query, without normalizing constants.  Used when hashing
- * utility statements.
+ * Given a possibly multi-statement source string, confine our attention to the
+ * relevant part of the string.
+ */
+static const char *
+pgss_clean_querytext(const char *query, int *location, int *len)
+{
+	int query_location = *location;
+	int query_len = *len;
+
+	/* First apply starting offset, unless it's -1 (unknown). */
+	if (query_location >= 0)
+	{
+		Assert(query_location <= strlen(query));
+		query += query_location;
+		/* Length of 0 (or -1) means "rest of string" */
+		if (query_len <= 0)
+			query_len = strlen(query);
+		else
+			Assert(query_len <= strlen(query));
+	}
+	else
+	{
+		/* If query location is unknown, distrust query_len as well */
+		query_location = 0;
+		query_len = strlen(query);
+	}
+
+	/*
+	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
+	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 */
+	while (query_len > 0 && scanner_isspace(query[0]))
+		query++, query_location++, query_len--;
+	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
+		query_len--;
+
+	*location = query_location;
+	*len = query_len;
+
+	return query;
+}
+
+/*
+ * Compute a query identifier for the given utility query string.
  */
 static uint64
-pgss_hash_string(const char *str, int len)
+pgss_compute_utility_queryid(const char *str, int query_len)
 {
-	return DatumGetUInt64(hash_any_extended((const unsigned char *) str,
-											len, 0));
+	uint64 queryId;
+
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+											   query_len, 0));
+
+	/*
+	 * If we are unlucky enough to get a hash of zero(invalid), use
+	 * queryID as 2 instead, queryID 1 is already in use for normal
+	 * statements.
+	 */
+	if (queryId == UINT64CONST(0))
+		queryId = UINT64CONST(2);
+
+	return queryId;
 }
 
 /*
  * Store some statistics for a statement.
  *
- * If queryId is 0 then this is a utility statement and we should compute
- * a suitable queryId internally.
+ * If queryId is 0 then this is a utility statement for which we couldn't
+ * compute a queryId during parse analysis, and we should compute a suitable
+ * queryId internally.
  *
  * If jstate is not NULL then we're trying to create an entry for which
  * we have no statistics as yet; we just want to record the normalized
@@ -1106,50 +1202,15 @@ pgss_store(const char *query, uint64 queryId,
 	/*
 	 * Confine our attention to the relevant part of the string, if the query
 	 * is a portion of a multi-statement source string.
-	 *
-	 * First apply starting offset, unless it's -1 (unknown).
-	 */
-	if (query_location >= 0)
-	{
-		Assert(query_location <= strlen(query));
-		query += query_location;
-		/* Length of 0 (or -1) means "rest of string" */
-		if (query_len <= 0)
-			query_len = strlen(query);
-		else
-			Assert(query_len <= strlen(query));
-	}
-	else
-	{
-		/* If query location is unknown, distrust query_len as well */
-		query_location = 0;
-		query_len = strlen(query);
-	}
-
-	/*
-	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
-	 * not libc's isspace(), because we want to match the lexer's behavior.
 	 */
-	while (query_len > 0 && scanner_isspace(query[0]))
-		query++, query_location++, query_len--;
-	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
-		query_len--;
+	query = pgss_clean_querytext(query, &query_location, &query_len);
 
 	/*
-	 * For utility statements, we just hash the query string to get an ID.
+	 * For not already handled utility statements, we just hash the query
+	 * string to get an ID.
 	 */
 	if (queryId == UINT64CONST(0))
-	{
-		queryId = pgss_hash_string(query, query_len);
-
-		/*
-		 * If we are unlucky enough to get a hash of zero(invalid), use
-		 * queryID as 2 instead, queryID 1 is already in use for normal
-		 * statements.
-		 */
-		if (queryId == UINT64CONST(0))
-			queryId = UINT64CONST(2);
-	}
+		queryId = pgss_compute_utility_queryid(query, query_len);
 
 	/* Set up key for hashtable search */
 	key.userid = GetUserId();
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 371d7838fb..f71ed56f57 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6539,6 +6539,11 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>queryid: identifier of session's current query, if any</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
@@ -6947,8 +6952,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       <listitem>
        <para>
         Enables the collection of information on the currently
-        executing command of each session, along with the time when
-        that command began execution. This parameter is on by
+        executing command of each session, along with its identifier and the
+        time when that command began execution. This parameter is on by
         default. Note that even when enabled, this information is not
         visible to all users, only to superusers and the user owning
         the session being reported on, so it should not represent a
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 987580d6df..bf7a81ed6e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -853,6 +853,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      <entry><type>xid</type></entry>
      <entry>The current backend's <literal>xmin</literal> horizon.</entry>
     </row>
+    <row>
+     <entry><structfield>queryid</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this field
+      shows the identifier of the currently executing query. In all other
+      states, it shows the identifier of last query that was executed.  By
+      default, query identifiers are not computed, so this field will always
+      be null, unless an additional module that compute query identifiers, such
+      as <xref linkend="pgstatstatements"/>, is configured.
+     </entry>
+    </row>
     <row>
      <entry><structfield>query</structfield></entry>
      <entry><type>text</type></entry>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b8a3f46912..eb217fd713 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -757,6 +757,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 28130fbc2b..13d9947025 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -54,6 +54,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "tcop/utility.h"
@@ -142,6 +143,13 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	/* In some cases (e.g. an EXECUTE statement) a query execution will skip
+	 * parse analysis, which means that the queryid won't be reported.  Note
+	 * that it's harmless to report the queryid multiple time, as the call will
+	 * be ignored if the top level queryid has already been reported.
+	 */
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a753d6efa0..c5c02a1d2f 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -121,7 +121,7 @@ typedef struct ExecParallelInitializeDSMContext
 } ExecParallelInitializeDSMContext;
 
 /* Helper functions that run in the parallel leader. */
-static char *ExecSerializePlan(Plan *plan, EState *estate);
+static char *ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId);
 static bool ExecParallelEstimate(PlanState *node,
 								 ExecParallelEstimateContext *e);
 static bool ExecParallelInitializeDSM(PlanState *node,
@@ -140,7 +140,7 @@ static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
  * Create a serialized representation of the plan to be sent to each worker.
  */
 static char *
-ExecSerializePlan(Plan *plan, EState *estate)
+ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId)
 {
 	PlannedStmt *pstmt;
 	ListCell   *lc;
@@ -171,7 +171,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = UINT64CONST(0);
+	pstmt->queryId = queryId;
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
@@ -562,7 +562,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
 ParallelExecutorInfo *
 ExecInitParallelPlan(PlanState *planstate, EState *estate,
 					 Bitmapset *sendParams, int nworkers,
-					 int64 tuples_needed)
+					 int64 tuples_needed,
+					 uint64 queryId)
 {
 	ParallelExecutorInfo *pei;
 	ParallelContext *pcxt;
@@ -603,7 +604,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pei->planstate = planstate;
 
 	/* Fix up and serialize plan to be sent to workers. */
-	pstmt_data = ExecSerializePlan(planstate->plan, estate);
+	pstmt_data = ExecSerializePlan(planstate->plan, estate, queryId);
 
 	/* Create a parallel context. */
 	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -1357,8 +1358,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Setting debug_query_string for individual workers */
 	debug_query_string = queryDesc->sourceText;
 
-	/* Report workers' query for monitoring purposes */
+	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 6b8ed867d5..c57e197020 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -172,7 +172,8 @@ ExecGather(PlanState *pstate)
 												 estate,
 												 gather->initParam,
 												 gather->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 317ddb4ae2..b2040dca8e 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -24,6 +24,7 @@
 #include "lib/binaryheap.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -216,7 +217,8 @@ ExecGatherMerge(PlanState *pstate)
 												 estate,
 												 gm->initParam,
 												 gm->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 6676412842..11fead8422 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -43,6 +43,7 @@
 #include "parser/parse_relation.h"
 #include "parser/parse_target.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/rel.h"
 
@@ -120,6 +121,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
@@ -153,6 +156,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f9287b7942..e0776ddf1a 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3089,6 +3089,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
+			beentry->st_queryid = 0;
 			proc->wait_event_info = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
@@ -3119,6 +3120,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	beentry->st_state = state;
 	beentry->st_state_start_timestamp = current_timestamp;
 
+	/*
+	 * If a new query is started, we reset the query identifier as it'll only
+	 * be known after parse analysis, to avoid reporting last query's
+	 * identifier.
+	 */
+	if (state == STATE_RUNNING)
+		beentry->st_queryid = 0;
+
 	if (cmd_str != NULL)
 	{
 		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
@@ -3129,6 +3138,48 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/* --------
+ * pgstat_report_queryid() -
+ *
+ *	Called to update top-level query identifier.
+ * --------
+ */
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	/*
+	 * if track_activities is disabled, st_queryid should already have been
+	 * reset
+	 */
+	if (!pgstat_track_activities)
+		return;
+
+	/*
+	 * We only report the top-level query identifiers.  The stored queryid is
+	 * reset when a backend calls pgstat_report_activity(STATE_RUNNING), or
+	 * with an explicit call to this function using the force flag.  If the
+	 * saved query identifier is not zero it means that it's not a top-level
+	 * command, so ignore the one provided unless it's an explicit call to
+	 * reset the identifier.
+	 */
+	if (beentry->st_queryid != 0 && !force)
+		return;
+
+	/*
+	 * Update my status entry, following the protocol of bumping
+	 * st_changecount before and after.  We use a volatile pointer here to
+	 * ensure the compiler doesn't try to get cute.
+	 */
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+	beentry->st_queryid = queryId;
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
 /*-----------
  * pgstat_progress_start_command() -
  *
@@ -4658,6 +4709,20 @@ pgstat_get_db_entry(Oid databaseid, bool create)
 	return result;
 }
 
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ *	Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+	if (!MyBEEntry)
+		return 0;
+
+	return MyBEEntry->st_queryid;
+}
+
 
 /*
  * Lookup the hash table entry for the specified table. If no hash
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 00c77b66c7..e4dd24cdc3 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -742,6 +742,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	if (log_parser_stats)
 		ShowUsage("PARSE ANALYSIS STATISTICS");
 
@@ -958,6 +960,7 @@ pg_plan_queries(List *querytrees, int cursorOptions, ParamListInfo boundParams)
 			stmt->utilityStmt = query->utilityStmt;
 			stmt->stmt_location = query->stmt_location;
 			stmt->stmt_len = query->stmt_len;
+			stmt->queryId = query->queryId;
 		}
 		else
 		{
@@ -1073,6 +1076,8 @@ exec_simple_query(const char *query_string)
 		DestReceiver *receiver;
 		int16		format;
 
+		pgstat_report_queryid(0, true);
+
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index cea01534a5..0a93d34f47 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -549,7 +549,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	30
+#define PG_STAT_GET_ACTIVITY_COLS	31
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -890,6 +890,10 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[28] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			if (beentry->st_queryid == 0)
+				nulls[30] = true;
+			else
+				values[30] = DatumGetUInt64(beentry->st_queryid);
 		}
 		else
 		{
@@ -918,6 +922,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[27] = true;
 			nulls[28] = true;
 			nulls[29] = true;
+			nulls[30] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index f8ae94729c..14e601ef26 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -72,10 +72,10 @@
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
-#include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2702,6 +2702,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (padding != 0)
+					appendStringInfo(buf, "%*ld", padding,
+							pgstat_get_my_queryid());
+				else
+					appendStringInfo(buf, "%ld",
+							pgstat_get_my_queryid());
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e58e4788a8..8d139c2b28 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -532,6 +532,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7fb574f9dc..d6c643cfc8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5178,9 +5178,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid, queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 17d07cf020..63bb80c00c 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -38,7 +38,8 @@ typedef struct ParallelExecutorInfo
 
 extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
 												  EState *estate, Bitmapset *sendParam, int nworkers,
-												  int64 tuples_needed);
+												  int64 tuples_needed,
+												  uint64 queryId);
 extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
 extern void ExecParallelFinish(ParallelExecutorInfo *pei);
 extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 1a19921f80..534affe80a 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1081,6 +1081,9 @@ typedef struct PgBackendStatus
 	ProgressCommandType st_progress_command;
 	Oid			st_progress_command_target;
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+
+	/* query identifier, optionnally computed using post_parse_analyze_hook */
+	uint64		st_queryid;
 } PgBackendStatus;
 
 /*
@@ -1260,6 +1263,7 @@ extern void pgstat_initialize(void);
 extern void pgstat_bestart(void);
 
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_queryid(uint64 queryId, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
@@ -1268,6 +1272,7 @@ extern const char *pgstat_get_wait_event_type(uint32 wait_event_info);
 extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
+extern uint64 pgstat_get_my_queryid(void);
 
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index c7304611c3..17e369993e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1746,9 +1746,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1852,7 +1853,7 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
@@ -2000,7 +2001,7 @@ pg_stat_replication| SELECT s.pid,
     w.spill_txns,
     w.spill_count,
     w.spill_bytes
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time, spill_txns, spill_count, spill_bytes) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_ssl| SELECT s.pid,
@@ -2012,7 +2013,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,
-- 
2.25.1

rjuju123@gmail.com

almost 6 years ago

In reply to: Julien Rouhaud (#59)

1 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Sat, Mar 14, 2020 at 06:53:51PM +0100, Julien Rouhaud wrote:

On Tue, Mar 03, 2020 at 04:24:59PM +0100, Julien Rouhaud wrote:

cfbot reports a failure since 2f9661311b (command completion tag
change), so here's a rebased v6, no change otherwise.

Conflict with 8e8a0becb3 (Unify several ways to tracking backend type), thanks
again to cfbot, rebased v7 attached.

Bit repetita.

Attachments:

v8-0001-Expose-queryid-in-pg_stat_activity-and-log_line_p.patchtext/x-diff; charset=us-asciiDownload

From 87be2c545e32c0c08a410949d5c5d383a4162af3 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Mon, 18 Mar 2019 18:55:50 +0100
Subject: [PATCH v8 1/2] Expose queryid in pg_stat_activity and log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.

Author: Julien Rouhaud
Reviewed-by: Evgeny Efimkin, Michael Paquier
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 179 ++++++++++++------
 doc/src/sgml/config.sgml                      |   9 +-
 doc/src/sgml/monitoring.sgml                  |  12 ++
 src/backend/catalog/system_views.sql          |   1 +
 src/backend/executor/execMain.c               |   8 +
 src/backend/executor/execParallel.c           |  14 +-
 src/backend/executor/nodeGather.c             |   3 +-
 src/backend/executor/nodeGatherMerge.c        |   4 +-
 src/backend/parser/analyze.c                  |   5 +
 src/backend/postmaster/pgstat.c               |  65 +++++++
 src/backend/tcop/postgres.c                   |   5 +
 src/backend/utils/adt/pgstatfuncs.c           |   7 +-
 src/backend/utils/error/elog.c                |  10 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/catalog/pg_proc.dat               |   6 +-
 src/include/executor/execParallel.h           |   3 +-
 src/include/pgstat.h                          |   5 +
 src/test/regress/expected/rules.out           |   9 +-
 18 files changed, 267 insertions(+), 79 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 20dc8c605b..2b3aa79cb6 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -112,6 +112,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+									!IsA(n, PrepareStmt) && \
+									!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -308,7 +316,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 								ProcessUtilityContext context, ParamListInfo params,
 								QueryEnvironment *queryEnv,
 								DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
+static const char *pgss_clean_querytext(const char *query, int *location, int *len);
+static uint64 pgss_compute_utility_queryid(const char *query, int query_len);
 static void pgss_store(const char *query, uint64 queryId,
 					   int query_location, int query_len,
 					   double total_time, uint64 rows,
@@ -792,16 +801,34 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * We compute a queryId now so that it can get exported in out
+	 * PgBackendStatus.  pgss_ProcessUtility will later discard it to prevents
+	 * double counting of optimizable statements that are directly contained in
+	 * utility statements.  Note that we don't compute a queryId for prepared
+	 * statemets related utility, as those will inherit from the underlying
+	 * statements's one (except DEALLOCATE which is entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && PGSS_HANDLED_UTILITY(query->utilityStmt)
+			&& pstate->p_sourcetext)
+		{
+			const char *querytext = pstate->p_sourcetext;
+			int query_location = query->stmt_location;
+			int query_len = query->stmt_len;
+
+			/*
+			 * Confine our attention to the relevant part of the string, if the
+			 * query is a portion of a multi-statement source string.
+			 */
+			querytext = pgss_clean_querytext(pstate->p_sourcetext,
+											 &query_location,
+											 &query_len);
+
+			query->queryId = pgss_compute_utility_queryid(querytext, query_len);
+		}
+		else
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -963,6 +990,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, QueryCompletion *qc)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
+	uint64		saved_queryId = pstmt->queryId;
+
+	/*
+	 * Utility statements get queryId zero.  We do this even in cases where
+	 * the statement contains an optimizable statement for which a queryId
+	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
+	 * runtime control will first go through ProcessUtility and then the
+	 * executor, and we don't want the executor hooks to do anything, since we
+	 * are already measuring the statement's costs at the utility level.
+	 *
+	 * Note that this is only done if pg_stat_statements is enabled and
+	 * configured to track utility statements, in the unlikely possibility
+	 * that user configured another extension to handle utility statements
+	 * only.
+	 */
+	if (pgss_enabled() && pgss_track_utility)
+		pstmt->queryId = UINT64CONST(0);
 
 	/*
 	 * If it's an EXECUTE statement, we don't track it and don't increment the
@@ -979,9 +1023,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	 * Likewise, we don't track execution of DEALLOCATE.
 	 */
 	if (pgss_track_utility && pgss_enabled() &&
-		!IsA(parsetree, ExecuteStmt) &&
-		!IsA(parsetree, PrepareStmt) &&
-		!IsA(parsetree, DeallocateStmt))
+		PGSS_HANDLED_UTILITY(parsetree))
 	{
 		instr_time	start;
 		instr_time	duration;
@@ -1042,7 +1084,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		INSTR_TIME_SUBTRACT(bufusage.blk_write_time, bufusage_start.blk_write_time);
 
 		pgss_store(queryString,
-				   0,			/* signal that it's a utility stmt */
+				   saved_queryId,
 				   pstmt->stmt_location,
 				   pstmt->stmt_len,
 				   INSTR_TIME_GET_MILLISEC(duration),
@@ -1064,22 +1106,76 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 }
 
 /*
- * Given an arbitrarily long query string, produce a hash for the purposes of
- * identifying the query, without normalizing constants.  Used when hashing
- * utility statements.
+ * Given a possibly multi-statement source string, confine our attention to the
+ * relevant part of the string.
+ */
+static const char *
+pgss_clean_querytext(const char *query, int *location, int *len)
+{
+	int query_location = *location;
+	int query_len = *len;
+
+	/* First apply starting offset, unless it's -1 (unknown). */
+	if (query_location >= 0)
+	{
+		Assert(query_location <= strlen(query));
+		query += query_location;
+		/* Length of 0 (or -1) means "rest of string" */
+		if (query_len <= 0)
+			query_len = strlen(query);
+		else
+			Assert(query_len <= strlen(query));
+	}
+	else
+	{
+		/* If query location is unknown, distrust query_len as well */
+		query_location = 0;
+		query_len = strlen(query);
+	}
+
+	/*
+	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
+	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 */
+	while (query_len > 0 && scanner_isspace(query[0]))
+		query++, query_location++, query_len--;
+	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
+		query_len--;
+
+	*location = query_location;
+	*len = query_len;
+
+	return query;
+}
+
+/*
+ * Compute a query identifier for the given utility query string.
  */
 static uint64
-pgss_hash_string(const char *str, int len)
+pgss_compute_utility_queryid(const char *str, int query_len)
 {
-	return DatumGetUInt64(hash_any_extended((const unsigned char *) str,
-											len, 0));
+	uint64 queryId;
+
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+											   query_len, 0));
+
+	/*
+	 * If we are unlucky enough to get a hash of zero(invalid), use
+	 * queryID as 2 instead, queryID 1 is already in use for normal
+	 * statements.
+	 */
+	if (queryId == UINT64CONST(0))
+		queryId = UINT64CONST(2);
+
+	return queryId;
 }
 
 /*
  * Store some statistics for a statement.
  *
- * If queryId is 0 then this is a utility statement and we should compute
- * a suitable queryId internally.
+ * If queryId is 0 then this is a utility statement for which we couldn't
+ * compute a queryId during parse analysis, and we should compute a suitable
+ * queryId internally.
  *
  * If jstate is not NULL then we're trying to create an entry for which
  * we have no statistics as yet; we just want to record the normalized
@@ -1106,50 +1202,15 @@ pgss_store(const char *query, uint64 queryId,
 	/*
 	 * Confine our attention to the relevant part of the string, if the query
 	 * is a portion of a multi-statement source string.
-	 *
-	 * First apply starting offset, unless it's -1 (unknown).
-	 */
-	if (query_location >= 0)
-	{
-		Assert(query_location <= strlen(query));
-		query += query_location;
-		/* Length of 0 (or -1) means "rest of string" */
-		if (query_len <= 0)
-			query_len = strlen(query);
-		else
-			Assert(query_len <= strlen(query));
-	}
-	else
-	{
-		/* If query location is unknown, distrust query_len as well */
-		query_location = 0;
-		query_len = strlen(query);
-	}
-
-	/*
-	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
-	 * not libc's isspace(), because we want to match the lexer's behavior.
 	 */
-	while (query_len > 0 && scanner_isspace(query[0]))
-		query++, query_location++, query_len--;
-	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
-		query_len--;
+	query = pgss_clean_querytext(query, &query_location, &query_len);
 
 	/*
-	 * For utility statements, we just hash the query string to get an ID.
+	 * For not already handled utility statements, we just hash the query
+	 * string to get an ID.
 	 */
 	if (queryId == UINT64CONST(0))
-	{
-		queryId = pgss_hash_string(query, query_len);
-
-		/*
-		 * If we are unlucky enough to get a hash of zero(invalid), use
-		 * queryID as 2 instead, queryID 1 is already in use for normal
-		 * statements.
-		 */
-		if (queryId == UINT64CONST(0))
-			queryId = UINT64CONST(2);
-	}
+		queryId = pgss_compute_utility_queryid(query, query_len);
 
 	/* Set up key for hashtable search */
 	key.userid = GetUserId();
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 672bf6f1ee..578a3270b5 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6568,6 +6568,11 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>queryid: identifier of session's current query, if any</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
@@ -6985,8 +6990,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       <listitem>
        <para>
         Enables the collection of information on the currently
-        executing command of each session, along with the time when
-        that command began execution. This parameter is on by
+        executing command of each session, along with its identifier and the
+        time when that command began execution. This parameter is on by
         default. Note that even when enabled, this information is not
         visible to all users, only to superusers and the user owning
         the session being reported on, so it should not represent a
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 987580d6df..bf7a81ed6e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -853,6 +853,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      <entry><type>xid</type></entry>
      <entry>The current backend's <literal>xmin</literal> horizon.</entry>
     </row>
+    <row>
+     <entry><structfield>queryid</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this field
+      shows the identifier of the currently executing query. In all other
+      states, it shows the identifier of last query that was executed.  By
+      default, query identifiers are not computed, so this field will always
+      be null, unless an additional module that compute query identifiers, such
+      as <xref linkend="pgstatstatements"/>, is configured.
+     </entry>
+    </row>
     <row>
      <entry><structfield>query</structfield></entry>
      <entry><type>text</type></entry>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b8a3f46912..eb217fd713 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -757,6 +757,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 28130fbc2b..13d9947025 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -54,6 +54,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "tcop/utility.h"
@@ -142,6 +143,13 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	/* In some cases (e.g. an EXECUTE statement) a query execution will skip
+	 * parse analysis, which means that the queryid won't be reported.  Note
+	 * that it's harmless to report the queryid multiple time, as the call will
+	 * be ignored if the top level queryid has already been reported.
+	 */
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a753d6efa0..c5c02a1d2f 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -121,7 +121,7 @@ typedef struct ExecParallelInitializeDSMContext
 } ExecParallelInitializeDSMContext;
 
 /* Helper functions that run in the parallel leader. */
-static char *ExecSerializePlan(Plan *plan, EState *estate);
+static char *ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId);
 static bool ExecParallelEstimate(PlanState *node,
 								 ExecParallelEstimateContext *e);
 static bool ExecParallelInitializeDSM(PlanState *node,
@@ -140,7 +140,7 @@ static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
  * Create a serialized representation of the plan to be sent to each worker.
  */
 static char *
-ExecSerializePlan(Plan *plan, EState *estate)
+ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId)
 {
 	PlannedStmt *pstmt;
 	ListCell   *lc;
@@ -171,7 +171,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = UINT64CONST(0);
+	pstmt->queryId = queryId;
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
@@ -562,7 +562,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
 ParallelExecutorInfo *
 ExecInitParallelPlan(PlanState *planstate, EState *estate,
 					 Bitmapset *sendParams, int nworkers,
-					 int64 tuples_needed)
+					 int64 tuples_needed,
+					 uint64 queryId)
 {
 	ParallelExecutorInfo *pei;
 	ParallelContext *pcxt;
@@ -603,7 +604,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pei->planstate = planstate;
 
 	/* Fix up and serialize plan to be sent to workers. */
-	pstmt_data = ExecSerializePlan(planstate->plan, estate);
+	pstmt_data = ExecSerializePlan(planstate->plan, estate, queryId);
 
 	/* Create a parallel context. */
 	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -1357,8 +1358,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Setting debug_query_string for individual workers */
 	debug_query_string = queryDesc->sourceText;
 
-	/* Report workers' query for monitoring purposes */
+	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 6b8ed867d5..c57e197020 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -172,7 +172,8 @@ ExecGather(PlanState *pstate)
 												 estate,
 												 gather->initParam,
 												 gather->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 317ddb4ae2..b2040dca8e 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -24,6 +24,7 @@
 #include "lib/binaryheap.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -216,7 +217,8 @@ ExecGatherMerge(PlanState *pstate)
 												 estate,
 												 gm->initParam,
 												 gm->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 6676412842..11fead8422 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -43,6 +43,7 @@
 #include "parser/parse_relation.h"
 #include "parser/parse_target.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/rel.h"
 
@@ -120,6 +121,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
@@ -153,6 +156,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f9287b7942..e0776ddf1a 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3089,6 +3089,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
+			beentry->st_queryid = 0;
 			proc->wait_event_info = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
@@ -3119,6 +3120,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	beentry->st_state = state;
 	beentry->st_state_start_timestamp = current_timestamp;
 
+	/*
+	 * If a new query is started, we reset the query identifier as it'll only
+	 * be known after parse analysis, to avoid reporting last query's
+	 * identifier.
+	 */
+	if (state == STATE_RUNNING)
+		beentry->st_queryid = 0;
+
 	if (cmd_str != NULL)
 	{
 		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
@@ -3129,6 +3138,48 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/* --------
+ * pgstat_report_queryid() -
+ *
+ *	Called to update top-level query identifier.
+ * --------
+ */
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	/*
+	 * if track_activities is disabled, st_queryid should already have been
+	 * reset
+	 */
+	if (!pgstat_track_activities)
+		return;
+
+	/*
+	 * We only report the top-level query identifiers.  The stored queryid is
+	 * reset when a backend calls pgstat_report_activity(STATE_RUNNING), or
+	 * with an explicit call to this function using the force flag.  If the
+	 * saved query identifier is not zero it means that it's not a top-level
+	 * command, so ignore the one provided unless it's an explicit call to
+	 * reset the identifier.
+	 */
+	if (beentry->st_queryid != 0 && !force)
+		return;
+
+	/*
+	 * Update my status entry, following the protocol of bumping
+	 * st_changecount before and after.  We use a volatile pointer here to
+	 * ensure the compiler doesn't try to get cute.
+	 */
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+	beentry->st_queryid = queryId;
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
 /*-----------
  * pgstat_progress_start_command() -
  *
@@ -4658,6 +4709,20 @@ pgstat_get_db_entry(Oid databaseid, bool create)
 	return result;
 }
 
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ *	Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+	if (!MyBEEntry)
+		return 0;
+
+	return MyBEEntry->st_queryid;
+}
+
 
 /*
  * Lookup the hash table entry for the specified table. If no hash
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 00c77b66c7..e4dd24cdc3 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -742,6 +742,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	if (log_parser_stats)
 		ShowUsage("PARSE ANALYSIS STATISTICS");
 
@@ -958,6 +960,7 @@ pg_plan_queries(List *querytrees, int cursorOptions, ParamListInfo boundParams)
 			stmt->utilityStmt = query->utilityStmt;
 			stmt->stmt_location = query->stmt_location;
 			stmt->stmt_len = query->stmt_len;
+			stmt->queryId = query->queryId;
 		}
 		else
 		{
@@ -1073,6 +1076,8 @@ exec_simple_query(const char *query_string)
 		DestReceiver *receiver;
 		int16		format;
 
+		pgstat_report_queryid(0, true);
+
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index cea01534a5..0a93d34f47 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -549,7 +549,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	30
+#define PG_STAT_GET_ACTIVITY_COLS	31
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -890,6 +890,10 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[28] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			if (beentry->st_queryid == 0)
+				nulls[30] = true;
+			else
+				values[30] = DatumGetUInt64(beentry->st_queryid);
 		}
 		else
 		{
@@ -918,6 +922,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[27] = true;
 			nulls[28] = true;
 			nulls[29] = true;
+			nulls[30] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 62eef7b71f..632340dd0a 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -72,11 +72,11 @@
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/bgworker.h"
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
-#include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2720,6 +2720,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (padding != 0)
+					appendStringInfo(buf, "%*ld", padding,
+							pgstat_get_my_queryid());
+				else
+					appendStringInfo(buf, "%ld",
+							pgstat_get_my_queryid());
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index aa44f0c9bf..edc00650de 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -533,6 +533,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7fb574f9dc..d6c643cfc8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5178,9 +5178,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid, queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 17d07cf020..63bb80c00c 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -38,7 +38,8 @@ typedef struct ParallelExecutorInfo
 
 extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
 												  EState *estate, Bitmapset *sendParam, int nworkers,
-												  int64 tuples_needed);
+												  int64 tuples_needed,
+												  uint64 queryId);
 extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
 extern void ExecParallelFinish(ParallelExecutorInfo *pei);
 extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 1a19921f80..534affe80a 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1081,6 +1081,9 @@ typedef struct PgBackendStatus
 	ProgressCommandType st_progress_command;
 	Oid			st_progress_command_target;
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+
+	/* query identifier, optionnally computed using post_parse_analyze_hook */
+	uint64		st_queryid;
 } PgBackendStatus;
 
 /*
@@ -1260,6 +1263,7 @@ extern void pgstat_initialize(void);
 extern void pgstat_bestart(void);
 
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_queryid(uint64 queryId, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
@@ -1268,6 +1272,7 @@ extern const char *pgstat_get_wait_event_type(uint32 wait_event_info);
 extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
+extern uint64 pgstat_get_my_queryid(void);
 
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index c7304611c3..17e369993e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1746,9 +1746,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1852,7 +1853,7 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
@@ -2000,7 +2001,7 @@ pg_stat_replication| SELECT s.pid,
     w.spill_txns,
     w.spill_count,
     w.spill_bytes
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time, spill_txns, spill_count, spill_bytes) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_ssl| SELECT s.pid,
@@ -2012,7 +2013,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,
-- 
2.25.1

rjuju123@gmail.com

almost 6 years ago

In reply to: Julien Rouhaud (#60)

1 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

New conflict, rebased v9 attached.

Attachments:

v9-0001-Expose-queryid-in-pg_stat_activity-and-log_line_p.patchtext/x-diff; charset=us-asciiDownload

From 26b98194d8add282158c65f6ac46c721ba80f498 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Mon, 18 Mar 2019 18:55:50 +0100
Subject: [PATCH v9 1/2] Expose queryid in pg_stat_activity and log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.

Author: Julien Rouhaud
Reviewed-by: Evgeny Efimkin, Michael Paquier
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 179 ++++++++++++------
 doc/src/sgml/config.sgml                      |   9 +-
 doc/src/sgml/monitoring.sgml                  |  12 ++
 src/backend/catalog/system_views.sql          |   1 +
 src/backend/executor/execMain.c               |   8 +
 src/backend/executor/execParallel.c           |  14 +-
 src/backend/executor/nodeGather.c             |   3 +-
 src/backend/executor/nodeGatherMerge.c        |   4 +-
 src/backend/parser/analyze.c                  |   5 +
 src/backend/postmaster/pgstat.c               |  65 +++++++
 src/backend/tcop/postgres.c                   |   5 +
 src/backend/utils/adt/pgstatfuncs.c           |   7 +-
 src/backend/utils/error/elog.c                |  10 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/catalog/pg_proc.dat               |   6 +-
 src/include/executor/execParallel.h           |   3 +-
 src/include/pgstat.h                          |   5 +
 src/test/regress/expected/rules.out           |   9 +-
 18 files changed, 267 insertions(+), 79 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 942922b01f..4073361f4c 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -115,6 +115,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+									!IsA(n, PrepareStmt) && \
+									!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -342,7 +350,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 								ProcessUtilityContext context, ParamListInfo params,
 								QueryEnvironment *queryEnv,
 								DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
+static const char *pgss_clean_querytext(const char *query, int *location, int *len);
+static uint64 pgss_compute_utility_queryid(const char *query, int query_len);
 static void pgss_store(const char *query, uint64 queryId,
 					   int query_location, int query_len,
 					   pgssStoreKind kind,
@@ -841,16 +850,34 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * We compute a queryId now so that it can get exported in out
+	 * PgBackendStatus.  pgss_ProcessUtility will later discard it to prevents
+	 * double counting of optimizable statements that are directly contained in
+	 * utility statements.  Note that we don't compute a queryId for prepared
+	 * statemets related utility, as those will inherit from the underlying
+	 * statements's one (except DEALLOCATE which is entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && PGSS_HANDLED_UTILITY(query->utilityStmt)
+			&& pstate->p_sourcetext)
+		{
+			const char *querytext = pstate->p_sourcetext;
+			int query_location = query->stmt_location;
+			int query_len = query->stmt_len;
+
+			/*
+			 * Confine our attention to the relevant part of the string, if the
+			 * query is a portion of a multi-statement source string.
+			 */
+			querytext = pgss_clean_querytext(pstate->p_sourcetext,
+											 &query_location,
+											 &query_len);
+
+			query->queryId = pgss_compute_utility_queryid(querytext, query_len);
+		}
+		else
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -1098,6 +1125,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, QueryCompletion *qc)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
+	uint64		saved_queryId = pstmt->queryId;
+
+	/*
+	 * Utility statements get queryId zero.  We do this even in cases where
+	 * the statement contains an optimizable statement for which a queryId
+	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
+	 * runtime control will first go through ProcessUtility and then the
+	 * executor, and we don't want the executor hooks to do anything, since we
+	 * are already measuring the statement's costs at the utility level.
+	 *
+	 * Note that this is only done if pg_stat_statements is enabled and
+	 * configured to track utility statements, in the unlikely possibility
+	 * that user configured another extension to handle utility statements
+	 * only.
+	 */
+	if (pgss_enabled(exec_nested_level) && pgss_track_utility)
+		pstmt->queryId = UINT64CONST(0);
 
 	/*
 	 * If it's an EXECUTE statement, we don't track it and don't increment the
@@ -1114,9 +1158,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	 * Likewise, we don't track execution of DEALLOCATE.
 	 */
 	if (pgss_track_utility && pgss_enabled(exec_nested_level) &&
-		!IsA(parsetree, ExecuteStmt) &&
-		!IsA(parsetree, PrepareStmt) &&
-		!IsA(parsetree, DeallocateStmt))
+		PGSS_HANDLED_UTILITY(parsetree))
 	{
 		instr_time	start;
 		instr_time	duration;
@@ -1155,7 +1197,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
 
 		pgss_store(queryString,
-				   0,			/* signal that it's a utility stmt */
+				   saved_queryId,
 				   pstmt->stmt_location,
 				   pstmt->stmt_len,
 				   PGSS_EXEC,
@@ -1178,22 +1220,76 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 }
 
 /*
- * Given an arbitrarily long query string, produce a hash for the purposes of
- * identifying the query, without normalizing constants.  Used when hashing
- * utility statements.
+ * Given a possibly multi-statement source string, confine our attention to the
+ * relevant part of the string.
+ */
+static const char *
+pgss_clean_querytext(const char *query, int *location, int *len)
+{
+	int query_location = *location;
+	int query_len = *len;
+
+	/* First apply starting offset, unless it's -1 (unknown). */
+	if (query_location >= 0)
+	{
+		Assert(query_location <= strlen(query));
+		query += query_location;
+		/* Length of 0 (or -1) means "rest of string" */
+		if (query_len <= 0)
+			query_len = strlen(query);
+		else
+			Assert(query_len <= strlen(query));
+	}
+	else
+	{
+		/* If query location is unknown, distrust query_len as well */
+		query_location = 0;
+		query_len = strlen(query);
+	}
+
+	/*
+	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
+	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 */
+	while (query_len > 0 && scanner_isspace(query[0]))
+		query++, query_location++, query_len--;
+	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
+		query_len--;
+
+	*location = query_location;
+	*len = query_len;
+
+	return query;
+}
+
+/*
+ * Compute a query identifier for the given utility query string.
  */
 static uint64
-pgss_hash_string(const char *str, int len)
+pgss_compute_utility_queryid(const char *str, int query_len)
 {
-	return DatumGetUInt64(hash_any_extended((const unsigned char *) str,
-											len, 0));
+	uint64 queryId;
+
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+											   query_len, 0));
+
+	/*
+	 * If we are unlucky enough to get a hash of zero(invalid), use
+	 * queryID as 2 instead, queryID 1 is already in use for normal
+	 * statements.
+	 */
+	if (queryId == UINT64CONST(0))
+		queryId = UINT64CONST(2);
+
+	return queryId;
 }
 
 /*
  * Store some statistics for a statement.
  *
- * If queryId is 0 then this is a utility statement and we should compute
- * a suitable queryId internally.
+ * If queryId is 0 then this is a utility statement for which we couldn't
+ * compute a queryId during parse analysis, and we should compute a suitable
+ * queryId internally.
  *
  * If jstate is not NULL then we're trying to create an entry for which
  * we have no statistics as yet; we just want to record the normalized
@@ -1224,50 +1320,15 @@ pgss_store(const char *query, uint64 queryId,
 	/*
 	 * Confine our attention to the relevant part of the string, if the query
 	 * is a portion of a multi-statement source string.
-	 *
-	 * First apply starting offset, unless it's -1 (unknown).
-	 */
-	if (query_location >= 0)
-	{
-		Assert(query_location <= strlen(query));
-		query += query_location;
-		/* Length of 0 (or -1) means "rest of string" */
-		if (query_len <= 0)
-			query_len = strlen(query);
-		else
-			Assert(query_len <= strlen(query));
-	}
-	else
-	{
-		/* If query location is unknown, distrust query_len as well */
-		query_location = 0;
-		query_len = strlen(query);
-	}
-
-	/*
-	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
-	 * not libc's isspace(), because we want to match the lexer's behavior.
 	 */
-	while (query_len > 0 && scanner_isspace(query[0]))
-		query++, query_location++, query_len--;
-	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
-		query_len--;
+	query = pgss_clean_querytext(query, &query_location, &query_len);
 
 	/*
-	 * For utility statements, we just hash the query string to get an ID.
+	 * For not already handled utility statements, we just hash the query
+	 * string to get an ID.
 	 */
 	if (queryId == UINT64CONST(0))
-	{
-		queryId = pgss_hash_string(query, query_len);
-
-		/*
-		 * If we are unlucky enough to get a hash of zero(invalid), use
-		 * queryID as 2 instead, queryID 1 is already in use for normal
-		 * statements.
-		 */
-		if (queryId == UINT64CONST(0))
-			queryId = UINT64CONST(2);
-	}
+		queryId = pgss_compute_utility_queryid(query, query_len);
 
 	/* Set up key for hashtable search */
 	key.userid = GetUserId();
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 243d019868..dd685faaa0 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6607,6 +6607,11 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>queryid: identifier of session's current query, if any</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
@@ -7024,8 +7029,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       <listitem>
        <para>
         Enables the collection of information on the currently
-        executing command of each session, along with the time when
-        that command began execution. This parameter is on by
+        executing command of each session, along with its identifier and the
+        time when that command began execution. This parameter is on by
         default. Note that even when enabled, this information is not
         visible to all users, only to superusers and the user owning
         the session being reported on, so it should not represent a
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0ebadf0d26..55906a4ed8 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -860,6 +860,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      <entry><type>xid</type></entry>
      <entry>The current backend's <literal>xmin</literal> horizon.</entry>
     </row>
+    <row>
+     <entry><structfield>queryid</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this field
+      shows the identifier of the currently executing query. In all other
+      states, it shows the identifier of last query that was executed.  By
+      default, query identifiers are not computed, so this field will always
+      be null, unless an additional module that compute query identifiers, such
+      as <xref linkend="pgstatstatements"/>, is configured.
+     </entry>
+    </row>
     <row>
      <entry><structfield>query</structfield></entry>
      <entry><type>text</type></entry>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 813ea8bfc3..fa41831bb0 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -758,6 +758,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4fdffad6f3..d4aa484ab4 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -54,6 +54,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "tcop/utility.h"
@@ -142,6 +143,13 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	/* In some cases (e.g. an EXECUTE statement) a query execution will skip
+	 * parse analysis, which means that the queryid won't be reported.  Note
+	 * that it's harmless to report the queryid multiple time, as the call will
+	 * be ignored if the top level queryid has already been reported.
+	 */
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a753d6efa0..c5c02a1d2f 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -121,7 +121,7 @@ typedef struct ExecParallelInitializeDSMContext
 } ExecParallelInitializeDSMContext;
 
 /* Helper functions that run in the parallel leader. */
-static char *ExecSerializePlan(Plan *plan, EState *estate);
+static char *ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId);
 static bool ExecParallelEstimate(PlanState *node,
 								 ExecParallelEstimateContext *e);
 static bool ExecParallelInitializeDSM(PlanState *node,
@@ -140,7 +140,7 @@ static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
  * Create a serialized representation of the plan to be sent to each worker.
  */
 static char *
-ExecSerializePlan(Plan *plan, EState *estate)
+ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId)
 {
 	PlannedStmt *pstmt;
 	ListCell   *lc;
@@ -171,7 +171,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = UINT64CONST(0);
+	pstmt->queryId = queryId;
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
@@ -562,7 +562,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
 ParallelExecutorInfo *
 ExecInitParallelPlan(PlanState *planstate, EState *estate,
 					 Bitmapset *sendParams, int nworkers,
-					 int64 tuples_needed)
+					 int64 tuples_needed,
+					 uint64 queryId)
 {
 	ParallelExecutorInfo *pei;
 	ParallelContext *pcxt;
@@ -603,7 +604,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pei->planstate = planstate;
 
 	/* Fix up and serialize plan to be sent to workers. */
-	pstmt_data = ExecSerializePlan(planstate->plan, estate);
+	pstmt_data = ExecSerializePlan(planstate->plan, estate, queryId);
 
 	/* Create a parallel context. */
 	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -1357,8 +1358,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Setting debug_query_string for individual workers */
 	debug_query_string = queryDesc->sourceText;
 
-	/* Report workers' query for monitoring purposes */
+	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 6b8ed867d5..c57e197020 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -172,7 +172,8 @@ ExecGather(PlanState *pstate)
 												 estate,
 												 gather->initParam,
 												 gather->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 317ddb4ae2..b2040dca8e 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -24,6 +24,7 @@
 #include "lib/binaryheap.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -216,7 +217,8 @@ ExecGatherMerge(PlanState *pstate)
 												 estate,
 												 gm->initParam,
 												 gm->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 6676412842..11fead8422 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -43,6 +43,7 @@
 #include "parser/parse_relation.h"
 #include "parser/parse_target.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/rel.h"
 
@@ -120,6 +121,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
@@ -153,6 +156,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 04274056ca..44d6a173fa 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3157,6 +3157,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
+			beentry->st_queryid = 0;
 			proc->wait_event_info = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
@@ -3187,6 +3188,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	beentry->st_state = state;
 	beentry->st_state_start_timestamp = current_timestamp;
 
+	/*
+	 * If a new query is started, we reset the query identifier as it'll only
+	 * be known after parse analysis, to avoid reporting last query's
+	 * identifier.
+	 */
+	if (state == STATE_RUNNING)
+		beentry->st_queryid = 0;
+
 	if (cmd_str != NULL)
 	{
 		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
@@ -3197,6 +3206,48 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/* --------
+ * pgstat_report_queryid() -
+ *
+ *	Called to update top-level query identifier.
+ * --------
+ */
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	/*
+	 * if track_activities is disabled, st_queryid should already have been
+	 * reset
+	 */
+	if (!pgstat_track_activities)
+		return;
+
+	/*
+	 * We only report the top-level query identifiers.  The stored queryid is
+	 * reset when a backend calls pgstat_report_activity(STATE_RUNNING), or
+	 * with an explicit call to this function using the force flag.  If the
+	 * saved query identifier is not zero it means that it's not a top-level
+	 * command, so ignore the one provided unless it's an explicit call to
+	 * reset the identifier.
+	 */
+	if (beentry->st_queryid != 0 && !force)
+		return;
+
+	/*
+	 * Update my status entry, following the protocol of bumping
+	 * st_changecount before and after.  We use a volatile pointer here to
+	 * ensure the compiler doesn't try to get cute.
+	 */
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+	beentry->st_queryid = queryId;
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
 /*-----------
  * pgstat_progress_start_command() -
  *
@@ -4784,6 +4835,20 @@ pgstat_get_db_entry(Oid databaseid, bool create)
 	return result;
 }
 
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ *	Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+	if (!MyBEEntry)
+		return 0;
+
+	return MyBEEntry->st_queryid;
+}
+
 
 /*
  * Lookup the hash table entry for the specified table. If no hash
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 5b677863b9..68da45e0c3 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -742,6 +742,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	if (log_parser_stats)
 		ShowUsage("PARSE ANALYSIS STATISTICS");
 
@@ -960,6 +962,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
 			stmt->utilityStmt = query->utilityStmt;
 			stmt->stmt_location = query->stmt_location;
 			stmt->stmt_len = query->stmt_len;
+			stmt->queryId = query->queryId;
 		}
 		else
 		{
@@ -1076,6 +1079,8 @@ exec_simple_query(const char *query_string)
 		DestReceiver *receiver;
 		int16		format;
 
+		pgstat_report_queryid(0, true);
+
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 175f4fd26b..c9d632f278 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -565,7 +565,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	30
+#define PG_STAT_GET_ACTIVITY_COLS	31
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -906,6 +906,10 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[28] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			if (beentry->st_queryid == 0)
+				nulls[30] = true;
+			else
+				values[30] = DatumGetUInt64(beentry->st_queryid);
 		}
 		else
 		{
@@ -934,6 +938,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[27] = true;
 			nulls[28] = true;
 			nulls[29] = true;
+			nulls[30] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index b8858b132b..363530db8d 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -72,11 +72,11 @@
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/bgworker.h"
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
-#include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2608,6 +2608,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (padding != 0)
+					appendStringInfo(buf, "%*ld", padding,
+							pgstat_get_my_queryid());
+				else
+					appendStringInfo(buf, "%ld",
+							pgstat_get_my_queryid());
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 7de4adc2ff..73eb56cc53 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -533,6 +533,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index a649e44d08..e3ac657113 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5183,9 +5183,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid, queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 17d07cf020..63bb80c00c 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -38,7 +38,8 @@ typedef struct ParallelExecutorInfo
 
 extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
 												  EState *estate, Bitmapset *sendParam, int nworkers,
-												  int64 tuples_needed);
+												  int64 tuples_needed,
+												  uint64 queryId);
 extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
 extern void ExecParallelFinish(ParallelExecutorInfo *pei);
 extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 9d351e7714..0c0f776723 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1133,6 +1133,9 @@ typedef struct PgBackendStatus
 	ProgressCommandType st_progress_command;
 	Oid			st_progress_command_target;
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+
+	/* query identifier, optionnally computed using post_parse_analyze_hook */
+	uint64		st_queryid;
 } PgBackendStatus;
 
 /*
@@ -1318,6 +1321,7 @@ extern void pgstat_initialize(void);
 extern void pgstat_bestart(void);
 
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_queryid(uint64 queryId, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
@@ -1326,6 +1330,7 @@ extern const char *pgstat_get_wait_event_type(uint32 wait_event_info);
 extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
+extern uint64 pgstat_get_my_queryid(void);
 
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6eec8ec568..c02a4b8a17 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1746,9 +1746,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1853,7 +1854,7 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
@@ -2004,7 +2005,7 @@ pg_stat_replication| SELECT s.pid,
     w.spill_txns,
     w.spill_count,
     w.spill_bytes
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time, spill_txns, spill_count, spill_bytes) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_slru| SELECT s.name,
@@ -2026,7 +2027,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,
-- 
2.25.1

tatsuro.yamada.tf@nttcom.co.jp

almost 6 years ago

In reply to: Julien Rouhaud (#61)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Hi Julien,

On 2020/04/02 22:25, Julien Rouhaud wrote:

New conflict, rebased v9 attached.

I tested the patch on the head (c7654f6a3) and
the result was fine. See below:

$ make installcheck-world
=====================
All 1 tests passed.
=====================

Regards,
Tatsuro Yamada

rjuju123@gmail.com

almost 6 years ago

In reply to: Tatsuro Yamada (#62)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Apr 7, 2020 at 8:40 AM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:

Hi Julien,

On 2020/04/02 22:25, Julien Rouhaud wrote:

New conflict, rebased v9 attached.

I tested the patch on the head (c7654f6a3) and
the result was fine. See below:

$ make installcheck-world
=====================
All 1 tests passed.
=====================

Thanks Yamada-san! Unfortunately this patch still didn't attract any
committer, so I moved it to the next commitfest.

Atsushi Torikoshi

atorik@gmail.com

over 5 years ago

In reply to: Julien Rouhaud (#63)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Hi,

v9 patch fails to apply to HEAD, could you check and rebase it?

And here are minor typos.

79 + * utility statements. Note that we don't compute a queryId
for prepared
80 + * statemets related utility, as those will inherit from the
underlying
81 + * statements's one (except DEALLOCATE which is entirely
untracked).

statemets -> statements
statements's -> statements' or statement's?

Regards,

--
Atsushi Torikoshi

On Wed, Apr 8, 2020 at 11:38 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

Show quoted text

On Tue, Apr 7, 2020 at 8:40 AM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:

Hi Julien,

On 2020/04/02 22:25, Julien Rouhaud wrote:

New conflict, rebased v9 attached.

I tested the patch on the head (c7654f6a3) and
the result was fine. See below:

$ make installcheck-world
=====================
All 1 tests passed.
=====================

Thanks Yamada-san! Unfortunately this patch still didn't attract any
committer, so I moved it to the next commitfest.

rjuju123@gmail.com

over 5 years ago

In reply to: Atsushi Torikoshi (#64)

1 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Jul 14, 2020 at 07:11:02PM +0900, Atsushi Torikoshi wrote:

Hi,

v9 patch fails to apply to HEAD, could you check and rebase it?

Thanks for the notice, v10 attached!

And here are minor typos.

79 + * utility statements. Note that we don't compute a queryId
for prepared
80 + * statemets related utility, as those will inherit from the
underlying
81 + * statements's one (except DEALLOCATE which is entirely
untracked).

statemets -> statements
statements's -> statements' or statement's?

Thanks! I went with "statement's".

Attachments:

v10-0001-Expose-queryid-in-pg_stat_activity-and-log_line_.patchtext/x-diff; charset=us-asciiDownload

From 8c651ee05c8a5e55966ad1646f48e83941d3776a Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Mon, 18 Mar 2019 18:55:50 +0100
Subject: [PATCH v10] Expose queryid in pg_stat_activity and log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.

Author: Julien Rouhaud
Reviewed-by: Evgeny Efimkin, Michael Paquier, Yamada Tatsuro, Atsushi Torikoshi
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 179 ++++++++++++------
 doc/src/sgml/config.sgml                      |   9 +-
 doc/src/sgml/monitoring.sgml                  |  15 ++
 src/backend/catalog/system_views.sql          |   1 +
 src/backend/executor/execMain.c               |   8 +
 src/backend/executor/execParallel.c           |  14 +-
 src/backend/executor/nodeGather.c             |   3 +-
 src/backend/executor/nodeGatherMerge.c        |   4 +-
 src/backend/parser/analyze.c                  |   5 +
 src/backend/postmaster/pgstat.c               |  65 +++++++
 src/backend/tcop/postgres.c                   |   5 +
 src/backend/utils/adt/pgstatfuncs.c           |   7 +-
 src/backend/utils/error/elog.c                |  10 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/catalog/pg_proc.dat               |   6 +-
 src/include/executor/execParallel.h           |   3 +-
 src/include/pgstat.h                          |   5 +
 src/test/regress/expected/rules.out           |   9 +-
 18 files changed, 270 insertions(+), 79 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 14cad19afb..a51c207b49 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -115,6 +115,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+									!IsA(n, PrepareStmt) && \
+									!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -345,7 +353,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 								ProcessUtilityContext context, ParamListInfo params,
 								QueryEnvironment *queryEnv,
 								DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
+static const char *pgss_clean_querytext(const char *query, int *location, int *len);
+static uint64 pgss_compute_utility_queryid(const char *query, int query_len);
 static void pgss_store(const char *query, uint64 queryId,
 					   int query_location, int query_len,
 					   pgssStoreKind kind,
@@ -845,16 +854,34 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * We compute a queryId now so that it can get exported in out
+	 * PgBackendStatus.  pgss_ProcessUtility will later discard it to prevents
+	 * double counting of optimizable statements that are directly contained in
+	 * utility statements.  Note that we don't compute a queryId for prepared
+	 * statements related utility, as those will inherit from the underlying
+	 * statement's one (except DEALLOCATE which is entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && PGSS_HANDLED_UTILITY(query->utilityStmt)
+			&& pstate->p_sourcetext)
+		{
+			const char *querytext = pstate->p_sourcetext;
+			int query_location = query->stmt_location;
+			int query_len = query->stmt_len;
+
+			/*
+			 * Confine our attention to the relevant part of the string, if the
+			 * query is a portion of a multi-statement source string.
+			 */
+			querytext = pgss_clean_querytext(pstate->p_sourcetext,
+											 &query_location,
+											 &query_len);
+
+			query->queryId = pgss_compute_utility_queryid(querytext, query_len);
+		}
+		else
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -1117,6 +1144,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, QueryCompletion *qc)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
+	uint64		saved_queryId = pstmt->queryId;
+
+	/*
+	 * Utility statements get queryId zero.  We do this even in cases where
+	 * the statement contains an optimizable statement for which a queryId
+	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
+	 * runtime control will first go through ProcessUtility and then the
+	 * executor, and we don't want the executor hooks to do anything, since we
+	 * are already measuring the statement's costs at the utility level.
+	 *
+	 * Note that this is only done if pg_stat_statements is enabled and
+	 * configured to track utility statements, in the unlikely possibility
+	 * that user configured another extension to handle utility statements
+	 * only.
+	 */
+	if (pgss_enabled(exec_nested_level) && pgss_track_utility)
+		pstmt->queryId = UINT64CONST(0);
 
 	/*
 	 * If it's an EXECUTE statement, we don't track it and don't increment the
@@ -1133,9 +1177,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	 * Likewise, we don't track execution of DEALLOCATE.
 	 */
 	if (pgss_track_utility && pgss_enabled(exec_nested_level) &&
-		!IsA(parsetree, ExecuteStmt) &&
-		!IsA(parsetree, PrepareStmt) &&
-		!IsA(parsetree, DeallocateStmt))
+		PGSS_HANDLED_UTILITY(parsetree))
 	{
 		instr_time	start;
 		instr_time	duration;
@@ -1181,7 +1223,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
 
 		pgss_store(queryString,
-				   0,			/* signal that it's a utility stmt */
+				   saved_queryId,
 				   pstmt->stmt_location,
 				   pstmt->stmt_len,
 				   PGSS_EXEC,
@@ -1205,22 +1247,76 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 }
 
 /*
- * Given an arbitrarily long query string, produce a hash for the purposes of
- * identifying the query, without normalizing constants.  Used when hashing
- * utility statements.
+ * Given a possibly multi-statement source string, confine our attention to the
+ * relevant part of the string.
+ */
+static const char *
+pgss_clean_querytext(const char *query, int *location, int *len)
+{
+	int query_location = *location;
+	int query_len = *len;
+
+	/* First apply starting offset, unless it's -1 (unknown). */
+	if (query_location >= 0)
+	{
+		Assert(query_location <= strlen(query));
+		query += query_location;
+		/* Length of 0 (or -1) means "rest of string" */
+		if (query_len <= 0)
+			query_len = strlen(query);
+		else
+			Assert(query_len <= strlen(query));
+	}
+	else
+	{
+		/* If query location is unknown, distrust query_len as well */
+		query_location = 0;
+		query_len = strlen(query);
+	}
+
+	/*
+	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
+	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 */
+	while (query_len > 0 && scanner_isspace(query[0]))
+		query++, query_location++, query_len--;
+	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
+		query_len--;
+
+	*location = query_location;
+	*len = query_len;
+
+	return query;
+}
+
+/*
+ * Compute a query identifier for the given utility query string.
  */
 static uint64
-pgss_hash_string(const char *str, int len)
+pgss_compute_utility_queryid(const char *str, int query_len)
 {
-	return DatumGetUInt64(hash_any_extended((const unsigned char *) str,
-											len, 0));
+	uint64 queryId;
+
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+											   query_len, 0));
+
+	/*
+	 * If we are unlucky enough to get a hash of zero(invalid), use
+	 * queryID as 2 instead, queryID 1 is already in use for normal
+	 * statements.
+	 */
+	if (queryId == UINT64CONST(0))
+		queryId = UINT64CONST(2);
+
+	return queryId;
 }
 
 /*
  * Store some statistics for a statement.
  *
- * If queryId is 0 then this is a utility statement and we should compute
- * a suitable queryId internally.
+ * If queryId is 0 then this is a utility statement for which we couldn't
+ * compute a queryId during parse analysis, and we should compute a suitable
+ * queryId internally.
  *
  * If jstate is not NULL then we're trying to create an entry for which
  * we have no statistics as yet; we just want to record the normalized
@@ -1253,50 +1349,15 @@ pgss_store(const char *query, uint64 queryId,
 	/*
 	 * Confine our attention to the relevant part of the string, if the query
 	 * is a portion of a multi-statement source string.
-	 *
-	 * First apply starting offset, unless it's -1 (unknown).
-	 */
-	if (query_location >= 0)
-	{
-		Assert(query_location <= strlen(query));
-		query += query_location;
-		/* Length of 0 (or -1) means "rest of string" */
-		if (query_len <= 0)
-			query_len = strlen(query);
-		else
-			Assert(query_len <= strlen(query));
-	}
-	else
-	{
-		/* If query location is unknown, distrust query_len as well */
-		query_location = 0;
-		query_len = strlen(query);
-	}
-
-	/*
-	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
-	 * not libc's isspace(), because we want to match the lexer's behavior.
 	 */
-	while (query_len > 0 && scanner_isspace(query[0]))
-		query++, query_location++, query_len--;
-	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
-		query_len--;
+	query = pgss_clean_querytext(query, &query_location, &query_len);
 
 	/*
-	 * For utility statements, we just hash the query string to get an ID.
+	 * For not already handled utility statements, we just hash the query
+	 * string to get an ID.
 	 */
 	if (queryId == UINT64CONST(0))
-	{
-		queryId = pgss_hash_string(query, query_len);
-
-		/*
-		 * If we are unlucky enough to get a hash of zero(invalid), use
-		 * queryID as 2 instead, queryID 1 is already in use for normal
-		 * statements.
-		 */
-		if (queryId == UINT64CONST(0))
-			queryId = UINT64CONST(2);
-	}
+		queryId = pgss_compute_utility_queryid(query, query_len);
 
 	/* Set up key for hashtable search */
 	key.userid = GetUserId();
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b353c61683..0801a1ed0a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6668,6 +6668,11 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>queryid: identifier of session's current query, if any</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
@@ -7119,8 +7124,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       <listitem>
        <para>
         Enables the collection of information on the currently
-        executing command of each session, along with the time when
-        that command began execution. This parameter is on by
+        executing command of each session, along with its identifier and the
+        time when that command began execution. This parameter is on by
         default. Note that even when enabled, this information is not
         visible to all users, only to superusers and the user owning
         the session being reported on, so it should not represent a
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 048ccc0988..88c4ee97c0 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -885,6 +885,21 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </para></entry>
      </row>
 
+    <row>
+     <entry role="catalog_table_entry"><para role="column_definition">
+      <structfield>queryid</structfield> <type>bigint</type>
+     </para>
+     <para>
+      Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this field
+      shows the identifier of the currently executing query. In all other
+      states, it shows the identifier of last query that was executed.  By
+      default, query identifiers are not computed, so this field will always
+      be null, unless an additional module that compute query identifiers, such
+      as <xref linkend="pgstatstatements"/>, is configured.
+     </para></entry>
+    </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b6d35c2d11..2545a57651 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -758,6 +758,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4fdffad6f3..d4aa484ab4 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -54,6 +54,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "tcop/utility.h"
@@ -142,6 +143,13 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	/* In some cases (e.g. an EXECUTE statement) a query execution will skip
+	 * parse analysis, which means that the queryid won't be reported.  Note
+	 * that it's harmless to report the queryid multiple time, as the call will
+	 * be ignored if the top level queryid has already been reported.
+	 */
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 382e78fb7f..1f8d4ea228 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -124,7 +124,7 @@ typedef struct ExecParallelInitializeDSMContext
 } ExecParallelInitializeDSMContext;
 
 /* Helper functions that run in the parallel leader. */
-static char *ExecSerializePlan(Plan *plan, EState *estate);
+static char *ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId);
 static bool ExecParallelEstimate(PlanState *node,
 								 ExecParallelEstimateContext *e);
 static bool ExecParallelInitializeDSM(PlanState *node,
@@ -143,7 +143,7 @@ static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
  * Create a serialized representation of the plan to be sent to each worker.
  */
 static char *
-ExecSerializePlan(Plan *plan, EState *estate)
+ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId)
 {
 	PlannedStmt *pstmt;
 	ListCell   *lc;
@@ -174,7 +174,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = UINT64CONST(0);
+	pstmt->queryId = queryId;
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
@@ -579,7 +579,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
 ParallelExecutorInfo *
 ExecInitParallelPlan(PlanState *planstate, EState *estate,
 					 Bitmapset *sendParams, int nworkers,
-					 int64 tuples_needed)
+					 int64 tuples_needed,
+					 uint64 queryId)
 {
 	ParallelExecutorInfo *pei;
 	ParallelContext *pcxt;
@@ -621,7 +622,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pei->planstate = planstate;
 
 	/* Fix up and serialize plan to be sent to workers. */
-	pstmt_data = ExecSerializePlan(planstate->plan, estate);
+	pstmt_data = ExecSerializePlan(planstate->plan, estate, queryId);
 
 	/* Create a parallel context. */
 	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -1404,8 +1405,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Setting debug_query_string for individual workers */
 	debug_query_string = queryDesc->sourceText;
 
-	/* Report workers' query for monitoring purposes */
+	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 6b8ed867d5..c57e197020 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -172,7 +172,8 @@ ExecGather(PlanState *pstate)
 												 estate,
 												 gather->initParam,
 												 gather->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 317ddb4ae2..b2040dca8e 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -24,6 +24,7 @@
 #include "lib/binaryheap.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -216,7 +217,8 @@ ExecGatherMerge(PlanState *pstate)
 												 estate,
 												 gm->initParam,
 												 gm->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index c159fb2957..e0a6099617 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -43,6 +43,7 @@
 #include "parser/parse_relation.h"
 #include "parser/parse_target.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/rel.h"
 
@@ -120,6 +121,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
@@ -153,6 +156,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 88992c2da2..51cb8f72fb 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3159,6 +3159,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
+			beentry->st_queryid = 0;
 			proc->wait_event_info = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
@@ -3189,6 +3190,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	beentry->st_state = state;
 	beentry->st_state_start_timestamp = current_timestamp;
 
+	/*
+	 * If a new query is started, we reset the query identifier as it'll only
+	 * be known after parse analysis, to avoid reporting last query's
+	 * identifier.
+	 */
+	if (state == STATE_RUNNING)
+		beentry->st_queryid = 0;
+
 	if (cmd_str != NULL)
 	{
 		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
@@ -3199,6 +3208,48 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/* --------
+ * pgstat_report_queryid() -
+ *
+ *	Called to update top-level query identifier.
+ * --------
+ */
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	/*
+	 * if track_activities is disabled, st_queryid should already have been
+	 * reset
+	 */
+	if (!pgstat_track_activities)
+		return;
+
+	/*
+	 * We only report the top-level query identifiers.  The stored queryid is
+	 * reset when a backend calls pgstat_report_activity(STATE_RUNNING), or
+	 * with an explicit call to this function using the force flag.  If the
+	 * saved query identifier is not zero it means that it's not a top-level
+	 * command, so ignore the one provided unless it's an explicit call to
+	 * reset the identifier.
+	 */
+	if (beentry->st_queryid != 0 && !force)
+		return;
+
+	/*
+	 * Update my status entry, following the protocol of bumping
+	 * st_changecount before and after.  We use a volatile pointer here to
+	 * ensure the compiler doesn't try to get cute.
+	 */
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+	beentry->st_queryid = queryId;
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
 /*-----------
  * pgstat_progress_start_command() -
  *
@@ -4793,6 +4844,20 @@ pgstat_get_db_entry(Oid databaseid, bool create)
 	return result;
 }
 
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ *	Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+	if (!MyBEEntry)
+		return 0;
+
+	return MyBEEntry->st_queryid;
+}
+
 
 /*
  * Lookup the hash table entry for the specified table. If no hash
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index c9424f167c..328daf1555 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -742,6 +742,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	if (log_parser_stats)
 		ShowUsage("PARSE ANALYSIS STATISTICS");
 
@@ -960,6 +962,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
 			stmt->utilityStmt = query->utilityStmt;
 			stmt->stmt_location = query->stmt_location;
 			stmt->stmt_len = query->stmt_len;
+			stmt->queryId = query->queryId;
 		}
 		else
 		{
@@ -1076,6 +1079,8 @@ exec_simple_query(const char *query_string)
 		DestReceiver *receiver;
 		int16		format;
 
+		pgstat_report_queryid(0, true);
+
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2aff739466..f155e52006 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -567,7 +567,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	30
+#define PG_STAT_GET_ACTIVITY_COLS	31
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -907,6 +907,10 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[28] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			if (beentry->st_queryid == 0)
+				nulls[30] = true;
+			else
+				values[30] = DatumGetUInt64(beentry->st_queryid);
 		}
 		else
 		{
@@ -935,6 +939,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[27] = true;
 			nulls[28] = true;
 			nulls[29] = true;
+			nulls[30] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index e4b717c79a..fe8f3fad1c 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -72,11 +72,11 @@
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/bgworker.h"
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
-#include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2608,6 +2608,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (padding != 0)
+					appendStringInfo(buf, "%*ld", padding,
+							pgstat_get_my_queryid());
+				else
+					appendStringInfo(buf, "%ld",
+							pgstat_get_my_queryid());
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e430e33c7b..66a7107546 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -536,6 +536,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 95604e988a..06546b01a9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5222,9 +5222,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid, queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a39a5b29c..fb5d908433 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -39,7 +39,8 @@ typedef struct ParallelExecutorInfo
 
 extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
 												  EState *estate, Bitmapset *sendParam, int nworkers,
-												  int64 tuples_needed);
+												  int64 tuples_needed,
+												  uint64 queryId);
 extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
 extern void ExecParallelFinish(ParallelExecutorInfo *pei);
 extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 1387201382..ea4786b114 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1135,6 +1135,9 @@ typedef struct PgBackendStatus
 	ProgressCommandType st_progress_command;
 	Oid			st_progress_command_target;
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+
+	/* query identifier, optionnally computed using post_parse_analyze_hook */
+	uint64		st_queryid;
 } PgBackendStatus;
 
 /*
@@ -1315,6 +1318,7 @@ extern void pgstat_initialize(void);
 extern void pgstat_bestart(void);
 
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_queryid(uint64 queryId, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
@@ -1323,6 +1327,7 @@ extern const char *pgstat_get_wait_event_type(uint32 wait_event_info);
 extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
+extern uint64 pgstat_get_my_queryid(void);
 
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index fa436f2caa..5cafde2609 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1748,9 +1748,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1855,7 +1856,7 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
@@ -2003,7 +2004,7 @@ pg_stat_replication| SELECT s.pid,
     w.sync_priority,
     w.sync_state,
     w.reply_time
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_slru| SELECT s.name,
@@ -2025,7 +2026,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,
-- 
2.27.0

torikoshia@oss.nttdata.com

over 5 years ago

In reply to: Julien Rouhaud (#65)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On 2020-07-14 20:24, Julien Rouhaud wrote:

On Tue, Jul 14, 2020 at 07:11:02PM +0900, Atsushi Torikoshi wrote:

Hi,

v9 patch fails to apply to HEAD, could you check and rebase it?

Thanks for the notice, v10 attached!

And here are minor typos.

79 + * utility statements. Note that we don't compute a
queryId
for prepared
80 + * statemets related utility, as those will inherit from
the
underlying
81 + * statements's one (except DEALLOCATE which is entirely
untracked).

statemets -> statements
statements's -> statements' or statement's?

Thanks! I went with "statement's".

Thanks for updating!
I tested the patch setting log_statement = 'all', but %Q in
log_line_prefix
was always 0 even when pg_stat_statements.queryid and
pg_stat_activity.queryid are not 0.

Is this an intentional behavior?

```
$ initdb --no-locale -D data

$ edit postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
logging_collector = on
log_line_prefix = '%m [%p] queryid:%Q '
log_statement = 'all'

$ pg_ctl start -D data

$ psql
=# CREATE EXTENSION pg_stat_statements;

=# CREATE TABLE t1 (i int);
=# INSERT INTO t1 VALUES (0),(1);
=# SELECT queryid, query FROM pg_stat_activity;

-- query ids are all 0 on the log
$ view log
2020-07-28 15:57:58.475 EDT [4480] queryid:0 LOG: statement: CREATE
TABLE t1 (i int);
2020-07-28 15:58:13.730 EDT [4480] queryid:0 LOG: statement: INSERT
INTO t1 VALUES (0),(1);
2020-07-28 15:59:28.389 EDT [4480] queryid:0 LOG: statement: SELECT *
FROM t1;

-- on pg_stat_activity and pgss, query ids are not 0
$ psql
=# SELECT queryid, query FROM pg_stat_activity WHERE query LIKE
'%t1%';
queryid | query

----------------------+----------------------------------------------------------------------
1109063694563750779 | SELECT * FROM t1;
-2582225123719476948 | SELECT queryid, query FROM pg_stat_activity
WHERE query LIKE '%t1%';
(2 rows)

=# SELECT queryid, query FROM pg_stat_statements WHERE query LIKE
'%t1%';
queryid | query
----------------------+---------------------------------
-5028988130796701553 | CREATE TABLE t1 (i int)
1109063694563750779 | SELECT * FROM t1
2726469050076420724 | INSERT INTO t1 VALUES ($1),($2)

```

And here is a minor typo.
optionnally -> optionally

753 + /* query identifier, optionnally computed using
post_parse_analyze_hook */

Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

rjuju123@gmail.com

over 5 years ago

In reply to: torikoshia (#66)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Jul 28, 2020 at 10:07 AM torikoshia <torikoshia@oss.nttdata.com> wrote:

Thanks for updating!
I tested the patch setting log_statement = 'all', but %Q in
log_line_prefix
was always 0 even when pg_stat_statements.queryid and
pg_stat_activity.queryid are not 0.

Is this an intentional behavior?

[...]

Thanks for the tests! That's indeed an expected behavior (although I
wasn't aware of it), which isn't documented in this patch (I'll fix
it). The reason for that is that log_statements is done right after
parsing the query:

/*
* Do basic parsing of the query or queries (this should be safe even if
* we are in aborted transaction state!)
*/
parsetree_list = pg_parse_query(query_string);

/* Log immediately if dictated by log_statement */
if (check_log_statement(parsetree_list))
{
ereport(LOG,
(errmsg("statement: %s", query_string),
errhidestmt(true),
errdetail_execute(parsetree_list)));
was_logged = true;
}

As parse analysis is not yet done, no queryid can be computed at that
point, so we always print 0. That's a limitation that can't be
removed without changing the semantics of log_statements, so we'll
probably have to live with it.

And here is a minor typo.
optionnally -> optionally

753 + /* query identifier, optionnally computed using
post_parse_analyze_hook */

Thanks, I fixed it locally!

rjuju123@gmail.com

over 5 years ago

In reply to: Julien Rouhaud (#67)

1 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Jul 28, 2020 at 10:55:04AM +0200, Julien Rouhaud wrote:

On Tue, Jul 28, 2020 at 10:07 AM torikoshia <torikoshia@oss.nttdata.com> wrote:

Thanks for updating!
I tested the patch setting log_statement = 'all', but %Q in
log_line_prefix
was always 0 even when pg_stat_statements.queryid and
pg_stat_activity.queryid are not 0.

Is this an intentional behavior?

[...]

Thanks for the tests! That's indeed an expected behavior (although I
wasn't aware of it), which isn't documented in this patch (I'll fix
it). The reason for that is that log_statements is done right after
parsing the query:

/*
* Do basic parsing of the query or queries (this should be safe even if
* we are in aborted transaction state!)
*/
parsetree_list = pg_parse_query(query_string);

/* Log immediately if dictated by log_statement */
if (check_log_statement(parsetree_list))
{
ereport(LOG,
(errmsg("statement: %s", query_string),
errhidestmt(true),
errdetail_execute(parsetree_list)));
was_logged = true;
}

As parse analysis is not yet done, no queryid can be computed at that
point, so we always print 0. That's a limitation that can't be
removed without changing the semantics of log_statements, so we'll
probably have to live with it.

And here is a minor typo.
optionnally -> optionally

753 + /* query identifier, optionnally computed using
post_parse_analyze_hook */

Thanks, I fixed it locally!

Recent conflict, rebased v11 attached.

Attachments:

v11-0001-Expose-queryid-in-pg_stat_activity-and-log_line_.patchtext/x-diff; charset=us-asciiDownload

From 473d038a1b447d4569709c3a499fc7356af76452 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Mon, 18 Mar 2019 18:55:50 +0100
Subject: [PATCH v11] Expose queryid in pg_stat_activity and log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.

Author: Julien Rouhaud
Reviewed-by: Evgeny Efimkin, Michael Paquier, Yamada Tatsuro, Atsushi Torikoshi
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 179 ++++++++++++------
 doc/src/sgml/config.sgml                      |   9 +-
 doc/src/sgml/monitoring.sgml                  |  15 ++
 src/backend/catalog/system_views.sql          |   1 +
 src/backend/executor/execMain.c               |   8 +
 src/backend/executor/execParallel.c           |  14 +-
 src/backend/executor/nodeGather.c             |   3 +-
 src/backend/executor/nodeGatherMerge.c        |   4 +-
 src/backend/parser/analyze.c                  |   5 +
 src/backend/postmaster/pgstat.c               |  65 +++++++
 src/backend/tcop/postgres.c                   |   5 +
 src/backend/utils/adt/pgstatfuncs.c           |   7 +-
 src/backend/utils/error/elog.c                |  10 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/catalog/pg_proc.dat               |   6 +-
 src/include/executor/execParallel.h           |   3 +-
 src/include/pgstat.h                          |   5 +
 src/test/regress/expected/rules.out           |   9 +-
 18 files changed, 270 insertions(+), 79 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 6b91c62c31..486d07f9de 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -115,6 +115,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+									!IsA(n, PrepareStmt) && \
+									!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -345,7 +353,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 								ProcessUtilityContext context, ParamListInfo params,
 								QueryEnvironment *queryEnv,
 								DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
+static const char *pgss_clean_querytext(const char *query, int *location, int *len);
+static uint64 pgss_compute_utility_queryid(const char *query, int query_len);
 static void pgss_store(const char *query, uint64 queryId,
 					   int query_location, int query_len,
 					   pgssStoreKind kind,
@@ -845,16 +854,34 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * We compute a queryId now so that it can get exported in out
+	 * PgBackendStatus.  pgss_ProcessUtility will later discard it to prevents
+	 * double counting of optimizable statements that are directly contained in
+	 * utility statements.  Note that we don't compute a queryId for prepared
+	 * statements related utility, as those will inherit from the underlying
+	 * statement's one (except DEALLOCATE which is entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && PGSS_HANDLED_UTILITY(query->utilityStmt)
+			&& pstate->p_sourcetext)
+		{
+			const char *querytext = pstate->p_sourcetext;
+			int query_location = query->stmt_location;
+			int query_len = query->stmt_len;
+
+			/*
+			 * Confine our attention to the relevant part of the string, if the
+			 * query is a portion of a multi-statement source string.
+			 */
+			querytext = pgss_clean_querytext(pstate->p_sourcetext,
+											 &query_location,
+											 &query_len);
+
+			query->queryId = pgss_compute_utility_queryid(querytext, query_len);
+		}
+		else
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -1117,6 +1144,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, QueryCompletion *qc)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
+	uint64		saved_queryId = pstmt->queryId;
+
+	/*
+	 * Utility statements get queryId zero.  We do this even in cases where
+	 * the statement contains an optimizable statement for which a queryId
+	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
+	 * runtime control will first go through ProcessUtility and then the
+	 * executor, and we don't want the executor hooks to do anything, since we
+	 * are already measuring the statement's costs at the utility level.
+	 *
+	 * Note that this is only done if pg_stat_statements is enabled and
+	 * configured to track utility statements, in the unlikely possibility
+	 * that user configured another extension to handle utility statements
+	 * only.
+	 */
+	if (pgss_enabled(exec_nested_level) && pgss_track_utility)
+		pstmt->queryId = UINT64CONST(0);
 
 	/*
 	 * If it's an EXECUTE statement, we don't track it and don't increment the
@@ -1133,9 +1177,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	 * Likewise, we don't track execution of DEALLOCATE.
 	 */
 	if (pgss_track_utility && pgss_enabled(exec_nested_level) &&
-		!IsA(parsetree, ExecuteStmt) &&
-		!IsA(parsetree, PrepareStmt) &&
-		!IsA(parsetree, DeallocateStmt))
+		PGSS_HANDLED_UTILITY(parsetree))
 	{
 		instr_time	start;
 		instr_time	duration;
@@ -1189,7 +1231,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
 
 		pgss_store(queryString,
-				   0,			/* signal that it's a utility stmt */
+				   saved_queryId,
 				   pstmt->stmt_location,
 				   pstmt->stmt_len,
 				   PGSS_EXEC,
@@ -1213,22 +1255,76 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 }
 
 /*
- * Given an arbitrarily long query string, produce a hash for the purposes of
- * identifying the query, without normalizing constants.  Used when hashing
- * utility statements.
+ * Given a possibly multi-statement source string, confine our attention to the
+ * relevant part of the string.
+ */
+static const char *
+pgss_clean_querytext(const char *query, int *location, int *len)
+{
+	int query_location = *location;
+	int query_len = *len;
+
+	/* First apply starting offset, unless it's -1 (unknown). */
+	if (query_location >= 0)
+	{
+		Assert(query_location <= strlen(query));
+		query += query_location;
+		/* Length of 0 (or -1) means "rest of string" */
+		if (query_len <= 0)
+			query_len = strlen(query);
+		else
+			Assert(query_len <= strlen(query));
+	}
+	else
+	{
+		/* If query location is unknown, distrust query_len as well */
+		query_location = 0;
+		query_len = strlen(query);
+	}
+
+	/*
+	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
+	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 */
+	while (query_len > 0 && scanner_isspace(query[0]))
+		query++, query_location++, query_len--;
+	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
+		query_len--;
+
+	*location = query_location;
+	*len = query_len;
+
+	return query;
+}
+
+/*
+ * Compute a query identifier for the given utility query string.
  */
 static uint64
-pgss_hash_string(const char *str, int len)
+pgss_compute_utility_queryid(const char *str, int query_len)
 {
-	return DatumGetUInt64(hash_any_extended((const unsigned char *) str,
-											len, 0));
+	uint64 queryId;
+
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+											   query_len, 0));
+
+	/*
+	 * If we are unlucky enough to get a hash of zero(invalid), use
+	 * queryID as 2 instead, queryID 1 is already in use for normal
+	 * statements.
+	 */
+	if (queryId == UINT64CONST(0))
+		queryId = UINT64CONST(2);
+
+	return queryId;
 }
 
 /*
  * Store some statistics for a statement.
  *
- * If queryId is 0 then this is a utility statement and we should compute
- * a suitable queryId internally.
+ * If queryId is 0 then this is a utility statement for which we couldn't
+ * compute a queryId during parse analysis, and we should compute a suitable
+ * queryId internally.
  *
  * If jstate is not NULL then we're trying to create an entry for which
  * we have no statistics as yet; we just want to record the normalized
@@ -1261,50 +1357,15 @@ pgss_store(const char *query, uint64 queryId,
 	/*
 	 * Confine our attention to the relevant part of the string, if the query
 	 * is a portion of a multi-statement source string.
-	 *
-	 * First apply starting offset, unless it's -1 (unknown).
-	 */
-	if (query_location >= 0)
-	{
-		Assert(query_location <= strlen(query));
-		query += query_location;
-		/* Length of 0 (or -1) means "rest of string" */
-		if (query_len <= 0)
-			query_len = strlen(query);
-		else
-			Assert(query_len <= strlen(query));
-	}
-	else
-	{
-		/* If query location is unknown, distrust query_len as well */
-		query_location = 0;
-		query_len = strlen(query);
-	}
-
-	/*
-	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
-	 * not libc's isspace(), because we want to match the lexer's behavior.
 	 */
-	while (query_len > 0 && scanner_isspace(query[0]))
-		query++, query_location++, query_len--;
-	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
-		query_len--;
+	query = pgss_clean_querytext(query, &query_location, &query_len);
 
 	/*
-	 * For utility statements, we just hash the query string to get an ID.
+	 * For not already handled utility statements, we just hash the query
+	 * string to get an ID.
 	 */
 	if (queryId == UINT64CONST(0))
-	{
-		queryId = pgss_hash_string(query, query_len);
-
-		/*
-		 * If we are unlucky enough to get a hash of zero(invalid), use
-		 * queryID as 2 instead, queryID 1 is already in use for normal
-		 * statements.
-		 */
-		if (queryId == UINT64CONST(0))
-			queryId = UINT64CONST(2);
-	}
+		queryId = pgss_compute_utility_queryid(query, query_len);
 
 	/* Set up key for hashtable search */
 	key.userid = GetUserId();
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7a7177c550..a522882176 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6757,6 +6757,11 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>queryid: identifier of session's current query, if any</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
@@ -7209,8 +7214,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       <listitem>
        <para>
         Enables the collection of information on the currently
-        executing command of each session, along with the time when
-        that command began execution. This parameter is on by
+        executing command of each session, along with its identifier and the
+        time when that command began execution. This parameter is on by
         default. Note that even when enabled, this information is not
         visible to all users, only to superusers and the user owning
         the session being reported on, so it should not represent a
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 304c49f07b..6ffdbf8105 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -882,6 +882,21 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </para></entry>
      </row>
 
+    <row>
+     <entry role="catalog_table_entry"><para role="column_definition">
+      <structfield>queryid</structfield> <type>bigint</type>
+     </para>
+     <para>
+      Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this field
+      shows the identifier of the currently executing query. In all other
+      states, it shows the identifier of last query that was executed.  By
+      default, query identifiers are not computed, so this field will always
+      be null, unless an additional module that compute query identifiers, such
+      as <xref linkend="pgstatstatements"/>, is configured.
+     </para></entry>
+    </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ba5a23ac25..1734e27666 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -761,6 +761,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4fdffad6f3..d4aa484ab4 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -54,6 +54,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "tcop/utility.h"
@@ -142,6 +143,13 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	/* In some cases (e.g. an EXECUTE statement) a query execution will skip
+	 * parse analysis, which means that the queryid won't be reported.  Note
+	 * that it's harmless to report the queryid multiple time, as the call will
+	 * be ignored if the top level queryid has already been reported.
+	 */
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 382e78fb7f..1f8d4ea228 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -124,7 +124,7 @@ typedef struct ExecParallelInitializeDSMContext
 } ExecParallelInitializeDSMContext;
 
 /* Helper functions that run in the parallel leader. */
-static char *ExecSerializePlan(Plan *plan, EState *estate);
+static char *ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId);
 static bool ExecParallelEstimate(PlanState *node,
 								 ExecParallelEstimateContext *e);
 static bool ExecParallelInitializeDSM(PlanState *node,
@@ -143,7 +143,7 @@ static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
  * Create a serialized representation of the plan to be sent to each worker.
  */
 static char *
-ExecSerializePlan(Plan *plan, EState *estate)
+ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId)
 {
 	PlannedStmt *pstmt;
 	ListCell   *lc;
@@ -174,7 +174,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = UINT64CONST(0);
+	pstmt->queryId = queryId;
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
@@ -579,7 +579,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
 ParallelExecutorInfo *
 ExecInitParallelPlan(PlanState *planstate, EState *estate,
 					 Bitmapset *sendParams, int nworkers,
-					 int64 tuples_needed)
+					 int64 tuples_needed,
+					 uint64 queryId)
 {
 	ParallelExecutorInfo *pei;
 	ParallelContext *pcxt;
@@ -621,7 +622,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pei->planstate = planstate;
 
 	/* Fix up and serialize plan to be sent to workers. */
-	pstmt_data = ExecSerializePlan(planstate->plan, estate);
+	pstmt_data = ExecSerializePlan(planstate->plan, estate, queryId);
 
 	/* Create a parallel context. */
 	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -1404,8 +1405,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Setting debug_query_string for individual workers */
 	debug_query_string = queryDesc->sourceText;
 
-	/* Report workers' query for monitoring purposes */
+	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a01b46af14..0fb003aaec 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -172,7 +172,8 @@ ExecGather(PlanState *pstate)
 												 estate,
 												 gather->initParam,
 												 gather->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 47129344f3..e6017675e7 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -24,6 +24,7 @@
 #include "lib/binaryheap.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -216,7 +217,8 @@ ExecGatherMerge(PlanState *pstate)
 												 estate,
 												 gm->initParam,
 												 gm->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index c159fb2957..e0a6099617 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -43,6 +43,7 @@
 #include "parser/parse_relation.h"
 #include "parser/parse_target.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/rel.h"
 
@@ -120,6 +121,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
@@ -153,6 +156,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 73ce944fb1..ca4298f611 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3159,6 +3159,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
+			beentry->st_queryid = 0;
 			proc->wait_event_info = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
@@ -3189,6 +3190,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	beentry->st_state = state;
 	beentry->st_state_start_timestamp = current_timestamp;
 
+	/*
+	 * If a new query is started, we reset the query identifier as it'll only
+	 * be known after parse analysis, to avoid reporting last query's
+	 * identifier.
+	 */
+	if (state == STATE_RUNNING)
+		beentry->st_queryid = 0;
+
 	if (cmd_str != NULL)
 	{
 		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
@@ -3199,6 +3208,48 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/* --------
+ * pgstat_report_queryid() -
+ *
+ *	Called to update top-level query identifier.
+ * --------
+ */
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	/*
+	 * if track_activities is disabled, st_queryid should already have been
+	 * reset
+	 */
+	if (!pgstat_track_activities)
+		return;
+
+	/*
+	 * We only report the top-level query identifiers.  The stored queryid is
+	 * reset when a backend calls pgstat_report_activity(STATE_RUNNING), or
+	 * with an explicit call to this function using the force flag.  If the
+	 * saved query identifier is not zero it means that it's not a top-level
+	 * command, so ignore the one provided unless it's an explicit call to
+	 * reset the identifier.
+	 */
+	if (beentry->st_queryid != 0 && !force)
+		return;
+
+	/*
+	 * Update my status entry, following the protocol of bumping
+	 * st_changecount before and after.  We use a volatile pointer here to
+	 * ensure the compiler doesn't try to get cute.
+	 */
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+	beentry->st_queryid = queryId;
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
 /*-----------
  * pgstat_progress_start_command() -
  *
@@ -4797,6 +4848,20 @@ pgstat_get_db_entry(Oid databaseid, bool create)
 	return result;
 }
 
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ *	Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+	if (!MyBEEntry)
+		return 0;
+
+	return MyBEEntry->st_queryid;
+}
+
 
 /*
  * Lookup the hash table entry for the specified table. If no hash
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index c9424f167c..328daf1555 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -742,6 +742,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	if (log_parser_stats)
 		ShowUsage("PARSE ANALYSIS STATISTICS");
 
@@ -960,6 +962,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
 			stmt->utilityStmt = query->utilityStmt;
 			stmt->stmt_location = query->stmt_location;
 			stmt->stmt_len = query->stmt_len;
+			stmt->queryId = query->queryId;
 		}
 		else
 		{
@@ -1076,6 +1079,8 @@ exec_simple_query(const char *query_string)
 		DestReceiver *receiver;
 		int16		format;
 
+		pgstat_report_queryid(0, true);
+
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 95738a4e34..b21d968d22 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -567,7 +567,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	30
+#define PG_STAT_GET_ACTIVITY_COLS	31
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -913,6 +913,10 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[28] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			if (beentry->st_queryid == 0)
+				nulls[30] = true;
+			else
+				values[30] = DatumGetUInt64(beentry->st_queryid);
 		}
 		else
 		{
@@ -941,6 +945,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[27] = true;
 			nulls[28] = true;
 			nulls[29] = true;
+			nulls[30] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index d0b368530e..f0e3d2ca80 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -72,11 +72,11 @@
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/bgworker.h"
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
-#include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2631,6 +2631,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (padding != 0)
+					appendStringInfo(buf, "%*ld", padding,
+							pgstat_get_my_queryid());
+				else
+					appendStringInfo(buf, "%ld",
+							pgstat_get_my_queryid());
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9cb571f7cc..0b4daeffcb 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -541,6 +541,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 27989971db..a87d8ff9fe 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5222,9 +5222,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid, queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a39a5b29c..fb5d908433 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -39,7 +39,8 @@ typedef struct ParallelExecutorInfo
 
 extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
 												  EState *estate, Bitmapset *sendParam, int nworkers,
-												  int64 tuples_needed);
+												  int64 tuples_needed,
+												  uint64 queryId);
 extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
 extern void ExecParallelFinish(ParallelExecutorInfo *pei);
 extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 1387201382..f394e862a4 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1135,6 +1135,9 @@ typedef struct PgBackendStatus
 	ProgressCommandType st_progress_command;
 	Oid			st_progress_command_target;
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+
+	/* query identifier, optionally computed using post_parse_analyze_hook */
+	uint64		st_queryid;
 } PgBackendStatus;
 
 /*
@@ -1315,6 +1318,7 @@ extern void pgstat_initialize(void);
 extern void pgstat_bestart(void);
 
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_queryid(uint64 queryId, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
@@ -1323,6 +1327,7 @@ extern const char *pgstat_get_wait_event_type(uint32 wait_event_info);
 extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
+extern uint64 pgstat_get_my_queryid(void);
 
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 2a18dc423e..7f95f5df7b 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1760,9 +1760,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1867,7 +1868,7 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
@@ -2015,7 +2016,7 @@ pg_stat_replication| SELECT s.pid,
     w.sync_priority,
     w.sync_state,
     w.reply_time
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_slru| SELECT s.name,
@@ -2037,7 +2038,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,
-- 
2.28.0

bruce@momjian.us

over 5 years ago

In reply to: Julien Rouhaud (#68)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Aug 19, 2020 at 04:19:30PM +0200, Julien Rouhaud wrote:

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.

I would like to apply this patch (I know it has been in the commitfest
since July 2019), but I have some questions about the user API. Does it
make sense to have a column in pg_stat_actvity and an option in
log_line_prefix that will be empty unless pg_stat_statements is
installed? Is there no clean way to move the query hash computation out
of pg_stat_statements and into the main code so the query id is always
visible? (Also, did we decide _not_ to make the pg_stat_statements
queryid always a positive value?)

Also, in the doc patch:

By default, query identifiers are not computed, so this field will always
be null, unless an additional module that compute query identifiers, such
as <xref linkend="pgstatstatements"/>, is configured.

why are you saying "such as"? Isn't pg_stat_statements the only way to
see the queryid? This command allowed the queryid to be displayed in
pg_stat_activity:

ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements';

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

tgl@sss.pgh.pa.us

over 5 years ago

In reply to: Bruce Momjian (#69)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Bruce Momjian <bruce@momjian.us> writes:

I would like to apply this patch (I know it has been in the commitfest
since July 2019), but I have some questions about the user API. Does it
make sense to have a column in pg_stat_actvity and an option in
log_line_prefix that will be empty unless pg_stat_statements is
installed? Is there no clean way to move the query hash computation out
of pg_stat_statements and into the main code so the query id is always
visible? (Also, did we decide _not_ to make the pg_stat_statements
queryid always a positive value?)

FWIW, I think this proposal is a mess. I was willing to hold my nose
and have a queryId field in the internal Query struct without any solid
consensus about what its semantics are and which extensions get to use it.
Exposing it to end users seems like a bridge too far, though. In
particular, I'm afraid that that will cause people to expect it to have
consistent values across PG versions, or even just across architectures
within one version.

The larger picture here is that there's lots of room to doubt whether
pg_stat_statements' decisions about what to ignore or include in the ID
will be satisfactory to everybody. If that were not so, we'd just move
the computation into core.

regards, tom lane

alvherre@alvh.no-ip.org

over 5 years ago

In reply to: Tom Lane (#70)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On 2020-Oct-05, Tom Lane wrote:

FWIW, I think this proposal is a mess. I was willing to hold my nose
and have a queryId field in the internal Query struct without any solid
consensus about what its semantics are and which extensions get to use it.
Exposing it to end users seems like a bridge too far, though. In
particular, I'm afraid that that will cause people to expect it to have
consistent values across PG versions, or even just across architectures
within one version.

I wonder if it would help to purposefully change the computation so that
it is not -- for instance, hash the system_identifier as initial value.
Then users would be forced to accept that it'll change as soon as it
migrates to another server or is upgraded to a new major version.

bruce@momjian.us

over 5 years ago

In reply to: Alvaro Herrera (#71)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Mon, Oct 5, 2020 at 07:58:42PM -0300, ï¿½lvaro Herrera wrote:

On 2020-Oct-05, Tom Lane wrote:

FWIW, I think this proposal is a mess. I was willing to hold my nose
and have a queryId field in the internal Query struct without any solid
consensus about what its semantics are and which extensions get to use it.
Exposing it to end users seems like a bridge too far, though. In
particular, I'm afraid that that will cause people to expect it to have
consistent values across PG versions, or even just across architectures
within one version.

I wonder if it would help to purposefully change the computation so that
it is not -- for instance, hash the system_identifier as initial value.
Then users would be forced to accept that it'll change as soon as it
migrates to another server or is upgraded to a new major version.

That seems like a good idea, but it would prevent cross-cluster
same-major-version comparisons, which seems like a negative. Perhaps we
should add the major version into the hash to handle this. Ideally,
let's just put a queryid-hash-version into to the hash, so if we change
the computation, we just update the hash version and nothing matches
anymore.

I do think the queryid has to display independent of pg_stat_statements,
because I can see people using queryid for log file and pg_stat_activity
comparisons. I also think the ability to have queryid accessible is an
important feature outside of pg_stat_statements, so I do think we need a
way to move this idea forward.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

rjuju123@gmail.com

over 5 years ago

In reply to: Bruce Momjian (#72)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Oct 6, 2020 at 10:18 AM Bruce Momjian <bruce@momjian.us> wrote:

On Mon, Oct 5, 2020 at 07:58:42PM -0300, Álvaro Herrera wrote:

On 2020-Oct-05, Tom Lane wrote:

FWIW, I think this proposal is a mess. I was willing to hold my nose
and have a queryId field in the internal Query struct without any solid
consensus about what its semantics are and which extensions get to use it.
Exposing it to end users seems like a bridge too far, though. In
particular, I'm afraid that that will cause people to expect it to have
consistent values across PG versions, or even just across architectures
within one version.

I wonder if it would help to purposefully change the computation so that
it is not -- for instance, hash the system_identifier as initial value.
Then users would be forced to accept that it'll change as soon as it
migrates to another server or is upgraded to a new major version.

That seems like a good idea, but it would prevent cross-cluster
same-major-version comparisons, which seems like a negative. Perhaps we
should add the major version into the hash to handle this. Ideally,
let's just put a queryid-hash-version into to the hash, so if we change
the computation, we just update the hash version and nothing matches
anymore.

I do think the queryid has to display independent of pg_stat_statements,
because I can see people using queryid for log file and pg_stat_activity
comparisons. I also think the ability to have queryid accessible is an
important feature outside of pg_stat_statements, so I do think we need a
way to move this idea forward.

For the record, for now any extension can compute a queryid and there
are at least 2 other published extensions that already do that, one of
them having different semantics on how to compute the queryid. I'm
not sure that we'll ever get a consensus on those semantics due to
performance tradeoff, so removing the ability to let people put their
own code for that doesn't seem like the best way forward.

Maybe we could add a new hook for only queryid computation, and add a
GUC to let people choose between no queryid computed, core computation
(current pg_stat_statement) and 3rd party plugin?

bruce@momjian.us

over 5 years ago

In reply to: Julien Rouhaud (#73)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Oct 6, 2020 at 11:11:27AM +0800, Julien Rouhaud wrote:

I do think the queryid has to display independent of pg_stat_statements,
because I can see people using queryid for log file and pg_stat_activity
comparisons. I also think the ability to have queryid accessible is an
important feature outside of pg_stat_statements, so I do think we need a
way to move this idea forward.

For the record, for now any extension can compute a queryid and there
are at least 2 other published extensions that already do that, one of
them having different semantics on how to compute the queryid. I'm
not sure that we'll ever get a consensus on those semantics due to
performance tradeoff, so removing the ability to let people put their
own code for that doesn't seem like the best way forward.

Maybe we could add a new hook for only queryid computation, and add a
GUC to let people choose between no queryid computed, core computation
(current pg_stat_statement) and 3rd party plugin?

That all seems very complicated. If we go in that direction, I suggest
we just give up getting any of this into core.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

Michael Paquier

michael@paquier.xyz

over 5 years ago

In reply to: Bruce Momjian (#69)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Mon, Oct 05, 2020 at 05:24:06PM -0400, Bruce Momjian wrote:

(Also, did we decide _not_ to make the pg_stat_statements queryid
always a positive value?)

This specific point has been discussed a couple of years ago, please
see cff440d and its related thread:
/messages/by-id/CA+TgmobG_Kp4cBKFmsznUAaM1GWW6hhRNiZC0KjRMOOeYnz5Yw@mail.gmail.com
--
Michael

Michael Paquier

michael@paquier.xyz

over 5 years ago

In reply to: Bruce Momjian (#74)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Mon, Oct 05, 2020 at 11:23:50PM -0400, Bruce Momjian wrote:

On Tue, Oct 6, 2020 at 11:11:27AM +0800, Julien Rouhaud wrote:

Maybe we could add a new hook for only queryid computation, and add a
GUC to let people choose between no queryid computed, core computation
(current pg_stat_statement) and 3rd party plugin?

That all seems very complicated. If we go in that direction, I suggest
we just give up getting any of this into core.

A GUC would have at least the advantage to make the computation
consistent for any system willing to consume it, with the option to
not pay any potential performance impact, though I have to admit that
just moving the query ID computation of PGSS into core may not be the
best option as a query ID of 0 means the same thing for a utility, for
an initialization, and for a backend running a query with an unknown
value, but that could be worked out.

FWIW, I think that adding the system ID in the hash is too
restrictive, as it could be interesting for users to do stat
comparisons across multiple systems running the same major version.
It would be better to not give any strong guarantee that the query ID
computed will remain consistent across major versions so as it is
possible to keep improving it. Also, if nothing has been done that
changes the hashing computation, I see little benefit in forcing a
breakage by adding something like PG_MAJORVERSION_NUM or such in the
hash computation.
--
Michael

bruce@momjian.us

over 5 years ago

In reply to: Michael Paquier (#76)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Oct 6, 2020 at 02:34:58PM +0900, Michael Paquier wrote:

On Mon, Oct 05, 2020 at 11:23:50PM -0400, Bruce Momjian wrote:

On Tue, Oct 6, 2020 at 11:11:27AM +0800, Julien Rouhaud wrote:

Maybe we could add a new hook for only queryid computation, and add a
GUC to let people choose between no queryid computed, core computation
(current pg_stat_statement) and 3rd party plugin?

That all seems very complicated. If we go in that direction, I suggest
we just give up getting any of this into core.

A GUC would have at least the advantage to make the computation
consistent for any system willing to consume it, with the option to
not pay any potential performance impact, though I have to admit that
just moving the query ID computation of PGSS into core may not be the
best option as a query ID of 0 means the same thing for a utility, for
an initialization, and for a backend running a query with an unknown
value, but that could be worked out.

FWIW, I think that adding the system ID in the hash is too
restrictive, as it could be interesting for users to do stat
comparisons across multiple systems running the same major version.
It would be better to not give any strong guarantee that the query ID
computed will remain consistent across major versions so as it is
possible to keep improving it. Also, if nothing has been done that
changes the hashing computation, I see little benefit in forcing a
breakage by adding something like PG_MAJORVERSION_NUM or such in the
hash computation.

I thought some more about this. First, I think having the queryid hash
code in the server, without requiring pg_stat_statements, is a
requirement --- I think too many people will want to use this feature
independent of pg_stat_statements. Second, I understand the desire to
have different hash computation methods, depending on what level of
detail/matching you want.

I propose moving the pg_stat_statements queryid hash code into the
server (with a version number), and also adding a postgressql.conf
variable that lets you control how detailed the queryid hash is
computed. This addresses the problem of people wanting different hash
methods.

When computing a hash, the queryid detail level and version number will
be mixed into the hash, so only a hash that used a similar query and
identical queryid detail level would match.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

Michael Paquier

michael@paquier.xyz

over 5 years ago

In reply to: Bruce Momjian (#77)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Oct 06, 2020 at 09:22:29AM -0400, Bruce Momjian wrote:

I propose moving the pg_stat_statements queryid hash code into the
server (with a version number), and also adding a postgresql.conf
variable that lets you control how detailed the queryid hash is
computed. This addresses the problem of people wanting different hash
methods.

In terms of making this part expendable in the future, there could be
a point in having an enum here, but are we sure that we will have a
need for that in the future? What I get from this discussion is that
we want a unique source of truth that users can consume, and that the
only source of truth proposed is the PGSS hashing. We may change the
way we compute the query ID in the future, for example if it gets
expanded to some utility statements, etc. But that would be
controlled by the version number in the hash, not the GUC itself.

When computing a hash, the queryid detail level and version number will
be mixed into the hash, so only a hash that used a similar query and
identical queryid detail level would match.

Yes, having a version number directly dependent on the hashing sounds
like a good compromise to me.
--
Michael

bruce@momjian.us

over 5 years ago

In reply to: Michael Paquier (#78)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Oct 7, 2020 at 10:42:49AM +0900, Michael Paquier wrote:

On Tue, Oct 06, 2020 at 09:22:29AM -0400, Bruce Momjian wrote:

I propose moving the pg_stat_statements queryid hash code into the
server (with a version number), and also adding a postgresql.conf
variable that lets you control how detailed the queryid hash is
computed. This addresses the problem of people wanting different hash
methods.

In terms of making this part expendable in the future, there could be
a point in having an enum here, but are we sure that we will have a
need for that in the future? What I get from this discussion is that
we want a unique source of truth that users can consume, and that the
only source of truth proposed is the PGSS hashing. We may change the
way we compute the query ID in the future, for example if it gets
expanded to some utility statements, etc. But that would be
controlled by the version number in the hash, not the GUC itself.

Oh, if that is true, then I agree let's just go with the version number.

When computing a hash, the queryid detail level and version number will
be mixed into the hash, so only a hash that used a similar query and
identical queryid detail level would match.

Yes, having a version number directly dependent on the hashing sounds
like a good compromise to me.

Good, much simpler. I think there is enough demand for a queryid that I
would like to get this moving forward.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

rjuju123@gmail.com

over 5 years ago

In reply to: Bruce Momjian (#79)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Oct 7, 2020 at 9:53 AM Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Oct 7, 2020 at 10:42:49AM +0900, Michael Paquier wrote:

On Tue, Oct 06, 2020 at 09:22:29AM -0400, Bruce Momjian wrote:

I propose moving the pg_stat_statements queryid hash code into the
server (with a version number), and also adding a postgresql.conf
variable that lets you control how detailed the queryid hash is
computed. This addresses the problem of people wanting different hash
methods.

In terms of making this part expendable in the future, there could be
a point in having an enum here, but are we sure that we will have a
need for that in the future? What I get from this discussion is that
we want a unique source of truth that users can consume, and that the
only source of truth proposed is the PGSS hashing. We may change the
way we compute the query ID in the future, for example if it gets
expanded to some utility statements, etc. But that would be
controlled by the version number in the hash, not the GUC itself.

Oh, if that is true, then I agree let's just go with the version number.

But there are many people that aren't happy with the current hashing
approach. If we're going to move the computation in core, shouldn't
we listen to their complaints and let them pay some probably quite
high overhead to base the hash on name and/or fully qualified name
rather than OID?
For instance people using logical replication to upgrade to a newer
version may want to easily compare query performance on the new
version, or people with multi-tenant databases may want to ignore the
schema name to keep a low number of different queryid.

It would probably still be possible to have a custom queryid hashing
by disabling the core one and computing a new one in a custom
extension, but that seems a bit hackish.

Jumping back on Tom's point that there are judgment calls on what is
examined or not, after a quick look I see at least two possible
problems of ignored clauses:
- WITH TIES clause
- OVERRIDING clause

I personally think that they shouldn't be ignored, but I don't know if
they were only forgotten or ignored on purpose.

bruce@momjian.us

over 5 years ago

In reply to: Julien Rouhaud (#80)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Mon, Oct 12, 2020 at 04:20:05PM +0800, Julien Rouhaud wrote:

But there are many people that aren't happy with the current hashing
approach. If we're going to move the computation in core, shouldn't
we listen to their complaints and let them pay some probably quite
high overhead to base the hash on name and/or fully qualified name
rather than OID?
For instance people using logical replication to upgrade to a newer
version may want to easily compare query performance on the new
version, or people with multi-tenant databases may want to ignore the
schema name to keep a low number of different queryid.

Well, we have to consider how complex the user interface has to be to
allow more flexibility. We don't need to allow every option a user will
want.

With a version number, we have the ability to improve the algorithm or
add customization, but for the first use, we are probably better off
keeping it simple.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

robertmhaas@gmail.com

over 5 years ago

In reply to: Bruce Momjian (#81)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Mon, Oct 12, 2020 at 10:14 AM Bruce Momjian <bruce@momjian.us> wrote:

On Mon, Oct 12, 2020 at 04:20:05PM +0800, Julien Rouhaud wrote:

But there are many people that aren't happy with the current hashing
approach. If we're going to move the computation in core, shouldn't
we listen to their complaints and let them pay some probably quite
high overhead to base the hash on name and/or fully qualified name
rather than OID?
For instance people using logical replication to upgrade to a newer
version may want to easily compare query performance on the new
version, or people with multi-tenant databases may want to ignore the
schema name to keep a low number of different queryid.

Well, we have to consider how complex the user interface has to be to
allow more flexibility. We don't need to allow every option a user will
want.

With a version number, we have the ability to improve the algorithm or
add customization, but for the first use, we are probably better off
keeping it simple.

I thought your earlier idea of allowing this to be controlled by a GUC
was good. There could be a default method built into core, matching
what pg_stat_statements does, so you could select no hashing or that
method no matter what. Then extensions could provide other methods
which could be selected via the GUC.

I don't really understand how a version number helps. It's not like
there is going to be a v2 that is in all ways better than v1. If there
are different algorithms here, they are going to be customized for
different needs.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

tgl@sss.pgh.pa.us

over 5 years ago

In reply to: Robert Haas (#82)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Robert Haas <robertmhaas@gmail.com> writes:

I don't really understand how a version number helps. It's not like
there is going to be a v2 that is in all ways better than v1. If there
are different algorithms here, they are going to be customized for
different needs.

Yeah, I agree --- a version number is the wrong way to think about this.
It's gonna be more like algorithm foo versus algorithm bar versus
algorithm baz, where each one is better for a specific set of use-cases.
Julien already noted the point about hashing object OIDs versus object
names; one can easily imagine disagreeing with pg_stat_statement's
choices about ignoring values of constants; other properties of statements
might be irrelevant for some use-cases; and so on.

I'm okay with moving pg_stat_statement's existing algorithm into core as
long as there's a way for extensions to override it. With proper design,
that would allow extensions that do override it to coexist with
pg_stat_statements (thereby redefining the latter's idea of which
statements are "the same"), which is something that doesn't really work
nicely today.

regards, tom lane

bruce@momjian.us

over 5 years ago

In reply to: Tom Lane (#83)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I don't really understand how a version number helps. It's not like
there is going to be a v2 that is in all ways better than v1. If there
are different algorithms here, they are going to be customized for
different needs.

Yeah, I agree --- a version number is the wrong way to think about this.
It's gonna be more like algorithm foo versus algorithm bar versus
algorithm baz, where each one is better for a specific set of use-cases.
Julien already noted the point about hashing object OIDs versus object
names; one can easily imagine disagreeing with pg_stat_statement's
choices about ignoring values of constants; other properties of statements
might be irrelevant for some use-cases; and so on.

The version number was to invalidate _all_ query hashes if the
algorithm is slightly modified, rather than invalidating just some of
them, which could lead to confusion. The idea of selectable hash
algorithms is nice if people feel there is sufficient need for that.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

tgl@sss.pgh.pa.us

over 5 years ago

In reply to: Bruce Momjian (#84)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote:

Yeah, I agree --- a version number is the wrong way to think about this.

The version number was to invalidate _all_ query hashes if the
algorithm is slightly modified, rather than invalidating just some of
them, which could lead to confusion.

Color me skeptical as to the use-case for that. From users' standpoints,
the hash is mainly going to change when we change the set of parse node
fields that get hashed. Which is going to happen at every major release
and no (or at least epsilon) minor releases. So I do not see a point in
tracking an algorithm version number as such. Seems like make-work.

regards, tom lane

bruce@momjian.us

over 5 years ago

In reply to: Tom Lane (#85)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Mon, Oct 12, 2020 at 04:07:30PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote:

Yeah, I agree --- a version number is the wrong way to think about this.

The version number was to invalidate _all_ query hashes if the
algorithm is slightly modified, rather than invalidating just some of
them, which could lead to confusion.

Color me skeptical as to the use-case for that. From users' standpoints,
the hash is mainly going to change when we change the set of parse node
fields that get hashed. Which is going to happen at every major release
and no (or at least epsilon) minor releases. So I do not see a point in
tracking an algorithm version number as such. Seems like make-work.

OK, I came up with the hash idea only to address one of your concerns
about mismatched hashes for algorithm improvements/changes. Seems we
might as well just document that cross-version hashes are different.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

rjuju123@gmail.com

about 5 years ago

In reply to: Bruce Momjian (#86)

3 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Oct 13, 2020 at 4:53 AM Bruce Momjian <bruce@momjian.us> wrote:

On Mon, Oct 12, 2020 at 04:07:30PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote:

Yeah, I agree --- a version number is the wrong way to think about this.

The version number was to invalidate _all_ query hashes if the
algorithm is slightly modified, rather than invalidating just some of
them, which could lead to confusion.

Color me skeptical as to the use-case for that. From users' standpoints,
the hash is mainly going to change when we change the set of parse node
fields that get hashed. Which is going to happen at every major release
and no (or at least epsilon) minor releases. So I do not see a point in
tracking an algorithm version number as such. Seems like make-work.

OK, I came up with the hash idea only to address one of your concerns
about mismatched hashes for algorithm improvements/changes. Seems we
might as well just document that cross-version hashes are different.

Ok, so I tried to implement what seems to be the consensus. First
attached patch moves the current pgss queryid computation in core,
with a new compute_queryid GUC (on/off). One thing I don't really
like about this patch is that the JumbleState that pgss needs in order
to normalize the query string (the constants location and such) has to
be done by the core while computing the queryid and provided to pgss
in post_parse_analyse hook. That isn't ideal as it looks very
specific to pgss needs. On the other hand it means that you can now
use pgss with custom queryid heuristics by disabling compute_queryid
and having your module doing only that in post_parse_analyse_hook.
You'll however need to be careful to configure
shared_preload_libraries such that your custom module's
post_parse_analyse_hook is called first, so pgss' one can be called
with the needed JumbleState. Note that if no JumbleState is provided
pgss will store non normalized queries, but will otherwise behave as
intended.

The 2nd patch is the rebased original queryid exposure patch. No big
changes, except that it now handles utility statements queryid
generated during post_parse_analysis, same as regular queries. This
should simplify the work needed for custom queryid third party
modules.

The 3rd patch changes explain (verbose) to display the queryid if one
has been generated, whether by core or a third-party module. For
instance:

rjuju=# set compute_queryid = on;
SET
rjuju=# explain (verbose) select relname from pg_class;
QUERY PLAN
-----------------------------------------------------------------------
Seq Scan on pg_catalog.pg_class (cost=0.00..16.90 rows=390 width=64)
Output: relname
Query Identifier: -5494854185674379299
(3 rows)

Attachments:

v12-0003-Expose-query-identifier-in-verbose-explain.patchtext/x-patch; charset=US-ASCII; name=v12-0003-Expose-query-identifier-in-verbose-explain.patchDownload

From 4a81289f02e9bfb796317b32d492eb949c9ed4a1 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Sun, 8 Mar 2020 14:34:44 +0100
Subject: [PATCH v12 3/3] Expose query identifier in verbose explain

If a query identifier has been computed, either by enabling compute_queryid or
using a third-party module, verbose explain will display it.

Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 src/backend/commands/explain.c        | 18 ++++++++++++++++++
 src/test/regress/expected/explain.out |  9 +++++++++
 src/test/regress/sql/explain.sql      |  3 +++
 3 files changed, 30 insertions(+)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c8e292adfa..bb08c18a3a 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -24,6 +24,7 @@
 #include "nodes/extensible.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "parser/analyze.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/bufmgr.h"
@@ -163,6 +164,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 {
 	ExplainState *es = NewExplainState();
 	TupOutputState *tstate;
+	JumbleState *jstate;
+	Query		*query;
 	List	   *rewritten;
 	ListCell   *lc;
 	bool		timing_set = false;
@@ -239,6 +242,13 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	/* if the summary was not set explicitly, set default value */
 	es->summary = (summary_set) ? es->summary : es->analyze;
 
+	query = castNode(Query, stmt->query);
+	if (compute_queryid)
+		jstate = JumbleQuery(query, pstate->p_sourcetext);
+
+	if (post_parse_analyze_hook)
+		(*post_parse_analyze_hook) (pstate, query, jstate);
+
 	/*
 	 * Parse analysis was done already, but we still have to run the rule
 	 * rewriter.  We do not do AcquireRewriteLocks: we assume the query either
@@ -582,6 +592,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	/* Create textual dump of plan tree */
 	ExplainPrintPlan(es, queryDesc);
 
+	if (es->verbose && plannedstmt->queryId != UINT64CONST(0))
+	{
+		char	buf[MAXINT8LEN+1];
+
+		pg_lltoa(plannedstmt->queryId, buf);
+		ExplainPropertyText("Query Identifier", buf, es);
+	}
+
 	/* Show buffer usage in planning */
 	if (bufusage)
 	{
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
index dc7ab2ce8b..966bfef865 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -472,3 +472,12 @@ select jsonb_pretty(
 (1 row)
 
 rollback;
+set compute_queryid = on;
+select explain_filter('explain (verbose) select 1');
+             explain_filter             
+----------------------------------------
+ Result  (cost=N.N..N.N rows=N width=N)
+   Output: N
+ Query Identifier: -N
+(3 rows)
+
diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql
index c79116c927..cec23dec73 100644
--- a/src/test/regress/sql/explain.sql
+++ b/src/test/regress/sql/explain.sql
@@ -105,3 +105,6 @@ select jsonb_pretty(
 );
 
 rollback;
+
+set compute_queryid = on;
+select explain_filter('explain (verbose) select 1');
-- 
2.28.0

v12-0002-Expose-queryid-in-pg_stat_activity-and-log_line_.patchtext/x-patch; charset=US-ASCII; name=v12-0002-Expose-queryid-in-pg_stat_activity-and-log_line_.patchDownload

From ee578a9128898d69ff50bf5db59bebf55ed13250 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Mon, 18 Mar 2019 18:55:50 +0100
Subject: [PATCH v12 2/3] Expose queryid in pg_stat_activity and
 log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.

Author: Julien Rouhaud
Reviewed-by: Evgeny Efimkin, Michael Paquier, Yamada Tatsuro, Atsushi Torikoshi
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 112 +++++++-----------
 doc/src/sgml/config.sgml                      |   9 +-
 doc/src/sgml/monitoring.sgml                  |  15 +++
 src/backend/catalog/system_views.sql          |   1 +
 src/backend/executor/execMain.c               |   8 ++
 src/backend/executor/execParallel.c           |  14 ++-
 src/backend/executor/nodeGather.c             |   3 +-
 src/backend/executor/nodeGatherMerge.c        |   4 +-
 src/backend/parser/analyze.c                  |   5 +
 src/backend/postmaster/pgstat.c               |  65 ++++++++++
 src/backend/tcop/postgres.c                   |   5 +
 src/backend/utils/adt/pgstatfuncs.c           |   7 +-
 src/backend/utils/error/elog.c                |  10 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          |  29 +++--
 src/include/catalog/pg_proc.dat               |   6 +-
 src/include/executor/execParallel.h           |   3 +-
 src/include/pgstat.h                          |   5 +
 src/include/utils/queryjumble.h               |   2 +-
 src/test/regress/expected/rules.out           |   9 +-
 20 files changed, 209 insertions(+), 104 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index f352d0b615..2a69dbb88e 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -65,6 +65,7 @@
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/queryjumble.h"
 #include "utils/memutils.h"
 
 PG_MODULE_MAGIC;
@@ -98,6 +99,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+									!IsA(n, PrepareStmt) && \
+									!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -295,7 +304,6 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 								ProcessUtilityContext context, ParamListInfo params,
 								QueryEnvironment *queryEnv,
 								DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
 static void pgss_store(const char *query, uint64 queryId,
 					   int query_location, int query_len,
 					   pgssStoreKind kind,
@@ -783,16 +791,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * Clear queryId for prepared statements related utility, as those will
+	 * inherit from the underlying statement's one (except DEALLOCATE which is
+	 * entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt))
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -1034,6 +1040,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, QueryCompletion *qc)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
+	uint64		saved_queryId = pstmt->queryId;
+
+	/*
+	 * Force utility statements to get queryId zero.  We do this even in cases
+	 * where the statement contains an optimizable statement for which a
+	 * queryId could be derived (such as EXPLAIN or DECLARE CURSOR).  For such
+	 * cases, runtime control will first go through ProcessUtility and then the
+	 * executor, and we don't want the executor hooks to do anything, since we
+	 * are already measuring the statement's costs at the utility level.
+	 *
+	 * Note that this is only done if pg_stat_statements is enabled and
+	 * configured to track utility statements, in the unlikely possibility
+	 * that user configured another extension to handle utility statements
+	 * only.
+	 */
+	if (pgss_enabled(exec_nested_level) && pgss_track_utility)
+		pstmt->queryId = UINT64CONST(0);
 
 	/*
 	 * If it's an EXECUTE statement, we don't track it and don't increment the
@@ -1050,9 +1073,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	 * Likewise, we don't track execution of DEALLOCATE.
 	 */
 	if (pgss_track_utility && pgss_enabled(exec_nested_level) &&
-		!IsA(parsetree, ExecuteStmt) &&
-		!IsA(parsetree, PrepareStmt) &&
-		!IsA(parsetree, DeallocateStmt))
+		PGSS_HANDLED_UTILITY(parsetree))
 	{
 		instr_time	start;
 		instr_time	duration;
@@ -1106,7 +1127,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
 
 		pgss_store(queryString,
-				   0,			/* signal that it's a utility stmt */
+				   saved_queryId,
 				   pstmt->stmt_location,
 				   pstmt->stmt_len,
 				   PGSS_EXEC,
@@ -1129,23 +1150,12 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	}
 }
 
-/*
- * Given an arbitrarily long query string, produce a hash for the purposes of
- * identifying the query, without normalizing constants.  Used when hashing
- * utility statements.
- */
-static uint64
-pgss_hash_string(const char *str, int len)
-{
-	return DatumGetUInt64(hash_any_extended((const unsigned char *) str,
-											len, 0));
-}
-
 /*
  * Store some statistics for a statement.
  *
- * If queryId is 0 then this is a utility statement and we should compute
- * a suitable queryId internally.
+ * If queryId is 0 then this is a utility statement for which we couldn't
+ * compute a queryId during parse analysis, and we should compute a suitable
+ * queryId internally.
  *
  * If jstate is not NULL then we're trying to create an entry for which
  * we have no statistics as yet; we just want to record the normalized
@@ -1176,52 +1186,18 @@ pgss_store(const char *query, uint64 queryId,
 		return;
 
 	/*
-	 * Confine our attention to the relevant part of the string, if the query
-	 * is a portion of a multi-statement source string.
-	 *
-	 * First apply starting offset, unless it's -1 (unknown).
-	 */
-	if (query_location >= 0)
-	{
-		Assert(query_location <= strlen(query));
-		query += query_location;
-		/* Length of 0 (or -1) means "rest of string" */
-		if (query_len <= 0)
-			query_len = strlen(query);
-		else
-			Assert(query_len <= strlen(query));
-	}
-	else
-	{
-		/* If query location is unknown, distrust query_len as well */
-		query_location = 0;
-		query_len = strlen(query);
-	}
-
-	/*
-	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
-	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 * Nothing to do if compute_queryid isn't enabled and no other module
+	 * computed a query identifier.
 	 */
-	while (query_len > 0 && scanner_isspace(query[0]))
-		query++, query_location++, query_len--;
-	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
-		query_len--;
+	if (queryId == UINT64CONST(0))
+		return;
 
 	/*
-	 * For utility statements, we just hash the query string to get an ID.
+	 * Confine our attention to the relevant part of the string, if the query
+	 * is a portion of a multi-statement source string, and update query
+	 * location and length if needed.
 	 */
-	if (queryId == UINT64CONST(0))
-	{
-		queryId = pgss_hash_string(query, query_len);
-
-		/*
-		 * If we are unlucky enough to get a hash of zero(invalid), use
-		 * queryID as 2 instead, queryID 1 is already in use for normal
-		 * statements.
-		 */
-		if (queryId == UINT64CONST(0))
-			queryId = UINT64CONST(2);
-	}
+	query = CleanQuerytext(query, &query_location, &query_len);
 
 	/* Set up key for hashtable search */
 	key.userid = GetUserId();
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ee914740cc..a6e772c8b4 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6759,6 +6759,11 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>queryid: identifier of session's current query, if any</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
@@ -7213,8 +7218,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       <listitem>
        <para>
         Enables the collection of information on the currently
-        executing command of each session, along with the time when
-        that command began execution. This parameter is on by
+        executing command of each session, along with its identifier and the
+        time when that command began execution. This parameter is on by
         default. Note that even when enabled, this information is not
         visible to all users, only to superusers and the user owning
         the session being reported on, so it should not represent a
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 66566765f0..1618ae00c8 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -899,6 +899,21 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </para></entry>
      </row>
 
+    <row>
+     <entry role="catalog_table_entry"><para role="column_definition">
+      <structfield>queryid</structfield> <type>bigint</type>
+     </para>
+     <para>
+      Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this field
+      shows the identifier of the currently executing query. In all other
+      states, it shows the identifier of last query that was executed.  By
+      default, query identifiers are not computed, so this field will always
+      be null, unless an additional module that compute query identifiers, such
+      as <xref linkend="pgstatstatements"/>, is configured.
+     </para></entry>
+    </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index c29390760f..1c81991fab 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -764,6 +764,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 783eecbc13..79a6f21e24 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -54,6 +54,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "tcop/utility.h"
@@ -142,6 +143,13 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	/* In some cases (e.g. an EXECUTE statement) a query execution will skip
+	 * parse analysis, which means that the queryid won't be reported.  Note
+	 * that it's harmless to report the queryid multiple time, as the call will
+	 * be ignored if the top level queryid has already been reported.
+	 */
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index befde52691..44976d2c68 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -124,7 +124,7 @@ typedef struct ExecParallelInitializeDSMContext
 } ExecParallelInitializeDSMContext;
 
 /* Helper functions that run in the parallel leader. */
-static char *ExecSerializePlan(Plan *plan, EState *estate);
+static char *ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId);
 static bool ExecParallelEstimate(PlanState *node,
 								 ExecParallelEstimateContext *e);
 static bool ExecParallelInitializeDSM(PlanState *node,
@@ -143,7 +143,7 @@ static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
  * Create a serialized representation of the plan to be sent to each worker.
  */
 static char *
-ExecSerializePlan(Plan *plan, EState *estate)
+ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId)
 {
 	PlannedStmt *pstmt;
 	ListCell   *lc;
@@ -174,7 +174,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = UINT64CONST(0);
+	pstmt->queryId = queryId;
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
@@ -578,7 +578,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
 ParallelExecutorInfo *
 ExecInitParallelPlan(PlanState *planstate, EState *estate,
 					 Bitmapset *sendParams, int nworkers,
-					 int64 tuples_needed)
+					 int64 tuples_needed,
+					 uint64 queryId)
 {
 	ParallelExecutorInfo *pei;
 	ParallelContext *pcxt;
@@ -620,7 +621,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pei->planstate = planstate;
 
 	/* Fix up and serialize plan to be sent to workers. */
-	pstmt_data = ExecSerializePlan(planstate->plan, estate);
+	pstmt_data = ExecSerializePlan(planstate->plan, estate, queryId);
 
 	/* Create a parallel context. */
 	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -1403,8 +1404,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Setting debug_query_string for individual workers */
 	debug_query_string = queryDesc->sourceText;
 
-	/* Report workers' query for monitoring purposes */
+	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a01b46af14..0fb003aaec 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -172,7 +172,8 @@ ExecGather(PlanState *pstate)
 												 estate,
 												 gather->initParam,
 												 gather->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 47129344f3..e6017675e7 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -24,6 +24,7 @@
 #include "lib/binaryheap.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -216,7 +217,8 @@ ExecGatherMerge(PlanState *pstate)
 												 estate,
 												 gm->initParam,
 												 gm->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index c59336cd49..cd05c15a22 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -43,6 +43,7 @@
 #include "parser/parse_relation.h"
 #include "parser/parse_target.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/guc.h"
 #include "utils/queryjumble.h"
@@ -126,6 +127,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
@@ -163,6 +166,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 822f0ebc62..105fadcad4 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3302,6 +3302,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
+			beentry->st_queryid = 0;
 			proc->wait_event_info = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
@@ -3332,6 +3333,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	beentry->st_state = state;
 	beentry->st_state_start_timestamp = current_timestamp;
 
+	/*
+	 * If a new query is started, we reset the query identifier as it'll only
+	 * be known after parse analysis, to avoid reporting last query's
+	 * identifier.
+	 */
+	if (state == STATE_RUNNING)
+		beentry->st_queryid = 0;
+
 	if (cmd_str != NULL)
 	{
 		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
@@ -3342,6 +3351,48 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/* --------
+ * pgstat_report_queryid() -
+ *
+ *	Called to update top-level query identifier.
+ * --------
+ */
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	/*
+	 * if track_activities is disabled, st_queryid should already have been
+	 * reset
+	 */
+	if (!pgstat_track_activities)
+		return;
+
+	/*
+	 * We only report the top-level query identifiers.  The stored queryid is
+	 * reset when a backend calls pgstat_report_activity(STATE_RUNNING), or
+	 * with an explicit call to this function using the force flag.  If the
+	 * saved query identifier is not zero it means that it's not a top-level
+	 * command, so ignore the one provided unless it's an explicit call to
+	 * reset the identifier.
+	 */
+	if (beentry->st_queryid != 0 && !force)
+		return;
+
+	/*
+	 * Update my status entry, following the protocol of bumping
+	 * st_changecount before and after.  We use a volatile pointer here to
+	 * ensure the compiler doesn't try to get cute.
+	 */
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+	beentry->st_queryid = queryId;
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
 /*-----------
  * pgstat_progress_start_command() -
  *
@@ -5000,6 +5051,20 @@ pgstat_get_db_entry(Oid databaseid, bool create)
 	return result;
 }
 
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ *	Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+	if (!MyBEEntry)
+		return 0;
+
+	return MyBEEntry->st_queryid;
+}
+
 
 /*
  * Lookup the hash table entry for the specified table. If no hash
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 0deb3c143f..5a66573f2f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -746,6 +746,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	if (log_parser_stats)
 		ShowUsage("PARSE ANALYSIS STATISTICS");
 
@@ -964,6 +966,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
 			stmt->utilityStmt = query->utilityStmt;
 			stmt->stmt_location = query->stmt_location;
 			stmt->stmt_len = query->stmt_len;
+			stmt->queryId = query->queryId;
 		}
 		else
 		{
@@ -1080,6 +1083,8 @@ exec_simple_query(const char *query_string)
 		DestReceiver *receiver;
 		int16		format;
 
+		pgstat_report_queryid(0, true);
+
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 0d0d2e6d2b..8dad50bc32 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -567,7 +567,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	30
+#define PG_STAT_GET_ACTIVITY_COLS	31
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -913,6 +913,10 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[28] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			if (beentry->st_queryid == 0)
+				nulls[30] = true;
+			else
+				values[30] = DatumGetUInt64(beentry->st_queryid);
 		}
 		else
 		{
@@ -941,6 +945,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[27] = true;
 			nulls[28] = true;
 			nulls[29] = true;
+			nulls[30] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 1ba47c194b..23c1e0d590 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -72,11 +72,11 @@
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/bgworker.h"
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
-#include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2628,6 +2628,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (padding != 0)
+					appendStringInfo(buf, "%*ld", padding,
+							pgstat_get_my_queryid());
+				else
+					appendStringInfo(buf, "%ld",
+							pgstat_get_my_queryid());
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 81bcb9d25c..eec94ac5a2 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -541,6 +541,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index ae84fcac6e..b0a5731ef7 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -39,7 +39,7 @@
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
-static uint64 compute_utility_queryid(const char *str, int query_len);
+static uint64 compute_utility_queryid(const char *str, int query_location, int query_len);
 static void AppendJumble(JumbleState *jstate,
 						 const unsigned char *item, Size size);
 static void JumbleQueryInternal(JumbleState *jstate, Query *query);
@@ -53,7 +53,7 @@ static void RecordConstLocation(JumbleState *jstate, int location);
  * relevant part of the string.
  */
 const char *
-clean_querytext(const char *query, int *location, int *len)
+CleanQuerytext(const char *query, int *location, int *len)
 {
 	int query_location = *location;
 	int query_len = *len;
@@ -97,17 +97,9 @@ JumbleQuery(Query *query, const char *querytext)
 	JumbleState *jstate = NULL;
 	if (query->utilityStmt)
 	{
-		const char *sql;
-		int query_location = query->stmt_location;
-		int query_len = query->stmt_len;
-
-		/*
-		 * Confine our attention to the relevant part of the string, if the
-		 * query is a portion of a multi-statement source string.
-		 */
-		sql = clean_querytext(querytext, &query_location, &query_len);
-
-		query->queryId = compute_utility_queryid(sql, query_len);
+		query->queryId = compute_utility_queryid(querytext,
+												 query->stmt_location,
+												 query->stmt_len);
 	}
 	else
 	{
@@ -143,11 +135,18 @@ JumbleQuery(Query *query, const char *querytext)
  * Compute a query identifier for the given utility query string.
  */
 static uint64
-compute_utility_queryid(const char *str, int query_len)
+compute_utility_queryid(const char *query_text, int query_location, int query_len)
 {
 	uint64 queryId;
+	const char *sql;
+
+	/*
+	 * Confine our attention to the relevant part of the string, if the
+	 * query is a portion of a multi-statement source string.
+	 */
+	sql = CleanQuerytext(query_text, &query_location, &query_len);
 
-	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) sql,
 											   query_len, 0));
 
 	/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 22340baf1c..872235e8c6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5228,9 +5228,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid, queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a39a5b29c..fb5d908433 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -39,7 +39,8 @@ typedef struct ParallelExecutorInfo
 
 extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
 												  EState *estate, Bitmapset *sendParam, int nworkers,
-												  int64 tuples_needed);
+												  int64 tuples_needed,
+												  uint64 queryId);
 extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
 extern void ExecParallelFinish(ParallelExecutorInfo *pei);
 extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index a821ff4f15..310586d053 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1205,6 +1205,9 @@ typedef struct PgBackendStatus
 	ProgressCommandType st_progress_command;
 	Oid			st_progress_command_target;
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+
+	/* query identifier, optionally computed using post_parse_analyze_hook */
+	uint64		st_queryid;
 } PgBackendStatus;
 
 /*
@@ -1394,6 +1397,7 @@ extern void pgstat_initialize(void);
 extern void pgstat_bestart(void);
 
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_queryid(uint64 queryId, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
@@ -1402,6 +1406,7 @@ extern const char *pgstat_get_wait_event_type(uint32 wait_event_info);
 extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
+extern uint64 pgstat_get_my_queryid(void);
 
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
index 14087eea43..520cd4f43e 100644
--- a/src/include/utils/queryjumble.h
+++ b/src/include/utils/queryjumble.h
@@ -52,7 +52,7 @@ typedef struct JumbleState
 	int			highest_extern_param_id;
 } JumbleState;
 
-const char *clean_querytext(const char *query, int *location, int *len);
+const char *CleanQuerytext(const char *query, int *location, int *len);
 JumbleState *JumbleQuery(Query *query, const char *querytext);
 
 #endif							/* QUERYJUMBLE_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index cf2a9b4408..488001411a 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1760,9 +1760,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1867,7 +1868,7 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
@@ -2015,7 +2016,7 @@ pg_stat_replication| SELECT s.pid,
     w.sync_priority,
     w.sync_state,
     w.reply_time
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_replication_slots| SELECT s.name,
@@ -2043,7 +2044,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,
-- 
2.28.0

v12-0001-Move-pg_stat_statements-query-jumbling-to-core.patchtext/x-patch; charset=UTF-8; name=v12-0001-Move-pg_stat_statements-query-jumbling-to-core.patchDownload

From 5cf0ae90790c7f3772e9e8779d62bdc038b088ca Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Wed, 14 Oct 2020 02:11:37 +0800
Subject: [PATCH v12 1/3] Move pg_stat_statements query jumbling to core.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

A new compute_queryid GUC is also added, to control whether the queryid should
be computed.  It's now possible to disable core queryid computation and use
pg_stat_statements with a different algorithm to compute the queryid by using
third-party module.

Author: Julien Rouhaud²
Reviewed-by:
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +----------------
 .../pg_stat_statements.conf                   |   1 +
 src/backend/parser/analyze.c                  |  14 +-
 src/backend/tcop/postgres.c                   |   6 +-
 src/backend/utils/misc/Makefile               |   1 +
 src/backend/utils/misc/guc.c                  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          | 834 ++++++++++++++++++
 src/include/parser/analyze.h                  |   4 +-
 src/include/utils/guc.h                       |   1 +
 src/include/utils/queryjumble.h               |  58 ++
 11 files changed, 951 insertions(+), 784 deletions(-)
 create mode 100644 src/backend/utils/misc/queryjumble.c
 create mode 100644 src/include/utils/queryjumble.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 1eac9edaee..f352d0b615 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * As of Postgres 9.2, this module normalizes query entries.  As of Postgres
+ * 14, the normalization is done by the core, if compute_queryid is enabled, or
+ * by third-party modules if enabled.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -113,8 +98,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
-#define JUMBLE_SIZE				1024	/* query serialization buffer size */
-
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -224,40 +207,6 @@ typedef struct pgssSharedState
 	int			gc_count;		/* query file garbage collection cycle count */
 } pgssSharedState;
 
-/*
- * Struct for tracking locations/lengths of constants during normalization
- */
-typedef struct pgssLocationLen
-{
-	int			location;		/* start offset in query text */
-	int			length;			/* length in bytes, or -1 to ignore */
-} pgssLocationLen;
-
-/*
- * Working state for computing a query jumble and producing a normalized
- * query string
- */
-typedef struct pgssJumbleState
-{
-	/* Jumble of current query tree */
-	unsigned char *jumble;
-
-	/* Number of bytes used in jumble[] */
-	Size		jumble_len;
-
-	/* Array of locations of constants that should be removed */
-	pgssLocationLen *clocations;
-
-	/* Allocated length of clocations array */
-	int			clocations_buf_size;
-
-	/* Current number of valid entries in clocations array */
-	int			clocations_count;
-
-	/* highest Param id we've seen, in order to start normalization correctly */
-	int			highest_extern_param_id;
-} pgssJumbleState;
-
 /*---- Local variables ----*/
 
 /* Current nesting depth of ExecutorRun+ProcessUtility calls */
@@ -330,7 +279,8 @@ PG_FUNCTION_INFO_V1(pg_stat_statements);
 
 static void pgss_shmem_startup(void);
 static void pgss_shmem_shutdown(int code, Datum arg);
-static void pgss_post_parse_analyze(ParseState *pstate, Query *query);
+static void pgss_post_parse_analyze(ParseState *pstate, Query *query,
+									JumbleState *jstate);
 static PlannedStmt *pgss_planner(Query *parse,
 								 const char *query_string,
 								 int cursorOptions,
@@ -352,7 +302,7 @@ static void pgss_store(const char *query, uint64 queryId,
 					   double total_time, uint64 rows,
 					   const BufferUsage *bufusage,
 					   const WalUsage *walusage,
-					   pgssJumbleState *jstate);
+					   JumbleState *jstate);
 static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
 										pgssVersion api_version,
 										bool showtext);
@@ -368,16 +318,9 @@ static char *qtext_fetch(Size query_offset, int query_len,
 static bool need_gc_qtexts(void);
 static void gc_qtexts(void);
 static void entry_reset(Oid userid, Oid dbid, uint64 queryid);
-static void AppendJumble(pgssJumbleState *jstate,
-						 const unsigned char *item, Size size);
-static void JumbleQuery(pgssJumbleState *jstate, Query *query);
-static void JumbleRangeTable(pgssJumbleState *jstate, List *rtable);
-static void JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks);
-static void JumbleExpr(pgssJumbleState *jstate, Node *node);
-static void RecordConstLocation(pgssJumbleState *jstate, int location);
-static char *generate_normalized_query(pgssJumbleState *jstate, const char *query,
+static char *generate_normalized_query(JumbleState *jstate, const char *query,
 									   int query_loc, int *query_len_p);
-static void fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+static void fill_in_constant_lengths(JumbleState *jstate, const char *query,
 									 int query_loc);
 static int	comp_location(const void *a, const void *b);
 
@@ -830,15 +773,10 @@ error:
  * Post-parse-analysis hook: mark query with a queryId
  */
 static void
-pgss_post_parse_analyze(ParseState *pstate, Query *query)
+pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 {
-	pgssJumbleState jstate;
-
 	if (prev_post_parse_analyze_hook)
-		prev_post_parse_analyze_hook(pstate, query);
-
-	/* Assert we didn't do this already */
-	Assert(query->queryId == UINT64CONST(0));
+		prev_post_parse_analyze_hook(pstate, query, jstate);
 
 	/* Safety check... */
 	if (!pgss || !pgss_hash || !pgss_enabled(exec_nested_level))
@@ -858,35 +796,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 	}
 
-	/* Set up workspace for query jumbling */
-	jstate.jumble = (unsigned char *) palloc(JUMBLE_SIZE);
-	jstate.jumble_len = 0;
-	jstate.clocations_buf_size = 32;
-	jstate.clocations = (pgssLocationLen *)
-		palloc(jstate.clocations_buf_size * sizeof(pgssLocationLen));
-	jstate.clocations_count = 0;
-	jstate.highest_extern_param_id = 0;
-
-	/* Compute query ID and mark the Query node with it */
-	JumbleQuery(&jstate, query);
-	query->queryId =
-		DatumGetUInt64(hash_any_extended(jstate.jumble, jstate.jumble_len, 0));
-
 	/*
-	 * If we are unlucky enough to get a hash of zero, use 1 instead, to
-	 * prevent confusion with the utility-statement case.
+	 * If query jumbling were able to identify any ignorable constants, we
+	 * immediately create a hash table entry for the query, so that we can
+	 * record the normalized form of the query string.  If there were no such
+	 * constants, the normalized string would be the same as the query text
+	 * anyway, so there's no need for an early entry.
 	 */
-	if (query->queryId == UINT64CONST(0))
-		query->queryId = UINT64CONST(1);
-
-	/*
-	 * If we were able to identify any ignorable constants, we immediately
-	 * create a hash table entry for the query, so that we can record the
-	 * normalized form of the query string.  If there were no such constants,
-	 * the normalized string would be the same as the query text anyway, so
-	 * there's no need for an early entry.
-	 */
-	if (jstate.clocations_count > 0)
+	if (jstate && jstate->clocations_count > 0)
 		pgss_store(pstate->p_sourcetext,
 				   query->queryId,
 				   query->stmt_location,
@@ -896,7 +813,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 				   0,
 				   NULL,
 				   NULL,
-				   &jstate);
+				   jstate);
 }
 
 /*
@@ -1245,7 +1162,7 @@ pgss_store(const char *query, uint64 queryId,
 		   double total_time, uint64 rows,
 		   const BufferUsage *bufusage,
 		   const WalUsage *walusage,
-		   pgssJumbleState *jstate)
+		   JumbleState *jstate)
 {
 	pgssHashKey key;
 	pgssEntry  *entry;
@@ -2541,678 +2458,6 @@ release_lock:
 	LWLockRelease(pgss->lock);
 }
 
-/*
- * AppendJumble: Append a value that is substantive in a given query to
- * the current jumble.
- */
-static void
-AppendJumble(pgssJumbleState *jstate, const unsigned char *item, Size size)
-{
-	unsigned char *jumble = jstate->jumble;
-	Size		jumble_len = jstate->jumble_len;
-
-	/*
-	 * Whenever the jumble buffer is full, we hash the current contents and
-	 * reset the buffer to contain just that hash value, thus relying on the
-	 * hash to summarize everything so far.
-	 */
-	while (size > 0)
-	{
-		Size		part_size;
-
-		if (jumble_len >= JUMBLE_SIZE)
-		{
-			uint64		start_hash;
-
-			start_hash = DatumGetUInt64(hash_any_extended(jumble,
-														  JUMBLE_SIZE, 0));
-			memcpy(jumble, &start_hash, sizeof(start_hash));
-			jumble_len = sizeof(start_hash);
-		}
-		part_size = Min(size, JUMBLE_SIZE - jumble_len);
-		memcpy(jumble + jumble_len, item, part_size);
-		jumble_len += part_size;
-		item += part_size;
-		size -= part_size;
-	}
-	jstate->jumble_len = jumble_len;
-}
-
-/*
- * Wrappers around AppendJumble to encapsulate details of serialization
- * of individual local variable elements.
- */
-#define APP_JUMB(item) \
-	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
-#define APP_JUMB_STRING(str) \
-	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
-
-/*
- * JumbleQuery: Selectively serialize the query tree, appending significant
- * data to the "query jumble" while ignoring nonsignificant data.
- *
- * Rule of thumb for what to include is that we should ignore anything not
- * semantically significant (such as alias names) as well as anything that can
- * be deduced from child nodes (else we'd just be double-hashing that piece
- * of information).
- */
-static void
-JumbleQuery(pgssJumbleState *jstate, Query *query)
-{
-	Assert(IsA(query, Query));
-	Assert(query->utilityStmt == NULL);
-
-	APP_JUMB(query->commandType);
-	/* resultRelation is usually predictable from commandType */
-	JumbleExpr(jstate, (Node *) query->cteList);
-	JumbleRangeTable(jstate, query->rtable);
-	JumbleExpr(jstate, (Node *) query->jointree);
-	JumbleExpr(jstate, (Node *) query->targetList);
-	JumbleExpr(jstate, (Node *) query->onConflict);
-	JumbleExpr(jstate, (Node *) query->returningList);
-	JumbleExpr(jstate, (Node *) query->groupClause);
-	JumbleExpr(jstate, (Node *) query->groupingSets);
-	JumbleExpr(jstate, query->havingQual);
-	JumbleExpr(jstate, (Node *) query->windowClause);
-	JumbleExpr(jstate, (Node *) query->distinctClause);
-	JumbleExpr(jstate, (Node *) query->sortClause);
-	JumbleExpr(jstate, query->limitOffset);
-	JumbleExpr(jstate, query->limitCount);
-	JumbleRowMarks(jstate, query->rowMarks);
-	JumbleExpr(jstate, query->setOperations);
-}
-
-/*
- * Jumble a range table
- */
-static void
-JumbleRangeTable(pgssJumbleState *jstate, List *rtable)
-{
-	ListCell   *lc;
-
-	foreach(lc, rtable)
-	{
-		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
-
-		APP_JUMB(rte->rtekind);
-		switch (rte->rtekind)
-		{
-			case RTE_RELATION:
-				APP_JUMB(rte->relid);
-				JumbleExpr(jstate, (Node *) rte->tablesample);
-				break;
-			case RTE_SUBQUERY:
-				JumbleQuery(jstate, rte->subquery);
-				break;
-			case RTE_JOIN:
-				APP_JUMB(rte->jointype);
-				break;
-			case RTE_FUNCTION:
-				JumbleExpr(jstate, (Node *) rte->functions);
-				break;
-			case RTE_TABLEFUNC:
-				JumbleExpr(jstate, (Node *) rte->tablefunc);
-				break;
-			case RTE_VALUES:
-				JumbleExpr(jstate, (Node *) rte->values_lists);
-				break;
-			case RTE_CTE:
-
-				/*
-				 * Depending on the CTE name here isn't ideal, but it's the
-				 * only info we have to identify the referenced WITH item.
-				 */
-				APP_JUMB_STRING(rte->ctename);
-				APP_JUMB(rte->ctelevelsup);
-				break;
-			case RTE_NAMEDTUPLESTORE:
-				APP_JUMB_STRING(rte->enrname);
-				break;
-			case RTE_RESULT:
-				break;
-			default:
-				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
-				break;
-		}
-	}
-}
-
-/*
- * Jumble a rowMarks list
- */
-static void
-JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks)
-{
-	ListCell   *lc;
-
-	foreach(lc, rowMarks)
-	{
-		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
-
-		if (!rowmark->pushedDown)
-		{
-			APP_JUMB(rowmark->rti);
-			APP_JUMB(rowmark->strength);
-			APP_JUMB(rowmark->waitPolicy);
-		}
-	}
-}
-
-/*
- * Jumble an expression tree
- *
- * In general this function should handle all the same node types that
- * expression_tree_walker() does, and therefore it's coded to be as parallel
- * to that function as possible.  However, since we are only invoked on
- * queries immediately post-parse-analysis, we need not handle node types
- * that only appear in planning.
- *
- * Note: the reason we don't simply use expression_tree_walker() is that the
- * point of that function is to support tree walkers that don't care about
- * most tree node types, but here we care about all types.  We should complain
- * about any unrecognized node type.
- */
-static void
-JumbleExpr(pgssJumbleState *jstate, Node *node)
-{
-	ListCell   *temp;
-
-	if (node == NULL)
-		return;
-
-	/* Guard against stack overflow due to overly complex expressions */
-	check_stack_depth();
-
-	/*
-	 * We always emit the node's NodeTag, then any additional fields that are
-	 * considered significant, and then we recurse to any child nodes.
-	 */
-	APP_JUMB(node->type);
-
-	switch (nodeTag(node))
-	{
-		case T_Var:
-			{
-				Var		   *var = (Var *) node;
-
-				APP_JUMB(var->varno);
-				APP_JUMB(var->varattno);
-				APP_JUMB(var->varlevelsup);
-			}
-			break;
-		case T_Const:
-			{
-				Const	   *c = (Const *) node;
-
-				/* We jumble only the constant's type, not its value */
-				APP_JUMB(c->consttype);
-				/* Also, record its parse location for query normalization */
-				RecordConstLocation(jstate, c->location);
-			}
-			break;
-		case T_Param:
-			{
-				Param	   *p = (Param *) node;
-
-				APP_JUMB(p->paramkind);
-				APP_JUMB(p->paramid);
-				APP_JUMB(p->paramtype);
-				/* Also, track the highest external Param id */
-				if (p->paramkind == PARAM_EXTERN &&
-					p->paramid > jstate->highest_extern_param_id)
-					jstate->highest_extern_param_id = p->paramid;
-			}
-			break;
-		case T_Aggref:
-			{
-				Aggref	   *expr = (Aggref *) node;
-
-				APP_JUMB(expr->aggfnoid);
-				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggorder);
-				JumbleExpr(jstate, (Node *) expr->aggdistinct);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_GroupingFunc:
-			{
-				GroupingFunc *grpnode = (GroupingFunc *) node;
-
-				JumbleExpr(jstate, (Node *) grpnode->refs);
-			}
-			break;
-		case T_WindowFunc:
-			{
-				WindowFunc *expr = (WindowFunc *) node;
-
-				APP_JUMB(expr->winfnoid);
-				APP_JUMB(expr->winref);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_SubscriptingRef:
-			{
-				SubscriptingRef *sbsref = (SubscriptingRef *) node;
-
-				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
-			}
-			break;
-		case T_FuncExpr:
-			{
-				FuncExpr   *expr = (FuncExpr *) node;
-
-				APP_JUMB(expr->funcid);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_NamedArgExpr:
-			{
-				NamedArgExpr *nae = (NamedArgExpr *) node;
-
-				APP_JUMB(nae->argnumber);
-				JumbleExpr(jstate, (Node *) nae->arg);
-			}
-			break;
-		case T_OpExpr:
-		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
-		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
-			{
-				OpExpr	   *expr = (OpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_ScalarArrayOpExpr:
-			{
-				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				APP_JUMB(expr->useOr);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_BoolExpr:
-			{
-				BoolExpr   *expr = (BoolExpr *) node;
-
-				APP_JUMB(expr->boolop);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_SubLink:
-			{
-				SubLink    *sublink = (SubLink *) node;
-
-				APP_JUMB(sublink->subLinkType);
-				APP_JUMB(sublink->subLinkId);
-				JumbleExpr(jstate, (Node *) sublink->testexpr);
-				JumbleQuery(jstate, castNode(Query, sublink->subselect));
-			}
-			break;
-		case T_FieldSelect:
-			{
-				FieldSelect *fs = (FieldSelect *) node;
-
-				APP_JUMB(fs->fieldnum);
-				JumbleExpr(jstate, (Node *) fs->arg);
-			}
-			break;
-		case T_FieldStore:
-			{
-				FieldStore *fstore = (FieldStore *) node;
-
-				JumbleExpr(jstate, (Node *) fstore->arg);
-				JumbleExpr(jstate, (Node *) fstore->newvals);
-			}
-			break;
-		case T_RelabelType:
-			{
-				RelabelType *rt = (RelabelType *) node;
-
-				APP_JUMB(rt->resulttype);
-				JumbleExpr(jstate, (Node *) rt->arg);
-			}
-			break;
-		case T_CoerceViaIO:
-			{
-				CoerceViaIO *cio = (CoerceViaIO *) node;
-
-				APP_JUMB(cio->resulttype);
-				JumbleExpr(jstate, (Node *) cio->arg);
-			}
-			break;
-		case T_ArrayCoerceExpr:
-			{
-				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
-
-				APP_JUMB(acexpr->resulttype);
-				JumbleExpr(jstate, (Node *) acexpr->arg);
-				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
-			}
-			break;
-		case T_ConvertRowtypeExpr:
-			{
-				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
-
-				APP_JUMB(crexpr->resulttype);
-				JumbleExpr(jstate, (Node *) crexpr->arg);
-			}
-			break;
-		case T_CollateExpr:
-			{
-				CollateExpr *ce = (CollateExpr *) node;
-
-				APP_JUMB(ce->collOid);
-				JumbleExpr(jstate, (Node *) ce->arg);
-			}
-			break;
-		case T_CaseExpr:
-			{
-				CaseExpr   *caseexpr = (CaseExpr *) node;
-
-				JumbleExpr(jstate, (Node *) caseexpr->arg);
-				foreach(temp, caseexpr->args)
-				{
-					CaseWhen   *when = lfirst_node(CaseWhen, temp);
-
-					JumbleExpr(jstate, (Node *) when->expr);
-					JumbleExpr(jstate, (Node *) when->result);
-				}
-				JumbleExpr(jstate, (Node *) caseexpr->defresult);
-			}
-			break;
-		case T_CaseTestExpr:
-			{
-				CaseTestExpr *ct = (CaseTestExpr *) node;
-
-				APP_JUMB(ct->typeId);
-			}
-			break;
-		case T_ArrayExpr:
-			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
-			break;
-		case T_RowExpr:
-			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
-			break;
-		case T_RowCompareExpr:
-			{
-				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
-
-				APP_JUMB(rcexpr->rctype);
-				JumbleExpr(jstate, (Node *) rcexpr->largs);
-				JumbleExpr(jstate, (Node *) rcexpr->rargs);
-			}
-			break;
-		case T_CoalesceExpr:
-			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
-			break;
-		case T_MinMaxExpr:
-			{
-				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
-
-				APP_JUMB(mmexpr->op);
-				JumbleExpr(jstate, (Node *) mmexpr->args);
-			}
-			break;
-		case T_SQLValueFunction:
-			{
-				SQLValueFunction *svf = (SQLValueFunction *) node;
-
-				APP_JUMB(svf->op);
-				/* type is fully determined by op */
-				APP_JUMB(svf->typmod);
-			}
-			break;
-		case T_XmlExpr:
-			{
-				XmlExpr    *xexpr = (XmlExpr *) node;
-
-				APP_JUMB(xexpr->op);
-				JumbleExpr(jstate, (Node *) xexpr->named_args);
-				JumbleExpr(jstate, (Node *) xexpr->args);
-			}
-			break;
-		case T_NullTest:
-			{
-				NullTest   *nt = (NullTest *) node;
-
-				APP_JUMB(nt->nulltesttype);
-				JumbleExpr(jstate, (Node *) nt->arg);
-			}
-			break;
-		case T_BooleanTest:
-			{
-				BooleanTest *bt = (BooleanTest *) node;
-
-				APP_JUMB(bt->booltesttype);
-				JumbleExpr(jstate, (Node *) bt->arg);
-			}
-			break;
-		case T_CoerceToDomain:
-			{
-				CoerceToDomain *cd = (CoerceToDomain *) node;
-
-				APP_JUMB(cd->resulttype);
-				JumbleExpr(jstate, (Node *) cd->arg);
-			}
-			break;
-		case T_CoerceToDomainValue:
-			{
-				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
-
-				APP_JUMB(cdv->typeId);
-			}
-			break;
-		case T_SetToDefault:
-			{
-				SetToDefault *sd = (SetToDefault *) node;
-
-				APP_JUMB(sd->typeId);
-			}
-			break;
-		case T_CurrentOfExpr:
-			{
-				CurrentOfExpr *ce = (CurrentOfExpr *) node;
-
-				APP_JUMB(ce->cvarno);
-				if (ce->cursor_name)
-					APP_JUMB_STRING(ce->cursor_name);
-				APP_JUMB(ce->cursor_param);
-			}
-			break;
-		case T_NextValueExpr:
-			{
-				NextValueExpr *nve = (NextValueExpr *) node;
-
-				APP_JUMB(nve->seqid);
-				APP_JUMB(nve->typeId);
-			}
-			break;
-		case T_InferenceElem:
-			{
-				InferenceElem *ie = (InferenceElem *) node;
-
-				APP_JUMB(ie->infercollid);
-				APP_JUMB(ie->inferopclass);
-				JumbleExpr(jstate, ie->expr);
-			}
-			break;
-		case T_TargetEntry:
-			{
-				TargetEntry *tle = (TargetEntry *) node;
-
-				APP_JUMB(tle->resno);
-				APP_JUMB(tle->ressortgroupref);
-				JumbleExpr(jstate, (Node *) tle->expr);
-			}
-			break;
-		case T_RangeTblRef:
-			{
-				RangeTblRef *rtr = (RangeTblRef *) node;
-
-				APP_JUMB(rtr->rtindex);
-			}
-			break;
-		case T_JoinExpr:
-			{
-				JoinExpr   *join = (JoinExpr *) node;
-
-				APP_JUMB(join->jointype);
-				APP_JUMB(join->isNatural);
-				APP_JUMB(join->rtindex);
-				JumbleExpr(jstate, join->larg);
-				JumbleExpr(jstate, join->rarg);
-				JumbleExpr(jstate, join->quals);
-			}
-			break;
-		case T_FromExpr:
-			{
-				FromExpr   *from = (FromExpr *) node;
-
-				JumbleExpr(jstate, (Node *) from->fromlist);
-				JumbleExpr(jstate, from->quals);
-			}
-			break;
-		case T_OnConflictExpr:
-			{
-				OnConflictExpr *conf = (OnConflictExpr *) node;
-
-				APP_JUMB(conf->action);
-				JumbleExpr(jstate, (Node *) conf->arbiterElems);
-				JumbleExpr(jstate, conf->arbiterWhere);
-				JumbleExpr(jstate, (Node *) conf->onConflictSet);
-				JumbleExpr(jstate, conf->onConflictWhere);
-				APP_JUMB(conf->constraint);
-				APP_JUMB(conf->exclRelIndex);
-				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
-			}
-			break;
-		case T_List:
-			foreach(temp, (List *) node)
-			{
-				JumbleExpr(jstate, (Node *) lfirst(temp));
-			}
-			break;
-		case T_IntList:
-			foreach(temp, (List *) node)
-			{
-				APP_JUMB(lfirst_int(temp));
-			}
-			break;
-		case T_SortGroupClause:
-			{
-				SortGroupClause *sgc = (SortGroupClause *) node;
-
-				APP_JUMB(sgc->tleSortGroupRef);
-				APP_JUMB(sgc->eqop);
-				APP_JUMB(sgc->sortop);
-				APP_JUMB(sgc->nulls_first);
-			}
-			break;
-		case T_GroupingSet:
-			{
-				GroupingSet *gsnode = (GroupingSet *) node;
-
-				JumbleExpr(jstate, (Node *) gsnode->content);
-			}
-			break;
-		case T_WindowClause:
-			{
-				WindowClause *wc = (WindowClause *) node;
-
-				APP_JUMB(wc->winref);
-				APP_JUMB(wc->frameOptions);
-				JumbleExpr(jstate, (Node *) wc->partitionClause);
-				JumbleExpr(jstate, (Node *) wc->orderClause);
-				JumbleExpr(jstate, wc->startOffset);
-				JumbleExpr(jstate, wc->endOffset);
-			}
-			break;
-		case T_CommonTableExpr:
-			{
-				CommonTableExpr *cte = (CommonTableExpr *) node;
-
-				/* we store the string name because RTE_CTE RTEs need it */
-				APP_JUMB_STRING(cte->ctename);
-				APP_JUMB(cte->ctematerialized);
-				JumbleQuery(jstate, castNode(Query, cte->ctequery));
-			}
-			break;
-		case T_SetOperationStmt:
-			{
-				SetOperationStmt *setop = (SetOperationStmt *) node;
-
-				APP_JUMB(setop->op);
-				APP_JUMB(setop->all);
-				JumbleExpr(jstate, setop->larg);
-				JumbleExpr(jstate, setop->rarg);
-			}
-			break;
-		case T_RangeTblFunction:
-			{
-				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
-
-				JumbleExpr(jstate, rtfunc->funcexpr);
-			}
-			break;
-		case T_TableFunc:
-			{
-				TableFunc  *tablefunc = (TableFunc *) node;
-
-				JumbleExpr(jstate, tablefunc->docexpr);
-				JumbleExpr(jstate, tablefunc->rowexpr);
-				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
-			}
-			break;
-		case T_TableSampleClause:
-			{
-				TableSampleClause *tsc = (TableSampleClause *) node;
-
-				APP_JUMB(tsc->tsmhandler);
-				JumbleExpr(jstate, (Node *) tsc->args);
-				JumbleExpr(jstate, (Node *) tsc->repeatable);
-			}
-			break;
-		default:
-			/* Only a warning, since we can stumble along anyway */
-			elog(WARNING, "unrecognized node type: %d",
-				 (int) nodeTag(node));
-			break;
-	}
-}
-
-/*
- * Record location of constant within query string of query tree
- * that is currently being walked.
- */
-static void
-RecordConstLocation(pgssJumbleState *jstate, int location)
-{
-	/* -1 indicates unknown or undefined location */
-	if (location >= 0)
-	{
-		/* enlarge array if needed */
-		if (jstate->clocations_count >= jstate->clocations_buf_size)
-		{
-			jstate->clocations_buf_size *= 2;
-			jstate->clocations = (pgssLocationLen *)
-				repalloc(jstate->clocations,
-						 jstate->clocations_buf_size *
-						 sizeof(pgssLocationLen));
-		}
-		jstate->clocations[jstate->clocations_count].location = location;
-		/* initialize lengths to -1 to simplify fill_in_constant_lengths */
-		jstate->clocations[jstate->clocations_count].length = -1;
-		jstate->clocations_count++;
-	}
-}
-
 /*
  * Generate a normalized version of the query string that will be used to
  * represent all similar queries.
@@ -3233,7 +2478,7 @@ RecordConstLocation(pgssJumbleState *jstate, int location)
  * Returns a palloc'd string.
  */
 static char *
-generate_normalized_query(pgssJumbleState *jstate, const char *query,
+generate_normalized_query(JumbleState *jstate, const char *query,
 						  int query_loc, int *query_len_p)
 {
 	char	   *norm_query;
@@ -3340,10 +2585,10 @@ generate_normalized_query(pgssJumbleState *jstate, const char *query,
  * reason for a constant to start with a '-'.
  */
 static void
-fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+fill_in_constant_lengths(JumbleState *jstate, const char *query,
 						 int query_loc)
 {
-	pgssLocationLen *locs;
+	LocationLen *locs;
 	core_yyscan_t yyscanner;
 	core_yy_extra_type yyextra;
 	core_YYSTYPE yylval;
@@ -3357,7 +2602,7 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 	 */
 	if (jstate->clocations_count > 1)
 		qsort(jstate->clocations, jstate->clocations_count,
-			  sizeof(pgssLocationLen), comp_location);
+			  sizeof(LocationLen), comp_location);
 	locs = jstate->clocations;
 
 	/* initialize the flex scanner --- should match raw_parser() */
@@ -3437,13 +2682,13 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 }
 
 /*
- * comp_location: comparator for qsorting pgssLocationLen structs by location
+ * comp_location: comparator for qsorting LocationLen structs by location
  */
 static int
 comp_location(const void *a, const void *b)
 {
-	int			l = ((const pgssLocationLen *) a)->location;
-	int			r = ((const pgssLocationLen *) b)->location;
+	int			l = ((const LocationLen *) a)->location;
+	int			r = ((const LocationLen *) b)->location;
 
 	if (l < r)
 		return -1;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.conf b/contrib/pg_stat_statements/pg_stat_statements.conf
index 13346e2807..d98411ea3f 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.conf
+++ b/contrib/pg_stat_statements/pg_stat_statements.conf
@@ -1 +1,2 @@
 shared_preload_libraries = 'pg_stat_statements'
+compute_queryid = on
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index c159fb2957..c59336cd49 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -44,6 +44,8 @@
 #include "parser/parse_target.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
+#include "utils/guc.h"
+#include "utils/queryjumble.h"
 #include "utils/rel.h"
 
 
@@ -103,6 +105,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -115,8 +118,11 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	query = transformTopLevelStmt(pstate, parseTree);
 
+	if (compute_queryid)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
@@ -136,6 +142,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -148,8 +155,11 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 	/* make sure all is well with parameter types */
 	check_variable_parameters(pstate, query);
 
+	if (compute_queryid)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 411cfadbff..0deb3c143f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -719,6 +719,7 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 	ParseState *pstate;
 	Query	   *query;
 	List	   *querytree_list;
+	JumbleState *jstate = NULL;
 
 	Assert(query_string != NULL);	/* required as of 8.4 */
 
@@ -737,8 +738,11 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	query = transformTopLevelStmt(pstate, parsetree);
 
+	if (compute_queryid)
+		jstate = JumbleQuery(query, query_string);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 2397fc2453..1d5327cf64 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -22,6 +22,7 @@ OBJS = \
 	pg_rusage.o \
 	ps_status.o \
 	queryenvironment.o \
+	queryjumble.o \
 	rls.o \
 	sampling.o \
 	superuser.o \
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a62d64eaa4..46a56a4a59 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -510,6 +510,7 @@ extern const struct config_enum_entry dynamic_shared_memory_options[];
 /*
  * GUC option variables that are exported from this module
  */
+bool		compute_queryid = false;
 bool		log_duration = false;
 bool		Debug_print_plan = false;
 bool		Debug_print_parse = false;
@@ -1404,6 +1405,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"compute_queryid", PGC_SUSET, STATS_MONITORING,
+			gettext_noop("Compute query identifiers."),
+			NULL
+		},
+		&compute_queryid,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"log_parser_stats", PGC_SUSET, STATS_MONITORING,
 			gettext_noop("Writes parser performance statistics to the server log."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9cb571f7cc..81bcb9d25c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -591,6 +591,7 @@
 
 # - Monitoring -
 
+#compute_queryid = off
 #log_parser_stats = off
 #log_planner_stats = off
 #log_executor_stats = off
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
new file mode 100644
index 0000000000..ae84fcac6e
--- /dev/null
+++ b/src/backend/utils/misc/queryjumble.c
@@ -0,0 +1,834 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.c
+ *	 Query normalization and fingerprinting.
+ *
+ * Normalization is a process whereby similar queries, typically differing only
+ * in their constants (though the exact rules are somewhat more subtle than
+ * that) are recognized as equivalent, and are tracked as a single entry.  This
+ * is particularly useful for non-prepared queries.
+ *
+ * Normalization is implemented by fingerprinting queries, selectively
+ * serializing those fields of each query tree's nodes that are judged to be
+ * essential to the query.  This is referred to as a query jumble.  This is
+ * distinct from a regular serialization in that various extraneous
+ * information is ignored as irrelevant or not essential to the query, such
+ * as the collations of Vars and, most notably, the values of constants.
+ *
+ * This jumble is acquired at the end of parse analysis of each query, and
+ * a 64-bit hash of it is stored into the query's Query.queryId field.
+ * The server then copies this value around, making it available in plan
+ * tree(s) generated from the query.  The executor can then use this value
+ * to blame query costs on the proper queryId.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/misc/queryjumble.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "common/hashfn.h"
+#include "miscadmin.h"
+#include "parser/scansup.h"
+#include "utils/queryjumble.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+static uint64 compute_utility_queryid(const char *str, int query_len);
+static void AppendJumble(JumbleState *jstate,
+						 const unsigned char *item, Size size);
+static void JumbleQueryInternal(JumbleState *jstate, Query *query);
+static void JumbleRangeTable(JumbleState *jstate, List *rtable);
+static void JumbleRowMarks(JumbleState *jstate, List *rowMarks);
+static void JumbleExpr(JumbleState *jstate, Node *node);
+static void RecordConstLocation(JumbleState *jstate, int location);
+
+/*
+ * Given a possibly multi-statement source string, confine our attention to the
+ * relevant part of the string.
+ */
+const char *
+clean_querytext(const char *query, int *location, int *len)
+{
+	int query_location = *location;
+	int query_len = *len;
+
+	/* First apply starting offset, unless it's -1 (unknown). */
+	if (query_location >= 0)
+	{
+		Assert(query_location <= strlen(query));
+		query += query_location;
+		/* Length of 0 (or -1) means "rest of string" */
+		if (query_len <= 0)
+			query_len = strlen(query);
+		else
+			Assert(query_len <= strlen(query));
+	}
+	else
+	{
+		/* If query location is unknown, distrust query_len as well */
+		query_location = 0;
+		query_len = strlen(query);
+	}
+
+	/*
+	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
+	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 */
+	while (query_len > 0 && scanner_isspace(query[0]))
+		query++, query_location++, query_len--;
+	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
+		query_len--;
+
+	*location = query_location;
+	*len = query_len;
+
+	return query;
+}
+
+JumbleState *
+JumbleQuery(Query *query, const char *querytext)
+{
+	JumbleState *jstate = NULL;
+	if (query->utilityStmt)
+	{
+		const char *sql;
+		int query_location = query->stmt_location;
+		int query_len = query->stmt_len;
+
+		/*
+		 * Confine our attention to the relevant part of the string, if the
+		 * query is a portion of a multi-statement source string.
+		 */
+		sql = clean_querytext(querytext, &query_location, &query_len);
+
+		query->queryId = compute_utility_queryid(sql, query_len);
+	}
+	else
+	{
+		jstate = (JumbleState *) palloc(sizeof(JumbleState));
+
+		/* Set up workspace for query jumbling */
+		jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
+		jstate->jumble_len = 0;
+		jstate->clocations_buf_size = 32;
+		jstate->clocations = (LocationLen *)
+			palloc(jstate->clocations_buf_size * sizeof(LocationLen));
+		jstate->clocations_count = 0;
+		jstate->highest_extern_param_id = 0;
+
+		/* Compute query ID and mark the Query node with it */
+		JumbleQueryInternal(jstate, query);
+		query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
+														  jstate->jumble_len,
+														  0));
+
+		/*
+		 * If we are unlucky enough to get a hash of zero, use 1 instead, to
+		 * prevent confusion with the utility-statement case.
+		 */
+		if (query->queryId == UINT64CONST(0))
+			query->queryId = UINT64CONST(1);
+	}
+
+	return jstate;
+}
+
+/*
+ * Compute a query identifier for the given utility query string.
+ */
+static uint64
+compute_utility_queryid(const char *str, int query_len)
+{
+	uint64 queryId;
+
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+											   query_len, 0));
+
+	/*
+	 * If we are unlucky enough to get a hash of zero(invalid), use
+	 * queryID as 2 instead, queryID 1 is already in use for normal
+	 * statements.
+	 */
+	if (queryId == UINT64CONST(0))
+		queryId = UINT64CONST(2);
+
+	return queryId;
+}
+
+/*
+ * AppendJumble: Append a value that is substantive in a given query to
+ * the current jumble.
+ */
+static void
+AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
+{
+	unsigned char *jumble = jstate->jumble;
+	Size		jumble_len = jstate->jumble_len;
+
+	/*
+	 * Whenever the jumble buffer is full, we hash the current contents and
+	 * reset the buffer to contain just that hash value, thus relying on the
+	 * hash to summarize everything so far.
+	 */
+	while (size > 0)
+	{
+		Size		part_size;
+
+		if (jumble_len >= JUMBLE_SIZE)
+		{
+			uint64		start_hash;
+
+			start_hash = DatumGetUInt64(hash_any_extended(jumble,
+														  JUMBLE_SIZE, 0));
+			memcpy(jumble, &start_hash, sizeof(start_hash));
+			jumble_len = sizeof(start_hash);
+		}
+		part_size = Min(size, JUMBLE_SIZE - jumble_len);
+		memcpy(jumble + jumble_len, item, part_size);
+		jumble_len += part_size;
+		item += part_size;
+		size -= part_size;
+	}
+	jstate->jumble_len = jumble_len;
+}
+
+/*
+ * Wrappers around AppendJumble to encapsulate details of serialization
+ * of individual local variable elements.
+ */
+#define APP_JUMB(item) \
+	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
+#define APP_JUMB_STRING(str) \
+	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
+
+/*
+ * JumbleQueryInternal: Selectively serialize the query tree, appending
+ * significant data to the "query jumble" while ignoring nonsignificant data.
+ *
+ * Rule of thumb for what to include is that we should ignore anything not
+ * semantically significant (such as alias names) as well as anything that can
+ * be deduced from child nodes (else we'd just be double-hashing that piece
+ * of information).
+ */
+static void
+JumbleQueryInternal(JumbleState *jstate, Query *query)
+{
+	Assert(IsA(query, Query));
+	Assert(query->utilityStmt == NULL);
+
+	APP_JUMB(query->commandType);
+	/* resultRelation is usually predictable from commandType */
+	JumbleExpr(jstate, (Node *) query->cteList);
+	JumbleRangeTable(jstate, query->rtable);
+	JumbleExpr(jstate, (Node *) query->jointree);
+	JumbleExpr(jstate, (Node *) query->targetList);
+	JumbleExpr(jstate, (Node *) query->onConflict);
+	JumbleExpr(jstate, (Node *) query->returningList);
+	JumbleExpr(jstate, (Node *) query->groupClause);
+	JumbleExpr(jstate, (Node *) query->groupingSets);
+	JumbleExpr(jstate, query->havingQual);
+	JumbleExpr(jstate, (Node *) query->windowClause);
+	JumbleExpr(jstate, (Node *) query->distinctClause);
+	JumbleExpr(jstate, (Node *) query->sortClause);
+	JumbleExpr(jstate, query->limitOffset);
+	JumbleExpr(jstate, query->limitCount);
+	JumbleRowMarks(jstate, query->rowMarks);
+	JumbleExpr(jstate, query->setOperations);
+}
+
+/*
+ * Jumble a range table
+ */
+static void
+JumbleRangeTable(JumbleState *jstate, List *rtable)
+{
+	ListCell   *lc;
+
+	foreach(lc, rtable)
+	{
+		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+		APP_JUMB(rte->rtekind);
+		switch (rte->rtekind)
+		{
+			case RTE_RELATION:
+				APP_JUMB(rte->relid);
+				JumbleExpr(jstate, (Node *) rte->tablesample);
+				break;
+			case RTE_SUBQUERY:
+				JumbleQueryInternal(jstate, rte->subquery);
+				break;
+			case RTE_JOIN:
+				APP_JUMB(rte->jointype);
+				break;
+			case RTE_FUNCTION:
+				JumbleExpr(jstate, (Node *) rte->functions);
+				break;
+			case RTE_TABLEFUNC:
+				JumbleExpr(jstate, (Node *) rte->tablefunc);
+				break;
+			case RTE_VALUES:
+				JumbleExpr(jstate, (Node *) rte->values_lists);
+				break;
+			case RTE_CTE:
+
+				/*
+				 * Depending on the CTE name here isn't ideal, but it's the
+				 * only info we have to identify the referenced WITH item.
+				 */
+				APP_JUMB_STRING(rte->ctename);
+				APP_JUMB(rte->ctelevelsup);
+				break;
+			case RTE_NAMEDTUPLESTORE:
+				APP_JUMB_STRING(rte->enrname);
+				break;
+			case RTE_RESULT:
+				break;
+			default:
+				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
+				break;
+		}
+	}
+}
+
+/*
+ * Jumble a rowMarks list
+ */
+static void
+JumbleRowMarks(JumbleState *jstate, List *rowMarks)
+{
+	ListCell   *lc;
+
+	foreach(lc, rowMarks)
+	{
+		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
+
+		if (!rowmark->pushedDown)
+		{
+			APP_JUMB(rowmark->rti);
+			APP_JUMB(rowmark->strength);
+			APP_JUMB(rowmark->waitPolicy);
+		}
+	}
+}
+
+/*
+ * Jumble an expression tree
+ *
+ * In general this function should handle all the same node types that
+ * expression_tree_walker() does, and therefore it's coded to be as parallel
+ * to that function as possible.  However, since we are only invoked on
+ * queries immediately post-parse-analysis, we need not handle node types
+ * that only appear in planning.
+ *
+ * Note: the reason we don't simply use expression_tree_walker() is that the
+ * point of that function is to support tree walkers that don't care about
+ * most tree node types, but here we care about all types.  We should complain
+ * about any unrecognized node type.
+ */
+static void
+JumbleExpr(JumbleState *jstate, Node *node)
+{
+	ListCell   *temp;
+
+	if (node == NULL)
+		return;
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+
+	/*
+	 * We always emit the node's NodeTag, then any additional fields that are
+	 * considered significant, and then we recurse to any child nodes.
+	 */
+	APP_JUMB(node->type);
+
+	switch (nodeTag(node))
+	{
+		case T_Var:
+			{
+				Var		   *var = (Var *) node;
+
+				APP_JUMB(var->varno);
+				APP_JUMB(var->varattno);
+				APP_JUMB(var->varlevelsup);
+			}
+			break;
+		case T_Const:
+			{
+				Const	   *c = (Const *) node;
+
+				/* We jumble only the constant's type, not its value */
+				APP_JUMB(c->consttype);
+				/* Also, record its parse location for query normalization */
+				RecordConstLocation(jstate, c->location);
+			}
+			break;
+		case T_Param:
+			{
+				Param	   *p = (Param *) node;
+
+				APP_JUMB(p->paramkind);
+				APP_JUMB(p->paramid);
+				APP_JUMB(p->paramtype);
+				/* Also, track the highest external Param id */
+				if (p->paramkind == PARAM_EXTERN &&
+					p->paramid > jstate->highest_extern_param_id)
+					jstate->highest_extern_param_id = p->paramid;
+			}
+			break;
+		case T_Aggref:
+			{
+				Aggref	   *expr = (Aggref *) node;
+
+				APP_JUMB(expr->aggfnoid);
+				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggorder);
+				JumbleExpr(jstate, (Node *) expr->aggdistinct);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_GroupingFunc:
+			{
+				GroupingFunc *grpnode = (GroupingFunc *) node;
+
+				JumbleExpr(jstate, (Node *) grpnode->refs);
+			}
+			break;
+		case T_WindowFunc:
+			{
+				WindowFunc *expr = (WindowFunc *) node;
+
+				APP_JUMB(expr->winfnoid);
+				APP_JUMB(expr->winref);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_SubscriptingRef:
+			{
+				SubscriptingRef *sbsref = (SubscriptingRef *) node;
+
+				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
+			}
+			break;
+		case T_FuncExpr:
+			{
+				FuncExpr   *expr = (FuncExpr *) node;
+
+				APP_JUMB(expr->funcid);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_NamedArgExpr:
+			{
+				NamedArgExpr *nae = (NamedArgExpr *) node;
+
+				APP_JUMB(nae->argnumber);
+				JumbleExpr(jstate, (Node *) nae->arg);
+			}
+			break;
+		case T_OpExpr:
+		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
+		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
+			{
+				OpExpr	   *expr = (OpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_ScalarArrayOpExpr:
+			{
+				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				APP_JUMB(expr->useOr);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_BoolExpr:
+			{
+				BoolExpr   *expr = (BoolExpr *) node;
+
+				APP_JUMB(expr->boolop);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_SubLink:
+			{
+				SubLink    *sublink = (SubLink *) node;
+
+				APP_JUMB(sublink->subLinkType);
+				APP_JUMB(sublink->subLinkId);
+				JumbleExpr(jstate, (Node *) sublink->testexpr);
+				JumbleQueryInternal(jstate, castNode(Query, sublink->subselect));
+			}
+			break;
+		case T_FieldSelect:
+			{
+				FieldSelect *fs = (FieldSelect *) node;
+
+				APP_JUMB(fs->fieldnum);
+				JumbleExpr(jstate, (Node *) fs->arg);
+			}
+			break;
+		case T_FieldStore:
+			{
+				FieldStore *fstore = (FieldStore *) node;
+
+				JumbleExpr(jstate, (Node *) fstore->arg);
+				JumbleExpr(jstate, (Node *) fstore->newvals);
+			}
+			break;
+		case T_RelabelType:
+			{
+				RelabelType *rt = (RelabelType *) node;
+
+				APP_JUMB(rt->resulttype);
+				JumbleExpr(jstate, (Node *) rt->arg);
+			}
+			break;
+		case T_CoerceViaIO:
+			{
+				CoerceViaIO *cio = (CoerceViaIO *) node;
+
+				APP_JUMB(cio->resulttype);
+				JumbleExpr(jstate, (Node *) cio->arg);
+			}
+			break;
+		case T_ArrayCoerceExpr:
+			{
+				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
+
+				APP_JUMB(acexpr->resulttype);
+				JumbleExpr(jstate, (Node *) acexpr->arg);
+				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
+			}
+			break;
+		case T_ConvertRowtypeExpr:
+			{
+				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
+
+				APP_JUMB(crexpr->resulttype);
+				JumbleExpr(jstate, (Node *) crexpr->arg);
+			}
+			break;
+		case T_CollateExpr:
+			{
+				CollateExpr *ce = (CollateExpr *) node;
+
+				APP_JUMB(ce->collOid);
+				JumbleExpr(jstate, (Node *) ce->arg);
+			}
+			break;
+		case T_CaseExpr:
+			{
+				CaseExpr   *caseexpr = (CaseExpr *) node;
+
+				JumbleExpr(jstate, (Node *) caseexpr->arg);
+				foreach(temp, caseexpr->args)
+				{
+					CaseWhen   *when = lfirst_node(CaseWhen, temp);
+
+					JumbleExpr(jstate, (Node *) when->expr);
+					JumbleExpr(jstate, (Node *) when->result);
+				}
+				JumbleExpr(jstate, (Node *) caseexpr->defresult);
+			}
+			break;
+		case T_CaseTestExpr:
+			{
+				CaseTestExpr *ct = (CaseTestExpr *) node;
+
+				APP_JUMB(ct->typeId);
+			}
+			break;
+		case T_ArrayExpr:
+			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
+			break;
+		case T_RowExpr:
+			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
+			break;
+		case T_RowCompareExpr:
+			{
+				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
+
+				APP_JUMB(rcexpr->rctype);
+				JumbleExpr(jstate, (Node *) rcexpr->largs);
+				JumbleExpr(jstate, (Node *) rcexpr->rargs);
+			}
+			break;
+		case T_CoalesceExpr:
+			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
+			break;
+		case T_MinMaxExpr:
+			{
+				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
+
+				APP_JUMB(mmexpr->op);
+				JumbleExpr(jstate, (Node *) mmexpr->args);
+			}
+			break;
+		case T_SQLValueFunction:
+			{
+				SQLValueFunction *svf = (SQLValueFunction *) node;
+
+				APP_JUMB(svf->op);
+				/* type is fully determined by op */
+				APP_JUMB(svf->typmod);
+			}
+			break;
+		case T_XmlExpr:
+			{
+				XmlExpr    *xexpr = (XmlExpr *) node;
+
+				APP_JUMB(xexpr->op);
+				JumbleExpr(jstate, (Node *) xexpr->named_args);
+				JumbleExpr(jstate, (Node *) xexpr->args);
+			}
+			break;
+		case T_NullTest:
+			{
+				NullTest   *nt = (NullTest *) node;
+
+				APP_JUMB(nt->nulltesttype);
+				JumbleExpr(jstate, (Node *) nt->arg);
+			}
+			break;
+		case T_BooleanTest:
+			{
+				BooleanTest *bt = (BooleanTest *) node;
+
+				APP_JUMB(bt->booltesttype);
+				JumbleExpr(jstate, (Node *) bt->arg);
+			}
+			break;
+		case T_CoerceToDomain:
+			{
+				CoerceToDomain *cd = (CoerceToDomain *) node;
+
+				APP_JUMB(cd->resulttype);
+				JumbleExpr(jstate, (Node *) cd->arg);
+			}
+			break;
+		case T_CoerceToDomainValue:
+			{
+				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
+
+				APP_JUMB(cdv->typeId);
+			}
+			break;
+		case T_SetToDefault:
+			{
+				SetToDefault *sd = (SetToDefault *) node;
+
+				APP_JUMB(sd->typeId);
+			}
+			break;
+		case T_CurrentOfExpr:
+			{
+				CurrentOfExpr *ce = (CurrentOfExpr *) node;
+
+				APP_JUMB(ce->cvarno);
+				if (ce->cursor_name)
+					APP_JUMB_STRING(ce->cursor_name);
+				APP_JUMB(ce->cursor_param);
+			}
+			break;
+		case T_NextValueExpr:
+			{
+				NextValueExpr *nve = (NextValueExpr *) node;
+
+				APP_JUMB(nve->seqid);
+				APP_JUMB(nve->typeId);
+			}
+			break;
+		case T_InferenceElem:
+			{
+				InferenceElem *ie = (InferenceElem *) node;
+
+				APP_JUMB(ie->infercollid);
+				APP_JUMB(ie->inferopclass);
+				JumbleExpr(jstate, ie->expr);
+			}
+			break;
+		case T_TargetEntry:
+			{
+				TargetEntry *tle = (TargetEntry *) node;
+
+				APP_JUMB(tle->resno);
+				APP_JUMB(tle->ressortgroupref);
+				JumbleExpr(jstate, (Node *) tle->expr);
+			}
+			break;
+		case T_RangeTblRef:
+			{
+				RangeTblRef *rtr = (RangeTblRef *) node;
+
+				APP_JUMB(rtr->rtindex);
+			}
+			break;
+		case T_JoinExpr:
+			{
+				JoinExpr   *join = (JoinExpr *) node;
+
+				APP_JUMB(join->jointype);
+				APP_JUMB(join->isNatural);
+				APP_JUMB(join->rtindex);
+				JumbleExpr(jstate, join->larg);
+				JumbleExpr(jstate, join->rarg);
+				JumbleExpr(jstate, join->quals);
+			}
+			break;
+		case T_FromExpr:
+			{
+				FromExpr   *from = (FromExpr *) node;
+
+				JumbleExpr(jstate, (Node *) from->fromlist);
+				JumbleExpr(jstate, from->quals);
+			}
+			break;
+		case T_OnConflictExpr:
+			{
+				OnConflictExpr *conf = (OnConflictExpr *) node;
+
+				APP_JUMB(conf->action);
+				JumbleExpr(jstate, (Node *) conf->arbiterElems);
+				JumbleExpr(jstate, conf->arbiterWhere);
+				JumbleExpr(jstate, (Node *) conf->onConflictSet);
+				JumbleExpr(jstate, conf->onConflictWhere);
+				APP_JUMB(conf->constraint);
+				APP_JUMB(conf->exclRelIndex);
+				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
+			}
+			break;
+		case T_List:
+			foreach(temp, (List *) node)
+			{
+				JumbleExpr(jstate, (Node *) lfirst(temp));
+			}
+			break;
+		case T_IntList:
+			foreach(temp, (List *) node)
+			{
+				APP_JUMB(lfirst_int(temp));
+			}
+			break;
+		case T_SortGroupClause:
+			{
+				SortGroupClause *sgc = (SortGroupClause *) node;
+
+				APP_JUMB(sgc->tleSortGroupRef);
+				APP_JUMB(sgc->eqop);
+				APP_JUMB(sgc->sortop);
+				APP_JUMB(sgc->nulls_first);
+			}
+			break;
+		case T_GroupingSet:
+			{
+				GroupingSet *gsnode = (GroupingSet *) node;
+
+				JumbleExpr(jstate, (Node *) gsnode->content);
+			}
+			break;
+		case T_WindowClause:
+			{
+				WindowClause *wc = (WindowClause *) node;
+
+				APP_JUMB(wc->winref);
+				APP_JUMB(wc->frameOptions);
+				JumbleExpr(jstate, (Node *) wc->partitionClause);
+				JumbleExpr(jstate, (Node *) wc->orderClause);
+				JumbleExpr(jstate, wc->startOffset);
+				JumbleExpr(jstate, wc->endOffset);
+			}
+			break;
+		case T_CommonTableExpr:
+			{
+				CommonTableExpr *cte = (CommonTableExpr *) node;
+
+				/* we store the string name because RTE_CTE RTEs need it */
+				APP_JUMB_STRING(cte->ctename);
+				APP_JUMB(cte->ctematerialized);
+				JumbleQueryInternal(jstate, castNode(Query, cte->ctequery));
+			}
+			break;
+		case T_SetOperationStmt:
+			{
+				SetOperationStmt *setop = (SetOperationStmt *) node;
+
+				APP_JUMB(setop->op);
+				APP_JUMB(setop->all);
+				JumbleExpr(jstate, setop->larg);
+				JumbleExpr(jstate, setop->rarg);
+			}
+			break;
+		case T_RangeTblFunction:
+			{
+				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
+
+				JumbleExpr(jstate, rtfunc->funcexpr);
+			}
+			break;
+		case T_TableFunc:
+			{
+				TableFunc  *tablefunc = (TableFunc *) node;
+
+				JumbleExpr(jstate, tablefunc->docexpr);
+				JumbleExpr(jstate, tablefunc->rowexpr);
+				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
+			}
+			break;
+		case T_TableSampleClause:
+			{
+				TableSampleClause *tsc = (TableSampleClause *) node;
+
+				APP_JUMB(tsc->tsmhandler);
+				JumbleExpr(jstate, (Node *) tsc->args);
+				JumbleExpr(jstate, (Node *) tsc->repeatable);
+			}
+			break;
+		default:
+			/* Only a warning, since we can stumble along anyway */
+			elog(WARNING, "unrecognized node type: %d",
+				 (int) nodeTag(node));
+			break;
+	}
+}
+
+/*
+ * Record location of constant within query string of query tree
+ * that is currently being walked.
+ */
+static void
+RecordConstLocation(JumbleState *jstate, int location)
+{
+	/* -1 indicates unknown or undefined location */
+	if (location >= 0)
+	{
+		/* enlarge array if needed */
+		if (jstate->clocations_count >= jstate->clocations_buf_size)
+		{
+			jstate->clocations_buf_size *= 2;
+			jstate->clocations = (LocationLen *)
+				repalloc(jstate->clocations,
+						 jstate->clocations_buf_size *
+						 sizeof(LocationLen));
+		}
+		jstate->clocations[jstate->clocations_count].location = location;
+		/* initialize lengths to -1 to simplify third-party module usage */
+		jstate->clocations[jstate->clocations_count].length = -1;
+		jstate->clocations_count++;
+	}
+}
diff --git a/src/include/parser/analyze.h b/src/include/parser/analyze.h
index 9d09a02141..e31c75d3a5 100644
--- a/src/include/parser/analyze.h
+++ b/src/include/parser/analyze.h
@@ -15,10 +15,12 @@
 #define ANALYZE_H
 
 #include "parser/parse_node.h"
+#include "utils/queryjumble.h"
 
 /* Hook for plugins to get control at end of parse analysis */
 typedef void (*post_parse_analyze_hook_type) (ParseState *pstate,
-											  Query *query);
+											  Query *query,
+											  JumbleState *jstate);
 extern PGDLLIMPORT post_parse_analyze_hook_type post_parse_analyze_hook;
 
 
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 073c8f3e06..57b854ce6b 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -248,6 +248,7 @@ extern bool log_btree_build_stats;
 extern PGDLLIMPORT bool check_function_bodies;
 extern bool session_auth_is_superuser;
 
+extern bool compute_queryid;
 extern bool log_duration;
 extern int	log_parameter_max_length;
 extern int	log_parameter_max_length_on_error;
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
new file mode 100644
index 0000000000..14087eea43
--- /dev/null
+++ b/src/include/utils/queryjumble.h
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.h
+ *	  Query normalization and fingerprinting.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/utils/queryjumble.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef QUERYJUBLE_H
+#define QUERYJUBLE_H
+
+#include "nodes/parsenodes.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+/*
+ * Struct for tracking locations/lengths of constants during normalization
+ */
+typedef struct LocationLen
+{
+	int			location;		/* start offset in query text */
+	int			length;			/* length in bytes, or -1 to ignore */
+} LocationLen;
+
+/*
+ * Working state for computing a query jumble and producing a normalized
+ * query string
+ */
+typedef struct JumbleState
+{
+	/* Jumble of current query tree */
+	unsigned char *jumble;
+
+	/* Number of bytes used in jumble[] */
+	Size		jumble_len;
+
+	/* Array of locations of constants that should be removed */
+	LocationLen *clocations;
+
+	/* Allocated length of clocations array */
+	int			clocations_buf_size;
+
+	/* Current number of valid entries in clocations array */
+	int			clocations_count;
+
+	/* highest Param id we've seen, in order to start normalization correctly */
+	int			highest_extern_param_id;
+} JumbleState;
+
+const char *clean_querytext(const char *query, int *location, int *len);
+JumbleState *JumbleQuery(Query *query, const char *querytext);
+
+#endif							/* QUERYJUMBLE_H */
-- 
2.28.0

rjuju123@gmail.com

about 5 years ago

In reply to: Julien Rouhaud (#87)

3 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Oct 14, 2020 at 5:43 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Tue, Oct 13, 2020 at 4:53 AM Bruce Momjian <bruce@momjian.us> wrote:

On Mon, Oct 12, 2020 at 04:07:30PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote:

Yeah, I agree --- a version number is the wrong way to think about this.

The version number was to invalidate _all_ query hashes if the
algorithm is slightly modified, rather than invalidating just some of
them, which could lead to confusion.

Color me skeptical as to the use-case for that. From users' standpoints,
the hash is mainly going to change when we change the set of parse node
fields that get hashed. Which is going to happen at every major release
and no (or at least epsilon) minor releases. So I do not see a point in
tracking an algorithm version number as such. Seems like make-work.

OK, I came up with the hash idea only to address one of your concerns
about mismatched hashes for algorithm improvements/changes. Seems we
might as well just document that cross-version hashes are different.

Ok, so I tried to implement what seems to be the consensus. First
attached patch moves the current pgss queryid computation in core,
with a new compute_queryid GUC (on/off). One thing I don't really
like about this patch is that the JumbleState that pgss needs in order
to normalize the query string (the constants location and such) has to
be done by the core while computing the queryid and provided to pgss
in post_parse_analyse hook. That isn't ideal as it looks very
specific to pgss needs. On the other hand it means that you can now
use pgss with custom queryid heuristics by disabling compute_queryid
and having your module doing only that in post_parse_analyse_hook.
You'll however need to be careful to configure
shared_preload_libraries such that your custom module's
post_parse_analyse_hook is called first, so pgss' one can be called
with the needed JumbleState. Note that if no JumbleState is provided
pgss will store non normalized queries, but will otherwise behave as
intended.

The 2nd patch is the rebased original queryid exposure patch. No big
changes, except that it now handles utility statements queryid
generated during post_parse_analysis, same as regular queries. This
should simplify the work needed for custom queryid third party
modules.

The 3rd patch changes explain (verbose) to display the queryid if one
has been generated, whether by core or a third-party module. For
instance:

rjuju=# set compute_queryid = on;
SET
rjuju=# explain (verbose) select relname from pg_class;
QUERY PLAN
-----------------------------------------------------------------------
Seq Scan on pg_catalog.pg_class (cost=0.00..16.90 rows=390 width=64)
Output: relname
Query Identifier: -5494854185674379299
(3 rows)

There was a possibly uninitialized var issue in the previous patches
(thanks cfbot), v13 fixes that.

Attachments:

v13-0002-Expose-queryid-in-pg_stat_activity-and-log_line_.patchtext/x-patch; charset=US-ASCII; name=v13-0002-Expose-queryid-in-pg_stat_activity-and-log_line_.patchDownload

From ee578a9128898d69ff50bf5db59bebf55ed13250 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Mon, 18 Mar 2019 18:55:50 +0100
Subject: [PATCH v13 2/3] Expose queryid in pg_stat_activity and
 log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.

Author: Julien Rouhaud
Reviewed-by: Evgeny Efimkin, Michael Paquier, Yamada Tatsuro, Atsushi Torikoshi
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 112 +++++++-----------
 doc/src/sgml/config.sgml                      |   9 +-
 doc/src/sgml/monitoring.sgml                  |  15 +++
 src/backend/catalog/system_views.sql          |   1 +
 src/backend/executor/execMain.c               |   8 ++
 src/backend/executor/execParallel.c           |  14 ++-
 src/backend/executor/nodeGather.c             |   3 +-
 src/backend/executor/nodeGatherMerge.c        |   4 +-
 src/backend/parser/analyze.c                  |   5 +
 src/backend/postmaster/pgstat.c               |  65 ++++++++++
 src/backend/tcop/postgres.c                   |   5 +
 src/backend/utils/adt/pgstatfuncs.c           |   7 +-
 src/backend/utils/error/elog.c                |  10 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          |  29 +++--
 src/include/catalog/pg_proc.dat               |   6 +-
 src/include/executor/execParallel.h           |   3 +-
 src/include/pgstat.h                          |   5 +
 src/include/utils/queryjumble.h               |   2 +-
 src/test/regress/expected/rules.out           |   9 +-
 20 files changed, 209 insertions(+), 104 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index f352d0b615..2a69dbb88e 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -65,6 +65,7 @@
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/queryjumble.h"
 #include "utils/memutils.h"
 
 PG_MODULE_MAGIC;
@@ -98,6 +99,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+									!IsA(n, PrepareStmt) && \
+									!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -295,7 +304,6 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 								ProcessUtilityContext context, ParamListInfo params,
 								QueryEnvironment *queryEnv,
 								DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
 static void pgss_store(const char *query, uint64 queryId,
 					   int query_location, int query_len,
 					   pgssStoreKind kind,
@@ -783,16 +791,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * Clear queryId for prepared statements related utility, as those will
+	 * inherit from the underlying statement's one (except DEALLOCATE which is
+	 * entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt))
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -1034,6 +1040,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, QueryCompletion *qc)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
+	uint64		saved_queryId = pstmt->queryId;
+
+	/*
+	 * Force utility statements to get queryId zero.  We do this even in cases
+	 * where the statement contains an optimizable statement for which a
+	 * queryId could be derived (such as EXPLAIN or DECLARE CURSOR).  For such
+	 * cases, runtime control will first go through ProcessUtility and then the
+	 * executor, and we don't want the executor hooks to do anything, since we
+	 * are already measuring the statement's costs at the utility level.
+	 *
+	 * Note that this is only done if pg_stat_statements is enabled and
+	 * configured to track utility statements, in the unlikely possibility
+	 * that user configured another extension to handle utility statements
+	 * only.
+	 */
+	if (pgss_enabled(exec_nested_level) && pgss_track_utility)
+		pstmt->queryId = UINT64CONST(0);
 
 	/*
 	 * If it's an EXECUTE statement, we don't track it and don't increment the
@@ -1050,9 +1073,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	 * Likewise, we don't track execution of DEALLOCATE.
 	 */
 	if (pgss_track_utility && pgss_enabled(exec_nested_level) &&
-		!IsA(parsetree, ExecuteStmt) &&
-		!IsA(parsetree, PrepareStmt) &&
-		!IsA(parsetree, DeallocateStmt))
+		PGSS_HANDLED_UTILITY(parsetree))
 	{
 		instr_time	start;
 		instr_time	duration;
@@ -1106,7 +1127,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
 
 		pgss_store(queryString,
-				   0,			/* signal that it's a utility stmt */
+				   saved_queryId,
 				   pstmt->stmt_location,
 				   pstmt->stmt_len,
 				   PGSS_EXEC,
@@ -1129,23 +1150,12 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	}
 }
 
-/*
- * Given an arbitrarily long query string, produce a hash for the purposes of
- * identifying the query, without normalizing constants.  Used when hashing
- * utility statements.
- */
-static uint64
-pgss_hash_string(const char *str, int len)
-{
-	return DatumGetUInt64(hash_any_extended((const unsigned char *) str,
-											len, 0));
-}
-
 /*
  * Store some statistics for a statement.
  *
- * If queryId is 0 then this is a utility statement and we should compute
- * a suitable queryId internally.
+ * If queryId is 0 then this is a utility statement for which we couldn't
+ * compute a queryId during parse analysis, and we should compute a suitable
+ * queryId internally.
  *
  * If jstate is not NULL then we're trying to create an entry for which
  * we have no statistics as yet; we just want to record the normalized
@@ -1176,52 +1186,18 @@ pgss_store(const char *query, uint64 queryId,
 		return;
 
 	/*
-	 * Confine our attention to the relevant part of the string, if the query
-	 * is a portion of a multi-statement source string.
-	 *
-	 * First apply starting offset, unless it's -1 (unknown).
-	 */
-	if (query_location >= 0)
-	{
-		Assert(query_location <= strlen(query));
-		query += query_location;
-		/* Length of 0 (or -1) means "rest of string" */
-		if (query_len <= 0)
-			query_len = strlen(query);
-		else
-			Assert(query_len <= strlen(query));
-	}
-	else
-	{
-		/* If query location is unknown, distrust query_len as well */
-		query_location = 0;
-		query_len = strlen(query);
-	}
-
-	/*
-	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
-	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 * Nothing to do if compute_queryid isn't enabled and no other module
+	 * computed a query identifier.
 	 */
-	while (query_len > 0 && scanner_isspace(query[0]))
-		query++, query_location++, query_len--;
-	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
-		query_len--;
+	if (queryId == UINT64CONST(0))
+		return;
 
 	/*
-	 * For utility statements, we just hash the query string to get an ID.
+	 * Confine our attention to the relevant part of the string, if the query
+	 * is a portion of a multi-statement source string, and update query
+	 * location and length if needed.
 	 */
-	if (queryId == UINT64CONST(0))
-	{
-		queryId = pgss_hash_string(query, query_len);
-
-		/*
-		 * If we are unlucky enough to get a hash of zero(invalid), use
-		 * queryID as 2 instead, queryID 1 is already in use for normal
-		 * statements.
-		 */
-		if (queryId == UINT64CONST(0))
-			queryId = UINT64CONST(2);
-	}
+	query = CleanQuerytext(query, &query_location, &query_len);
 
 	/* Set up key for hashtable search */
 	key.userid = GetUserId();
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ee914740cc..a6e772c8b4 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6759,6 +6759,11 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>queryid: identifier of session's current query, if any</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
@@ -7213,8 +7218,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       <listitem>
        <para>
         Enables the collection of information on the currently
-        executing command of each session, along with the time when
-        that command began execution. This parameter is on by
+        executing command of each session, along with its identifier and the
+        time when that command began execution. This parameter is on by
         default. Note that even when enabled, this information is not
         visible to all users, only to superusers and the user owning
         the session being reported on, so it should not represent a
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 66566765f0..1618ae00c8 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -899,6 +899,21 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </para></entry>
      </row>
 
+    <row>
+     <entry role="catalog_table_entry"><para role="column_definition">
+      <structfield>queryid</structfield> <type>bigint</type>
+     </para>
+     <para>
+      Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this field
+      shows the identifier of the currently executing query. In all other
+      states, it shows the identifier of last query that was executed.  By
+      default, query identifiers are not computed, so this field will always
+      be null, unless an additional module that compute query identifiers, such
+      as <xref linkend="pgstatstatements"/>, is configured.
+     </para></entry>
+    </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index c29390760f..1c81991fab 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -764,6 +764,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 783eecbc13..79a6f21e24 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -54,6 +54,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "tcop/utility.h"
@@ -142,6 +143,13 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	/* In some cases (e.g. an EXECUTE statement) a query execution will skip
+	 * parse analysis, which means that the queryid won't be reported.  Note
+	 * that it's harmless to report the queryid multiple time, as the call will
+	 * be ignored if the top level queryid has already been reported.
+	 */
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index befde52691..44976d2c68 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -124,7 +124,7 @@ typedef struct ExecParallelInitializeDSMContext
 } ExecParallelInitializeDSMContext;
 
 /* Helper functions that run in the parallel leader. */
-static char *ExecSerializePlan(Plan *plan, EState *estate);
+static char *ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId);
 static bool ExecParallelEstimate(PlanState *node,
 								 ExecParallelEstimateContext *e);
 static bool ExecParallelInitializeDSM(PlanState *node,
@@ -143,7 +143,7 @@ static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
  * Create a serialized representation of the plan to be sent to each worker.
  */
 static char *
-ExecSerializePlan(Plan *plan, EState *estate)
+ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId)
 {
 	PlannedStmt *pstmt;
 	ListCell   *lc;
@@ -174,7 +174,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = UINT64CONST(0);
+	pstmt->queryId = queryId;
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
@@ -578,7 +578,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
 ParallelExecutorInfo *
 ExecInitParallelPlan(PlanState *planstate, EState *estate,
 					 Bitmapset *sendParams, int nworkers,
-					 int64 tuples_needed)
+					 int64 tuples_needed,
+					 uint64 queryId)
 {
 	ParallelExecutorInfo *pei;
 	ParallelContext *pcxt;
@@ -620,7 +621,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pei->planstate = planstate;
 
 	/* Fix up and serialize plan to be sent to workers. */
-	pstmt_data = ExecSerializePlan(planstate->plan, estate);
+	pstmt_data = ExecSerializePlan(planstate->plan, estate, queryId);
 
 	/* Create a parallel context. */
 	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -1403,8 +1404,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Setting debug_query_string for individual workers */
 	debug_query_string = queryDesc->sourceText;
 
-	/* Report workers' query for monitoring purposes */
+	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a01b46af14..0fb003aaec 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -172,7 +172,8 @@ ExecGather(PlanState *pstate)
 												 estate,
 												 gather->initParam,
 												 gather->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 47129344f3..e6017675e7 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -24,6 +24,7 @@
 #include "lib/binaryheap.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -216,7 +217,8 @@ ExecGatherMerge(PlanState *pstate)
 												 estate,
 												 gm->initParam,
 												 gm->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index c59336cd49..cd05c15a22 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -43,6 +43,7 @@
 #include "parser/parse_relation.h"
 #include "parser/parse_target.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/guc.h"
 #include "utils/queryjumble.h"
@@ -126,6 +127,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
@@ -163,6 +166,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 822f0ebc62..105fadcad4 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3302,6 +3302,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
+			beentry->st_queryid = 0;
 			proc->wait_event_info = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
@@ -3332,6 +3333,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	beentry->st_state = state;
 	beentry->st_state_start_timestamp = current_timestamp;
 
+	/*
+	 * If a new query is started, we reset the query identifier as it'll only
+	 * be known after parse analysis, to avoid reporting last query's
+	 * identifier.
+	 */
+	if (state == STATE_RUNNING)
+		beentry->st_queryid = 0;
+
 	if (cmd_str != NULL)
 	{
 		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
@@ -3342,6 +3351,48 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/* --------
+ * pgstat_report_queryid() -
+ *
+ *	Called to update top-level query identifier.
+ * --------
+ */
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	/*
+	 * if track_activities is disabled, st_queryid should already have been
+	 * reset
+	 */
+	if (!pgstat_track_activities)
+		return;
+
+	/*
+	 * We only report the top-level query identifiers.  The stored queryid is
+	 * reset when a backend calls pgstat_report_activity(STATE_RUNNING), or
+	 * with an explicit call to this function using the force flag.  If the
+	 * saved query identifier is not zero it means that it's not a top-level
+	 * command, so ignore the one provided unless it's an explicit call to
+	 * reset the identifier.
+	 */
+	if (beentry->st_queryid != 0 && !force)
+		return;
+
+	/*
+	 * Update my status entry, following the protocol of bumping
+	 * st_changecount before and after.  We use a volatile pointer here to
+	 * ensure the compiler doesn't try to get cute.
+	 */
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+	beentry->st_queryid = queryId;
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
 /*-----------
  * pgstat_progress_start_command() -
  *
@@ -5000,6 +5051,20 @@ pgstat_get_db_entry(Oid databaseid, bool create)
 	return result;
 }
 
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ *	Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+	if (!MyBEEntry)
+		return 0;
+
+	return MyBEEntry->st_queryid;
+}
+
 
 /*
  * Lookup the hash table entry for the specified table. If no hash
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 0deb3c143f..5a66573f2f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -746,6 +746,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	if (log_parser_stats)
 		ShowUsage("PARSE ANALYSIS STATISTICS");
 
@@ -964,6 +966,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
 			stmt->utilityStmt = query->utilityStmt;
 			stmt->stmt_location = query->stmt_location;
 			stmt->stmt_len = query->stmt_len;
+			stmt->queryId = query->queryId;
 		}
 		else
 		{
@@ -1080,6 +1083,8 @@ exec_simple_query(const char *query_string)
 		DestReceiver *receiver;
 		int16		format;
 
+		pgstat_report_queryid(0, true);
+
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 0d0d2e6d2b..8dad50bc32 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -567,7 +567,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	30
+#define PG_STAT_GET_ACTIVITY_COLS	31
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -913,6 +913,10 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[28] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			if (beentry->st_queryid == 0)
+				nulls[30] = true;
+			else
+				values[30] = DatumGetUInt64(beentry->st_queryid);
 		}
 		else
 		{
@@ -941,6 +945,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[27] = true;
 			nulls[28] = true;
 			nulls[29] = true;
+			nulls[30] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 1ba47c194b..23c1e0d590 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -72,11 +72,11 @@
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/bgworker.h"
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
-#include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2628,6 +2628,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (padding != 0)
+					appendStringInfo(buf, "%*ld", padding,
+							pgstat_get_my_queryid());
+				else
+					appendStringInfo(buf, "%ld",
+							pgstat_get_my_queryid());
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 81bcb9d25c..eec94ac5a2 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -541,6 +541,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index ae84fcac6e..b0a5731ef7 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -39,7 +39,7 @@
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
-static uint64 compute_utility_queryid(const char *str, int query_len);
+static uint64 compute_utility_queryid(const char *str, int query_location, int query_len);
 static void AppendJumble(JumbleState *jstate,
 						 const unsigned char *item, Size size);
 static void JumbleQueryInternal(JumbleState *jstate, Query *query);
@@ -53,7 +53,7 @@ static void RecordConstLocation(JumbleState *jstate, int location);
  * relevant part of the string.
  */
 const char *
-clean_querytext(const char *query, int *location, int *len)
+CleanQuerytext(const char *query, int *location, int *len)
 {
 	int query_location = *location;
 	int query_len = *len;
@@ -97,17 +97,9 @@ JumbleQuery(Query *query, const char *querytext)
 	JumbleState *jstate = NULL;
 	if (query->utilityStmt)
 	{
-		const char *sql;
-		int query_location = query->stmt_location;
-		int query_len = query->stmt_len;
-
-		/*
-		 * Confine our attention to the relevant part of the string, if the
-		 * query is a portion of a multi-statement source string.
-		 */
-		sql = clean_querytext(querytext, &query_location, &query_len);
-
-		query->queryId = compute_utility_queryid(sql, query_len);
+		query->queryId = compute_utility_queryid(querytext,
+												 query->stmt_location,
+												 query->stmt_len);
 	}
 	else
 	{
@@ -143,11 +135,18 @@ JumbleQuery(Query *query, const char *querytext)
  * Compute a query identifier for the given utility query string.
  */
 static uint64
-compute_utility_queryid(const char *str, int query_len)
+compute_utility_queryid(const char *query_text, int query_location, int query_len)
 {
 	uint64 queryId;
+	const char *sql;
+
+	/*
+	 * Confine our attention to the relevant part of the string, if the
+	 * query is a portion of a multi-statement source string.
+	 */
+	sql = CleanQuerytext(query_text, &query_location, &query_len);
 
-	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) sql,
 											   query_len, 0));
 
 	/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 22340baf1c..872235e8c6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5228,9 +5228,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid, queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a39a5b29c..fb5d908433 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -39,7 +39,8 @@ typedef struct ParallelExecutorInfo
 
 extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
 												  EState *estate, Bitmapset *sendParam, int nworkers,
-												  int64 tuples_needed);
+												  int64 tuples_needed,
+												  uint64 queryId);
 extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
 extern void ExecParallelFinish(ParallelExecutorInfo *pei);
 extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index a821ff4f15..310586d053 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1205,6 +1205,9 @@ typedef struct PgBackendStatus
 	ProgressCommandType st_progress_command;
 	Oid			st_progress_command_target;
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+
+	/* query identifier, optionally computed using post_parse_analyze_hook */
+	uint64		st_queryid;
 } PgBackendStatus;
 
 /*
@@ -1394,6 +1397,7 @@ extern void pgstat_initialize(void);
 extern void pgstat_bestart(void);
 
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_queryid(uint64 queryId, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
@@ -1402,6 +1406,7 @@ extern const char *pgstat_get_wait_event_type(uint32 wait_event_info);
 extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
+extern uint64 pgstat_get_my_queryid(void);
 
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
index 14087eea43..520cd4f43e 100644
--- a/src/include/utils/queryjumble.h
+++ b/src/include/utils/queryjumble.h
@@ -52,7 +52,7 @@ typedef struct JumbleState
 	int			highest_extern_param_id;
 } JumbleState;
 
-const char *clean_querytext(const char *query, int *location, int *len);
+const char *CleanQuerytext(const char *query, int *location, int *len);
 JumbleState *JumbleQuery(Query *query, const char *querytext);
 
 #endif							/* QUERYJUMBLE_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index cf2a9b4408..488001411a 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1760,9 +1760,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1867,7 +1868,7 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
@@ -2015,7 +2016,7 @@ pg_stat_replication| SELECT s.pid,
     w.sync_priority,
     w.sync_state,
     w.reply_time
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_replication_slots| SELECT s.name,
@@ -2043,7 +2044,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,
-- 
2.28.0

v13-0001-Move-pg_stat_statements-query-jumbling-to-core.patchtext/x-patch; charset=UTF-8; name=v13-0001-Move-pg_stat_statements-query-jumbling-to-core.patchDownload

From 5cf0ae90790c7f3772e9e8779d62bdc038b088ca Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Wed, 14 Oct 2020 02:11:37 +0800
Subject: [PATCH v13 1/3] Move pg_stat_statements query jumbling to core.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

A new compute_queryid GUC is also added, to control whether the queryid should
be computed.  It's now possible to disable core queryid computation and use
pg_stat_statements with a different algorithm to compute the queryid by using
third-party module.

Author: Julien Rouhaud²
Reviewed-by:
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +----------------
 .../pg_stat_statements.conf                   |   1 +
 src/backend/parser/analyze.c                  |  14 +-
 src/backend/tcop/postgres.c                   |   6 +-
 src/backend/utils/misc/Makefile               |   1 +
 src/backend/utils/misc/guc.c                  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          | 834 ++++++++++++++++++
 src/include/parser/analyze.h                  |   4 +-
 src/include/utils/guc.h                       |   1 +
 src/include/utils/queryjumble.h               |  58 ++
 11 files changed, 951 insertions(+), 784 deletions(-)
 create mode 100644 src/backend/utils/misc/queryjumble.c
 create mode 100644 src/include/utils/queryjumble.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 1eac9edaee..f352d0b615 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * As of Postgres 9.2, this module normalizes query entries.  As of Postgres
+ * 14, the normalization is done by the core, if compute_queryid is enabled, or
+ * by third-party modules if enabled.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -113,8 +98,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
-#define JUMBLE_SIZE				1024	/* query serialization buffer size */
-
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -224,40 +207,6 @@ typedef struct pgssSharedState
 	int			gc_count;		/* query file garbage collection cycle count */
 } pgssSharedState;
 
-/*
- * Struct for tracking locations/lengths of constants during normalization
- */
-typedef struct pgssLocationLen
-{
-	int			location;		/* start offset in query text */
-	int			length;			/* length in bytes, or -1 to ignore */
-} pgssLocationLen;
-
-/*
- * Working state for computing a query jumble and producing a normalized
- * query string
- */
-typedef struct pgssJumbleState
-{
-	/* Jumble of current query tree */
-	unsigned char *jumble;
-
-	/* Number of bytes used in jumble[] */
-	Size		jumble_len;
-
-	/* Array of locations of constants that should be removed */
-	pgssLocationLen *clocations;
-
-	/* Allocated length of clocations array */
-	int			clocations_buf_size;
-
-	/* Current number of valid entries in clocations array */
-	int			clocations_count;
-
-	/* highest Param id we've seen, in order to start normalization correctly */
-	int			highest_extern_param_id;
-} pgssJumbleState;
-
 /*---- Local variables ----*/
 
 /* Current nesting depth of ExecutorRun+ProcessUtility calls */
@@ -330,7 +279,8 @@ PG_FUNCTION_INFO_V1(pg_stat_statements);
 
 static void pgss_shmem_startup(void);
 static void pgss_shmem_shutdown(int code, Datum arg);
-static void pgss_post_parse_analyze(ParseState *pstate, Query *query);
+static void pgss_post_parse_analyze(ParseState *pstate, Query *query,
+									JumbleState *jstate);
 static PlannedStmt *pgss_planner(Query *parse,
 								 const char *query_string,
 								 int cursorOptions,
@@ -352,7 +302,7 @@ static void pgss_store(const char *query, uint64 queryId,
 					   double total_time, uint64 rows,
 					   const BufferUsage *bufusage,
 					   const WalUsage *walusage,
-					   pgssJumbleState *jstate);
+					   JumbleState *jstate);
 static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
 										pgssVersion api_version,
 										bool showtext);
@@ -368,16 +318,9 @@ static char *qtext_fetch(Size query_offset, int query_len,
 static bool need_gc_qtexts(void);
 static void gc_qtexts(void);
 static void entry_reset(Oid userid, Oid dbid, uint64 queryid);
-static void AppendJumble(pgssJumbleState *jstate,
-						 const unsigned char *item, Size size);
-static void JumbleQuery(pgssJumbleState *jstate, Query *query);
-static void JumbleRangeTable(pgssJumbleState *jstate, List *rtable);
-static void JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks);
-static void JumbleExpr(pgssJumbleState *jstate, Node *node);
-static void RecordConstLocation(pgssJumbleState *jstate, int location);
-static char *generate_normalized_query(pgssJumbleState *jstate, const char *query,
+static char *generate_normalized_query(JumbleState *jstate, const char *query,
 									   int query_loc, int *query_len_p);
-static void fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+static void fill_in_constant_lengths(JumbleState *jstate, const char *query,
 									 int query_loc);
 static int	comp_location(const void *a, const void *b);
 
@@ -830,15 +773,10 @@ error:
  * Post-parse-analysis hook: mark query with a queryId
  */
 static void
-pgss_post_parse_analyze(ParseState *pstate, Query *query)
+pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 {
-	pgssJumbleState jstate;
-
 	if (prev_post_parse_analyze_hook)
-		prev_post_parse_analyze_hook(pstate, query);
-
-	/* Assert we didn't do this already */
-	Assert(query->queryId == UINT64CONST(0));
+		prev_post_parse_analyze_hook(pstate, query, jstate);
 
 	/* Safety check... */
 	if (!pgss || !pgss_hash || !pgss_enabled(exec_nested_level))
@@ -858,35 +796,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 	}
 
-	/* Set up workspace for query jumbling */
-	jstate.jumble = (unsigned char *) palloc(JUMBLE_SIZE);
-	jstate.jumble_len = 0;
-	jstate.clocations_buf_size = 32;
-	jstate.clocations = (pgssLocationLen *)
-		palloc(jstate.clocations_buf_size * sizeof(pgssLocationLen));
-	jstate.clocations_count = 0;
-	jstate.highest_extern_param_id = 0;
-
-	/* Compute query ID and mark the Query node with it */
-	JumbleQuery(&jstate, query);
-	query->queryId =
-		DatumGetUInt64(hash_any_extended(jstate.jumble, jstate.jumble_len, 0));
-
 	/*
-	 * If we are unlucky enough to get a hash of zero, use 1 instead, to
-	 * prevent confusion with the utility-statement case.
+	 * If query jumbling were able to identify any ignorable constants, we
+	 * immediately create a hash table entry for the query, so that we can
+	 * record the normalized form of the query string.  If there were no such
+	 * constants, the normalized string would be the same as the query text
+	 * anyway, so there's no need for an early entry.
 	 */
-	if (query->queryId == UINT64CONST(0))
-		query->queryId = UINT64CONST(1);
-
-	/*
-	 * If we were able to identify any ignorable constants, we immediately
-	 * create a hash table entry for the query, so that we can record the
-	 * normalized form of the query string.  If there were no such constants,
-	 * the normalized string would be the same as the query text anyway, so
-	 * there's no need for an early entry.
-	 */
-	if (jstate.clocations_count > 0)
+	if (jstate && jstate->clocations_count > 0)
 		pgss_store(pstate->p_sourcetext,
 				   query->queryId,
 				   query->stmt_location,
@@ -896,7 +813,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 				   0,
 				   NULL,
 				   NULL,
-				   &jstate);
+				   jstate);
 }
 
 /*
@@ -1245,7 +1162,7 @@ pgss_store(const char *query, uint64 queryId,
 		   double total_time, uint64 rows,
 		   const BufferUsage *bufusage,
 		   const WalUsage *walusage,
-		   pgssJumbleState *jstate)
+		   JumbleState *jstate)
 {
 	pgssHashKey key;
 	pgssEntry  *entry;
@@ -2541,678 +2458,6 @@ release_lock:
 	LWLockRelease(pgss->lock);
 }
 
-/*
- * AppendJumble: Append a value that is substantive in a given query to
- * the current jumble.
- */
-static void
-AppendJumble(pgssJumbleState *jstate, const unsigned char *item, Size size)
-{
-	unsigned char *jumble = jstate->jumble;
-	Size		jumble_len = jstate->jumble_len;
-
-	/*
-	 * Whenever the jumble buffer is full, we hash the current contents and
-	 * reset the buffer to contain just that hash value, thus relying on the
-	 * hash to summarize everything so far.
-	 */
-	while (size > 0)
-	{
-		Size		part_size;
-
-		if (jumble_len >= JUMBLE_SIZE)
-		{
-			uint64		start_hash;
-
-			start_hash = DatumGetUInt64(hash_any_extended(jumble,
-														  JUMBLE_SIZE, 0));
-			memcpy(jumble, &start_hash, sizeof(start_hash));
-			jumble_len = sizeof(start_hash);
-		}
-		part_size = Min(size, JUMBLE_SIZE - jumble_len);
-		memcpy(jumble + jumble_len, item, part_size);
-		jumble_len += part_size;
-		item += part_size;
-		size -= part_size;
-	}
-	jstate->jumble_len = jumble_len;
-}
-
-/*
- * Wrappers around AppendJumble to encapsulate details of serialization
- * of individual local variable elements.
- */
-#define APP_JUMB(item) \
-	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
-#define APP_JUMB_STRING(str) \
-	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
-
-/*
- * JumbleQuery: Selectively serialize the query tree, appending significant
- * data to the "query jumble" while ignoring nonsignificant data.
- *
- * Rule of thumb for what to include is that we should ignore anything not
- * semantically significant (such as alias names) as well as anything that can
- * be deduced from child nodes (else we'd just be double-hashing that piece
- * of information).
- */
-static void
-JumbleQuery(pgssJumbleState *jstate, Query *query)
-{
-	Assert(IsA(query, Query));
-	Assert(query->utilityStmt == NULL);
-
-	APP_JUMB(query->commandType);
-	/* resultRelation is usually predictable from commandType */
-	JumbleExpr(jstate, (Node *) query->cteList);
-	JumbleRangeTable(jstate, query->rtable);
-	JumbleExpr(jstate, (Node *) query->jointree);
-	JumbleExpr(jstate, (Node *) query->targetList);
-	JumbleExpr(jstate, (Node *) query->onConflict);
-	JumbleExpr(jstate, (Node *) query->returningList);
-	JumbleExpr(jstate, (Node *) query->groupClause);
-	JumbleExpr(jstate, (Node *) query->groupingSets);
-	JumbleExpr(jstate, query->havingQual);
-	JumbleExpr(jstate, (Node *) query->windowClause);
-	JumbleExpr(jstate, (Node *) query->distinctClause);
-	JumbleExpr(jstate, (Node *) query->sortClause);
-	JumbleExpr(jstate, query->limitOffset);
-	JumbleExpr(jstate, query->limitCount);
-	JumbleRowMarks(jstate, query->rowMarks);
-	JumbleExpr(jstate, query->setOperations);
-}
-
-/*
- * Jumble a range table
- */
-static void
-JumbleRangeTable(pgssJumbleState *jstate, List *rtable)
-{
-	ListCell   *lc;
-
-	foreach(lc, rtable)
-	{
-		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
-
-		APP_JUMB(rte->rtekind);
-		switch (rte->rtekind)
-		{
-			case RTE_RELATION:
-				APP_JUMB(rte->relid);
-				JumbleExpr(jstate, (Node *) rte->tablesample);
-				break;
-			case RTE_SUBQUERY:
-				JumbleQuery(jstate, rte->subquery);
-				break;
-			case RTE_JOIN:
-				APP_JUMB(rte->jointype);
-				break;
-			case RTE_FUNCTION:
-				JumbleExpr(jstate, (Node *) rte->functions);
-				break;
-			case RTE_TABLEFUNC:
-				JumbleExpr(jstate, (Node *) rte->tablefunc);
-				break;
-			case RTE_VALUES:
-				JumbleExpr(jstate, (Node *) rte->values_lists);
-				break;
-			case RTE_CTE:
-
-				/*
-				 * Depending on the CTE name here isn't ideal, but it's the
-				 * only info we have to identify the referenced WITH item.
-				 */
-				APP_JUMB_STRING(rte->ctename);
-				APP_JUMB(rte->ctelevelsup);
-				break;
-			case RTE_NAMEDTUPLESTORE:
-				APP_JUMB_STRING(rte->enrname);
-				break;
-			case RTE_RESULT:
-				break;
-			default:
-				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
-				break;
-		}
-	}
-}
-
-/*
- * Jumble a rowMarks list
- */
-static void
-JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks)
-{
-	ListCell   *lc;
-
-	foreach(lc, rowMarks)
-	{
-		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
-
-		if (!rowmark->pushedDown)
-		{
-			APP_JUMB(rowmark->rti);
-			APP_JUMB(rowmark->strength);
-			APP_JUMB(rowmark->waitPolicy);
-		}
-	}
-}
-
-/*
- * Jumble an expression tree
- *
- * In general this function should handle all the same node types that
- * expression_tree_walker() does, and therefore it's coded to be as parallel
- * to that function as possible.  However, since we are only invoked on
- * queries immediately post-parse-analysis, we need not handle node types
- * that only appear in planning.
- *
- * Note: the reason we don't simply use expression_tree_walker() is that the
- * point of that function is to support tree walkers that don't care about
- * most tree node types, but here we care about all types.  We should complain
- * about any unrecognized node type.
- */
-static void
-JumbleExpr(pgssJumbleState *jstate, Node *node)
-{
-	ListCell   *temp;
-
-	if (node == NULL)
-		return;
-
-	/* Guard against stack overflow due to overly complex expressions */
-	check_stack_depth();
-
-	/*
-	 * We always emit the node's NodeTag, then any additional fields that are
-	 * considered significant, and then we recurse to any child nodes.
-	 */
-	APP_JUMB(node->type);
-
-	switch (nodeTag(node))
-	{
-		case T_Var:
-			{
-				Var		   *var = (Var *) node;
-
-				APP_JUMB(var->varno);
-				APP_JUMB(var->varattno);
-				APP_JUMB(var->varlevelsup);
-			}
-			break;
-		case T_Const:
-			{
-				Const	   *c = (Const *) node;
-
-				/* We jumble only the constant's type, not its value */
-				APP_JUMB(c->consttype);
-				/* Also, record its parse location for query normalization */
-				RecordConstLocation(jstate, c->location);
-			}
-			break;
-		case T_Param:
-			{
-				Param	   *p = (Param *) node;
-
-				APP_JUMB(p->paramkind);
-				APP_JUMB(p->paramid);
-				APP_JUMB(p->paramtype);
-				/* Also, track the highest external Param id */
-				if (p->paramkind == PARAM_EXTERN &&
-					p->paramid > jstate->highest_extern_param_id)
-					jstate->highest_extern_param_id = p->paramid;
-			}
-			break;
-		case T_Aggref:
-			{
-				Aggref	   *expr = (Aggref *) node;
-
-				APP_JUMB(expr->aggfnoid);
-				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggorder);
-				JumbleExpr(jstate, (Node *) expr->aggdistinct);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_GroupingFunc:
-			{
-				GroupingFunc *grpnode = (GroupingFunc *) node;
-
-				JumbleExpr(jstate, (Node *) grpnode->refs);
-			}
-			break;
-		case T_WindowFunc:
-			{
-				WindowFunc *expr = (WindowFunc *) node;
-
-				APP_JUMB(expr->winfnoid);
-				APP_JUMB(expr->winref);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_SubscriptingRef:
-			{
-				SubscriptingRef *sbsref = (SubscriptingRef *) node;
-
-				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
-			}
-			break;
-		case T_FuncExpr:
-			{
-				FuncExpr   *expr = (FuncExpr *) node;
-
-				APP_JUMB(expr->funcid);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_NamedArgExpr:
-			{
-				NamedArgExpr *nae = (NamedArgExpr *) node;
-
-				APP_JUMB(nae->argnumber);
-				JumbleExpr(jstate, (Node *) nae->arg);
-			}
-			break;
-		case T_OpExpr:
-		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
-		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
-			{
-				OpExpr	   *expr = (OpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_ScalarArrayOpExpr:
-			{
-				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				APP_JUMB(expr->useOr);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_BoolExpr:
-			{
-				BoolExpr   *expr = (BoolExpr *) node;
-
-				APP_JUMB(expr->boolop);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_SubLink:
-			{
-				SubLink    *sublink = (SubLink *) node;
-
-				APP_JUMB(sublink->subLinkType);
-				APP_JUMB(sublink->subLinkId);
-				JumbleExpr(jstate, (Node *) sublink->testexpr);
-				JumbleQuery(jstate, castNode(Query, sublink->subselect));
-			}
-			break;
-		case T_FieldSelect:
-			{
-				FieldSelect *fs = (FieldSelect *) node;
-
-				APP_JUMB(fs->fieldnum);
-				JumbleExpr(jstate, (Node *) fs->arg);
-			}
-			break;
-		case T_FieldStore:
-			{
-				FieldStore *fstore = (FieldStore *) node;
-
-				JumbleExpr(jstate, (Node *) fstore->arg);
-				JumbleExpr(jstate, (Node *) fstore->newvals);
-			}
-			break;
-		case T_RelabelType:
-			{
-				RelabelType *rt = (RelabelType *) node;
-
-				APP_JUMB(rt->resulttype);
-				JumbleExpr(jstate, (Node *) rt->arg);
-			}
-			break;
-		case T_CoerceViaIO:
-			{
-				CoerceViaIO *cio = (CoerceViaIO *) node;
-
-				APP_JUMB(cio->resulttype);
-				JumbleExpr(jstate, (Node *) cio->arg);
-			}
-			break;
-		case T_ArrayCoerceExpr:
-			{
-				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
-
-				APP_JUMB(acexpr->resulttype);
-				JumbleExpr(jstate, (Node *) acexpr->arg);
-				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
-			}
-			break;
-		case T_ConvertRowtypeExpr:
-			{
-				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
-
-				APP_JUMB(crexpr->resulttype);
-				JumbleExpr(jstate, (Node *) crexpr->arg);
-			}
-			break;
-		case T_CollateExpr:
-			{
-				CollateExpr *ce = (CollateExpr *) node;
-
-				APP_JUMB(ce->collOid);
-				JumbleExpr(jstate, (Node *) ce->arg);
-			}
-			break;
-		case T_CaseExpr:
-			{
-				CaseExpr   *caseexpr = (CaseExpr *) node;
-
-				JumbleExpr(jstate, (Node *) caseexpr->arg);
-				foreach(temp, caseexpr->args)
-				{
-					CaseWhen   *when = lfirst_node(CaseWhen, temp);
-
-					JumbleExpr(jstate, (Node *) when->expr);
-					JumbleExpr(jstate, (Node *) when->result);
-				}
-				JumbleExpr(jstate, (Node *) caseexpr->defresult);
-			}
-			break;
-		case T_CaseTestExpr:
-			{
-				CaseTestExpr *ct = (CaseTestExpr *) node;
-
-				APP_JUMB(ct->typeId);
-			}
-			break;
-		case T_ArrayExpr:
-			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
-			break;
-		case T_RowExpr:
-			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
-			break;
-		case T_RowCompareExpr:
-			{
-				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
-
-				APP_JUMB(rcexpr->rctype);
-				JumbleExpr(jstate, (Node *) rcexpr->largs);
-				JumbleExpr(jstate, (Node *) rcexpr->rargs);
-			}
-			break;
-		case T_CoalesceExpr:
-			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
-			break;
-		case T_MinMaxExpr:
-			{
-				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
-
-				APP_JUMB(mmexpr->op);
-				JumbleExpr(jstate, (Node *) mmexpr->args);
-			}
-			break;
-		case T_SQLValueFunction:
-			{
-				SQLValueFunction *svf = (SQLValueFunction *) node;
-
-				APP_JUMB(svf->op);
-				/* type is fully determined by op */
-				APP_JUMB(svf->typmod);
-			}
-			break;
-		case T_XmlExpr:
-			{
-				XmlExpr    *xexpr = (XmlExpr *) node;
-
-				APP_JUMB(xexpr->op);
-				JumbleExpr(jstate, (Node *) xexpr->named_args);
-				JumbleExpr(jstate, (Node *) xexpr->args);
-			}
-			break;
-		case T_NullTest:
-			{
-				NullTest   *nt = (NullTest *) node;
-
-				APP_JUMB(nt->nulltesttype);
-				JumbleExpr(jstate, (Node *) nt->arg);
-			}
-			break;
-		case T_BooleanTest:
-			{
-				BooleanTest *bt = (BooleanTest *) node;
-
-				APP_JUMB(bt->booltesttype);
-				JumbleExpr(jstate, (Node *) bt->arg);
-			}
-			break;
-		case T_CoerceToDomain:
-			{
-				CoerceToDomain *cd = (CoerceToDomain *) node;
-
-				APP_JUMB(cd->resulttype);
-				JumbleExpr(jstate, (Node *) cd->arg);
-			}
-			break;
-		case T_CoerceToDomainValue:
-			{
-				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
-
-				APP_JUMB(cdv->typeId);
-			}
-			break;
-		case T_SetToDefault:
-			{
-				SetToDefault *sd = (SetToDefault *) node;
-
-				APP_JUMB(sd->typeId);
-			}
-			break;
-		case T_CurrentOfExpr:
-			{
-				CurrentOfExpr *ce = (CurrentOfExpr *) node;
-
-				APP_JUMB(ce->cvarno);
-				if (ce->cursor_name)
-					APP_JUMB_STRING(ce->cursor_name);
-				APP_JUMB(ce->cursor_param);
-			}
-			break;
-		case T_NextValueExpr:
-			{
-				NextValueExpr *nve = (NextValueExpr *) node;
-
-				APP_JUMB(nve->seqid);
-				APP_JUMB(nve->typeId);
-			}
-			break;
-		case T_InferenceElem:
-			{
-				InferenceElem *ie = (InferenceElem *) node;
-
-				APP_JUMB(ie->infercollid);
-				APP_JUMB(ie->inferopclass);
-				JumbleExpr(jstate, ie->expr);
-			}
-			break;
-		case T_TargetEntry:
-			{
-				TargetEntry *tle = (TargetEntry *) node;
-
-				APP_JUMB(tle->resno);
-				APP_JUMB(tle->ressortgroupref);
-				JumbleExpr(jstate, (Node *) tle->expr);
-			}
-			break;
-		case T_RangeTblRef:
-			{
-				RangeTblRef *rtr = (RangeTblRef *) node;
-
-				APP_JUMB(rtr->rtindex);
-			}
-			break;
-		case T_JoinExpr:
-			{
-				JoinExpr   *join = (JoinExpr *) node;
-
-				APP_JUMB(join->jointype);
-				APP_JUMB(join->isNatural);
-				APP_JUMB(join->rtindex);
-				JumbleExpr(jstate, join->larg);
-				JumbleExpr(jstate, join->rarg);
-				JumbleExpr(jstate, join->quals);
-			}
-			break;
-		case T_FromExpr:
-			{
-				FromExpr   *from = (FromExpr *) node;
-
-				JumbleExpr(jstate, (Node *) from->fromlist);
-				JumbleExpr(jstate, from->quals);
-			}
-			break;
-		case T_OnConflictExpr:
-			{
-				OnConflictExpr *conf = (OnConflictExpr *) node;
-
-				APP_JUMB(conf->action);
-				JumbleExpr(jstate, (Node *) conf->arbiterElems);
-				JumbleExpr(jstate, conf->arbiterWhere);
-				JumbleExpr(jstate, (Node *) conf->onConflictSet);
-				JumbleExpr(jstate, conf->onConflictWhere);
-				APP_JUMB(conf->constraint);
-				APP_JUMB(conf->exclRelIndex);
-				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
-			}
-			break;
-		case T_List:
-			foreach(temp, (List *) node)
-			{
-				JumbleExpr(jstate, (Node *) lfirst(temp));
-			}
-			break;
-		case T_IntList:
-			foreach(temp, (List *) node)
-			{
-				APP_JUMB(lfirst_int(temp));
-			}
-			break;
-		case T_SortGroupClause:
-			{
-				SortGroupClause *sgc = (SortGroupClause *) node;
-
-				APP_JUMB(sgc->tleSortGroupRef);
-				APP_JUMB(sgc->eqop);
-				APP_JUMB(sgc->sortop);
-				APP_JUMB(sgc->nulls_first);
-			}
-			break;
-		case T_GroupingSet:
-			{
-				GroupingSet *gsnode = (GroupingSet *) node;
-
-				JumbleExpr(jstate, (Node *) gsnode->content);
-			}
-			break;
-		case T_WindowClause:
-			{
-				WindowClause *wc = (WindowClause *) node;
-
-				APP_JUMB(wc->winref);
-				APP_JUMB(wc->frameOptions);
-				JumbleExpr(jstate, (Node *) wc->partitionClause);
-				JumbleExpr(jstate, (Node *) wc->orderClause);
-				JumbleExpr(jstate, wc->startOffset);
-				JumbleExpr(jstate, wc->endOffset);
-			}
-			break;
-		case T_CommonTableExpr:
-			{
-				CommonTableExpr *cte = (CommonTableExpr *) node;
-
-				/* we store the string name because RTE_CTE RTEs need it */
-				APP_JUMB_STRING(cte->ctename);
-				APP_JUMB(cte->ctematerialized);
-				JumbleQuery(jstate, castNode(Query, cte->ctequery));
-			}
-			break;
-		case T_SetOperationStmt:
-			{
-				SetOperationStmt *setop = (SetOperationStmt *) node;
-
-				APP_JUMB(setop->op);
-				APP_JUMB(setop->all);
-				JumbleExpr(jstate, setop->larg);
-				JumbleExpr(jstate, setop->rarg);
-			}
-			break;
-		case T_RangeTblFunction:
-			{
-				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
-
-				JumbleExpr(jstate, rtfunc->funcexpr);
-			}
-			break;
-		case T_TableFunc:
-			{
-				TableFunc  *tablefunc = (TableFunc *) node;
-
-				JumbleExpr(jstate, tablefunc->docexpr);
-				JumbleExpr(jstate, tablefunc->rowexpr);
-				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
-			}
-			break;
-		case T_TableSampleClause:
-			{
-				TableSampleClause *tsc = (TableSampleClause *) node;
-
-				APP_JUMB(tsc->tsmhandler);
-				JumbleExpr(jstate, (Node *) tsc->args);
-				JumbleExpr(jstate, (Node *) tsc->repeatable);
-			}
-			break;
-		default:
-			/* Only a warning, since we can stumble along anyway */
-			elog(WARNING, "unrecognized node type: %d",
-				 (int) nodeTag(node));
-			break;
-	}
-}
-
-/*
- * Record location of constant within query string of query tree
- * that is currently being walked.
- */
-static void
-RecordConstLocation(pgssJumbleState *jstate, int location)
-{
-	/* -1 indicates unknown or undefined location */
-	if (location >= 0)
-	{
-		/* enlarge array if needed */
-		if (jstate->clocations_count >= jstate->clocations_buf_size)
-		{
-			jstate->clocations_buf_size *= 2;
-			jstate->clocations = (pgssLocationLen *)
-				repalloc(jstate->clocations,
-						 jstate->clocations_buf_size *
-						 sizeof(pgssLocationLen));
-		}
-		jstate->clocations[jstate->clocations_count].location = location;
-		/* initialize lengths to -1 to simplify fill_in_constant_lengths */
-		jstate->clocations[jstate->clocations_count].length = -1;
-		jstate->clocations_count++;
-	}
-}
-
 /*
  * Generate a normalized version of the query string that will be used to
  * represent all similar queries.
@@ -3233,7 +2478,7 @@ RecordConstLocation(pgssJumbleState *jstate, int location)
  * Returns a palloc'd string.
  */
 static char *
-generate_normalized_query(pgssJumbleState *jstate, const char *query,
+generate_normalized_query(JumbleState *jstate, const char *query,
 						  int query_loc, int *query_len_p)
 {
 	char	   *norm_query;
@@ -3340,10 +2585,10 @@ generate_normalized_query(pgssJumbleState *jstate, const char *query,
  * reason for a constant to start with a '-'.
  */
 static void
-fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+fill_in_constant_lengths(JumbleState *jstate, const char *query,
 						 int query_loc)
 {
-	pgssLocationLen *locs;
+	LocationLen *locs;
 	core_yyscan_t yyscanner;
 	core_yy_extra_type yyextra;
 	core_YYSTYPE yylval;
@@ -3357,7 +2602,7 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 	 */
 	if (jstate->clocations_count > 1)
 		qsort(jstate->clocations, jstate->clocations_count,
-			  sizeof(pgssLocationLen), comp_location);
+			  sizeof(LocationLen), comp_location);
 	locs = jstate->clocations;
 
 	/* initialize the flex scanner --- should match raw_parser() */
@@ -3437,13 +2682,13 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 }
 
 /*
- * comp_location: comparator for qsorting pgssLocationLen structs by location
+ * comp_location: comparator for qsorting LocationLen structs by location
  */
 static int
 comp_location(const void *a, const void *b)
 {
-	int			l = ((const pgssLocationLen *) a)->location;
-	int			r = ((const pgssLocationLen *) b)->location;
+	int			l = ((const LocationLen *) a)->location;
+	int			r = ((const LocationLen *) b)->location;
 
 	if (l < r)
 		return -1;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.conf b/contrib/pg_stat_statements/pg_stat_statements.conf
index 13346e2807..d98411ea3f 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.conf
+++ b/contrib/pg_stat_statements/pg_stat_statements.conf
@@ -1 +1,2 @@
 shared_preload_libraries = 'pg_stat_statements'
+compute_queryid = on
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index c159fb2957..c59336cd49 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -44,6 +44,8 @@
 #include "parser/parse_target.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
+#include "utils/guc.h"
+#include "utils/queryjumble.h"
 #include "utils/rel.h"
 
 
@@ -103,6 +105,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -115,8 +118,11 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	query = transformTopLevelStmt(pstate, parseTree);
 
+	if (compute_queryid)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
@@ -136,6 +142,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -148,8 +155,11 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 	/* make sure all is well with parameter types */
 	check_variable_parameters(pstate, query);
 
+	if (compute_queryid)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 411cfadbff..0deb3c143f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -719,6 +719,7 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 	ParseState *pstate;
 	Query	   *query;
 	List	   *querytree_list;
+	JumbleState *jstate = NULL;
 
 	Assert(query_string != NULL);	/* required as of 8.4 */
 
@@ -737,8 +738,11 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	query = transformTopLevelStmt(pstate, parsetree);
 
+	if (compute_queryid)
+		jstate = JumbleQuery(query, query_string);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 2397fc2453..1d5327cf64 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -22,6 +22,7 @@ OBJS = \
 	pg_rusage.o \
 	ps_status.o \
 	queryenvironment.o \
+	queryjumble.o \
 	rls.o \
 	sampling.o \
 	superuser.o \
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a62d64eaa4..46a56a4a59 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -510,6 +510,7 @@ extern const struct config_enum_entry dynamic_shared_memory_options[];
 /*
  * GUC option variables that are exported from this module
  */
+bool		compute_queryid = false;
 bool		log_duration = false;
 bool		Debug_print_plan = false;
 bool		Debug_print_parse = false;
@@ -1404,6 +1405,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"compute_queryid", PGC_SUSET, STATS_MONITORING,
+			gettext_noop("Compute query identifiers."),
+			NULL
+		},
+		&compute_queryid,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"log_parser_stats", PGC_SUSET, STATS_MONITORING,
 			gettext_noop("Writes parser performance statistics to the server log."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9cb571f7cc..81bcb9d25c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -591,6 +591,7 @@
 
 # - Monitoring -
 
+#compute_queryid = off
 #log_parser_stats = off
 #log_planner_stats = off
 #log_executor_stats = off
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
new file mode 100644
index 0000000000..ae84fcac6e
--- /dev/null
+++ b/src/backend/utils/misc/queryjumble.c
@@ -0,0 +1,834 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.c
+ *	 Query normalization and fingerprinting.
+ *
+ * Normalization is a process whereby similar queries, typically differing only
+ * in their constants (though the exact rules are somewhat more subtle than
+ * that) are recognized as equivalent, and are tracked as a single entry.  This
+ * is particularly useful for non-prepared queries.
+ *
+ * Normalization is implemented by fingerprinting queries, selectively
+ * serializing those fields of each query tree's nodes that are judged to be
+ * essential to the query.  This is referred to as a query jumble.  This is
+ * distinct from a regular serialization in that various extraneous
+ * information is ignored as irrelevant or not essential to the query, such
+ * as the collations of Vars and, most notably, the values of constants.
+ *
+ * This jumble is acquired at the end of parse analysis of each query, and
+ * a 64-bit hash of it is stored into the query's Query.queryId field.
+ * The server then copies this value around, making it available in plan
+ * tree(s) generated from the query.  The executor can then use this value
+ * to blame query costs on the proper queryId.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/misc/queryjumble.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "common/hashfn.h"
+#include "miscadmin.h"
+#include "parser/scansup.h"
+#include "utils/queryjumble.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+static uint64 compute_utility_queryid(const char *str, int query_len);
+static void AppendJumble(JumbleState *jstate,
+						 const unsigned char *item, Size size);
+static void JumbleQueryInternal(JumbleState *jstate, Query *query);
+static void JumbleRangeTable(JumbleState *jstate, List *rtable);
+static void JumbleRowMarks(JumbleState *jstate, List *rowMarks);
+static void JumbleExpr(JumbleState *jstate, Node *node);
+static void RecordConstLocation(JumbleState *jstate, int location);
+
+/*
+ * Given a possibly multi-statement source string, confine our attention to the
+ * relevant part of the string.
+ */
+const char *
+clean_querytext(const char *query, int *location, int *len)
+{
+	int query_location = *location;
+	int query_len = *len;
+
+	/* First apply starting offset, unless it's -1 (unknown). */
+	if (query_location >= 0)
+	{
+		Assert(query_location <= strlen(query));
+		query += query_location;
+		/* Length of 0 (or -1) means "rest of string" */
+		if (query_len <= 0)
+			query_len = strlen(query);
+		else
+			Assert(query_len <= strlen(query));
+	}
+	else
+	{
+		/* If query location is unknown, distrust query_len as well */
+		query_location = 0;
+		query_len = strlen(query);
+	}
+
+	/*
+	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
+	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 */
+	while (query_len > 0 && scanner_isspace(query[0]))
+		query++, query_location++, query_len--;
+	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
+		query_len--;
+
+	*location = query_location;
+	*len = query_len;
+
+	return query;
+}
+
+JumbleState *
+JumbleQuery(Query *query, const char *querytext)
+{
+	JumbleState *jstate = NULL;
+	if (query->utilityStmt)
+	{
+		const char *sql;
+		int query_location = query->stmt_location;
+		int query_len = query->stmt_len;
+
+		/*
+		 * Confine our attention to the relevant part of the string, if the
+		 * query is a portion of a multi-statement source string.
+		 */
+		sql = clean_querytext(querytext, &query_location, &query_len);
+
+		query->queryId = compute_utility_queryid(sql, query_len);
+	}
+	else
+	{
+		jstate = (JumbleState *) palloc(sizeof(JumbleState));
+
+		/* Set up workspace for query jumbling */
+		jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
+		jstate->jumble_len = 0;
+		jstate->clocations_buf_size = 32;
+		jstate->clocations = (LocationLen *)
+			palloc(jstate->clocations_buf_size * sizeof(LocationLen));
+		jstate->clocations_count = 0;
+		jstate->highest_extern_param_id = 0;
+
+		/* Compute query ID and mark the Query node with it */
+		JumbleQueryInternal(jstate, query);
+		query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
+														  jstate->jumble_len,
+														  0));
+
+		/*
+		 * If we are unlucky enough to get a hash of zero, use 1 instead, to
+		 * prevent confusion with the utility-statement case.
+		 */
+		if (query->queryId == UINT64CONST(0))
+			query->queryId = UINT64CONST(1);
+	}
+
+	return jstate;
+}
+
+/*
+ * Compute a query identifier for the given utility query string.
+ */
+static uint64
+compute_utility_queryid(const char *str, int query_len)
+{
+	uint64 queryId;
+
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+											   query_len, 0));
+
+	/*
+	 * If we are unlucky enough to get a hash of zero(invalid), use
+	 * queryID as 2 instead, queryID 1 is already in use for normal
+	 * statements.
+	 */
+	if (queryId == UINT64CONST(0))
+		queryId = UINT64CONST(2);
+
+	return queryId;
+}
+
+/*
+ * AppendJumble: Append a value that is substantive in a given query to
+ * the current jumble.
+ */
+static void
+AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
+{
+	unsigned char *jumble = jstate->jumble;
+	Size		jumble_len = jstate->jumble_len;
+
+	/*
+	 * Whenever the jumble buffer is full, we hash the current contents and
+	 * reset the buffer to contain just that hash value, thus relying on the
+	 * hash to summarize everything so far.
+	 */
+	while (size > 0)
+	{
+		Size		part_size;
+
+		if (jumble_len >= JUMBLE_SIZE)
+		{
+			uint64		start_hash;
+
+			start_hash = DatumGetUInt64(hash_any_extended(jumble,
+														  JUMBLE_SIZE, 0));
+			memcpy(jumble, &start_hash, sizeof(start_hash));
+			jumble_len = sizeof(start_hash);
+		}
+		part_size = Min(size, JUMBLE_SIZE - jumble_len);
+		memcpy(jumble + jumble_len, item, part_size);
+		jumble_len += part_size;
+		item += part_size;
+		size -= part_size;
+	}
+	jstate->jumble_len = jumble_len;
+}
+
+/*
+ * Wrappers around AppendJumble to encapsulate details of serialization
+ * of individual local variable elements.
+ */
+#define APP_JUMB(item) \
+	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
+#define APP_JUMB_STRING(str) \
+	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
+
+/*
+ * JumbleQueryInternal: Selectively serialize the query tree, appending
+ * significant data to the "query jumble" while ignoring nonsignificant data.
+ *
+ * Rule of thumb for what to include is that we should ignore anything not
+ * semantically significant (such as alias names) as well as anything that can
+ * be deduced from child nodes (else we'd just be double-hashing that piece
+ * of information).
+ */
+static void
+JumbleQueryInternal(JumbleState *jstate, Query *query)
+{
+	Assert(IsA(query, Query));
+	Assert(query->utilityStmt == NULL);
+
+	APP_JUMB(query->commandType);
+	/* resultRelation is usually predictable from commandType */
+	JumbleExpr(jstate, (Node *) query->cteList);
+	JumbleRangeTable(jstate, query->rtable);
+	JumbleExpr(jstate, (Node *) query->jointree);
+	JumbleExpr(jstate, (Node *) query->targetList);
+	JumbleExpr(jstate, (Node *) query->onConflict);
+	JumbleExpr(jstate, (Node *) query->returningList);
+	JumbleExpr(jstate, (Node *) query->groupClause);
+	JumbleExpr(jstate, (Node *) query->groupingSets);
+	JumbleExpr(jstate, query->havingQual);
+	JumbleExpr(jstate, (Node *) query->windowClause);
+	JumbleExpr(jstate, (Node *) query->distinctClause);
+	JumbleExpr(jstate, (Node *) query->sortClause);
+	JumbleExpr(jstate, query->limitOffset);
+	JumbleExpr(jstate, query->limitCount);
+	JumbleRowMarks(jstate, query->rowMarks);
+	JumbleExpr(jstate, query->setOperations);
+}
+
+/*
+ * Jumble a range table
+ */
+static void
+JumbleRangeTable(JumbleState *jstate, List *rtable)
+{
+	ListCell   *lc;
+
+	foreach(lc, rtable)
+	{
+		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+		APP_JUMB(rte->rtekind);
+		switch (rte->rtekind)
+		{
+			case RTE_RELATION:
+				APP_JUMB(rte->relid);
+				JumbleExpr(jstate, (Node *) rte->tablesample);
+				break;
+			case RTE_SUBQUERY:
+				JumbleQueryInternal(jstate, rte->subquery);
+				break;
+			case RTE_JOIN:
+				APP_JUMB(rte->jointype);
+				break;
+			case RTE_FUNCTION:
+				JumbleExpr(jstate, (Node *) rte->functions);
+				break;
+			case RTE_TABLEFUNC:
+				JumbleExpr(jstate, (Node *) rte->tablefunc);
+				break;
+			case RTE_VALUES:
+				JumbleExpr(jstate, (Node *) rte->values_lists);
+				break;
+			case RTE_CTE:
+
+				/*
+				 * Depending on the CTE name here isn't ideal, but it's the
+				 * only info we have to identify the referenced WITH item.
+				 */
+				APP_JUMB_STRING(rte->ctename);
+				APP_JUMB(rte->ctelevelsup);
+				break;
+			case RTE_NAMEDTUPLESTORE:
+				APP_JUMB_STRING(rte->enrname);
+				break;
+			case RTE_RESULT:
+				break;
+			default:
+				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
+				break;
+		}
+	}
+}
+
+/*
+ * Jumble a rowMarks list
+ */
+static void
+JumbleRowMarks(JumbleState *jstate, List *rowMarks)
+{
+	ListCell   *lc;
+
+	foreach(lc, rowMarks)
+	{
+		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
+
+		if (!rowmark->pushedDown)
+		{
+			APP_JUMB(rowmark->rti);
+			APP_JUMB(rowmark->strength);
+			APP_JUMB(rowmark->waitPolicy);
+		}
+	}
+}
+
+/*
+ * Jumble an expression tree
+ *
+ * In general this function should handle all the same node types that
+ * expression_tree_walker() does, and therefore it's coded to be as parallel
+ * to that function as possible.  However, since we are only invoked on
+ * queries immediately post-parse-analysis, we need not handle node types
+ * that only appear in planning.
+ *
+ * Note: the reason we don't simply use expression_tree_walker() is that the
+ * point of that function is to support tree walkers that don't care about
+ * most tree node types, but here we care about all types.  We should complain
+ * about any unrecognized node type.
+ */
+static void
+JumbleExpr(JumbleState *jstate, Node *node)
+{
+	ListCell   *temp;
+
+	if (node == NULL)
+		return;
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+
+	/*
+	 * We always emit the node's NodeTag, then any additional fields that are
+	 * considered significant, and then we recurse to any child nodes.
+	 */
+	APP_JUMB(node->type);
+
+	switch (nodeTag(node))
+	{
+		case T_Var:
+			{
+				Var		   *var = (Var *) node;
+
+				APP_JUMB(var->varno);
+				APP_JUMB(var->varattno);
+				APP_JUMB(var->varlevelsup);
+			}
+			break;
+		case T_Const:
+			{
+				Const	   *c = (Const *) node;
+
+				/* We jumble only the constant's type, not its value */
+				APP_JUMB(c->consttype);
+				/* Also, record its parse location for query normalization */
+				RecordConstLocation(jstate, c->location);
+			}
+			break;
+		case T_Param:
+			{
+				Param	   *p = (Param *) node;
+
+				APP_JUMB(p->paramkind);
+				APP_JUMB(p->paramid);
+				APP_JUMB(p->paramtype);
+				/* Also, track the highest external Param id */
+				if (p->paramkind == PARAM_EXTERN &&
+					p->paramid > jstate->highest_extern_param_id)
+					jstate->highest_extern_param_id = p->paramid;
+			}
+			break;
+		case T_Aggref:
+			{
+				Aggref	   *expr = (Aggref *) node;
+
+				APP_JUMB(expr->aggfnoid);
+				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggorder);
+				JumbleExpr(jstate, (Node *) expr->aggdistinct);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_GroupingFunc:
+			{
+				GroupingFunc *grpnode = (GroupingFunc *) node;
+
+				JumbleExpr(jstate, (Node *) grpnode->refs);
+			}
+			break;
+		case T_WindowFunc:
+			{
+				WindowFunc *expr = (WindowFunc *) node;
+
+				APP_JUMB(expr->winfnoid);
+				APP_JUMB(expr->winref);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_SubscriptingRef:
+			{
+				SubscriptingRef *sbsref = (SubscriptingRef *) node;
+
+				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
+			}
+			break;
+		case T_FuncExpr:
+			{
+				FuncExpr   *expr = (FuncExpr *) node;
+
+				APP_JUMB(expr->funcid);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_NamedArgExpr:
+			{
+				NamedArgExpr *nae = (NamedArgExpr *) node;
+
+				APP_JUMB(nae->argnumber);
+				JumbleExpr(jstate, (Node *) nae->arg);
+			}
+			break;
+		case T_OpExpr:
+		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
+		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
+			{
+				OpExpr	   *expr = (OpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_ScalarArrayOpExpr:
+			{
+				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				APP_JUMB(expr->useOr);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_BoolExpr:
+			{
+				BoolExpr   *expr = (BoolExpr *) node;
+
+				APP_JUMB(expr->boolop);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_SubLink:
+			{
+				SubLink    *sublink = (SubLink *) node;
+
+				APP_JUMB(sublink->subLinkType);
+				APP_JUMB(sublink->subLinkId);
+				JumbleExpr(jstate, (Node *) sublink->testexpr);
+				JumbleQueryInternal(jstate, castNode(Query, sublink->subselect));
+			}
+			break;
+		case T_FieldSelect:
+			{
+				FieldSelect *fs = (FieldSelect *) node;
+
+				APP_JUMB(fs->fieldnum);
+				JumbleExpr(jstate, (Node *) fs->arg);
+			}
+			break;
+		case T_FieldStore:
+			{
+				FieldStore *fstore = (FieldStore *) node;
+
+				JumbleExpr(jstate, (Node *) fstore->arg);
+				JumbleExpr(jstate, (Node *) fstore->newvals);
+			}
+			break;
+		case T_RelabelType:
+			{
+				RelabelType *rt = (RelabelType *) node;
+
+				APP_JUMB(rt->resulttype);
+				JumbleExpr(jstate, (Node *) rt->arg);
+			}
+			break;
+		case T_CoerceViaIO:
+			{
+				CoerceViaIO *cio = (CoerceViaIO *) node;
+
+				APP_JUMB(cio->resulttype);
+				JumbleExpr(jstate, (Node *) cio->arg);
+			}
+			break;
+		case T_ArrayCoerceExpr:
+			{
+				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
+
+				APP_JUMB(acexpr->resulttype);
+				JumbleExpr(jstate, (Node *) acexpr->arg);
+				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
+			}
+			break;
+		case T_ConvertRowtypeExpr:
+			{
+				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
+
+				APP_JUMB(crexpr->resulttype);
+				JumbleExpr(jstate, (Node *) crexpr->arg);
+			}
+			break;
+		case T_CollateExpr:
+			{
+				CollateExpr *ce = (CollateExpr *) node;
+
+				APP_JUMB(ce->collOid);
+				JumbleExpr(jstate, (Node *) ce->arg);
+			}
+			break;
+		case T_CaseExpr:
+			{
+				CaseExpr   *caseexpr = (CaseExpr *) node;
+
+				JumbleExpr(jstate, (Node *) caseexpr->arg);
+				foreach(temp, caseexpr->args)
+				{
+					CaseWhen   *when = lfirst_node(CaseWhen, temp);
+
+					JumbleExpr(jstate, (Node *) when->expr);
+					JumbleExpr(jstate, (Node *) when->result);
+				}
+				JumbleExpr(jstate, (Node *) caseexpr->defresult);
+			}
+			break;
+		case T_CaseTestExpr:
+			{
+				CaseTestExpr *ct = (CaseTestExpr *) node;
+
+				APP_JUMB(ct->typeId);
+			}
+			break;
+		case T_ArrayExpr:
+			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
+			break;
+		case T_RowExpr:
+			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
+			break;
+		case T_RowCompareExpr:
+			{
+				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
+
+				APP_JUMB(rcexpr->rctype);
+				JumbleExpr(jstate, (Node *) rcexpr->largs);
+				JumbleExpr(jstate, (Node *) rcexpr->rargs);
+			}
+			break;
+		case T_CoalesceExpr:
+			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
+			break;
+		case T_MinMaxExpr:
+			{
+				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
+
+				APP_JUMB(mmexpr->op);
+				JumbleExpr(jstate, (Node *) mmexpr->args);
+			}
+			break;
+		case T_SQLValueFunction:
+			{
+				SQLValueFunction *svf = (SQLValueFunction *) node;
+
+				APP_JUMB(svf->op);
+				/* type is fully determined by op */
+				APP_JUMB(svf->typmod);
+			}
+			break;
+		case T_XmlExpr:
+			{
+				XmlExpr    *xexpr = (XmlExpr *) node;
+
+				APP_JUMB(xexpr->op);
+				JumbleExpr(jstate, (Node *) xexpr->named_args);
+				JumbleExpr(jstate, (Node *) xexpr->args);
+			}
+			break;
+		case T_NullTest:
+			{
+				NullTest   *nt = (NullTest *) node;
+
+				APP_JUMB(nt->nulltesttype);
+				JumbleExpr(jstate, (Node *) nt->arg);
+			}
+			break;
+		case T_BooleanTest:
+			{
+				BooleanTest *bt = (BooleanTest *) node;
+
+				APP_JUMB(bt->booltesttype);
+				JumbleExpr(jstate, (Node *) bt->arg);
+			}
+			break;
+		case T_CoerceToDomain:
+			{
+				CoerceToDomain *cd = (CoerceToDomain *) node;
+
+				APP_JUMB(cd->resulttype);
+				JumbleExpr(jstate, (Node *) cd->arg);
+			}
+			break;
+		case T_CoerceToDomainValue:
+			{
+				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
+
+				APP_JUMB(cdv->typeId);
+			}
+			break;
+		case T_SetToDefault:
+			{
+				SetToDefault *sd = (SetToDefault *) node;
+
+				APP_JUMB(sd->typeId);
+			}
+			break;
+		case T_CurrentOfExpr:
+			{
+				CurrentOfExpr *ce = (CurrentOfExpr *) node;
+
+				APP_JUMB(ce->cvarno);
+				if (ce->cursor_name)
+					APP_JUMB_STRING(ce->cursor_name);
+				APP_JUMB(ce->cursor_param);
+			}
+			break;
+		case T_NextValueExpr:
+			{
+				NextValueExpr *nve = (NextValueExpr *) node;
+
+				APP_JUMB(nve->seqid);
+				APP_JUMB(nve->typeId);
+			}
+			break;
+		case T_InferenceElem:
+			{
+				InferenceElem *ie = (InferenceElem *) node;
+
+				APP_JUMB(ie->infercollid);
+				APP_JUMB(ie->inferopclass);
+				JumbleExpr(jstate, ie->expr);
+			}
+			break;
+		case T_TargetEntry:
+			{
+				TargetEntry *tle = (TargetEntry *) node;
+
+				APP_JUMB(tle->resno);
+				APP_JUMB(tle->ressortgroupref);
+				JumbleExpr(jstate, (Node *) tle->expr);
+			}
+			break;
+		case T_RangeTblRef:
+			{
+				RangeTblRef *rtr = (RangeTblRef *) node;
+
+				APP_JUMB(rtr->rtindex);
+			}
+			break;
+		case T_JoinExpr:
+			{
+				JoinExpr   *join = (JoinExpr *) node;
+
+				APP_JUMB(join->jointype);
+				APP_JUMB(join->isNatural);
+				APP_JUMB(join->rtindex);
+				JumbleExpr(jstate, join->larg);
+				JumbleExpr(jstate, join->rarg);
+				JumbleExpr(jstate, join->quals);
+			}
+			break;
+		case T_FromExpr:
+			{
+				FromExpr   *from = (FromExpr *) node;
+
+				JumbleExpr(jstate, (Node *) from->fromlist);
+				JumbleExpr(jstate, from->quals);
+			}
+			break;
+		case T_OnConflictExpr:
+			{
+				OnConflictExpr *conf = (OnConflictExpr *) node;
+
+				APP_JUMB(conf->action);
+				JumbleExpr(jstate, (Node *) conf->arbiterElems);
+				JumbleExpr(jstate, conf->arbiterWhere);
+				JumbleExpr(jstate, (Node *) conf->onConflictSet);
+				JumbleExpr(jstate, conf->onConflictWhere);
+				APP_JUMB(conf->constraint);
+				APP_JUMB(conf->exclRelIndex);
+				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
+			}
+			break;
+		case T_List:
+			foreach(temp, (List *) node)
+			{
+				JumbleExpr(jstate, (Node *) lfirst(temp));
+			}
+			break;
+		case T_IntList:
+			foreach(temp, (List *) node)
+			{
+				APP_JUMB(lfirst_int(temp));
+			}
+			break;
+		case T_SortGroupClause:
+			{
+				SortGroupClause *sgc = (SortGroupClause *) node;
+
+				APP_JUMB(sgc->tleSortGroupRef);
+				APP_JUMB(sgc->eqop);
+				APP_JUMB(sgc->sortop);
+				APP_JUMB(sgc->nulls_first);
+			}
+			break;
+		case T_GroupingSet:
+			{
+				GroupingSet *gsnode = (GroupingSet *) node;
+
+				JumbleExpr(jstate, (Node *) gsnode->content);
+			}
+			break;
+		case T_WindowClause:
+			{
+				WindowClause *wc = (WindowClause *) node;
+
+				APP_JUMB(wc->winref);
+				APP_JUMB(wc->frameOptions);
+				JumbleExpr(jstate, (Node *) wc->partitionClause);
+				JumbleExpr(jstate, (Node *) wc->orderClause);
+				JumbleExpr(jstate, wc->startOffset);
+				JumbleExpr(jstate, wc->endOffset);
+			}
+			break;
+		case T_CommonTableExpr:
+			{
+				CommonTableExpr *cte = (CommonTableExpr *) node;
+
+				/* we store the string name because RTE_CTE RTEs need it */
+				APP_JUMB_STRING(cte->ctename);
+				APP_JUMB(cte->ctematerialized);
+				JumbleQueryInternal(jstate, castNode(Query, cte->ctequery));
+			}
+			break;
+		case T_SetOperationStmt:
+			{
+				SetOperationStmt *setop = (SetOperationStmt *) node;
+
+				APP_JUMB(setop->op);
+				APP_JUMB(setop->all);
+				JumbleExpr(jstate, setop->larg);
+				JumbleExpr(jstate, setop->rarg);
+			}
+			break;
+		case T_RangeTblFunction:
+			{
+				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
+
+				JumbleExpr(jstate, rtfunc->funcexpr);
+			}
+			break;
+		case T_TableFunc:
+			{
+				TableFunc  *tablefunc = (TableFunc *) node;
+
+				JumbleExpr(jstate, tablefunc->docexpr);
+				JumbleExpr(jstate, tablefunc->rowexpr);
+				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
+			}
+			break;
+		case T_TableSampleClause:
+			{
+				TableSampleClause *tsc = (TableSampleClause *) node;
+
+				APP_JUMB(tsc->tsmhandler);
+				JumbleExpr(jstate, (Node *) tsc->args);
+				JumbleExpr(jstate, (Node *) tsc->repeatable);
+			}
+			break;
+		default:
+			/* Only a warning, since we can stumble along anyway */
+			elog(WARNING, "unrecognized node type: %d",
+				 (int) nodeTag(node));
+			break;
+	}
+}
+
+/*
+ * Record location of constant within query string of query tree
+ * that is currently being walked.
+ */
+static void
+RecordConstLocation(JumbleState *jstate, int location)
+{
+	/* -1 indicates unknown or undefined location */
+	if (location >= 0)
+	{
+		/* enlarge array if needed */
+		if (jstate->clocations_count >= jstate->clocations_buf_size)
+		{
+			jstate->clocations_buf_size *= 2;
+			jstate->clocations = (LocationLen *)
+				repalloc(jstate->clocations,
+						 jstate->clocations_buf_size *
+						 sizeof(LocationLen));
+		}
+		jstate->clocations[jstate->clocations_count].location = location;
+		/* initialize lengths to -1 to simplify third-party module usage */
+		jstate->clocations[jstate->clocations_count].length = -1;
+		jstate->clocations_count++;
+	}
+}
diff --git a/src/include/parser/analyze.h b/src/include/parser/analyze.h
index 9d09a02141..e31c75d3a5 100644
--- a/src/include/parser/analyze.h
+++ b/src/include/parser/analyze.h
@@ -15,10 +15,12 @@
 #define ANALYZE_H
 
 #include "parser/parse_node.h"
+#include "utils/queryjumble.h"
 
 /* Hook for plugins to get control at end of parse analysis */
 typedef void (*post_parse_analyze_hook_type) (ParseState *pstate,
-											  Query *query);
+											  Query *query,
+											  JumbleState *jstate);
 extern PGDLLIMPORT post_parse_analyze_hook_type post_parse_analyze_hook;
 
 
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 073c8f3e06..57b854ce6b 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -248,6 +248,7 @@ extern bool log_btree_build_stats;
 extern PGDLLIMPORT bool check_function_bodies;
 extern bool session_auth_is_superuser;
 
+extern bool compute_queryid;
 extern bool log_duration;
 extern int	log_parameter_max_length;
 extern int	log_parameter_max_length_on_error;
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
new file mode 100644
index 0000000000..14087eea43
--- /dev/null
+++ b/src/include/utils/queryjumble.h
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.h
+ *	  Query normalization and fingerprinting.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/utils/queryjumble.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef QUERYJUBLE_H
+#define QUERYJUBLE_H
+
+#include "nodes/parsenodes.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+/*
+ * Struct for tracking locations/lengths of constants during normalization
+ */
+typedef struct LocationLen
+{
+	int			location;		/* start offset in query text */
+	int			length;			/* length in bytes, or -1 to ignore */
+} LocationLen;
+
+/*
+ * Working state for computing a query jumble and producing a normalized
+ * query string
+ */
+typedef struct JumbleState
+{
+	/* Jumble of current query tree */
+	unsigned char *jumble;
+
+	/* Number of bytes used in jumble[] */
+	Size		jumble_len;
+
+	/* Array of locations of constants that should be removed */
+	LocationLen *clocations;
+
+	/* Allocated length of clocations array */
+	int			clocations_buf_size;
+
+	/* Current number of valid entries in clocations array */
+	int			clocations_count;
+
+	/* highest Param id we've seen, in order to start normalization correctly */
+	int			highest_extern_param_id;
+} JumbleState;
+
+const char *clean_querytext(const char *query, int *location, int *len);
+JumbleState *JumbleQuery(Query *query, const char *querytext);
+
+#endif							/* QUERYJUMBLE_H */
-- 
2.28.0

v13-0003-Expose-query-identifier-in-verbose-explain.patchtext/x-patch; charset=US-ASCII; name=v13-0003-Expose-query-identifier-in-verbose-explain.patchDownload

From b927439f8dd4533bbaffbcfd2e0b01dc9de9acb0 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Sun, 8 Mar 2020 14:34:44 +0100
Subject: [PATCH v13 3/3] Expose query identifier in verbose explain

If a query identifier has been computed, either by enabling compute_queryid or
using a third-party module, verbose explain will display it.

Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 src/backend/commands/explain.c        | 18 ++++++++++++++++++
 src/test/regress/expected/explain.out |  9 +++++++++
 src/test/regress/sql/explain.sql      |  3 +++
 3 files changed, 30 insertions(+)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c8e292adfa..a25d99c3e1 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -24,6 +24,7 @@
 #include "nodes/extensible.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "parser/analyze.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/bufmgr.h"
@@ -163,6 +164,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 {
 	ExplainState *es = NewExplainState();
 	TupOutputState *tstate;
+	JumbleState *jstate = NULL;
+	Query		*query;
 	List	   *rewritten;
 	ListCell   *lc;
 	bool		timing_set = false;
@@ -239,6 +242,13 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	/* if the summary was not set explicitly, set default value */
 	es->summary = (summary_set) ? es->summary : es->analyze;
 
+	query = castNode(Query, stmt->query);
+	if (compute_queryid)
+		jstate = JumbleQuery(query, pstate->p_sourcetext);
+
+	if (post_parse_analyze_hook)
+		(*post_parse_analyze_hook) (pstate, query, jstate);
+
 	/*
 	 * Parse analysis was done already, but we still have to run the rule
 	 * rewriter.  We do not do AcquireRewriteLocks: we assume the query either
@@ -582,6 +592,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	/* Create textual dump of plan tree */
 	ExplainPrintPlan(es, queryDesc);
 
+	if (es->verbose && plannedstmt->queryId != UINT64CONST(0))
+	{
+		char	buf[MAXINT8LEN+1];
+
+		pg_lltoa(plannedstmt->queryId, buf);
+		ExplainPropertyText("Query Identifier", buf, es);
+	}
+
 	/* Show buffer usage in planning */
 	if (bufusage)
 	{
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
index dc7ab2ce8b..966bfef865 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -472,3 +472,12 @@ select jsonb_pretty(
 (1 row)
 
 rollback;
+set compute_queryid = on;
+select explain_filter('explain (verbose) select 1');
+             explain_filter             
+----------------------------------------
+ Result  (cost=N.N..N.N rows=N width=N)
+   Output: N
+ Query Identifier: -N
+(3 rows)
+
diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql
index c79116c927..cec23dec73 100644
--- a/src/test/regress/sql/explain.sql
+++ b/src/test/regress/sql/explain.sql
@@ -105,3 +105,6 @@ select jsonb_pretty(
 );
 
 rollback;
+
+set compute_queryid = on;
+select explain_filter('explain (verbose) select 1');
-- 
2.28.0

bruce@momjian.us

about 5 years ago

In reply to: Julien Rouhaud (#87)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Oct 14, 2020 at 05:43:33PM +0800, Julien Rouhaud wrote:

On Tue, Oct 13, 2020 at 4:53 AM Bruce Momjian <bruce@momjian.us> wrote:

On Mon, Oct 12, 2020 at 04:07:30PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote:

Yeah, I agree --- a version number is the wrong way to think about this.

The version number was to invalidate _all_ query hashes if the
algorithm is slightly modified, rather than invalidating just some of
them, which could lead to confusion.

Color me skeptical as to the use-case for that. From users' standpoints,
the hash is mainly going to change when we change the set of parse node
fields that get hashed. Which is going to happen at every major release
and no (or at least epsilon) minor releases. So I do not see a point in
tracking an algorithm version number as such. Seems like make-work.

OK, I came up with the hash idea only to address one of your concerns
about mismatched hashes for algorithm improvements/changes. Seems we
might as well just document that cross-version hashes are different.

Ok, so I tried to implement what seems to be the consensus. First
attached patch moves the current pgss queryid computation in core,
with a new compute_queryid GUC (on/off). One thing I don't really

Why would someone turn compute_queryid off? Overhead?

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

rjuju123@gmail.com

about 5 years ago

In reply to: Bruce Momjian (#89)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Oct 14, 2020 at 10:09 PM Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Oct 14, 2020 at 05:43:33PM +0800, Julien Rouhaud wrote:

On Tue, Oct 13, 2020 at 4:53 AM Bruce Momjian <bruce@momjian.us> wrote:

On Mon, Oct 12, 2020 at 04:07:30PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote:

Yeah, I agree --- a version number is the wrong way to think about this.

The version number was to invalidate _all_ query hashes if the
algorithm is slightly modified, rather than invalidating just some of
them, which could lead to confusion.

Color me skeptical as to the use-case for that. From users' standpoints,
the hash is mainly going to change when we change the set of parse node
fields that get hashed. Which is going to happen at every major release
and no (or at least epsilon) minor releases. So I do not see a point in
tracking an algorithm version number as such. Seems like make-work.

OK, I came up with the hash idea only to address one of your concerns
about mismatched hashes for algorithm improvements/changes. Seems we
might as well just document that cross-version hashes are different.

Ok, so I tried to implement what seems to be the consensus. First
attached patch moves the current pgss queryid computation in core,
with a new compute_queryid GUC (on/off). One thing I don't really

Why would someone turn compute_queryid off? Overhead?

Yes, or possibly to use a different algorithm.

bruce@momjian.us

about 5 years ago

In reply to: Julien Rouhaud (#90)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Oct 14, 2020 at 10:21:24PM +0800, Julien Rouhaud wrote:

On Wed, Oct 14, 2020 at 10:09 PM Bruce Momjian <bruce@momjian.us> wrote:

OK, I came up with the hash idea only to address one of your concerns
about mismatched hashes for algorithm improvements/changes. Seems we
might as well just document that cross-version hashes are different.

Ok, so I tried to implement what seems to be the consensus. First
attached patch moves the current pgss queryid computation in core,
with a new compute_queryid GUC (on/off). One thing I don't really

Why would someone turn compute_queryid off? Overhead?

Yes, or possibly to use a different algorithm.

Is there a measureable overhead when this is turned on, since it is off
by default and maybe should default to on.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

tgl@sss.pgh.pa.us

about 5 years ago

In reply to: Bruce Momjian (#91)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Bruce Momjian <bruce@momjian.us> writes:

Is there a measureable overhead when this is turned on, since it is off
by default and maybe should default to on.

I don't believe that "default to on" can even be in the discussion.
There is no in-core feature that would use this by default.

regards, tom lane

rjuju123@gmail.com

about 5 years ago

In reply to: Tom Lane (#92)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Oct 14, 2020 at 10:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Bruce Momjian <bruce@momjian.us> writes:

Is there a measureable overhead when this is turned on, since it is off
by default and maybe should default to on.

I don't believe that "default to on" can even be in the discussion.
There is no in-core feature that would use this by default.

If the 2nd patch is applied there would be pg_stat_activity.queryid
column, but I doubt that's a strong enough argument.

bruce@momjian.us

about 5 years ago

In reply to: Julien Rouhaud (#93)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Oct 14, 2020 at 10:34:31PM +0800, Julien Rouhaud wrote:

On Wed, Oct 14, 2020 at 10:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Bruce Momjian <bruce@momjian.us> writes:

Is there a measureable overhead when this is turned on, since it is off
by default and maybe should default to on.

I don't believe that "default to on" can even be in the discussion.
There is no in-core feature that would use this by default.

If the 2nd patch is applied there would be pg_stat_activity.queryid
column, but I doubt that's a strong enough argument.

There is that, and log_line_prefix, which I can imaging being useful.
My point is that if the queryid is visible, there should be a reason it
defaults to show empty.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

rjuju123@gmail.com

about 5 years ago

In reply to: Bruce Momjian (#94)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Oct 14, 2020 at 10:40 PM Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Oct 14, 2020 at 10:34:31PM +0800, Julien Rouhaud wrote:

On Wed, Oct 14, 2020 at 10:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Bruce Momjian <bruce@momjian.us> writes:

Is there a measureable overhead when this is turned on, since it is off
by default and maybe should default to on.

I don't believe that "default to on" can even be in the discussion.
There is no in-core feature that would use this by default.

If the 2nd patch is applied there would be pg_stat_activity.queryid
column, but I doubt that's a strong enough argument.

There is that, and log_line_prefix, which I can imaging being useful.
My point is that if the queryid is visible, there should be a reason it
defaults to show empty.

I did some naive benchmarking. Using a custom pgbench script with this query:

SELECT *
FROM pg_class c
JOIN pg_attribute a ON a.attrelid = c.oid
ORDER BY 1 DESC
LIMIT 1;

I can see around 2% overhead (this query is reported with ~ 3ms
latency average). Adding a few joins, overhead goes down to 1%.
Adding on top of the join some WHERE and GROUP BY conditions, overhead
goes down to 0.2% (at that point average latency is around 9ms on my
laptop). So having this enabled by default is probably only going to
hit people with OLTP-style workload with a majority of queries running
in a couple of milliseconds or less, which isn't that uncommon.

bruce@momjian.us

about 5 years ago

In reply to: Julien Rouhaud (#95)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Oct 15, 2020 at 11:41:23AM +0800, Julien Rouhaud wrote:

On Wed, Oct 14, 2020 at 10:40 PM Bruce Momjian <bruce@momjian.us> wrote:

There is that, and log_line_prefix, which I can imaging being useful.
My point is that if the queryid is visible, there should be a reason it
defaults to show empty.

I did some naive benchmarking. Using a custom pgbench script with this query:

SELECT *
FROM pg_class c
JOIN pg_attribute a ON a.attrelid = c.oid
ORDER BY 1 DESC
LIMIT 1;

I can see around 2% overhead (this query is reported with ~ 3ms
latency average). Adding a few joins, overhead goes down to 1%.

That number is too high to enable this by default. I suggest we either
improve the performance of this, or clearly document that you have to
enable the hash computation to see the pg_stat_activity and
log_line_prefix fields.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

alvherre@alvh.no-ip.org

about 5 years ago

In reply to: Bruce Momjian (#96)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On 2020-Oct-16, Bruce Momjian wrote:

On Thu, Oct 15, 2020 at 11:41:23AM +0800, Julien Rouhaud wrote:

I did some naive benchmarking. Using a custom pgbench script with this query:

I can see around 2% overhead (this query is reported with ~ 3ms
latency average). Adding a few joins, overhead goes down to 1%.

That number is too high to enable this by default. I suggest we either
improve the performance of this, or clearly document that you have to
enable the hash computation to see the pg_stat_activity and
log_line_prefix fields.

Agreed. This is similar to how we used to deal with query strings: an
optional feature, disabled by default (cf. commit b13c9686d084).

In this case, I suppose using pg_stat_statement would require to have it
enabled, and it'd just not collect anything if disabled. Similarly, the
field would show NULL in pg_stat_activity or an empty string in
log_line_prefix/CSV logs.

So users that want it can easily have it, and users that don't are not
paying the price.

For maximum user-friendliness, pg_stat_statement could be loaded and
shmem-initialized even when query ID computation is turned off, and
you'd be able to enable query ID computation with just SIGHUP; so you
don't have to restart the server in order to enable statement tracking.
(I suppose we would forbid users from disabling query ID with SET,
though.)

tgl@sss.pgh.pa.us

about 5 years ago

In reply to: Alvaro Herrera (#97)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

In this case, I suppose using pg_stat_statement would require to have it
enabled, and it'd just not collect anything if disabled.

Alternatively, pg_stat_statement might be able to force it on
(applying a non-overridable PGC_INTERNAL-level setting) on load?
Not sure if that'd be desirable or not.

If the behavior of pg_stat_statement is to do nothing when it
sees a query without the ID calculated (which I guess it'd have to)
then there's a potential security issue if the GUC is USERSET level:
a user could hide her queries from pg_stat_statement by turning the
GUC off. So this line of thought suggests the GUC needs to be at
least SUSET, and maybe higher ... doesn't pg_stat_statement need it
to have the same value cluster-wide?

regards, tom lane

bruce@momjian.us

about 5 years ago

In reply to: Alvaro Herrera (#97)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Oct 16, 2020 at 01:03:55PM -0300, ï¿½lvaro Herrera wrote:

On 2020-Oct-16, Bruce Momjian wrote:

On Thu, Oct 15, 2020 at 11:41:23AM +0800, Julien Rouhaud wrote:

I did some naive benchmarking. Using a custom pgbench script with this query:

I can see around 2% overhead (this query is reported with ~ 3ms
latency average). Adding a few joins, overhead goes down to 1%.

That number is too high to enable this by default. I suggest we either
improve the performance of this, or clearly document that you have to
enable the hash computation to see the pg_stat_activity and
log_line_prefix fields.

Agreed. This is similar to how we used to deal with query strings: an
optional feature, disabled by default (cf. commit b13c9686d084).

In this case, I suppose using pg_stat_statement would require to have it
enabled, and it'd just not collect anything if disabled. Similarly, the
field would show NULL in pg_stat_activity or an empty string in
log_line_prefix/CSV logs.

Yes, and at each use point, e.g., pg_stat_activity, log_line_prefix, we
have to remind people how to turn hash compuation on.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

rjuju123@gmail.com

about 5 years ago

In reply to: Tom Lane (#98)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Sat, Oct 17, 2020 at 12:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

In this case, I suppose using pg_stat_statement would require to have it
enabled, and it'd just not collect anything if disabled.

Yes, my idea was to be able to have pg_stat_statements enabled even if
no queryid is computed without that being a problem, and the patch I
sent should handle that properly, as pgss_store (and a few other
places) check for a non-zero queryid before doing any work.

Also, we can't have pg_stat_statements have any specific behavior
based on the new GUC, as there could alternatively be another module
that handles the queryid generation.

Alternatively, pg_stat_statement might be able to force it on
(applying a non-overridable PGC_INTERNAL-level setting) on load?
Not sure if that'd be desirable or not.

If the behavior of pg_stat_statement is to do nothing when it
sees a query without the ID calculated (which I guess it'd have to)

Yes that's what it does.

then there's a potential security issue if the GUC is USERSET level:
a user could hide her queries from pg_stat_statement by turning the
GUC off. So this line of thought suggests the GUC needs to be at
least SUSET, and maybe higher ... doesn't pg_stat_statement need it
to have the same value cluster-wide?

Well, I don't think that there's any guarantee that pg_stat_statemens
will display all activity that has been run, since there's a limited
amount of (userid, dbid, queryid) that can be stored, but I agree that
allowing random user to hide their activity isn't nice. Note that I
defined the GUC as SUSET, but maybe it should be SIGHUP?

rjuju123@gmail.com

about 5 years ago

In reply to: Bruce Momjian (#96)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Oct 16, 2020 at 11:04 PM Bruce Momjian <bruce@momjian.us> wrote:

On Thu, Oct 15, 2020 at 11:41:23AM +0800, Julien Rouhaud wrote:

On Wed, Oct 14, 2020 at 10:40 PM Bruce Momjian <bruce@momjian.us> wrote:

There is that, and log_line_prefix, which I can imaging being useful.
My point is that if the queryid is visible, there should be a reason it
defaults to show empty.

I did some naive benchmarking. Using a custom pgbench script with this query:

SELECT *
FROM pg_class c
JOIN pg_attribute a ON a.attrelid = c.oid
ORDER BY 1 DESC
LIMIT 1;

I can see around 2% overhead (this query is reported with ~ 3ms
latency average). Adding a few joins, overhead goes down to 1%.

That number is too high to enable this by default. I suggest we either
improve the performance of this, or clearly document that you have to
enable the hash computation to see the pg_stat_activity and
log_line_prefix fields.

I realize that I didn't update the documentation part to reflect the
new GUC. I'll fix that and add more warnings about the requirements
to have values displayed in pg_stat_acitivity and log_line_prefix.

alvherre@alvh.no-ip.org

about 5 years ago

In reply to: Julien Rouhaud (#100)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On 2020-Oct-17, Julien Rouhaud wrote:

On Sat, Oct 17, 2020 at 12:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

then there's a potential security issue if the GUC is USERSET level:
a user could hide her queries from pg_stat_statement by turning the
GUC off. So this line of thought suggests the GUC needs to be at
least SUSET, and maybe higher ... doesn't pg_stat_statement need it
to have the same value cluster-wide?

Well, I don't think that there's any guarantee that pg_stat_statemens
will display all activity that has been run, since there's a limited
amount of (userid, dbid, queryid) that can be stored, but I agree that
allowing random user to hide their activity isn't nice. Note that I
defined the GUC as SUSET, but maybe it should be SIGHUP?

I don't think we should consider pg_stat_statement a bulletproof defense
for security problems. It is already lossy by design.

I do think it'd be preferrable if we allowed it to be disabled at the
config file level only, not with SET (prevent users from hiding stuff);
but I think it is useful to allow users to enable it for specific
queries or for specific sessions only, while globally disabled. This
might mean we need to mark it PGC_SIGHUP and then have the check hook
disallow it from being changed under such-and-such conditions.

tgl@sss.pgh.pa.us

about 5 years ago

In reply to: Alvaro Herrera (#102)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

On 2020-Oct-17, Julien Rouhaud wrote:

On Sat, Oct 17, 2020 at 12:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

then there's a potential security issue if the GUC is USERSET level:
a user could hide her queries from pg_stat_statement by turning the
GUC off. So this line of thought suggests the GUC needs to be at
least SUSET, and maybe higher ... doesn't pg_stat_statement need it
to have the same value cluster-wide?

I don't think we should consider pg_stat_statement a bulletproof defense
for security problems. It is already lossy by design.

Fair point, but if we allow several different values to be set in
different sessions, what ends up happening in pg_stat_statements?

On the other hand, maybe that's just a matter for documentation.
"If the 'same' query is processed with two different queryID settings,
that will generally result in two separate table entries, because
the same ID hash is unlikely to be produced in both cases". There
is certainly a use-case for wanting to be able to do this, if for
example you'd like different query aggregation behavior for different
applications.

I do think it'd be preferrable if we allowed it to be disabled at the
config file level only, not with SET (prevent users from hiding stuff);
but I think it is useful to allow users to enable it for specific
queries or for specific sessions only, while globally disabled.

Indeed. I'm kind of talking myself into the idea that USERSET, or
at most SUSET, is fine, so long as we document what happens when it
has different values in different sessions.

regards, tom lane

alvherre@alvh.no-ip.org

about 5 years ago

In reply to: Tom Lane (#103)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On 2020-Oct-17, Tom Lane wrote:

Fair point, but if we allow several different values to be set in
different sessions, what ends up happening in pg_stat_statements?

On the other hand, maybe that's just a matter for documentation.
"If the 'same' query is processed with two different queryID settings,
that will generally result in two separate table entries, because
the same ID hash is unlikely to be produced in both cases".

Wait ... what? I've been thinking that this GUC is just to enable or
disable the computation of query ID, not to change the algorithm to do
so. Do we really need to allow different algorithms in different
sessions?

tgl@sss.pgh.pa.us

about 5 years ago

In reply to: Alvaro Herrera (#104)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Wait ... what? I've been thinking that this GUC is just to enable or
disable the computation of query ID, not to change the algorithm to do
so. Do we really need to allow different algorithms in different
sessions?

We established that some time ago, no?

regards, tom lane

rjuju123@gmail.com

about 5 years ago

In reply to: Tom Lane (#105)

3 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Sun, Oct 18, 2020 at 12:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Wait ... what? I've been thinking that this GUC is just to enable or
disable the computation of query ID, not to change the algorithm to do
so. Do we really need to allow different algorithms in different
sessions?

We established that some time ago, no?

I thought we established the need for allowing different algorithms,
but I assumed globally not per session. Anyway, allowing to enable or
disable compute_queryid per session would technically allow that,
assuming that you have another module loaded that computes a queryid
only if no-one was already computed. In that case pg_stat_statements
works as you would expect, you will get a new entry, with a duplicated
query text.

With a bit more thinking, there's at least one use case where it's
interesting to disable pg_stat_statements: queries using temporary
tables. In that case you're guaranteed to generate an infinity of
different queryid. That doesn't really help since you're not
aggregating anything anymore, and it also makes pg_stat_statements
virtually unusable as once you have a workload that needs frequent
eviction, the overhead is so bad that you basically have to disable
pg_stat_statements. We could alternatively add a GUC to disable
queryid computation when one of the tables is a temporary table, but
that's yet one among many considerations that are probably best
answered with a custom implementation.

I'm also attaching an updated patch with some attempt to improve the
documentation. I mention that in-core algorithm may not suits
everyone's needs, but we don't actually document what heuristics are.
Should we give more details on them and what are the most direct
consequences?

Attachments:

v14-0002-Expose-queryid-in-pg_stat_activity-and-log_line_.patchtext/x-patch; charset=US-ASCII; name=v14-0002-Expose-queryid-in-pg_stat_activity-and-log_line_.patchDownload

From 4b1f4ed2bfc2917879a33cc1348157f2fffd0cb4 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Mon, 18 Mar 2019 18:55:50 +0100
Subject: [PATCH v14 2/3] Expose queryid in pg_stat_activity and
 log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.

Author: Julien Rouhaud
Reviewed-by: Evgeny Efimkin, Michael Paquier, Yamada Tatsuro, Atsushi Torikoshi
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 112 +++++++-----------
 doc/src/sgml/config.sgml                      |  29 +++--
 doc/src/sgml/monitoring.sgml                  |  16 +++
 src/backend/catalog/system_views.sql          |   1 +
 src/backend/executor/execMain.c               |   8 ++
 src/backend/executor/execParallel.c           |  14 ++-
 src/backend/executor/nodeGather.c             |   3 +-
 src/backend/executor/nodeGatherMerge.c        |   4 +-
 src/backend/parser/analyze.c                  |   5 +
 src/backend/postmaster/pgstat.c               |  65 ++++++++++
 src/backend/tcop/postgres.c                   |   5 +
 src/backend/utils/adt/pgstatfuncs.c           |   7 +-
 src/backend/utils/error/elog.c                |  10 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          |  29 +++--
 src/include/catalog/pg_proc.dat               |   6 +-
 src/include/executor/execParallel.h           |   3 +-
 src/include/pgstat.h                          |   5 +
 src/include/utils/queryjumble.h               |   2 +-
 src/test/regress/expected/rules.out           |   9 +-
 20 files changed, 224 insertions(+), 110 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index f352d0b615..2a69dbb88e 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -65,6 +65,7 @@
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/queryjumble.h"
 #include "utils/memutils.h"
 
 PG_MODULE_MAGIC;
@@ -98,6 +99,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+									!IsA(n, PrepareStmt) && \
+									!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -295,7 +304,6 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 								ProcessUtilityContext context, ParamListInfo params,
 								QueryEnvironment *queryEnv,
 								DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
 static void pgss_store(const char *query, uint64 queryId,
 					   int query_location, int query_len,
 					   pgssStoreKind kind,
@@ -783,16 +791,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * Clear queryId for prepared statements related utility, as those will
+	 * inherit from the underlying statement's one (except DEALLOCATE which is
+	 * entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt))
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -1034,6 +1040,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, QueryCompletion *qc)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
+	uint64		saved_queryId = pstmt->queryId;
+
+	/*
+	 * Force utility statements to get queryId zero.  We do this even in cases
+	 * where the statement contains an optimizable statement for which a
+	 * queryId could be derived (such as EXPLAIN or DECLARE CURSOR).  For such
+	 * cases, runtime control will first go through ProcessUtility and then the
+	 * executor, and we don't want the executor hooks to do anything, since we
+	 * are already measuring the statement's costs at the utility level.
+	 *
+	 * Note that this is only done if pg_stat_statements is enabled and
+	 * configured to track utility statements, in the unlikely possibility
+	 * that user configured another extension to handle utility statements
+	 * only.
+	 */
+	if (pgss_enabled(exec_nested_level) && pgss_track_utility)
+		pstmt->queryId = UINT64CONST(0);
 
 	/*
 	 * If it's an EXECUTE statement, we don't track it and don't increment the
@@ -1050,9 +1073,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	 * Likewise, we don't track execution of DEALLOCATE.
 	 */
 	if (pgss_track_utility && pgss_enabled(exec_nested_level) &&
-		!IsA(parsetree, ExecuteStmt) &&
-		!IsA(parsetree, PrepareStmt) &&
-		!IsA(parsetree, DeallocateStmt))
+		PGSS_HANDLED_UTILITY(parsetree))
 	{
 		instr_time	start;
 		instr_time	duration;
@@ -1106,7 +1127,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
 
 		pgss_store(queryString,
-				   0,			/* signal that it's a utility stmt */
+				   saved_queryId,
 				   pstmt->stmt_location,
 				   pstmt->stmt_len,
 				   PGSS_EXEC,
@@ -1129,23 +1150,12 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	}
 }
 
-/*
- * Given an arbitrarily long query string, produce a hash for the purposes of
- * identifying the query, without normalizing constants.  Used when hashing
- * utility statements.
- */
-static uint64
-pgss_hash_string(const char *str, int len)
-{
-	return DatumGetUInt64(hash_any_extended((const unsigned char *) str,
-											len, 0));
-}
-
 /*
  * Store some statistics for a statement.
  *
- * If queryId is 0 then this is a utility statement and we should compute
- * a suitable queryId internally.
+ * If queryId is 0 then this is a utility statement for which we couldn't
+ * compute a queryId during parse analysis, and we should compute a suitable
+ * queryId internally.
  *
  * If jstate is not NULL then we're trying to create an entry for which
  * we have no statistics as yet; we just want to record the normalized
@@ -1176,52 +1186,18 @@ pgss_store(const char *query, uint64 queryId,
 		return;
 
 	/*
-	 * Confine our attention to the relevant part of the string, if the query
-	 * is a portion of a multi-statement source string.
-	 *
-	 * First apply starting offset, unless it's -1 (unknown).
-	 */
-	if (query_location >= 0)
-	{
-		Assert(query_location <= strlen(query));
-		query += query_location;
-		/* Length of 0 (or -1) means "rest of string" */
-		if (query_len <= 0)
-			query_len = strlen(query);
-		else
-			Assert(query_len <= strlen(query));
-	}
-	else
-	{
-		/* If query location is unknown, distrust query_len as well */
-		query_location = 0;
-		query_len = strlen(query);
-	}
-
-	/*
-	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
-	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 * Nothing to do if compute_queryid isn't enabled and no other module
+	 * computed a query identifier.
 	 */
-	while (query_len > 0 && scanner_isspace(query[0]))
-		query++, query_location++, query_len--;
-	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
-		query_len--;
+	if (queryId == UINT64CONST(0))
+		return;
 
 	/*
-	 * For utility statements, we just hash the query string to get an ID.
+	 * Confine our attention to the relevant part of the string, if the query
+	 * is a portion of a multi-statement source string, and update query
+	 * location and length if needed.
 	 */
-	if (queryId == UINT64CONST(0))
-	{
-		queryId = pgss_hash_string(query, query_len);
-
-		/*
-		 * If we are unlucky enough to get a hash of zero(invalid), use
-		 * queryID as 2 instead, queryID 1 is already in use for normal
-		 * statements.
-		 */
-		if (queryId == UINT64CONST(0))
-			queryId = UINT64CONST(2);
-	}
+	query = CleanQuerytext(query, &query_location, &query_len);
 
 	/* Set up key for hashtable search */
 	key.userid = GetUserId();
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b40f7b5af3..2acca653d3 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6843,6 +6843,15 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>queryid: identifier of session's current query.
+             By default, query identifiers are not computed, so this field will
+             always be zero, unless <xref linkend="guc-compute-queryid"/>
+             parameter is enabled or if a third-party module that computes query
+             identifiers is configured.</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
@@ -7297,8 +7306,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       <listitem>
        <para>
         Enables the collection of information on the currently
-        executing command of each session, along with the time when
-        that command began execution. This parameter is on by
+        executing command of each session, along with its identifier and the
+        time when that command began execution. This parameter is on by
         default. Note that even when enabled, this information is not
         visible to all users, only to superusers and the user owning
         the session being reported on, so it should not represent a
@@ -7424,12 +7433,16 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </term>
       <listitem>
        <para>
-        Enables or disables in core query identifier computation.arameter.  The
-        <xref linkend="pgstatstatements"/> extension requires a query
-        identifier to be computed.  Note that an external module can
-        alternatively be used if the in core query identifier computation
-        specification doesn't suit your need.  In this case, in core
-        computation must be disabled.  The default is <literal>off</literal>.
+        Enables or disables in core query identifier computation.  A query
+        identifier can be displayed in the <link
+        linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
+        view, or emitted in the log if configured via the <xref
+        linkend="guc-log-line-prefix"/> parameter.  The <xref
+        linkend="pgstatstatements"/> extension also requires a query identifier
+        to be computed.  Note that an external module can alternatively be used
+        if the in core query identifier computation specification doesn't suit
+        your need.  In this case, in core computation must be disabled.  The
+        default is <literal>off</literal>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 66566765f0..c1f57b9c1a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -899,6 +899,22 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </para></entry>
      </row>
 
+    <row>
+     <entry role="catalog_table_entry"><para role="column_definition">
+      <structfield>queryid</structfield> <type>bigint</type>
+     </para>
+     <para>
+      Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this field
+      shows the identifier of the currently executing query. In all other
+      states, it shows the identifier of last query that was executed.  By
+      default, query identifiers are not computed, so this field will always
+      be null, unless <xref linkend="guc-compute-queryid"/> parameter is
+      enabled or if a third-party module that computes query identifiers is
+      configured.
+     </para></entry>
+    </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index c29390760f..1c81991fab 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -764,6 +764,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 293f53d07c..263ee57160 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -54,6 +54,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "tcop/utility.h"
@@ -142,6 +143,13 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	/* In some cases (e.g. an EXECUTE statement) a query execution will skip
+	 * parse analysis, which means that the queryid won't be reported.  Note
+	 * that it's harmless to report the queryid multiple time, as the call will
+	 * be ignored if the top level queryid has already been reported.
+	 */
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index befde52691..44976d2c68 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -124,7 +124,7 @@ typedef struct ExecParallelInitializeDSMContext
 } ExecParallelInitializeDSMContext;
 
 /* Helper functions that run in the parallel leader. */
-static char *ExecSerializePlan(Plan *plan, EState *estate);
+static char *ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId);
 static bool ExecParallelEstimate(PlanState *node,
 								 ExecParallelEstimateContext *e);
 static bool ExecParallelInitializeDSM(PlanState *node,
@@ -143,7 +143,7 @@ static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
  * Create a serialized representation of the plan to be sent to each worker.
  */
 static char *
-ExecSerializePlan(Plan *plan, EState *estate)
+ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId)
 {
 	PlannedStmt *pstmt;
 	ListCell   *lc;
@@ -174,7 +174,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = UINT64CONST(0);
+	pstmt->queryId = queryId;
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
@@ -578,7 +578,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
 ParallelExecutorInfo *
 ExecInitParallelPlan(PlanState *planstate, EState *estate,
 					 Bitmapset *sendParams, int nworkers,
-					 int64 tuples_needed)
+					 int64 tuples_needed,
+					 uint64 queryId)
 {
 	ParallelExecutorInfo *pei;
 	ParallelContext *pcxt;
@@ -620,7 +621,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pei->planstate = planstate;
 
 	/* Fix up and serialize plan to be sent to workers. */
-	pstmt_data = ExecSerializePlan(planstate->plan, estate);
+	pstmt_data = ExecSerializePlan(planstate->plan, estate, queryId);
 
 	/* Create a parallel context. */
 	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -1403,8 +1404,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Setting debug_query_string for individual workers */
 	debug_query_string = queryDesc->sourceText;
 
-	/* Report workers' query for monitoring purposes */
+	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a01b46af14..0fb003aaec 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -172,7 +172,8 @@ ExecGather(PlanState *pstate)
 												 estate,
 												 gather->initParam,
 												 gather->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 47129344f3..e6017675e7 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -24,6 +24,7 @@
 #include "lib/binaryheap.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -216,7 +217,8 @@ ExecGatherMerge(PlanState *pstate)
 												 estate,
 												 gm->initParam,
 												 gm->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index c59336cd49..cd05c15a22 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -43,6 +43,7 @@
 #include "parser/parse_relation.h"
 #include "parser/parse_target.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/guc.h"
 #include "utils/queryjumble.h"
@@ -126,6 +127,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
@@ -163,6 +166,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 822f0ebc62..105fadcad4 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3302,6 +3302,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
+			beentry->st_queryid = 0;
 			proc->wait_event_info = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
@@ -3332,6 +3333,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	beentry->st_state = state;
 	beentry->st_state_start_timestamp = current_timestamp;
 
+	/*
+	 * If a new query is started, we reset the query identifier as it'll only
+	 * be known after parse analysis, to avoid reporting last query's
+	 * identifier.
+	 */
+	if (state == STATE_RUNNING)
+		beentry->st_queryid = 0;
+
 	if (cmd_str != NULL)
 	{
 		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
@@ -3342,6 +3351,48 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/* --------
+ * pgstat_report_queryid() -
+ *
+ *	Called to update top-level query identifier.
+ * --------
+ */
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	/*
+	 * if track_activities is disabled, st_queryid should already have been
+	 * reset
+	 */
+	if (!pgstat_track_activities)
+		return;
+
+	/*
+	 * We only report the top-level query identifiers.  The stored queryid is
+	 * reset when a backend calls pgstat_report_activity(STATE_RUNNING), or
+	 * with an explicit call to this function using the force flag.  If the
+	 * saved query identifier is not zero it means that it's not a top-level
+	 * command, so ignore the one provided unless it's an explicit call to
+	 * reset the identifier.
+	 */
+	if (beentry->st_queryid != 0 && !force)
+		return;
+
+	/*
+	 * Update my status entry, following the protocol of bumping
+	 * st_changecount before and after.  We use a volatile pointer here to
+	 * ensure the compiler doesn't try to get cute.
+	 */
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+	beentry->st_queryid = queryId;
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
 /*-----------
  * pgstat_progress_start_command() -
  *
@@ -5000,6 +5051,20 @@ pgstat_get_db_entry(Oid databaseid, bool create)
 	return result;
 }
 
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ *	Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+	if (!MyBEEntry)
+		return 0;
+
+	return MyBEEntry->st_queryid;
+}
+
 
 /*
  * Lookup the hash table entry for the specified table. If no hash
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 0deb3c143f..5a66573f2f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -746,6 +746,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	if (log_parser_stats)
 		ShowUsage("PARSE ANALYSIS STATISTICS");
 
@@ -964,6 +966,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
 			stmt->utilityStmt = query->utilityStmt;
 			stmt->stmt_location = query->stmt_location;
 			stmt->stmt_len = query->stmt_len;
+			stmt->queryId = query->queryId;
 		}
 		else
 		{
@@ -1080,6 +1083,8 @@ exec_simple_query(const char *query_string)
 		DestReceiver *receiver;
 		int16		format;
 
+		pgstat_report_queryid(0, true);
+
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 0d0d2e6d2b..8dad50bc32 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -567,7 +567,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	30
+#define PG_STAT_GET_ACTIVITY_COLS	31
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -913,6 +913,10 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[28] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			if (beentry->st_queryid == 0)
+				nulls[30] = true;
+			else
+				values[30] = DatumGetUInt64(beentry->st_queryid);
 		}
 		else
 		{
@@ -941,6 +945,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[27] = true;
 			nulls[28] = true;
 			nulls[29] = true;
+			nulls[30] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 1ba47c194b..23c1e0d590 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -72,11 +72,11 @@
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/bgworker.h"
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
-#include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2628,6 +2628,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (padding != 0)
+					appendStringInfo(buf, "%*ld", padding,
+							pgstat_get_my_queryid());
+				else
+					appendStringInfo(buf, "%ld",
+							pgstat_get_my_queryid());
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 81bcb9d25c..eec94ac5a2 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -541,6 +541,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index ae84fcac6e..b0a5731ef7 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -39,7 +39,7 @@
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
-static uint64 compute_utility_queryid(const char *str, int query_len);
+static uint64 compute_utility_queryid(const char *str, int query_location, int query_len);
 static void AppendJumble(JumbleState *jstate,
 						 const unsigned char *item, Size size);
 static void JumbleQueryInternal(JumbleState *jstate, Query *query);
@@ -53,7 +53,7 @@ static void RecordConstLocation(JumbleState *jstate, int location);
  * relevant part of the string.
  */
 const char *
-clean_querytext(const char *query, int *location, int *len)
+CleanQuerytext(const char *query, int *location, int *len)
 {
 	int query_location = *location;
 	int query_len = *len;
@@ -97,17 +97,9 @@ JumbleQuery(Query *query, const char *querytext)
 	JumbleState *jstate = NULL;
 	if (query->utilityStmt)
 	{
-		const char *sql;
-		int query_location = query->stmt_location;
-		int query_len = query->stmt_len;
-
-		/*
-		 * Confine our attention to the relevant part of the string, if the
-		 * query is a portion of a multi-statement source string.
-		 */
-		sql = clean_querytext(querytext, &query_location, &query_len);
-
-		query->queryId = compute_utility_queryid(sql, query_len);
+		query->queryId = compute_utility_queryid(querytext,
+												 query->stmt_location,
+												 query->stmt_len);
 	}
 	else
 	{
@@ -143,11 +135,18 @@ JumbleQuery(Query *query, const char *querytext)
  * Compute a query identifier for the given utility query string.
  */
 static uint64
-compute_utility_queryid(const char *str, int query_len)
+compute_utility_queryid(const char *query_text, int query_location, int query_len)
 {
 	uint64 queryId;
+	const char *sql;
+
+	/*
+	 * Confine our attention to the relevant part of the string, if the
+	 * query is a portion of a multi-statement source string.
+	 */
+	sql = CleanQuerytext(query_text, &query_location, &query_len);
 
-	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) sql,
 											   query_len, 0));
 
 	/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 22340baf1c..872235e8c6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5228,9 +5228,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid, queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a39a5b29c..fb5d908433 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -39,7 +39,8 @@ typedef struct ParallelExecutorInfo
 
 extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
 												  EState *estate, Bitmapset *sendParam, int nworkers,
-												  int64 tuples_needed);
+												  int64 tuples_needed,
+												  uint64 queryId);
 extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
 extern void ExecParallelFinish(ParallelExecutorInfo *pei);
 extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index a821ff4f15..310586d053 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1205,6 +1205,9 @@ typedef struct PgBackendStatus
 	ProgressCommandType st_progress_command;
 	Oid			st_progress_command_target;
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+
+	/* query identifier, optionally computed using post_parse_analyze_hook */
+	uint64		st_queryid;
 } PgBackendStatus;
 
 /*
@@ -1394,6 +1397,7 @@ extern void pgstat_initialize(void);
 extern void pgstat_bestart(void);
 
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_queryid(uint64 queryId, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
@@ -1402,6 +1406,7 @@ extern const char *pgstat_get_wait_event_type(uint32 wait_event_info);
 extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
+extern uint64 pgstat_get_my_queryid(void);
 
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
index 14087eea43..520cd4f43e 100644
--- a/src/include/utils/queryjumble.h
+++ b/src/include/utils/queryjumble.h
@@ -52,7 +52,7 @@ typedef struct JumbleState
 	int			highest_extern_param_id;
 } JumbleState;
 
-const char *clean_querytext(const char *query, int *location, int *len);
+const char *CleanQuerytext(const char *query, int *location, int *len);
 JumbleState *JumbleQuery(Query *query, const char *querytext);
 
 #endif							/* QUERYJUMBLE_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index cf2a9b4408..488001411a 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1760,9 +1760,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1867,7 +1868,7 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
@@ -2015,7 +2016,7 @@ pg_stat_replication| SELECT s.pid,
     w.sync_priority,
     w.sync_state,
     w.reply_time
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_replication_slots| SELECT s.name,
@@ -2043,7 +2044,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,
-- 
2.28.0

v14-0003-Expose-query-identifier-in-verbose-explain.patchtext/x-patch; charset=US-ASCII; name=v14-0003-Expose-query-identifier-in-verbose-explain.patchDownload

From 4935c94a3ae05869edd3f485dd590369ce94261c Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Sun, 8 Mar 2020 14:34:44 +0100
Subject: [PATCH v14 3/3] Expose query identifier in verbose explain

If a query identifier has been computed, either by enabling compute_queryid or
using a third-party module, verbose explain will display it.

Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 doc/src/sgml/config.sgml              | 14 +++++++-------
 doc/src/sgml/ref/explain.sgml         |  6 ++++--
 src/backend/commands/explain.c        | 18 ++++++++++++++++++
 src/test/regress/expected/explain.out |  9 +++++++++
 src/test/regress/sql/explain.sql      |  3 +++
 5 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2acca653d3..54fb70a188 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7436,13 +7436,13 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         Enables or disables in core query identifier computation.  A query
         identifier can be displayed in the <link
         linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
-        view, or emitted in the log if configured via the <xref
-        linkend="guc-log-line-prefix"/> parameter.  The <xref
-        linkend="pgstatstatements"/> extension also requires a query identifier
-        to be computed.  Note that an external module can alternatively be used
-        if the in core query identifier computation specification doesn't suit
-        your need.  In this case, in core computation must be disabled.  The
-        default is <literal>off</literal>.
+        view, using <command>EXPLAIN</command>, or emitted in the log if
+        configured via the <xref linkend="guc-log-line-prefix"/> parameter.
+        The <xref linkend="pgstatstatements"/> extension also requires a query
+        identifier to be computed.  Note that an external module can
+        alternatively be used if the in core query identifier computation
+        specification doesn't suit your need.  In this case, in core
+        computation must be disabled.  The default is <literal>off</literal>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index b0ccdd26e7..2c68ed6220 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -136,8 +136,10 @@ ROLLBACK;
       the output column list for each node in the plan tree, schema-qualify
       table and function names, always label variables in expressions with
       their range table alias, and always print the name of each trigger for
-      which statistics are displayed.  This parameter defaults to
-      <literal>FALSE</literal>.
+      which statistics are displayed.  The query identifier will also be
+      displayed if one has been compute, see <xref
+      linkend="guc-compute-queryid"/> for more details.  This parameter
+      defaults to <literal>FALSE</literal>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 41317f1837..eaa8f011ed 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -24,6 +24,7 @@
 #include "nodes/extensible.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "parser/analyze.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/bufmgr.h"
@@ -163,6 +164,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 {
 	ExplainState *es = NewExplainState();
 	TupOutputState *tstate;
+	JumbleState *jstate = NULL;
+	Query		*query;
 	List	   *rewritten;
 	ListCell   *lc;
 	bool		timing_set = false;
@@ -239,6 +242,13 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	/* if the summary was not set explicitly, set default value */
 	es->summary = (summary_set) ? es->summary : es->analyze;
 
+	query = castNode(Query, stmt->query);
+	if (compute_queryid)
+		jstate = JumbleQuery(query, pstate->p_sourcetext);
+
+	if (post_parse_analyze_hook)
+		(*post_parse_analyze_hook) (pstate, query, jstate);
+
 	/*
 	 * Parse analysis was done already, but we still have to run the rule
 	 * rewriter.  We do not do AcquireRewriteLocks: we assume the query either
@@ -582,6 +592,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	/* Create textual dump of plan tree */
 	ExplainPrintPlan(es, queryDesc);
 
+	if (es->verbose && plannedstmt->queryId != UINT64CONST(0))
+	{
+		char	buf[MAXINT8LEN+1];
+
+		pg_lltoa(plannedstmt->queryId, buf);
+		ExplainPropertyText("Query Identifier", buf, es);
+	}
+
 	/* Show buffer usage in planning */
 	if (bufusage)
 	{
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
index dc7ab2ce8b..966bfef865 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -472,3 +472,12 @@ select jsonb_pretty(
 (1 row)
 
 rollback;
+set compute_queryid = on;
+select explain_filter('explain (verbose) select 1');
+             explain_filter             
+----------------------------------------
+ Result  (cost=N.N..N.N rows=N width=N)
+   Output: N
+ Query Identifier: -N
+(3 rows)
+
diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql
index c79116c927..cec23dec73 100644
--- a/src/test/regress/sql/explain.sql
+++ b/src/test/regress/sql/explain.sql
@@ -105,3 +105,6 @@ select jsonb_pretty(
 );
 
 rollback;
+
+set compute_queryid = on;
+select explain_filter('explain (verbose) select 1');
-- 
2.28.0

v14-0001-Move-pg_stat_statements-query-jumbling-to-core.patchtext/x-patch; charset=UTF-8; name=v14-0001-Move-pg_stat_statements-query-jumbling-to-core.patchDownload

From 45a35ffc17287ca19d12f14a9b58dd820207dcd9 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Wed, 14 Oct 2020 02:11:37 +0800
Subject: [PATCH v14 1/3] Move pg_stat_statements query jumbling to core.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

A new compute_queryid GUC is also added, to control whether the queryid should
be computed.  It's now possible to disable core queryid computation and use
pg_stat_statements with a different algorithm to compute the queryid by using
third-party module.

Author: Julien Rouhaud²
Reviewed-by:
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +----------------
 .../pg_stat_statements.conf                   |   1 +
 doc/src/sgml/config.sgml                      |  18 +
 src/backend/parser/analyze.c                  |  14 +-
 src/backend/tcop/postgres.c                   |   6 +-
 src/backend/utils/misc/Makefile               |   1 +
 src/backend/utils/misc/guc.c                  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          | 834 ++++++++++++++++++
 src/include/parser/analyze.h                  |   4 +-
 src/include/utils/guc.h                       |   1 +
 src/include/utils/queryjumble.h               |  58 ++
 12 files changed, 969 insertions(+), 784 deletions(-)
 create mode 100644 src/backend/utils/misc/queryjumble.c
 create mode 100644 src/include/utils/queryjumble.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 1eac9edaee..f352d0b615 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * As of Postgres 9.2, this module normalizes query entries.  As of Postgres
+ * 14, the normalization is done by the core, if compute_queryid is enabled, or
+ * by third-party modules if enabled.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -113,8 +98,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
-#define JUMBLE_SIZE				1024	/* query serialization buffer size */
-
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -224,40 +207,6 @@ typedef struct pgssSharedState
 	int			gc_count;		/* query file garbage collection cycle count */
 } pgssSharedState;
 
-/*
- * Struct for tracking locations/lengths of constants during normalization
- */
-typedef struct pgssLocationLen
-{
-	int			location;		/* start offset in query text */
-	int			length;			/* length in bytes, or -1 to ignore */
-} pgssLocationLen;
-
-/*
- * Working state for computing a query jumble and producing a normalized
- * query string
- */
-typedef struct pgssJumbleState
-{
-	/* Jumble of current query tree */
-	unsigned char *jumble;
-
-	/* Number of bytes used in jumble[] */
-	Size		jumble_len;
-
-	/* Array of locations of constants that should be removed */
-	pgssLocationLen *clocations;
-
-	/* Allocated length of clocations array */
-	int			clocations_buf_size;
-
-	/* Current number of valid entries in clocations array */
-	int			clocations_count;
-
-	/* highest Param id we've seen, in order to start normalization correctly */
-	int			highest_extern_param_id;
-} pgssJumbleState;
-
 /*---- Local variables ----*/
 
 /* Current nesting depth of ExecutorRun+ProcessUtility calls */
@@ -330,7 +279,8 @@ PG_FUNCTION_INFO_V1(pg_stat_statements);
 
 static void pgss_shmem_startup(void);
 static void pgss_shmem_shutdown(int code, Datum arg);
-static void pgss_post_parse_analyze(ParseState *pstate, Query *query);
+static void pgss_post_parse_analyze(ParseState *pstate, Query *query,
+									JumbleState *jstate);
 static PlannedStmt *pgss_planner(Query *parse,
 								 const char *query_string,
 								 int cursorOptions,
@@ -352,7 +302,7 @@ static void pgss_store(const char *query, uint64 queryId,
 					   double total_time, uint64 rows,
 					   const BufferUsage *bufusage,
 					   const WalUsage *walusage,
-					   pgssJumbleState *jstate);
+					   JumbleState *jstate);
 static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
 										pgssVersion api_version,
 										bool showtext);
@@ -368,16 +318,9 @@ static char *qtext_fetch(Size query_offset, int query_len,
 static bool need_gc_qtexts(void);
 static void gc_qtexts(void);
 static void entry_reset(Oid userid, Oid dbid, uint64 queryid);
-static void AppendJumble(pgssJumbleState *jstate,
-						 const unsigned char *item, Size size);
-static void JumbleQuery(pgssJumbleState *jstate, Query *query);
-static void JumbleRangeTable(pgssJumbleState *jstate, List *rtable);
-static void JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks);
-static void JumbleExpr(pgssJumbleState *jstate, Node *node);
-static void RecordConstLocation(pgssJumbleState *jstate, int location);
-static char *generate_normalized_query(pgssJumbleState *jstate, const char *query,
+static char *generate_normalized_query(JumbleState *jstate, const char *query,
 									   int query_loc, int *query_len_p);
-static void fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+static void fill_in_constant_lengths(JumbleState *jstate, const char *query,
 									 int query_loc);
 static int	comp_location(const void *a, const void *b);
 
@@ -830,15 +773,10 @@ error:
  * Post-parse-analysis hook: mark query with a queryId
  */
 static void
-pgss_post_parse_analyze(ParseState *pstate, Query *query)
+pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 {
-	pgssJumbleState jstate;
-
 	if (prev_post_parse_analyze_hook)
-		prev_post_parse_analyze_hook(pstate, query);
-
-	/* Assert we didn't do this already */
-	Assert(query->queryId == UINT64CONST(0));
+		prev_post_parse_analyze_hook(pstate, query, jstate);
 
 	/* Safety check... */
 	if (!pgss || !pgss_hash || !pgss_enabled(exec_nested_level))
@@ -858,35 +796,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 	}
 
-	/* Set up workspace for query jumbling */
-	jstate.jumble = (unsigned char *) palloc(JUMBLE_SIZE);
-	jstate.jumble_len = 0;
-	jstate.clocations_buf_size = 32;
-	jstate.clocations = (pgssLocationLen *)
-		palloc(jstate.clocations_buf_size * sizeof(pgssLocationLen));
-	jstate.clocations_count = 0;
-	jstate.highest_extern_param_id = 0;
-
-	/* Compute query ID and mark the Query node with it */
-	JumbleQuery(&jstate, query);
-	query->queryId =
-		DatumGetUInt64(hash_any_extended(jstate.jumble, jstate.jumble_len, 0));
-
 	/*
-	 * If we are unlucky enough to get a hash of zero, use 1 instead, to
-	 * prevent confusion with the utility-statement case.
+	 * If query jumbling were able to identify any ignorable constants, we
+	 * immediately create a hash table entry for the query, so that we can
+	 * record the normalized form of the query string.  If there were no such
+	 * constants, the normalized string would be the same as the query text
+	 * anyway, so there's no need for an early entry.
 	 */
-	if (query->queryId == UINT64CONST(0))
-		query->queryId = UINT64CONST(1);
-
-	/*
-	 * If we were able to identify any ignorable constants, we immediately
-	 * create a hash table entry for the query, so that we can record the
-	 * normalized form of the query string.  If there were no such constants,
-	 * the normalized string would be the same as the query text anyway, so
-	 * there's no need for an early entry.
-	 */
-	if (jstate.clocations_count > 0)
+	if (jstate && jstate->clocations_count > 0)
 		pgss_store(pstate->p_sourcetext,
 				   query->queryId,
 				   query->stmt_location,
@@ -896,7 +813,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 				   0,
 				   NULL,
 				   NULL,
-				   &jstate);
+				   jstate);
 }
 
 /*
@@ -1245,7 +1162,7 @@ pgss_store(const char *query, uint64 queryId,
 		   double total_time, uint64 rows,
 		   const BufferUsage *bufusage,
 		   const WalUsage *walusage,
-		   pgssJumbleState *jstate)
+		   JumbleState *jstate)
 {
 	pgssHashKey key;
 	pgssEntry  *entry;
@@ -2541,678 +2458,6 @@ release_lock:
 	LWLockRelease(pgss->lock);
 }
 
-/*
- * AppendJumble: Append a value that is substantive in a given query to
- * the current jumble.
- */
-static void
-AppendJumble(pgssJumbleState *jstate, const unsigned char *item, Size size)
-{
-	unsigned char *jumble = jstate->jumble;
-	Size		jumble_len = jstate->jumble_len;
-
-	/*
-	 * Whenever the jumble buffer is full, we hash the current contents and
-	 * reset the buffer to contain just that hash value, thus relying on the
-	 * hash to summarize everything so far.
-	 */
-	while (size > 0)
-	{
-		Size		part_size;
-
-		if (jumble_len >= JUMBLE_SIZE)
-		{
-			uint64		start_hash;
-
-			start_hash = DatumGetUInt64(hash_any_extended(jumble,
-														  JUMBLE_SIZE, 0));
-			memcpy(jumble, &start_hash, sizeof(start_hash));
-			jumble_len = sizeof(start_hash);
-		}
-		part_size = Min(size, JUMBLE_SIZE - jumble_len);
-		memcpy(jumble + jumble_len, item, part_size);
-		jumble_len += part_size;
-		item += part_size;
-		size -= part_size;
-	}
-	jstate->jumble_len = jumble_len;
-}
-
-/*
- * Wrappers around AppendJumble to encapsulate details of serialization
- * of individual local variable elements.
- */
-#define APP_JUMB(item) \
-	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
-#define APP_JUMB_STRING(str) \
-	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
-
-/*
- * JumbleQuery: Selectively serialize the query tree, appending significant
- * data to the "query jumble" while ignoring nonsignificant data.
- *
- * Rule of thumb for what to include is that we should ignore anything not
- * semantically significant (such as alias names) as well as anything that can
- * be deduced from child nodes (else we'd just be double-hashing that piece
- * of information).
- */
-static void
-JumbleQuery(pgssJumbleState *jstate, Query *query)
-{
-	Assert(IsA(query, Query));
-	Assert(query->utilityStmt == NULL);
-
-	APP_JUMB(query->commandType);
-	/* resultRelation is usually predictable from commandType */
-	JumbleExpr(jstate, (Node *) query->cteList);
-	JumbleRangeTable(jstate, query->rtable);
-	JumbleExpr(jstate, (Node *) query->jointree);
-	JumbleExpr(jstate, (Node *) query->targetList);
-	JumbleExpr(jstate, (Node *) query->onConflict);
-	JumbleExpr(jstate, (Node *) query->returningList);
-	JumbleExpr(jstate, (Node *) query->groupClause);
-	JumbleExpr(jstate, (Node *) query->groupingSets);
-	JumbleExpr(jstate, query->havingQual);
-	JumbleExpr(jstate, (Node *) query->windowClause);
-	JumbleExpr(jstate, (Node *) query->distinctClause);
-	JumbleExpr(jstate, (Node *) query->sortClause);
-	JumbleExpr(jstate, query->limitOffset);
-	JumbleExpr(jstate, query->limitCount);
-	JumbleRowMarks(jstate, query->rowMarks);
-	JumbleExpr(jstate, query->setOperations);
-}
-
-/*
- * Jumble a range table
- */
-static void
-JumbleRangeTable(pgssJumbleState *jstate, List *rtable)
-{
-	ListCell   *lc;
-
-	foreach(lc, rtable)
-	{
-		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
-
-		APP_JUMB(rte->rtekind);
-		switch (rte->rtekind)
-		{
-			case RTE_RELATION:
-				APP_JUMB(rte->relid);
-				JumbleExpr(jstate, (Node *) rte->tablesample);
-				break;
-			case RTE_SUBQUERY:
-				JumbleQuery(jstate, rte->subquery);
-				break;
-			case RTE_JOIN:
-				APP_JUMB(rte->jointype);
-				break;
-			case RTE_FUNCTION:
-				JumbleExpr(jstate, (Node *) rte->functions);
-				break;
-			case RTE_TABLEFUNC:
-				JumbleExpr(jstate, (Node *) rte->tablefunc);
-				break;
-			case RTE_VALUES:
-				JumbleExpr(jstate, (Node *) rte->values_lists);
-				break;
-			case RTE_CTE:
-
-				/*
-				 * Depending on the CTE name here isn't ideal, but it's the
-				 * only info we have to identify the referenced WITH item.
-				 */
-				APP_JUMB_STRING(rte->ctename);
-				APP_JUMB(rte->ctelevelsup);
-				break;
-			case RTE_NAMEDTUPLESTORE:
-				APP_JUMB_STRING(rte->enrname);
-				break;
-			case RTE_RESULT:
-				break;
-			default:
-				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
-				break;
-		}
-	}
-}
-
-/*
- * Jumble a rowMarks list
- */
-static void
-JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks)
-{
-	ListCell   *lc;
-
-	foreach(lc, rowMarks)
-	{
-		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
-
-		if (!rowmark->pushedDown)
-		{
-			APP_JUMB(rowmark->rti);
-			APP_JUMB(rowmark->strength);
-			APP_JUMB(rowmark->waitPolicy);
-		}
-	}
-}
-
-/*
- * Jumble an expression tree
- *
- * In general this function should handle all the same node types that
- * expression_tree_walker() does, and therefore it's coded to be as parallel
- * to that function as possible.  However, since we are only invoked on
- * queries immediately post-parse-analysis, we need not handle node types
- * that only appear in planning.
- *
- * Note: the reason we don't simply use expression_tree_walker() is that the
- * point of that function is to support tree walkers that don't care about
- * most tree node types, but here we care about all types.  We should complain
- * about any unrecognized node type.
- */
-static void
-JumbleExpr(pgssJumbleState *jstate, Node *node)
-{
-	ListCell   *temp;
-
-	if (node == NULL)
-		return;
-
-	/* Guard against stack overflow due to overly complex expressions */
-	check_stack_depth();
-
-	/*
-	 * We always emit the node's NodeTag, then any additional fields that are
-	 * considered significant, and then we recurse to any child nodes.
-	 */
-	APP_JUMB(node->type);
-
-	switch (nodeTag(node))
-	{
-		case T_Var:
-			{
-				Var		   *var = (Var *) node;
-
-				APP_JUMB(var->varno);
-				APP_JUMB(var->varattno);
-				APP_JUMB(var->varlevelsup);
-			}
-			break;
-		case T_Const:
-			{
-				Const	   *c = (Const *) node;
-
-				/* We jumble only the constant's type, not its value */
-				APP_JUMB(c->consttype);
-				/* Also, record its parse location for query normalization */
-				RecordConstLocation(jstate, c->location);
-			}
-			break;
-		case T_Param:
-			{
-				Param	   *p = (Param *) node;
-
-				APP_JUMB(p->paramkind);
-				APP_JUMB(p->paramid);
-				APP_JUMB(p->paramtype);
-				/* Also, track the highest external Param id */
-				if (p->paramkind == PARAM_EXTERN &&
-					p->paramid > jstate->highest_extern_param_id)
-					jstate->highest_extern_param_id = p->paramid;
-			}
-			break;
-		case T_Aggref:
-			{
-				Aggref	   *expr = (Aggref *) node;
-
-				APP_JUMB(expr->aggfnoid);
-				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggorder);
-				JumbleExpr(jstate, (Node *) expr->aggdistinct);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_GroupingFunc:
-			{
-				GroupingFunc *grpnode = (GroupingFunc *) node;
-
-				JumbleExpr(jstate, (Node *) grpnode->refs);
-			}
-			break;
-		case T_WindowFunc:
-			{
-				WindowFunc *expr = (WindowFunc *) node;
-
-				APP_JUMB(expr->winfnoid);
-				APP_JUMB(expr->winref);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_SubscriptingRef:
-			{
-				SubscriptingRef *sbsref = (SubscriptingRef *) node;
-
-				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
-			}
-			break;
-		case T_FuncExpr:
-			{
-				FuncExpr   *expr = (FuncExpr *) node;
-
-				APP_JUMB(expr->funcid);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_NamedArgExpr:
-			{
-				NamedArgExpr *nae = (NamedArgExpr *) node;
-
-				APP_JUMB(nae->argnumber);
-				JumbleExpr(jstate, (Node *) nae->arg);
-			}
-			break;
-		case T_OpExpr:
-		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
-		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
-			{
-				OpExpr	   *expr = (OpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_ScalarArrayOpExpr:
-			{
-				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				APP_JUMB(expr->useOr);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_BoolExpr:
-			{
-				BoolExpr   *expr = (BoolExpr *) node;
-
-				APP_JUMB(expr->boolop);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_SubLink:
-			{
-				SubLink    *sublink = (SubLink *) node;
-
-				APP_JUMB(sublink->subLinkType);
-				APP_JUMB(sublink->subLinkId);
-				JumbleExpr(jstate, (Node *) sublink->testexpr);
-				JumbleQuery(jstate, castNode(Query, sublink->subselect));
-			}
-			break;
-		case T_FieldSelect:
-			{
-				FieldSelect *fs = (FieldSelect *) node;
-
-				APP_JUMB(fs->fieldnum);
-				JumbleExpr(jstate, (Node *) fs->arg);
-			}
-			break;
-		case T_FieldStore:
-			{
-				FieldStore *fstore = (FieldStore *) node;
-
-				JumbleExpr(jstate, (Node *) fstore->arg);
-				JumbleExpr(jstate, (Node *) fstore->newvals);
-			}
-			break;
-		case T_RelabelType:
-			{
-				RelabelType *rt = (RelabelType *) node;
-
-				APP_JUMB(rt->resulttype);
-				JumbleExpr(jstate, (Node *) rt->arg);
-			}
-			break;
-		case T_CoerceViaIO:
-			{
-				CoerceViaIO *cio = (CoerceViaIO *) node;
-
-				APP_JUMB(cio->resulttype);
-				JumbleExpr(jstate, (Node *) cio->arg);
-			}
-			break;
-		case T_ArrayCoerceExpr:
-			{
-				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
-
-				APP_JUMB(acexpr->resulttype);
-				JumbleExpr(jstate, (Node *) acexpr->arg);
-				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
-			}
-			break;
-		case T_ConvertRowtypeExpr:
-			{
-				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
-
-				APP_JUMB(crexpr->resulttype);
-				JumbleExpr(jstate, (Node *) crexpr->arg);
-			}
-			break;
-		case T_CollateExpr:
-			{
-				CollateExpr *ce = (CollateExpr *) node;
-
-				APP_JUMB(ce->collOid);
-				JumbleExpr(jstate, (Node *) ce->arg);
-			}
-			break;
-		case T_CaseExpr:
-			{
-				CaseExpr   *caseexpr = (CaseExpr *) node;
-
-				JumbleExpr(jstate, (Node *) caseexpr->arg);
-				foreach(temp, caseexpr->args)
-				{
-					CaseWhen   *when = lfirst_node(CaseWhen, temp);
-
-					JumbleExpr(jstate, (Node *) when->expr);
-					JumbleExpr(jstate, (Node *) when->result);
-				}
-				JumbleExpr(jstate, (Node *) caseexpr->defresult);
-			}
-			break;
-		case T_CaseTestExpr:
-			{
-				CaseTestExpr *ct = (CaseTestExpr *) node;
-
-				APP_JUMB(ct->typeId);
-			}
-			break;
-		case T_ArrayExpr:
-			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
-			break;
-		case T_RowExpr:
-			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
-			break;
-		case T_RowCompareExpr:
-			{
-				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
-
-				APP_JUMB(rcexpr->rctype);
-				JumbleExpr(jstate, (Node *) rcexpr->largs);
-				JumbleExpr(jstate, (Node *) rcexpr->rargs);
-			}
-			break;
-		case T_CoalesceExpr:
-			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
-			break;
-		case T_MinMaxExpr:
-			{
-				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
-
-				APP_JUMB(mmexpr->op);
-				JumbleExpr(jstate, (Node *) mmexpr->args);
-			}
-			break;
-		case T_SQLValueFunction:
-			{
-				SQLValueFunction *svf = (SQLValueFunction *) node;
-
-				APP_JUMB(svf->op);
-				/* type is fully determined by op */
-				APP_JUMB(svf->typmod);
-			}
-			break;
-		case T_XmlExpr:
-			{
-				XmlExpr    *xexpr = (XmlExpr *) node;
-
-				APP_JUMB(xexpr->op);
-				JumbleExpr(jstate, (Node *) xexpr->named_args);
-				JumbleExpr(jstate, (Node *) xexpr->args);
-			}
-			break;
-		case T_NullTest:
-			{
-				NullTest   *nt = (NullTest *) node;
-
-				APP_JUMB(nt->nulltesttype);
-				JumbleExpr(jstate, (Node *) nt->arg);
-			}
-			break;
-		case T_BooleanTest:
-			{
-				BooleanTest *bt = (BooleanTest *) node;
-
-				APP_JUMB(bt->booltesttype);
-				JumbleExpr(jstate, (Node *) bt->arg);
-			}
-			break;
-		case T_CoerceToDomain:
-			{
-				CoerceToDomain *cd = (CoerceToDomain *) node;
-
-				APP_JUMB(cd->resulttype);
-				JumbleExpr(jstate, (Node *) cd->arg);
-			}
-			break;
-		case T_CoerceToDomainValue:
-			{
-				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
-
-				APP_JUMB(cdv->typeId);
-			}
-			break;
-		case T_SetToDefault:
-			{
-				SetToDefault *sd = (SetToDefault *) node;
-
-				APP_JUMB(sd->typeId);
-			}
-			break;
-		case T_CurrentOfExpr:
-			{
-				CurrentOfExpr *ce = (CurrentOfExpr *) node;
-
-				APP_JUMB(ce->cvarno);
-				if (ce->cursor_name)
-					APP_JUMB_STRING(ce->cursor_name);
-				APP_JUMB(ce->cursor_param);
-			}
-			break;
-		case T_NextValueExpr:
-			{
-				NextValueExpr *nve = (NextValueExpr *) node;
-
-				APP_JUMB(nve->seqid);
-				APP_JUMB(nve->typeId);
-			}
-			break;
-		case T_InferenceElem:
-			{
-				InferenceElem *ie = (InferenceElem *) node;
-
-				APP_JUMB(ie->infercollid);
-				APP_JUMB(ie->inferopclass);
-				JumbleExpr(jstate, ie->expr);
-			}
-			break;
-		case T_TargetEntry:
-			{
-				TargetEntry *tle = (TargetEntry *) node;
-
-				APP_JUMB(tle->resno);
-				APP_JUMB(tle->ressortgroupref);
-				JumbleExpr(jstate, (Node *) tle->expr);
-			}
-			break;
-		case T_RangeTblRef:
-			{
-				RangeTblRef *rtr = (RangeTblRef *) node;
-
-				APP_JUMB(rtr->rtindex);
-			}
-			break;
-		case T_JoinExpr:
-			{
-				JoinExpr   *join = (JoinExpr *) node;
-
-				APP_JUMB(join->jointype);
-				APP_JUMB(join->isNatural);
-				APP_JUMB(join->rtindex);
-				JumbleExpr(jstate, join->larg);
-				JumbleExpr(jstate, join->rarg);
-				JumbleExpr(jstate, join->quals);
-			}
-			break;
-		case T_FromExpr:
-			{
-				FromExpr   *from = (FromExpr *) node;
-
-				JumbleExpr(jstate, (Node *) from->fromlist);
-				JumbleExpr(jstate, from->quals);
-			}
-			break;
-		case T_OnConflictExpr:
-			{
-				OnConflictExpr *conf = (OnConflictExpr *) node;
-
-				APP_JUMB(conf->action);
-				JumbleExpr(jstate, (Node *) conf->arbiterElems);
-				JumbleExpr(jstate, conf->arbiterWhere);
-				JumbleExpr(jstate, (Node *) conf->onConflictSet);
-				JumbleExpr(jstate, conf->onConflictWhere);
-				APP_JUMB(conf->constraint);
-				APP_JUMB(conf->exclRelIndex);
-				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
-			}
-			break;
-		case T_List:
-			foreach(temp, (List *) node)
-			{
-				JumbleExpr(jstate, (Node *) lfirst(temp));
-			}
-			break;
-		case T_IntList:
-			foreach(temp, (List *) node)
-			{
-				APP_JUMB(lfirst_int(temp));
-			}
-			break;
-		case T_SortGroupClause:
-			{
-				SortGroupClause *sgc = (SortGroupClause *) node;
-
-				APP_JUMB(sgc->tleSortGroupRef);
-				APP_JUMB(sgc->eqop);
-				APP_JUMB(sgc->sortop);
-				APP_JUMB(sgc->nulls_first);
-			}
-			break;
-		case T_GroupingSet:
-			{
-				GroupingSet *gsnode = (GroupingSet *) node;
-
-				JumbleExpr(jstate, (Node *) gsnode->content);
-			}
-			break;
-		case T_WindowClause:
-			{
-				WindowClause *wc = (WindowClause *) node;
-
-				APP_JUMB(wc->winref);
-				APP_JUMB(wc->frameOptions);
-				JumbleExpr(jstate, (Node *) wc->partitionClause);
-				JumbleExpr(jstate, (Node *) wc->orderClause);
-				JumbleExpr(jstate, wc->startOffset);
-				JumbleExpr(jstate, wc->endOffset);
-			}
-			break;
-		case T_CommonTableExpr:
-			{
-				CommonTableExpr *cte = (CommonTableExpr *) node;
-
-				/* we store the string name because RTE_CTE RTEs need it */
-				APP_JUMB_STRING(cte->ctename);
-				APP_JUMB(cte->ctematerialized);
-				JumbleQuery(jstate, castNode(Query, cte->ctequery));
-			}
-			break;
-		case T_SetOperationStmt:
-			{
-				SetOperationStmt *setop = (SetOperationStmt *) node;
-
-				APP_JUMB(setop->op);
-				APP_JUMB(setop->all);
-				JumbleExpr(jstate, setop->larg);
-				JumbleExpr(jstate, setop->rarg);
-			}
-			break;
-		case T_RangeTblFunction:
-			{
-				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
-
-				JumbleExpr(jstate, rtfunc->funcexpr);
-			}
-			break;
-		case T_TableFunc:
-			{
-				TableFunc  *tablefunc = (TableFunc *) node;
-
-				JumbleExpr(jstate, tablefunc->docexpr);
-				JumbleExpr(jstate, tablefunc->rowexpr);
-				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
-			}
-			break;
-		case T_TableSampleClause:
-			{
-				TableSampleClause *tsc = (TableSampleClause *) node;
-
-				APP_JUMB(tsc->tsmhandler);
-				JumbleExpr(jstate, (Node *) tsc->args);
-				JumbleExpr(jstate, (Node *) tsc->repeatable);
-			}
-			break;
-		default:
-			/* Only a warning, since we can stumble along anyway */
-			elog(WARNING, "unrecognized node type: %d",
-				 (int) nodeTag(node));
-			break;
-	}
-}
-
-/*
- * Record location of constant within query string of query tree
- * that is currently being walked.
- */
-static void
-RecordConstLocation(pgssJumbleState *jstate, int location)
-{
-	/* -1 indicates unknown or undefined location */
-	if (location >= 0)
-	{
-		/* enlarge array if needed */
-		if (jstate->clocations_count >= jstate->clocations_buf_size)
-		{
-			jstate->clocations_buf_size *= 2;
-			jstate->clocations = (pgssLocationLen *)
-				repalloc(jstate->clocations,
-						 jstate->clocations_buf_size *
-						 sizeof(pgssLocationLen));
-		}
-		jstate->clocations[jstate->clocations_count].location = location;
-		/* initialize lengths to -1 to simplify fill_in_constant_lengths */
-		jstate->clocations[jstate->clocations_count].length = -1;
-		jstate->clocations_count++;
-	}
-}
-
 /*
  * Generate a normalized version of the query string that will be used to
  * represent all similar queries.
@@ -3233,7 +2478,7 @@ RecordConstLocation(pgssJumbleState *jstate, int location)
  * Returns a palloc'd string.
  */
 static char *
-generate_normalized_query(pgssJumbleState *jstate, const char *query,
+generate_normalized_query(JumbleState *jstate, const char *query,
 						  int query_loc, int *query_len_p)
 {
 	char	   *norm_query;
@@ -3340,10 +2585,10 @@ generate_normalized_query(pgssJumbleState *jstate, const char *query,
  * reason for a constant to start with a '-'.
  */
 static void
-fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+fill_in_constant_lengths(JumbleState *jstate, const char *query,
 						 int query_loc)
 {
-	pgssLocationLen *locs;
+	LocationLen *locs;
 	core_yyscan_t yyscanner;
 	core_yy_extra_type yyextra;
 	core_YYSTYPE yylval;
@@ -3357,7 +2602,7 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 	 */
 	if (jstate->clocations_count > 1)
 		qsort(jstate->clocations, jstate->clocations_count,
-			  sizeof(pgssLocationLen), comp_location);
+			  sizeof(LocationLen), comp_location);
 	locs = jstate->clocations;
 
 	/* initialize the flex scanner --- should match raw_parser() */
@@ -3437,13 +2682,13 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 }
 
 /*
- * comp_location: comparator for qsorting pgssLocationLen structs by location
+ * comp_location: comparator for qsorting LocationLen structs by location
  */
 static int
 comp_location(const void *a, const void *b)
 {
-	int			l = ((const pgssLocationLen *) a)->location;
-	int			r = ((const pgssLocationLen *) b)->location;
+	int			l = ((const LocationLen *) a)->location;
+	int			r = ((const LocationLen *) b)->location;
 
 	if (l < r)
 		return -1;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.conf b/contrib/pg_stat_statements/pg_stat_statements.conf
index 13346e2807..d98411ea3f 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.conf
+++ b/contrib/pg_stat_statements/pg_stat_statements.conf
@@ -1 +1,2 @@
 shared_preload_libraries = 'pg_stat_statements'
+compute_queryid = on
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 1ef880cda5..b40f7b5af3 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7416,6 +7416,24 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
      <title>Statistics Monitoring</title>
      <variablelist>
 
+     <varlistentry id="guc-compute-queryid" xreflabel="compute_queryid">
+      <term><varname>compute_queryid</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>compute_queryid</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables in core query identifier computation.arameter.  The
+        <xref linkend="pgstatstatements"/> extension requires a query
+        identifier to be computed.  Note that an external module can
+        alternatively be used if the in core query identifier computation
+        specification doesn't suit your need.  In this case, in core
+        computation must be disabled.  The default is <literal>off</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><varname>log_statement_stats</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index c159fb2957..c59336cd49 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -44,6 +44,8 @@
 #include "parser/parse_target.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
+#include "utils/guc.h"
+#include "utils/queryjumble.h"
 #include "utils/rel.h"
 
 
@@ -103,6 +105,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -115,8 +118,11 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	query = transformTopLevelStmt(pstate, parseTree);
 
+	if (compute_queryid)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
@@ -136,6 +142,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -148,8 +155,11 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 	/* make sure all is well with parameter types */
 	check_variable_parameters(pstate, query);
 
+	if (compute_queryid)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 411cfadbff..0deb3c143f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -719,6 +719,7 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 	ParseState *pstate;
 	Query	   *query;
 	List	   *querytree_list;
+	JumbleState *jstate = NULL;
 
 	Assert(query_string != NULL);	/* required as of 8.4 */
 
@@ -737,8 +738,11 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	query = transformTopLevelStmt(pstate, parsetree);
 
+	if (compute_queryid)
+		jstate = JumbleQuery(query, query_string);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 2397fc2453..1d5327cf64 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -22,6 +22,7 @@ OBJS = \
 	pg_rusage.o \
 	ps_status.o \
 	queryenvironment.o \
+	queryjumble.o \
 	rls.o \
 	sampling.o \
 	superuser.o \
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a62d64eaa4..46a56a4a59 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -510,6 +510,7 @@ extern const struct config_enum_entry dynamic_shared_memory_options[];
 /*
  * GUC option variables that are exported from this module
  */
+bool		compute_queryid = false;
 bool		log_duration = false;
 bool		Debug_print_plan = false;
 bool		Debug_print_parse = false;
@@ -1404,6 +1405,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"compute_queryid", PGC_SUSET, STATS_MONITORING,
+			gettext_noop("Compute query identifiers."),
+			NULL
+		},
+		&compute_queryid,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"log_parser_stats", PGC_SUSET, STATS_MONITORING,
 			gettext_noop("Writes parser performance statistics to the server log."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9cb571f7cc..81bcb9d25c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -591,6 +591,7 @@
 
 # - Monitoring -
 
+#compute_queryid = off
 #log_parser_stats = off
 #log_planner_stats = off
 #log_executor_stats = off
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
new file mode 100644
index 0000000000..ae84fcac6e
--- /dev/null
+++ b/src/backend/utils/misc/queryjumble.c
@@ -0,0 +1,834 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.c
+ *	 Query normalization and fingerprinting.
+ *
+ * Normalization is a process whereby similar queries, typically differing only
+ * in their constants (though the exact rules are somewhat more subtle than
+ * that) are recognized as equivalent, and are tracked as a single entry.  This
+ * is particularly useful for non-prepared queries.
+ *
+ * Normalization is implemented by fingerprinting queries, selectively
+ * serializing those fields of each query tree's nodes that are judged to be
+ * essential to the query.  This is referred to as a query jumble.  This is
+ * distinct from a regular serialization in that various extraneous
+ * information is ignored as irrelevant or not essential to the query, such
+ * as the collations of Vars and, most notably, the values of constants.
+ *
+ * This jumble is acquired at the end of parse analysis of each query, and
+ * a 64-bit hash of it is stored into the query's Query.queryId field.
+ * The server then copies this value around, making it available in plan
+ * tree(s) generated from the query.  The executor can then use this value
+ * to blame query costs on the proper queryId.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/misc/queryjumble.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "common/hashfn.h"
+#include "miscadmin.h"
+#include "parser/scansup.h"
+#include "utils/queryjumble.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+static uint64 compute_utility_queryid(const char *str, int query_len);
+static void AppendJumble(JumbleState *jstate,
+						 const unsigned char *item, Size size);
+static void JumbleQueryInternal(JumbleState *jstate, Query *query);
+static void JumbleRangeTable(JumbleState *jstate, List *rtable);
+static void JumbleRowMarks(JumbleState *jstate, List *rowMarks);
+static void JumbleExpr(JumbleState *jstate, Node *node);
+static void RecordConstLocation(JumbleState *jstate, int location);
+
+/*
+ * Given a possibly multi-statement source string, confine our attention to the
+ * relevant part of the string.
+ */
+const char *
+clean_querytext(const char *query, int *location, int *len)
+{
+	int query_location = *location;
+	int query_len = *len;
+
+	/* First apply starting offset, unless it's -1 (unknown). */
+	if (query_location >= 0)
+	{
+		Assert(query_location <= strlen(query));
+		query += query_location;
+		/* Length of 0 (or -1) means "rest of string" */
+		if (query_len <= 0)
+			query_len = strlen(query);
+		else
+			Assert(query_len <= strlen(query));
+	}
+	else
+	{
+		/* If query location is unknown, distrust query_len as well */
+		query_location = 0;
+		query_len = strlen(query);
+	}
+
+	/*
+	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
+	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 */
+	while (query_len > 0 && scanner_isspace(query[0]))
+		query++, query_location++, query_len--;
+	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
+		query_len--;
+
+	*location = query_location;
+	*len = query_len;
+
+	return query;
+}
+
+JumbleState *
+JumbleQuery(Query *query, const char *querytext)
+{
+	JumbleState *jstate = NULL;
+	if (query->utilityStmt)
+	{
+		const char *sql;
+		int query_location = query->stmt_location;
+		int query_len = query->stmt_len;
+
+		/*
+		 * Confine our attention to the relevant part of the string, if the
+		 * query is a portion of a multi-statement source string.
+		 */
+		sql = clean_querytext(querytext, &query_location, &query_len);
+
+		query->queryId = compute_utility_queryid(sql, query_len);
+	}
+	else
+	{
+		jstate = (JumbleState *) palloc(sizeof(JumbleState));
+
+		/* Set up workspace for query jumbling */
+		jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
+		jstate->jumble_len = 0;
+		jstate->clocations_buf_size = 32;
+		jstate->clocations = (LocationLen *)
+			palloc(jstate->clocations_buf_size * sizeof(LocationLen));
+		jstate->clocations_count = 0;
+		jstate->highest_extern_param_id = 0;
+
+		/* Compute query ID and mark the Query node with it */
+		JumbleQueryInternal(jstate, query);
+		query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
+														  jstate->jumble_len,
+														  0));
+
+		/*
+		 * If we are unlucky enough to get a hash of zero, use 1 instead, to
+		 * prevent confusion with the utility-statement case.
+		 */
+		if (query->queryId == UINT64CONST(0))
+			query->queryId = UINT64CONST(1);
+	}
+
+	return jstate;
+}
+
+/*
+ * Compute a query identifier for the given utility query string.
+ */
+static uint64
+compute_utility_queryid(const char *str, int query_len)
+{
+	uint64 queryId;
+
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+											   query_len, 0));
+
+	/*
+	 * If we are unlucky enough to get a hash of zero(invalid), use
+	 * queryID as 2 instead, queryID 1 is already in use for normal
+	 * statements.
+	 */
+	if (queryId == UINT64CONST(0))
+		queryId = UINT64CONST(2);
+
+	return queryId;
+}
+
+/*
+ * AppendJumble: Append a value that is substantive in a given query to
+ * the current jumble.
+ */
+static void
+AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
+{
+	unsigned char *jumble = jstate->jumble;
+	Size		jumble_len = jstate->jumble_len;
+
+	/*
+	 * Whenever the jumble buffer is full, we hash the current contents and
+	 * reset the buffer to contain just that hash value, thus relying on the
+	 * hash to summarize everything so far.
+	 */
+	while (size > 0)
+	{
+		Size		part_size;
+
+		if (jumble_len >= JUMBLE_SIZE)
+		{
+			uint64		start_hash;
+
+			start_hash = DatumGetUInt64(hash_any_extended(jumble,
+														  JUMBLE_SIZE, 0));
+			memcpy(jumble, &start_hash, sizeof(start_hash));
+			jumble_len = sizeof(start_hash);
+		}
+		part_size = Min(size, JUMBLE_SIZE - jumble_len);
+		memcpy(jumble + jumble_len, item, part_size);
+		jumble_len += part_size;
+		item += part_size;
+		size -= part_size;
+	}
+	jstate->jumble_len = jumble_len;
+}
+
+/*
+ * Wrappers around AppendJumble to encapsulate details of serialization
+ * of individual local variable elements.
+ */
+#define APP_JUMB(item) \
+	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
+#define APP_JUMB_STRING(str) \
+	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
+
+/*
+ * JumbleQueryInternal: Selectively serialize the query tree, appending
+ * significant data to the "query jumble" while ignoring nonsignificant data.
+ *
+ * Rule of thumb for what to include is that we should ignore anything not
+ * semantically significant (such as alias names) as well as anything that can
+ * be deduced from child nodes (else we'd just be double-hashing that piece
+ * of information).
+ */
+static void
+JumbleQueryInternal(JumbleState *jstate, Query *query)
+{
+	Assert(IsA(query, Query));
+	Assert(query->utilityStmt == NULL);
+
+	APP_JUMB(query->commandType);
+	/* resultRelation is usually predictable from commandType */
+	JumbleExpr(jstate, (Node *) query->cteList);
+	JumbleRangeTable(jstate, query->rtable);
+	JumbleExpr(jstate, (Node *) query->jointree);
+	JumbleExpr(jstate, (Node *) query->targetList);
+	JumbleExpr(jstate, (Node *) query->onConflict);
+	JumbleExpr(jstate, (Node *) query->returningList);
+	JumbleExpr(jstate, (Node *) query->groupClause);
+	JumbleExpr(jstate, (Node *) query->groupingSets);
+	JumbleExpr(jstate, query->havingQual);
+	JumbleExpr(jstate, (Node *) query->windowClause);
+	JumbleExpr(jstate, (Node *) query->distinctClause);
+	JumbleExpr(jstate, (Node *) query->sortClause);
+	JumbleExpr(jstate, query->limitOffset);
+	JumbleExpr(jstate, query->limitCount);
+	JumbleRowMarks(jstate, query->rowMarks);
+	JumbleExpr(jstate, query->setOperations);
+}
+
+/*
+ * Jumble a range table
+ */
+static void
+JumbleRangeTable(JumbleState *jstate, List *rtable)
+{
+	ListCell   *lc;
+
+	foreach(lc, rtable)
+	{
+		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+		APP_JUMB(rte->rtekind);
+		switch (rte->rtekind)
+		{
+			case RTE_RELATION:
+				APP_JUMB(rte->relid);
+				JumbleExpr(jstate, (Node *) rte->tablesample);
+				break;
+			case RTE_SUBQUERY:
+				JumbleQueryInternal(jstate, rte->subquery);
+				break;
+			case RTE_JOIN:
+				APP_JUMB(rte->jointype);
+				break;
+			case RTE_FUNCTION:
+				JumbleExpr(jstate, (Node *) rte->functions);
+				break;
+			case RTE_TABLEFUNC:
+				JumbleExpr(jstate, (Node *) rte->tablefunc);
+				break;
+			case RTE_VALUES:
+				JumbleExpr(jstate, (Node *) rte->values_lists);
+				break;
+			case RTE_CTE:
+
+				/*
+				 * Depending on the CTE name here isn't ideal, but it's the
+				 * only info we have to identify the referenced WITH item.
+				 */
+				APP_JUMB_STRING(rte->ctename);
+				APP_JUMB(rte->ctelevelsup);
+				break;
+			case RTE_NAMEDTUPLESTORE:
+				APP_JUMB_STRING(rte->enrname);
+				break;
+			case RTE_RESULT:
+				break;
+			default:
+				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
+				break;
+		}
+	}
+}
+
+/*
+ * Jumble a rowMarks list
+ */
+static void
+JumbleRowMarks(JumbleState *jstate, List *rowMarks)
+{
+	ListCell   *lc;
+
+	foreach(lc, rowMarks)
+	{
+		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
+
+		if (!rowmark->pushedDown)
+		{
+			APP_JUMB(rowmark->rti);
+			APP_JUMB(rowmark->strength);
+			APP_JUMB(rowmark->waitPolicy);
+		}
+	}
+}
+
+/*
+ * Jumble an expression tree
+ *
+ * In general this function should handle all the same node types that
+ * expression_tree_walker() does, and therefore it's coded to be as parallel
+ * to that function as possible.  However, since we are only invoked on
+ * queries immediately post-parse-analysis, we need not handle node types
+ * that only appear in planning.
+ *
+ * Note: the reason we don't simply use expression_tree_walker() is that the
+ * point of that function is to support tree walkers that don't care about
+ * most tree node types, but here we care about all types.  We should complain
+ * about any unrecognized node type.
+ */
+static void
+JumbleExpr(JumbleState *jstate, Node *node)
+{
+	ListCell   *temp;
+
+	if (node == NULL)
+		return;
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+
+	/*
+	 * We always emit the node's NodeTag, then any additional fields that are
+	 * considered significant, and then we recurse to any child nodes.
+	 */
+	APP_JUMB(node->type);
+
+	switch (nodeTag(node))
+	{
+		case T_Var:
+			{
+				Var		   *var = (Var *) node;
+
+				APP_JUMB(var->varno);
+				APP_JUMB(var->varattno);
+				APP_JUMB(var->varlevelsup);
+			}
+			break;
+		case T_Const:
+			{
+				Const	   *c = (Const *) node;
+
+				/* We jumble only the constant's type, not its value */
+				APP_JUMB(c->consttype);
+				/* Also, record its parse location for query normalization */
+				RecordConstLocation(jstate, c->location);
+			}
+			break;
+		case T_Param:
+			{
+				Param	   *p = (Param *) node;
+
+				APP_JUMB(p->paramkind);
+				APP_JUMB(p->paramid);
+				APP_JUMB(p->paramtype);
+				/* Also, track the highest external Param id */
+				if (p->paramkind == PARAM_EXTERN &&
+					p->paramid > jstate->highest_extern_param_id)
+					jstate->highest_extern_param_id = p->paramid;
+			}
+			break;
+		case T_Aggref:
+			{
+				Aggref	   *expr = (Aggref *) node;
+
+				APP_JUMB(expr->aggfnoid);
+				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggorder);
+				JumbleExpr(jstate, (Node *) expr->aggdistinct);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_GroupingFunc:
+			{
+				GroupingFunc *grpnode = (GroupingFunc *) node;
+
+				JumbleExpr(jstate, (Node *) grpnode->refs);
+			}
+			break;
+		case T_WindowFunc:
+			{
+				WindowFunc *expr = (WindowFunc *) node;
+
+				APP_JUMB(expr->winfnoid);
+				APP_JUMB(expr->winref);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_SubscriptingRef:
+			{
+				SubscriptingRef *sbsref = (SubscriptingRef *) node;
+
+				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
+			}
+			break;
+		case T_FuncExpr:
+			{
+				FuncExpr   *expr = (FuncExpr *) node;
+
+				APP_JUMB(expr->funcid);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_NamedArgExpr:
+			{
+				NamedArgExpr *nae = (NamedArgExpr *) node;
+
+				APP_JUMB(nae->argnumber);
+				JumbleExpr(jstate, (Node *) nae->arg);
+			}
+			break;
+		case T_OpExpr:
+		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
+		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
+			{
+				OpExpr	   *expr = (OpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_ScalarArrayOpExpr:
+			{
+				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				APP_JUMB(expr->useOr);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_BoolExpr:
+			{
+				BoolExpr   *expr = (BoolExpr *) node;
+
+				APP_JUMB(expr->boolop);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_SubLink:
+			{
+				SubLink    *sublink = (SubLink *) node;
+
+				APP_JUMB(sublink->subLinkType);
+				APP_JUMB(sublink->subLinkId);
+				JumbleExpr(jstate, (Node *) sublink->testexpr);
+				JumbleQueryInternal(jstate, castNode(Query, sublink->subselect));
+			}
+			break;
+		case T_FieldSelect:
+			{
+				FieldSelect *fs = (FieldSelect *) node;
+
+				APP_JUMB(fs->fieldnum);
+				JumbleExpr(jstate, (Node *) fs->arg);
+			}
+			break;
+		case T_FieldStore:
+			{
+				FieldStore *fstore = (FieldStore *) node;
+
+				JumbleExpr(jstate, (Node *) fstore->arg);
+				JumbleExpr(jstate, (Node *) fstore->newvals);
+			}
+			break;
+		case T_RelabelType:
+			{
+				RelabelType *rt = (RelabelType *) node;
+
+				APP_JUMB(rt->resulttype);
+				JumbleExpr(jstate, (Node *) rt->arg);
+			}
+			break;
+		case T_CoerceViaIO:
+			{
+				CoerceViaIO *cio = (CoerceViaIO *) node;
+
+				APP_JUMB(cio->resulttype);
+				JumbleExpr(jstate, (Node *) cio->arg);
+			}
+			break;
+		case T_ArrayCoerceExpr:
+			{
+				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
+
+				APP_JUMB(acexpr->resulttype);
+				JumbleExpr(jstate, (Node *) acexpr->arg);
+				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
+			}
+			break;
+		case T_ConvertRowtypeExpr:
+			{
+				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
+
+				APP_JUMB(crexpr->resulttype);
+				JumbleExpr(jstate, (Node *) crexpr->arg);
+			}
+			break;
+		case T_CollateExpr:
+			{
+				CollateExpr *ce = (CollateExpr *) node;
+
+				APP_JUMB(ce->collOid);
+				JumbleExpr(jstate, (Node *) ce->arg);
+			}
+			break;
+		case T_CaseExpr:
+			{
+				CaseExpr   *caseexpr = (CaseExpr *) node;
+
+				JumbleExpr(jstate, (Node *) caseexpr->arg);
+				foreach(temp, caseexpr->args)
+				{
+					CaseWhen   *when = lfirst_node(CaseWhen, temp);
+
+					JumbleExpr(jstate, (Node *) when->expr);
+					JumbleExpr(jstate, (Node *) when->result);
+				}
+				JumbleExpr(jstate, (Node *) caseexpr->defresult);
+			}
+			break;
+		case T_CaseTestExpr:
+			{
+				CaseTestExpr *ct = (CaseTestExpr *) node;
+
+				APP_JUMB(ct->typeId);
+			}
+			break;
+		case T_ArrayExpr:
+			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
+			break;
+		case T_RowExpr:
+			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
+			break;
+		case T_RowCompareExpr:
+			{
+				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
+
+				APP_JUMB(rcexpr->rctype);
+				JumbleExpr(jstate, (Node *) rcexpr->largs);
+				JumbleExpr(jstate, (Node *) rcexpr->rargs);
+			}
+			break;
+		case T_CoalesceExpr:
+			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
+			break;
+		case T_MinMaxExpr:
+			{
+				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
+
+				APP_JUMB(mmexpr->op);
+				JumbleExpr(jstate, (Node *) mmexpr->args);
+			}
+			break;
+		case T_SQLValueFunction:
+			{
+				SQLValueFunction *svf = (SQLValueFunction *) node;
+
+				APP_JUMB(svf->op);
+				/* type is fully determined by op */
+				APP_JUMB(svf->typmod);
+			}
+			break;
+		case T_XmlExpr:
+			{
+				XmlExpr    *xexpr = (XmlExpr *) node;
+
+				APP_JUMB(xexpr->op);
+				JumbleExpr(jstate, (Node *) xexpr->named_args);
+				JumbleExpr(jstate, (Node *) xexpr->args);
+			}
+			break;
+		case T_NullTest:
+			{
+				NullTest   *nt = (NullTest *) node;
+
+				APP_JUMB(nt->nulltesttype);
+				JumbleExpr(jstate, (Node *) nt->arg);
+			}
+			break;
+		case T_BooleanTest:
+			{
+				BooleanTest *bt = (BooleanTest *) node;
+
+				APP_JUMB(bt->booltesttype);
+				JumbleExpr(jstate, (Node *) bt->arg);
+			}
+			break;
+		case T_CoerceToDomain:
+			{
+				CoerceToDomain *cd = (CoerceToDomain *) node;
+
+				APP_JUMB(cd->resulttype);
+				JumbleExpr(jstate, (Node *) cd->arg);
+			}
+			break;
+		case T_CoerceToDomainValue:
+			{
+				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
+
+				APP_JUMB(cdv->typeId);
+			}
+			break;
+		case T_SetToDefault:
+			{
+				SetToDefault *sd = (SetToDefault *) node;
+
+				APP_JUMB(sd->typeId);
+			}
+			break;
+		case T_CurrentOfExpr:
+			{
+				CurrentOfExpr *ce = (CurrentOfExpr *) node;
+
+				APP_JUMB(ce->cvarno);
+				if (ce->cursor_name)
+					APP_JUMB_STRING(ce->cursor_name);
+				APP_JUMB(ce->cursor_param);
+			}
+			break;
+		case T_NextValueExpr:
+			{
+				NextValueExpr *nve = (NextValueExpr *) node;
+
+				APP_JUMB(nve->seqid);
+				APP_JUMB(nve->typeId);
+			}
+			break;
+		case T_InferenceElem:
+			{
+				InferenceElem *ie = (InferenceElem *) node;
+
+				APP_JUMB(ie->infercollid);
+				APP_JUMB(ie->inferopclass);
+				JumbleExpr(jstate, ie->expr);
+			}
+			break;
+		case T_TargetEntry:
+			{
+				TargetEntry *tle = (TargetEntry *) node;
+
+				APP_JUMB(tle->resno);
+				APP_JUMB(tle->ressortgroupref);
+				JumbleExpr(jstate, (Node *) tle->expr);
+			}
+			break;
+		case T_RangeTblRef:
+			{
+				RangeTblRef *rtr = (RangeTblRef *) node;
+
+				APP_JUMB(rtr->rtindex);
+			}
+			break;
+		case T_JoinExpr:
+			{
+				JoinExpr   *join = (JoinExpr *) node;
+
+				APP_JUMB(join->jointype);
+				APP_JUMB(join->isNatural);
+				APP_JUMB(join->rtindex);
+				JumbleExpr(jstate, join->larg);
+				JumbleExpr(jstate, join->rarg);
+				JumbleExpr(jstate, join->quals);
+			}
+			break;
+		case T_FromExpr:
+			{
+				FromExpr   *from = (FromExpr *) node;
+
+				JumbleExpr(jstate, (Node *) from->fromlist);
+				JumbleExpr(jstate, from->quals);
+			}
+			break;
+		case T_OnConflictExpr:
+			{
+				OnConflictExpr *conf = (OnConflictExpr *) node;
+
+				APP_JUMB(conf->action);
+				JumbleExpr(jstate, (Node *) conf->arbiterElems);
+				JumbleExpr(jstate, conf->arbiterWhere);
+				JumbleExpr(jstate, (Node *) conf->onConflictSet);
+				JumbleExpr(jstate, conf->onConflictWhere);
+				APP_JUMB(conf->constraint);
+				APP_JUMB(conf->exclRelIndex);
+				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
+			}
+			break;
+		case T_List:
+			foreach(temp, (List *) node)
+			{
+				JumbleExpr(jstate, (Node *) lfirst(temp));
+			}
+			break;
+		case T_IntList:
+			foreach(temp, (List *) node)
+			{
+				APP_JUMB(lfirst_int(temp));
+			}
+			break;
+		case T_SortGroupClause:
+			{
+				SortGroupClause *sgc = (SortGroupClause *) node;
+
+				APP_JUMB(sgc->tleSortGroupRef);
+				APP_JUMB(sgc->eqop);
+				APP_JUMB(sgc->sortop);
+				APP_JUMB(sgc->nulls_first);
+			}
+			break;
+		case T_GroupingSet:
+			{
+				GroupingSet *gsnode = (GroupingSet *) node;
+
+				JumbleExpr(jstate, (Node *) gsnode->content);
+			}
+			break;
+		case T_WindowClause:
+			{
+				WindowClause *wc = (WindowClause *) node;
+
+				APP_JUMB(wc->winref);
+				APP_JUMB(wc->frameOptions);
+				JumbleExpr(jstate, (Node *) wc->partitionClause);
+				JumbleExpr(jstate, (Node *) wc->orderClause);
+				JumbleExpr(jstate, wc->startOffset);
+				JumbleExpr(jstate, wc->endOffset);
+			}
+			break;
+		case T_CommonTableExpr:
+			{
+				CommonTableExpr *cte = (CommonTableExpr *) node;
+
+				/* we store the string name because RTE_CTE RTEs need it */
+				APP_JUMB_STRING(cte->ctename);
+				APP_JUMB(cte->ctematerialized);
+				JumbleQueryInternal(jstate, castNode(Query, cte->ctequery));
+			}
+			break;
+		case T_SetOperationStmt:
+			{
+				SetOperationStmt *setop = (SetOperationStmt *) node;
+
+				APP_JUMB(setop->op);
+				APP_JUMB(setop->all);
+				JumbleExpr(jstate, setop->larg);
+				JumbleExpr(jstate, setop->rarg);
+			}
+			break;
+		case T_RangeTblFunction:
+			{
+				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
+
+				JumbleExpr(jstate, rtfunc->funcexpr);
+			}
+			break;
+		case T_TableFunc:
+			{
+				TableFunc  *tablefunc = (TableFunc *) node;
+
+				JumbleExpr(jstate, tablefunc->docexpr);
+				JumbleExpr(jstate, tablefunc->rowexpr);
+				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
+			}
+			break;
+		case T_TableSampleClause:
+			{
+				TableSampleClause *tsc = (TableSampleClause *) node;
+
+				APP_JUMB(tsc->tsmhandler);
+				JumbleExpr(jstate, (Node *) tsc->args);
+				JumbleExpr(jstate, (Node *) tsc->repeatable);
+			}
+			break;
+		default:
+			/* Only a warning, since we can stumble along anyway */
+			elog(WARNING, "unrecognized node type: %d",
+				 (int) nodeTag(node));
+			break;
+	}
+}
+
+/*
+ * Record location of constant within query string of query tree
+ * that is currently being walked.
+ */
+static void
+RecordConstLocation(JumbleState *jstate, int location)
+{
+	/* -1 indicates unknown or undefined location */
+	if (location >= 0)
+	{
+		/* enlarge array if needed */
+		if (jstate->clocations_count >= jstate->clocations_buf_size)
+		{
+			jstate->clocations_buf_size *= 2;
+			jstate->clocations = (LocationLen *)
+				repalloc(jstate->clocations,
+						 jstate->clocations_buf_size *
+						 sizeof(LocationLen));
+		}
+		jstate->clocations[jstate->clocations_count].location = location;
+		/* initialize lengths to -1 to simplify third-party module usage */
+		jstate->clocations[jstate->clocations_count].length = -1;
+		jstate->clocations_count++;
+	}
+}
diff --git a/src/include/parser/analyze.h b/src/include/parser/analyze.h
index 9d09a02141..e31c75d3a5 100644
--- a/src/include/parser/analyze.h
+++ b/src/include/parser/analyze.h
@@ -15,10 +15,12 @@
 #define ANALYZE_H
 
 #include "parser/parse_node.h"
+#include "utils/queryjumble.h"
 
 /* Hook for plugins to get control at end of parse analysis */
 typedef void (*post_parse_analyze_hook_type) (ParseState *pstate,
-											  Query *query);
+											  Query *query,
+											  JumbleState *jstate);
 extern PGDLLIMPORT post_parse_analyze_hook_type post_parse_analyze_hook;
 
 
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 073c8f3e06..57b854ce6b 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -248,6 +248,7 @@ extern bool log_btree_build_stats;
 extern PGDLLIMPORT bool check_function_bodies;
 extern bool session_auth_is_superuser;
 
+extern bool compute_queryid;
 extern bool log_duration;
 extern int	log_parameter_max_length;
 extern int	log_parameter_max_length_on_error;
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
new file mode 100644
index 0000000000..14087eea43
--- /dev/null
+++ b/src/include/utils/queryjumble.h
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.h
+ *	  Query normalization and fingerprinting.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/utils/queryjumble.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef QUERYJUBLE_H
+#define QUERYJUBLE_H
+
+#include "nodes/parsenodes.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+/*
+ * Struct for tracking locations/lengths of constants during normalization
+ */
+typedef struct LocationLen
+{
+	int			location;		/* start offset in query text */
+	int			length;			/* length in bytes, or -1 to ignore */
+} LocationLen;
+
+/*
+ * Working state for computing a query jumble and producing a normalized
+ * query string
+ */
+typedef struct JumbleState
+{
+	/* Jumble of current query tree */
+	unsigned char *jumble;
+
+	/* Number of bytes used in jumble[] */
+	Size		jumble_len;
+
+	/* Array of locations of constants that should be removed */
+	LocationLen *clocations;
+
+	/* Allocated length of clocations array */
+	int			clocations_buf_size;
+
+	/* Current number of valid entries in clocations array */
+	int			clocations_count;
+
+	/* highest Param id we've seen, in order to start normalization correctly */
+	int			highest_extern_param_id;
+} JumbleState;
+
+const char *clean_querytext(const char *query, int *location, int *len);
+JumbleState *JumbleQuery(Query *query, const char *querytext);
+
+#endif							/* QUERYJUMBLE_H */
-- 
2.28.0

rjuju123@gmail.com

about 5 years ago

In reply to: Julien Rouhaud (#106)

3 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Sun, Oct 18, 2020 at 4:12 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Sun, Oct 18, 2020 at 12:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Wait ... what? I've been thinking that this GUC is just to enable or
disable the computation of query ID, not to change the algorithm to do
so. Do we really need to allow different algorithms in different
sessions?

We established that some time ago, no?

I thought we established the need for allowing different algorithms,
but I assumed globally not per session. Anyway, allowing to enable or
disable compute_queryid per session would technically allow that,
assuming that you have another module loaded that computes a queryid
only if no-one was already computed. In that case pg_stat_statements
works as you would expect, you will get a new entry, with a duplicated
query text.

With a bit more thinking, there's at least one use case where it's
interesting to disable pg_stat_statements: queries using temporary
tables. In that case you're guaranteed to generate an infinity of
different queryid. That doesn't really help since you're not
aggregating anything anymore, and it also makes pg_stat_statements
virtually unusable as once you have a workload that needs frequent
eviction, the overhead is so bad that you basically have to disable
pg_stat_statements. We could alternatively add a GUC to disable
queryid computation when one of the tables is a temporary table, but
that's yet one among many considerations that are probably best
answered with a custom implementation.

I'm also attaching an updated patch with some attempt to improve the
documentation. I mention that in-core algorithm may not suits
everyone's needs, but we don't actually document what heuristics are.
Should we give more details on them and what are the most direct
consequences?

v15 that fixes recent conflicts.

Attachments:

v15-0003-Expose-query-identifier-in-verbose-explain.patchtext/x-patch; charset=US-ASCII; name=v15-0003-Expose-query-identifier-in-verbose-explain.patchDownload

From 952796fa1c65000948ed2a267f76676e354c989c Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Sun, 8 Mar 2020 14:34:44 +0100
Subject: [PATCH v15 3/3] Expose query identifier in verbose explain

If a query identifier has been computed, either by enabling compute_queryid or
using a third-party module, verbose explain will display it.

Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 doc/src/sgml/config.sgml              | 14 +++++++-------
 doc/src/sgml/ref/explain.sgml         |  6 ++++--
 src/backend/commands/explain.c        | 18 ++++++++++++++++++
 src/test/regress/expected/explain.out |  9 +++++++++
 src/test/regress/sql/explain.sql      |  3 +++
 5 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d9c85a1f80..1d2f7ba393 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7487,13 +7487,13 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         Enables or disables in core query identifier computation.  A query
         identifier can be displayed in the <link
         linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
-        view, or emitted in the log if configured via the <xref
-        linkend="guc-log-line-prefix"/> parameter.  The <xref
-        linkend="pgstatstatements"/> extension also requires a query identifier
-        to be computed.  Note that an external module can alternatively be used
-        if the in core query identifier computation specification doesn't suit
-        your need.  In this case, in core computation must be disabled.  The
-        default is <literal>off</literal>.
+        view, using <command>EXPLAIN</command>, or emitted in the log if
+        configured via the <xref linkend="guc-log-line-prefix"/> parameter.
+        The <xref linkend="pgstatstatements"/> extension also requires a query
+        identifier to be computed.  Note that an external module can
+        alternatively be used if the in core query identifier computation
+        specification doesn't suit your need.  In this case, in core
+        computation must be disabled.  The default is <literal>off</literal>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index c4512332a0..105b069b41 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -136,8 +136,10 @@ ROLLBACK;
       the output column list for each node in the plan tree, schema-qualify
       table and function names, always label variables in expressions with
       their range table alias, and always print the name of each trigger for
-      which statistics are displayed.  This parameter defaults to
-      <literal>FALSE</literal>.
+      which statistics are displayed.  The query identifier will also be
+      displayed if one has been compute, see <xref
+      linkend="guc-compute-queryid"/> for more details.  This parameter
+      defaults to <literal>FALSE</literal>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5d7eb3574c..2e1b4bf0bf 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -24,6 +24,7 @@
 #include "nodes/extensible.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "parser/analyze.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/bufmgr.h"
@@ -163,6 +164,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 {
 	ExplainState *es = NewExplainState();
 	TupOutputState *tstate;
+	JumbleState *jstate = NULL;
+	Query		*query;
 	List	   *rewritten;
 	ListCell   *lc;
 	bool		timing_set = false;
@@ -239,6 +242,13 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	/* if the summary was not set explicitly, set default value */
 	es->summary = (summary_set) ? es->summary : es->analyze;
 
+	query = castNode(Query, stmt->query);
+	if (compute_queryid)
+		jstate = JumbleQuery(query, pstate->p_sourcetext);
+
+	if (post_parse_analyze_hook)
+		(*post_parse_analyze_hook) (pstate, query, jstate);
+
 	/*
 	 * Parse analysis was done already, but we still have to run the rule
 	 * rewriter.  We do not do AcquireRewriteLocks: we assume the query either
@@ -598,6 +608,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	/* Create textual dump of plan tree */
 	ExplainPrintPlan(es, queryDesc);
 
+	if (es->verbose && plannedstmt->queryId != UINT64CONST(0))
+	{
+		char	buf[MAXINT8LEN+1];
+
+		pg_lltoa(plannedstmt->queryId, buf);
+		ExplainPropertyText("Query Identifier", buf, es);
+	}
+
 	/* Show buffer usage in planning */
 	if (bufusage)
 	{
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
index dc7ab2ce8b..966bfef865 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -472,3 +472,12 @@ select jsonb_pretty(
 (1 row)
 
 rollback;
+set compute_queryid = on;
+select explain_filter('explain (verbose) select 1');
+             explain_filter             
+----------------------------------------
+ Result  (cost=N.N..N.N rows=N width=N)
+   Output: N
+ Query Identifier: -N
+(3 rows)
+
diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql
index c79116c927..cec23dec73 100644
--- a/src/test/regress/sql/explain.sql
+++ b/src/test/regress/sql/explain.sql
@@ -105,3 +105,6 @@ select jsonb_pretty(
 );
 
 rollback;
+
+set compute_queryid = on;
+select explain_filter('explain (verbose) select 1');
-- 
2.29.2

v15-0001-Move-pg_stat_statements-query-jumbling-to-core.patchtext/x-patch; charset=US-ASCII; name=v15-0001-Move-pg_stat_statements-query-jumbling-to-core.patchDownload

From 5b2a42102e1c4ca40b5e7b62f14a151eee7d6b9b Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Wed, 14 Oct 2020 02:11:37 +0800
Subject: [PATCH v15 1/3] Move pg_stat_statements query jumbling to core.

A new compute_queryid GUC is also added, to control whether the queryid should
be computed.  It's now possible to disable core queryid computation and use
pg_stat_statements with a different algorithm to compute the queryid by using
third-party module.

Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +----------------
 .../pg_stat_statements.conf                   |   1 +
 doc/src/sgml/config.sgml                      |  18 +
 src/backend/parser/analyze.c                  |  14 +-
 src/backend/tcop/postgres.c                   |   6 +-
 src/backend/utils/misc/Makefile               |   1 +
 src/backend/utils/misc/guc.c                  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          | 834 ++++++++++++++++++
 src/include/parser/analyze.h                  |   4 +-
 src/include/utils/guc.h                       |   1 +
 src/include/utils/queryjumble.h               |  58 ++
 12 files changed, 969 insertions(+), 784 deletions(-)
 create mode 100644 src/backend/utils/misc/queryjumble.c
 create mode 100644 src/include/utils/queryjumble.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 72a117fc19..3db4fa2f7a 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * As of Postgres 9.2, this module normalizes query entries.  As of Postgres
+ * 14, the normalization is done by the core, if compute_queryid is enabled, or
+ * by third-party modules if enabled.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -114,8 +99,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
-#define JUMBLE_SIZE				1024	/* query serialization buffer size */
-
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -235,40 +218,6 @@ typedef struct pgssSharedState
 	pgssGlobalStats stats;		/* global statistics for pgss */
 } pgssSharedState;
 
-/*
- * Struct for tracking locations/lengths of constants during normalization
- */
-typedef struct pgssLocationLen
-{
-	int			location;		/* start offset in query text */
-	int			length;			/* length in bytes, or -1 to ignore */
-} pgssLocationLen;
-
-/*
- * Working state for computing a query jumble and producing a normalized
- * query string
- */
-typedef struct pgssJumbleState
-{
-	/* Jumble of current query tree */
-	unsigned char *jumble;
-
-	/* Number of bytes used in jumble[] */
-	Size		jumble_len;
-
-	/* Array of locations of constants that should be removed */
-	pgssLocationLen *clocations;
-
-	/* Allocated length of clocations array */
-	int			clocations_buf_size;
-
-	/* Current number of valid entries in clocations array */
-	int			clocations_count;
-
-	/* highest Param id we've seen, in order to start normalization correctly */
-	int			highest_extern_param_id;
-} pgssJumbleState;
-
 /*---- Local variables ----*/
 
 /* Current nesting depth of ExecutorRun+ProcessUtility calls */
@@ -342,7 +291,8 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_info);
 
 static void pgss_shmem_startup(void);
 static void pgss_shmem_shutdown(int code, Datum arg);
-static void pgss_post_parse_analyze(ParseState *pstate, Query *query);
+static void pgss_post_parse_analyze(ParseState *pstate, Query *query,
+									JumbleState *jstate);
 static PlannedStmt *pgss_planner(Query *parse,
 								 const char *query_string,
 								 int cursorOptions,
@@ -364,7 +314,7 @@ static void pgss_store(const char *query, uint64 queryId,
 					   double total_time, uint64 rows,
 					   const BufferUsage *bufusage,
 					   const WalUsage *walusage,
-					   pgssJumbleState *jstate);
+					   JumbleState *jstate);
 static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
 										pgssVersion api_version,
 										bool showtext);
@@ -380,16 +330,9 @@ static char *qtext_fetch(Size query_offset, int query_len,
 static bool need_gc_qtexts(void);
 static void gc_qtexts(void);
 static void entry_reset(Oid userid, Oid dbid, uint64 queryid);
-static void AppendJumble(pgssJumbleState *jstate,
-						 const unsigned char *item, Size size);
-static void JumbleQuery(pgssJumbleState *jstate, Query *query);
-static void JumbleRangeTable(pgssJumbleState *jstate, List *rtable);
-static void JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks);
-static void JumbleExpr(pgssJumbleState *jstate, Node *node);
-static void RecordConstLocation(pgssJumbleState *jstate, int location);
-static char *generate_normalized_query(pgssJumbleState *jstate, const char *query,
+static char *generate_normalized_query(JumbleState *jstate, const char *query,
 									   int query_loc, int *query_len_p);
-static void fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+static void fill_in_constant_lengths(JumbleState *jstate, const char *query,
 									 int query_loc);
 static int	comp_location(const void *a, const void *b);
 
@@ -851,15 +794,10 @@ error:
  * Post-parse-analysis hook: mark query with a queryId
  */
 static void
-pgss_post_parse_analyze(ParseState *pstate, Query *query)
+pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 {
-	pgssJumbleState jstate;
-
 	if (prev_post_parse_analyze_hook)
-		prev_post_parse_analyze_hook(pstate, query);
-
-	/* Assert we didn't do this already */
-	Assert(query->queryId == UINT64CONST(0));
+		prev_post_parse_analyze_hook(pstate, query, jstate);
 
 	/* Safety check... */
 	if (!pgss || !pgss_hash || !pgss_enabled(exec_nested_level))
@@ -879,35 +817,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 	}
 
-	/* Set up workspace for query jumbling */
-	jstate.jumble = (unsigned char *) palloc(JUMBLE_SIZE);
-	jstate.jumble_len = 0;
-	jstate.clocations_buf_size = 32;
-	jstate.clocations = (pgssLocationLen *)
-		palloc(jstate.clocations_buf_size * sizeof(pgssLocationLen));
-	jstate.clocations_count = 0;
-	jstate.highest_extern_param_id = 0;
-
-	/* Compute query ID and mark the Query node with it */
-	JumbleQuery(&jstate, query);
-	query->queryId =
-		DatumGetUInt64(hash_any_extended(jstate.jumble, jstate.jumble_len, 0));
-
 	/*
-	 * If we are unlucky enough to get a hash of zero, use 1 instead, to
-	 * prevent confusion with the utility-statement case.
+	 * If query jumbling were able to identify any ignorable constants, we
+	 * immediately create a hash table entry for the query, so that we can
+	 * record the normalized form of the query string.  If there were no such
+	 * constants, the normalized string would be the same as the query text
+	 * anyway, so there's no need for an early entry.
 	 */
-	if (query->queryId == UINT64CONST(0))
-		query->queryId = UINT64CONST(1);
-
-	/*
-	 * If we were able to identify any ignorable constants, we immediately
-	 * create a hash table entry for the query, so that we can record the
-	 * normalized form of the query string.  If there were no such constants,
-	 * the normalized string would be the same as the query text anyway, so
-	 * there's no need for an early entry.
-	 */
-	if (jstate.clocations_count > 0)
+	if (jstate && jstate->clocations_count > 0)
 		pgss_store(pstate->p_sourcetext,
 				   query->queryId,
 				   query->stmt_location,
@@ -917,7 +834,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 				   0,
 				   NULL,
 				   NULL,
-				   &jstate);
+				   jstate);
 }
 
 /*
@@ -1267,7 +1184,7 @@ pgss_store(const char *query, uint64 queryId,
 		   double total_time, uint64 rows,
 		   const BufferUsage *bufusage,
 		   const WalUsage *walusage,
-		   pgssJumbleState *jstate)
+		   JumbleState *jstate)
 {
 	pgssHashKey key;
 	pgssEntry  *entry;
@@ -2622,678 +2539,6 @@ release_lock:
 	LWLockRelease(pgss->lock);
 }
 
-/*
- * AppendJumble: Append a value that is substantive in a given query to
- * the current jumble.
- */
-static void
-AppendJumble(pgssJumbleState *jstate, const unsigned char *item, Size size)
-{
-	unsigned char *jumble = jstate->jumble;
-	Size		jumble_len = jstate->jumble_len;
-
-	/*
-	 * Whenever the jumble buffer is full, we hash the current contents and
-	 * reset the buffer to contain just that hash value, thus relying on the
-	 * hash to summarize everything so far.
-	 */
-	while (size > 0)
-	{
-		Size		part_size;
-
-		if (jumble_len >= JUMBLE_SIZE)
-		{
-			uint64		start_hash;
-
-			start_hash = DatumGetUInt64(hash_any_extended(jumble,
-														  JUMBLE_SIZE, 0));
-			memcpy(jumble, &start_hash, sizeof(start_hash));
-			jumble_len = sizeof(start_hash);
-		}
-		part_size = Min(size, JUMBLE_SIZE - jumble_len);
-		memcpy(jumble + jumble_len, item, part_size);
-		jumble_len += part_size;
-		item += part_size;
-		size -= part_size;
-	}
-	jstate->jumble_len = jumble_len;
-}
-
-/*
- * Wrappers around AppendJumble to encapsulate details of serialization
- * of individual local variable elements.
- */
-#define APP_JUMB(item) \
-	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
-#define APP_JUMB_STRING(str) \
-	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
-
-/*
- * JumbleQuery: Selectively serialize the query tree, appending significant
- * data to the "query jumble" while ignoring nonsignificant data.
- *
- * Rule of thumb for what to include is that we should ignore anything not
- * semantically significant (such as alias names) as well as anything that can
- * be deduced from child nodes (else we'd just be double-hashing that piece
- * of information).
- */
-static void
-JumbleQuery(pgssJumbleState *jstate, Query *query)
-{
-	Assert(IsA(query, Query));
-	Assert(query->utilityStmt == NULL);
-
-	APP_JUMB(query->commandType);
-	/* resultRelation is usually predictable from commandType */
-	JumbleExpr(jstate, (Node *) query->cteList);
-	JumbleRangeTable(jstate, query->rtable);
-	JumbleExpr(jstate, (Node *) query->jointree);
-	JumbleExpr(jstate, (Node *) query->targetList);
-	JumbleExpr(jstate, (Node *) query->onConflict);
-	JumbleExpr(jstate, (Node *) query->returningList);
-	JumbleExpr(jstate, (Node *) query->groupClause);
-	JumbleExpr(jstate, (Node *) query->groupingSets);
-	JumbleExpr(jstate, query->havingQual);
-	JumbleExpr(jstate, (Node *) query->windowClause);
-	JumbleExpr(jstate, (Node *) query->distinctClause);
-	JumbleExpr(jstate, (Node *) query->sortClause);
-	JumbleExpr(jstate, query->limitOffset);
-	JumbleExpr(jstate, query->limitCount);
-	JumbleRowMarks(jstate, query->rowMarks);
-	JumbleExpr(jstate, query->setOperations);
-}
-
-/*
- * Jumble a range table
- */
-static void
-JumbleRangeTable(pgssJumbleState *jstate, List *rtable)
-{
-	ListCell   *lc;
-
-	foreach(lc, rtable)
-	{
-		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
-
-		APP_JUMB(rte->rtekind);
-		switch (rte->rtekind)
-		{
-			case RTE_RELATION:
-				APP_JUMB(rte->relid);
-				JumbleExpr(jstate, (Node *) rte->tablesample);
-				break;
-			case RTE_SUBQUERY:
-				JumbleQuery(jstate, rte->subquery);
-				break;
-			case RTE_JOIN:
-				APP_JUMB(rte->jointype);
-				break;
-			case RTE_FUNCTION:
-				JumbleExpr(jstate, (Node *) rte->functions);
-				break;
-			case RTE_TABLEFUNC:
-				JumbleExpr(jstate, (Node *) rte->tablefunc);
-				break;
-			case RTE_VALUES:
-				JumbleExpr(jstate, (Node *) rte->values_lists);
-				break;
-			case RTE_CTE:
-
-				/*
-				 * Depending on the CTE name here isn't ideal, but it's the
-				 * only info we have to identify the referenced WITH item.
-				 */
-				APP_JUMB_STRING(rte->ctename);
-				APP_JUMB(rte->ctelevelsup);
-				break;
-			case RTE_NAMEDTUPLESTORE:
-				APP_JUMB_STRING(rte->enrname);
-				break;
-			case RTE_RESULT:
-				break;
-			default:
-				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
-				break;
-		}
-	}
-}
-
-/*
- * Jumble a rowMarks list
- */
-static void
-JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks)
-{
-	ListCell   *lc;
-
-	foreach(lc, rowMarks)
-	{
-		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
-
-		if (!rowmark->pushedDown)
-		{
-			APP_JUMB(rowmark->rti);
-			APP_JUMB(rowmark->strength);
-			APP_JUMB(rowmark->waitPolicy);
-		}
-	}
-}
-
-/*
- * Jumble an expression tree
- *
- * In general this function should handle all the same node types that
- * expression_tree_walker() does, and therefore it's coded to be as parallel
- * to that function as possible.  However, since we are only invoked on
- * queries immediately post-parse-analysis, we need not handle node types
- * that only appear in planning.
- *
- * Note: the reason we don't simply use expression_tree_walker() is that the
- * point of that function is to support tree walkers that don't care about
- * most tree node types, but here we care about all types.  We should complain
- * about any unrecognized node type.
- */
-static void
-JumbleExpr(pgssJumbleState *jstate, Node *node)
-{
-	ListCell   *temp;
-
-	if (node == NULL)
-		return;
-
-	/* Guard against stack overflow due to overly complex expressions */
-	check_stack_depth();
-
-	/*
-	 * We always emit the node's NodeTag, then any additional fields that are
-	 * considered significant, and then we recurse to any child nodes.
-	 */
-	APP_JUMB(node->type);
-
-	switch (nodeTag(node))
-	{
-		case T_Var:
-			{
-				Var		   *var = (Var *) node;
-
-				APP_JUMB(var->varno);
-				APP_JUMB(var->varattno);
-				APP_JUMB(var->varlevelsup);
-			}
-			break;
-		case T_Const:
-			{
-				Const	   *c = (Const *) node;
-
-				/* We jumble only the constant's type, not its value */
-				APP_JUMB(c->consttype);
-				/* Also, record its parse location for query normalization */
-				RecordConstLocation(jstate, c->location);
-			}
-			break;
-		case T_Param:
-			{
-				Param	   *p = (Param *) node;
-
-				APP_JUMB(p->paramkind);
-				APP_JUMB(p->paramid);
-				APP_JUMB(p->paramtype);
-				/* Also, track the highest external Param id */
-				if (p->paramkind == PARAM_EXTERN &&
-					p->paramid > jstate->highest_extern_param_id)
-					jstate->highest_extern_param_id = p->paramid;
-			}
-			break;
-		case T_Aggref:
-			{
-				Aggref	   *expr = (Aggref *) node;
-
-				APP_JUMB(expr->aggfnoid);
-				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggorder);
-				JumbleExpr(jstate, (Node *) expr->aggdistinct);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_GroupingFunc:
-			{
-				GroupingFunc *grpnode = (GroupingFunc *) node;
-
-				JumbleExpr(jstate, (Node *) grpnode->refs);
-			}
-			break;
-		case T_WindowFunc:
-			{
-				WindowFunc *expr = (WindowFunc *) node;
-
-				APP_JUMB(expr->winfnoid);
-				APP_JUMB(expr->winref);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_SubscriptingRef:
-			{
-				SubscriptingRef *sbsref = (SubscriptingRef *) node;
-
-				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
-			}
-			break;
-		case T_FuncExpr:
-			{
-				FuncExpr   *expr = (FuncExpr *) node;
-
-				APP_JUMB(expr->funcid);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_NamedArgExpr:
-			{
-				NamedArgExpr *nae = (NamedArgExpr *) node;
-
-				APP_JUMB(nae->argnumber);
-				JumbleExpr(jstate, (Node *) nae->arg);
-			}
-			break;
-		case T_OpExpr:
-		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
-		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
-			{
-				OpExpr	   *expr = (OpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_ScalarArrayOpExpr:
-			{
-				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				APP_JUMB(expr->useOr);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_BoolExpr:
-			{
-				BoolExpr   *expr = (BoolExpr *) node;
-
-				APP_JUMB(expr->boolop);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_SubLink:
-			{
-				SubLink    *sublink = (SubLink *) node;
-
-				APP_JUMB(sublink->subLinkType);
-				APP_JUMB(sublink->subLinkId);
-				JumbleExpr(jstate, (Node *) sublink->testexpr);
-				JumbleQuery(jstate, castNode(Query, sublink->subselect));
-			}
-			break;
-		case T_FieldSelect:
-			{
-				FieldSelect *fs = (FieldSelect *) node;
-
-				APP_JUMB(fs->fieldnum);
-				JumbleExpr(jstate, (Node *) fs->arg);
-			}
-			break;
-		case T_FieldStore:
-			{
-				FieldStore *fstore = (FieldStore *) node;
-
-				JumbleExpr(jstate, (Node *) fstore->arg);
-				JumbleExpr(jstate, (Node *) fstore->newvals);
-			}
-			break;
-		case T_RelabelType:
-			{
-				RelabelType *rt = (RelabelType *) node;
-
-				APP_JUMB(rt->resulttype);
-				JumbleExpr(jstate, (Node *) rt->arg);
-			}
-			break;
-		case T_CoerceViaIO:
-			{
-				CoerceViaIO *cio = (CoerceViaIO *) node;
-
-				APP_JUMB(cio->resulttype);
-				JumbleExpr(jstate, (Node *) cio->arg);
-			}
-			break;
-		case T_ArrayCoerceExpr:
-			{
-				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
-
-				APP_JUMB(acexpr->resulttype);
-				JumbleExpr(jstate, (Node *) acexpr->arg);
-				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
-			}
-			break;
-		case T_ConvertRowtypeExpr:
-			{
-				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
-
-				APP_JUMB(crexpr->resulttype);
-				JumbleExpr(jstate, (Node *) crexpr->arg);
-			}
-			break;
-		case T_CollateExpr:
-			{
-				CollateExpr *ce = (CollateExpr *) node;
-
-				APP_JUMB(ce->collOid);
-				JumbleExpr(jstate, (Node *) ce->arg);
-			}
-			break;
-		case T_CaseExpr:
-			{
-				CaseExpr   *caseexpr = (CaseExpr *) node;
-
-				JumbleExpr(jstate, (Node *) caseexpr->arg);
-				foreach(temp, caseexpr->args)
-				{
-					CaseWhen   *when = lfirst_node(CaseWhen, temp);
-
-					JumbleExpr(jstate, (Node *) when->expr);
-					JumbleExpr(jstate, (Node *) when->result);
-				}
-				JumbleExpr(jstate, (Node *) caseexpr->defresult);
-			}
-			break;
-		case T_CaseTestExpr:
-			{
-				CaseTestExpr *ct = (CaseTestExpr *) node;
-
-				APP_JUMB(ct->typeId);
-			}
-			break;
-		case T_ArrayExpr:
-			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
-			break;
-		case T_RowExpr:
-			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
-			break;
-		case T_RowCompareExpr:
-			{
-				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
-
-				APP_JUMB(rcexpr->rctype);
-				JumbleExpr(jstate, (Node *) rcexpr->largs);
-				JumbleExpr(jstate, (Node *) rcexpr->rargs);
-			}
-			break;
-		case T_CoalesceExpr:
-			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
-			break;
-		case T_MinMaxExpr:
-			{
-				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
-
-				APP_JUMB(mmexpr->op);
-				JumbleExpr(jstate, (Node *) mmexpr->args);
-			}
-			break;
-		case T_SQLValueFunction:
-			{
-				SQLValueFunction *svf = (SQLValueFunction *) node;
-
-				APP_JUMB(svf->op);
-				/* type is fully determined by op */
-				APP_JUMB(svf->typmod);
-			}
-			break;
-		case T_XmlExpr:
-			{
-				XmlExpr    *xexpr = (XmlExpr *) node;
-
-				APP_JUMB(xexpr->op);
-				JumbleExpr(jstate, (Node *) xexpr->named_args);
-				JumbleExpr(jstate, (Node *) xexpr->args);
-			}
-			break;
-		case T_NullTest:
-			{
-				NullTest   *nt = (NullTest *) node;
-
-				APP_JUMB(nt->nulltesttype);
-				JumbleExpr(jstate, (Node *) nt->arg);
-			}
-			break;
-		case T_BooleanTest:
-			{
-				BooleanTest *bt = (BooleanTest *) node;
-
-				APP_JUMB(bt->booltesttype);
-				JumbleExpr(jstate, (Node *) bt->arg);
-			}
-			break;
-		case T_CoerceToDomain:
-			{
-				CoerceToDomain *cd = (CoerceToDomain *) node;
-
-				APP_JUMB(cd->resulttype);
-				JumbleExpr(jstate, (Node *) cd->arg);
-			}
-			break;
-		case T_CoerceToDomainValue:
-			{
-				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
-
-				APP_JUMB(cdv->typeId);
-			}
-			break;
-		case T_SetToDefault:
-			{
-				SetToDefault *sd = (SetToDefault *) node;
-
-				APP_JUMB(sd->typeId);
-			}
-			break;
-		case T_CurrentOfExpr:
-			{
-				CurrentOfExpr *ce = (CurrentOfExpr *) node;
-
-				APP_JUMB(ce->cvarno);
-				if (ce->cursor_name)
-					APP_JUMB_STRING(ce->cursor_name);
-				APP_JUMB(ce->cursor_param);
-			}
-			break;
-		case T_NextValueExpr:
-			{
-				NextValueExpr *nve = (NextValueExpr *) node;
-
-				APP_JUMB(nve->seqid);
-				APP_JUMB(nve->typeId);
-			}
-			break;
-		case T_InferenceElem:
-			{
-				InferenceElem *ie = (InferenceElem *) node;
-
-				APP_JUMB(ie->infercollid);
-				APP_JUMB(ie->inferopclass);
-				JumbleExpr(jstate, ie->expr);
-			}
-			break;
-		case T_TargetEntry:
-			{
-				TargetEntry *tle = (TargetEntry *) node;
-
-				APP_JUMB(tle->resno);
-				APP_JUMB(tle->ressortgroupref);
-				JumbleExpr(jstate, (Node *) tle->expr);
-			}
-			break;
-		case T_RangeTblRef:
-			{
-				RangeTblRef *rtr = (RangeTblRef *) node;
-
-				APP_JUMB(rtr->rtindex);
-			}
-			break;
-		case T_JoinExpr:
-			{
-				JoinExpr   *join = (JoinExpr *) node;
-
-				APP_JUMB(join->jointype);
-				APP_JUMB(join->isNatural);
-				APP_JUMB(join->rtindex);
-				JumbleExpr(jstate, join->larg);
-				JumbleExpr(jstate, join->rarg);
-				JumbleExpr(jstate, join->quals);
-			}
-			break;
-		case T_FromExpr:
-			{
-				FromExpr   *from = (FromExpr *) node;
-
-				JumbleExpr(jstate, (Node *) from->fromlist);
-				JumbleExpr(jstate, from->quals);
-			}
-			break;
-		case T_OnConflictExpr:
-			{
-				OnConflictExpr *conf = (OnConflictExpr *) node;
-
-				APP_JUMB(conf->action);
-				JumbleExpr(jstate, (Node *) conf->arbiterElems);
-				JumbleExpr(jstate, conf->arbiterWhere);
-				JumbleExpr(jstate, (Node *) conf->onConflictSet);
-				JumbleExpr(jstate, conf->onConflictWhere);
-				APP_JUMB(conf->constraint);
-				APP_JUMB(conf->exclRelIndex);
-				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
-			}
-			break;
-		case T_List:
-			foreach(temp, (List *) node)
-			{
-				JumbleExpr(jstate, (Node *) lfirst(temp));
-			}
-			break;
-		case T_IntList:
-			foreach(temp, (List *) node)
-			{
-				APP_JUMB(lfirst_int(temp));
-			}
-			break;
-		case T_SortGroupClause:
-			{
-				SortGroupClause *sgc = (SortGroupClause *) node;
-
-				APP_JUMB(sgc->tleSortGroupRef);
-				APP_JUMB(sgc->eqop);
-				APP_JUMB(sgc->sortop);
-				APP_JUMB(sgc->nulls_first);
-			}
-			break;
-		case T_GroupingSet:
-			{
-				GroupingSet *gsnode = (GroupingSet *) node;
-
-				JumbleExpr(jstate, (Node *) gsnode->content);
-			}
-			break;
-		case T_WindowClause:
-			{
-				WindowClause *wc = (WindowClause *) node;
-
-				APP_JUMB(wc->winref);
-				APP_JUMB(wc->frameOptions);
-				JumbleExpr(jstate, (Node *) wc->partitionClause);
-				JumbleExpr(jstate, (Node *) wc->orderClause);
-				JumbleExpr(jstate, wc->startOffset);
-				JumbleExpr(jstate, wc->endOffset);
-			}
-			break;
-		case T_CommonTableExpr:
-			{
-				CommonTableExpr *cte = (CommonTableExpr *) node;
-
-				/* we store the string name because RTE_CTE RTEs need it */
-				APP_JUMB_STRING(cte->ctename);
-				APP_JUMB(cte->ctematerialized);
-				JumbleQuery(jstate, castNode(Query, cte->ctequery));
-			}
-			break;
-		case T_SetOperationStmt:
-			{
-				SetOperationStmt *setop = (SetOperationStmt *) node;
-
-				APP_JUMB(setop->op);
-				APP_JUMB(setop->all);
-				JumbleExpr(jstate, setop->larg);
-				JumbleExpr(jstate, setop->rarg);
-			}
-			break;
-		case T_RangeTblFunction:
-			{
-				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
-
-				JumbleExpr(jstate, rtfunc->funcexpr);
-			}
-			break;
-		case T_TableFunc:
-			{
-				TableFunc  *tablefunc = (TableFunc *) node;
-
-				JumbleExpr(jstate, tablefunc->docexpr);
-				JumbleExpr(jstate, tablefunc->rowexpr);
-				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
-			}
-			break;
-		case T_TableSampleClause:
-			{
-				TableSampleClause *tsc = (TableSampleClause *) node;
-
-				APP_JUMB(tsc->tsmhandler);
-				JumbleExpr(jstate, (Node *) tsc->args);
-				JumbleExpr(jstate, (Node *) tsc->repeatable);
-			}
-			break;
-		default:
-			/* Only a warning, since we can stumble along anyway */
-			elog(WARNING, "unrecognized node type: %d",
-				 (int) nodeTag(node));
-			break;
-	}
-}
-
-/*
- * Record location of constant within query string of query tree
- * that is currently being walked.
- */
-static void
-RecordConstLocation(pgssJumbleState *jstate, int location)
-{
-	/* -1 indicates unknown or undefined location */
-	if (location >= 0)
-	{
-		/* enlarge array if needed */
-		if (jstate->clocations_count >= jstate->clocations_buf_size)
-		{
-			jstate->clocations_buf_size *= 2;
-			jstate->clocations = (pgssLocationLen *)
-				repalloc(jstate->clocations,
-						 jstate->clocations_buf_size *
-						 sizeof(pgssLocationLen));
-		}
-		jstate->clocations[jstate->clocations_count].location = location;
-		/* initialize lengths to -1 to simplify fill_in_constant_lengths */
-		jstate->clocations[jstate->clocations_count].length = -1;
-		jstate->clocations_count++;
-	}
-}
-
 /*
  * Generate a normalized version of the query string that will be used to
  * represent all similar queries.
@@ -3314,7 +2559,7 @@ RecordConstLocation(pgssJumbleState *jstate, int location)
  * Returns a palloc'd string.
  */
 static char *
-generate_normalized_query(pgssJumbleState *jstate, const char *query,
+generate_normalized_query(JumbleState *jstate, const char *query,
 						  int query_loc, int *query_len_p)
 {
 	char	   *norm_query;
@@ -3421,10 +2666,10 @@ generate_normalized_query(pgssJumbleState *jstate, const char *query,
  * reason for a constant to start with a '-'.
  */
 static void
-fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+fill_in_constant_lengths(JumbleState *jstate, const char *query,
 						 int query_loc)
 {
-	pgssLocationLen *locs;
+	LocationLen *locs;
 	core_yyscan_t yyscanner;
 	core_yy_extra_type yyextra;
 	core_YYSTYPE yylval;
@@ -3438,7 +2683,7 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 	 */
 	if (jstate->clocations_count > 1)
 		qsort(jstate->clocations, jstate->clocations_count,
-			  sizeof(pgssLocationLen), comp_location);
+			  sizeof(LocationLen), comp_location);
 	locs = jstate->clocations;
 
 	/* initialize the flex scanner --- should match raw_parser() */
@@ -3518,13 +2763,13 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 }
 
 /*
- * comp_location: comparator for qsorting pgssLocationLen structs by location
+ * comp_location: comparator for qsorting LocationLen structs by location
  */
 static int
 comp_location(const void *a, const void *b)
 {
-	int			l = ((const pgssLocationLen *) a)->location;
-	int			r = ((const pgssLocationLen *) b)->location;
+	int			l = ((const LocationLen *) a)->location;
+	int			r = ((const LocationLen *) b)->location;
 
 	if (l < r)
 		return -1;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.conf b/contrib/pg_stat_statements/pg_stat_statements.conf
index 13346e2807..d98411ea3f 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.conf
+++ b/contrib/pg_stat_statements/pg_stat_statements.conf
@@ -1 +1,2 @@
 shared_preload_libraries = 'pg_stat_statements'
+compute_queryid = on
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7c0a673a8d..2af758b10e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7467,6 +7467,24 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
      <title>Statistics Monitoring</title>
      <variablelist>
 
+     <varlistentry id="guc-compute-queryid" xreflabel="compute_queryid">
+      <term><varname>compute_queryid</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>compute_queryid</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables in core query identifier computation.arameter.  The
+        <xref linkend="pgstatstatements"/> extension requires a query
+        identifier to be computed.  Note that an external module can
+        alternatively be used if the in core query identifier computation
+        specification doesn't suit your need.  In this case, in core
+        computation must be disabled.  The default is <literal>off</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><varname>log_statement_stats</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 28e192f51c..1bc0f66703 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -46,6 +46,8 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/queryjumble.h"
 #include "utils/rel.h"
 
 
@@ -107,6 +109,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -119,8 +122,11 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	query = transformTopLevelStmt(pstate, parseTree);
 
+	if (compute_queryid)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
@@ -140,6 +146,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -152,8 +159,11 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 	/* make sure all is well with parameter types */
 	check_variable_parameters(pstate, query);
 
+	if (compute_queryid)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 28055680aa..45a45f4171 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -720,6 +720,7 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 	ParseState *pstate;
 	Query	   *query;
 	List	   *querytree_list;
+	JumbleState *jstate = NULL;
 
 	Assert(query_string != NULL);	/* required as of 8.4 */
 
@@ -738,8 +739,11 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	query = transformTopLevelStmt(pstate, parsetree);
 
+	if (compute_queryid)
+		jstate = JumbleQuery(query, query_string);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 2397fc2453..1d5327cf64 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -22,6 +22,7 @@ OBJS = \
 	pg_rusage.o \
 	ps_status.o \
 	queryenvironment.o \
+	queryjumble.o \
 	rls.o \
 	sampling.o \
 	superuser.o \
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 17579eeaca..fc2e0e08b8 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -512,6 +512,7 @@ extern const struct config_enum_entry dynamic_shared_memory_options[];
 /*
  * GUC option variables that are exported from this module
  */
+bool		compute_queryid = false;
 bool		log_duration = false;
 bool		Debug_print_plan = false;
 bool		Debug_print_parse = false;
@@ -1407,6 +1408,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"compute_queryid", PGC_SUSET, STATS_MONITORING,
+			gettext_noop("Compute query identifiers."),
+			NULL
+		},
+		&compute_queryid,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"log_parser_stats", PGC_SUSET, STATS_MONITORING,
 			gettext_noop("Writes parser performance statistics to the server log."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 8930a94fff..c4421bcc1f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -592,6 +592,7 @@
 
 # - Monitoring -
 
+#compute_queryid = off
 #log_parser_stats = off
 #log_planner_stats = off
 #log_executor_stats = off
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
new file mode 100644
index 0000000000..ae84fcac6e
--- /dev/null
+++ b/src/backend/utils/misc/queryjumble.c
@@ -0,0 +1,834 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.c
+ *	 Query normalization and fingerprinting.
+ *
+ * Normalization is a process whereby similar queries, typically differing only
+ * in their constants (though the exact rules are somewhat more subtle than
+ * that) are recognized as equivalent, and are tracked as a single entry.  This
+ * is particularly useful for non-prepared queries.
+ *
+ * Normalization is implemented by fingerprinting queries, selectively
+ * serializing those fields of each query tree's nodes that are judged to be
+ * essential to the query.  This is referred to as a query jumble.  This is
+ * distinct from a regular serialization in that various extraneous
+ * information is ignored as irrelevant or not essential to the query, such
+ * as the collations of Vars and, most notably, the values of constants.
+ *
+ * This jumble is acquired at the end of parse analysis of each query, and
+ * a 64-bit hash of it is stored into the query's Query.queryId field.
+ * The server then copies this value around, making it available in plan
+ * tree(s) generated from the query.  The executor can then use this value
+ * to blame query costs on the proper queryId.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/misc/queryjumble.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "common/hashfn.h"
+#include "miscadmin.h"
+#include "parser/scansup.h"
+#include "utils/queryjumble.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+static uint64 compute_utility_queryid(const char *str, int query_len);
+static void AppendJumble(JumbleState *jstate,
+						 const unsigned char *item, Size size);
+static void JumbleQueryInternal(JumbleState *jstate, Query *query);
+static void JumbleRangeTable(JumbleState *jstate, List *rtable);
+static void JumbleRowMarks(JumbleState *jstate, List *rowMarks);
+static void JumbleExpr(JumbleState *jstate, Node *node);
+static void RecordConstLocation(JumbleState *jstate, int location);
+
+/*
+ * Given a possibly multi-statement source string, confine our attention to the
+ * relevant part of the string.
+ */
+const char *
+clean_querytext(const char *query, int *location, int *len)
+{
+	int query_location = *location;
+	int query_len = *len;
+
+	/* First apply starting offset, unless it's -1 (unknown). */
+	if (query_location >= 0)
+	{
+		Assert(query_location <= strlen(query));
+		query += query_location;
+		/* Length of 0 (or -1) means "rest of string" */
+		if (query_len <= 0)
+			query_len = strlen(query);
+		else
+			Assert(query_len <= strlen(query));
+	}
+	else
+	{
+		/* If query location is unknown, distrust query_len as well */
+		query_location = 0;
+		query_len = strlen(query);
+	}
+
+	/*
+	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
+	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 */
+	while (query_len > 0 && scanner_isspace(query[0]))
+		query++, query_location++, query_len--;
+	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
+		query_len--;
+
+	*location = query_location;
+	*len = query_len;
+
+	return query;
+}
+
+JumbleState *
+JumbleQuery(Query *query, const char *querytext)
+{
+	JumbleState *jstate = NULL;
+	if (query->utilityStmt)
+	{
+		const char *sql;
+		int query_location = query->stmt_location;
+		int query_len = query->stmt_len;
+
+		/*
+		 * Confine our attention to the relevant part of the string, if the
+		 * query is a portion of a multi-statement source string.
+		 */
+		sql = clean_querytext(querytext, &query_location, &query_len);
+
+		query->queryId = compute_utility_queryid(sql, query_len);
+	}
+	else
+	{
+		jstate = (JumbleState *) palloc(sizeof(JumbleState));
+
+		/* Set up workspace for query jumbling */
+		jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
+		jstate->jumble_len = 0;
+		jstate->clocations_buf_size = 32;
+		jstate->clocations = (LocationLen *)
+			palloc(jstate->clocations_buf_size * sizeof(LocationLen));
+		jstate->clocations_count = 0;
+		jstate->highest_extern_param_id = 0;
+
+		/* Compute query ID and mark the Query node with it */
+		JumbleQueryInternal(jstate, query);
+		query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
+														  jstate->jumble_len,
+														  0));
+
+		/*
+		 * If we are unlucky enough to get a hash of zero, use 1 instead, to
+		 * prevent confusion with the utility-statement case.
+		 */
+		if (query->queryId == UINT64CONST(0))
+			query->queryId = UINT64CONST(1);
+	}
+
+	return jstate;
+}
+
+/*
+ * Compute a query identifier for the given utility query string.
+ */
+static uint64
+compute_utility_queryid(const char *str, int query_len)
+{
+	uint64 queryId;
+
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+											   query_len, 0));
+
+	/*
+	 * If we are unlucky enough to get a hash of zero(invalid), use
+	 * queryID as 2 instead, queryID 1 is already in use for normal
+	 * statements.
+	 */
+	if (queryId == UINT64CONST(0))
+		queryId = UINT64CONST(2);
+
+	return queryId;
+}
+
+/*
+ * AppendJumble: Append a value that is substantive in a given query to
+ * the current jumble.
+ */
+static void
+AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
+{
+	unsigned char *jumble = jstate->jumble;
+	Size		jumble_len = jstate->jumble_len;
+
+	/*
+	 * Whenever the jumble buffer is full, we hash the current contents and
+	 * reset the buffer to contain just that hash value, thus relying on the
+	 * hash to summarize everything so far.
+	 */
+	while (size > 0)
+	{
+		Size		part_size;
+
+		if (jumble_len >= JUMBLE_SIZE)
+		{
+			uint64		start_hash;
+
+			start_hash = DatumGetUInt64(hash_any_extended(jumble,
+														  JUMBLE_SIZE, 0));
+			memcpy(jumble, &start_hash, sizeof(start_hash));
+			jumble_len = sizeof(start_hash);
+		}
+		part_size = Min(size, JUMBLE_SIZE - jumble_len);
+		memcpy(jumble + jumble_len, item, part_size);
+		jumble_len += part_size;
+		item += part_size;
+		size -= part_size;
+	}
+	jstate->jumble_len = jumble_len;
+}
+
+/*
+ * Wrappers around AppendJumble to encapsulate details of serialization
+ * of individual local variable elements.
+ */
+#define APP_JUMB(item) \
+	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
+#define APP_JUMB_STRING(str) \
+	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
+
+/*
+ * JumbleQueryInternal: Selectively serialize the query tree, appending
+ * significant data to the "query jumble" while ignoring nonsignificant data.
+ *
+ * Rule of thumb for what to include is that we should ignore anything not
+ * semantically significant (such as alias names) as well as anything that can
+ * be deduced from child nodes (else we'd just be double-hashing that piece
+ * of information).
+ */
+static void
+JumbleQueryInternal(JumbleState *jstate, Query *query)
+{
+	Assert(IsA(query, Query));
+	Assert(query->utilityStmt == NULL);
+
+	APP_JUMB(query->commandType);
+	/* resultRelation is usually predictable from commandType */
+	JumbleExpr(jstate, (Node *) query->cteList);
+	JumbleRangeTable(jstate, query->rtable);
+	JumbleExpr(jstate, (Node *) query->jointree);
+	JumbleExpr(jstate, (Node *) query->targetList);
+	JumbleExpr(jstate, (Node *) query->onConflict);
+	JumbleExpr(jstate, (Node *) query->returningList);
+	JumbleExpr(jstate, (Node *) query->groupClause);
+	JumbleExpr(jstate, (Node *) query->groupingSets);
+	JumbleExpr(jstate, query->havingQual);
+	JumbleExpr(jstate, (Node *) query->windowClause);
+	JumbleExpr(jstate, (Node *) query->distinctClause);
+	JumbleExpr(jstate, (Node *) query->sortClause);
+	JumbleExpr(jstate, query->limitOffset);
+	JumbleExpr(jstate, query->limitCount);
+	JumbleRowMarks(jstate, query->rowMarks);
+	JumbleExpr(jstate, query->setOperations);
+}
+
+/*
+ * Jumble a range table
+ */
+static void
+JumbleRangeTable(JumbleState *jstate, List *rtable)
+{
+	ListCell   *lc;
+
+	foreach(lc, rtable)
+	{
+		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+		APP_JUMB(rte->rtekind);
+		switch (rte->rtekind)
+		{
+			case RTE_RELATION:
+				APP_JUMB(rte->relid);
+				JumbleExpr(jstate, (Node *) rte->tablesample);
+				break;
+			case RTE_SUBQUERY:
+				JumbleQueryInternal(jstate, rte->subquery);
+				break;
+			case RTE_JOIN:
+				APP_JUMB(rte->jointype);
+				break;
+			case RTE_FUNCTION:
+				JumbleExpr(jstate, (Node *) rte->functions);
+				break;
+			case RTE_TABLEFUNC:
+				JumbleExpr(jstate, (Node *) rte->tablefunc);
+				break;
+			case RTE_VALUES:
+				JumbleExpr(jstate, (Node *) rte->values_lists);
+				break;
+			case RTE_CTE:
+
+				/*
+				 * Depending on the CTE name here isn't ideal, but it's the
+				 * only info we have to identify the referenced WITH item.
+				 */
+				APP_JUMB_STRING(rte->ctename);
+				APP_JUMB(rte->ctelevelsup);
+				break;
+			case RTE_NAMEDTUPLESTORE:
+				APP_JUMB_STRING(rte->enrname);
+				break;
+			case RTE_RESULT:
+				break;
+			default:
+				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
+				break;
+		}
+	}
+}
+
+/*
+ * Jumble a rowMarks list
+ */
+static void
+JumbleRowMarks(JumbleState *jstate, List *rowMarks)
+{
+	ListCell   *lc;
+
+	foreach(lc, rowMarks)
+	{
+		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
+
+		if (!rowmark->pushedDown)
+		{
+			APP_JUMB(rowmark->rti);
+			APP_JUMB(rowmark->strength);
+			APP_JUMB(rowmark->waitPolicy);
+		}
+	}
+}
+
+/*
+ * Jumble an expression tree
+ *
+ * In general this function should handle all the same node types that
+ * expression_tree_walker() does, and therefore it's coded to be as parallel
+ * to that function as possible.  However, since we are only invoked on
+ * queries immediately post-parse-analysis, we need not handle node types
+ * that only appear in planning.
+ *
+ * Note: the reason we don't simply use expression_tree_walker() is that the
+ * point of that function is to support tree walkers that don't care about
+ * most tree node types, but here we care about all types.  We should complain
+ * about any unrecognized node type.
+ */
+static void
+JumbleExpr(JumbleState *jstate, Node *node)
+{
+	ListCell   *temp;
+
+	if (node == NULL)
+		return;
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+
+	/*
+	 * We always emit the node's NodeTag, then any additional fields that are
+	 * considered significant, and then we recurse to any child nodes.
+	 */
+	APP_JUMB(node->type);
+
+	switch (nodeTag(node))
+	{
+		case T_Var:
+			{
+				Var		   *var = (Var *) node;
+
+				APP_JUMB(var->varno);
+				APP_JUMB(var->varattno);
+				APP_JUMB(var->varlevelsup);
+			}
+			break;
+		case T_Const:
+			{
+				Const	   *c = (Const *) node;
+
+				/* We jumble only the constant's type, not its value */
+				APP_JUMB(c->consttype);
+				/* Also, record its parse location for query normalization */
+				RecordConstLocation(jstate, c->location);
+			}
+			break;
+		case T_Param:
+			{
+				Param	   *p = (Param *) node;
+
+				APP_JUMB(p->paramkind);
+				APP_JUMB(p->paramid);
+				APP_JUMB(p->paramtype);
+				/* Also, track the highest external Param id */
+				if (p->paramkind == PARAM_EXTERN &&
+					p->paramid > jstate->highest_extern_param_id)
+					jstate->highest_extern_param_id = p->paramid;
+			}
+			break;
+		case T_Aggref:
+			{
+				Aggref	   *expr = (Aggref *) node;
+
+				APP_JUMB(expr->aggfnoid);
+				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggorder);
+				JumbleExpr(jstate, (Node *) expr->aggdistinct);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_GroupingFunc:
+			{
+				GroupingFunc *grpnode = (GroupingFunc *) node;
+
+				JumbleExpr(jstate, (Node *) grpnode->refs);
+			}
+			break;
+		case T_WindowFunc:
+			{
+				WindowFunc *expr = (WindowFunc *) node;
+
+				APP_JUMB(expr->winfnoid);
+				APP_JUMB(expr->winref);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_SubscriptingRef:
+			{
+				SubscriptingRef *sbsref = (SubscriptingRef *) node;
+
+				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
+			}
+			break;
+		case T_FuncExpr:
+			{
+				FuncExpr   *expr = (FuncExpr *) node;
+
+				APP_JUMB(expr->funcid);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_NamedArgExpr:
+			{
+				NamedArgExpr *nae = (NamedArgExpr *) node;
+
+				APP_JUMB(nae->argnumber);
+				JumbleExpr(jstate, (Node *) nae->arg);
+			}
+			break;
+		case T_OpExpr:
+		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
+		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
+			{
+				OpExpr	   *expr = (OpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_ScalarArrayOpExpr:
+			{
+				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				APP_JUMB(expr->useOr);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_BoolExpr:
+			{
+				BoolExpr   *expr = (BoolExpr *) node;
+
+				APP_JUMB(expr->boolop);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_SubLink:
+			{
+				SubLink    *sublink = (SubLink *) node;
+
+				APP_JUMB(sublink->subLinkType);
+				APP_JUMB(sublink->subLinkId);
+				JumbleExpr(jstate, (Node *) sublink->testexpr);
+				JumbleQueryInternal(jstate, castNode(Query, sublink->subselect));
+			}
+			break;
+		case T_FieldSelect:
+			{
+				FieldSelect *fs = (FieldSelect *) node;
+
+				APP_JUMB(fs->fieldnum);
+				JumbleExpr(jstate, (Node *) fs->arg);
+			}
+			break;
+		case T_FieldStore:
+			{
+				FieldStore *fstore = (FieldStore *) node;
+
+				JumbleExpr(jstate, (Node *) fstore->arg);
+				JumbleExpr(jstate, (Node *) fstore->newvals);
+			}
+			break;
+		case T_RelabelType:
+			{
+				RelabelType *rt = (RelabelType *) node;
+
+				APP_JUMB(rt->resulttype);
+				JumbleExpr(jstate, (Node *) rt->arg);
+			}
+			break;
+		case T_CoerceViaIO:
+			{
+				CoerceViaIO *cio = (CoerceViaIO *) node;
+
+				APP_JUMB(cio->resulttype);
+				JumbleExpr(jstate, (Node *) cio->arg);
+			}
+			break;
+		case T_ArrayCoerceExpr:
+			{
+				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
+
+				APP_JUMB(acexpr->resulttype);
+				JumbleExpr(jstate, (Node *) acexpr->arg);
+				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
+			}
+			break;
+		case T_ConvertRowtypeExpr:
+			{
+				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
+
+				APP_JUMB(crexpr->resulttype);
+				JumbleExpr(jstate, (Node *) crexpr->arg);
+			}
+			break;
+		case T_CollateExpr:
+			{
+				CollateExpr *ce = (CollateExpr *) node;
+
+				APP_JUMB(ce->collOid);
+				JumbleExpr(jstate, (Node *) ce->arg);
+			}
+			break;
+		case T_CaseExpr:
+			{
+				CaseExpr   *caseexpr = (CaseExpr *) node;
+
+				JumbleExpr(jstate, (Node *) caseexpr->arg);
+				foreach(temp, caseexpr->args)
+				{
+					CaseWhen   *when = lfirst_node(CaseWhen, temp);
+
+					JumbleExpr(jstate, (Node *) when->expr);
+					JumbleExpr(jstate, (Node *) when->result);
+				}
+				JumbleExpr(jstate, (Node *) caseexpr->defresult);
+			}
+			break;
+		case T_CaseTestExpr:
+			{
+				CaseTestExpr *ct = (CaseTestExpr *) node;
+
+				APP_JUMB(ct->typeId);
+			}
+			break;
+		case T_ArrayExpr:
+			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
+			break;
+		case T_RowExpr:
+			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
+			break;
+		case T_RowCompareExpr:
+			{
+				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
+
+				APP_JUMB(rcexpr->rctype);
+				JumbleExpr(jstate, (Node *) rcexpr->largs);
+				JumbleExpr(jstate, (Node *) rcexpr->rargs);
+			}
+			break;
+		case T_CoalesceExpr:
+			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
+			break;
+		case T_MinMaxExpr:
+			{
+				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
+
+				APP_JUMB(mmexpr->op);
+				JumbleExpr(jstate, (Node *) mmexpr->args);
+			}
+			break;
+		case T_SQLValueFunction:
+			{
+				SQLValueFunction *svf = (SQLValueFunction *) node;
+
+				APP_JUMB(svf->op);
+				/* type is fully determined by op */
+				APP_JUMB(svf->typmod);
+			}
+			break;
+		case T_XmlExpr:
+			{
+				XmlExpr    *xexpr = (XmlExpr *) node;
+
+				APP_JUMB(xexpr->op);
+				JumbleExpr(jstate, (Node *) xexpr->named_args);
+				JumbleExpr(jstate, (Node *) xexpr->args);
+			}
+			break;
+		case T_NullTest:
+			{
+				NullTest   *nt = (NullTest *) node;
+
+				APP_JUMB(nt->nulltesttype);
+				JumbleExpr(jstate, (Node *) nt->arg);
+			}
+			break;
+		case T_BooleanTest:
+			{
+				BooleanTest *bt = (BooleanTest *) node;
+
+				APP_JUMB(bt->booltesttype);
+				JumbleExpr(jstate, (Node *) bt->arg);
+			}
+			break;
+		case T_CoerceToDomain:
+			{
+				CoerceToDomain *cd = (CoerceToDomain *) node;
+
+				APP_JUMB(cd->resulttype);
+				JumbleExpr(jstate, (Node *) cd->arg);
+			}
+			break;
+		case T_CoerceToDomainValue:
+			{
+				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
+
+				APP_JUMB(cdv->typeId);
+			}
+			break;
+		case T_SetToDefault:
+			{
+				SetToDefault *sd = (SetToDefault *) node;
+
+				APP_JUMB(sd->typeId);
+			}
+			break;
+		case T_CurrentOfExpr:
+			{
+				CurrentOfExpr *ce = (CurrentOfExpr *) node;
+
+				APP_JUMB(ce->cvarno);
+				if (ce->cursor_name)
+					APP_JUMB_STRING(ce->cursor_name);
+				APP_JUMB(ce->cursor_param);
+			}
+			break;
+		case T_NextValueExpr:
+			{
+				NextValueExpr *nve = (NextValueExpr *) node;
+
+				APP_JUMB(nve->seqid);
+				APP_JUMB(nve->typeId);
+			}
+			break;
+		case T_InferenceElem:
+			{
+				InferenceElem *ie = (InferenceElem *) node;
+
+				APP_JUMB(ie->infercollid);
+				APP_JUMB(ie->inferopclass);
+				JumbleExpr(jstate, ie->expr);
+			}
+			break;
+		case T_TargetEntry:
+			{
+				TargetEntry *tle = (TargetEntry *) node;
+
+				APP_JUMB(tle->resno);
+				APP_JUMB(tle->ressortgroupref);
+				JumbleExpr(jstate, (Node *) tle->expr);
+			}
+			break;
+		case T_RangeTblRef:
+			{
+				RangeTblRef *rtr = (RangeTblRef *) node;
+
+				APP_JUMB(rtr->rtindex);
+			}
+			break;
+		case T_JoinExpr:
+			{
+				JoinExpr   *join = (JoinExpr *) node;
+
+				APP_JUMB(join->jointype);
+				APP_JUMB(join->isNatural);
+				APP_JUMB(join->rtindex);
+				JumbleExpr(jstate, join->larg);
+				JumbleExpr(jstate, join->rarg);
+				JumbleExpr(jstate, join->quals);
+			}
+			break;
+		case T_FromExpr:
+			{
+				FromExpr   *from = (FromExpr *) node;
+
+				JumbleExpr(jstate, (Node *) from->fromlist);
+				JumbleExpr(jstate, from->quals);
+			}
+			break;
+		case T_OnConflictExpr:
+			{
+				OnConflictExpr *conf = (OnConflictExpr *) node;
+
+				APP_JUMB(conf->action);
+				JumbleExpr(jstate, (Node *) conf->arbiterElems);
+				JumbleExpr(jstate, conf->arbiterWhere);
+				JumbleExpr(jstate, (Node *) conf->onConflictSet);
+				JumbleExpr(jstate, conf->onConflictWhere);
+				APP_JUMB(conf->constraint);
+				APP_JUMB(conf->exclRelIndex);
+				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
+			}
+			break;
+		case T_List:
+			foreach(temp, (List *) node)
+			{
+				JumbleExpr(jstate, (Node *) lfirst(temp));
+			}
+			break;
+		case T_IntList:
+			foreach(temp, (List *) node)
+			{
+				APP_JUMB(lfirst_int(temp));
+			}
+			break;
+		case T_SortGroupClause:
+			{
+				SortGroupClause *sgc = (SortGroupClause *) node;
+
+				APP_JUMB(sgc->tleSortGroupRef);
+				APP_JUMB(sgc->eqop);
+				APP_JUMB(sgc->sortop);
+				APP_JUMB(sgc->nulls_first);
+			}
+			break;
+		case T_GroupingSet:
+			{
+				GroupingSet *gsnode = (GroupingSet *) node;
+
+				JumbleExpr(jstate, (Node *) gsnode->content);
+			}
+			break;
+		case T_WindowClause:
+			{
+				WindowClause *wc = (WindowClause *) node;
+
+				APP_JUMB(wc->winref);
+				APP_JUMB(wc->frameOptions);
+				JumbleExpr(jstate, (Node *) wc->partitionClause);
+				JumbleExpr(jstate, (Node *) wc->orderClause);
+				JumbleExpr(jstate, wc->startOffset);
+				JumbleExpr(jstate, wc->endOffset);
+			}
+			break;
+		case T_CommonTableExpr:
+			{
+				CommonTableExpr *cte = (CommonTableExpr *) node;
+
+				/* we store the string name because RTE_CTE RTEs need it */
+				APP_JUMB_STRING(cte->ctename);
+				APP_JUMB(cte->ctematerialized);
+				JumbleQueryInternal(jstate, castNode(Query, cte->ctequery));
+			}
+			break;
+		case T_SetOperationStmt:
+			{
+				SetOperationStmt *setop = (SetOperationStmt *) node;
+
+				APP_JUMB(setop->op);
+				APP_JUMB(setop->all);
+				JumbleExpr(jstate, setop->larg);
+				JumbleExpr(jstate, setop->rarg);
+			}
+			break;
+		case T_RangeTblFunction:
+			{
+				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
+
+				JumbleExpr(jstate, rtfunc->funcexpr);
+			}
+			break;
+		case T_TableFunc:
+			{
+				TableFunc  *tablefunc = (TableFunc *) node;
+
+				JumbleExpr(jstate, tablefunc->docexpr);
+				JumbleExpr(jstate, tablefunc->rowexpr);
+				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
+			}
+			break;
+		case T_TableSampleClause:
+			{
+				TableSampleClause *tsc = (TableSampleClause *) node;
+
+				APP_JUMB(tsc->tsmhandler);
+				JumbleExpr(jstate, (Node *) tsc->args);
+				JumbleExpr(jstate, (Node *) tsc->repeatable);
+			}
+			break;
+		default:
+			/* Only a warning, since we can stumble along anyway */
+			elog(WARNING, "unrecognized node type: %d",
+				 (int) nodeTag(node));
+			break;
+	}
+}
+
+/*
+ * Record location of constant within query string of query tree
+ * that is currently being walked.
+ */
+static void
+RecordConstLocation(JumbleState *jstate, int location)
+{
+	/* -1 indicates unknown or undefined location */
+	if (location >= 0)
+	{
+		/* enlarge array if needed */
+		if (jstate->clocations_count >= jstate->clocations_buf_size)
+		{
+			jstate->clocations_buf_size *= 2;
+			jstate->clocations = (LocationLen *)
+				repalloc(jstate->clocations,
+						 jstate->clocations_buf_size *
+						 sizeof(LocationLen));
+		}
+		jstate->clocations[jstate->clocations_count].location = location;
+		/* initialize lengths to -1 to simplify third-party module usage */
+		jstate->clocations[jstate->clocations_count].length = -1;
+		jstate->clocations_count++;
+	}
+}
diff --git a/src/include/parser/analyze.h b/src/include/parser/analyze.h
index fede4be820..3ba98daa74 100644
--- a/src/include/parser/analyze.h
+++ b/src/include/parser/analyze.h
@@ -15,10 +15,12 @@
 #define ANALYZE_H
 
 #include "parser/parse_node.h"
+#include "utils/queryjumble.h"
 
 /* Hook for plugins to get control at end of parse analysis */
 typedef void (*post_parse_analyze_hook_type) (ParseState *pstate,
-											  Query *query);
+											  Query *query,
+											  JumbleState *jstate);
 extern PGDLLIMPORT post_parse_analyze_hook_type post_parse_analyze_hook;
 
 
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 5004ee4177..40c4a75bac 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -248,6 +248,7 @@ extern bool log_btree_build_stats;
 extern PGDLLIMPORT bool check_function_bodies;
 extern bool session_auth_is_superuser;
 
+extern bool compute_queryid;
 extern bool log_duration;
 extern int	log_parameter_max_length;
 extern int	log_parameter_max_length_on_error;
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
new file mode 100644
index 0000000000..14087eea43
--- /dev/null
+++ b/src/include/utils/queryjumble.h
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.h
+ *	  Query normalization and fingerprinting.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/utils/queryjumble.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef QUERYJUBLE_H
+#define QUERYJUBLE_H
+
+#include "nodes/parsenodes.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+/*
+ * Struct for tracking locations/lengths of constants during normalization
+ */
+typedef struct LocationLen
+{
+	int			location;		/* start offset in query text */
+	int			length;			/* length in bytes, or -1 to ignore */
+} LocationLen;
+
+/*
+ * Working state for computing a query jumble and producing a normalized
+ * query string
+ */
+typedef struct JumbleState
+{
+	/* Jumble of current query tree */
+	unsigned char *jumble;
+
+	/* Number of bytes used in jumble[] */
+	Size		jumble_len;
+
+	/* Array of locations of constants that should be removed */
+	LocationLen *clocations;
+
+	/* Allocated length of clocations array */
+	int			clocations_buf_size;
+
+	/* Current number of valid entries in clocations array */
+	int			clocations_count;
+
+	/* highest Param id we've seen, in order to start normalization correctly */
+	int			highest_extern_param_id;
+} JumbleState;
+
+const char *clean_querytext(const char *query, int *location, int *len);
+JumbleState *JumbleQuery(Query *query, const char *querytext);
+
+#endif							/* QUERYJUMBLE_H */
-- 
2.29.2

v15-0002-Expose-queryid-in-pg_stat_activity-and-log_line_.patchtext/x-patch; charset=US-ASCII; name=v15-0002-Expose-queryid-in-pg_stat_activity-and-log_line_.patchDownload

From fc71558b1c2065874be43822af4f829e5e51e199 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Mon, 18 Mar 2019 18:55:50 +0100
Subject: [PATCH v15 2/3] Expose queryid in pg_stat_activity and
 log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.

Author: Julien Rouhaud
Reviewed-by: Evgeny Efimkin, Michael Paquier, Yamada Tatsuro, Atsushi Torikoshi
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 112 +++++++-----------
 doc/src/sgml/config.sgml                      |  29 +++--
 doc/src/sgml/monitoring.sgml                  |  16 +++
 src/backend/catalog/system_views.sql          |   1 +
 src/backend/executor/execMain.c               |   8 ++
 src/backend/executor/execParallel.c           |  14 ++-
 src/backend/executor/nodeGather.c             |   3 +-
 src/backend/executor/nodeGatherMerge.c        |   4 +-
 src/backend/parser/analyze.c                  |   5 +
 src/backend/postmaster/pgstat.c               |  65 ++++++++++
 src/backend/tcop/postgres.c                   |   5 +
 src/backend/utils/adt/pgstatfuncs.c           |   7 +-
 src/backend/utils/error/elog.c                |  10 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          |  29 +++--
 src/include/catalog/pg_proc.dat               |   6 +-
 src/include/executor/execParallel.h           |   3 +-
 src/include/pgstat.h                          |   5 +
 src/include/utils/queryjumble.h               |   2 +-
 src/test/regress/expected/rules.out           |   9 +-
 20 files changed, 224 insertions(+), 110 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 3db4fa2f7a..ce166f417e 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -65,6 +65,7 @@
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/queryjumble.h"
 #include "utils/memutils.h"
 #include "utils/timestamp.h"
 
@@ -99,6 +100,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+									!IsA(n, PrepareStmt) && \
+									!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -307,7 +316,6 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 								ProcessUtilityContext context, ParamListInfo params,
 								QueryEnvironment *queryEnv,
 								DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
 static void pgss_store(const char *query, uint64 queryId,
 					   int query_location, int query_len,
 					   pgssStoreKind kind,
@@ -804,16 +812,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * Clear queryId for prepared statements related utility, as those will
+	 * inherit from the underlying statement's one (except DEALLOCATE which is
+	 * entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt))
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -1055,6 +1061,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, QueryCompletion *qc)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
+	uint64		saved_queryId = pstmt->queryId;
+
+	/*
+	 * Force utility statements to get queryId zero.  We do this even in cases
+	 * where the statement contains an optimizable statement for which a
+	 * queryId could be derived (such as EXPLAIN or DECLARE CURSOR).  For such
+	 * cases, runtime control will first go through ProcessUtility and then the
+	 * executor, and we don't want the executor hooks to do anything, since we
+	 * are already measuring the statement's costs at the utility level.
+	 *
+	 * Note that this is only done if pg_stat_statements is enabled and
+	 * configured to track utility statements, in the unlikely possibility
+	 * that user configured another extension to handle utility statements
+	 * only.
+	 */
+	if (pgss_enabled(exec_nested_level) && pgss_track_utility)
+		pstmt->queryId = UINT64CONST(0);
 
 	/*
 	 * If it's an EXECUTE statement, we don't track it and don't increment the
@@ -1071,9 +1094,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	 * Likewise, we don't track execution of DEALLOCATE.
 	 */
 	if (pgss_track_utility && pgss_enabled(exec_nested_level) &&
-		!IsA(parsetree, ExecuteStmt) &&
-		!IsA(parsetree, PrepareStmt) &&
-		!IsA(parsetree, DeallocateStmt))
+		PGSS_HANDLED_UTILITY(parsetree))
 	{
 		instr_time	start;
 		instr_time	duration;
@@ -1128,7 +1149,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
 
 		pgss_store(queryString,
-				   0,			/* signal that it's a utility stmt */
+				   saved_queryId,
 				   pstmt->stmt_location,
 				   pstmt->stmt_len,
 				   PGSS_EXEC,
@@ -1151,23 +1172,12 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	}
 }
 
-/*
- * Given an arbitrarily long query string, produce a hash for the purposes of
- * identifying the query, without normalizing constants.  Used when hashing
- * utility statements.
- */
-static uint64
-pgss_hash_string(const char *str, int len)
-{
-	return DatumGetUInt64(hash_any_extended((const unsigned char *) str,
-											len, 0));
-}
-
 /*
  * Store some statistics for a statement.
  *
- * If queryId is 0 then this is a utility statement and we should compute
- * a suitable queryId internally.
+ * If queryId is 0 then this is a utility statement for which we couldn't
+ * compute a queryId during parse analysis, and we should compute a suitable
+ * queryId internally.
  *
  * If jstate is not NULL then we're trying to create an entry for which
  * we have no statistics as yet; we just want to record the normalized
@@ -1198,52 +1208,18 @@ pgss_store(const char *query, uint64 queryId,
 		return;
 
 	/*
-	 * Confine our attention to the relevant part of the string, if the query
-	 * is a portion of a multi-statement source string.
-	 *
-	 * First apply starting offset, unless it's -1 (unknown).
-	 */
-	if (query_location >= 0)
-	{
-		Assert(query_location <= strlen(query));
-		query += query_location;
-		/* Length of 0 (or -1) means "rest of string" */
-		if (query_len <= 0)
-			query_len = strlen(query);
-		else
-			Assert(query_len <= strlen(query));
-	}
-	else
-	{
-		/* If query location is unknown, distrust query_len as well */
-		query_location = 0;
-		query_len = strlen(query);
-	}
-
-	/*
-	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
-	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 * Nothing to do if compute_queryid isn't enabled and no other module
+	 * computed a query identifier.
 	 */
-	while (query_len > 0 && scanner_isspace(query[0]))
-		query++, query_location++, query_len--;
-	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
-		query_len--;
+	if (queryId == UINT64CONST(0))
+		return;
 
 	/*
-	 * For utility statements, we just hash the query string to get an ID.
+	 * Confine our attention to the relevant part of the string, if the query
+	 * is a portion of a multi-statement source string, and update query
+	 * location and length if needed.
 	 */
-	if (queryId == UINT64CONST(0))
-	{
-		queryId = pgss_hash_string(query, query_len);
-
-		/*
-		 * If we are unlucky enough to get a hash of zero(invalid), use
-		 * queryID as 2 instead, queryID 1 is already in use for normal
-		 * statements.
-		 */
-		if (queryId == UINT64CONST(0))
-			queryId = UINT64CONST(2);
-	}
+	query = CleanQuerytext(query, &query_location, &query_len);
 
 	/* Set up key for hashtable search */
 	key.userid = GetUserId();
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2af758b10e..d9c85a1f80 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6872,6 +6872,15 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>queryid: identifier of session's current query.
+             By default, query identifiers are not computed, so this field will
+             always be zero, unless <xref linkend="guc-compute-queryid"/>
+             parameter is enabled or if a third-party module that computes query
+             identifiers is configured.</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
@@ -7348,8 +7357,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       <listitem>
        <para>
         Enables the collection of information on the currently
-        executing command of each session, along with the time when
-        that command began execution. This parameter is on by
+        executing command of each session, along with its identifier and the
+        time when that command began execution. This parameter is on by
         default. Note that even when enabled, this information is not
         visible to all users, only to superusers and the user owning
         the session being reported on, so it should not represent a
@@ -7475,12 +7484,16 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </term>
       <listitem>
        <para>
-        Enables or disables in core query identifier computation.arameter.  The
-        <xref linkend="pgstatstatements"/> extension requires a query
-        identifier to be computed.  Note that an external module can
-        alternatively be used if the in core query identifier computation
-        specification doesn't suit your need.  In this case, in core
-        computation must be disabled.  The default is <literal>off</literal>.
+        Enables or disables in core query identifier computation.  A query
+        identifier can be displayed in the <link
+        linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
+        view, or emitted in the log if configured via the <xref
+        linkend="guc-log-line-prefix"/> parameter.  The <xref
+        linkend="pgstatstatements"/> extension also requires a query identifier
+        to be computed.  Note that an external module can alternatively be used
+        if the in core query identifier computation specification doesn't suit
+        your need.  In this case, in core computation must be disabled.  The
+        default is <literal>off</literal>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3cdb1aff3c..4cd698bf16 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -905,6 +905,22 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </para></entry>
      </row>
 
+    <row>
+     <entry role="catalog_table_entry"><para role="column_definition">
+      <structfield>queryid</structfield> <type>bigint</type>
+     </para>
+     <para>
+      Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this field
+      shows the identifier of the currently executing query. In all other
+      states, it shows the identifier of last query that was executed.  By
+      default, query identifiers are not computed, so this field will always
+      be null, unless <xref linkend="guc-compute-queryid"/> parameter is
+      enabled or if a third-party module that computes query identifiers is
+      configured.
+     </para></entry>
+    </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5d89e77dbe..d934719a69 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -764,6 +764,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b4e25df601..f1c59d922f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -54,6 +54,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "tcop/utility.h"
@@ -142,6 +143,13 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	/* In some cases (e.g. an EXECUTE statement) a query execution will skip
+	 * parse analysis, which means that the queryid won't be reported.  Note
+	 * that it's harmless to report the queryid multiple time, as the call will
+	 * be ignored if the top level queryid has already been reported.
+	 */
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..26f1994a31 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -124,7 +124,7 @@ typedef struct ExecParallelInitializeDSMContext
 } ExecParallelInitializeDSMContext;
 
 /* Helper functions that run in the parallel leader. */
-static char *ExecSerializePlan(Plan *plan, EState *estate);
+static char *ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId);
 static bool ExecParallelEstimate(PlanState *node,
 								 ExecParallelEstimateContext *e);
 static bool ExecParallelInitializeDSM(PlanState *node,
@@ -143,7 +143,7 @@ static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
  * Create a serialized representation of the plan to be sent to each worker.
  */
 static char *
-ExecSerializePlan(Plan *plan, EState *estate)
+ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId)
 {
 	PlannedStmt *pstmt;
 	ListCell   *lc;
@@ -174,7 +174,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = UINT64CONST(0);
+	pstmt->queryId = queryId;
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
@@ -578,7 +578,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
 ParallelExecutorInfo *
 ExecInitParallelPlan(PlanState *planstate, EState *estate,
 					 Bitmapset *sendParams, int nworkers,
-					 int64 tuples_needed)
+					 int64 tuples_needed,
+					 uint64 queryId)
 {
 	ParallelExecutorInfo *pei;
 	ParallelContext *pcxt;
@@ -620,7 +621,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pei->planstate = planstate;
 
 	/* Fix up and serialize plan to be sent to workers. */
-	pstmt_data = ExecSerializePlan(planstate->plan, estate);
+	pstmt_data = ExecSerializePlan(planstate->plan, estate, queryId);
 
 	/* Create a parallel context. */
 	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -1403,8 +1404,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Setting debug_query_string for individual workers */
 	debug_query_string = queryDesc->sourceText;
 
-	/* Report workers' query for monitoring purposes */
+	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 9e1dc464cb..04c860f678 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -172,7 +172,8 @@ ExecGather(PlanState *pstate)
 												 estate,
 												 gather->initParam,
 												 gather->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index aa5743cebf..32f74e8c23 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -24,6 +24,7 @@
 #include "lib/binaryheap.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -216,7 +217,8 @@ ExecGatherMerge(PlanState *pstate)
 												 estate,
 												 gm->initParam,
 												 gm->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 1bc0f66703..6a241f9f4a 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -44,6 +44,7 @@
 #include "parser/parse_target.h"
 #include "parser/parse_type.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
@@ -130,6 +131,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
@@ -167,6 +170,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 3f24a33ef1..9f0c2eec35 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3326,6 +3326,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
+			beentry->st_queryid = 0;
 			proc->wait_event_info = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
@@ -3356,6 +3357,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	beentry->st_state = state;
 	beentry->st_state_start_timestamp = current_timestamp;
 
+	/*
+	 * If a new query is started, we reset the query identifier as it'll only
+	 * be known after parse analysis, to avoid reporting last query's
+	 * identifier.
+	 */
+	if (state == STATE_RUNNING)
+		beentry->st_queryid = 0;
+
 	if (cmd_str != NULL)
 	{
 		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
@@ -3366,6 +3375,48 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/* --------
+ * pgstat_report_queryid() -
+ *
+ *	Called to update top-level query identifier.
+ * --------
+ */
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	/*
+	 * if track_activities is disabled, st_queryid should already have been
+	 * reset
+	 */
+	if (!pgstat_track_activities)
+		return;
+
+	/*
+	 * We only report the top-level query identifiers.  The stored queryid is
+	 * reset when a backend calls pgstat_report_activity(STATE_RUNNING), or
+	 * with an explicit call to this function using the force flag.  If the
+	 * saved query identifier is not zero it means that it's not a top-level
+	 * command, so ignore the one provided unless it's an explicit call to
+	 * reset the identifier.
+	 */
+	if (beentry->st_queryid != 0 && !force)
+		return;
+
+	/*
+	 * Update my status entry, following the protocol of bumping
+	 * st_changecount before and after.  We use a volatile pointer here to
+	 * ensure the compiler doesn't try to get cute.
+	 */
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+	beentry->st_queryid = queryId;
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
 /*-----------
  * pgstat_progress_start_command() -
  *
@@ -5042,6 +5093,20 @@ pgstat_get_db_entry(Oid databaseid, bool create)
 	return result;
 }
 
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ *	Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+	if (!MyBEEntry)
+		return 0;
+
+	return MyBEEntry->st_queryid;
+}
+
 
 /*
  * Lookup the hash table entry for the specified table. If no hash
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 45a45f4171..6a836bfc1a 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -747,6 +747,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	if (log_parser_stats)
 		ShowUsage("PARSE ANALYSIS STATISTICS");
 
@@ -965,6 +967,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
 			stmt->utilityStmt = query->utilityStmt;
 			stmt->stmt_location = query->stmt_location;
 			stmt->stmt_len = query->stmt_len;
+			stmt->queryId = query->queryId;
 		}
 		else
 		{
@@ -1081,6 +1084,8 @@ exec_simple_query(const char *query_string)
 		DestReceiver *receiver;
 		int16		format;
 
+		pgstat_report_queryid(0, true);
+
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 5c12a165a1..c5f267eafc 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -569,7 +569,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	30
+#define PG_STAT_GET_ACTIVITY_COLS	31
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -915,6 +915,10 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[28] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			if (beentry->st_queryid == 0)
+				nulls[30] = true;
+			else
+				values[30] = DatumGetUInt64(beentry->st_queryid);
 		}
 		else
 		{
@@ -943,6 +947,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[27] = true;
 			nulls[28] = true;
 			nulls[29] = true;
+			nulls[30] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 7790f6ab25..0ae922c775 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -72,11 +72,11 @@
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/bgworker.h"
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
-#include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2709,6 +2709,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (padding != 0)
+					appendStringInfo(buf, "%*ld", padding,
+							pgstat_get_my_queryid());
+				else
+					appendStringInfo(buf, "%ld",
+							pgstat_get_my_queryid());
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index c4421bcc1f..799f3e692b 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -540,6 +540,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index ae84fcac6e..b0a5731ef7 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -39,7 +39,7 @@
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
-static uint64 compute_utility_queryid(const char *str, int query_len);
+static uint64 compute_utility_queryid(const char *str, int query_location, int query_len);
 static void AppendJumble(JumbleState *jstate,
 						 const unsigned char *item, Size size);
 static void JumbleQueryInternal(JumbleState *jstate, Query *query);
@@ -53,7 +53,7 @@ static void RecordConstLocation(JumbleState *jstate, int location);
  * relevant part of the string.
  */
 const char *
-clean_querytext(const char *query, int *location, int *len)
+CleanQuerytext(const char *query, int *location, int *len)
 {
 	int query_location = *location;
 	int query_len = *len;
@@ -97,17 +97,9 @@ JumbleQuery(Query *query, const char *querytext)
 	JumbleState *jstate = NULL;
 	if (query->utilityStmt)
 	{
-		const char *sql;
-		int query_location = query->stmt_location;
-		int query_len = query->stmt_len;
-
-		/*
-		 * Confine our attention to the relevant part of the string, if the
-		 * query is a portion of a multi-statement source string.
-		 */
-		sql = clean_querytext(querytext, &query_location, &query_len);
-
-		query->queryId = compute_utility_queryid(sql, query_len);
+		query->queryId = compute_utility_queryid(querytext,
+												 query->stmt_location,
+												 query->stmt_len);
 	}
 	else
 	{
@@ -143,11 +135,18 @@ JumbleQuery(Query *query, const char *querytext)
  * Compute a query identifier for the given utility query string.
  */
 static uint64
-compute_utility_queryid(const char *str, int query_len)
+compute_utility_queryid(const char *query_text, int query_location, int query_len)
 {
 	uint64 queryId;
+	const char *sql;
+
+	/*
+	 * Confine our attention to the relevant part of the string, if the
+	 * query is a portion of a multi-statement source string.
+	 */
+	sql = CleanQuerytext(query_text, &query_location, &query_len);
 
-	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) sql,
 											   query_len, 0));
 
 	/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d7b55f57ea..a05cd9f868 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5236,9 +5236,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid, queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 3888175a2f..e0e08e0b27 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -39,7 +39,8 @@ typedef struct ParallelExecutorInfo
 
 extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
 												  EState *estate, Bitmapset *sendParam, int nworkers,
-												  int64 tuples_needed);
+												  int64 tuples_needed,
+												  uint64 queryId);
 extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
 extern void ExecParallelFinish(ParallelExecutorInfo *pei);
 extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index c38b689710..fbb54de637 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1218,6 +1218,9 @@ typedef struct PgBackendStatus
 	ProgressCommandType st_progress_command;
 	Oid			st_progress_command_target;
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+
+	/* query identifier, optionally computed using post_parse_analyze_hook */
+	uint64		st_queryid;
 } PgBackendStatus;
 
 /*
@@ -1407,6 +1410,7 @@ extern void pgstat_initialize(void);
 extern void pgstat_bestart(void);
 
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_queryid(uint64 queryId, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
@@ -1415,6 +1419,7 @@ extern const char *pgstat_get_wait_event_type(uint32 wait_event_info);
 extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
+extern uint64 pgstat_get_my_queryid(void);
 
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
index 14087eea43..520cd4f43e 100644
--- a/src/include/utils/queryjumble.h
+++ b/src/include/utils/queryjumble.h
@@ -52,7 +52,7 @@ typedef struct JumbleState
 	int			highest_extern_param_id;
 } JumbleState;
 
-const char *clean_querytext(const char *query, int *location, int *len);
+const char *CleanQuerytext(const char *query, int *location, int *len);
 JumbleState *JumbleQuery(Query *query, const char *querytext);
 
 #endif							/* QUERYJUMBLE_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a687e99d1e..a26af67450 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1760,9 +1760,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1867,7 +1868,7 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
@@ -2024,7 +2025,7 @@ pg_stat_replication| SELECT s.pid,
     w.sync_priority,
     w.sync_state,
     w.reply_time
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_replication_slots| SELECT s.slot_name,
@@ -2055,7 +2056,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,
-- 
2.29.2

rjuju123@gmail.com

almost 5 years ago

In reply to: Julien Rouhaud (#107)

3 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Jan 8, 2021 at 1:07 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

v15 that fixes recent conflicts.

Rebase only, thanks to the cfbot! V16 attached.

Attachments:

v16-0002-Expose-queryid-in-pg_stat_activity-and-log_line_.patchtext/x-patch; charset=US-ASCII; name=v16-0002-Expose-queryid-in-pg_stat_activity-and-log_line_.patchDownload

From a0388c53d9755cfd706513f7f02a08b31a48aacb Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Mon, 18 Mar 2019 18:55:50 +0100
Subject: [PATCH v16 2/3] Expose queryid in pg_stat_activity and
 log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.

Author: Julien Rouhaud
Reviewed-by: Evgeny Efimkin, Michael Paquier, Yamada Tatsuro, Atsushi Torikoshi
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 112 +++++++-----------
 doc/src/sgml/config.sgml                      |  29 +++--
 doc/src/sgml/monitoring.sgml                  |  16 +++
 src/backend/catalog/system_views.sql          |   1 +
 src/backend/executor/execMain.c               |   8 ++
 src/backend/executor/execParallel.c           |  14 ++-
 src/backend/executor/nodeGather.c             |   3 +-
 src/backend/executor/nodeGatherMerge.c        |   4 +-
 src/backend/parser/analyze.c                  |   5 +
 src/backend/postmaster/pgstat.c               |  65 ++++++++++
 src/backend/tcop/postgres.c                   |   5 +
 src/backend/utils/adt/pgstatfuncs.c           |   7 +-
 src/backend/utils/error/elog.c                |   9 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          |  29 +++--
 src/include/catalog/pg_proc.dat               |   6 +-
 src/include/executor/execParallel.h           |   3 +-
 src/include/pgstat.h                          |   5 +
 src/include/utils/queryjumble.h               |   2 +-
 src/test/regress/expected/rules.out           |   9 +-
 20 files changed, 223 insertions(+), 110 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 3db4fa2f7a..ce166f417e 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -65,6 +65,7 @@
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/queryjumble.h"
 #include "utils/memutils.h"
 #include "utils/timestamp.h"
 
@@ -99,6 +100,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+									!IsA(n, PrepareStmt) && \
+									!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -307,7 +316,6 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 								ProcessUtilityContext context, ParamListInfo params,
 								QueryEnvironment *queryEnv,
 								DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
 static void pgss_store(const char *query, uint64 queryId,
 					   int query_location, int query_len,
 					   pgssStoreKind kind,
@@ -804,16 +812,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * Clear queryId for prepared statements related utility, as those will
+	 * inherit from the underlying statement's one (except DEALLOCATE which is
+	 * entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt))
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -1055,6 +1061,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, QueryCompletion *qc)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
+	uint64		saved_queryId = pstmt->queryId;
+
+	/*
+	 * Force utility statements to get queryId zero.  We do this even in cases
+	 * where the statement contains an optimizable statement for which a
+	 * queryId could be derived (such as EXPLAIN or DECLARE CURSOR).  For such
+	 * cases, runtime control will first go through ProcessUtility and then the
+	 * executor, and we don't want the executor hooks to do anything, since we
+	 * are already measuring the statement's costs at the utility level.
+	 *
+	 * Note that this is only done if pg_stat_statements is enabled and
+	 * configured to track utility statements, in the unlikely possibility
+	 * that user configured another extension to handle utility statements
+	 * only.
+	 */
+	if (pgss_enabled(exec_nested_level) && pgss_track_utility)
+		pstmt->queryId = UINT64CONST(0);
 
 	/*
 	 * If it's an EXECUTE statement, we don't track it and don't increment the
@@ -1071,9 +1094,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	 * Likewise, we don't track execution of DEALLOCATE.
 	 */
 	if (pgss_track_utility && pgss_enabled(exec_nested_level) &&
-		!IsA(parsetree, ExecuteStmt) &&
-		!IsA(parsetree, PrepareStmt) &&
-		!IsA(parsetree, DeallocateStmt))
+		PGSS_HANDLED_UTILITY(parsetree))
 	{
 		instr_time	start;
 		instr_time	duration;
@@ -1128,7 +1149,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
 
 		pgss_store(queryString,
-				   0,			/* signal that it's a utility stmt */
+				   saved_queryId,
 				   pstmt->stmt_location,
 				   pstmt->stmt_len,
 				   PGSS_EXEC,
@@ -1151,23 +1172,12 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	}
 }
 
-/*
- * Given an arbitrarily long query string, produce a hash for the purposes of
- * identifying the query, without normalizing constants.  Used when hashing
- * utility statements.
- */
-static uint64
-pgss_hash_string(const char *str, int len)
-{
-	return DatumGetUInt64(hash_any_extended((const unsigned char *) str,
-											len, 0));
-}
-
 /*
  * Store some statistics for a statement.
  *
- * If queryId is 0 then this is a utility statement and we should compute
- * a suitable queryId internally.
+ * If queryId is 0 then this is a utility statement for which we couldn't
+ * compute a queryId during parse analysis, and we should compute a suitable
+ * queryId internally.
  *
  * If jstate is not NULL then we're trying to create an entry for which
  * we have no statistics as yet; we just want to record the normalized
@@ -1198,52 +1208,18 @@ pgss_store(const char *query, uint64 queryId,
 		return;
 
 	/*
-	 * Confine our attention to the relevant part of the string, if the query
-	 * is a portion of a multi-statement source string.
-	 *
-	 * First apply starting offset, unless it's -1 (unknown).
-	 */
-	if (query_location >= 0)
-	{
-		Assert(query_location <= strlen(query));
-		query += query_location;
-		/* Length of 0 (or -1) means "rest of string" */
-		if (query_len <= 0)
-			query_len = strlen(query);
-		else
-			Assert(query_len <= strlen(query));
-	}
-	else
-	{
-		/* If query location is unknown, distrust query_len as well */
-		query_location = 0;
-		query_len = strlen(query);
-	}
-
-	/*
-	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
-	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 * Nothing to do if compute_queryid isn't enabled and no other module
+	 * computed a query identifier.
 	 */
-	while (query_len > 0 && scanner_isspace(query[0]))
-		query++, query_location++, query_len--;
-	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
-		query_len--;
+	if (queryId == UINT64CONST(0))
+		return;
 
 	/*
-	 * For utility statements, we just hash the query string to get an ID.
+	 * Confine our attention to the relevant part of the string, if the query
+	 * is a portion of a multi-statement source string, and update query
+	 * location and length if needed.
 	 */
-	if (queryId == UINT64CONST(0))
-	{
-		queryId = pgss_hash_string(query, query_len);
-
-		/*
-		 * If we are unlucky enough to get a hash of zero(invalid), use
-		 * queryID as 2 instead, queryID 1 is already in use for normal
-		 * statements.
-		 */
-		if (queryId == UINT64CONST(0))
-			queryId = UINT64CONST(2);
-	}
+	query = CleanQuerytext(query, &query_location, &query_len);
 
 	/* Set up key for hashtable search */
 	key.userid = GetUserId();
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e4e8e7e997..ffaf46a8a3 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6872,6 +6872,15 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>queryid: identifier of session's current query.
+             By default, query identifiers are not computed, so this field will
+             always be zero, unless <xref linkend="guc-compute-queryid"/>
+             parameter is enabled or if a third-party module that computes query
+             identifiers is configured.</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
@@ -7348,8 +7357,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       <listitem>
        <para>
         Enables the collection of information on the currently
-        executing command of each session, along with the time when
-        that command began execution. This parameter is on by
+        executing command of each session, along with its identifier and the
+        time when that command began execution. This parameter is on by
         default. Note that even when enabled, this information is not
         visible to all users, only to superusers and the user owning
         the session being reported on, so it should not represent a
@@ -7475,12 +7484,16 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </term>
       <listitem>
        <para>
-        Enables or disables in core query identifier computation.arameter.  The
-        <xref linkend="pgstatstatements"/> extension requires a query
-        identifier to be computed.  Note that an external module can
-        alternatively be used if the in core query identifier computation
-        specification doesn't suit your need.  In this case, in core
-        computation must be disabled.  The default is <literal>off</literal>.
+        Enables or disables in core query identifier computation.  A query
+        identifier can be displayed in the <link
+        linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
+        view, or emitted in the log if configured via the <xref
+        linkend="guc-log-line-prefix"/> parameter.  The <xref
+        linkend="pgstatstatements"/> extension also requires a query identifier
+        to be computed.  Note that an external module can alternatively be used
+        if the in core query identifier computation specification doesn't suit
+        your need.  In this case, in core computation must be disabled.  The
+        default is <literal>off</literal>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index f05140dd42..5c3de36bfe 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -905,6 +905,22 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </para></entry>
      </row>
 
+    <row>
+     <entry role="catalog_table_entry"><para role="column_definition">
+      <structfield>queryid</structfield> <type>bigint</type>
+     </para>
+     <para>
+      Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this field
+      shows the identifier of the currently executing query. In all other
+      states, it shows the identifier of last query that was executed.  By
+      default, query identifiers are not computed, so this field will always
+      be null, unless <xref linkend="guc-compute-queryid"/> parameter is
+      enabled or if a third-party module that computes query identifiers is
+      configured.
+     </para></entry>
+    </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd9d7..fdcc21c656 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -764,6 +764,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index f4dd47acc7..f931370df2 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -54,6 +54,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "tcop/utility.h"
@@ -142,6 +143,13 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	/* In some cases (e.g. an EXECUTE statement) a query execution will skip
+	 * parse analysis, which means that the queryid won't be reported.  Note
+	 * that it's harmless to report the queryid multiple time, as the call will
+	 * be ignored if the top level queryid has already been reported.
+	 */
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..26f1994a31 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -124,7 +124,7 @@ typedef struct ExecParallelInitializeDSMContext
 } ExecParallelInitializeDSMContext;
 
 /* Helper functions that run in the parallel leader. */
-static char *ExecSerializePlan(Plan *plan, EState *estate);
+static char *ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId);
 static bool ExecParallelEstimate(PlanState *node,
 								 ExecParallelEstimateContext *e);
 static bool ExecParallelInitializeDSM(PlanState *node,
@@ -143,7 +143,7 @@ static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
  * Create a serialized representation of the plan to be sent to each worker.
  */
 static char *
-ExecSerializePlan(Plan *plan, EState *estate)
+ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId)
 {
 	PlannedStmt *pstmt;
 	ListCell   *lc;
@@ -174,7 +174,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = UINT64CONST(0);
+	pstmt->queryId = queryId;
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
@@ -578,7 +578,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
 ParallelExecutorInfo *
 ExecInitParallelPlan(PlanState *planstate, EState *estate,
 					 Bitmapset *sendParams, int nworkers,
-					 int64 tuples_needed)
+					 int64 tuples_needed,
+					 uint64 queryId)
 {
 	ParallelExecutorInfo *pei;
 	ParallelContext *pcxt;
@@ -620,7 +621,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pei->planstate = planstate;
 
 	/* Fix up and serialize plan to be sent to workers. */
-	pstmt_data = ExecSerializePlan(planstate->plan, estate);
+	pstmt_data = ExecSerializePlan(planstate->plan, estate, queryId);
 
 	/* Create a parallel context. */
 	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -1403,8 +1404,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Setting debug_query_string for individual workers */
 	debug_query_string = queryDesc->sourceText;
 
-	/* Report workers' query for monitoring purposes */
+	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 9e1dc464cb..04c860f678 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -172,7 +172,8 @@ ExecGather(PlanState *pstate)
 												 estate,
 												 gather->initParam,
 												 gather->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index aa5743cebf..32f74e8c23 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -24,6 +24,7 @@
 #include "lib/binaryheap.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -216,7 +217,8 @@ ExecGatherMerge(PlanState *pstate)
 												 estate,
 												 gm->initParam,
 												 gm->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 1bc0f66703..6a241f9f4a 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -44,6 +44,7 @@
 #include "parser/parse_target.h"
 #include "parser/parse_type.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
@@ -130,6 +131,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
@@ -167,6 +170,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f75b52719d..b68efa320b 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3382,6 +3382,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
+			beentry->st_queryid = 0;
 			proc->wait_event_info = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
@@ -3436,6 +3437,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	beentry->st_state = state;
 	beentry->st_state_start_timestamp = current_timestamp;
 
+	/*
+	 * If a new query is started, we reset the query identifier as it'll only
+	 * be known after parse analysis, to avoid reporting last query's
+	 * identifier.
+	 */
+	if (state == STATE_RUNNING)
+		beentry->st_queryid = 0;
+
 	if (cmd_str != NULL)
 	{
 		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
@@ -3446,6 +3455,48 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/* --------
+ * pgstat_report_queryid() -
+ *
+ *	Called to update top-level query identifier.
+ * --------
+ */
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	/*
+	 * if track_activities is disabled, st_queryid should already have been
+	 * reset
+	 */
+	if (!pgstat_track_activities)
+		return;
+
+	/*
+	 * We only report the top-level query identifiers.  The stored queryid is
+	 * reset when a backend calls pgstat_report_activity(STATE_RUNNING), or
+	 * with an explicit call to this function using the force flag.  If the
+	 * saved query identifier is not zero it means that it's not a top-level
+	 * command, so ignore the one provided unless it's an explicit call to
+	 * reset the identifier.
+	 */
+	if (beentry->st_queryid != 0 && !force)
+		return;
+
+	/*
+	 * Update my status entry, following the protocol of bumping
+	 * st_changecount before and after.  We use a volatile pointer here to
+	 * ensure the compiler doesn't try to get cute.
+	 */
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+	beentry->st_queryid = queryId;
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
 /*-----------
  * pgstat_progress_start_command() -
  *
@@ -5133,6 +5184,20 @@ pgstat_get_db_entry(Oid databaseid, bool create)
 	return result;
 }
 
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ *	Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+	if (!MyBEEntry)
+		return 0;
+
+	return MyBEEntry->st_queryid;
+}
+
 
 /*
  * Lookup the hash table entry for the specified table. If no hash
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 46379dd6db..cc0360b1bf 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -747,6 +747,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	if (log_parser_stats)
 		ShowUsage("PARSE ANALYSIS STATISTICS");
 
@@ -965,6 +967,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
 			stmt->utilityStmt = query->utilityStmt;
 			stmt->stmt_location = query->stmt_location;
 			stmt->stmt_len = query->stmt_len;
+			stmt->queryId = query->queryId;
 		}
 		else
 		{
@@ -1081,6 +1084,8 @@ exec_simple_query(const char *query_string)
 		DestReceiver *receiver;
 		int16		format;
 
+		pgstat_report_queryid(0, true);
+
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 62bff52638..5e0ba55ac1 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -569,7 +569,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	30
+#define PG_STAT_GET_ACTIVITY_COLS	31
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -915,6 +915,10 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[28] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			if (beentry->st_queryid == 0)
+				nulls[30] = true;
+			else
+				values[30] = DatumGetUInt64(beentry->st_queryid);
 		}
 		else
 		{
@@ -943,6 +947,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[27] = true;
 			nulls[28] = true;
 			nulls[29] = true;
+			nulls[30] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 80c2672461..1ed2e146d5 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -77,7 +77,6 @@
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
-#include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2717,6 +2716,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (padding != 0)
+					appendStringInfo(buf, "%*ld", padding,
+							pgstat_get_my_queryid());
+				else
+					appendStringInfo(buf, "%ld",
+							pgstat_get_my_queryid());
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index c4421bcc1f..799f3e692b 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -540,6 +540,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index ae84fcac6e..b0a5731ef7 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -39,7 +39,7 @@
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
-static uint64 compute_utility_queryid(const char *str, int query_len);
+static uint64 compute_utility_queryid(const char *str, int query_location, int query_len);
 static void AppendJumble(JumbleState *jstate,
 						 const unsigned char *item, Size size);
 static void JumbleQueryInternal(JumbleState *jstate, Query *query);
@@ -53,7 +53,7 @@ static void RecordConstLocation(JumbleState *jstate, int location);
  * relevant part of the string.
  */
 const char *
-clean_querytext(const char *query, int *location, int *len)
+CleanQuerytext(const char *query, int *location, int *len)
 {
 	int query_location = *location;
 	int query_len = *len;
@@ -97,17 +97,9 @@ JumbleQuery(Query *query, const char *querytext)
 	JumbleState *jstate = NULL;
 	if (query->utilityStmt)
 	{
-		const char *sql;
-		int query_location = query->stmt_location;
-		int query_len = query->stmt_len;
-
-		/*
-		 * Confine our attention to the relevant part of the string, if the
-		 * query is a portion of a multi-statement source string.
-		 */
-		sql = clean_querytext(querytext, &query_location, &query_len);
-
-		query->queryId = compute_utility_queryid(sql, query_len);
+		query->queryId = compute_utility_queryid(querytext,
+												 query->stmt_location,
+												 query->stmt_len);
 	}
 	else
 	{
@@ -143,11 +135,18 @@ JumbleQuery(Query *query, const char *querytext)
  * Compute a query identifier for the given utility query string.
  */
 static uint64
-compute_utility_queryid(const char *str, int query_len)
+compute_utility_queryid(const char *query_text, int query_location, int query_len)
 {
 	uint64 queryId;
+	const char *sql;
+
+	/*
+	 * Confine our attention to the relevant part of the string, if the
+	 * query is a portion of a multi-statement source string.
+	 */
+	sql = CleanQuerytext(query_text, &query_location, &query_len);
 
-	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) sql,
 											   query_len, 0));
 
 	/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b5f52d4e4a..80e6e660c0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5238,9 +5238,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid, queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 3888175a2f..e0e08e0b27 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -39,7 +39,8 @@ typedef struct ParallelExecutorInfo
 
 extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
 												  EState *estate, Bitmapset *sendParam, int nworkers,
-												  int64 tuples_needed);
+												  int64 tuples_needed,
+												  uint64 queryId);
 extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
 extern void ExecParallelFinish(ParallelExecutorInfo *pei);
 extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 724068cf87..2347070116 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1252,6 +1252,9 @@ typedef struct PgBackendStatus
 	ProgressCommandType st_progress_command;
 	Oid			st_progress_command_target;
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+
+	/* query identifier, optionally computed using post_parse_analyze_hook */
+	uint64		st_queryid;
 } PgBackendStatus;
 
 /*
@@ -1446,6 +1449,7 @@ extern void pgstat_initialize(void);
 extern void pgstat_bestart(void);
 
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_queryid(uint64 queryId, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
@@ -1454,6 +1458,7 @@ extern const char *pgstat_get_wait_event_type(uint32 wait_event_info);
 extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
+extern uint64 pgstat_get_my_queryid(void);
 
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
index 14087eea43..520cd4f43e 100644
--- a/src/include/utils/queryjumble.h
+++ b/src/include/utils/queryjumble.h
@@ -52,7 +52,7 @@ typedef struct JumbleState
 	int			highest_extern_param_id;
 } JumbleState;
 
-const char *clean_querytext(const char *query, int *location, int *len);
+const char *CleanQuerytext(const char *query, int *location, int *len);
 JumbleState *JumbleQuery(Query *query, const char *querytext);
 
 #endif							/* QUERYJUMBLE_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6173473de9..b2aa7b6e77 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1760,9 +1760,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1874,7 +1875,7 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
@@ -2031,7 +2032,7 @@ pg_stat_replication| SELECT s.pid,
     w.sync_priority,
     w.sync_state,
     w.reply_time
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_replication_slots| SELECT s.slot_name,
@@ -2062,7 +2063,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,
-- 
2.29.2

v16-0001-Move-pg_stat_statements-query-jumbling-to-core.patchtext/x-patch; charset=US-ASCII; name=v16-0001-Move-pg_stat_statements-query-jumbling-to-core.patchDownload

From 8d46e58c90249f66582eccaabaa37286bcbda7bd Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Wed, 14 Oct 2020 02:11:37 +0800
Subject: [PATCH v16 1/3] Move pg_stat_statements query jumbling to core.

A new compute_queryid GUC is also added, to control whether the queryid should
be computed.  It's now possible to disable core queryid computation and use
pg_stat_statements with a different algorithm to compute the queryid by using
third-party module.

Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +----------------
 .../pg_stat_statements.conf                   |   1 +
 doc/src/sgml/config.sgml                      |  18 +
 src/backend/parser/analyze.c                  |  14 +-
 src/backend/tcop/postgres.c                   |   6 +-
 src/backend/utils/misc/Makefile               |   1 +
 src/backend/utils/misc/guc.c                  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          | 834 ++++++++++++++++++
 src/include/parser/analyze.h                  |   4 +-
 src/include/utils/guc.h                       |   1 +
 src/include/utils/queryjumble.h               |  58 ++
 12 files changed, 969 insertions(+), 784 deletions(-)
 create mode 100644 src/backend/utils/misc/queryjumble.c
 create mode 100644 src/include/utils/queryjumble.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 72a117fc19..3db4fa2f7a 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * As of Postgres 9.2, this module normalizes query entries.  As of Postgres
+ * 14, the normalization is done by the core, if compute_queryid is enabled, or
+ * by third-party modules if enabled.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -114,8 +99,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
-#define JUMBLE_SIZE				1024	/* query serialization buffer size */
-
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -235,40 +218,6 @@ typedef struct pgssSharedState
 	pgssGlobalStats stats;		/* global statistics for pgss */
 } pgssSharedState;
 
-/*
- * Struct for tracking locations/lengths of constants during normalization
- */
-typedef struct pgssLocationLen
-{
-	int			location;		/* start offset in query text */
-	int			length;			/* length in bytes, or -1 to ignore */
-} pgssLocationLen;
-
-/*
- * Working state for computing a query jumble and producing a normalized
- * query string
- */
-typedef struct pgssJumbleState
-{
-	/* Jumble of current query tree */
-	unsigned char *jumble;
-
-	/* Number of bytes used in jumble[] */
-	Size		jumble_len;
-
-	/* Array of locations of constants that should be removed */
-	pgssLocationLen *clocations;
-
-	/* Allocated length of clocations array */
-	int			clocations_buf_size;
-
-	/* Current number of valid entries in clocations array */
-	int			clocations_count;
-
-	/* highest Param id we've seen, in order to start normalization correctly */
-	int			highest_extern_param_id;
-} pgssJumbleState;
-
 /*---- Local variables ----*/
 
 /* Current nesting depth of ExecutorRun+ProcessUtility calls */
@@ -342,7 +291,8 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_info);
 
 static void pgss_shmem_startup(void);
 static void pgss_shmem_shutdown(int code, Datum arg);
-static void pgss_post_parse_analyze(ParseState *pstate, Query *query);
+static void pgss_post_parse_analyze(ParseState *pstate, Query *query,
+									JumbleState *jstate);
 static PlannedStmt *pgss_planner(Query *parse,
 								 const char *query_string,
 								 int cursorOptions,
@@ -364,7 +314,7 @@ static void pgss_store(const char *query, uint64 queryId,
 					   double total_time, uint64 rows,
 					   const BufferUsage *bufusage,
 					   const WalUsage *walusage,
-					   pgssJumbleState *jstate);
+					   JumbleState *jstate);
 static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
 										pgssVersion api_version,
 										bool showtext);
@@ -380,16 +330,9 @@ static char *qtext_fetch(Size query_offset, int query_len,
 static bool need_gc_qtexts(void);
 static void gc_qtexts(void);
 static void entry_reset(Oid userid, Oid dbid, uint64 queryid);
-static void AppendJumble(pgssJumbleState *jstate,
-						 const unsigned char *item, Size size);
-static void JumbleQuery(pgssJumbleState *jstate, Query *query);
-static void JumbleRangeTable(pgssJumbleState *jstate, List *rtable);
-static void JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks);
-static void JumbleExpr(pgssJumbleState *jstate, Node *node);
-static void RecordConstLocation(pgssJumbleState *jstate, int location);
-static char *generate_normalized_query(pgssJumbleState *jstate, const char *query,
+static char *generate_normalized_query(JumbleState *jstate, const char *query,
 									   int query_loc, int *query_len_p);
-static void fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+static void fill_in_constant_lengths(JumbleState *jstate, const char *query,
 									 int query_loc);
 static int	comp_location(const void *a, const void *b);
 
@@ -851,15 +794,10 @@ error:
  * Post-parse-analysis hook: mark query with a queryId
  */
 static void
-pgss_post_parse_analyze(ParseState *pstate, Query *query)
+pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 {
-	pgssJumbleState jstate;
-
 	if (prev_post_parse_analyze_hook)
-		prev_post_parse_analyze_hook(pstate, query);
-
-	/* Assert we didn't do this already */
-	Assert(query->queryId == UINT64CONST(0));
+		prev_post_parse_analyze_hook(pstate, query, jstate);
 
 	/* Safety check... */
 	if (!pgss || !pgss_hash || !pgss_enabled(exec_nested_level))
@@ -879,35 +817,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 	}
 
-	/* Set up workspace for query jumbling */
-	jstate.jumble = (unsigned char *) palloc(JUMBLE_SIZE);
-	jstate.jumble_len = 0;
-	jstate.clocations_buf_size = 32;
-	jstate.clocations = (pgssLocationLen *)
-		palloc(jstate.clocations_buf_size * sizeof(pgssLocationLen));
-	jstate.clocations_count = 0;
-	jstate.highest_extern_param_id = 0;
-
-	/* Compute query ID and mark the Query node with it */
-	JumbleQuery(&jstate, query);
-	query->queryId =
-		DatumGetUInt64(hash_any_extended(jstate.jumble, jstate.jumble_len, 0));
-
 	/*
-	 * If we are unlucky enough to get a hash of zero, use 1 instead, to
-	 * prevent confusion with the utility-statement case.
+	 * If query jumbling were able to identify any ignorable constants, we
+	 * immediately create a hash table entry for the query, so that we can
+	 * record the normalized form of the query string.  If there were no such
+	 * constants, the normalized string would be the same as the query text
+	 * anyway, so there's no need for an early entry.
 	 */
-	if (query->queryId == UINT64CONST(0))
-		query->queryId = UINT64CONST(1);
-
-	/*
-	 * If we were able to identify any ignorable constants, we immediately
-	 * create a hash table entry for the query, so that we can record the
-	 * normalized form of the query string.  If there were no such constants,
-	 * the normalized string would be the same as the query text anyway, so
-	 * there's no need for an early entry.
-	 */
-	if (jstate.clocations_count > 0)
+	if (jstate && jstate->clocations_count > 0)
 		pgss_store(pstate->p_sourcetext,
 				   query->queryId,
 				   query->stmt_location,
@@ -917,7 +834,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 				   0,
 				   NULL,
 				   NULL,
-				   &jstate);
+				   jstate);
 }
 
 /*
@@ -1267,7 +1184,7 @@ pgss_store(const char *query, uint64 queryId,
 		   double total_time, uint64 rows,
 		   const BufferUsage *bufusage,
 		   const WalUsage *walusage,
-		   pgssJumbleState *jstate)
+		   JumbleState *jstate)
 {
 	pgssHashKey key;
 	pgssEntry  *entry;
@@ -2622,678 +2539,6 @@ release_lock:
 	LWLockRelease(pgss->lock);
 }
 
-/*
- * AppendJumble: Append a value that is substantive in a given query to
- * the current jumble.
- */
-static void
-AppendJumble(pgssJumbleState *jstate, const unsigned char *item, Size size)
-{
-	unsigned char *jumble = jstate->jumble;
-	Size		jumble_len = jstate->jumble_len;
-
-	/*
-	 * Whenever the jumble buffer is full, we hash the current contents and
-	 * reset the buffer to contain just that hash value, thus relying on the
-	 * hash to summarize everything so far.
-	 */
-	while (size > 0)
-	{
-		Size		part_size;
-
-		if (jumble_len >= JUMBLE_SIZE)
-		{
-			uint64		start_hash;
-
-			start_hash = DatumGetUInt64(hash_any_extended(jumble,
-														  JUMBLE_SIZE, 0));
-			memcpy(jumble, &start_hash, sizeof(start_hash));
-			jumble_len = sizeof(start_hash);
-		}
-		part_size = Min(size, JUMBLE_SIZE - jumble_len);
-		memcpy(jumble + jumble_len, item, part_size);
-		jumble_len += part_size;
-		item += part_size;
-		size -= part_size;
-	}
-	jstate->jumble_len = jumble_len;
-}
-
-/*
- * Wrappers around AppendJumble to encapsulate details of serialization
- * of individual local variable elements.
- */
-#define APP_JUMB(item) \
-	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
-#define APP_JUMB_STRING(str) \
-	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
-
-/*
- * JumbleQuery: Selectively serialize the query tree, appending significant
- * data to the "query jumble" while ignoring nonsignificant data.
- *
- * Rule of thumb for what to include is that we should ignore anything not
- * semantically significant (such as alias names) as well as anything that can
- * be deduced from child nodes (else we'd just be double-hashing that piece
- * of information).
- */
-static void
-JumbleQuery(pgssJumbleState *jstate, Query *query)
-{
-	Assert(IsA(query, Query));
-	Assert(query->utilityStmt == NULL);
-
-	APP_JUMB(query->commandType);
-	/* resultRelation is usually predictable from commandType */
-	JumbleExpr(jstate, (Node *) query->cteList);
-	JumbleRangeTable(jstate, query->rtable);
-	JumbleExpr(jstate, (Node *) query->jointree);
-	JumbleExpr(jstate, (Node *) query->targetList);
-	JumbleExpr(jstate, (Node *) query->onConflict);
-	JumbleExpr(jstate, (Node *) query->returningList);
-	JumbleExpr(jstate, (Node *) query->groupClause);
-	JumbleExpr(jstate, (Node *) query->groupingSets);
-	JumbleExpr(jstate, query->havingQual);
-	JumbleExpr(jstate, (Node *) query->windowClause);
-	JumbleExpr(jstate, (Node *) query->distinctClause);
-	JumbleExpr(jstate, (Node *) query->sortClause);
-	JumbleExpr(jstate, query->limitOffset);
-	JumbleExpr(jstate, query->limitCount);
-	JumbleRowMarks(jstate, query->rowMarks);
-	JumbleExpr(jstate, query->setOperations);
-}
-
-/*
- * Jumble a range table
- */
-static void
-JumbleRangeTable(pgssJumbleState *jstate, List *rtable)
-{
-	ListCell   *lc;
-
-	foreach(lc, rtable)
-	{
-		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
-
-		APP_JUMB(rte->rtekind);
-		switch (rte->rtekind)
-		{
-			case RTE_RELATION:
-				APP_JUMB(rte->relid);
-				JumbleExpr(jstate, (Node *) rte->tablesample);
-				break;
-			case RTE_SUBQUERY:
-				JumbleQuery(jstate, rte->subquery);
-				break;
-			case RTE_JOIN:
-				APP_JUMB(rte->jointype);
-				break;
-			case RTE_FUNCTION:
-				JumbleExpr(jstate, (Node *) rte->functions);
-				break;
-			case RTE_TABLEFUNC:
-				JumbleExpr(jstate, (Node *) rte->tablefunc);
-				break;
-			case RTE_VALUES:
-				JumbleExpr(jstate, (Node *) rte->values_lists);
-				break;
-			case RTE_CTE:
-
-				/*
-				 * Depending on the CTE name here isn't ideal, but it's the
-				 * only info we have to identify the referenced WITH item.
-				 */
-				APP_JUMB_STRING(rte->ctename);
-				APP_JUMB(rte->ctelevelsup);
-				break;
-			case RTE_NAMEDTUPLESTORE:
-				APP_JUMB_STRING(rte->enrname);
-				break;
-			case RTE_RESULT:
-				break;
-			default:
-				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
-				break;
-		}
-	}
-}
-
-/*
- * Jumble a rowMarks list
- */
-static void
-JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks)
-{
-	ListCell   *lc;
-
-	foreach(lc, rowMarks)
-	{
-		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
-
-		if (!rowmark->pushedDown)
-		{
-			APP_JUMB(rowmark->rti);
-			APP_JUMB(rowmark->strength);
-			APP_JUMB(rowmark->waitPolicy);
-		}
-	}
-}
-
-/*
- * Jumble an expression tree
- *
- * In general this function should handle all the same node types that
- * expression_tree_walker() does, and therefore it's coded to be as parallel
- * to that function as possible.  However, since we are only invoked on
- * queries immediately post-parse-analysis, we need not handle node types
- * that only appear in planning.
- *
- * Note: the reason we don't simply use expression_tree_walker() is that the
- * point of that function is to support tree walkers that don't care about
- * most tree node types, but here we care about all types.  We should complain
- * about any unrecognized node type.
- */
-static void
-JumbleExpr(pgssJumbleState *jstate, Node *node)
-{
-	ListCell   *temp;
-
-	if (node == NULL)
-		return;
-
-	/* Guard against stack overflow due to overly complex expressions */
-	check_stack_depth();
-
-	/*
-	 * We always emit the node's NodeTag, then any additional fields that are
-	 * considered significant, and then we recurse to any child nodes.
-	 */
-	APP_JUMB(node->type);
-
-	switch (nodeTag(node))
-	{
-		case T_Var:
-			{
-				Var		   *var = (Var *) node;
-
-				APP_JUMB(var->varno);
-				APP_JUMB(var->varattno);
-				APP_JUMB(var->varlevelsup);
-			}
-			break;
-		case T_Const:
-			{
-				Const	   *c = (Const *) node;
-
-				/* We jumble only the constant's type, not its value */
-				APP_JUMB(c->consttype);
-				/* Also, record its parse location for query normalization */
-				RecordConstLocation(jstate, c->location);
-			}
-			break;
-		case T_Param:
-			{
-				Param	   *p = (Param *) node;
-
-				APP_JUMB(p->paramkind);
-				APP_JUMB(p->paramid);
-				APP_JUMB(p->paramtype);
-				/* Also, track the highest external Param id */
-				if (p->paramkind == PARAM_EXTERN &&
-					p->paramid > jstate->highest_extern_param_id)
-					jstate->highest_extern_param_id = p->paramid;
-			}
-			break;
-		case T_Aggref:
-			{
-				Aggref	   *expr = (Aggref *) node;
-
-				APP_JUMB(expr->aggfnoid);
-				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggorder);
-				JumbleExpr(jstate, (Node *) expr->aggdistinct);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_GroupingFunc:
-			{
-				GroupingFunc *grpnode = (GroupingFunc *) node;
-
-				JumbleExpr(jstate, (Node *) grpnode->refs);
-			}
-			break;
-		case T_WindowFunc:
-			{
-				WindowFunc *expr = (WindowFunc *) node;
-
-				APP_JUMB(expr->winfnoid);
-				APP_JUMB(expr->winref);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_SubscriptingRef:
-			{
-				SubscriptingRef *sbsref = (SubscriptingRef *) node;
-
-				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
-			}
-			break;
-		case T_FuncExpr:
-			{
-				FuncExpr   *expr = (FuncExpr *) node;
-
-				APP_JUMB(expr->funcid);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_NamedArgExpr:
-			{
-				NamedArgExpr *nae = (NamedArgExpr *) node;
-
-				APP_JUMB(nae->argnumber);
-				JumbleExpr(jstate, (Node *) nae->arg);
-			}
-			break;
-		case T_OpExpr:
-		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
-		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
-			{
-				OpExpr	   *expr = (OpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_ScalarArrayOpExpr:
-			{
-				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				APP_JUMB(expr->useOr);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_BoolExpr:
-			{
-				BoolExpr   *expr = (BoolExpr *) node;
-
-				APP_JUMB(expr->boolop);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_SubLink:
-			{
-				SubLink    *sublink = (SubLink *) node;
-
-				APP_JUMB(sublink->subLinkType);
-				APP_JUMB(sublink->subLinkId);
-				JumbleExpr(jstate, (Node *) sublink->testexpr);
-				JumbleQuery(jstate, castNode(Query, sublink->subselect));
-			}
-			break;
-		case T_FieldSelect:
-			{
-				FieldSelect *fs = (FieldSelect *) node;
-
-				APP_JUMB(fs->fieldnum);
-				JumbleExpr(jstate, (Node *) fs->arg);
-			}
-			break;
-		case T_FieldStore:
-			{
-				FieldStore *fstore = (FieldStore *) node;
-
-				JumbleExpr(jstate, (Node *) fstore->arg);
-				JumbleExpr(jstate, (Node *) fstore->newvals);
-			}
-			break;
-		case T_RelabelType:
-			{
-				RelabelType *rt = (RelabelType *) node;
-
-				APP_JUMB(rt->resulttype);
-				JumbleExpr(jstate, (Node *) rt->arg);
-			}
-			break;
-		case T_CoerceViaIO:
-			{
-				CoerceViaIO *cio = (CoerceViaIO *) node;
-
-				APP_JUMB(cio->resulttype);
-				JumbleExpr(jstate, (Node *) cio->arg);
-			}
-			break;
-		case T_ArrayCoerceExpr:
-			{
-				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
-
-				APP_JUMB(acexpr->resulttype);
-				JumbleExpr(jstate, (Node *) acexpr->arg);
-				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
-			}
-			break;
-		case T_ConvertRowtypeExpr:
-			{
-				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
-
-				APP_JUMB(crexpr->resulttype);
-				JumbleExpr(jstate, (Node *) crexpr->arg);
-			}
-			break;
-		case T_CollateExpr:
-			{
-				CollateExpr *ce = (CollateExpr *) node;
-
-				APP_JUMB(ce->collOid);
-				JumbleExpr(jstate, (Node *) ce->arg);
-			}
-			break;
-		case T_CaseExpr:
-			{
-				CaseExpr   *caseexpr = (CaseExpr *) node;
-
-				JumbleExpr(jstate, (Node *) caseexpr->arg);
-				foreach(temp, caseexpr->args)
-				{
-					CaseWhen   *when = lfirst_node(CaseWhen, temp);
-
-					JumbleExpr(jstate, (Node *) when->expr);
-					JumbleExpr(jstate, (Node *) when->result);
-				}
-				JumbleExpr(jstate, (Node *) caseexpr->defresult);
-			}
-			break;
-		case T_CaseTestExpr:
-			{
-				CaseTestExpr *ct = (CaseTestExpr *) node;
-
-				APP_JUMB(ct->typeId);
-			}
-			break;
-		case T_ArrayExpr:
-			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
-			break;
-		case T_RowExpr:
-			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
-			break;
-		case T_RowCompareExpr:
-			{
-				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
-
-				APP_JUMB(rcexpr->rctype);
-				JumbleExpr(jstate, (Node *) rcexpr->largs);
-				JumbleExpr(jstate, (Node *) rcexpr->rargs);
-			}
-			break;
-		case T_CoalesceExpr:
-			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
-			break;
-		case T_MinMaxExpr:
-			{
-				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
-
-				APP_JUMB(mmexpr->op);
-				JumbleExpr(jstate, (Node *) mmexpr->args);
-			}
-			break;
-		case T_SQLValueFunction:
-			{
-				SQLValueFunction *svf = (SQLValueFunction *) node;
-
-				APP_JUMB(svf->op);
-				/* type is fully determined by op */
-				APP_JUMB(svf->typmod);
-			}
-			break;
-		case T_XmlExpr:
-			{
-				XmlExpr    *xexpr = (XmlExpr *) node;
-
-				APP_JUMB(xexpr->op);
-				JumbleExpr(jstate, (Node *) xexpr->named_args);
-				JumbleExpr(jstate, (Node *) xexpr->args);
-			}
-			break;
-		case T_NullTest:
-			{
-				NullTest   *nt = (NullTest *) node;
-
-				APP_JUMB(nt->nulltesttype);
-				JumbleExpr(jstate, (Node *) nt->arg);
-			}
-			break;
-		case T_BooleanTest:
-			{
-				BooleanTest *bt = (BooleanTest *) node;
-
-				APP_JUMB(bt->booltesttype);
-				JumbleExpr(jstate, (Node *) bt->arg);
-			}
-			break;
-		case T_CoerceToDomain:
-			{
-				CoerceToDomain *cd = (CoerceToDomain *) node;
-
-				APP_JUMB(cd->resulttype);
-				JumbleExpr(jstate, (Node *) cd->arg);
-			}
-			break;
-		case T_CoerceToDomainValue:
-			{
-				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
-
-				APP_JUMB(cdv->typeId);
-			}
-			break;
-		case T_SetToDefault:
-			{
-				SetToDefault *sd = (SetToDefault *) node;
-
-				APP_JUMB(sd->typeId);
-			}
-			break;
-		case T_CurrentOfExpr:
-			{
-				CurrentOfExpr *ce = (CurrentOfExpr *) node;
-
-				APP_JUMB(ce->cvarno);
-				if (ce->cursor_name)
-					APP_JUMB_STRING(ce->cursor_name);
-				APP_JUMB(ce->cursor_param);
-			}
-			break;
-		case T_NextValueExpr:
-			{
-				NextValueExpr *nve = (NextValueExpr *) node;
-
-				APP_JUMB(nve->seqid);
-				APP_JUMB(nve->typeId);
-			}
-			break;
-		case T_InferenceElem:
-			{
-				InferenceElem *ie = (InferenceElem *) node;
-
-				APP_JUMB(ie->infercollid);
-				APP_JUMB(ie->inferopclass);
-				JumbleExpr(jstate, ie->expr);
-			}
-			break;
-		case T_TargetEntry:
-			{
-				TargetEntry *tle = (TargetEntry *) node;
-
-				APP_JUMB(tle->resno);
-				APP_JUMB(tle->ressortgroupref);
-				JumbleExpr(jstate, (Node *) tle->expr);
-			}
-			break;
-		case T_RangeTblRef:
-			{
-				RangeTblRef *rtr = (RangeTblRef *) node;
-
-				APP_JUMB(rtr->rtindex);
-			}
-			break;
-		case T_JoinExpr:
-			{
-				JoinExpr   *join = (JoinExpr *) node;
-
-				APP_JUMB(join->jointype);
-				APP_JUMB(join->isNatural);
-				APP_JUMB(join->rtindex);
-				JumbleExpr(jstate, join->larg);
-				JumbleExpr(jstate, join->rarg);
-				JumbleExpr(jstate, join->quals);
-			}
-			break;
-		case T_FromExpr:
-			{
-				FromExpr   *from = (FromExpr *) node;
-
-				JumbleExpr(jstate, (Node *) from->fromlist);
-				JumbleExpr(jstate, from->quals);
-			}
-			break;
-		case T_OnConflictExpr:
-			{
-				OnConflictExpr *conf = (OnConflictExpr *) node;
-
-				APP_JUMB(conf->action);
-				JumbleExpr(jstate, (Node *) conf->arbiterElems);
-				JumbleExpr(jstate, conf->arbiterWhere);
-				JumbleExpr(jstate, (Node *) conf->onConflictSet);
-				JumbleExpr(jstate, conf->onConflictWhere);
-				APP_JUMB(conf->constraint);
-				APP_JUMB(conf->exclRelIndex);
-				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
-			}
-			break;
-		case T_List:
-			foreach(temp, (List *) node)
-			{
-				JumbleExpr(jstate, (Node *) lfirst(temp));
-			}
-			break;
-		case T_IntList:
-			foreach(temp, (List *) node)
-			{
-				APP_JUMB(lfirst_int(temp));
-			}
-			break;
-		case T_SortGroupClause:
-			{
-				SortGroupClause *sgc = (SortGroupClause *) node;
-
-				APP_JUMB(sgc->tleSortGroupRef);
-				APP_JUMB(sgc->eqop);
-				APP_JUMB(sgc->sortop);
-				APP_JUMB(sgc->nulls_first);
-			}
-			break;
-		case T_GroupingSet:
-			{
-				GroupingSet *gsnode = (GroupingSet *) node;
-
-				JumbleExpr(jstate, (Node *) gsnode->content);
-			}
-			break;
-		case T_WindowClause:
-			{
-				WindowClause *wc = (WindowClause *) node;
-
-				APP_JUMB(wc->winref);
-				APP_JUMB(wc->frameOptions);
-				JumbleExpr(jstate, (Node *) wc->partitionClause);
-				JumbleExpr(jstate, (Node *) wc->orderClause);
-				JumbleExpr(jstate, wc->startOffset);
-				JumbleExpr(jstate, wc->endOffset);
-			}
-			break;
-		case T_CommonTableExpr:
-			{
-				CommonTableExpr *cte = (CommonTableExpr *) node;
-
-				/* we store the string name because RTE_CTE RTEs need it */
-				APP_JUMB_STRING(cte->ctename);
-				APP_JUMB(cte->ctematerialized);
-				JumbleQuery(jstate, castNode(Query, cte->ctequery));
-			}
-			break;
-		case T_SetOperationStmt:
-			{
-				SetOperationStmt *setop = (SetOperationStmt *) node;
-
-				APP_JUMB(setop->op);
-				APP_JUMB(setop->all);
-				JumbleExpr(jstate, setop->larg);
-				JumbleExpr(jstate, setop->rarg);
-			}
-			break;
-		case T_RangeTblFunction:
-			{
-				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
-
-				JumbleExpr(jstate, rtfunc->funcexpr);
-			}
-			break;
-		case T_TableFunc:
-			{
-				TableFunc  *tablefunc = (TableFunc *) node;
-
-				JumbleExpr(jstate, tablefunc->docexpr);
-				JumbleExpr(jstate, tablefunc->rowexpr);
-				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
-			}
-			break;
-		case T_TableSampleClause:
-			{
-				TableSampleClause *tsc = (TableSampleClause *) node;
-
-				APP_JUMB(tsc->tsmhandler);
-				JumbleExpr(jstate, (Node *) tsc->args);
-				JumbleExpr(jstate, (Node *) tsc->repeatable);
-			}
-			break;
-		default:
-			/* Only a warning, since we can stumble along anyway */
-			elog(WARNING, "unrecognized node type: %d",
-				 (int) nodeTag(node));
-			break;
-	}
-}
-
-/*
- * Record location of constant within query string of query tree
- * that is currently being walked.
- */
-static void
-RecordConstLocation(pgssJumbleState *jstate, int location)
-{
-	/* -1 indicates unknown or undefined location */
-	if (location >= 0)
-	{
-		/* enlarge array if needed */
-		if (jstate->clocations_count >= jstate->clocations_buf_size)
-		{
-			jstate->clocations_buf_size *= 2;
-			jstate->clocations = (pgssLocationLen *)
-				repalloc(jstate->clocations,
-						 jstate->clocations_buf_size *
-						 sizeof(pgssLocationLen));
-		}
-		jstate->clocations[jstate->clocations_count].location = location;
-		/* initialize lengths to -1 to simplify fill_in_constant_lengths */
-		jstate->clocations[jstate->clocations_count].length = -1;
-		jstate->clocations_count++;
-	}
-}
-
 /*
  * Generate a normalized version of the query string that will be used to
  * represent all similar queries.
@@ -3314,7 +2559,7 @@ RecordConstLocation(pgssJumbleState *jstate, int location)
  * Returns a palloc'd string.
  */
 static char *
-generate_normalized_query(pgssJumbleState *jstate, const char *query,
+generate_normalized_query(JumbleState *jstate, const char *query,
 						  int query_loc, int *query_len_p)
 {
 	char	   *norm_query;
@@ -3421,10 +2666,10 @@ generate_normalized_query(pgssJumbleState *jstate, const char *query,
  * reason for a constant to start with a '-'.
  */
 static void
-fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+fill_in_constant_lengths(JumbleState *jstate, const char *query,
 						 int query_loc)
 {
-	pgssLocationLen *locs;
+	LocationLen *locs;
 	core_yyscan_t yyscanner;
 	core_yy_extra_type yyextra;
 	core_YYSTYPE yylval;
@@ -3438,7 +2683,7 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 	 */
 	if (jstate->clocations_count > 1)
 		qsort(jstate->clocations, jstate->clocations_count,
-			  sizeof(pgssLocationLen), comp_location);
+			  sizeof(LocationLen), comp_location);
 	locs = jstate->clocations;
 
 	/* initialize the flex scanner --- should match raw_parser() */
@@ -3518,13 +2763,13 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 }
 
 /*
- * comp_location: comparator for qsorting pgssLocationLen structs by location
+ * comp_location: comparator for qsorting LocationLen structs by location
  */
 static int
 comp_location(const void *a, const void *b)
 {
-	int			l = ((const pgssLocationLen *) a)->location;
-	int			r = ((const pgssLocationLen *) b)->location;
+	int			l = ((const LocationLen *) a)->location;
+	int			r = ((const LocationLen *) b)->location;
 
 	if (l < r)
 		return -1;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.conf b/contrib/pg_stat_statements/pg_stat_statements.conf
index 13346e2807..d98411ea3f 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.conf
+++ b/contrib/pg_stat_statements/pg_stat_statements.conf
@@ -1 +1,2 @@
 shared_preload_libraries = 'pg_stat_statements'
+compute_queryid = on
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 82864bbb24..e4e8e7e997 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7467,6 +7467,24 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
      <title>Statistics Monitoring</title>
      <variablelist>
 
+     <varlistentry id="guc-compute-queryid" xreflabel="compute_queryid">
+      <term><varname>compute_queryid</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>compute_queryid</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables in core query identifier computation.arameter.  The
+        <xref linkend="pgstatstatements"/> extension requires a query
+        identifier to be computed.  Note that an external module can
+        alternatively be used if the in core query identifier computation
+        specification doesn't suit your need.  In this case, in core
+        computation must be disabled.  The default is <literal>off</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><varname>log_statement_stats</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 28e192f51c..1bc0f66703 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -46,6 +46,8 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/queryjumble.h"
 #include "utils/rel.h"
 
 
@@ -107,6 +109,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -119,8 +122,11 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	query = transformTopLevelStmt(pstate, parseTree);
 
+	if (compute_queryid)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
@@ -140,6 +146,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -152,8 +159,11 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 	/* make sure all is well with parameter types */
 	check_variable_parameters(pstate, query);
 
+	if (compute_queryid)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8dab9fd578..46379dd6db 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -720,6 +720,7 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 	ParseState *pstate;
 	Query	   *query;
 	List	   *querytree_list;
+	JumbleState *jstate = NULL;
 
 	Assert(query_string != NULL);	/* required as of 8.4 */
 
@@ -738,8 +739,11 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	query = transformTopLevelStmt(pstate, parsetree);
 
+	if (compute_queryid)
+		jstate = JumbleQuery(query, query_string);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 2397fc2453..1d5327cf64 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -22,6 +22,7 @@ OBJS = \
 	pg_rusage.o \
 	ps_status.o \
 	queryenvironment.o \
+	queryjumble.o \
 	rls.o \
 	sampling.o \
 	superuser.o \
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 17579eeaca..fc2e0e08b8 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -512,6 +512,7 @@ extern const struct config_enum_entry dynamic_shared_memory_options[];
 /*
  * GUC option variables that are exported from this module
  */
+bool		compute_queryid = false;
 bool		log_duration = false;
 bool		Debug_print_plan = false;
 bool		Debug_print_parse = false;
@@ -1407,6 +1408,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"compute_queryid", PGC_SUSET, STATS_MONITORING,
+			gettext_noop("Compute query identifiers."),
+			NULL
+		},
+		&compute_queryid,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"log_parser_stats", PGC_SUSET, STATS_MONITORING,
 			gettext_noop("Writes parser performance statistics to the server log."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 8930a94fff..c4421bcc1f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -592,6 +592,7 @@
 
 # - Monitoring -
 
+#compute_queryid = off
 #log_parser_stats = off
 #log_planner_stats = off
 #log_executor_stats = off
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
new file mode 100644
index 0000000000..ae84fcac6e
--- /dev/null
+++ b/src/backend/utils/misc/queryjumble.c
@@ -0,0 +1,834 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.c
+ *	 Query normalization and fingerprinting.
+ *
+ * Normalization is a process whereby similar queries, typically differing only
+ * in their constants (though the exact rules are somewhat more subtle than
+ * that) are recognized as equivalent, and are tracked as a single entry.  This
+ * is particularly useful for non-prepared queries.
+ *
+ * Normalization is implemented by fingerprinting queries, selectively
+ * serializing those fields of each query tree's nodes that are judged to be
+ * essential to the query.  This is referred to as a query jumble.  This is
+ * distinct from a regular serialization in that various extraneous
+ * information is ignored as irrelevant or not essential to the query, such
+ * as the collations of Vars and, most notably, the values of constants.
+ *
+ * This jumble is acquired at the end of parse analysis of each query, and
+ * a 64-bit hash of it is stored into the query's Query.queryId field.
+ * The server then copies this value around, making it available in plan
+ * tree(s) generated from the query.  The executor can then use this value
+ * to blame query costs on the proper queryId.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/misc/queryjumble.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "common/hashfn.h"
+#include "miscadmin.h"
+#include "parser/scansup.h"
+#include "utils/queryjumble.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+static uint64 compute_utility_queryid(const char *str, int query_len);
+static void AppendJumble(JumbleState *jstate,
+						 const unsigned char *item, Size size);
+static void JumbleQueryInternal(JumbleState *jstate, Query *query);
+static void JumbleRangeTable(JumbleState *jstate, List *rtable);
+static void JumbleRowMarks(JumbleState *jstate, List *rowMarks);
+static void JumbleExpr(JumbleState *jstate, Node *node);
+static void RecordConstLocation(JumbleState *jstate, int location);
+
+/*
+ * Given a possibly multi-statement source string, confine our attention to the
+ * relevant part of the string.
+ */
+const char *
+clean_querytext(const char *query, int *location, int *len)
+{
+	int query_location = *location;
+	int query_len = *len;
+
+	/* First apply starting offset, unless it's -1 (unknown). */
+	if (query_location >= 0)
+	{
+		Assert(query_location <= strlen(query));
+		query += query_location;
+		/* Length of 0 (or -1) means "rest of string" */
+		if (query_len <= 0)
+			query_len = strlen(query);
+		else
+			Assert(query_len <= strlen(query));
+	}
+	else
+	{
+		/* If query location is unknown, distrust query_len as well */
+		query_location = 0;
+		query_len = strlen(query);
+	}
+
+	/*
+	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
+	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 */
+	while (query_len > 0 && scanner_isspace(query[0]))
+		query++, query_location++, query_len--;
+	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
+		query_len--;
+
+	*location = query_location;
+	*len = query_len;
+
+	return query;
+}
+
+JumbleState *
+JumbleQuery(Query *query, const char *querytext)
+{
+	JumbleState *jstate = NULL;
+	if (query->utilityStmt)
+	{
+		const char *sql;
+		int query_location = query->stmt_location;
+		int query_len = query->stmt_len;
+
+		/*
+		 * Confine our attention to the relevant part of the string, if the
+		 * query is a portion of a multi-statement source string.
+		 */
+		sql = clean_querytext(querytext, &query_location, &query_len);
+
+		query->queryId = compute_utility_queryid(sql, query_len);
+	}
+	else
+	{
+		jstate = (JumbleState *) palloc(sizeof(JumbleState));
+
+		/* Set up workspace for query jumbling */
+		jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
+		jstate->jumble_len = 0;
+		jstate->clocations_buf_size = 32;
+		jstate->clocations = (LocationLen *)
+			palloc(jstate->clocations_buf_size * sizeof(LocationLen));
+		jstate->clocations_count = 0;
+		jstate->highest_extern_param_id = 0;
+
+		/* Compute query ID and mark the Query node with it */
+		JumbleQueryInternal(jstate, query);
+		query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
+														  jstate->jumble_len,
+														  0));
+
+		/*
+		 * If we are unlucky enough to get a hash of zero, use 1 instead, to
+		 * prevent confusion with the utility-statement case.
+		 */
+		if (query->queryId == UINT64CONST(0))
+			query->queryId = UINT64CONST(1);
+	}
+
+	return jstate;
+}
+
+/*
+ * Compute a query identifier for the given utility query string.
+ */
+static uint64
+compute_utility_queryid(const char *str, int query_len)
+{
+	uint64 queryId;
+
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+											   query_len, 0));
+
+	/*
+	 * If we are unlucky enough to get a hash of zero(invalid), use
+	 * queryID as 2 instead, queryID 1 is already in use for normal
+	 * statements.
+	 */
+	if (queryId == UINT64CONST(0))
+		queryId = UINT64CONST(2);
+
+	return queryId;
+}
+
+/*
+ * AppendJumble: Append a value that is substantive in a given query to
+ * the current jumble.
+ */
+static void
+AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
+{
+	unsigned char *jumble = jstate->jumble;
+	Size		jumble_len = jstate->jumble_len;
+
+	/*
+	 * Whenever the jumble buffer is full, we hash the current contents and
+	 * reset the buffer to contain just that hash value, thus relying on the
+	 * hash to summarize everything so far.
+	 */
+	while (size > 0)
+	{
+		Size		part_size;
+
+		if (jumble_len >= JUMBLE_SIZE)
+		{
+			uint64		start_hash;
+
+			start_hash = DatumGetUInt64(hash_any_extended(jumble,
+														  JUMBLE_SIZE, 0));
+			memcpy(jumble, &start_hash, sizeof(start_hash));
+			jumble_len = sizeof(start_hash);
+		}
+		part_size = Min(size, JUMBLE_SIZE - jumble_len);
+		memcpy(jumble + jumble_len, item, part_size);
+		jumble_len += part_size;
+		item += part_size;
+		size -= part_size;
+	}
+	jstate->jumble_len = jumble_len;
+}
+
+/*
+ * Wrappers around AppendJumble to encapsulate details of serialization
+ * of individual local variable elements.
+ */
+#define APP_JUMB(item) \
+	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
+#define APP_JUMB_STRING(str) \
+	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
+
+/*
+ * JumbleQueryInternal: Selectively serialize the query tree, appending
+ * significant data to the "query jumble" while ignoring nonsignificant data.
+ *
+ * Rule of thumb for what to include is that we should ignore anything not
+ * semantically significant (such as alias names) as well as anything that can
+ * be deduced from child nodes (else we'd just be double-hashing that piece
+ * of information).
+ */
+static void
+JumbleQueryInternal(JumbleState *jstate, Query *query)
+{
+	Assert(IsA(query, Query));
+	Assert(query->utilityStmt == NULL);
+
+	APP_JUMB(query->commandType);
+	/* resultRelation is usually predictable from commandType */
+	JumbleExpr(jstate, (Node *) query->cteList);
+	JumbleRangeTable(jstate, query->rtable);
+	JumbleExpr(jstate, (Node *) query->jointree);
+	JumbleExpr(jstate, (Node *) query->targetList);
+	JumbleExpr(jstate, (Node *) query->onConflict);
+	JumbleExpr(jstate, (Node *) query->returningList);
+	JumbleExpr(jstate, (Node *) query->groupClause);
+	JumbleExpr(jstate, (Node *) query->groupingSets);
+	JumbleExpr(jstate, query->havingQual);
+	JumbleExpr(jstate, (Node *) query->windowClause);
+	JumbleExpr(jstate, (Node *) query->distinctClause);
+	JumbleExpr(jstate, (Node *) query->sortClause);
+	JumbleExpr(jstate, query->limitOffset);
+	JumbleExpr(jstate, query->limitCount);
+	JumbleRowMarks(jstate, query->rowMarks);
+	JumbleExpr(jstate, query->setOperations);
+}
+
+/*
+ * Jumble a range table
+ */
+static void
+JumbleRangeTable(JumbleState *jstate, List *rtable)
+{
+	ListCell   *lc;
+
+	foreach(lc, rtable)
+	{
+		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+		APP_JUMB(rte->rtekind);
+		switch (rte->rtekind)
+		{
+			case RTE_RELATION:
+				APP_JUMB(rte->relid);
+				JumbleExpr(jstate, (Node *) rte->tablesample);
+				break;
+			case RTE_SUBQUERY:
+				JumbleQueryInternal(jstate, rte->subquery);
+				break;
+			case RTE_JOIN:
+				APP_JUMB(rte->jointype);
+				break;
+			case RTE_FUNCTION:
+				JumbleExpr(jstate, (Node *) rte->functions);
+				break;
+			case RTE_TABLEFUNC:
+				JumbleExpr(jstate, (Node *) rte->tablefunc);
+				break;
+			case RTE_VALUES:
+				JumbleExpr(jstate, (Node *) rte->values_lists);
+				break;
+			case RTE_CTE:
+
+				/*
+				 * Depending on the CTE name here isn't ideal, but it's the
+				 * only info we have to identify the referenced WITH item.
+				 */
+				APP_JUMB_STRING(rte->ctename);
+				APP_JUMB(rte->ctelevelsup);
+				break;
+			case RTE_NAMEDTUPLESTORE:
+				APP_JUMB_STRING(rte->enrname);
+				break;
+			case RTE_RESULT:
+				break;
+			default:
+				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
+				break;
+		}
+	}
+}
+
+/*
+ * Jumble a rowMarks list
+ */
+static void
+JumbleRowMarks(JumbleState *jstate, List *rowMarks)
+{
+	ListCell   *lc;
+
+	foreach(lc, rowMarks)
+	{
+		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
+
+		if (!rowmark->pushedDown)
+		{
+			APP_JUMB(rowmark->rti);
+			APP_JUMB(rowmark->strength);
+			APP_JUMB(rowmark->waitPolicy);
+		}
+	}
+}
+
+/*
+ * Jumble an expression tree
+ *
+ * In general this function should handle all the same node types that
+ * expression_tree_walker() does, and therefore it's coded to be as parallel
+ * to that function as possible.  However, since we are only invoked on
+ * queries immediately post-parse-analysis, we need not handle node types
+ * that only appear in planning.
+ *
+ * Note: the reason we don't simply use expression_tree_walker() is that the
+ * point of that function is to support tree walkers that don't care about
+ * most tree node types, but here we care about all types.  We should complain
+ * about any unrecognized node type.
+ */
+static void
+JumbleExpr(JumbleState *jstate, Node *node)
+{
+	ListCell   *temp;
+
+	if (node == NULL)
+		return;
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+
+	/*
+	 * We always emit the node's NodeTag, then any additional fields that are
+	 * considered significant, and then we recurse to any child nodes.
+	 */
+	APP_JUMB(node->type);
+
+	switch (nodeTag(node))
+	{
+		case T_Var:
+			{
+				Var		   *var = (Var *) node;
+
+				APP_JUMB(var->varno);
+				APP_JUMB(var->varattno);
+				APP_JUMB(var->varlevelsup);
+			}
+			break;
+		case T_Const:
+			{
+				Const	   *c = (Const *) node;
+
+				/* We jumble only the constant's type, not its value */
+				APP_JUMB(c->consttype);
+				/* Also, record its parse location for query normalization */
+				RecordConstLocation(jstate, c->location);
+			}
+			break;
+		case T_Param:
+			{
+				Param	   *p = (Param *) node;
+
+				APP_JUMB(p->paramkind);
+				APP_JUMB(p->paramid);
+				APP_JUMB(p->paramtype);
+				/* Also, track the highest external Param id */
+				if (p->paramkind == PARAM_EXTERN &&
+					p->paramid > jstate->highest_extern_param_id)
+					jstate->highest_extern_param_id = p->paramid;
+			}
+			break;
+		case T_Aggref:
+			{
+				Aggref	   *expr = (Aggref *) node;
+
+				APP_JUMB(expr->aggfnoid);
+				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggorder);
+				JumbleExpr(jstate, (Node *) expr->aggdistinct);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_GroupingFunc:
+			{
+				GroupingFunc *grpnode = (GroupingFunc *) node;
+
+				JumbleExpr(jstate, (Node *) grpnode->refs);
+			}
+			break;
+		case T_WindowFunc:
+			{
+				WindowFunc *expr = (WindowFunc *) node;
+
+				APP_JUMB(expr->winfnoid);
+				APP_JUMB(expr->winref);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_SubscriptingRef:
+			{
+				SubscriptingRef *sbsref = (SubscriptingRef *) node;
+
+				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
+			}
+			break;
+		case T_FuncExpr:
+			{
+				FuncExpr   *expr = (FuncExpr *) node;
+
+				APP_JUMB(expr->funcid);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_NamedArgExpr:
+			{
+				NamedArgExpr *nae = (NamedArgExpr *) node;
+
+				APP_JUMB(nae->argnumber);
+				JumbleExpr(jstate, (Node *) nae->arg);
+			}
+			break;
+		case T_OpExpr:
+		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
+		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
+			{
+				OpExpr	   *expr = (OpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_ScalarArrayOpExpr:
+			{
+				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				APP_JUMB(expr->useOr);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_BoolExpr:
+			{
+				BoolExpr   *expr = (BoolExpr *) node;
+
+				APP_JUMB(expr->boolop);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_SubLink:
+			{
+				SubLink    *sublink = (SubLink *) node;
+
+				APP_JUMB(sublink->subLinkType);
+				APP_JUMB(sublink->subLinkId);
+				JumbleExpr(jstate, (Node *) sublink->testexpr);
+				JumbleQueryInternal(jstate, castNode(Query, sublink->subselect));
+			}
+			break;
+		case T_FieldSelect:
+			{
+				FieldSelect *fs = (FieldSelect *) node;
+
+				APP_JUMB(fs->fieldnum);
+				JumbleExpr(jstate, (Node *) fs->arg);
+			}
+			break;
+		case T_FieldStore:
+			{
+				FieldStore *fstore = (FieldStore *) node;
+
+				JumbleExpr(jstate, (Node *) fstore->arg);
+				JumbleExpr(jstate, (Node *) fstore->newvals);
+			}
+			break;
+		case T_RelabelType:
+			{
+				RelabelType *rt = (RelabelType *) node;
+
+				APP_JUMB(rt->resulttype);
+				JumbleExpr(jstate, (Node *) rt->arg);
+			}
+			break;
+		case T_CoerceViaIO:
+			{
+				CoerceViaIO *cio = (CoerceViaIO *) node;
+
+				APP_JUMB(cio->resulttype);
+				JumbleExpr(jstate, (Node *) cio->arg);
+			}
+			break;
+		case T_ArrayCoerceExpr:
+			{
+				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
+
+				APP_JUMB(acexpr->resulttype);
+				JumbleExpr(jstate, (Node *) acexpr->arg);
+				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
+			}
+			break;
+		case T_ConvertRowtypeExpr:
+			{
+				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
+
+				APP_JUMB(crexpr->resulttype);
+				JumbleExpr(jstate, (Node *) crexpr->arg);
+			}
+			break;
+		case T_CollateExpr:
+			{
+				CollateExpr *ce = (CollateExpr *) node;
+
+				APP_JUMB(ce->collOid);
+				JumbleExpr(jstate, (Node *) ce->arg);
+			}
+			break;
+		case T_CaseExpr:
+			{
+				CaseExpr   *caseexpr = (CaseExpr *) node;
+
+				JumbleExpr(jstate, (Node *) caseexpr->arg);
+				foreach(temp, caseexpr->args)
+				{
+					CaseWhen   *when = lfirst_node(CaseWhen, temp);
+
+					JumbleExpr(jstate, (Node *) when->expr);
+					JumbleExpr(jstate, (Node *) when->result);
+				}
+				JumbleExpr(jstate, (Node *) caseexpr->defresult);
+			}
+			break;
+		case T_CaseTestExpr:
+			{
+				CaseTestExpr *ct = (CaseTestExpr *) node;
+
+				APP_JUMB(ct->typeId);
+			}
+			break;
+		case T_ArrayExpr:
+			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
+			break;
+		case T_RowExpr:
+			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
+			break;
+		case T_RowCompareExpr:
+			{
+				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
+
+				APP_JUMB(rcexpr->rctype);
+				JumbleExpr(jstate, (Node *) rcexpr->largs);
+				JumbleExpr(jstate, (Node *) rcexpr->rargs);
+			}
+			break;
+		case T_CoalesceExpr:
+			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
+			break;
+		case T_MinMaxExpr:
+			{
+				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
+
+				APP_JUMB(mmexpr->op);
+				JumbleExpr(jstate, (Node *) mmexpr->args);
+			}
+			break;
+		case T_SQLValueFunction:
+			{
+				SQLValueFunction *svf = (SQLValueFunction *) node;
+
+				APP_JUMB(svf->op);
+				/* type is fully determined by op */
+				APP_JUMB(svf->typmod);
+			}
+			break;
+		case T_XmlExpr:
+			{
+				XmlExpr    *xexpr = (XmlExpr *) node;
+
+				APP_JUMB(xexpr->op);
+				JumbleExpr(jstate, (Node *) xexpr->named_args);
+				JumbleExpr(jstate, (Node *) xexpr->args);
+			}
+			break;
+		case T_NullTest:
+			{
+				NullTest   *nt = (NullTest *) node;
+
+				APP_JUMB(nt->nulltesttype);
+				JumbleExpr(jstate, (Node *) nt->arg);
+			}
+			break;
+		case T_BooleanTest:
+			{
+				BooleanTest *bt = (BooleanTest *) node;
+
+				APP_JUMB(bt->booltesttype);
+				JumbleExpr(jstate, (Node *) bt->arg);
+			}
+			break;
+		case T_CoerceToDomain:
+			{
+				CoerceToDomain *cd = (CoerceToDomain *) node;
+
+				APP_JUMB(cd->resulttype);
+				JumbleExpr(jstate, (Node *) cd->arg);
+			}
+			break;
+		case T_CoerceToDomainValue:
+			{
+				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
+
+				APP_JUMB(cdv->typeId);
+			}
+			break;
+		case T_SetToDefault:
+			{
+				SetToDefault *sd = (SetToDefault *) node;
+
+				APP_JUMB(sd->typeId);
+			}
+			break;
+		case T_CurrentOfExpr:
+			{
+				CurrentOfExpr *ce = (CurrentOfExpr *) node;
+
+				APP_JUMB(ce->cvarno);
+				if (ce->cursor_name)
+					APP_JUMB_STRING(ce->cursor_name);
+				APP_JUMB(ce->cursor_param);
+			}
+			break;
+		case T_NextValueExpr:
+			{
+				NextValueExpr *nve = (NextValueExpr *) node;
+
+				APP_JUMB(nve->seqid);
+				APP_JUMB(nve->typeId);
+			}
+			break;
+		case T_InferenceElem:
+			{
+				InferenceElem *ie = (InferenceElem *) node;
+
+				APP_JUMB(ie->infercollid);
+				APP_JUMB(ie->inferopclass);
+				JumbleExpr(jstate, ie->expr);
+			}
+			break;
+		case T_TargetEntry:
+			{
+				TargetEntry *tle = (TargetEntry *) node;
+
+				APP_JUMB(tle->resno);
+				APP_JUMB(tle->ressortgroupref);
+				JumbleExpr(jstate, (Node *) tle->expr);
+			}
+			break;
+		case T_RangeTblRef:
+			{
+				RangeTblRef *rtr = (RangeTblRef *) node;
+
+				APP_JUMB(rtr->rtindex);
+			}
+			break;
+		case T_JoinExpr:
+			{
+				JoinExpr   *join = (JoinExpr *) node;
+
+				APP_JUMB(join->jointype);
+				APP_JUMB(join->isNatural);
+				APP_JUMB(join->rtindex);
+				JumbleExpr(jstate, join->larg);
+				JumbleExpr(jstate, join->rarg);
+				JumbleExpr(jstate, join->quals);
+			}
+			break;
+		case T_FromExpr:
+			{
+				FromExpr   *from = (FromExpr *) node;
+
+				JumbleExpr(jstate, (Node *) from->fromlist);
+				JumbleExpr(jstate, from->quals);
+			}
+			break;
+		case T_OnConflictExpr:
+			{
+				OnConflictExpr *conf = (OnConflictExpr *) node;
+
+				APP_JUMB(conf->action);
+				JumbleExpr(jstate, (Node *) conf->arbiterElems);
+				JumbleExpr(jstate, conf->arbiterWhere);
+				JumbleExpr(jstate, (Node *) conf->onConflictSet);
+				JumbleExpr(jstate, conf->onConflictWhere);
+				APP_JUMB(conf->constraint);
+				APP_JUMB(conf->exclRelIndex);
+				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
+			}
+			break;
+		case T_List:
+			foreach(temp, (List *) node)
+			{
+				JumbleExpr(jstate, (Node *) lfirst(temp));
+			}
+			break;
+		case T_IntList:
+			foreach(temp, (List *) node)
+			{
+				APP_JUMB(lfirst_int(temp));
+			}
+			break;
+		case T_SortGroupClause:
+			{
+				SortGroupClause *sgc = (SortGroupClause *) node;
+
+				APP_JUMB(sgc->tleSortGroupRef);
+				APP_JUMB(sgc->eqop);
+				APP_JUMB(sgc->sortop);
+				APP_JUMB(sgc->nulls_first);
+			}
+			break;
+		case T_GroupingSet:
+			{
+				GroupingSet *gsnode = (GroupingSet *) node;
+
+				JumbleExpr(jstate, (Node *) gsnode->content);
+			}
+			break;
+		case T_WindowClause:
+			{
+				WindowClause *wc = (WindowClause *) node;
+
+				APP_JUMB(wc->winref);
+				APP_JUMB(wc->frameOptions);
+				JumbleExpr(jstate, (Node *) wc->partitionClause);
+				JumbleExpr(jstate, (Node *) wc->orderClause);
+				JumbleExpr(jstate, wc->startOffset);
+				JumbleExpr(jstate, wc->endOffset);
+			}
+			break;
+		case T_CommonTableExpr:
+			{
+				CommonTableExpr *cte = (CommonTableExpr *) node;
+
+				/* we store the string name because RTE_CTE RTEs need it */
+				APP_JUMB_STRING(cte->ctename);
+				APP_JUMB(cte->ctematerialized);
+				JumbleQueryInternal(jstate, castNode(Query, cte->ctequery));
+			}
+			break;
+		case T_SetOperationStmt:
+			{
+				SetOperationStmt *setop = (SetOperationStmt *) node;
+
+				APP_JUMB(setop->op);
+				APP_JUMB(setop->all);
+				JumbleExpr(jstate, setop->larg);
+				JumbleExpr(jstate, setop->rarg);
+			}
+			break;
+		case T_RangeTblFunction:
+			{
+				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
+
+				JumbleExpr(jstate, rtfunc->funcexpr);
+			}
+			break;
+		case T_TableFunc:
+			{
+				TableFunc  *tablefunc = (TableFunc *) node;
+
+				JumbleExpr(jstate, tablefunc->docexpr);
+				JumbleExpr(jstate, tablefunc->rowexpr);
+				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
+			}
+			break;
+		case T_TableSampleClause:
+			{
+				TableSampleClause *tsc = (TableSampleClause *) node;
+
+				APP_JUMB(tsc->tsmhandler);
+				JumbleExpr(jstate, (Node *) tsc->args);
+				JumbleExpr(jstate, (Node *) tsc->repeatable);
+			}
+			break;
+		default:
+			/* Only a warning, since we can stumble along anyway */
+			elog(WARNING, "unrecognized node type: %d",
+				 (int) nodeTag(node));
+			break;
+	}
+}
+
+/*
+ * Record location of constant within query string of query tree
+ * that is currently being walked.
+ */
+static void
+RecordConstLocation(JumbleState *jstate, int location)
+{
+	/* -1 indicates unknown or undefined location */
+	if (location >= 0)
+	{
+		/* enlarge array if needed */
+		if (jstate->clocations_count >= jstate->clocations_buf_size)
+		{
+			jstate->clocations_buf_size *= 2;
+			jstate->clocations = (LocationLen *)
+				repalloc(jstate->clocations,
+						 jstate->clocations_buf_size *
+						 sizeof(LocationLen));
+		}
+		jstate->clocations[jstate->clocations_count].location = location;
+		/* initialize lengths to -1 to simplify third-party module usage */
+		jstate->clocations[jstate->clocations_count].length = -1;
+		jstate->clocations_count++;
+	}
+}
diff --git a/src/include/parser/analyze.h b/src/include/parser/analyze.h
index fede4be820..3ba98daa74 100644
--- a/src/include/parser/analyze.h
+++ b/src/include/parser/analyze.h
@@ -15,10 +15,12 @@
 #define ANALYZE_H
 
 #include "parser/parse_node.h"
+#include "utils/queryjumble.h"
 
 /* Hook for plugins to get control at end of parse analysis */
 typedef void (*post_parse_analyze_hook_type) (ParseState *pstate,
-											  Query *query);
+											  Query *query,
+											  JumbleState *jstate);
 extern PGDLLIMPORT post_parse_analyze_hook_type post_parse_analyze_hook;
 
 
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 5004ee4177..40c4a75bac 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -248,6 +248,7 @@ extern bool log_btree_build_stats;
 extern PGDLLIMPORT bool check_function_bodies;
 extern bool session_auth_is_superuser;
 
+extern bool compute_queryid;
 extern bool log_duration;
 extern int	log_parameter_max_length;
 extern int	log_parameter_max_length_on_error;
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
new file mode 100644
index 0000000000..14087eea43
--- /dev/null
+++ b/src/include/utils/queryjumble.h
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.h
+ *	  Query normalization and fingerprinting.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/utils/queryjumble.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef QUERYJUBLE_H
+#define QUERYJUBLE_H
+
+#include "nodes/parsenodes.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+/*
+ * Struct for tracking locations/lengths of constants during normalization
+ */
+typedef struct LocationLen
+{
+	int			location;		/* start offset in query text */
+	int			length;			/* length in bytes, or -1 to ignore */
+} LocationLen;
+
+/*
+ * Working state for computing a query jumble and producing a normalized
+ * query string
+ */
+typedef struct JumbleState
+{
+	/* Jumble of current query tree */
+	unsigned char *jumble;
+
+	/* Number of bytes used in jumble[] */
+	Size		jumble_len;
+
+	/* Array of locations of constants that should be removed */
+	LocationLen *clocations;
+
+	/* Allocated length of clocations array */
+	int			clocations_buf_size;
+
+	/* Current number of valid entries in clocations array */
+	int			clocations_count;
+
+	/* highest Param id we've seen, in order to start normalization correctly */
+	int			highest_extern_param_id;
+} JumbleState;
+
+const char *clean_querytext(const char *query, int *location, int *len);
+JumbleState *JumbleQuery(Query *query, const char *querytext);
+
+#endif							/* QUERYJUMBLE_H */
-- 
2.29.2

v16-0003-Expose-query-identifier-in-verbose-explain.patchtext/x-patch; charset=US-ASCII; name=v16-0003-Expose-query-identifier-in-verbose-explain.patchDownload

From 60738155b9476abaddff2e5df5b0438ae679da62 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Sun, 8 Mar 2020 14:34:44 +0100
Subject: [PATCH v16 3/3] Expose query identifier in verbose explain

If a query identifier has been computed, either by enabling compute_queryid or
using a third-party module, verbose explain will display it.

Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 doc/src/sgml/config.sgml              | 14 +++++++-------
 doc/src/sgml/ref/explain.sgml         |  6 ++++--
 src/backend/commands/explain.c        | 18 ++++++++++++++++++
 src/test/regress/expected/explain.out |  9 +++++++++
 src/test/regress/sql/explain.sql      |  3 +++
 5 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ffaf46a8a3..71599857df 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7487,13 +7487,13 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         Enables or disables in core query identifier computation.  A query
         identifier can be displayed in the <link
         linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
-        view, or emitted in the log if configured via the <xref
-        linkend="guc-log-line-prefix"/> parameter.  The <xref
-        linkend="pgstatstatements"/> extension also requires a query identifier
-        to be computed.  Note that an external module can alternatively be used
-        if the in core query identifier computation specification doesn't suit
-        your need.  In this case, in core computation must be disabled.  The
-        default is <literal>off</literal>.
+        view, using <command>EXPLAIN</command>, or emitted in the log if
+        configured via the <xref linkend="guc-log-line-prefix"/> parameter.
+        The <xref linkend="pgstatstatements"/> extension also requires a query
+        identifier to be computed.  Note that an external module can
+        alternatively be used if the in core query identifier computation
+        specification doesn't suit your need.  In this case, in core
+        computation must be disabled.  The default is <literal>off</literal>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index c4512332a0..105b069b41 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -136,8 +136,10 @@ ROLLBACK;
       the output column list for each node in the plan tree, schema-qualify
       table and function names, always label variables in expressions with
       their range table alias, and always print the name of each trigger for
-      which statistics are displayed.  This parameter defaults to
-      <literal>FALSE</literal>.
+      which statistics are displayed.  The query identifier will also be
+      displayed if one has been compute, see <xref
+      linkend="guc-compute-queryid"/> for more details.  This parameter
+      defaults to <literal>FALSE</literal>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5d7eb3574c..2e1b4bf0bf 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -24,6 +24,7 @@
 #include "nodes/extensible.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "parser/analyze.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/bufmgr.h"
@@ -163,6 +164,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 {
 	ExplainState *es = NewExplainState();
 	TupOutputState *tstate;
+	JumbleState *jstate = NULL;
+	Query		*query;
 	List	   *rewritten;
 	ListCell   *lc;
 	bool		timing_set = false;
@@ -239,6 +242,13 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	/* if the summary was not set explicitly, set default value */
 	es->summary = (summary_set) ? es->summary : es->analyze;
 
+	query = castNode(Query, stmt->query);
+	if (compute_queryid)
+		jstate = JumbleQuery(query, pstate->p_sourcetext);
+
+	if (post_parse_analyze_hook)
+		(*post_parse_analyze_hook) (pstate, query, jstate);
+
 	/*
 	 * Parse analysis was done already, but we still have to run the rule
 	 * rewriter.  We do not do AcquireRewriteLocks: we assume the query either
@@ -598,6 +608,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	/* Create textual dump of plan tree */
 	ExplainPrintPlan(es, queryDesc);
 
+	if (es->verbose && plannedstmt->queryId != UINT64CONST(0))
+	{
+		char	buf[MAXINT8LEN+1];
+
+		pg_lltoa(plannedstmt->queryId, buf);
+		ExplainPropertyText("Query Identifier", buf, es);
+	}
+
 	/* Show buffer usage in planning */
 	if (bufusage)
 	{
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
index dc7ab2ce8b..966bfef865 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -472,3 +472,12 @@ select jsonb_pretty(
 (1 row)
 
 rollback;
+set compute_queryid = on;
+select explain_filter('explain (verbose) select 1');
+             explain_filter             
+----------------------------------------
+ Result  (cost=N.N..N.N rows=N width=N)
+   Output: N
+ Query Identifier: -N
+(3 rows)
+
diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql
index c79116c927..cec23dec73 100644
--- a/src/test/regress/sql/explain.sql
+++ b/src/test/regress/sql/explain.sql
@@ -105,3 +105,6 @@ select jsonb_pretty(
 );
 
 rollback;
+
+set compute_queryid = on;
+select explain_filter('explain (verbose) select 1');
-- 
2.29.2

tatsuro.yamada.tf@nttcom.co.jp

almost 5 years ago

In reply to: Julien Rouhaud (#108)

1 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Hi Julien,

Rebase only, thanks to the cfbot! V16 attached.

I tested the v16 patch on a0efda88a by using "make installcheck-parallel", and
my result is the following. Attached file is regression.diffs.

========================
1 of 202 tests failed.
========================

The differences that caused some tests to fail can be viewed in the
file "/home/postgres/PG140/src/test/regress/regression.diffs". A copy of the test summary that you see
above is saved in the file "/home/postgres/PG140/src/test/regress/regression.out".

src/test/regress/regression.diffs
---------------------------------
diff -U3 /home/postgres/PG140/src/test/regress/expected/rules.out /home/postgres/PG140/src/test/regress/results/rules.out
--- /home/postgres/PG140/src/test/regress/expected/rules.out    2021-01-20 08:41:16.383175559 +0900
+++ /home/postgres/PG140/src/test/regress/results/rules.out 2021-01-20 08:43:46.589171774 +0900
@@ -1760,10 +1760,9 @@
      s.state,
      s.backend_xid,
      s.backend_xmin,
-    s.queryid,
      s.query,
      s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
...

Thanks,
Tatsuro Yamada

Attachments:

regression.diffstext/plain; charset=UTF-8; name=regression.diffsDownload

diff -U3 /home/postgres/PG140/src/test/regress/expected/rules.out /home/postgres/PG140/src/test/regress/results/rules.out
--- /home/postgres/PG140/src/test/regress/expected/rules.out	2021-01-20 08:41:16.383175559 +0900
+++ /home/postgres/PG140/src/test/regress/results/rules.out	2021-01-20 08:52:10.891159065 +0900
@@ -1760,10 +1760,9 @@
     s.state,
     s.backend_xid,
     s.backend_xmin,
-    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1875,7 +1874,7 @@
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
@@ -2032,7 +2031,7 @@
     w.sync_priority,
     w.sync_state,
     w.reply_time
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_replication_slots| SELECT s.slot_name,
@@ -2063,7 +2062,7 @@
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,

tatsuro.yamada.tf@nttcom.co.jp

almost 5 years ago

In reply to: Tatsuro Yamada (#109)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Hi Julien,

Rebase only, thanks to the cfbot! V16 attached.

I tested the v16 patch on a0efda88a by using "make installcheck-parallel", and
my result is the following. Attached file is regression.diffs.

Sorry, my environment was not suitable for the test when I sent my previous email.
I fixed my environment and tested it again, and it was a success. See below:

=======================
All 202 tests passed.
=======================

Regards,
Tatsuro Yamada

rjuju123@gmail.com

almost 5 years ago

In reply to: Tatsuro Yamada (#110)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Hello Yamada-san,

On Wed, Jan 20, 2021 at 10:06 AM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:

Hi Julien,

Rebase only, thanks to the cfbot! V16 attached.

I tested the v16 patch on a0efda88a by using "make installcheck-parallel", and
my result is the following. Attached file is regression.diffs.

Sorry, my environment was not suitable for the test when I sent my previous email.
I fixed my environment and tested it again, and it was a success. See below:

=======================
All 202 tests passed.
=======================

No worries, thanks a lot for testing!

rjuju123@gmail.com

almost 5 years ago

In reply to: Julien Rouhaud (#108)

3 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Jan 20, 2021 at 12:43 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Fri, Jan 8, 2021 at 1:07 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

v15 that fixes recent conflicts.

Rebase only, thanks to the cfbot! V16 attached.

Recent commit exposed that the explain_filter() doesn't filter
negative sign. This can now be a problem with query identifiers in
explain output as they use the whole bigint range. v17 attached fixes
that, also rebased against current HEAD.

Attachments:

v17-0003-Expose-query-identifier-in-verbose-explain.patchtext/x-patch; charset=US-ASCII; name=v17-0003-Expose-query-identifier-in-verbose-explain.patchDownload

From 8904b60ed3cc770f209ae5f97804b4d1ffe7d175 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Sun, 8 Mar 2020 14:34:44 +0100
Subject: [PATCH v17 3/3] Expose query identifier in verbose explain

If a query identifier has been computed, either by enabling compute_queryid or
using a third-party module, verbose explain will display it.

Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 doc/src/sgml/config.sgml              | 14 +++++++-------
 doc/src/sgml/ref/explain.sgml         |  6 ++++--
 src/backend/commands/explain.c        | 18 ++++++++++++++++++
 src/test/regress/expected/explain.out | 11 ++++++++++-
 src/test/regress/sql/explain.sql      |  5 ++++-
 5 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 1763790473..2aeb146223 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7524,13 +7524,13 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         Enables or disables in core query identifier computation.  A query
         identifier can be displayed in the <link
         linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
-        view, or emitted in the log if configured via the <xref
-        linkend="guc-log-line-prefix"/> parameter.  The <xref
-        linkend="pgstatstatements"/> extension also requires a query identifier
-        to be computed.  Note that an external module can alternatively be used
-        if the in core query identifier computation specification doesn't suit
-        your need.  In this case, in core computation must be disabled.  The
-        default is <literal>off</literal>.
+        view, using <command>EXPLAIN</command>, or emitted in the log if
+        configured via the <xref linkend="guc-log-line-prefix"/> parameter.
+        The <xref linkend="pgstatstatements"/> extension also requires a query
+        identifier to be computed.  Note that an external module can
+        alternatively be used if the in core query identifier computation
+        specification doesn't suit your need.  In this case, in core
+        computation must be disabled.  The default is <literal>off</literal>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index c4512332a0..105b069b41 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -136,8 +136,10 @@ ROLLBACK;
       the output column list for each node in the plan tree, schema-qualify
       table and function names, always label variables in expressions with
       their range table alias, and always print the name of each trigger for
-      which statistics are displayed.  This parameter defaults to
-      <literal>FALSE</literal>.
+      which statistics are displayed.  The query identifier will also be
+      displayed if one has been compute, see <xref
+      linkend="guc-compute-queryid"/> for more details.  This parameter
+      defaults to <literal>FALSE</literal>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index afc45429ba..ac5879c1cf 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -24,6 +24,7 @@
 #include "nodes/extensible.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "parser/analyze.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/bufmgr.h"
@@ -163,6 +164,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 {
 	ExplainState *es = NewExplainState();
 	TupOutputState *tstate;
+	JumbleState *jstate = NULL;
+	Query		*query;
 	List	   *rewritten;
 	ListCell   *lc;
 	bool		timing_set = false;
@@ -239,6 +242,13 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	/* if the summary was not set explicitly, set default value */
 	es->summary = (summary_set) ? es->summary : es->analyze;
 
+	query = castNode(Query, stmt->query);
+	if (compute_queryid)
+		jstate = JumbleQuery(query, pstate->p_sourcetext);
+
+	if (post_parse_analyze_hook)
+		(*post_parse_analyze_hook) (pstate, query, jstate);
+
 	/*
 	 * Parse analysis was done already, but we still have to run the rule
 	 * rewriter.  We do not do AcquireRewriteLocks: we assume the query either
@@ -598,6 +608,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	/* Create textual dump of plan tree */
 	ExplainPrintPlan(es, queryDesc);
 
+	if (es->verbose && plannedstmt->queryId != UINT64CONST(0))
+	{
+		char	buf[MAXINT8LEN+1];
+
+		pg_lltoa(plannedstmt->queryId, buf);
+		ExplainPropertyText("Query Identifier", buf, es);
+	}
+
 	/* Show buffer usage in planning */
 	if (bufusage)
 	{
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
index dc7ab2ce8b..f45f069f30 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -17,7 +17,7 @@ begin
     for ln in execute $1
     loop
         -- Replace any numeric word with just 'N'
-        ln := regexp_replace(ln, '\m\d+\M', 'N', 'g');
+        ln := regexp_replace(ln, '-?\m\d+\M', 'N', 'g');
         -- In sort output, the above won't match units-suffixed numbers
         ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
         -- Ignore text-mode buffers output because it varies depending
@@ -472,3 +472,12 @@ select jsonb_pretty(
 (1 row)
 
 rollback;
+set compute_queryid = on;
+select explain_filter('explain (verbose) select 1');
+             explain_filter             
+----------------------------------------
+ Result  (cost=N.N..N.N rows=N width=N)
+   Output: N
+ Query Identifier: N
+(3 rows)
+
diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql
index c79116c927..99f7bb1bf5 100644
--- a/src/test/regress/sql/explain.sql
+++ b/src/test/regress/sql/explain.sql
@@ -19,7 +19,7 @@ begin
     for ln in execute $1
     loop
         -- Replace any numeric word with just 'N'
-        ln := regexp_replace(ln, '\m\d+\M', 'N', 'g');
+        ln := regexp_replace(ln, '-?\m\d+\M', 'N', 'g');
         -- In sort output, the above won't match units-suffixed numbers
         ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
         -- Ignore text-mode buffers output because it varies depending
@@ -105,3 +105,6 @@ select jsonb_pretty(
 );
 
 rollback;
+
+set compute_queryid = on;
+select explain_filter('explain (verbose) select 1');
-- 
2.30.1

v17-0002-Expose-queryid-in-pg_stat_activity-and-log_line_.patchtext/x-patch; charset=US-ASCII; name=v17-0002-Expose-queryid-in-pg_stat_activity-and-log_line_.patchDownload

From 46a08791812ec798976e2b45bf45c9d7d46551e3 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Mon, 18 Mar 2019 18:55:50 +0100
Subject: [PATCH v17 2/3] Expose queryid in pg_stat_activity and
 log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.

Author: Julien Rouhaud
Reviewed-by: Evgeny Efimkin, Michael Paquier, Tatsuro Yamada, Torikoshi Atsushi
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 112 +++++++-----------
 doc/src/sgml/config.sgml                      |  29 +++--
 doc/src/sgml/monitoring.sgml                  |  16 +++
 src/backend/catalog/system_views.sql          |   1 +
 src/backend/executor/execMain.c               |   8 ++
 src/backend/executor/execParallel.c           |  14 ++-
 src/backend/executor/nodeGather.c             |   3 +-
 src/backend/executor/nodeGatherMerge.c        |   4 +-
 src/backend/parser/analyze.c                  |   5 +
 src/backend/postmaster/pgstat.c               |  65 ++++++++++
 src/backend/tcop/postgres.c                   |   5 +
 src/backend/utils/adt/pgstatfuncs.c           |   7 +-
 src/backend/utils/error/elog.c                |   9 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          |  29 +++--
 src/include/catalog/pg_proc.dat               |   6 +-
 src/include/executor/execParallel.h           |   3 +-
 src/include/pgstat.h                          |   5 +
 src/include/utils/queryjumble.h               |   2 +-
 src/test/regress/expected/rules.out           |   9 +-
 20 files changed, 223 insertions(+), 110 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 99bc7184cb..2fc57f1254 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -65,6 +65,7 @@
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/queryjumble.h"
 #include "utils/memutils.h"
 #include "utils/timestamp.h"
 
@@ -99,6 +100,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+									!IsA(n, PrepareStmt) && \
+									!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -307,7 +316,6 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 								ProcessUtilityContext context, ParamListInfo params,
 								QueryEnvironment *queryEnv,
 								DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
 static void pgss_store(const char *query, uint64 queryId,
 					   int query_location, int query_len,
 					   pgssStoreKind kind,
@@ -804,16 +812,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * Clear queryId for prepared statements related utility, as those will
+	 * inherit from the underlying statement's one (except DEALLOCATE which is
+	 * entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt))
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -1055,6 +1061,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, QueryCompletion *qc)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
+	uint64		saved_queryId = pstmt->queryId;
+
+	/*
+	 * Force utility statements to get queryId zero.  We do this even in cases
+	 * where the statement contains an optimizable statement for which a
+	 * queryId could be derived (such as EXPLAIN or DECLARE CURSOR).  For such
+	 * cases, runtime control will first go through ProcessUtility and then the
+	 * executor, and we don't want the executor hooks to do anything, since we
+	 * are already measuring the statement's costs at the utility level.
+	 *
+	 * Note that this is only done if pg_stat_statements is enabled and
+	 * configured to track utility statements, in the unlikely possibility
+	 * that user configured another extension to handle utility statements
+	 * only.
+	 */
+	if (pgss_enabled(exec_nested_level) && pgss_track_utility)
+		pstmt->queryId = UINT64CONST(0);
 
 	/*
 	 * If it's an EXECUTE statement, we don't track it and don't increment the
@@ -1071,9 +1094,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	 * Likewise, we don't track execution of DEALLOCATE.
 	 */
 	if (pgss_track_utility && pgss_enabled(exec_nested_level) &&
-		!IsA(parsetree, ExecuteStmt) &&
-		!IsA(parsetree, PrepareStmt) &&
-		!IsA(parsetree, DeallocateStmt))
+		PGSS_HANDLED_UTILITY(parsetree))
 	{
 		instr_time	start;
 		instr_time	duration;
@@ -1128,7 +1149,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
 
 		pgss_store(queryString,
-				   0,			/* signal that it's a utility stmt */
+				   saved_queryId,
 				   pstmt->stmt_location,
 				   pstmt->stmt_len,
 				   PGSS_EXEC,
@@ -1151,23 +1172,12 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	}
 }
 
-/*
- * Given an arbitrarily long query string, produce a hash for the purposes of
- * identifying the query, without normalizing constants.  Used when hashing
- * utility statements.
- */
-static uint64
-pgss_hash_string(const char *str, int len)
-{
-	return DatumGetUInt64(hash_any_extended((const unsigned char *) str,
-											len, 0));
-}
-
 /*
  * Store some statistics for a statement.
  *
- * If queryId is 0 then this is a utility statement and we should compute
- * a suitable queryId internally.
+ * If queryId is 0 then this is a utility statement for which we couldn't
+ * compute a queryId during parse analysis, and we should compute a suitable
+ * queryId internally.
  *
  * If jstate is not NULL then we're trying to create an entry for which
  * we have no statistics as yet; we just want to record the normalized
@@ -1198,52 +1208,18 @@ pgss_store(const char *query, uint64 queryId,
 		return;
 
 	/*
-	 * Confine our attention to the relevant part of the string, if the query
-	 * is a portion of a multi-statement source string.
-	 *
-	 * First apply starting offset, unless it's -1 (unknown).
-	 */
-	if (query_location >= 0)
-	{
-		Assert(query_location <= strlen(query));
-		query += query_location;
-		/* Length of 0 (or -1) means "rest of string" */
-		if (query_len <= 0)
-			query_len = strlen(query);
-		else
-			Assert(query_len <= strlen(query));
-	}
-	else
-	{
-		/* If query location is unknown, distrust query_len as well */
-		query_location = 0;
-		query_len = strlen(query);
-	}
-
-	/*
-	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
-	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 * Nothing to do if compute_queryid isn't enabled and no other module
+	 * computed a query identifier.
 	 */
-	while (query_len > 0 && scanner_isspace(query[0]))
-		query++, query_location++, query_len--;
-	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
-		query_len--;
+	if (queryId == UINT64CONST(0))
+		return;
 
 	/*
-	 * For utility statements, we just hash the query string to get an ID.
+	 * Confine our attention to the relevant part of the string, if the query
+	 * is a portion of a multi-statement source string, and update query
+	 * location and length if needed.
 	 */
-	if (queryId == UINT64CONST(0))
-	{
-		queryId = pgss_hash_string(query, query_len);
-
-		/*
-		 * If we are unlucky enough to get a hash of zero(invalid), use
-		 * queryID as 2 instead, queryID 1 is already in use for normal
-		 * statements.
-		 */
-		if (queryId == UINT64CONST(0))
-			queryId = UINT64CONST(2);
-	}
+	query = CleanQuerytext(query, &query_location, &query_len);
 
 	/* Set up key for hashtable search */
 	key.userid = GetUserId();
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e5b6a68bae..1763790473 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6909,6 +6909,15 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>queryid: identifier of session's current query.
+             By default, query identifiers are not computed, so this field will
+             always be zero, unless <xref linkend="guc-compute-queryid"/>
+             parameter is enabled or if a third-party module that computes query
+             identifiers is configured.</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
@@ -7385,8 +7394,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       <listitem>
        <para>
         Enables the collection of information on the currently
-        executing command of each session, along with the time when
-        that command began execution. This parameter is on by
+        executing command of each session, along with its identifier and the
+        time when that command began execution. This parameter is on by
         default. Note that even when enabled, this information is not
         visible to all users, only to superusers and the user owning
         the session being reported on, so it should not represent a
@@ -7512,12 +7521,16 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </term>
       <listitem>
        <para>
-        Enables or disables in core query identifier computation.arameter.  The
-        <xref linkend="pgstatstatements"/> extension requires a query
-        identifier to be computed.  Note that an external module can
-        alternatively be used if the in core query identifier computation
-        specification doesn't suit your need.  In this case, in core
-        computation must be disabled.  The default is <literal>off</literal>.
+        Enables or disables in core query identifier computation.  A query
+        identifier can be displayed in the <link
+        linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
+        view, or emitted in the log if configured via the <xref
+        linkend="guc-log-line-prefix"/> parameter.  The <xref
+        linkend="pgstatstatements"/> extension also requires a query identifier
+        to be computed.  Note that an external module can alternatively be used
+        if the in core query identifier computation specification doesn't suit
+        your need.  In this case, in core computation must be disabled.  The
+        default is <literal>off</literal>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3513e127b7..8f2a98b0bc 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -905,6 +905,22 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </para></entry>
      </row>
 
+    <row>
+     <entry role="catalog_table_entry"><para role="column_definition">
+      <structfield>queryid</structfield> <type>bigint</type>
+     </para>
+     <para>
+      Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this field
+      shows the identifier of the currently executing query. In all other
+      states, it shows the identifier of last query that was executed.  By
+      default, query identifiers are not computed, so this field will always
+      be null, unless <xref linkend="guc-compute-queryid"/> parameter is
+      enabled or if a third-party module that computes query identifiers is
+      configured.
+     </para></entry>
+    </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd9d7..fdcc21c656 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -764,6 +764,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index c74ce36ffb..389345341f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -54,6 +54,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "tcop/utility.h"
@@ -128,6 +129,13 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	/* In some cases (e.g. an EXECUTE statement) a query execution will skip
+	 * parse analysis, which means that the queryid won't be reported.  Note
+	 * that it's harmless to report the queryid multiple time, as the call will
+	 * be ignored if the top level queryid has already been reported.
+	 */
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..26f1994a31 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -124,7 +124,7 @@ typedef struct ExecParallelInitializeDSMContext
 } ExecParallelInitializeDSMContext;
 
 /* Helper functions that run in the parallel leader. */
-static char *ExecSerializePlan(Plan *plan, EState *estate);
+static char *ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId);
 static bool ExecParallelEstimate(PlanState *node,
 								 ExecParallelEstimateContext *e);
 static bool ExecParallelInitializeDSM(PlanState *node,
@@ -143,7 +143,7 @@ static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
  * Create a serialized representation of the plan to be sent to each worker.
  */
 static char *
-ExecSerializePlan(Plan *plan, EState *estate)
+ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId)
 {
 	PlannedStmt *pstmt;
 	ListCell   *lc;
@@ -174,7 +174,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = UINT64CONST(0);
+	pstmt->queryId = queryId;
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
@@ -578,7 +578,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
 ParallelExecutorInfo *
 ExecInitParallelPlan(PlanState *planstate, EState *estate,
 					 Bitmapset *sendParams, int nworkers,
-					 int64 tuples_needed)
+					 int64 tuples_needed,
+					 uint64 queryId)
 {
 	ParallelExecutorInfo *pei;
 	ParallelContext *pcxt;
@@ -620,7 +621,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pei->planstate = planstate;
 
 	/* Fix up and serialize plan to be sent to workers. */
-	pstmt_data = ExecSerializePlan(planstate->plan, estate);
+	pstmt_data = ExecSerializePlan(planstate->plan, estate, queryId);
 
 	/* Create a parallel context. */
 	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -1403,8 +1404,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Setting debug_query_string for individual workers */
 	debug_query_string = queryDesc->sourceText;
 
-	/* Report workers' query for monitoring purposes */
+	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 9e1dc464cb..04c860f678 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -172,7 +172,8 @@ ExecGather(PlanState *pstate)
 												 estate,
 												 gather->initParam,
 												 gather->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index aa5743cebf..32f74e8c23 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -24,6 +24,7 @@
 #include "lib/binaryheap.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -216,7 +217,8 @@ ExecGatherMerge(PlanState *pstate)
 												 estate,
 												 gm->initParam,
 												 gm->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index ddfb97b543..0dd7e95abd 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -44,6 +44,7 @@
 #include "parser/parse_target.h"
 #include "parser/parse_type.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
@@ -130,6 +131,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
@@ -167,6 +170,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f75b52719d..b68efa320b 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3382,6 +3382,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
+			beentry->st_queryid = 0;
 			proc->wait_event_info = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
@@ -3436,6 +3437,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	beentry->st_state = state;
 	beentry->st_state_start_timestamp = current_timestamp;
 
+	/*
+	 * If a new query is started, we reset the query identifier as it'll only
+	 * be known after parse analysis, to avoid reporting last query's
+	 * identifier.
+	 */
+	if (state == STATE_RUNNING)
+		beentry->st_queryid = 0;
+
 	if (cmd_str != NULL)
 	{
 		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
@@ -3446,6 +3455,48 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/* --------
+ * pgstat_report_queryid() -
+ *
+ *	Called to update top-level query identifier.
+ * --------
+ */
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	/*
+	 * if track_activities is disabled, st_queryid should already have been
+	 * reset
+	 */
+	if (!pgstat_track_activities)
+		return;
+
+	/*
+	 * We only report the top-level query identifiers.  The stored queryid is
+	 * reset when a backend calls pgstat_report_activity(STATE_RUNNING), or
+	 * with an explicit call to this function using the force flag.  If the
+	 * saved query identifier is not zero it means that it's not a top-level
+	 * command, so ignore the one provided unless it's an explicit call to
+	 * reset the identifier.
+	 */
+	if (beentry->st_queryid != 0 && !force)
+		return;
+
+	/*
+	 * Update my status entry, following the protocol of bumping
+	 * st_changecount before and after.  We use a volatile pointer here to
+	 * ensure the compiler doesn't try to get cute.
+	 */
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+	beentry->st_queryid = queryId;
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
 /*-----------
  * pgstat_progress_start_command() -
  *
@@ -5133,6 +5184,20 @@ pgstat_get_db_entry(Oid databaseid, bool create)
 	return result;
 }
 
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ *	Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+	if (!MyBEEntry)
+		return 0;
+
+	return MyBEEntry->st_queryid;
+}
+
 
 /*
  * Lookup the hash table entry for the specified table. If no hash
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 99f460a301..7306c1e8b9 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -747,6 +747,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	if (log_parser_stats)
 		ShowUsage("PARSE ANALYSIS STATISTICS");
 
@@ -965,6 +967,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
 			stmt->utilityStmt = query->utilityStmt;
 			stmt->stmt_location = query->stmt_location;
 			stmt->stmt_len = query->stmt_len;
+			stmt->queryId = query->queryId;
 		}
 		else
 		{
@@ -1081,6 +1084,8 @@ exec_simple_query(const char *query_string)
 		DestReceiver *receiver;
 		int16		format;
 
+		pgstat_report_queryid(0, true);
+
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 62bff52638..5e0ba55ac1 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -569,7 +569,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	30
+#define PG_STAT_GET_ACTIVITY_COLS	31
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -915,6 +915,10 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[28] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			if (beentry->st_queryid == 0)
+				nulls[30] = true;
+			else
+				values[30] = DatumGetUInt64(beentry->st_queryid);
 		}
 		else
 		{
@@ -943,6 +947,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[27] = true;
 			nulls[28] = true;
 			nulls[29] = true;
+			nulls[30] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 80c2672461..1ed2e146d5 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -77,7 +77,6 @@
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
-#include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2717,6 +2716,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (padding != 0)
+					appendStringInfo(buf, "%*ld", padding,
+							pgstat_get_my_queryid());
+				else
+					appendStringInfo(buf, "%ld",
+							pgstat_get_my_queryid());
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 344049aac7..c44bf1c700 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -541,6 +541,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index ae84fcac6e..b0a5731ef7 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -39,7 +39,7 @@
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
-static uint64 compute_utility_queryid(const char *str, int query_len);
+static uint64 compute_utility_queryid(const char *str, int query_location, int query_len);
 static void AppendJumble(JumbleState *jstate,
 						 const unsigned char *item, Size size);
 static void JumbleQueryInternal(JumbleState *jstate, Query *query);
@@ -53,7 +53,7 @@ static void RecordConstLocation(JumbleState *jstate, int location);
  * relevant part of the string.
  */
 const char *
-clean_querytext(const char *query, int *location, int *len)
+CleanQuerytext(const char *query, int *location, int *len)
 {
 	int query_location = *location;
 	int query_len = *len;
@@ -97,17 +97,9 @@ JumbleQuery(Query *query, const char *querytext)
 	JumbleState *jstate = NULL;
 	if (query->utilityStmt)
 	{
-		const char *sql;
-		int query_location = query->stmt_location;
-		int query_len = query->stmt_len;
-
-		/*
-		 * Confine our attention to the relevant part of the string, if the
-		 * query is a portion of a multi-statement source string.
-		 */
-		sql = clean_querytext(querytext, &query_location, &query_len);
-
-		query->queryId = compute_utility_queryid(sql, query_len);
+		query->queryId = compute_utility_queryid(querytext,
+												 query->stmt_location,
+												 query->stmt_len);
 	}
 	else
 	{
@@ -143,11 +135,18 @@ JumbleQuery(Query *query, const char *querytext)
  * Compute a query identifier for the given utility query string.
  */
 static uint64
-compute_utility_queryid(const char *str, int query_len)
+compute_utility_queryid(const char *query_text, int query_location, int query_len)
 {
 	uint64 queryId;
+	const char *sql;
+
+	/*
+	 * Confine our attention to the relevant part of the string, if the
+	 * query is a portion of a multi-statement source string.
+	 */
+	sql = CleanQuerytext(query_text, &query_location, &query_len);
 
-	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) sql,
 											   query_len, 0));
 
 	/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1487710d59..23dab59362 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5246,9 +5246,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,bool,text,numeric,text,bool,text,bool,int4,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid, queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 3888175a2f..e0e08e0b27 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -39,7 +39,8 @@ typedef struct ParallelExecutorInfo
 
 extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
 												  EState *estate, Bitmapset *sendParam, int nworkers,
-												  int64 tuples_needed);
+												  int64 tuples_needed,
+												  uint64 queryId);
 extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
 extern void ExecParallelFinish(ParallelExecutorInfo *pei);
 extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 724068cf87..2347070116 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1252,6 +1252,9 @@ typedef struct PgBackendStatus
 	ProgressCommandType st_progress_command;
 	Oid			st_progress_command_target;
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+
+	/* query identifier, optionally computed using post_parse_analyze_hook */
+	uint64		st_queryid;
 } PgBackendStatus;
 
 /*
@@ -1446,6 +1449,7 @@ extern void pgstat_initialize(void);
 extern void pgstat_bestart(void);
 
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_queryid(uint64 queryId, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
@@ -1454,6 +1458,7 @@ extern const char *pgstat_get_wait_event_type(uint32 wait_event_info);
 extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
+extern uint64 pgstat_get_my_queryid(void);
 
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
index 14087eea43..520cd4f43e 100644
--- a/src/include/utils/queryjumble.h
+++ b/src/include/utils/queryjumble.h
@@ -52,7 +52,7 @@ typedef struct JumbleState
 	int			highest_extern_param_id;
 } JumbleState;
 
-const char *clean_querytext(const char *query, int *location, int *len);
+const char *CleanQuerytext(const char *query, int *location, int *len);
 JumbleState *JumbleQuery(Query *query, const char *querytext);
 
 #endif							/* QUERYJUMBLE_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 10a1f34ebc..6260ae9ce7 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1761,9 +1761,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1875,7 +1876,7 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
@@ -2032,7 +2033,7 @@ pg_stat_replication| SELECT s.pid,
     w.sync_priority,
     w.sync_state,
     w.reply_time
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_replication_slots| SELECT s.slot_name,
@@ -2063,7 +2064,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,
-- 
2.30.1

v17-0001-Move-pg_stat_statements-query-jumbling-to-core.patchtext/x-patch; charset=US-ASCII; name=v17-0001-Move-pg_stat_statements-query-jumbling-to-core.patchDownload

From 7543a2ea88109970f9cc713d7b25ca6d81cdbbfb Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Wed, 14 Oct 2020 02:11:37 +0800
Subject: [PATCH v17 1/3] Move pg_stat_statements query jumbling to core.

A new compute_queryid GUC is also added, to control whether the queryid should
be computed.  It's now possible to disable core queryid computation and use
pg_stat_statements with a different algorithm to compute the queryid by using
third-party module.

Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +----------------
 .../pg_stat_statements.conf                   |   1 +
 doc/src/sgml/config.sgml                      |  18 +
 src/backend/parser/analyze.c                  |  14 +-
 src/backend/tcop/postgres.c                   |   6 +-
 src/backend/utils/misc/Makefile               |   1 +
 src/backend/utils/misc/guc.c                  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          | 834 ++++++++++++++++++
 src/include/parser/analyze.h                  |   4 +-
 src/include/utils/guc.h                       |   1 +
 src/include/utils/queryjumble.h               |  58 ++
 12 files changed, 969 insertions(+), 784 deletions(-)
 create mode 100644 src/backend/utils/misc/queryjumble.c
 create mode 100644 src/include/utils/queryjumble.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 62cccbfa44..99bc7184cb 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * As of Postgres 9.2, this module normalizes query entries.  As of Postgres
+ * 14, the normalization is done by the core, if compute_queryid is enabled, or
+ * by third-party modules if enabled.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -114,8 +99,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
-#define JUMBLE_SIZE				1024	/* query serialization buffer size */
-
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -235,40 +218,6 @@ typedef struct pgssSharedState
 	pgssGlobalStats stats;		/* global statistics for pgss */
 } pgssSharedState;
 
-/*
- * Struct for tracking locations/lengths of constants during normalization
- */
-typedef struct pgssLocationLen
-{
-	int			location;		/* start offset in query text */
-	int			length;			/* length in bytes, or -1 to ignore */
-} pgssLocationLen;
-
-/*
- * Working state for computing a query jumble and producing a normalized
- * query string
- */
-typedef struct pgssJumbleState
-{
-	/* Jumble of current query tree */
-	unsigned char *jumble;
-
-	/* Number of bytes used in jumble[] */
-	Size		jumble_len;
-
-	/* Array of locations of constants that should be removed */
-	pgssLocationLen *clocations;
-
-	/* Allocated length of clocations array */
-	int			clocations_buf_size;
-
-	/* Current number of valid entries in clocations array */
-	int			clocations_count;
-
-	/* highest Param id we've seen, in order to start normalization correctly */
-	int			highest_extern_param_id;
-} pgssJumbleState;
-
 /*---- Local variables ----*/
 
 /* Current nesting depth of ExecutorRun+ProcessUtility calls */
@@ -342,7 +291,8 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_info);
 
 static void pgss_shmem_startup(void);
 static void pgss_shmem_shutdown(int code, Datum arg);
-static void pgss_post_parse_analyze(ParseState *pstate, Query *query);
+static void pgss_post_parse_analyze(ParseState *pstate, Query *query,
+									JumbleState *jstate);
 static PlannedStmt *pgss_planner(Query *parse,
 								 const char *query_string,
 								 int cursorOptions,
@@ -364,7 +314,7 @@ static void pgss_store(const char *query, uint64 queryId,
 					   double total_time, uint64 rows,
 					   const BufferUsage *bufusage,
 					   const WalUsage *walusage,
-					   pgssJumbleState *jstate);
+					   JumbleState *jstate);
 static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
 										pgssVersion api_version,
 										bool showtext);
@@ -380,16 +330,9 @@ static char *qtext_fetch(Size query_offset, int query_len,
 static bool need_gc_qtexts(void);
 static void gc_qtexts(void);
 static void entry_reset(Oid userid, Oid dbid, uint64 queryid);
-static void AppendJumble(pgssJumbleState *jstate,
-						 const unsigned char *item, Size size);
-static void JumbleQuery(pgssJumbleState *jstate, Query *query);
-static void JumbleRangeTable(pgssJumbleState *jstate, List *rtable);
-static void JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks);
-static void JumbleExpr(pgssJumbleState *jstate, Node *node);
-static void RecordConstLocation(pgssJumbleState *jstate, int location);
-static char *generate_normalized_query(pgssJumbleState *jstate, const char *query,
+static char *generate_normalized_query(JumbleState *jstate, const char *query,
 									   int query_loc, int *query_len_p);
-static void fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+static void fill_in_constant_lengths(JumbleState *jstate, const char *query,
 									 int query_loc);
 static int	comp_location(const void *a, const void *b);
 
@@ -851,15 +794,10 @@ error:
  * Post-parse-analysis hook: mark query with a queryId
  */
 static void
-pgss_post_parse_analyze(ParseState *pstate, Query *query)
+pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 {
-	pgssJumbleState jstate;
-
 	if (prev_post_parse_analyze_hook)
-		prev_post_parse_analyze_hook(pstate, query);
-
-	/* Assert we didn't do this already */
-	Assert(query->queryId == UINT64CONST(0));
+		prev_post_parse_analyze_hook(pstate, query, jstate);
 
 	/* Safety check... */
 	if (!pgss || !pgss_hash || !pgss_enabled(exec_nested_level))
@@ -879,35 +817,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 	}
 
-	/* Set up workspace for query jumbling */
-	jstate.jumble = (unsigned char *) palloc(JUMBLE_SIZE);
-	jstate.jumble_len = 0;
-	jstate.clocations_buf_size = 32;
-	jstate.clocations = (pgssLocationLen *)
-		palloc(jstate.clocations_buf_size * sizeof(pgssLocationLen));
-	jstate.clocations_count = 0;
-	jstate.highest_extern_param_id = 0;
-
-	/* Compute query ID and mark the Query node with it */
-	JumbleQuery(&jstate, query);
-	query->queryId =
-		DatumGetUInt64(hash_any_extended(jstate.jumble, jstate.jumble_len, 0));
-
 	/*
-	 * If we are unlucky enough to get a hash of zero, use 1 instead, to
-	 * prevent confusion with the utility-statement case.
+	 * If query jumbling were able to identify any ignorable constants, we
+	 * immediately create a hash table entry for the query, so that we can
+	 * record the normalized form of the query string.  If there were no such
+	 * constants, the normalized string would be the same as the query text
+	 * anyway, so there's no need for an early entry.
 	 */
-	if (query->queryId == UINT64CONST(0))
-		query->queryId = UINT64CONST(1);
-
-	/*
-	 * If we were able to identify any ignorable constants, we immediately
-	 * create a hash table entry for the query, so that we can record the
-	 * normalized form of the query string.  If there were no such constants,
-	 * the normalized string would be the same as the query text anyway, so
-	 * there's no need for an early entry.
-	 */
-	if (jstate.clocations_count > 0)
+	if (jstate && jstate->clocations_count > 0)
 		pgss_store(pstate->p_sourcetext,
 				   query->queryId,
 				   query->stmt_location,
@@ -917,7 +834,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 				   0,
 				   NULL,
 				   NULL,
-				   &jstate);
+				   jstate);
 }
 
 /*
@@ -1267,7 +1184,7 @@ pgss_store(const char *query, uint64 queryId,
 		   double total_time, uint64 rows,
 		   const BufferUsage *bufusage,
 		   const WalUsage *walusage,
-		   pgssJumbleState *jstate)
+		   JumbleState *jstate)
 {
 	pgssHashKey key;
 	pgssEntry  *entry;
@@ -2627,678 +2544,6 @@ release_lock:
 	LWLockRelease(pgss->lock);
 }
 
-/*
- * AppendJumble: Append a value that is substantive in a given query to
- * the current jumble.
- */
-static void
-AppendJumble(pgssJumbleState *jstate, const unsigned char *item, Size size)
-{
-	unsigned char *jumble = jstate->jumble;
-	Size		jumble_len = jstate->jumble_len;
-
-	/*
-	 * Whenever the jumble buffer is full, we hash the current contents and
-	 * reset the buffer to contain just that hash value, thus relying on the
-	 * hash to summarize everything so far.
-	 */
-	while (size > 0)
-	{
-		Size		part_size;
-
-		if (jumble_len >= JUMBLE_SIZE)
-		{
-			uint64		start_hash;
-
-			start_hash = DatumGetUInt64(hash_any_extended(jumble,
-														  JUMBLE_SIZE, 0));
-			memcpy(jumble, &start_hash, sizeof(start_hash));
-			jumble_len = sizeof(start_hash);
-		}
-		part_size = Min(size, JUMBLE_SIZE - jumble_len);
-		memcpy(jumble + jumble_len, item, part_size);
-		jumble_len += part_size;
-		item += part_size;
-		size -= part_size;
-	}
-	jstate->jumble_len = jumble_len;
-}
-
-/*
- * Wrappers around AppendJumble to encapsulate details of serialization
- * of individual local variable elements.
- */
-#define APP_JUMB(item) \
-	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
-#define APP_JUMB_STRING(str) \
-	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
-
-/*
- * JumbleQuery: Selectively serialize the query tree, appending significant
- * data to the "query jumble" while ignoring nonsignificant data.
- *
- * Rule of thumb for what to include is that we should ignore anything not
- * semantically significant (such as alias names) as well as anything that can
- * be deduced from child nodes (else we'd just be double-hashing that piece
- * of information).
- */
-static void
-JumbleQuery(pgssJumbleState *jstate, Query *query)
-{
-	Assert(IsA(query, Query));
-	Assert(query->utilityStmt == NULL);
-
-	APP_JUMB(query->commandType);
-	/* resultRelation is usually predictable from commandType */
-	JumbleExpr(jstate, (Node *) query->cteList);
-	JumbleRangeTable(jstate, query->rtable);
-	JumbleExpr(jstate, (Node *) query->jointree);
-	JumbleExpr(jstate, (Node *) query->targetList);
-	JumbleExpr(jstate, (Node *) query->onConflict);
-	JumbleExpr(jstate, (Node *) query->returningList);
-	JumbleExpr(jstate, (Node *) query->groupClause);
-	JumbleExpr(jstate, (Node *) query->groupingSets);
-	JumbleExpr(jstate, query->havingQual);
-	JumbleExpr(jstate, (Node *) query->windowClause);
-	JumbleExpr(jstate, (Node *) query->distinctClause);
-	JumbleExpr(jstate, (Node *) query->sortClause);
-	JumbleExpr(jstate, query->limitOffset);
-	JumbleExpr(jstate, query->limitCount);
-	JumbleRowMarks(jstate, query->rowMarks);
-	JumbleExpr(jstate, query->setOperations);
-}
-
-/*
- * Jumble a range table
- */
-static void
-JumbleRangeTable(pgssJumbleState *jstate, List *rtable)
-{
-	ListCell   *lc;
-
-	foreach(lc, rtable)
-	{
-		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
-
-		APP_JUMB(rte->rtekind);
-		switch (rte->rtekind)
-		{
-			case RTE_RELATION:
-				APP_JUMB(rte->relid);
-				JumbleExpr(jstate, (Node *) rte->tablesample);
-				break;
-			case RTE_SUBQUERY:
-				JumbleQuery(jstate, rte->subquery);
-				break;
-			case RTE_JOIN:
-				APP_JUMB(rte->jointype);
-				break;
-			case RTE_FUNCTION:
-				JumbleExpr(jstate, (Node *) rte->functions);
-				break;
-			case RTE_TABLEFUNC:
-				JumbleExpr(jstate, (Node *) rte->tablefunc);
-				break;
-			case RTE_VALUES:
-				JumbleExpr(jstate, (Node *) rte->values_lists);
-				break;
-			case RTE_CTE:
-
-				/*
-				 * Depending on the CTE name here isn't ideal, but it's the
-				 * only info we have to identify the referenced WITH item.
-				 */
-				APP_JUMB_STRING(rte->ctename);
-				APP_JUMB(rte->ctelevelsup);
-				break;
-			case RTE_NAMEDTUPLESTORE:
-				APP_JUMB_STRING(rte->enrname);
-				break;
-			case RTE_RESULT:
-				break;
-			default:
-				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
-				break;
-		}
-	}
-}
-
-/*
- * Jumble a rowMarks list
- */
-static void
-JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks)
-{
-	ListCell   *lc;
-
-	foreach(lc, rowMarks)
-	{
-		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
-
-		if (!rowmark->pushedDown)
-		{
-			APP_JUMB(rowmark->rti);
-			APP_JUMB(rowmark->strength);
-			APP_JUMB(rowmark->waitPolicy);
-		}
-	}
-}
-
-/*
- * Jumble an expression tree
- *
- * In general this function should handle all the same node types that
- * expression_tree_walker() does, and therefore it's coded to be as parallel
- * to that function as possible.  However, since we are only invoked on
- * queries immediately post-parse-analysis, we need not handle node types
- * that only appear in planning.
- *
- * Note: the reason we don't simply use expression_tree_walker() is that the
- * point of that function is to support tree walkers that don't care about
- * most tree node types, but here we care about all types.  We should complain
- * about any unrecognized node type.
- */
-static void
-JumbleExpr(pgssJumbleState *jstate, Node *node)
-{
-	ListCell   *temp;
-
-	if (node == NULL)
-		return;
-
-	/* Guard against stack overflow due to overly complex expressions */
-	check_stack_depth();
-
-	/*
-	 * We always emit the node's NodeTag, then any additional fields that are
-	 * considered significant, and then we recurse to any child nodes.
-	 */
-	APP_JUMB(node->type);
-
-	switch (nodeTag(node))
-	{
-		case T_Var:
-			{
-				Var		   *var = (Var *) node;
-
-				APP_JUMB(var->varno);
-				APP_JUMB(var->varattno);
-				APP_JUMB(var->varlevelsup);
-			}
-			break;
-		case T_Const:
-			{
-				Const	   *c = (Const *) node;
-
-				/* We jumble only the constant's type, not its value */
-				APP_JUMB(c->consttype);
-				/* Also, record its parse location for query normalization */
-				RecordConstLocation(jstate, c->location);
-			}
-			break;
-		case T_Param:
-			{
-				Param	   *p = (Param *) node;
-
-				APP_JUMB(p->paramkind);
-				APP_JUMB(p->paramid);
-				APP_JUMB(p->paramtype);
-				/* Also, track the highest external Param id */
-				if (p->paramkind == PARAM_EXTERN &&
-					p->paramid > jstate->highest_extern_param_id)
-					jstate->highest_extern_param_id = p->paramid;
-			}
-			break;
-		case T_Aggref:
-			{
-				Aggref	   *expr = (Aggref *) node;
-
-				APP_JUMB(expr->aggfnoid);
-				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggorder);
-				JumbleExpr(jstate, (Node *) expr->aggdistinct);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_GroupingFunc:
-			{
-				GroupingFunc *grpnode = (GroupingFunc *) node;
-
-				JumbleExpr(jstate, (Node *) grpnode->refs);
-			}
-			break;
-		case T_WindowFunc:
-			{
-				WindowFunc *expr = (WindowFunc *) node;
-
-				APP_JUMB(expr->winfnoid);
-				APP_JUMB(expr->winref);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_SubscriptingRef:
-			{
-				SubscriptingRef *sbsref = (SubscriptingRef *) node;
-
-				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
-			}
-			break;
-		case T_FuncExpr:
-			{
-				FuncExpr   *expr = (FuncExpr *) node;
-
-				APP_JUMB(expr->funcid);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_NamedArgExpr:
-			{
-				NamedArgExpr *nae = (NamedArgExpr *) node;
-
-				APP_JUMB(nae->argnumber);
-				JumbleExpr(jstate, (Node *) nae->arg);
-			}
-			break;
-		case T_OpExpr:
-		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
-		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
-			{
-				OpExpr	   *expr = (OpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_ScalarArrayOpExpr:
-			{
-				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				APP_JUMB(expr->useOr);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_BoolExpr:
-			{
-				BoolExpr   *expr = (BoolExpr *) node;
-
-				APP_JUMB(expr->boolop);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_SubLink:
-			{
-				SubLink    *sublink = (SubLink *) node;
-
-				APP_JUMB(sublink->subLinkType);
-				APP_JUMB(sublink->subLinkId);
-				JumbleExpr(jstate, (Node *) sublink->testexpr);
-				JumbleQuery(jstate, castNode(Query, sublink->subselect));
-			}
-			break;
-		case T_FieldSelect:
-			{
-				FieldSelect *fs = (FieldSelect *) node;
-
-				APP_JUMB(fs->fieldnum);
-				JumbleExpr(jstate, (Node *) fs->arg);
-			}
-			break;
-		case T_FieldStore:
-			{
-				FieldStore *fstore = (FieldStore *) node;
-
-				JumbleExpr(jstate, (Node *) fstore->arg);
-				JumbleExpr(jstate, (Node *) fstore->newvals);
-			}
-			break;
-		case T_RelabelType:
-			{
-				RelabelType *rt = (RelabelType *) node;
-
-				APP_JUMB(rt->resulttype);
-				JumbleExpr(jstate, (Node *) rt->arg);
-			}
-			break;
-		case T_CoerceViaIO:
-			{
-				CoerceViaIO *cio = (CoerceViaIO *) node;
-
-				APP_JUMB(cio->resulttype);
-				JumbleExpr(jstate, (Node *) cio->arg);
-			}
-			break;
-		case T_ArrayCoerceExpr:
-			{
-				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
-
-				APP_JUMB(acexpr->resulttype);
-				JumbleExpr(jstate, (Node *) acexpr->arg);
-				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
-			}
-			break;
-		case T_ConvertRowtypeExpr:
-			{
-				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
-
-				APP_JUMB(crexpr->resulttype);
-				JumbleExpr(jstate, (Node *) crexpr->arg);
-			}
-			break;
-		case T_CollateExpr:
-			{
-				CollateExpr *ce = (CollateExpr *) node;
-
-				APP_JUMB(ce->collOid);
-				JumbleExpr(jstate, (Node *) ce->arg);
-			}
-			break;
-		case T_CaseExpr:
-			{
-				CaseExpr   *caseexpr = (CaseExpr *) node;
-
-				JumbleExpr(jstate, (Node *) caseexpr->arg);
-				foreach(temp, caseexpr->args)
-				{
-					CaseWhen   *when = lfirst_node(CaseWhen, temp);
-
-					JumbleExpr(jstate, (Node *) when->expr);
-					JumbleExpr(jstate, (Node *) when->result);
-				}
-				JumbleExpr(jstate, (Node *) caseexpr->defresult);
-			}
-			break;
-		case T_CaseTestExpr:
-			{
-				CaseTestExpr *ct = (CaseTestExpr *) node;
-
-				APP_JUMB(ct->typeId);
-			}
-			break;
-		case T_ArrayExpr:
-			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
-			break;
-		case T_RowExpr:
-			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
-			break;
-		case T_RowCompareExpr:
-			{
-				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
-
-				APP_JUMB(rcexpr->rctype);
-				JumbleExpr(jstate, (Node *) rcexpr->largs);
-				JumbleExpr(jstate, (Node *) rcexpr->rargs);
-			}
-			break;
-		case T_CoalesceExpr:
-			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
-			break;
-		case T_MinMaxExpr:
-			{
-				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
-
-				APP_JUMB(mmexpr->op);
-				JumbleExpr(jstate, (Node *) mmexpr->args);
-			}
-			break;
-		case T_SQLValueFunction:
-			{
-				SQLValueFunction *svf = (SQLValueFunction *) node;
-
-				APP_JUMB(svf->op);
-				/* type is fully determined by op */
-				APP_JUMB(svf->typmod);
-			}
-			break;
-		case T_XmlExpr:
-			{
-				XmlExpr    *xexpr = (XmlExpr *) node;
-
-				APP_JUMB(xexpr->op);
-				JumbleExpr(jstate, (Node *) xexpr->named_args);
-				JumbleExpr(jstate, (Node *) xexpr->args);
-			}
-			break;
-		case T_NullTest:
-			{
-				NullTest   *nt = (NullTest *) node;
-
-				APP_JUMB(nt->nulltesttype);
-				JumbleExpr(jstate, (Node *) nt->arg);
-			}
-			break;
-		case T_BooleanTest:
-			{
-				BooleanTest *bt = (BooleanTest *) node;
-
-				APP_JUMB(bt->booltesttype);
-				JumbleExpr(jstate, (Node *) bt->arg);
-			}
-			break;
-		case T_CoerceToDomain:
-			{
-				CoerceToDomain *cd = (CoerceToDomain *) node;
-
-				APP_JUMB(cd->resulttype);
-				JumbleExpr(jstate, (Node *) cd->arg);
-			}
-			break;
-		case T_CoerceToDomainValue:
-			{
-				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
-
-				APP_JUMB(cdv->typeId);
-			}
-			break;
-		case T_SetToDefault:
-			{
-				SetToDefault *sd = (SetToDefault *) node;
-
-				APP_JUMB(sd->typeId);
-			}
-			break;
-		case T_CurrentOfExpr:
-			{
-				CurrentOfExpr *ce = (CurrentOfExpr *) node;
-
-				APP_JUMB(ce->cvarno);
-				if (ce->cursor_name)
-					APP_JUMB_STRING(ce->cursor_name);
-				APP_JUMB(ce->cursor_param);
-			}
-			break;
-		case T_NextValueExpr:
-			{
-				NextValueExpr *nve = (NextValueExpr *) node;
-
-				APP_JUMB(nve->seqid);
-				APP_JUMB(nve->typeId);
-			}
-			break;
-		case T_InferenceElem:
-			{
-				InferenceElem *ie = (InferenceElem *) node;
-
-				APP_JUMB(ie->infercollid);
-				APP_JUMB(ie->inferopclass);
-				JumbleExpr(jstate, ie->expr);
-			}
-			break;
-		case T_TargetEntry:
-			{
-				TargetEntry *tle = (TargetEntry *) node;
-
-				APP_JUMB(tle->resno);
-				APP_JUMB(tle->ressortgroupref);
-				JumbleExpr(jstate, (Node *) tle->expr);
-			}
-			break;
-		case T_RangeTblRef:
-			{
-				RangeTblRef *rtr = (RangeTblRef *) node;
-
-				APP_JUMB(rtr->rtindex);
-			}
-			break;
-		case T_JoinExpr:
-			{
-				JoinExpr   *join = (JoinExpr *) node;
-
-				APP_JUMB(join->jointype);
-				APP_JUMB(join->isNatural);
-				APP_JUMB(join->rtindex);
-				JumbleExpr(jstate, join->larg);
-				JumbleExpr(jstate, join->rarg);
-				JumbleExpr(jstate, join->quals);
-			}
-			break;
-		case T_FromExpr:
-			{
-				FromExpr   *from = (FromExpr *) node;
-
-				JumbleExpr(jstate, (Node *) from->fromlist);
-				JumbleExpr(jstate, from->quals);
-			}
-			break;
-		case T_OnConflictExpr:
-			{
-				OnConflictExpr *conf = (OnConflictExpr *) node;
-
-				APP_JUMB(conf->action);
-				JumbleExpr(jstate, (Node *) conf->arbiterElems);
-				JumbleExpr(jstate, conf->arbiterWhere);
-				JumbleExpr(jstate, (Node *) conf->onConflictSet);
-				JumbleExpr(jstate, conf->onConflictWhere);
-				APP_JUMB(conf->constraint);
-				APP_JUMB(conf->exclRelIndex);
-				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
-			}
-			break;
-		case T_List:
-			foreach(temp, (List *) node)
-			{
-				JumbleExpr(jstate, (Node *) lfirst(temp));
-			}
-			break;
-		case T_IntList:
-			foreach(temp, (List *) node)
-			{
-				APP_JUMB(lfirst_int(temp));
-			}
-			break;
-		case T_SortGroupClause:
-			{
-				SortGroupClause *sgc = (SortGroupClause *) node;
-
-				APP_JUMB(sgc->tleSortGroupRef);
-				APP_JUMB(sgc->eqop);
-				APP_JUMB(sgc->sortop);
-				APP_JUMB(sgc->nulls_first);
-			}
-			break;
-		case T_GroupingSet:
-			{
-				GroupingSet *gsnode = (GroupingSet *) node;
-
-				JumbleExpr(jstate, (Node *) gsnode->content);
-			}
-			break;
-		case T_WindowClause:
-			{
-				WindowClause *wc = (WindowClause *) node;
-
-				APP_JUMB(wc->winref);
-				APP_JUMB(wc->frameOptions);
-				JumbleExpr(jstate, (Node *) wc->partitionClause);
-				JumbleExpr(jstate, (Node *) wc->orderClause);
-				JumbleExpr(jstate, wc->startOffset);
-				JumbleExpr(jstate, wc->endOffset);
-			}
-			break;
-		case T_CommonTableExpr:
-			{
-				CommonTableExpr *cte = (CommonTableExpr *) node;
-
-				/* we store the string name because RTE_CTE RTEs need it */
-				APP_JUMB_STRING(cte->ctename);
-				APP_JUMB(cte->ctematerialized);
-				JumbleQuery(jstate, castNode(Query, cte->ctequery));
-			}
-			break;
-		case T_SetOperationStmt:
-			{
-				SetOperationStmt *setop = (SetOperationStmt *) node;
-
-				APP_JUMB(setop->op);
-				APP_JUMB(setop->all);
-				JumbleExpr(jstate, setop->larg);
-				JumbleExpr(jstate, setop->rarg);
-			}
-			break;
-		case T_RangeTblFunction:
-			{
-				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
-
-				JumbleExpr(jstate, rtfunc->funcexpr);
-			}
-			break;
-		case T_TableFunc:
-			{
-				TableFunc  *tablefunc = (TableFunc *) node;
-
-				JumbleExpr(jstate, tablefunc->docexpr);
-				JumbleExpr(jstate, tablefunc->rowexpr);
-				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
-			}
-			break;
-		case T_TableSampleClause:
-			{
-				TableSampleClause *tsc = (TableSampleClause *) node;
-
-				APP_JUMB(tsc->tsmhandler);
-				JumbleExpr(jstate, (Node *) tsc->args);
-				JumbleExpr(jstate, (Node *) tsc->repeatable);
-			}
-			break;
-		default:
-			/* Only a warning, since we can stumble along anyway */
-			elog(WARNING, "unrecognized node type: %d",
-				 (int) nodeTag(node));
-			break;
-	}
-}
-
-/*
- * Record location of constant within query string of query tree
- * that is currently being walked.
- */
-static void
-RecordConstLocation(pgssJumbleState *jstate, int location)
-{
-	/* -1 indicates unknown or undefined location */
-	if (location >= 0)
-	{
-		/* enlarge array if needed */
-		if (jstate->clocations_count >= jstate->clocations_buf_size)
-		{
-			jstate->clocations_buf_size *= 2;
-			jstate->clocations = (pgssLocationLen *)
-				repalloc(jstate->clocations,
-						 jstate->clocations_buf_size *
-						 sizeof(pgssLocationLen));
-		}
-		jstate->clocations[jstate->clocations_count].location = location;
-		/* initialize lengths to -1 to simplify fill_in_constant_lengths */
-		jstate->clocations[jstate->clocations_count].length = -1;
-		jstate->clocations_count++;
-	}
-}
-
 /*
  * Generate a normalized version of the query string that will be used to
  * represent all similar queries.
@@ -3319,7 +2564,7 @@ RecordConstLocation(pgssJumbleState *jstate, int location)
  * Returns a palloc'd string.
  */
 static char *
-generate_normalized_query(pgssJumbleState *jstate, const char *query,
+generate_normalized_query(JumbleState *jstate, const char *query,
 						  int query_loc, int *query_len_p)
 {
 	char	   *norm_query;
@@ -3426,10 +2671,10 @@ generate_normalized_query(pgssJumbleState *jstate, const char *query,
  * reason for a constant to start with a '-'.
  */
 static void
-fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+fill_in_constant_lengths(JumbleState *jstate, const char *query,
 						 int query_loc)
 {
-	pgssLocationLen *locs;
+	LocationLen *locs;
 	core_yyscan_t yyscanner;
 	core_yy_extra_type yyextra;
 	core_YYSTYPE yylval;
@@ -3443,7 +2688,7 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 	 */
 	if (jstate->clocations_count > 1)
 		qsort(jstate->clocations, jstate->clocations_count,
-			  sizeof(pgssLocationLen), comp_location);
+			  sizeof(LocationLen), comp_location);
 	locs = jstate->clocations;
 
 	/* initialize the flex scanner --- should match raw_parser() */
@@ -3523,13 +2768,13 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 }
 
 /*
- * comp_location: comparator for qsorting pgssLocationLen structs by location
+ * comp_location: comparator for qsorting LocationLen structs by location
  */
 static int
 comp_location(const void *a, const void *b)
 {
-	int			l = ((const pgssLocationLen *) a)->location;
-	int			r = ((const pgssLocationLen *) b)->location;
+	int			l = ((const LocationLen *) a)->location;
+	int			r = ((const LocationLen *) b)->location;
 
 	if (l < r)
 		return -1;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.conf b/contrib/pg_stat_statements/pg_stat_statements.conf
index 13346e2807..d98411ea3f 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.conf
+++ b/contrib/pg_stat_statements/pg_stat_statements.conf
@@ -1 +1,2 @@
 shared_preload_libraries = 'pg_stat_statements'
+compute_queryid = on
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b5718fc136..e5b6a68bae 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7504,6 +7504,24 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
      <title>Statistics Monitoring</title>
      <variablelist>
 
+     <varlistentry id="guc-compute-queryid" xreflabel="compute_queryid">
+      <term><varname>compute_queryid</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>compute_queryid</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables in core query identifier computation.arameter.  The
+        <xref linkend="pgstatstatements"/> extension requires a query
+        identifier to be computed.  Note that an external module can
+        alternatively be used if the in core query identifier computation
+        specification doesn't suit your need.  In this case, in core
+        computation must be disabled.  The default is <literal>off</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><varname>log_statement_stats</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 0f3a70c49a..ddfb97b543 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -46,6 +46,8 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/queryjumble.h"
 #include "utils/rel.h"
 
 
@@ -107,6 +109,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -119,8 +122,11 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	query = transformTopLevelStmt(pstate, parseTree);
 
+	if (compute_queryid)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
@@ -140,6 +146,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -152,8 +159,11 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 	/* make sure all is well with parameter types */
 	check_variable_parameters(pstate, query);
 
+	if (compute_queryid)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index bb5ccb4578..99f460a301 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -720,6 +720,7 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 	ParseState *pstate;
 	Query	   *query;
 	List	   *querytree_list;
+	JumbleState *jstate = NULL;
 
 	Assert(query_string != NULL);	/* required as of 8.4 */
 
@@ -738,8 +739,11 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	query = transformTopLevelStmt(pstate, parsetree);
 
+	if (compute_queryid)
+		jstate = JumbleQuery(query, query_string);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 2397fc2453..1d5327cf64 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -22,6 +22,7 @@ OBJS = \
 	pg_rusage.o \
 	ps_status.o \
 	queryenvironment.o \
+	queryjumble.o \
 	rls.o \
 	sampling.o \
 	superuser.o \
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index d626731723..65dde1da39 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -512,6 +512,7 @@ extern const struct config_enum_entry dynamic_shared_memory_options[];
 /*
  * GUC option variables that are exported from this module
  */
+bool		compute_queryid = false;
 bool		log_duration = false;
 bool		Debug_print_plan = false;
 bool		Debug_print_parse = false;
@@ -1407,6 +1408,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"compute_queryid", PGC_SUSET, STATS_MONITORING,
+			gettext_noop("Compute query identifiers."),
+			NULL
+		},
+		&compute_queryid,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"log_parser_stats", PGC_SUSET, STATS_MONITORING,
 			gettext_noop("Writes parser performance statistics to the server log."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index ee06528bb0..344049aac7 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -593,6 +593,7 @@
 
 # - Monitoring -
 
+#compute_queryid = off
 #log_parser_stats = off
 #log_planner_stats = off
 #log_executor_stats = off
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
new file mode 100644
index 0000000000..ae84fcac6e
--- /dev/null
+++ b/src/backend/utils/misc/queryjumble.c
@@ -0,0 +1,834 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.c
+ *	 Query normalization and fingerprinting.
+ *
+ * Normalization is a process whereby similar queries, typically differing only
+ * in their constants (though the exact rules are somewhat more subtle than
+ * that) are recognized as equivalent, and are tracked as a single entry.  This
+ * is particularly useful for non-prepared queries.
+ *
+ * Normalization is implemented by fingerprinting queries, selectively
+ * serializing those fields of each query tree's nodes that are judged to be
+ * essential to the query.  This is referred to as a query jumble.  This is
+ * distinct from a regular serialization in that various extraneous
+ * information is ignored as irrelevant or not essential to the query, such
+ * as the collations of Vars and, most notably, the values of constants.
+ *
+ * This jumble is acquired at the end of parse analysis of each query, and
+ * a 64-bit hash of it is stored into the query's Query.queryId field.
+ * The server then copies this value around, making it available in plan
+ * tree(s) generated from the query.  The executor can then use this value
+ * to blame query costs on the proper queryId.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/misc/queryjumble.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "common/hashfn.h"
+#include "miscadmin.h"
+#include "parser/scansup.h"
+#include "utils/queryjumble.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+static uint64 compute_utility_queryid(const char *str, int query_len);
+static void AppendJumble(JumbleState *jstate,
+						 const unsigned char *item, Size size);
+static void JumbleQueryInternal(JumbleState *jstate, Query *query);
+static void JumbleRangeTable(JumbleState *jstate, List *rtable);
+static void JumbleRowMarks(JumbleState *jstate, List *rowMarks);
+static void JumbleExpr(JumbleState *jstate, Node *node);
+static void RecordConstLocation(JumbleState *jstate, int location);
+
+/*
+ * Given a possibly multi-statement source string, confine our attention to the
+ * relevant part of the string.
+ */
+const char *
+clean_querytext(const char *query, int *location, int *len)
+{
+	int query_location = *location;
+	int query_len = *len;
+
+	/* First apply starting offset, unless it's -1 (unknown). */
+	if (query_location >= 0)
+	{
+		Assert(query_location <= strlen(query));
+		query += query_location;
+		/* Length of 0 (or -1) means "rest of string" */
+		if (query_len <= 0)
+			query_len = strlen(query);
+		else
+			Assert(query_len <= strlen(query));
+	}
+	else
+	{
+		/* If query location is unknown, distrust query_len as well */
+		query_location = 0;
+		query_len = strlen(query);
+	}
+
+	/*
+	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
+	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 */
+	while (query_len > 0 && scanner_isspace(query[0]))
+		query++, query_location++, query_len--;
+	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
+		query_len--;
+
+	*location = query_location;
+	*len = query_len;
+
+	return query;
+}
+
+JumbleState *
+JumbleQuery(Query *query, const char *querytext)
+{
+	JumbleState *jstate = NULL;
+	if (query->utilityStmt)
+	{
+		const char *sql;
+		int query_location = query->stmt_location;
+		int query_len = query->stmt_len;
+
+		/*
+		 * Confine our attention to the relevant part of the string, if the
+		 * query is a portion of a multi-statement source string.
+		 */
+		sql = clean_querytext(querytext, &query_location, &query_len);
+
+		query->queryId = compute_utility_queryid(sql, query_len);
+	}
+	else
+	{
+		jstate = (JumbleState *) palloc(sizeof(JumbleState));
+
+		/* Set up workspace for query jumbling */
+		jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
+		jstate->jumble_len = 0;
+		jstate->clocations_buf_size = 32;
+		jstate->clocations = (LocationLen *)
+			palloc(jstate->clocations_buf_size * sizeof(LocationLen));
+		jstate->clocations_count = 0;
+		jstate->highest_extern_param_id = 0;
+
+		/* Compute query ID and mark the Query node with it */
+		JumbleQueryInternal(jstate, query);
+		query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
+														  jstate->jumble_len,
+														  0));
+
+		/*
+		 * If we are unlucky enough to get a hash of zero, use 1 instead, to
+		 * prevent confusion with the utility-statement case.
+		 */
+		if (query->queryId == UINT64CONST(0))
+			query->queryId = UINT64CONST(1);
+	}
+
+	return jstate;
+}
+
+/*
+ * Compute a query identifier for the given utility query string.
+ */
+static uint64
+compute_utility_queryid(const char *str, int query_len)
+{
+	uint64 queryId;
+
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+											   query_len, 0));
+
+	/*
+	 * If we are unlucky enough to get a hash of zero(invalid), use
+	 * queryID as 2 instead, queryID 1 is already in use for normal
+	 * statements.
+	 */
+	if (queryId == UINT64CONST(0))
+		queryId = UINT64CONST(2);
+
+	return queryId;
+}
+
+/*
+ * AppendJumble: Append a value that is substantive in a given query to
+ * the current jumble.
+ */
+static void
+AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
+{
+	unsigned char *jumble = jstate->jumble;
+	Size		jumble_len = jstate->jumble_len;
+
+	/*
+	 * Whenever the jumble buffer is full, we hash the current contents and
+	 * reset the buffer to contain just that hash value, thus relying on the
+	 * hash to summarize everything so far.
+	 */
+	while (size > 0)
+	{
+		Size		part_size;
+
+		if (jumble_len >= JUMBLE_SIZE)
+		{
+			uint64		start_hash;
+
+			start_hash = DatumGetUInt64(hash_any_extended(jumble,
+														  JUMBLE_SIZE, 0));
+			memcpy(jumble, &start_hash, sizeof(start_hash));
+			jumble_len = sizeof(start_hash);
+		}
+		part_size = Min(size, JUMBLE_SIZE - jumble_len);
+		memcpy(jumble + jumble_len, item, part_size);
+		jumble_len += part_size;
+		item += part_size;
+		size -= part_size;
+	}
+	jstate->jumble_len = jumble_len;
+}
+
+/*
+ * Wrappers around AppendJumble to encapsulate details of serialization
+ * of individual local variable elements.
+ */
+#define APP_JUMB(item) \
+	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
+#define APP_JUMB_STRING(str) \
+	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
+
+/*
+ * JumbleQueryInternal: Selectively serialize the query tree, appending
+ * significant data to the "query jumble" while ignoring nonsignificant data.
+ *
+ * Rule of thumb for what to include is that we should ignore anything not
+ * semantically significant (such as alias names) as well as anything that can
+ * be deduced from child nodes (else we'd just be double-hashing that piece
+ * of information).
+ */
+static void
+JumbleQueryInternal(JumbleState *jstate, Query *query)
+{
+	Assert(IsA(query, Query));
+	Assert(query->utilityStmt == NULL);
+
+	APP_JUMB(query->commandType);
+	/* resultRelation is usually predictable from commandType */
+	JumbleExpr(jstate, (Node *) query->cteList);
+	JumbleRangeTable(jstate, query->rtable);
+	JumbleExpr(jstate, (Node *) query->jointree);
+	JumbleExpr(jstate, (Node *) query->targetList);
+	JumbleExpr(jstate, (Node *) query->onConflict);
+	JumbleExpr(jstate, (Node *) query->returningList);
+	JumbleExpr(jstate, (Node *) query->groupClause);
+	JumbleExpr(jstate, (Node *) query->groupingSets);
+	JumbleExpr(jstate, query->havingQual);
+	JumbleExpr(jstate, (Node *) query->windowClause);
+	JumbleExpr(jstate, (Node *) query->distinctClause);
+	JumbleExpr(jstate, (Node *) query->sortClause);
+	JumbleExpr(jstate, query->limitOffset);
+	JumbleExpr(jstate, query->limitCount);
+	JumbleRowMarks(jstate, query->rowMarks);
+	JumbleExpr(jstate, query->setOperations);
+}
+
+/*
+ * Jumble a range table
+ */
+static void
+JumbleRangeTable(JumbleState *jstate, List *rtable)
+{
+	ListCell   *lc;
+
+	foreach(lc, rtable)
+	{
+		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+		APP_JUMB(rte->rtekind);
+		switch (rte->rtekind)
+		{
+			case RTE_RELATION:
+				APP_JUMB(rte->relid);
+				JumbleExpr(jstate, (Node *) rte->tablesample);
+				break;
+			case RTE_SUBQUERY:
+				JumbleQueryInternal(jstate, rte->subquery);
+				break;
+			case RTE_JOIN:
+				APP_JUMB(rte->jointype);
+				break;
+			case RTE_FUNCTION:
+				JumbleExpr(jstate, (Node *) rte->functions);
+				break;
+			case RTE_TABLEFUNC:
+				JumbleExpr(jstate, (Node *) rte->tablefunc);
+				break;
+			case RTE_VALUES:
+				JumbleExpr(jstate, (Node *) rte->values_lists);
+				break;
+			case RTE_CTE:
+
+				/*
+				 * Depending on the CTE name here isn't ideal, but it's the
+				 * only info we have to identify the referenced WITH item.
+				 */
+				APP_JUMB_STRING(rte->ctename);
+				APP_JUMB(rte->ctelevelsup);
+				break;
+			case RTE_NAMEDTUPLESTORE:
+				APP_JUMB_STRING(rte->enrname);
+				break;
+			case RTE_RESULT:
+				break;
+			default:
+				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
+				break;
+		}
+	}
+}
+
+/*
+ * Jumble a rowMarks list
+ */
+static void
+JumbleRowMarks(JumbleState *jstate, List *rowMarks)
+{
+	ListCell   *lc;
+
+	foreach(lc, rowMarks)
+	{
+		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
+
+		if (!rowmark->pushedDown)
+		{
+			APP_JUMB(rowmark->rti);
+			APP_JUMB(rowmark->strength);
+			APP_JUMB(rowmark->waitPolicy);
+		}
+	}
+}
+
+/*
+ * Jumble an expression tree
+ *
+ * In general this function should handle all the same node types that
+ * expression_tree_walker() does, and therefore it's coded to be as parallel
+ * to that function as possible.  However, since we are only invoked on
+ * queries immediately post-parse-analysis, we need not handle node types
+ * that only appear in planning.
+ *
+ * Note: the reason we don't simply use expression_tree_walker() is that the
+ * point of that function is to support tree walkers that don't care about
+ * most tree node types, but here we care about all types.  We should complain
+ * about any unrecognized node type.
+ */
+static void
+JumbleExpr(JumbleState *jstate, Node *node)
+{
+	ListCell   *temp;
+
+	if (node == NULL)
+		return;
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+
+	/*
+	 * We always emit the node's NodeTag, then any additional fields that are
+	 * considered significant, and then we recurse to any child nodes.
+	 */
+	APP_JUMB(node->type);
+
+	switch (nodeTag(node))
+	{
+		case T_Var:
+			{
+				Var		   *var = (Var *) node;
+
+				APP_JUMB(var->varno);
+				APP_JUMB(var->varattno);
+				APP_JUMB(var->varlevelsup);
+			}
+			break;
+		case T_Const:
+			{
+				Const	   *c = (Const *) node;
+
+				/* We jumble only the constant's type, not its value */
+				APP_JUMB(c->consttype);
+				/* Also, record its parse location for query normalization */
+				RecordConstLocation(jstate, c->location);
+			}
+			break;
+		case T_Param:
+			{
+				Param	   *p = (Param *) node;
+
+				APP_JUMB(p->paramkind);
+				APP_JUMB(p->paramid);
+				APP_JUMB(p->paramtype);
+				/* Also, track the highest external Param id */
+				if (p->paramkind == PARAM_EXTERN &&
+					p->paramid > jstate->highest_extern_param_id)
+					jstate->highest_extern_param_id = p->paramid;
+			}
+			break;
+		case T_Aggref:
+			{
+				Aggref	   *expr = (Aggref *) node;
+
+				APP_JUMB(expr->aggfnoid);
+				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggorder);
+				JumbleExpr(jstate, (Node *) expr->aggdistinct);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_GroupingFunc:
+			{
+				GroupingFunc *grpnode = (GroupingFunc *) node;
+
+				JumbleExpr(jstate, (Node *) grpnode->refs);
+			}
+			break;
+		case T_WindowFunc:
+			{
+				WindowFunc *expr = (WindowFunc *) node;
+
+				APP_JUMB(expr->winfnoid);
+				APP_JUMB(expr->winref);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_SubscriptingRef:
+			{
+				SubscriptingRef *sbsref = (SubscriptingRef *) node;
+
+				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
+			}
+			break;
+		case T_FuncExpr:
+			{
+				FuncExpr   *expr = (FuncExpr *) node;
+
+				APP_JUMB(expr->funcid);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_NamedArgExpr:
+			{
+				NamedArgExpr *nae = (NamedArgExpr *) node;
+
+				APP_JUMB(nae->argnumber);
+				JumbleExpr(jstate, (Node *) nae->arg);
+			}
+			break;
+		case T_OpExpr:
+		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
+		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
+			{
+				OpExpr	   *expr = (OpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_ScalarArrayOpExpr:
+			{
+				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				APP_JUMB(expr->useOr);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_BoolExpr:
+			{
+				BoolExpr   *expr = (BoolExpr *) node;
+
+				APP_JUMB(expr->boolop);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_SubLink:
+			{
+				SubLink    *sublink = (SubLink *) node;
+
+				APP_JUMB(sublink->subLinkType);
+				APP_JUMB(sublink->subLinkId);
+				JumbleExpr(jstate, (Node *) sublink->testexpr);
+				JumbleQueryInternal(jstate, castNode(Query, sublink->subselect));
+			}
+			break;
+		case T_FieldSelect:
+			{
+				FieldSelect *fs = (FieldSelect *) node;
+
+				APP_JUMB(fs->fieldnum);
+				JumbleExpr(jstate, (Node *) fs->arg);
+			}
+			break;
+		case T_FieldStore:
+			{
+				FieldStore *fstore = (FieldStore *) node;
+
+				JumbleExpr(jstate, (Node *) fstore->arg);
+				JumbleExpr(jstate, (Node *) fstore->newvals);
+			}
+			break;
+		case T_RelabelType:
+			{
+				RelabelType *rt = (RelabelType *) node;
+
+				APP_JUMB(rt->resulttype);
+				JumbleExpr(jstate, (Node *) rt->arg);
+			}
+			break;
+		case T_CoerceViaIO:
+			{
+				CoerceViaIO *cio = (CoerceViaIO *) node;
+
+				APP_JUMB(cio->resulttype);
+				JumbleExpr(jstate, (Node *) cio->arg);
+			}
+			break;
+		case T_ArrayCoerceExpr:
+			{
+				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
+
+				APP_JUMB(acexpr->resulttype);
+				JumbleExpr(jstate, (Node *) acexpr->arg);
+				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
+			}
+			break;
+		case T_ConvertRowtypeExpr:
+			{
+				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
+
+				APP_JUMB(crexpr->resulttype);
+				JumbleExpr(jstate, (Node *) crexpr->arg);
+			}
+			break;
+		case T_CollateExpr:
+			{
+				CollateExpr *ce = (CollateExpr *) node;
+
+				APP_JUMB(ce->collOid);
+				JumbleExpr(jstate, (Node *) ce->arg);
+			}
+			break;
+		case T_CaseExpr:
+			{
+				CaseExpr   *caseexpr = (CaseExpr *) node;
+
+				JumbleExpr(jstate, (Node *) caseexpr->arg);
+				foreach(temp, caseexpr->args)
+				{
+					CaseWhen   *when = lfirst_node(CaseWhen, temp);
+
+					JumbleExpr(jstate, (Node *) when->expr);
+					JumbleExpr(jstate, (Node *) when->result);
+				}
+				JumbleExpr(jstate, (Node *) caseexpr->defresult);
+			}
+			break;
+		case T_CaseTestExpr:
+			{
+				CaseTestExpr *ct = (CaseTestExpr *) node;
+
+				APP_JUMB(ct->typeId);
+			}
+			break;
+		case T_ArrayExpr:
+			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
+			break;
+		case T_RowExpr:
+			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
+			break;
+		case T_RowCompareExpr:
+			{
+				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
+
+				APP_JUMB(rcexpr->rctype);
+				JumbleExpr(jstate, (Node *) rcexpr->largs);
+				JumbleExpr(jstate, (Node *) rcexpr->rargs);
+			}
+			break;
+		case T_CoalesceExpr:
+			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
+			break;
+		case T_MinMaxExpr:
+			{
+				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
+
+				APP_JUMB(mmexpr->op);
+				JumbleExpr(jstate, (Node *) mmexpr->args);
+			}
+			break;
+		case T_SQLValueFunction:
+			{
+				SQLValueFunction *svf = (SQLValueFunction *) node;
+
+				APP_JUMB(svf->op);
+				/* type is fully determined by op */
+				APP_JUMB(svf->typmod);
+			}
+			break;
+		case T_XmlExpr:
+			{
+				XmlExpr    *xexpr = (XmlExpr *) node;
+
+				APP_JUMB(xexpr->op);
+				JumbleExpr(jstate, (Node *) xexpr->named_args);
+				JumbleExpr(jstate, (Node *) xexpr->args);
+			}
+			break;
+		case T_NullTest:
+			{
+				NullTest   *nt = (NullTest *) node;
+
+				APP_JUMB(nt->nulltesttype);
+				JumbleExpr(jstate, (Node *) nt->arg);
+			}
+			break;
+		case T_BooleanTest:
+			{
+				BooleanTest *bt = (BooleanTest *) node;
+
+				APP_JUMB(bt->booltesttype);
+				JumbleExpr(jstate, (Node *) bt->arg);
+			}
+			break;
+		case T_CoerceToDomain:
+			{
+				CoerceToDomain *cd = (CoerceToDomain *) node;
+
+				APP_JUMB(cd->resulttype);
+				JumbleExpr(jstate, (Node *) cd->arg);
+			}
+			break;
+		case T_CoerceToDomainValue:
+			{
+				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
+
+				APP_JUMB(cdv->typeId);
+			}
+			break;
+		case T_SetToDefault:
+			{
+				SetToDefault *sd = (SetToDefault *) node;
+
+				APP_JUMB(sd->typeId);
+			}
+			break;
+		case T_CurrentOfExpr:
+			{
+				CurrentOfExpr *ce = (CurrentOfExpr *) node;
+
+				APP_JUMB(ce->cvarno);
+				if (ce->cursor_name)
+					APP_JUMB_STRING(ce->cursor_name);
+				APP_JUMB(ce->cursor_param);
+			}
+			break;
+		case T_NextValueExpr:
+			{
+				NextValueExpr *nve = (NextValueExpr *) node;
+
+				APP_JUMB(nve->seqid);
+				APP_JUMB(nve->typeId);
+			}
+			break;
+		case T_InferenceElem:
+			{
+				InferenceElem *ie = (InferenceElem *) node;
+
+				APP_JUMB(ie->infercollid);
+				APP_JUMB(ie->inferopclass);
+				JumbleExpr(jstate, ie->expr);
+			}
+			break;
+		case T_TargetEntry:
+			{
+				TargetEntry *tle = (TargetEntry *) node;
+
+				APP_JUMB(tle->resno);
+				APP_JUMB(tle->ressortgroupref);
+				JumbleExpr(jstate, (Node *) tle->expr);
+			}
+			break;
+		case T_RangeTblRef:
+			{
+				RangeTblRef *rtr = (RangeTblRef *) node;
+
+				APP_JUMB(rtr->rtindex);
+			}
+			break;
+		case T_JoinExpr:
+			{
+				JoinExpr   *join = (JoinExpr *) node;
+
+				APP_JUMB(join->jointype);
+				APP_JUMB(join->isNatural);
+				APP_JUMB(join->rtindex);
+				JumbleExpr(jstate, join->larg);
+				JumbleExpr(jstate, join->rarg);
+				JumbleExpr(jstate, join->quals);
+			}
+			break;
+		case T_FromExpr:
+			{
+				FromExpr   *from = (FromExpr *) node;
+
+				JumbleExpr(jstate, (Node *) from->fromlist);
+				JumbleExpr(jstate, from->quals);
+			}
+			break;
+		case T_OnConflictExpr:
+			{
+				OnConflictExpr *conf = (OnConflictExpr *) node;
+
+				APP_JUMB(conf->action);
+				JumbleExpr(jstate, (Node *) conf->arbiterElems);
+				JumbleExpr(jstate, conf->arbiterWhere);
+				JumbleExpr(jstate, (Node *) conf->onConflictSet);
+				JumbleExpr(jstate, conf->onConflictWhere);
+				APP_JUMB(conf->constraint);
+				APP_JUMB(conf->exclRelIndex);
+				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
+			}
+			break;
+		case T_List:
+			foreach(temp, (List *) node)
+			{
+				JumbleExpr(jstate, (Node *) lfirst(temp));
+			}
+			break;
+		case T_IntList:
+			foreach(temp, (List *) node)
+			{
+				APP_JUMB(lfirst_int(temp));
+			}
+			break;
+		case T_SortGroupClause:
+			{
+				SortGroupClause *sgc = (SortGroupClause *) node;
+
+				APP_JUMB(sgc->tleSortGroupRef);
+				APP_JUMB(sgc->eqop);
+				APP_JUMB(sgc->sortop);
+				APP_JUMB(sgc->nulls_first);
+			}
+			break;
+		case T_GroupingSet:
+			{
+				GroupingSet *gsnode = (GroupingSet *) node;
+
+				JumbleExpr(jstate, (Node *) gsnode->content);
+			}
+			break;
+		case T_WindowClause:
+			{
+				WindowClause *wc = (WindowClause *) node;
+
+				APP_JUMB(wc->winref);
+				APP_JUMB(wc->frameOptions);
+				JumbleExpr(jstate, (Node *) wc->partitionClause);
+				JumbleExpr(jstate, (Node *) wc->orderClause);
+				JumbleExpr(jstate, wc->startOffset);
+				JumbleExpr(jstate, wc->endOffset);
+			}
+			break;
+		case T_CommonTableExpr:
+			{
+				CommonTableExpr *cte = (CommonTableExpr *) node;
+
+				/* we store the string name because RTE_CTE RTEs need it */
+				APP_JUMB_STRING(cte->ctename);
+				APP_JUMB(cte->ctematerialized);
+				JumbleQueryInternal(jstate, castNode(Query, cte->ctequery));
+			}
+			break;
+		case T_SetOperationStmt:
+			{
+				SetOperationStmt *setop = (SetOperationStmt *) node;
+
+				APP_JUMB(setop->op);
+				APP_JUMB(setop->all);
+				JumbleExpr(jstate, setop->larg);
+				JumbleExpr(jstate, setop->rarg);
+			}
+			break;
+		case T_RangeTblFunction:
+			{
+				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
+
+				JumbleExpr(jstate, rtfunc->funcexpr);
+			}
+			break;
+		case T_TableFunc:
+			{
+				TableFunc  *tablefunc = (TableFunc *) node;
+
+				JumbleExpr(jstate, tablefunc->docexpr);
+				JumbleExpr(jstate, tablefunc->rowexpr);
+				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
+			}
+			break;
+		case T_TableSampleClause:
+			{
+				TableSampleClause *tsc = (TableSampleClause *) node;
+
+				APP_JUMB(tsc->tsmhandler);
+				JumbleExpr(jstate, (Node *) tsc->args);
+				JumbleExpr(jstate, (Node *) tsc->repeatable);
+			}
+			break;
+		default:
+			/* Only a warning, since we can stumble along anyway */
+			elog(WARNING, "unrecognized node type: %d",
+				 (int) nodeTag(node));
+			break;
+	}
+}
+
+/*
+ * Record location of constant within query string of query tree
+ * that is currently being walked.
+ */
+static void
+RecordConstLocation(JumbleState *jstate, int location)
+{
+	/* -1 indicates unknown or undefined location */
+	if (location >= 0)
+	{
+		/* enlarge array if needed */
+		if (jstate->clocations_count >= jstate->clocations_buf_size)
+		{
+			jstate->clocations_buf_size *= 2;
+			jstate->clocations = (LocationLen *)
+				repalloc(jstate->clocations,
+						 jstate->clocations_buf_size *
+						 sizeof(LocationLen));
+		}
+		jstate->clocations[jstate->clocations_count].location = location;
+		/* initialize lengths to -1 to simplify third-party module usage */
+		jstate->clocations[jstate->clocations_count].length = -1;
+		jstate->clocations_count++;
+	}
+}
diff --git a/src/include/parser/analyze.h b/src/include/parser/analyze.h
index 4a3c9686f9..6716db6c13 100644
--- a/src/include/parser/analyze.h
+++ b/src/include/parser/analyze.h
@@ -15,10 +15,12 @@
 #define ANALYZE_H
 
 #include "parser/parse_node.h"
+#include "utils/queryjumble.h"
 
 /* Hook for plugins to get control at end of parse analysis */
 typedef void (*post_parse_analyze_hook_type) (ParseState *pstate,
-											  Query *query);
+											  Query *query,
+											  JumbleState *jstate);
 extern PGDLLIMPORT post_parse_analyze_hook_type post_parse_analyze_hook;
 
 
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 5004ee4177..40c4a75bac 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -248,6 +248,7 @@ extern bool log_btree_build_stats;
 extern PGDLLIMPORT bool check_function_bodies;
 extern bool session_auth_is_superuser;
 
+extern bool compute_queryid;
 extern bool log_duration;
 extern int	log_parameter_max_length;
 extern int	log_parameter_max_length_on_error;
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
new file mode 100644
index 0000000000..14087eea43
--- /dev/null
+++ b/src/include/utils/queryjumble.h
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.h
+ *	  Query normalization and fingerprinting.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/utils/queryjumble.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef QUERYJUBLE_H
+#define QUERYJUBLE_H
+
+#include "nodes/parsenodes.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+/*
+ * Struct for tracking locations/lengths of constants during normalization
+ */
+typedef struct LocationLen
+{
+	int			location;		/* start offset in query text */
+	int			length;			/* length in bytes, or -1 to ignore */
+} LocationLen;
+
+/*
+ * Working state for computing a query jumble and producing a normalized
+ * query string
+ */
+typedef struct JumbleState
+{
+	/* Jumble of current query tree */
+	unsigned char *jumble;
+
+	/* Number of bytes used in jumble[] */
+	Size		jumble_len;
+
+	/* Array of locations of constants that should be removed */
+	LocationLen *clocations;
+
+	/* Allocated length of clocations array */
+	int			clocations_buf_size;
+
+	/* Current number of valid entries in clocations array */
+	int			clocations_count;
+
+	/* highest Param id we've seen, in order to start normalization correctly */
+	int			highest_extern_param_id;
+} JumbleState;
+
+const char *clean_querytext(const char *query, int *location, int *len);
+JumbleState *JumbleQuery(Query *query, const char *querytext);
+
+#endif							/* QUERYJUMBLE_H */
-- 
2.30.1

rjuju123@gmail.com

almost 5 years ago

In reply to: Julien Rouhaud (#112)

3 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Recent conflict, thanks to cfbot. v18 attached.

Attachments:

v18-0001-Move-pg_stat_statements-query-jumbling-to-core.patchtext/x-diff; charset=us-asciiDownload

From fa94eba58ee0ca098cfde0d17de72dc230ee471c Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Wed, 14 Oct 2020 02:11:37 +0800
Subject: [PATCH v18 1/3] Move pg_stat_statements query jumbling to core.

A new compute_queryid GUC is also added, to control whether the queryid should
be computed.  It's now possible to disable core queryid computation and use
pg_stat_statements with a different algorithm to compute the queryid by using
third-party module.

Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +----------------
 .../pg_stat_statements.conf                   |   1 +
 doc/src/sgml/config.sgml                      |  18 +
 src/backend/parser/analyze.c                  |  14 +-
 src/backend/tcop/postgres.c                   |   6 +-
 src/backend/utils/misc/Makefile               |   1 +
 src/backend/utils/misc/guc.c                  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          | 834 ++++++++++++++++++
 src/include/parser/analyze.h                  |   4 +-
 src/include/utils/guc.h                       |   1 +
 src/include/utils/queryjumble.h               |  58 ++
 12 files changed, 969 insertions(+), 784 deletions(-)
 create mode 100644 src/backend/utils/misc/queryjumble.c
 create mode 100644 src/include/utils/queryjumble.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 62cccbfa44..99bc7184cb 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * As of Postgres 9.2, this module normalizes query entries.  As of Postgres
+ * 14, the normalization is done by the core, if compute_queryid is enabled, or
+ * by third-party modules if enabled.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -114,8 +99,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
-#define JUMBLE_SIZE				1024	/* query serialization buffer size */
-
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -235,40 +218,6 @@ typedef struct pgssSharedState
 	pgssGlobalStats stats;		/* global statistics for pgss */
 } pgssSharedState;
 
-/*
- * Struct for tracking locations/lengths of constants during normalization
- */
-typedef struct pgssLocationLen
-{
-	int			location;		/* start offset in query text */
-	int			length;			/* length in bytes, or -1 to ignore */
-} pgssLocationLen;
-
-/*
- * Working state for computing a query jumble and producing a normalized
- * query string
- */
-typedef struct pgssJumbleState
-{
-	/* Jumble of current query tree */
-	unsigned char *jumble;
-
-	/* Number of bytes used in jumble[] */
-	Size		jumble_len;
-
-	/* Array of locations of constants that should be removed */
-	pgssLocationLen *clocations;
-
-	/* Allocated length of clocations array */
-	int			clocations_buf_size;
-
-	/* Current number of valid entries in clocations array */
-	int			clocations_count;
-
-	/* highest Param id we've seen, in order to start normalization correctly */
-	int			highest_extern_param_id;
-} pgssJumbleState;
-
 /*---- Local variables ----*/
 
 /* Current nesting depth of ExecutorRun+ProcessUtility calls */
@@ -342,7 +291,8 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_info);
 
 static void pgss_shmem_startup(void);
 static void pgss_shmem_shutdown(int code, Datum arg);
-static void pgss_post_parse_analyze(ParseState *pstate, Query *query);
+static void pgss_post_parse_analyze(ParseState *pstate, Query *query,
+									JumbleState *jstate);
 static PlannedStmt *pgss_planner(Query *parse,
 								 const char *query_string,
 								 int cursorOptions,
@@ -364,7 +314,7 @@ static void pgss_store(const char *query, uint64 queryId,
 					   double total_time, uint64 rows,
 					   const BufferUsage *bufusage,
 					   const WalUsage *walusage,
-					   pgssJumbleState *jstate);
+					   JumbleState *jstate);
 static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
 										pgssVersion api_version,
 										bool showtext);
@@ -380,16 +330,9 @@ static char *qtext_fetch(Size query_offset, int query_len,
 static bool need_gc_qtexts(void);
 static void gc_qtexts(void);
 static void entry_reset(Oid userid, Oid dbid, uint64 queryid);
-static void AppendJumble(pgssJumbleState *jstate,
-						 const unsigned char *item, Size size);
-static void JumbleQuery(pgssJumbleState *jstate, Query *query);
-static void JumbleRangeTable(pgssJumbleState *jstate, List *rtable);
-static void JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks);
-static void JumbleExpr(pgssJumbleState *jstate, Node *node);
-static void RecordConstLocation(pgssJumbleState *jstate, int location);
-static char *generate_normalized_query(pgssJumbleState *jstate, const char *query,
+static char *generate_normalized_query(JumbleState *jstate, const char *query,
 									   int query_loc, int *query_len_p);
-static void fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+static void fill_in_constant_lengths(JumbleState *jstate, const char *query,
 									 int query_loc);
 static int	comp_location(const void *a, const void *b);
 
@@ -851,15 +794,10 @@ error:
  * Post-parse-analysis hook: mark query with a queryId
  */
 static void
-pgss_post_parse_analyze(ParseState *pstate, Query *query)
+pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 {
-	pgssJumbleState jstate;
-
 	if (prev_post_parse_analyze_hook)
-		prev_post_parse_analyze_hook(pstate, query);
-
-	/* Assert we didn't do this already */
-	Assert(query->queryId == UINT64CONST(0));
+		prev_post_parse_analyze_hook(pstate, query, jstate);
 
 	/* Safety check... */
 	if (!pgss || !pgss_hash || !pgss_enabled(exec_nested_level))
@@ -879,35 +817,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 	}
 
-	/* Set up workspace for query jumbling */
-	jstate.jumble = (unsigned char *) palloc(JUMBLE_SIZE);
-	jstate.jumble_len = 0;
-	jstate.clocations_buf_size = 32;
-	jstate.clocations = (pgssLocationLen *)
-		palloc(jstate.clocations_buf_size * sizeof(pgssLocationLen));
-	jstate.clocations_count = 0;
-	jstate.highest_extern_param_id = 0;
-
-	/* Compute query ID and mark the Query node with it */
-	JumbleQuery(&jstate, query);
-	query->queryId =
-		DatumGetUInt64(hash_any_extended(jstate.jumble, jstate.jumble_len, 0));
-
 	/*
-	 * If we are unlucky enough to get a hash of zero, use 1 instead, to
-	 * prevent confusion with the utility-statement case.
+	 * If query jumbling were able to identify any ignorable constants, we
+	 * immediately create a hash table entry for the query, so that we can
+	 * record the normalized form of the query string.  If there were no such
+	 * constants, the normalized string would be the same as the query text
+	 * anyway, so there's no need for an early entry.
 	 */
-	if (query->queryId == UINT64CONST(0))
-		query->queryId = UINT64CONST(1);
-
-	/*
-	 * If we were able to identify any ignorable constants, we immediately
-	 * create a hash table entry for the query, so that we can record the
-	 * normalized form of the query string.  If there were no such constants,
-	 * the normalized string would be the same as the query text anyway, so
-	 * there's no need for an early entry.
-	 */
-	if (jstate.clocations_count > 0)
+	if (jstate && jstate->clocations_count > 0)
 		pgss_store(pstate->p_sourcetext,
 				   query->queryId,
 				   query->stmt_location,
@@ -917,7 +834,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 				   0,
 				   NULL,
 				   NULL,
-				   &jstate);
+				   jstate);
 }
 
 /*
@@ -1267,7 +1184,7 @@ pgss_store(const char *query, uint64 queryId,
 		   double total_time, uint64 rows,
 		   const BufferUsage *bufusage,
 		   const WalUsage *walusage,
-		   pgssJumbleState *jstate)
+		   JumbleState *jstate)
 {
 	pgssHashKey key;
 	pgssEntry  *entry;
@@ -2627,678 +2544,6 @@ release_lock:
 	LWLockRelease(pgss->lock);
 }
 
-/*
- * AppendJumble: Append a value that is substantive in a given query to
- * the current jumble.
- */
-static void
-AppendJumble(pgssJumbleState *jstate, const unsigned char *item, Size size)
-{
-	unsigned char *jumble = jstate->jumble;
-	Size		jumble_len = jstate->jumble_len;
-
-	/*
-	 * Whenever the jumble buffer is full, we hash the current contents and
-	 * reset the buffer to contain just that hash value, thus relying on the
-	 * hash to summarize everything so far.
-	 */
-	while (size > 0)
-	{
-		Size		part_size;
-
-		if (jumble_len >= JUMBLE_SIZE)
-		{
-			uint64		start_hash;
-
-			start_hash = DatumGetUInt64(hash_any_extended(jumble,
-														  JUMBLE_SIZE, 0));
-			memcpy(jumble, &start_hash, sizeof(start_hash));
-			jumble_len = sizeof(start_hash);
-		}
-		part_size = Min(size, JUMBLE_SIZE - jumble_len);
-		memcpy(jumble + jumble_len, item, part_size);
-		jumble_len += part_size;
-		item += part_size;
-		size -= part_size;
-	}
-	jstate->jumble_len = jumble_len;
-}
-
-/*
- * Wrappers around AppendJumble to encapsulate details of serialization
- * of individual local variable elements.
- */
-#define APP_JUMB(item) \
-	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
-#define APP_JUMB_STRING(str) \
-	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
-
-/*
- * JumbleQuery: Selectively serialize the query tree, appending significant
- * data to the "query jumble" while ignoring nonsignificant data.
- *
- * Rule of thumb for what to include is that we should ignore anything not
- * semantically significant (such as alias names) as well as anything that can
- * be deduced from child nodes (else we'd just be double-hashing that piece
- * of information).
- */
-static void
-JumbleQuery(pgssJumbleState *jstate, Query *query)
-{
-	Assert(IsA(query, Query));
-	Assert(query->utilityStmt == NULL);
-
-	APP_JUMB(query->commandType);
-	/* resultRelation is usually predictable from commandType */
-	JumbleExpr(jstate, (Node *) query->cteList);
-	JumbleRangeTable(jstate, query->rtable);
-	JumbleExpr(jstate, (Node *) query->jointree);
-	JumbleExpr(jstate, (Node *) query->targetList);
-	JumbleExpr(jstate, (Node *) query->onConflict);
-	JumbleExpr(jstate, (Node *) query->returningList);
-	JumbleExpr(jstate, (Node *) query->groupClause);
-	JumbleExpr(jstate, (Node *) query->groupingSets);
-	JumbleExpr(jstate, query->havingQual);
-	JumbleExpr(jstate, (Node *) query->windowClause);
-	JumbleExpr(jstate, (Node *) query->distinctClause);
-	JumbleExpr(jstate, (Node *) query->sortClause);
-	JumbleExpr(jstate, query->limitOffset);
-	JumbleExpr(jstate, query->limitCount);
-	JumbleRowMarks(jstate, query->rowMarks);
-	JumbleExpr(jstate, query->setOperations);
-}
-
-/*
- * Jumble a range table
- */
-static void
-JumbleRangeTable(pgssJumbleState *jstate, List *rtable)
-{
-	ListCell   *lc;
-
-	foreach(lc, rtable)
-	{
-		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
-
-		APP_JUMB(rte->rtekind);
-		switch (rte->rtekind)
-		{
-			case RTE_RELATION:
-				APP_JUMB(rte->relid);
-				JumbleExpr(jstate, (Node *) rte->tablesample);
-				break;
-			case RTE_SUBQUERY:
-				JumbleQuery(jstate, rte->subquery);
-				break;
-			case RTE_JOIN:
-				APP_JUMB(rte->jointype);
-				break;
-			case RTE_FUNCTION:
-				JumbleExpr(jstate, (Node *) rte->functions);
-				break;
-			case RTE_TABLEFUNC:
-				JumbleExpr(jstate, (Node *) rte->tablefunc);
-				break;
-			case RTE_VALUES:
-				JumbleExpr(jstate, (Node *) rte->values_lists);
-				break;
-			case RTE_CTE:
-
-				/*
-				 * Depending on the CTE name here isn't ideal, but it's the
-				 * only info we have to identify the referenced WITH item.
-				 */
-				APP_JUMB_STRING(rte->ctename);
-				APP_JUMB(rte->ctelevelsup);
-				break;
-			case RTE_NAMEDTUPLESTORE:
-				APP_JUMB_STRING(rte->enrname);
-				break;
-			case RTE_RESULT:
-				break;
-			default:
-				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
-				break;
-		}
-	}
-}
-
-/*
- * Jumble a rowMarks list
- */
-static void
-JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks)
-{
-	ListCell   *lc;
-
-	foreach(lc, rowMarks)
-	{
-		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
-
-		if (!rowmark->pushedDown)
-		{
-			APP_JUMB(rowmark->rti);
-			APP_JUMB(rowmark->strength);
-			APP_JUMB(rowmark->waitPolicy);
-		}
-	}
-}
-
-/*
- * Jumble an expression tree
- *
- * In general this function should handle all the same node types that
- * expression_tree_walker() does, and therefore it's coded to be as parallel
- * to that function as possible.  However, since we are only invoked on
- * queries immediately post-parse-analysis, we need not handle node types
- * that only appear in planning.
- *
- * Note: the reason we don't simply use expression_tree_walker() is that the
- * point of that function is to support tree walkers that don't care about
- * most tree node types, but here we care about all types.  We should complain
- * about any unrecognized node type.
- */
-static void
-JumbleExpr(pgssJumbleState *jstate, Node *node)
-{
-	ListCell   *temp;
-
-	if (node == NULL)
-		return;
-
-	/* Guard against stack overflow due to overly complex expressions */
-	check_stack_depth();
-
-	/*
-	 * We always emit the node's NodeTag, then any additional fields that are
-	 * considered significant, and then we recurse to any child nodes.
-	 */
-	APP_JUMB(node->type);
-
-	switch (nodeTag(node))
-	{
-		case T_Var:
-			{
-				Var		   *var = (Var *) node;
-
-				APP_JUMB(var->varno);
-				APP_JUMB(var->varattno);
-				APP_JUMB(var->varlevelsup);
-			}
-			break;
-		case T_Const:
-			{
-				Const	   *c = (Const *) node;
-
-				/* We jumble only the constant's type, not its value */
-				APP_JUMB(c->consttype);
-				/* Also, record its parse location for query normalization */
-				RecordConstLocation(jstate, c->location);
-			}
-			break;
-		case T_Param:
-			{
-				Param	   *p = (Param *) node;
-
-				APP_JUMB(p->paramkind);
-				APP_JUMB(p->paramid);
-				APP_JUMB(p->paramtype);
-				/* Also, track the highest external Param id */
-				if (p->paramkind == PARAM_EXTERN &&
-					p->paramid > jstate->highest_extern_param_id)
-					jstate->highest_extern_param_id = p->paramid;
-			}
-			break;
-		case T_Aggref:
-			{
-				Aggref	   *expr = (Aggref *) node;
-
-				APP_JUMB(expr->aggfnoid);
-				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggorder);
-				JumbleExpr(jstate, (Node *) expr->aggdistinct);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_GroupingFunc:
-			{
-				GroupingFunc *grpnode = (GroupingFunc *) node;
-
-				JumbleExpr(jstate, (Node *) grpnode->refs);
-			}
-			break;
-		case T_WindowFunc:
-			{
-				WindowFunc *expr = (WindowFunc *) node;
-
-				APP_JUMB(expr->winfnoid);
-				APP_JUMB(expr->winref);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_SubscriptingRef:
-			{
-				SubscriptingRef *sbsref = (SubscriptingRef *) node;
-
-				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
-			}
-			break;
-		case T_FuncExpr:
-			{
-				FuncExpr   *expr = (FuncExpr *) node;
-
-				APP_JUMB(expr->funcid);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_NamedArgExpr:
-			{
-				NamedArgExpr *nae = (NamedArgExpr *) node;
-
-				APP_JUMB(nae->argnumber);
-				JumbleExpr(jstate, (Node *) nae->arg);
-			}
-			break;
-		case T_OpExpr:
-		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
-		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
-			{
-				OpExpr	   *expr = (OpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_ScalarArrayOpExpr:
-			{
-				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				APP_JUMB(expr->useOr);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_BoolExpr:
-			{
-				BoolExpr   *expr = (BoolExpr *) node;
-
-				APP_JUMB(expr->boolop);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_SubLink:
-			{
-				SubLink    *sublink = (SubLink *) node;
-
-				APP_JUMB(sublink->subLinkType);
-				APP_JUMB(sublink->subLinkId);
-				JumbleExpr(jstate, (Node *) sublink->testexpr);
-				JumbleQuery(jstate, castNode(Query, sublink->subselect));
-			}
-			break;
-		case T_FieldSelect:
-			{
-				FieldSelect *fs = (FieldSelect *) node;
-
-				APP_JUMB(fs->fieldnum);
-				JumbleExpr(jstate, (Node *) fs->arg);
-			}
-			break;
-		case T_FieldStore:
-			{
-				FieldStore *fstore = (FieldStore *) node;
-
-				JumbleExpr(jstate, (Node *) fstore->arg);
-				JumbleExpr(jstate, (Node *) fstore->newvals);
-			}
-			break;
-		case T_RelabelType:
-			{
-				RelabelType *rt = (RelabelType *) node;
-
-				APP_JUMB(rt->resulttype);
-				JumbleExpr(jstate, (Node *) rt->arg);
-			}
-			break;
-		case T_CoerceViaIO:
-			{
-				CoerceViaIO *cio = (CoerceViaIO *) node;
-
-				APP_JUMB(cio->resulttype);
-				JumbleExpr(jstate, (Node *) cio->arg);
-			}
-			break;
-		case T_ArrayCoerceExpr:
-			{
-				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
-
-				APP_JUMB(acexpr->resulttype);
-				JumbleExpr(jstate, (Node *) acexpr->arg);
-				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
-			}
-			break;
-		case T_ConvertRowtypeExpr:
-			{
-				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
-
-				APP_JUMB(crexpr->resulttype);
-				JumbleExpr(jstate, (Node *) crexpr->arg);
-			}
-			break;
-		case T_CollateExpr:
-			{
-				CollateExpr *ce = (CollateExpr *) node;
-
-				APP_JUMB(ce->collOid);
-				JumbleExpr(jstate, (Node *) ce->arg);
-			}
-			break;
-		case T_CaseExpr:
-			{
-				CaseExpr   *caseexpr = (CaseExpr *) node;
-
-				JumbleExpr(jstate, (Node *) caseexpr->arg);
-				foreach(temp, caseexpr->args)
-				{
-					CaseWhen   *when = lfirst_node(CaseWhen, temp);
-
-					JumbleExpr(jstate, (Node *) when->expr);
-					JumbleExpr(jstate, (Node *) when->result);
-				}
-				JumbleExpr(jstate, (Node *) caseexpr->defresult);
-			}
-			break;
-		case T_CaseTestExpr:
-			{
-				CaseTestExpr *ct = (CaseTestExpr *) node;
-
-				APP_JUMB(ct->typeId);
-			}
-			break;
-		case T_ArrayExpr:
-			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
-			break;
-		case T_RowExpr:
-			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
-			break;
-		case T_RowCompareExpr:
-			{
-				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
-
-				APP_JUMB(rcexpr->rctype);
-				JumbleExpr(jstate, (Node *) rcexpr->largs);
-				JumbleExpr(jstate, (Node *) rcexpr->rargs);
-			}
-			break;
-		case T_CoalesceExpr:
-			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
-			break;
-		case T_MinMaxExpr:
-			{
-				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
-
-				APP_JUMB(mmexpr->op);
-				JumbleExpr(jstate, (Node *) mmexpr->args);
-			}
-			break;
-		case T_SQLValueFunction:
-			{
-				SQLValueFunction *svf = (SQLValueFunction *) node;
-
-				APP_JUMB(svf->op);
-				/* type is fully determined by op */
-				APP_JUMB(svf->typmod);
-			}
-			break;
-		case T_XmlExpr:
-			{
-				XmlExpr    *xexpr = (XmlExpr *) node;
-
-				APP_JUMB(xexpr->op);
-				JumbleExpr(jstate, (Node *) xexpr->named_args);
-				JumbleExpr(jstate, (Node *) xexpr->args);
-			}
-			break;
-		case T_NullTest:
-			{
-				NullTest   *nt = (NullTest *) node;
-
-				APP_JUMB(nt->nulltesttype);
-				JumbleExpr(jstate, (Node *) nt->arg);
-			}
-			break;
-		case T_BooleanTest:
-			{
-				BooleanTest *bt = (BooleanTest *) node;
-
-				APP_JUMB(bt->booltesttype);
-				JumbleExpr(jstate, (Node *) bt->arg);
-			}
-			break;
-		case T_CoerceToDomain:
-			{
-				CoerceToDomain *cd = (CoerceToDomain *) node;
-
-				APP_JUMB(cd->resulttype);
-				JumbleExpr(jstate, (Node *) cd->arg);
-			}
-			break;
-		case T_CoerceToDomainValue:
-			{
-				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
-
-				APP_JUMB(cdv->typeId);
-			}
-			break;
-		case T_SetToDefault:
-			{
-				SetToDefault *sd = (SetToDefault *) node;
-
-				APP_JUMB(sd->typeId);
-			}
-			break;
-		case T_CurrentOfExpr:
-			{
-				CurrentOfExpr *ce = (CurrentOfExpr *) node;
-
-				APP_JUMB(ce->cvarno);
-				if (ce->cursor_name)
-					APP_JUMB_STRING(ce->cursor_name);
-				APP_JUMB(ce->cursor_param);
-			}
-			break;
-		case T_NextValueExpr:
-			{
-				NextValueExpr *nve = (NextValueExpr *) node;
-
-				APP_JUMB(nve->seqid);
-				APP_JUMB(nve->typeId);
-			}
-			break;
-		case T_InferenceElem:
-			{
-				InferenceElem *ie = (InferenceElem *) node;
-
-				APP_JUMB(ie->infercollid);
-				APP_JUMB(ie->inferopclass);
-				JumbleExpr(jstate, ie->expr);
-			}
-			break;
-		case T_TargetEntry:
-			{
-				TargetEntry *tle = (TargetEntry *) node;
-
-				APP_JUMB(tle->resno);
-				APP_JUMB(tle->ressortgroupref);
-				JumbleExpr(jstate, (Node *) tle->expr);
-			}
-			break;
-		case T_RangeTblRef:
-			{
-				RangeTblRef *rtr = (RangeTblRef *) node;
-
-				APP_JUMB(rtr->rtindex);
-			}
-			break;
-		case T_JoinExpr:
-			{
-				JoinExpr   *join = (JoinExpr *) node;
-
-				APP_JUMB(join->jointype);
-				APP_JUMB(join->isNatural);
-				APP_JUMB(join->rtindex);
-				JumbleExpr(jstate, join->larg);
-				JumbleExpr(jstate, join->rarg);
-				JumbleExpr(jstate, join->quals);
-			}
-			break;
-		case T_FromExpr:
-			{
-				FromExpr   *from = (FromExpr *) node;
-
-				JumbleExpr(jstate, (Node *) from->fromlist);
-				JumbleExpr(jstate, from->quals);
-			}
-			break;
-		case T_OnConflictExpr:
-			{
-				OnConflictExpr *conf = (OnConflictExpr *) node;
-
-				APP_JUMB(conf->action);
-				JumbleExpr(jstate, (Node *) conf->arbiterElems);
-				JumbleExpr(jstate, conf->arbiterWhere);
-				JumbleExpr(jstate, (Node *) conf->onConflictSet);
-				JumbleExpr(jstate, conf->onConflictWhere);
-				APP_JUMB(conf->constraint);
-				APP_JUMB(conf->exclRelIndex);
-				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
-			}
-			break;
-		case T_List:
-			foreach(temp, (List *) node)
-			{
-				JumbleExpr(jstate, (Node *) lfirst(temp));
-			}
-			break;
-		case T_IntList:
-			foreach(temp, (List *) node)
-			{
-				APP_JUMB(lfirst_int(temp));
-			}
-			break;
-		case T_SortGroupClause:
-			{
-				SortGroupClause *sgc = (SortGroupClause *) node;
-
-				APP_JUMB(sgc->tleSortGroupRef);
-				APP_JUMB(sgc->eqop);
-				APP_JUMB(sgc->sortop);
-				APP_JUMB(sgc->nulls_first);
-			}
-			break;
-		case T_GroupingSet:
-			{
-				GroupingSet *gsnode = (GroupingSet *) node;
-
-				JumbleExpr(jstate, (Node *) gsnode->content);
-			}
-			break;
-		case T_WindowClause:
-			{
-				WindowClause *wc = (WindowClause *) node;
-
-				APP_JUMB(wc->winref);
-				APP_JUMB(wc->frameOptions);
-				JumbleExpr(jstate, (Node *) wc->partitionClause);
-				JumbleExpr(jstate, (Node *) wc->orderClause);
-				JumbleExpr(jstate, wc->startOffset);
-				JumbleExpr(jstate, wc->endOffset);
-			}
-			break;
-		case T_CommonTableExpr:
-			{
-				CommonTableExpr *cte = (CommonTableExpr *) node;
-
-				/* we store the string name because RTE_CTE RTEs need it */
-				APP_JUMB_STRING(cte->ctename);
-				APP_JUMB(cte->ctematerialized);
-				JumbleQuery(jstate, castNode(Query, cte->ctequery));
-			}
-			break;
-		case T_SetOperationStmt:
-			{
-				SetOperationStmt *setop = (SetOperationStmt *) node;
-
-				APP_JUMB(setop->op);
-				APP_JUMB(setop->all);
-				JumbleExpr(jstate, setop->larg);
-				JumbleExpr(jstate, setop->rarg);
-			}
-			break;
-		case T_RangeTblFunction:
-			{
-				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
-
-				JumbleExpr(jstate, rtfunc->funcexpr);
-			}
-			break;
-		case T_TableFunc:
-			{
-				TableFunc  *tablefunc = (TableFunc *) node;
-
-				JumbleExpr(jstate, tablefunc->docexpr);
-				JumbleExpr(jstate, tablefunc->rowexpr);
-				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
-			}
-			break;
-		case T_TableSampleClause:
-			{
-				TableSampleClause *tsc = (TableSampleClause *) node;
-
-				APP_JUMB(tsc->tsmhandler);
-				JumbleExpr(jstate, (Node *) tsc->args);
-				JumbleExpr(jstate, (Node *) tsc->repeatable);
-			}
-			break;
-		default:
-			/* Only a warning, since we can stumble along anyway */
-			elog(WARNING, "unrecognized node type: %d",
-				 (int) nodeTag(node));
-			break;
-	}
-}
-
-/*
- * Record location of constant within query string of query tree
- * that is currently being walked.
- */
-static void
-RecordConstLocation(pgssJumbleState *jstate, int location)
-{
-	/* -1 indicates unknown or undefined location */
-	if (location >= 0)
-	{
-		/* enlarge array if needed */
-		if (jstate->clocations_count >= jstate->clocations_buf_size)
-		{
-			jstate->clocations_buf_size *= 2;
-			jstate->clocations = (pgssLocationLen *)
-				repalloc(jstate->clocations,
-						 jstate->clocations_buf_size *
-						 sizeof(pgssLocationLen));
-		}
-		jstate->clocations[jstate->clocations_count].location = location;
-		/* initialize lengths to -1 to simplify fill_in_constant_lengths */
-		jstate->clocations[jstate->clocations_count].length = -1;
-		jstate->clocations_count++;
-	}
-}
-
 /*
  * Generate a normalized version of the query string that will be used to
  * represent all similar queries.
@@ -3319,7 +2564,7 @@ RecordConstLocation(pgssJumbleState *jstate, int location)
  * Returns a palloc'd string.
  */
 static char *
-generate_normalized_query(pgssJumbleState *jstate, const char *query,
+generate_normalized_query(JumbleState *jstate, const char *query,
 						  int query_loc, int *query_len_p)
 {
 	char	   *norm_query;
@@ -3426,10 +2671,10 @@ generate_normalized_query(pgssJumbleState *jstate, const char *query,
  * reason for a constant to start with a '-'.
  */
 static void
-fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+fill_in_constant_lengths(JumbleState *jstate, const char *query,
 						 int query_loc)
 {
-	pgssLocationLen *locs;
+	LocationLen *locs;
 	core_yyscan_t yyscanner;
 	core_yy_extra_type yyextra;
 	core_YYSTYPE yylval;
@@ -3443,7 +2688,7 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 	 */
 	if (jstate->clocations_count > 1)
 		qsort(jstate->clocations, jstate->clocations_count,
-			  sizeof(pgssLocationLen), comp_location);
+			  sizeof(LocationLen), comp_location);
 	locs = jstate->clocations;
 
 	/* initialize the flex scanner --- should match raw_parser() */
@@ -3523,13 +2768,13 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 }
 
 /*
- * comp_location: comparator for qsorting pgssLocationLen structs by location
+ * comp_location: comparator for qsorting LocationLen structs by location
  */
 static int
 comp_location(const void *a, const void *b)
 {
-	int			l = ((const pgssLocationLen *) a)->location;
-	int			r = ((const pgssLocationLen *) b)->location;
+	int			l = ((const LocationLen *) a)->location;
+	int			r = ((const LocationLen *) b)->location;
 
 	if (l < r)
 		return -1;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.conf b/contrib/pg_stat_statements/pg_stat_statements.conf
index 13346e2807..d98411ea3f 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.conf
+++ b/contrib/pg_stat_statements/pg_stat_statements.conf
@@ -1 +1,2 @@
 shared_preload_libraries = 'pg_stat_statements'
+compute_queryid = on
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a218d78bef..6834ea3735 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7536,6 +7536,24 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
      <title>Statistics Monitoring</title>
      <variablelist>
 
+     <varlistentry id="guc-compute-queryid" xreflabel="compute_queryid">
+      <term><varname>compute_queryid</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>compute_queryid</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables in core query identifier computation.arameter.  The
+        <xref linkend="pgstatstatements"/> extension requires a query
+        identifier to be computed.  Note that an external module can
+        alternatively be used if the in core query identifier computation
+        specification doesn't suit your need.  In this case, in core
+        computation must be disabled.  The default is <literal>off</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><varname>log_statement_stats</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 0f3a70c49a..ddfb97b543 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -46,6 +46,8 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/queryjumble.h"
 #include "utils/rel.h"
 
 
@@ -107,6 +109,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -119,8 +122,11 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	query = transformTopLevelStmt(pstate, parseTree);
 
+	if (compute_queryid)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
@@ -140,6 +146,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -152,8 +159,11 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 	/* make sure all is well with parameter types */
 	check_variable_parameters(pstate, query);
 
+	if (compute_queryid)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8a0332dde9..c11af652de 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -653,6 +653,7 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 	ParseState *pstate;
 	Query	   *query;
 	List	   *querytree_list;
+	JumbleState *jstate = NULL;
 
 	Assert(query_string != NULL);	/* required as of 8.4 */
 
@@ -671,8 +672,11 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	query = transformTopLevelStmt(pstate, parsetree);
 
+	if (compute_queryid)
+		jstate = JumbleQuery(query, query_string);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 2397fc2453..1d5327cf64 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -22,6 +22,7 @@ OBJS = \
 	pg_rusage.o \
 	ps_status.o \
 	queryenvironment.o \
+	queryjumble.o \
 	rls.o \
 	sampling.o \
 	superuser.o \
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 855076b1fd..74a7d7f992 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -512,6 +512,7 @@ extern const struct config_enum_entry dynamic_shared_memory_options[];
 /*
  * GUC option variables that are exported from this module
  */
+bool		compute_queryid = false;
 bool		log_duration = false;
 bool		Debug_print_plan = false;
 bool		Debug_print_parse = false;
@@ -1407,6 +1408,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"compute_queryid", PGC_SUSET, STATS_MONITORING,
+			gettext_noop("Compute query identifiers."),
+			NULL
+		},
+		&compute_queryid,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"log_parser_stats", PGC_SUSET, STATS_MONITORING,
 			gettext_noop("Writes parser performance statistics to the server log."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index f46c2dd7a8..31230b5704 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -594,6 +594,7 @@
 
 # - Monitoring -
 
+#compute_queryid = off
 #log_parser_stats = off
 #log_planner_stats = off
 #log_executor_stats = off
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
new file mode 100644
index 0000000000..ae84fcac6e
--- /dev/null
+++ b/src/backend/utils/misc/queryjumble.c
@@ -0,0 +1,834 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.c
+ *	 Query normalization and fingerprinting.
+ *
+ * Normalization is a process whereby similar queries, typically differing only
+ * in their constants (though the exact rules are somewhat more subtle than
+ * that) are recognized as equivalent, and are tracked as a single entry.  This
+ * is particularly useful for non-prepared queries.
+ *
+ * Normalization is implemented by fingerprinting queries, selectively
+ * serializing those fields of each query tree's nodes that are judged to be
+ * essential to the query.  This is referred to as a query jumble.  This is
+ * distinct from a regular serialization in that various extraneous
+ * information is ignored as irrelevant or not essential to the query, such
+ * as the collations of Vars and, most notably, the values of constants.
+ *
+ * This jumble is acquired at the end of parse analysis of each query, and
+ * a 64-bit hash of it is stored into the query's Query.queryId field.
+ * The server then copies this value around, making it available in plan
+ * tree(s) generated from the query.  The executor can then use this value
+ * to blame query costs on the proper queryId.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/misc/queryjumble.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "common/hashfn.h"
+#include "miscadmin.h"
+#include "parser/scansup.h"
+#include "utils/queryjumble.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+static uint64 compute_utility_queryid(const char *str, int query_len);
+static void AppendJumble(JumbleState *jstate,
+						 const unsigned char *item, Size size);
+static void JumbleQueryInternal(JumbleState *jstate, Query *query);
+static void JumbleRangeTable(JumbleState *jstate, List *rtable);
+static void JumbleRowMarks(JumbleState *jstate, List *rowMarks);
+static void JumbleExpr(JumbleState *jstate, Node *node);
+static void RecordConstLocation(JumbleState *jstate, int location);
+
+/*
+ * Given a possibly multi-statement source string, confine our attention to the
+ * relevant part of the string.
+ */
+const char *
+clean_querytext(const char *query, int *location, int *len)
+{
+	int query_location = *location;
+	int query_len = *len;
+
+	/* First apply starting offset, unless it's -1 (unknown). */
+	if (query_location >= 0)
+	{
+		Assert(query_location <= strlen(query));
+		query += query_location;
+		/* Length of 0 (or -1) means "rest of string" */
+		if (query_len <= 0)
+			query_len = strlen(query);
+		else
+			Assert(query_len <= strlen(query));
+	}
+	else
+	{
+		/* If query location is unknown, distrust query_len as well */
+		query_location = 0;
+		query_len = strlen(query);
+	}
+
+	/*
+	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
+	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 */
+	while (query_len > 0 && scanner_isspace(query[0]))
+		query++, query_location++, query_len--;
+	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
+		query_len--;
+
+	*location = query_location;
+	*len = query_len;
+
+	return query;
+}
+
+JumbleState *
+JumbleQuery(Query *query, const char *querytext)
+{
+	JumbleState *jstate = NULL;
+	if (query->utilityStmt)
+	{
+		const char *sql;
+		int query_location = query->stmt_location;
+		int query_len = query->stmt_len;
+
+		/*
+		 * Confine our attention to the relevant part of the string, if the
+		 * query is a portion of a multi-statement source string.
+		 */
+		sql = clean_querytext(querytext, &query_location, &query_len);
+
+		query->queryId = compute_utility_queryid(sql, query_len);
+	}
+	else
+	{
+		jstate = (JumbleState *) palloc(sizeof(JumbleState));
+
+		/* Set up workspace for query jumbling */
+		jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
+		jstate->jumble_len = 0;
+		jstate->clocations_buf_size = 32;
+		jstate->clocations = (LocationLen *)
+			palloc(jstate->clocations_buf_size * sizeof(LocationLen));
+		jstate->clocations_count = 0;
+		jstate->highest_extern_param_id = 0;
+
+		/* Compute query ID and mark the Query node with it */
+		JumbleQueryInternal(jstate, query);
+		query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
+														  jstate->jumble_len,
+														  0));
+
+		/*
+		 * If we are unlucky enough to get a hash of zero, use 1 instead, to
+		 * prevent confusion with the utility-statement case.
+		 */
+		if (query->queryId == UINT64CONST(0))
+			query->queryId = UINT64CONST(1);
+	}
+
+	return jstate;
+}
+
+/*
+ * Compute a query identifier for the given utility query string.
+ */
+static uint64
+compute_utility_queryid(const char *str, int query_len)
+{
+	uint64 queryId;
+
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+											   query_len, 0));
+
+	/*
+	 * If we are unlucky enough to get a hash of zero(invalid), use
+	 * queryID as 2 instead, queryID 1 is already in use for normal
+	 * statements.
+	 */
+	if (queryId == UINT64CONST(0))
+		queryId = UINT64CONST(2);
+
+	return queryId;
+}
+
+/*
+ * AppendJumble: Append a value that is substantive in a given query to
+ * the current jumble.
+ */
+static void
+AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
+{
+	unsigned char *jumble = jstate->jumble;
+	Size		jumble_len = jstate->jumble_len;
+
+	/*
+	 * Whenever the jumble buffer is full, we hash the current contents and
+	 * reset the buffer to contain just that hash value, thus relying on the
+	 * hash to summarize everything so far.
+	 */
+	while (size > 0)
+	{
+		Size		part_size;
+
+		if (jumble_len >= JUMBLE_SIZE)
+		{
+			uint64		start_hash;
+
+			start_hash = DatumGetUInt64(hash_any_extended(jumble,
+														  JUMBLE_SIZE, 0));
+			memcpy(jumble, &start_hash, sizeof(start_hash));
+			jumble_len = sizeof(start_hash);
+		}
+		part_size = Min(size, JUMBLE_SIZE - jumble_len);
+		memcpy(jumble + jumble_len, item, part_size);
+		jumble_len += part_size;
+		item += part_size;
+		size -= part_size;
+	}
+	jstate->jumble_len = jumble_len;
+}
+
+/*
+ * Wrappers around AppendJumble to encapsulate details of serialization
+ * of individual local variable elements.
+ */
+#define APP_JUMB(item) \
+	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
+#define APP_JUMB_STRING(str) \
+	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
+
+/*
+ * JumbleQueryInternal: Selectively serialize the query tree, appending
+ * significant data to the "query jumble" while ignoring nonsignificant data.
+ *
+ * Rule of thumb for what to include is that we should ignore anything not
+ * semantically significant (such as alias names) as well as anything that can
+ * be deduced from child nodes (else we'd just be double-hashing that piece
+ * of information).
+ */
+static void
+JumbleQueryInternal(JumbleState *jstate, Query *query)
+{
+	Assert(IsA(query, Query));
+	Assert(query->utilityStmt == NULL);
+
+	APP_JUMB(query->commandType);
+	/* resultRelation is usually predictable from commandType */
+	JumbleExpr(jstate, (Node *) query->cteList);
+	JumbleRangeTable(jstate, query->rtable);
+	JumbleExpr(jstate, (Node *) query->jointree);
+	JumbleExpr(jstate, (Node *) query->targetList);
+	JumbleExpr(jstate, (Node *) query->onConflict);
+	JumbleExpr(jstate, (Node *) query->returningList);
+	JumbleExpr(jstate, (Node *) query->groupClause);
+	JumbleExpr(jstate, (Node *) query->groupingSets);
+	JumbleExpr(jstate, query->havingQual);
+	JumbleExpr(jstate, (Node *) query->windowClause);
+	JumbleExpr(jstate, (Node *) query->distinctClause);
+	JumbleExpr(jstate, (Node *) query->sortClause);
+	JumbleExpr(jstate, query->limitOffset);
+	JumbleExpr(jstate, query->limitCount);
+	JumbleRowMarks(jstate, query->rowMarks);
+	JumbleExpr(jstate, query->setOperations);
+}
+
+/*
+ * Jumble a range table
+ */
+static void
+JumbleRangeTable(JumbleState *jstate, List *rtable)
+{
+	ListCell   *lc;
+
+	foreach(lc, rtable)
+	{
+		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+		APP_JUMB(rte->rtekind);
+		switch (rte->rtekind)
+		{
+			case RTE_RELATION:
+				APP_JUMB(rte->relid);
+				JumbleExpr(jstate, (Node *) rte->tablesample);
+				break;
+			case RTE_SUBQUERY:
+				JumbleQueryInternal(jstate, rte->subquery);
+				break;
+			case RTE_JOIN:
+				APP_JUMB(rte->jointype);
+				break;
+			case RTE_FUNCTION:
+				JumbleExpr(jstate, (Node *) rte->functions);
+				break;
+			case RTE_TABLEFUNC:
+				JumbleExpr(jstate, (Node *) rte->tablefunc);
+				break;
+			case RTE_VALUES:
+				JumbleExpr(jstate, (Node *) rte->values_lists);
+				break;
+			case RTE_CTE:
+
+				/*
+				 * Depending on the CTE name here isn't ideal, but it's the
+				 * only info we have to identify the referenced WITH item.
+				 */
+				APP_JUMB_STRING(rte->ctename);
+				APP_JUMB(rte->ctelevelsup);
+				break;
+			case RTE_NAMEDTUPLESTORE:
+				APP_JUMB_STRING(rte->enrname);
+				break;
+			case RTE_RESULT:
+				break;
+			default:
+				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
+				break;
+		}
+	}
+}
+
+/*
+ * Jumble a rowMarks list
+ */
+static void
+JumbleRowMarks(JumbleState *jstate, List *rowMarks)
+{
+	ListCell   *lc;
+
+	foreach(lc, rowMarks)
+	{
+		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
+
+		if (!rowmark->pushedDown)
+		{
+			APP_JUMB(rowmark->rti);
+			APP_JUMB(rowmark->strength);
+			APP_JUMB(rowmark->waitPolicy);
+		}
+	}
+}
+
+/*
+ * Jumble an expression tree
+ *
+ * In general this function should handle all the same node types that
+ * expression_tree_walker() does, and therefore it's coded to be as parallel
+ * to that function as possible.  However, since we are only invoked on
+ * queries immediately post-parse-analysis, we need not handle node types
+ * that only appear in planning.
+ *
+ * Note: the reason we don't simply use expression_tree_walker() is that the
+ * point of that function is to support tree walkers that don't care about
+ * most tree node types, but here we care about all types.  We should complain
+ * about any unrecognized node type.
+ */
+static void
+JumbleExpr(JumbleState *jstate, Node *node)
+{
+	ListCell   *temp;
+
+	if (node == NULL)
+		return;
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+
+	/*
+	 * We always emit the node's NodeTag, then any additional fields that are
+	 * considered significant, and then we recurse to any child nodes.
+	 */
+	APP_JUMB(node->type);
+
+	switch (nodeTag(node))
+	{
+		case T_Var:
+			{
+				Var		   *var = (Var *) node;
+
+				APP_JUMB(var->varno);
+				APP_JUMB(var->varattno);
+				APP_JUMB(var->varlevelsup);
+			}
+			break;
+		case T_Const:
+			{
+				Const	   *c = (Const *) node;
+
+				/* We jumble only the constant's type, not its value */
+				APP_JUMB(c->consttype);
+				/* Also, record its parse location for query normalization */
+				RecordConstLocation(jstate, c->location);
+			}
+			break;
+		case T_Param:
+			{
+				Param	   *p = (Param *) node;
+
+				APP_JUMB(p->paramkind);
+				APP_JUMB(p->paramid);
+				APP_JUMB(p->paramtype);
+				/* Also, track the highest external Param id */
+				if (p->paramkind == PARAM_EXTERN &&
+					p->paramid > jstate->highest_extern_param_id)
+					jstate->highest_extern_param_id = p->paramid;
+			}
+			break;
+		case T_Aggref:
+			{
+				Aggref	   *expr = (Aggref *) node;
+
+				APP_JUMB(expr->aggfnoid);
+				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggorder);
+				JumbleExpr(jstate, (Node *) expr->aggdistinct);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_GroupingFunc:
+			{
+				GroupingFunc *grpnode = (GroupingFunc *) node;
+
+				JumbleExpr(jstate, (Node *) grpnode->refs);
+			}
+			break;
+		case T_WindowFunc:
+			{
+				WindowFunc *expr = (WindowFunc *) node;
+
+				APP_JUMB(expr->winfnoid);
+				APP_JUMB(expr->winref);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_SubscriptingRef:
+			{
+				SubscriptingRef *sbsref = (SubscriptingRef *) node;
+
+				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
+			}
+			break;
+		case T_FuncExpr:
+			{
+				FuncExpr   *expr = (FuncExpr *) node;
+
+				APP_JUMB(expr->funcid);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_NamedArgExpr:
+			{
+				NamedArgExpr *nae = (NamedArgExpr *) node;
+
+				APP_JUMB(nae->argnumber);
+				JumbleExpr(jstate, (Node *) nae->arg);
+			}
+			break;
+		case T_OpExpr:
+		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
+		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
+			{
+				OpExpr	   *expr = (OpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_ScalarArrayOpExpr:
+			{
+				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				APP_JUMB(expr->useOr);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_BoolExpr:
+			{
+				BoolExpr   *expr = (BoolExpr *) node;
+
+				APP_JUMB(expr->boolop);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_SubLink:
+			{
+				SubLink    *sublink = (SubLink *) node;
+
+				APP_JUMB(sublink->subLinkType);
+				APP_JUMB(sublink->subLinkId);
+				JumbleExpr(jstate, (Node *) sublink->testexpr);
+				JumbleQueryInternal(jstate, castNode(Query, sublink->subselect));
+			}
+			break;
+		case T_FieldSelect:
+			{
+				FieldSelect *fs = (FieldSelect *) node;
+
+				APP_JUMB(fs->fieldnum);
+				JumbleExpr(jstate, (Node *) fs->arg);
+			}
+			break;
+		case T_FieldStore:
+			{
+				FieldStore *fstore = (FieldStore *) node;
+
+				JumbleExpr(jstate, (Node *) fstore->arg);
+				JumbleExpr(jstate, (Node *) fstore->newvals);
+			}
+			break;
+		case T_RelabelType:
+			{
+				RelabelType *rt = (RelabelType *) node;
+
+				APP_JUMB(rt->resulttype);
+				JumbleExpr(jstate, (Node *) rt->arg);
+			}
+			break;
+		case T_CoerceViaIO:
+			{
+				CoerceViaIO *cio = (CoerceViaIO *) node;
+
+				APP_JUMB(cio->resulttype);
+				JumbleExpr(jstate, (Node *) cio->arg);
+			}
+			break;
+		case T_ArrayCoerceExpr:
+			{
+				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
+
+				APP_JUMB(acexpr->resulttype);
+				JumbleExpr(jstate, (Node *) acexpr->arg);
+				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
+			}
+			break;
+		case T_ConvertRowtypeExpr:
+			{
+				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
+
+				APP_JUMB(crexpr->resulttype);
+				JumbleExpr(jstate, (Node *) crexpr->arg);
+			}
+			break;
+		case T_CollateExpr:
+			{
+				CollateExpr *ce = (CollateExpr *) node;
+
+				APP_JUMB(ce->collOid);
+				JumbleExpr(jstate, (Node *) ce->arg);
+			}
+			break;
+		case T_CaseExpr:
+			{
+				CaseExpr   *caseexpr = (CaseExpr *) node;
+
+				JumbleExpr(jstate, (Node *) caseexpr->arg);
+				foreach(temp, caseexpr->args)
+				{
+					CaseWhen   *when = lfirst_node(CaseWhen, temp);
+
+					JumbleExpr(jstate, (Node *) when->expr);
+					JumbleExpr(jstate, (Node *) when->result);
+				}
+				JumbleExpr(jstate, (Node *) caseexpr->defresult);
+			}
+			break;
+		case T_CaseTestExpr:
+			{
+				CaseTestExpr *ct = (CaseTestExpr *) node;
+
+				APP_JUMB(ct->typeId);
+			}
+			break;
+		case T_ArrayExpr:
+			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
+			break;
+		case T_RowExpr:
+			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
+			break;
+		case T_RowCompareExpr:
+			{
+				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
+
+				APP_JUMB(rcexpr->rctype);
+				JumbleExpr(jstate, (Node *) rcexpr->largs);
+				JumbleExpr(jstate, (Node *) rcexpr->rargs);
+			}
+			break;
+		case T_CoalesceExpr:
+			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
+			break;
+		case T_MinMaxExpr:
+			{
+				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
+
+				APP_JUMB(mmexpr->op);
+				JumbleExpr(jstate, (Node *) mmexpr->args);
+			}
+			break;
+		case T_SQLValueFunction:
+			{
+				SQLValueFunction *svf = (SQLValueFunction *) node;
+
+				APP_JUMB(svf->op);
+				/* type is fully determined by op */
+				APP_JUMB(svf->typmod);
+			}
+			break;
+		case T_XmlExpr:
+			{
+				XmlExpr    *xexpr = (XmlExpr *) node;
+
+				APP_JUMB(xexpr->op);
+				JumbleExpr(jstate, (Node *) xexpr->named_args);
+				JumbleExpr(jstate, (Node *) xexpr->args);
+			}
+			break;
+		case T_NullTest:
+			{
+				NullTest   *nt = (NullTest *) node;
+
+				APP_JUMB(nt->nulltesttype);
+				JumbleExpr(jstate, (Node *) nt->arg);
+			}
+			break;
+		case T_BooleanTest:
+			{
+				BooleanTest *bt = (BooleanTest *) node;
+
+				APP_JUMB(bt->booltesttype);
+				JumbleExpr(jstate, (Node *) bt->arg);
+			}
+			break;
+		case T_CoerceToDomain:
+			{
+				CoerceToDomain *cd = (CoerceToDomain *) node;
+
+				APP_JUMB(cd->resulttype);
+				JumbleExpr(jstate, (Node *) cd->arg);
+			}
+			break;
+		case T_CoerceToDomainValue:
+			{
+				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
+
+				APP_JUMB(cdv->typeId);
+			}
+			break;
+		case T_SetToDefault:
+			{
+				SetToDefault *sd = (SetToDefault *) node;
+
+				APP_JUMB(sd->typeId);
+			}
+			break;
+		case T_CurrentOfExpr:
+			{
+				CurrentOfExpr *ce = (CurrentOfExpr *) node;
+
+				APP_JUMB(ce->cvarno);
+				if (ce->cursor_name)
+					APP_JUMB_STRING(ce->cursor_name);
+				APP_JUMB(ce->cursor_param);
+			}
+			break;
+		case T_NextValueExpr:
+			{
+				NextValueExpr *nve = (NextValueExpr *) node;
+
+				APP_JUMB(nve->seqid);
+				APP_JUMB(nve->typeId);
+			}
+			break;
+		case T_InferenceElem:
+			{
+				InferenceElem *ie = (InferenceElem *) node;
+
+				APP_JUMB(ie->infercollid);
+				APP_JUMB(ie->inferopclass);
+				JumbleExpr(jstate, ie->expr);
+			}
+			break;
+		case T_TargetEntry:
+			{
+				TargetEntry *tle = (TargetEntry *) node;
+
+				APP_JUMB(tle->resno);
+				APP_JUMB(tle->ressortgroupref);
+				JumbleExpr(jstate, (Node *) tle->expr);
+			}
+			break;
+		case T_RangeTblRef:
+			{
+				RangeTblRef *rtr = (RangeTblRef *) node;
+
+				APP_JUMB(rtr->rtindex);
+			}
+			break;
+		case T_JoinExpr:
+			{
+				JoinExpr   *join = (JoinExpr *) node;
+
+				APP_JUMB(join->jointype);
+				APP_JUMB(join->isNatural);
+				APP_JUMB(join->rtindex);
+				JumbleExpr(jstate, join->larg);
+				JumbleExpr(jstate, join->rarg);
+				JumbleExpr(jstate, join->quals);
+			}
+			break;
+		case T_FromExpr:
+			{
+				FromExpr   *from = (FromExpr *) node;
+
+				JumbleExpr(jstate, (Node *) from->fromlist);
+				JumbleExpr(jstate, from->quals);
+			}
+			break;
+		case T_OnConflictExpr:
+			{
+				OnConflictExpr *conf = (OnConflictExpr *) node;
+
+				APP_JUMB(conf->action);
+				JumbleExpr(jstate, (Node *) conf->arbiterElems);
+				JumbleExpr(jstate, conf->arbiterWhere);
+				JumbleExpr(jstate, (Node *) conf->onConflictSet);
+				JumbleExpr(jstate, conf->onConflictWhere);
+				APP_JUMB(conf->constraint);
+				APP_JUMB(conf->exclRelIndex);
+				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
+			}
+			break;
+		case T_List:
+			foreach(temp, (List *) node)
+			{
+				JumbleExpr(jstate, (Node *) lfirst(temp));
+			}
+			break;
+		case T_IntList:
+			foreach(temp, (List *) node)
+			{
+				APP_JUMB(lfirst_int(temp));
+			}
+			break;
+		case T_SortGroupClause:
+			{
+				SortGroupClause *sgc = (SortGroupClause *) node;
+
+				APP_JUMB(sgc->tleSortGroupRef);
+				APP_JUMB(sgc->eqop);
+				APP_JUMB(sgc->sortop);
+				APP_JUMB(sgc->nulls_first);
+			}
+			break;
+		case T_GroupingSet:
+			{
+				GroupingSet *gsnode = (GroupingSet *) node;
+
+				JumbleExpr(jstate, (Node *) gsnode->content);
+			}
+			break;
+		case T_WindowClause:
+			{
+				WindowClause *wc = (WindowClause *) node;
+
+				APP_JUMB(wc->winref);
+				APP_JUMB(wc->frameOptions);
+				JumbleExpr(jstate, (Node *) wc->partitionClause);
+				JumbleExpr(jstate, (Node *) wc->orderClause);
+				JumbleExpr(jstate, wc->startOffset);
+				JumbleExpr(jstate, wc->endOffset);
+			}
+			break;
+		case T_CommonTableExpr:
+			{
+				CommonTableExpr *cte = (CommonTableExpr *) node;
+
+				/* we store the string name because RTE_CTE RTEs need it */
+				APP_JUMB_STRING(cte->ctename);
+				APP_JUMB(cte->ctematerialized);
+				JumbleQueryInternal(jstate, castNode(Query, cte->ctequery));
+			}
+			break;
+		case T_SetOperationStmt:
+			{
+				SetOperationStmt *setop = (SetOperationStmt *) node;
+
+				APP_JUMB(setop->op);
+				APP_JUMB(setop->all);
+				JumbleExpr(jstate, setop->larg);
+				JumbleExpr(jstate, setop->rarg);
+			}
+			break;
+		case T_RangeTblFunction:
+			{
+				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
+
+				JumbleExpr(jstate, rtfunc->funcexpr);
+			}
+			break;
+		case T_TableFunc:
+			{
+				TableFunc  *tablefunc = (TableFunc *) node;
+
+				JumbleExpr(jstate, tablefunc->docexpr);
+				JumbleExpr(jstate, tablefunc->rowexpr);
+				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
+			}
+			break;
+		case T_TableSampleClause:
+			{
+				TableSampleClause *tsc = (TableSampleClause *) node;
+
+				APP_JUMB(tsc->tsmhandler);
+				JumbleExpr(jstate, (Node *) tsc->args);
+				JumbleExpr(jstate, (Node *) tsc->repeatable);
+			}
+			break;
+		default:
+			/* Only a warning, since we can stumble along anyway */
+			elog(WARNING, "unrecognized node type: %d",
+				 (int) nodeTag(node));
+			break;
+	}
+}
+
+/*
+ * Record location of constant within query string of query tree
+ * that is currently being walked.
+ */
+static void
+RecordConstLocation(JumbleState *jstate, int location)
+{
+	/* -1 indicates unknown or undefined location */
+	if (location >= 0)
+	{
+		/* enlarge array if needed */
+		if (jstate->clocations_count >= jstate->clocations_buf_size)
+		{
+			jstate->clocations_buf_size *= 2;
+			jstate->clocations = (LocationLen *)
+				repalloc(jstate->clocations,
+						 jstate->clocations_buf_size *
+						 sizeof(LocationLen));
+		}
+		jstate->clocations[jstate->clocations_count].location = location;
+		/* initialize lengths to -1 to simplify third-party module usage */
+		jstate->clocations[jstate->clocations_count].length = -1;
+		jstate->clocations_count++;
+	}
+}
diff --git a/src/include/parser/analyze.h b/src/include/parser/analyze.h
index 4a3c9686f9..6716db6c13 100644
--- a/src/include/parser/analyze.h
+++ b/src/include/parser/analyze.h
@@ -15,10 +15,12 @@
 #define ANALYZE_H
 
 #include "parser/parse_node.h"
+#include "utils/queryjumble.h"
 
 /* Hook for plugins to get control at end of parse analysis */
 typedef void (*post_parse_analyze_hook_type) (ParseState *pstate,
-											  Query *query);
+											  Query *query,
+											  JumbleState *jstate);
 extern PGDLLIMPORT post_parse_analyze_hook_type post_parse_analyze_hook;
 
 
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 5004ee4177..40c4a75bac 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -248,6 +248,7 @@ extern bool log_btree_build_stats;
 extern PGDLLIMPORT bool check_function_bodies;
 extern bool session_auth_is_superuser;
 
+extern bool compute_queryid;
 extern bool log_duration;
 extern int	log_parameter_max_length;
 extern int	log_parameter_max_length_on_error;
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
new file mode 100644
index 0000000000..14087eea43
--- /dev/null
+++ b/src/include/utils/queryjumble.h
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.h
+ *	  Query normalization and fingerprinting.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/utils/queryjumble.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef QUERYJUBLE_H
+#define QUERYJUBLE_H
+
+#include "nodes/parsenodes.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+/*
+ * Struct for tracking locations/lengths of constants during normalization
+ */
+typedef struct LocationLen
+{
+	int			location;		/* start offset in query text */
+	int			length;			/* length in bytes, or -1 to ignore */
+} LocationLen;
+
+/*
+ * Working state for computing a query jumble and producing a normalized
+ * query string
+ */
+typedef struct JumbleState
+{
+	/* Jumble of current query tree */
+	unsigned char *jumble;
+
+	/* Number of bytes used in jumble[] */
+	Size		jumble_len;
+
+	/* Array of locations of constants that should be removed */
+	LocationLen *clocations;
+
+	/* Allocated length of clocations array */
+	int			clocations_buf_size;
+
+	/* Current number of valid entries in clocations array */
+	int			clocations_count;
+
+	/* highest Param id we've seen, in order to start normalization correctly */
+	int			highest_extern_param_id;
+} JumbleState;
+
+const char *clean_querytext(const char *query, int *location, int *len);
+JumbleState *JumbleQuery(Query *query, const char *querytext);
+
+#endif							/* QUERYJUMBLE_H */
-- 
2.30.1

v18-0002-Expose-queryid-in-pg_stat_activity-and-log_line_.patchtext/x-diff; charset=us-asciiDownload

From 01aa3deae114c45c22d2f8024e4950d843b2d810 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Mon, 18 Mar 2019 18:55:50 +0100
Subject: [PATCH v18 2/3] Expose queryid in pg_stat_activity and
 log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.

Author: Julien Rouhaud
Reviewed-by: Evgeny Efimkin, Michael Paquier, Tatsuro Yamada, Torikoshi Atsushi
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 112 +++++++-----------
 doc/src/sgml/config.sgml                      |  29 +++--
 doc/src/sgml/monitoring.sgml                  |  16 +++
 src/backend/catalog/system_views.sql          |   1 +
 src/backend/executor/execMain.c               |   8 ++
 src/backend/executor/execParallel.c           |  14 ++-
 src/backend/executor/nodeGather.c             |   3 +-
 src/backend/executor/nodeGatherMerge.c        |   4 +-
 src/backend/parser/analyze.c                  |   5 +
 src/backend/postmaster/pgstat.c               |  65 ++++++++++
 src/backend/tcop/postgres.c                   |   5 +
 src/backend/utils/adt/pgstatfuncs.c           |   7 +-
 src/backend/utils/error/elog.c                |   9 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          |  29 +++--
 src/include/catalog/pg_proc.dat               |   6 +-
 src/include/executor/execParallel.h           |   3 +-
 src/include/pgstat.h                          |   5 +
 src/include/utils/queryjumble.h               |   2 +-
 src/test/regress/expected/rules.out           |   9 +-
 20 files changed, 223 insertions(+), 110 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 99bc7184cb..2fc57f1254 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -65,6 +65,7 @@
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/queryjumble.h"
 #include "utils/memutils.h"
 #include "utils/timestamp.h"
 
@@ -99,6 +100,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+									!IsA(n, PrepareStmt) && \
+									!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -307,7 +316,6 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 								ProcessUtilityContext context, ParamListInfo params,
 								QueryEnvironment *queryEnv,
 								DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
 static void pgss_store(const char *query, uint64 queryId,
 					   int query_location, int query_len,
 					   pgssStoreKind kind,
@@ -804,16 +812,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * Clear queryId for prepared statements related utility, as those will
+	 * inherit from the underlying statement's one (except DEALLOCATE which is
+	 * entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt))
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -1055,6 +1061,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, QueryCompletion *qc)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
+	uint64		saved_queryId = pstmt->queryId;
+
+	/*
+	 * Force utility statements to get queryId zero.  We do this even in cases
+	 * where the statement contains an optimizable statement for which a
+	 * queryId could be derived (such as EXPLAIN or DECLARE CURSOR).  For such
+	 * cases, runtime control will first go through ProcessUtility and then the
+	 * executor, and we don't want the executor hooks to do anything, since we
+	 * are already measuring the statement's costs at the utility level.
+	 *
+	 * Note that this is only done if pg_stat_statements is enabled and
+	 * configured to track utility statements, in the unlikely possibility
+	 * that user configured another extension to handle utility statements
+	 * only.
+	 */
+	if (pgss_enabled(exec_nested_level) && pgss_track_utility)
+		pstmt->queryId = UINT64CONST(0);
 
 	/*
 	 * If it's an EXECUTE statement, we don't track it and don't increment the
@@ -1071,9 +1094,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	 * Likewise, we don't track execution of DEALLOCATE.
 	 */
 	if (pgss_track_utility && pgss_enabled(exec_nested_level) &&
-		!IsA(parsetree, ExecuteStmt) &&
-		!IsA(parsetree, PrepareStmt) &&
-		!IsA(parsetree, DeallocateStmt))
+		PGSS_HANDLED_UTILITY(parsetree))
 	{
 		instr_time	start;
 		instr_time	duration;
@@ -1128,7 +1149,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
 
 		pgss_store(queryString,
-				   0,			/* signal that it's a utility stmt */
+				   saved_queryId,
 				   pstmt->stmt_location,
 				   pstmt->stmt_len,
 				   PGSS_EXEC,
@@ -1151,23 +1172,12 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	}
 }
 
-/*
- * Given an arbitrarily long query string, produce a hash for the purposes of
- * identifying the query, without normalizing constants.  Used when hashing
- * utility statements.
- */
-static uint64
-pgss_hash_string(const char *str, int len)
-{
-	return DatumGetUInt64(hash_any_extended((const unsigned char *) str,
-											len, 0));
-}
-
 /*
  * Store some statistics for a statement.
  *
- * If queryId is 0 then this is a utility statement and we should compute
- * a suitable queryId internally.
+ * If queryId is 0 then this is a utility statement for which we couldn't
+ * compute a queryId during parse analysis, and we should compute a suitable
+ * queryId internally.
  *
  * If jstate is not NULL then we're trying to create an entry for which
  * we have no statistics as yet; we just want to record the normalized
@@ -1198,52 +1208,18 @@ pgss_store(const char *query, uint64 queryId,
 		return;
 
 	/*
-	 * Confine our attention to the relevant part of the string, if the query
-	 * is a portion of a multi-statement source string.
-	 *
-	 * First apply starting offset, unless it's -1 (unknown).
-	 */
-	if (query_location >= 0)
-	{
-		Assert(query_location <= strlen(query));
-		query += query_location;
-		/* Length of 0 (or -1) means "rest of string" */
-		if (query_len <= 0)
-			query_len = strlen(query);
-		else
-			Assert(query_len <= strlen(query));
-	}
-	else
-	{
-		/* If query location is unknown, distrust query_len as well */
-		query_location = 0;
-		query_len = strlen(query);
-	}
-
-	/*
-	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
-	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 * Nothing to do if compute_queryid isn't enabled and no other module
+	 * computed a query identifier.
 	 */
-	while (query_len > 0 && scanner_isspace(query[0]))
-		query++, query_location++, query_len--;
-	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
-		query_len--;
+	if (queryId == UINT64CONST(0))
+		return;
 
 	/*
-	 * For utility statements, we just hash the query string to get an ID.
+	 * Confine our attention to the relevant part of the string, if the query
+	 * is a portion of a multi-statement source string, and update query
+	 * location and length if needed.
 	 */
-	if (queryId == UINT64CONST(0))
-	{
-		queryId = pgss_hash_string(query, query_len);
-
-		/*
-		 * If we are unlucky enough to get a hash of zero(invalid), use
-		 * queryID as 2 instead, queryID 1 is already in use for normal
-		 * statements.
-		 */
-		if (queryId == UINT64CONST(0))
-			queryId = UINT64CONST(2);
-	}
+	query = CleanQuerytext(query, &query_location, &query_len);
 
 	/* Set up key for hashtable search */
 	key.userid = GetUserId();
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6834ea3735..bdec637cd5 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6920,6 +6920,15 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>queryid: identifier of session's current query.
+             By default, query identifiers are not computed, so this field will
+             always be zero, unless <xref linkend="guc-compute-queryid"/>
+             parameter is enabled or if a third-party module that computes query
+             identifiers is configured.</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
@@ -7396,8 +7405,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       <listitem>
        <para>
         Enables the collection of information on the currently
-        executing command of each session, along with the time when
-        that command began execution. This parameter is on by
+        executing command of each session, along with its identifier and the
+        time when that command began execution. This parameter is on by
         default. Note that even when enabled, this information is not
         visible to all users, only to superusers and the user owning
         the session being reported on, so it should not represent a
@@ -7544,12 +7553,16 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </term>
       <listitem>
        <para>
-        Enables or disables in core query identifier computation.arameter.  The
-        <xref linkend="pgstatstatements"/> extension requires a query
-        identifier to be computed.  Note that an external module can
-        alternatively be used if the in core query identifier computation
-        specification doesn't suit your need.  In this case, in core
-        computation must be disabled.  The default is <literal>off</literal>.
+        Enables or disables in core query identifier computation.  A query
+        identifier can be displayed in the <link
+        linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
+        view, or emitted in the log if configured via the <xref
+        linkend="guc-log-line-prefix"/> parameter.  The <xref
+        linkend="pgstatstatements"/> extension also requires a query identifier
+        to be computed.  Note that an external module can alternatively be used
+        if the in core query identifier computation specification doesn't suit
+        your need.  In this case, in core computation must be disabled.  The
+        default is <literal>off</literal>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index c35045faa1..1fae30b51e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -910,6 +910,22 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </para></entry>
      </row>
 
+    <row>
+     <entry role="catalog_table_entry"><para role="column_definition">
+      <structfield>queryid</structfield> <type>bigint</type>
+     </para>
+     <para>
+      Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this field
+      shows the identifier of the currently executing query. In all other
+      states, it shows the identifier of last query that was executed.  By
+      default, query identifiers are not computed, so this field will always
+      be null, unless <xref linkend="guc-compute-queryid"/> parameter is
+      enabled or if a third-party module that computes query identifiers is
+      configured.
+     </para></entry>
+    </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 0dca65dc7b..012d86217f 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -764,6 +764,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0648dd82ba..e39cf20161 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -54,6 +54,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "tcop/utility.h"
@@ -128,6 +129,13 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	/* In some cases (e.g. an EXECUTE statement) a query execution will skip
+	 * parse analysis, which means that the queryid won't be reported.  Note
+	 * that it's harmless to report the queryid multiple time, as the call will
+	 * be ignored if the top level queryid has already been reported.
+	 */
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..26f1994a31 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -124,7 +124,7 @@ typedef struct ExecParallelInitializeDSMContext
 } ExecParallelInitializeDSMContext;
 
 /* Helper functions that run in the parallel leader. */
-static char *ExecSerializePlan(Plan *plan, EState *estate);
+static char *ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId);
 static bool ExecParallelEstimate(PlanState *node,
 								 ExecParallelEstimateContext *e);
 static bool ExecParallelInitializeDSM(PlanState *node,
@@ -143,7 +143,7 @@ static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
  * Create a serialized representation of the plan to be sent to each worker.
  */
 static char *
-ExecSerializePlan(Plan *plan, EState *estate)
+ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId)
 {
 	PlannedStmt *pstmt;
 	ListCell   *lc;
@@ -174,7 +174,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = UINT64CONST(0);
+	pstmt->queryId = queryId;
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
@@ -578,7 +578,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
 ParallelExecutorInfo *
 ExecInitParallelPlan(PlanState *planstate, EState *estate,
 					 Bitmapset *sendParams, int nworkers,
-					 int64 tuples_needed)
+					 int64 tuples_needed,
+					 uint64 queryId)
 {
 	ParallelExecutorInfo *pei;
 	ParallelContext *pcxt;
@@ -620,7 +621,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pei->planstate = planstate;
 
 	/* Fix up and serialize plan to be sent to workers. */
-	pstmt_data = ExecSerializePlan(planstate->plan, estate);
+	pstmt_data = ExecSerializePlan(planstate->plan, estate, queryId);
 
 	/* Create a parallel context. */
 	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -1403,8 +1404,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Setting debug_query_string for individual workers */
 	debug_query_string = queryDesc->sourceText;
 
-	/* Report workers' query for monitoring purposes */
+	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 9e1dc464cb..04c860f678 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -172,7 +172,8 @@ ExecGather(PlanState *pstate)
 												 estate,
 												 gather->initParam,
 												 gather->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index aa5743cebf..32f74e8c23 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -24,6 +24,7 @@
 #include "lib/binaryheap.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -216,7 +217,8 @@ ExecGatherMerge(PlanState *pstate)
 												 estate,
 												 gm->initParam,
 												 gm->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index ddfb97b543..0dd7e95abd 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -44,6 +44,7 @@
 #include "parser/parse_target.h"
 #include "parser/parse_type.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
@@ -130,6 +131,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
@@ -167,6 +170,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index b1e2d94951..dab90243eb 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3381,6 +3381,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
+			beentry->st_queryid = 0;
 			proc->wait_event_info = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
@@ -3435,6 +3436,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	beentry->st_state = state;
 	beentry->st_state_start_timestamp = current_timestamp;
 
+	/*
+	 * If a new query is started, we reset the query identifier as it'll only
+	 * be known after parse analysis, to avoid reporting last query's
+	 * identifier.
+	 */
+	if (state == STATE_RUNNING)
+		beentry->st_queryid = 0;
+
 	if (cmd_str != NULL)
 	{
 		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
@@ -3445,6 +3454,48 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/* --------
+ * pgstat_report_queryid() -
+ *
+ *	Called to update top-level query identifier.
+ * --------
+ */
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	/*
+	 * if track_activities is disabled, st_queryid should already have been
+	 * reset
+	 */
+	if (!pgstat_track_activities)
+		return;
+
+	/*
+	 * We only report the top-level query identifiers.  The stored queryid is
+	 * reset when a backend calls pgstat_report_activity(STATE_RUNNING), or
+	 * with an explicit call to this function using the force flag.  If the
+	 * saved query identifier is not zero it means that it's not a top-level
+	 * command, so ignore the one provided unless it's an explicit call to
+	 * reset the identifier.
+	 */
+	if (beentry->st_queryid != 0 && !force)
+		return;
+
+	/*
+	 * Update my status entry, following the protocol of bumping
+	 * st_changecount before and after.  We use a volatile pointer here to
+	 * ensure the compiler doesn't try to get cute.
+	 */
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+	beentry->st_queryid = queryId;
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
 /*-----------
  * pgstat_progress_start_command() -
  *
@@ -5178,6 +5229,20 @@ pgstat_get_db_entry(Oid databaseid, bool create)
 	return result;
 }
 
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ *	Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+	if (!MyBEEntry)
+		return 0;
+
+	return MyBEEntry->st_queryid;
+}
+
 
 /*
  * Lookup the hash table entry for the specified table. If no hash
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index c11af652de..b2892db274 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -680,6 +680,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	if (log_parser_stats)
 		ShowUsage("PARSE ANALYSIS STATISTICS");
 
@@ -898,6 +900,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
 			stmt->utilityStmt = query->utilityStmt;
 			stmt->stmt_location = query->stmt_location;
 			stmt->stmt_len = query->stmt_len;
+			stmt->queryId = query->queryId;
 		}
 		else
 		{
@@ -1014,6 +1017,8 @@ exec_simple_query(const char *query_string)
 		DestReceiver *receiver;
 		int16		format;
 
+		pgstat_report_queryid(0, true);
+
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 5102227a60..8e81eef8cb 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -569,7 +569,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	29
+#define PG_STAT_GET_ACTIVITY_COLS	30
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -914,6 +914,10 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[27] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			if (beentry->st_queryid == 0)
+				nulls[29] = true;
+			else
+				values[29] = DatumGetUInt64(beentry->st_queryid);
 		}
 		else
 		{
@@ -941,6 +945,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[26] = true;
 			nulls[27] = true;
 			nulls[28] = true;
+			nulls[29] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index e729ebece7..7aa484c5ed 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -77,7 +77,6 @@
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
-#include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2685,6 +2684,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (padding != 0)
+					appendStringInfo(buf, "%*ld", padding,
+							pgstat_get_my_queryid());
+				else
+					appendStringInfo(buf, "%ld",
+							pgstat_get_my_queryid());
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 31230b5704..1f1b93995f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -541,6 +541,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index ae84fcac6e..b0a5731ef7 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -39,7 +39,7 @@
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
-static uint64 compute_utility_queryid(const char *str, int query_len);
+static uint64 compute_utility_queryid(const char *str, int query_location, int query_len);
 static void AppendJumble(JumbleState *jstate,
 						 const unsigned char *item, Size size);
 static void JumbleQueryInternal(JumbleState *jstate, Query *query);
@@ -53,7 +53,7 @@ static void RecordConstLocation(JumbleState *jstate, int location);
  * relevant part of the string.
  */
 const char *
-clean_querytext(const char *query, int *location, int *len)
+CleanQuerytext(const char *query, int *location, int *len)
 {
 	int query_location = *location;
 	int query_len = *len;
@@ -97,17 +97,9 @@ JumbleQuery(Query *query, const char *querytext)
 	JumbleState *jstate = NULL;
 	if (query->utilityStmt)
 	{
-		const char *sql;
-		int query_location = query->stmt_location;
-		int query_len = query->stmt_len;
-
-		/*
-		 * Confine our attention to the relevant part of the string, if the
-		 * query is a portion of a multi-statement source string.
-		 */
-		sql = clean_querytext(querytext, &query_location, &query_len);
-
-		query->queryId = compute_utility_queryid(sql, query_len);
+		query->queryId = compute_utility_queryid(querytext,
+												 query->stmt_location,
+												 query->stmt_len);
 	}
 	else
 	{
@@ -143,11 +135,18 @@ JumbleQuery(Query *query, const char *querytext)
  * Compute a query identifier for the given utility query string.
  */
 static uint64
-compute_utility_queryid(const char *str, int query_len)
+compute_utility_queryid(const char *query_text, int query_location, int query_len)
 {
 	uint64 queryId;
+	const char *sql;
+
+	/*
+	 * Confine our attention to the relevant part of the string, if the
+	 * query is a portion of a multi-statement source string.
+	 */
+	sql = CleanQuerytext(query_text, &query_location, &query_len);
 
-	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) sql,
 											   query_len, 0));
 
 	/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 93393fcfd4..da76bf6ab4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5249,9 +5249,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 3888175a2f..e0e08e0b27 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -39,7 +39,8 @@ typedef struct ParallelExecutorInfo
 
 extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
 												  EState *estate, Bitmapset *sendParam, int nworkers,
-												  int64 tuples_needed);
+												  int64 tuples_needed,
+												  uint64 queryId);
 extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
 extern void ExecParallelFinish(ParallelExecutorInfo *pei);
 extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be43c04802..09d36a1e23 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1263,6 +1263,9 @@ typedef struct PgBackendStatus
 	ProgressCommandType st_progress_command;
 	Oid			st_progress_command_target;
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+
+	/* query identifier, optionally computed using post_parse_analyze_hook */
+	uint64		st_queryid;
 } PgBackendStatus;
 
 /*
@@ -1457,6 +1460,7 @@ extern void pgstat_initialize(void);
 extern void pgstat_bestart(void);
 
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_queryid(uint64 queryId, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
@@ -1465,6 +1469,7 @@ extern const char *pgstat_get_wait_event_type(uint32 wait_event_info);
 extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
+extern uint64 pgstat_get_my_queryid(void);
 
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
index 14087eea43..520cd4f43e 100644
--- a/src/include/utils/queryjumble.h
+++ b/src/include/utils/queryjumble.h
@@ -52,7 +52,7 @@ typedef struct JumbleState
 	int			highest_extern_param_id;
 } JumbleState;
 
-const char *clean_querytext(const char *query, int *location, int *len);
+const char *CleanQuerytext(const char *query, int *location, int *len);
 JumbleState *JumbleQuery(Query *query, const char *querytext);
 
 #endif							/* QUERYJUMBLE_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 9b12cc122a..ff3506d5d7 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1762,9 +1762,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1876,7 +1877,7 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
@@ -2046,7 +2047,7 @@ pg_stat_replication| SELECT s.pid,
     w.sync_priority,
     w.sync_state,
     w.reply_time
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_replication_slots| SELECT s.slot_name,
@@ -2076,7 +2077,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,
-- 
2.30.1

v18-0003-Expose-query-identifier-in-verbose-explain.patchtext/x-diff; charset=us-asciiDownload

From 9b4cbbebb793a9cb3ca8e4b06a367f0ade8d023e Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Sun, 8 Mar 2020 14:34:44 +0100
Subject: [PATCH v18 3/3] Expose query identifier in verbose explain

If a query identifier has been computed, either by enabling compute_queryid or
using a third-party module, verbose explain will display it.

Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 doc/src/sgml/config.sgml              | 14 +++++++-------
 doc/src/sgml/ref/explain.sgml         |  6 ++++--
 src/backend/commands/explain.c        | 18 ++++++++++++++++++
 src/test/regress/expected/explain.out | 11 ++++++++++-
 src/test/regress/sql/explain.sql      |  5 ++++-
 5 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index bdec637cd5..16617b43d3 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7556,13 +7556,13 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         Enables or disables in core query identifier computation.  A query
         identifier can be displayed in the <link
         linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
-        view, or emitted in the log if configured via the <xref
-        linkend="guc-log-line-prefix"/> parameter.  The <xref
-        linkend="pgstatstatements"/> extension also requires a query identifier
-        to be computed.  Note that an external module can alternatively be used
-        if the in core query identifier computation specification doesn't suit
-        your need.  In this case, in core computation must be disabled.  The
-        default is <literal>off</literal>.
+        view, using <command>EXPLAIN</command>, or emitted in the log if
+        configured via the <xref linkend="guc-log-line-prefix"/> parameter.
+        The <xref linkend="pgstatstatements"/> extension also requires a query
+        identifier to be computed.  Note that an external module can
+        alternatively be used if the in core query identifier computation
+        specification doesn't suit your need.  In this case, in core
+        computation must be disabled.  The default is <literal>off</literal>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index c4512332a0..105b069b41 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -136,8 +136,10 @@ ROLLBACK;
       the output column list for each node in the plan tree, schema-qualify
       table and function names, always label variables in expressions with
       their range table alias, and always print the name of each trigger for
-      which statistics are displayed.  This parameter defaults to
-      <literal>FALSE</literal>.
+      which statistics are displayed.  The query identifier will also be
+      displayed if one has been compute, see <xref
+      linkend="guc-compute-queryid"/> for more details.  This parameter
+      defaults to <literal>FALSE</literal>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index afc45429ba..ac5879c1cf 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -24,6 +24,7 @@
 #include "nodes/extensible.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "parser/analyze.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/bufmgr.h"
@@ -163,6 +164,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 {
 	ExplainState *es = NewExplainState();
 	TupOutputState *tstate;
+	JumbleState *jstate = NULL;
+	Query		*query;
 	List	   *rewritten;
 	ListCell   *lc;
 	bool		timing_set = false;
@@ -239,6 +242,13 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	/* if the summary was not set explicitly, set default value */
 	es->summary = (summary_set) ? es->summary : es->analyze;
 
+	query = castNode(Query, stmt->query);
+	if (compute_queryid)
+		jstate = JumbleQuery(query, pstate->p_sourcetext);
+
+	if (post_parse_analyze_hook)
+		(*post_parse_analyze_hook) (pstate, query, jstate);
+
 	/*
 	 * Parse analysis was done already, but we still have to run the rule
 	 * rewriter.  We do not do AcquireRewriteLocks: we assume the query either
@@ -598,6 +608,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	/* Create textual dump of plan tree */
 	ExplainPrintPlan(es, queryDesc);
 
+	if (es->verbose && plannedstmt->queryId != UINT64CONST(0))
+	{
+		char	buf[MAXINT8LEN+1];
+
+		pg_lltoa(plannedstmt->queryId, buf);
+		ExplainPropertyText("Query Identifier", buf, es);
+	}
+
 	/* Show buffer usage in planning */
 	if (bufusage)
 	{
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
index dc7ab2ce8b..f45f069f30 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -17,7 +17,7 @@ begin
     for ln in execute $1
     loop
         -- Replace any numeric word with just 'N'
-        ln := regexp_replace(ln, '\m\d+\M', 'N', 'g');
+        ln := regexp_replace(ln, '-?\m\d+\M', 'N', 'g');
         -- In sort output, the above won't match units-suffixed numbers
         ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
         -- Ignore text-mode buffers output because it varies depending
@@ -472,3 +472,12 @@ select jsonb_pretty(
 (1 row)
 
 rollback;
+set compute_queryid = on;
+select explain_filter('explain (verbose) select 1');
+             explain_filter             
+----------------------------------------
+ Result  (cost=N.N..N.N rows=N width=N)
+   Output: N
+ Query Identifier: N
+(3 rows)
+
diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql
index c79116c927..99f7bb1bf5 100644
--- a/src/test/regress/sql/explain.sql
+++ b/src/test/regress/sql/explain.sql
@@ -19,7 +19,7 @@ begin
     for ln in execute $1
     loop
         -- Replace any numeric word with just 'N'
-        ln := regexp_replace(ln, '\m\d+\M', 'N', 'g');
+        ln := regexp_replace(ln, '-?\m\d+\M', 'N', 'g');
         -- In sort output, the above won't match units-suffixed numbers
         ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
         -- Ignore text-mode buffers output because it varies depending
@@ -105,3 +105,6 @@ select jsonb_pretty(
 );
 
 rollback;
+
+set compute_queryid = on;
+select explain_filter('explain (verbose) select 1');
-- 
2.30.1

bruce@momjian.us

almost 5 years ago

In reply to: Julien Rouhaud (#113)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Sun, Mar 14, 2021 at 04:06:45PM +0800, Julien Rouhaud wrote:

Recent conflict, thanks to cfbot. v18 attached.

We are reaching the two-year mark on this feature, that everyone seems
to agree is needed. Is any committer going to work on this to get it
into PG 14? Should I take it?

I just read the thread and I didn't see any open issues. Are there any?

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

tgl@sss.pgh.pa.us

almost 5 years ago

In reply to: Bruce Momjian (#114)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Bruce Momjian <bruce@momjian.us> writes:

We are reaching the two-year mark on this feature, that everyone seems
to agree is needed. Is any committer going to work on this to get it
into PG 14? Should I take it?

I still say that it's a serious mistake to sanctify a query ID calculation
method that was designed only for pg_stat_statement's needs as the one
true way to do it. But that's what exposing it in a core view would do.

regards, tom lane

bruce@momjian.us

almost 5 years ago

In reply to: Tom Lane (#115)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Mar 17, 2021 at 11:28:38AM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

We are reaching the two-year mark on this feature, that everyone seems
to agree is needed. Is any committer going to work on this to get it
into PG 14? Should I take it?

I still say that it's a serious mistake to sanctify a query ID calculation
method that was designed only for pg_stat_statement's needs as the one
true way to do it. But that's what exposing it in a core view would do.

OK, I am fine with creating a new method, and maybe having
pg_stat_statements use it. Is that the direction we should be going in?
I do think we need _some_ method in core if we are going to be exposing
this value in pg_stat_activity and log_line_prefix.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

tgl@sss.pgh.pa.us

almost 5 years ago

In reply to: Bruce Momjian (#116)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Bruce Momjian <bruce@momjian.us> writes:

On Wed, Mar 17, 2021 at 11:28:38AM -0400, Tom Lane wrote:

I still say that it's a serious mistake to sanctify a query ID calculation
method that was designed only for pg_stat_statement's needs as the one
true way to do it. But that's what exposing it in a core view would do.

OK, I am fine with creating a new method, and maybe having
pg_stat_statements use it. Is that the direction we should be going in?

The point is that we've understood Query.queryId as something that
different extensions might calculate differently for their own needs.
In particular it's easy to imagine extensions that want an ID that is
less fuzzy than what pg_stat_statements wants. We never had a plan for
how two such extensions could co-exist, but at least it was possible
to use one if you didn't use another. If this gets moved into core
then there will basically be only one way that anyone can do it.

Maybe what we need is a design for allowing more than one query ID.

I do think we need _some_ method in core if we are going to be exposing
this value in pg_stat_activity and log_line_prefix.

I'm basically objecting to the conclusion that we should do either
of those. There is no way around the fact that it will break every
user of Query.queryId other than pg_stat_statements, unless they
are okay with whatever definition pg_stat_statements is using (which
is a moving target BTW).

regards, tom lane

bruce@momjian.us

almost 5 years ago

In reply to: Tom Lane (#117)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Mar 17, 2021 at 12:01:38PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Wed, Mar 17, 2021 at 11:28:38AM -0400, Tom Lane wrote:

I still say that it's a serious mistake to sanctify a query ID calculation
method that was designed only for pg_stat_statement's needs as the one
true way to do it. But that's what exposing it in a core view would do.

OK, I am fine with creating a new method, and maybe having
pg_stat_statements use it. Is that the direction we should be going in?

The point is that we've understood Query.queryId as something that
different extensions might calculate differently for their own needs.
In particular it's easy to imagine extensions that want an ID that is
less fuzzy than what pg_stat_statements wants. We never had a plan for
how two such extensions could co-exist, but at least it was possible
to use one if you didn't use another. If this gets moved into core
then there will basically be only one way that anyone can do it.

Well, the patch docs say:

Enables or disables in core query identifier computation.arameter. The
<xref linkend="pgstatstatements"/> extension requires a query
--> identifier to be computed. Note that an external module can
--> alternatively be used if the in core query identifier computation
specification doesn't suit your need. In this case, in core
computation must be disabled. The default is <literal>off</literal>.

Maybe what we need is a design for allowing more than one query ID.

I do think we need _some_ method in core if we are going to be exposing
this value in pg_stat_activity and log_line_prefix.

I'm basically objecting to the conclusion that we should do either
of those. There is no way around the fact that it will break every
user of Query.queryId other than pg_stat_statements, unless they
are okay with whatever definition pg_stat_statements is using (which
is a moving target BTW).

I thought the above doc patch feature avoided this problem because an
extension can override the build-in query id.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

pavel.stehule@gmail.com

almost 5 years ago

In reply to: Tom Lane (#117)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

st 17. 3. 2021 v 17:03 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Bruce Momjian <bruce@momjian.us> writes:

On Wed, Mar 17, 2021 at 11:28:38AM -0400, Tom Lane wrote:

I still say that it's a serious mistake to sanctify a query ID

calculation

method that was designed only for pg_stat_statement's needs as the one
true way to do it. But that's what exposing it in a core view would do.

OK, I am fine with creating a new method, and maybe having
pg_stat_statements use it. Is that the direction we should be going in?

The point is that we've understood Query.queryId as something that
different extensions might calculate differently for their own needs.
In particular it's easy to imagine extensions that want an ID that is
less fuzzy than what pg_stat_statements wants. We never had a plan for
how two such extensions could co-exist, but at least it was possible
to use one if you didn't use another. If this gets moved into core
then there will basically be only one way that anyone can do it.

Maybe what we need is a design for allowing more than one query ID.

Theoretically there can be a hook for calculation of queryid, that can be
by used extension. Default can be assigned with a method that is used by
pg_stat_statements.

I don't think it is possible to use more different query id for
pg_stat_statements so this solution can be simple.

regards

Pavel

Show quoted text

I do think we need _some_ method in core if we are going to be exposing
this value in pg_stat_activity and log_line_prefix.

I'm basically objecting to the conclusion that we should do either
of those. There is no way around the fact that it will break every
user of Query.queryId other than pg_stat_statements, unless they
are okay with whatever definition pg_stat_statements is using (which
is a moving target BTW).

regards, tom lane

bruce@momjian.us

almost 5 years ago

In reply to: Pavel Stehule (#119)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Mar 17, 2021 at 05:16:50PM +0100, Pavel Stehule wrote:

st 17. 3. 2021 vï¿½17:03 odesï¿½latel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Bruce Momjian <bruce@momjian.us> writes:

On Wed, Mar 17, 2021 at 11:28:38AM -0400, Tom Lane wrote:

I still say that it's a serious mistake to sanctify a query ID

calculation

method that was designed only for pg_stat_statement's needs as the one
true way to do it.ï¿½ But that's what exposing it in a core view would do.

OK, I am fine with creating a new method, and maybe having
pg_stat_statements use it.ï¿½ Is that the direction we should be going in?

The point is that we've understood Query.queryId as something that
different extensions might calculate differently for their own needs.
In particular it's easy to imagine extensions that want an ID that is
less fuzzy than what pg_stat_statements wants.ï¿½ We never had a plan for
how two such extensions could co-exist, but at least it was possible
to use one if you didn't use another.ï¿½ If this gets moved into core
then there will basically be only one way that anyone can do it.

Maybe what we need is a design for allowing more than one query ID.

Theoretically there can be a hook for calculation of queryid, that can be by
used extension. Default can be assigned with a method that is used by
pg_stat_statements.

Yes, that is what the code patch says it does.

I don't think it is possible to use more different query id for
pg_stat_statements so this solution can be simple.

Agreed.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

rjuju123@gmail.com

almost 5 years ago

In reply to: Bruce Momjian (#120)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Mar 17, 2021 at 12:24:44PM -0400, Bruce Momjian wrote:

On Wed, Mar 17, 2021 at 05:16:50PM +0100, Pavel Stehule wrote:

st 17. 3. 2021 vï¿½17:03 odesï¿½latel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Bruce Momjian <bruce@momjian.us> writes:

On Wed, Mar 17, 2021 at 11:28:38AM -0400, Tom Lane wrote:

I still say that it's a serious mistake to sanctify a query ID

calculation

method that was designed only for pg_stat_statement's needs as the one
true way to do it.ï¿½ But that's what exposing it in a core view would do.

OK, I am fine with creating a new method, and maybe having
pg_stat_statements use it.ï¿½ Is that the direction we should be going in?

The point is that we've understood Query.queryId as something that
different extensions might calculate differently for their own needs.
In particular it's easy to imagine extensions that want an ID that is
less fuzzy than what pg_stat_statements wants.ï¿½ We never had a plan for
how two such extensions could co-exist, but at least it was possible
to use one if you didn't use another.ï¿½ If this gets moved into core
then there will basically be only one way that anyone can do it.

Maybe what we need is a design for allowing more than one query ID.

Theoretically there can be a hook for calculation of queryid, that can be by
used extension. Default can be assigned with a method that is used by
pg_stat_statements.

Yes, that is what the code patch says it does.

I don't think it is possible to use more different query id for
pg_stat_statements so this solution can be simple.

Agreed.

Actually, putting the query identifer computation in the core makes it way more
tunable, even if it's conterintuitive. What it means is that you can now chose
to use usual pgss' algorithm or a different one for log_line_prefix and
pg_stat_activity.queryid, but also that you can now use pgss with a different
query id algorithm. That's another thing that user were asking for a long
time.

I originally suggested to make it clearer by having an enum GUC rather than a
boolean, say compute_queryid = [ none | core | external ], and if set to
external then a hook would be explicitely called. Right now, "none" and
"external" are binded with compute_queryid = off, and depends on whether an
extension is computing a queryid during post_parse_analyse_hook.

It could later be extended to suit other needs if we ever come to some
agreement (for instance "legacy", "logical_replication_stable" or whatever
better name we can find for something that doesn't depend on Oid).

robertmhaas@gmail.com

almost 5 years ago

In reply to: Julien Rouhaud (#121)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Mar 17, 2021 at 12:48 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

I originally suggested to make it clearer by having an enum GUC rather than a
boolean, say compute_queryid = [ none | core | external ], and if set to
external then a hook would be explicitely called. Right now, "none" and
"external" are binded with compute_queryid = off, and depends on whether an
extension is computing a queryid during post_parse_analyse_hook.

I would just make it a Boolean and have a hook. The Boolean controls
whether it gets computed at all, and the hook lets an external module
override the way it gets computed.

--
Robert Haas
EDB: http://www.enterprisedb.com

bruce@momjian.us

almost 5 years ago

In reply to: Robert Haas (#122)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Mar 17, 2021 at 04:04:44PM -0400, Robert Haas wrote:

On Wed, Mar 17, 2021 at 12:48 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

I originally suggested to make it clearer by having an enum GUC rather than a
boolean, say compute_queryid = [ none | core | external ], and if set to
external then a hook would be explicitely called. Right now, "none" and
"external" are binded with compute_queryid = off, and depends on whether an
extension is computing a queryid during post_parse_analyse_hook.

I would just make it a Boolean and have a hook. The Boolean controls
whether it gets computed at all, and the hook lets an external module
override the way it gets computed.

OK, is that what everyone wants? I think that is what the patch already
does.

I think having multiple queryids used in a single cluster is much too
confusing to support. You would have to label and control which queryid
is displayed by pg_stat_activity and log_line_prefix, and that seems too
confusing and not useful.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

rjuju123@gmail.com

almost 5 years ago

In reply to: Bruce Momjian (#123)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Mar 17, 2021 at 06:32:16PM -0400, Bruce Momjian wrote:

On Wed, Mar 17, 2021 at 04:04:44PM -0400, Robert Haas wrote:

On Wed, Mar 17, 2021 at 12:48 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

I originally suggested to make it clearer by having an enum GUC rather than a
boolean, say compute_queryid = [ none | core | external ], and if set to
external then a hook would be explicitely called. Right now, "none" and
"external" are binded with compute_queryid = off, and depends on whether an
extension is computing a queryid during post_parse_analyse_hook.

I would just make it a Boolean and have a hook. The Boolean controls
whether it gets computed at all, and the hook lets an external module
override the way it gets computed.

OK, is that what everyone wants? I think that is what the patch already
does.

Note exactly. Right now a custom queryid can be computed even if
compute_queryid is off, if some extension does that in post_parse_analyze_hook.

I'm assuming that what Robert was thinking was more like:

if (compute_queryid)
{
if (queryid_hook)
queryId = queryid_hook(...);
else
queryId = JumbeQuery(...);
}
else
queryId = 0;

And that should be done *after* post_parse_analyse_hook so that it's clear that
this hook is no longer the place to compute queryid.

Is that what should be done?

I think having multiple queryids used in a single cluster is much too
confusing to support. You would have to label and control which queryid
is displayed by pg_stat_activity and log_line_prefix, and that seems too
confusing and not useful.

I agree.

bruce@momjian.us

almost 5 years ago

In reply to: Julien Rouhaud (#124)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Mar 18, 2021 at 07:29:56AM +0800, Julien Rouhaud wrote:

On Wed, Mar 17, 2021 at 06:32:16PM -0400, Bruce Momjian wrote:

OK, is that what everyone wants? I think that is what the patch already
does.

Note exactly. Right now a custom queryid can be computed even if
compute_queryid is off, if some extension does that in post_parse_analyze_hook.

I'm assuming that what Robert was thinking was more like:

if (compute_queryid)
{
if (queryid_hook)
queryId = queryid_hook(...);
else
queryId = JumbeQuery(...);
}
else
queryId = 0;

And that should be done *after* post_parse_analyse_hook so that it's clear that
this hook is no longer the place to compute queryid.

Is that what should be done?

No, I don't think so. I think having extensions change behavior
controlled by GUCs is a bad interface.

The docs are going to say that you have to enable compute_queryid to see
the query id in pg_stat_activity and log_line_prefix, but if you install
an extension, the query id will be visible even if you don't have
compute_queryid enabled. I think you need to only honor the hook if
compute_queryid is enabled, and update the pg_stat_statements docs to
say you have to enable compute_queryid for pg_stat_statements to work.

Also, should it be compute_queryid or compute_query_id?

Also, the overhead of computing the query id was reported as 2% --- that
seems quite high for what it does. Do we know why it is so high?

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

rjuju123@gmail.com

almost 5 years ago

In reply to: Bruce Momjian (#125)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Mar 18, 2021 at 09:47:29AM -0400, Bruce Momjian wrote:

On Thu, Mar 18, 2021 at 07:29:56AM +0800, Julien Rouhaud wrote:

On Wed, Mar 17, 2021 at 06:32:16PM -0400, Bruce Momjian wrote:

OK, is that what everyone wants? I think that is what the patch already
does.

Note exactly. Right now a custom queryid can be computed even if
compute_queryid is off, if some extension does that in post_parse_analyze_hook.

I'm assuming that what Robert was thinking was more like:

if (compute_queryid)
{
if (queryid_hook)
queryId = queryid_hook(...);
else
queryId = JumbeQuery(...);
}
else
queryId = 0;

And that should be done *after* post_parse_analyse_hook so that it's clear that
this hook is no longer the place to compute queryid.

Is that what should be done?

No, I don't think so. I think having extensions change behavior
controlled by GUCs is a bad interface.

The docs are going to say that you have to enable compute_queryid to see
the query id in pg_stat_activity and log_line_prefix, but if you install
an extension, the query id will be visible even if you don't have
compute_queryid enabled. I think you need to only honor the hook if
compute_queryid is enabled, and update the pg_stat_statements docs to
say you have to enable compute_queryid for pg_stat_statements to work.

I'm confused, what you described really looks like what I described.

Let me try to clarify:

- if compute_queryid is off, a queryid should never be seen no matter how hard
an extension tries

- if compute_queryid is on, the calculation will be done by the core
(using pgss JumbeQuery) unless an extension computed one already. The only
way to know what algorithm is used is to check the list of extension loaded.

- if some extension calculates a queryid during post_parse_analyze_hook, we
will always reset it.

Is that the approach you want?

Note that the only way to not honor the hook is iff the new GUC is disabled is
to have a new queryid_hook, as we can't stop calling post_parse_analyze_hook if
the new GUC is off, and we don't want to pay the queryid calculation overhead
if the admin explicitly said it wasn't needed.

Also, should it be compute_queryid or compute_query_id?

Maybe compute_query_identifier?

Also, the overhead of computing the query id was reported as 2% --- that
seems quite high for what it does. Do we know why it is so high?

The 2% was a worst case scenario, for a query with a single join over
ridiculously small pg_class and pg_attribute, in read only. The whole workload
was in shared buffers so the planning and execution is quite fast. Adding some
complexity in the query really limited the overhead.

Note that this was done on an old laptop with quite slow CPU. Maybe
someone with a better hardware than a 5/6yo laptop could get some more
realistic results (I unfortunately don't have anything to try on).

bruce@momjian.us

almost 5 years ago

In reply to: Julien Rouhaud (#126)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Mar 19, 2021 at 02:06:56AM +0800, Julien Rouhaud wrote:

On Thu, Mar 18, 2021 at 09:47:29AM -0400, Bruce Momjian wrote:

On Thu, Mar 18, 2021 at 07:29:56AM +0800, Julien Rouhaud wrote:

Note exactly. Right now a custom queryid can be computed even if
compute_queryid is off, if some extension does that in post_parse_analyze_hook.

The above text is the part that made me think an extension could display
a query id even if disabled by the GUC.

The docs are going to say that you have to enable compute_queryid to see
the query id in pg_stat_activity and log_line_prefix, but if you install
an extension, the query id will be visible even if you don't have
compute_queryid enabled. I think you need to only honor the hook if
compute_queryid is enabled, and update the pg_stat_statements docs to
say you have to enable compute_queryid for pg_stat_statements to work.

I'm confused, what you described really looks like what I described.

Let me try to clarify:

- if compute_queryid is off, a queryid should never be seen no matter how hard
an extension tries

Oh, OK. I can see an extension setting the query id on its own --- we
can't prevent that from happening. It is probably enough to tell
extensions to honor the GUC, since they would want it enabled so it
displays in pg_stat_activity and log_line_prefix.

- if compute_queryid is on, the calculation will be done by the core
(using pgss JumbeQuery) unless an extension computed one already. The only
way to know what algorithm is used is to check the list of extension loaded.

OK.

- if some extension calculates a queryid during post_parse_analyze_hook, we
will always reset it.

OK, good.

Is that the approach you want?

Yes, I think so.

Note that the only way to not honor the hook is iff the new GUC is disabled is
to have a new queryid_hook, as we can't stop calling post_parse_analyze_hook if
the new GUC is off, and we don't want to pay the queryid calculation overhead
if the admin explicitly said it wasn't needed.

Right, let's just get the extensions to honor the GUC --- we don't need
to block them or anything.

Also, should it be compute_queryid or compute_query_id?

Maybe compute_query_identifier?

I think compute_query_id works, and is shorter.

Also, the overhead of computing the query id was reported as 2% --- that
seems quite high for what it does. Do we know why it is so high?

The 2% was a worst case scenario, for a query with a single join over
ridiculously small pg_class and pg_attribute, in read only. The whole workload
was in shared buffers so the planning and execution is quite fast. Adding some
complexity in the query really limited the overhead.

Note that this was done on an old laptop with quite slow CPU. Maybe
someone with a better hardware than a 5/6yo laptop could get some more
realistic results (I unfortunately don't have anything to try on).

OK, good to know. I can run some tests here if people would like me to.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

rjuju123@gmail.com

almost 5 years ago

In reply to: Bruce Momjian (#127)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Mar 18, 2021 at 03:23:49PM -0400, Bruce Momjian wrote:

On Fri, Mar 19, 2021 at 02:06:56AM +0800, Julien Rouhaud wrote:

The above text is the part that made me think an extension could display
a query id even if disabled by the GUC.

With the last version of the patch I sent it was the case.

Oh, OK. I can see an extension setting the query id on its own --- we
can't prevent that from happening. It is probably enough to tell
extensions to honor the GUC, since they would want it enabled so it
displays in pg_stat_activity and log_line_prefix.

Ok. So no new hook, and we keep using post_parse_analyze_hook as the official
way to have custom queryid implementation, with this new behavior:

- if some extension calculates a queryid during post_parse_analyze_hook, we
will always reset it.

OK, good.

Now that I'm back on the code I remember why I did it this way. It's
unfortunately not really possible to make things work this way.

pg_stat_statements' post_parse_analyze_hook relies on a queryid already being
computed, as it's where we know where the constants are recorded. It means:

- we have to call post_parse_analyze_hook *after* doing core queryid
calculation
- if users want to use a third party module to calculate a queryid, they'll
have to make sure that the module's post_parse_analyze_hook is called
*before* pg_stat_statements' one.
- even if they do so, they'll still have to pay the price of core queryid
calculation

So it would be very hard to configure and will be too expensive. I think that
we have to choose to either we make compute_query_id only trigger core
calculation (like it was in previous patch version), or introduce a new hook.

I think compute_query_id works, and is shorter.

WFM.

OK, good to know. I can run some tests here if people would like me to.

+1. A read only pgbench will be some kind od worse case scenario that can be
used I think.

bruce@momjian.us

almost 5 years ago

In reply to: Julien Rouhaud (#128)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Mar 19, 2021 at 11:16:50AM +0800, Julien Rouhaud wrote:

On Thu, Mar 18, 2021 at 03:23:49PM -0400, Bruce Momjian wrote:

On Fri, Mar 19, 2021 at 02:06:56AM +0800, Julien Rouhaud wrote:

The above text is the part that made me think an extension could display
a query id even if disabled by the GUC.

With the last version of the patch I sent it was the case.

Oh, OK. I can see an extension setting the query id on its own --- we
can't prevent that from happening. It is probably enough to tell
extensions to honor the GUC, since they would want it enabled so it
displays in pg_stat_activity and log_line_prefix.

Ok. So no new hook, and we keep using post_parse_analyze_hook as the official
way to have custom queryid implementation, with this new behavior:

- if some extension calculates a queryid during post_parse_analyze_hook, we
will always reset it.

OK, good.

Now that I'm back on the code I remember why I did it this way. It's
unfortunately not really possible to make things work this way.

pg_stat_statements' post_parse_analyze_hook relies on a queryid already being
computed, as it's where we know where the constants are recorded. It means:

- we have to call post_parse_analyze_hook *after* doing core queryid
calculation
- if users want to use a third party module to calculate a queryid, they'll
have to make sure that the module's post_parse_analyze_hook is called
*before* pg_stat_statements' one.
- even if they do so, they'll still have to pay the price of core queryid
calculation

OK, that makes perfect sense. I think the best solution is to document
that compute_query_id just controls the built-in computation of the
query id, and that extensions can also compute it if this is off, and
pg_stat_activity and log_line_prefix will display built-in or extension
computed query ids.

It might be interesting someday to check if the hook changed a
pre-computed query id and warn the user in the logs, but that could
cause more log-spam problems than help. I am a little worried that
someone might have compute_query_id enabled and then install an
extension that overwrites it, but we will just have to document this
issue. Hopefully extensions will be clear that they are computing their
own query id.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

hannuk@google.com

almost 5 years ago

In reply to: Bruce Momjian (#129)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Mar 19, 2021 at 2:29 PM Bruce Momjian <bruce@momjian.us> wrote:

OK, that makes perfect sense. I think the best solution is to document
that compute_query_id just controls the built-in computation of the
query id, and that extensions can also compute it if this is off, and
pg_stat_activity and log_line_prefix will display built-in or extension
computed query ids.

It might be interesting someday to check if the hook changed a
pre-computed query id and warn the user in the logs, but that could
cause more log-spam problems than help.

The log-spam could be mitigated by logging it just once per connection
the first time it is overridden

Also, we could ask the extensions to expose the "method name" in a read-only GUC

so one can do

SHOW compute_query_id_method;

and get the name of method use

compute_query_id_method
------------------------------------
builtin

And it may even dynamically change to indicate the overriding of builtin

compute_query_id_method
---------------------------------------------------
fancy_compute_query_id (overrides builtin)

rjuju123@gmail.com

almost 5 years ago

In reply to: Bruce Momjian (#129)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Mar 19, 2021 at 09:29:06AM -0400, Bruce Momjian wrote:

On Fri, Mar 19, 2021 at 11:16:50AM +0800, Julien Rouhaud wrote:

Now that I'm back on the code I remember why I did it this way. It's
unfortunately not really possible to make things work this way.

pg_stat_statements' post_parse_analyze_hook relies on a queryid already being
computed, as it's where we know where the constants are recorded. It means:

- we have to call post_parse_analyze_hook *after* doing core queryid
calculation
- if users want to use a third party module to calculate a queryid, they'll
have to make sure that the module's post_parse_analyze_hook is called
*before* pg_stat_statements' one.
- even if they do so, they'll still have to pay the price of core queryid
calculation

OK, that makes perfect sense. I think the best solution is to document
that compute_query_id just controls the built-in computation of the
query id, and that extensions can also compute it if this is off, and
pg_stat_activity and log_line_prefix will display built-in or extension
computed query ids.

So the last version of the patch should implement that behavior right? It's
just missing some explicit guidance that third-party extensions should only
calculate a queryid if compute_query_id is off

It might be interesting someday to check if the hook changed a
pre-computed query id and warn the user in the logs, but that could
cause more log-spam problems than help. I am a little worried that
someone might have compute_query_id enabled and then install an
extension that overwrites it, but we will just have to document this
issue. Hopefully extensions will be clear that they are computing their
own query id.

I agree. And hopefully they will split the queryid calculation from the rest
of the extension so that users can use the combination they want.

rjuju123@gmail.com

almost 5 years ago

In reply to: Hannu Krosing (#130)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Mar 19, 2021 at 02:54:16PM +0100, Hannu Krosing wrote:

On Fri, Mar 19, 2021 at 2:29 PM Bruce Momjian <bruce@momjian.us> wrote:

OK, that makes perfect sense. I think the best solution is to document
that compute_query_id just controls the built-in computation of the
query id, and that extensions can also compute it if this is off, and
pg_stat_activity and log_line_prefix will display built-in or extension
computed query ids.

It might be interesting someday to check if the hook changed a
pre-computed query id and warn the user in the logs, but that could
cause more log-spam problems than help.

The log-spam could be mitigated by logging it just once per connection
the first time it is overridden

Yes, but it might still generate a significant amount of additional lines.

If extensions authors follow the recommendations and only calculate a queryid
when compute_query_id is off, it shoule be easy to check that you have
everything setup properly.

Also, we could ask the extensions to expose the "method name" in a read-only GUC

so one can do

SHOW compute_query_id_method;

and get the name of method use

compute_query_id_method
------------------------------------
builtin

And it may even dynamically change to indicate the overriding of builtin

compute_query_id_method
---------------------------------------------------
fancy_compute_query_id (overrides builtin)

This could be nice, but I'm not sure that it would work well if someones
install multiple extensions that calculate a queryid (which would be silly but
still), or load another one at runtime.

bruce@momjian.us

almost 5 years ago

In reply to: Julien Rouhaud (#132)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Mar 19, 2021 at 10:35:21PM +0800, Julien Rouhaud wrote:

On Fri, Mar 19, 2021 at 02:54:16PM +0100, Hannu Krosing wrote:

On Fri, Mar 19, 2021 at 2:29 PM Bruce Momjian <bruce@momjian.us> wrote:
The log-spam could be mitigated by logging it just once per connection
the first time it is overridden

Yes, but it might still generate a significant amount of additional lines.

If extensions authors follow the recommendations and only calculate a queryid
when compute_query_id is off, it shoule be easy to check that you have
everything setup properly.

Seems extensions that want to generate their own query id should just
error out with a message to the log file if compute_query_id is set ---
that should fix the entire issue --- but see below.

Also, we could ask the extensions to expose the "method name" in a read-only GUC

so one can do

SHOW compute_query_id_method;

and get the name of method use

compute_query_id_method
------------------------------------
builtin

And it may even dynamically change to indicate the overriding of builtin

compute_query_id_method
---------------------------------------------------
fancy_compute_query_id (overrides builtin)

This could be nice, but I'm not sure that it would work well if someones
install multiple extensions that calculate a queryid (which would be silly but
still), or load another one at runtime.

Well, given we don't really want to support multiple query id types
being generated or displayed, the "error out" above should fix it.

Let's do this --- tell extensions to error out if the query id is
already set, either by compute_query_id or another extension. If an
extension wants to generate its own query id and store is internal to
the extension, that is fine, but the server-displayed query id should be
generated once and never overwritten by an extension.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

bruce@momjian.us

almost 5 years ago

In reply to: Julien Rouhaud (#131)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Mar 19, 2021 at 10:27:51PM +0800, Julien Rouhaud wrote:

On Fri, Mar 19, 2021 at 09:29:06AM -0400, Bruce Momjian wrote:

OK, that makes perfect sense. I think the best solution is to document
that compute_query_id just controls the built-in computation of the
query id, and that extensions can also compute it if this is off, and
pg_stat_activity and log_line_prefix will display built-in or extension
computed query ids.

So the last version of the patch should implement that behavior right? It's
just missing some explicit guidance that third-party extensions should only
calculate a queryid if compute_query_id is off

Yes, I think we are now down to just how the extensions should be told
to behave, and how we document this --- see the email I just sent.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

hannuk@google.com

almost 5 years ago

In reply to: Bruce Momjian (#134)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

It would be really convenient if user-visible serialisations of the query
id had something that identifies the computation method.

maybe prefix 'N' for internal, 'S' for pg_stat_statements etc.

This would immediately show in logs at what point the id calculator was
changed

On Fri, Mar 19, 2021 at 5:54 PM Bruce Momjian <bruce@momjian.us> wrote:

Show quoted text

On Fri, Mar 19, 2021 at 10:27:51PM +0800, Julien Rouhaud wrote:

On Fri, Mar 19, 2021 at 09:29:06AM -0400, Bruce Momjian wrote:

OK, that makes perfect sense. I think the best solution is to document
that compute_query_id just controls the built-in computation of the
query id, and that extensions can also compute it if this is off, and
pg_stat_activity and log_line_prefix will display built-in or extension
computed query ids.

So the last version of the patch should implement that behavior right?

It's

just missing some explicit guidance that third-party extensions should

only

calculate a queryid if compute_query_id is off

Yes, I think we are now down to just how the extensions should be told
to behave, and how we document this --- see the email I just sent.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

bruce@momjian.us

almost 5 years ago

In reply to: Hannu Krosing (#135)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Sat, Mar 20, 2021 at 01:03:16AM +0100, Hannu Krosing wrote:

It would be really convenient if user-visible serialisations of the query id
had something that identifies the computation method.

maybe prefix 'N' for internal, 'S' for pg_stat_statements etc.

This would immediatelyï¿½show in logs at what point the idï¿½calculatorï¿½was changed

Yeah, but it an integer, and I don't think we want to change that.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

rjuju123@gmail.com

almost 5 years ago

In reply to: Bruce Momjian (#136)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Mar 19, 2021 at 08:10:54PM -0400, Bruce Momjian wrote:

On Sat, Mar 20, 2021 at 01:03:16AM +0100, Hannu Krosing wrote:

It would be really convenient if user-visible serialisations of the query id
had something that identifies the computation method.

maybe prefix 'N' for internal, 'S' for pg_stat_statements etc.

This would immediatelyï¿½show in logs at what point the idï¿½calculatorï¿½was changed

Yeah, but it an integer, and I don't think we want to change that.

Also, with Bruce's approach to ask extensions to error out if they would
overwrite a queryid the only way to change the calculation method is a restart.
So only one source can exist in the system.

Hopefully that's a big enough hammer that administrators will know what method
they're using.

rjuju123@gmail.com

almost 5 years ago

In reply to: Bruce Momjian (#133)

3 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Mar 19, 2021 at 12:53:18PM -0400, Bruce Momjian wrote:

Well, given we don't really want to support multiple query id types
being generated or displayed, the "error out" above should fix it.

Let's do this --- tell extensions to error out if the query id is
already set, either by compute_query_id or another extension. If an
extension wants to generate its own query id and store is internal to
the extension, that is fine, but the server-displayed query id should be
generated once and never overwritten by an extension.

Agreed, this will ensure that you won't dynamically change the queryid source.

We should also document that changing it requires a restart and calling
pg_stat_statements_reset() afterwards.

v19 adds some changes, plus extra documentation for pg_stat_statements about
the requirement for a queryid to be calculated, and a note that all documented
details only apply for in-core source. I'm not sure if this is still the best
place to document those details anymore though.

Attachments:

v19-0001-Move-pg_stat_statements-query-jumbling-to-core.patchtext/x-diff; charset=us-asciiDownload

From bcc76fdcff0ac867b087706a14141ceadcb371bf Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Wed, 14 Oct 2020 02:11:37 +0800
Subject: [PATCH v19 1/3] Move pg_stat_statements query jumbling to core.

A new compute_query_id GUC is also added, to control whether a query identifier
should be computed by the core.  It's thefore now possible to disable core
queryid computation and use pg_stat_statements with a different algorithm to
compute the query identifier by using third-party module.

To ensure that a single source of query identifier can be used and is well
defined, modules that calculate a query identifier should throw an error if
compute_query_id is enabled or if a query idenfitier was already calculated.

Author: Julien Rouhaud
Reviewed-by: Bruce Momjian
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +----------------
 .../pg_stat_statements.conf                   |   1 +
 doc/src/sgml/config.sgml                      |  26 +
 doc/src/sgml/pgstatstatements.sgml            |  20 +-
 src/backend/parser/analyze.c                  |  14 +-
 src/backend/tcop/postgres.c                   |   6 +-
 src/backend/utils/misc/Makefile               |   1 +
 src/backend/utils/misc/guc.c                  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          | 834 ++++++++++++++++++
 src/include/parser/analyze.h                  |   4 +-
 src/include/utils/guc.h                       |   1 +
 src/include/utils/queryjumble.h               |  58 ++
 13 files changed, 996 insertions(+), 785 deletions(-)
 create mode 100644 src/backend/utils/misc/queryjumble.c
 create mode 100644 src/include/utils/queryjumble.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 62cccbfa44..498f2aa376 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * As of Postgres 9.2, this module normalizes query entries.  As of Postgres
+ * 14, the normalization is done by the core if compute_query_id is enabled,
+ * or optionally by third-party modules.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -114,8 +99,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
-#define JUMBLE_SIZE				1024	/* query serialization buffer size */
-
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -235,40 +218,6 @@ typedef struct pgssSharedState
 	pgssGlobalStats stats;		/* global statistics for pgss */
 } pgssSharedState;
 
-/*
- * Struct for tracking locations/lengths of constants during normalization
- */
-typedef struct pgssLocationLen
-{
-	int			location;		/* start offset in query text */
-	int			length;			/* length in bytes, or -1 to ignore */
-} pgssLocationLen;
-
-/*
- * Working state for computing a query jumble and producing a normalized
- * query string
- */
-typedef struct pgssJumbleState
-{
-	/* Jumble of current query tree */
-	unsigned char *jumble;
-
-	/* Number of bytes used in jumble[] */
-	Size		jumble_len;
-
-	/* Array of locations of constants that should be removed */
-	pgssLocationLen *clocations;
-
-	/* Allocated length of clocations array */
-	int			clocations_buf_size;
-
-	/* Current number of valid entries in clocations array */
-	int			clocations_count;
-
-	/* highest Param id we've seen, in order to start normalization correctly */
-	int			highest_extern_param_id;
-} pgssJumbleState;
-
 /*---- Local variables ----*/
 
 /* Current nesting depth of ExecutorRun+ProcessUtility calls */
@@ -342,7 +291,8 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_info);
 
 static void pgss_shmem_startup(void);
 static void pgss_shmem_shutdown(int code, Datum arg);
-static void pgss_post_parse_analyze(ParseState *pstate, Query *query);
+static void pgss_post_parse_analyze(ParseState *pstate, Query *query,
+									JumbleState *jstate);
 static PlannedStmt *pgss_planner(Query *parse,
 								 const char *query_string,
 								 int cursorOptions,
@@ -364,7 +314,7 @@ static void pgss_store(const char *query, uint64 queryId,
 					   double total_time, uint64 rows,
 					   const BufferUsage *bufusage,
 					   const WalUsage *walusage,
-					   pgssJumbleState *jstate);
+					   JumbleState *jstate);
 static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
 										pgssVersion api_version,
 										bool showtext);
@@ -380,16 +330,9 @@ static char *qtext_fetch(Size query_offset, int query_len,
 static bool need_gc_qtexts(void);
 static void gc_qtexts(void);
 static void entry_reset(Oid userid, Oid dbid, uint64 queryid);
-static void AppendJumble(pgssJumbleState *jstate,
-						 const unsigned char *item, Size size);
-static void JumbleQuery(pgssJumbleState *jstate, Query *query);
-static void JumbleRangeTable(pgssJumbleState *jstate, List *rtable);
-static void JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks);
-static void JumbleExpr(pgssJumbleState *jstate, Node *node);
-static void RecordConstLocation(pgssJumbleState *jstate, int location);
-static char *generate_normalized_query(pgssJumbleState *jstate, const char *query,
+static char *generate_normalized_query(JumbleState *jstate, const char *query,
 									   int query_loc, int *query_len_p);
-static void fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+static void fill_in_constant_lengths(JumbleState *jstate, const char *query,
 									 int query_loc);
 static int	comp_location(const void *a, const void *b);
 
@@ -851,15 +794,10 @@ error:
  * Post-parse-analysis hook: mark query with a queryId
  */
 static void
-pgss_post_parse_analyze(ParseState *pstate, Query *query)
+pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 {
-	pgssJumbleState jstate;
-
 	if (prev_post_parse_analyze_hook)
-		prev_post_parse_analyze_hook(pstate, query);
-
-	/* Assert we didn't do this already */
-	Assert(query->queryId == UINT64CONST(0));
+		prev_post_parse_analyze_hook(pstate, query, jstate);
 
 	/* Safety check... */
 	if (!pgss || !pgss_hash || !pgss_enabled(exec_nested_level))
@@ -879,35 +817,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 	}
 
-	/* Set up workspace for query jumbling */
-	jstate.jumble = (unsigned char *) palloc(JUMBLE_SIZE);
-	jstate.jumble_len = 0;
-	jstate.clocations_buf_size = 32;
-	jstate.clocations = (pgssLocationLen *)
-		palloc(jstate.clocations_buf_size * sizeof(pgssLocationLen));
-	jstate.clocations_count = 0;
-	jstate.highest_extern_param_id = 0;
-
-	/* Compute query ID and mark the Query node with it */
-	JumbleQuery(&jstate, query);
-	query->queryId =
-		DatumGetUInt64(hash_any_extended(jstate.jumble, jstate.jumble_len, 0));
-
 	/*
-	 * If we are unlucky enough to get a hash of zero, use 1 instead, to
-	 * prevent confusion with the utility-statement case.
+	 * If query jumbling were able to identify any ignorable constants, we
+	 * immediately create a hash table entry for the query, so that we can
+	 * record the normalized form of the query string.  If there were no such
+	 * constants, the normalized string would be the same as the query text
+	 * anyway, so there's no need for an early entry.
 	 */
-	if (query->queryId == UINT64CONST(0))
-		query->queryId = UINT64CONST(1);
-
-	/*
-	 * If we were able to identify any ignorable constants, we immediately
-	 * create a hash table entry for the query, so that we can record the
-	 * normalized form of the query string.  If there were no such constants,
-	 * the normalized string would be the same as the query text anyway, so
-	 * there's no need for an early entry.
-	 */
-	if (jstate.clocations_count > 0)
+	if (jstate && jstate->clocations_count > 0)
 		pgss_store(pstate->p_sourcetext,
 				   query->queryId,
 				   query->stmt_location,
@@ -917,7 +834,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 				   0,
 				   NULL,
 				   NULL,
-				   &jstate);
+				   jstate);
 }
 
 /*
@@ -1267,7 +1184,7 @@ pgss_store(const char *query, uint64 queryId,
 		   double total_time, uint64 rows,
 		   const BufferUsage *bufusage,
 		   const WalUsage *walusage,
-		   pgssJumbleState *jstate)
+		   JumbleState *jstate)
 {
 	pgssHashKey key;
 	pgssEntry  *entry;
@@ -2627,678 +2544,6 @@ release_lock:
 	LWLockRelease(pgss->lock);
 }
 
-/*
- * AppendJumble: Append a value that is substantive in a given query to
- * the current jumble.
- */
-static void
-AppendJumble(pgssJumbleState *jstate, const unsigned char *item, Size size)
-{
-	unsigned char *jumble = jstate->jumble;
-	Size		jumble_len = jstate->jumble_len;
-
-	/*
-	 * Whenever the jumble buffer is full, we hash the current contents and
-	 * reset the buffer to contain just that hash value, thus relying on the
-	 * hash to summarize everything so far.
-	 */
-	while (size > 0)
-	{
-		Size		part_size;
-
-		if (jumble_len >= JUMBLE_SIZE)
-		{
-			uint64		start_hash;
-
-			start_hash = DatumGetUInt64(hash_any_extended(jumble,
-														  JUMBLE_SIZE, 0));
-			memcpy(jumble, &start_hash, sizeof(start_hash));
-			jumble_len = sizeof(start_hash);
-		}
-		part_size = Min(size, JUMBLE_SIZE - jumble_len);
-		memcpy(jumble + jumble_len, item, part_size);
-		jumble_len += part_size;
-		item += part_size;
-		size -= part_size;
-	}
-	jstate->jumble_len = jumble_len;
-}
-
-/*
- * Wrappers around AppendJumble to encapsulate details of serialization
- * of individual local variable elements.
- */
-#define APP_JUMB(item) \
-	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
-#define APP_JUMB_STRING(str) \
-	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
-
-/*
- * JumbleQuery: Selectively serialize the query tree, appending significant
- * data to the "query jumble" while ignoring nonsignificant data.
- *
- * Rule of thumb for what to include is that we should ignore anything not
- * semantically significant (such as alias names) as well as anything that can
- * be deduced from child nodes (else we'd just be double-hashing that piece
- * of information).
- */
-static void
-JumbleQuery(pgssJumbleState *jstate, Query *query)
-{
-	Assert(IsA(query, Query));
-	Assert(query->utilityStmt == NULL);
-
-	APP_JUMB(query->commandType);
-	/* resultRelation is usually predictable from commandType */
-	JumbleExpr(jstate, (Node *) query->cteList);
-	JumbleRangeTable(jstate, query->rtable);
-	JumbleExpr(jstate, (Node *) query->jointree);
-	JumbleExpr(jstate, (Node *) query->targetList);
-	JumbleExpr(jstate, (Node *) query->onConflict);
-	JumbleExpr(jstate, (Node *) query->returningList);
-	JumbleExpr(jstate, (Node *) query->groupClause);
-	JumbleExpr(jstate, (Node *) query->groupingSets);
-	JumbleExpr(jstate, query->havingQual);
-	JumbleExpr(jstate, (Node *) query->windowClause);
-	JumbleExpr(jstate, (Node *) query->distinctClause);
-	JumbleExpr(jstate, (Node *) query->sortClause);
-	JumbleExpr(jstate, query->limitOffset);
-	JumbleExpr(jstate, query->limitCount);
-	JumbleRowMarks(jstate, query->rowMarks);
-	JumbleExpr(jstate, query->setOperations);
-}
-
-/*
- * Jumble a range table
- */
-static void
-JumbleRangeTable(pgssJumbleState *jstate, List *rtable)
-{
-	ListCell   *lc;
-
-	foreach(lc, rtable)
-	{
-		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
-
-		APP_JUMB(rte->rtekind);
-		switch (rte->rtekind)
-		{
-			case RTE_RELATION:
-				APP_JUMB(rte->relid);
-				JumbleExpr(jstate, (Node *) rte->tablesample);
-				break;
-			case RTE_SUBQUERY:
-				JumbleQuery(jstate, rte->subquery);
-				break;
-			case RTE_JOIN:
-				APP_JUMB(rte->jointype);
-				break;
-			case RTE_FUNCTION:
-				JumbleExpr(jstate, (Node *) rte->functions);
-				break;
-			case RTE_TABLEFUNC:
-				JumbleExpr(jstate, (Node *) rte->tablefunc);
-				break;
-			case RTE_VALUES:
-				JumbleExpr(jstate, (Node *) rte->values_lists);
-				break;
-			case RTE_CTE:
-
-				/*
-				 * Depending on the CTE name here isn't ideal, but it's the
-				 * only info we have to identify the referenced WITH item.
-				 */
-				APP_JUMB_STRING(rte->ctename);
-				APP_JUMB(rte->ctelevelsup);
-				break;
-			case RTE_NAMEDTUPLESTORE:
-				APP_JUMB_STRING(rte->enrname);
-				break;
-			case RTE_RESULT:
-				break;
-			default:
-				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
-				break;
-		}
-	}
-}
-
-/*
- * Jumble a rowMarks list
- */
-static void
-JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks)
-{
-	ListCell   *lc;
-
-	foreach(lc, rowMarks)
-	{
-		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
-
-		if (!rowmark->pushedDown)
-		{
-			APP_JUMB(rowmark->rti);
-			APP_JUMB(rowmark->strength);
-			APP_JUMB(rowmark->waitPolicy);
-		}
-	}
-}
-
-/*
- * Jumble an expression tree
- *
- * In general this function should handle all the same node types that
- * expression_tree_walker() does, and therefore it's coded to be as parallel
- * to that function as possible.  However, since we are only invoked on
- * queries immediately post-parse-analysis, we need not handle node types
- * that only appear in planning.
- *
- * Note: the reason we don't simply use expression_tree_walker() is that the
- * point of that function is to support tree walkers that don't care about
- * most tree node types, but here we care about all types.  We should complain
- * about any unrecognized node type.
- */
-static void
-JumbleExpr(pgssJumbleState *jstate, Node *node)
-{
-	ListCell   *temp;
-
-	if (node == NULL)
-		return;
-
-	/* Guard against stack overflow due to overly complex expressions */
-	check_stack_depth();
-
-	/*
-	 * We always emit the node's NodeTag, then any additional fields that are
-	 * considered significant, and then we recurse to any child nodes.
-	 */
-	APP_JUMB(node->type);
-
-	switch (nodeTag(node))
-	{
-		case T_Var:
-			{
-				Var		   *var = (Var *) node;
-
-				APP_JUMB(var->varno);
-				APP_JUMB(var->varattno);
-				APP_JUMB(var->varlevelsup);
-			}
-			break;
-		case T_Const:
-			{
-				Const	   *c = (Const *) node;
-
-				/* We jumble only the constant's type, not its value */
-				APP_JUMB(c->consttype);
-				/* Also, record its parse location for query normalization */
-				RecordConstLocation(jstate, c->location);
-			}
-			break;
-		case T_Param:
-			{
-				Param	   *p = (Param *) node;
-
-				APP_JUMB(p->paramkind);
-				APP_JUMB(p->paramid);
-				APP_JUMB(p->paramtype);
-				/* Also, track the highest external Param id */
-				if (p->paramkind == PARAM_EXTERN &&
-					p->paramid > jstate->highest_extern_param_id)
-					jstate->highest_extern_param_id = p->paramid;
-			}
-			break;
-		case T_Aggref:
-			{
-				Aggref	   *expr = (Aggref *) node;
-
-				APP_JUMB(expr->aggfnoid);
-				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggorder);
-				JumbleExpr(jstate, (Node *) expr->aggdistinct);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_GroupingFunc:
-			{
-				GroupingFunc *grpnode = (GroupingFunc *) node;
-
-				JumbleExpr(jstate, (Node *) grpnode->refs);
-			}
-			break;
-		case T_WindowFunc:
-			{
-				WindowFunc *expr = (WindowFunc *) node;
-
-				APP_JUMB(expr->winfnoid);
-				APP_JUMB(expr->winref);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_SubscriptingRef:
-			{
-				SubscriptingRef *sbsref = (SubscriptingRef *) node;
-
-				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
-			}
-			break;
-		case T_FuncExpr:
-			{
-				FuncExpr   *expr = (FuncExpr *) node;
-
-				APP_JUMB(expr->funcid);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_NamedArgExpr:
-			{
-				NamedArgExpr *nae = (NamedArgExpr *) node;
-
-				APP_JUMB(nae->argnumber);
-				JumbleExpr(jstate, (Node *) nae->arg);
-			}
-			break;
-		case T_OpExpr:
-		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
-		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
-			{
-				OpExpr	   *expr = (OpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_ScalarArrayOpExpr:
-			{
-				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				APP_JUMB(expr->useOr);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_BoolExpr:
-			{
-				BoolExpr   *expr = (BoolExpr *) node;
-
-				APP_JUMB(expr->boolop);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_SubLink:
-			{
-				SubLink    *sublink = (SubLink *) node;
-
-				APP_JUMB(sublink->subLinkType);
-				APP_JUMB(sublink->subLinkId);
-				JumbleExpr(jstate, (Node *) sublink->testexpr);
-				JumbleQuery(jstate, castNode(Query, sublink->subselect));
-			}
-			break;
-		case T_FieldSelect:
-			{
-				FieldSelect *fs = (FieldSelect *) node;
-
-				APP_JUMB(fs->fieldnum);
-				JumbleExpr(jstate, (Node *) fs->arg);
-			}
-			break;
-		case T_FieldStore:
-			{
-				FieldStore *fstore = (FieldStore *) node;
-
-				JumbleExpr(jstate, (Node *) fstore->arg);
-				JumbleExpr(jstate, (Node *) fstore->newvals);
-			}
-			break;
-		case T_RelabelType:
-			{
-				RelabelType *rt = (RelabelType *) node;
-
-				APP_JUMB(rt->resulttype);
-				JumbleExpr(jstate, (Node *) rt->arg);
-			}
-			break;
-		case T_CoerceViaIO:
-			{
-				CoerceViaIO *cio = (CoerceViaIO *) node;
-
-				APP_JUMB(cio->resulttype);
-				JumbleExpr(jstate, (Node *) cio->arg);
-			}
-			break;
-		case T_ArrayCoerceExpr:
-			{
-				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
-
-				APP_JUMB(acexpr->resulttype);
-				JumbleExpr(jstate, (Node *) acexpr->arg);
-				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
-			}
-			break;
-		case T_ConvertRowtypeExpr:
-			{
-				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
-
-				APP_JUMB(crexpr->resulttype);
-				JumbleExpr(jstate, (Node *) crexpr->arg);
-			}
-			break;
-		case T_CollateExpr:
-			{
-				CollateExpr *ce = (CollateExpr *) node;
-
-				APP_JUMB(ce->collOid);
-				JumbleExpr(jstate, (Node *) ce->arg);
-			}
-			break;
-		case T_CaseExpr:
-			{
-				CaseExpr   *caseexpr = (CaseExpr *) node;
-
-				JumbleExpr(jstate, (Node *) caseexpr->arg);
-				foreach(temp, caseexpr->args)
-				{
-					CaseWhen   *when = lfirst_node(CaseWhen, temp);
-
-					JumbleExpr(jstate, (Node *) when->expr);
-					JumbleExpr(jstate, (Node *) when->result);
-				}
-				JumbleExpr(jstate, (Node *) caseexpr->defresult);
-			}
-			break;
-		case T_CaseTestExpr:
-			{
-				CaseTestExpr *ct = (CaseTestExpr *) node;
-
-				APP_JUMB(ct->typeId);
-			}
-			break;
-		case T_ArrayExpr:
-			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
-			break;
-		case T_RowExpr:
-			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
-			break;
-		case T_RowCompareExpr:
-			{
-				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
-
-				APP_JUMB(rcexpr->rctype);
-				JumbleExpr(jstate, (Node *) rcexpr->largs);
-				JumbleExpr(jstate, (Node *) rcexpr->rargs);
-			}
-			break;
-		case T_CoalesceExpr:
-			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
-			break;
-		case T_MinMaxExpr:
-			{
-				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
-
-				APP_JUMB(mmexpr->op);
-				JumbleExpr(jstate, (Node *) mmexpr->args);
-			}
-			break;
-		case T_SQLValueFunction:
-			{
-				SQLValueFunction *svf = (SQLValueFunction *) node;
-
-				APP_JUMB(svf->op);
-				/* type is fully determined by op */
-				APP_JUMB(svf->typmod);
-			}
-			break;
-		case T_XmlExpr:
-			{
-				XmlExpr    *xexpr = (XmlExpr *) node;
-
-				APP_JUMB(xexpr->op);
-				JumbleExpr(jstate, (Node *) xexpr->named_args);
-				JumbleExpr(jstate, (Node *) xexpr->args);
-			}
-			break;
-		case T_NullTest:
-			{
-				NullTest   *nt = (NullTest *) node;
-
-				APP_JUMB(nt->nulltesttype);
-				JumbleExpr(jstate, (Node *) nt->arg);
-			}
-			break;
-		case T_BooleanTest:
-			{
-				BooleanTest *bt = (BooleanTest *) node;
-
-				APP_JUMB(bt->booltesttype);
-				JumbleExpr(jstate, (Node *) bt->arg);
-			}
-			break;
-		case T_CoerceToDomain:
-			{
-				CoerceToDomain *cd = (CoerceToDomain *) node;
-
-				APP_JUMB(cd->resulttype);
-				JumbleExpr(jstate, (Node *) cd->arg);
-			}
-			break;
-		case T_CoerceToDomainValue:
-			{
-				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
-
-				APP_JUMB(cdv->typeId);
-			}
-			break;
-		case T_SetToDefault:
-			{
-				SetToDefault *sd = (SetToDefault *) node;
-
-				APP_JUMB(sd->typeId);
-			}
-			break;
-		case T_CurrentOfExpr:
-			{
-				CurrentOfExpr *ce = (CurrentOfExpr *) node;
-
-				APP_JUMB(ce->cvarno);
-				if (ce->cursor_name)
-					APP_JUMB_STRING(ce->cursor_name);
-				APP_JUMB(ce->cursor_param);
-			}
-			break;
-		case T_NextValueExpr:
-			{
-				NextValueExpr *nve = (NextValueExpr *) node;
-
-				APP_JUMB(nve->seqid);
-				APP_JUMB(nve->typeId);
-			}
-			break;
-		case T_InferenceElem:
-			{
-				InferenceElem *ie = (InferenceElem *) node;
-
-				APP_JUMB(ie->infercollid);
-				APP_JUMB(ie->inferopclass);
-				JumbleExpr(jstate, ie->expr);
-			}
-			break;
-		case T_TargetEntry:
-			{
-				TargetEntry *tle = (TargetEntry *) node;
-
-				APP_JUMB(tle->resno);
-				APP_JUMB(tle->ressortgroupref);
-				JumbleExpr(jstate, (Node *) tle->expr);
-			}
-			break;
-		case T_RangeTblRef:
-			{
-				RangeTblRef *rtr = (RangeTblRef *) node;
-
-				APP_JUMB(rtr->rtindex);
-			}
-			break;
-		case T_JoinExpr:
-			{
-				JoinExpr   *join = (JoinExpr *) node;
-
-				APP_JUMB(join->jointype);
-				APP_JUMB(join->isNatural);
-				APP_JUMB(join->rtindex);
-				JumbleExpr(jstate, join->larg);
-				JumbleExpr(jstate, join->rarg);
-				JumbleExpr(jstate, join->quals);
-			}
-			break;
-		case T_FromExpr:
-			{
-				FromExpr   *from = (FromExpr *) node;
-
-				JumbleExpr(jstate, (Node *) from->fromlist);
-				JumbleExpr(jstate, from->quals);
-			}
-			break;
-		case T_OnConflictExpr:
-			{
-				OnConflictExpr *conf = (OnConflictExpr *) node;
-
-				APP_JUMB(conf->action);
-				JumbleExpr(jstate, (Node *) conf->arbiterElems);
-				JumbleExpr(jstate, conf->arbiterWhere);
-				JumbleExpr(jstate, (Node *) conf->onConflictSet);
-				JumbleExpr(jstate, conf->onConflictWhere);
-				APP_JUMB(conf->constraint);
-				APP_JUMB(conf->exclRelIndex);
-				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
-			}
-			break;
-		case T_List:
-			foreach(temp, (List *) node)
-			{
-				JumbleExpr(jstate, (Node *) lfirst(temp));
-			}
-			break;
-		case T_IntList:
-			foreach(temp, (List *) node)
-			{
-				APP_JUMB(lfirst_int(temp));
-			}
-			break;
-		case T_SortGroupClause:
-			{
-				SortGroupClause *sgc = (SortGroupClause *) node;
-
-				APP_JUMB(sgc->tleSortGroupRef);
-				APP_JUMB(sgc->eqop);
-				APP_JUMB(sgc->sortop);
-				APP_JUMB(sgc->nulls_first);
-			}
-			break;
-		case T_GroupingSet:
-			{
-				GroupingSet *gsnode = (GroupingSet *) node;
-
-				JumbleExpr(jstate, (Node *) gsnode->content);
-			}
-			break;
-		case T_WindowClause:
-			{
-				WindowClause *wc = (WindowClause *) node;
-
-				APP_JUMB(wc->winref);
-				APP_JUMB(wc->frameOptions);
-				JumbleExpr(jstate, (Node *) wc->partitionClause);
-				JumbleExpr(jstate, (Node *) wc->orderClause);
-				JumbleExpr(jstate, wc->startOffset);
-				JumbleExpr(jstate, wc->endOffset);
-			}
-			break;
-		case T_CommonTableExpr:
-			{
-				CommonTableExpr *cte = (CommonTableExpr *) node;
-
-				/* we store the string name because RTE_CTE RTEs need it */
-				APP_JUMB_STRING(cte->ctename);
-				APP_JUMB(cte->ctematerialized);
-				JumbleQuery(jstate, castNode(Query, cte->ctequery));
-			}
-			break;
-		case T_SetOperationStmt:
-			{
-				SetOperationStmt *setop = (SetOperationStmt *) node;
-
-				APP_JUMB(setop->op);
-				APP_JUMB(setop->all);
-				JumbleExpr(jstate, setop->larg);
-				JumbleExpr(jstate, setop->rarg);
-			}
-			break;
-		case T_RangeTblFunction:
-			{
-				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
-
-				JumbleExpr(jstate, rtfunc->funcexpr);
-			}
-			break;
-		case T_TableFunc:
-			{
-				TableFunc  *tablefunc = (TableFunc *) node;
-
-				JumbleExpr(jstate, tablefunc->docexpr);
-				JumbleExpr(jstate, tablefunc->rowexpr);
-				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
-			}
-			break;
-		case T_TableSampleClause:
-			{
-				TableSampleClause *tsc = (TableSampleClause *) node;
-
-				APP_JUMB(tsc->tsmhandler);
-				JumbleExpr(jstate, (Node *) tsc->args);
-				JumbleExpr(jstate, (Node *) tsc->repeatable);
-			}
-			break;
-		default:
-			/* Only a warning, since we can stumble along anyway */
-			elog(WARNING, "unrecognized node type: %d",
-				 (int) nodeTag(node));
-			break;
-	}
-}
-
-/*
- * Record location of constant within query string of query tree
- * that is currently being walked.
- */
-static void
-RecordConstLocation(pgssJumbleState *jstate, int location)
-{
-	/* -1 indicates unknown or undefined location */
-	if (location >= 0)
-	{
-		/* enlarge array if needed */
-		if (jstate->clocations_count >= jstate->clocations_buf_size)
-		{
-			jstate->clocations_buf_size *= 2;
-			jstate->clocations = (pgssLocationLen *)
-				repalloc(jstate->clocations,
-						 jstate->clocations_buf_size *
-						 sizeof(pgssLocationLen));
-		}
-		jstate->clocations[jstate->clocations_count].location = location;
-		/* initialize lengths to -1 to simplify fill_in_constant_lengths */
-		jstate->clocations[jstate->clocations_count].length = -1;
-		jstate->clocations_count++;
-	}
-}
-
 /*
  * Generate a normalized version of the query string that will be used to
  * represent all similar queries.
@@ -3319,7 +2564,7 @@ RecordConstLocation(pgssJumbleState *jstate, int location)
  * Returns a palloc'd string.
  */
 static char *
-generate_normalized_query(pgssJumbleState *jstate, const char *query,
+generate_normalized_query(JumbleState *jstate, const char *query,
 						  int query_loc, int *query_len_p)
 {
 	char	   *norm_query;
@@ -3426,10 +2671,10 @@ generate_normalized_query(pgssJumbleState *jstate, const char *query,
  * reason for a constant to start with a '-'.
  */
 static void
-fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+fill_in_constant_lengths(JumbleState *jstate, const char *query,
 						 int query_loc)
 {
-	pgssLocationLen *locs;
+	LocationLen *locs;
 	core_yyscan_t yyscanner;
 	core_yy_extra_type yyextra;
 	core_YYSTYPE yylval;
@@ -3443,7 +2688,7 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 	 */
 	if (jstate->clocations_count > 1)
 		qsort(jstate->clocations, jstate->clocations_count,
-			  sizeof(pgssLocationLen), comp_location);
+			  sizeof(LocationLen), comp_location);
 	locs = jstate->clocations;
 
 	/* initialize the flex scanner --- should match raw_parser() */
@@ -3523,13 +2768,13 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 }
 
 /*
- * comp_location: comparator for qsorting pgssLocationLen structs by location
+ * comp_location: comparator for qsorting LocationLen structs by location
  */
 static int
 comp_location(const void *a, const void *b)
 {
-	int			l = ((const pgssLocationLen *) a)->location;
-	int			r = ((const pgssLocationLen *) b)->location;
+	int			l = ((const LocationLen *) a)->location;
+	int			r = ((const LocationLen *) b)->location;
 
 	if (l < r)
 		return -1;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.conf b/contrib/pg_stat_statements/pg_stat_statements.conf
index 13346e2807..e47b26040f 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.conf
+++ b/contrib/pg_stat_statements/pg_stat_statements.conf
@@ -1 +1,2 @@
 shared_preload_libraries = 'pg_stat_statements'
+compute_query_id = on
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ee4925d6d9..176d448798 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7561,6 +7561,32 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
      <title>Statistics Monitoring</title>
      <variablelist>
 
+     <varlistentry id="guc-compute-query-id" xreflabel="compute_query_id">
+      <term><varname>compute_query_id</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>compute_query_id</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables in core query identifier computation.  The
+        <xref linkend="pgstatstatements"/> extension requires a query
+        identifier to be computed.  Note that an external module can
+        alternatively be used if the in core query identifier computation
+        specification doesn't suit your need.  In this case, in core
+        computation must be disabled.  The default is <literal>off</literal>.
+       </para>
+       <note>
+        <para>
+         To ensure that a single source of query identifier can be used and is
+         well defined, extensions that calculate a query identifier should
+         throw an error if this parameter is <literal>on</literal> or if a
+         query idenfitier was already calculated.
+        </para>
+       </note>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><varname>log_statement_stats</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 464bf0e5ae..b242568041 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -20,6 +20,14 @@
   This means that a server restart is needed to add or remove the module.
  </para>
 
+ <para>
+  The module won't track any statistics unless query identifiers are
+  calculated.  This can be done by enabling <xref
+  linkend="guc-compute-query-id"/> or using a third-party module.  Note that
+  all statistics tracked by this module must be reset if the query identifier
+  is changed.
+ </para>
+
  <para>
    When <filename>pg_stat_statements</filename> is loaded, it tracks
    statistics across all databases of the server.  To access and manipulate
@@ -84,7 +92,7 @@
        <structfield>queryid</structfield> <type>bigint</type>
       </para>
       <para>
-       Internal hash code, computed from the statement's parse tree
+       Hash code to identify identical normalized queries.
       </para></entry>
      </row>
 
@@ -386,6 +394,16 @@
    are compared strictly on the basis of their textual query strings, however.
   </para>
 
+  <note>
+   <para>
+    All the following details about constant replacement and
+    <structfield>queryid</structfield> only applies when <xref
+    linkend="guc-compute-query-id"/> is enabled.  If you use an external module
+    instead, you should refer to its documentation for the implication of their
+    <structfield>queryid</structfield> heuristics.
+   </para>
+  </note>
+
   <para>
    When a constant's value has been ignored for purposes of matching the query
    to other queries, the constant is replaced by a parameter symbol, such
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 7149724953..c565c80365 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -46,6 +46,8 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/queryjumble.h"
 #include "utils/rel.h"
 
 
@@ -107,6 +109,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -119,8 +122,11 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	query = transformTopLevelStmt(pstate, parseTree);
 
+	if (compute_query_id)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
@@ -140,6 +146,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -152,8 +159,11 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 	/* make sure all is well with parameter types */
 	check_variable_parameters(pstate, query);
 
+	if (compute_query_id)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2b1b68109f..7e034b72b1 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -665,6 +665,7 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 	ParseState *pstate;
 	Query	   *query;
 	List	   *querytree_list;
+	JumbleState *jstate = NULL;
 
 	Assert(query_string != NULL);	/* required as of 8.4 */
 
@@ -683,8 +684,11 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	query = transformTopLevelStmt(pstate, parsetree);
 
+	if (compute_query_id)
+		jstate = JumbleQuery(query, query_string);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 2397fc2453..1d5327cf64 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -22,6 +22,7 @@ OBJS = \
 	pg_rusage.o \
 	ps_status.o \
 	queryenvironment.o \
+	queryjumble.o \
 	rls.o \
 	sampling.o \
 	superuser.o \
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 2964efda96..72a8f6d9ff 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -521,6 +521,7 @@ extern const struct config_enum_entry dynamic_shared_memory_options[];
 /*
  * GUC option variables that are exported from this module
  */
+bool		compute_query_id = false;
 bool		log_duration = false;
 bool		Debug_print_plan = false;
 bool		Debug_print_parse = false;
@@ -1435,6 +1436,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"compute_query_id", PGC_SUSET, STATS_MONITORING,
+			gettext_noop("Compute query identifiers."),
+			NULL
+		},
+		&compute_query_id,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"log_parser_stats", PGC_SUSET, STATS_MONITORING,
 			gettext_noop("Writes parser performance statistics to the server log."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 86425965d0..01493ed3d4 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -595,6 +595,7 @@
 
 # - Monitoring -
 
+#compute_query_id = off
 #log_parser_stats = off
 #log_planner_stats = off
 #log_executor_stats = off
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
new file mode 100644
index 0000000000..ae84fcac6e
--- /dev/null
+++ b/src/backend/utils/misc/queryjumble.c
@@ -0,0 +1,834 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.c
+ *	 Query normalization and fingerprinting.
+ *
+ * Normalization is a process whereby similar queries, typically differing only
+ * in their constants (though the exact rules are somewhat more subtle than
+ * that) are recognized as equivalent, and are tracked as a single entry.  This
+ * is particularly useful for non-prepared queries.
+ *
+ * Normalization is implemented by fingerprinting queries, selectively
+ * serializing those fields of each query tree's nodes that are judged to be
+ * essential to the query.  This is referred to as a query jumble.  This is
+ * distinct from a regular serialization in that various extraneous
+ * information is ignored as irrelevant or not essential to the query, such
+ * as the collations of Vars and, most notably, the values of constants.
+ *
+ * This jumble is acquired at the end of parse analysis of each query, and
+ * a 64-bit hash of it is stored into the query's Query.queryId field.
+ * The server then copies this value around, making it available in plan
+ * tree(s) generated from the query.  The executor can then use this value
+ * to blame query costs on the proper queryId.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/misc/queryjumble.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "common/hashfn.h"
+#include "miscadmin.h"
+#include "parser/scansup.h"
+#include "utils/queryjumble.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+static uint64 compute_utility_queryid(const char *str, int query_len);
+static void AppendJumble(JumbleState *jstate,
+						 const unsigned char *item, Size size);
+static void JumbleQueryInternal(JumbleState *jstate, Query *query);
+static void JumbleRangeTable(JumbleState *jstate, List *rtable);
+static void JumbleRowMarks(JumbleState *jstate, List *rowMarks);
+static void JumbleExpr(JumbleState *jstate, Node *node);
+static void RecordConstLocation(JumbleState *jstate, int location);
+
+/*
+ * Given a possibly multi-statement source string, confine our attention to the
+ * relevant part of the string.
+ */
+const char *
+clean_querytext(const char *query, int *location, int *len)
+{
+	int query_location = *location;
+	int query_len = *len;
+
+	/* First apply starting offset, unless it's -1 (unknown). */
+	if (query_location >= 0)
+	{
+		Assert(query_location <= strlen(query));
+		query += query_location;
+		/* Length of 0 (or -1) means "rest of string" */
+		if (query_len <= 0)
+			query_len = strlen(query);
+		else
+			Assert(query_len <= strlen(query));
+	}
+	else
+	{
+		/* If query location is unknown, distrust query_len as well */
+		query_location = 0;
+		query_len = strlen(query);
+	}
+
+	/*
+	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
+	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 */
+	while (query_len > 0 && scanner_isspace(query[0]))
+		query++, query_location++, query_len--;
+	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
+		query_len--;
+
+	*location = query_location;
+	*len = query_len;
+
+	return query;
+}
+
+JumbleState *
+JumbleQuery(Query *query, const char *querytext)
+{
+	JumbleState *jstate = NULL;
+	if (query->utilityStmt)
+	{
+		const char *sql;
+		int query_location = query->stmt_location;
+		int query_len = query->stmt_len;
+
+		/*
+		 * Confine our attention to the relevant part of the string, if the
+		 * query is a portion of a multi-statement source string.
+		 */
+		sql = clean_querytext(querytext, &query_location, &query_len);
+
+		query->queryId = compute_utility_queryid(sql, query_len);
+	}
+	else
+	{
+		jstate = (JumbleState *) palloc(sizeof(JumbleState));
+
+		/* Set up workspace for query jumbling */
+		jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
+		jstate->jumble_len = 0;
+		jstate->clocations_buf_size = 32;
+		jstate->clocations = (LocationLen *)
+			palloc(jstate->clocations_buf_size * sizeof(LocationLen));
+		jstate->clocations_count = 0;
+		jstate->highest_extern_param_id = 0;
+
+		/* Compute query ID and mark the Query node with it */
+		JumbleQueryInternal(jstate, query);
+		query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
+														  jstate->jumble_len,
+														  0));
+
+		/*
+		 * If we are unlucky enough to get a hash of zero, use 1 instead, to
+		 * prevent confusion with the utility-statement case.
+		 */
+		if (query->queryId == UINT64CONST(0))
+			query->queryId = UINT64CONST(1);
+	}
+
+	return jstate;
+}
+
+/*
+ * Compute a query identifier for the given utility query string.
+ */
+static uint64
+compute_utility_queryid(const char *str, int query_len)
+{
+	uint64 queryId;
+
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+											   query_len, 0));
+
+	/*
+	 * If we are unlucky enough to get a hash of zero(invalid), use
+	 * queryID as 2 instead, queryID 1 is already in use for normal
+	 * statements.
+	 */
+	if (queryId == UINT64CONST(0))
+		queryId = UINT64CONST(2);
+
+	return queryId;
+}
+
+/*
+ * AppendJumble: Append a value that is substantive in a given query to
+ * the current jumble.
+ */
+static void
+AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
+{
+	unsigned char *jumble = jstate->jumble;
+	Size		jumble_len = jstate->jumble_len;
+
+	/*
+	 * Whenever the jumble buffer is full, we hash the current contents and
+	 * reset the buffer to contain just that hash value, thus relying on the
+	 * hash to summarize everything so far.
+	 */
+	while (size > 0)
+	{
+		Size		part_size;
+
+		if (jumble_len >= JUMBLE_SIZE)
+		{
+			uint64		start_hash;
+
+			start_hash = DatumGetUInt64(hash_any_extended(jumble,
+														  JUMBLE_SIZE, 0));
+			memcpy(jumble, &start_hash, sizeof(start_hash));
+			jumble_len = sizeof(start_hash);
+		}
+		part_size = Min(size, JUMBLE_SIZE - jumble_len);
+		memcpy(jumble + jumble_len, item, part_size);
+		jumble_len += part_size;
+		item += part_size;
+		size -= part_size;
+	}
+	jstate->jumble_len = jumble_len;
+}
+
+/*
+ * Wrappers around AppendJumble to encapsulate details of serialization
+ * of individual local variable elements.
+ */
+#define APP_JUMB(item) \
+	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
+#define APP_JUMB_STRING(str) \
+	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
+
+/*
+ * JumbleQueryInternal: Selectively serialize the query tree, appending
+ * significant data to the "query jumble" while ignoring nonsignificant data.
+ *
+ * Rule of thumb for what to include is that we should ignore anything not
+ * semantically significant (such as alias names) as well as anything that can
+ * be deduced from child nodes (else we'd just be double-hashing that piece
+ * of information).
+ */
+static void
+JumbleQueryInternal(JumbleState *jstate, Query *query)
+{
+	Assert(IsA(query, Query));
+	Assert(query->utilityStmt == NULL);
+
+	APP_JUMB(query->commandType);
+	/* resultRelation is usually predictable from commandType */
+	JumbleExpr(jstate, (Node *) query->cteList);
+	JumbleRangeTable(jstate, query->rtable);
+	JumbleExpr(jstate, (Node *) query->jointree);
+	JumbleExpr(jstate, (Node *) query->targetList);
+	JumbleExpr(jstate, (Node *) query->onConflict);
+	JumbleExpr(jstate, (Node *) query->returningList);
+	JumbleExpr(jstate, (Node *) query->groupClause);
+	JumbleExpr(jstate, (Node *) query->groupingSets);
+	JumbleExpr(jstate, query->havingQual);
+	JumbleExpr(jstate, (Node *) query->windowClause);
+	JumbleExpr(jstate, (Node *) query->distinctClause);
+	JumbleExpr(jstate, (Node *) query->sortClause);
+	JumbleExpr(jstate, query->limitOffset);
+	JumbleExpr(jstate, query->limitCount);
+	JumbleRowMarks(jstate, query->rowMarks);
+	JumbleExpr(jstate, query->setOperations);
+}
+
+/*
+ * Jumble a range table
+ */
+static void
+JumbleRangeTable(JumbleState *jstate, List *rtable)
+{
+	ListCell   *lc;
+
+	foreach(lc, rtable)
+	{
+		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+		APP_JUMB(rte->rtekind);
+		switch (rte->rtekind)
+		{
+			case RTE_RELATION:
+				APP_JUMB(rte->relid);
+				JumbleExpr(jstate, (Node *) rte->tablesample);
+				break;
+			case RTE_SUBQUERY:
+				JumbleQueryInternal(jstate, rte->subquery);
+				break;
+			case RTE_JOIN:
+				APP_JUMB(rte->jointype);
+				break;
+			case RTE_FUNCTION:
+				JumbleExpr(jstate, (Node *) rte->functions);
+				break;
+			case RTE_TABLEFUNC:
+				JumbleExpr(jstate, (Node *) rte->tablefunc);
+				break;
+			case RTE_VALUES:
+				JumbleExpr(jstate, (Node *) rte->values_lists);
+				break;
+			case RTE_CTE:
+
+				/*
+				 * Depending on the CTE name here isn't ideal, but it's the
+				 * only info we have to identify the referenced WITH item.
+				 */
+				APP_JUMB_STRING(rte->ctename);
+				APP_JUMB(rte->ctelevelsup);
+				break;
+			case RTE_NAMEDTUPLESTORE:
+				APP_JUMB_STRING(rte->enrname);
+				break;
+			case RTE_RESULT:
+				break;
+			default:
+				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
+				break;
+		}
+	}
+}
+
+/*
+ * Jumble a rowMarks list
+ */
+static void
+JumbleRowMarks(JumbleState *jstate, List *rowMarks)
+{
+	ListCell   *lc;
+
+	foreach(lc, rowMarks)
+	{
+		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
+
+		if (!rowmark->pushedDown)
+		{
+			APP_JUMB(rowmark->rti);
+			APP_JUMB(rowmark->strength);
+			APP_JUMB(rowmark->waitPolicy);
+		}
+	}
+}
+
+/*
+ * Jumble an expression tree
+ *
+ * In general this function should handle all the same node types that
+ * expression_tree_walker() does, and therefore it's coded to be as parallel
+ * to that function as possible.  However, since we are only invoked on
+ * queries immediately post-parse-analysis, we need not handle node types
+ * that only appear in planning.
+ *
+ * Note: the reason we don't simply use expression_tree_walker() is that the
+ * point of that function is to support tree walkers that don't care about
+ * most tree node types, but here we care about all types.  We should complain
+ * about any unrecognized node type.
+ */
+static void
+JumbleExpr(JumbleState *jstate, Node *node)
+{
+	ListCell   *temp;
+
+	if (node == NULL)
+		return;
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+
+	/*
+	 * We always emit the node's NodeTag, then any additional fields that are
+	 * considered significant, and then we recurse to any child nodes.
+	 */
+	APP_JUMB(node->type);
+
+	switch (nodeTag(node))
+	{
+		case T_Var:
+			{
+				Var		   *var = (Var *) node;
+
+				APP_JUMB(var->varno);
+				APP_JUMB(var->varattno);
+				APP_JUMB(var->varlevelsup);
+			}
+			break;
+		case T_Const:
+			{
+				Const	   *c = (Const *) node;
+
+				/* We jumble only the constant's type, not its value */
+				APP_JUMB(c->consttype);
+				/* Also, record its parse location for query normalization */
+				RecordConstLocation(jstate, c->location);
+			}
+			break;
+		case T_Param:
+			{
+				Param	   *p = (Param *) node;
+
+				APP_JUMB(p->paramkind);
+				APP_JUMB(p->paramid);
+				APP_JUMB(p->paramtype);
+				/* Also, track the highest external Param id */
+				if (p->paramkind == PARAM_EXTERN &&
+					p->paramid > jstate->highest_extern_param_id)
+					jstate->highest_extern_param_id = p->paramid;
+			}
+			break;
+		case T_Aggref:
+			{
+				Aggref	   *expr = (Aggref *) node;
+
+				APP_JUMB(expr->aggfnoid);
+				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggorder);
+				JumbleExpr(jstate, (Node *) expr->aggdistinct);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_GroupingFunc:
+			{
+				GroupingFunc *grpnode = (GroupingFunc *) node;
+
+				JumbleExpr(jstate, (Node *) grpnode->refs);
+			}
+			break;
+		case T_WindowFunc:
+			{
+				WindowFunc *expr = (WindowFunc *) node;
+
+				APP_JUMB(expr->winfnoid);
+				APP_JUMB(expr->winref);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_SubscriptingRef:
+			{
+				SubscriptingRef *sbsref = (SubscriptingRef *) node;
+
+				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
+			}
+			break;
+		case T_FuncExpr:
+			{
+				FuncExpr   *expr = (FuncExpr *) node;
+
+				APP_JUMB(expr->funcid);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_NamedArgExpr:
+			{
+				NamedArgExpr *nae = (NamedArgExpr *) node;
+
+				APP_JUMB(nae->argnumber);
+				JumbleExpr(jstate, (Node *) nae->arg);
+			}
+			break;
+		case T_OpExpr:
+		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
+		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
+			{
+				OpExpr	   *expr = (OpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_ScalarArrayOpExpr:
+			{
+				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				APP_JUMB(expr->useOr);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_BoolExpr:
+			{
+				BoolExpr   *expr = (BoolExpr *) node;
+
+				APP_JUMB(expr->boolop);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_SubLink:
+			{
+				SubLink    *sublink = (SubLink *) node;
+
+				APP_JUMB(sublink->subLinkType);
+				APP_JUMB(sublink->subLinkId);
+				JumbleExpr(jstate, (Node *) sublink->testexpr);
+				JumbleQueryInternal(jstate, castNode(Query, sublink->subselect));
+			}
+			break;
+		case T_FieldSelect:
+			{
+				FieldSelect *fs = (FieldSelect *) node;
+
+				APP_JUMB(fs->fieldnum);
+				JumbleExpr(jstate, (Node *) fs->arg);
+			}
+			break;
+		case T_FieldStore:
+			{
+				FieldStore *fstore = (FieldStore *) node;
+
+				JumbleExpr(jstate, (Node *) fstore->arg);
+				JumbleExpr(jstate, (Node *) fstore->newvals);
+			}
+			break;
+		case T_RelabelType:
+			{
+				RelabelType *rt = (RelabelType *) node;
+
+				APP_JUMB(rt->resulttype);
+				JumbleExpr(jstate, (Node *) rt->arg);
+			}
+			break;
+		case T_CoerceViaIO:
+			{
+				CoerceViaIO *cio = (CoerceViaIO *) node;
+
+				APP_JUMB(cio->resulttype);
+				JumbleExpr(jstate, (Node *) cio->arg);
+			}
+			break;
+		case T_ArrayCoerceExpr:
+			{
+				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
+
+				APP_JUMB(acexpr->resulttype);
+				JumbleExpr(jstate, (Node *) acexpr->arg);
+				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
+			}
+			break;
+		case T_ConvertRowtypeExpr:
+			{
+				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
+
+				APP_JUMB(crexpr->resulttype);
+				JumbleExpr(jstate, (Node *) crexpr->arg);
+			}
+			break;
+		case T_CollateExpr:
+			{
+				CollateExpr *ce = (CollateExpr *) node;
+
+				APP_JUMB(ce->collOid);
+				JumbleExpr(jstate, (Node *) ce->arg);
+			}
+			break;
+		case T_CaseExpr:
+			{
+				CaseExpr   *caseexpr = (CaseExpr *) node;
+
+				JumbleExpr(jstate, (Node *) caseexpr->arg);
+				foreach(temp, caseexpr->args)
+				{
+					CaseWhen   *when = lfirst_node(CaseWhen, temp);
+
+					JumbleExpr(jstate, (Node *) when->expr);
+					JumbleExpr(jstate, (Node *) when->result);
+				}
+				JumbleExpr(jstate, (Node *) caseexpr->defresult);
+			}
+			break;
+		case T_CaseTestExpr:
+			{
+				CaseTestExpr *ct = (CaseTestExpr *) node;
+
+				APP_JUMB(ct->typeId);
+			}
+			break;
+		case T_ArrayExpr:
+			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
+			break;
+		case T_RowExpr:
+			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
+			break;
+		case T_RowCompareExpr:
+			{
+				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
+
+				APP_JUMB(rcexpr->rctype);
+				JumbleExpr(jstate, (Node *) rcexpr->largs);
+				JumbleExpr(jstate, (Node *) rcexpr->rargs);
+			}
+			break;
+		case T_CoalesceExpr:
+			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
+			break;
+		case T_MinMaxExpr:
+			{
+				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
+
+				APP_JUMB(mmexpr->op);
+				JumbleExpr(jstate, (Node *) mmexpr->args);
+			}
+			break;
+		case T_SQLValueFunction:
+			{
+				SQLValueFunction *svf = (SQLValueFunction *) node;
+
+				APP_JUMB(svf->op);
+				/* type is fully determined by op */
+				APP_JUMB(svf->typmod);
+			}
+			break;
+		case T_XmlExpr:
+			{
+				XmlExpr    *xexpr = (XmlExpr *) node;
+
+				APP_JUMB(xexpr->op);
+				JumbleExpr(jstate, (Node *) xexpr->named_args);
+				JumbleExpr(jstate, (Node *) xexpr->args);
+			}
+			break;
+		case T_NullTest:
+			{
+				NullTest   *nt = (NullTest *) node;
+
+				APP_JUMB(nt->nulltesttype);
+				JumbleExpr(jstate, (Node *) nt->arg);
+			}
+			break;
+		case T_BooleanTest:
+			{
+				BooleanTest *bt = (BooleanTest *) node;
+
+				APP_JUMB(bt->booltesttype);
+				JumbleExpr(jstate, (Node *) bt->arg);
+			}
+			break;
+		case T_CoerceToDomain:
+			{
+				CoerceToDomain *cd = (CoerceToDomain *) node;
+
+				APP_JUMB(cd->resulttype);
+				JumbleExpr(jstate, (Node *) cd->arg);
+			}
+			break;
+		case T_CoerceToDomainValue:
+			{
+				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
+
+				APP_JUMB(cdv->typeId);
+			}
+			break;
+		case T_SetToDefault:
+			{
+				SetToDefault *sd = (SetToDefault *) node;
+
+				APP_JUMB(sd->typeId);
+			}
+			break;
+		case T_CurrentOfExpr:
+			{
+				CurrentOfExpr *ce = (CurrentOfExpr *) node;
+
+				APP_JUMB(ce->cvarno);
+				if (ce->cursor_name)
+					APP_JUMB_STRING(ce->cursor_name);
+				APP_JUMB(ce->cursor_param);
+			}
+			break;
+		case T_NextValueExpr:
+			{
+				NextValueExpr *nve = (NextValueExpr *) node;
+
+				APP_JUMB(nve->seqid);
+				APP_JUMB(nve->typeId);
+			}
+			break;
+		case T_InferenceElem:
+			{
+				InferenceElem *ie = (InferenceElem *) node;
+
+				APP_JUMB(ie->infercollid);
+				APP_JUMB(ie->inferopclass);
+				JumbleExpr(jstate, ie->expr);
+			}
+			break;
+		case T_TargetEntry:
+			{
+				TargetEntry *tle = (TargetEntry *) node;
+
+				APP_JUMB(tle->resno);
+				APP_JUMB(tle->ressortgroupref);
+				JumbleExpr(jstate, (Node *) tle->expr);
+			}
+			break;
+		case T_RangeTblRef:
+			{
+				RangeTblRef *rtr = (RangeTblRef *) node;
+
+				APP_JUMB(rtr->rtindex);
+			}
+			break;
+		case T_JoinExpr:
+			{
+				JoinExpr   *join = (JoinExpr *) node;
+
+				APP_JUMB(join->jointype);
+				APP_JUMB(join->isNatural);
+				APP_JUMB(join->rtindex);
+				JumbleExpr(jstate, join->larg);
+				JumbleExpr(jstate, join->rarg);
+				JumbleExpr(jstate, join->quals);
+			}
+			break;
+		case T_FromExpr:
+			{
+				FromExpr   *from = (FromExpr *) node;
+
+				JumbleExpr(jstate, (Node *) from->fromlist);
+				JumbleExpr(jstate, from->quals);
+			}
+			break;
+		case T_OnConflictExpr:
+			{
+				OnConflictExpr *conf = (OnConflictExpr *) node;
+
+				APP_JUMB(conf->action);
+				JumbleExpr(jstate, (Node *) conf->arbiterElems);
+				JumbleExpr(jstate, conf->arbiterWhere);
+				JumbleExpr(jstate, (Node *) conf->onConflictSet);
+				JumbleExpr(jstate, conf->onConflictWhere);
+				APP_JUMB(conf->constraint);
+				APP_JUMB(conf->exclRelIndex);
+				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
+			}
+			break;
+		case T_List:
+			foreach(temp, (List *) node)
+			{
+				JumbleExpr(jstate, (Node *) lfirst(temp));
+			}
+			break;
+		case T_IntList:
+			foreach(temp, (List *) node)
+			{
+				APP_JUMB(lfirst_int(temp));
+			}
+			break;
+		case T_SortGroupClause:
+			{
+				SortGroupClause *sgc = (SortGroupClause *) node;
+
+				APP_JUMB(sgc->tleSortGroupRef);
+				APP_JUMB(sgc->eqop);
+				APP_JUMB(sgc->sortop);
+				APP_JUMB(sgc->nulls_first);
+			}
+			break;
+		case T_GroupingSet:
+			{
+				GroupingSet *gsnode = (GroupingSet *) node;
+
+				JumbleExpr(jstate, (Node *) gsnode->content);
+			}
+			break;
+		case T_WindowClause:
+			{
+				WindowClause *wc = (WindowClause *) node;
+
+				APP_JUMB(wc->winref);
+				APP_JUMB(wc->frameOptions);
+				JumbleExpr(jstate, (Node *) wc->partitionClause);
+				JumbleExpr(jstate, (Node *) wc->orderClause);
+				JumbleExpr(jstate, wc->startOffset);
+				JumbleExpr(jstate, wc->endOffset);
+			}
+			break;
+		case T_CommonTableExpr:
+			{
+				CommonTableExpr *cte = (CommonTableExpr *) node;
+
+				/* we store the string name because RTE_CTE RTEs need it */
+				APP_JUMB_STRING(cte->ctename);
+				APP_JUMB(cte->ctematerialized);
+				JumbleQueryInternal(jstate, castNode(Query, cte->ctequery));
+			}
+			break;
+		case T_SetOperationStmt:
+			{
+				SetOperationStmt *setop = (SetOperationStmt *) node;
+
+				APP_JUMB(setop->op);
+				APP_JUMB(setop->all);
+				JumbleExpr(jstate, setop->larg);
+				JumbleExpr(jstate, setop->rarg);
+			}
+			break;
+		case T_RangeTblFunction:
+			{
+				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
+
+				JumbleExpr(jstate, rtfunc->funcexpr);
+			}
+			break;
+		case T_TableFunc:
+			{
+				TableFunc  *tablefunc = (TableFunc *) node;
+
+				JumbleExpr(jstate, tablefunc->docexpr);
+				JumbleExpr(jstate, tablefunc->rowexpr);
+				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
+			}
+			break;
+		case T_TableSampleClause:
+			{
+				TableSampleClause *tsc = (TableSampleClause *) node;
+
+				APP_JUMB(tsc->tsmhandler);
+				JumbleExpr(jstate, (Node *) tsc->args);
+				JumbleExpr(jstate, (Node *) tsc->repeatable);
+			}
+			break;
+		default:
+			/* Only a warning, since we can stumble along anyway */
+			elog(WARNING, "unrecognized node type: %d",
+				 (int) nodeTag(node));
+			break;
+	}
+}
+
+/*
+ * Record location of constant within query string of query tree
+ * that is currently being walked.
+ */
+static void
+RecordConstLocation(JumbleState *jstate, int location)
+{
+	/* -1 indicates unknown or undefined location */
+	if (location >= 0)
+	{
+		/* enlarge array if needed */
+		if (jstate->clocations_count >= jstate->clocations_buf_size)
+		{
+			jstate->clocations_buf_size *= 2;
+			jstate->clocations = (LocationLen *)
+				repalloc(jstate->clocations,
+						 jstate->clocations_buf_size *
+						 sizeof(LocationLen));
+		}
+		jstate->clocations[jstate->clocations_count].location = location;
+		/* initialize lengths to -1 to simplify third-party module usage */
+		jstate->clocations[jstate->clocations_count].length = -1;
+		jstate->clocations_count++;
+	}
+}
diff --git a/src/include/parser/analyze.h b/src/include/parser/analyze.h
index 4a3c9686f9..6716db6c13 100644
--- a/src/include/parser/analyze.h
+++ b/src/include/parser/analyze.h
@@ -15,10 +15,12 @@
 #define ANALYZE_H
 
 #include "parser/parse_node.h"
+#include "utils/queryjumble.h"
 
 /* Hook for plugins to get control at end of parse analysis */
 typedef void (*post_parse_analyze_hook_type) (ParseState *pstate,
-											  Query *query);
+											  Query *query,
+											  JumbleState *jstate);
 extern PGDLLIMPORT post_parse_analyze_hook_type post_parse_analyze_hook;
 
 
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 5004ee4177..9b6552b25b 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -248,6 +248,7 @@ extern bool log_btree_build_stats;
 extern PGDLLIMPORT bool check_function_bodies;
 extern bool session_auth_is_superuser;
 
+extern bool compute_query_id;
 extern bool log_duration;
 extern int	log_parameter_max_length;
 extern int	log_parameter_max_length_on_error;
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
new file mode 100644
index 0000000000..14087eea43
--- /dev/null
+++ b/src/include/utils/queryjumble.h
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.h
+ *	  Query normalization and fingerprinting.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/utils/queryjumble.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef QUERYJUBLE_H
+#define QUERYJUBLE_H
+
+#include "nodes/parsenodes.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+/*
+ * Struct for tracking locations/lengths of constants during normalization
+ */
+typedef struct LocationLen
+{
+	int			location;		/* start offset in query text */
+	int			length;			/* length in bytes, or -1 to ignore */
+} LocationLen;
+
+/*
+ * Working state for computing a query jumble and producing a normalized
+ * query string
+ */
+typedef struct JumbleState
+{
+	/* Jumble of current query tree */
+	unsigned char *jumble;
+
+	/* Number of bytes used in jumble[] */
+	Size		jumble_len;
+
+	/* Array of locations of constants that should be removed */
+	LocationLen *clocations;
+
+	/* Allocated length of clocations array */
+	int			clocations_buf_size;
+
+	/* Current number of valid entries in clocations array */
+	int			clocations_count;
+
+	/* highest Param id we've seen, in order to start normalization correctly */
+	int			highest_extern_param_id;
+} JumbleState;
+
+const char *clean_querytext(const char *query, int *location, int *len);
+JumbleState *JumbleQuery(Query *query, const char *querytext);
+
+#endif							/* QUERYJUMBLE_H */
-- 
2.30.1

v19-0002-Expose-queryid-in-pg_stat_activity-and-log_line_.patchtext/x-diff; charset=us-asciiDownload

From 977b7c40ea91fd3ec70eafeef02c143b46b46225 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Mon, 18 Mar 2019 18:55:50 +0100
Subject: [PATCH v19 2/3] Expose queryid in pg_stat_activity and
 log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.

Author: Julien Rouhaud
Reviewed-by: Evgeny Efimkin, Michael Paquier, Tatsuro Yamada, Torikoshi Atsushi, Bruce Momjian
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 112 +++++++-----------
 doc/src/sgml/config.sgml                      |  21 +++-
 doc/src/sgml/monitoring.sgml                  |  16 +++
 src/backend/catalog/system_views.sql          |   1 +
 src/backend/executor/execMain.c               |   8 ++
 src/backend/executor/execParallel.c           |  14 ++-
 src/backend/executor/nodeGather.c             |   3 +-
 src/backend/executor/nodeGatherMerge.c        |   4 +-
 src/backend/parser/analyze.c                  |   5 +
 src/backend/postmaster/pgstat.c               |  65 ++++++++++
 src/backend/tcop/postgres.c                   |   5 +
 src/backend/utils/adt/pgstatfuncs.c           |   7 +-
 src/backend/utils/error/elog.c                |   9 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          |  29 +++--
 src/include/catalog/pg_proc.dat               |   6 +-
 src/include/executor/execParallel.h           |   3 +-
 src/include/pgstat.h                          |   5 +
 src/include/utils/queryjumble.h               |   2 +-
 src/test/regress/expected/rules.out           |   9 +-
 20 files changed, 219 insertions(+), 106 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 498f2aa376..d9fdcf2241 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -65,6 +65,7 @@
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/queryjumble.h"
 #include "utils/memutils.h"
 #include "utils/timestamp.h"
 
@@ -99,6 +100,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+									!IsA(n, PrepareStmt) && \
+									!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -307,7 +316,6 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 								ProcessUtilityContext context, ParamListInfo params,
 								QueryEnvironment *queryEnv,
 								DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
 static void pgss_store(const char *query, uint64 queryId,
 					   int query_location, int query_len,
 					   pgssStoreKind kind,
@@ -804,16 +812,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * Clear queryId for prepared statements related utility, as those will
+	 * inherit from the underlying statement's one (except DEALLOCATE which is
+	 * entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt))
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -1055,6 +1061,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, QueryCompletion *qc)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
+	uint64		saved_queryId = pstmt->queryId;
+
+	/*
+	 * Force utility statements to get queryId zero.  We do this even in cases
+	 * where the statement contains an optimizable statement for which a
+	 * queryId could be derived (such as EXPLAIN or DECLARE CURSOR).  For such
+	 * cases, runtime control will first go through ProcessUtility and then the
+	 * executor, and we don't want the executor hooks to do anything, since we
+	 * are already measuring the statement's costs at the utility level.
+	 *
+	 * Note that this is only done if pg_stat_statements is enabled and
+	 * configured to track utility statements, in the unlikely possibility
+	 * that user configured another extension to handle utility statements
+	 * only.
+	 */
+	if (pgss_enabled(exec_nested_level) && pgss_track_utility)
+		pstmt->queryId = UINT64CONST(0);
 
 	/*
 	 * If it's an EXECUTE statement, we don't track it and don't increment the
@@ -1071,9 +1094,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	 * Likewise, we don't track execution of DEALLOCATE.
 	 */
 	if (pgss_track_utility && pgss_enabled(exec_nested_level) &&
-		!IsA(parsetree, ExecuteStmt) &&
-		!IsA(parsetree, PrepareStmt) &&
-		!IsA(parsetree, DeallocateStmt))
+		PGSS_HANDLED_UTILITY(parsetree))
 	{
 		instr_time	start;
 		instr_time	duration;
@@ -1128,7 +1149,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
 
 		pgss_store(queryString,
-				   0,			/* signal that it's a utility stmt */
+				   saved_queryId,
 				   pstmt->stmt_location,
 				   pstmt->stmt_len,
 				   PGSS_EXEC,
@@ -1151,23 +1172,12 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	}
 }
 
-/*
- * Given an arbitrarily long query string, produce a hash for the purposes of
- * identifying the query, without normalizing constants.  Used when hashing
- * utility statements.
- */
-static uint64
-pgss_hash_string(const char *str, int len)
-{
-	return DatumGetUInt64(hash_any_extended((const unsigned char *) str,
-											len, 0));
-}
-
 /*
  * Store some statistics for a statement.
  *
- * If queryId is 0 then this is a utility statement and we should compute
- * a suitable queryId internally.
+ * If queryId is 0 then this is a utility statement for which we couldn't
+ * compute a queryId during parse analysis, and we should compute a suitable
+ * queryId internally.
  *
  * If jstate is not NULL then we're trying to create an entry for which
  * we have no statistics as yet; we just want to record the normalized
@@ -1198,52 +1208,18 @@ pgss_store(const char *query, uint64 queryId,
 		return;
 
 	/*
-	 * Confine our attention to the relevant part of the string, if the query
-	 * is a portion of a multi-statement source string.
-	 *
-	 * First apply starting offset, unless it's -1 (unknown).
-	 */
-	if (query_location >= 0)
-	{
-		Assert(query_location <= strlen(query));
-		query += query_location;
-		/* Length of 0 (or -1) means "rest of string" */
-		if (query_len <= 0)
-			query_len = strlen(query);
-		else
-			Assert(query_len <= strlen(query));
-	}
-	else
-	{
-		/* If query location is unknown, distrust query_len as well */
-		query_location = 0;
-		query_len = strlen(query);
-	}
-
-	/*
-	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
-	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 * Nothing to do if compute_query_id isn't enabled and no other module
+	 * computed a query identifier.
 	 */
-	while (query_len > 0 && scanner_isspace(query[0]))
-		query++, query_location++, query_len--;
-	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
-		query_len--;
+	if (queryId == UINT64CONST(0))
+		return;
 
 	/*
-	 * For utility statements, we just hash the query string to get an ID.
+	 * Confine our attention to the relevant part of the string, if the query
+	 * is a portion of a multi-statement source string, and update query
+	 * location and length if needed.
 	 */
-	if (queryId == UINT64CONST(0))
-	{
-		queryId = pgss_hash_string(query, query_len);
-
-		/*
-		 * If we are unlucky enough to get a hash of zero(invalid), use
-		 * queryID as 2 instead, queryID 1 is already in use for normal
-		 * statements.
-		 */
-		if (queryId == UINT64CONST(0))
-			queryId = UINT64CONST(2);
-	}
+	query = CleanQuerytext(query, &query_location, &query_len);
 
 	/* Set up key for hashtable search */
 	key.userid = GetUserId();
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 176d448798..cfcd38b70b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6943,6 +6943,15 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>queryid: identifier of session's current query.
+             By default, query identifiers are not computed, so this field will
+             always be zero, unless <xref linkend="guc-compute-query-id"/>
+             parameter is enabled or if a third-party module that computes query
+             identifiers is configured.</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
@@ -7419,8 +7428,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       <listitem>
        <para>
         Enables the collection of information on the currently
-        executing command of each session, along with the time when
-        that command began execution. This parameter is on by
+        executing command of each session, along with its identifier and the
+        time when that command began execution. This parameter is on by
         default. Note that even when enabled, this information is not
         visible to all users, only to superusers and the user owning
         the session being reported on, so it should not represent a
@@ -7569,8 +7578,12 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </term>
       <listitem>
        <para>
-        Enables or disables in core query identifier computation.  The
-        <xref linkend="pgstatstatements"/> extension requires a query
+        Enables or disables in core query identifier computation.  A query
+        identifier can be displayed in the <link
+        linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
+        view, or emitted in the log if configured via the <xref
+        linkend="guc-log-line-prefix"/> parameter.  The <xref
+        linkend="pgstatstatements"/> extension also requires a query
         identifier to be computed.  Note that an external module can
         alternatively be used if the in core query identifier computation
         specification doesn't suit your need.  In this case, in core
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index db4b4e460c..1c80265ccf 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -910,6 +910,22 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </para></entry>
      </row>
 
+    <row>
+     <entry role="catalog_table_entry"><para role="column_definition">
+      <structfield>queryid</structfield> <type>bigint</type>
+     </para>
+     <para>
+      Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this field
+      shows the identifier of the currently executing query. In all other
+      states, it shows the identifier of last query that was executed.  By
+      default, query identifiers are not computed, so this field will always
+      be null, unless <xref linkend="guc-compute-query-id"/> parameter is
+      enabled or if a third-party module that computes query identifiers is
+      configured.
+     </para></entry>
+    </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 0dca65dc7b..012d86217f 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -764,6 +764,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0648dd82ba..e39cf20161 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -54,6 +54,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "tcop/utility.h"
@@ -128,6 +129,13 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	/* In some cases (e.g. an EXECUTE statement) a query execution will skip
+	 * parse analysis, which means that the queryid won't be reported.  Note
+	 * that it's harmless to report the queryid multiple time, as the call will
+	 * be ignored if the top level queryid has already been reported.
+	 */
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..26f1994a31 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -124,7 +124,7 @@ typedef struct ExecParallelInitializeDSMContext
 } ExecParallelInitializeDSMContext;
 
 /* Helper functions that run in the parallel leader. */
-static char *ExecSerializePlan(Plan *plan, EState *estate);
+static char *ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId);
 static bool ExecParallelEstimate(PlanState *node,
 								 ExecParallelEstimateContext *e);
 static bool ExecParallelInitializeDSM(PlanState *node,
@@ -143,7 +143,7 @@ static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
  * Create a serialized representation of the plan to be sent to each worker.
  */
 static char *
-ExecSerializePlan(Plan *plan, EState *estate)
+ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId)
 {
 	PlannedStmt *pstmt;
 	ListCell   *lc;
@@ -174,7 +174,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = UINT64CONST(0);
+	pstmt->queryId = queryId;
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
@@ -578,7 +578,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
 ParallelExecutorInfo *
 ExecInitParallelPlan(PlanState *planstate, EState *estate,
 					 Bitmapset *sendParams, int nworkers,
-					 int64 tuples_needed)
+					 int64 tuples_needed,
+					 uint64 queryId)
 {
 	ParallelExecutorInfo *pei;
 	ParallelContext *pcxt;
@@ -620,7 +621,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pei->planstate = planstate;
 
 	/* Fix up and serialize plan to be sent to workers. */
-	pstmt_data = ExecSerializePlan(planstate->plan, estate);
+	pstmt_data = ExecSerializePlan(planstate->plan, estate, queryId);
 
 	/* Create a parallel context. */
 	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -1403,8 +1404,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Setting debug_query_string for individual workers */
 	debug_query_string = queryDesc->sourceText;
 
-	/* Report workers' query for monitoring purposes */
+	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 9e1dc464cb..04c860f678 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -172,7 +172,8 @@ ExecGather(PlanState *pstate)
 												 estate,
 												 gather->initParam,
 												 gather->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index aa5743cebf..32f74e8c23 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -24,6 +24,7 @@
 #include "lib/binaryheap.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -216,7 +217,8 @@ ExecGatherMerge(PlanState *pstate)
 												 estate,
 												 gm->initParam,
 												 gm->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index c565c80365..d125ef7f98 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -44,6 +44,7 @@
 #include "parser/parse_target.h"
 #include "parser/parse_type.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
@@ -130,6 +131,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
@@ -167,6 +170,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 208a33692f..2419a2b003 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3381,6 +3381,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
+			beentry->st_queryid = 0;
 			proc->wait_event_info = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
@@ -3435,6 +3436,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	beentry->st_state = state;
 	beentry->st_state_start_timestamp = current_timestamp;
 
+	/*
+	 * If a new query is started, we reset the query identifier as it'll only
+	 * be known after parse analysis, to avoid reporting last query's
+	 * identifier.
+	 */
+	if (state == STATE_RUNNING)
+		beentry->st_queryid = 0;
+
 	if (cmd_str != NULL)
 	{
 		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
@@ -3445,6 +3454,48 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/* --------
+ * pgstat_report_queryid() -
+ *
+ *	Called to update top-level query identifier.
+ * --------
+ */
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	/*
+	 * if track_activities is disabled, st_queryid should already have been
+	 * reset
+	 */
+	if (!pgstat_track_activities)
+		return;
+
+	/*
+	 * We only report the top-level query identifiers.  The stored queryid is
+	 * reset when a backend calls pgstat_report_activity(STATE_RUNNING), or
+	 * with an explicit call to this function using the force flag.  If the
+	 * saved query identifier is not zero it means that it's not a top-level
+	 * command, so ignore the one provided unless it's an explicit call to
+	 * reset the identifier.
+	 */
+	if (beentry->st_queryid != 0 && !force)
+		return;
+
+	/*
+	 * Update my status entry, following the protocol of bumping
+	 * st_changecount before and after.  We use a volatile pointer here to
+	 * ensure the compiler doesn't try to get cute.
+	 */
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+	beentry->st_queryid = queryId;
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
 /*-----------
  * pgstat_progress_start_command() -
  *
@@ -5178,6 +5229,20 @@ pgstat_get_db_entry(Oid databaseid, bool create)
 	return result;
 }
 
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ *	Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+	if (!MyBEEntry)
+		return 0;
+
+	return MyBEEntry->st_queryid;
+}
+
 
 /*
  * Lookup the hash table entry for the specified table. If no hash
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7e034b72b1..d66cee79f0 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -692,6 +692,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	if (log_parser_stats)
 		ShowUsage("PARSE ANALYSIS STATISTICS");
 
@@ -910,6 +912,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
 			stmt->utilityStmt = query->utilityStmt;
 			stmt->stmt_location = query->stmt_location;
 			stmt->stmt_len = query->stmt_len;
+			stmt->queryId = query->queryId;
 		}
 		else
 		{
@@ -1026,6 +1029,8 @@ exec_simple_query(const char *query_string)
 		DestReceiver *receiver;
 		int16		format;
 
+		pgstat_report_queryid(0, true);
+
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 5102227a60..8e81eef8cb 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -569,7 +569,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	29
+#define PG_STAT_GET_ACTIVITY_COLS	30
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -914,6 +914,10 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[27] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			if (beentry->st_queryid == 0)
+				nulls[29] = true;
+			else
+				values[29] = DatumGetUInt64(beentry->st_queryid);
 		}
 		else
 		{
@@ -941,6 +945,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[26] = true;
 			nulls[27] = true;
 			nulls[28] = true;
+			nulls[29] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index e729ebece7..7aa484c5ed 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -77,7 +77,6 @@
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
-#include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2685,6 +2684,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (padding != 0)
+					appendStringInfo(buf, "%*ld", padding,
+							pgstat_get_my_queryid());
+				else
+					appendStringInfo(buf, "%ld",
+							pgstat_get_my_queryid());
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 01493ed3d4..47d6e2019b 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -542,6 +542,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index ae84fcac6e..b0a5731ef7 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -39,7 +39,7 @@
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
-static uint64 compute_utility_queryid(const char *str, int query_len);
+static uint64 compute_utility_queryid(const char *str, int query_location, int query_len);
 static void AppendJumble(JumbleState *jstate,
 						 const unsigned char *item, Size size);
 static void JumbleQueryInternal(JumbleState *jstate, Query *query);
@@ -53,7 +53,7 @@ static void RecordConstLocation(JumbleState *jstate, int location);
  * relevant part of the string.
  */
 const char *
-clean_querytext(const char *query, int *location, int *len)
+CleanQuerytext(const char *query, int *location, int *len)
 {
 	int query_location = *location;
 	int query_len = *len;
@@ -97,17 +97,9 @@ JumbleQuery(Query *query, const char *querytext)
 	JumbleState *jstate = NULL;
 	if (query->utilityStmt)
 	{
-		const char *sql;
-		int query_location = query->stmt_location;
-		int query_len = query->stmt_len;
-
-		/*
-		 * Confine our attention to the relevant part of the string, if the
-		 * query is a portion of a multi-statement source string.
-		 */
-		sql = clean_querytext(querytext, &query_location, &query_len);
-
-		query->queryId = compute_utility_queryid(sql, query_len);
+		query->queryId = compute_utility_queryid(querytext,
+												 query->stmt_location,
+												 query->stmt_len);
 	}
 	else
 	{
@@ -143,11 +135,18 @@ JumbleQuery(Query *query, const char *querytext)
  * Compute a query identifier for the given utility query string.
  */
 static uint64
-compute_utility_queryid(const char *str, int query_len)
+compute_utility_queryid(const char *query_text, int query_location, int query_len)
 {
 	uint64 queryId;
+	const char *sql;
+
+	/*
+	 * Confine our attention to the relevant part of the string, if the
+	 * query is a portion of a multi-statement source string.
+	 */
+	sql = CleanQuerytext(query_text, &query_location, &query_len);
 
-	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) sql,
 											   query_len, 0));
 
 	/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index e259531f60..9550de0798 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5249,9 +5249,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 3888175a2f..e0e08e0b27 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -39,7 +39,8 @@ typedef struct ParallelExecutorInfo
 
 extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
 												  EState *estate, Bitmapset *sendParam, int nworkers,
-												  int64 tuples_needed);
+												  int64 tuples_needed,
+												  uint64 queryId);
 extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
 extern void ExecParallelFinish(ParallelExecutorInfo *pei);
 extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be43c04802..09d36a1e23 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1263,6 +1263,9 @@ typedef struct PgBackendStatus
 	ProgressCommandType st_progress_command;
 	Oid			st_progress_command_target;
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+
+	/* query identifier, optionally computed using post_parse_analyze_hook */
+	uint64		st_queryid;
 } PgBackendStatus;
 
 /*
@@ -1457,6 +1460,7 @@ extern void pgstat_initialize(void);
 extern void pgstat_bestart(void);
 
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_queryid(uint64 queryId, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
@@ -1465,6 +1469,7 @@ extern const char *pgstat_get_wait_event_type(uint32 wait_event_info);
 extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
+extern uint64 pgstat_get_my_queryid(void);
 
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
index 14087eea43..520cd4f43e 100644
--- a/src/include/utils/queryjumble.h
+++ b/src/include/utils/queryjumble.h
@@ -52,7 +52,7 @@ typedef struct JumbleState
 	int			highest_extern_param_id;
 } JumbleState;
 
-const char *clean_querytext(const char *query, int *location, int *len);
+const char *CleanQuerytext(const char *query, int *location, int *len);
 JumbleState *JumbleQuery(Query *query, const char *querytext);
 
 #endif							/* QUERYJUMBLE_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 9b12cc122a..ff3506d5d7 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1762,9 +1762,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1876,7 +1877,7 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
@@ -2046,7 +2047,7 @@ pg_stat_replication| SELECT s.pid,
     w.sync_priority,
     w.sync_state,
     w.reply_time
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_replication_slots| SELECT s.slot_name,
@@ -2076,7 +2077,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,
-- 
2.30.1

v19-0003-Expose-query-identifier-in-verbose-explain.patchtext/x-diff; charset=us-asciiDownload

From 8430936be7427a628ca7e2bd1d3d15b3f17424be Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Sun, 8 Mar 2020 14:34:44 +0100
Subject: [PATCH v19 3/3] Expose query identifier in verbose explain

If a query identifier has been computed, either by enabling compute_query_id or
using a third-party module, verbose explain will display it.

Author: Julien Rouhaud
Reviewed-by: Bruce Momjian
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=D4LuFA@mail.gmail.com
---
 doc/src/sgml/config.sgml              |  6 +++---
 doc/src/sgml/ref/explain.sgml         |  6 ++++--
 src/backend/commands/explain.c        | 18 ++++++++++++++++++
 src/test/regress/expected/explain.out | 11 ++++++++++-
 src/test/regress/sql/explain.sql      |  5 ++++-
 5 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index cfcd38b70b..6239bf1d10 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7581,9 +7581,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         Enables or disables in core query identifier computation.  A query
         identifier can be displayed in the <link
         linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
-        view, or emitted in the log if configured via the <xref
-        linkend="guc-log-line-prefix"/> parameter.  The <xref
-        linkend="pgstatstatements"/> extension also requires a query
+        view, using <command>EXPLAIN</command>, or emitted in the log if
+        configured via the <xref linkend="guc-log-line-prefix"/> parameter.
+        The <xref linkend="pgstatstatements"/> extension also requires a query
         identifier to be computed.  Note that an external module can
         alternatively be used if the in core query identifier computation
         specification doesn't suit your need.  In this case, in core
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index c4512332a0..135dff6d3d 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -136,8 +136,10 @@ ROLLBACK;
       the output column list for each node in the plan tree, schema-qualify
       table and function names, always label variables in expressions with
       their range table alias, and always print the name of each trigger for
-      which statistics are displayed.  This parameter defaults to
-      <literal>FALSE</literal>.
+      which statistics are displayed.  The query identifier will also be
+      displayed if one has been compute, see <xref
+      linkend="guc-compute-query-id"/> for more details.  This parameter
+      defaults to <literal>FALSE</literal>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index afc45429ba..9794c4e794 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -24,6 +24,7 @@
 #include "nodes/extensible.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "parser/analyze.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/bufmgr.h"
@@ -163,6 +164,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 {
 	ExplainState *es = NewExplainState();
 	TupOutputState *tstate;
+	JumbleState *jstate = NULL;
+	Query		*query;
 	List	   *rewritten;
 	ListCell   *lc;
 	bool		timing_set = false;
@@ -239,6 +242,13 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	/* if the summary was not set explicitly, set default value */
 	es->summary = (summary_set) ? es->summary : es->analyze;
 
+	query = castNode(Query, stmt->query);
+	if (compute_query_id)
+		jstate = JumbleQuery(query, pstate->p_sourcetext);
+
+	if (post_parse_analyze_hook)
+		(*post_parse_analyze_hook) (pstate, query, jstate);
+
 	/*
 	 * Parse analysis was done already, but we still have to run the rule
 	 * rewriter.  We do not do AcquireRewriteLocks: we assume the query either
@@ -598,6 +608,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	/* Create textual dump of plan tree */
 	ExplainPrintPlan(es, queryDesc);
 
+	if (es->verbose && plannedstmt->queryId != UINT64CONST(0))
+	{
+		char	buf[MAXINT8LEN+1];
+
+		pg_lltoa(plannedstmt->queryId, buf);
+		ExplainPropertyText("Query Identifier", buf, es);
+	}
+
 	/* Show buffer usage in planning */
 	if (bufusage)
 	{
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
index 791eba8511..1f8a3ead52 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -17,7 +17,7 @@ begin
     for ln in execute $1
     loop
         -- Replace any numeric word with just 'N'
-        ln := regexp_replace(ln, '\m\d+\M', 'N', 'g');
+        ln := regexp_replace(ln, '-?\m\d+\M', 'N', 'g');
         -- In sort output, the above won't match units-suffixed numbers
         ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
         -- Ignore text-mode buffers output because it varies depending
@@ -470,3 +470,12 @@ select jsonb_pretty(
 (1 row)
 
 rollback;
+set compute_query_id = on;
+select explain_filter('explain (verbose) select 1');
+             explain_filter             
+----------------------------------------
+ Result  (cost=N.N..N.N rows=N width=N)
+   Output: N
+ Query Identifier: N
+(3 rows)
+
diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql
index f2eab030d6..468caf4037 100644
--- a/src/test/regress/sql/explain.sql
+++ b/src/test/regress/sql/explain.sql
@@ -19,7 +19,7 @@ begin
     for ln in execute $1
     loop
         -- Replace any numeric word with just 'N'
-        ln := regexp_replace(ln, '\m\d+\M', 'N', 'g');
+        ln := regexp_replace(ln, '-?\m\d+\M', 'N', 'g');
         -- In sort output, the above won't match units-suffixed numbers
         ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
         -- Ignore text-mode buffers output because it varies depending
@@ -103,3 +103,6 @@ select jsonb_pretty(
 );
 
 rollback;
+
+set compute_query_id = on;
+select explain_filter('explain (verbose) select 1');
-- 
2.30.1

bruce@momjian.us

almost 5 years ago

In reply to: Julien Rouhaud (#138)

3 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Sat, Mar 20, 2021 at 02:12:34PM +0800, Julien Rouhaud wrote:

On Fri, Mar 19, 2021 at 12:53:18PM -0400, Bruce Momjian wrote:

Well, given we don't really want to support multiple query id types
being generated or displayed, the "error out" above should fix it.

Let's do this --- tell extensions to error out if the query id is
already set, either by compute_query_id or another extension. If an
extension wants to generate its own query id and store is internal to
the extension, that is fine, but the server-displayed query id should be
generated once and never overwritten by an extension.

Agreed, this will ensure that you won't dynamically change the queryid source.

We should also document that changing it requires a restart and calling
pg_stat_statements_reset() afterwards.

v19 adds some changes, plus extra documentation for pg_stat_statements about
the requirement for a queryid to be calculated, and a note that all documented
details only apply for in-core source. I'm not sure if this is still the best
place to document those details anymore though.

OK, after reading the entire thread, I don't think there are any
remaining open issues with this patch and I think this is ready for
committing. I have adjusted the doc section of the patches, attached.
I have marked myself as committer in the commitfest app and hope to
apply it in the next few days based on feedback.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

Attachments:

qid-01-jumble_over_master.difftext/x-diff; charset=us-asciiDownload

From 5ab783123fc11e948963de7a7c3e6428051e3315 Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Mon, 22 Mar 2021 17:43:22 -0400
Subject: [PATCH] qid-01-jumble_over_master squash commit

---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +----------------
 .../pg_stat_statements.conf                   |   1 +
 doc/src/sgml/config.sgml                      |  25 +
 doc/src/sgml/pgstatstatements.sgml            |  20 +-
 src/backend/parser/analyze.c                  |  14 +-
 src/backend/tcop/postgres.c                   |   6 +-
 src/backend/utils/misc/Makefile               |   1 +
 src/backend/utils/misc/guc.c                  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c (new)    | 834 ++++++++++++++++++
 src/include/parser/analyze.h                  |   4 +-
 src/include/utils/guc.h                       |   1 +
 src/include/utils/queryjumble.h (new)         |  58 ++
 13 files changed, 995 insertions(+), 785 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 62cccbfa44..bd8c96728c 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * Starting in Postgres 9.2, this module normalized query entries.  As of
+ * Postgres 14, the normalization is done by the core if compute_query_id is
+ * enabled, or optionally by third-party modules.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -114,8 +99,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
-#define JUMBLE_SIZE				1024	/* query serialization buffer size */
-
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -235,40 +218,6 @@ typedef struct pgssSharedState
 	pgssGlobalStats stats;		/* global statistics for pgss */
 } pgssSharedState;
 
-/*
- * Struct for tracking locations/lengths of constants during normalization
- */
-typedef struct pgssLocationLen
-{
-	int			location;		/* start offset in query text */
-	int			length;			/* length in bytes, or -1 to ignore */
-} pgssLocationLen;
-
-/*
- * Working state for computing a query jumble and producing a normalized
- * query string
- */
-typedef struct pgssJumbleState
-{
-	/* Jumble of current query tree */
-	unsigned char *jumble;
-
-	/* Number of bytes used in jumble[] */
-	Size		jumble_len;
-
-	/* Array of locations of constants that should be removed */
-	pgssLocationLen *clocations;
-
-	/* Allocated length of clocations array */
-	int			clocations_buf_size;
-
-	/* Current number of valid entries in clocations array */
-	int			clocations_count;
-
-	/* highest Param id we've seen, in order to start normalization correctly */
-	int			highest_extern_param_id;
-} pgssJumbleState;
-
 /*---- Local variables ----*/
 
 /* Current nesting depth of ExecutorRun+ProcessUtility calls */
@@ -342,7 +291,8 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_info);
 
 static void pgss_shmem_startup(void);
 static void pgss_shmem_shutdown(int code, Datum arg);
-static void pgss_post_parse_analyze(ParseState *pstate, Query *query);
+static void pgss_post_parse_analyze(ParseState *pstate, Query *query,
+									JumbleState *jstate);
 static PlannedStmt *pgss_planner(Query *parse,
 								 const char *query_string,
 								 int cursorOptions,
@@ -364,7 +314,7 @@ static void pgss_store(const char *query, uint64 queryId,
 					   double total_time, uint64 rows,
 					   const BufferUsage *bufusage,
 					   const WalUsage *walusage,
-					   pgssJumbleState *jstate);
+					   JumbleState *jstate);
 static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
 										pgssVersion api_version,
 										bool showtext);
@@ -380,16 +330,9 @@ static char *qtext_fetch(Size query_offset, int query_len,
 static bool need_gc_qtexts(void);
 static void gc_qtexts(void);
 static void entry_reset(Oid userid, Oid dbid, uint64 queryid);
-static void AppendJumble(pgssJumbleState *jstate,
-						 const unsigned char *item, Size size);
-static void JumbleQuery(pgssJumbleState *jstate, Query *query);
-static void JumbleRangeTable(pgssJumbleState *jstate, List *rtable);
-static void JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks);
-static void JumbleExpr(pgssJumbleState *jstate, Node *node);
-static void RecordConstLocation(pgssJumbleState *jstate, int location);
-static char *generate_normalized_query(pgssJumbleState *jstate, const char *query,
+static char *generate_normalized_query(JumbleState *jstate, const char *query,
 									   int query_loc, int *query_len_p);
-static void fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+static void fill_in_constant_lengths(JumbleState *jstate, const char *query,
 									 int query_loc);
 static int	comp_location(const void *a, const void *b);
 
@@ -851,15 +794,10 @@ error:
  * Post-parse-analysis hook: mark query with a queryId
  */
 static void
-pgss_post_parse_analyze(ParseState *pstate, Query *query)
+pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 {
-	pgssJumbleState jstate;
-
 	if (prev_post_parse_analyze_hook)
-		prev_post_parse_analyze_hook(pstate, query);
-
-	/* Assert we didn't do this already */
-	Assert(query->queryId == UINT64CONST(0));
+		prev_post_parse_analyze_hook(pstate, query, jstate);
 
 	/* Safety check... */
 	if (!pgss || !pgss_hash || !pgss_enabled(exec_nested_level))
@@ -879,35 +817,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 	}
 
-	/* Set up workspace for query jumbling */
-	jstate.jumble = (unsigned char *) palloc(JUMBLE_SIZE);
-	jstate.jumble_len = 0;
-	jstate.clocations_buf_size = 32;
-	jstate.clocations = (pgssLocationLen *)
-		palloc(jstate.clocations_buf_size * sizeof(pgssLocationLen));
-	jstate.clocations_count = 0;
-	jstate.highest_extern_param_id = 0;
-
-	/* Compute query ID and mark the Query node with it */
-	JumbleQuery(&jstate, query);
-	query->queryId =
-		DatumGetUInt64(hash_any_extended(jstate.jumble, jstate.jumble_len, 0));
-
 	/*
-	 * If we are unlucky enough to get a hash of zero, use 1 instead, to
-	 * prevent confusion with the utility-statement case.
+	 * If query jumbling were able to identify any ignorable constants, we
+	 * immediately create a hash table entry for the query, so that we can
+	 * record the normalized form of the query string.  If there were no such
+	 * constants, the normalized string would be the same as the query text
+	 * anyway, so there's no need for an early entry.
 	 */
-	if (query->queryId == UINT64CONST(0))
-		query->queryId = UINT64CONST(1);
-
-	/*
-	 * If we were able to identify any ignorable constants, we immediately
-	 * create a hash table entry for the query, so that we can record the
-	 * normalized form of the query string.  If there were no such constants,
-	 * the normalized string would be the same as the query text anyway, so
-	 * there's no need for an early entry.
-	 */
-	if (jstate.clocations_count > 0)
+	if (jstate && jstate->clocations_count > 0)
 		pgss_store(pstate->p_sourcetext,
 				   query->queryId,
 				   query->stmt_location,
@@ -917,7 +834,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 				   0,
 				   NULL,
 				   NULL,
-				   &jstate);
+				   jstate);
 }
 
 /*
@@ -1267,7 +1184,7 @@ pgss_store(const char *query, uint64 queryId,
 		   double total_time, uint64 rows,
 		   const BufferUsage *bufusage,
 		   const WalUsage *walusage,
-		   pgssJumbleState *jstate)
+		   JumbleState *jstate)
 {
 	pgssHashKey key;
 	pgssEntry  *entry;
@@ -2627,678 +2544,6 @@ release_lock:
 	LWLockRelease(pgss->lock);
 }
 
-/*
- * AppendJumble: Append a value that is substantive in a given query to
- * the current jumble.
- */
-static void
-AppendJumble(pgssJumbleState *jstate, const unsigned char *item, Size size)
-{
-	unsigned char *jumble = jstate->jumble;
-	Size		jumble_len = jstate->jumble_len;
-
-	/*
-	 * Whenever the jumble buffer is full, we hash the current contents and
-	 * reset the buffer to contain just that hash value, thus relying on the
-	 * hash to summarize everything so far.
-	 */
-	while (size > 0)
-	{
-		Size		part_size;
-
-		if (jumble_len >= JUMBLE_SIZE)
-		{
-			uint64		start_hash;
-
-			start_hash = DatumGetUInt64(hash_any_extended(jumble,
-														  JUMBLE_SIZE, 0));
-			memcpy(jumble, &start_hash, sizeof(start_hash));
-			jumble_len = sizeof(start_hash);
-		}
-		part_size = Min(size, JUMBLE_SIZE - jumble_len);
-		memcpy(jumble + jumble_len, item, part_size);
-		jumble_len += part_size;
-		item += part_size;
-		size -= part_size;
-	}
-	jstate->jumble_len = jumble_len;
-}
-
-/*
- * Wrappers around AppendJumble to encapsulate details of serialization
- * of individual local variable elements.
- */
-#define APP_JUMB(item) \
-	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
-#define APP_JUMB_STRING(str) \
-	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
-
-/*
- * JumbleQuery: Selectively serialize the query tree, appending significant
- * data to the "query jumble" while ignoring nonsignificant data.
- *
- * Rule of thumb for what to include is that we should ignore anything not
- * semantically significant (such as alias names) as well as anything that can
- * be deduced from child nodes (else we'd just be double-hashing that piece
- * of information).
- */
-static void
-JumbleQuery(pgssJumbleState *jstate, Query *query)
-{
-	Assert(IsA(query, Query));
-	Assert(query->utilityStmt == NULL);
-
-	APP_JUMB(query->commandType);
-	/* resultRelation is usually predictable from commandType */
-	JumbleExpr(jstate, (Node *) query->cteList);
-	JumbleRangeTable(jstate, query->rtable);
-	JumbleExpr(jstate, (Node *) query->jointree);
-	JumbleExpr(jstate, (Node *) query->targetList);
-	JumbleExpr(jstate, (Node *) query->onConflict);
-	JumbleExpr(jstate, (Node *) query->returningList);
-	JumbleExpr(jstate, (Node *) query->groupClause);
-	JumbleExpr(jstate, (Node *) query->groupingSets);
-	JumbleExpr(jstate, query->havingQual);
-	JumbleExpr(jstate, (Node *) query->windowClause);
-	JumbleExpr(jstate, (Node *) query->distinctClause);
-	JumbleExpr(jstate, (Node *) query->sortClause);
-	JumbleExpr(jstate, query->limitOffset);
-	JumbleExpr(jstate, query->limitCount);
-	JumbleRowMarks(jstate, query->rowMarks);
-	JumbleExpr(jstate, query->setOperations);
-}
-
-/*
- * Jumble a range table
- */
-static void
-JumbleRangeTable(pgssJumbleState *jstate, List *rtable)
-{
-	ListCell   *lc;
-
-	foreach(lc, rtable)
-	{
-		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
-
-		APP_JUMB(rte->rtekind);
-		switch (rte->rtekind)
-		{
-			case RTE_RELATION:
-				APP_JUMB(rte->relid);
-				JumbleExpr(jstate, (Node *) rte->tablesample);
-				break;
-			case RTE_SUBQUERY:
-				JumbleQuery(jstate, rte->subquery);
-				break;
-			case RTE_JOIN:
-				APP_JUMB(rte->jointype);
-				break;
-			case RTE_FUNCTION:
-				JumbleExpr(jstate, (Node *) rte->functions);
-				break;
-			case RTE_TABLEFUNC:
-				JumbleExpr(jstate, (Node *) rte->tablefunc);
-				break;
-			case RTE_VALUES:
-				JumbleExpr(jstate, (Node *) rte->values_lists);
-				break;
-			case RTE_CTE:
-
-				/*
-				 * Depending on the CTE name here isn't ideal, but it's the
-				 * only info we have to identify the referenced WITH item.
-				 */
-				APP_JUMB_STRING(rte->ctename);
-				APP_JUMB(rte->ctelevelsup);
-				break;
-			case RTE_NAMEDTUPLESTORE:
-				APP_JUMB_STRING(rte->enrname);
-				break;
-			case RTE_RESULT:
-				break;
-			default:
-				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
-				break;
-		}
-	}
-}
-
-/*
- * Jumble a rowMarks list
- */
-static void
-JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks)
-{
-	ListCell   *lc;
-
-	foreach(lc, rowMarks)
-	{
-		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
-
-		if (!rowmark->pushedDown)
-		{
-			APP_JUMB(rowmark->rti);
-			APP_JUMB(rowmark->strength);
-			APP_JUMB(rowmark->waitPolicy);
-		}
-	}
-}
-
-/*
- * Jumble an expression tree
- *
- * In general this function should handle all the same node types that
- * expression_tree_walker() does, and therefore it's coded to be as parallel
- * to that function as possible.  However, since we are only invoked on
- * queries immediately post-parse-analysis, we need not handle node types
- * that only appear in planning.
- *
- * Note: the reason we don't simply use expression_tree_walker() is that the
- * point of that function is to support tree walkers that don't care about
- * most tree node types, but here we care about all types.  We should complain
- * about any unrecognized node type.
- */
-static void
-JumbleExpr(pgssJumbleState *jstate, Node *node)
-{
-	ListCell   *temp;
-
-	if (node == NULL)
-		return;
-
-	/* Guard against stack overflow due to overly complex expressions */
-	check_stack_depth();
-
-	/*
-	 * We always emit the node's NodeTag, then any additional fields that are
-	 * considered significant, and then we recurse to any child nodes.
-	 */
-	APP_JUMB(node->type);
-
-	switch (nodeTag(node))
-	{
-		case T_Var:
-			{
-				Var		   *var = (Var *) node;
-
-				APP_JUMB(var->varno);
-				APP_JUMB(var->varattno);
-				APP_JUMB(var->varlevelsup);
-			}
-			break;
-		case T_Const:
-			{
-				Const	   *c = (Const *) node;
-
-				/* We jumble only the constant's type, not its value */
-				APP_JUMB(c->consttype);
-				/* Also, record its parse location for query normalization */
-				RecordConstLocation(jstate, c->location);
-			}
-			break;
-		case T_Param:
-			{
-				Param	   *p = (Param *) node;
-
-				APP_JUMB(p->paramkind);
-				APP_JUMB(p->paramid);
-				APP_JUMB(p->paramtype);
-				/* Also, track the highest external Param id */
-				if (p->paramkind == PARAM_EXTERN &&
-					p->paramid > jstate->highest_extern_param_id)
-					jstate->highest_extern_param_id = p->paramid;
-			}
-			break;
-		case T_Aggref:
-			{
-				Aggref	   *expr = (Aggref *) node;
-
-				APP_JUMB(expr->aggfnoid);
-				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggorder);
-				JumbleExpr(jstate, (Node *) expr->aggdistinct);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_GroupingFunc:
-			{
-				GroupingFunc *grpnode = (GroupingFunc *) node;
-
-				JumbleExpr(jstate, (Node *) grpnode->refs);
-			}
-			break;
-		case T_WindowFunc:
-			{
-				WindowFunc *expr = (WindowFunc *) node;
-
-				APP_JUMB(expr->winfnoid);
-				APP_JUMB(expr->winref);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_SubscriptingRef:
-			{
-				SubscriptingRef *sbsref = (SubscriptingRef *) node;
-
-				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
-			}
-			break;
-		case T_FuncExpr:
-			{
-				FuncExpr   *expr = (FuncExpr *) node;
-
-				APP_JUMB(expr->funcid);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_NamedArgExpr:
-			{
-				NamedArgExpr *nae = (NamedArgExpr *) node;
-
-				APP_JUMB(nae->argnumber);
-				JumbleExpr(jstate, (Node *) nae->arg);
-			}
-			break;
-		case T_OpExpr:
-		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
-		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
-			{
-				OpExpr	   *expr = (OpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_ScalarArrayOpExpr:
-			{
-				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				APP_JUMB(expr->useOr);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_BoolExpr:
-			{
-				BoolExpr   *expr = (BoolExpr *) node;
-
-				APP_JUMB(expr->boolop);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_SubLink:
-			{
-				SubLink    *sublink = (SubLink *) node;
-
-				APP_JUMB(sublink->subLinkType);
-				APP_JUMB(sublink->subLinkId);
-				JumbleExpr(jstate, (Node *) sublink->testexpr);
-				JumbleQuery(jstate, castNode(Query, sublink->subselect));
-			}
-			break;
-		case T_FieldSelect:
-			{
-				FieldSelect *fs = (FieldSelect *) node;
-
-				APP_JUMB(fs->fieldnum);
-				JumbleExpr(jstate, (Node *) fs->arg);
-			}
-			break;
-		case T_FieldStore:
-			{
-				FieldStore *fstore = (FieldStore *) node;
-
-				JumbleExpr(jstate, (Node *) fstore->arg);
-				JumbleExpr(jstate, (Node *) fstore->newvals);
-			}
-			break;
-		case T_RelabelType:
-			{
-				RelabelType *rt = (RelabelType *) node;
-
-				APP_JUMB(rt->resulttype);
-				JumbleExpr(jstate, (Node *) rt->arg);
-			}
-			break;
-		case T_CoerceViaIO:
-			{
-				CoerceViaIO *cio = (CoerceViaIO *) node;
-
-				APP_JUMB(cio->resulttype);
-				JumbleExpr(jstate, (Node *) cio->arg);
-			}
-			break;
-		case T_ArrayCoerceExpr:
-			{
-				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
-
-				APP_JUMB(acexpr->resulttype);
-				JumbleExpr(jstate, (Node *) acexpr->arg);
-				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
-			}
-			break;
-		case T_ConvertRowtypeExpr:
-			{
-				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
-
-				APP_JUMB(crexpr->resulttype);
-				JumbleExpr(jstate, (Node *) crexpr->arg);
-			}
-			break;
-		case T_CollateExpr:
-			{
-				CollateExpr *ce = (CollateExpr *) node;
-
-				APP_JUMB(ce->collOid);
-				JumbleExpr(jstate, (Node *) ce->arg);
-			}
-			break;
-		case T_CaseExpr:
-			{
-				CaseExpr   *caseexpr = (CaseExpr *) node;
-
-				JumbleExpr(jstate, (Node *) caseexpr->arg);
-				foreach(temp, caseexpr->args)
-				{
-					CaseWhen   *when = lfirst_node(CaseWhen, temp);
-
-					JumbleExpr(jstate, (Node *) when->expr);
-					JumbleExpr(jstate, (Node *) when->result);
-				}
-				JumbleExpr(jstate, (Node *) caseexpr->defresult);
-			}
-			break;
-		case T_CaseTestExpr:
-			{
-				CaseTestExpr *ct = (CaseTestExpr *) node;
-
-				APP_JUMB(ct->typeId);
-			}
-			break;
-		case T_ArrayExpr:
-			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
-			break;
-		case T_RowExpr:
-			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
-			break;
-		case T_RowCompareExpr:
-			{
-				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
-
-				APP_JUMB(rcexpr->rctype);
-				JumbleExpr(jstate, (Node *) rcexpr->largs);
-				JumbleExpr(jstate, (Node *) rcexpr->rargs);
-			}
-			break;
-		case T_CoalesceExpr:
-			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
-			break;
-		case T_MinMaxExpr:
-			{
-				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
-
-				APP_JUMB(mmexpr->op);
-				JumbleExpr(jstate, (Node *) mmexpr->args);
-			}
-			break;
-		case T_SQLValueFunction:
-			{
-				SQLValueFunction *svf = (SQLValueFunction *) node;
-
-				APP_JUMB(svf->op);
-				/* type is fully determined by op */
-				APP_JUMB(svf->typmod);
-			}
-			break;
-		case T_XmlExpr:
-			{
-				XmlExpr    *xexpr = (XmlExpr *) node;
-
-				APP_JUMB(xexpr->op);
-				JumbleExpr(jstate, (Node *) xexpr->named_args);
-				JumbleExpr(jstate, (Node *) xexpr->args);
-			}
-			break;
-		case T_NullTest:
-			{
-				NullTest   *nt = (NullTest *) node;
-
-				APP_JUMB(nt->nulltesttype);
-				JumbleExpr(jstate, (Node *) nt->arg);
-			}
-			break;
-		case T_BooleanTest:
-			{
-				BooleanTest *bt = (BooleanTest *) node;
-
-				APP_JUMB(bt->booltesttype);
-				JumbleExpr(jstate, (Node *) bt->arg);
-			}
-			break;
-		case T_CoerceToDomain:
-			{
-				CoerceToDomain *cd = (CoerceToDomain *) node;
-
-				APP_JUMB(cd->resulttype);
-				JumbleExpr(jstate, (Node *) cd->arg);
-			}
-			break;
-		case T_CoerceToDomainValue:
-			{
-				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
-
-				APP_JUMB(cdv->typeId);
-			}
-			break;
-		case T_SetToDefault:
-			{
-				SetToDefault *sd = (SetToDefault *) node;
-
-				APP_JUMB(sd->typeId);
-			}
-			break;
-		case T_CurrentOfExpr:
-			{
-				CurrentOfExpr *ce = (CurrentOfExpr *) node;
-
-				APP_JUMB(ce->cvarno);
-				if (ce->cursor_name)
-					APP_JUMB_STRING(ce->cursor_name);
-				APP_JUMB(ce->cursor_param);
-			}
-			break;
-		case T_NextValueExpr:
-			{
-				NextValueExpr *nve = (NextValueExpr *) node;
-
-				APP_JUMB(nve->seqid);
-				APP_JUMB(nve->typeId);
-			}
-			break;
-		case T_InferenceElem:
-			{
-				InferenceElem *ie = (InferenceElem *) node;
-
-				APP_JUMB(ie->infercollid);
-				APP_JUMB(ie->inferopclass);
-				JumbleExpr(jstate, ie->expr);
-			}
-			break;
-		case T_TargetEntry:
-			{
-				TargetEntry *tle = (TargetEntry *) node;
-
-				APP_JUMB(tle->resno);
-				APP_JUMB(tle->ressortgroupref);
-				JumbleExpr(jstate, (Node *) tle->expr);
-			}
-			break;
-		case T_RangeTblRef:
-			{
-				RangeTblRef *rtr = (RangeTblRef *) node;
-
-				APP_JUMB(rtr->rtindex);
-			}
-			break;
-		case T_JoinExpr:
-			{
-				JoinExpr   *join = (JoinExpr *) node;
-
-				APP_JUMB(join->jointype);
-				APP_JUMB(join->isNatural);
-				APP_JUMB(join->rtindex);
-				JumbleExpr(jstate, join->larg);
-				JumbleExpr(jstate, join->rarg);
-				JumbleExpr(jstate, join->quals);
-			}
-			break;
-		case T_FromExpr:
-			{
-				FromExpr   *from = (FromExpr *) node;
-
-				JumbleExpr(jstate, (Node *) from->fromlist);
-				JumbleExpr(jstate, from->quals);
-			}
-			break;
-		case T_OnConflictExpr:
-			{
-				OnConflictExpr *conf = (OnConflictExpr *) node;
-
-				APP_JUMB(conf->action);
-				JumbleExpr(jstate, (Node *) conf->arbiterElems);
-				JumbleExpr(jstate, conf->arbiterWhere);
-				JumbleExpr(jstate, (Node *) conf->onConflictSet);
-				JumbleExpr(jstate, conf->onConflictWhere);
-				APP_JUMB(conf->constraint);
-				APP_JUMB(conf->exclRelIndex);
-				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
-			}
-			break;
-		case T_List:
-			foreach(temp, (List *) node)
-			{
-				JumbleExpr(jstate, (Node *) lfirst(temp));
-			}
-			break;
-		case T_IntList:
-			foreach(temp, (List *) node)
-			{
-				APP_JUMB(lfirst_int(temp));
-			}
-			break;
-		case T_SortGroupClause:
-			{
-				SortGroupClause *sgc = (SortGroupClause *) node;
-
-				APP_JUMB(sgc->tleSortGroupRef);
-				APP_JUMB(sgc->eqop);
-				APP_JUMB(sgc->sortop);
-				APP_JUMB(sgc->nulls_first);
-			}
-			break;
-		case T_GroupingSet:
-			{
-				GroupingSet *gsnode = (GroupingSet *) node;
-
-				JumbleExpr(jstate, (Node *) gsnode->content);
-			}
-			break;
-		case T_WindowClause:
-			{
-				WindowClause *wc = (WindowClause *) node;
-
-				APP_JUMB(wc->winref);
-				APP_JUMB(wc->frameOptions);
-				JumbleExpr(jstate, (Node *) wc->partitionClause);
-				JumbleExpr(jstate, (Node *) wc->orderClause);
-				JumbleExpr(jstate, wc->startOffset);
-				JumbleExpr(jstate, wc->endOffset);
-			}
-			break;
-		case T_CommonTableExpr:
-			{
-				CommonTableExpr *cte = (CommonTableExpr *) node;
-
-				/* we store the string name because RTE_CTE RTEs need it */
-				APP_JUMB_STRING(cte->ctename);
-				APP_JUMB(cte->ctematerialized);
-				JumbleQuery(jstate, castNode(Query, cte->ctequery));
-			}
-			break;
-		case T_SetOperationStmt:
-			{
-				SetOperationStmt *setop = (SetOperationStmt *) node;
-
-				APP_JUMB(setop->op);
-				APP_JUMB(setop->all);
-				JumbleExpr(jstate, setop->larg);
-				JumbleExpr(jstate, setop->rarg);
-			}
-			break;
-		case T_RangeTblFunction:
-			{
-				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
-
-				JumbleExpr(jstate, rtfunc->funcexpr);
-			}
-			break;
-		case T_TableFunc:
-			{
-				TableFunc  *tablefunc = (TableFunc *) node;
-
-				JumbleExpr(jstate, tablefunc->docexpr);
-				JumbleExpr(jstate, tablefunc->rowexpr);
-				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
-			}
-			break;
-		case T_TableSampleClause:
-			{
-				TableSampleClause *tsc = (TableSampleClause *) node;
-
-				APP_JUMB(tsc->tsmhandler);
-				JumbleExpr(jstate, (Node *) tsc->args);
-				JumbleExpr(jstate, (Node *) tsc->repeatable);
-			}
-			break;
-		default:
-			/* Only a warning, since we can stumble along anyway */
-			elog(WARNING, "unrecognized node type: %d",
-				 (int) nodeTag(node));
-			break;
-	}
-}
-
-/*
- * Record location of constant within query string of query tree
- * that is currently being walked.
- */
-static void
-RecordConstLocation(pgssJumbleState *jstate, int location)
-{
-	/* -1 indicates unknown or undefined location */
-	if (location >= 0)
-	{
-		/* enlarge array if needed */
-		if (jstate->clocations_count >= jstate->clocations_buf_size)
-		{
-			jstate->clocations_buf_size *= 2;
-			jstate->clocations = (pgssLocationLen *)
-				repalloc(jstate->clocations,
-						 jstate->clocations_buf_size *
-						 sizeof(pgssLocationLen));
-		}
-		jstate->clocations[jstate->clocations_count].location = location;
-		/* initialize lengths to -1 to simplify fill_in_constant_lengths */
-		jstate->clocations[jstate->clocations_count].length = -1;
-		jstate->clocations_count++;
-	}
-}
-
 /*
  * Generate a normalized version of the query string that will be used to
  * represent all similar queries.
@@ -3319,7 +2564,7 @@ RecordConstLocation(pgssJumbleState *jstate, int location)
  * Returns a palloc'd string.
  */
 static char *
-generate_normalized_query(pgssJumbleState *jstate, const char *query,
+generate_normalized_query(JumbleState *jstate, const char *query,
 						  int query_loc, int *query_len_p)
 {
 	char	   *norm_query;
@@ -3426,10 +2671,10 @@ generate_normalized_query(pgssJumbleState *jstate, const char *query,
  * reason for a constant to start with a '-'.
  */
 static void
-fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+fill_in_constant_lengths(JumbleState *jstate, const char *query,
 						 int query_loc)
 {
-	pgssLocationLen *locs;
+	LocationLen *locs;
 	core_yyscan_t yyscanner;
 	core_yy_extra_type yyextra;
 	core_YYSTYPE yylval;
@@ -3443,7 +2688,7 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 	 */
 	if (jstate->clocations_count > 1)
 		qsort(jstate->clocations, jstate->clocations_count,
-			  sizeof(pgssLocationLen), comp_location);
+			  sizeof(LocationLen), comp_location);
 	locs = jstate->clocations;
 
 	/* initialize the flex scanner --- should match raw_parser() */
@@ -3523,13 +2768,13 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 }
 
 /*
- * comp_location: comparator for qsorting pgssLocationLen structs by location
+ * comp_location: comparator for qsorting LocationLen structs by location
  */
 static int
 comp_location(const void *a, const void *b)
 {
-	int			l = ((const pgssLocationLen *) a)->location;
-	int			r = ((const pgssLocationLen *) b)->location;
+	int			l = ((const LocationLen *) a)->location;
+	int			r = ((const LocationLen *) b)->location;
 
 	if (l < r)
 		return -1;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.conf b/contrib/pg_stat_statements/pg_stat_statements.conf
index 13346e2807..e47b26040f 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.conf
+++ b/contrib/pg_stat_statements/pg_stat_statements.conf
@@ -1 +1,2 @@
 shared_preload_libraries = 'pg_stat_statements'
+compute_query_id = on
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5679b40dd5..89f7daf11f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7561,6 +7561,31 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
      <title>Statistics Monitoring</title>
      <variablelist>
 
+     <varlistentry id="guc-compute-query-id" xreflabel="compute_query_id">
+      <term><varname>compute_query_id</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>compute_query_id</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables in-core computation of a query identifier.  The <xref
+        linkend="pgstatstatements"/> extension requires a query identifier
+        to be computed.  Note that an external module can alternatively
+        be used if the in-core query identifier computation method
+        isn't acceptable.  In this case, in-core computation should
+        remain disabled.  The default is <literal>off</literal>.
+       </para>
+       <note>
+        <para>
+         To ensure that a only one query identifier is calculated and
+         displayed, extensions that calculate query identifiers should
+         throw an error if a query identifier has already been computed.
+        </para>
+       </note>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><varname>log_statement_stats</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 464bf0e5ae..3ca292d71f 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -20,6 +20,14 @@
   This means that a server restart is needed to add or remove the module.
  </para>
 
+ <para>
+  The module will not track statistics unless query
+  identifiers are calculated.  This can be done by enabling <xref
+  linkend="guc-compute-query-id"/> or using a third-party module that
+  computes its own query identifiers.  Note that all statistics tracked
+  by this module must be reset if the query identifier method is changed.
+ </para>
+
  <para>
    When <filename>pg_stat_statements</filename> is loaded, it tracks
    statistics across all databases of the server.  To access and manipulate
@@ -84,7 +92,7 @@
        <structfield>queryid</structfield> <type>bigint</type>
       </para>
       <para>
-       Internal hash code, computed from the statement's parse tree
+       Hash code to identify identical normalized queries.
       </para></entry>
      </row>
 
@@ -386,6 +394,16 @@
    are compared strictly on the basis of their textual query strings, however.
   </para>
 
+  <note>
+   <para>
+    The following details about constant replacement and
+    <structfield>queryid</structfield> only applies when <xref
+    linkend="guc-compute-query-id"/> is enabled.  If you use an external
+    module instead to compute <structfield>queryid</structfield>, you
+    should refer to its documentation for details.
+   </para>
+  </note>
+
   <para>
    When a constant's value has been ignored for purposes of matching the query
    to other queries, the constant is replaced by a parameter symbol, such
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 7149724953..c565c80365 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -46,6 +46,8 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/queryjumble.h"
 #include "utils/rel.h"
 
 
@@ -107,6 +109,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -119,8 +122,11 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	query = transformTopLevelStmt(pstate, parseTree);
 
+	if (compute_query_id)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
@@ -140,6 +146,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -152,8 +159,11 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 	/* make sure all is well with parameter types */
 	check_variable_parameters(pstate, query);
 
+	if (compute_query_id)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2b1b68109f..7e034b72b1 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -665,6 +665,7 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 	ParseState *pstate;
 	Query	   *query;
 	List	   *querytree_list;
+	JumbleState *jstate = NULL;
 
 	Assert(query_string != NULL);	/* required as of 8.4 */
 
@@ -683,8 +684,11 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	query = transformTopLevelStmt(pstate, parsetree);
 
+	if (compute_query_id)
+		jstate = JumbleQuery(query, query_string);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 2397fc2453..1d5327cf64 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -22,6 +22,7 @@ OBJS = \
 	pg_rusage.o \
 	ps_status.o \
 	queryenvironment.o \
+	queryjumble.o \
 	rls.o \
 	sampling.o \
 	superuser.o \
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 3b36a31a47..8f5aa0ced7 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -521,6 +521,7 @@ extern const struct config_enum_entry dynamic_shared_memory_options[];
 /*
  * GUC option variables that are exported from this module
  */
+bool		compute_query_id = false;
 bool		log_duration = false;
 bool		Debug_print_plan = false;
 bool		Debug_print_parse = false;
@@ -1435,6 +1436,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"compute_query_id", PGC_SUSET, STATS_MONITORING,
+			gettext_noop("Compute query identifiers."),
+			NULL
+		},
+		&compute_query_id,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"log_parser_stats", PGC_SUSET, STATS_MONITORING,
 			gettext_noop("Writes parser performance statistics to the server log."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 86425965d0..01493ed3d4 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -595,6 +595,7 @@
 
 # - Monitoring -
 
+#compute_query_id = off
 #log_parser_stats = off
 #log_planner_stats = off
 #log_executor_stats = off
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
new file mode 100644
index 0000000000..ae84fcac6e
--- /dev/null
+++ b/src/backend/utils/misc/queryjumble.c
@@ -0,0 +1,834 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.c
+ *	 Query normalization and fingerprinting.
+ *
+ * Normalization is a process whereby similar queries, typically differing only
+ * in their constants (though the exact rules are somewhat more subtle than
+ * that) are recognized as equivalent, and are tracked as a single entry.  This
+ * is particularly useful for non-prepared queries.
+ *
+ * Normalization is implemented by fingerprinting queries, selectively
+ * serializing those fields of each query tree's nodes that are judged to be
+ * essential to the query.  This is referred to as a query jumble.  This is
+ * distinct from a regular serialization in that various extraneous
+ * information is ignored as irrelevant or not essential to the query, such
+ * as the collations of Vars and, most notably, the values of constants.
+ *
+ * This jumble is acquired at the end of parse analysis of each query, and
+ * a 64-bit hash of it is stored into the query's Query.queryId field.
+ * The server then copies this value around, making it available in plan
+ * tree(s) generated from the query.  The executor can then use this value
+ * to blame query costs on the proper queryId.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/misc/queryjumble.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "common/hashfn.h"
+#include "miscadmin.h"
+#include "parser/scansup.h"
+#include "utils/queryjumble.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+static uint64 compute_utility_queryid(const char *str, int query_len);
+static void AppendJumble(JumbleState *jstate,
+						 const unsigned char *item, Size size);
+static void JumbleQueryInternal(JumbleState *jstate, Query *query);
+static void JumbleRangeTable(JumbleState *jstate, List *rtable);
+static void JumbleRowMarks(JumbleState *jstate, List *rowMarks);
+static void JumbleExpr(JumbleState *jstate, Node *node);
+static void RecordConstLocation(JumbleState *jstate, int location);
+
+/*
+ * Given a possibly multi-statement source string, confine our attention to the
+ * relevant part of the string.
+ */
+const char *
+clean_querytext(const char *query, int *location, int *len)
+{
+	int query_location = *location;
+	int query_len = *len;
+
+	/* First apply starting offset, unless it's -1 (unknown). */
+	if (query_location >= 0)
+	{
+		Assert(query_location <= strlen(query));
+		query += query_location;
+		/* Length of 0 (or -1) means "rest of string" */
+		if (query_len <= 0)
+			query_len = strlen(query);
+		else
+			Assert(query_len <= strlen(query));
+	}
+	else
+	{
+		/* If query location is unknown, distrust query_len as well */
+		query_location = 0;
+		query_len = strlen(query);
+	}
+
+	/*
+	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
+	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 */
+	while (query_len > 0 && scanner_isspace(query[0]))
+		query++, query_location++, query_len--;
+	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
+		query_len--;
+
+	*location = query_location;
+	*len = query_len;
+
+	return query;
+}
+
+JumbleState *
+JumbleQuery(Query *query, const char *querytext)
+{
+	JumbleState *jstate = NULL;
+	if (query->utilityStmt)
+	{
+		const char *sql;
+		int query_location = query->stmt_location;
+		int query_len = query->stmt_len;
+
+		/*
+		 * Confine our attention to the relevant part of the string, if the
+		 * query is a portion of a multi-statement source string.
+		 */
+		sql = clean_querytext(querytext, &query_location, &query_len);
+
+		query->queryId = compute_utility_queryid(sql, query_len);
+	}
+	else
+	{
+		jstate = (JumbleState *) palloc(sizeof(JumbleState));
+
+		/* Set up workspace for query jumbling */
+		jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
+		jstate->jumble_len = 0;
+		jstate->clocations_buf_size = 32;
+		jstate->clocations = (LocationLen *)
+			palloc(jstate->clocations_buf_size * sizeof(LocationLen));
+		jstate->clocations_count = 0;
+		jstate->highest_extern_param_id = 0;
+
+		/* Compute query ID and mark the Query node with it */
+		JumbleQueryInternal(jstate, query);
+		query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
+														  jstate->jumble_len,
+														  0));
+
+		/*
+		 * If we are unlucky enough to get a hash of zero, use 1 instead, to
+		 * prevent confusion with the utility-statement case.
+		 */
+		if (query->queryId == UINT64CONST(0))
+			query->queryId = UINT64CONST(1);
+	}
+
+	return jstate;
+}
+
+/*
+ * Compute a query identifier for the given utility query string.
+ */
+static uint64
+compute_utility_queryid(const char *str, int query_len)
+{
+	uint64 queryId;
+
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+											   query_len, 0));
+
+	/*
+	 * If we are unlucky enough to get a hash of zero(invalid), use
+	 * queryID as 2 instead, queryID 1 is already in use for normal
+	 * statements.
+	 */
+	if (queryId == UINT64CONST(0))
+		queryId = UINT64CONST(2);
+
+	return queryId;
+}
+
+/*
+ * AppendJumble: Append a value that is substantive in a given query to
+ * the current jumble.
+ */
+static void
+AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
+{
+	unsigned char *jumble = jstate->jumble;
+	Size		jumble_len = jstate->jumble_len;
+
+	/*
+	 * Whenever the jumble buffer is full, we hash the current contents and
+	 * reset the buffer to contain just that hash value, thus relying on the
+	 * hash to summarize everything so far.
+	 */
+	while (size > 0)
+	{
+		Size		part_size;
+
+		if (jumble_len >= JUMBLE_SIZE)
+		{
+			uint64		start_hash;
+
+			start_hash = DatumGetUInt64(hash_any_extended(jumble,
+														  JUMBLE_SIZE, 0));
+			memcpy(jumble, &start_hash, sizeof(start_hash));
+			jumble_len = sizeof(start_hash);
+		}
+		part_size = Min(size, JUMBLE_SIZE - jumble_len);
+		memcpy(jumble + jumble_len, item, part_size);
+		jumble_len += part_size;
+		item += part_size;
+		size -= part_size;
+	}
+	jstate->jumble_len = jumble_len;
+}
+
+/*
+ * Wrappers around AppendJumble to encapsulate details of serialization
+ * of individual local variable elements.
+ */
+#define APP_JUMB(item) \
+	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
+#define APP_JUMB_STRING(str) \
+	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
+
+/*
+ * JumbleQueryInternal: Selectively serialize the query tree, appending
+ * significant data to the "query jumble" while ignoring nonsignificant data.
+ *
+ * Rule of thumb for what to include is that we should ignore anything not
+ * semantically significant (such as alias names) as well as anything that can
+ * be deduced from child nodes (else we'd just be double-hashing that piece
+ * of information).
+ */
+static void
+JumbleQueryInternal(JumbleState *jstate, Query *query)
+{
+	Assert(IsA(query, Query));
+	Assert(query->utilityStmt == NULL);
+
+	APP_JUMB(query->commandType);
+	/* resultRelation is usually predictable from commandType */
+	JumbleExpr(jstate, (Node *) query->cteList);
+	JumbleRangeTable(jstate, query->rtable);
+	JumbleExpr(jstate, (Node *) query->jointree);
+	JumbleExpr(jstate, (Node *) query->targetList);
+	JumbleExpr(jstate, (Node *) query->onConflict);
+	JumbleExpr(jstate, (Node *) query->returningList);
+	JumbleExpr(jstate, (Node *) query->groupClause);
+	JumbleExpr(jstate, (Node *) query->groupingSets);
+	JumbleExpr(jstate, query->havingQual);
+	JumbleExpr(jstate, (Node *) query->windowClause);
+	JumbleExpr(jstate, (Node *) query->distinctClause);
+	JumbleExpr(jstate, (Node *) query->sortClause);
+	JumbleExpr(jstate, query->limitOffset);
+	JumbleExpr(jstate, query->limitCount);
+	JumbleRowMarks(jstate, query->rowMarks);
+	JumbleExpr(jstate, query->setOperations);
+}
+
+/*
+ * Jumble a range table
+ */
+static void
+JumbleRangeTable(JumbleState *jstate, List *rtable)
+{
+	ListCell   *lc;
+
+	foreach(lc, rtable)
+	{
+		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+		APP_JUMB(rte->rtekind);
+		switch (rte->rtekind)
+		{
+			case RTE_RELATION:
+				APP_JUMB(rte->relid);
+				JumbleExpr(jstate, (Node *) rte->tablesample);
+				break;
+			case RTE_SUBQUERY:
+				JumbleQueryInternal(jstate, rte->subquery);
+				break;
+			case RTE_JOIN:
+				APP_JUMB(rte->jointype);
+				break;
+			case RTE_FUNCTION:
+				JumbleExpr(jstate, (Node *) rte->functions);
+				break;
+			case RTE_TABLEFUNC:
+				JumbleExpr(jstate, (Node *) rte->tablefunc);
+				break;
+			case RTE_VALUES:
+				JumbleExpr(jstate, (Node *) rte->values_lists);
+				break;
+			case RTE_CTE:
+
+				/*
+				 * Depending on the CTE name here isn't ideal, but it's the
+				 * only info we have to identify the referenced WITH item.
+				 */
+				APP_JUMB_STRING(rte->ctename);
+				APP_JUMB(rte->ctelevelsup);
+				break;
+			case RTE_NAMEDTUPLESTORE:
+				APP_JUMB_STRING(rte->enrname);
+				break;
+			case RTE_RESULT:
+				break;
+			default:
+				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
+				break;
+		}
+	}
+}
+
+/*
+ * Jumble a rowMarks list
+ */
+static void
+JumbleRowMarks(JumbleState *jstate, List *rowMarks)
+{
+	ListCell   *lc;
+
+	foreach(lc, rowMarks)
+	{
+		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
+
+		if (!rowmark->pushedDown)
+		{
+			APP_JUMB(rowmark->rti);
+			APP_JUMB(rowmark->strength);
+			APP_JUMB(rowmark->waitPolicy);
+		}
+	}
+}
+
+/*
+ * Jumble an expression tree
+ *
+ * In general this function should handle all the same node types that
+ * expression_tree_walker() does, and therefore it's coded to be as parallel
+ * to that function as possible.  However, since we are only invoked on
+ * queries immediately post-parse-analysis, we need not handle node types
+ * that only appear in planning.
+ *
+ * Note: the reason we don't simply use expression_tree_walker() is that the
+ * point of that function is to support tree walkers that don't care about
+ * most tree node types, but here we care about all types.  We should complain
+ * about any unrecognized node type.
+ */
+static void
+JumbleExpr(JumbleState *jstate, Node *node)
+{
+	ListCell   *temp;
+
+	if (node == NULL)
+		return;
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+
+	/*
+	 * We always emit the node's NodeTag, then any additional fields that are
+	 * considered significant, and then we recurse to any child nodes.
+	 */
+	APP_JUMB(node->type);
+
+	switch (nodeTag(node))
+	{
+		case T_Var:
+			{
+				Var		   *var = (Var *) node;
+
+				APP_JUMB(var->varno);
+				APP_JUMB(var->varattno);
+				APP_JUMB(var->varlevelsup);
+			}
+			break;
+		case T_Const:
+			{
+				Const	   *c = (Const *) node;
+
+				/* We jumble only the constant's type, not its value */
+				APP_JUMB(c->consttype);
+				/* Also, record its parse location for query normalization */
+				RecordConstLocation(jstate, c->location);
+			}
+			break;
+		case T_Param:
+			{
+				Param	   *p = (Param *) node;
+
+				APP_JUMB(p->paramkind);
+				APP_JUMB(p->paramid);
+				APP_JUMB(p->paramtype);
+				/* Also, track the highest external Param id */
+				if (p->paramkind == PARAM_EXTERN &&
+					p->paramid > jstate->highest_extern_param_id)
+					jstate->highest_extern_param_id = p->paramid;
+			}
+			break;
+		case T_Aggref:
+			{
+				Aggref	   *expr = (Aggref *) node;
+
+				APP_JUMB(expr->aggfnoid);
+				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggorder);
+				JumbleExpr(jstate, (Node *) expr->aggdistinct);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_GroupingFunc:
+			{
+				GroupingFunc *grpnode = (GroupingFunc *) node;
+
+				JumbleExpr(jstate, (Node *) grpnode->refs);
+			}
+			break;
+		case T_WindowFunc:
+			{
+				WindowFunc *expr = (WindowFunc *) node;
+
+				APP_JUMB(expr->winfnoid);
+				APP_JUMB(expr->winref);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_SubscriptingRef:
+			{
+				SubscriptingRef *sbsref = (SubscriptingRef *) node;
+
+				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
+			}
+			break;
+		case T_FuncExpr:
+			{
+				FuncExpr   *expr = (FuncExpr *) node;
+
+				APP_JUMB(expr->funcid);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_NamedArgExpr:
+			{
+				NamedArgExpr *nae = (NamedArgExpr *) node;
+
+				APP_JUMB(nae->argnumber);
+				JumbleExpr(jstate, (Node *) nae->arg);
+			}
+			break;
+		case T_OpExpr:
+		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
+		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
+			{
+				OpExpr	   *expr = (OpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_ScalarArrayOpExpr:
+			{
+				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				APP_JUMB(expr->useOr);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_BoolExpr:
+			{
+				BoolExpr   *expr = (BoolExpr *) node;
+
+				APP_JUMB(expr->boolop);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_SubLink:
+			{
+				SubLink    *sublink = (SubLink *) node;
+
+				APP_JUMB(sublink->subLinkType);
+				APP_JUMB(sublink->subLinkId);
+				JumbleExpr(jstate, (Node *) sublink->testexpr);
+				JumbleQueryInternal(jstate, castNode(Query, sublink->subselect));
+			}
+			break;
+		case T_FieldSelect:
+			{
+				FieldSelect *fs = (FieldSelect *) node;
+
+				APP_JUMB(fs->fieldnum);
+				JumbleExpr(jstate, (Node *) fs->arg);
+			}
+			break;
+		case T_FieldStore:
+			{
+				FieldStore *fstore = (FieldStore *) node;
+
+				JumbleExpr(jstate, (Node *) fstore->arg);
+				JumbleExpr(jstate, (Node *) fstore->newvals);
+			}
+			break;
+		case T_RelabelType:
+			{
+				RelabelType *rt = (RelabelType *) node;
+
+				APP_JUMB(rt->resulttype);
+				JumbleExpr(jstate, (Node *) rt->arg);
+			}
+			break;
+		case T_CoerceViaIO:
+			{
+				CoerceViaIO *cio = (CoerceViaIO *) node;
+
+				APP_JUMB(cio->resulttype);
+				JumbleExpr(jstate, (Node *) cio->arg);
+			}
+			break;
+		case T_ArrayCoerceExpr:
+			{
+				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
+
+				APP_JUMB(acexpr->resulttype);
+				JumbleExpr(jstate, (Node *) acexpr->arg);
+				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
+			}
+			break;
+		case T_ConvertRowtypeExpr:
+			{
+				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
+
+				APP_JUMB(crexpr->resulttype);
+				JumbleExpr(jstate, (Node *) crexpr->arg);
+			}
+			break;
+		case T_CollateExpr:
+			{
+				CollateExpr *ce = (CollateExpr *) node;
+
+				APP_JUMB(ce->collOid);
+				JumbleExpr(jstate, (Node *) ce->arg);
+			}
+			break;
+		case T_CaseExpr:
+			{
+				CaseExpr   *caseexpr = (CaseExpr *) node;
+
+				JumbleExpr(jstate, (Node *) caseexpr->arg);
+				foreach(temp, caseexpr->args)
+				{
+					CaseWhen   *when = lfirst_node(CaseWhen, temp);
+
+					JumbleExpr(jstate, (Node *) when->expr);
+					JumbleExpr(jstate, (Node *) when->result);
+				}
+				JumbleExpr(jstate, (Node *) caseexpr->defresult);
+			}
+			break;
+		case T_CaseTestExpr:
+			{
+				CaseTestExpr *ct = (CaseTestExpr *) node;
+
+				APP_JUMB(ct->typeId);
+			}
+			break;
+		case T_ArrayExpr:
+			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
+			break;
+		case T_RowExpr:
+			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
+			break;
+		case T_RowCompareExpr:
+			{
+				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
+
+				APP_JUMB(rcexpr->rctype);
+				JumbleExpr(jstate, (Node *) rcexpr->largs);
+				JumbleExpr(jstate, (Node *) rcexpr->rargs);
+			}
+			break;
+		case T_CoalesceExpr:
+			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
+			break;
+		case T_MinMaxExpr:
+			{
+				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
+
+				APP_JUMB(mmexpr->op);
+				JumbleExpr(jstate, (Node *) mmexpr->args);
+			}
+			break;
+		case T_SQLValueFunction:
+			{
+				SQLValueFunction *svf = (SQLValueFunction *) node;
+
+				APP_JUMB(svf->op);
+				/* type is fully determined by op */
+				APP_JUMB(svf->typmod);
+			}
+			break;
+		case T_XmlExpr:
+			{
+				XmlExpr    *xexpr = (XmlExpr *) node;
+
+				APP_JUMB(xexpr->op);
+				JumbleExpr(jstate, (Node *) xexpr->named_args);
+				JumbleExpr(jstate, (Node *) xexpr->args);
+			}
+			break;
+		case T_NullTest:
+			{
+				NullTest   *nt = (NullTest *) node;
+
+				APP_JUMB(nt->nulltesttype);
+				JumbleExpr(jstate, (Node *) nt->arg);
+			}
+			break;
+		case T_BooleanTest:
+			{
+				BooleanTest *bt = (BooleanTest *) node;
+
+				APP_JUMB(bt->booltesttype);
+				JumbleExpr(jstate, (Node *) bt->arg);
+			}
+			break;
+		case T_CoerceToDomain:
+			{
+				CoerceToDomain *cd = (CoerceToDomain *) node;
+
+				APP_JUMB(cd->resulttype);
+				JumbleExpr(jstate, (Node *) cd->arg);
+			}
+			break;
+		case T_CoerceToDomainValue:
+			{
+				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
+
+				APP_JUMB(cdv->typeId);
+			}
+			break;
+		case T_SetToDefault:
+			{
+				SetToDefault *sd = (SetToDefault *) node;
+
+				APP_JUMB(sd->typeId);
+			}
+			break;
+		case T_CurrentOfExpr:
+			{
+				CurrentOfExpr *ce = (CurrentOfExpr *) node;
+
+				APP_JUMB(ce->cvarno);
+				if (ce->cursor_name)
+					APP_JUMB_STRING(ce->cursor_name);
+				APP_JUMB(ce->cursor_param);
+			}
+			break;
+		case T_NextValueExpr:
+			{
+				NextValueExpr *nve = (NextValueExpr *) node;
+
+				APP_JUMB(nve->seqid);
+				APP_JUMB(nve->typeId);
+			}
+			break;
+		case T_InferenceElem:
+			{
+				InferenceElem *ie = (InferenceElem *) node;
+
+				APP_JUMB(ie->infercollid);
+				APP_JUMB(ie->inferopclass);
+				JumbleExpr(jstate, ie->expr);
+			}
+			break;
+		case T_TargetEntry:
+			{
+				TargetEntry *tle = (TargetEntry *) node;
+
+				APP_JUMB(tle->resno);
+				APP_JUMB(tle->ressortgroupref);
+				JumbleExpr(jstate, (Node *) tle->expr);
+			}
+			break;
+		case T_RangeTblRef:
+			{
+				RangeTblRef *rtr = (RangeTblRef *) node;
+
+				APP_JUMB(rtr->rtindex);
+			}
+			break;
+		case T_JoinExpr:
+			{
+				JoinExpr   *join = (JoinExpr *) node;
+
+				APP_JUMB(join->jointype);
+				APP_JUMB(join->isNatural);
+				APP_JUMB(join->rtindex);
+				JumbleExpr(jstate, join->larg);
+				JumbleExpr(jstate, join->rarg);
+				JumbleExpr(jstate, join->quals);
+			}
+			break;
+		case T_FromExpr:
+			{
+				FromExpr   *from = (FromExpr *) node;
+
+				JumbleExpr(jstate, (Node *) from->fromlist);
+				JumbleExpr(jstate, from->quals);
+			}
+			break;
+		case T_OnConflictExpr:
+			{
+				OnConflictExpr *conf = (OnConflictExpr *) node;
+
+				APP_JUMB(conf->action);
+				JumbleExpr(jstate, (Node *) conf->arbiterElems);
+				JumbleExpr(jstate, conf->arbiterWhere);
+				JumbleExpr(jstate, (Node *) conf->onConflictSet);
+				JumbleExpr(jstate, conf->onConflictWhere);
+				APP_JUMB(conf->constraint);
+				APP_JUMB(conf->exclRelIndex);
+				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
+			}
+			break;
+		case T_List:
+			foreach(temp, (List *) node)
+			{
+				JumbleExpr(jstate, (Node *) lfirst(temp));
+			}
+			break;
+		case T_IntList:
+			foreach(temp, (List *) node)
+			{
+				APP_JUMB(lfirst_int(temp));
+			}
+			break;
+		case T_SortGroupClause:
+			{
+				SortGroupClause *sgc = (SortGroupClause *) node;
+
+				APP_JUMB(sgc->tleSortGroupRef);
+				APP_JUMB(sgc->eqop);
+				APP_JUMB(sgc->sortop);
+				APP_JUMB(sgc->nulls_first);
+			}
+			break;
+		case T_GroupingSet:
+			{
+				GroupingSet *gsnode = (GroupingSet *) node;
+
+				JumbleExpr(jstate, (Node *) gsnode->content);
+			}
+			break;
+		case T_WindowClause:
+			{
+				WindowClause *wc = (WindowClause *) node;
+
+				APP_JUMB(wc->winref);
+				APP_JUMB(wc->frameOptions);
+				JumbleExpr(jstate, (Node *) wc->partitionClause);
+				JumbleExpr(jstate, (Node *) wc->orderClause);
+				JumbleExpr(jstate, wc->startOffset);
+				JumbleExpr(jstate, wc->endOffset);
+			}
+			break;
+		case T_CommonTableExpr:
+			{
+				CommonTableExpr *cte = (CommonTableExpr *) node;
+
+				/* we store the string name because RTE_CTE RTEs need it */
+				APP_JUMB_STRING(cte->ctename);
+				APP_JUMB(cte->ctematerialized);
+				JumbleQueryInternal(jstate, castNode(Query, cte->ctequery));
+			}
+			break;
+		case T_SetOperationStmt:
+			{
+				SetOperationStmt *setop = (SetOperationStmt *) node;
+
+				APP_JUMB(setop->op);
+				APP_JUMB(setop->all);
+				JumbleExpr(jstate, setop->larg);
+				JumbleExpr(jstate, setop->rarg);
+			}
+			break;
+		case T_RangeTblFunction:
+			{
+				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
+
+				JumbleExpr(jstate, rtfunc->funcexpr);
+			}
+			break;
+		case T_TableFunc:
+			{
+				TableFunc  *tablefunc = (TableFunc *) node;
+
+				JumbleExpr(jstate, tablefunc->docexpr);
+				JumbleExpr(jstate, tablefunc->rowexpr);
+				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
+			}
+			break;
+		case T_TableSampleClause:
+			{
+				TableSampleClause *tsc = (TableSampleClause *) node;
+
+				APP_JUMB(tsc->tsmhandler);
+				JumbleExpr(jstate, (Node *) tsc->args);
+				JumbleExpr(jstate, (Node *) tsc->repeatable);
+			}
+			break;
+		default:
+			/* Only a warning, since we can stumble along anyway */
+			elog(WARNING, "unrecognized node type: %d",
+				 (int) nodeTag(node));
+			break;
+	}
+}
+
+/*
+ * Record location of constant within query string of query tree
+ * that is currently being walked.
+ */
+static void
+RecordConstLocation(JumbleState *jstate, int location)
+{
+	/* -1 indicates unknown or undefined location */
+	if (location >= 0)
+	{
+		/* enlarge array if needed */
+		if (jstate->clocations_count >= jstate->clocations_buf_size)
+		{
+			jstate->clocations_buf_size *= 2;
+			jstate->clocations = (LocationLen *)
+				repalloc(jstate->clocations,
+						 jstate->clocations_buf_size *
+						 sizeof(LocationLen));
+		}
+		jstate->clocations[jstate->clocations_count].location = location;
+		/* initialize lengths to -1 to simplify third-party module usage */
+		jstate->clocations[jstate->clocations_count].length = -1;
+		jstate->clocations_count++;
+	}
+}
diff --git a/src/include/parser/analyze.h b/src/include/parser/analyze.h
index 4a3c9686f9..6716db6c13 100644
--- a/src/include/parser/analyze.h
+++ b/src/include/parser/analyze.h
@@ -15,10 +15,12 @@
 #define ANALYZE_H
 
 #include "parser/parse_node.h"
+#include "utils/queryjumble.h"
 
 /* Hook for plugins to get control at end of parse analysis */
 typedef void (*post_parse_analyze_hook_type) (ParseState *pstate,
-											  Query *query);
+											  Query *query,
+											  JumbleState *jstate);
 extern PGDLLIMPORT post_parse_analyze_hook_type post_parse_analyze_hook;
 
 
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 5004ee4177..9b6552b25b 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -248,6 +248,7 @@ extern bool log_btree_build_stats;
 extern PGDLLIMPORT bool check_function_bodies;
 extern bool session_auth_is_superuser;
 
+extern bool compute_query_id;
 extern bool log_duration;
 extern int	log_parameter_max_length;
 extern int	log_parameter_max_length_on_error;
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
new file mode 100644
index 0000000000..14087eea43
--- /dev/null
+++ b/src/include/utils/queryjumble.h
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.h
+ *	  Query normalization and fingerprinting.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/utils/queryjumble.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef QUERYJUBLE_H
+#define QUERYJUBLE_H
+
+#include "nodes/parsenodes.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+/*
+ * Struct for tracking locations/lengths of constants during normalization
+ */
+typedef struct LocationLen
+{
+	int			location;		/* start offset in query text */
+	int			length;			/* length in bytes, or -1 to ignore */
+} LocationLen;
+
+/*
+ * Working state for computing a query jumble and producing a normalized
+ * query string
+ */
+typedef struct JumbleState
+{
+	/* Jumble of current query tree */
+	unsigned char *jumble;
+
+	/* Number of bytes used in jumble[] */
+	Size		jumble_len;
+
+	/* Array of locations of constants that should be removed */
+	LocationLen *clocations;
+
+	/* Allocated length of clocations array */
+	int			clocations_buf_size;
+
+	/* Current number of valid entries in clocations array */
+	int			clocations_count;
+
+	/* highest Param id we've seen, in order to start normalization correctly */
+	int			highest_extern_param_id;
+} JumbleState;
+
+const char *clean_querytext(const char *query, int *location, int *len);
+JumbleState *JumbleQuery(Query *query, const char *querytext);
+
+#endif							/* QUERYJUMBLE_H */
-- 
2.20.1

qid-02-display_over_qid-01-jumble.difftext/x-diff; charset=us-asciiDownload

From 601b81a4bdce056e7533521072f9a14e7257e541 Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Mon, 22 Mar 2021 17:43:23 -0400
Subject: [PATCH] qid-02-display_over_qid-01-jumble squash commit

---
 .../pg_stat_statements/pg_stat_statements.c   | 112 +++++++-----------
 doc/src/sgml/config.sgml                      |  29 +++--
 doc/src/sgml/monitoring.sgml                  |  16 +++
 src/backend/catalog/system_views.sql          |   1 +
 src/backend/executor/execMain.c               |   9 ++
 src/backend/executor/execParallel.c           |  14 ++-
 src/backend/executor/nodeGather.c             |   3 +-
 src/backend/executor/nodeGatherMerge.c        |   4 +-
 src/backend/parser/analyze.c                  |   5 +
 src/backend/postmaster/pgstat.c               |  65 ++++++++++
 src/backend/tcop/postgres.c                   |   5 +
 src/backend/utils/adt/pgstatfuncs.c           |   7 +-
 src/backend/utils/error/elog.c                |   9 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          |  29 +++--
 src/include/catalog/pg_proc.dat               |   6 +-
 src/include/executor/execParallel.h           |   3 +-
 src/include/pgstat.h                          |   5 +
 src/include/utils/queryjumble.h               |   2 +-
 src/test/regress/expected/rules.out           |   9 +-
 20 files changed, 224 insertions(+), 110 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index bd8c96728c..f62b9a2bfd 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -65,6 +65,7 @@
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/queryjumble.h"
 #include "utils/memutils.h"
 #include "utils/timestamp.h"
 
@@ -99,6 +100,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+									!IsA(n, PrepareStmt) && \
+									!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -307,7 +316,6 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 								ProcessUtilityContext context, ParamListInfo params,
 								QueryEnvironment *queryEnv,
 								DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
 static void pgss_store(const char *query, uint64 queryId,
 					   int query_location, int query_len,
 					   pgssStoreKind kind,
@@ -804,16 +812,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * Clear queryId for prepared statements related utility, as those will
+	 * inherit from the underlying statement's one (except DEALLOCATE which is
+	 * entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt))
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -1055,6 +1061,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, QueryCompletion *qc)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
+	uint64		saved_queryId = pstmt->queryId;
+
+	/*
+	 * Force utility statements to get queryId zero.  We do this even in cases
+	 * where the statement contains an optimizable statement for which a
+	 * queryId could be derived (such as EXPLAIN or DECLARE CURSOR).  For such
+	 * cases, runtime control will first go through ProcessUtility and then the
+	 * executor, and we don't want the executor hooks to do anything, since we
+	 * are already measuring the statement's costs at the utility level.
+	 *
+	 * Note that this is only done if pg_stat_statements is enabled and
+	 * configured to track utility statements, in the unlikely possibility
+	 * that user configured another extension to handle utility statements
+	 * only.
+	 */
+	if (pgss_enabled(exec_nested_level) && pgss_track_utility)
+		pstmt->queryId = UINT64CONST(0);
 
 	/*
 	 * If it's an EXECUTE statement, we don't track it and don't increment the
@@ -1071,9 +1094,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	 * Likewise, we don't track execution of DEALLOCATE.
 	 */
 	if (pgss_track_utility && pgss_enabled(exec_nested_level) &&
-		!IsA(parsetree, ExecuteStmt) &&
-		!IsA(parsetree, PrepareStmt) &&
-		!IsA(parsetree, DeallocateStmt))
+		PGSS_HANDLED_UTILITY(parsetree))
 	{
 		instr_time	start;
 		instr_time	duration;
@@ -1128,7 +1149,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
 
 		pgss_store(queryString,
-				   0,			/* signal that it's a utility stmt */
+				   saved_queryId,
 				   pstmt->stmt_location,
 				   pstmt->stmt_len,
 				   PGSS_EXEC,
@@ -1151,23 +1172,12 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	}
 }
 
-/*
- * Given an arbitrarily long query string, produce a hash for the purposes of
- * identifying the query, without normalizing constants.  Used when hashing
- * utility statements.
- */
-static uint64
-pgss_hash_string(const char *str, int len)
-{
-	return DatumGetUInt64(hash_any_extended((const unsigned char *) str,
-											len, 0));
-}
-
 /*
  * Store some statistics for a statement.
  *
- * If queryId is 0 then this is a utility statement and we should compute
- * a suitable queryId internally.
+ * If queryId is 0 then this is a utility statement for which we couldn't
+ * compute a queryId during parse analysis, and we should compute a suitable
+ * queryId internally.
  *
  * If jstate is not NULL then we're trying to create an entry for which
  * we have no statistics as yet; we just want to record the normalized
@@ -1198,52 +1208,18 @@ pgss_store(const char *query, uint64 queryId,
 		return;
 
 	/*
-	 * Confine our attention to the relevant part of the string, if the query
-	 * is a portion of a multi-statement source string.
-	 *
-	 * First apply starting offset, unless it's -1 (unknown).
-	 */
-	if (query_location >= 0)
-	{
-		Assert(query_location <= strlen(query));
-		query += query_location;
-		/* Length of 0 (or -1) means "rest of string" */
-		if (query_len <= 0)
-			query_len = strlen(query);
-		else
-			Assert(query_len <= strlen(query));
-	}
-	else
-	{
-		/* If query location is unknown, distrust query_len as well */
-		query_location = 0;
-		query_len = strlen(query);
-	}
-
-	/*
-	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
-	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 * Nothing to do if compute_query_id isn't enabled and no other module
+	 * computed a query identifier.
 	 */
-	while (query_len > 0 && scanner_isspace(query[0]))
-		query++, query_location++, query_len--;
-	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
-		query_len--;
+	if (queryId == UINT64CONST(0))
+		return;
 
 	/*
-	 * For utility statements, we just hash the query string to get an ID.
+	 * Confine our attention to the relevant part of the string, if the query
+	 * is a portion of a multi-statement source string, and update query
+	 * location and length if needed.
 	 */
-	if (queryId == UINT64CONST(0))
-	{
-		queryId = pgss_hash_string(query, query_len);
-
-		/*
-		 * If we are unlucky enough to get a hash of zero(invalid), use
-		 * queryID as 2 instead, queryID 1 is already in use for normal
-		 * statements.
-		 */
-		if (queryId == UINT64CONST(0))
-			queryId = UINT64CONST(2);
-	}
+	query = CleanQuerytext(query, &query_location, &query_len);
 
 	/* Set up key for hashtable search */
 	key.userid = GetUserId();
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 89f7daf11f..a3034beddc 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6943,6 +6943,15 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>query identifier of the current query.  Query
+             identifiers are not computed by default, so this field
+             will be zero unless <xref linkend="guc-compute-query-id"/>
+             parameter is enabled or a third-party module that computes
+             query identifiers is configured.</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
@@ -7419,8 +7428,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       <listitem>
        <para>
         Enables the collection of information on the currently
-        executing command of each session, along with the time when
-        that command began execution. This parameter is on by
+        executing command of each session, along with its identifier and the
+        time when that command began execution. This parameter is on by
         default. Note that even when enabled, this information is not
         visible to all users, only to superusers and the user owning
         the session being reported on, so it should not represent a
@@ -7569,12 +7578,16 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </term>
       <listitem>
        <para>
-        Enables in-core computation of a query identifier.  The <xref
-        linkend="pgstatstatements"/> extension requires a query identifier
-        to be computed.  Note that an external module can alternatively
-        be used if the in-core query identifier computation method
-        isn't acceptable.  In this case, in-core computation should
-        remain disabled.  The default is <literal>off</literal>.
+        Enables in-core computation of a query identifier.
+        Query identifiers can be displayed in the <link
+        linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
+        view, or emitted in the log if configured via the <xref
+        linkend="guc-log-line-prefix"/> parameter.  The <xref
+        linkend="pgstatstatements"/> extension also requires a query
+        identifier to be computed.  Note that an external module can
+        alternatively be used if the in-core query identifier computation
+        specification isn't acceptable.  In this case, in-core computation
+        must be disabled.  The default is <literal>off</literal>.
        </para>
        <note>
         <para>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index db4b4e460c..c2ef473cc5 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -910,6 +910,22 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </para></entry>
      </row>
 
+    <row>
+     <entry role="catalog_table_entry"><para role="column_definition">
+      <structfield>queryid</structfield> <type>bigint</type>
+     </para>
+     <para>
+      Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this
+      field shows the identifier of the currently executing query. In
+      all other states, it shows the identifier of last query that was
+      executed.  Query identifiers are not computed by default so this
+      field will be null unless <xref linkend="guc-compute-query-id"/>
+      parameter is enabled or a third-party module that computes query
+      identifiers is configured.
+     </para></entry>
+    </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 0dca65dc7b..012d86217f 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -764,6 +764,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0648dd82ba..2d1c7690cb 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -54,6 +54,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "tcop/utility.h"
@@ -128,6 +129,14 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	/*
+	 * In some cases (e.g. an EXECUTE statement) a query execution will skip
+	 * parse analysis, which means that the queryid won't be reported.  Note
+	 * that it's harmless to report the queryid multiple time, as the call will
+	 * be ignored if the top level queryid has already been reported.
+	 */
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..26f1994a31 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -124,7 +124,7 @@ typedef struct ExecParallelInitializeDSMContext
 } ExecParallelInitializeDSMContext;
 
 /* Helper functions that run in the parallel leader. */
-static char *ExecSerializePlan(Plan *plan, EState *estate);
+static char *ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId);
 static bool ExecParallelEstimate(PlanState *node,
 								 ExecParallelEstimateContext *e);
 static bool ExecParallelInitializeDSM(PlanState *node,
@@ -143,7 +143,7 @@ static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
  * Create a serialized representation of the plan to be sent to each worker.
  */
 static char *
-ExecSerializePlan(Plan *plan, EState *estate)
+ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId)
 {
 	PlannedStmt *pstmt;
 	ListCell   *lc;
@@ -174,7 +174,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = UINT64CONST(0);
+	pstmt->queryId = queryId;
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
@@ -578,7 +578,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
 ParallelExecutorInfo *
 ExecInitParallelPlan(PlanState *planstate, EState *estate,
 					 Bitmapset *sendParams, int nworkers,
-					 int64 tuples_needed)
+					 int64 tuples_needed,
+					 uint64 queryId)
 {
 	ParallelExecutorInfo *pei;
 	ParallelContext *pcxt;
@@ -620,7 +621,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pei->planstate = planstate;
 
 	/* Fix up and serialize plan to be sent to workers. */
-	pstmt_data = ExecSerializePlan(planstate->plan, estate);
+	pstmt_data = ExecSerializePlan(planstate->plan, estate, queryId);
 
 	/* Create a parallel context. */
 	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -1403,8 +1404,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Setting debug_query_string for individual workers */
 	debug_query_string = queryDesc->sourceText;
 
-	/* Report workers' query for monitoring purposes */
+	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 9e1dc464cb..04c860f678 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -172,7 +172,8 @@ ExecGather(PlanState *pstate)
 												 estate,
 												 gather->initParam,
 												 gather->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index aa5743cebf..32f74e8c23 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -24,6 +24,7 @@
 #include "lib/binaryheap.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -216,7 +217,8 @@ ExecGatherMerge(PlanState *pstate)
 												 estate,
 												 gm->initParam,
 												 gm->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index c565c80365..d125ef7f98 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -44,6 +44,7 @@
 #include "parser/parse_target.h"
 #include "parser/parse_type.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
@@ -130,6 +131,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
@@ -167,6 +170,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 208a33692f..2419a2b003 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3381,6 +3381,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
+			beentry->st_queryid = 0;
 			proc->wait_event_info = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
@@ -3435,6 +3436,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	beentry->st_state = state;
 	beentry->st_state_start_timestamp = current_timestamp;
 
+	/*
+	 * If a new query is started, we reset the query identifier as it'll only
+	 * be known after parse analysis, to avoid reporting last query's
+	 * identifier.
+	 */
+	if (state == STATE_RUNNING)
+		beentry->st_queryid = 0;
+
 	if (cmd_str != NULL)
 	{
 		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
@@ -3445,6 +3454,48 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/* --------
+ * pgstat_report_queryid() -
+ *
+ *	Called to update top-level query identifier.
+ * --------
+ */
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	/*
+	 * if track_activities is disabled, st_queryid should already have been
+	 * reset
+	 */
+	if (!pgstat_track_activities)
+		return;
+
+	/*
+	 * We only report the top-level query identifiers.  The stored queryid is
+	 * reset when a backend calls pgstat_report_activity(STATE_RUNNING), or
+	 * with an explicit call to this function using the force flag.  If the
+	 * saved query identifier is not zero it means that it's not a top-level
+	 * command, so ignore the one provided unless it's an explicit call to
+	 * reset the identifier.
+	 */
+	if (beentry->st_queryid != 0 && !force)
+		return;
+
+	/*
+	 * Update my status entry, following the protocol of bumping
+	 * st_changecount before and after.  We use a volatile pointer here to
+	 * ensure the compiler doesn't try to get cute.
+	 */
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+	beentry->st_queryid = queryId;
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
 /*-----------
  * pgstat_progress_start_command() -
  *
@@ -5178,6 +5229,20 @@ pgstat_get_db_entry(Oid databaseid, bool create)
 	return result;
 }
 
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ *	Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+	if (!MyBEEntry)
+		return 0;
+
+	return MyBEEntry->st_queryid;
+}
+
 
 /*
  * Lookup the hash table entry for the specified table. If no hash
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7e034b72b1..d66cee79f0 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -692,6 +692,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	if (log_parser_stats)
 		ShowUsage("PARSE ANALYSIS STATISTICS");
 
@@ -910,6 +912,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
 			stmt->utilityStmt = query->utilityStmt;
 			stmt->stmt_location = query->stmt_location;
 			stmt->stmt_len = query->stmt_len;
+			stmt->queryId = query->queryId;
 		}
 		else
 		{
@@ -1026,6 +1029,8 @@ exec_simple_query(const char *query_string)
 		DestReceiver *receiver;
 		int16		format;
 
+		pgstat_report_queryid(0, true);
+
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 5102227a60..8e81eef8cb 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -569,7 +569,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	29
+#define PG_STAT_GET_ACTIVITY_COLS	30
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -914,6 +914,10 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[27] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			if (beentry->st_queryid == 0)
+				nulls[29] = true;
+			else
+				values[29] = DatumGetUInt64(beentry->st_queryid);
 		}
 		else
 		{
@@ -941,6 +945,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[26] = true;
 			nulls[27] = true;
 			nulls[28] = true;
+			nulls[29] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index e729ebece7..7aa484c5ed 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -77,7 +77,6 @@
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
-#include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2685,6 +2684,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (padding != 0)
+					appendStringInfo(buf, "%*ld", padding,
+							pgstat_get_my_queryid());
+				else
+					appendStringInfo(buf, "%ld",
+							pgstat_get_my_queryid());
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 01493ed3d4..47d6e2019b 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -542,6 +542,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index ae84fcac6e..b0a5731ef7 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -39,7 +39,7 @@
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
-static uint64 compute_utility_queryid(const char *str, int query_len);
+static uint64 compute_utility_queryid(const char *str, int query_location, int query_len);
 static void AppendJumble(JumbleState *jstate,
 						 const unsigned char *item, Size size);
 static void JumbleQueryInternal(JumbleState *jstate, Query *query);
@@ -53,7 +53,7 @@ static void RecordConstLocation(JumbleState *jstate, int location);
  * relevant part of the string.
  */
 const char *
-clean_querytext(const char *query, int *location, int *len)
+CleanQuerytext(const char *query, int *location, int *len)
 {
 	int query_location = *location;
 	int query_len = *len;
@@ -97,17 +97,9 @@ JumbleQuery(Query *query, const char *querytext)
 	JumbleState *jstate = NULL;
 	if (query->utilityStmt)
 	{
-		const char *sql;
-		int query_location = query->stmt_location;
-		int query_len = query->stmt_len;
-
-		/*
-		 * Confine our attention to the relevant part of the string, if the
-		 * query is a portion of a multi-statement source string.
-		 */
-		sql = clean_querytext(querytext, &query_location, &query_len);
-
-		query->queryId = compute_utility_queryid(sql, query_len);
+		query->queryId = compute_utility_queryid(querytext,
+												 query->stmt_location,
+												 query->stmt_len);
 	}
 	else
 	{
@@ -143,11 +135,18 @@ JumbleQuery(Query *query, const char *querytext)
  * Compute a query identifier for the given utility query string.
  */
 static uint64
-compute_utility_queryid(const char *str, int query_len)
+compute_utility_queryid(const char *query_text, int query_location, int query_len)
 {
 	uint64 queryId;
+	const char *sql;
+
+	/*
+	 * Confine our attention to the relevant part of the string, if the
+	 * query is a portion of a multi-statement source string.
+	 */
+	sql = CleanQuerytext(query_text, &query_location, &query_len);
 
-	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) sql,
 											   query_len, 0));
 
 	/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index e259531f60..9550de0798 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5249,9 +5249,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 3888175a2f..e0e08e0b27 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -39,7 +39,8 @@ typedef struct ParallelExecutorInfo
 
 extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
 												  EState *estate, Bitmapset *sendParam, int nworkers,
-												  int64 tuples_needed);
+												  int64 tuples_needed,
+												  uint64 queryId);
 extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
 extern void ExecParallelFinish(ParallelExecutorInfo *pei);
 extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be43c04802..09d36a1e23 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1263,6 +1263,9 @@ typedef struct PgBackendStatus
 	ProgressCommandType st_progress_command;
 	Oid			st_progress_command_target;
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+
+	/* query identifier, optionally computed using post_parse_analyze_hook */
+	uint64		st_queryid;
 } PgBackendStatus;
 
 /*
@@ -1457,6 +1460,7 @@ extern void pgstat_initialize(void);
 extern void pgstat_bestart(void);
 
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_queryid(uint64 queryId, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
@@ -1465,6 +1469,7 @@ extern const char *pgstat_get_wait_event_type(uint32 wait_event_info);
 extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
+extern uint64 pgstat_get_my_queryid(void);
 
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
index 14087eea43..520cd4f43e 100644
--- a/src/include/utils/queryjumble.h
+++ b/src/include/utils/queryjumble.h
@@ -52,7 +52,7 @@ typedef struct JumbleState
 	int			highest_extern_param_id;
 } JumbleState;
 
-const char *clean_querytext(const char *query, int *location, int *len);
+const char *CleanQuerytext(const char *query, int *location, int *len);
 JumbleState *JumbleQuery(Query *query, const char *querytext);
 
 #endif							/* QUERYJUMBLE_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 9b12cc122a..ff3506d5d7 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1762,9 +1762,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1876,7 +1877,7 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
@@ -2046,7 +2047,7 @@ pg_stat_replication| SELECT s.pid,
     w.sync_priority,
     w.sync_state,
     w.reply_time
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_replication_slots| SELECT s.slot_name,
@@ -2076,7 +2077,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,
-- 
2.20.1

qid-03-explain_over_qid-02-display.difftext/x-diff; charset=us-asciiDownload

From 8439ff46a39ae132762a138754567a4d720e34ed Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Mon, 22 Mar 2021 17:43:24 -0400
Subject: [PATCH] qid-03-explain_over_qid-02-display squash commit

---
 doc/src/sgml/config.sgml              |  6 +++---
 doc/src/sgml/ref/explain.sgml         |  6 ++++--
 src/backend/commands/explain.c        | 18 ++++++++++++++++++
 src/test/regress/expected/explain.out | 11 ++++++++++-
 src/test/regress/sql/explain.sql      |  5 ++++-
 5 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a3034beddc..518674c6f5 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7581,9 +7581,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         Enables in-core computation of a query identifier.
         Query identifiers can be displayed in the <link
         linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
-        view, or emitted in the log if configured via the <xref
-        linkend="guc-log-line-prefix"/> parameter.  The <xref
-        linkend="pgstatstatements"/> extension also requires a query
+        view, using <command>EXPLAIN</command>, or emitted in the log if
+        configured via the <xref linkend="guc-log-line-prefix"/> parameter.
+        The <xref linkend="pgstatstatements"/> extension also requires a query
         identifier to be computed.  Note that an external module can
         alternatively be used if the in-core query identifier computation
         specification isn't acceptable.  In this case, in-core computation
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index c4512332a0..135dff6d3d 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -136,8 +136,10 @@ ROLLBACK;
       the output column list for each node in the plan tree, schema-qualify
       table and function names, always label variables in expressions with
       their range table alias, and always print the name of each trigger for
-      which statistics are displayed.  This parameter defaults to
-      <literal>FALSE</literal>.
+      which statistics are displayed.  The query identifier will also be
+      displayed if one has been compute, see <xref
+      linkend="guc-compute-query-id"/> for more details.  This parameter
+      defaults to <literal>FALSE</literal>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index afc45429ba..9794c4e794 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -24,6 +24,7 @@
 #include "nodes/extensible.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "parser/analyze.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/bufmgr.h"
@@ -163,6 +164,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 {
 	ExplainState *es = NewExplainState();
 	TupOutputState *tstate;
+	JumbleState *jstate = NULL;
+	Query		*query;
 	List	   *rewritten;
 	ListCell   *lc;
 	bool		timing_set = false;
@@ -239,6 +242,13 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	/* if the summary was not set explicitly, set default value */
 	es->summary = (summary_set) ? es->summary : es->analyze;
 
+	query = castNode(Query, stmt->query);
+	if (compute_query_id)
+		jstate = JumbleQuery(query, pstate->p_sourcetext);
+
+	if (post_parse_analyze_hook)
+		(*post_parse_analyze_hook) (pstate, query, jstate);
+
 	/*
 	 * Parse analysis was done already, but we still have to run the rule
 	 * rewriter.  We do not do AcquireRewriteLocks: we assume the query either
@@ -598,6 +608,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	/* Create textual dump of plan tree */
 	ExplainPrintPlan(es, queryDesc);
 
+	if (es->verbose && plannedstmt->queryId != UINT64CONST(0))
+	{
+		char	buf[MAXINT8LEN+1];
+
+		pg_lltoa(plannedstmt->queryId, buf);
+		ExplainPropertyText("Query Identifier", buf, es);
+	}
+
 	/* Show buffer usage in planning */
 	if (bufusage)
 	{
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
index 791eba8511..1f8a3ead52 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -17,7 +17,7 @@ begin
     for ln in execute $1
     loop
         -- Replace any numeric word with just 'N'
-        ln := regexp_replace(ln, '\m\d+\M', 'N', 'g');
+        ln := regexp_replace(ln, '-?\m\d+\M', 'N', 'g');
         -- In sort output, the above won't match units-suffixed numbers
         ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
         -- Ignore text-mode buffers output because it varies depending
@@ -470,3 +470,12 @@ select jsonb_pretty(
 (1 row)
 
 rollback;
+set compute_query_id = on;
+select explain_filter('explain (verbose) select 1');
+             explain_filter             
+----------------------------------------
+ Result  (cost=N.N..N.N rows=N width=N)
+   Output: N
+ Query Identifier: N
+(3 rows)
+
diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql
index f2eab030d6..468caf4037 100644
--- a/src/test/regress/sql/explain.sql
+++ b/src/test/regress/sql/explain.sql
@@ -19,7 +19,7 @@ begin
     for ln in execute $1
     loop
         -- Replace any numeric word with just 'N'
-        ln := regexp_replace(ln, '\m\d+\M', 'N', 'g');
+        ln := regexp_replace(ln, '-?\m\d+\M', 'N', 'g');
         -- In sort output, the above won't match units-suffixed numbers
         ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
         -- Ignore text-mode buffers output because it varies depending
@@ -103,3 +103,6 @@ select jsonb_pretty(
 );
 
 rollback;
+
+set compute_query_id = on;
+select explain_filter('explain (verbose) select 1');
-- 
2.20.1

zyu@yugabyte.com

almost 5 years ago

In reply to: Bruce Momjian (#139)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Hi,
For queryjumble.c :

+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group

The year should be updated.
Same with queryjumble.h

Cheers

On Mon, Mar 22, 2021 at 2:56 PM Bruce Momjian <bruce@momjian.us> wrote:

Show quoted text

On Sat, Mar 20, 2021 at 02:12:34PM +0800, Julien Rouhaud wrote:

On Fri, Mar 19, 2021 at 12:53:18PM -0400, Bruce Momjian wrote:

Well, given we don't really want to support multiple query id types
being generated or displayed, the "error out" above should fix it.

Let's do this --- tell extensions to error out if the query id is
already set, either by compute_query_id or another extension. If an
extension wants to generate its own query id and store is internal to
the extension, that is fine, but the server-displayed query id should

be

generated once and never overwritten by an extension.

Agreed, this will ensure that you won't dynamically change the queryid

source.

We should also document that changing it requires a restart and calling
pg_stat_statements_reset() afterwards.

v19 adds some changes, plus extra documentation for pg_stat_statements

about

the requirement for a queryid to be calculated, and a note that all

documented

details only apply for in-core source. I'm not sure if this is still

the best

place to document those details anymore though.

OK, after reading the entire thread, I don't think there are any
remaining open issues with this patch and I think this is ready for
committing. I have adjusted the doc section of the patches, attached.
I have marked myself as committer in the commitfest app and hope to
apply it in the next few days based on feedback.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

bruce@momjian.us

almost 5 years ago

In reply to: Zhihong Yu (#140)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Mon, Mar 22, 2021 at 05:17:15PM -0700, Zhihong Yu wrote:

Hi,
Forï¿½queryjumble.c :

+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group

The year should be updated.
Same withï¿½queryjumble.h

Thanks, fixed.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

rjuju123@gmail.com

almost 5 years ago

In reply to: Bruce Momjian (#139)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Mon, Mar 22, 2021 at 05:55:54PM -0400, Bruce Momjian wrote:

OK, after reading the entire thread, I don't think there are any
remaining open issues with this patch and I think this is ready for
committing. I have adjusted the doc section of the patches, attached.
I have marked myself as committer in the commitfest app and hope to
apply it in the next few days based on feedback.

Thanks a lot Bruce!

I looked at the changes in the attached patches and that's a clear
improvements, thanks a lot for that.

rjuju123@gmail.com

almost 5 years ago

In reply to: Bruce Momjian (#141)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Mon, Mar 22, 2021 at 08:43:40PM -0400, Bruce Momjian wrote:

On Mon, Mar 22, 2021 at 05:17:15PM -0700, Zhihong Yu wrote:

Hi,
Forï¿½queryjumble.c :

+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group

The year should be updated.
Same withï¿½queryjumble.h

Thanks, fixed.

Thanks also for taking care of that. While at it I see that current HEAD has a
lot of files with the same problem:

$ git grep "\-2020"
config/config.guess:# Copyright 1992-2020 Free Software Foundation, Inc.
config/config.guess:Copyright 1992-2020 Free Software Foundation, Inc.
config/config.sub:# Copyright 1992-2020 Free Software Foundation, Inc.
config/config.sub:Copyright 1992-2020 Free Software Foundation, Inc.
contrib/pageinspect/gistfuncs.c: * Copyright (c) 2014-2020, PostgreSQL Global Development Group
src/backend/rewrite/rewriteSearchCycle.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/backend/utils/adt/jsonbsubs.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/bin/pg_archivecleanup/po/de.po:# Copyright (C) 2019-2020 PostgreSQL Global Development Group
src/bin/pg_rewind/po/de.po:# Copyright (C) 2015-2020 PostgreSQL Global Development Group
src/bin/pg_rewind/po/de.po:# Peter Eisentraut <peter@eisentraut.org>, 2015-2020.
src/common/hex.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/common/sha1.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/common/sha1_int.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/include/common/hex.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/include/common/sha1.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/include/port/pg_iovec.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/include/rewrite/rewriteSearchCycle.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/interfaces/ecpg/preproc/po/de.po:# Copyright (C) 2009-2020 PostgreSQL Global Development Group
src/interfaces/ecpg/preproc/po/de.po:# Peter Eisentraut <peter@eisentraut.org>, 2009-2020.

Is that an oversight in ca3b37487be333a1d241dab1bbdd17a211a88f43, at least for
non .po files?

bruce@momjian.us

almost 5 years ago

In reply to: Julien Rouhaud (#143)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Mar 23, 2021 at 02:36:27PM +0800, Julien Rouhaud wrote:

On Mon, Mar 22, 2021 at 08:43:40PM -0400, Bruce Momjian wrote:

On Mon, Mar 22, 2021 at 05:17:15PM -0700, Zhihong Yu wrote:

Hi,
Forï¿½queryjumble.c :

+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group

The year should be updated.
Same withï¿½queryjumble.h

Thanks, fixed.

Thanks also for taking care of that. While at it I see that current HEAD has a
lot of files with the same problem:

$ git grep "\-2020"
config/config.guess:# Copyright 1992-2020 Free Software Foundation, Inc.
config/config.guess:Copyright 1992-2020 Free Software Foundation, Inc.
config/config.sub:# Copyright 1992-2020 Free Software Foundation, Inc.
config/config.sub:Copyright 1992-2020 Free Software Foundation, Inc.
contrib/pageinspect/gistfuncs.c: * Copyright (c) 2014-2020, PostgreSQL Global Development Group
src/backend/rewrite/rewriteSearchCycle.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/backend/utils/adt/jsonbsubs.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/bin/pg_archivecleanup/po/de.po:# Copyright (C) 2019-2020 PostgreSQL Global Development Group
src/bin/pg_rewind/po/de.po:# Copyright (C) 2015-2020 PostgreSQL Global Development Group
src/bin/pg_rewind/po/de.po:# Peter Eisentraut <peter@eisentraut.org>, 2015-2020.
src/common/hex.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/common/sha1.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/common/sha1_int.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/include/common/hex.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/include/common/sha1.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/include/port/pg_iovec.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/include/rewrite/rewriteSearchCycle.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/interfaces/ecpg/preproc/po/de.po:# Copyright (C) 2009-2020 PostgreSQL Global Development Group
src/interfaces/ecpg/preproc/po/de.po:# Peter Eisentraut <peter@eisentraut.org>, 2009-2020.

Is that an oversight in ca3b37487be333a1d241dab1bbdd17a211a88f43, at least for
non .po files?

No, I don't think so. We don't change the Free Software Foundation
copyrights, and the .po files get loaded from another repository
occasionally. The hex/sha copyrights came from patches developed in
2020 but committed in 2021. These will mostly be corrected in 2022.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

alvherre@alvh.no-ip.org

almost 5 years ago

In reply to: Bruce Momjian (#139)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On 2021-Mar-22, Bruce Momjian wrote:

--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -136,8 +136,10 @@ ROLLBACK;
the output column list for each node in the plan tree, schema-qualify
table and function names, always label variables in expressions with
their range table alias, and always print the name of each trigger for
-      which statistics are displayed.  This parameter defaults to
-      <literal>FALSE</literal>.
+      which statistics are displayed.  The query identifier will also be
+      displayed if one has been compute, see <xref
+      linkend="guc-compute-query-id"/> for more details.  This parameter
+      defaults to <literal>FALSE</literal>.

Typo here, "has been computed".

Is the intention to commit each of these patches separately?

--
ï¿½lvaro Herrera Valdivia, Chile

bruce@momjian.us

almost 5 years ago

In reply to: Alvaro Herrera (#145)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Mar 23, 2021 at 12:12:03PM -0300, ï¿½lvaro Herrera wrote:

On 2021-Mar-22, Bruce Momjian wrote:

--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -136,8 +136,10 @@ ROLLBACK;
the output column list for each node in the plan tree, schema-qualify
table and function names, always label variables in expressions with
their range table alias, and always print the name of each trigger for
-      which statistics are displayed.  This parameter defaults to
-      <literal>FALSE</literal>.
+      which statistics are displayed.  The query identifier will also be
+      displayed if one has been compute, see <xref
+      linkend="guc-compute-query-id"/> for more details.  This parameter
+      defaults to <literal>FALSE</literal>.

Typo here, "has been computed".

Good catch, fixed.

Is the intention to commit each of these patches separately?

No, I was thinking of just doing a single commit. Should I do three
commits? I posted it as three patches since that is how it was posted
by the author, and reviewing is easier. It also will need a catversion
bump.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

rjuju123@gmail.com

almost 5 years ago

In reply to: Bruce Momjian (#144)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Mar 23, 2021 at 10:34:38AM -0400, Bruce Momjian wrote:

On Tue, Mar 23, 2021 at 02:36:27PM +0800, Julien Rouhaud wrote:

Is that an oversight in ca3b37487be333a1d241dab1bbdd17a211a88f43, at least for
non .po files?

No, I don't think so. We don't change the Free Software Foundation
copyrights, and the .po files get loaded from another repository
occasionally. The hex/sha copyrights came from patches developed in
2020 but committed in 2021. These will mostly be corrected in 2022.

Ok, thanks for the clarification!

rjuju123@gmail.com

almost 5 years ago

In reply to: Bruce Momjian (#146)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Mar 23, 2021 at 12:27:10PM -0400, Bruce Momjian wrote:

No, I was thinking of just doing a single commit. Should I do three
commits? I posted it as three patches since that is how it was posted
by the author, and reviewing is easier. It also will need a catversion
bump.

Yes, I originally split the commit because it was easier to write this way and
it seemed better to send different patches too to ease review.

I think that it would make sense to commit the first patch separately, but I'm
fine with a single commit if you prefer.

alvherre@alvh.no-ip.org

almost 5 years ago

In reply to: Bruce Momjian (#139)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On 2021-Mar-22, Bruce Momjian wrote:

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index e259531f60..9550de0798 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5249,9 +5249,9 @@
proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
proretset => 't', provolatile => 's', proparallel => 'r',
prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,queryid}',

BTW why do you put the queryid column at the end of the column list
here? It seems awkward. Can we put it perhaps between state and query?

-const char *clean_querytext(const char *query, int *location, int *len);
+const char *CleanQuerytext(const char *query, int *location, int *len);
JumbleState *JumbleQuery(Query *query, const char *querytext);

I think pushing in more than one commit is a reasonable approach if they
are well-contained, but if you do that it'd be better to avoid
introducing a function with one name and renaming it in your next
commit.

--
ï¿½lvaro Herrera Valdivia, Chile
"Just treat us the way you want to be treated + some extra allowance
for ignorance." (Michael Brusser)

rjuju123@gmail.com

almost 5 years ago

In reply to: Alvaro Herrera (#149)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Mar 24, 2021 at 05:12:35AM -0300, Alvaro Herrera wrote:

On 2021-Mar-22, Bruce Momjian wrote:

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index e259531f60..9550de0798 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5249,9 +5249,9 @@
proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
proretset => 't', provolatile => 's', proparallel => 'r',
prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,queryid}',

BTW why do you put the queryid column at the end of the column list
here? It seems awkward. Can we put it perhaps between state and query?

I thought that it would be better to have it at the end as it can always be
NULL (and will be by default), which I guess was also the reason to have
leader_pid there. I'm all in favor to have queryid near the query, and
while at it leader_pid near the pid.

-const char *clean_querytext(const char *query, int *location, int *len);
+const char *CleanQuerytext(const char *query, int *location, int *len);
JumbleState *JumbleQuery(Query *query, const char *querytext);
I think pushing in more than one commit is a reasonable approach if they
are well-contained

They should, as I incrementally built on top of the first one. I also just
double checked the patchset and each new commit compiles and passes the
regression tests.

but if you do that it'd be better to avoid
introducing a function with one name and renaming it in your next
commit.

Oops, I apparently messed a fixup when working on it. Bruce, should I take
care of that of do you want to? I think you have some local modifications
already I'd rather not miss some changes.

bruce@momjian.us

almost 5 years ago

In reply to: Julien Rouhaud (#150)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Mar 24, 2021 at 04:51:40PM +0800, Julien Rouhaud wrote:

but if you do that it'd be better to avoid
introducing a function with one name and renaming it in your next
commit.

Oops, I apparently messed a fixup when working on it. Bruce, should I take
care of that of do you want to? I think you have some local modifications
already I'd rather not miss some changes.

I have no local modifications. Please modify the patch I posted and
repost your version, thanks.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

rjuju123@gmail.com

almost 5 years ago

In reply to: Bruce Momjian (#151)

3 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Mar 24, 2021 at 08:13:40AM -0400, Bruce Momjian wrote:

On Wed, Mar 24, 2021 at 04:51:40PM +0800, Julien Rouhaud wrote:

but if you do that it'd be better to avoid
introducing a function with one name and renaming it in your next
commit.

Oops, I apparently messed a fixup when working on it. Bruce, should I take
care of that of do you want to? I think you have some local modifications
already I'd rather not miss some changes.

I have no local modifications. Please modify the patch I posted and
repost your version, thanks.

Ok! I used the last version of the patch you sent and addressed the following
comments from earlier messages in attached v20:

- copyright year to 2021
- s/has has been compute/has been compute/
- use the name CleanQuerytext in the first commit

I didn't change the position of queryid in pg_stat_get_activity(), as the
"real" order is actually define in system_views.sql when creating
pg_stat_activity view. Adding the new fields at the end of
pg_stat_get_activity() helps to keep the C code simpler and less bug prone, so
I think it's best to continue this way.

I also used the previous commit message if that helps.

Attachments:

v20-0001-Move-pg_stat_statements-query-jumbling-to-core.patchtext/x-diff; charset=us-asciiDownload

From 5df95cb13c3505d654bd480c8978fe6f5eba00bb Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Mon, 22 Mar 2021 17:43:22 -0400
Subject: [PATCH v20 1/3] Move pg_stat_statements query jumbling to core.

A new compute_query_id GUC is also added, to control whether a query identifier
should be computed by the core.  It's thefore now possible to disable core
queryid computation and use pg_stat_statements with a different algorithm to
compute the query identifier by using third-party module.

To ensure that a single source of query identifier can be used and is well
defined, modules that calculate a query identifier should throw an error if
compute_query_id is enabled or if a query idenfitier was already calculated.
---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +----------------
 .../pg_stat_statements.conf                   |   1 +
 doc/src/sgml/config.sgml                      |  25 +
 doc/src/sgml/pgstatstatements.sgml            |  20 +-
 src/backend/parser/analyze.c                  |  14 +-
 src/backend/tcop/postgres.c                   |   6 +-
 src/backend/utils/misc/Makefile               |   1 +
 src/backend/utils/misc/guc.c                  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          | 834 ++++++++++++++++++
 src/include/parser/analyze.h                  |   4 +-
 src/include/utils/guc.h                       |   1 +
 src/include/utils/queryjumble.h               |  58 ++
 13 files changed, 995 insertions(+), 785 deletions(-)
 create mode 100644 src/backend/utils/misc/queryjumble.c
 create mode 100644 src/include/utils/queryjumble.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 62cccbfa44..bd8c96728c 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * Starting in Postgres 9.2, this module normalized query entries.  As of
+ * Postgres 14, the normalization is done by the core if compute_query_id is
+ * enabled, or optionally by third-party modules.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -114,8 +99,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
-#define JUMBLE_SIZE				1024	/* query serialization buffer size */
-
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -235,40 +218,6 @@ typedef struct pgssSharedState
 	pgssGlobalStats stats;		/* global statistics for pgss */
 } pgssSharedState;
 
-/*
- * Struct for tracking locations/lengths of constants during normalization
- */
-typedef struct pgssLocationLen
-{
-	int			location;		/* start offset in query text */
-	int			length;			/* length in bytes, or -1 to ignore */
-} pgssLocationLen;
-
-/*
- * Working state for computing a query jumble and producing a normalized
- * query string
- */
-typedef struct pgssJumbleState
-{
-	/* Jumble of current query tree */
-	unsigned char *jumble;
-
-	/* Number of bytes used in jumble[] */
-	Size		jumble_len;
-
-	/* Array of locations of constants that should be removed */
-	pgssLocationLen *clocations;
-
-	/* Allocated length of clocations array */
-	int			clocations_buf_size;
-
-	/* Current number of valid entries in clocations array */
-	int			clocations_count;
-
-	/* highest Param id we've seen, in order to start normalization correctly */
-	int			highest_extern_param_id;
-} pgssJumbleState;
-
 /*---- Local variables ----*/
 
 /* Current nesting depth of ExecutorRun+ProcessUtility calls */
@@ -342,7 +291,8 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_info);
 
 static void pgss_shmem_startup(void);
 static void pgss_shmem_shutdown(int code, Datum arg);
-static void pgss_post_parse_analyze(ParseState *pstate, Query *query);
+static void pgss_post_parse_analyze(ParseState *pstate, Query *query,
+									JumbleState *jstate);
 static PlannedStmt *pgss_planner(Query *parse,
 								 const char *query_string,
 								 int cursorOptions,
@@ -364,7 +314,7 @@ static void pgss_store(const char *query, uint64 queryId,
 					   double total_time, uint64 rows,
 					   const BufferUsage *bufusage,
 					   const WalUsage *walusage,
-					   pgssJumbleState *jstate);
+					   JumbleState *jstate);
 static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
 										pgssVersion api_version,
 										bool showtext);
@@ -380,16 +330,9 @@ static char *qtext_fetch(Size query_offset, int query_len,
 static bool need_gc_qtexts(void);
 static void gc_qtexts(void);
 static void entry_reset(Oid userid, Oid dbid, uint64 queryid);
-static void AppendJumble(pgssJumbleState *jstate,
-						 const unsigned char *item, Size size);
-static void JumbleQuery(pgssJumbleState *jstate, Query *query);
-static void JumbleRangeTable(pgssJumbleState *jstate, List *rtable);
-static void JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks);
-static void JumbleExpr(pgssJumbleState *jstate, Node *node);
-static void RecordConstLocation(pgssJumbleState *jstate, int location);
-static char *generate_normalized_query(pgssJumbleState *jstate, const char *query,
+static char *generate_normalized_query(JumbleState *jstate, const char *query,
 									   int query_loc, int *query_len_p);
-static void fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+static void fill_in_constant_lengths(JumbleState *jstate, const char *query,
 									 int query_loc);
 static int	comp_location(const void *a, const void *b);
 
@@ -851,15 +794,10 @@ error:
  * Post-parse-analysis hook: mark query with a queryId
  */
 static void
-pgss_post_parse_analyze(ParseState *pstate, Query *query)
+pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 {
-	pgssJumbleState jstate;
-
 	if (prev_post_parse_analyze_hook)
-		prev_post_parse_analyze_hook(pstate, query);
-
-	/* Assert we didn't do this already */
-	Assert(query->queryId == UINT64CONST(0));
+		prev_post_parse_analyze_hook(pstate, query, jstate);
 
 	/* Safety check... */
 	if (!pgss || !pgss_hash || !pgss_enabled(exec_nested_level))
@@ -879,35 +817,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 	}
 
-	/* Set up workspace for query jumbling */
-	jstate.jumble = (unsigned char *) palloc(JUMBLE_SIZE);
-	jstate.jumble_len = 0;
-	jstate.clocations_buf_size = 32;
-	jstate.clocations = (pgssLocationLen *)
-		palloc(jstate.clocations_buf_size * sizeof(pgssLocationLen));
-	jstate.clocations_count = 0;
-	jstate.highest_extern_param_id = 0;
-
-	/* Compute query ID and mark the Query node with it */
-	JumbleQuery(&jstate, query);
-	query->queryId =
-		DatumGetUInt64(hash_any_extended(jstate.jumble, jstate.jumble_len, 0));
-
 	/*
-	 * If we are unlucky enough to get a hash of zero, use 1 instead, to
-	 * prevent confusion with the utility-statement case.
+	 * If query jumbling were able to identify any ignorable constants, we
+	 * immediately create a hash table entry for the query, so that we can
+	 * record the normalized form of the query string.  If there were no such
+	 * constants, the normalized string would be the same as the query text
+	 * anyway, so there's no need for an early entry.
 	 */
-	if (query->queryId == UINT64CONST(0))
-		query->queryId = UINT64CONST(1);
-
-	/*
-	 * If we were able to identify any ignorable constants, we immediately
-	 * create a hash table entry for the query, so that we can record the
-	 * normalized form of the query string.  If there were no such constants,
-	 * the normalized string would be the same as the query text anyway, so
-	 * there's no need for an early entry.
-	 */
-	if (jstate.clocations_count > 0)
+	if (jstate && jstate->clocations_count > 0)
 		pgss_store(pstate->p_sourcetext,
 				   query->queryId,
 				   query->stmt_location,
@@ -917,7 +834,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 				   0,
 				   NULL,
 				   NULL,
-				   &jstate);
+				   jstate);
 }
 
 /*
@@ -1267,7 +1184,7 @@ pgss_store(const char *query, uint64 queryId,
 		   double total_time, uint64 rows,
 		   const BufferUsage *bufusage,
 		   const WalUsage *walusage,
-		   pgssJumbleState *jstate)
+		   JumbleState *jstate)
 {
 	pgssHashKey key;
 	pgssEntry  *entry;
@@ -2627,678 +2544,6 @@ release_lock:
 	LWLockRelease(pgss->lock);
 }
 
-/*
- * AppendJumble: Append a value that is substantive in a given query to
- * the current jumble.
- */
-static void
-AppendJumble(pgssJumbleState *jstate, const unsigned char *item, Size size)
-{
-	unsigned char *jumble = jstate->jumble;
-	Size		jumble_len = jstate->jumble_len;
-
-	/*
-	 * Whenever the jumble buffer is full, we hash the current contents and
-	 * reset the buffer to contain just that hash value, thus relying on the
-	 * hash to summarize everything so far.
-	 */
-	while (size > 0)
-	{
-		Size		part_size;
-
-		if (jumble_len >= JUMBLE_SIZE)
-		{
-			uint64		start_hash;
-
-			start_hash = DatumGetUInt64(hash_any_extended(jumble,
-														  JUMBLE_SIZE, 0));
-			memcpy(jumble, &start_hash, sizeof(start_hash));
-			jumble_len = sizeof(start_hash);
-		}
-		part_size = Min(size, JUMBLE_SIZE - jumble_len);
-		memcpy(jumble + jumble_len, item, part_size);
-		jumble_len += part_size;
-		item += part_size;
-		size -= part_size;
-	}
-	jstate->jumble_len = jumble_len;
-}
-
-/*
- * Wrappers around AppendJumble to encapsulate details of serialization
- * of individual local variable elements.
- */
-#define APP_JUMB(item) \
-	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
-#define APP_JUMB_STRING(str) \
-	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
-
-/*
- * JumbleQuery: Selectively serialize the query tree, appending significant
- * data to the "query jumble" while ignoring nonsignificant data.
- *
- * Rule of thumb for what to include is that we should ignore anything not
- * semantically significant (such as alias names) as well as anything that can
- * be deduced from child nodes (else we'd just be double-hashing that piece
- * of information).
- */
-static void
-JumbleQuery(pgssJumbleState *jstate, Query *query)
-{
-	Assert(IsA(query, Query));
-	Assert(query->utilityStmt == NULL);
-
-	APP_JUMB(query->commandType);
-	/* resultRelation is usually predictable from commandType */
-	JumbleExpr(jstate, (Node *) query->cteList);
-	JumbleRangeTable(jstate, query->rtable);
-	JumbleExpr(jstate, (Node *) query->jointree);
-	JumbleExpr(jstate, (Node *) query->targetList);
-	JumbleExpr(jstate, (Node *) query->onConflict);
-	JumbleExpr(jstate, (Node *) query->returningList);
-	JumbleExpr(jstate, (Node *) query->groupClause);
-	JumbleExpr(jstate, (Node *) query->groupingSets);
-	JumbleExpr(jstate, query->havingQual);
-	JumbleExpr(jstate, (Node *) query->windowClause);
-	JumbleExpr(jstate, (Node *) query->distinctClause);
-	JumbleExpr(jstate, (Node *) query->sortClause);
-	JumbleExpr(jstate, query->limitOffset);
-	JumbleExpr(jstate, query->limitCount);
-	JumbleRowMarks(jstate, query->rowMarks);
-	JumbleExpr(jstate, query->setOperations);
-}
-
-/*
- * Jumble a range table
- */
-static void
-JumbleRangeTable(pgssJumbleState *jstate, List *rtable)
-{
-	ListCell   *lc;
-
-	foreach(lc, rtable)
-	{
-		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
-
-		APP_JUMB(rte->rtekind);
-		switch (rte->rtekind)
-		{
-			case RTE_RELATION:
-				APP_JUMB(rte->relid);
-				JumbleExpr(jstate, (Node *) rte->tablesample);
-				break;
-			case RTE_SUBQUERY:
-				JumbleQuery(jstate, rte->subquery);
-				break;
-			case RTE_JOIN:
-				APP_JUMB(rte->jointype);
-				break;
-			case RTE_FUNCTION:
-				JumbleExpr(jstate, (Node *) rte->functions);
-				break;
-			case RTE_TABLEFUNC:
-				JumbleExpr(jstate, (Node *) rte->tablefunc);
-				break;
-			case RTE_VALUES:
-				JumbleExpr(jstate, (Node *) rte->values_lists);
-				break;
-			case RTE_CTE:
-
-				/*
-				 * Depending on the CTE name here isn't ideal, but it's the
-				 * only info we have to identify the referenced WITH item.
-				 */
-				APP_JUMB_STRING(rte->ctename);
-				APP_JUMB(rte->ctelevelsup);
-				break;
-			case RTE_NAMEDTUPLESTORE:
-				APP_JUMB_STRING(rte->enrname);
-				break;
-			case RTE_RESULT:
-				break;
-			default:
-				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
-				break;
-		}
-	}
-}
-
-/*
- * Jumble a rowMarks list
- */
-static void
-JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks)
-{
-	ListCell   *lc;
-
-	foreach(lc, rowMarks)
-	{
-		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
-
-		if (!rowmark->pushedDown)
-		{
-			APP_JUMB(rowmark->rti);
-			APP_JUMB(rowmark->strength);
-			APP_JUMB(rowmark->waitPolicy);
-		}
-	}
-}
-
-/*
- * Jumble an expression tree
- *
- * In general this function should handle all the same node types that
- * expression_tree_walker() does, and therefore it's coded to be as parallel
- * to that function as possible.  However, since we are only invoked on
- * queries immediately post-parse-analysis, we need not handle node types
- * that only appear in planning.
- *
- * Note: the reason we don't simply use expression_tree_walker() is that the
- * point of that function is to support tree walkers that don't care about
- * most tree node types, but here we care about all types.  We should complain
- * about any unrecognized node type.
- */
-static void
-JumbleExpr(pgssJumbleState *jstate, Node *node)
-{
-	ListCell   *temp;
-
-	if (node == NULL)
-		return;
-
-	/* Guard against stack overflow due to overly complex expressions */
-	check_stack_depth();
-
-	/*
-	 * We always emit the node's NodeTag, then any additional fields that are
-	 * considered significant, and then we recurse to any child nodes.
-	 */
-	APP_JUMB(node->type);
-
-	switch (nodeTag(node))
-	{
-		case T_Var:
-			{
-				Var		   *var = (Var *) node;
-
-				APP_JUMB(var->varno);
-				APP_JUMB(var->varattno);
-				APP_JUMB(var->varlevelsup);
-			}
-			break;
-		case T_Const:
-			{
-				Const	   *c = (Const *) node;
-
-				/* We jumble only the constant's type, not its value */
-				APP_JUMB(c->consttype);
-				/* Also, record its parse location for query normalization */
-				RecordConstLocation(jstate, c->location);
-			}
-			break;
-		case T_Param:
-			{
-				Param	   *p = (Param *) node;
-
-				APP_JUMB(p->paramkind);
-				APP_JUMB(p->paramid);
-				APP_JUMB(p->paramtype);
-				/* Also, track the highest external Param id */
-				if (p->paramkind == PARAM_EXTERN &&
-					p->paramid > jstate->highest_extern_param_id)
-					jstate->highest_extern_param_id = p->paramid;
-			}
-			break;
-		case T_Aggref:
-			{
-				Aggref	   *expr = (Aggref *) node;
-
-				APP_JUMB(expr->aggfnoid);
-				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggorder);
-				JumbleExpr(jstate, (Node *) expr->aggdistinct);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_GroupingFunc:
-			{
-				GroupingFunc *grpnode = (GroupingFunc *) node;
-
-				JumbleExpr(jstate, (Node *) grpnode->refs);
-			}
-			break;
-		case T_WindowFunc:
-			{
-				WindowFunc *expr = (WindowFunc *) node;
-
-				APP_JUMB(expr->winfnoid);
-				APP_JUMB(expr->winref);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_SubscriptingRef:
-			{
-				SubscriptingRef *sbsref = (SubscriptingRef *) node;
-
-				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
-			}
-			break;
-		case T_FuncExpr:
-			{
-				FuncExpr   *expr = (FuncExpr *) node;
-
-				APP_JUMB(expr->funcid);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_NamedArgExpr:
-			{
-				NamedArgExpr *nae = (NamedArgExpr *) node;
-
-				APP_JUMB(nae->argnumber);
-				JumbleExpr(jstate, (Node *) nae->arg);
-			}
-			break;
-		case T_OpExpr:
-		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
-		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
-			{
-				OpExpr	   *expr = (OpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_ScalarArrayOpExpr:
-			{
-				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				APP_JUMB(expr->useOr);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_BoolExpr:
-			{
-				BoolExpr   *expr = (BoolExpr *) node;
-
-				APP_JUMB(expr->boolop);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_SubLink:
-			{
-				SubLink    *sublink = (SubLink *) node;
-
-				APP_JUMB(sublink->subLinkType);
-				APP_JUMB(sublink->subLinkId);
-				JumbleExpr(jstate, (Node *) sublink->testexpr);
-				JumbleQuery(jstate, castNode(Query, sublink->subselect));
-			}
-			break;
-		case T_FieldSelect:
-			{
-				FieldSelect *fs = (FieldSelect *) node;
-
-				APP_JUMB(fs->fieldnum);
-				JumbleExpr(jstate, (Node *) fs->arg);
-			}
-			break;
-		case T_FieldStore:
-			{
-				FieldStore *fstore = (FieldStore *) node;
-
-				JumbleExpr(jstate, (Node *) fstore->arg);
-				JumbleExpr(jstate, (Node *) fstore->newvals);
-			}
-			break;
-		case T_RelabelType:
-			{
-				RelabelType *rt = (RelabelType *) node;
-
-				APP_JUMB(rt->resulttype);
-				JumbleExpr(jstate, (Node *) rt->arg);
-			}
-			break;
-		case T_CoerceViaIO:
-			{
-				CoerceViaIO *cio = (CoerceViaIO *) node;
-
-				APP_JUMB(cio->resulttype);
-				JumbleExpr(jstate, (Node *) cio->arg);
-			}
-			break;
-		case T_ArrayCoerceExpr:
-			{
-				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
-
-				APP_JUMB(acexpr->resulttype);
-				JumbleExpr(jstate, (Node *) acexpr->arg);
-				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
-			}
-			break;
-		case T_ConvertRowtypeExpr:
-			{
-				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
-
-				APP_JUMB(crexpr->resulttype);
-				JumbleExpr(jstate, (Node *) crexpr->arg);
-			}
-			break;
-		case T_CollateExpr:
-			{
-				CollateExpr *ce = (CollateExpr *) node;
-
-				APP_JUMB(ce->collOid);
-				JumbleExpr(jstate, (Node *) ce->arg);
-			}
-			break;
-		case T_CaseExpr:
-			{
-				CaseExpr   *caseexpr = (CaseExpr *) node;
-
-				JumbleExpr(jstate, (Node *) caseexpr->arg);
-				foreach(temp, caseexpr->args)
-				{
-					CaseWhen   *when = lfirst_node(CaseWhen, temp);
-
-					JumbleExpr(jstate, (Node *) when->expr);
-					JumbleExpr(jstate, (Node *) when->result);
-				}
-				JumbleExpr(jstate, (Node *) caseexpr->defresult);
-			}
-			break;
-		case T_CaseTestExpr:
-			{
-				CaseTestExpr *ct = (CaseTestExpr *) node;
-
-				APP_JUMB(ct->typeId);
-			}
-			break;
-		case T_ArrayExpr:
-			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
-			break;
-		case T_RowExpr:
-			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
-			break;
-		case T_RowCompareExpr:
-			{
-				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
-
-				APP_JUMB(rcexpr->rctype);
-				JumbleExpr(jstate, (Node *) rcexpr->largs);
-				JumbleExpr(jstate, (Node *) rcexpr->rargs);
-			}
-			break;
-		case T_CoalesceExpr:
-			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
-			break;
-		case T_MinMaxExpr:
-			{
-				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
-
-				APP_JUMB(mmexpr->op);
-				JumbleExpr(jstate, (Node *) mmexpr->args);
-			}
-			break;
-		case T_SQLValueFunction:
-			{
-				SQLValueFunction *svf = (SQLValueFunction *) node;
-
-				APP_JUMB(svf->op);
-				/* type is fully determined by op */
-				APP_JUMB(svf->typmod);
-			}
-			break;
-		case T_XmlExpr:
-			{
-				XmlExpr    *xexpr = (XmlExpr *) node;
-
-				APP_JUMB(xexpr->op);
-				JumbleExpr(jstate, (Node *) xexpr->named_args);
-				JumbleExpr(jstate, (Node *) xexpr->args);
-			}
-			break;
-		case T_NullTest:
-			{
-				NullTest   *nt = (NullTest *) node;
-
-				APP_JUMB(nt->nulltesttype);
-				JumbleExpr(jstate, (Node *) nt->arg);
-			}
-			break;
-		case T_BooleanTest:
-			{
-				BooleanTest *bt = (BooleanTest *) node;
-
-				APP_JUMB(bt->booltesttype);
-				JumbleExpr(jstate, (Node *) bt->arg);
-			}
-			break;
-		case T_CoerceToDomain:
-			{
-				CoerceToDomain *cd = (CoerceToDomain *) node;
-
-				APP_JUMB(cd->resulttype);
-				JumbleExpr(jstate, (Node *) cd->arg);
-			}
-			break;
-		case T_CoerceToDomainValue:
-			{
-				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
-
-				APP_JUMB(cdv->typeId);
-			}
-			break;
-		case T_SetToDefault:
-			{
-				SetToDefault *sd = (SetToDefault *) node;
-
-				APP_JUMB(sd->typeId);
-			}
-			break;
-		case T_CurrentOfExpr:
-			{
-				CurrentOfExpr *ce = (CurrentOfExpr *) node;
-
-				APP_JUMB(ce->cvarno);
-				if (ce->cursor_name)
-					APP_JUMB_STRING(ce->cursor_name);
-				APP_JUMB(ce->cursor_param);
-			}
-			break;
-		case T_NextValueExpr:
-			{
-				NextValueExpr *nve = (NextValueExpr *) node;
-
-				APP_JUMB(nve->seqid);
-				APP_JUMB(nve->typeId);
-			}
-			break;
-		case T_InferenceElem:
-			{
-				InferenceElem *ie = (InferenceElem *) node;
-
-				APP_JUMB(ie->infercollid);
-				APP_JUMB(ie->inferopclass);
-				JumbleExpr(jstate, ie->expr);
-			}
-			break;
-		case T_TargetEntry:
-			{
-				TargetEntry *tle = (TargetEntry *) node;
-
-				APP_JUMB(tle->resno);
-				APP_JUMB(tle->ressortgroupref);
-				JumbleExpr(jstate, (Node *) tle->expr);
-			}
-			break;
-		case T_RangeTblRef:
-			{
-				RangeTblRef *rtr = (RangeTblRef *) node;
-
-				APP_JUMB(rtr->rtindex);
-			}
-			break;
-		case T_JoinExpr:
-			{
-				JoinExpr   *join = (JoinExpr *) node;
-
-				APP_JUMB(join->jointype);
-				APP_JUMB(join->isNatural);
-				APP_JUMB(join->rtindex);
-				JumbleExpr(jstate, join->larg);
-				JumbleExpr(jstate, join->rarg);
-				JumbleExpr(jstate, join->quals);
-			}
-			break;
-		case T_FromExpr:
-			{
-				FromExpr   *from = (FromExpr *) node;
-
-				JumbleExpr(jstate, (Node *) from->fromlist);
-				JumbleExpr(jstate, from->quals);
-			}
-			break;
-		case T_OnConflictExpr:
-			{
-				OnConflictExpr *conf = (OnConflictExpr *) node;
-
-				APP_JUMB(conf->action);
-				JumbleExpr(jstate, (Node *) conf->arbiterElems);
-				JumbleExpr(jstate, conf->arbiterWhere);
-				JumbleExpr(jstate, (Node *) conf->onConflictSet);
-				JumbleExpr(jstate, conf->onConflictWhere);
-				APP_JUMB(conf->constraint);
-				APP_JUMB(conf->exclRelIndex);
-				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
-			}
-			break;
-		case T_List:
-			foreach(temp, (List *) node)
-			{
-				JumbleExpr(jstate, (Node *) lfirst(temp));
-			}
-			break;
-		case T_IntList:
-			foreach(temp, (List *) node)
-			{
-				APP_JUMB(lfirst_int(temp));
-			}
-			break;
-		case T_SortGroupClause:
-			{
-				SortGroupClause *sgc = (SortGroupClause *) node;
-
-				APP_JUMB(sgc->tleSortGroupRef);
-				APP_JUMB(sgc->eqop);
-				APP_JUMB(sgc->sortop);
-				APP_JUMB(sgc->nulls_first);
-			}
-			break;
-		case T_GroupingSet:
-			{
-				GroupingSet *gsnode = (GroupingSet *) node;
-
-				JumbleExpr(jstate, (Node *) gsnode->content);
-			}
-			break;
-		case T_WindowClause:
-			{
-				WindowClause *wc = (WindowClause *) node;
-
-				APP_JUMB(wc->winref);
-				APP_JUMB(wc->frameOptions);
-				JumbleExpr(jstate, (Node *) wc->partitionClause);
-				JumbleExpr(jstate, (Node *) wc->orderClause);
-				JumbleExpr(jstate, wc->startOffset);
-				JumbleExpr(jstate, wc->endOffset);
-			}
-			break;
-		case T_CommonTableExpr:
-			{
-				CommonTableExpr *cte = (CommonTableExpr *) node;
-
-				/* we store the string name because RTE_CTE RTEs need it */
-				APP_JUMB_STRING(cte->ctename);
-				APP_JUMB(cte->ctematerialized);
-				JumbleQuery(jstate, castNode(Query, cte->ctequery));
-			}
-			break;
-		case T_SetOperationStmt:
-			{
-				SetOperationStmt *setop = (SetOperationStmt *) node;
-
-				APP_JUMB(setop->op);
-				APP_JUMB(setop->all);
-				JumbleExpr(jstate, setop->larg);
-				JumbleExpr(jstate, setop->rarg);
-			}
-			break;
-		case T_RangeTblFunction:
-			{
-				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
-
-				JumbleExpr(jstate, rtfunc->funcexpr);
-			}
-			break;
-		case T_TableFunc:
-			{
-				TableFunc  *tablefunc = (TableFunc *) node;
-
-				JumbleExpr(jstate, tablefunc->docexpr);
-				JumbleExpr(jstate, tablefunc->rowexpr);
-				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
-			}
-			break;
-		case T_TableSampleClause:
-			{
-				TableSampleClause *tsc = (TableSampleClause *) node;
-
-				APP_JUMB(tsc->tsmhandler);
-				JumbleExpr(jstate, (Node *) tsc->args);
-				JumbleExpr(jstate, (Node *) tsc->repeatable);
-			}
-			break;
-		default:
-			/* Only a warning, since we can stumble along anyway */
-			elog(WARNING, "unrecognized node type: %d",
-				 (int) nodeTag(node));
-			break;
-	}
-}
-
-/*
- * Record location of constant within query string of query tree
- * that is currently being walked.
- */
-static void
-RecordConstLocation(pgssJumbleState *jstate, int location)
-{
-	/* -1 indicates unknown or undefined location */
-	if (location >= 0)
-	{
-		/* enlarge array if needed */
-		if (jstate->clocations_count >= jstate->clocations_buf_size)
-		{
-			jstate->clocations_buf_size *= 2;
-			jstate->clocations = (pgssLocationLen *)
-				repalloc(jstate->clocations,
-						 jstate->clocations_buf_size *
-						 sizeof(pgssLocationLen));
-		}
-		jstate->clocations[jstate->clocations_count].location = location;
-		/* initialize lengths to -1 to simplify fill_in_constant_lengths */
-		jstate->clocations[jstate->clocations_count].length = -1;
-		jstate->clocations_count++;
-	}
-}
-
 /*
  * Generate a normalized version of the query string that will be used to
  * represent all similar queries.
@@ -3319,7 +2564,7 @@ RecordConstLocation(pgssJumbleState *jstate, int location)
  * Returns a palloc'd string.
  */
 static char *
-generate_normalized_query(pgssJumbleState *jstate, const char *query,
+generate_normalized_query(JumbleState *jstate, const char *query,
 						  int query_loc, int *query_len_p)
 {
 	char	   *norm_query;
@@ -3426,10 +2671,10 @@ generate_normalized_query(pgssJumbleState *jstate, const char *query,
  * reason for a constant to start with a '-'.
  */
 static void
-fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+fill_in_constant_lengths(JumbleState *jstate, const char *query,
 						 int query_loc)
 {
-	pgssLocationLen *locs;
+	LocationLen *locs;
 	core_yyscan_t yyscanner;
 	core_yy_extra_type yyextra;
 	core_YYSTYPE yylval;
@@ -3443,7 +2688,7 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 	 */
 	if (jstate->clocations_count > 1)
 		qsort(jstate->clocations, jstate->clocations_count,
-			  sizeof(pgssLocationLen), comp_location);
+			  sizeof(LocationLen), comp_location);
 	locs = jstate->clocations;
 
 	/* initialize the flex scanner --- should match raw_parser() */
@@ -3523,13 +2768,13 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 }
 
 /*
- * comp_location: comparator for qsorting pgssLocationLen structs by location
+ * comp_location: comparator for qsorting LocationLen structs by location
  */
 static int
 comp_location(const void *a, const void *b)
 {
-	int			l = ((const pgssLocationLen *) a)->location;
-	int			r = ((const pgssLocationLen *) b)->location;
+	int			l = ((const LocationLen *) a)->location;
+	int			r = ((const LocationLen *) b)->location;
 
 	if (l < r)
 		return -1;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.conf b/contrib/pg_stat_statements/pg_stat_statements.conf
index 13346e2807..e47b26040f 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.conf
+++ b/contrib/pg_stat_statements/pg_stat_statements.conf
@@ -1 +1,2 @@
 shared_preload_libraries = 'pg_stat_statements'
+compute_query_id = on
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 1f0e0fc1fb..d1aa746224 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7538,6 +7538,31 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
      <title>Statistics Monitoring</title>
      <variablelist>
 
+     <varlistentry id="guc-compute-query-id" xreflabel="compute_query_id">
+      <term><varname>compute_query_id</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>compute_query_id</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables in-core computation of a query identifier.  The <xref
+        linkend="pgstatstatements"/> extension requires a query identifier
+        to be computed.  Note that an external module can alternatively
+        be used if the in-core query identifier computation method
+        isn't acceptable.  In this case, in-core computation should
+        remain disabled.  The default is <literal>off</literal>.
+       </para>
+       <note>
+        <para>
+         To ensure that a only one query identifier is calculated and
+         displayed, extensions that calculate query identifiers should
+         throw an error if a query identifier has already been computed.
+        </para>
+       </note>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><varname>log_statement_stats</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 464bf0e5ae..3ca292d71f 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -20,6 +20,14 @@
   This means that a server restart is needed to add or remove the module.
  </para>
 
+ <para>
+  The module will not track statistics unless query
+  identifiers are calculated.  This can be done by enabling <xref
+  linkend="guc-compute-query-id"/> or using a third-party module that
+  computes its own query identifiers.  Note that all statistics tracked
+  by this module must be reset if the query identifier method is changed.
+ </para>
+
  <para>
    When <filename>pg_stat_statements</filename> is loaded, it tracks
    statistics across all databases of the server.  To access and manipulate
@@ -84,7 +92,7 @@
        <structfield>queryid</structfield> <type>bigint</type>
       </para>
       <para>
-       Internal hash code, computed from the statement's parse tree
+       Hash code to identify identical normalized queries.
       </para></entry>
      </row>
 
@@ -386,6 +394,16 @@
    are compared strictly on the basis of their textual query strings, however.
   </para>
 
+  <note>
+   <para>
+    The following details about constant replacement and
+    <structfield>queryid</structfield> only applies when <xref
+    linkend="guc-compute-query-id"/> is enabled.  If you use an external
+    module instead to compute <structfield>queryid</structfield>, you
+    should refer to its documentation for details.
+   </para>
+  </note>
+
   <para>
    When a constant's value has been ignored for purposes of matching the query
    to other queries, the constant is replaced by a parameter symbol, such
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 7149724953..c565c80365 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -46,6 +46,8 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/queryjumble.h"
 #include "utils/rel.h"
 
 
@@ -107,6 +109,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -119,8 +122,11 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	query = transformTopLevelStmt(pstate, parseTree);
 
+	if (compute_query_id)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
@@ -140,6 +146,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -152,8 +159,11 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 	/* make sure all is well with parameter types */
 	check_variable_parameters(pstate, query);
 
+	if (compute_query_id)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2b1b68109f..7e034b72b1 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -665,6 +665,7 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 	ParseState *pstate;
 	Query	   *query;
 	List	   *querytree_list;
+	JumbleState *jstate = NULL;
 
 	Assert(query_string != NULL);	/* required as of 8.4 */
 
@@ -683,8 +684,11 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	query = transformTopLevelStmt(pstate, parsetree);
 
+	if (compute_query_id)
+		jstate = JumbleQuery(query, query_string);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 2397fc2453..1d5327cf64 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -22,6 +22,7 @@ OBJS = \
 	pg_rusage.o \
 	ps_status.o \
 	queryenvironment.o \
+	queryjumble.o \
 	rls.o \
 	sampling.o \
 	superuser.o \
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 8bfaa53541..87287ac13e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -521,6 +521,7 @@ extern const struct config_enum_entry dynamic_shared_memory_options[];
 /*
  * GUC option variables that are exported from this module
  */
+bool		compute_query_id = false;
 bool		log_duration = false;
 bool		Debug_print_plan = false;
 bool		Debug_print_parse = false;
@@ -1425,6 +1426,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"compute_query_id", PGC_SUSET, STATS_MONITORING,
+			gettext_noop("Compute query identifiers."),
+			NULL
+		},
+		&compute_query_id,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"log_parser_stats", PGC_SUSET, STATS_MONITORING,
 			gettext_noop("Writes parser performance statistics to the server log."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 74b416b74a..aa34c99f0c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -594,6 +594,7 @@
 
 # - Monitoring -
 
+#compute_query_id = off
 #log_parser_stats = off
 #log_planner_stats = off
 #log_executor_stats = off
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
new file mode 100644
index 0000000000..2a47688fd6
--- /dev/null
+++ b/src/backend/utils/misc/queryjumble.c
@@ -0,0 +1,834 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.c
+ *	 Query normalization and fingerprinting.
+ *
+ * Normalization is a process whereby similar queries, typically differing only
+ * in their constants (though the exact rules are somewhat more subtle than
+ * that) are recognized as equivalent, and are tracked as a single entry.  This
+ * is particularly useful for non-prepared queries.
+ *
+ * Normalization is implemented by fingerprinting queries, selectively
+ * serializing those fields of each query tree's nodes that are judged to be
+ * essential to the query.  This is referred to as a query jumble.  This is
+ * distinct from a regular serialization in that various extraneous
+ * information is ignored as irrelevant or not essential to the query, such
+ * as the collations of Vars and, most notably, the values of constants.
+ *
+ * This jumble is acquired at the end of parse analysis of each query, and
+ * a 64-bit hash of it is stored into the query's Query.queryId field.
+ * The server then copies this value around, making it available in plan
+ * tree(s) generated from the query.  The executor can then use this value
+ * to blame query costs on the proper queryId.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/misc/queryjumble.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "common/hashfn.h"
+#include "miscadmin.h"
+#include "parser/scansup.h"
+#include "utils/queryjumble.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+static uint64 compute_utility_queryid(const char *str, int query_len);
+static void AppendJumble(JumbleState *jstate,
+						 const unsigned char *item, Size size);
+static void JumbleQueryInternal(JumbleState *jstate, Query *query);
+static void JumbleRangeTable(JumbleState *jstate, List *rtable);
+static void JumbleRowMarks(JumbleState *jstate, List *rowMarks);
+static void JumbleExpr(JumbleState *jstate, Node *node);
+static void RecordConstLocation(JumbleState *jstate, int location);
+
+/*
+ * Given a possibly multi-statement source string, confine our attention to the
+ * relevant part of the string.
+ */
+const char *
+CleanQuerytext(const char *query, int *location, int *len)
+{
+	int query_location = *location;
+	int query_len = *len;
+
+	/* First apply starting offset, unless it's -1 (unknown). */
+	if (query_location >= 0)
+	{
+		Assert(query_location <= strlen(query));
+		query += query_location;
+		/* Length of 0 (or -1) means "rest of string" */
+		if (query_len <= 0)
+			query_len = strlen(query);
+		else
+			Assert(query_len <= strlen(query));
+	}
+	else
+	{
+		/* If query location is unknown, distrust query_len as well */
+		query_location = 0;
+		query_len = strlen(query);
+	}
+
+	/*
+	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
+	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 */
+	while (query_len > 0 && scanner_isspace(query[0]))
+		query++, query_location++, query_len--;
+	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
+		query_len--;
+
+	*location = query_location;
+	*len = query_len;
+
+	return query;
+}
+
+JumbleState *
+JumbleQuery(Query *query, const char *querytext)
+{
+	JumbleState *jstate = NULL;
+	if (query->utilityStmt)
+	{
+		const char *sql;
+		int query_location = query->stmt_location;
+		int query_len = query->stmt_len;
+
+		/*
+		 * Confine our attention to the relevant part of the string, if the
+		 * query is a portion of a multi-statement source string.
+		 */
+		sql = CleanQuerytext(querytext, &query_location, &query_len);
+
+		query->queryId = compute_utility_queryid(sql, query_len);
+	}
+	else
+	{
+		jstate = (JumbleState *) palloc(sizeof(JumbleState));
+
+		/* Set up workspace for query jumbling */
+		jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
+		jstate->jumble_len = 0;
+		jstate->clocations_buf_size = 32;
+		jstate->clocations = (LocationLen *)
+			palloc(jstate->clocations_buf_size * sizeof(LocationLen));
+		jstate->clocations_count = 0;
+		jstate->highest_extern_param_id = 0;
+
+		/* Compute query ID and mark the Query node with it */
+		JumbleQueryInternal(jstate, query);
+		query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
+														  jstate->jumble_len,
+														  0));
+
+		/*
+		 * If we are unlucky enough to get a hash of zero, use 1 instead, to
+		 * prevent confusion with the utility-statement case.
+		 */
+		if (query->queryId == UINT64CONST(0))
+			query->queryId = UINT64CONST(1);
+	}
+
+	return jstate;
+}
+
+/*
+ * Compute a query identifier for the given utility query string.
+ */
+static uint64
+compute_utility_queryid(const char *str, int query_len)
+{
+	uint64 queryId;
+
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+											   query_len, 0));
+
+	/*
+	 * If we are unlucky enough to get a hash of zero(invalid), use
+	 * queryID as 2 instead, queryID 1 is already in use for normal
+	 * statements.
+	 */
+	if (queryId == UINT64CONST(0))
+		queryId = UINT64CONST(2);
+
+	return queryId;
+}
+
+/*
+ * AppendJumble: Append a value that is substantive in a given query to
+ * the current jumble.
+ */
+static void
+AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
+{
+	unsigned char *jumble = jstate->jumble;
+	Size		jumble_len = jstate->jumble_len;
+
+	/*
+	 * Whenever the jumble buffer is full, we hash the current contents and
+	 * reset the buffer to contain just that hash value, thus relying on the
+	 * hash to summarize everything so far.
+	 */
+	while (size > 0)
+	{
+		Size		part_size;
+
+		if (jumble_len >= JUMBLE_SIZE)
+		{
+			uint64		start_hash;
+
+			start_hash = DatumGetUInt64(hash_any_extended(jumble,
+														  JUMBLE_SIZE, 0));
+			memcpy(jumble, &start_hash, sizeof(start_hash));
+			jumble_len = sizeof(start_hash);
+		}
+		part_size = Min(size, JUMBLE_SIZE - jumble_len);
+		memcpy(jumble + jumble_len, item, part_size);
+		jumble_len += part_size;
+		item += part_size;
+		size -= part_size;
+	}
+	jstate->jumble_len = jumble_len;
+}
+
+/*
+ * Wrappers around AppendJumble to encapsulate details of serialization
+ * of individual local variable elements.
+ */
+#define APP_JUMB(item) \
+	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
+#define APP_JUMB_STRING(str) \
+	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
+
+/*
+ * JumbleQueryInternal: Selectively serialize the query tree, appending
+ * significant data to the "query jumble" while ignoring nonsignificant data.
+ *
+ * Rule of thumb for what to include is that we should ignore anything not
+ * semantically significant (such as alias names) as well as anything that can
+ * be deduced from child nodes (else we'd just be double-hashing that piece
+ * of information).
+ */
+static void
+JumbleQueryInternal(JumbleState *jstate, Query *query)
+{
+	Assert(IsA(query, Query));
+	Assert(query->utilityStmt == NULL);
+
+	APP_JUMB(query->commandType);
+	/* resultRelation is usually predictable from commandType */
+	JumbleExpr(jstate, (Node *) query->cteList);
+	JumbleRangeTable(jstate, query->rtable);
+	JumbleExpr(jstate, (Node *) query->jointree);
+	JumbleExpr(jstate, (Node *) query->targetList);
+	JumbleExpr(jstate, (Node *) query->onConflict);
+	JumbleExpr(jstate, (Node *) query->returningList);
+	JumbleExpr(jstate, (Node *) query->groupClause);
+	JumbleExpr(jstate, (Node *) query->groupingSets);
+	JumbleExpr(jstate, query->havingQual);
+	JumbleExpr(jstate, (Node *) query->windowClause);
+	JumbleExpr(jstate, (Node *) query->distinctClause);
+	JumbleExpr(jstate, (Node *) query->sortClause);
+	JumbleExpr(jstate, query->limitOffset);
+	JumbleExpr(jstate, query->limitCount);
+	JumbleRowMarks(jstate, query->rowMarks);
+	JumbleExpr(jstate, query->setOperations);
+}
+
+/*
+ * Jumble a range table
+ */
+static void
+JumbleRangeTable(JumbleState *jstate, List *rtable)
+{
+	ListCell   *lc;
+
+	foreach(lc, rtable)
+	{
+		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+		APP_JUMB(rte->rtekind);
+		switch (rte->rtekind)
+		{
+			case RTE_RELATION:
+				APP_JUMB(rte->relid);
+				JumbleExpr(jstate, (Node *) rte->tablesample);
+				break;
+			case RTE_SUBQUERY:
+				JumbleQueryInternal(jstate, rte->subquery);
+				break;
+			case RTE_JOIN:
+				APP_JUMB(rte->jointype);
+				break;
+			case RTE_FUNCTION:
+				JumbleExpr(jstate, (Node *) rte->functions);
+				break;
+			case RTE_TABLEFUNC:
+				JumbleExpr(jstate, (Node *) rte->tablefunc);
+				break;
+			case RTE_VALUES:
+				JumbleExpr(jstate, (Node *) rte->values_lists);
+				break;
+			case RTE_CTE:
+
+				/*
+				 * Depending on the CTE name here isn't ideal, but it's the
+				 * only info we have to identify the referenced WITH item.
+				 */
+				APP_JUMB_STRING(rte->ctename);
+				APP_JUMB(rte->ctelevelsup);
+				break;
+			case RTE_NAMEDTUPLESTORE:
+				APP_JUMB_STRING(rte->enrname);
+				break;
+			case RTE_RESULT:
+				break;
+			default:
+				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
+				break;
+		}
+	}
+}
+
+/*
+ * Jumble a rowMarks list
+ */
+static void
+JumbleRowMarks(JumbleState *jstate, List *rowMarks)
+{
+	ListCell   *lc;
+
+	foreach(lc, rowMarks)
+	{
+		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
+
+		if (!rowmark->pushedDown)
+		{
+			APP_JUMB(rowmark->rti);
+			APP_JUMB(rowmark->strength);
+			APP_JUMB(rowmark->waitPolicy);
+		}
+	}
+}
+
+/*
+ * Jumble an expression tree
+ *
+ * In general this function should handle all the same node types that
+ * expression_tree_walker() does, and therefore it's coded to be as parallel
+ * to that function as possible.  However, since we are only invoked on
+ * queries immediately post-parse-analysis, we need not handle node types
+ * that only appear in planning.
+ *
+ * Note: the reason we don't simply use expression_tree_walker() is that the
+ * point of that function is to support tree walkers that don't care about
+ * most tree node types, but here we care about all types.  We should complain
+ * about any unrecognized node type.
+ */
+static void
+JumbleExpr(JumbleState *jstate, Node *node)
+{
+	ListCell   *temp;
+
+	if (node == NULL)
+		return;
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+
+	/*
+	 * We always emit the node's NodeTag, then any additional fields that are
+	 * considered significant, and then we recurse to any child nodes.
+	 */
+	APP_JUMB(node->type);
+
+	switch (nodeTag(node))
+	{
+		case T_Var:
+			{
+				Var		   *var = (Var *) node;
+
+				APP_JUMB(var->varno);
+				APP_JUMB(var->varattno);
+				APP_JUMB(var->varlevelsup);
+			}
+			break;
+		case T_Const:
+			{
+				Const	   *c = (Const *) node;
+
+				/* We jumble only the constant's type, not its value */
+				APP_JUMB(c->consttype);
+				/* Also, record its parse location for query normalization */
+				RecordConstLocation(jstate, c->location);
+			}
+			break;
+		case T_Param:
+			{
+				Param	   *p = (Param *) node;
+
+				APP_JUMB(p->paramkind);
+				APP_JUMB(p->paramid);
+				APP_JUMB(p->paramtype);
+				/* Also, track the highest external Param id */
+				if (p->paramkind == PARAM_EXTERN &&
+					p->paramid > jstate->highest_extern_param_id)
+					jstate->highest_extern_param_id = p->paramid;
+			}
+			break;
+		case T_Aggref:
+			{
+				Aggref	   *expr = (Aggref *) node;
+
+				APP_JUMB(expr->aggfnoid);
+				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggorder);
+				JumbleExpr(jstate, (Node *) expr->aggdistinct);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_GroupingFunc:
+			{
+				GroupingFunc *grpnode = (GroupingFunc *) node;
+
+				JumbleExpr(jstate, (Node *) grpnode->refs);
+			}
+			break;
+		case T_WindowFunc:
+			{
+				WindowFunc *expr = (WindowFunc *) node;
+
+				APP_JUMB(expr->winfnoid);
+				APP_JUMB(expr->winref);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_SubscriptingRef:
+			{
+				SubscriptingRef *sbsref = (SubscriptingRef *) node;
+
+				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
+			}
+			break;
+		case T_FuncExpr:
+			{
+				FuncExpr   *expr = (FuncExpr *) node;
+
+				APP_JUMB(expr->funcid);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_NamedArgExpr:
+			{
+				NamedArgExpr *nae = (NamedArgExpr *) node;
+
+				APP_JUMB(nae->argnumber);
+				JumbleExpr(jstate, (Node *) nae->arg);
+			}
+			break;
+		case T_OpExpr:
+		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
+		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
+			{
+				OpExpr	   *expr = (OpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_ScalarArrayOpExpr:
+			{
+				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				APP_JUMB(expr->useOr);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_BoolExpr:
+			{
+				BoolExpr   *expr = (BoolExpr *) node;
+
+				APP_JUMB(expr->boolop);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_SubLink:
+			{
+				SubLink    *sublink = (SubLink *) node;
+
+				APP_JUMB(sublink->subLinkType);
+				APP_JUMB(sublink->subLinkId);
+				JumbleExpr(jstate, (Node *) sublink->testexpr);
+				JumbleQueryInternal(jstate, castNode(Query, sublink->subselect));
+			}
+			break;
+		case T_FieldSelect:
+			{
+				FieldSelect *fs = (FieldSelect *) node;
+
+				APP_JUMB(fs->fieldnum);
+				JumbleExpr(jstate, (Node *) fs->arg);
+			}
+			break;
+		case T_FieldStore:
+			{
+				FieldStore *fstore = (FieldStore *) node;
+
+				JumbleExpr(jstate, (Node *) fstore->arg);
+				JumbleExpr(jstate, (Node *) fstore->newvals);
+			}
+			break;
+		case T_RelabelType:
+			{
+				RelabelType *rt = (RelabelType *) node;
+
+				APP_JUMB(rt->resulttype);
+				JumbleExpr(jstate, (Node *) rt->arg);
+			}
+			break;
+		case T_CoerceViaIO:
+			{
+				CoerceViaIO *cio = (CoerceViaIO *) node;
+
+				APP_JUMB(cio->resulttype);
+				JumbleExpr(jstate, (Node *) cio->arg);
+			}
+			break;
+		case T_ArrayCoerceExpr:
+			{
+				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
+
+				APP_JUMB(acexpr->resulttype);
+				JumbleExpr(jstate, (Node *) acexpr->arg);
+				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
+			}
+			break;
+		case T_ConvertRowtypeExpr:
+			{
+				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
+
+				APP_JUMB(crexpr->resulttype);
+				JumbleExpr(jstate, (Node *) crexpr->arg);
+			}
+			break;
+		case T_CollateExpr:
+			{
+				CollateExpr *ce = (CollateExpr *) node;
+
+				APP_JUMB(ce->collOid);
+				JumbleExpr(jstate, (Node *) ce->arg);
+			}
+			break;
+		case T_CaseExpr:
+			{
+				CaseExpr   *caseexpr = (CaseExpr *) node;
+
+				JumbleExpr(jstate, (Node *) caseexpr->arg);
+				foreach(temp, caseexpr->args)
+				{
+					CaseWhen   *when = lfirst_node(CaseWhen, temp);
+
+					JumbleExpr(jstate, (Node *) when->expr);
+					JumbleExpr(jstate, (Node *) when->result);
+				}
+				JumbleExpr(jstate, (Node *) caseexpr->defresult);
+			}
+			break;
+		case T_CaseTestExpr:
+			{
+				CaseTestExpr *ct = (CaseTestExpr *) node;
+
+				APP_JUMB(ct->typeId);
+			}
+			break;
+		case T_ArrayExpr:
+			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
+			break;
+		case T_RowExpr:
+			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
+			break;
+		case T_RowCompareExpr:
+			{
+				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
+
+				APP_JUMB(rcexpr->rctype);
+				JumbleExpr(jstate, (Node *) rcexpr->largs);
+				JumbleExpr(jstate, (Node *) rcexpr->rargs);
+			}
+			break;
+		case T_CoalesceExpr:
+			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
+			break;
+		case T_MinMaxExpr:
+			{
+				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
+
+				APP_JUMB(mmexpr->op);
+				JumbleExpr(jstate, (Node *) mmexpr->args);
+			}
+			break;
+		case T_SQLValueFunction:
+			{
+				SQLValueFunction *svf = (SQLValueFunction *) node;
+
+				APP_JUMB(svf->op);
+				/* type is fully determined by op */
+				APP_JUMB(svf->typmod);
+			}
+			break;
+		case T_XmlExpr:
+			{
+				XmlExpr    *xexpr = (XmlExpr *) node;
+
+				APP_JUMB(xexpr->op);
+				JumbleExpr(jstate, (Node *) xexpr->named_args);
+				JumbleExpr(jstate, (Node *) xexpr->args);
+			}
+			break;
+		case T_NullTest:
+			{
+				NullTest   *nt = (NullTest *) node;
+
+				APP_JUMB(nt->nulltesttype);
+				JumbleExpr(jstate, (Node *) nt->arg);
+			}
+			break;
+		case T_BooleanTest:
+			{
+				BooleanTest *bt = (BooleanTest *) node;
+
+				APP_JUMB(bt->booltesttype);
+				JumbleExpr(jstate, (Node *) bt->arg);
+			}
+			break;
+		case T_CoerceToDomain:
+			{
+				CoerceToDomain *cd = (CoerceToDomain *) node;
+
+				APP_JUMB(cd->resulttype);
+				JumbleExpr(jstate, (Node *) cd->arg);
+			}
+			break;
+		case T_CoerceToDomainValue:
+			{
+				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
+
+				APP_JUMB(cdv->typeId);
+			}
+			break;
+		case T_SetToDefault:
+			{
+				SetToDefault *sd = (SetToDefault *) node;
+
+				APP_JUMB(sd->typeId);
+			}
+			break;
+		case T_CurrentOfExpr:
+			{
+				CurrentOfExpr *ce = (CurrentOfExpr *) node;
+
+				APP_JUMB(ce->cvarno);
+				if (ce->cursor_name)
+					APP_JUMB_STRING(ce->cursor_name);
+				APP_JUMB(ce->cursor_param);
+			}
+			break;
+		case T_NextValueExpr:
+			{
+				NextValueExpr *nve = (NextValueExpr *) node;
+
+				APP_JUMB(nve->seqid);
+				APP_JUMB(nve->typeId);
+			}
+			break;
+		case T_InferenceElem:
+			{
+				InferenceElem *ie = (InferenceElem *) node;
+
+				APP_JUMB(ie->infercollid);
+				APP_JUMB(ie->inferopclass);
+				JumbleExpr(jstate, ie->expr);
+			}
+			break;
+		case T_TargetEntry:
+			{
+				TargetEntry *tle = (TargetEntry *) node;
+
+				APP_JUMB(tle->resno);
+				APP_JUMB(tle->ressortgroupref);
+				JumbleExpr(jstate, (Node *) tle->expr);
+			}
+			break;
+		case T_RangeTblRef:
+			{
+				RangeTblRef *rtr = (RangeTblRef *) node;
+
+				APP_JUMB(rtr->rtindex);
+			}
+			break;
+		case T_JoinExpr:
+			{
+				JoinExpr   *join = (JoinExpr *) node;
+
+				APP_JUMB(join->jointype);
+				APP_JUMB(join->isNatural);
+				APP_JUMB(join->rtindex);
+				JumbleExpr(jstate, join->larg);
+				JumbleExpr(jstate, join->rarg);
+				JumbleExpr(jstate, join->quals);
+			}
+			break;
+		case T_FromExpr:
+			{
+				FromExpr   *from = (FromExpr *) node;
+
+				JumbleExpr(jstate, (Node *) from->fromlist);
+				JumbleExpr(jstate, from->quals);
+			}
+			break;
+		case T_OnConflictExpr:
+			{
+				OnConflictExpr *conf = (OnConflictExpr *) node;
+
+				APP_JUMB(conf->action);
+				JumbleExpr(jstate, (Node *) conf->arbiterElems);
+				JumbleExpr(jstate, conf->arbiterWhere);
+				JumbleExpr(jstate, (Node *) conf->onConflictSet);
+				JumbleExpr(jstate, conf->onConflictWhere);
+				APP_JUMB(conf->constraint);
+				APP_JUMB(conf->exclRelIndex);
+				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
+			}
+			break;
+		case T_List:
+			foreach(temp, (List *) node)
+			{
+				JumbleExpr(jstate, (Node *) lfirst(temp));
+			}
+			break;
+		case T_IntList:
+			foreach(temp, (List *) node)
+			{
+				APP_JUMB(lfirst_int(temp));
+			}
+			break;
+		case T_SortGroupClause:
+			{
+				SortGroupClause *sgc = (SortGroupClause *) node;
+
+				APP_JUMB(sgc->tleSortGroupRef);
+				APP_JUMB(sgc->eqop);
+				APP_JUMB(sgc->sortop);
+				APP_JUMB(sgc->nulls_first);
+			}
+			break;
+		case T_GroupingSet:
+			{
+				GroupingSet *gsnode = (GroupingSet *) node;
+
+				JumbleExpr(jstate, (Node *) gsnode->content);
+			}
+			break;
+		case T_WindowClause:
+			{
+				WindowClause *wc = (WindowClause *) node;
+
+				APP_JUMB(wc->winref);
+				APP_JUMB(wc->frameOptions);
+				JumbleExpr(jstate, (Node *) wc->partitionClause);
+				JumbleExpr(jstate, (Node *) wc->orderClause);
+				JumbleExpr(jstate, wc->startOffset);
+				JumbleExpr(jstate, wc->endOffset);
+			}
+			break;
+		case T_CommonTableExpr:
+			{
+				CommonTableExpr *cte = (CommonTableExpr *) node;
+
+				/* we store the string name because RTE_CTE RTEs need it */
+				APP_JUMB_STRING(cte->ctename);
+				APP_JUMB(cte->ctematerialized);
+				JumbleQueryInternal(jstate, castNode(Query, cte->ctequery));
+			}
+			break;
+		case T_SetOperationStmt:
+			{
+				SetOperationStmt *setop = (SetOperationStmt *) node;
+
+				APP_JUMB(setop->op);
+				APP_JUMB(setop->all);
+				JumbleExpr(jstate, setop->larg);
+				JumbleExpr(jstate, setop->rarg);
+			}
+			break;
+		case T_RangeTblFunction:
+			{
+				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
+
+				JumbleExpr(jstate, rtfunc->funcexpr);
+			}
+			break;
+		case T_TableFunc:
+			{
+				TableFunc  *tablefunc = (TableFunc *) node;
+
+				JumbleExpr(jstate, tablefunc->docexpr);
+				JumbleExpr(jstate, tablefunc->rowexpr);
+				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
+			}
+			break;
+		case T_TableSampleClause:
+			{
+				TableSampleClause *tsc = (TableSampleClause *) node;
+
+				APP_JUMB(tsc->tsmhandler);
+				JumbleExpr(jstate, (Node *) tsc->args);
+				JumbleExpr(jstate, (Node *) tsc->repeatable);
+			}
+			break;
+		default:
+			/* Only a warning, since we can stumble along anyway */
+			elog(WARNING, "unrecognized node type: %d",
+				 (int) nodeTag(node));
+			break;
+	}
+}
+
+/*
+ * Record location of constant within query string of query tree
+ * that is currently being walked.
+ */
+static void
+RecordConstLocation(JumbleState *jstate, int location)
+{
+	/* -1 indicates unknown or undefined location */
+	if (location >= 0)
+	{
+		/* enlarge array if needed */
+		if (jstate->clocations_count >= jstate->clocations_buf_size)
+		{
+			jstate->clocations_buf_size *= 2;
+			jstate->clocations = (LocationLen *)
+				repalloc(jstate->clocations,
+						 jstate->clocations_buf_size *
+						 sizeof(LocationLen));
+		}
+		jstate->clocations[jstate->clocations_count].location = location;
+		/* initialize lengths to -1 to simplify third-party module usage */
+		jstate->clocations[jstate->clocations_count].length = -1;
+		jstate->clocations_count++;
+	}
+}
diff --git a/src/include/parser/analyze.h b/src/include/parser/analyze.h
index 4a3c9686f9..6716db6c13 100644
--- a/src/include/parser/analyze.h
+++ b/src/include/parser/analyze.h
@@ -15,10 +15,12 @@
 #define ANALYZE_H
 
 #include "parser/parse_node.h"
+#include "utils/queryjumble.h"
 
 /* Hook for plugins to get control at end of parse analysis */
 typedef void (*post_parse_analyze_hook_type) (ParseState *pstate,
-											  Query *query);
+											  Query *query,
+											  JumbleState *jstate);
 extern PGDLLIMPORT post_parse_analyze_hook_type post_parse_analyze_hook;
 
 
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 5004ee4177..9b6552b25b 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -248,6 +248,7 @@ extern bool log_btree_build_stats;
 extern PGDLLIMPORT bool check_function_bodies;
 extern bool session_auth_is_superuser;
 
+extern bool compute_query_id;
 extern bool log_duration;
 extern int	log_parameter_max_length;
 extern int	log_parameter_max_length_on_error;
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
new file mode 100644
index 0000000000..83ba7339fa
--- /dev/null
+++ b/src/include/utils/queryjumble.h
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.h
+ *	  Query normalization and fingerprinting.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/utils/queryjumble.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef QUERYJUBLE_H
+#define QUERYJUBLE_H
+
+#include "nodes/parsenodes.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+/*
+ * Struct for tracking locations/lengths of constants during normalization
+ */
+typedef struct LocationLen
+{
+	int			location;		/* start offset in query text */
+	int			length;			/* length in bytes, or -1 to ignore */
+} LocationLen;
+
+/*
+ * Working state for computing a query jumble and producing a normalized
+ * query string
+ */
+typedef struct JumbleState
+{
+	/* Jumble of current query tree */
+	unsigned char *jumble;
+
+	/* Number of bytes used in jumble[] */
+	Size		jumble_len;
+
+	/* Array of locations of constants that should be removed */
+	LocationLen *clocations;
+
+	/* Allocated length of clocations array */
+	int			clocations_buf_size;
+
+	/* Current number of valid entries in clocations array */
+	int			clocations_count;
+
+	/* highest Param id we've seen, in order to start normalization correctly */
+	int			highest_extern_param_id;
+} JumbleState;
+
+const char *CleanQuerytext(const char *query, int *location, int *len);
+JumbleState *JumbleQuery(Query *query, const char *querytext);
+
+#endif							/* QUERYJUMBLE_H */
-- 
2.30.1

v20-0002-Expose-queryid-in-pg_stat_activity-and-log_line_.patchtext/x-diff; charset=us-asciiDownload

From e08c9d5fc86ba722844d97000798de868890aba3 Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Mon, 22 Mar 2021 17:43:23 -0400
Subject: [PATCH v20 2/3] Expose queryid in pg_stat_activity and
 log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.
---
 .../pg_stat_statements/pg_stat_statements.c   | 112 +++++++-----------
 doc/src/sgml/config.sgml                      |  29 +++--
 doc/src/sgml/monitoring.sgml                  |  16 +++
 src/backend/catalog/system_views.sql          |   1 +
 src/backend/executor/execMain.c               |   9 ++
 src/backend/executor/execParallel.c           |  14 ++-
 src/backend/executor/nodeGather.c             |   3 +-
 src/backend/executor/nodeGatherMerge.c        |   4 +-
 src/backend/parser/analyze.c                  |   5 +
 src/backend/postmaster/pgstat.c               |  65 ++++++++++
 src/backend/tcop/postgres.c                   |   5 +
 src/backend/utils/adt/pgstatfuncs.c           |   7 +-
 src/backend/utils/error/elog.c                |   9 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          |  27 ++---
 src/include/catalog/pg_proc.dat               |   6 +-
 src/include/executor/execParallel.h           |   3 +-
 src/include/pgstat.h                          |   5 +
 src/test/regress/expected/rules.out           |   9 +-
 19 files changed, 222 insertions(+), 108 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index bd8c96728c..f62b9a2bfd 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -65,6 +65,7 @@
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/queryjumble.h"
 #include "utils/memutils.h"
 #include "utils/timestamp.h"
 
@@ -99,6 +100,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+									!IsA(n, PrepareStmt) && \
+									!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -307,7 +316,6 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 								ProcessUtilityContext context, ParamListInfo params,
 								QueryEnvironment *queryEnv,
 								DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
 static void pgss_store(const char *query, uint64 queryId,
 					   int query_location, int query_len,
 					   pgssStoreKind kind,
@@ -804,16 +812,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * Clear queryId for prepared statements related utility, as those will
+	 * inherit from the underlying statement's one (except DEALLOCATE which is
+	 * entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt))
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -1055,6 +1061,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, QueryCompletion *qc)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
+	uint64		saved_queryId = pstmt->queryId;
+
+	/*
+	 * Force utility statements to get queryId zero.  We do this even in cases
+	 * where the statement contains an optimizable statement for which a
+	 * queryId could be derived (such as EXPLAIN or DECLARE CURSOR).  For such
+	 * cases, runtime control will first go through ProcessUtility and then the
+	 * executor, and we don't want the executor hooks to do anything, since we
+	 * are already measuring the statement's costs at the utility level.
+	 *
+	 * Note that this is only done if pg_stat_statements is enabled and
+	 * configured to track utility statements, in the unlikely possibility
+	 * that user configured another extension to handle utility statements
+	 * only.
+	 */
+	if (pgss_enabled(exec_nested_level) && pgss_track_utility)
+		pstmt->queryId = UINT64CONST(0);
 
 	/*
 	 * If it's an EXECUTE statement, we don't track it and don't increment the
@@ -1071,9 +1094,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	 * Likewise, we don't track execution of DEALLOCATE.
 	 */
 	if (pgss_track_utility && pgss_enabled(exec_nested_level) &&
-		!IsA(parsetree, ExecuteStmt) &&
-		!IsA(parsetree, PrepareStmt) &&
-		!IsA(parsetree, DeallocateStmt))
+		PGSS_HANDLED_UTILITY(parsetree))
 	{
 		instr_time	start;
 		instr_time	duration;
@@ -1128,7 +1149,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
 
 		pgss_store(queryString,
-				   0,			/* signal that it's a utility stmt */
+				   saved_queryId,
 				   pstmt->stmt_location,
 				   pstmt->stmt_len,
 				   PGSS_EXEC,
@@ -1151,23 +1172,12 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	}
 }
 
-/*
- * Given an arbitrarily long query string, produce a hash for the purposes of
- * identifying the query, without normalizing constants.  Used when hashing
- * utility statements.
- */
-static uint64
-pgss_hash_string(const char *str, int len)
-{
-	return DatumGetUInt64(hash_any_extended((const unsigned char *) str,
-											len, 0));
-}
-
 /*
  * Store some statistics for a statement.
  *
- * If queryId is 0 then this is a utility statement and we should compute
- * a suitable queryId internally.
+ * If queryId is 0 then this is a utility statement for which we couldn't
+ * compute a queryId during parse analysis, and we should compute a suitable
+ * queryId internally.
  *
  * If jstate is not NULL then we're trying to create an entry for which
  * we have no statistics as yet; we just want to record the normalized
@@ -1198,52 +1208,18 @@ pgss_store(const char *query, uint64 queryId,
 		return;
 
 	/*
-	 * Confine our attention to the relevant part of the string, if the query
-	 * is a portion of a multi-statement source string.
-	 *
-	 * First apply starting offset, unless it's -1 (unknown).
-	 */
-	if (query_location >= 0)
-	{
-		Assert(query_location <= strlen(query));
-		query += query_location;
-		/* Length of 0 (or -1) means "rest of string" */
-		if (query_len <= 0)
-			query_len = strlen(query);
-		else
-			Assert(query_len <= strlen(query));
-	}
-	else
-	{
-		/* If query location is unknown, distrust query_len as well */
-		query_location = 0;
-		query_len = strlen(query);
-	}
-
-	/*
-	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
-	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 * Nothing to do if compute_query_id isn't enabled and no other module
+	 * computed a query identifier.
 	 */
-	while (query_len > 0 && scanner_isspace(query[0]))
-		query++, query_location++, query_len--;
-	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
-		query_len--;
+	if (queryId == UINT64CONST(0))
+		return;
 
 	/*
-	 * For utility statements, we just hash the query string to get an ID.
+	 * Confine our attention to the relevant part of the string, if the query
+	 * is a portion of a multi-statement source string, and update query
+	 * location and length if needed.
 	 */
-	if (queryId == UINT64CONST(0))
-	{
-		queryId = pgss_hash_string(query, query_len);
-
-		/*
-		 * If we are unlucky enough to get a hash of zero(invalid), use
-		 * queryID as 2 instead, queryID 1 is already in use for normal
-		 * statements.
-		 */
-		if (queryId == UINT64CONST(0))
-			queryId = UINT64CONST(2);
-	}
+	query = CleanQuerytext(query, &query_location, &query_len);
 
 	/* Set up key for hashtable search */
 	key.userid = GetUserId();
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d1aa746224..65ad8ca29e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6920,6 +6920,15 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>query identifier of the current query.  Query
+             identifiers are not computed by default, so this field
+             will be zero unless <xref linkend="guc-compute-query-id"/>
+             parameter is enabled or a third-party module that computes
+             query identifiers is configured.</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
@@ -7396,8 +7405,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       <listitem>
        <para>
         Enables the collection of information on the currently
-        executing command of each session, along with the time when
-        that command began execution. This parameter is on by
+        executing command of each session, along with its identifier and the
+        time when that command began execution. This parameter is on by
         default. Note that even when enabled, this information is not
         visible to all users, only to superusers and the user owning
         the session being reported on, so it should not represent a
@@ -7546,12 +7555,16 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </term>
       <listitem>
        <para>
-        Enables in-core computation of a query identifier.  The <xref
-        linkend="pgstatstatements"/> extension requires a query identifier
-        to be computed.  Note that an external module can alternatively
-        be used if the in-core query identifier computation method
-        isn't acceptable.  In this case, in-core computation should
-        remain disabled.  The default is <literal>off</literal>.
+        Enables in-core computation of a query identifier.
+        Query identifiers can be displayed in the <link
+        linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
+        view, or emitted in the log if configured via the <xref
+        linkend="guc-log-line-prefix"/> parameter.  The <xref
+        linkend="pgstatstatements"/> extension also requires a query
+        identifier to be computed.  Note that an external module can
+        alternatively be used if the in-core query identifier computation
+        specification isn't acceptable.  In this case, in-core computation
+        must be disabled.  The default is <literal>off</literal>.
        </para>
        <note>
         <para>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 43c07da20e..4995ccfedb 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -910,6 +910,22 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </para></entry>
      </row>
 
+    <row>
+     <entry role="catalog_table_entry"><para role="column_definition">
+      <structfield>queryid</structfield> <type>bigint</type>
+     </para>
+     <para>
+      Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this
+      field shows the identifier of the currently executing query. In
+      all other states, it shows the identifier of last query that was
+      executed.  Query identifiers are not computed by default so this
+      field will be null unless <xref linkend="guc-compute-query-id"/>
+      parameter is enabled or a third-party module that computes query
+      identifiers is configured.
+     </para></entry>
+    </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 0dca65dc7b..012d86217f 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -764,6 +764,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index c74ce36ffb..8c6c644211 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -54,6 +54,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "tcop/utility.h"
@@ -128,6 +129,14 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	/*
+	 * In some cases (e.g. an EXECUTE statement) a query execution will skip
+	 * parse analysis, which means that the queryid won't be reported.  Note
+	 * that it's harmless to report the queryid multiple time, as the call will
+	 * be ignored if the top level queryid has already been reported.
+	 */
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..26f1994a31 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -124,7 +124,7 @@ typedef struct ExecParallelInitializeDSMContext
 } ExecParallelInitializeDSMContext;
 
 /* Helper functions that run in the parallel leader. */
-static char *ExecSerializePlan(Plan *plan, EState *estate);
+static char *ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId);
 static bool ExecParallelEstimate(PlanState *node,
 								 ExecParallelEstimateContext *e);
 static bool ExecParallelInitializeDSM(PlanState *node,
@@ -143,7 +143,7 @@ static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
  * Create a serialized representation of the plan to be sent to each worker.
  */
 static char *
-ExecSerializePlan(Plan *plan, EState *estate)
+ExecSerializePlan(Plan *plan, EState *estate, uint64 queryId)
 {
 	PlannedStmt *pstmt;
 	ListCell   *lc;
@@ -174,7 +174,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = UINT64CONST(0);
+	pstmt->queryId = queryId;
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
@@ -578,7 +578,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
 ParallelExecutorInfo *
 ExecInitParallelPlan(PlanState *planstate, EState *estate,
 					 Bitmapset *sendParams, int nworkers,
-					 int64 tuples_needed)
+					 int64 tuples_needed,
+					 uint64 queryId)
 {
 	ParallelExecutorInfo *pei;
 	ParallelContext *pcxt;
@@ -620,7 +621,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pei->planstate = planstate;
 
 	/* Fix up and serialize plan to be sent to workers. */
-	pstmt_data = ExecSerializePlan(planstate->plan, estate);
+	pstmt_data = ExecSerializePlan(planstate->plan, estate, queryId);
 
 	/* Create a parallel context. */
 	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -1403,8 +1404,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Setting debug_query_string for individual workers */
 	debug_query_string = queryDesc->sourceText;
 
-	/* Report workers' query for monitoring purposes */
+	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 9e1dc464cb..04c860f678 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -172,7 +172,8 @@ ExecGather(PlanState *pstate)
 												 estate,
 												 gather->initParam,
 												 gather->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index aa5743cebf..32f74e8c23 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -24,6 +24,7 @@
 #include "lib/binaryheap.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -216,7 +217,8 @@ ExecGatherMerge(PlanState *pstate)
 												 estate,
 												 gm->initParam,
 												 gm->num_workers,
-												 node->tuples_needed);
+												 node->tuples_needed,
+												 pgstat_get_my_queryid());
 			else
 				ExecParallelReinitialize(node->ps.lefttree,
 										 node->pei,
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index c565c80365..d125ef7f98 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -44,6 +44,7 @@
 #include "parser/parse_target.h"
 #include "parser/parse_type.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
@@ -130,6 +131,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
@@ -167,6 +170,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 60f45ccc4e..b57deb42d7 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3381,6 +3381,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
+			beentry->st_queryid = 0;
 			proc->wait_event_info = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
@@ -3435,6 +3436,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	beentry->st_state = state;
 	beentry->st_state_start_timestamp = current_timestamp;
 
+	/*
+	 * If a new query is started, we reset the query identifier as it'll only
+	 * be known after parse analysis, to avoid reporting last query's
+	 * identifier.
+	 */
+	if (state == STATE_RUNNING)
+		beentry->st_queryid = 0;
+
 	if (cmd_str != NULL)
 	{
 		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
@@ -3445,6 +3454,48 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/* --------
+ * pgstat_report_queryid() -
+ *
+ *	Called to update top-level query identifier.
+ * --------
+ */
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	/*
+	 * if track_activities is disabled, st_queryid should already have been
+	 * reset
+	 */
+	if (!pgstat_track_activities)
+		return;
+
+	/*
+	 * We only report the top-level query identifiers.  The stored queryid is
+	 * reset when a backend calls pgstat_report_activity(STATE_RUNNING), or
+	 * with an explicit call to this function using the force flag.  If the
+	 * saved query identifier is not zero it means that it's not a top-level
+	 * command, so ignore the one provided unless it's an explicit call to
+	 * reset the identifier.
+	 */
+	if (beentry->st_queryid != 0 && !force)
+		return;
+
+	/*
+	 * Update my status entry, following the protocol of bumping
+	 * st_changecount before and after.  We use a volatile pointer here to
+	 * ensure the compiler doesn't try to get cute.
+	 */
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+	beentry->st_queryid = queryId;
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
 /*-----------
  * pgstat_progress_start_command() -
  *
@@ -5178,6 +5229,20 @@ pgstat_get_db_entry(Oid databaseid, bool create)
 	return result;
 }
 
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ *	Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+	if (!MyBEEntry)
+		return 0;
+
+	return MyBEEntry->st_queryid;
+}
+
 
 /*
  * Lookup the hash table entry for the specified table. If no hash
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7e034b72b1..d66cee79f0 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -692,6 +692,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	if (log_parser_stats)
 		ShowUsage("PARSE ANALYSIS STATISTICS");
 
@@ -910,6 +912,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
 			stmt->utilityStmt = query->utilityStmt;
 			stmt->stmt_location = query->stmt_location;
 			stmt->stmt_len = query->stmt_len;
+			stmt->queryId = query->queryId;
 		}
 		else
 		{
@@ -1026,6 +1029,8 @@ exec_simple_query(const char *query_string)
 		DestReceiver *receiver;
 		int16		format;
 
+		pgstat_report_queryid(0, true);
+
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 5102227a60..8e81eef8cb 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -569,7 +569,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	29
+#define PG_STAT_GET_ACTIVITY_COLS	30
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -914,6 +914,10 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[27] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			if (beentry->st_queryid == 0)
+				nulls[29] = true;
+			else
+				values[29] = DatumGetUInt64(beentry->st_queryid);
 		}
 		else
 		{
@@ -941,6 +945,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[26] = true;
 			nulls[27] = true;
 			nulls[28] = true;
+			nulls[29] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index e729ebece7..7aa484c5ed 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -77,7 +77,6 @@
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
-#include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2685,6 +2684,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (padding != 0)
+					appendStringInfo(buf, "%*ld", padding,
+							pgstat_get_my_queryid());
+				else
+					appendStringInfo(buf, "%ld",
+							pgstat_get_my_queryid());
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index aa34c99f0c..d702ea24b6 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -541,6 +541,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index 2a47688fd6..53286bb333 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -39,7 +39,7 @@
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
-static uint64 compute_utility_queryid(const char *str, int query_len);
+static uint64 compute_utility_queryid(const char *str, int query_location, int query_len);
 static void AppendJumble(JumbleState *jstate,
 						 const unsigned char *item, Size size);
 static void JumbleQueryInternal(JumbleState *jstate, Query *query);
@@ -97,17 +97,9 @@ JumbleQuery(Query *query, const char *querytext)
 	JumbleState *jstate = NULL;
 	if (query->utilityStmt)
 	{
-		const char *sql;
-		int query_location = query->stmt_location;
-		int query_len = query->stmt_len;
-
-		/*
-		 * Confine our attention to the relevant part of the string, if the
-		 * query is a portion of a multi-statement source string.
-		 */
-		sql = CleanQuerytext(querytext, &query_location, &query_len);
-
-		query->queryId = compute_utility_queryid(sql, query_len);
+		query->queryId = compute_utility_queryid(querytext,
+												 query->stmt_location,
+												 query->stmt_len);
 	}
 	else
 	{
@@ -143,11 +135,18 @@ JumbleQuery(Query *query, const char *querytext)
  * Compute a query identifier for the given utility query string.
  */
 static uint64
-compute_utility_queryid(const char *str, int query_len)
+compute_utility_queryid(const char *query_text, int query_location, int query_len)
 {
 	uint64 queryId;
+	const char *sql;
+
+	/*
+	 * Confine our attention to the relevant part of the string, if the
+	 * query is a portion of a multi-statement source string.
+	 */
+	sql = CleanQuerytext(query_text, &query_location, &query_len);
 
-	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) sql,
 											   query_len, 0));
 
 	/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 604ac564b3..5cebdd4379 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5255,9 +5255,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 3888175a2f..e0e08e0b27 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -39,7 +39,8 @@ typedef struct ParallelExecutorInfo
 
 extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
 												  EState *estate, Bitmapset *sendParam, int nworkers,
-												  int64 tuples_needed);
+												  int64 tuples_needed,
+												  uint64 queryId);
 extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
 extern void ExecParallelFinish(ParallelExecutorInfo *pei);
 extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 87672e6f30..6966696191 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1263,6 +1263,9 @@ typedef struct PgBackendStatus
 	ProgressCommandType st_progress_command;
 	Oid			st_progress_command_target;
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+
+	/* query identifier, optionally computed using post_parse_analyze_hook */
+	uint64		st_queryid;
 } PgBackendStatus;
 
 /*
@@ -1457,6 +1460,7 @@ extern void pgstat_initialize(void);
 extern void pgstat_bestart(void);
 
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_queryid(uint64 queryId, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
@@ -1465,6 +1469,7 @@ extern const char *pgstat_get_wait_event_type(uint32 wait_event_info);
 extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
+extern uint64 pgstat_get_my_queryid(void);
 
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 9b12cc122a..ff3506d5d7 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1762,9 +1762,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1876,7 +1877,7 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
@@ -2046,7 +2047,7 @@ pg_stat_replication| SELECT s.pid,
     w.sync_priority,
     w.sync_state,
     w.reply_time
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_replication_slots| SELECT s.slot_name,
@@ -2076,7 +2077,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,
-- 
2.30.1

v20-0003-Expose-query-identifier-in-verbose-explain.patchtext/x-diff; charset=us-asciiDownload

From 730eac44d5ea4dc539f734e2bc672316cb75ae80 Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Mon, 22 Mar 2021 17:43:24 -0400
Subject: [PATCH v20 3/3] Expose query identifier in verbose explain

If a query identifier has been computed, either by enabling compute_query_id or
using a third-party module, verbose explain will display it.
---
 doc/src/sgml/config.sgml              |  6 +++---
 doc/src/sgml/ref/explain.sgml         |  6 ++++--
 src/backend/commands/explain.c        | 18 ++++++++++++++++++
 src/test/regress/expected/explain.out | 11 ++++++++++-
 src/test/regress/sql/explain.sql      |  5 ++++-
 5 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 65ad8ca29e..e6d9046042 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7558,9 +7558,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         Enables in-core computation of a query identifier.
         Query identifiers can be displayed in the <link
         linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
-        view, or emitted in the log if configured via the <xref
-        linkend="guc-log-line-prefix"/> parameter.  The <xref
-        linkend="pgstatstatements"/> extension also requires a query
+        view, using <command>EXPLAIN</command>, or emitted in the log if
+        configured via the <xref linkend="guc-log-line-prefix"/> parameter.
+        The <xref linkend="pgstatstatements"/> extension also requires a query
         identifier to be computed.  Note that an external module can
         alternatively be used if the in-core query identifier computation
         specification isn't acceptable.  In this case, in-core computation
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index c4512332a0..4d758fb237 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -136,8 +136,10 @@ ROLLBACK;
       the output column list for each node in the plan tree, schema-qualify
       table and function names, always label variables in expressions with
       their range table alias, and always print the name of each trigger for
-      which statistics are displayed.  This parameter defaults to
-      <literal>FALSE</literal>.
+      which statistics are displayed.  The query identifier will also be
+      displayed if one has been computed, see <xref
+      linkend="guc-compute-query-id"/> for more details.  This parameter
+      defaults to <literal>FALSE</literal>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index afc45429ba..9794c4e794 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -24,6 +24,7 @@
 #include "nodes/extensible.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "parser/analyze.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/bufmgr.h"
@@ -163,6 +164,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 {
 	ExplainState *es = NewExplainState();
 	TupOutputState *tstate;
+	JumbleState *jstate = NULL;
+	Query		*query;
 	List	   *rewritten;
 	ListCell   *lc;
 	bool		timing_set = false;
@@ -239,6 +242,13 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	/* if the summary was not set explicitly, set default value */
 	es->summary = (summary_set) ? es->summary : es->analyze;
 
+	query = castNode(Query, stmt->query);
+	if (compute_query_id)
+		jstate = JumbleQuery(query, pstate->p_sourcetext);
+
+	if (post_parse_analyze_hook)
+		(*post_parse_analyze_hook) (pstate, query, jstate);
+
 	/*
 	 * Parse analysis was done already, but we still have to run the rule
 	 * rewriter.  We do not do AcquireRewriteLocks: we assume the query either
@@ -598,6 +608,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	/* Create textual dump of plan tree */
 	ExplainPrintPlan(es, queryDesc);
 
+	if (es->verbose && plannedstmt->queryId != UINT64CONST(0))
+	{
+		char	buf[MAXINT8LEN+1];
+
+		pg_lltoa(plannedstmt->queryId, buf);
+		ExplainPropertyText("Query Identifier", buf, es);
+	}
+
 	/* Show buffer usage in planning */
 	if (bufusage)
 	{
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
index 791eba8511..1f8a3ead52 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -17,7 +17,7 @@ begin
     for ln in execute $1
     loop
         -- Replace any numeric word with just 'N'
-        ln := regexp_replace(ln, '\m\d+\M', 'N', 'g');
+        ln := regexp_replace(ln, '-?\m\d+\M', 'N', 'g');
         -- In sort output, the above won't match units-suffixed numbers
         ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
         -- Ignore text-mode buffers output because it varies depending
@@ -470,3 +470,12 @@ select jsonb_pretty(
 (1 row)
 
 rollback;
+set compute_query_id = on;
+select explain_filter('explain (verbose) select 1');
+             explain_filter             
+----------------------------------------
+ Result  (cost=N.N..N.N rows=N width=N)
+   Output: N
+ Query Identifier: N
+(3 rows)
+
diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql
index f2eab030d6..468caf4037 100644
--- a/src/test/regress/sql/explain.sql
+++ b/src/test/regress/sql/explain.sql
@@ -19,7 +19,7 @@ begin
     for ln in execute $1
     loop
         -- Replace any numeric word with just 'N'
-        ln := regexp_replace(ln, '\m\d+\M', 'N', 'g');
+        ln := regexp_replace(ln, '-?\m\d+\M', 'N', 'g');
         -- In sort output, the above won't match units-suffixed numbers
         ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
         -- Ignore text-mode buffers output because it varies depending
@@ -103,3 +103,6 @@ select jsonb_pretty(
 );
 
 rollback;
+
+set compute_query_id = on;
+select explain_filter('explain (verbose) select 1');
-- 
2.30.1

alvherre@alvh.no-ip.org

almost 5 years ago

In reply to: Julien Rouhaud (#152)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On 2021-Mar-24, Julien Rouhaud wrote:

From e08c9d5fc86ba722844d97000798de868890aba3 Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Mon, 22 Mar 2021 17:43:23 -0400
Subject: [PATCH v20 2/3] Expose queryid in pg_stat_activity and

src/backend/executor/execMain.c | 9 ++
src/backend/executor/execParallel.c | 14 ++-
src/backend/executor/nodeGather.c | 3 +-
src/backend/executor/nodeGatherMerge.c | 4 +-

Hmm...

I find it odd that there's executor code that acquires the current query
ID from pgstat, after having been put there by planner or ExecutorStart
itself. Seems like a modularity violation. I wonder if it would make
more sense to have the value maybe in struct EState (or perhaps there's
a better place -- but I don't think they have a way to reach the
QueryDesc anyhow), put there by ExecutorStart, so that places such as
execParallel, nodeGather etc don't have to fetch it from pgstat but from
EState.

--
ï¿½lvaro Herrera Valdivia, Chile
"We're here to devour each other alive" (Hobbes)

rjuju123@gmail.com

almost 5 years ago

In reply to: Alvaro Herrera (#153)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Mar 24, 2021 at 01:02:00PM -0300, Alvaro Herrera wrote:

On 2021-Mar-24, Julien Rouhaud wrote:

From e08c9d5fc86ba722844d97000798de868890aba3 Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Mon, 22 Mar 2021 17:43:23 -0400
Subject: [PATCH v20 2/3] Expose queryid in pg_stat_activity and

src/backend/executor/execMain.c | 9 ++
src/backend/executor/execParallel.c | 14 ++-
src/backend/executor/nodeGather.c | 3 +-
src/backend/executor/nodeGatherMerge.c | 4 +-

Hmm...

I find it odd that there's executor code that acquires the current query
ID from pgstat, after having been put there by planner or ExecutorStart
itself. Seems like a modularity violation. I wonder if it would make
more sense to have the value maybe in struct EState (or perhaps there's
a better place -- but I don't think they have a way to reach the
QueryDesc anyhow), put there by ExecutorStart, so that places such as
execParallel, nodeGather etc don't have to fetch it from pgstat but from
EState.

The current queryid is already available in the Estate, as the underlying
PlannedStmt contains it. The problem is that we want to display the top level
queryid, not the current query one, and the top level queryid is held in
pgstat.

bruce@momjian.us

almost 5 years ago

In reply to: Julien Rouhaud (#152)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Mar 24, 2021 at 11:20:49PM +0800, Julien Rouhaud wrote:

On Wed, Mar 24, 2021 at 08:13:40AM -0400, Bruce Momjian wrote:

I have no local modifications. Please modify the patch I posted and
repost your version, thanks.

Ok! I used the last version of the patch you sent and addressed the following
comments from earlier messages in attached v20:

- copyright year to 2021
- s/has has been compute/has been compute/
- use the name CleanQuerytext in the first commit

My apologies --- yes, I made those two changes after I posted my version
of the patch. I should have reposted my version with those changes.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

rjuju123@gmail.com

almost 5 years ago

In reply to: Julien Rouhaud (#154)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Mar 25, 2021 at 10:36:38AM +0800, Julien Rouhaud wrote:

On Wed, Mar 24, 2021 at 01:02:00PM -0300, Alvaro Herrera wrote:

On 2021-Mar-24, Julien Rouhaud wrote:

From e08c9d5fc86ba722844d97000798de868890aba3 Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Mon, 22 Mar 2021 17:43:23 -0400
Subject: [PATCH v20 2/3] Expose queryid in pg_stat_activity and

src/backend/executor/execMain.c | 9 ++
src/backend/executor/execParallel.c | 14 ++-
src/backend/executor/nodeGather.c | 3 +-
src/backend/executor/nodeGatherMerge.c | 4 +-

Hmm...

I find it odd that there's executor code that acquires the current query
ID from pgstat, after having been put there by planner or ExecutorStart
itself. Seems like a modularity violation. I wonder if it would make
more sense to have the value maybe in struct EState (or perhaps there's
a better place -- but I don't think they have a way to reach the
QueryDesc anyhow), put there by ExecutorStart, so that places such as
execParallel, nodeGather etc don't have to fetch it from pgstat but from
EState.

The current queryid is already available in the Estate, as the underlying
PlannedStmt contains it. The problem is that we want to display the top level
queryid, not the current query one, and the top level queryid is held in
pgstat.

So is the current approach ok? If not I'm afraid that detecting and caching
the top level queryid in the executor parts would lead to some code
duplication.

bruce@momjian.us

almost 5 years ago

In reply to: Julien Rouhaud (#156)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Mar 31, 2021 at 11:25:32AM +0800, Julien Rouhaud wrote:

On Thu, Mar 25, 2021 at 10:36:38AM +0800, Julien Rouhaud wrote:

On Wed, Mar 24, 2021 at 01:02:00PM -0300, Alvaro Herrera wrote:

On 2021-Mar-24, Julien Rouhaud wrote:

From e08c9d5fc86ba722844d97000798de868890aba3 Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Mon, 22 Mar 2021 17:43:23 -0400
Subject: [PATCH v20 2/3] Expose queryid in pg_stat_activity and

src/backend/executor/execMain.c | 9 ++
src/backend/executor/execParallel.c | 14 ++-
src/backend/executor/nodeGather.c | 3 +-
src/backend/executor/nodeGatherMerge.c | 4 +-

Hmm...

I find it odd that there's executor code that acquires the current query
ID from pgstat, after having been put there by planner or ExecutorStart
itself. Seems like a modularity violation. I wonder if it would make
more sense to have the value maybe in struct EState (or perhaps there's
a better place -- but I don't think they have a way to reach the
QueryDesc anyhow), put there by ExecutorStart, so that places such as
execParallel, nodeGather etc don't have to fetch it from pgstat but from
EState.

The current queryid is already available in the Estate, as the underlying
PlannedStmt contains it. The problem is that we want to display the top level
queryid, not the current query one, and the top level queryid is held in
pgstat.

So is the current approach ok? If not I'm afraid that detecting and caching
the top level queryid in the executor parts would lead to some code
duplication.

I assume it is since Alvaro didn't reply. I am planning to apply this
soon.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

alvherre@alvh.no-ip.org

almost 5 years ago

In reply to: Bruce Momjian (#157)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On 2021-Mar-31, Bruce Momjian wrote:

On Wed, Mar 31, 2021 at 11:25:32AM +0800, Julien Rouhaud wrote:

On Thu, Mar 25, 2021 at 10:36:38AM +0800, Julien Rouhaud wrote:

On Wed, Mar 24, 2021 at 01:02:00PM -0300, Alvaro Herrera wrote:

I find it odd that there's executor code that acquires the current query
ID from pgstat, after having been put there by planner or ExecutorStart
itself. Seems like a modularity violation. I wonder if it would make
more sense to have the value maybe in struct EState (or perhaps there's
a better place -- but I don't think they have a way to reach the
QueryDesc anyhow), put there by ExecutorStart, so that places such as
execParallel, nodeGather etc don't have to fetch it from pgstat but from
EState.

The current queryid is already available in the Estate, as the underlying
PlannedStmt contains it. The problem is that we want to display the top level
queryid, not the current query one, and the top level queryid is held in
pgstat.

So is the current approach ok? If not I'm afraid that detecting and caching
the top level queryid in the executor parts would lead to some code
duplication.

I assume it is since Alvaro didn't reply. I am planning to apply this
soon.

I'm afraid I don't know enough about how parallel query works to make a
good assessment on this being a good approach or not -- and no time at
present to figure it all out.

--
ï¿½lvaro Herrera 39ï¿½49'30"S 73ï¿½17'W
"I think my standards have lowered enough that now I think 'good design'
is when the page doesn't irritate the living f*ck out of me." (JWZ)

rjuju123@gmail.com

almost 5 years ago

In reply to: Alvaro Herrera (#158)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Mar 31, 2021 at 11:18:45AM -0300, Alvaro Herrera wrote:

On 2021-Mar-31, Bruce Momjian wrote:

I assume it is since Alvaro didn't reply. I am planning to apply this
soon.

I'm afraid I don't know enough about how parallel query works to make a
good assessment on this being a good approach or not -- and no time at
present to figure it all out.

I'm far from being an expert either, but at the time I wrote it and
looking at the code around it probably seemed sensible. We could directly call
pgstat_get_my_queryid() in ExecSerializePlan() rather than passing it from the
various callers though, at least there would be a single source for it.

rjuju123@gmail.com

almost 5 years ago

In reply to: Julien Rouhaud (#159)

3 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Apr 01, 2021 at 11:05:24PM +0800, Julien Rouhaud wrote:

On Wed, Mar 31, 2021 at 11:18:45AM -0300, Alvaro Herrera wrote:

On 2021-Mar-31, Bruce Momjian wrote:

I assume it is since Alvaro didn't reply. I am planning to apply this
soon.

I'm afraid I don't know enough about how parallel query works to make a
good assessment on this being a good approach or not -- and no time at
present to figure it all out.

I'm far from being an expert either, but at the time I wrote it and
looking at the code around it probably seemed sensible. We could directly call
pgstat_get_my_queryid() in ExecSerializePlan() rather than passing it from the
various callers though, at least there would be a single source for it.

Here's a v21 that includes the mentioned change.

Attachments:

v21-0002-Expose-queryid-in-pg_stat_activity-and-log_line_.patchtext/x-diff; charset=us-asciiDownload

From b2f654803c2e8a6e64e3fa29d1acafb5b3199489 Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Mon, 22 Mar 2021 17:43:23 -0400
Subject: [PATCH v21 2/3] Expose queryid in pg_stat_activity and
 log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.
---
 .../pg_stat_statements/pg_stat_statements.c   | 112 +++++++-----------
 doc/src/sgml/config.sgml                      |  29 +++--
 doc/src/sgml/monitoring.sgml                  |  16 +++
 src/backend/catalog/system_views.sql          |   1 +
 src/backend/executor/execMain.c               |   9 ++
 src/backend/executor/execParallel.c           |   5 +-
 src/backend/executor/nodeGatherMerge.c        |   1 +
 src/backend/parser/analyze.c                  |   5 +
 src/backend/postmaster/pgstat.c               |  65 ++++++++++
 src/backend/tcop/postgres.c                   |   5 +
 src/backend/utils/adt/pgstatfuncs.c           |   7 +-
 src/backend/utils/error/elog.c                |   9 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          |  27 ++---
 src/include/catalog/pg_proc.dat               |   6 +-
 src/include/pgstat.h                          |   5 +
 src/test/regress/expected/rules.out           |   9 +-
 17 files changed, 211 insertions(+), 101 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index bd8c96728c..f62b9a2bfd 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -65,6 +65,7 @@
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/queryjumble.h"
 #include "utils/memutils.h"
 #include "utils/timestamp.h"
 
@@ -99,6 +100,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+									!IsA(n, PrepareStmt) && \
+									!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -307,7 +316,6 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 								ProcessUtilityContext context, ParamListInfo params,
 								QueryEnvironment *queryEnv,
 								DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
 static void pgss_store(const char *query, uint64 queryId,
 					   int query_location, int query_len,
 					   pgssStoreKind kind,
@@ -804,16 +812,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * Clear queryId for prepared statements related utility, as those will
+	 * inherit from the underlying statement's one (except DEALLOCATE which is
+	 * entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt))
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -1055,6 +1061,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, QueryCompletion *qc)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
+	uint64		saved_queryId = pstmt->queryId;
+
+	/*
+	 * Force utility statements to get queryId zero.  We do this even in cases
+	 * where the statement contains an optimizable statement for which a
+	 * queryId could be derived (such as EXPLAIN or DECLARE CURSOR).  For such
+	 * cases, runtime control will first go through ProcessUtility and then the
+	 * executor, and we don't want the executor hooks to do anything, since we
+	 * are already measuring the statement's costs at the utility level.
+	 *
+	 * Note that this is only done if pg_stat_statements is enabled and
+	 * configured to track utility statements, in the unlikely possibility
+	 * that user configured another extension to handle utility statements
+	 * only.
+	 */
+	if (pgss_enabled(exec_nested_level) && pgss_track_utility)
+		pstmt->queryId = UINT64CONST(0);
 
 	/*
 	 * If it's an EXECUTE statement, we don't track it and don't increment the
@@ -1071,9 +1094,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	 * Likewise, we don't track execution of DEALLOCATE.
 	 */
 	if (pgss_track_utility && pgss_enabled(exec_nested_level) &&
-		!IsA(parsetree, ExecuteStmt) &&
-		!IsA(parsetree, PrepareStmt) &&
-		!IsA(parsetree, DeallocateStmt))
+		PGSS_HANDLED_UTILITY(parsetree))
 	{
 		instr_time	start;
 		instr_time	duration;
@@ -1128,7 +1149,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
 
 		pgss_store(queryString,
-				   0,			/* signal that it's a utility stmt */
+				   saved_queryId,
 				   pstmt->stmt_location,
 				   pstmt->stmt_len,
 				   PGSS_EXEC,
@@ -1151,23 +1172,12 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	}
 }
 
-/*
- * Given an arbitrarily long query string, produce a hash for the purposes of
- * identifying the query, without normalizing constants.  Used when hashing
- * utility statements.
- */
-static uint64
-pgss_hash_string(const char *str, int len)
-{
-	return DatumGetUInt64(hash_any_extended((const unsigned char *) str,
-											len, 0));
-}
-
 /*
  * Store some statistics for a statement.
  *
- * If queryId is 0 then this is a utility statement and we should compute
- * a suitable queryId internally.
+ * If queryId is 0 then this is a utility statement for which we couldn't
+ * compute a queryId during parse analysis, and we should compute a suitable
+ * queryId internally.
  *
  * If jstate is not NULL then we're trying to create an entry for which
  * we have no statistics as yet; we just want to record the normalized
@@ -1198,52 +1208,18 @@ pgss_store(const char *query, uint64 queryId,
 		return;
 
 	/*
-	 * Confine our attention to the relevant part of the string, if the query
-	 * is a portion of a multi-statement source string.
-	 *
-	 * First apply starting offset, unless it's -1 (unknown).
-	 */
-	if (query_location >= 0)
-	{
-		Assert(query_location <= strlen(query));
-		query += query_location;
-		/* Length of 0 (or -1) means "rest of string" */
-		if (query_len <= 0)
-			query_len = strlen(query);
-		else
-			Assert(query_len <= strlen(query));
-	}
-	else
-	{
-		/* If query location is unknown, distrust query_len as well */
-		query_location = 0;
-		query_len = strlen(query);
-	}
-
-	/*
-	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
-	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 * Nothing to do if compute_query_id isn't enabled and no other module
+	 * computed a query identifier.
 	 */
-	while (query_len > 0 && scanner_isspace(query[0]))
-		query++, query_location++, query_len--;
-	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
-		query_len--;
+	if (queryId == UINT64CONST(0))
+		return;
 
 	/*
-	 * For utility statements, we just hash the query string to get an ID.
+	 * Confine our attention to the relevant part of the string, if the query
+	 * is a portion of a multi-statement source string, and update query
+	 * location and length if needed.
 	 */
-	if (queryId == UINT64CONST(0))
-	{
-		queryId = pgss_hash_string(query, query_len);
-
-		/*
-		 * If we are unlucky enough to get a hash of zero(invalid), use
-		 * queryID as 2 instead, queryID 1 is already in use for normal
-		 * statements.
-		 */
-		if (queryId == UINT64CONST(0))
-			queryId = UINT64CONST(2);
-	}
+	query = CleanQuerytext(query, &query_location, &query_len);
 
 	/* Set up key for hashtable search */
 	key.userid = GetUserId();
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8639914fac..d53d0e234f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6942,6 +6942,15 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>query identifier of the current query.  Query
+             identifiers are not computed by default, so this field
+             will be zero unless <xref linkend="guc-compute-query-id"/>
+             parameter is enabled or a third-party module that computes
+             query identifiers is configured.</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
@@ -7418,8 +7427,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       <listitem>
        <para>
         Enables the collection of information on the currently
-        executing command of each session, along with the time when
-        that command began execution. This parameter is on by
+        executing command of each session, along with its identifier and the
+        time when that command began execution. This parameter is on by
         default. Note that even when enabled, this information is not
         visible to all users, only to superusers and the user owning
         the session being reported on, so it should not represent a
@@ -7568,12 +7577,16 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </term>
       <listitem>
        <para>
-        Enables in-core computation of a query identifier.  The <xref
-        linkend="pgstatstatements"/> extension requires a query identifier
-        to be computed.  Note that an external module can alternatively
-        be used if the in-core query identifier computation method
-        isn't acceptable.  In this case, in-core computation should
-        remain disabled.  The default is <literal>off</literal>.
+        Enables in-core computation of a query identifier.
+        Query identifiers can be displayed in the <link
+        linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
+        view, or emitted in the log if configured via the <xref
+        linkend="guc-log-line-prefix"/> parameter.  The <xref
+        linkend="pgstatstatements"/> extension also requires a query
+        identifier to be computed.  Note that an external module can
+        alternatively be used if the in-core query identifier computation
+        specification isn't acceptable.  In this case, in-core computation
+        must be disabled.  The default is <literal>off</literal>.
        </para>
        <note>
         <para>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index af540fb02f..b4b18fa547 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -910,6 +910,22 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </para></entry>
      </row>
 
+    <row>
+     <entry role="catalog_table_entry"><para role="column_definition">
+      <structfield>queryid</structfield> <type>bigint</type>
+     </para>
+     <para>
+      Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this
+      field shows the identifier of the currently executing query. In
+      all other states, it shows the identifier of last query that was
+      executed.  Query identifiers are not computed by default so this
+      field will be null unless <xref linkend="guc-compute-query-id"/>
+      parameter is enabled or a third-party module that computes query
+      identifiers is configured.
+     </para></entry>
+    </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5f2541d316..4d6b232787 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -833,6 +833,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 163242f54e..82fbfd2259 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -54,6 +54,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "tcop/utility.h"
@@ -128,6 +129,14 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	/*
+	 * In some cases (e.g. an EXECUTE statement) a query execution will skip
+	 * parse analysis, which means that the queryid won't be reported.  Note
+	 * that it's harmless to report the queryid multiple time, as the call will
+	 * be ignored if the top level queryid has already been reported.
+	 */
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..e3cfa96519 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -174,7 +174,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = UINT64CONST(0);
+	pstmt->queryId = pgstat_get_my_queryid();
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
@@ -1403,8 +1403,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Setting debug_query_string for individual workers */
 	debug_query_string = queryDesc->sourceText;
 
-	/* Report workers' query for monitoring purposes */
+	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index aa5743cebf..91e2c10eab 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -24,6 +24,7 @@
 #include "lib/binaryheap.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 35cb9ebfd7..73976cf4f6 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -44,6 +44,7 @@
 #include "parser/parse_target.h"
 #include "parser/parse_type.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
@@ -130,6 +131,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
@@ -167,6 +170,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 4b9bcd2b41..e216bd591c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3381,6 +3381,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
+			beentry->st_queryid = 0;
 			proc->wait_event_info = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
@@ -3435,6 +3436,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	beentry->st_state = state;
 	beentry->st_state_start_timestamp = current_timestamp;
 
+	/*
+	 * If a new query is started, we reset the query identifier as it'll only
+	 * be known after parse analysis, to avoid reporting last query's
+	 * identifier.
+	 */
+	if (state == STATE_RUNNING)
+		beentry->st_queryid = 0;
+
 	if (cmd_str != NULL)
 	{
 		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
@@ -3445,6 +3454,48 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/* --------
+ * pgstat_report_queryid() -
+ *
+ *	Called to update top-level query identifier.
+ * --------
+ */
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	/*
+	 * if track_activities is disabled, st_queryid should already have been
+	 * reset
+	 */
+	if (!pgstat_track_activities)
+		return;
+
+	/*
+	 * We only report the top-level query identifiers.  The stored queryid is
+	 * reset when a backend calls pgstat_report_activity(STATE_RUNNING), or
+	 * with an explicit call to this function using the force flag.  If the
+	 * saved query identifier is not zero it means that it's not a top-level
+	 * command, so ignore the one provided unless it's an explicit call to
+	 * reset the identifier.
+	 */
+	if (beentry->st_queryid != 0 && !force)
+		return;
+
+	/*
+	 * Update my status entry, following the protocol of bumping
+	 * st_changecount before and after.  We use a volatile pointer here to
+	 * ensure the compiler doesn't try to get cute.
+	 */
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+	beentry->st_queryid = queryId;
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
 /*-----------
  * pgstat_progress_start_command() -
  *
@@ -5181,6 +5232,20 @@ pgstat_get_db_entry(Oid databaseid, bool create)
 	return result;
 }
 
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ *	Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+	if (!MyBEEntry)
+		return 0;
+
+	return MyBEEntry->st_queryid;
+}
+
 
 /*
  * Lookup the hash table entry for the specified table. If no hash
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7e034b72b1..d66cee79f0 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -692,6 +692,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	if (log_parser_stats)
 		ShowUsage("PARSE ANALYSIS STATISTICS");
 
@@ -910,6 +912,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
 			stmt->utilityStmt = query->utilityStmt;
 			stmt->stmt_location = query->stmt_location;
 			stmt->stmt_len = query->stmt_len;
+			stmt->queryId = query->queryId;
 		}
 		else
 		{
@@ -1026,6 +1029,8 @@ exec_simple_query(const char *query_string)
 		DestReceiver *receiver;
 		int16		format;
 
+		pgstat_report_queryid(0, true);
+
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 5102227a60..8e81eef8cb 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -569,7 +569,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	29
+#define PG_STAT_GET_ACTIVITY_COLS	30
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -914,6 +914,10 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[27] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			if (beentry->st_queryid == 0)
+				nulls[29] = true;
+			else
+				values[29] = DatumGetUInt64(beentry->st_queryid);
 		}
 		else
 		{
@@ -941,6 +945,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[26] = true;
 			nulls[27] = true;
 			nulls[28] = true;
+			nulls[29] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 423df2f300..bbdef3bf95 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -77,7 +77,6 @@
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
-#include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2710,6 +2709,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (padding != 0)
+					appendStringInfo(buf, "%*ld", padding,
+							pgstat_get_my_queryid());
+				else
+					appendStringInfo(buf, "%ld",
+							pgstat_get_my_queryid());
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 14000cb67d..08b040e9a9 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -542,6 +542,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index 2a47688fd6..53286bb333 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -39,7 +39,7 @@
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
-static uint64 compute_utility_queryid(const char *str, int query_len);
+static uint64 compute_utility_queryid(const char *str, int query_location, int query_len);
 static void AppendJumble(JumbleState *jstate,
 						 const unsigned char *item, Size size);
 static void JumbleQueryInternal(JumbleState *jstate, Query *query);
@@ -97,17 +97,9 @@ JumbleQuery(Query *query, const char *querytext)
 	JumbleState *jstate = NULL;
 	if (query->utilityStmt)
 	{
-		const char *sql;
-		int query_location = query->stmt_location;
-		int query_len = query->stmt_len;
-
-		/*
-		 * Confine our attention to the relevant part of the string, if the
-		 * query is a portion of a multi-statement source string.
-		 */
-		sql = CleanQuerytext(querytext, &query_location, &query_len);
-
-		query->queryId = compute_utility_queryid(sql, query_len);
+		query->queryId = compute_utility_queryid(querytext,
+												 query->stmt_location,
+												 query->stmt_len);
 	}
 	else
 	{
@@ -143,11 +135,18 @@ JumbleQuery(Query *query, const char *querytext)
  * Compute a query identifier for the given utility query string.
  */
 static uint64
-compute_utility_queryid(const char *str, int query_len)
+compute_utility_queryid(const char *query_text, int query_location, int query_len)
 {
 	uint64 queryId;
+	const char *sql;
+
+	/*
+	 * Confine our attention to the relevant part of the string, if the
+	 * query is a portion of a multi-statement source string.
+	 */
+	sql = CleanQuerytext(query_text, &query_location, &query_len);
 
-	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) sql,
 											   query_len, 0));
 
 	/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 69ffd0c3f4..ab30558e3f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5263,9 +5263,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index d699502cd9..3731c43e6d 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1264,6 +1264,9 @@ typedef struct PgBackendStatus
 	ProgressCommandType st_progress_command;
 	Oid			st_progress_command_target;
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+
+	/* query identifier, optionally computed using post_parse_analyze_hook */
+	uint64		st_queryid;
 } PgBackendStatus;
 
 /*
@@ -1458,6 +1461,7 @@ extern void pgstat_initialize(void);
 extern void pgstat_bestart(void);
 
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_queryid(uint64 queryId, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
@@ -1466,6 +1470,7 @@ extern const char *pgstat_get_wait_event_type(uint32 wait_event_info);
 extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
+extern uint64 pgstat_get_my_queryid(void);
 
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 9b59a7b4a5..264deda7af 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1762,9 +1762,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1876,7 +1877,7 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
@@ -2046,7 +2047,7 @@ pg_stat_replication| SELECT s.pid,
     w.sync_priority,
     w.sync_state,
     w.reply_time
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_replication_slots| SELECT s.slot_name,
@@ -2076,7 +2077,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,
-- 
2.30.1

v21-0003-Expose-query-identifier-in-verbose-explain.patchtext/x-diff; charset=us-asciiDownload

From 05efd3ef9a5282f9ceaab83ba50c2648593a9d9e Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Mon, 22 Mar 2021 17:43:24 -0400
Subject: [PATCH v21 3/3] Expose query identifier in verbose explain

If a query identifier has been computed, either by enabling compute_query_id or
using a third-party module, verbose explain will display it.
---
 doc/src/sgml/config.sgml              |  6 +++---
 doc/src/sgml/ref/explain.sgml         |  6 ++++--
 src/backend/commands/explain.c        | 18 ++++++++++++++++++
 src/test/regress/expected/explain.out | 11 ++++++++++-
 src/test/regress/sql/explain.sql      |  5 ++++-
 5 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d53d0e234f..9520771bf6 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7580,9 +7580,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         Enables in-core computation of a query identifier.
         Query identifiers can be displayed in the <link
         linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
-        view, or emitted in the log if configured via the <xref
-        linkend="guc-log-line-prefix"/> parameter.  The <xref
-        linkend="pgstatstatements"/> extension also requires a query
+        view, using <command>EXPLAIN</command>, or emitted in the log if
+        configured via the <xref linkend="guc-log-line-prefix"/> parameter.
+        The <xref linkend="pgstatstatements"/> extension also requires a query
         identifier to be computed.  Note that an external module can
         alternatively be used if the in-core query identifier computation
         specification isn't acceptable.  In this case, in-core computation
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index c4512332a0..4d758fb237 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -136,8 +136,10 @@ ROLLBACK;
       the output column list for each node in the plan tree, schema-qualify
       table and function names, always label variables in expressions with
       their range table alias, and always print the name of each trigger for
-      which statistics are displayed.  This parameter defaults to
-      <literal>FALSE</literal>.
+      which statistics are displayed.  The query identifier will also be
+      displayed if one has been computed, see <xref
+      linkend="guc-compute-query-id"/> for more details.  This parameter
+      defaults to <literal>FALSE</literal>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 872aaa7aed..04f4822513 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -24,6 +24,7 @@
 #include "nodes/extensible.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "parser/analyze.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/bufmgr.h"
@@ -163,6 +164,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 {
 	ExplainState *es = NewExplainState();
 	TupOutputState *tstate;
+	JumbleState *jstate = NULL;
+	Query		*query;
 	List	   *rewritten;
 	ListCell   *lc;
 	bool		timing_set = false;
@@ -239,6 +242,13 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	/* if the summary was not set explicitly, set default value */
 	es->summary = (summary_set) ? es->summary : es->analyze;
 
+	query = castNode(Query, stmt->query);
+	if (compute_query_id)
+		jstate = JumbleQuery(query, pstate->p_sourcetext);
+
+	if (post_parse_analyze_hook)
+		(*post_parse_analyze_hook) (pstate, query, jstate);
+
 	/*
 	 * Parse analysis was done already, but we still have to run the rule
 	 * rewriter.  We do not do AcquireRewriteLocks: we assume the query either
@@ -598,6 +608,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	/* Create textual dump of plan tree */
 	ExplainPrintPlan(es, queryDesc);
 
+	if (es->verbose && plannedstmt->queryId != UINT64CONST(0))
+	{
+		char	buf[MAXINT8LEN+1];
+
+		pg_lltoa(plannedstmt->queryId, buf);
+		ExplainPropertyText("Query Identifier", buf, es);
+	}
+
 	/* Show buffer usage in planning */
 	if (bufusage)
 	{
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
index b89b99fb02..4c578d4f5e 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -17,7 +17,7 @@ begin
     for ln in execute $1
     loop
         -- Replace any numeric word with just 'N'
-        ln := regexp_replace(ln, '\m\d+\M', 'N', 'g');
+        ln := regexp_replace(ln, '-?\m\d+\M', 'N', 'g');
         -- In sort output, the above won't match units-suffixed numbers
         ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
         -- Ignore text-mode buffers output because it varies depending
@@ -477,3 +477,12 @@ select jsonb_pretty(
 (1 row)
 
 rollback;
+set compute_query_id = on;
+select explain_filter('explain (verbose) select 1');
+             explain_filter             
+----------------------------------------
+ Result  (cost=N.N..N.N rows=N width=N)
+   Output: N
+ Query Identifier: N
+(3 rows)
+
diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql
index f2eab030d6..468caf4037 100644
--- a/src/test/regress/sql/explain.sql
+++ b/src/test/regress/sql/explain.sql
@@ -19,7 +19,7 @@ begin
     for ln in execute $1
     loop
         -- Replace any numeric word with just 'N'
-        ln := regexp_replace(ln, '\m\d+\M', 'N', 'g');
+        ln := regexp_replace(ln, '-?\m\d+\M', 'N', 'g');
         -- In sort output, the above won't match units-suffixed numbers
         ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
         -- Ignore text-mode buffers output because it varies depending
@@ -103,3 +103,6 @@ select jsonb_pretty(
 );
 
 rollback;
+
+set compute_query_id = on;
+select explain_filter('explain (verbose) select 1');
-- 
2.30.1

v21-0001-Move-pg_stat_statements-query-jumbling-to-core.patchtext/x-diff; charset=us-asciiDownload

From 819a45faf520dfd60b4fe3e9aea111171e3a2b69 Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Mon, 22 Mar 2021 17:43:22 -0400
Subject: [PATCH v21 1/3] Move pg_stat_statements query jumbling to core.

A new compute_query_id GUC is also added, to control whether a query identifier
should be computed by the core.  It's thefore now possible to disable core
queryid computation and use pg_stat_statements with a different algorithm to
compute the query identifier by using third-party module.

To ensure that a single source of query identifier can be used and is well
defined, modules that calculate a query identifier should throw an error if
compute_query_id is enabled or if a query idenfitier was already calculated.
---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +----------------
 .../pg_stat_statements.conf                   |   1 +
 doc/src/sgml/config.sgml                      |  25 +
 doc/src/sgml/pgstatstatements.sgml            |  20 +-
 src/backend/parser/analyze.c                  |  14 +-
 src/backend/tcop/postgres.c                   |   6 +-
 src/backend/utils/misc/Makefile               |   1 +
 src/backend/utils/misc/guc.c                  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          | 834 ++++++++++++++++++
 src/include/parser/analyze.h                  |   4 +-
 src/include/utils/guc.h                       |   1 +
 src/include/utils/queryjumble.h               |  58 ++
 13 files changed, 995 insertions(+), 785 deletions(-)
 create mode 100644 src/backend/utils/misc/queryjumble.c
 create mode 100644 src/include/utils/queryjumble.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 62cccbfa44..bd8c96728c 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * Starting in Postgres 9.2, this module normalized query entries.  As of
+ * Postgres 14, the normalization is done by the core if compute_query_id is
+ * enabled, or optionally by third-party modules.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -114,8 +99,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
-#define JUMBLE_SIZE				1024	/* query serialization buffer size */
-
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -235,40 +218,6 @@ typedef struct pgssSharedState
 	pgssGlobalStats stats;		/* global statistics for pgss */
 } pgssSharedState;
 
-/*
- * Struct for tracking locations/lengths of constants during normalization
- */
-typedef struct pgssLocationLen
-{
-	int			location;		/* start offset in query text */
-	int			length;			/* length in bytes, or -1 to ignore */
-} pgssLocationLen;
-
-/*
- * Working state for computing a query jumble and producing a normalized
- * query string
- */
-typedef struct pgssJumbleState
-{
-	/* Jumble of current query tree */
-	unsigned char *jumble;
-
-	/* Number of bytes used in jumble[] */
-	Size		jumble_len;
-
-	/* Array of locations of constants that should be removed */
-	pgssLocationLen *clocations;
-
-	/* Allocated length of clocations array */
-	int			clocations_buf_size;
-
-	/* Current number of valid entries in clocations array */
-	int			clocations_count;
-
-	/* highest Param id we've seen, in order to start normalization correctly */
-	int			highest_extern_param_id;
-} pgssJumbleState;
-
 /*---- Local variables ----*/
 
 /* Current nesting depth of ExecutorRun+ProcessUtility calls */
@@ -342,7 +291,8 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_info);
 
 static void pgss_shmem_startup(void);
 static void pgss_shmem_shutdown(int code, Datum arg);
-static void pgss_post_parse_analyze(ParseState *pstate, Query *query);
+static void pgss_post_parse_analyze(ParseState *pstate, Query *query,
+									JumbleState *jstate);
 static PlannedStmt *pgss_planner(Query *parse,
 								 const char *query_string,
 								 int cursorOptions,
@@ -364,7 +314,7 @@ static void pgss_store(const char *query, uint64 queryId,
 					   double total_time, uint64 rows,
 					   const BufferUsage *bufusage,
 					   const WalUsage *walusage,
-					   pgssJumbleState *jstate);
+					   JumbleState *jstate);
 static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
 										pgssVersion api_version,
 										bool showtext);
@@ -380,16 +330,9 @@ static char *qtext_fetch(Size query_offset, int query_len,
 static bool need_gc_qtexts(void);
 static void gc_qtexts(void);
 static void entry_reset(Oid userid, Oid dbid, uint64 queryid);
-static void AppendJumble(pgssJumbleState *jstate,
-						 const unsigned char *item, Size size);
-static void JumbleQuery(pgssJumbleState *jstate, Query *query);
-static void JumbleRangeTable(pgssJumbleState *jstate, List *rtable);
-static void JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks);
-static void JumbleExpr(pgssJumbleState *jstate, Node *node);
-static void RecordConstLocation(pgssJumbleState *jstate, int location);
-static char *generate_normalized_query(pgssJumbleState *jstate, const char *query,
+static char *generate_normalized_query(JumbleState *jstate, const char *query,
 									   int query_loc, int *query_len_p);
-static void fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+static void fill_in_constant_lengths(JumbleState *jstate, const char *query,
 									 int query_loc);
 static int	comp_location(const void *a, const void *b);
 
@@ -851,15 +794,10 @@ error:
  * Post-parse-analysis hook: mark query with a queryId
  */
 static void
-pgss_post_parse_analyze(ParseState *pstate, Query *query)
+pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 {
-	pgssJumbleState jstate;
-
 	if (prev_post_parse_analyze_hook)
-		prev_post_parse_analyze_hook(pstate, query);
-
-	/* Assert we didn't do this already */
-	Assert(query->queryId == UINT64CONST(0));
+		prev_post_parse_analyze_hook(pstate, query, jstate);
 
 	/* Safety check... */
 	if (!pgss || !pgss_hash || !pgss_enabled(exec_nested_level))
@@ -879,35 +817,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 	}
 
-	/* Set up workspace for query jumbling */
-	jstate.jumble = (unsigned char *) palloc(JUMBLE_SIZE);
-	jstate.jumble_len = 0;
-	jstate.clocations_buf_size = 32;
-	jstate.clocations = (pgssLocationLen *)
-		palloc(jstate.clocations_buf_size * sizeof(pgssLocationLen));
-	jstate.clocations_count = 0;
-	jstate.highest_extern_param_id = 0;
-
-	/* Compute query ID and mark the Query node with it */
-	JumbleQuery(&jstate, query);
-	query->queryId =
-		DatumGetUInt64(hash_any_extended(jstate.jumble, jstate.jumble_len, 0));
-
 	/*
-	 * If we are unlucky enough to get a hash of zero, use 1 instead, to
-	 * prevent confusion with the utility-statement case.
+	 * If query jumbling were able to identify any ignorable constants, we
+	 * immediately create a hash table entry for the query, so that we can
+	 * record the normalized form of the query string.  If there were no such
+	 * constants, the normalized string would be the same as the query text
+	 * anyway, so there's no need for an early entry.
 	 */
-	if (query->queryId == UINT64CONST(0))
-		query->queryId = UINT64CONST(1);
-
-	/*
-	 * If we were able to identify any ignorable constants, we immediately
-	 * create a hash table entry for the query, so that we can record the
-	 * normalized form of the query string.  If there were no such constants,
-	 * the normalized string would be the same as the query text anyway, so
-	 * there's no need for an early entry.
-	 */
-	if (jstate.clocations_count > 0)
+	if (jstate && jstate->clocations_count > 0)
 		pgss_store(pstate->p_sourcetext,
 				   query->queryId,
 				   query->stmt_location,
@@ -917,7 +834,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 				   0,
 				   NULL,
 				   NULL,
-				   &jstate);
+				   jstate);
 }
 
 /*
@@ -1267,7 +1184,7 @@ pgss_store(const char *query, uint64 queryId,
 		   double total_time, uint64 rows,
 		   const BufferUsage *bufusage,
 		   const WalUsage *walusage,
-		   pgssJumbleState *jstate)
+		   JumbleState *jstate)
 {
 	pgssHashKey key;
 	pgssEntry  *entry;
@@ -2627,678 +2544,6 @@ release_lock:
 	LWLockRelease(pgss->lock);
 }
 
-/*
- * AppendJumble: Append a value that is substantive in a given query to
- * the current jumble.
- */
-static void
-AppendJumble(pgssJumbleState *jstate, const unsigned char *item, Size size)
-{
-	unsigned char *jumble = jstate->jumble;
-	Size		jumble_len = jstate->jumble_len;
-
-	/*
-	 * Whenever the jumble buffer is full, we hash the current contents and
-	 * reset the buffer to contain just that hash value, thus relying on the
-	 * hash to summarize everything so far.
-	 */
-	while (size > 0)
-	{
-		Size		part_size;
-
-		if (jumble_len >= JUMBLE_SIZE)
-		{
-			uint64		start_hash;
-
-			start_hash = DatumGetUInt64(hash_any_extended(jumble,
-														  JUMBLE_SIZE, 0));
-			memcpy(jumble, &start_hash, sizeof(start_hash));
-			jumble_len = sizeof(start_hash);
-		}
-		part_size = Min(size, JUMBLE_SIZE - jumble_len);
-		memcpy(jumble + jumble_len, item, part_size);
-		jumble_len += part_size;
-		item += part_size;
-		size -= part_size;
-	}
-	jstate->jumble_len = jumble_len;
-}
-
-/*
- * Wrappers around AppendJumble to encapsulate details of serialization
- * of individual local variable elements.
- */
-#define APP_JUMB(item) \
-	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
-#define APP_JUMB_STRING(str) \
-	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
-
-/*
- * JumbleQuery: Selectively serialize the query tree, appending significant
- * data to the "query jumble" while ignoring nonsignificant data.
- *
- * Rule of thumb for what to include is that we should ignore anything not
- * semantically significant (such as alias names) as well as anything that can
- * be deduced from child nodes (else we'd just be double-hashing that piece
- * of information).
- */
-static void
-JumbleQuery(pgssJumbleState *jstate, Query *query)
-{
-	Assert(IsA(query, Query));
-	Assert(query->utilityStmt == NULL);
-
-	APP_JUMB(query->commandType);
-	/* resultRelation is usually predictable from commandType */
-	JumbleExpr(jstate, (Node *) query->cteList);
-	JumbleRangeTable(jstate, query->rtable);
-	JumbleExpr(jstate, (Node *) query->jointree);
-	JumbleExpr(jstate, (Node *) query->targetList);
-	JumbleExpr(jstate, (Node *) query->onConflict);
-	JumbleExpr(jstate, (Node *) query->returningList);
-	JumbleExpr(jstate, (Node *) query->groupClause);
-	JumbleExpr(jstate, (Node *) query->groupingSets);
-	JumbleExpr(jstate, query->havingQual);
-	JumbleExpr(jstate, (Node *) query->windowClause);
-	JumbleExpr(jstate, (Node *) query->distinctClause);
-	JumbleExpr(jstate, (Node *) query->sortClause);
-	JumbleExpr(jstate, query->limitOffset);
-	JumbleExpr(jstate, query->limitCount);
-	JumbleRowMarks(jstate, query->rowMarks);
-	JumbleExpr(jstate, query->setOperations);
-}
-
-/*
- * Jumble a range table
- */
-static void
-JumbleRangeTable(pgssJumbleState *jstate, List *rtable)
-{
-	ListCell   *lc;
-
-	foreach(lc, rtable)
-	{
-		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
-
-		APP_JUMB(rte->rtekind);
-		switch (rte->rtekind)
-		{
-			case RTE_RELATION:
-				APP_JUMB(rte->relid);
-				JumbleExpr(jstate, (Node *) rte->tablesample);
-				break;
-			case RTE_SUBQUERY:
-				JumbleQuery(jstate, rte->subquery);
-				break;
-			case RTE_JOIN:
-				APP_JUMB(rte->jointype);
-				break;
-			case RTE_FUNCTION:
-				JumbleExpr(jstate, (Node *) rte->functions);
-				break;
-			case RTE_TABLEFUNC:
-				JumbleExpr(jstate, (Node *) rte->tablefunc);
-				break;
-			case RTE_VALUES:
-				JumbleExpr(jstate, (Node *) rte->values_lists);
-				break;
-			case RTE_CTE:
-
-				/*
-				 * Depending on the CTE name here isn't ideal, but it's the
-				 * only info we have to identify the referenced WITH item.
-				 */
-				APP_JUMB_STRING(rte->ctename);
-				APP_JUMB(rte->ctelevelsup);
-				break;
-			case RTE_NAMEDTUPLESTORE:
-				APP_JUMB_STRING(rte->enrname);
-				break;
-			case RTE_RESULT:
-				break;
-			default:
-				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
-				break;
-		}
-	}
-}
-
-/*
- * Jumble a rowMarks list
- */
-static void
-JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks)
-{
-	ListCell   *lc;
-
-	foreach(lc, rowMarks)
-	{
-		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
-
-		if (!rowmark->pushedDown)
-		{
-			APP_JUMB(rowmark->rti);
-			APP_JUMB(rowmark->strength);
-			APP_JUMB(rowmark->waitPolicy);
-		}
-	}
-}
-
-/*
- * Jumble an expression tree
- *
- * In general this function should handle all the same node types that
- * expression_tree_walker() does, and therefore it's coded to be as parallel
- * to that function as possible.  However, since we are only invoked on
- * queries immediately post-parse-analysis, we need not handle node types
- * that only appear in planning.
- *
- * Note: the reason we don't simply use expression_tree_walker() is that the
- * point of that function is to support tree walkers that don't care about
- * most tree node types, but here we care about all types.  We should complain
- * about any unrecognized node type.
- */
-static void
-JumbleExpr(pgssJumbleState *jstate, Node *node)
-{
-	ListCell   *temp;
-
-	if (node == NULL)
-		return;
-
-	/* Guard against stack overflow due to overly complex expressions */
-	check_stack_depth();
-
-	/*
-	 * We always emit the node's NodeTag, then any additional fields that are
-	 * considered significant, and then we recurse to any child nodes.
-	 */
-	APP_JUMB(node->type);
-
-	switch (nodeTag(node))
-	{
-		case T_Var:
-			{
-				Var		   *var = (Var *) node;
-
-				APP_JUMB(var->varno);
-				APP_JUMB(var->varattno);
-				APP_JUMB(var->varlevelsup);
-			}
-			break;
-		case T_Const:
-			{
-				Const	   *c = (Const *) node;
-
-				/* We jumble only the constant's type, not its value */
-				APP_JUMB(c->consttype);
-				/* Also, record its parse location for query normalization */
-				RecordConstLocation(jstate, c->location);
-			}
-			break;
-		case T_Param:
-			{
-				Param	   *p = (Param *) node;
-
-				APP_JUMB(p->paramkind);
-				APP_JUMB(p->paramid);
-				APP_JUMB(p->paramtype);
-				/* Also, track the highest external Param id */
-				if (p->paramkind == PARAM_EXTERN &&
-					p->paramid > jstate->highest_extern_param_id)
-					jstate->highest_extern_param_id = p->paramid;
-			}
-			break;
-		case T_Aggref:
-			{
-				Aggref	   *expr = (Aggref *) node;
-
-				APP_JUMB(expr->aggfnoid);
-				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggorder);
-				JumbleExpr(jstate, (Node *) expr->aggdistinct);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_GroupingFunc:
-			{
-				GroupingFunc *grpnode = (GroupingFunc *) node;
-
-				JumbleExpr(jstate, (Node *) grpnode->refs);
-			}
-			break;
-		case T_WindowFunc:
-			{
-				WindowFunc *expr = (WindowFunc *) node;
-
-				APP_JUMB(expr->winfnoid);
-				APP_JUMB(expr->winref);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_SubscriptingRef:
-			{
-				SubscriptingRef *sbsref = (SubscriptingRef *) node;
-
-				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
-			}
-			break;
-		case T_FuncExpr:
-			{
-				FuncExpr   *expr = (FuncExpr *) node;
-
-				APP_JUMB(expr->funcid);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_NamedArgExpr:
-			{
-				NamedArgExpr *nae = (NamedArgExpr *) node;
-
-				APP_JUMB(nae->argnumber);
-				JumbleExpr(jstate, (Node *) nae->arg);
-			}
-			break;
-		case T_OpExpr:
-		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
-		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
-			{
-				OpExpr	   *expr = (OpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_ScalarArrayOpExpr:
-			{
-				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				APP_JUMB(expr->useOr);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_BoolExpr:
-			{
-				BoolExpr   *expr = (BoolExpr *) node;
-
-				APP_JUMB(expr->boolop);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_SubLink:
-			{
-				SubLink    *sublink = (SubLink *) node;
-
-				APP_JUMB(sublink->subLinkType);
-				APP_JUMB(sublink->subLinkId);
-				JumbleExpr(jstate, (Node *) sublink->testexpr);
-				JumbleQuery(jstate, castNode(Query, sublink->subselect));
-			}
-			break;
-		case T_FieldSelect:
-			{
-				FieldSelect *fs = (FieldSelect *) node;
-
-				APP_JUMB(fs->fieldnum);
-				JumbleExpr(jstate, (Node *) fs->arg);
-			}
-			break;
-		case T_FieldStore:
-			{
-				FieldStore *fstore = (FieldStore *) node;
-
-				JumbleExpr(jstate, (Node *) fstore->arg);
-				JumbleExpr(jstate, (Node *) fstore->newvals);
-			}
-			break;
-		case T_RelabelType:
-			{
-				RelabelType *rt = (RelabelType *) node;
-
-				APP_JUMB(rt->resulttype);
-				JumbleExpr(jstate, (Node *) rt->arg);
-			}
-			break;
-		case T_CoerceViaIO:
-			{
-				CoerceViaIO *cio = (CoerceViaIO *) node;
-
-				APP_JUMB(cio->resulttype);
-				JumbleExpr(jstate, (Node *) cio->arg);
-			}
-			break;
-		case T_ArrayCoerceExpr:
-			{
-				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
-
-				APP_JUMB(acexpr->resulttype);
-				JumbleExpr(jstate, (Node *) acexpr->arg);
-				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
-			}
-			break;
-		case T_ConvertRowtypeExpr:
-			{
-				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
-
-				APP_JUMB(crexpr->resulttype);
-				JumbleExpr(jstate, (Node *) crexpr->arg);
-			}
-			break;
-		case T_CollateExpr:
-			{
-				CollateExpr *ce = (CollateExpr *) node;
-
-				APP_JUMB(ce->collOid);
-				JumbleExpr(jstate, (Node *) ce->arg);
-			}
-			break;
-		case T_CaseExpr:
-			{
-				CaseExpr   *caseexpr = (CaseExpr *) node;
-
-				JumbleExpr(jstate, (Node *) caseexpr->arg);
-				foreach(temp, caseexpr->args)
-				{
-					CaseWhen   *when = lfirst_node(CaseWhen, temp);
-
-					JumbleExpr(jstate, (Node *) when->expr);
-					JumbleExpr(jstate, (Node *) when->result);
-				}
-				JumbleExpr(jstate, (Node *) caseexpr->defresult);
-			}
-			break;
-		case T_CaseTestExpr:
-			{
-				CaseTestExpr *ct = (CaseTestExpr *) node;
-
-				APP_JUMB(ct->typeId);
-			}
-			break;
-		case T_ArrayExpr:
-			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
-			break;
-		case T_RowExpr:
-			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
-			break;
-		case T_RowCompareExpr:
-			{
-				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
-
-				APP_JUMB(rcexpr->rctype);
-				JumbleExpr(jstate, (Node *) rcexpr->largs);
-				JumbleExpr(jstate, (Node *) rcexpr->rargs);
-			}
-			break;
-		case T_CoalesceExpr:
-			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
-			break;
-		case T_MinMaxExpr:
-			{
-				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
-
-				APP_JUMB(mmexpr->op);
-				JumbleExpr(jstate, (Node *) mmexpr->args);
-			}
-			break;
-		case T_SQLValueFunction:
-			{
-				SQLValueFunction *svf = (SQLValueFunction *) node;
-
-				APP_JUMB(svf->op);
-				/* type is fully determined by op */
-				APP_JUMB(svf->typmod);
-			}
-			break;
-		case T_XmlExpr:
-			{
-				XmlExpr    *xexpr = (XmlExpr *) node;
-
-				APP_JUMB(xexpr->op);
-				JumbleExpr(jstate, (Node *) xexpr->named_args);
-				JumbleExpr(jstate, (Node *) xexpr->args);
-			}
-			break;
-		case T_NullTest:
-			{
-				NullTest   *nt = (NullTest *) node;
-
-				APP_JUMB(nt->nulltesttype);
-				JumbleExpr(jstate, (Node *) nt->arg);
-			}
-			break;
-		case T_BooleanTest:
-			{
-				BooleanTest *bt = (BooleanTest *) node;
-
-				APP_JUMB(bt->booltesttype);
-				JumbleExpr(jstate, (Node *) bt->arg);
-			}
-			break;
-		case T_CoerceToDomain:
-			{
-				CoerceToDomain *cd = (CoerceToDomain *) node;
-
-				APP_JUMB(cd->resulttype);
-				JumbleExpr(jstate, (Node *) cd->arg);
-			}
-			break;
-		case T_CoerceToDomainValue:
-			{
-				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
-
-				APP_JUMB(cdv->typeId);
-			}
-			break;
-		case T_SetToDefault:
-			{
-				SetToDefault *sd = (SetToDefault *) node;
-
-				APP_JUMB(sd->typeId);
-			}
-			break;
-		case T_CurrentOfExpr:
-			{
-				CurrentOfExpr *ce = (CurrentOfExpr *) node;
-
-				APP_JUMB(ce->cvarno);
-				if (ce->cursor_name)
-					APP_JUMB_STRING(ce->cursor_name);
-				APP_JUMB(ce->cursor_param);
-			}
-			break;
-		case T_NextValueExpr:
-			{
-				NextValueExpr *nve = (NextValueExpr *) node;
-
-				APP_JUMB(nve->seqid);
-				APP_JUMB(nve->typeId);
-			}
-			break;
-		case T_InferenceElem:
-			{
-				InferenceElem *ie = (InferenceElem *) node;
-
-				APP_JUMB(ie->infercollid);
-				APP_JUMB(ie->inferopclass);
-				JumbleExpr(jstate, ie->expr);
-			}
-			break;
-		case T_TargetEntry:
-			{
-				TargetEntry *tle = (TargetEntry *) node;
-
-				APP_JUMB(tle->resno);
-				APP_JUMB(tle->ressortgroupref);
-				JumbleExpr(jstate, (Node *) tle->expr);
-			}
-			break;
-		case T_RangeTblRef:
-			{
-				RangeTblRef *rtr = (RangeTblRef *) node;
-
-				APP_JUMB(rtr->rtindex);
-			}
-			break;
-		case T_JoinExpr:
-			{
-				JoinExpr   *join = (JoinExpr *) node;
-
-				APP_JUMB(join->jointype);
-				APP_JUMB(join->isNatural);
-				APP_JUMB(join->rtindex);
-				JumbleExpr(jstate, join->larg);
-				JumbleExpr(jstate, join->rarg);
-				JumbleExpr(jstate, join->quals);
-			}
-			break;
-		case T_FromExpr:
-			{
-				FromExpr   *from = (FromExpr *) node;
-
-				JumbleExpr(jstate, (Node *) from->fromlist);
-				JumbleExpr(jstate, from->quals);
-			}
-			break;
-		case T_OnConflictExpr:
-			{
-				OnConflictExpr *conf = (OnConflictExpr *) node;
-
-				APP_JUMB(conf->action);
-				JumbleExpr(jstate, (Node *) conf->arbiterElems);
-				JumbleExpr(jstate, conf->arbiterWhere);
-				JumbleExpr(jstate, (Node *) conf->onConflictSet);
-				JumbleExpr(jstate, conf->onConflictWhere);
-				APP_JUMB(conf->constraint);
-				APP_JUMB(conf->exclRelIndex);
-				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
-			}
-			break;
-		case T_List:
-			foreach(temp, (List *) node)
-			{
-				JumbleExpr(jstate, (Node *) lfirst(temp));
-			}
-			break;
-		case T_IntList:
-			foreach(temp, (List *) node)
-			{
-				APP_JUMB(lfirst_int(temp));
-			}
-			break;
-		case T_SortGroupClause:
-			{
-				SortGroupClause *sgc = (SortGroupClause *) node;
-
-				APP_JUMB(sgc->tleSortGroupRef);
-				APP_JUMB(sgc->eqop);
-				APP_JUMB(sgc->sortop);
-				APP_JUMB(sgc->nulls_first);
-			}
-			break;
-		case T_GroupingSet:
-			{
-				GroupingSet *gsnode = (GroupingSet *) node;
-
-				JumbleExpr(jstate, (Node *) gsnode->content);
-			}
-			break;
-		case T_WindowClause:
-			{
-				WindowClause *wc = (WindowClause *) node;
-
-				APP_JUMB(wc->winref);
-				APP_JUMB(wc->frameOptions);
-				JumbleExpr(jstate, (Node *) wc->partitionClause);
-				JumbleExpr(jstate, (Node *) wc->orderClause);
-				JumbleExpr(jstate, wc->startOffset);
-				JumbleExpr(jstate, wc->endOffset);
-			}
-			break;
-		case T_CommonTableExpr:
-			{
-				CommonTableExpr *cte = (CommonTableExpr *) node;
-
-				/* we store the string name because RTE_CTE RTEs need it */
-				APP_JUMB_STRING(cte->ctename);
-				APP_JUMB(cte->ctematerialized);
-				JumbleQuery(jstate, castNode(Query, cte->ctequery));
-			}
-			break;
-		case T_SetOperationStmt:
-			{
-				SetOperationStmt *setop = (SetOperationStmt *) node;
-
-				APP_JUMB(setop->op);
-				APP_JUMB(setop->all);
-				JumbleExpr(jstate, setop->larg);
-				JumbleExpr(jstate, setop->rarg);
-			}
-			break;
-		case T_RangeTblFunction:
-			{
-				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
-
-				JumbleExpr(jstate, rtfunc->funcexpr);
-			}
-			break;
-		case T_TableFunc:
-			{
-				TableFunc  *tablefunc = (TableFunc *) node;
-
-				JumbleExpr(jstate, tablefunc->docexpr);
-				JumbleExpr(jstate, tablefunc->rowexpr);
-				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
-			}
-			break;
-		case T_TableSampleClause:
-			{
-				TableSampleClause *tsc = (TableSampleClause *) node;
-
-				APP_JUMB(tsc->tsmhandler);
-				JumbleExpr(jstate, (Node *) tsc->args);
-				JumbleExpr(jstate, (Node *) tsc->repeatable);
-			}
-			break;
-		default:
-			/* Only a warning, since we can stumble along anyway */
-			elog(WARNING, "unrecognized node type: %d",
-				 (int) nodeTag(node));
-			break;
-	}
-}
-
-/*
- * Record location of constant within query string of query tree
- * that is currently being walked.
- */
-static void
-RecordConstLocation(pgssJumbleState *jstate, int location)
-{
-	/* -1 indicates unknown or undefined location */
-	if (location >= 0)
-	{
-		/* enlarge array if needed */
-		if (jstate->clocations_count >= jstate->clocations_buf_size)
-		{
-			jstate->clocations_buf_size *= 2;
-			jstate->clocations = (pgssLocationLen *)
-				repalloc(jstate->clocations,
-						 jstate->clocations_buf_size *
-						 sizeof(pgssLocationLen));
-		}
-		jstate->clocations[jstate->clocations_count].location = location;
-		/* initialize lengths to -1 to simplify fill_in_constant_lengths */
-		jstate->clocations[jstate->clocations_count].length = -1;
-		jstate->clocations_count++;
-	}
-}
-
 /*
  * Generate a normalized version of the query string that will be used to
  * represent all similar queries.
@@ -3319,7 +2564,7 @@ RecordConstLocation(pgssJumbleState *jstate, int location)
  * Returns a palloc'd string.
  */
 static char *
-generate_normalized_query(pgssJumbleState *jstate, const char *query,
+generate_normalized_query(JumbleState *jstate, const char *query,
 						  int query_loc, int *query_len_p)
 {
 	char	   *norm_query;
@@ -3426,10 +2671,10 @@ generate_normalized_query(pgssJumbleState *jstate, const char *query,
  * reason for a constant to start with a '-'.
  */
 static void
-fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+fill_in_constant_lengths(JumbleState *jstate, const char *query,
 						 int query_loc)
 {
-	pgssLocationLen *locs;
+	LocationLen *locs;
 	core_yyscan_t yyscanner;
 	core_yy_extra_type yyextra;
 	core_YYSTYPE yylval;
@@ -3443,7 +2688,7 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 	 */
 	if (jstate->clocations_count > 1)
 		qsort(jstate->clocations, jstate->clocations_count,
-			  sizeof(pgssLocationLen), comp_location);
+			  sizeof(LocationLen), comp_location);
 	locs = jstate->clocations;
 
 	/* initialize the flex scanner --- should match raw_parser() */
@@ -3523,13 +2768,13 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 }
 
 /*
- * comp_location: comparator for qsorting pgssLocationLen structs by location
+ * comp_location: comparator for qsorting LocationLen structs by location
  */
 static int
 comp_location(const void *a, const void *b)
 {
-	int			l = ((const pgssLocationLen *) a)->location;
-	int			r = ((const pgssLocationLen *) b)->location;
+	int			l = ((const LocationLen *) a)->location;
+	int			r = ((const LocationLen *) b)->location;
 
 	if (l < r)
 		return -1;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.conf b/contrib/pg_stat_statements/pg_stat_statements.conf
index 13346e2807..e47b26040f 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.conf
+++ b/contrib/pg_stat_statements/pg_stat_statements.conf
@@ -1 +1,2 @@
 shared_preload_libraries = 'pg_stat_statements'
+compute_query_id = on
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d1e2e8c4c3..8639914fac 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7560,6 +7560,31 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
      <title>Statistics Monitoring</title>
      <variablelist>
 
+     <varlistentry id="guc-compute-query-id" xreflabel="compute_query_id">
+      <term><varname>compute_query_id</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>compute_query_id</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables in-core computation of a query identifier.  The <xref
+        linkend="pgstatstatements"/> extension requires a query identifier
+        to be computed.  Note that an external module can alternatively
+        be used if the in-core query identifier computation method
+        isn't acceptable.  In this case, in-core computation should
+        remain disabled.  The default is <literal>off</literal>.
+       </para>
+       <note>
+        <para>
+         To ensure that a only one query identifier is calculated and
+         displayed, extensions that calculate query identifiers should
+         throw an error if a query identifier has already been computed.
+        </para>
+       </note>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><varname>log_statement_stats</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 464bf0e5ae..3ca292d71f 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -20,6 +20,14 @@
   This means that a server restart is needed to add or remove the module.
  </para>
 
+ <para>
+  The module will not track statistics unless query
+  identifiers are calculated.  This can be done by enabling <xref
+  linkend="guc-compute-query-id"/> or using a third-party module that
+  computes its own query identifiers.  Note that all statistics tracked
+  by this module must be reset if the query identifier method is changed.
+ </para>
+
  <para>
    When <filename>pg_stat_statements</filename> is loaded, it tracks
    statistics across all databases of the server.  To access and manipulate
@@ -84,7 +92,7 @@
        <structfield>queryid</structfield> <type>bigint</type>
       </para>
       <para>
-       Internal hash code, computed from the statement's parse tree
+       Hash code to identify identical normalized queries.
       </para></entry>
      </row>
 
@@ -386,6 +394,16 @@
    are compared strictly on the basis of their textual query strings, however.
   </para>
 
+  <note>
+   <para>
+    The following details about constant replacement and
+    <structfield>queryid</structfield> only applies when <xref
+    linkend="guc-compute-query-id"/> is enabled.  If you use an external
+    module instead to compute <structfield>queryid</structfield>, you
+    should refer to its documentation for details.
+   </para>
+  </note>
+
   <para>
    When a constant's value has been ignored for purposes of matching the query
    to other queries, the constant is replaced by a parameter symbol, such
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 5de1307570..35cb9ebfd7 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -46,6 +46,8 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/queryjumble.h"
 #include "utils/rel.h"
 
 
@@ -107,6 +109,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -119,8 +122,11 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	query = transformTopLevelStmt(pstate, parseTree);
 
+	if (compute_query_id)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
@@ -140,6 +146,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -152,8 +159,11 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 	/* make sure all is well with parameter types */
 	check_variable_parameters(pstate, query);
 
+	if (compute_query_id)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2b1b68109f..7e034b72b1 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -665,6 +665,7 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 	ParseState *pstate;
 	Query	   *query;
 	List	   *querytree_list;
+	JumbleState *jstate = NULL;
 
 	Assert(query_string != NULL);	/* required as of 8.4 */
 
@@ -683,8 +684,11 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	query = transformTopLevelStmt(pstate, parsetree);
 
+	if (compute_query_id)
+		jstate = JumbleQuery(query, query_string);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 2397fc2453..1d5327cf64 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -22,6 +22,7 @@ OBJS = \
 	pg_rusage.o \
 	ps_status.o \
 	queryenvironment.o \
+	queryjumble.o \
 	rls.o \
 	sampling.o \
 	superuser.o \
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 03daec9a08..a680c70b2e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -529,6 +529,7 @@ extern const struct config_enum_entry dynamic_shared_memory_options[];
 /*
  * GUC option variables that are exported from this module
  */
+bool		compute_query_id = false;
 bool		log_duration = false;
 bool		Debug_print_plan = false;
 bool		Debug_print_parse = false;
@@ -1443,6 +1444,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"compute_query_id", PGC_SUSET, STATS_MONITORING,
+			gettext_noop("Compute query identifiers."),
+			NULL
+		},
+		&compute_query_id,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"log_parser_stats", PGC_SUSET, STATS_MONITORING,
 			gettext_noop("Writes parser performance statistics to the server log."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 791d39cf07..14000cb67d 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -595,6 +595,7 @@
 
 # - Monitoring -
 
+#compute_query_id = off
 #log_parser_stats = off
 #log_planner_stats = off
 #log_executor_stats = off
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
new file mode 100644
index 0000000000..2a47688fd6
--- /dev/null
+++ b/src/backend/utils/misc/queryjumble.c
@@ -0,0 +1,834 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.c
+ *	 Query normalization and fingerprinting.
+ *
+ * Normalization is a process whereby similar queries, typically differing only
+ * in their constants (though the exact rules are somewhat more subtle than
+ * that) are recognized as equivalent, and are tracked as a single entry.  This
+ * is particularly useful for non-prepared queries.
+ *
+ * Normalization is implemented by fingerprinting queries, selectively
+ * serializing those fields of each query tree's nodes that are judged to be
+ * essential to the query.  This is referred to as a query jumble.  This is
+ * distinct from a regular serialization in that various extraneous
+ * information is ignored as irrelevant or not essential to the query, such
+ * as the collations of Vars and, most notably, the values of constants.
+ *
+ * This jumble is acquired at the end of parse analysis of each query, and
+ * a 64-bit hash of it is stored into the query's Query.queryId field.
+ * The server then copies this value around, making it available in plan
+ * tree(s) generated from the query.  The executor can then use this value
+ * to blame query costs on the proper queryId.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/misc/queryjumble.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "common/hashfn.h"
+#include "miscadmin.h"
+#include "parser/scansup.h"
+#include "utils/queryjumble.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+static uint64 compute_utility_queryid(const char *str, int query_len);
+static void AppendJumble(JumbleState *jstate,
+						 const unsigned char *item, Size size);
+static void JumbleQueryInternal(JumbleState *jstate, Query *query);
+static void JumbleRangeTable(JumbleState *jstate, List *rtable);
+static void JumbleRowMarks(JumbleState *jstate, List *rowMarks);
+static void JumbleExpr(JumbleState *jstate, Node *node);
+static void RecordConstLocation(JumbleState *jstate, int location);
+
+/*
+ * Given a possibly multi-statement source string, confine our attention to the
+ * relevant part of the string.
+ */
+const char *
+CleanQuerytext(const char *query, int *location, int *len)
+{
+	int query_location = *location;
+	int query_len = *len;
+
+	/* First apply starting offset, unless it's -1 (unknown). */
+	if (query_location >= 0)
+	{
+		Assert(query_location <= strlen(query));
+		query += query_location;
+		/* Length of 0 (or -1) means "rest of string" */
+		if (query_len <= 0)
+			query_len = strlen(query);
+		else
+			Assert(query_len <= strlen(query));
+	}
+	else
+	{
+		/* If query location is unknown, distrust query_len as well */
+		query_location = 0;
+		query_len = strlen(query);
+	}
+
+	/*
+	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
+	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 */
+	while (query_len > 0 && scanner_isspace(query[0]))
+		query++, query_location++, query_len--;
+	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
+		query_len--;
+
+	*location = query_location;
+	*len = query_len;
+
+	return query;
+}
+
+JumbleState *
+JumbleQuery(Query *query, const char *querytext)
+{
+	JumbleState *jstate = NULL;
+	if (query->utilityStmt)
+	{
+		const char *sql;
+		int query_location = query->stmt_location;
+		int query_len = query->stmt_len;
+
+		/*
+		 * Confine our attention to the relevant part of the string, if the
+		 * query is a portion of a multi-statement source string.
+		 */
+		sql = CleanQuerytext(querytext, &query_location, &query_len);
+
+		query->queryId = compute_utility_queryid(sql, query_len);
+	}
+	else
+	{
+		jstate = (JumbleState *) palloc(sizeof(JumbleState));
+
+		/* Set up workspace for query jumbling */
+		jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
+		jstate->jumble_len = 0;
+		jstate->clocations_buf_size = 32;
+		jstate->clocations = (LocationLen *)
+			palloc(jstate->clocations_buf_size * sizeof(LocationLen));
+		jstate->clocations_count = 0;
+		jstate->highest_extern_param_id = 0;
+
+		/* Compute query ID and mark the Query node with it */
+		JumbleQueryInternal(jstate, query);
+		query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
+														  jstate->jumble_len,
+														  0));
+
+		/*
+		 * If we are unlucky enough to get a hash of zero, use 1 instead, to
+		 * prevent confusion with the utility-statement case.
+		 */
+		if (query->queryId == UINT64CONST(0))
+			query->queryId = UINT64CONST(1);
+	}
+
+	return jstate;
+}
+
+/*
+ * Compute a query identifier for the given utility query string.
+ */
+static uint64
+compute_utility_queryid(const char *str, int query_len)
+{
+	uint64 queryId;
+
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+											   query_len, 0));
+
+	/*
+	 * If we are unlucky enough to get a hash of zero(invalid), use
+	 * queryID as 2 instead, queryID 1 is already in use for normal
+	 * statements.
+	 */
+	if (queryId == UINT64CONST(0))
+		queryId = UINT64CONST(2);
+
+	return queryId;
+}
+
+/*
+ * AppendJumble: Append a value that is substantive in a given query to
+ * the current jumble.
+ */
+static void
+AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
+{
+	unsigned char *jumble = jstate->jumble;
+	Size		jumble_len = jstate->jumble_len;
+
+	/*
+	 * Whenever the jumble buffer is full, we hash the current contents and
+	 * reset the buffer to contain just that hash value, thus relying on the
+	 * hash to summarize everything so far.
+	 */
+	while (size > 0)
+	{
+		Size		part_size;
+
+		if (jumble_len >= JUMBLE_SIZE)
+		{
+			uint64		start_hash;
+
+			start_hash = DatumGetUInt64(hash_any_extended(jumble,
+														  JUMBLE_SIZE, 0));
+			memcpy(jumble, &start_hash, sizeof(start_hash));
+			jumble_len = sizeof(start_hash);
+		}
+		part_size = Min(size, JUMBLE_SIZE - jumble_len);
+		memcpy(jumble + jumble_len, item, part_size);
+		jumble_len += part_size;
+		item += part_size;
+		size -= part_size;
+	}
+	jstate->jumble_len = jumble_len;
+}
+
+/*
+ * Wrappers around AppendJumble to encapsulate details of serialization
+ * of individual local variable elements.
+ */
+#define APP_JUMB(item) \
+	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
+#define APP_JUMB_STRING(str) \
+	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
+
+/*
+ * JumbleQueryInternal: Selectively serialize the query tree, appending
+ * significant data to the "query jumble" while ignoring nonsignificant data.
+ *
+ * Rule of thumb for what to include is that we should ignore anything not
+ * semantically significant (such as alias names) as well as anything that can
+ * be deduced from child nodes (else we'd just be double-hashing that piece
+ * of information).
+ */
+static void
+JumbleQueryInternal(JumbleState *jstate, Query *query)
+{
+	Assert(IsA(query, Query));
+	Assert(query->utilityStmt == NULL);
+
+	APP_JUMB(query->commandType);
+	/* resultRelation is usually predictable from commandType */
+	JumbleExpr(jstate, (Node *) query->cteList);
+	JumbleRangeTable(jstate, query->rtable);
+	JumbleExpr(jstate, (Node *) query->jointree);
+	JumbleExpr(jstate, (Node *) query->targetList);
+	JumbleExpr(jstate, (Node *) query->onConflict);
+	JumbleExpr(jstate, (Node *) query->returningList);
+	JumbleExpr(jstate, (Node *) query->groupClause);
+	JumbleExpr(jstate, (Node *) query->groupingSets);
+	JumbleExpr(jstate, query->havingQual);
+	JumbleExpr(jstate, (Node *) query->windowClause);
+	JumbleExpr(jstate, (Node *) query->distinctClause);
+	JumbleExpr(jstate, (Node *) query->sortClause);
+	JumbleExpr(jstate, query->limitOffset);
+	JumbleExpr(jstate, query->limitCount);
+	JumbleRowMarks(jstate, query->rowMarks);
+	JumbleExpr(jstate, query->setOperations);
+}
+
+/*
+ * Jumble a range table
+ */
+static void
+JumbleRangeTable(JumbleState *jstate, List *rtable)
+{
+	ListCell   *lc;
+
+	foreach(lc, rtable)
+	{
+		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+		APP_JUMB(rte->rtekind);
+		switch (rte->rtekind)
+		{
+			case RTE_RELATION:
+				APP_JUMB(rte->relid);
+				JumbleExpr(jstate, (Node *) rte->tablesample);
+				break;
+			case RTE_SUBQUERY:
+				JumbleQueryInternal(jstate, rte->subquery);
+				break;
+			case RTE_JOIN:
+				APP_JUMB(rte->jointype);
+				break;
+			case RTE_FUNCTION:
+				JumbleExpr(jstate, (Node *) rte->functions);
+				break;
+			case RTE_TABLEFUNC:
+				JumbleExpr(jstate, (Node *) rte->tablefunc);
+				break;
+			case RTE_VALUES:
+				JumbleExpr(jstate, (Node *) rte->values_lists);
+				break;
+			case RTE_CTE:
+
+				/*
+				 * Depending on the CTE name here isn't ideal, but it's the
+				 * only info we have to identify the referenced WITH item.
+				 */
+				APP_JUMB_STRING(rte->ctename);
+				APP_JUMB(rte->ctelevelsup);
+				break;
+			case RTE_NAMEDTUPLESTORE:
+				APP_JUMB_STRING(rte->enrname);
+				break;
+			case RTE_RESULT:
+				break;
+			default:
+				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
+				break;
+		}
+	}
+}
+
+/*
+ * Jumble a rowMarks list
+ */
+static void
+JumbleRowMarks(JumbleState *jstate, List *rowMarks)
+{
+	ListCell   *lc;
+
+	foreach(lc, rowMarks)
+	{
+		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
+
+		if (!rowmark->pushedDown)
+		{
+			APP_JUMB(rowmark->rti);
+			APP_JUMB(rowmark->strength);
+			APP_JUMB(rowmark->waitPolicy);
+		}
+	}
+}
+
+/*
+ * Jumble an expression tree
+ *
+ * In general this function should handle all the same node types that
+ * expression_tree_walker() does, and therefore it's coded to be as parallel
+ * to that function as possible.  However, since we are only invoked on
+ * queries immediately post-parse-analysis, we need not handle node types
+ * that only appear in planning.
+ *
+ * Note: the reason we don't simply use expression_tree_walker() is that the
+ * point of that function is to support tree walkers that don't care about
+ * most tree node types, but here we care about all types.  We should complain
+ * about any unrecognized node type.
+ */
+static void
+JumbleExpr(JumbleState *jstate, Node *node)
+{
+	ListCell   *temp;
+
+	if (node == NULL)
+		return;
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+
+	/*
+	 * We always emit the node's NodeTag, then any additional fields that are
+	 * considered significant, and then we recurse to any child nodes.
+	 */
+	APP_JUMB(node->type);
+
+	switch (nodeTag(node))
+	{
+		case T_Var:
+			{
+				Var		   *var = (Var *) node;
+
+				APP_JUMB(var->varno);
+				APP_JUMB(var->varattno);
+				APP_JUMB(var->varlevelsup);
+			}
+			break;
+		case T_Const:
+			{
+				Const	   *c = (Const *) node;
+
+				/* We jumble only the constant's type, not its value */
+				APP_JUMB(c->consttype);
+				/* Also, record its parse location for query normalization */
+				RecordConstLocation(jstate, c->location);
+			}
+			break;
+		case T_Param:
+			{
+				Param	   *p = (Param *) node;
+
+				APP_JUMB(p->paramkind);
+				APP_JUMB(p->paramid);
+				APP_JUMB(p->paramtype);
+				/* Also, track the highest external Param id */
+				if (p->paramkind == PARAM_EXTERN &&
+					p->paramid > jstate->highest_extern_param_id)
+					jstate->highest_extern_param_id = p->paramid;
+			}
+			break;
+		case T_Aggref:
+			{
+				Aggref	   *expr = (Aggref *) node;
+
+				APP_JUMB(expr->aggfnoid);
+				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggorder);
+				JumbleExpr(jstate, (Node *) expr->aggdistinct);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_GroupingFunc:
+			{
+				GroupingFunc *grpnode = (GroupingFunc *) node;
+
+				JumbleExpr(jstate, (Node *) grpnode->refs);
+			}
+			break;
+		case T_WindowFunc:
+			{
+				WindowFunc *expr = (WindowFunc *) node;
+
+				APP_JUMB(expr->winfnoid);
+				APP_JUMB(expr->winref);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_SubscriptingRef:
+			{
+				SubscriptingRef *sbsref = (SubscriptingRef *) node;
+
+				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
+			}
+			break;
+		case T_FuncExpr:
+			{
+				FuncExpr   *expr = (FuncExpr *) node;
+
+				APP_JUMB(expr->funcid);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_NamedArgExpr:
+			{
+				NamedArgExpr *nae = (NamedArgExpr *) node;
+
+				APP_JUMB(nae->argnumber);
+				JumbleExpr(jstate, (Node *) nae->arg);
+			}
+			break;
+		case T_OpExpr:
+		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
+		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
+			{
+				OpExpr	   *expr = (OpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_ScalarArrayOpExpr:
+			{
+				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				APP_JUMB(expr->useOr);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_BoolExpr:
+			{
+				BoolExpr   *expr = (BoolExpr *) node;
+
+				APP_JUMB(expr->boolop);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_SubLink:
+			{
+				SubLink    *sublink = (SubLink *) node;
+
+				APP_JUMB(sublink->subLinkType);
+				APP_JUMB(sublink->subLinkId);
+				JumbleExpr(jstate, (Node *) sublink->testexpr);
+				JumbleQueryInternal(jstate, castNode(Query, sublink->subselect));
+			}
+			break;
+		case T_FieldSelect:
+			{
+				FieldSelect *fs = (FieldSelect *) node;
+
+				APP_JUMB(fs->fieldnum);
+				JumbleExpr(jstate, (Node *) fs->arg);
+			}
+			break;
+		case T_FieldStore:
+			{
+				FieldStore *fstore = (FieldStore *) node;
+
+				JumbleExpr(jstate, (Node *) fstore->arg);
+				JumbleExpr(jstate, (Node *) fstore->newvals);
+			}
+			break;
+		case T_RelabelType:
+			{
+				RelabelType *rt = (RelabelType *) node;
+
+				APP_JUMB(rt->resulttype);
+				JumbleExpr(jstate, (Node *) rt->arg);
+			}
+			break;
+		case T_CoerceViaIO:
+			{
+				CoerceViaIO *cio = (CoerceViaIO *) node;
+
+				APP_JUMB(cio->resulttype);
+				JumbleExpr(jstate, (Node *) cio->arg);
+			}
+			break;
+		case T_ArrayCoerceExpr:
+			{
+				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
+
+				APP_JUMB(acexpr->resulttype);
+				JumbleExpr(jstate, (Node *) acexpr->arg);
+				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
+			}
+			break;
+		case T_ConvertRowtypeExpr:
+			{
+				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
+
+				APP_JUMB(crexpr->resulttype);
+				JumbleExpr(jstate, (Node *) crexpr->arg);
+			}
+			break;
+		case T_CollateExpr:
+			{
+				CollateExpr *ce = (CollateExpr *) node;
+
+				APP_JUMB(ce->collOid);
+				JumbleExpr(jstate, (Node *) ce->arg);
+			}
+			break;
+		case T_CaseExpr:
+			{
+				CaseExpr   *caseexpr = (CaseExpr *) node;
+
+				JumbleExpr(jstate, (Node *) caseexpr->arg);
+				foreach(temp, caseexpr->args)
+				{
+					CaseWhen   *when = lfirst_node(CaseWhen, temp);
+
+					JumbleExpr(jstate, (Node *) when->expr);
+					JumbleExpr(jstate, (Node *) when->result);
+				}
+				JumbleExpr(jstate, (Node *) caseexpr->defresult);
+			}
+			break;
+		case T_CaseTestExpr:
+			{
+				CaseTestExpr *ct = (CaseTestExpr *) node;
+
+				APP_JUMB(ct->typeId);
+			}
+			break;
+		case T_ArrayExpr:
+			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
+			break;
+		case T_RowExpr:
+			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
+			break;
+		case T_RowCompareExpr:
+			{
+				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
+
+				APP_JUMB(rcexpr->rctype);
+				JumbleExpr(jstate, (Node *) rcexpr->largs);
+				JumbleExpr(jstate, (Node *) rcexpr->rargs);
+			}
+			break;
+		case T_CoalesceExpr:
+			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
+			break;
+		case T_MinMaxExpr:
+			{
+				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
+
+				APP_JUMB(mmexpr->op);
+				JumbleExpr(jstate, (Node *) mmexpr->args);
+			}
+			break;
+		case T_SQLValueFunction:
+			{
+				SQLValueFunction *svf = (SQLValueFunction *) node;
+
+				APP_JUMB(svf->op);
+				/* type is fully determined by op */
+				APP_JUMB(svf->typmod);
+			}
+			break;
+		case T_XmlExpr:
+			{
+				XmlExpr    *xexpr = (XmlExpr *) node;
+
+				APP_JUMB(xexpr->op);
+				JumbleExpr(jstate, (Node *) xexpr->named_args);
+				JumbleExpr(jstate, (Node *) xexpr->args);
+			}
+			break;
+		case T_NullTest:
+			{
+				NullTest   *nt = (NullTest *) node;
+
+				APP_JUMB(nt->nulltesttype);
+				JumbleExpr(jstate, (Node *) nt->arg);
+			}
+			break;
+		case T_BooleanTest:
+			{
+				BooleanTest *bt = (BooleanTest *) node;
+
+				APP_JUMB(bt->booltesttype);
+				JumbleExpr(jstate, (Node *) bt->arg);
+			}
+			break;
+		case T_CoerceToDomain:
+			{
+				CoerceToDomain *cd = (CoerceToDomain *) node;
+
+				APP_JUMB(cd->resulttype);
+				JumbleExpr(jstate, (Node *) cd->arg);
+			}
+			break;
+		case T_CoerceToDomainValue:
+			{
+				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
+
+				APP_JUMB(cdv->typeId);
+			}
+			break;
+		case T_SetToDefault:
+			{
+				SetToDefault *sd = (SetToDefault *) node;
+
+				APP_JUMB(sd->typeId);
+			}
+			break;
+		case T_CurrentOfExpr:
+			{
+				CurrentOfExpr *ce = (CurrentOfExpr *) node;
+
+				APP_JUMB(ce->cvarno);
+				if (ce->cursor_name)
+					APP_JUMB_STRING(ce->cursor_name);
+				APP_JUMB(ce->cursor_param);
+			}
+			break;
+		case T_NextValueExpr:
+			{
+				NextValueExpr *nve = (NextValueExpr *) node;
+
+				APP_JUMB(nve->seqid);
+				APP_JUMB(nve->typeId);
+			}
+			break;
+		case T_InferenceElem:
+			{
+				InferenceElem *ie = (InferenceElem *) node;
+
+				APP_JUMB(ie->infercollid);
+				APP_JUMB(ie->inferopclass);
+				JumbleExpr(jstate, ie->expr);
+			}
+			break;
+		case T_TargetEntry:
+			{
+				TargetEntry *tle = (TargetEntry *) node;
+
+				APP_JUMB(tle->resno);
+				APP_JUMB(tle->ressortgroupref);
+				JumbleExpr(jstate, (Node *) tle->expr);
+			}
+			break;
+		case T_RangeTblRef:
+			{
+				RangeTblRef *rtr = (RangeTblRef *) node;
+
+				APP_JUMB(rtr->rtindex);
+			}
+			break;
+		case T_JoinExpr:
+			{
+				JoinExpr   *join = (JoinExpr *) node;
+
+				APP_JUMB(join->jointype);
+				APP_JUMB(join->isNatural);
+				APP_JUMB(join->rtindex);
+				JumbleExpr(jstate, join->larg);
+				JumbleExpr(jstate, join->rarg);
+				JumbleExpr(jstate, join->quals);
+			}
+			break;
+		case T_FromExpr:
+			{
+				FromExpr   *from = (FromExpr *) node;
+
+				JumbleExpr(jstate, (Node *) from->fromlist);
+				JumbleExpr(jstate, from->quals);
+			}
+			break;
+		case T_OnConflictExpr:
+			{
+				OnConflictExpr *conf = (OnConflictExpr *) node;
+
+				APP_JUMB(conf->action);
+				JumbleExpr(jstate, (Node *) conf->arbiterElems);
+				JumbleExpr(jstate, conf->arbiterWhere);
+				JumbleExpr(jstate, (Node *) conf->onConflictSet);
+				JumbleExpr(jstate, conf->onConflictWhere);
+				APP_JUMB(conf->constraint);
+				APP_JUMB(conf->exclRelIndex);
+				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
+			}
+			break;
+		case T_List:
+			foreach(temp, (List *) node)
+			{
+				JumbleExpr(jstate, (Node *) lfirst(temp));
+			}
+			break;
+		case T_IntList:
+			foreach(temp, (List *) node)
+			{
+				APP_JUMB(lfirst_int(temp));
+			}
+			break;
+		case T_SortGroupClause:
+			{
+				SortGroupClause *sgc = (SortGroupClause *) node;
+
+				APP_JUMB(sgc->tleSortGroupRef);
+				APP_JUMB(sgc->eqop);
+				APP_JUMB(sgc->sortop);
+				APP_JUMB(sgc->nulls_first);
+			}
+			break;
+		case T_GroupingSet:
+			{
+				GroupingSet *gsnode = (GroupingSet *) node;
+
+				JumbleExpr(jstate, (Node *) gsnode->content);
+			}
+			break;
+		case T_WindowClause:
+			{
+				WindowClause *wc = (WindowClause *) node;
+
+				APP_JUMB(wc->winref);
+				APP_JUMB(wc->frameOptions);
+				JumbleExpr(jstate, (Node *) wc->partitionClause);
+				JumbleExpr(jstate, (Node *) wc->orderClause);
+				JumbleExpr(jstate, wc->startOffset);
+				JumbleExpr(jstate, wc->endOffset);
+			}
+			break;
+		case T_CommonTableExpr:
+			{
+				CommonTableExpr *cte = (CommonTableExpr *) node;
+
+				/* we store the string name because RTE_CTE RTEs need it */
+				APP_JUMB_STRING(cte->ctename);
+				APP_JUMB(cte->ctematerialized);
+				JumbleQueryInternal(jstate, castNode(Query, cte->ctequery));
+			}
+			break;
+		case T_SetOperationStmt:
+			{
+				SetOperationStmt *setop = (SetOperationStmt *) node;
+
+				APP_JUMB(setop->op);
+				APP_JUMB(setop->all);
+				JumbleExpr(jstate, setop->larg);
+				JumbleExpr(jstate, setop->rarg);
+			}
+			break;
+		case T_RangeTblFunction:
+			{
+				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
+
+				JumbleExpr(jstate, rtfunc->funcexpr);
+			}
+			break;
+		case T_TableFunc:
+			{
+				TableFunc  *tablefunc = (TableFunc *) node;
+
+				JumbleExpr(jstate, tablefunc->docexpr);
+				JumbleExpr(jstate, tablefunc->rowexpr);
+				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
+			}
+			break;
+		case T_TableSampleClause:
+			{
+				TableSampleClause *tsc = (TableSampleClause *) node;
+
+				APP_JUMB(tsc->tsmhandler);
+				JumbleExpr(jstate, (Node *) tsc->args);
+				JumbleExpr(jstate, (Node *) tsc->repeatable);
+			}
+			break;
+		default:
+			/* Only a warning, since we can stumble along anyway */
+			elog(WARNING, "unrecognized node type: %d",
+				 (int) nodeTag(node));
+			break;
+	}
+}
+
+/*
+ * Record location of constant within query string of query tree
+ * that is currently being walked.
+ */
+static void
+RecordConstLocation(JumbleState *jstate, int location)
+{
+	/* -1 indicates unknown or undefined location */
+	if (location >= 0)
+	{
+		/* enlarge array if needed */
+		if (jstate->clocations_count >= jstate->clocations_buf_size)
+		{
+			jstate->clocations_buf_size *= 2;
+			jstate->clocations = (LocationLen *)
+				repalloc(jstate->clocations,
+						 jstate->clocations_buf_size *
+						 sizeof(LocationLen));
+		}
+		jstate->clocations[jstate->clocations_count].location = location;
+		/* initialize lengths to -1 to simplify third-party module usage */
+		jstate->clocations[jstate->clocations_count].length = -1;
+		jstate->clocations_count++;
+	}
+}
diff --git a/src/include/parser/analyze.h b/src/include/parser/analyze.h
index 4a3c9686f9..6716db6c13 100644
--- a/src/include/parser/analyze.h
+++ b/src/include/parser/analyze.h
@@ -15,10 +15,12 @@
 #define ANALYZE_H
 
 #include "parser/parse_node.h"
+#include "utils/queryjumble.h"
 
 /* Hook for plugins to get control at end of parse analysis */
 typedef void (*post_parse_analyze_hook_type) (ParseState *pstate,
-											  Query *query);
+											  Query *query,
+											  JumbleState *jstate);
 extern PGDLLIMPORT post_parse_analyze_hook_type post_parse_analyze_hook;
 
 
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 5004ee4177..9b6552b25b 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -248,6 +248,7 @@ extern bool log_btree_build_stats;
 extern PGDLLIMPORT bool check_function_bodies;
 extern bool session_auth_is_superuser;
 
+extern bool compute_query_id;
 extern bool log_duration;
 extern int	log_parameter_max_length;
 extern int	log_parameter_max_length_on_error;
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
new file mode 100644
index 0000000000..83ba7339fa
--- /dev/null
+++ b/src/include/utils/queryjumble.h
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.h
+ *	  Query normalization and fingerprinting.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/utils/queryjumble.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef QUERYJUBLE_H
+#define QUERYJUBLE_H
+
+#include "nodes/parsenodes.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+/*
+ * Struct for tracking locations/lengths of constants during normalization
+ */
+typedef struct LocationLen
+{
+	int			location;		/* start offset in query text */
+	int			length;			/* length in bytes, or -1 to ignore */
+} LocationLen;
+
+/*
+ * Working state for computing a query jumble and producing a normalized
+ * query string
+ */
+typedef struct JumbleState
+{
+	/* Jumble of current query tree */
+	unsigned char *jumble;
+
+	/* Number of bytes used in jumble[] */
+	Size		jumble_len;
+
+	/* Array of locations of constants that should be removed */
+	LocationLen *clocations;
+
+	/* Allocated length of clocations array */
+	int			clocations_buf_size;
+
+	/* Current number of valid entries in clocations array */
+	int			clocations_count;
+
+	/* highest Param id we've seen, in order to start normalization correctly */
+	int			highest_extern_param_id;
+} JumbleState;
+
+const char *CleanQuerytext(const char *query, int *location, int *len);
+JumbleState *JumbleQuery(Query *query, const char *querytext);
+
+#endif							/* QUERYJUMBLE_H */
-- 
2.30.1

bruce@momjian.us

almost 5 years ago

In reply to: Julien Rouhaud (#160)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Apr 1, 2021 at 11:30:15PM +0800, Julien Rouhaud wrote:

On Thu, Apr 01, 2021 at 11:05:24PM +0800, Julien Rouhaud wrote:

On Wed, Mar 31, 2021 at 11:18:45AM -0300, Alvaro Herrera wrote:

On 2021-Mar-31, Bruce Momjian wrote:

I assume it is since Alvaro didn't reply. I am planning to apply this
soon.

I'm afraid I don't know enough about how parallel query works to make a
good assessment on this being a good approach or not -- and no time at
present to figure it all out.

I'm far from being an expert either, but at the time I wrote it and
looking at the code around it probably seemed sensible. We could directly call
pgstat_get_my_queryid() in ExecSerializePlan() rather than passing it from the
various callers though, at least there would be a single source for it.

Here's a v21 that includes the mentioned change.

You are using:

/* ----------
* pgstat_get_my_queryid() -
*
* Return current backend's query identifier.
*/
uint64
pgstat_get_my_queryid(void)
{
if (!MyBEEntry)
return 0;

return MyBEEntry->st_queryid;
}

Looking at log_statement:

/* Log immediately if dictated by log_statement */
if (check_log_statement(parsetree_list))
{
ereport(LOG,
(errmsg("statement: %s", query_string),
errhidestmt(true),
errdetail_execute(parsetree_list)));
was_logged = true;
}

it uses the global variable query_string. I wonder if the query hash
should be a global variable too --- this would more clearly match how we
handle top-level info like query_string. Digging into the stats system
to get top-level info does seem odd.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

bruce@momjian.us

almost 5 years ago

In reply to: Bruce Momjian (#161)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Apr 1, 2021 at 01:56:42PM -0400, Bruce Momjian wrote:

You are using:

/* ----------
* pgstat_get_my_queryid() -
*
* Return current backend's query identifier.
*/
uint64
pgstat_get_my_queryid(void)
{
if (!MyBEEntry)
return 0;

return MyBEEntry->st_queryid;
}

Looking at log_statement:

/* Log immediately if dictated by log_statement */
if (check_log_statement(parsetree_list))
{
ereport(LOG,
(errmsg("statement: %s", query_string),
errhidestmt(true),
errdetail_execute(parsetree_list)));
was_logged = true;
}

it uses the global variable query_string. I wonder if the query hash
should be a global variable too --- this would more clearly match how we
handle top-level info like query_string. Digging into the stats system
to get top-level info does seem odd.

Also, if you go in that direction, make sure the hash it set in the same
places the query string is set, though I am unclear how extensions would
handle that.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

rjuju123@gmail.com

almost 5 years ago

In reply to: Bruce Momjian (#162)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Apr 01, 2021 at 01:59:15PM -0400, Bruce Momjian wrote:

On Thu, Apr 1, 2021 at 01:56:42PM -0400, Bruce Momjian wrote:

You are using:

/* ----------
* pgstat_get_my_queryid() -
*
* Return current backend's query identifier.
*/
uint64
pgstat_get_my_queryid(void)
{
if (!MyBEEntry)
return 0;

return MyBEEntry->st_queryid;
}

Looking at log_statement:

/* Log immediately if dictated by log_statement */
if (check_log_statement(parsetree_list))
{
ereport(LOG,
(errmsg("statement: %s", query_string),
errhidestmt(true),
errdetail_execute(parsetree_list)));
was_logged = true;
}

it uses the global variable query_string.

Unless I'm missing something query_string isn't a global variable, it's a
parameter passed to exec_simple_query() from postgresMain().

It's then passed to the stats collector to be able to be displayed in
pg_stat_activity through pgstat_report_activity() a bit like what I do for the
queryid.

There's a global variable debug_query_string, but it's only for debugging
purpose.

I wonder if the query hash
should be a global variable too --- this would more clearly match how we
handle top-level info like query_string. Digging into the stats system
to get top-level info does seem odd.

The main difference is that there's a single top level query_string,
even if it contains multiple statements. But there would be multiple queryid
calculated in that case and we don't want to change it during a top level
multi-statements execution, so we can't use the same approach.

Also, the query_string is directly logged from this code path, while the
queryid is logged as a log_line_prefix, and almost all the code there also
retrieve information from some shared structure.

And since it also has to be available in pg_stat_activity, having a single
source of truth looked like a better approach.

Also, if you go in that direction, make sure the hash it set in the same
places the query string is set, though I am unclear how extensions would
handle that.

It should be transparent for application, it's extracting the first queryid
seen for each top level statement and export it. The rest of the code still
continue to see the queryid that corresponds to the really executed single
statement.

bruce@momjian.us

almost 5 years ago

In reply to: Julien Rouhaud (#163)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Apr 2, 2021 at 02:28:02AM +0800, Julien Rouhaud wrote:

Unless I'm missing something query_string isn't a global variable, it's a
parameter passed to exec_simple_query() from postgresMain().

It's then passed to the stats collector to be able to be displayed in
pg_stat_activity through pgstat_report_activity() a bit like what I do for the
queryid.

There's a global variable debug_query_string, but it's only for debugging
purpose.

I wonder if the query hash
should be a global variable too --- this would more clearly match how we
handle top-level info like query_string. Digging into the stats system
to get top-level info does seem odd.

The main difference is that there's a single top level query_string,
even if it contains multiple statements. But there would be multiple queryid
calculated in that case and we don't want to change it during a top level
multi-statements execution, so we can't use the same approach.

Also, the query_string is directly logged from this code path, while the
queryid is logged as a log_line_prefix, and almost all the code there also
retrieve information from some shared structure.

And since it also has to be available in pg_stat_activity, having a single
source of truth looked like a better approach.

Also, if you go in that direction, make sure the hash it set in the same
places the query string is set, though I am unclear how extensions would
handle that.

It should be transparent for application, it's extracting the first queryid
seen for each top level statement and export it. The rest of the code still
continue to see the queryid that corresponds to the really executed single
statement.

OK, I am happy with your design decisions, thanks.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

rjuju123@gmail.com

almost 5 years ago

In reply to: Bruce Momjian (#164)

3 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Apr 01, 2021 at 03:27:11PM -0400, Bruce Momjian wrote:

OK, I am happy with your design decisions, thanks.

Thanks! While double checking I noticed that I failed to remove a (now)
useless include of pgstat.h in nodeGatherMerge.c in last version. I'm
attaching v22 to fix that, no other change.

Attachments:

v22-0001-Move-pg_stat_statements-query-jumbling-to-core.patchtext/x-diff; charset=us-asciiDownload

From 819a45faf520dfd60b4fe3e9aea111171e3a2b69 Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Mon, 22 Mar 2021 17:43:22 -0400
Subject: [PATCH v22 1/3] Move pg_stat_statements query jumbling to core.

A new compute_query_id GUC is also added, to control whether a query identifier
should be computed by the core.  It's thefore now possible to disable core
queryid computation and use pg_stat_statements with a different algorithm to
compute the query identifier by using third-party module.

To ensure that a single source of query identifier can be used and is well
defined, modules that calculate a query identifier should throw an error if
compute_query_id is enabled or if a query idenfitier was already calculated.
---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +----------------
 .../pg_stat_statements.conf                   |   1 +
 doc/src/sgml/config.sgml                      |  25 +
 doc/src/sgml/pgstatstatements.sgml            |  20 +-
 src/backend/parser/analyze.c                  |  14 +-
 src/backend/tcop/postgres.c                   |   6 +-
 src/backend/utils/misc/Makefile               |   1 +
 src/backend/utils/misc/guc.c                  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          | 834 ++++++++++++++++++
 src/include/parser/analyze.h                  |   4 +-
 src/include/utils/guc.h                       |   1 +
 src/include/utils/queryjumble.h               |  58 ++
 13 files changed, 995 insertions(+), 785 deletions(-)
 create mode 100644 src/backend/utils/misc/queryjumble.c
 create mode 100644 src/include/utils/queryjumble.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 62cccbfa44..bd8c96728c 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * Starting in Postgres 9.2, this module normalized query entries.  As of
+ * Postgres 14, the normalization is done by the core if compute_query_id is
+ * enabled, or optionally by third-party modules.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -114,8 +99,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
-#define JUMBLE_SIZE				1024	/* query serialization buffer size */
-
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -235,40 +218,6 @@ typedef struct pgssSharedState
 	pgssGlobalStats stats;		/* global statistics for pgss */
 } pgssSharedState;
 
-/*
- * Struct for tracking locations/lengths of constants during normalization
- */
-typedef struct pgssLocationLen
-{
-	int			location;		/* start offset in query text */
-	int			length;			/* length in bytes, or -1 to ignore */
-} pgssLocationLen;
-
-/*
- * Working state for computing a query jumble and producing a normalized
- * query string
- */
-typedef struct pgssJumbleState
-{
-	/* Jumble of current query tree */
-	unsigned char *jumble;
-
-	/* Number of bytes used in jumble[] */
-	Size		jumble_len;
-
-	/* Array of locations of constants that should be removed */
-	pgssLocationLen *clocations;
-
-	/* Allocated length of clocations array */
-	int			clocations_buf_size;
-
-	/* Current number of valid entries in clocations array */
-	int			clocations_count;
-
-	/* highest Param id we've seen, in order to start normalization correctly */
-	int			highest_extern_param_id;
-} pgssJumbleState;
-
 /*---- Local variables ----*/
 
 /* Current nesting depth of ExecutorRun+ProcessUtility calls */
@@ -342,7 +291,8 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_info);
 
 static void pgss_shmem_startup(void);
 static void pgss_shmem_shutdown(int code, Datum arg);
-static void pgss_post_parse_analyze(ParseState *pstate, Query *query);
+static void pgss_post_parse_analyze(ParseState *pstate, Query *query,
+									JumbleState *jstate);
 static PlannedStmt *pgss_planner(Query *parse,
 								 const char *query_string,
 								 int cursorOptions,
@@ -364,7 +314,7 @@ static void pgss_store(const char *query, uint64 queryId,
 					   double total_time, uint64 rows,
 					   const BufferUsage *bufusage,
 					   const WalUsage *walusage,
-					   pgssJumbleState *jstate);
+					   JumbleState *jstate);
 static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
 										pgssVersion api_version,
 										bool showtext);
@@ -380,16 +330,9 @@ static char *qtext_fetch(Size query_offset, int query_len,
 static bool need_gc_qtexts(void);
 static void gc_qtexts(void);
 static void entry_reset(Oid userid, Oid dbid, uint64 queryid);
-static void AppendJumble(pgssJumbleState *jstate,
-						 const unsigned char *item, Size size);
-static void JumbleQuery(pgssJumbleState *jstate, Query *query);
-static void JumbleRangeTable(pgssJumbleState *jstate, List *rtable);
-static void JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks);
-static void JumbleExpr(pgssJumbleState *jstate, Node *node);
-static void RecordConstLocation(pgssJumbleState *jstate, int location);
-static char *generate_normalized_query(pgssJumbleState *jstate, const char *query,
+static char *generate_normalized_query(JumbleState *jstate, const char *query,
 									   int query_loc, int *query_len_p);
-static void fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+static void fill_in_constant_lengths(JumbleState *jstate, const char *query,
 									 int query_loc);
 static int	comp_location(const void *a, const void *b);
 
@@ -851,15 +794,10 @@ error:
  * Post-parse-analysis hook: mark query with a queryId
  */
 static void
-pgss_post_parse_analyze(ParseState *pstate, Query *query)
+pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 {
-	pgssJumbleState jstate;
-
 	if (prev_post_parse_analyze_hook)
-		prev_post_parse_analyze_hook(pstate, query);
-
-	/* Assert we didn't do this already */
-	Assert(query->queryId == UINT64CONST(0));
+		prev_post_parse_analyze_hook(pstate, query, jstate);
 
 	/* Safety check... */
 	if (!pgss || !pgss_hash || !pgss_enabled(exec_nested_level))
@@ -879,35 +817,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 	}
 
-	/* Set up workspace for query jumbling */
-	jstate.jumble = (unsigned char *) palloc(JUMBLE_SIZE);
-	jstate.jumble_len = 0;
-	jstate.clocations_buf_size = 32;
-	jstate.clocations = (pgssLocationLen *)
-		palloc(jstate.clocations_buf_size * sizeof(pgssLocationLen));
-	jstate.clocations_count = 0;
-	jstate.highest_extern_param_id = 0;
-
-	/* Compute query ID and mark the Query node with it */
-	JumbleQuery(&jstate, query);
-	query->queryId =
-		DatumGetUInt64(hash_any_extended(jstate.jumble, jstate.jumble_len, 0));
-
 	/*
-	 * If we are unlucky enough to get a hash of zero, use 1 instead, to
-	 * prevent confusion with the utility-statement case.
+	 * If query jumbling were able to identify any ignorable constants, we
+	 * immediately create a hash table entry for the query, so that we can
+	 * record the normalized form of the query string.  If there were no such
+	 * constants, the normalized string would be the same as the query text
+	 * anyway, so there's no need for an early entry.
 	 */
-	if (query->queryId == UINT64CONST(0))
-		query->queryId = UINT64CONST(1);
-
-	/*
-	 * If we were able to identify any ignorable constants, we immediately
-	 * create a hash table entry for the query, so that we can record the
-	 * normalized form of the query string.  If there were no such constants,
-	 * the normalized string would be the same as the query text anyway, so
-	 * there's no need for an early entry.
-	 */
-	if (jstate.clocations_count > 0)
+	if (jstate && jstate->clocations_count > 0)
 		pgss_store(pstate->p_sourcetext,
 				   query->queryId,
 				   query->stmt_location,
@@ -917,7 +834,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 				   0,
 				   NULL,
 				   NULL,
-				   &jstate);
+				   jstate);
 }
 
 /*
@@ -1267,7 +1184,7 @@ pgss_store(const char *query, uint64 queryId,
 		   double total_time, uint64 rows,
 		   const BufferUsage *bufusage,
 		   const WalUsage *walusage,
-		   pgssJumbleState *jstate)
+		   JumbleState *jstate)
 {
 	pgssHashKey key;
 	pgssEntry  *entry;
@@ -2627,678 +2544,6 @@ release_lock:
 	LWLockRelease(pgss->lock);
 }
 
-/*
- * AppendJumble: Append a value that is substantive in a given query to
- * the current jumble.
- */
-static void
-AppendJumble(pgssJumbleState *jstate, const unsigned char *item, Size size)
-{
-	unsigned char *jumble = jstate->jumble;
-	Size		jumble_len = jstate->jumble_len;
-
-	/*
-	 * Whenever the jumble buffer is full, we hash the current contents and
-	 * reset the buffer to contain just that hash value, thus relying on the
-	 * hash to summarize everything so far.
-	 */
-	while (size > 0)
-	{
-		Size		part_size;
-
-		if (jumble_len >= JUMBLE_SIZE)
-		{
-			uint64		start_hash;
-
-			start_hash = DatumGetUInt64(hash_any_extended(jumble,
-														  JUMBLE_SIZE, 0));
-			memcpy(jumble, &start_hash, sizeof(start_hash));
-			jumble_len = sizeof(start_hash);
-		}
-		part_size = Min(size, JUMBLE_SIZE - jumble_len);
-		memcpy(jumble + jumble_len, item, part_size);
-		jumble_len += part_size;
-		item += part_size;
-		size -= part_size;
-	}
-	jstate->jumble_len = jumble_len;
-}
-
-/*
- * Wrappers around AppendJumble to encapsulate details of serialization
- * of individual local variable elements.
- */
-#define APP_JUMB(item) \
-	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
-#define APP_JUMB_STRING(str) \
-	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
-
-/*
- * JumbleQuery: Selectively serialize the query tree, appending significant
- * data to the "query jumble" while ignoring nonsignificant data.
- *
- * Rule of thumb for what to include is that we should ignore anything not
- * semantically significant (such as alias names) as well as anything that can
- * be deduced from child nodes (else we'd just be double-hashing that piece
- * of information).
- */
-static void
-JumbleQuery(pgssJumbleState *jstate, Query *query)
-{
-	Assert(IsA(query, Query));
-	Assert(query->utilityStmt == NULL);
-
-	APP_JUMB(query->commandType);
-	/* resultRelation is usually predictable from commandType */
-	JumbleExpr(jstate, (Node *) query->cteList);
-	JumbleRangeTable(jstate, query->rtable);
-	JumbleExpr(jstate, (Node *) query->jointree);
-	JumbleExpr(jstate, (Node *) query->targetList);
-	JumbleExpr(jstate, (Node *) query->onConflict);
-	JumbleExpr(jstate, (Node *) query->returningList);
-	JumbleExpr(jstate, (Node *) query->groupClause);
-	JumbleExpr(jstate, (Node *) query->groupingSets);
-	JumbleExpr(jstate, query->havingQual);
-	JumbleExpr(jstate, (Node *) query->windowClause);
-	JumbleExpr(jstate, (Node *) query->distinctClause);
-	JumbleExpr(jstate, (Node *) query->sortClause);
-	JumbleExpr(jstate, query->limitOffset);
-	JumbleExpr(jstate, query->limitCount);
-	JumbleRowMarks(jstate, query->rowMarks);
-	JumbleExpr(jstate, query->setOperations);
-}
-
-/*
- * Jumble a range table
- */
-static void
-JumbleRangeTable(pgssJumbleState *jstate, List *rtable)
-{
-	ListCell   *lc;
-
-	foreach(lc, rtable)
-	{
-		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
-
-		APP_JUMB(rte->rtekind);
-		switch (rte->rtekind)
-		{
-			case RTE_RELATION:
-				APP_JUMB(rte->relid);
-				JumbleExpr(jstate, (Node *) rte->tablesample);
-				break;
-			case RTE_SUBQUERY:
-				JumbleQuery(jstate, rte->subquery);
-				break;
-			case RTE_JOIN:
-				APP_JUMB(rte->jointype);
-				break;
-			case RTE_FUNCTION:
-				JumbleExpr(jstate, (Node *) rte->functions);
-				break;
-			case RTE_TABLEFUNC:
-				JumbleExpr(jstate, (Node *) rte->tablefunc);
-				break;
-			case RTE_VALUES:
-				JumbleExpr(jstate, (Node *) rte->values_lists);
-				break;
-			case RTE_CTE:
-
-				/*
-				 * Depending on the CTE name here isn't ideal, but it's the
-				 * only info we have to identify the referenced WITH item.
-				 */
-				APP_JUMB_STRING(rte->ctename);
-				APP_JUMB(rte->ctelevelsup);
-				break;
-			case RTE_NAMEDTUPLESTORE:
-				APP_JUMB_STRING(rte->enrname);
-				break;
-			case RTE_RESULT:
-				break;
-			default:
-				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
-				break;
-		}
-	}
-}
-
-/*
- * Jumble a rowMarks list
- */
-static void
-JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks)
-{
-	ListCell   *lc;
-
-	foreach(lc, rowMarks)
-	{
-		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
-
-		if (!rowmark->pushedDown)
-		{
-			APP_JUMB(rowmark->rti);
-			APP_JUMB(rowmark->strength);
-			APP_JUMB(rowmark->waitPolicy);
-		}
-	}
-}
-
-/*
- * Jumble an expression tree
- *
- * In general this function should handle all the same node types that
- * expression_tree_walker() does, and therefore it's coded to be as parallel
- * to that function as possible.  However, since we are only invoked on
- * queries immediately post-parse-analysis, we need not handle node types
- * that only appear in planning.
- *
- * Note: the reason we don't simply use expression_tree_walker() is that the
- * point of that function is to support tree walkers that don't care about
- * most tree node types, but here we care about all types.  We should complain
- * about any unrecognized node type.
- */
-static void
-JumbleExpr(pgssJumbleState *jstate, Node *node)
-{
-	ListCell   *temp;
-
-	if (node == NULL)
-		return;
-
-	/* Guard against stack overflow due to overly complex expressions */
-	check_stack_depth();
-
-	/*
-	 * We always emit the node's NodeTag, then any additional fields that are
-	 * considered significant, and then we recurse to any child nodes.
-	 */
-	APP_JUMB(node->type);
-
-	switch (nodeTag(node))
-	{
-		case T_Var:
-			{
-				Var		   *var = (Var *) node;
-
-				APP_JUMB(var->varno);
-				APP_JUMB(var->varattno);
-				APP_JUMB(var->varlevelsup);
-			}
-			break;
-		case T_Const:
-			{
-				Const	   *c = (Const *) node;
-
-				/* We jumble only the constant's type, not its value */
-				APP_JUMB(c->consttype);
-				/* Also, record its parse location for query normalization */
-				RecordConstLocation(jstate, c->location);
-			}
-			break;
-		case T_Param:
-			{
-				Param	   *p = (Param *) node;
-
-				APP_JUMB(p->paramkind);
-				APP_JUMB(p->paramid);
-				APP_JUMB(p->paramtype);
-				/* Also, track the highest external Param id */
-				if (p->paramkind == PARAM_EXTERN &&
-					p->paramid > jstate->highest_extern_param_id)
-					jstate->highest_extern_param_id = p->paramid;
-			}
-			break;
-		case T_Aggref:
-			{
-				Aggref	   *expr = (Aggref *) node;
-
-				APP_JUMB(expr->aggfnoid);
-				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggorder);
-				JumbleExpr(jstate, (Node *) expr->aggdistinct);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_GroupingFunc:
-			{
-				GroupingFunc *grpnode = (GroupingFunc *) node;
-
-				JumbleExpr(jstate, (Node *) grpnode->refs);
-			}
-			break;
-		case T_WindowFunc:
-			{
-				WindowFunc *expr = (WindowFunc *) node;
-
-				APP_JUMB(expr->winfnoid);
-				APP_JUMB(expr->winref);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_SubscriptingRef:
-			{
-				SubscriptingRef *sbsref = (SubscriptingRef *) node;
-
-				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
-			}
-			break;
-		case T_FuncExpr:
-			{
-				FuncExpr   *expr = (FuncExpr *) node;
-
-				APP_JUMB(expr->funcid);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_NamedArgExpr:
-			{
-				NamedArgExpr *nae = (NamedArgExpr *) node;
-
-				APP_JUMB(nae->argnumber);
-				JumbleExpr(jstate, (Node *) nae->arg);
-			}
-			break;
-		case T_OpExpr:
-		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
-		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
-			{
-				OpExpr	   *expr = (OpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_ScalarArrayOpExpr:
-			{
-				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				APP_JUMB(expr->useOr);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_BoolExpr:
-			{
-				BoolExpr   *expr = (BoolExpr *) node;
-
-				APP_JUMB(expr->boolop);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_SubLink:
-			{
-				SubLink    *sublink = (SubLink *) node;
-
-				APP_JUMB(sublink->subLinkType);
-				APP_JUMB(sublink->subLinkId);
-				JumbleExpr(jstate, (Node *) sublink->testexpr);
-				JumbleQuery(jstate, castNode(Query, sublink->subselect));
-			}
-			break;
-		case T_FieldSelect:
-			{
-				FieldSelect *fs = (FieldSelect *) node;
-
-				APP_JUMB(fs->fieldnum);
-				JumbleExpr(jstate, (Node *) fs->arg);
-			}
-			break;
-		case T_FieldStore:
-			{
-				FieldStore *fstore = (FieldStore *) node;
-
-				JumbleExpr(jstate, (Node *) fstore->arg);
-				JumbleExpr(jstate, (Node *) fstore->newvals);
-			}
-			break;
-		case T_RelabelType:
-			{
-				RelabelType *rt = (RelabelType *) node;
-
-				APP_JUMB(rt->resulttype);
-				JumbleExpr(jstate, (Node *) rt->arg);
-			}
-			break;
-		case T_CoerceViaIO:
-			{
-				CoerceViaIO *cio = (CoerceViaIO *) node;
-
-				APP_JUMB(cio->resulttype);
-				JumbleExpr(jstate, (Node *) cio->arg);
-			}
-			break;
-		case T_ArrayCoerceExpr:
-			{
-				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
-
-				APP_JUMB(acexpr->resulttype);
-				JumbleExpr(jstate, (Node *) acexpr->arg);
-				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
-			}
-			break;
-		case T_ConvertRowtypeExpr:
-			{
-				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
-
-				APP_JUMB(crexpr->resulttype);
-				JumbleExpr(jstate, (Node *) crexpr->arg);
-			}
-			break;
-		case T_CollateExpr:
-			{
-				CollateExpr *ce = (CollateExpr *) node;
-
-				APP_JUMB(ce->collOid);
-				JumbleExpr(jstate, (Node *) ce->arg);
-			}
-			break;
-		case T_CaseExpr:
-			{
-				CaseExpr   *caseexpr = (CaseExpr *) node;
-
-				JumbleExpr(jstate, (Node *) caseexpr->arg);
-				foreach(temp, caseexpr->args)
-				{
-					CaseWhen   *when = lfirst_node(CaseWhen, temp);
-
-					JumbleExpr(jstate, (Node *) when->expr);
-					JumbleExpr(jstate, (Node *) when->result);
-				}
-				JumbleExpr(jstate, (Node *) caseexpr->defresult);
-			}
-			break;
-		case T_CaseTestExpr:
-			{
-				CaseTestExpr *ct = (CaseTestExpr *) node;
-
-				APP_JUMB(ct->typeId);
-			}
-			break;
-		case T_ArrayExpr:
-			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
-			break;
-		case T_RowExpr:
-			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
-			break;
-		case T_RowCompareExpr:
-			{
-				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
-
-				APP_JUMB(rcexpr->rctype);
-				JumbleExpr(jstate, (Node *) rcexpr->largs);
-				JumbleExpr(jstate, (Node *) rcexpr->rargs);
-			}
-			break;
-		case T_CoalesceExpr:
-			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
-			break;
-		case T_MinMaxExpr:
-			{
-				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
-
-				APP_JUMB(mmexpr->op);
-				JumbleExpr(jstate, (Node *) mmexpr->args);
-			}
-			break;
-		case T_SQLValueFunction:
-			{
-				SQLValueFunction *svf = (SQLValueFunction *) node;
-
-				APP_JUMB(svf->op);
-				/* type is fully determined by op */
-				APP_JUMB(svf->typmod);
-			}
-			break;
-		case T_XmlExpr:
-			{
-				XmlExpr    *xexpr = (XmlExpr *) node;
-
-				APP_JUMB(xexpr->op);
-				JumbleExpr(jstate, (Node *) xexpr->named_args);
-				JumbleExpr(jstate, (Node *) xexpr->args);
-			}
-			break;
-		case T_NullTest:
-			{
-				NullTest   *nt = (NullTest *) node;
-
-				APP_JUMB(nt->nulltesttype);
-				JumbleExpr(jstate, (Node *) nt->arg);
-			}
-			break;
-		case T_BooleanTest:
-			{
-				BooleanTest *bt = (BooleanTest *) node;
-
-				APP_JUMB(bt->booltesttype);
-				JumbleExpr(jstate, (Node *) bt->arg);
-			}
-			break;
-		case T_CoerceToDomain:
-			{
-				CoerceToDomain *cd = (CoerceToDomain *) node;
-
-				APP_JUMB(cd->resulttype);
-				JumbleExpr(jstate, (Node *) cd->arg);
-			}
-			break;
-		case T_CoerceToDomainValue:
-			{
-				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
-
-				APP_JUMB(cdv->typeId);
-			}
-			break;
-		case T_SetToDefault:
-			{
-				SetToDefault *sd = (SetToDefault *) node;
-
-				APP_JUMB(sd->typeId);
-			}
-			break;
-		case T_CurrentOfExpr:
-			{
-				CurrentOfExpr *ce = (CurrentOfExpr *) node;
-
-				APP_JUMB(ce->cvarno);
-				if (ce->cursor_name)
-					APP_JUMB_STRING(ce->cursor_name);
-				APP_JUMB(ce->cursor_param);
-			}
-			break;
-		case T_NextValueExpr:
-			{
-				NextValueExpr *nve = (NextValueExpr *) node;
-
-				APP_JUMB(nve->seqid);
-				APP_JUMB(nve->typeId);
-			}
-			break;
-		case T_InferenceElem:
-			{
-				InferenceElem *ie = (InferenceElem *) node;
-
-				APP_JUMB(ie->infercollid);
-				APP_JUMB(ie->inferopclass);
-				JumbleExpr(jstate, ie->expr);
-			}
-			break;
-		case T_TargetEntry:
-			{
-				TargetEntry *tle = (TargetEntry *) node;
-
-				APP_JUMB(tle->resno);
-				APP_JUMB(tle->ressortgroupref);
-				JumbleExpr(jstate, (Node *) tle->expr);
-			}
-			break;
-		case T_RangeTblRef:
-			{
-				RangeTblRef *rtr = (RangeTblRef *) node;
-
-				APP_JUMB(rtr->rtindex);
-			}
-			break;
-		case T_JoinExpr:
-			{
-				JoinExpr   *join = (JoinExpr *) node;
-
-				APP_JUMB(join->jointype);
-				APP_JUMB(join->isNatural);
-				APP_JUMB(join->rtindex);
-				JumbleExpr(jstate, join->larg);
-				JumbleExpr(jstate, join->rarg);
-				JumbleExpr(jstate, join->quals);
-			}
-			break;
-		case T_FromExpr:
-			{
-				FromExpr   *from = (FromExpr *) node;
-
-				JumbleExpr(jstate, (Node *) from->fromlist);
-				JumbleExpr(jstate, from->quals);
-			}
-			break;
-		case T_OnConflictExpr:
-			{
-				OnConflictExpr *conf = (OnConflictExpr *) node;
-
-				APP_JUMB(conf->action);
-				JumbleExpr(jstate, (Node *) conf->arbiterElems);
-				JumbleExpr(jstate, conf->arbiterWhere);
-				JumbleExpr(jstate, (Node *) conf->onConflictSet);
-				JumbleExpr(jstate, conf->onConflictWhere);
-				APP_JUMB(conf->constraint);
-				APP_JUMB(conf->exclRelIndex);
-				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
-			}
-			break;
-		case T_List:
-			foreach(temp, (List *) node)
-			{
-				JumbleExpr(jstate, (Node *) lfirst(temp));
-			}
-			break;
-		case T_IntList:
-			foreach(temp, (List *) node)
-			{
-				APP_JUMB(lfirst_int(temp));
-			}
-			break;
-		case T_SortGroupClause:
-			{
-				SortGroupClause *sgc = (SortGroupClause *) node;
-
-				APP_JUMB(sgc->tleSortGroupRef);
-				APP_JUMB(sgc->eqop);
-				APP_JUMB(sgc->sortop);
-				APP_JUMB(sgc->nulls_first);
-			}
-			break;
-		case T_GroupingSet:
-			{
-				GroupingSet *gsnode = (GroupingSet *) node;
-
-				JumbleExpr(jstate, (Node *) gsnode->content);
-			}
-			break;
-		case T_WindowClause:
-			{
-				WindowClause *wc = (WindowClause *) node;
-
-				APP_JUMB(wc->winref);
-				APP_JUMB(wc->frameOptions);
-				JumbleExpr(jstate, (Node *) wc->partitionClause);
-				JumbleExpr(jstate, (Node *) wc->orderClause);
-				JumbleExpr(jstate, wc->startOffset);
-				JumbleExpr(jstate, wc->endOffset);
-			}
-			break;
-		case T_CommonTableExpr:
-			{
-				CommonTableExpr *cte = (CommonTableExpr *) node;
-
-				/* we store the string name because RTE_CTE RTEs need it */
-				APP_JUMB_STRING(cte->ctename);
-				APP_JUMB(cte->ctematerialized);
-				JumbleQuery(jstate, castNode(Query, cte->ctequery));
-			}
-			break;
-		case T_SetOperationStmt:
-			{
-				SetOperationStmt *setop = (SetOperationStmt *) node;
-
-				APP_JUMB(setop->op);
-				APP_JUMB(setop->all);
-				JumbleExpr(jstate, setop->larg);
-				JumbleExpr(jstate, setop->rarg);
-			}
-			break;
-		case T_RangeTblFunction:
-			{
-				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
-
-				JumbleExpr(jstate, rtfunc->funcexpr);
-			}
-			break;
-		case T_TableFunc:
-			{
-				TableFunc  *tablefunc = (TableFunc *) node;
-
-				JumbleExpr(jstate, tablefunc->docexpr);
-				JumbleExpr(jstate, tablefunc->rowexpr);
-				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
-			}
-			break;
-		case T_TableSampleClause:
-			{
-				TableSampleClause *tsc = (TableSampleClause *) node;
-
-				APP_JUMB(tsc->tsmhandler);
-				JumbleExpr(jstate, (Node *) tsc->args);
-				JumbleExpr(jstate, (Node *) tsc->repeatable);
-			}
-			break;
-		default:
-			/* Only a warning, since we can stumble along anyway */
-			elog(WARNING, "unrecognized node type: %d",
-				 (int) nodeTag(node));
-			break;
-	}
-}
-
-/*
- * Record location of constant within query string of query tree
- * that is currently being walked.
- */
-static void
-RecordConstLocation(pgssJumbleState *jstate, int location)
-{
-	/* -1 indicates unknown or undefined location */
-	if (location >= 0)
-	{
-		/* enlarge array if needed */
-		if (jstate->clocations_count >= jstate->clocations_buf_size)
-		{
-			jstate->clocations_buf_size *= 2;
-			jstate->clocations = (pgssLocationLen *)
-				repalloc(jstate->clocations,
-						 jstate->clocations_buf_size *
-						 sizeof(pgssLocationLen));
-		}
-		jstate->clocations[jstate->clocations_count].location = location;
-		/* initialize lengths to -1 to simplify fill_in_constant_lengths */
-		jstate->clocations[jstate->clocations_count].length = -1;
-		jstate->clocations_count++;
-	}
-}
-
 /*
  * Generate a normalized version of the query string that will be used to
  * represent all similar queries.
@@ -3319,7 +2564,7 @@ RecordConstLocation(pgssJumbleState *jstate, int location)
  * Returns a palloc'd string.
  */
 static char *
-generate_normalized_query(pgssJumbleState *jstate, const char *query,
+generate_normalized_query(JumbleState *jstate, const char *query,
 						  int query_loc, int *query_len_p)
 {
 	char	   *norm_query;
@@ -3426,10 +2671,10 @@ generate_normalized_query(pgssJumbleState *jstate, const char *query,
  * reason for a constant to start with a '-'.
  */
 static void
-fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+fill_in_constant_lengths(JumbleState *jstate, const char *query,
 						 int query_loc)
 {
-	pgssLocationLen *locs;
+	LocationLen *locs;
 	core_yyscan_t yyscanner;
 	core_yy_extra_type yyextra;
 	core_YYSTYPE yylval;
@@ -3443,7 +2688,7 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 	 */
 	if (jstate->clocations_count > 1)
 		qsort(jstate->clocations, jstate->clocations_count,
-			  sizeof(pgssLocationLen), comp_location);
+			  sizeof(LocationLen), comp_location);
 	locs = jstate->clocations;
 
 	/* initialize the flex scanner --- should match raw_parser() */
@@ -3523,13 +2768,13 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 }
 
 /*
- * comp_location: comparator for qsorting pgssLocationLen structs by location
+ * comp_location: comparator for qsorting LocationLen structs by location
  */
 static int
 comp_location(const void *a, const void *b)
 {
-	int			l = ((const pgssLocationLen *) a)->location;
-	int			r = ((const pgssLocationLen *) b)->location;
+	int			l = ((const LocationLen *) a)->location;
+	int			r = ((const LocationLen *) b)->location;
 
 	if (l < r)
 		return -1;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.conf b/contrib/pg_stat_statements/pg_stat_statements.conf
index 13346e2807..e47b26040f 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.conf
+++ b/contrib/pg_stat_statements/pg_stat_statements.conf
@@ -1 +1,2 @@
 shared_preload_libraries = 'pg_stat_statements'
+compute_query_id = on
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d1e2e8c4c3..8639914fac 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7560,6 +7560,31 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
      <title>Statistics Monitoring</title>
      <variablelist>
 
+     <varlistentry id="guc-compute-query-id" xreflabel="compute_query_id">
+      <term><varname>compute_query_id</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>compute_query_id</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables in-core computation of a query identifier.  The <xref
+        linkend="pgstatstatements"/> extension requires a query identifier
+        to be computed.  Note that an external module can alternatively
+        be used if the in-core query identifier computation method
+        isn't acceptable.  In this case, in-core computation should
+        remain disabled.  The default is <literal>off</literal>.
+       </para>
+       <note>
+        <para>
+         To ensure that a only one query identifier is calculated and
+         displayed, extensions that calculate query identifiers should
+         throw an error if a query identifier has already been computed.
+        </para>
+       </note>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><varname>log_statement_stats</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 464bf0e5ae..3ca292d71f 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -20,6 +20,14 @@
   This means that a server restart is needed to add or remove the module.
  </para>
 
+ <para>
+  The module will not track statistics unless query
+  identifiers are calculated.  This can be done by enabling <xref
+  linkend="guc-compute-query-id"/> or using a third-party module that
+  computes its own query identifiers.  Note that all statistics tracked
+  by this module must be reset if the query identifier method is changed.
+ </para>
+
  <para>
    When <filename>pg_stat_statements</filename> is loaded, it tracks
    statistics across all databases of the server.  To access and manipulate
@@ -84,7 +92,7 @@
        <structfield>queryid</structfield> <type>bigint</type>
       </para>
       <para>
-       Internal hash code, computed from the statement's parse tree
+       Hash code to identify identical normalized queries.
       </para></entry>
      </row>
 
@@ -386,6 +394,16 @@
    are compared strictly on the basis of their textual query strings, however.
   </para>
 
+  <note>
+   <para>
+    The following details about constant replacement and
+    <structfield>queryid</structfield> only applies when <xref
+    linkend="guc-compute-query-id"/> is enabled.  If you use an external
+    module instead to compute <structfield>queryid</structfield>, you
+    should refer to its documentation for details.
+   </para>
+  </note>
+
   <para>
    When a constant's value has been ignored for purposes of matching the query
    to other queries, the constant is replaced by a parameter symbol, such
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 5de1307570..35cb9ebfd7 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -46,6 +46,8 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/queryjumble.h"
 #include "utils/rel.h"
 
 
@@ -107,6 +109,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -119,8 +122,11 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	query = transformTopLevelStmt(pstate, parseTree);
 
+	if (compute_query_id)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
@@ -140,6 +146,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -152,8 +159,11 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 	/* make sure all is well with parameter types */
 	check_variable_parameters(pstate, query);
 
+	if (compute_query_id)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2b1b68109f..7e034b72b1 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -665,6 +665,7 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 	ParseState *pstate;
 	Query	   *query;
 	List	   *querytree_list;
+	JumbleState *jstate = NULL;
 
 	Assert(query_string != NULL);	/* required as of 8.4 */
 
@@ -683,8 +684,11 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	query = transformTopLevelStmt(pstate, parsetree);
 
+	if (compute_query_id)
+		jstate = JumbleQuery(query, query_string);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 2397fc2453..1d5327cf64 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -22,6 +22,7 @@ OBJS = \
 	pg_rusage.o \
 	ps_status.o \
 	queryenvironment.o \
+	queryjumble.o \
 	rls.o \
 	sampling.o \
 	superuser.o \
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 03daec9a08..a680c70b2e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -529,6 +529,7 @@ extern const struct config_enum_entry dynamic_shared_memory_options[];
 /*
  * GUC option variables that are exported from this module
  */
+bool		compute_query_id = false;
 bool		log_duration = false;
 bool		Debug_print_plan = false;
 bool		Debug_print_parse = false;
@@ -1443,6 +1444,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"compute_query_id", PGC_SUSET, STATS_MONITORING,
+			gettext_noop("Compute query identifiers."),
+			NULL
+		},
+		&compute_query_id,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"log_parser_stats", PGC_SUSET, STATS_MONITORING,
 			gettext_noop("Writes parser performance statistics to the server log."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 791d39cf07..14000cb67d 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -595,6 +595,7 @@
 
 # - Monitoring -
 
+#compute_query_id = off
 #log_parser_stats = off
 #log_planner_stats = off
 #log_executor_stats = off
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
new file mode 100644
index 0000000000..2a47688fd6
--- /dev/null
+++ b/src/backend/utils/misc/queryjumble.c
@@ -0,0 +1,834 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.c
+ *	 Query normalization and fingerprinting.
+ *
+ * Normalization is a process whereby similar queries, typically differing only
+ * in their constants (though the exact rules are somewhat more subtle than
+ * that) are recognized as equivalent, and are tracked as a single entry.  This
+ * is particularly useful for non-prepared queries.
+ *
+ * Normalization is implemented by fingerprinting queries, selectively
+ * serializing those fields of each query tree's nodes that are judged to be
+ * essential to the query.  This is referred to as a query jumble.  This is
+ * distinct from a regular serialization in that various extraneous
+ * information is ignored as irrelevant or not essential to the query, such
+ * as the collations of Vars and, most notably, the values of constants.
+ *
+ * This jumble is acquired at the end of parse analysis of each query, and
+ * a 64-bit hash of it is stored into the query's Query.queryId field.
+ * The server then copies this value around, making it available in plan
+ * tree(s) generated from the query.  The executor can then use this value
+ * to blame query costs on the proper queryId.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/misc/queryjumble.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "common/hashfn.h"
+#include "miscadmin.h"
+#include "parser/scansup.h"
+#include "utils/queryjumble.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+static uint64 compute_utility_queryid(const char *str, int query_len);
+static void AppendJumble(JumbleState *jstate,
+						 const unsigned char *item, Size size);
+static void JumbleQueryInternal(JumbleState *jstate, Query *query);
+static void JumbleRangeTable(JumbleState *jstate, List *rtable);
+static void JumbleRowMarks(JumbleState *jstate, List *rowMarks);
+static void JumbleExpr(JumbleState *jstate, Node *node);
+static void RecordConstLocation(JumbleState *jstate, int location);
+
+/*
+ * Given a possibly multi-statement source string, confine our attention to the
+ * relevant part of the string.
+ */
+const char *
+CleanQuerytext(const char *query, int *location, int *len)
+{
+	int query_location = *location;
+	int query_len = *len;
+
+	/* First apply starting offset, unless it's -1 (unknown). */
+	if (query_location >= 0)
+	{
+		Assert(query_location <= strlen(query));
+		query += query_location;
+		/* Length of 0 (or -1) means "rest of string" */
+		if (query_len <= 0)
+			query_len = strlen(query);
+		else
+			Assert(query_len <= strlen(query));
+	}
+	else
+	{
+		/* If query location is unknown, distrust query_len as well */
+		query_location = 0;
+		query_len = strlen(query);
+	}
+
+	/*
+	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
+	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 */
+	while (query_len > 0 && scanner_isspace(query[0]))
+		query++, query_location++, query_len--;
+	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
+		query_len--;
+
+	*location = query_location;
+	*len = query_len;
+
+	return query;
+}
+
+JumbleState *
+JumbleQuery(Query *query, const char *querytext)
+{
+	JumbleState *jstate = NULL;
+	if (query->utilityStmt)
+	{
+		const char *sql;
+		int query_location = query->stmt_location;
+		int query_len = query->stmt_len;
+
+		/*
+		 * Confine our attention to the relevant part of the string, if the
+		 * query is a portion of a multi-statement source string.
+		 */
+		sql = CleanQuerytext(querytext, &query_location, &query_len);
+
+		query->queryId = compute_utility_queryid(sql, query_len);
+	}
+	else
+	{
+		jstate = (JumbleState *) palloc(sizeof(JumbleState));
+
+		/* Set up workspace for query jumbling */
+		jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
+		jstate->jumble_len = 0;
+		jstate->clocations_buf_size = 32;
+		jstate->clocations = (LocationLen *)
+			palloc(jstate->clocations_buf_size * sizeof(LocationLen));
+		jstate->clocations_count = 0;
+		jstate->highest_extern_param_id = 0;
+
+		/* Compute query ID and mark the Query node with it */
+		JumbleQueryInternal(jstate, query);
+		query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
+														  jstate->jumble_len,
+														  0));
+
+		/*
+		 * If we are unlucky enough to get a hash of zero, use 1 instead, to
+		 * prevent confusion with the utility-statement case.
+		 */
+		if (query->queryId == UINT64CONST(0))
+			query->queryId = UINT64CONST(1);
+	}
+
+	return jstate;
+}
+
+/*
+ * Compute a query identifier for the given utility query string.
+ */
+static uint64
+compute_utility_queryid(const char *str, int query_len)
+{
+	uint64 queryId;
+
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+											   query_len, 0));
+
+	/*
+	 * If we are unlucky enough to get a hash of zero(invalid), use
+	 * queryID as 2 instead, queryID 1 is already in use for normal
+	 * statements.
+	 */
+	if (queryId == UINT64CONST(0))
+		queryId = UINT64CONST(2);
+
+	return queryId;
+}
+
+/*
+ * AppendJumble: Append a value that is substantive in a given query to
+ * the current jumble.
+ */
+static void
+AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
+{
+	unsigned char *jumble = jstate->jumble;
+	Size		jumble_len = jstate->jumble_len;
+
+	/*
+	 * Whenever the jumble buffer is full, we hash the current contents and
+	 * reset the buffer to contain just that hash value, thus relying on the
+	 * hash to summarize everything so far.
+	 */
+	while (size > 0)
+	{
+		Size		part_size;
+
+		if (jumble_len >= JUMBLE_SIZE)
+		{
+			uint64		start_hash;
+
+			start_hash = DatumGetUInt64(hash_any_extended(jumble,
+														  JUMBLE_SIZE, 0));
+			memcpy(jumble, &start_hash, sizeof(start_hash));
+			jumble_len = sizeof(start_hash);
+		}
+		part_size = Min(size, JUMBLE_SIZE - jumble_len);
+		memcpy(jumble + jumble_len, item, part_size);
+		jumble_len += part_size;
+		item += part_size;
+		size -= part_size;
+	}
+	jstate->jumble_len = jumble_len;
+}
+
+/*
+ * Wrappers around AppendJumble to encapsulate details of serialization
+ * of individual local variable elements.
+ */
+#define APP_JUMB(item) \
+	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
+#define APP_JUMB_STRING(str) \
+	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
+
+/*
+ * JumbleQueryInternal: Selectively serialize the query tree, appending
+ * significant data to the "query jumble" while ignoring nonsignificant data.
+ *
+ * Rule of thumb for what to include is that we should ignore anything not
+ * semantically significant (such as alias names) as well as anything that can
+ * be deduced from child nodes (else we'd just be double-hashing that piece
+ * of information).
+ */
+static void
+JumbleQueryInternal(JumbleState *jstate, Query *query)
+{
+	Assert(IsA(query, Query));
+	Assert(query->utilityStmt == NULL);
+
+	APP_JUMB(query->commandType);
+	/* resultRelation is usually predictable from commandType */
+	JumbleExpr(jstate, (Node *) query->cteList);
+	JumbleRangeTable(jstate, query->rtable);
+	JumbleExpr(jstate, (Node *) query->jointree);
+	JumbleExpr(jstate, (Node *) query->targetList);
+	JumbleExpr(jstate, (Node *) query->onConflict);
+	JumbleExpr(jstate, (Node *) query->returningList);
+	JumbleExpr(jstate, (Node *) query->groupClause);
+	JumbleExpr(jstate, (Node *) query->groupingSets);
+	JumbleExpr(jstate, query->havingQual);
+	JumbleExpr(jstate, (Node *) query->windowClause);
+	JumbleExpr(jstate, (Node *) query->distinctClause);
+	JumbleExpr(jstate, (Node *) query->sortClause);
+	JumbleExpr(jstate, query->limitOffset);
+	JumbleExpr(jstate, query->limitCount);
+	JumbleRowMarks(jstate, query->rowMarks);
+	JumbleExpr(jstate, query->setOperations);
+}
+
+/*
+ * Jumble a range table
+ */
+static void
+JumbleRangeTable(JumbleState *jstate, List *rtable)
+{
+	ListCell   *lc;
+
+	foreach(lc, rtable)
+	{
+		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+		APP_JUMB(rte->rtekind);
+		switch (rte->rtekind)
+		{
+			case RTE_RELATION:
+				APP_JUMB(rte->relid);
+				JumbleExpr(jstate, (Node *) rte->tablesample);
+				break;
+			case RTE_SUBQUERY:
+				JumbleQueryInternal(jstate, rte->subquery);
+				break;
+			case RTE_JOIN:
+				APP_JUMB(rte->jointype);
+				break;
+			case RTE_FUNCTION:
+				JumbleExpr(jstate, (Node *) rte->functions);
+				break;
+			case RTE_TABLEFUNC:
+				JumbleExpr(jstate, (Node *) rte->tablefunc);
+				break;
+			case RTE_VALUES:
+				JumbleExpr(jstate, (Node *) rte->values_lists);
+				break;
+			case RTE_CTE:
+
+				/*
+				 * Depending on the CTE name here isn't ideal, but it's the
+				 * only info we have to identify the referenced WITH item.
+				 */
+				APP_JUMB_STRING(rte->ctename);
+				APP_JUMB(rte->ctelevelsup);
+				break;
+			case RTE_NAMEDTUPLESTORE:
+				APP_JUMB_STRING(rte->enrname);
+				break;
+			case RTE_RESULT:
+				break;
+			default:
+				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
+				break;
+		}
+	}
+}
+
+/*
+ * Jumble a rowMarks list
+ */
+static void
+JumbleRowMarks(JumbleState *jstate, List *rowMarks)
+{
+	ListCell   *lc;
+
+	foreach(lc, rowMarks)
+	{
+		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
+
+		if (!rowmark->pushedDown)
+		{
+			APP_JUMB(rowmark->rti);
+			APP_JUMB(rowmark->strength);
+			APP_JUMB(rowmark->waitPolicy);
+		}
+	}
+}
+
+/*
+ * Jumble an expression tree
+ *
+ * In general this function should handle all the same node types that
+ * expression_tree_walker() does, and therefore it's coded to be as parallel
+ * to that function as possible.  However, since we are only invoked on
+ * queries immediately post-parse-analysis, we need not handle node types
+ * that only appear in planning.
+ *
+ * Note: the reason we don't simply use expression_tree_walker() is that the
+ * point of that function is to support tree walkers that don't care about
+ * most tree node types, but here we care about all types.  We should complain
+ * about any unrecognized node type.
+ */
+static void
+JumbleExpr(JumbleState *jstate, Node *node)
+{
+	ListCell   *temp;
+
+	if (node == NULL)
+		return;
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+
+	/*
+	 * We always emit the node's NodeTag, then any additional fields that are
+	 * considered significant, and then we recurse to any child nodes.
+	 */
+	APP_JUMB(node->type);
+
+	switch (nodeTag(node))
+	{
+		case T_Var:
+			{
+				Var		   *var = (Var *) node;
+
+				APP_JUMB(var->varno);
+				APP_JUMB(var->varattno);
+				APP_JUMB(var->varlevelsup);
+			}
+			break;
+		case T_Const:
+			{
+				Const	   *c = (Const *) node;
+
+				/* We jumble only the constant's type, not its value */
+				APP_JUMB(c->consttype);
+				/* Also, record its parse location for query normalization */
+				RecordConstLocation(jstate, c->location);
+			}
+			break;
+		case T_Param:
+			{
+				Param	   *p = (Param *) node;
+
+				APP_JUMB(p->paramkind);
+				APP_JUMB(p->paramid);
+				APP_JUMB(p->paramtype);
+				/* Also, track the highest external Param id */
+				if (p->paramkind == PARAM_EXTERN &&
+					p->paramid > jstate->highest_extern_param_id)
+					jstate->highest_extern_param_id = p->paramid;
+			}
+			break;
+		case T_Aggref:
+			{
+				Aggref	   *expr = (Aggref *) node;
+
+				APP_JUMB(expr->aggfnoid);
+				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggorder);
+				JumbleExpr(jstate, (Node *) expr->aggdistinct);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_GroupingFunc:
+			{
+				GroupingFunc *grpnode = (GroupingFunc *) node;
+
+				JumbleExpr(jstate, (Node *) grpnode->refs);
+			}
+			break;
+		case T_WindowFunc:
+			{
+				WindowFunc *expr = (WindowFunc *) node;
+
+				APP_JUMB(expr->winfnoid);
+				APP_JUMB(expr->winref);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_SubscriptingRef:
+			{
+				SubscriptingRef *sbsref = (SubscriptingRef *) node;
+
+				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
+			}
+			break;
+		case T_FuncExpr:
+			{
+				FuncExpr   *expr = (FuncExpr *) node;
+
+				APP_JUMB(expr->funcid);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_NamedArgExpr:
+			{
+				NamedArgExpr *nae = (NamedArgExpr *) node;
+
+				APP_JUMB(nae->argnumber);
+				JumbleExpr(jstate, (Node *) nae->arg);
+			}
+			break;
+		case T_OpExpr:
+		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
+		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
+			{
+				OpExpr	   *expr = (OpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_ScalarArrayOpExpr:
+			{
+				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				APP_JUMB(expr->useOr);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_BoolExpr:
+			{
+				BoolExpr   *expr = (BoolExpr *) node;
+
+				APP_JUMB(expr->boolop);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_SubLink:
+			{
+				SubLink    *sublink = (SubLink *) node;
+
+				APP_JUMB(sublink->subLinkType);
+				APP_JUMB(sublink->subLinkId);
+				JumbleExpr(jstate, (Node *) sublink->testexpr);
+				JumbleQueryInternal(jstate, castNode(Query, sublink->subselect));
+			}
+			break;
+		case T_FieldSelect:
+			{
+				FieldSelect *fs = (FieldSelect *) node;
+
+				APP_JUMB(fs->fieldnum);
+				JumbleExpr(jstate, (Node *) fs->arg);
+			}
+			break;
+		case T_FieldStore:
+			{
+				FieldStore *fstore = (FieldStore *) node;
+
+				JumbleExpr(jstate, (Node *) fstore->arg);
+				JumbleExpr(jstate, (Node *) fstore->newvals);
+			}
+			break;
+		case T_RelabelType:
+			{
+				RelabelType *rt = (RelabelType *) node;
+
+				APP_JUMB(rt->resulttype);
+				JumbleExpr(jstate, (Node *) rt->arg);
+			}
+			break;
+		case T_CoerceViaIO:
+			{
+				CoerceViaIO *cio = (CoerceViaIO *) node;
+
+				APP_JUMB(cio->resulttype);
+				JumbleExpr(jstate, (Node *) cio->arg);
+			}
+			break;
+		case T_ArrayCoerceExpr:
+			{
+				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
+
+				APP_JUMB(acexpr->resulttype);
+				JumbleExpr(jstate, (Node *) acexpr->arg);
+				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
+			}
+			break;
+		case T_ConvertRowtypeExpr:
+			{
+				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
+
+				APP_JUMB(crexpr->resulttype);
+				JumbleExpr(jstate, (Node *) crexpr->arg);
+			}
+			break;
+		case T_CollateExpr:
+			{
+				CollateExpr *ce = (CollateExpr *) node;
+
+				APP_JUMB(ce->collOid);
+				JumbleExpr(jstate, (Node *) ce->arg);
+			}
+			break;
+		case T_CaseExpr:
+			{
+				CaseExpr   *caseexpr = (CaseExpr *) node;
+
+				JumbleExpr(jstate, (Node *) caseexpr->arg);
+				foreach(temp, caseexpr->args)
+				{
+					CaseWhen   *when = lfirst_node(CaseWhen, temp);
+
+					JumbleExpr(jstate, (Node *) when->expr);
+					JumbleExpr(jstate, (Node *) when->result);
+				}
+				JumbleExpr(jstate, (Node *) caseexpr->defresult);
+			}
+			break;
+		case T_CaseTestExpr:
+			{
+				CaseTestExpr *ct = (CaseTestExpr *) node;
+
+				APP_JUMB(ct->typeId);
+			}
+			break;
+		case T_ArrayExpr:
+			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
+			break;
+		case T_RowExpr:
+			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
+			break;
+		case T_RowCompareExpr:
+			{
+				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
+
+				APP_JUMB(rcexpr->rctype);
+				JumbleExpr(jstate, (Node *) rcexpr->largs);
+				JumbleExpr(jstate, (Node *) rcexpr->rargs);
+			}
+			break;
+		case T_CoalesceExpr:
+			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
+			break;
+		case T_MinMaxExpr:
+			{
+				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
+
+				APP_JUMB(mmexpr->op);
+				JumbleExpr(jstate, (Node *) mmexpr->args);
+			}
+			break;
+		case T_SQLValueFunction:
+			{
+				SQLValueFunction *svf = (SQLValueFunction *) node;
+
+				APP_JUMB(svf->op);
+				/* type is fully determined by op */
+				APP_JUMB(svf->typmod);
+			}
+			break;
+		case T_XmlExpr:
+			{
+				XmlExpr    *xexpr = (XmlExpr *) node;
+
+				APP_JUMB(xexpr->op);
+				JumbleExpr(jstate, (Node *) xexpr->named_args);
+				JumbleExpr(jstate, (Node *) xexpr->args);
+			}
+			break;
+		case T_NullTest:
+			{
+				NullTest   *nt = (NullTest *) node;
+
+				APP_JUMB(nt->nulltesttype);
+				JumbleExpr(jstate, (Node *) nt->arg);
+			}
+			break;
+		case T_BooleanTest:
+			{
+				BooleanTest *bt = (BooleanTest *) node;
+
+				APP_JUMB(bt->booltesttype);
+				JumbleExpr(jstate, (Node *) bt->arg);
+			}
+			break;
+		case T_CoerceToDomain:
+			{
+				CoerceToDomain *cd = (CoerceToDomain *) node;
+
+				APP_JUMB(cd->resulttype);
+				JumbleExpr(jstate, (Node *) cd->arg);
+			}
+			break;
+		case T_CoerceToDomainValue:
+			{
+				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
+
+				APP_JUMB(cdv->typeId);
+			}
+			break;
+		case T_SetToDefault:
+			{
+				SetToDefault *sd = (SetToDefault *) node;
+
+				APP_JUMB(sd->typeId);
+			}
+			break;
+		case T_CurrentOfExpr:
+			{
+				CurrentOfExpr *ce = (CurrentOfExpr *) node;
+
+				APP_JUMB(ce->cvarno);
+				if (ce->cursor_name)
+					APP_JUMB_STRING(ce->cursor_name);
+				APP_JUMB(ce->cursor_param);
+			}
+			break;
+		case T_NextValueExpr:
+			{
+				NextValueExpr *nve = (NextValueExpr *) node;
+
+				APP_JUMB(nve->seqid);
+				APP_JUMB(nve->typeId);
+			}
+			break;
+		case T_InferenceElem:
+			{
+				InferenceElem *ie = (InferenceElem *) node;
+
+				APP_JUMB(ie->infercollid);
+				APP_JUMB(ie->inferopclass);
+				JumbleExpr(jstate, ie->expr);
+			}
+			break;
+		case T_TargetEntry:
+			{
+				TargetEntry *tle = (TargetEntry *) node;
+
+				APP_JUMB(tle->resno);
+				APP_JUMB(tle->ressortgroupref);
+				JumbleExpr(jstate, (Node *) tle->expr);
+			}
+			break;
+		case T_RangeTblRef:
+			{
+				RangeTblRef *rtr = (RangeTblRef *) node;
+
+				APP_JUMB(rtr->rtindex);
+			}
+			break;
+		case T_JoinExpr:
+			{
+				JoinExpr   *join = (JoinExpr *) node;
+
+				APP_JUMB(join->jointype);
+				APP_JUMB(join->isNatural);
+				APP_JUMB(join->rtindex);
+				JumbleExpr(jstate, join->larg);
+				JumbleExpr(jstate, join->rarg);
+				JumbleExpr(jstate, join->quals);
+			}
+			break;
+		case T_FromExpr:
+			{
+				FromExpr   *from = (FromExpr *) node;
+
+				JumbleExpr(jstate, (Node *) from->fromlist);
+				JumbleExpr(jstate, from->quals);
+			}
+			break;
+		case T_OnConflictExpr:
+			{
+				OnConflictExpr *conf = (OnConflictExpr *) node;
+
+				APP_JUMB(conf->action);
+				JumbleExpr(jstate, (Node *) conf->arbiterElems);
+				JumbleExpr(jstate, conf->arbiterWhere);
+				JumbleExpr(jstate, (Node *) conf->onConflictSet);
+				JumbleExpr(jstate, conf->onConflictWhere);
+				APP_JUMB(conf->constraint);
+				APP_JUMB(conf->exclRelIndex);
+				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
+			}
+			break;
+		case T_List:
+			foreach(temp, (List *) node)
+			{
+				JumbleExpr(jstate, (Node *) lfirst(temp));
+			}
+			break;
+		case T_IntList:
+			foreach(temp, (List *) node)
+			{
+				APP_JUMB(lfirst_int(temp));
+			}
+			break;
+		case T_SortGroupClause:
+			{
+				SortGroupClause *sgc = (SortGroupClause *) node;
+
+				APP_JUMB(sgc->tleSortGroupRef);
+				APP_JUMB(sgc->eqop);
+				APP_JUMB(sgc->sortop);
+				APP_JUMB(sgc->nulls_first);
+			}
+			break;
+		case T_GroupingSet:
+			{
+				GroupingSet *gsnode = (GroupingSet *) node;
+
+				JumbleExpr(jstate, (Node *) gsnode->content);
+			}
+			break;
+		case T_WindowClause:
+			{
+				WindowClause *wc = (WindowClause *) node;
+
+				APP_JUMB(wc->winref);
+				APP_JUMB(wc->frameOptions);
+				JumbleExpr(jstate, (Node *) wc->partitionClause);
+				JumbleExpr(jstate, (Node *) wc->orderClause);
+				JumbleExpr(jstate, wc->startOffset);
+				JumbleExpr(jstate, wc->endOffset);
+			}
+			break;
+		case T_CommonTableExpr:
+			{
+				CommonTableExpr *cte = (CommonTableExpr *) node;
+
+				/* we store the string name because RTE_CTE RTEs need it */
+				APP_JUMB_STRING(cte->ctename);
+				APP_JUMB(cte->ctematerialized);
+				JumbleQueryInternal(jstate, castNode(Query, cte->ctequery));
+			}
+			break;
+		case T_SetOperationStmt:
+			{
+				SetOperationStmt *setop = (SetOperationStmt *) node;
+
+				APP_JUMB(setop->op);
+				APP_JUMB(setop->all);
+				JumbleExpr(jstate, setop->larg);
+				JumbleExpr(jstate, setop->rarg);
+			}
+			break;
+		case T_RangeTblFunction:
+			{
+				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
+
+				JumbleExpr(jstate, rtfunc->funcexpr);
+			}
+			break;
+		case T_TableFunc:
+			{
+				TableFunc  *tablefunc = (TableFunc *) node;
+
+				JumbleExpr(jstate, tablefunc->docexpr);
+				JumbleExpr(jstate, tablefunc->rowexpr);
+				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
+			}
+			break;
+		case T_TableSampleClause:
+			{
+				TableSampleClause *tsc = (TableSampleClause *) node;
+
+				APP_JUMB(tsc->tsmhandler);
+				JumbleExpr(jstate, (Node *) tsc->args);
+				JumbleExpr(jstate, (Node *) tsc->repeatable);
+			}
+			break;
+		default:
+			/* Only a warning, since we can stumble along anyway */
+			elog(WARNING, "unrecognized node type: %d",
+				 (int) nodeTag(node));
+			break;
+	}
+}
+
+/*
+ * Record location of constant within query string of query tree
+ * that is currently being walked.
+ */
+static void
+RecordConstLocation(JumbleState *jstate, int location)
+{
+	/* -1 indicates unknown or undefined location */
+	if (location >= 0)
+	{
+		/* enlarge array if needed */
+		if (jstate->clocations_count >= jstate->clocations_buf_size)
+		{
+			jstate->clocations_buf_size *= 2;
+			jstate->clocations = (LocationLen *)
+				repalloc(jstate->clocations,
+						 jstate->clocations_buf_size *
+						 sizeof(LocationLen));
+		}
+		jstate->clocations[jstate->clocations_count].location = location;
+		/* initialize lengths to -1 to simplify third-party module usage */
+		jstate->clocations[jstate->clocations_count].length = -1;
+		jstate->clocations_count++;
+	}
+}
diff --git a/src/include/parser/analyze.h b/src/include/parser/analyze.h
index 4a3c9686f9..6716db6c13 100644
--- a/src/include/parser/analyze.h
+++ b/src/include/parser/analyze.h
@@ -15,10 +15,12 @@
 #define ANALYZE_H
 
 #include "parser/parse_node.h"
+#include "utils/queryjumble.h"
 
 /* Hook for plugins to get control at end of parse analysis */
 typedef void (*post_parse_analyze_hook_type) (ParseState *pstate,
-											  Query *query);
+											  Query *query,
+											  JumbleState *jstate);
 extern PGDLLIMPORT post_parse_analyze_hook_type post_parse_analyze_hook;
 
 
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 5004ee4177..9b6552b25b 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -248,6 +248,7 @@ extern bool log_btree_build_stats;
 extern PGDLLIMPORT bool check_function_bodies;
 extern bool session_auth_is_superuser;
 
+extern bool compute_query_id;
 extern bool log_duration;
 extern int	log_parameter_max_length;
 extern int	log_parameter_max_length_on_error;
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
new file mode 100644
index 0000000000..83ba7339fa
--- /dev/null
+++ b/src/include/utils/queryjumble.h
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.h
+ *	  Query normalization and fingerprinting.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/utils/queryjumble.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef QUERYJUBLE_H
+#define QUERYJUBLE_H
+
+#include "nodes/parsenodes.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+/*
+ * Struct for tracking locations/lengths of constants during normalization
+ */
+typedef struct LocationLen
+{
+	int			location;		/* start offset in query text */
+	int			length;			/* length in bytes, or -1 to ignore */
+} LocationLen;
+
+/*
+ * Working state for computing a query jumble and producing a normalized
+ * query string
+ */
+typedef struct JumbleState
+{
+	/* Jumble of current query tree */
+	unsigned char *jumble;
+
+	/* Number of bytes used in jumble[] */
+	Size		jumble_len;
+
+	/* Array of locations of constants that should be removed */
+	LocationLen *clocations;
+
+	/* Allocated length of clocations array */
+	int			clocations_buf_size;
+
+	/* Current number of valid entries in clocations array */
+	int			clocations_count;
+
+	/* highest Param id we've seen, in order to start normalization correctly */
+	int			highest_extern_param_id;
+} JumbleState;
+
+const char *CleanQuerytext(const char *query, int *location, int *len);
+JumbleState *JumbleQuery(Query *query, const char *querytext);
+
+#endif							/* QUERYJUMBLE_H */
-- 
2.30.1

v22-0002-Expose-queryid-in-pg_stat_activity-and-log_line_.patchtext/x-diff; charset=us-asciiDownload

From c022ef25cbf75747f2a12166977f85f4d273160a Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Mon, 22 Mar 2021 17:43:23 -0400
Subject: [PATCH v22 2/3] Expose queryid in pg_stat_activity and
 log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.
---
 .../pg_stat_statements/pg_stat_statements.c   | 112 +++++++-----------
 doc/src/sgml/config.sgml                      |  29 +++--
 doc/src/sgml/monitoring.sgml                  |  16 +++
 src/backend/catalog/system_views.sql          |   1 +
 src/backend/executor/execMain.c               |   9 ++
 src/backend/executor/execParallel.c           |   5 +-
 src/backend/parser/analyze.c                  |   5 +
 src/backend/postmaster/pgstat.c               |  65 ++++++++++
 src/backend/tcop/postgres.c                   |   5 +
 src/backend/utils/adt/pgstatfuncs.c           |   7 +-
 src/backend/utils/error/elog.c                |   9 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          |  27 ++---
 src/include/catalog/pg_proc.dat               |   6 +-
 src/include/pgstat.h                          |   5 +
 src/test/regress/expected/rules.out           |   9 +-
 16 files changed, 210 insertions(+), 101 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index bd8c96728c..f62b9a2bfd 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -65,6 +65,7 @@
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/queryjumble.h"
 #include "utils/memutils.h"
 #include "utils/timestamp.h"
 
@@ -99,6 +100,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+									!IsA(n, PrepareStmt) && \
+									!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -307,7 +316,6 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 								ProcessUtilityContext context, ParamListInfo params,
 								QueryEnvironment *queryEnv,
 								DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
 static void pgss_store(const char *query, uint64 queryId,
 					   int query_location, int query_len,
 					   pgssStoreKind kind,
@@ -804,16 +812,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * Clear queryId for prepared statements related utility, as those will
+	 * inherit from the underlying statement's one (except DEALLOCATE which is
+	 * entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt))
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -1055,6 +1061,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, QueryCompletion *qc)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
+	uint64		saved_queryId = pstmt->queryId;
+
+	/*
+	 * Force utility statements to get queryId zero.  We do this even in cases
+	 * where the statement contains an optimizable statement for which a
+	 * queryId could be derived (such as EXPLAIN or DECLARE CURSOR).  For such
+	 * cases, runtime control will first go through ProcessUtility and then the
+	 * executor, and we don't want the executor hooks to do anything, since we
+	 * are already measuring the statement's costs at the utility level.
+	 *
+	 * Note that this is only done if pg_stat_statements is enabled and
+	 * configured to track utility statements, in the unlikely possibility
+	 * that user configured another extension to handle utility statements
+	 * only.
+	 */
+	if (pgss_enabled(exec_nested_level) && pgss_track_utility)
+		pstmt->queryId = UINT64CONST(0);
 
 	/*
 	 * If it's an EXECUTE statement, we don't track it and don't increment the
@@ -1071,9 +1094,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	 * Likewise, we don't track execution of DEALLOCATE.
 	 */
 	if (pgss_track_utility && pgss_enabled(exec_nested_level) &&
-		!IsA(parsetree, ExecuteStmt) &&
-		!IsA(parsetree, PrepareStmt) &&
-		!IsA(parsetree, DeallocateStmt))
+		PGSS_HANDLED_UTILITY(parsetree))
 	{
 		instr_time	start;
 		instr_time	duration;
@@ -1128,7 +1149,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
 
 		pgss_store(queryString,
-				   0,			/* signal that it's a utility stmt */
+				   saved_queryId,
 				   pstmt->stmt_location,
 				   pstmt->stmt_len,
 				   PGSS_EXEC,
@@ -1151,23 +1172,12 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	}
 }
 
-/*
- * Given an arbitrarily long query string, produce a hash for the purposes of
- * identifying the query, without normalizing constants.  Used when hashing
- * utility statements.
- */
-static uint64
-pgss_hash_string(const char *str, int len)
-{
-	return DatumGetUInt64(hash_any_extended((const unsigned char *) str,
-											len, 0));
-}
-
 /*
  * Store some statistics for a statement.
  *
- * If queryId is 0 then this is a utility statement and we should compute
- * a suitable queryId internally.
+ * If queryId is 0 then this is a utility statement for which we couldn't
+ * compute a queryId during parse analysis, and we should compute a suitable
+ * queryId internally.
  *
  * If jstate is not NULL then we're trying to create an entry for which
  * we have no statistics as yet; we just want to record the normalized
@@ -1198,52 +1208,18 @@ pgss_store(const char *query, uint64 queryId,
 		return;
 
 	/*
-	 * Confine our attention to the relevant part of the string, if the query
-	 * is a portion of a multi-statement source string.
-	 *
-	 * First apply starting offset, unless it's -1 (unknown).
-	 */
-	if (query_location >= 0)
-	{
-		Assert(query_location <= strlen(query));
-		query += query_location;
-		/* Length of 0 (or -1) means "rest of string" */
-		if (query_len <= 0)
-			query_len = strlen(query);
-		else
-			Assert(query_len <= strlen(query));
-	}
-	else
-	{
-		/* If query location is unknown, distrust query_len as well */
-		query_location = 0;
-		query_len = strlen(query);
-	}
-
-	/*
-	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
-	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 * Nothing to do if compute_query_id isn't enabled and no other module
+	 * computed a query identifier.
 	 */
-	while (query_len > 0 && scanner_isspace(query[0]))
-		query++, query_location++, query_len--;
-	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
-		query_len--;
+	if (queryId == UINT64CONST(0))
+		return;
 
 	/*
-	 * For utility statements, we just hash the query string to get an ID.
+	 * Confine our attention to the relevant part of the string, if the query
+	 * is a portion of a multi-statement source string, and update query
+	 * location and length if needed.
 	 */
-	if (queryId == UINT64CONST(0))
-	{
-		queryId = pgss_hash_string(query, query_len);
-
-		/*
-		 * If we are unlucky enough to get a hash of zero(invalid), use
-		 * queryID as 2 instead, queryID 1 is already in use for normal
-		 * statements.
-		 */
-		if (queryId == UINT64CONST(0))
-			queryId = UINT64CONST(2);
-	}
+	query = CleanQuerytext(query, &query_location, &query_len);
 
 	/* Set up key for hashtable search */
 	key.userid = GetUserId();
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8639914fac..d53d0e234f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6942,6 +6942,15 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>query identifier of the current query.  Query
+             identifiers are not computed by default, so this field
+             will be zero unless <xref linkend="guc-compute-query-id"/>
+             parameter is enabled or a third-party module that computes
+             query identifiers is configured.</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
@@ -7418,8 +7427,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       <listitem>
        <para>
         Enables the collection of information on the currently
-        executing command of each session, along with the time when
-        that command began execution. This parameter is on by
+        executing command of each session, along with its identifier and the
+        time when that command began execution. This parameter is on by
         default. Note that even when enabled, this information is not
         visible to all users, only to superusers and the user owning
         the session being reported on, so it should not represent a
@@ -7568,12 +7577,16 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </term>
       <listitem>
        <para>
-        Enables in-core computation of a query identifier.  The <xref
-        linkend="pgstatstatements"/> extension requires a query identifier
-        to be computed.  Note that an external module can alternatively
-        be used if the in-core query identifier computation method
-        isn't acceptable.  In this case, in-core computation should
-        remain disabled.  The default is <literal>off</literal>.
+        Enables in-core computation of a query identifier.
+        Query identifiers can be displayed in the <link
+        linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
+        view, or emitted in the log if configured via the <xref
+        linkend="guc-log-line-prefix"/> parameter.  The <xref
+        linkend="pgstatstatements"/> extension also requires a query
+        identifier to be computed.  Note that an external module can
+        alternatively be used if the in-core query identifier computation
+        specification isn't acceptable.  In this case, in-core computation
+        must be disabled.  The default is <literal>off</literal>.
        </para>
        <note>
         <para>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index af540fb02f..b4b18fa547 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -910,6 +910,22 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </para></entry>
      </row>
 
+    <row>
+     <entry role="catalog_table_entry"><para role="column_definition">
+      <structfield>queryid</structfield> <type>bigint</type>
+     </para>
+     <para>
+      Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this
+      field shows the identifier of the currently executing query. In
+      all other states, it shows the identifier of last query that was
+      executed.  Query identifiers are not computed by default so this
+      field will be null unless <xref linkend="guc-compute-query-id"/>
+      parameter is enabled or a third-party module that computes query
+      identifiers is configured.
+     </para></entry>
+    </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5f2541d316..4d6b232787 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -833,6 +833,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 163242f54e..82fbfd2259 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -54,6 +54,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "tcop/utility.h"
@@ -128,6 +129,14 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	/*
+	 * In some cases (e.g. an EXECUTE statement) a query execution will skip
+	 * parse analysis, which means that the queryid won't be reported.  Note
+	 * that it's harmless to report the queryid multiple time, as the call will
+	 * be ignored if the top level queryid has already been reported.
+	 */
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..e3cfa96519 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -174,7 +174,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = UINT64CONST(0);
+	pstmt->queryId = pgstat_get_my_queryid();
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
@@ -1403,8 +1403,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Setting debug_query_string for individual workers */
 	debug_query_string = queryDesc->sourceText;
 
-	/* Report workers' query for monitoring purposes */
+	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 35cb9ebfd7..73976cf4f6 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -44,6 +44,7 @@
 #include "parser/parse_target.h"
 #include "parser/parse_type.h"
 #include "parser/parsetree.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
@@ -130,6 +131,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
@@ -167,6 +170,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 4b9bcd2b41..e216bd591c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3381,6 +3381,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
+			beentry->st_queryid = 0;
 			proc->wait_event_info = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
@@ -3435,6 +3436,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	beentry->st_state = state;
 	beentry->st_state_start_timestamp = current_timestamp;
 
+	/*
+	 * If a new query is started, we reset the query identifier as it'll only
+	 * be known after parse analysis, to avoid reporting last query's
+	 * identifier.
+	 */
+	if (state == STATE_RUNNING)
+		beentry->st_queryid = 0;
+
 	if (cmd_str != NULL)
 	{
 		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
@@ -3445,6 +3454,48 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/* --------
+ * pgstat_report_queryid() -
+ *
+ *	Called to update top-level query identifier.
+ * --------
+ */
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	/*
+	 * if track_activities is disabled, st_queryid should already have been
+	 * reset
+	 */
+	if (!pgstat_track_activities)
+		return;
+
+	/*
+	 * We only report the top-level query identifiers.  The stored queryid is
+	 * reset when a backend calls pgstat_report_activity(STATE_RUNNING), or
+	 * with an explicit call to this function using the force flag.  If the
+	 * saved query identifier is not zero it means that it's not a top-level
+	 * command, so ignore the one provided unless it's an explicit call to
+	 * reset the identifier.
+	 */
+	if (beentry->st_queryid != 0 && !force)
+		return;
+
+	/*
+	 * Update my status entry, following the protocol of bumping
+	 * st_changecount before and after.  We use a volatile pointer here to
+	 * ensure the compiler doesn't try to get cute.
+	 */
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+	beentry->st_queryid = queryId;
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
 /*-----------
  * pgstat_progress_start_command() -
  *
@@ -5181,6 +5232,20 @@ pgstat_get_db_entry(Oid databaseid, bool create)
 	return result;
 }
 
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ *	Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+	if (!MyBEEntry)
+		return 0;
+
+	return MyBEEntry->st_queryid;
+}
+
 
 /*
  * Lookup the hash table entry for the specified table. If no hash
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7e034b72b1..d66cee79f0 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -692,6 +692,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	if (log_parser_stats)
 		ShowUsage("PARSE ANALYSIS STATISTICS");
 
@@ -910,6 +912,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
 			stmt->utilityStmt = query->utilityStmt;
 			stmt->stmt_location = query->stmt_location;
 			stmt->stmt_len = query->stmt_len;
+			stmt->queryId = query->queryId;
 		}
 		else
 		{
@@ -1026,6 +1029,8 @@ exec_simple_query(const char *query_string)
 		DestReceiver *receiver;
 		int16		format;
 
+		pgstat_report_queryid(0, true);
+
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 5102227a60..8e81eef8cb 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -569,7 +569,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	29
+#define PG_STAT_GET_ACTIVITY_COLS	30
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -914,6 +914,10 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[27] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			if (beentry->st_queryid == 0)
+				nulls[29] = true;
+			else
+				values[29] = DatumGetUInt64(beentry->st_queryid);
 		}
 		else
 		{
@@ -941,6 +945,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[26] = true;
 			nulls[27] = true;
 			nulls[28] = true;
+			nulls[29] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 423df2f300..bbdef3bf95 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -77,7 +77,6 @@
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
-#include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2710,6 +2709,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (padding != 0)
+					appendStringInfo(buf, "%*ld", padding,
+							pgstat_get_my_queryid());
+				else
+					appendStringInfo(buf, "%ld",
+							pgstat_get_my_queryid());
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 14000cb67d..08b040e9a9 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -542,6 +542,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index 2a47688fd6..53286bb333 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -39,7 +39,7 @@
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
-static uint64 compute_utility_queryid(const char *str, int query_len);
+static uint64 compute_utility_queryid(const char *str, int query_location, int query_len);
 static void AppendJumble(JumbleState *jstate,
 						 const unsigned char *item, Size size);
 static void JumbleQueryInternal(JumbleState *jstate, Query *query);
@@ -97,17 +97,9 @@ JumbleQuery(Query *query, const char *querytext)
 	JumbleState *jstate = NULL;
 	if (query->utilityStmt)
 	{
-		const char *sql;
-		int query_location = query->stmt_location;
-		int query_len = query->stmt_len;
-
-		/*
-		 * Confine our attention to the relevant part of the string, if the
-		 * query is a portion of a multi-statement source string.
-		 */
-		sql = CleanQuerytext(querytext, &query_location, &query_len);
-
-		query->queryId = compute_utility_queryid(sql, query_len);
+		query->queryId = compute_utility_queryid(querytext,
+												 query->stmt_location,
+												 query->stmt_len);
 	}
 	else
 	{
@@ -143,11 +135,18 @@ JumbleQuery(Query *query, const char *querytext)
  * Compute a query identifier for the given utility query string.
  */
 static uint64
-compute_utility_queryid(const char *str, int query_len)
+compute_utility_queryid(const char *query_text, int query_location, int query_len)
 {
 	uint64 queryId;
+	const char *sql;
+
+	/*
+	 * Confine our attention to the relevant part of the string, if the
+	 * query is a portion of a multi-statement source string.
+	 */
+	sql = CleanQuerytext(query_text, &query_location, &query_len);
 
-	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) sql,
 											   query_len, 0));
 
 	/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 69ffd0c3f4..ab30558e3f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5263,9 +5263,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index d699502cd9..3731c43e6d 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1264,6 +1264,9 @@ typedef struct PgBackendStatus
 	ProgressCommandType st_progress_command;
 	Oid			st_progress_command_target;
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+
+	/* query identifier, optionally computed using post_parse_analyze_hook */
+	uint64		st_queryid;
 } PgBackendStatus;
 
 /*
@@ -1458,6 +1461,7 @@ extern void pgstat_initialize(void);
 extern void pgstat_bestart(void);
 
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_queryid(uint64 queryId, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
@@ -1466,6 +1470,7 @@ extern const char *pgstat_get_wait_event_type(uint32 wait_event_info);
 extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
+extern uint64 pgstat_get_my_queryid(void);
 
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 9b59a7b4a5..264deda7af 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1762,9 +1762,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1876,7 +1877,7 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
@@ -2046,7 +2047,7 @@ pg_stat_replication| SELECT s.pid,
     w.sync_priority,
     w.sync_state,
     w.reply_time
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_replication_slots| SELECT s.slot_name,
@@ -2076,7 +2077,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,
-- 
2.30.1

v22-0003-Expose-query-identifier-in-verbose-explain.patchtext/x-diff; charset=us-asciiDownload

From b96339ecaf93929880b4bd370982d7581485551c Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Mon, 22 Mar 2021 17:43:24 -0400
Subject: [PATCH v22 3/3] Expose query identifier in verbose explain

If a query identifier has been computed, either by enabling compute_query_id or
using a third-party module, verbose explain will display it.
---
 doc/src/sgml/config.sgml              |  6 +++---
 doc/src/sgml/ref/explain.sgml         |  6 ++++--
 src/backend/commands/explain.c        | 18 ++++++++++++++++++
 src/test/regress/expected/explain.out | 11 ++++++++++-
 src/test/regress/sql/explain.sql      |  5 ++++-
 5 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d53d0e234f..9520771bf6 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7580,9 +7580,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         Enables in-core computation of a query identifier.
         Query identifiers can be displayed in the <link
         linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
-        view, or emitted in the log if configured via the <xref
-        linkend="guc-log-line-prefix"/> parameter.  The <xref
-        linkend="pgstatstatements"/> extension also requires a query
+        view, using <command>EXPLAIN</command>, or emitted in the log if
+        configured via the <xref linkend="guc-log-line-prefix"/> parameter.
+        The <xref linkend="pgstatstatements"/> extension also requires a query
         identifier to be computed.  Note that an external module can
         alternatively be used if the in-core query identifier computation
         specification isn't acceptable.  In this case, in-core computation
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index c4512332a0..4d758fb237 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -136,8 +136,10 @@ ROLLBACK;
       the output column list for each node in the plan tree, schema-qualify
       table and function names, always label variables in expressions with
       their range table alias, and always print the name of each trigger for
-      which statistics are displayed.  This parameter defaults to
-      <literal>FALSE</literal>.
+      which statistics are displayed.  The query identifier will also be
+      displayed if one has been computed, see <xref
+      linkend="guc-compute-query-id"/> for more details.  This parameter
+      defaults to <literal>FALSE</literal>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 872aaa7aed..04f4822513 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -24,6 +24,7 @@
 #include "nodes/extensible.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "parser/analyze.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/bufmgr.h"
@@ -163,6 +164,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 {
 	ExplainState *es = NewExplainState();
 	TupOutputState *tstate;
+	JumbleState *jstate = NULL;
+	Query		*query;
 	List	   *rewritten;
 	ListCell   *lc;
 	bool		timing_set = false;
@@ -239,6 +242,13 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	/* if the summary was not set explicitly, set default value */
 	es->summary = (summary_set) ? es->summary : es->analyze;
 
+	query = castNode(Query, stmt->query);
+	if (compute_query_id)
+		jstate = JumbleQuery(query, pstate->p_sourcetext);
+
+	if (post_parse_analyze_hook)
+		(*post_parse_analyze_hook) (pstate, query, jstate);
+
 	/*
 	 * Parse analysis was done already, but we still have to run the rule
 	 * rewriter.  We do not do AcquireRewriteLocks: we assume the query either
@@ -598,6 +608,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	/* Create textual dump of plan tree */
 	ExplainPrintPlan(es, queryDesc);
 
+	if (es->verbose && plannedstmt->queryId != UINT64CONST(0))
+	{
+		char	buf[MAXINT8LEN+1];
+
+		pg_lltoa(plannedstmt->queryId, buf);
+		ExplainPropertyText("Query Identifier", buf, es);
+	}
+
 	/* Show buffer usage in planning */
 	if (bufusage)
 	{
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
index b89b99fb02..4c578d4f5e 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -17,7 +17,7 @@ begin
     for ln in execute $1
     loop
         -- Replace any numeric word with just 'N'
-        ln := regexp_replace(ln, '\m\d+\M', 'N', 'g');
+        ln := regexp_replace(ln, '-?\m\d+\M', 'N', 'g');
         -- In sort output, the above won't match units-suffixed numbers
         ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
         -- Ignore text-mode buffers output because it varies depending
@@ -477,3 +477,12 @@ select jsonb_pretty(
 (1 row)
 
 rollback;
+set compute_query_id = on;
+select explain_filter('explain (verbose) select 1');
+             explain_filter             
+----------------------------------------
+ Result  (cost=N.N..N.N rows=N width=N)
+   Output: N
+ Query Identifier: N
+(3 rows)
+
diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql
index f2eab030d6..468caf4037 100644
--- a/src/test/regress/sql/explain.sql
+++ b/src/test/regress/sql/explain.sql
@@ -19,7 +19,7 @@ begin
     for ln in execute $1
     loop
         -- Replace any numeric word with just 'N'
-        ln := regexp_replace(ln, '\m\d+\M', 'N', 'g');
+        ln := regexp_replace(ln, '-?\m\d+\M', 'N', 'g');
         -- In sort output, the above won't match units-suffixed numbers
         ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
         -- Ignore text-mode buffers output because it varies depending
@@ -103,3 +103,6 @@ select jsonb_pretty(
 );
 
 rollback;
+
+set compute_query_id = on;
+select explain_filter('explain (verbose) select 1');
-- 
2.30.1

rjuju123@gmail.com

almost 5 years ago

In reply to: Julien Rouhaud (#165)

3 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Apr 02, 2021 at 01:33:28PM +0800, Julien Rouhaud wrote:

On Thu, Apr 01, 2021 at 03:27:11PM -0400, Bruce Momjian wrote:

OK, I am happy with your design decisions, thanks.

Thanks! While double checking I noticed that I failed to remove a (now)
useless include of pgstat.h in nodeGatherMerge.c in last version. I'm
attaching v22 to fix that, no other change.

There was a conflict since e1025044c (Split backend status and progress related
functionality out of pgstat.c).

Attached v23 is a rebase against current HEAD, and I also added a few
UINT64CONST() macro usage for consistency.

Attachments:

v23-0001-Move-pg_stat_statements-query-jumbling-to-core.patchtext/x-diff; charset=us-asciiDownload

From 29eda2d08f3ed38bbf443898dfad645f5d279d96 Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Mon, 22 Mar 2021 17:43:22 -0400
Subject: [PATCH v23 1/3] Move pg_stat_statements query jumbling to core.

A new compute_query_id GUC is also added, to control whether a query identifier
should be computed by the core.  It's thefore now possible to disable core
queryid computation and use pg_stat_statements with a different algorithm to
compute the query identifier by using third-party module.

To ensure that a single source of query identifier can be used and is well
defined, modules that calculate a query identifier should throw an error if
compute_query_id is enabled or if a query idenfitier was already calculated.
---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +----------------
 .../pg_stat_statements.conf                   |   1 +
 doc/src/sgml/config.sgml                      |  25 +
 doc/src/sgml/pgstatstatements.sgml            |  20 +-
 src/backend/parser/analyze.c                  |  14 +-
 src/backend/tcop/postgres.c                   |   6 +-
 src/backend/utils/misc/Makefile               |   1 +
 src/backend/utils/misc/guc.c                  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          | 834 ++++++++++++++++++
 src/include/parser/analyze.h                  |   4 +-
 src/include/utils/guc.h                       |   1 +
 src/include/utils/queryjumble.h               |  58 ++
 13 files changed, 995 insertions(+), 785 deletions(-)
 create mode 100644 src/backend/utils/misc/queryjumble.c
 create mode 100644 src/include/utils/queryjumble.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 1141d2b067..0f8bac0cca 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * Starting in Postgres 9.2, this module normalized query entries.  As of
+ * Postgres 14, the normalization is done by the core if compute_query_id is
+ * enabled, or optionally by third-party modules.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -116,8 +101,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
-#define JUMBLE_SIZE				1024	/* query serialization buffer size */
-
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -237,40 +220,6 @@ typedef struct pgssSharedState
 	pgssGlobalStats stats;		/* global statistics for pgss */
 } pgssSharedState;
 
-/*
- * Struct for tracking locations/lengths of constants during normalization
- */
-typedef struct pgssLocationLen
-{
-	int			location;		/* start offset in query text */
-	int			length;			/* length in bytes, or -1 to ignore */
-} pgssLocationLen;
-
-/*
- * Working state for computing a query jumble and producing a normalized
- * query string
- */
-typedef struct pgssJumbleState
-{
-	/* Jumble of current query tree */
-	unsigned char *jumble;
-
-	/* Number of bytes used in jumble[] */
-	Size		jumble_len;
-
-	/* Array of locations of constants that should be removed */
-	pgssLocationLen *clocations;
-
-	/* Allocated length of clocations array */
-	int			clocations_buf_size;
-
-	/* Current number of valid entries in clocations array */
-	int			clocations_count;
-
-	/* highest Param id we've seen, in order to start normalization correctly */
-	int			highest_extern_param_id;
-} pgssJumbleState;
-
 /*---- Local variables ----*/
 
 /* Current nesting depth of ExecutorRun+ProcessUtility calls */
@@ -344,7 +293,8 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_info);
 
 static void pgss_shmem_startup(void);
 static void pgss_shmem_shutdown(int code, Datum arg);
-static void pgss_post_parse_analyze(ParseState *pstate, Query *query);
+static void pgss_post_parse_analyze(ParseState *pstate, Query *query,
+									JumbleState *jstate);
 static PlannedStmt *pgss_planner(Query *parse,
 								 const char *query_string,
 								 int cursorOptions,
@@ -366,7 +316,7 @@ static void pgss_store(const char *query, uint64 queryId,
 					   double total_time, uint64 rows,
 					   const BufferUsage *bufusage,
 					   const WalUsage *walusage,
-					   pgssJumbleState *jstate);
+					   JumbleState *jstate);
 static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
 										pgssVersion api_version,
 										bool showtext);
@@ -382,16 +332,9 @@ static char *qtext_fetch(Size query_offset, int query_len,
 static bool need_gc_qtexts(void);
 static void gc_qtexts(void);
 static void entry_reset(Oid userid, Oid dbid, uint64 queryid);
-static void AppendJumble(pgssJumbleState *jstate,
-						 const unsigned char *item, Size size);
-static void JumbleQuery(pgssJumbleState *jstate, Query *query);
-static void JumbleRangeTable(pgssJumbleState *jstate, List *rtable);
-static void JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks);
-static void JumbleExpr(pgssJumbleState *jstate, Node *node);
-static void RecordConstLocation(pgssJumbleState *jstate, int location);
-static char *generate_normalized_query(pgssJumbleState *jstate, const char *query,
+static char *generate_normalized_query(JumbleState *jstate, const char *query,
 									   int query_loc, int *query_len_p);
-static void fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+static void fill_in_constant_lengths(JumbleState *jstate, const char *query,
 									 int query_loc);
 static int	comp_location(const void *a, const void *b);
 
@@ -853,15 +796,10 @@ error:
  * Post-parse-analysis hook: mark query with a queryId
  */
 static void
-pgss_post_parse_analyze(ParseState *pstate, Query *query)
+pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 {
-	pgssJumbleState jstate;
-
 	if (prev_post_parse_analyze_hook)
-		prev_post_parse_analyze_hook(pstate, query);
-
-	/* Assert we didn't do this already */
-	Assert(query->queryId == UINT64CONST(0));
+		prev_post_parse_analyze_hook(pstate, query, jstate);
 
 	/* Safety check... */
 	if (!pgss || !pgss_hash || !pgss_enabled(exec_nested_level))
@@ -881,35 +819,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 	}
 
-	/* Set up workspace for query jumbling */
-	jstate.jumble = (unsigned char *) palloc(JUMBLE_SIZE);
-	jstate.jumble_len = 0;
-	jstate.clocations_buf_size = 32;
-	jstate.clocations = (pgssLocationLen *)
-		palloc(jstate.clocations_buf_size * sizeof(pgssLocationLen));
-	jstate.clocations_count = 0;
-	jstate.highest_extern_param_id = 0;
-
-	/* Compute query ID and mark the Query node with it */
-	JumbleQuery(&jstate, query);
-	query->queryId =
-		DatumGetUInt64(hash_any_extended(jstate.jumble, jstate.jumble_len, 0));
-
 	/*
-	 * If we are unlucky enough to get a hash of zero, use 1 instead, to
-	 * prevent confusion with the utility-statement case.
+	 * If query jumbling were able to identify any ignorable constants, we
+	 * immediately create a hash table entry for the query, so that we can
+	 * record the normalized form of the query string.  If there were no such
+	 * constants, the normalized string would be the same as the query text
+	 * anyway, so there's no need for an early entry.
 	 */
-	if (query->queryId == UINT64CONST(0))
-		query->queryId = UINT64CONST(1);
-
-	/*
-	 * If we were able to identify any ignorable constants, we immediately
-	 * create a hash table entry for the query, so that we can record the
-	 * normalized form of the query string.  If there were no such constants,
-	 * the normalized string would be the same as the query text anyway, so
-	 * there's no need for an early entry.
-	 */
-	if (jstate.clocations_count > 0)
+	if (jstate && jstate->clocations_count > 0)
 		pgss_store(pstate->p_sourcetext,
 				   query->queryId,
 				   query->stmt_location,
@@ -919,7 +836,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 				   0,
 				   NULL,
 				   NULL,
-				   &jstate);
+				   jstate);
 }
 
 /*
@@ -1269,7 +1186,7 @@ pgss_store(const char *query, uint64 queryId,
 		   double total_time, uint64 rows,
 		   const BufferUsage *bufusage,
 		   const WalUsage *walusage,
-		   pgssJumbleState *jstate)
+		   JumbleState *jstate)
 {
 	pgssHashKey key;
 	pgssEntry  *entry;
@@ -2629,678 +2546,6 @@ release_lock:
 	LWLockRelease(pgss->lock);
 }
 
-/*
- * AppendJumble: Append a value that is substantive in a given query to
- * the current jumble.
- */
-static void
-AppendJumble(pgssJumbleState *jstate, const unsigned char *item, Size size)
-{
-	unsigned char *jumble = jstate->jumble;
-	Size		jumble_len = jstate->jumble_len;
-
-	/*
-	 * Whenever the jumble buffer is full, we hash the current contents and
-	 * reset the buffer to contain just that hash value, thus relying on the
-	 * hash to summarize everything so far.
-	 */
-	while (size > 0)
-	{
-		Size		part_size;
-
-		if (jumble_len >= JUMBLE_SIZE)
-		{
-			uint64		start_hash;
-
-			start_hash = DatumGetUInt64(hash_any_extended(jumble,
-														  JUMBLE_SIZE, 0));
-			memcpy(jumble, &start_hash, sizeof(start_hash));
-			jumble_len = sizeof(start_hash);
-		}
-		part_size = Min(size, JUMBLE_SIZE - jumble_len);
-		memcpy(jumble + jumble_len, item, part_size);
-		jumble_len += part_size;
-		item += part_size;
-		size -= part_size;
-	}
-	jstate->jumble_len = jumble_len;
-}
-
-/*
- * Wrappers around AppendJumble to encapsulate details of serialization
- * of individual local variable elements.
- */
-#define APP_JUMB(item) \
-	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
-#define APP_JUMB_STRING(str) \
-	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
-
-/*
- * JumbleQuery: Selectively serialize the query tree, appending significant
- * data to the "query jumble" while ignoring nonsignificant data.
- *
- * Rule of thumb for what to include is that we should ignore anything not
- * semantically significant (such as alias names) as well as anything that can
- * be deduced from child nodes (else we'd just be double-hashing that piece
- * of information).
- */
-static void
-JumbleQuery(pgssJumbleState *jstate, Query *query)
-{
-	Assert(IsA(query, Query));
-	Assert(query->utilityStmt == NULL);
-
-	APP_JUMB(query->commandType);
-	/* resultRelation is usually predictable from commandType */
-	JumbleExpr(jstate, (Node *) query->cteList);
-	JumbleRangeTable(jstate, query->rtable);
-	JumbleExpr(jstate, (Node *) query->jointree);
-	JumbleExpr(jstate, (Node *) query->targetList);
-	JumbleExpr(jstate, (Node *) query->onConflict);
-	JumbleExpr(jstate, (Node *) query->returningList);
-	JumbleExpr(jstate, (Node *) query->groupClause);
-	JumbleExpr(jstate, (Node *) query->groupingSets);
-	JumbleExpr(jstate, query->havingQual);
-	JumbleExpr(jstate, (Node *) query->windowClause);
-	JumbleExpr(jstate, (Node *) query->distinctClause);
-	JumbleExpr(jstate, (Node *) query->sortClause);
-	JumbleExpr(jstate, query->limitOffset);
-	JumbleExpr(jstate, query->limitCount);
-	JumbleRowMarks(jstate, query->rowMarks);
-	JumbleExpr(jstate, query->setOperations);
-}
-
-/*
- * Jumble a range table
- */
-static void
-JumbleRangeTable(pgssJumbleState *jstate, List *rtable)
-{
-	ListCell   *lc;
-
-	foreach(lc, rtable)
-	{
-		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
-
-		APP_JUMB(rte->rtekind);
-		switch (rte->rtekind)
-		{
-			case RTE_RELATION:
-				APP_JUMB(rte->relid);
-				JumbleExpr(jstate, (Node *) rte->tablesample);
-				break;
-			case RTE_SUBQUERY:
-				JumbleQuery(jstate, rte->subquery);
-				break;
-			case RTE_JOIN:
-				APP_JUMB(rte->jointype);
-				break;
-			case RTE_FUNCTION:
-				JumbleExpr(jstate, (Node *) rte->functions);
-				break;
-			case RTE_TABLEFUNC:
-				JumbleExpr(jstate, (Node *) rte->tablefunc);
-				break;
-			case RTE_VALUES:
-				JumbleExpr(jstate, (Node *) rte->values_lists);
-				break;
-			case RTE_CTE:
-
-				/*
-				 * Depending on the CTE name here isn't ideal, but it's the
-				 * only info we have to identify the referenced WITH item.
-				 */
-				APP_JUMB_STRING(rte->ctename);
-				APP_JUMB(rte->ctelevelsup);
-				break;
-			case RTE_NAMEDTUPLESTORE:
-				APP_JUMB_STRING(rte->enrname);
-				break;
-			case RTE_RESULT:
-				break;
-			default:
-				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
-				break;
-		}
-	}
-}
-
-/*
- * Jumble a rowMarks list
- */
-static void
-JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks)
-{
-	ListCell   *lc;
-
-	foreach(lc, rowMarks)
-	{
-		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
-
-		if (!rowmark->pushedDown)
-		{
-			APP_JUMB(rowmark->rti);
-			APP_JUMB(rowmark->strength);
-			APP_JUMB(rowmark->waitPolicy);
-		}
-	}
-}
-
-/*
- * Jumble an expression tree
- *
- * In general this function should handle all the same node types that
- * expression_tree_walker() does, and therefore it's coded to be as parallel
- * to that function as possible.  However, since we are only invoked on
- * queries immediately post-parse-analysis, we need not handle node types
- * that only appear in planning.
- *
- * Note: the reason we don't simply use expression_tree_walker() is that the
- * point of that function is to support tree walkers that don't care about
- * most tree node types, but here we care about all types.  We should complain
- * about any unrecognized node type.
- */
-static void
-JumbleExpr(pgssJumbleState *jstate, Node *node)
-{
-	ListCell   *temp;
-
-	if (node == NULL)
-		return;
-
-	/* Guard against stack overflow due to overly complex expressions */
-	check_stack_depth();
-
-	/*
-	 * We always emit the node's NodeTag, then any additional fields that are
-	 * considered significant, and then we recurse to any child nodes.
-	 */
-	APP_JUMB(node->type);
-
-	switch (nodeTag(node))
-	{
-		case T_Var:
-			{
-				Var		   *var = (Var *) node;
-
-				APP_JUMB(var->varno);
-				APP_JUMB(var->varattno);
-				APP_JUMB(var->varlevelsup);
-			}
-			break;
-		case T_Const:
-			{
-				Const	   *c = (Const *) node;
-
-				/* We jumble only the constant's type, not its value */
-				APP_JUMB(c->consttype);
-				/* Also, record its parse location for query normalization */
-				RecordConstLocation(jstate, c->location);
-			}
-			break;
-		case T_Param:
-			{
-				Param	   *p = (Param *) node;
-
-				APP_JUMB(p->paramkind);
-				APP_JUMB(p->paramid);
-				APP_JUMB(p->paramtype);
-				/* Also, track the highest external Param id */
-				if (p->paramkind == PARAM_EXTERN &&
-					p->paramid > jstate->highest_extern_param_id)
-					jstate->highest_extern_param_id = p->paramid;
-			}
-			break;
-		case T_Aggref:
-			{
-				Aggref	   *expr = (Aggref *) node;
-
-				APP_JUMB(expr->aggfnoid);
-				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggorder);
-				JumbleExpr(jstate, (Node *) expr->aggdistinct);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_GroupingFunc:
-			{
-				GroupingFunc *grpnode = (GroupingFunc *) node;
-
-				JumbleExpr(jstate, (Node *) grpnode->refs);
-			}
-			break;
-		case T_WindowFunc:
-			{
-				WindowFunc *expr = (WindowFunc *) node;
-
-				APP_JUMB(expr->winfnoid);
-				APP_JUMB(expr->winref);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_SubscriptingRef:
-			{
-				SubscriptingRef *sbsref = (SubscriptingRef *) node;
-
-				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
-			}
-			break;
-		case T_FuncExpr:
-			{
-				FuncExpr   *expr = (FuncExpr *) node;
-
-				APP_JUMB(expr->funcid);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_NamedArgExpr:
-			{
-				NamedArgExpr *nae = (NamedArgExpr *) node;
-
-				APP_JUMB(nae->argnumber);
-				JumbleExpr(jstate, (Node *) nae->arg);
-			}
-			break;
-		case T_OpExpr:
-		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
-		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
-			{
-				OpExpr	   *expr = (OpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_ScalarArrayOpExpr:
-			{
-				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				APP_JUMB(expr->useOr);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_BoolExpr:
-			{
-				BoolExpr   *expr = (BoolExpr *) node;
-
-				APP_JUMB(expr->boolop);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_SubLink:
-			{
-				SubLink    *sublink = (SubLink *) node;
-
-				APP_JUMB(sublink->subLinkType);
-				APP_JUMB(sublink->subLinkId);
-				JumbleExpr(jstate, (Node *) sublink->testexpr);
-				JumbleQuery(jstate, castNode(Query, sublink->subselect));
-			}
-			break;
-		case T_FieldSelect:
-			{
-				FieldSelect *fs = (FieldSelect *) node;
-
-				APP_JUMB(fs->fieldnum);
-				JumbleExpr(jstate, (Node *) fs->arg);
-			}
-			break;
-		case T_FieldStore:
-			{
-				FieldStore *fstore = (FieldStore *) node;
-
-				JumbleExpr(jstate, (Node *) fstore->arg);
-				JumbleExpr(jstate, (Node *) fstore->newvals);
-			}
-			break;
-		case T_RelabelType:
-			{
-				RelabelType *rt = (RelabelType *) node;
-
-				APP_JUMB(rt->resulttype);
-				JumbleExpr(jstate, (Node *) rt->arg);
-			}
-			break;
-		case T_CoerceViaIO:
-			{
-				CoerceViaIO *cio = (CoerceViaIO *) node;
-
-				APP_JUMB(cio->resulttype);
-				JumbleExpr(jstate, (Node *) cio->arg);
-			}
-			break;
-		case T_ArrayCoerceExpr:
-			{
-				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
-
-				APP_JUMB(acexpr->resulttype);
-				JumbleExpr(jstate, (Node *) acexpr->arg);
-				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
-			}
-			break;
-		case T_ConvertRowtypeExpr:
-			{
-				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
-
-				APP_JUMB(crexpr->resulttype);
-				JumbleExpr(jstate, (Node *) crexpr->arg);
-			}
-			break;
-		case T_CollateExpr:
-			{
-				CollateExpr *ce = (CollateExpr *) node;
-
-				APP_JUMB(ce->collOid);
-				JumbleExpr(jstate, (Node *) ce->arg);
-			}
-			break;
-		case T_CaseExpr:
-			{
-				CaseExpr   *caseexpr = (CaseExpr *) node;
-
-				JumbleExpr(jstate, (Node *) caseexpr->arg);
-				foreach(temp, caseexpr->args)
-				{
-					CaseWhen   *when = lfirst_node(CaseWhen, temp);
-
-					JumbleExpr(jstate, (Node *) when->expr);
-					JumbleExpr(jstate, (Node *) when->result);
-				}
-				JumbleExpr(jstate, (Node *) caseexpr->defresult);
-			}
-			break;
-		case T_CaseTestExpr:
-			{
-				CaseTestExpr *ct = (CaseTestExpr *) node;
-
-				APP_JUMB(ct->typeId);
-			}
-			break;
-		case T_ArrayExpr:
-			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
-			break;
-		case T_RowExpr:
-			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
-			break;
-		case T_RowCompareExpr:
-			{
-				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
-
-				APP_JUMB(rcexpr->rctype);
-				JumbleExpr(jstate, (Node *) rcexpr->largs);
-				JumbleExpr(jstate, (Node *) rcexpr->rargs);
-			}
-			break;
-		case T_CoalesceExpr:
-			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
-			break;
-		case T_MinMaxExpr:
-			{
-				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
-
-				APP_JUMB(mmexpr->op);
-				JumbleExpr(jstate, (Node *) mmexpr->args);
-			}
-			break;
-		case T_SQLValueFunction:
-			{
-				SQLValueFunction *svf = (SQLValueFunction *) node;
-
-				APP_JUMB(svf->op);
-				/* type is fully determined by op */
-				APP_JUMB(svf->typmod);
-			}
-			break;
-		case T_XmlExpr:
-			{
-				XmlExpr    *xexpr = (XmlExpr *) node;
-
-				APP_JUMB(xexpr->op);
-				JumbleExpr(jstate, (Node *) xexpr->named_args);
-				JumbleExpr(jstate, (Node *) xexpr->args);
-			}
-			break;
-		case T_NullTest:
-			{
-				NullTest   *nt = (NullTest *) node;
-
-				APP_JUMB(nt->nulltesttype);
-				JumbleExpr(jstate, (Node *) nt->arg);
-			}
-			break;
-		case T_BooleanTest:
-			{
-				BooleanTest *bt = (BooleanTest *) node;
-
-				APP_JUMB(bt->booltesttype);
-				JumbleExpr(jstate, (Node *) bt->arg);
-			}
-			break;
-		case T_CoerceToDomain:
-			{
-				CoerceToDomain *cd = (CoerceToDomain *) node;
-
-				APP_JUMB(cd->resulttype);
-				JumbleExpr(jstate, (Node *) cd->arg);
-			}
-			break;
-		case T_CoerceToDomainValue:
-			{
-				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
-
-				APP_JUMB(cdv->typeId);
-			}
-			break;
-		case T_SetToDefault:
-			{
-				SetToDefault *sd = (SetToDefault *) node;
-
-				APP_JUMB(sd->typeId);
-			}
-			break;
-		case T_CurrentOfExpr:
-			{
-				CurrentOfExpr *ce = (CurrentOfExpr *) node;
-
-				APP_JUMB(ce->cvarno);
-				if (ce->cursor_name)
-					APP_JUMB_STRING(ce->cursor_name);
-				APP_JUMB(ce->cursor_param);
-			}
-			break;
-		case T_NextValueExpr:
-			{
-				NextValueExpr *nve = (NextValueExpr *) node;
-
-				APP_JUMB(nve->seqid);
-				APP_JUMB(nve->typeId);
-			}
-			break;
-		case T_InferenceElem:
-			{
-				InferenceElem *ie = (InferenceElem *) node;
-
-				APP_JUMB(ie->infercollid);
-				APP_JUMB(ie->inferopclass);
-				JumbleExpr(jstate, ie->expr);
-			}
-			break;
-		case T_TargetEntry:
-			{
-				TargetEntry *tle = (TargetEntry *) node;
-
-				APP_JUMB(tle->resno);
-				APP_JUMB(tle->ressortgroupref);
-				JumbleExpr(jstate, (Node *) tle->expr);
-			}
-			break;
-		case T_RangeTblRef:
-			{
-				RangeTblRef *rtr = (RangeTblRef *) node;
-
-				APP_JUMB(rtr->rtindex);
-			}
-			break;
-		case T_JoinExpr:
-			{
-				JoinExpr   *join = (JoinExpr *) node;
-
-				APP_JUMB(join->jointype);
-				APP_JUMB(join->isNatural);
-				APP_JUMB(join->rtindex);
-				JumbleExpr(jstate, join->larg);
-				JumbleExpr(jstate, join->rarg);
-				JumbleExpr(jstate, join->quals);
-			}
-			break;
-		case T_FromExpr:
-			{
-				FromExpr   *from = (FromExpr *) node;
-
-				JumbleExpr(jstate, (Node *) from->fromlist);
-				JumbleExpr(jstate, from->quals);
-			}
-			break;
-		case T_OnConflictExpr:
-			{
-				OnConflictExpr *conf = (OnConflictExpr *) node;
-
-				APP_JUMB(conf->action);
-				JumbleExpr(jstate, (Node *) conf->arbiterElems);
-				JumbleExpr(jstate, conf->arbiterWhere);
-				JumbleExpr(jstate, (Node *) conf->onConflictSet);
-				JumbleExpr(jstate, conf->onConflictWhere);
-				APP_JUMB(conf->constraint);
-				APP_JUMB(conf->exclRelIndex);
-				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
-			}
-			break;
-		case T_List:
-			foreach(temp, (List *) node)
-			{
-				JumbleExpr(jstate, (Node *) lfirst(temp));
-			}
-			break;
-		case T_IntList:
-			foreach(temp, (List *) node)
-			{
-				APP_JUMB(lfirst_int(temp));
-			}
-			break;
-		case T_SortGroupClause:
-			{
-				SortGroupClause *sgc = (SortGroupClause *) node;
-
-				APP_JUMB(sgc->tleSortGroupRef);
-				APP_JUMB(sgc->eqop);
-				APP_JUMB(sgc->sortop);
-				APP_JUMB(sgc->nulls_first);
-			}
-			break;
-		case T_GroupingSet:
-			{
-				GroupingSet *gsnode = (GroupingSet *) node;
-
-				JumbleExpr(jstate, (Node *) gsnode->content);
-			}
-			break;
-		case T_WindowClause:
-			{
-				WindowClause *wc = (WindowClause *) node;
-
-				APP_JUMB(wc->winref);
-				APP_JUMB(wc->frameOptions);
-				JumbleExpr(jstate, (Node *) wc->partitionClause);
-				JumbleExpr(jstate, (Node *) wc->orderClause);
-				JumbleExpr(jstate, wc->startOffset);
-				JumbleExpr(jstate, wc->endOffset);
-			}
-			break;
-		case T_CommonTableExpr:
-			{
-				CommonTableExpr *cte = (CommonTableExpr *) node;
-
-				/* we store the string name because RTE_CTE RTEs need it */
-				APP_JUMB_STRING(cte->ctename);
-				APP_JUMB(cte->ctematerialized);
-				JumbleQuery(jstate, castNode(Query, cte->ctequery));
-			}
-			break;
-		case T_SetOperationStmt:
-			{
-				SetOperationStmt *setop = (SetOperationStmt *) node;
-
-				APP_JUMB(setop->op);
-				APP_JUMB(setop->all);
-				JumbleExpr(jstate, setop->larg);
-				JumbleExpr(jstate, setop->rarg);
-			}
-			break;
-		case T_RangeTblFunction:
-			{
-				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
-
-				JumbleExpr(jstate, rtfunc->funcexpr);
-			}
-			break;
-		case T_TableFunc:
-			{
-				TableFunc  *tablefunc = (TableFunc *) node;
-
-				JumbleExpr(jstate, tablefunc->docexpr);
-				JumbleExpr(jstate, tablefunc->rowexpr);
-				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
-			}
-			break;
-		case T_TableSampleClause:
-			{
-				TableSampleClause *tsc = (TableSampleClause *) node;
-
-				APP_JUMB(tsc->tsmhandler);
-				JumbleExpr(jstate, (Node *) tsc->args);
-				JumbleExpr(jstate, (Node *) tsc->repeatable);
-			}
-			break;
-		default:
-			/* Only a warning, since we can stumble along anyway */
-			elog(WARNING, "unrecognized node type: %d",
-				 (int) nodeTag(node));
-			break;
-	}
-}
-
-/*
- * Record location of constant within query string of query tree
- * that is currently being walked.
- */
-static void
-RecordConstLocation(pgssJumbleState *jstate, int location)
-{
-	/* -1 indicates unknown or undefined location */
-	if (location >= 0)
-	{
-		/* enlarge array if needed */
-		if (jstate->clocations_count >= jstate->clocations_buf_size)
-		{
-			jstate->clocations_buf_size *= 2;
-			jstate->clocations = (pgssLocationLen *)
-				repalloc(jstate->clocations,
-						 jstate->clocations_buf_size *
-						 sizeof(pgssLocationLen));
-		}
-		jstate->clocations[jstate->clocations_count].location = location;
-		/* initialize lengths to -1 to simplify fill_in_constant_lengths */
-		jstate->clocations[jstate->clocations_count].length = -1;
-		jstate->clocations_count++;
-	}
-}
-
 /*
  * Generate a normalized version of the query string that will be used to
  * represent all similar queries.
@@ -3321,7 +2566,7 @@ RecordConstLocation(pgssJumbleState *jstate, int location)
  * Returns a palloc'd string.
  */
 static char *
-generate_normalized_query(pgssJumbleState *jstate, const char *query,
+generate_normalized_query(JumbleState *jstate, const char *query,
 						  int query_loc, int *query_len_p)
 {
 	char	   *norm_query;
@@ -3428,10 +2673,10 @@ generate_normalized_query(pgssJumbleState *jstate, const char *query,
  * reason for a constant to start with a '-'.
  */
 static void
-fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+fill_in_constant_lengths(JumbleState *jstate, const char *query,
 						 int query_loc)
 {
-	pgssLocationLen *locs;
+	LocationLen *locs;
 	core_yyscan_t yyscanner;
 	core_yy_extra_type yyextra;
 	core_YYSTYPE yylval;
@@ -3445,7 +2690,7 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 	 */
 	if (jstate->clocations_count > 1)
 		qsort(jstate->clocations, jstate->clocations_count,
-			  sizeof(pgssLocationLen), comp_location);
+			  sizeof(LocationLen), comp_location);
 	locs = jstate->clocations;
 
 	/* initialize the flex scanner --- should match raw_parser() */
@@ -3525,13 +2770,13 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 }
 
 /*
- * comp_location: comparator for qsorting pgssLocationLen structs by location
+ * comp_location: comparator for qsorting LocationLen structs by location
  */
 static int
 comp_location(const void *a, const void *b)
 {
-	int			l = ((const pgssLocationLen *) a)->location;
-	int			r = ((const pgssLocationLen *) b)->location;
+	int			l = ((const LocationLen *) a)->location;
+	int			r = ((const LocationLen *) b)->location;
 
 	if (l < r)
 		return -1;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.conf b/contrib/pg_stat_statements/pg_stat_statements.conf
index 13346e2807..e47b26040f 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.conf
+++ b/contrib/pg_stat_statements/pg_stat_statements.conf
@@ -1 +1,2 @@
 shared_preload_libraries = 'pg_stat_statements'
+compute_query_id = on
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0c9128a55d..b28f7000c1 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7617,6 +7617,31 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
      <title>Statistics Monitoring</title>
      <variablelist>
 
+     <varlistentry id="guc-compute-query-id" xreflabel="compute_query_id">
+      <term><varname>compute_query_id</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>compute_query_id</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables in-core computation of a query identifier.  The <xref
+        linkend="pgstatstatements"/> extension requires a query identifier
+        to be computed.  Note that an external module can alternatively
+        be used if the in-core query identifier computation method
+        isn't acceptable.  In this case, in-core computation should
+        remain disabled.  The default is <literal>off</literal>.
+       </para>
+       <note>
+        <para>
+         To ensure that a only one query identifier is calculated and
+         displayed, extensions that calculate query identifiers should
+         throw an error if a query identifier has already been computed.
+        </para>
+       </note>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><varname>log_statement_stats</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 464bf0e5ae..3ca292d71f 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -20,6 +20,14 @@
   This means that a server restart is needed to add or remove the module.
  </para>
 
+ <para>
+  The module will not track statistics unless query
+  identifiers are calculated.  This can be done by enabling <xref
+  linkend="guc-compute-query-id"/> or using a third-party module that
+  computes its own query identifiers.  Note that all statistics tracked
+  by this module must be reset if the query identifier method is changed.
+ </para>
+
  <para>
    When <filename>pg_stat_statements</filename> is loaded, it tracks
    statistics across all databases of the server.  To access and manipulate
@@ -84,7 +92,7 @@
        <structfield>queryid</structfield> <type>bigint</type>
       </para>
       <para>
-       Internal hash code, computed from the statement's parse tree
+       Hash code to identify identical normalized queries.
       </para></entry>
      </row>
 
@@ -386,6 +394,16 @@
    are compared strictly on the basis of their textual query strings, however.
   </para>
 
+  <note>
+   <para>
+    The following details about constant replacement and
+    <structfield>queryid</structfield> only applies when <xref
+    linkend="guc-compute-query-id"/> is enabled.  If you use an external
+    module instead to compute <structfield>queryid</structfield>, you
+    should refer to its documentation for details.
+   </para>
+  </note>
+
   <para>
    When a constant's value has been ignored for purposes of matching the query
    to other queries, the constant is replaced by a parameter symbol, such
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 5de1307570..35cb9ebfd7 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -46,6 +46,8 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/queryjumble.h"
 #include "utils/rel.h"
 
 
@@ -107,6 +109,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -119,8 +122,11 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	query = transformTopLevelStmt(pstate, parseTree);
 
+	if (compute_query_id)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
@@ -140,6 +146,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -152,8 +159,11 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 	/* make sure all is well with parameter types */
 	check_variable_parameters(pstate, query);
 
+	if (compute_query_id)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index ad351e2fd1..3a62e45bef 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -668,6 +668,7 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 	ParseState *pstate;
 	Query	   *query;
 	List	   *querytree_list;
+	JumbleState *jstate = NULL;
 
 	Assert(query_string != NULL);	/* required as of 8.4 */
 
@@ -686,8 +687,11 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	query = transformTopLevelStmt(pstate, parsetree);
 
+	if (compute_query_id)
+		jstate = JumbleQuery(query, query_string);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 2397fc2453..1d5327cf64 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -22,6 +22,7 @@ OBJS = \
 	pg_rusage.o \
 	ps_status.o \
 	queryenvironment.o \
+	queryjumble.o \
 	rls.o \
 	sampling.o \
 	superuser.o \
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index c9c9da85f3..20b677543a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -534,6 +534,7 @@ extern const struct config_enum_entry dynamic_shared_memory_options[];
 /*
  * GUC option variables that are exported from this module
  */
+bool		compute_query_id = false;
 bool		log_duration = false;
 bool		Debug_print_plan = false;
 bool		Debug_print_parse = false;
@@ -1458,6 +1459,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"compute_query_id", PGC_SUSET, STATS_MONITORING,
+			gettext_noop("Compute query identifiers."),
+			NULL
+		},
+		&compute_query_id,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"log_parser_stats", PGC_SUSET, STATS_MONITORING,
 			gettext_noop("Writes parser performance statistics to the server log."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 39da7cc942..192577a02e 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -596,6 +596,7 @@
 
 # - Monitoring -
 
+#compute_query_id = off
 #log_parser_stats = off
 #log_planner_stats = off
 #log_executor_stats = off
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
new file mode 100644
index 0000000000..2a47688fd6
--- /dev/null
+++ b/src/backend/utils/misc/queryjumble.c
@@ -0,0 +1,834 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.c
+ *	 Query normalization and fingerprinting.
+ *
+ * Normalization is a process whereby similar queries, typically differing only
+ * in their constants (though the exact rules are somewhat more subtle than
+ * that) are recognized as equivalent, and are tracked as a single entry.  This
+ * is particularly useful for non-prepared queries.
+ *
+ * Normalization is implemented by fingerprinting queries, selectively
+ * serializing those fields of each query tree's nodes that are judged to be
+ * essential to the query.  This is referred to as a query jumble.  This is
+ * distinct from a regular serialization in that various extraneous
+ * information is ignored as irrelevant or not essential to the query, such
+ * as the collations of Vars and, most notably, the values of constants.
+ *
+ * This jumble is acquired at the end of parse analysis of each query, and
+ * a 64-bit hash of it is stored into the query's Query.queryId field.
+ * The server then copies this value around, making it available in plan
+ * tree(s) generated from the query.  The executor can then use this value
+ * to blame query costs on the proper queryId.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/misc/queryjumble.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "common/hashfn.h"
+#include "miscadmin.h"
+#include "parser/scansup.h"
+#include "utils/queryjumble.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+static uint64 compute_utility_queryid(const char *str, int query_len);
+static void AppendJumble(JumbleState *jstate,
+						 const unsigned char *item, Size size);
+static void JumbleQueryInternal(JumbleState *jstate, Query *query);
+static void JumbleRangeTable(JumbleState *jstate, List *rtable);
+static void JumbleRowMarks(JumbleState *jstate, List *rowMarks);
+static void JumbleExpr(JumbleState *jstate, Node *node);
+static void RecordConstLocation(JumbleState *jstate, int location);
+
+/*
+ * Given a possibly multi-statement source string, confine our attention to the
+ * relevant part of the string.
+ */
+const char *
+CleanQuerytext(const char *query, int *location, int *len)
+{
+	int query_location = *location;
+	int query_len = *len;
+
+	/* First apply starting offset, unless it's -1 (unknown). */
+	if (query_location >= 0)
+	{
+		Assert(query_location <= strlen(query));
+		query += query_location;
+		/* Length of 0 (or -1) means "rest of string" */
+		if (query_len <= 0)
+			query_len = strlen(query);
+		else
+			Assert(query_len <= strlen(query));
+	}
+	else
+	{
+		/* If query location is unknown, distrust query_len as well */
+		query_location = 0;
+		query_len = strlen(query);
+	}
+
+	/*
+	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
+	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 */
+	while (query_len > 0 && scanner_isspace(query[0]))
+		query++, query_location++, query_len--;
+	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
+		query_len--;
+
+	*location = query_location;
+	*len = query_len;
+
+	return query;
+}
+
+JumbleState *
+JumbleQuery(Query *query, const char *querytext)
+{
+	JumbleState *jstate = NULL;
+	if (query->utilityStmt)
+	{
+		const char *sql;
+		int query_location = query->stmt_location;
+		int query_len = query->stmt_len;
+
+		/*
+		 * Confine our attention to the relevant part of the string, if the
+		 * query is a portion of a multi-statement source string.
+		 */
+		sql = CleanQuerytext(querytext, &query_location, &query_len);
+
+		query->queryId = compute_utility_queryid(sql, query_len);
+	}
+	else
+	{
+		jstate = (JumbleState *) palloc(sizeof(JumbleState));
+
+		/* Set up workspace for query jumbling */
+		jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
+		jstate->jumble_len = 0;
+		jstate->clocations_buf_size = 32;
+		jstate->clocations = (LocationLen *)
+			palloc(jstate->clocations_buf_size * sizeof(LocationLen));
+		jstate->clocations_count = 0;
+		jstate->highest_extern_param_id = 0;
+
+		/* Compute query ID and mark the Query node with it */
+		JumbleQueryInternal(jstate, query);
+		query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
+														  jstate->jumble_len,
+														  0));
+
+		/*
+		 * If we are unlucky enough to get a hash of zero, use 1 instead, to
+		 * prevent confusion with the utility-statement case.
+		 */
+		if (query->queryId == UINT64CONST(0))
+			query->queryId = UINT64CONST(1);
+	}
+
+	return jstate;
+}
+
+/*
+ * Compute a query identifier for the given utility query string.
+ */
+static uint64
+compute_utility_queryid(const char *str, int query_len)
+{
+	uint64 queryId;
+
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+											   query_len, 0));
+
+	/*
+	 * If we are unlucky enough to get a hash of zero(invalid), use
+	 * queryID as 2 instead, queryID 1 is already in use for normal
+	 * statements.
+	 */
+	if (queryId == UINT64CONST(0))
+		queryId = UINT64CONST(2);
+
+	return queryId;
+}
+
+/*
+ * AppendJumble: Append a value that is substantive in a given query to
+ * the current jumble.
+ */
+static void
+AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
+{
+	unsigned char *jumble = jstate->jumble;
+	Size		jumble_len = jstate->jumble_len;
+
+	/*
+	 * Whenever the jumble buffer is full, we hash the current contents and
+	 * reset the buffer to contain just that hash value, thus relying on the
+	 * hash to summarize everything so far.
+	 */
+	while (size > 0)
+	{
+		Size		part_size;
+
+		if (jumble_len >= JUMBLE_SIZE)
+		{
+			uint64		start_hash;
+
+			start_hash = DatumGetUInt64(hash_any_extended(jumble,
+														  JUMBLE_SIZE, 0));
+			memcpy(jumble, &start_hash, sizeof(start_hash));
+			jumble_len = sizeof(start_hash);
+		}
+		part_size = Min(size, JUMBLE_SIZE - jumble_len);
+		memcpy(jumble + jumble_len, item, part_size);
+		jumble_len += part_size;
+		item += part_size;
+		size -= part_size;
+	}
+	jstate->jumble_len = jumble_len;
+}
+
+/*
+ * Wrappers around AppendJumble to encapsulate details of serialization
+ * of individual local variable elements.
+ */
+#define APP_JUMB(item) \
+	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
+#define APP_JUMB_STRING(str) \
+	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
+
+/*
+ * JumbleQueryInternal: Selectively serialize the query tree, appending
+ * significant data to the "query jumble" while ignoring nonsignificant data.
+ *
+ * Rule of thumb for what to include is that we should ignore anything not
+ * semantically significant (such as alias names) as well as anything that can
+ * be deduced from child nodes (else we'd just be double-hashing that piece
+ * of information).
+ */
+static void
+JumbleQueryInternal(JumbleState *jstate, Query *query)
+{
+	Assert(IsA(query, Query));
+	Assert(query->utilityStmt == NULL);
+
+	APP_JUMB(query->commandType);
+	/* resultRelation is usually predictable from commandType */
+	JumbleExpr(jstate, (Node *) query->cteList);
+	JumbleRangeTable(jstate, query->rtable);
+	JumbleExpr(jstate, (Node *) query->jointree);
+	JumbleExpr(jstate, (Node *) query->targetList);
+	JumbleExpr(jstate, (Node *) query->onConflict);
+	JumbleExpr(jstate, (Node *) query->returningList);
+	JumbleExpr(jstate, (Node *) query->groupClause);
+	JumbleExpr(jstate, (Node *) query->groupingSets);
+	JumbleExpr(jstate, query->havingQual);
+	JumbleExpr(jstate, (Node *) query->windowClause);
+	JumbleExpr(jstate, (Node *) query->distinctClause);
+	JumbleExpr(jstate, (Node *) query->sortClause);
+	JumbleExpr(jstate, query->limitOffset);
+	JumbleExpr(jstate, query->limitCount);
+	JumbleRowMarks(jstate, query->rowMarks);
+	JumbleExpr(jstate, query->setOperations);
+}
+
+/*
+ * Jumble a range table
+ */
+static void
+JumbleRangeTable(JumbleState *jstate, List *rtable)
+{
+	ListCell   *lc;
+
+	foreach(lc, rtable)
+	{
+		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+		APP_JUMB(rte->rtekind);
+		switch (rte->rtekind)
+		{
+			case RTE_RELATION:
+				APP_JUMB(rte->relid);
+				JumbleExpr(jstate, (Node *) rte->tablesample);
+				break;
+			case RTE_SUBQUERY:
+				JumbleQueryInternal(jstate, rte->subquery);
+				break;
+			case RTE_JOIN:
+				APP_JUMB(rte->jointype);
+				break;
+			case RTE_FUNCTION:
+				JumbleExpr(jstate, (Node *) rte->functions);
+				break;
+			case RTE_TABLEFUNC:
+				JumbleExpr(jstate, (Node *) rte->tablefunc);
+				break;
+			case RTE_VALUES:
+				JumbleExpr(jstate, (Node *) rte->values_lists);
+				break;
+			case RTE_CTE:
+
+				/*
+				 * Depending on the CTE name here isn't ideal, but it's the
+				 * only info we have to identify the referenced WITH item.
+				 */
+				APP_JUMB_STRING(rte->ctename);
+				APP_JUMB(rte->ctelevelsup);
+				break;
+			case RTE_NAMEDTUPLESTORE:
+				APP_JUMB_STRING(rte->enrname);
+				break;
+			case RTE_RESULT:
+				break;
+			default:
+				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
+				break;
+		}
+	}
+}
+
+/*
+ * Jumble a rowMarks list
+ */
+static void
+JumbleRowMarks(JumbleState *jstate, List *rowMarks)
+{
+	ListCell   *lc;
+
+	foreach(lc, rowMarks)
+	{
+		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
+
+		if (!rowmark->pushedDown)
+		{
+			APP_JUMB(rowmark->rti);
+			APP_JUMB(rowmark->strength);
+			APP_JUMB(rowmark->waitPolicy);
+		}
+	}
+}
+
+/*
+ * Jumble an expression tree
+ *
+ * In general this function should handle all the same node types that
+ * expression_tree_walker() does, and therefore it's coded to be as parallel
+ * to that function as possible.  However, since we are only invoked on
+ * queries immediately post-parse-analysis, we need not handle node types
+ * that only appear in planning.
+ *
+ * Note: the reason we don't simply use expression_tree_walker() is that the
+ * point of that function is to support tree walkers that don't care about
+ * most tree node types, but here we care about all types.  We should complain
+ * about any unrecognized node type.
+ */
+static void
+JumbleExpr(JumbleState *jstate, Node *node)
+{
+	ListCell   *temp;
+
+	if (node == NULL)
+		return;
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+
+	/*
+	 * We always emit the node's NodeTag, then any additional fields that are
+	 * considered significant, and then we recurse to any child nodes.
+	 */
+	APP_JUMB(node->type);
+
+	switch (nodeTag(node))
+	{
+		case T_Var:
+			{
+				Var		   *var = (Var *) node;
+
+				APP_JUMB(var->varno);
+				APP_JUMB(var->varattno);
+				APP_JUMB(var->varlevelsup);
+			}
+			break;
+		case T_Const:
+			{
+				Const	   *c = (Const *) node;
+
+				/* We jumble only the constant's type, not its value */
+				APP_JUMB(c->consttype);
+				/* Also, record its parse location for query normalization */
+				RecordConstLocation(jstate, c->location);
+			}
+			break;
+		case T_Param:
+			{
+				Param	   *p = (Param *) node;
+
+				APP_JUMB(p->paramkind);
+				APP_JUMB(p->paramid);
+				APP_JUMB(p->paramtype);
+				/* Also, track the highest external Param id */
+				if (p->paramkind == PARAM_EXTERN &&
+					p->paramid > jstate->highest_extern_param_id)
+					jstate->highest_extern_param_id = p->paramid;
+			}
+			break;
+		case T_Aggref:
+			{
+				Aggref	   *expr = (Aggref *) node;
+
+				APP_JUMB(expr->aggfnoid);
+				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggorder);
+				JumbleExpr(jstate, (Node *) expr->aggdistinct);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_GroupingFunc:
+			{
+				GroupingFunc *grpnode = (GroupingFunc *) node;
+
+				JumbleExpr(jstate, (Node *) grpnode->refs);
+			}
+			break;
+		case T_WindowFunc:
+			{
+				WindowFunc *expr = (WindowFunc *) node;
+
+				APP_JUMB(expr->winfnoid);
+				APP_JUMB(expr->winref);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_SubscriptingRef:
+			{
+				SubscriptingRef *sbsref = (SubscriptingRef *) node;
+
+				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
+			}
+			break;
+		case T_FuncExpr:
+			{
+				FuncExpr   *expr = (FuncExpr *) node;
+
+				APP_JUMB(expr->funcid);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_NamedArgExpr:
+			{
+				NamedArgExpr *nae = (NamedArgExpr *) node;
+
+				APP_JUMB(nae->argnumber);
+				JumbleExpr(jstate, (Node *) nae->arg);
+			}
+			break;
+		case T_OpExpr:
+		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
+		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
+			{
+				OpExpr	   *expr = (OpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_ScalarArrayOpExpr:
+			{
+				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				APP_JUMB(expr->useOr);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_BoolExpr:
+			{
+				BoolExpr   *expr = (BoolExpr *) node;
+
+				APP_JUMB(expr->boolop);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_SubLink:
+			{
+				SubLink    *sublink = (SubLink *) node;
+
+				APP_JUMB(sublink->subLinkType);
+				APP_JUMB(sublink->subLinkId);
+				JumbleExpr(jstate, (Node *) sublink->testexpr);
+				JumbleQueryInternal(jstate, castNode(Query, sublink->subselect));
+			}
+			break;
+		case T_FieldSelect:
+			{
+				FieldSelect *fs = (FieldSelect *) node;
+
+				APP_JUMB(fs->fieldnum);
+				JumbleExpr(jstate, (Node *) fs->arg);
+			}
+			break;
+		case T_FieldStore:
+			{
+				FieldStore *fstore = (FieldStore *) node;
+
+				JumbleExpr(jstate, (Node *) fstore->arg);
+				JumbleExpr(jstate, (Node *) fstore->newvals);
+			}
+			break;
+		case T_RelabelType:
+			{
+				RelabelType *rt = (RelabelType *) node;
+
+				APP_JUMB(rt->resulttype);
+				JumbleExpr(jstate, (Node *) rt->arg);
+			}
+			break;
+		case T_CoerceViaIO:
+			{
+				CoerceViaIO *cio = (CoerceViaIO *) node;
+
+				APP_JUMB(cio->resulttype);
+				JumbleExpr(jstate, (Node *) cio->arg);
+			}
+			break;
+		case T_ArrayCoerceExpr:
+			{
+				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
+
+				APP_JUMB(acexpr->resulttype);
+				JumbleExpr(jstate, (Node *) acexpr->arg);
+				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
+			}
+			break;
+		case T_ConvertRowtypeExpr:
+			{
+				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
+
+				APP_JUMB(crexpr->resulttype);
+				JumbleExpr(jstate, (Node *) crexpr->arg);
+			}
+			break;
+		case T_CollateExpr:
+			{
+				CollateExpr *ce = (CollateExpr *) node;
+
+				APP_JUMB(ce->collOid);
+				JumbleExpr(jstate, (Node *) ce->arg);
+			}
+			break;
+		case T_CaseExpr:
+			{
+				CaseExpr   *caseexpr = (CaseExpr *) node;
+
+				JumbleExpr(jstate, (Node *) caseexpr->arg);
+				foreach(temp, caseexpr->args)
+				{
+					CaseWhen   *when = lfirst_node(CaseWhen, temp);
+
+					JumbleExpr(jstate, (Node *) when->expr);
+					JumbleExpr(jstate, (Node *) when->result);
+				}
+				JumbleExpr(jstate, (Node *) caseexpr->defresult);
+			}
+			break;
+		case T_CaseTestExpr:
+			{
+				CaseTestExpr *ct = (CaseTestExpr *) node;
+
+				APP_JUMB(ct->typeId);
+			}
+			break;
+		case T_ArrayExpr:
+			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
+			break;
+		case T_RowExpr:
+			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
+			break;
+		case T_RowCompareExpr:
+			{
+				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
+
+				APP_JUMB(rcexpr->rctype);
+				JumbleExpr(jstate, (Node *) rcexpr->largs);
+				JumbleExpr(jstate, (Node *) rcexpr->rargs);
+			}
+			break;
+		case T_CoalesceExpr:
+			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
+			break;
+		case T_MinMaxExpr:
+			{
+				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
+
+				APP_JUMB(mmexpr->op);
+				JumbleExpr(jstate, (Node *) mmexpr->args);
+			}
+			break;
+		case T_SQLValueFunction:
+			{
+				SQLValueFunction *svf = (SQLValueFunction *) node;
+
+				APP_JUMB(svf->op);
+				/* type is fully determined by op */
+				APP_JUMB(svf->typmod);
+			}
+			break;
+		case T_XmlExpr:
+			{
+				XmlExpr    *xexpr = (XmlExpr *) node;
+
+				APP_JUMB(xexpr->op);
+				JumbleExpr(jstate, (Node *) xexpr->named_args);
+				JumbleExpr(jstate, (Node *) xexpr->args);
+			}
+			break;
+		case T_NullTest:
+			{
+				NullTest   *nt = (NullTest *) node;
+
+				APP_JUMB(nt->nulltesttype);
+				JumbleExpr(jstate, (Node *) nt->arg);
+			}
+			break;
+		case T_BooleanTest:
+			{
+				BooleanTest *bt = (BooleanTest *) node;
+
+				APP_JUMB(bt->booltesttype);
+				JumbleExpr(jstate, (Node *) bt->arg);
+			}
+			break;
+		case T_CoerceToDomain:
+			{
+				CoerceToDomain *cd = (CoerceToDomain *) node;
+
+				APP_JUMB(cd->resulttype);
+				JumbleExpr(jstate, (Node *) cd->arg);
+			}
+			break;
+		case T_CoerceToDomainValue:
+			{
+				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
+
+				APP_JUMB(cdv->typeId);
+			}
+			break;
+		case T_SetToDefault:
+			{
+				SetToDefault *sd = (SetToDefault *) node;
+
+				APP_JUMB(sd->typeId);
+			}
+			break;
+		case T_CurrentOfExpr:
+			{
+				CurrentOfExpr *ce = (CurrentOfExpr *) node;
+
+				APP_JUMB(ce->cvarno);
+				if (ce->cursor_name)
+					APP_JUMB_STRING(ce->cursor_name);
+				APP_JUMB(ce->cursor_param);
+			}
+			break;
+		case T_NextValueExpr:
+			{
+				NextValueExpr *nve = (NextValueExpr *) node;
+
+				APP_JUMB(nve->seqid);
+				APP_JUMB(nve->typeId);
+			}
+			break;
+		case T_InferenceElem:
+			{
+				InferenceElem *ie = (InferenceElem *) node;
+
+				APP_JUMB(ie->infercollid);
+				APP_JUMB(ie->inferopclass);
+				JumbleExpr(jstate, ie->expr);
+			}
+			break;
+		case T_TargetEntry:
+			{
+				TargetEntry *tle = (TargetEntry *) node;
+
+				APP_JUMB(tle->resno);
+				APP_JUMB(tle->ressortgroupref);
+				JumbleExpr(jstate, (Node *) tle->expr);
+			}
+			break;
+		case T_RangeTblRef:
+			{
+				RangeTblRef *rtr = (RangeTblRef *) node;
+
+				APP_JUMB(rtr->rtindex);
+			}
+			break;
+		case T_JoinExpr:
+			{
+				JoinExpr   *join = (JoinExpr *) node;
+
+				APP_JUMB(join->jointype);
+				APP_JUMB(join->isNatural);
+				APP_JUMB(join->rtindex);
+				JumbleExpr(jstate, join->larg);
+				JumbleExpr(jstate, join->rarg);
+				JumbleExpr(jstate, join->quals);
+			}
+			break;
+		case T_FromExpr:
+			{
+				FromExpr   *from = (FromExpr *) node;
+
+				JumbleExpr(jstate, (Node *) from->fromlist);
+				JumbleExpr(jstate, from->quals);
+			}
+			break;
+		case T_OnConflictExpr:
+			{
+				OnConflictExpr *conf = (OnConflictExpr *) node;
+
+				APP_JUMB(conf->action);
+				JumbleExpr(jstate, (Node *) conf->arbiterElems);
+				JumbleExpr(jstate, conf->arbiterWhere);
+				JumbleExpr(jstate, (Node *) conf->onConflictSet);
+				JumbleExpr(jstate, conf->onConflictWhere);
+				APP_JUMB(conf->constraint);
+				APP_JUMB(conf->exclRelIndex);
+				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
+			}
+			break;
+		case T_List:
+			foreach(temp, (List *) node)
+			{
+				JumbleExpr(jstate, (Node *) lfirst(temp));
+			}
+			break;
+		case T_IntList:
+			foreach(temp, (List *) node)
+			{
+				APP_JUMB(lfirst_int(temp));
+			}
+			break;
+		case T_SortGroupClause:
+			{
+				SortGroupClause *sgc = (SortGroupClause *) node;
+
+				APP_JUMB(sgc->tleSortGroupRef);
+				APP_JUMB(sgc->eqop);
+				APP_JUMB(sgc->sortop);
+				APP_JUMB(sgc->nulls_first);
+			}
+			break;
+		case T_GroupingSet:
+			{
+				GroupingSet *gsnode = (GroupingSet *) node;
+
+				JumbleExpr(jstate, (Node *) gsnode->content);
+			}
+			break;
+		case T_WindowClause:
+			{
+				WindowClause *wc = (WindowClause *) node;
+
+				APP_JUMB(wc->winref);
+				APP_JUMB(wc->frameOptions);
+				JumbleExpr(jstate, (Node *) wc->partitionClause);
+				JumbleExpr(jstate, (Node *) wc->orderClause);
+				JumbleExpr(jstate, wc->startOffset);
+				JumbleExpr(jstate, wc->endOffset);
+			}
+			break;
+		case T_CommonTableExpr:
+			{
+				CommonTableExpr *cte = (CommonTableExpr *) node;
+
+				/* we store the string name because RTE_CTE RTEs need it */
+				APP_JUMB_STRING(cte->ctename);
+				APP_JUMB(cte->ctematerialized);
+				JumbleQueryInternal(jstate, castNode(Query, cte->ctequery));
+			}
+			break;
+		case T_SetOperationStmt:
+			{
+				SetOperationStmt *setop = (SetOperationStmt *) node;
+
+				APP_JUMB(setop->op);
+				APP_JUMB(setop->all);
+				JumbleExpr(jstate, setop->larg);
+				JumbleExpr(jstate, setop->rarg);
+			}
+			break;
+		case T_RangeTblFunction:
+			{
+				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
+
+				JumbleExpr(jstate, rtfunc->funcexpr);
+			}
+			break;
+		case T_TableFunc:
+			{
+				TableFunc  *tablefunc = (TableFunc *) node;
+
+				JumbleExpr(jstate, tablefunc->docexpr);
+				JumbleExpr(jstate, tablefunc->rowexpr);
+				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
+			}
+			break;
+		case T_TableSampleClause:
+			{
+				TableSampleClause *tsc = (TableSampleClause *) node;
+
+				APP_JUMB(tsc->tsmhandler);
+				JumbleExpr(jstate, (Node *) tsc->args);
+				JumbleExpr(jstate, (Node *) tsc->repeatable);
+			}
+			break;
+		default:
+			/* Only a warning, since we can stumble along anyway */
+			elog(WARNING, "unrecognized node type: %d",
+				 (int) nodeTag(node));
+			break;
+	}
+}
+
+/*
+ * Record location of constant within query string of query tree
+ * that is currently being walked.
+ */
+static void
+RecordConstLocation(JumbleState *jstate, int location)
+{
+	/* -1 indicates unknown or undefined location */
+	if (location >= 0)
+	{
+		/* enlarge array if needed */
+		if (jstate->clocations_count >= jstate->clocations_buf_size)
+		{
+			jstate->clocations_buf_size *= 2;
+			jstate->clocations = (LocationLen *)
+				repalloc(jstate->clocations,
+						 jstate->clocations_buf_size *
+						 sizeof(LocationLen));
+		}
+		jstate->clocations[jstate->clocations_count].location = location;
+		/* initialize lengths to -1 to simplify third-party module usage */
+		jstate->clocations[jstate->clocations_count].length = -1;
+		jstate->clocations_count++;
+	}
+}
diff --git a/src/include/parser/analyze.h b/src/include/parser/analyze.h
index 4a3c9686f9..6716db6c13 100644
--- a/src/include/parser/analyze.h
+++ b/src/include/parser/analyze.h
@@ -15,10 +15,12 @@
 #define ANALYZE_H
 
 #include "parser/parse_node.h"
+#include "utils/queryjumble.h"
 
 /* Hook for plugins to get control at end of parse analysis */
 typedef void (*post_parse_analyze_hook_type) (ParseState *pstate,
-											  Query *query);
+											  Query *query,
+											  JumbleState *jstate);
 extern PGDLLIMPORT post_parse_analyze_hook_type post_parse_analyze_hook;
 
 
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 5004ee4177..9b6552b25b 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -248,6 +248,7 @@ extern bool log_btree_build_stats;
 extern PGDLLIMPORT bool check_function_bodies;
 extern bool session_auth_is_superuser;
 
+extern bool compute_query_id;
 extern bool log_duration;
 extern int	log_parameter_max_length;
 extern int	log_parameter_max_length_on_error;
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
new file mode 100644
index 0000000000..83ba7339fa
--- /dev/null
+++ b/src/include/utils/queryjumble.h
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.h
+ *	  Query normalization and fingerprinting.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/utils/queryjumble.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef QUERYJUBLE_H
+#define QUERYJUBLE_H
+
+#include "nodes/parsenodes.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+/*
+ * Struct for tracking locations/lengths of constants during normalization
+ */
+typedef struct LocationLen
+{
+	int			location;		/* start offset in query text */
+	int			length;			/* length in bytes, or -1 to ignore */
+} LocationLen;
+
+/*
+ * Working state for computing a query jumble and producing a normalized
+ * query string
+ */
+typedef struct JumbleState
+{
+	/* Jumble of current query tree */
+	unsigned char *jumble;
+
+	/* Number of bytes used in jumble[] */
+	Size		jumble_len;
+
+	/* Array of locations of constants that should be removed */
+	LocationLen *clocations;
+
+	/* Allocated length of clocations array */
+	int			clocations_buf_size;
+
+	/* Current number of valid entries in clocations array */
+	int			clocations_count;
+
+	/* highest Param id we've seen, in order to start normalization correctly */
+	int			highest_extern_param_id;
+} JumbleState;
+
+const char *CleanQuerytext(const char *query, int *location, int *len);
+JumbleState *JumbleQuery(Query *query, const char *querytext);
+
+#endif							/* QUERYJUMBLE_H */
-- 
2.30.1

v23-0002-Expose-queryid-in-pg_stat_activity-and-log_line_.patchtext/x-diff; charset=us-asciiDownload

From d1659e0793f1859b7ed945db88542282d1719da8 Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Mon, 22 Mar 2021 17:43:23 -0400
Subject: [PATCH v23 2/3] Expose queryid in pg_stat_activity and
 log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.
---
 .../pg_stat_statements/pg_stat_statements.c   | 112 +++++++-----------
 doc/src/sgml/config.sgml                      |  29 +++--
 doc/src/sgml/monitoring.sgml                  |  16 +++
 src/backend/catalog/system_views.sql          |   1 +
 src/backend/executor/execMain.c               |   9 ++
 src/backend/executor/execParallel.c           |   5 +-
 src/backend/parser/analyze.c                  |   5 +
 src/backend/tcop/postgres.c                   |   5 +
 src/backend/utils/activity/backend_status.c   |  66 +++++++++++
 src/backend/utils/adt/pgstatfuncs.c           |   7 +-
 src/backend/utils/error/elog.c                |   8 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          |  27 ++---
 src/include/catalog/pg_proc.dat               |   6 +-
 src/include/utils/backend_status.h            |   5 +
 src/test/regress/expected/rules.out           |   9 +-
 16 files changed, 211 insertions(+), 100 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 0f8bac0cca..52cba86196 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -67,6 +67,7 @@
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/queryjumble.h"
 #include "utils/memutils.h"
 #include "utils/timestamp.h"
 
@@ -101,6 +102,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+									!IsA(n, PrepareStmt) && \
+									!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -309,7 +318,6 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 								ProcessUtilityContext context, ParamListInfo params,
 								QueryEnvironment *queryEnv,
 								DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
 static void pgss_store(const char *query, uint64 queryId,
 					   int query_location, int query_len,
 					   pgssStoreKind kind,
@@ -806,16 +814,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * Clear queryId for prepared statements related utility, as those will
+	 * inherit from the underlying statement's one (except DEALLOCATE which is
+	 * entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt))
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -1057,6 +1063,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, QueryCompletion *qc)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
+	uint64		saved_queryId = pstmt->queryId;
+
+	/*
+	 * Force utility statements to get queryId zero.  We do this even in cases
+	 * where the statement contains an optimizable statement for which a
+	 * queryId could be derived (such as EXPLAIN or DECLARE CURSOR).  For such
+	 * cases, runtime control will first go through ProcessUtility and then the
+	 * executor, and we don't want the executor hooks to do anything, since we
+	 * are already measuring the statement's costs at the utility level.
+	 *
+	 * Note that this is only done if pg_stat_statements is enabled and
+	 * configured to track utility statements, in the unlikely possibility
+	 * that user configured another extension to handle utility statements
+	 * only.
+	 */
+	if (pgss_enabled(exec_nested_level) && pgss_track_utility)
+		pstmt->queryId = UINT64CONST(0);
 
 	/*
 	 * If it's an EXECUTE statement, we don't track it and don't increment the
@@ -1073,9 +1096,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	 * Likewise, we don't track execution of DEALLOCATE.
 	 */
 	if (pgss_track_utility && pgss_enabled(exec_nested_level) &&
-		!IsA(parsetree, ExecuteStmt) &&
-		!IsA(parsetree, PrepareStmt) &&
-		!IsA(parsetree, DeallocateStmt))
+		PGSS_HANDLED_UTILITY(parsetree))
 	{
 		instr_time	start;
 		instr_time	duration;
@@ -1130,7 +1151,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
 
 		pgss_store(queryString,
-				   0,			/* signal that it's a utility stmt */
+				   saved_queryId,
 				   pstmt->stmt_location,
 				   pstmt->stmt_len,
 				   PGSS_EXEC,
@@ -1153,23 +1174,12 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	}
 }
 
-/*
- * Given an arbitrarily long query string, produce a hash for the purposes of
- * identifying the query, without normalizing constants.  Used when hashing
- * utility statements.
- */
-static uint64
-pgss_hash_string(const char *str, int len)
-{
-	return DatumGetUInt64(hash_any_extended((const unsigned char *) str,
-											len, 0));
-}
-
 /*
  * Store some statistics for a statement.
  *
- * If queryId is 0 then this is a utility statement and we should compute
- * a suitable queryId internally.
+ * If queryId is 0 then this is a utility statement for which we couldn't
+ * compute a queryId during parse analysis, and we should compute a suitable
+ * queryId internally.
  *
  * If jstate is not NULL then we're trying to create an entry for which
  * we have no statistics as yet; we just want to record the normalized
@@ -1200,52 +1210,18 @@ pgss_store(const char *query, uint64 queryId,
 		return;
 
 	/*
-	 * Confine our attention to the relevant part of the string, if the query
-	 * is a portion of a multi-statement source string.
-	 *
-	 * First apply starting offset, unless it's -1 (unknown).
-	 */
-	if (query_location >= 0)
-	{
-		Assert(query_location <= strlen(query));
-		query += query_location;
-		/* Length of 0 (or -1) means "rest of string" */
-		if (query_len <= 0)
-			query_len = strlen(query);
-		else
-			Assert(query_len <= strlen(query));
-	}
-	else
-	{
-		/* If query location is unknown, distrust query_len as well */
-		query_location = 0;
-		query_len = strlen(query);
-	}
-
-	/*
-	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
-	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 * Nothing to do if compute_query_id isn't enabled and no other module
+	 * computed a query identifier.
 	 */
-	while (query_len > 0 && scanner_isspace(query[0]))
-		query++, query_location++, query_len--;
-	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
-		query_len--;
+	if (queryId == UINT64CONST(0))
+		return;
 
 	/*
-	 * For utility statements, we just hash the query string to get an ID.
+	 * Confine our attention to the relevant part of the string, if the query
+	 * is a portion of a multi-statement source string, and update query
+	 * location and length if needed.
 	 */
-	if (queryId == UINT64CONST(0))
-	{
-		queryId = pgss_hash_string(query, query_len);
-
-		/*
-		 * If we are unlucky enough to get a hash of zero(invalid), use
-		 * queryID as 2 instead, queryID 1 is already in use for normal
-		 * statements.
-		 */
-		if (queryId == UINT64CONST(0))
-			queryId = UINT64CONST(2);
-	}
+	query = CleanQuerytext(query, &query_location, &query_len);
 
 	/* Set up key for hashtable search */
 	key.userid = GetUserId();
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b28f7000c1..5f9eddb197 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6999,6 +6999,15 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>query identifier of the current query.  Query
+             identifiers are not computed by default, so this field
+             will be zero unless <xref linkend="guc-compute-query-id"/>
+             parameter is enabled or a third-party module that computes
+             query identifiers is configured.</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
@@ -7475,8 +7484,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       <listitem>
        <para>
         Enables the collection of information on the currently
-        executing command of each session, along with the time when
-        that command began execution. This parameter is on by
+        executing command of each session, along with its identifier and the
+        time when that command began execution. This parameter is on by
         default. Note that even when enabled, this information is not
         visible to all users, only to superusers and the user owning
         the session being reported on, so it should not represent a
@@ -7625,12 +7634,16 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </term>
       <listitem>
        <para>
-        Enables in-core computation of a query identifier.  The <xref
-        linkend="pgstatstatements"/> extension requires a query identifier
-        to be computed.  Note that an external module can alternatively
-        be used if the in-core query identifier computation method
-        isn't acceptable.  In this case, in-core computation should
-        remain disabled.  The default is <literal>off</literal>.
+        Enables in-core computation of a query identifier.
+        Query identifiers can be displayed in the <link
+        linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
+        view, or emitted in the log if configured via the <xref
+        linkend="guc-log-line-prefix"/> parameter.  The <xref
+        linkend="pgstatstatements"/> extension also requires a query
+        identifier to be computed.  Note that an external module can
+        alternatively be used if the in-core query identifier computation
+        specification isn't acceptable.  In this case, in-core computation
+        must be disabled.  The default is <literal>off</literal>.
        </para>
        <note>
         <para>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 56018745c8..52958b4fd9 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -910,6 +910,22 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </para></entry>
      </row>
 
+    <row>
+     <entry role="catalog_table_entry"><para role="column_definition">
+      <structfield>queryid</structfield> <type>bigint</type>
+     </para>
+     <para>
+      Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this
+      field shows the identifier of the currently executing query. In
+      all other states, it shows the identifier of last query that was
+      executed.  Query identifiers are not computed by default so this
+      field will be null unless <xref linkend="guc-compute-query-id"/>
+      parameter is enabled or a third-party module that computes query
+      identifiers is configured.
+     </para></entry>
+    </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5f2541d316..4d6b232787 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -833,6 +833,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 163242f54e..db49d657f6 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -58,6 +58,7 @@
 #include "storage/lmgr.h"
 #include "tcop/utility.h"
 #include "utils/acl.h"
+#include "utils/backend_status.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/partcache.h"
@@ -128,6 +129,14 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	/*
+	 * In some cases (e.g. an EXECUTE statement) a query execution will skip
+	 * parse analysis, which means that the queryid won't be reported.  Note
+	 * that it's harmless to report the queryid multiple time, as the call will
+	 * be ignored if the top level queryid has already been reported.
+	 */
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 366d0b20b9..c7a2f31473 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -175,7 +175,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = UINT64CONST(0);
+	pstmt->queryId = pgstat_get_my_queryid();
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
@@ -1421,8 +1421,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Setting debug_query_string for individual workers */
 	debug_query_string = queryDesc->sourceText;
 
-	/* Report workers' query for monitoring purposes */
+	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 35cb9ebfd7..b082096b90 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -45,6 +45,7 @@
 #include "parser/parse_type.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
+#include "utils/backend_status.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/queryjumble.h"
@@ -130,6 +131,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
@@ -167,6 +170,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3a62e45bef..d0c1dc9ef2 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -695,6 +695,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	if (log_parser_stats)
 		ShowUsage("PARSE ANALYSIS STATISTICS");
 
@@ -913,6 +915,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
 			stmt->utilityStmt = query->utilityStmt;
 			stmt->stmt_location = query->stmt_location;
 			stmt->stmt_len = query->stmt_len;
+			stmt->queryId = query->queryId;
 		}
 		else
 		{
@@ -1029,6 +1032,8 @@ exec_simple_query(const char *query_string)
 		DestReceiver *receiver;
 		int16		format;
 
+		pgstat_report_queryid(0, true);
+
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index a25ec0ee3c..1505988763 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -544,6 +544,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
+			beentry->st_queryid = UINT64CONST(0);
 			proc->wait_event_info = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
@@ -598,6 +599,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	beentry->st_state = state;
 	beentry->st_state_start_timestamp = current_timestamp;
 
+	/*
+	 * If a new query is started, we reset the query identifier as it'll only
+	 * be known after parse analysis, to avoid reporting last query's
+	 * identifier.
+	 */
+	if (state == STATE_RUNNING)
+		beentry->st_queryid = UINT64CONST(0);
+
 	if (cmd_str != NULL)
 	{
 		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
@@ -608,6 +617,49 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/* --------
+ * pgstat_report_queryid() -
+ *
+ * Called to update top-level query identifier.
+ * --------
+ */
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	/*
+	 * if track_activities is disabled, st_queryid should already have been
+	 * reset
+	 */
+	if (!pgstat_track_activities)
+		return;
+
+	/*
+	 * We only report the top-level query identifiers.  The stored queryid is
+	 * reset when a backend calls pgstat_report_activity(STATE_RUNNING), or
+	 * with an explicit call to this function using the force flag.  If the
+	 * saved query identifier is not zero it means that it's not a top-level
+	 * command, so ignore the one provided unless it's an explicit call to
+	 * reset the identifier.
+	 */
+	if (beentry->st_queryid != 0 && !force)
+		return;
+
+	/*
+	 * Update my status entry, following the protocol of bumping
+	 * st_changecount before and after.  We use a volatile pointer here to
+	 * ensure the compiler doesn't try to get cute.
+	 */
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+	beentry->st_queryid = queryId;
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
+
 /* ----------
  * pgstat_report_appname() -
  *
@@ -972,6 +1024,20 @@ pgstat_get_crashed_backend_activity(int pid, char *buffer, int buflen)
 	return NULL;
 }
 
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ * Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+	if (!MyBEEntry)
+		return 0;
+
+	return MyBEEntry->st_queryid;
+}
+
 
 /* ----------
  * pgstat_fetch_stat_beentry() -
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9ffbca685c..9fa4a93162 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -569,7 +569,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	29
+#define PG_STAT_GET_ACTIVITY_COLS	30
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -914,6 +914,10 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[27] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			if (beentry->st_queryid == 0)
+				nulls[29] = true;
+			else
+				values[29] = DatumGetUInt64(beentry->st_queryid);
 		}
 		else
 		{
@@ -941,6 +945,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[26] = true;
 			nulls[27] = true;
 			nulls[28] = true;
+			nulls[29] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 12de4b38cb..1cf71a649b 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2714,6 +2714,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (padding != 0)
+					appendStringInfo(buf, "%*ld", padding,
+							pgstat_get_my_queryid());
+				else
+					appendStringInfo(buf, "%ld",
+							pgstat_get_my_queryid());
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 192577a02e..65f6186966 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -543,6 +543,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index 2a47688fd6..53286bb333 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -39,7 +39,7 @@
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
-static uint64 compute_utility_queryid(const char *str, int query_len);
+static uint64 compute_utility_queryid(const char *str, int query_location, int query_len);
 static void AppendJumble(JumbleState *jstate,
 						 const unsigned char *item, Size size);
 static void JumbleQueryInternal(JumbleState *jstate, Query *query);
@@ -97,17 +97,9 @@ JumbleQuery(Query *query, const char *querytext)
 	JumbleState *jstate = NULL;
 	if (query->utilityStmt)
 	{
-		const char *sql;
-		int query_location = query->stmt_location;
-		int query_len = query->stmt_len;
-
-		/*
-		 * Confine our attention to the relevant part of the string, if the
-		 * query is a portion of a multi-statement source string.
-		 */
-		sql = CleanQuerytext(querytext, &query_location, &query_len);
-
-		query->queryId = compute_utility_queryid(sql, query_len);
+		query->queryId = compute_utility_queryid(querytext,
+												 query->stmt_location,
+												 query->stmt_len);
 	}
 	else
 	{
@@ -143,11 +135,18 @@ JumbleQuery(Query *query, const char *querytext)
  * Compute a query identifier for the given utility query string.
  */
 static uint64
-compute_utility_queryid(const char *str, int query_len)
+compute_utility_queryid(const char *query_text, int query_location, int query_len)
 {
 	uint64 queryId;
+	const char *sql;
+
+	/*
+	 * Confine our attention to the relevant part of the string, if the
+	 * query is a portion of a multi-statement source string.
+	 */
+	sql = CleanQuerytext(query_text, &query_location, &query_len);
 
-	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) sql,
 											   query_len, 0));
 
 	/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 69ffd0c3f4..ab30558e3f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5263,9 +5263,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 3fd7370d41..8e149b56ca 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -165,6 +165,9 @@ typedef struct PgBackendStatus
 	ProgressCommandType st_progress_command;
 	Oid			st_progress_command_target;
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+
+	/* query identifier, optionally computed using post_parse_analyze_hook */
+	uint64		st_queryid;
 } PgBackendStatus;
 
 
@@ -294,12 +297,14 @@ extern void pgstat_clear_backend_activity_snapshot(void);
 
 /* Activity reporting functions */
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_queryid(uint64 queryId, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
 extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
+extern uint64 pgstat_get_my_queryid(void);
 
 
 /* ----------
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 9b59a7b4a5..264deda7af 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1762,9 +1762,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1876,7 +1877,7 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
@@ -2046,7 +2047,7 @@ pg_stat_replication| SELECT s.pid,
     w.sync_priority,
     w.sync_state,
     w.reply_time
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_replication_slots| SELECT s.slot_name,
@@ -2076,7 +2077,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,
-- 
2.30.1

v23-0003-Expose-query-identifier-in-verbose-explain.patchtext/x-diff; charset=us-asciiDownload

From 684cb243e9895fe9f658a78f9f8376cd8a895264 Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Mon, 22 Mar 2021 17:43:24 -0400
Subject: [PATCH v23 3/3] Expose query identifier in verbose explain

If a query identifier has been computed, either by enabling compute_query_id or
using a third-party module, verbose explain will display it.
---
 doc/src/sgml/config.sgml              |  6 +++---
 doc/src/sgml/ref/explain.sgml         |  6 ++++--
 src/backend/commands/explain.c        | 18 ++++++++++++++++++
 src/test/regress/expected/explain.out | 11 ++++++++++-
 src/test/regress/sql/explain.sql      |  5 ++++-
 5 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5f9eddb197..71b47729b8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7637,9 +7637,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         Enables in-core computation of a query identifier.
         Query identifiers can be displayed in the <link
         linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
-        view, or emitted in the log if configured via the <xref
-        linkend="guc-log-line-prefix"/> parameter.  The <xref
-        linkend="pgstatstatements"/> extension also requires a query
+        view, using <command>EXPLAIN</command>, or emitted in the log if
+        configured via the <xref linkend="guc-log-line-prefix"/> parameter.
+        The <xref linkend="pgstatstatements"/> extension also requires a query
         identifier to be computed.  Note that an external module can
         alternatively be used if the in-core query identifier computation
         specification isn't acceptable.  In this case, in-core computation
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index c4512332a0..4d758fb237 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -136,8 +136,10 @@ ROLLBACK;
       the output column list for each node in the plan tree, schema-qualify
       table and function names, always label variables in expressions with
       their range table alias, and always print the name of each trigger for
-      which statistics are displayed.  This parameter defaults to
-      <literal>FALSE</literal>.
+      which statistics are displayed.  The query identifier will also be
+      displayed if one has been computed, see <xref
+      linkend="guc-compute-query-id"/> for more details.  This parameter
+      defaults to <literal>FALSE</literal>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index ede8cec947..b62a76e7e5 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -24,6 +24,7 @@
 #include "nodes/extensible.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "parser/analyze.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/bufmgr.h"
@@ -165,6 +166,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 {
 	ExplainState *es = NewExplainState();
 	TupOutputState *tstate;
+	JumbleState *jstate = NULL;
+	Query		*query;
 	List	   *rewritten;
 	ListCell   *lc;
 	bool		timing_set = false;
@@ -241,6 +244,13 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	/* if the summary was not set explicitly, set default value */
 	es->summary = (summary_set) ? es->summary : es->analyze;
 
+	query = castNode(Query, stmt->query);
+	if (compute_query_id)
+		jstate = JumbleQuery(query, pstate->p_sourcetext);
+
+	if (post_parse_analyze_hook)
+		(*post_parse_analyze_hook) (pstate, query, jstate);
+
 	/*
 	 * Parse analysis was done already, but we still have to run the rule
 	 * rewriter.  We do not do AcquireRewriteLocks: we assume the query either
@@ -600,6 +610,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	/* Create textual dump of plan tree */
 	ExplainPrintPlan(es, queryDesc);
 
+	if (es->verbose && plannedstmt->queryId != UINT64CONST(0))
+	{
+		char	buf[MAXINT8LEN+1];
+
+		pg_lltoa(plannedstmt->queryId, buf);
+		ExplainPropertyText("Query Identifier", buf, es);
+	}
+
 	/* Show buffer usage in planning */
 	if (bufusage)
 	{
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
index b89b99fb02..4c578d4f5e 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -17,7 +17,7 @@ begin
     for ln in execute $1
     loop
         -- Replace any numeric word with just 'N'
-        ln := regexp_replace(ln, '\m\d+\M', 'N', 'g');
+        ln := regexp_replace(ln, '-?\m\d+\M', 'N', 'g');
         -- In sort output, the above won't match units-suffixed numbers
         ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
         -- Ignore text-mode buffers output because it varies depending
@@ -477,3 +477,12 @@ select jsonb_pretty(
 (1 row)
 
 rollback;
+set compute_query_id = on;
+select explain_filter('explain (verbose) select 1');
+             explain_filter             
+----------------------------------------
+ Result  (cost=N.N..N.N rows=N width=N)
+   Output: N
+ Query Identifier: N
+(3 rows)
+
diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql
index f2eab030d6..468caf4037 100644
--- a/src/test/regress/sql/explain.sql
+++ b/src/test/regress/sql/explain.sql
@@ -19,7 +19,7 @@ begin
     for ln in execute $1
     loop
         -- Replace any numeric word with just 'N'
-        ln := regexp_replace(ln, '\m\d+\M', 'N', 'g');
+        ln := regexp_replace(ln, '-?\m\d+\M', 'N', 'g');
         -- In sort output, the above won't match units-suffixed numbers
         ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
         -- Ignore text-mode buffers output because it varies depending
@@ -103,3 +103,6 @@ select jsonb_pretty(
 );
 
 rollback;
+
+set compute_query_id = on;
+select explain_filter('explain (verbose) select 1');
-- 
2.30.1

bruce@momjian.us

almost 5 years ago

In reply to: Julien Rouhaud (#166)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Sun, Apr 4, 2021 at 10:18:50PM +0800, Julien Rouhaud wrote:

On Fri, Apr 02, 2021 at 01:33:28PM +0800, Julien Rouhaud wrote:

On Thu, Apr 01, 2021 at 03:27:11PM -0400, Bruce Momjian wrote:

OK, I am happy with your design decisions, thanks.

Thanks! While double checking I noticed that I failed to remove a (now)
useless include of pgstat.h in nodeGatherMerge.c in last version. I'm
attaching v22 to fix that, no other change.

There was a conflict since e1025044c (Split backend status and progress related
functionality out of pgstat.c).

Attached v23 is a rebase against current HEAD, and I also added a few
UINT64CONST() macro usage for consistency.

Thanks. I struggled with merging the statistics collection changes into
my cluster file encryption branches because my patch made changes to
code that moved to another C file.

I plan to apply this tomorrow.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

nitinjadhavpostgres@gmail.com

almost 5 years ago

In reply to: Bruce Momjian (#167)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

I have reviewed the code. Here are a few minor comments.

1.
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ if (!beentry)
+ return;
+
+ /*
+ * if track_activities is disabled, st_queryid should already have been
+ * reset
+ */
+ if (!pgstat_track_activities)
+ return;

The above two conditions can be clubbed together in a single condition.

2.
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ * Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+ if (!MyBEEntry)
+ return 0;
+
+ return MyBEEntry->st_queryid;
+}

Is it safe to directly read the data from MyBEEntry without
calling pgstat_begin_read_activity() and pgstat_end_read_activity(). Kindly
ref pgstat_get_backend_current_activity() for more information. Kindly let
me know if I am wrong.

Thanks and Regards,
Nitin Jadhav

On Mon, Apr 5, 2021 at 10:46 PM Bruce Momjian <bruce@momjian.us> wrote:

Show quoted text

On Sun, Apr 4, 2021 at 10:18:50PM +0800, Julien Rouhaud wrote:

On Fri, Apr 02, 2021 at 01:33:28PM +0800, Julien Rouhaud wrote:

On Thu, Apr 01, 2021 at 03:27:11PM -0400, Bruce Momjian wrote:

OK, I am happy with your design decisions, thanks.

Thanks! While double checking I noticed that I failed to remove a

(now)

useless include of pgstat.h in nodeGatherMerge.c in last version. I'm
attaching v22 to fix that, no other change.

There was a conflict since e1025044c (Split backend status and progress

related

functionality out of pgstat.c).

Attached v23 is a rebase against current HEAD, and I also added a few
UINT64CONST() macro usage for consistency.

Thanks. I struggled with merging the statistics collection changes into
my cluster file encryption branches because my patch made changes to
code that moved to another C file.

I plan to apply this tomorrow.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

rjuju123@gmail.com

almost 5 years ago

In reply to: Nitin Jadhav (#168)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Apr 06, 2021 at 08:05:19PM +0530, Nitin Jadhav wrote:

1.
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ if (!beentry)
+ return;
+
+ /*
+ * if track_activities is disabled, st_queryid should already have been
+ * reset
+ */
+ if (!pgstat_track_activities)
+ return;

The above two conditions can be clubbed together in a single condition.

Right, I just kept it separate as the comment is only relevant for the 2nd
test. I'm fine with merging both if needed.

2.
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ * Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+ if (!MyBEEntry)
+ return 0;
+
+ return MyBEEntry->st_queryid;
+}
Is it safe to directly read the data from MyBEEntry without
calling pgstat_begin_read_activity() and pgstat_end_read_activity(). Kindly
ref pgstat_get_backend_current_activity() for more information. Kindly let
me know if I am wrong.

This field is only written by a backend for its own entry.
pg_stat_get_activity already has required protection, so the rest of the calls
to read that field shouldn't have any risk of reading torn values on platform
where this isn't an atomic operation due to concurrent write, as it will be
from the same backend that originally wrote it. It avoids some overhead to
retrieve the queryid, but if people think it's worth having the loop (or a
comment explaining why there's no loop) I'm also fine with it.

alvherre@alvh.no-ip.org

almost 5 years ago

In reply to: Nitin Jadhav (#168)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On 2021-Apr-06, Nitin Jadhav wrote:

I have reviewed the code. Here are a few minor comments.
1.
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ if (!beentry)
+ return;
+
+ /*
+ * if track_activities is disabled, st_queryid should already have been
+ * reset
+ */
+ if (!pgstat_track_activities)
+ return;
The above two conditions can be clubbed together in a single condition.

I wonder if it wouldn't make more sense to put the assignment *after* we
have checked the second condition.

--
ï¿½lvaro Herrera Valdivia, Chile

rjuju123@gmail.com

almost 5 years ago

In reply to: Alvaro Herrera (#170)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Apr 06, 2021 at 11:41:52AM -0400, Alvaro Herrera wrote:

On 2021-Apr-06, Nitin Jadhav wrote:
I have reviewed the code. Here are a few minor comments.
1.
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ if (!beentry)
+ return;
+
+ /*
+ * if track_activities is disabled, st_queryid should already have been
+ * reset
+ */
+ if (!pgstat_track_activities)
+ return;
The above two conditions can be clubbed together in a single condition.
I wonder if it wouldn't make more sense to put the assignment *after* we
have checked the second condition.

All other pgstat_report_* functions do the assignment before doing any test on
beentry and/or pgstat_track_activities, I think we should keep this code
consistent.

nitinjadhavpostgres@gmail.com

almost 5 years ago

In reply to: Julien Rouhaud (#169)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

1.
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ if (!beentry)
+ return;
+
+ /*
+ * if track_activities is disabled, st_queryid should already have been
+ * reset
+ */
+ if (!pgstat_track_activities)
+ return;
The above two conditions can be clubbed together in a single condition.
Right, I just kept it separate as the comment is only relevant for the 2nd
test. I'm fine with merging both if needed.

I feel we should merge both of the conditions as it is done in
pgstat_report_xact_timestamp(). Probably we can write a common comment to
explain both the conditions.

2.
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ * Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+ if (!MyBEEntry)
+ return 0;
+
+ return MyBEEntry->st_queryid;
+}
Is it safe to directly read the data from MyBEEntry without
calling pgstat_begin_read_activity() and pgstat_end_read_activity().
Kindly

ref pgstat_get_backend_current_activity() for more information. Kindly

let

me know if I am wrong.

This field is only written by a backend for its own entry.
pg_stat_get_activity already has required protection, so the rest of the
calls
to read that field shouldn't have any risk of reading torn values on
platform
where this isn't an atomic operation due to concurrent write, as it will be
from the same backend that originally wrote it. It avoids some overhead to
retrieve the queryid, but if people think it's worth having the loop (or a
comment explaining why there's no loop) I'm also fine with it.

Thanks for the explanation. Please add a comment explaining why there is no
loop.

Thanks and Regards,
Nitin Jadhav

On Tue, Apr 6, 2021 at 8:40 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

Show quoted text

On Tue, Apr 06, 2021 at 08:05:19PM +0530, Nitin Jadhav wrote:
1.
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ if (!beentry)
+ return;
+
+ /*
+ * if track_activities is disabled, st_queryid should already have been
+ * reset
+ */
+ if (!pgstat_track_activities)
+ return;
The above two conditions can be clubbed together in a single condition.
Right, I just kept it separate as the comment is only relevant for the 2nd
test. I'm fine with merging both if needed.
2.
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ * Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+ if (!MyBEEntry)
+ return 0;
+
+ return MyBEEntry->st_queryid;
+}
Is it safe to directly read the data from MyBEEntry without
calling pgstat_begin_read_activity() and pgstat_end_read_activity().
Kindly

ref pgstat_get_backend_current_activity() for more information. Kindly

let

me know if I am wrong.

This field is only written by a backend for its own entry.
pg_stat_get_activity already has required protection, so the rest of the
calls
to read that field shouldn't have any risk of reading torn values on
platform
where this isn't an atomic operation due to concurrent write, as it will be
from the same backend that originally wrote it. It avoids some overhead to
retrieve the queryid, but if people think it's worth having the loop (or a
comment explaining why there's no loop) I'm also fine with it.

nitinjadhavpostgres@gmail.com

almost 5 years ago

In reply to: Julien Rouhaud (#171)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Apr 06, 2021 at 11:41:52AM -0400, Alvaro Herrera wrote:
On 2021-Apr-06, Nitin Jadhav wrote:
I have reviewed the code. Here are a few minor comments.
1.
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ if (!beentry)
+ return;
+
+ /*
+ * if track_activities is disabled, st_queryid should already have
been
+ * reset
+ */
+ if (!pgstat_track_activities)
+ return;
The above two conditions can be clubbed together in a single condition.
I wonder if it wouldn't make more sense to put the assignment *after* we
have checked the second condition.
All other pgstat_report_* functions do the assignment before doing any
test on
beentry and/or pgstat_track_activities, I think we should keep this code
consistent.

I agree about this.

Thanks and Regards,
Nitin Jadhav

On Tue, Apr 6, 2021 at 9:18 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

Show quoted text

On Tue, Apr 06, 2021 at 11:41:52AM -0400, Alvaro Herrera wrote:
On 2021-Apr-06, Nitin Jadhav wrote:
I have reviewed the code. Here are a few minor comments.
1.
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ if (!beentry)
+ return;
+
+ /*
+ * if track_activities is disabled, st_queryid should already have
been
+ * reset
+ */
+ if (!pgstat_track_activities)
+ return;
The above two conditions can be clubbed together in a single condition.
I wonder if it wouldn't make more sense to put the assignment *after* we
have checked the second condition.
All other pgstat_report_* functions do the assignment before doing any
test on
beentry and/or pgstat_track_activities, I think we should keep this code
consistent.

rjuju123@gmail.com

almost 5 years ago

In reply to: Nitin Jadhav (#172)

3 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Apr 07, 2021 at 06:15:27PM +0530, Nitin Jadhav wrote:

I feel we should merge both of the conditions as it is done in
pgstat_report_xact_timestamp(). Probably we can write a common comment to
explain both the conditions.

[...]

Thanks for the explanation. Please add a comment explaining why there is no
loop.

PFA v24.

Attachments:

v24-0001-Move-pg_stat_statements-query-jumbling-to-core.patchtext/x-diff; charset=us-asciiDownload

From 29eda2d08f3ed38bbf443898dfad645f5d279d96 Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Mon, 22 Mar 2021 17:43:22 -0400
Subject: [PATCH v24 1/3] Move pg_stat_statements query jumbling to core.

A new compute_query_id GUC is also added, to control whether a query identifier
should be computed by the core.  It's thefore now possible to disable core
queryid computation and use pg_stat_statements with a different algorithm to
compute the query identifier by using third-party module.

To ensure that a single source of query identifier can be used and is well
defined, modules that calculate a query identifier should throw an error if
compute_query_id is enabled or if a query idenfitier was already calculated.
---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +----------------
 .../pg_stat_statements.conf                   |   1 +
 doc/src/sgml/config.sgml                      |  25 +
 doc/src/sgml/pgstatstatements.sgml            |  20 +-
 src/backend/parser/analyze.c                  |  14 +-
 src/backend/tcop/postgres.c                   |   6 +-
 src/backend/utils/misc/Makefile               |   1 +
 src/backend/utils/misc/guc.c                  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          | 834 ++++++++++++++++++
 src/include/parser/analyze.h                  |   4 +-
 src/include/utils/guc.h                       |   1 +
 src/include/utils/queryjumble.h               |  58 ++
 13 files changed, 995 insertions(+), 785 deletions(-)
 create mode 100644 src/backend/utils/misc/queryjumble.c
 create mode 100644 src/include/utils/queryjumble.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 1141d2b067..0f8bac0cca 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * Starting in Postgres 9.2, this module normalized query entries.  As of
+ * Postgres 14, the normalization is done by the core if compute_query_id is
+ * enabled, or optionally by third-party modules.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -116,8 +101,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
-#define JUMBLE_SIZE				1024	/* query serialization buffer size */
-
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -237,40 +220,6 @@ typedef struct pgssSharedState
 	pgssGlobalStats stats;		/* global statistics for pgss */
 } pgssSharedState;
 
-/*
- * Struct for tracking locations/lengths of constants during normalization
- */
-typedef struct pgssLocationLen
-{
-	int			location;		/* start offset in query text */
-	int			length;			/* length in bytes, or -1 to ignore */
-} pgssLocationLen;
-
-/*
- * Working state for computing a query jumble and producing a normalized
- * query string
- */
-typedef struct pgssJumbleState
-{
-	/* Jumble of current query tree */
-	unsigned char *jumble;
-
-	/* Number of bytes used in jumble[] */
-	Size		jumble_len;
-
-	/* Array of locations of constants that should be removed */
-	pgssLocationLen *clocations;
-
-	/* Allocated length of clocations array */
-	int			clocations_buf_size;
-
-	/* Current number of valid entries in clocations array */
-	int			clocations_count;
-
-	/* highest Param id we've seen, in order to start normalization correctly */
-	int			highest_extern_param_id;
-} pgssJumbleState;
-
 /*---- Local variables ----*/
 
 /* Current nesting depth of ExecutorRun+ProcessUtility calls */
@@ -344,7 +293,8 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_info);
 
 static void pgss_shmem_startup(void);
 static void pgss_shmem_shutdown(int code, Datum arg);
-static void pgss_post_parse_analyze(ParseState *pstate, Query *query);
+static void pgss_post_parse_analyze(ParseState *pstate, Query *query,
+									JumbleState *jstate);
 static PlannedStmt *pgss_planner(Query *parse,
 								 const char *query_string,
 								 int cursorOptions,
@@ -366,7 +316,7 @@ static void pgss_store(const char *query, uint64 queryId,
 					   double total_time, uint64 rows,
 					   const BufferUsage *bufusage,
 					   const WalUsage *walusage,
-					   pgssJumbleState *jstate);
+					   JumbleState *jstate);
 static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
 										pgssVersion api_version,
 										bool showtext);
@@ -382,16 +332,9 @@ static char *qtext_fetch(Size query_offset, int query_len,
 static bool need_gc_qtexts(void);
 static void gc_qtexts(void);
 static void entry_reset(Oid userid, Oid dbid, uint64 queryid);
-static void AppendJumble(pgssJumbleState *jstate,
-						 const unsigned char *item, Size size);
-static void JumbleQuery(pgssJumbleState *jstate, Query *query);
-static void JumbleRangeTable(pgssJumbleState *jstate, List *rtable);
-static void JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks);
-static void JumbleExpr(pgssJumbleState *jstate, Node *node);
-static void RecordConstLocation(pgssJumbleState *jstate, int location);
-static char *generate_normalized_query(pgssJumbleState *jstate, const char *query,
+static char *generate_normalized_query(JumbleState *jstate, const char *query,
 									   int query_loc, int *query_len_p);
-static void fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+static void fill_in_constant_lengths(JumbleState *jstate, const char *query,
 									 int query_loc);
 static int	comp_location(const void *a, const void *b);
 
@@ -853,15 +796,10 @@ error:
  * Post-parse-analysis hook: mark query with a queryId
  */
 static void
-pgss_post_parse_analyze(ParseState *pstate, Query *query)
+pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 {
-	pgssJumbleState jstate;
-
 	if (prev_post_parse_analyze_hook)
-		prev_post_parse_analyze_hook(pstate, query);
-
-	/* Assert we didn't do this already */
-	Assert(query->queryId == UINT64CONST(0));
+		prev_post_parse_analyze_hook(pstate, query, jstate);
 
 	/* Safety check... */
 	if (!pgss || !pgss_hash || !pgss_enabled(exec_nested_level))
@@ -881,35 +819,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 	}
 
-	/* Set up workspace for query jumbling */
-	jstate.jumble = (unsigned char *) palloc(JUMBLE_SIZE);
-	jstate.jumble_len = 0;
-	jstate.clocations_buf_size = 32;
-	jstate.clocations = (pgssLocationLen *)
-		palloc(jstate.clocations_buf_size * sizeof(pgssLocationLen));
-	jstate.clocations_count = 0;
-	jstate.highest_extern_param_id = 0;
-
-	/* Compute query ID and mark the Query node with it */
-	JumbleQuery(&jstate, query);
-	query->queryId =
-		DatumGetUInt64(hash_any_extended(jstate.jumble, jstate.jumble_len, 0));
-
 	/*
-	 * If we are unlucky enough to get a hash of zero, use 1 instead, to
-	 * prevent confusion with the utility-statement case.
+	 * If query jumbling were able to identify any ignorable constants, we
+	 * immediately create a hash table entry for the query, so that we can
+	 * record the normalized form of the query string.  If there were no such
+	 * constants, the normalized string would be the same as the query text
+	 * anyway, so there's no need for an early entry.
 	 */
-	if (query->queryId == UINT64CONST(0))
-		query->queryId = UINT64CONST(1);
-
-	/*
-	 * If we were able to identify any ignorable constants, we immediately
-	 * create a hash table entry for the query, so that we can record the
-	 * normalized form of the query string.  If there were no such constants,
-	 * the normalized string would be the same as the query text anyway, so
-	 * there's no need for an early entry.
-	 */
-	if (jstate.clocations_count > 0)
+	if (jstate && jstate->clocations_count > 0)
 		pgss_store(pstate->p_sourcetext,
 				   query->queryId,
 				   query->stmt_location,
@@ -919,7 +836,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 				   0,
 				   NULL,
 				   NULL,
-				   &jstate);
+				   jstate);
 }
 
 /*
@@ -1269,7 +1186,7 @@ pgss_store(const char *query, uint64 queryId,
 		   double total_time, uint64 rows,
 		   const BufferUsage *bufusage,
 		   const WalUsage *walusage,
-		   pgssJumbleState *jstate)
+		   JumbleState *jstate)
 {
 	pgssHashKey key;
 	pgssEntry  *entry;
@@ -2629,678 +2546,6 @@ release_lock:
 	LWLockRelease(pgss->lock);
 }
 
-/*
- * AppendJumble: Append a value that is substantive in a given query to
- * the current jumble.
- */
-static void
-AppendJumble(pgssJumbleState *jstate, const unsigned char *item, Size size)
-{
-	unsigned char *jumble = jstate->jumble;
-	Size		jumble_len = jstate->jumble_len;
-
-	/*
-	 * Whenever the jumble buffer is full, we hash the current contents and
-	 * reset the buffer to contain just that hash value, thus relying on the
-	 * hash to summarize everything so far.
-	 */
-	while (size > 0)
-	{
-		Size		part_size;
-
-		if (jumble_len >= JUMBLE_SIZE)
-		{
-			uint64		start_hash;
-
-			start_hash = DatumGetUInt64(hash_any_extended(jumble,
-														  JUMBLE_SIZE, 0));
-			memcpy(jumble, &start_hash, sizeof(start_hash));
-			jumble_len = sizeof(start_hash);
-		}
-		part_size = Min(size, JUMBLE_SIZE - jumble_len);
-		memcpy(jumble + jumble_len, item, part_size);
-		jumble_len += part_size;
-		item += part_size;
-		size -= part_size;
-	}
-	jstate->jumble_len = jumble_len;
-}
-
-/*
- * Wrappers around AppendJumble to encapsulate details of serialization
- * of individual local variable elements.
- */
-#define APP_JUMB(item) \
-	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
-#define APP_JUMB_STRING(str) \
-	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
-
-/*
- * JumbleQuery: Selectively serialize the query tree, appending significant
- * data to the "query jumble" while ignoring nonsignificant data.
- *
- * Rule of thumb for what to include is that we should ignore anything not
- * semantically significant (such as alias names) as well as anything that can
- * be deduced from child nodes (else we'd just be double-hashing that piece
- * of information).
- */
-static void
-JumbleQuery(pgssJumbleState *jstate, Query *query)
-{
-	Assert(IsA(query, Query));
-	Assert(query->utilityStmt == NULL);
-
-	APP_JUMB(query->commandType);
-	/* resultRelation is usually predictable from commandType */
-	JumbleExpr(jstate, (Node *) query->cteList);
-	JumbleRangeTable(jstate, query->rtable);
-	JumbleExpr(jstate, (Node *) query->jointree);
-	JumbleExpr(jstate, (Node *) query->targetList);
-	JumbleExpr(jstate, (Node *) query->onConflict);
-	JumbleExpr(jstate, (Node *) query->returningList);
-	JumbleExpr(jstate, (Node *) query->groupClause);
-	JumbleExpr(jstate, (Node *) query->groupingSets);
-	JumbleExpr(jstate, query->havingQual);
-	JumbleExpr(jstate, (Node *) query->windowClause);
-	JumbleExpr(jstate, (Node *) query->distinctClause);
-	JumbleExpr(jstate, (Node *) query->sortClause);
-	JumbleExpr(jstate, query->limitOffset);
-	JumbleExpr(jstate, query->limitCount);
-	JumbleRowMarks(jstate, query->rowMarks);
-	JumbleExpr(jstate, query->setOperations);
-}
-
-/*
- * Jumble a range table
- */
-static void
-JumbleRangeTable(pgssJumbleState *jstate, List *rtable)
-{
-	ListCell   *lc;
-
-	foreach(lc, rtable)
-	{
-		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
-
-		APP_JUMB(rte->rtekind);
-		switch (rte->rtekind)
-		{
-			case RTE_RELATION:
-				APP_JUMB(rte->relid);
-				JumbleExpr(jstate, (Node *) rte->tablesample);
-				break;
-			case RTE_SUBQUERY:
-				JumbleQuery(jstate, rte->subquery);
-				break;
-			case RTE_JOIN:
-				APP_JUMB(rte->jointype);
-				break;
-			case RTE_FUNCTION:
-				JumbleExpr(jstate, (Node *) rte->functions);
-				break;
-			case RTE_TABLEFUNC:
-				JumbleExpr(jstate, (Node *) rte->tablefunc);
-				break;
-			case RTE_VALUES:
-				JumbleExpr(jstate, (Node *) rte->values_lists);
-				break;
-			case RTE_CTE:
-
-				/*
-				 * Depending on the CTE name here isn't ideal, but it's the
-				 * only info we have to identify the referenced WITH item.
-				 */
-				APP_JUMB_STRING(rte->ctename);
-				APP_JUMB(rte->ctelevelsup);
-				break;
-			case RTE_NAMEDTUPLESTORE:
-				APP_JUMB_STRING(rte->enrname);
-				break;
-			case RTE_RESULT:
-				break;
-			default:
-				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
-				break;
-		}
-	}
-}
-
-/*
- * Jumble a rowMarks list
- */
-static void
-JumbleRowMarks(pgssJumbleState *jstate, List *rowMarks)
-{
-	ListCell   *lc;
-
-	foreach(lc, rowMarks)
-	{
-		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
-
-		if (!rowmark->pushedDown)
-		{
-			APP_JUMB(rowmark->rti);
-			APP_JUMB(rowmark->strength);
-			APP_JUMB(rowmark->waitPolicy);
-		}
-	}
-}
-
-/*
- * Jumble an expression tree
- *
- * In general this function should handle all the same node types that
- * expression_tree_walker() does, and therefore it's coded to be as parallel
- * to that function as possible.  However, since we are only invoked on
- * queries immediately post-parse-analysis, we need not handle node types
- * that only appear in planning.
- *
- * Note: the reason we don't simply use expression_tree_walker() is that the
- * point of that function is to support tree walkers that don't care about
- * most tree node types, but here we care about all types.  We should complain
- * about any unrecognized node type.
- */
-static void
-JumbleExpr(pgssJumbleState *jstate, Node *node)
-{
-	ListCell   *temp;
-
-	if (node == NULL)
-		return;
-
-	/* Guard against stack overflow due to overly complex expressions */
-	check_stack_depth();
-
-	/*
-	 * We always emit the node's NodeTag, then any additional fields that are
-	 * considered significant, and then we recurse to any child nodes.
-	 */
-	APP_JUMB(node->type);
-
-	switch (nodeTag(node))
-	{
-		case T_Var:
-			{
-				Var		   *var = (Var *) node;
-
-				APP_JUMB(var->varno);
-				APP_JUMB(var->varattno);
-				APP_JUMB(var->varlevelsup);
-			}
-			break;
-		case T_Const:
-			{
-				Const	   *c = (Const *) node;
-
-				/* We jumble only the constant's type, not its value */
-				APP_JUMB(c->consttype);
-				/* Also, record its parse location for query normalization */
-				RecordConstLocation(jstate, c->location);
-			}
-			break;
-		case T_Param:
-			{
-				Param	   *p = (Param *) node;
-
-				APP_JUMB(p->paramkind);
-				APP_JUMB(p->paramid);
-				APP_JUMB(p->paramtype);
-				/* Also, track the highest external Param id */
-				if (p->paramkind == PARAM_EXTERN &&
-					p->paramid > jstate->highest_extern_param_id)
-					jstate->highest_extern_param_id = p->paramid;
-			}
-			break;
-		case T_Aggref:
-			{
-				Aggref	   *expr = (Aggref *) node;
-
-				APP_JUMB(expr->aggfnoid);
-				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggorder);
-				JumbleExpr(jstate, (Node *) expr->aggdistinct);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_GroupingFunc:
-			{
-				GroupingFunc *grpnode = (GroupingFunc *) node;
-
-				JumbleExpr(jstate, (Node *) grpnode->refs);
-			}
-			break;
-		case T_WindowFunc:
-			{
-				WindowFunc *expr = (WindowFunc *) node;
-
-				APP_JUMB(expr->winfnoid);
-				APP_JUMB(expr->winref);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
-			}
-			break;
-		case T_SubscriptingRef:
-			{
-				SubscriptingRef *sbsref = (SubscriptingRef *) node;
-
-				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
-			}
-			break;
-		case T_FuncExpr:
-			{
-				FuncExpr   *expr = (FuncExpr *) node;
-
-				APP_JUMB(expr->funcid);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_NamedArgExpr:
-			{
-				NamedArgExpr *nae = (NamedArgExpr *) node;
-
-				APP_JUMB(nae->argnumber);
-				JumbleExpr(jstate, (Node *) nae->arg);
-			}
-			break;
-		case T_OpExpr:
-		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
-		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
-			{
-				OpExpr	   *expr = (OpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_ScalarArrayOpExpr:
-			{
-				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
-
-				APP_JUMB(expr->opno);
-				APP_JUMB(expr->useOr);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_BoolExpr:
-			{
-				BoolExpr   *expr = (BoolExpr *) node;
-
-				APP_JUMB(expr->boolop);
-				JumbleExpr(jstate, (Node *) expr->args);
-			}
-			break;
-		case T_SubLink:
-			{
-				SubLink    *sublink = (SubLink *) node;
-
-				APP_JUMB(sublink->subLinkType);
-				APP_JUMB(sublink->subLinkId);
-				JumbleExpr(jstate, (Node *) sublink->testexpr);
-				JumbleQuery(jstate, castNode(Query, sublink->subselect));
-			}
-			break;
-		case T_FieldSelect:
-			{
-				FieldSelect *fs = (FieldSelect *) node;
-
-				APP_JUMB(fs->fieldnum);
-				JumbleExpr(jstate, (Node *) fs->arg);
-			}
-			break;
-		case T_FieldStore:
-			{
-				FieldStore *fstore = (FieldStore *) node;
-
-				JumbleExpr(jstate, (Node *) fstore->arg);
-				JumbleExpr(jstate, (Node *) fstore->newvals);
-			}
-			break;
-		case T_RelabelType:
-			{
-				RelabelType *rt = (RelabelType *) node;
-
-				APP_JUMB(rt->resulttype);
-				JumbleExpr(jstate, (Node *) rt->arg);
-			}
-			break;
-		case T_CoerceViaIO:
-			{
-				CoerceViaIO *cio = (CoerceViaIO *) node;
-
-				APP_JUMB(cio->resulttype);
-				JumbleExpr(jstate, (Node *) cio->arg);
-			}
-			break;
-		case T_ArrayCoerceExpr:
-			{
-				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
-
-				APP_JUMB(acexpr->resulttype);
-				JumbleExpr(jstate, (Node *) acexpr->arg);
-				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
-			}
-			break;
-		case T_ConvertRowtypeExpr:
-			{
-				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
-
-				APP_JUMB(crexpr->resulttype);
-				JumbleExpr(jstate, (Node *) crexpr->arg);
-			}
-			break;
-		case T_CollateExpr:
-			{
-				CollateExpr *ce = (CollateExpr *) node;
-
-				APP_JUMB(ce->collOid);
-				JumbleExpr(jstate, (Node *) ce->arg);
-			}
-			break;
-		case T_CaseExpr:
-			{
-				CaseExpr   *caseexpr = (CaseExpr *) node;
-
-				JumbleExpr(jstate, (Node *) caseexpr->arg);
-				foreach(temp, caseexpr->args)
-				{
-					CaseWhen   *when = lfirst_node(CaseWhen, temp);
-
-					JumbleExpr(jstate, (Node *) when->expr);
-					JumbleExpr(jstate, (Node *) when->result);
-				}
-				JumbleExpr(jstate, (Node *) caseexpr->defresult);
-			}
-			break;
-		case T_CaseTestExpr:
-			{
-				CaseTestExpr *ct = (CaseTestExpr *) node;
-
-				APP_JUMB(ct->typeId);
-			}
-			break;
-		case T_ArrayExpr:
-			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
-			break;
-		case T_RowExpr:
-			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
-			break;
-		case T_RowCompareExpr:
-			{
-				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
-
-				APP_JUMB(rcexpr->rctype);
-				JumbleExpr(jstate, (Node *) rcexpr->largs);
-				JumbleExpr(jstate, (Node *) rcexpr->rargs);
-			}
-			break;
-		case T_CoalesceExpr:
-			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
-			break;
-		case T_MinMaxExpr:
-			{
-				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
-
-				APP_JUMB(mmexpr->op);
-				JumbleExpr(jstate, (Node *) mmexpr->args);
-			}
-			break;
-		case T_SQLValueFunction:
-			{
-				SQLValueFunction *svf = (SQLValueFunction *) node;
-
-				APP_JUMB(svf->op);
-				/* type is fully determined by op */
-				APP_JUMB(svf->typmod);
-			}
-			break;
-		case T_XmlExpr:
-			{
-				XmlExpr    *xexpr = (XmlExpr *) node;
-
-				APP_JUMB(xexpr->op);
-				JumbleExpr(jstate, (Node *) xexpr->named_args);
-				JumbleExpr(jstate, (Node *) xexpr->args);
-			}
-			break;
-		case T_NullTest:
-			{
-				NullTest   *nt = (NullTest *) node;
-
-				APP_JUMB(nt->nulltesttype);
-				JumbleExpr(jstate, (Node *) nt->arg);
-			}
-			break;
-		case T_BooleanTest:
-			{
-				BooleanTest *bt = (BooleanTest *) node;
-
-				APP_JUMB(bt->booltesttype);
-				JumbleExpr(jstate, (Node *) bt->arg);
-			}
-			break;
-		case T_CoerceToDomain:
-			{
-				CoerceToDomain *cd = (CoerceToDomain *) node;
-
-				APP_JUMB(cd->resulttype);
-				JumbleExpr(jstate, (Node *) cd->arg);
-			}
-			break;
-		case T_CoerceToDomainValue:
-			{
-				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
-
-				APP_JUMB(cdv->typeId);
-			}
-			break;
-		case T_SetToDefault:
-			{
-				SetToDefault *sd = (SetToDefault *) node;
-
-				APP_JUMB(sd->typeId);
-			}
-			break;
-		case T_CurrentOfExpr:
-			{
-				CurrentOfExpr *ce = (CurrentOfExpr *) node;
-
-				APP_JUMB(ce->cvarno);
-				if (ce->cursor_name)
-					APP_JUMB_STRING(ce->cursor_name);
-				APP_JUMB(ce->cursor_param);
-			}
-			break;
-		case T_NextValueExpr:
-			{
-				NextValueExpr *nve = (NextValueExpr *) node;
-
-				APP_JUMB(nve->seqid);
-				APP_JUMB(nve->typeId);
-			}
-			break;
-		case T_InferenceElem:
-			{
-				InferenceElem *ie = (InferenceElem *) node;
-
-				APP_JUMB(ie->infercollid);
-				APP_JUMB(ie->inferopclass);
-				JumbleExpr(jstate, ie->expr);
-			}
-			break;
-		case T_TargetEntry:
-			{
-				TargetEntry *tle = (TargetEntry *) node;
-
-				APP_JUMB(tle->resno);
-				APP_JUMB(tle->ressortgroupref);
-				JumbleExpr(jstate, (Node *) tle->expr);
-			}
-			break;
-		case T_RangeTblRef:
-			{
-				RangeTblRef *rtr = (RangeTblRef *) node;
-
-				APP_JUMB(rtr->rtindex);
-			}
-			break;
-		case T_JoinExpr:
-			{
-				JoinExpr   *join = (JoinExpr *) node;
-
-				APP_JUMB(join->jointype);
-				APP_JUMB(join->isNatural);
-				APP_JUMB(join->rtindex);
-				JumbleExpr(jstate, join->larg);
-				JumbleExpr(jstate, join->rarg);
-				JumbleExpr(jstate, join->quals);
-			}
-			break;
-		case T_FromExpr:
-			{
-				FromExpr   *from = (FromExpr *) node;
-
-				JumbleExpr(jstate, (Node *) from->fromlist);
-				JumbleExpr(jstate, from->quals);
-			}
-			break;
-		case T_OnConflictExpr:
-			{
-				OnConflictExpr *conf = (OnConflictExpr *) node;
-
-				APP_JUMB(conf->action);
-				JumbleExpr(jstate, (Node *) conf->arbiterElems);
-				JumbleExpr(jstate, conf->arbiterWhere);
-				JumbleExpr(jstate, (Node *) conf->onConflictSet);
-				JumbleExpr(jstate, conf->onConflictWhere);
-				APP_JUMB(conf->constraint);
-				APP_JUMB(conf->exclRelIndex);
-				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
-			}
-			break;
-		case T_List:
-			foreach(temp, (List *) node)
-			{
-				JumbleExpr(jstate, (Node *) lfirst(temp));
-			}
-			break;
-		case T_IntList:
-			foreach(temp, (List *) node)
-			{
-				APP_JUMB(lfirst_int(temp));
-			}
-			break;
-		case T_SortGroupClause:
-			{
-				SortGroupClause *sgc = (SortGroupClause *) node;
-
-				APP_JUMB(sgc->tleSortGroupRef);
-				APP_JUMB(sgc->eqop);
-				APP_JUMB(sgc->sortop);
-				APP_JUMB(sgc->nulls_first);
-			}
-			break;
-		case T_GroupingSet:
-			{
-				GroupingSet *gsnode = (GroupingSet *) node;
-
-				JumbleExpr(jstate, (Node *) gsnode->content);
-			}
-			break;
-		case T_WindowClause:
-			{
-				WindowClause *wc = (WindowClause *) node;
-
-				APP_JUMB(wc->winref);
-				APP_JUMB(wc->frameOptions);
-				JumbleExpr(jstate, (Node *) wc->partitionClause);
-				JumbleExpr(jstate, (Node *) wc->orderClause);
-				JumbleExpr(jstate, wc->startOffset);
-				JumbleExpr(jstate, wc->endOffset);
-			}
-			break;
-		case T_CommonTableExpr:
-			{
-				CommonTableExpr *cte = (CommonTableExpr *) node;
-
-				/* we store the string name because RTE_CTE RTEs need it */
-				APP_JUMB_STRING(cte->ctename);
-				APP_JUMB(cte->ctematerialized);
-				JumbleQuery(jstate, castNode(Query, cte->ctequery));
-			}
-			break;
-		case T_SetOperationStmt:
-			{
-				SetOperationStmt *setop = (SetOperationStmt *) node;
-
-				APP_JUMB(setop->op);
-				APP_JUMB(setop->all);
-				JumbleExpr(jstate, setop->larg);
-				JumbleExpr(jstate, setop->rarg);
-			}
-			break;
-		case T_RangeTblFunction:
-			{
-				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
-
-				JumbleExpr(jstate, rtfunc->funcexpr);
-			}
-			break;
-		case T_TableFunc:
-			{
-				TableFunc  *tablefunc = (TableFunc *) node;
-
-				JumbleExpr(jstate, tablefunc->docexpr);
-				JumbleExpr(jstate, tablefunc->rowexpr);
-				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
-			}
-			break;
-		case T_TableSampleClause:
-			{
-				TableSampleClause *tsc = (TableSampleClause *) node;
-
-				APP_JUMB(tsc->tsmhandler);
-				JumbleExpr(jstate, (Node *) tsc->args);
-				JumbleExpr(jstate, (Node *) tsc->repeatable);
-			}
-			break;
-		default:
-			/* Only a warning, since we can stumble along anyway */
-			elog(WARNING, "unrecognized node type: %d",
-				 (int) nodeTag(node));
-			break;
-	}
-}
-
-/*
- * Record location of constant within query string of query tree
- * that is currently being walked.
- */
-static void
-RecordConstLocation(pgssJumbleState *jstate, int location)
-{
-	/* -1 indicates unknown or undefined location */
-	if (location >= 0)
-	{
-		/* enlarge array if needed */
-		if (jstate->clocations_count >= jstate->clocations_buf_size)
-		{
-			jstate->clocations_buf_size *= 2;
-			jstate->clocations = (pgssLocationLen *)
-				repalloc(jstate->clocations,
-						 jstate->clocations_buf_size *
-						 sizeof(pgssLocationLen));
-		}
-		jstate->clocations[jstate->clocations_count].location = location;
-		/* initialize lengths to -1 to simplify fill_in_constant_lengths */
-		jstate->clocations[jstate->clocations_count].length = -1;
-		jstate->clocations_count++;
-	}
-}
-
 /*
  * Generate a normalized version of the query string that will be used to
  * represent all similar queries.
@@ -3321,7 +2566,7 @@ RecordConstLocation(pgssJumbleState *jstate, int location)
  * Returns a palloc'd string.
  */
 static char *
-generate_normalized_query(pgssJumbleState *jstate, const char *query,
+generate_normalized_query(JumbleState *jstate, const char *query,
 						  int query_loc, int *query_len_p)
 {
 	char	   *norm_query;
@@ -3428,10 +2673,10 @@ generate_normalized_query(pgssJumbleState *jstate, const char *query,
  * reason for a constant to start with a '-'.
  */
 static void
-fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
+fill_in_constant_lengths(JumbleState *jstate, const char *query,
 						 int query_loc)
 {
-	pgssLocationLen *locs;
+	LocationLen *locs;
 	core_yyscan_t yyscanner;
 	core_yy_extra_type yyextra;
 	core_YYSTYPE yylval;
@@ -3445,7 +2690,7 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 	 */
 	if (jstate->clocations_count > 1)
 		qsort(jstate->clocations, jstate->clocations_count,
-			  sizeof(pgssLocationLen), comp_location);
+			  sizeof(LocationLen), comp_location);
 	locs = jstate->clocations;
 
 	/* initialize the flex scanner --- should match raw_parser() */
@@ -3525,13 +2770,13 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 }
 
 /*
- * comp_location: comparator for qsorting pgssLocationLen structs by location
+ * comp_location: comparator for qsorting LocationLen structs by location
  */
 static int
 comp_location(const void *a, const void *b)
 {
-	int			l = ((const pgssLocationLen *) a)->location;
-	int			r = ((const pgssLocationLen *) b)->location;
+	int			l = ((const LocationLen *) a)->location;
+	int			r = ((const LocationLen *) b)->location;
 
 	if (l < r)
 		return -1;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.conf b/contrib/pg_stat_statements/pg_stat_statements.conf
index 13346e2807..e47b26040f 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.conf
+++ b/contrib/pg_stat_statements/pg_stat_statements.conf
@@ -1 +1,2 @@
 shared_preload_libraries = 'pg_stat_statements'
+compute_query_id = on
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0c9128a55d..b28f7000c1 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7617,6 +7617,31 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
      <title>Statistics Monitoring</title>
      <variablelist>
 
+     <varlistentry id="guc-compute-query-id" xreflabel="compute_query_id">
+      <term><varname>compute_query_id</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>compute_query_id</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables in-core computation of a query identifier.  The <xref
+        linkend="pgstatstatements"/> extension requires a query identifier
+        to be computed.  Note that an external module can alternatively
+        be used if the in-core query identifier computation method
+        isn't acceptable.  In this case, in-core computation should
+        remain disabled.  The default is <literal>off</literal>.
+       </para>
+       <note>
+        <para>
+         To ensure that a only one query identifier is calculated and
+         displayed, extensions that calculate query identifiers should
+         throw an error if a query identifier has already been computed.
+        </para>
+       </note>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><varname>log_statement_stats</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 464bf0e5ae..3ca292d71f 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -20,6 +20,14 @@
   This means that a server restart is needed to add or remove the module.
  </para>
 
+ <para>
+  The module will not track statistics unless query
+  identifiers are calculated.  This can be done by enabling <xref
+  linkend="guc-compute-query-id"/> or using a third-party module that
+  computes its own query identifiers.  Note that all statistics tracked
+  by this module must be reset if the query identifier method is changed.
+ </para>
+
  <para>
    When <filename>pg_stat_statements</filename> is loaded, it tracks
    statistics across all databases of the server.  To access and manipulate
@@ -84,7 +92,7 @@
        <structfield>queryid</structfield> <type>bigint</type>
       </para>
       <para>
-       Internal hash code, computed from the statement's parse tree
+       Hash code to identify identical normalized queries.
       </para></entry>
      </row>
 
@@ -386,6 +394,16 @@
    are compared strictly on the basis of their textual query strings, however.
   </para>
 
+  <note>
+   <para>
+    The following details about constant replacement and
+    <structfield>queryid</structfield> only applies when <xref
+    linkend="guc-compute-query-id"/> is enabled.  If you use an external
+    module instead to compute <structfield>queryid</structfield>, you
+    should refer to its documentation for details.
+   </para>
+  </note>
+
   <para>
    When a constant's value has been ignored for purposes of matching the query
    to other queries, the constant is replaced by a parameter symbol, such
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 5de1307570..35cb9ebfd7 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -46,6 +46,8 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/queryjumble.h"
 #include "utils/rel.h"
 
 
@@ -107,6 +109,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -119,8 +122,11 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	query = transformTopLevelStmt(pstate, parseTree);
 
+	if (compute_query_id)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
@@ -140,6 +146,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
+	JumbleState *jstate = NULL;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -152,8 +159,11 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 	/* make sure all is well with parameter types */
 	check_variable_parameters(pstate, query);
 
+	if (compute_query_id)
+		jstate = JumbleQuery(query, sourceText);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index ad351e2fd1..3a62e45bef 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -668,6 +668,7 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 	ParseState *pstate;
 	Query	   *query;
 	List	   *querytree_list;
+	JumbleState *jstate = NULL;
 
 	Assert(query_string != NULL);	/* required as of 8.4 */
 
@@ -686,8 +687,11 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	query = transformTopLevelStmt(pstate, parsetree);
 
+	if (compute_query_id)
+		jstate = JumbleQuery(query, query_string);
+
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query);
+		(*post_parse_analyze_hook) (pstate, query, jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 2397fc2453..1d5327cf64 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -22,6 +22,7 @@ OBJS = \
 	pg_rusage.o \
 	ps_status.o \
 	queryenvironment.o \
+	queryjumble.o \
 	rls.o \
 	sampling.o \
 	superuser.o \
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index c9c9da85f3..20b677543a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -534,6 +534,7 @@ extern const struct config_enum_entry dynamic_shared_memory_options[];
 /*
  * GUC option variables that are exported from this module
  */
+bool		compute_query_id = false;
 bool		log_duration = false;
 bool		Debug_print_plan = false;
 bool		Debug_print_parse = false;
@@ -1458,6 +1459,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"compute_query_id", PGC_SUSET, STATS_MONITORING,
+			gettext_noop("Compute query identifiers."),
+			NULL
+		},
+		&compute_query_id,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"log_parser_stats", PGC_SUSET, STATS_MONITORING,
 			gettext_noop("Writes parser performance statistics to the server log."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 39da7cc942..192577a02e 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -596,6 +596,7 @@
 
 # - Monitoring -
 
+#compute_query_id = off
 #log_parser_stats = off
 #log_planner_stats = off
 #log_executor_stats = off
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
new file mode 100644
index 0000000000..2a47688fd6
--- /dev/null
+++ b/src/backend/utils/misc/queryjumble.c
@@ -0,0 +1,834 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.c
+ *	 Query normalization and fingerprinting.
+ *
+ * Normalization is a process whereby similar queries, typically differing only
+ * in their constants (though the exact rules are somewhat more subtle than
+ * that) are recognized as equivalent, and are tracked as a single entry.  This
+ * is particularly useful for non-prepared queries.
+ *
+ * Normalization is implemented by fingerprinting queries, selectively
+ * serializing those fields of each query tree's nodes that are judged to be
+ * essential to the query.  This is referred to as a query jumble.  This is
+ * distinct from a regular serialization in that various extraneous
+ * information is ignored as irrelevant or not essential to the query, such
+ * as the collations of Vars and, most notably, the values of constants.
+ *
+ * This jumble is acquired at the end of parse analysis of each query, and
+ * a 64-bit hash of it is stored into the query's Query.queryId field.
+ * The server then copies this value around, making it available in plan
+ * tree(s) generated from the query.  The executor can then use this value
+ * to blame query costs on the proper queryId.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/misc/queryjumble.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "common/hashfn.h"
+#include "miscadmin.h"
+#include "parser/scansup.h"
+#include "utils/queryjumble.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+static uint64 compute_utility_queryid(const char *str, int query_len);
+static void AppendJumble(JumbleState *jstate,
+						 const unsigned char *item, Size size);
+static void JumbleQueryInternal(JumbleState *jstate, Query *query);
+static void JumbleRangeTable(JumbleState *jstate, List *rtable);
+static void JumbleRowMarks(JumbleState *jstate, List *rowMarks);
+static void JumbleExpr(JumbleState *jstate, Node *node);
+static void RecordConstLocation(JumbleState *jstate, int location);
+
+/*
+ * Given a possibly multi-statement source string, confine our attention to the
+ * relevant part of the string.
+ */
+const char *
+CleanQuerytext(const char *query, int *location, int *len)
+{
+	int query_location = *location;
+	int query_len = *len;
+
+	/* First apply starting offset, unless it's -1 (unknown). */
+	if (query_location >= 0)
+	{
+		Assert(query_location <= strlen(query));
+		query += query_location;
+		/* Length of 0 (or -1) means "rest of string" */
+		if (query_len <= 0)
+			query_len = strlen(query);
+		else
+			Assert(query_len <= strlen(query));
+	}
+	else
+	{
+		/* If query location is unknown, distrust query_len as well */
+		query_location = 0;
+		query_len = strlen(query);
+	}
+
+	/*
+	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
+	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 */
+	while (query_len > 0 && scanner_isspace(query[0]))
+		query++, query_location++, query_len--;
+	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
+		query_len--;
+
+	*location = query_location;
+	*len = query_len;
+
+	return query;
+}
+
+JumbleState *
+JumbleQuery(Query *query, const char *querytext)
+{
+	JumbleState *jstate = NULL;
+	if (query->utilityStmt)
+	{
+		const char *sql;
+		int query_location = query->stmt_location;
+		int query_len = query->stmt_len;
+
+		/*
+		 * Confine our attention to the relevant part of the string, if the
+		 * query is a portion of a multi-statement source string.
+		 */
+		sql = CleanQuerytext(querytext, &query_location, &query_len);
+
+		query->queryId = compute_utility_queryid(sql, query_len);
+	}
+	else
+	{
+		jstate = (JumbleState *) palloc(sizeof(JumbleState));
+
+		/* Set up workspace for query jumbling */
+		jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
+		jstate->jumble_len = 0;
+		jstate->clocations_buf_size = 32;
+		jstate->clocations = (LocationLen *)
+			palloc(jstate->clocations_buf_size * sizeof(LocationLen));
+		jstate->clocations_count = 0;
+		jstate->highest_extern_param_id = 0;
+
+		/* Compute query ID and mark the Query node with it */
+		JumbleQueryInternal(jstate, query);
+		query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
+														  jstate->jumble_len,
+														  0));
+
+		/*
+		 * If we are unlucky enough to get a hash of zero, use 1 instead, to
+		 * prevent confusion with the utility-statement case.
+		 */
+		if (query->queryId == UINT64CONST(0))
+			query->queryId = UINT64CONST(1);
+	}
+
+	return jstate;
+}
+
+/*
+ * Compute a query identifier for the given utility query string.
+ */
+static uint64
+compute_utility_queryid(const char *str, int query_len)
+{
+	uint64 queryId;
+
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+											   query_len, 0));
+
+	/*
+	 * If we are unlucky enough to get a hash of zero(invalid), use
+	 * queryID as 2 instead, queryID 1 is already in use for normal
+	 * statements.
+	 */
+	if (queryId == UINT64CONST(0))
+		queryId = UINT64CONST(2);
+
+	return queryId;
+}
+
+/*
+ * AppendJumble: Append a value that is substantive in a given query to
+ * the current jumble.
+ */
+static void
+AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
+{
+	unsigned char *jumble = jstate->jumble;
+	Size		jumble_len = jstate->jumble_len;
+
+	/*
+	 * Whenever the jumble buffer is full, we hash the current contents and
+	 * reset the buffer to contain just that hash value, thus relying on the
+	 * hash to summarize everything so far.
+	 */
+	while (size > 0)
+	{
+		Size		part_size;
+
+		if (jumble_len >= JUMBLE_SIZE)
+		{
+			uint64		start_hash;
+
+			start_hash = DatumGetUInt64(hash_any_extended(jumble,
+														  JUMBLE_SIZE, 0));
+			memcpy(jumble, &start_hash, sizeof(start_hash));
+			jumble_len = sizeof(start_hash);
+		}
+		part_size = Min(size, JUMBLE_SIZE - jumble_len);
+		memcpy(jumble + jumble_len, item, part_size);
+		jumble_len += part_size;
+		item += part_size;
+		size -= part_size;
+	}
+	jstate->jumble_len = jumble_len;
+}
+
+/*
+ * Wrappers around AppendJumble to encapsulate details of serialization
+ * of individual local variable elements.
+ */
+#define APP_JUMB(item) \
+	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
+#define APP_JUMB_STRING(str) \
+	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
+
+/*
+ * JumbleQueryInternal: Selectively serialize the query tree, appending
+ * significant data to the "query jumble" while ignoring nonsignificant data.
+ *
+ * Rule of thumb for what to include is that we should ignore anything not
+ * semantically significant (such as alias names) as well as anything that can
+ * be deduced from child nodes (else we'd just be double-hashing that piece
+ * of information).
+ */
+static void
+JumbleQueryInternal(JumbleState *jstate, Query *query)
+{
+	Assert(IsA(query, Query));
+	Assert(query->utilityStmt == NULL);
+
+	APP_JUMB(query->commandType);
+	/* resultRelation is usually predictable from commandType */
+	JumbleExpr(jstate, (Node *) query->cteList);
+	JumbleRangeTable(jstate, query->rtable);
+	JumbleExpr(jstate, (Node *) query->jointree);
+	JumbleExpr(jstate, (Node *) query->targetList);
+	JumbleExpr(jstate, (Node *) query->onConflict);
+	JumbleExpr(jstate, (Node *) query->returningList);
+	JumbleExpr(jstate, (Node *) query->groupClause);
+	JumbleExpr(jstate, (Node *) query->groupingSets);
+	JumbleExpr(jstate, query->havingQual);
+	JumbleExpr(jstate, (Node *) query->windowClause);
+	JumbleExpr(jstate, (Node *) query->distinctClause);
+	JumbleExpr(jstate, (Node *) query->sortClause);
+	JumbleExpr(jstate, query->limitOffset);
+	JumbleExpr(jstate, query->limitCount);
+	JumbleRowMarks(jstate, query->rowMarks);
+	JumbleExpr(jstate, query->setOperations);
+}
+
+/*
+ * Jumble a range table
+ */
+static void
+JumbleRangeTable(JumbleState *jstate, List *rtable)
+{
+	ListCell   *lc;
+
+	foreach(lc, rtable)
+	{
+		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+		APP_JUMB(rte->rtekind);
+		switch (rte->rtekind)
+		{
+			case RTE_RELATION:
+				APP_JUMB(rte->relid);
+				JumbleExpr(jstate, (Node *) rte->tablesample);
+				break;
+			case RTE_SUBQUERY:
+				JumbleQueryInternal(jstate, rte->subquery);
+				break;
+			case RTE_JOIN:
+				APP_JUMB(rte->jointype);
+				break;
+			case RTE_FUNCTION:
+				JumbleExpr(jstate, (Node *) rte->functions);
+				break;
+			case RTE_TABLEFUNC:
+				JumbleExpr(jstate, (Node *) rte->tablefunc);
+				break;
+			case RTE_VALUES:
+				JumbleExpr(jstate, (Node *) rte->values_lists);
+				break;
+			case RTE_CTE:
+
+				/*
+				 * Depending on the CTE name here isn't ideal, but it's the
+				 * only info we have to identify the referenced WITH item.
+				 */
+				APP_JUMB_STRING(rte->ctename);
+				APP_JUMB(rte->ctelevelsup);
+				break;
+			case RTE_NAMEDTUPLESTORE:
+				APP_JUMB_STRING(rte->enrname);
+				break;
+			case RTE_RESULT:
+				break;
+			default:
+				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
+				break;
+		}
+	}
+}
+
+/*
+ * Jumble a rowMarks list
+ */
+static void
+JumbleRowMarks(JumbleState *jstate, List *rowMarks)
+{
+	ListCell   *lc;
+
+	foreach(lc, rowMarks)
+	{
+		RowMarkClause *rowmark = lfirst_node(RowMarkClause, lc);
+
+		if (!rowmark->pushedDown)
+		{
+			APP_JUMB(rowmark->rti);
+			APP_JUMB(rowmark->strength);
+			APP_JUMB(rowmark->waitPolicy);
+		}
+	}
+}
+
+/*
+ * Jumble an expression tree
+ *
+ * In general this function should handle all the same node types that
+ * expression_tree_walker() does, and therefore it's coded to be as parallel
+ * to that function as possible.  However, since we are only invoked on
+ * queries immediately post-parse-analysis, we need not handle node types
+ * that only appear in planning.
+ *
+ * Note: the reason we don't simply use expression_tree_walker() is that the
+ * point of that function is to support tree walkers that don't care about
+ * most tree node types, but here we care about all types.  We should complain
+ * about any unrecognized node type.
+ */
+static void
+JumbleExpr(JumbleState *jstate, Node *node)
+{
+	ListCell   *temp;
+
+	if (node == NULL)
+		return;
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+
+	/*
+	 * We always emit the node's NodeTag, then any additional fields that are
+	 * considered significant, and then we recurse to any child nodes.
+	 */
+	APP_JUMB(node->type);
+
+	switch (nodeTag(node))
+	{
+		case T_Var:
+			{
+				Var		   *var = (Var *) node;
+
+				APP_JUMB(var->varno);
+				APP_JUMB(var->varattno);
+				APP_JUMB(var->varlevelsup);
+			}
+			break;
+		case T_Const:
+			{
+				Const	   *c = (Const *) node;
+
+				/* We jumble only the constant's type, not its value */
+				APP_JUMB(c->consttype);
+				/* Also, record its parse location for query normalization */
+				RecordConstLocation(jstate, c->location);
+			}
+			break;
+		case T_Param:
+			{
+				Param	   *p = (Param *) node;
+
+				APP_JUMB(p->paramkind);
+				APP_JUMB(p->paramid);
+				APP_JUMB(p->paramtype);
+				/* Also, track the highest external Param id */
+				if (p->paramkind == PARAM_EXTERN &&
+					p->paramid > jstate->highest_extern_param_id)
+					jstate->highest_extern_param_id = p->paramid;
+			}
+			break;
+		case T_Aggref:
+			{
+				Aggref	   *expr = (Aggref *) node;
+
+				APP_JUMB(expr->aggfnoid);
+				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggorder);
+				JumbleExpr(jstate, (Node *) expr->aggdistinct);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_GroupingFunc:
+			{
+				GroupingFunc *grpnode = (GroupingFunc *) node;
+
+				JumbleExpr(jstate, (Node *) grpnode->refs);
+			}
+			break;
+		case T_WindowFunc:
+			{
+				WindowFunc *expr = (WindowFunc *) node;
+
+				APP_JUMB(expr->winfnoid);
+				APP_JUMB(expr->winref);
+				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->aggfilter);
+			}
+			break;
+		case T_SubscriptingRef:
+			{
+				SubscriptingRef *sbsref = (SubscriptingRef *) node;
+
+				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
+			}
+			break;
+		case T_FuncExpr:
+			{
+				FuncExpr   *expr = (FuncExpr *) node;
+
+				APP_JUMB(expr->funcid);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_NamedArgExpr:
+			{
+				NamedArgExpr *nae = (NamedArgExpr *) node;
+
+				APP_JUMB(nae->argnumber);
+				JumbleExpr(jstate, (Node *) nae->arg);
+			}
+			break;
+		case T_OpExpr:
+		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
+		case T_NullIfExpr:		/* struct-equivalent to OpExpr */
+			{
+				OpExpr	   *expr = (OpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_ScalarArrayOpExpr:
+			{
+				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
+
+				APP_JUMB(expr->opno);
+				APP_JUMB(expr->useOr);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_BoolExpr:
+			{
+				BoolExpr   *expr = (BoolExpr *) node;
+
+				APP_JUMB(expr->boolop);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;
+		case T_SubLink:
+			{
+				SubLink    *sublink = (SubLink *) node;
+
+				APP_JUMB(sublink->subLinkType);
+				APP_JUMB(sublink->subLinkId);
+				JumbleExpr(jstate, (Node *) sublink->testexpr);
+				JumbleQueryInternal(jstate, castNode(Query, sublink->subselect));
+			}
+			break;
+		case T_FieldSelect:
+			{
+				FieldSelect *fs = (FieldSelect *) node;
+
+				APP_JUMB(fs->fieldnum);
+				JumbleExpr(jstate, (Node *) fs->arg);
+			}
+			break;
+		case T_FieldStore:
+			{
+				FieldStore *fstore = (FieldStore *) node;
+
+				JumbleExpr(jstate, (Node *) fstore->arg);
+				JumbleExpr(jstate, (Node *) fstore->newvals);
+			}
+			break;
+		case T_RelabelType:
+			{
+				RelabelType *rt = (RelabelType *) node;
+
+				APP_JUMB(rt->resulttype);
+				JumbleExpr(jstate, (Node *) rt->arg);
+			}
+			break;
+		case T_CoerceViaIO:
+			{
+				CoerceViaIO *cio = (CoerceViaIO *) node;
+
+				APP_JUMB(cio->resulttype);
+				JumbleExpr(jstate, (Node *) cio->arg);
+			}
+			break;
+		case T_ArrayCoerceExpr:
+			{
+				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
+
+				APP_JUMB(acexpr->resulttype);
+				JumbleExpr(jstate, (Node *) acexpr->arg);
+				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
+			}
+			break;
+		case T_ConvertRowtypeExpr:
+			{
+				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
+
+				APP_JUMB(crexpr->resulttype);
+				JumbleExpr(jstate, (Node *) crexpr->arg);
+			}
+			break;
+		case T_CollateExpr:
+			{
+				CollateExpr *ce = (CollateExpr *) node;
+
+				APP_JUMB(ce->collOid);
+				JumbleExpr(jstate, (Node *) ce->arg);
+			}
+			break;
+		case T_CaseExpr:
+			{
+				CaseExpr   *caseexpr = (CaseExpr *) node;
+
+				JumbleExpr(jstate, (Node *) caseexpr->arg);
+				foreach(temp, caseexpr->args)
+				{
+					CaseWhen   *when = lfirst_node(CaseWhen, temp);
+
+					JumbleExpr(jstate, (Node *) when->expr);
+					JumbleExpr(jstate, (Node *) when->result);
+				}
+				JumbleExpr(jstate, (Node *) caseexpr->defresult);
+			}
+			break;
+		case T_CaseTestExpr:
+			{
+				CaseTestExpr *ct = (CaseTestExpr *) node;
+
+				APP_JUMB(ct->typeId);
+			}
+			break;
+		case T_ArrayExpr:
+			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
+			break;
+		case T_RowExpr:
+			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
+			break;
+		case T_RowCompareExpr:
+			{
+				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
+
+				APP_JUMB(rcexpr->rctype);
+				JumbleExpr(jstate, (Node *) rcexpr->largs);
+				JumbleExpr(jstate, (Node *) rcexpr->rargs);
+			}
+			break;
+		case T_CoalesceExpr:
+			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
+			break;
+		case T_MinMaxExpr:
+			{
+				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
+
+				APP_JUMB(mmexpr->op);
+				JumbleExpr(jstate, (Node *) mmexpr->args);
+			}
+			break;
+		case T_SQLValueFunction:
+			{
+				SQLValueFunction *svf = (SQLValueFunction *) node;
+
+				APP_JUMB(svf->op);
+				/* type is fully determined by op */
+				APP_JUMB(svf->typmod);
+			}
+			break;
+		case T_XmlExpr:
+			{
+				XmlExpr    *xexpr = (XmlExpr *) node;
+
+				APP_JUMB(xexpr->op);
+				JumbleExpr(jstate, (Node *) xexpr->named_args);
+				JumbleExpr(jstate, (Node *) xexpr->args);
+			}
+			break;
+		case T_NullTest:
+			{
+				NullTest   *nt = (NullTest *) node;
+
+				APP_JUMB(nt->nulltesttype);
+				JumbleExpr(jstate, (Node *) nt->arg);
+			}
+			break;
+		case T_BooleanTest:
+			{
+				BooleanTest *bt = (BooleanTest *) node;
+
+				APP_JUMB(bt->booltesttype);
+				JumbleExpr(jstate, (Node *) bt->arg);
+			}
+			break;
+		case T_CoerceToDomain:
+			{
+				CoerceToDomain *cd = (CoerceToDomain *) node;
+
+				APP_JUMB(cd->resulttype);
+				JumbleExpr(jstate, (Node *) cd->arg);
+			}
+			break;
+		case T_CoerceToDomainValue:
+			{
+				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
+
+				APP_JUMB(cdv->typeId);
+			}
+			break;
+		case T_SetToDefault:
+			{
+				SetToDefault *sd = (SetToDefault *) node;
+
+				APP_JUMB(sd->typeId);
+			}
+			break;
+		case T_CurrentOfExpr:
+			{
+				CurrentOfExpr *ce = (CurrentOfExpr *) node;
+
+				APP_JUMB(ce->cvarno);
+				if (ce->cursor_name)
+					APP_JUMB_STRING(ce->cursor_name);
+				APP_JUMB(ce->cursor_param);
+			}
+			break;
+		case T_NextValueExpr:
+			{
+				NextValueExpr *nve = (NextValueExpr *) node;
+
+				APP_JUMB(nve->seqid);
+				APP_JUMB(nve->typeId);
+			}
+			break;
+		case T_InferenceElem:
+			{
+				InferenceElem *ie = (InferenceElem *) node;
+
+				APP_JUMB(ie->infercollid);
+				APP_JUMB(ie->inferopclass);
+				JumbleExpr(jstate, ie->expr);
+			}
+			break;
+		case T_TargetEntry:
+			{
+				TargetEntry *tle = (TargetEntry *) node;
+
+				APP_JUMB(tle->resno);
+				APP_JUMB(tle->ressortgroupref);
+				JumbleExpr(jstate, (Node *) tle->expr);
+			}
+			break;
+		case T_RangeTblRef:
+			{
+				RangeTblRef *rtr = (RangeTblRef *) node;
+
+				APP_JUMB(rtr->rtindex);
+			}
+			break;
+		case T_JoinExpr:
+			{
+				JoinExpr   *join = (JoinExpr *) node;
+
+				APP_JUMB(join->jointype);
+				APP_JUMB(join->isNatural);
+				APP_JUMB(join->rtindex);
+				JumbleExpr(jstate, join->larg);
+				JumbleExpr(jstate, join->rarg);
+				JumbleExpr(jstate, join->quals);
+			}
+			break;
+		case T_FromExpr:
+			{
+				FromExpr   *from = (FromExpr *) node;
+
+				JumbleExpr(jstate, (Node *) from->fromlist);
+				JumbleExpr(jstate, from->quals);
+			}
+			break;
+		case T_OnConflictExpr:
+			{
+				OnConflictExpr *conf = (OnConflictExpr *) node;
+
+				APP_JUMB(conf->action);
+				JumbleExpr(jstate, (Node *) conf->arbiterElems);
+				JumbleExpr(jstate, conf->arbiterWhere);
+				JumbleExpr(jstate, (Node *) conf->onConflictSet);
+				JumbleExpr(jstate, conf->onConflictWhere);
+				APP_JUMB(conf->constraint);
+				APP_JUMB(conf->exclRelIndex);
+				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
+			}
+			break;
+		case T_List:
+			foreach(temp, (List *) node)
+			{
+				JumbleExpr(jstate, (Node *) lfirst(temp));
+			}
+			break;
+		case T_IntList:
+			foreach(temp, (List *) node)
+			{
+				APP_JUMB(lfirst_int(temp));
+			}
+			break;
+		case T_SortGroupClause:
+			{
+				SortGroupClause *sgc = (SortGroupClause *) node;
+
+				APP_JUMB(sgc->tleSortGroupRef);
+				APP_JUMB(sgc->eqop);
+				APP_JUMB(sgc->sortop);
+				APP_JUMB(sgc->nulls_first);
+			}
+			break;
+		case T_GroupingSet:
+			{
+				GroupingSet *gsnode = (GroupingSet *) node;
+
+				JumbleExpr(jstate, (Node *) gsnode->content);
+			}
+			break;
+		case T_WindowClause:
+			{
+				WindowClause *wc = (WindowClause *) node;
+
+				APP_JUMB(wc->winref);
+				APP_JUMB(wc->frameOptions);
+				JumbleExpr(jstate, (Node *) wc->partitionClause);
+				JumbleExpr(jstate, (Node *) wc->orderClause);
+				JumbleExpr(jstate, wc->startOffset);
+				JumbleExpr(jstate, wc->endOffset);
+			}
+			break;
+		case T_CommonTableExpr:
+			{
+				CommonTableExpr *cte = (CommonTableExpr *) node;
+
+				/* we store the string name because RTE_CTE RTEs need it */
+				APP_JUMB_STRING(cte->ctename);
+				APP_JUMB(cte->ctematerialized);
+				JumbleQueryInternal(jstate, castNode(Query, cte->ctequery));
+			}
+			break;
+		case T_SetOperationStmt:
+			{
+				SetOperationStmt *setop = (SetOperationStmt *) node;
+
+				APP_JUMB(setop->op);
+				APP_JUMB(setop->all);
+				JumbleExpr(jstate, setop->larg);
+				JumbleExpr(jstate, setop->rarg);
+			}
+			break;
+		case T_RangeTblFunction:
+			{
+				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
+
+				JumbleExpr(jstate, rtfunc->funcexpr);
+			}
+			break;
+		case T_TableFunc:
+			{
+				TableFunc  *tablefunc = (TableFunc *) node;
+
+				JumbleExpr(jstate, tablefunc->docexpr);
+				JumbleExpr(jstate, tablefunc->rowexpr);
+				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
+			}
+			break;
+		case T_TableSampleClause:
+			{
+				TableSampleClause *tsc = (TableSampleClause *) node;
+
+				APP_JUMB(tsc->tsmhandler);
+				JumbleExpr(jstate, (Node *) tsc->args);
+				JumbleExpr(jstate, (Node *) tsc->repeatable);
+			}
+			break;
+		default:
+			/* Only a warning, since we can stumble along anyway */
+			elog(WARNING, "unrecognized node type: %d",
+				 (int) nodeTag(node));
+			break;
+	}
+}
+
+/*
+ * Record location of constant within query string of query tree
+ * that is currently being walked.
+ */
+static void
+RecordConstLocation(JumbleState *jstate, int location)
+{
+	/* -1 indicates unknown or undefined location */
+	if (location >= 0)
+	{
+		/* enlarge array if needed */
+		if (jstate->clocations_count >= jstate->clocations_buf_size)
+		{
+			jstate->clocations_buf_size *= 2;
+			jstate->clocations = (LocationLen *)
+				repalloc(jstate->clocations,
+						 jstate->clocations_buf_size *
+						 sizeof(LocationLen));
+		}
+		jstate->clocations[jstate->clocations_count].location = location;
+		/* initialize lengths to -1 to simplify third-party module usage */
+		jstate->clocations[jstate->clocations_count].length = -1;
+		jstate->clocations_count++;
+	}
+}
diff --git a/src/include/parser/analyze.h b/src/include/parser/analyze.h
index 4a3c9686f9..6716db6c13 100644
--- a/src/include/parser/analyze.h
+++ b/src/include/parser/analyze.h
@@ -15,10 +15,12 @@
 #define ANALYZE_H
 
 #include "parser/parse_node.h"
+#include "utils/queryjumble.h"
 
 /* Hook for plugins to get control at end of parse analysis */
 typedef void (*post_parse_analyze_hook_type) (ParseState *pstate,
-											  Query *query);
+											  Query *query,
+											  JumbleState *jstate);
 extern PGDLLIMPORT post_parse_analyze_hook_type post_parse_analyze_hook;
 
 
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 5004ee4177..9b6552b25b 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -248,6 +248,7 @@ extern bool log_btree_build_stats;
 extern PGDLLIMPORT bool check_function_bodies;
 extern bool session_auth_is_superuser;
 
+extern bool compute_query_id;
 extern bool log_duration;
 extern int	log_parameter_max_length;
 extern int	log_parameter_max_length_on_error;
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
new file mode 100644
index 0000000000..83ba7339fa
--- /dev/null
+++ b/src/include/utils/queryjumble.h
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * queryjumble.h
+ *	  Query normalization and fingerprinting.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/utils/queryjumble.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef QUERYJUBLE_H
+#define QUERYJUBLE_H
+
+#include "nodes/parsenodes.h"
+
+#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+
+/*
+ * Struct for tracking locations/lengths of constants during normalization
+ */
+typedef struct LocationLen
+{
+	int			location;		/* start offset in query text */
+	int			length;			/* length in bytes, or -1 to ignore */
+} LocationLen;
+
+/*
+ * Working state for computing a query jumble and producing a normalized
+ * query string
+ */
+typedef struct JumbleState
+{
+	/* Jumble of current query tree */
+	unsigned char *jumble;
+
+	/* Number of bytes used in jumble[] */
+	Size		jumble_len;
+
+	/* Array of locations of constants that should be removed */
+	LocationLen *clocations;
+
+	/* Allocated length of clocations array */
+	int			clocations_buf_size;
+
+	/* Current number of valid entries in clocations array */
+	int			clocations_count;
+
+	/* highest Param id we've seen, in order to start normalization correctly */
+	int			highest_extern_param_id;
+} JumbleState;
+
+const char *CleanQuerytext(const char *query, int *location, int *len);
+JumbleState *JumbleQuery(Query *query, const char *querytext);
+
+#endif							/* QUERYJUMBLE_H */
-- 
2.30.1

v24-0002-Expose-queryid-in-pg_stat_activity-and-log_line_.patchtext/x-diff; charset=us-asciiDownload

From 6d1949524feb11febe70a3c36d94dc2f71653450 Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Mon, 22 Mar 2021 17:43:23 -0400
Subject: [PATCH v24 2/3] Expose queryid in pg_stat_activity and
 log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.
---
 .../pg_stat_statements/pg_stat_statements.c   | 112 +++++++-----------
 doc/src/sgml/config.sgml                      |  29 +++--
 doc/src/sgml/monitoring.sgml                  |  16 +++
 src/backend/catalog/system_views.sql          |   1 +
 src/backend/executor/execMain.c               |   9 ++
 src/backend/executor/execParallel.c           |   5 +-
 src/backend/parser/analyze.c                  |   5 +
 src/backend/tcop/postgres.c                   |   5 +
 src/backend/utils/activity/backend_status.c   |  68 +++++++++++
 src/backend/utils/adt/pgstatfuncs.c           |   7 +-
 src/backend/utils/error/elog.c                |   8 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c          |  27 ++---
 src/include/catalog/pg_proc.dat               |   6 +-
 src/include/utils/backend_status.h            |   5 +
 src/test/regress/expected/rules.out           |   9 +-
 16 files changed, 213 insertions(+), 100 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 0f8bac0cca..52cba86196 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -67,6 +67,7 @@
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/queryjumble.h"
 #include "utils/memutils.h"
 #include "utils/timestamp.h"
 
@@ -101,6 +102,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+									!IsA(n, PrepareStmt) && \
+									!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -309,7 +318,6 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 								ProcessUtilityContext context, ParamListInfo params,
 								QueryEnvironment *queryEnv,
 								DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
 static void pgss_store(const char *query, uint64 queryId,
 					   int query_location, int query_len,
 					   pgssStoreKind kind,
@@ -806,16 +814,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * Clear queryId for prepared statements related utility, as those will
+	 * inherit from the underlying statement's one (except DEALLOCATE which is
+	 * entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt))
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -1057,6 +1063,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, QueryCompletion *qc)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
+	uint64		saved_queryId = pstmt->queryId;
+
+	/*
+	 * Force utility statements to get queryId zero.  We do this even in cases
+	 * where the statement contains an optimizable statement for which a
+	 * queryId could be derived (such as EXPLAIN or DECLARE CURSOR).  For such
+	 * cases, runtime control will first go through ProcessUtility and then the
+	 * executor, and we don't want the executor hooks to do anything, since we
+	 * are already measuring the statement's costs at the utility level.
+	 *
+	 * Note that this is only done if pg_stat_statements is enabled and
+	 * configured to track utility statements, in the unlikely possibility
+	 * that user configured another extension to handle utility statements
+	 * only.
+	 */
+	if (pgss_enabled(exec_nested_level) && pgss_track_utility)
+		pstmt->queryId = UINT64CONST(0);
 
 	/*
 	 * If it's an EXECUTE statement, we don't track it and don't increment the
@@ -1073,9 +1096,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	 * Likewise, we don't track execution of DEALLOCATE.
 	 */
 	if (pgss_track_utility && pgss_enabled(exec_nested_level) &&
-		!IsA(parsetree, ExecuteStmt) &&
-		!IsA(parsetree, PrepareStmt) &&
-		!IsA(parsetree, DeallocateStmt))
+		PGSS_HANDLED_UTILITY(parsetree))
 	{
 		instr_time	start;
 		instr_time	duration;
@@ -1130,7 +1151,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
 
 		pgss_store(queryString,
-				   0,			/* signal that it's a utility stmt */
+				   saved_queryId,
 				   pstmt->stmt_location,
 				   pstmt->stmt_len,
 				   PGSS_EXEC,
@@ -1153,23 +1174,12 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	}
 }
 
-/*
- * Given an arbitrarily long query string, produce a hash for the purposes of
- * identifying the query, without normalizing constants.  Used when hashing
- * utility statements.
- */
-static uint64
-pgss_hash_string(const char *str, int len)
-{
-	return DatumGetUInt64(hash_any_extended((const unsigned char *) str,
-											len, 0));
-}
-
 /*
  * Store some statistics for a statement.
  *
- * If queryId is 0 then this is a utility statement and we should compute
- * a suitable queryId internally.
+ * If queryId is 0 then this is a utility statement for which we couldn't
+ * compute a queryId during parse analysis, and we should compute a suitable
+ * queryId internally.
  *
  * If jstate is not NULL then we're trying to create an entry for which
  * we have no statistics as yet; we just want to record the normalized
@@ -1200,52 +1210,18 @@ pgss_store(const char *query, uint64 queryId,
 		return;
 
 	/*
-	 * Confine our attention to the relevant part of the string, if the query
-	 * is a portion of a multi-statement source string.
-	 *
-	 * First apply starting offset, unless it's -1 (unknown).
-	 */
-	if (query_location >= 0)
-	{
-		Assert(query_location <= strlen(query));
-		query += query_location;
-		/* Length of 0 (or -1) means "rest of string" */
-		if (query_len <= 0)
-			query_len = strlen(query);
-		else
-			Assert(query_len <= strlen(query));
-	}
-	else
-	{
-		/* If query location is unknown, distrust query_len as well */
-		query_location = 0;
-		query_len = strlen(query);
-	}
-
-	/*
-	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
-	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 * Nothing to do if compute_query_id isn't enabled and no other module
+	 * computed a query identifier.
 	 */
-	while (query_len > 0 && scanner_isspace(query[0]))
-		query++, query_location++, query_len--;
-	while (query_len > 0 && scanner_isspace(query[query_len - 1]))
-		query_len--;
+	if (queryId == UINT64CONST(0))
+		return;
 
 	/*
-	 * For utility statements, we just hash the query string to get an ID.
+	 * Confine our attention to the relevant part of the string, if the query
+	 * is a portion of a multi-statement source string, and update query
+	 * location and length if needed.
 	 */
-	if (queryId == UINT64CONST(0))
-	{
-		queryId = pgss_hash_string(query, query_len);
-
-		/*
-		 * If we are unlucky enough to get a hash of zero(invalid), use
-		 * queryID as 2 instead, queryID 1 is already in use for normal
-		 * statements.
-		 */
-		if (queryId == UINT64CONST(0))
-			queryId = UINT64CONST(2);
-	}
+	query = CleanQuerytext(query, &query_location, &query_len);
 
 	/* Set up key for hashtable search */
 	key.userid = GetUserId();
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b28f7000c1..5f9eddb197 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6999,6 +6999,15 @@ local0.*    /var/log/postgresql
              session processes</entry>
              <entry>no</entry>
             </row>
+            <row>
+             <entry><literal>%Q</literal></entry>
+             <entry>query identifier of the current query.  Query
+             identifiers are not computed by default, so this field
+             will be zero unless <xref linkend="guc-compute-query-id"/>
+             parameter is enabled or a third-party module that computes
+             query identifiers is configured.</entry>
+             <entry>yes</entry>
+            </row>
             <row>
              <entry><literal>%%</literal></entry>
              <entry>Literal <literal>%</literal></entry>
@@ -7475,8 +7484,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       <listitem>
        <para>
         Enables the collection of information on the currently
-        executing command of each session, along with the time when
-        that command began execution. This parameter is on by
+        executing command of each session, along with its identifier and the
+        time when that command began execution. This parameter is on by
         default. Note that even when enabled, this information is not
         visible to all users, only to superusers and the user owning
         the session being reported on, so it should not represent a
@@ -7625,12 +7634,16 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </term>
       <listitem>
        <para>
-        Enables in-core computation of a query identifier.  The <xref
-        linkend="pgstatstatements"/> extension requires a query identifier
-        to be computed.  Note that an external module can alternatively
-        be used if the in-core query identifier computation method
-        isn't acceptable.  In this case, in-core computation should
-        remain disabled.  The default is <literal>off</literal>.
+        Enables in-core computation of a query identifier.
+        Query identifiers can be displayed in the <link
+        linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
+        view, or emitted in the log if configured via the <xref
+        linkend="guc-log-line-prefix"/> parameter.  The <xref
+        linkend="pgstatstatements"/> extension also requires a query
+        identifier to be computed.  Note that an external module can
+        alternatively be used if the in-core query identifier computation
+        specification isn't acceptable.  In this case, in-core computation
+        must be disabled.  The default is <literal>off</literal>.
        </para>
        <note>
         <para>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 56018745c8..52958b4fd9 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -910,6 +910,22 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </para></entry>
      </row>
 
+    <row>
+     <entry role="catalog_table_entry"><para role="column_definition">
+      <structfield>queryid</structfield> <type>bigint</type>
+     </para>
+     <para>
+      Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this
+      field shows the identifier of the currently executing query. In
+      all other states, it shows the identifier of last query that was
+      executed.  Query identifiers are not computed by default so this
+      field will be null unless <xref linkend="guc-compute-query-id"/>
+      parameter is enabled or a third-party module that computes query
+      identifiers is configured.
+     </para></entry>
+    </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5f2541d316..4d6b232787 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -833,6 +833,7 @@ CREATE VIEW pg_stat_activity AS
             S.state,
             S.backend_xid,
             s.backend_xmin,
+            S.queryid,
             S.query,
             S.backend_type
     FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 163242f54e..db49d657f6 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -58,6 +58,7 @@
 #include "storage/lmgr.h"
 #include "tcop/utility.h"
 #include "utils/acl.h"
+#include "utils/backend_status.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/partcache.h"
@@ -128,6 +129,14 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 void
 ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	/*
+	 * In some cases (e.g. an EXECUTE statement) a query execution will skip
+	 * parse analysis, which means that the queryid won't be reported.  Note
+	 * that it's harmless to report the queryid multiple time, as the call will
+	 * be ignored if the top level queryid has already been reported.
+	 */
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
+
 	if (ExecutorStart_hook)
 		(*ExecutorStart_hook) (queryDesc, eflags);
 	else
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 366d0b20b9..c7a2f31473 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -175,7 +175,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = UINT64CONST(0);
+	pstmt->queryId = pgstat_get_my_queryid();
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
@@ -1421,8 +1421,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Setting debug_query_string for individual workers */
 	debug_query_string = queryDesc->sourceText;
 
-	/* Report workers' query for monitoring purposes */
+	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
+	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 35cb9ebfd7..b082096b90 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -45,6 +45,7 @@
 #include "parser/parse_type.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
+#include "utils/backend_status.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/queryjumble.h"
@@ -130,6 +131,8 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
@@ -167,6 +170,8 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	return query;
 }
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3a62e45bef..d0c1dc9ef2 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -695,6 +695,8 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	free_parsestate(pstate);
 
+	pgstat_report_queryid(query->queryId, false);
+
 	if (log_parser_stats)
 		ShowUsage("PARSE ANALYSIS STATISTICS");
 
@@ -913,6 +915,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
 			stmt->utilityStmt = query->utilityStmt;
 			stmt->stmt_location = query->stmt_location;
 			stmt->stmt_len = query->stmt_len;
+			stmt->queryId = query->queryId;
 		}
 		else
 		{
@@ -1029,6 +1032,8 @@ exec_simple_query(const char *query_string)
 		DestReceiver *receiver;
 		int16		format;
 
+		pgstat_report_queryid(0, true);
+
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index a25ec0ee3c..6110113e56 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -544,6 +544,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
+			beentry->st_queryid = UINT64CONST(0);
 			proc->wait_event_info = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
@@ -598,6 +599,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	beentry->st_state = state;
 	beentry->st_state_start_timestamp = current_timestamp;
 
+	/*
+	 * If a new query is started, we reset the query identifier as it'll only
+	 * be known after parse analysis, to avoid reporting last query's
+	 * identifier.
+	 */
+	if (state == STATE_RUNNING)
+		beentry->st_queryid = UINT64CONST(0);
+
 	if (cmd_str != NULL)
 	{
 		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
@@ -608,6 +617,46 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/* --------
+ * pgstat_report_queryid() -
+ *
+ * Called to update top-level query identifier.
+ * --------
+ */
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	/*
+	 * if track_activities is disabled, st_queryid should already have been
+	 * reset
+	 */
+	if (!beentry || !pgstat_track_activities)
+		return;
+
+	/*
+	 * We only report the top-level query identifiers.  The stored queryid is
+	 * reset when a backend calls pgstat_report_activity(STATE_RUNNING), or
+	 * with an explicit call to this function using the force flag.  If the
+	 * saved query identifier is not zero it means that it's not a top-level
+	 * command, so ignore the one provided unless it's an explicit call to
+	 * reset the identifier.
+	 */
+	if (beentry->st_queryid != 0 && !force)
+		return;
+
+	/*
+	 * Update my status entry, following the protocol of bumping
+	 * st_changecount before and after.  We use a volatile pointer here to
+	 * ensure the compiler doesn't try to get cute.
+	 */
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+	beentry->st_queryid = queryId;
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
+
 /* ----------
  * pgstat_report_appname() -
  *
@@ -972,6 +1021,25 @@ pgstat_get_crashed_backend_activity(int pid, char *buffer, int buflen)
 	return NULL;
 }
 
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ * Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+	if (!MyBEEntry)
+		return 0;
+
+	/* There's no need for a look around pgstat_begin_read_activity /
+	 * pgstat_end_read_activity here as it's only called from
+	 * pg_stat_get_activity which is already protected, or from the same
+	 * backend which mean that there won't be concurrent write.
+	 */
+	return MyBEEntry->st_queryid;
+}
+
 
 /* ----------
  * pgstat_fetch_stat_beentry() -
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9ffbca685c..9fa4a93162 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -569,7 +569,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	29
+#define PG_STAT_GET_ACTIVITY_COLS	30
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -914,6 +914,10 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				values[27] = BoolGetDatum(false);	/* GSS Encryption not in
 													 * use */
 			}
+			if (beentry->st_queryid == 0)
+				nulls[29] = true;
+			else
+				values[29] = DatumGetUInt64(beentry->st_queryid);
 		}
 		else
 		{
@@ -941,6 +945,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[26] = true;
 			nulls[27] = true;
 			nulls[28] = true;
+			nulls[29] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 12de4b38cb..1cf71a649b 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2714,6 +2714,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				else
 					appendStringInfoString(buf, unpack_sql_state(edata->sqlerrcode));
 				break;
+			case 'Q':
+				if (padding != 0)
+					appendStringInfo(buf, "%*ld", padding,
+							pgstat_get_my_queryid());
+				else
+					appendStringInfo(buf, "%ld",
+							pgstat_get_my_queryid());
+				break;
 			default:
 				/* format error - ignore it */
 				break;
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 192577a02e..65f6186966 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -543,6 +543,7 @@
 					#   %t = timestamp without milliseconds
 					#   %m = timestamp with milliseconds
 					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %Q = query ID (0 if none or not computed)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index 2a47688fd6..53286bb333 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -39,7 +39,7 @@
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
-static uint64 compute_utility_queryid(const char *str, int query_len);
+static uint64 compute_utility_queryid(const char *str, int query_location, int query_len);
 static void AppendJumble(JumbleState *jstate,
 						 const unsigned char *item, Size size);
 static void JumbleQueryInternal(JumbleState *jstate, Query *query);
@@ -97,17 +97,9 @@ JumbleQuery(Query *query, const char *querytext)
 	JumbleState *jstate = NULL;
 	if (query->utilityStmt)
 	{
-		const char *sql;
-		int query_location = query->stmt_location;
-		int query_len = query->stmt_len;
-
-		/*
-		 * Confine our attention to the relevant part of the string, if the
-		 * query is a portion of a multi-statement source string.
-		 */
-		sql = CleanQuerytext(querytext, &query_location, &query_len);
-
-		query->queryId = compute_utility_queryid(sql, query_len);
+		query->queryId = compute_utility_queryid(querytext,
+												 query->stmt_location,
+												 query->stmt_len);
 	}
 	else
 	{
@@ -143,11 +135,18 @@ JumbleQuery(Query *query, const char *querytext)
  * Compute a query identifier for the given utility query string.
  */
 static uint64
-compute_utility_queryid(const char *str, int query_len)
+compute_utility_queryid(const char *query_text, int query_location, int query_len)
 {
 	uint64 queryId;
+	const char *sql;
+
+	/*
+	 * Confine our attention to the relevant part of the string, if the
+	 * query is a portion of a multi-statement source string.
+	 */
+	sql = CleanQuerytext(query_text, &query_location, &query_len);
 
-	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+	queryId = DatumGetUInt64(hash_any_extended((const unsigned char *) sql,
 											   query_len, 0));
 
 	/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 69ffd0c3f4..ab30558e3f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5263,9 +5263,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,queryid}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 3fd7370d41..8e149b56ca 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -165,6 +165,9 @@ typedef struct PgBackendStatus
 	ProgressCommandType st_progress_command;
 	Oid			st_progress_command_target;
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+
+	/* query identifier, optionally computed using post_parse_analyze_hook */
+	uint64		st_queryid;
 } PgBackendStatus;
 
 
@@ -294,12 +297,14 @@ extern void pgstat_clear_backend_activity_snapshot(void);
 
 /* Activity reporting functions */
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_queryid(uint64 queryId, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
 extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
+extern uint64 pgstat_get_my_queryid(void);
 
 
 /* ----------
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 9b59a7b4a5..264deda7af 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1762,9 +1762,10 @@ pg_stat_activity| SELECT s.datid,
     s.state,
     s.backend_xid,
     s.backend_xmin,
+    s.queryid,
     s.query,
     s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1876,7 +1877,7 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
@@ -2046,7 +2047,7 @@ pg_stat_replication| SELECT s.pid,
     w.sync_priority,
     w.sync_state,
     w.reply_time
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_replication_slots| SELECT s.slot_name,
@@ -2076,7 +2077,7 @@ pg_stat_ssl| SELECT s.pid,
     s.ssl_client_dn AS client_dn,
     s.ssl_client_serial AS client_serial,
     s.ssl_issuer_dn AS issuer_dn
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_subscription| SELECT su.oid AS subid,
     su.subname,
-- 
2.30.1

v24-0003-Expose-query-identifier-in-verbose-explain.patchtext/x-diff; charset=us-asciiDownload

From 6e2fe1c4ef044e3cee8e0f7bbc2501accdadb1ee Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Mon, 22 Mar 2021 17:43:24 -0400
Subject: [PATCH v24 3/3] Expose query identifier in verbose explain

If a query identifier has been computed, either by enabling compute_query_id or
using a third-party module, verbose explain will display it.
---
 doc/src/sgml/config.sgml              |  6 +++---
 doc/src/sgml/ref/explain.sgml         |  6 ++++--
 src/backend/commands/explain.c        | 18 ++++++++++++++++++
 src/test/regress/expected/explain.out | 11 ++++++++++-
 src/test/regress/sql/explain.sql      |  5 ++++-
 5 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5f9eddb197..71b47729b8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7637,9 +7637,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         Enables in-core computation of a query identifier.
         Query identifiers can be displayed in the <link
         linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
-        view, or emitted in the log if configured via the <xref
-        linkend="guc-log-line-prefix"/> parameter.  The <xref
-        linkend="pgstatstatements"/> extension also requires a query
+        view, using <command>EXPLAIN</command>, or emitted in the log if
+        configured via the <xref linkend="guc-log-line-prefix"/> parameter.
+        The <xref linkend="pgstatstatements"/> extension also requires a query
         identifier to be computed.  Note that an external module can
         alternatively be used if the in-core query identifier computation
         specification isn't acceptable.  In this case, in-core computation
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index c4512332a0..4d758fb237 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -136,8 +136,10 @@ ROLLBACK;
       the output column list for each node in the plan tree, schema-qualify
       table and function names, always label variables in expressions with
       their range table alias, and always print the name of each trigger for
-      which statistics are displayed.  This parameter defaults to
-      <literal>FALSE</literal>.
+      which statistics are displayed.  The query identifier will also be
+      displayed if one has been computed, see <xref
+      linkend="guc-compute-query-id"/> for more details.  This parameter
+      defaults to <literal>FALSE</literal>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index ede8cec947..b62a76e7e5 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -24,6 +24,7 @@
 #include "nodes/extensible.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "parser/analyze.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/bufmgr.h"
@@ -165,6 +166,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 {
 	ExplainState *es = NewExplainState();
 	TupOutputState *tstate;
+	JumbleState *jstate = NULL;
+	Query		*query;
 	List	   *rewritten;
 	ListCell   *lc;
 	bool		timing_set = false;
@@ -241,6 +244,13 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	/* if the summary was not set explicitly, set default value */
 	es->summary = (summary_set) ? es->summary : es->analyze;
 
+	query = castNode(Query, stmt->query);
+	if (compute_query_id)
+		jstate = JumbleQuery(query, pstate->p_sourcetext);
+
+	if (post_parse_analyze_hook)
+		(*post_parse_analyze_hook) (pstate, query, jstate);
+
 	/*
 	 * Parse analysis was done already, but we still have to run the rule
 	 * rewriter.  We do not do AcquireRewriteLocks: we assume the query either
@@ -600,6 +610,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	/* Create textual dump of plan tree */
 	ExplainPrintPlan(es, queryDesc);
 
+	if (es->verbose && plannedstmt->queryId != UINT64CONST(0))
+	{
+		char	buf[MAXINT8LEN+1];
+
+		pg_lltoa(plannedstmt->queryId, buf);
+		ExplainPropertyText("Query Identifier", buf, es);
+	}
+
 	/* Show buffer usage in planning */
 	if (bufusage)
 	{
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
index b89b99fb02..4c578d4f5e 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -17,7 +17,7 @@ begin
     for ln in execute $1
     loop
         -- Replace any numeric word with just 'N'
-        ln := regexp_replace(ln, '\m\d+\M', 'N', 'g');
+        ln := regexp_replace(ln, '-?\m\d+\M', 'N', 'g');
         -- In sort output, the above won't match units-suffixed numbers
         ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
         -- Ignore text-mode buffers output because it varies depending
@@ -477,3 +477,12 @@ select jsonb_pretty(
 (1 row)
 
 rollback;
+set compute_query_id = on;
+select explain_filter('explain (verbose) select 1');
+             explain_filter             
+----------------------------------------
+ Result  (cost=N.N..N.N rows=N width=N)
+   Output: N
+ Query Identifier: N
+(3 rows)
+
diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql
index f2eab030d6..468caf4037 100644
--- a/src/test/regress/sql/explain.sql
+++ b/src/test/regress/sql/explain.sql
@@ -19,7 +19,7 @@ begin
     for ln in execute $1
     loop
         -- Replace any numeric word with just 'N'
-        ln := regexp_replace(ln, '\m\d+\M', 'N', 'g');
+        ln := regexp_replace(ln, '-?\m\d+\M', 'N', 'g');
         -- In sort output, the above won't match units-suffixed numbers
         ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
         -- Ignore text-mode buffers output because it varies depending
@@ -103,3 +103,6 @@ select jsonb_pretty(
 );
 
 rollback;
+
+set compute_query_id = on;
+select explain_filter('explain (verbose) select 1');
-- 
2.30.1

bruce@momjian.us

almost 5 years ago

In reply to: Julien Rouhaud (#174)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Apr 7, 2021 at 08:57:26PM +0800, Julien Rouhaud wrote:

On Wed, Apr 07, 2021 at 06:15:27PM +0530, Nitin Jadhav wrote:

I feel we should merge both of the conditions as it is done in
pgstat_report_xact_timestamp(). Probably we can write a common comment to
explain both the conditions.

[...]

Thanks for the explanation. Please add a comment explaining why there is no
loop.

PFA v24.

Patch applied. I am ready to adjust this with any improvements people
might have. Thank you for all the good feedback we got on this, and I
know many users have waited a long time for this feature.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

pryzby@telsasoft.com

almost 5 years ago

In reply to: Bruce Momjian (#175)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Apr 07, 2021 at 02:12:11PM -0400, Bruce Momjian wrote:

Patch applied. I am ready to adjust this with any improvements people
might have. Thank you for all the good feedback we got on this, and I
know many users have waited a long time for this feature.

If you support log_line_prefix 'Q', then you should also add to write_csvlog().

--
Justin

tgl@sss.pgh.pa.us

almost 5 years ago

In reply to: Bruce Momjian (#175)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Bruce Momjian <bruce@momjian.us> writes:

Patch applied. I am ready to adjust this with any improvements people
might have. Thank you for all the good feedback we got on this, and I
know many users have waited a long time for this feature.

For starters, you could try to make the buildfarm green again.

regards, tom lane

bruce@momjian.us

almost 5 years ago

In reply to: Tom Lane (#177)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

aOn Wed, Apr 7, 2021 at 04:15:50PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Patch applied. I am ready to adjust this with any improvements people
might have. Thank you for all the good feedback we got on this, and I
know many users have waited a long time for this feature.

For starters, you could try to make the buildfarm green again.

Wow, that's odd. The cfbot was green, so I never even looked at the
buildfarm. I will look at that now, and the CVS log issue.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

rjuju123@gmail.com

almost 5 years ago

In reply to: Bruce Momjian (#178)

1 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Apr 07, 2021 at 04:22:55PM -0400, Bruce Momjian wrote:

aOn Wed, Apr 7, 2021 at 04:15:50PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Patch applied. I am ready to adjust this with any improvements people
might have. Thank you for all the good feedback we got on this, and I
know many users have waited a long time for this feature.

For starters, you could try to make the buildfarm green again.

Wow, that's odd. The cfbot was green, so I never even looked at the
buildfarm. I will look at that now, and the CVS log issue.

Sorry about that. The issue came from animals with jit_above_cost = 0
outputting more lines than expected. I fixed that by using the same query as
before in explain.sql, as they don't generate any JIT output.

I also added the queryid to the csvlog output and fixed the documentation that
mention how to create a table to access the data.

Attachments:

v1-fix_pgss.difftext/plain; charset=us-asciiDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 963824d050..a9d0d63a57 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7341,6 +7341,7 @@ CREATE TABLE postgres_log
   application_name text,
   backend_type text,
   leader_pid integer,
+  queryid bigint,
   PRIMARY KEY (session_id, session_line_num)
 );
 </programlisting>
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index 2e21806f48..27827146f1 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -266,7 +266,8 @@ CREATE FOREIGN TABLE pglog (
   location text,
   application_name text,
   backend_type text,
-  leader_pid integer
+  leader_pid integer,
+  queryid bigint
 ) SERVER pglog
 OPTIONS ( filename 'log/pglog.csv', format 'csv' );
 </programlisting>
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 1cf71a649b..d27fb999d9 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2965,6 +2965,9 @@ write_csvlog(ErrorData *edata)
 			appendStringInfo(&buf, "%d", leader->pid);
 	}
 
+	appendStringInfoChar(&buf, ',');
+	appendStringInfo(&buf, "%zd", pgstat_get_my_queryid());
+
 	appendStringInfoChar(&buf, '\n');
 
 	/* If in the syslogger process, try to write messages direct to file */
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
index 4c578d4f5e..cda28098ba 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -478,11 +478,11 @@ select jsonb_pretty(
 
 rollback;
 set compute_query_id = on;
-select explain_filter('explain (verbose) select 1');
-             explain_filter             
-----------------------------------------
- Result  (cost=N.N..N.N rows=N width=N)
-   Output: N
+select explain_filter('explain (verbose) select * from int8_tbl i8');
+                         explain_filter                         
+----------------------------------------------------------------
+ Seq Scan on public.int8_tbl i8  (cost=N.N..N.N rows=N width=N)
+   Output: q1, q2
  Query Identifier: N
 (3 rows)
 
diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql
index 468caf4037..3f9ae9843a 100644
--- a/src/test/regress/sql/explain.sql
+++ b/src/test/regress/sql/explain.sql
@@ -105,4 +105,4 @@ select jsonb_pretty(
 rollback;
 
 set compute_query_id = on;
-select explain_filter('explain (verbose) select 1');
+select explain_filter('explain (verbose) select * from int8_tbl i8');

rjuju123@gmail.com

almost 5 years ago

In reply to: Julien Rouhaud (#179)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Apr 08, 2021 at 05:56:25AM +0800, Julien Rouhaud wrote:

I also added the queryid to the csvlog output and fixed the documentation that
mention how to create a table to access the data.

Note that I chose to output a 0 queryid if none has been computed rather that
outputting nothing. Let me know if that's not the wanted behavior.

bruce@momjian.us

almost 5 years ago

In reply to: Julien Rouhaud (#179)

1 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Apr 8, 2021 at 05:56:25AM +0800, Julien Rouhaud wrote:

On Wed, Apr 07, 2021 at 04:22:55PM -0400, Bruce Momjian wrote:

aOn Wed, Apr 7, 2021 at 04:15:50PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Patch applied. I am ready to adjust this with any improvements people
might have. Thank you for all the good feedback we got on this, and I
know many users have waited a long time for this feature.

For starters, you could try to make the buildfarm green again.

Wow, that's odd. The cfbot was green, so I never even looked at the
buildfarm. I will look at that now, and the CVS log issue.

Sorry about that. The issue came from animals with jit_above_cost = 0
outputting more lines than expected. I fixed that by using the same query as
before in explain.sql, as they don't generate any JIT output.

Yes, I just came to the same conclusion, that 'SELECT 1' didn't generate
the proper output lines to allow explain_filter() to strip out the JIT
lines. I have applied your patch for this, which should fix the build
farm. (I see my first green report now.)

I also added the queryid to the csvlog output and fixed the documentation that
mention how to create a table to access the data.

Uh, I think your patch missed a few things. First, you use "%zd"
(size_t) for the printf string, but calls to pgstat_get_my_queryid() in
src/backend/utils/error/elog.c used "%ld". Which is correct? I see
pgstat_get_my_queryid() as returning uint64, but I didn't think a uint64
fits in a BIGINT SQL column.

Also, you missed the SGML paragraph doc change, but you correctly
changed the SQL table definition.

I am attaching my version of the patch.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

Attachments:

csv.difftext/x-diff; charset=us-asciiDownload

From d6b0d010678bab519cc8a54893ff6c4affd34422 Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Wed, 7 Apr 2021 18:37:25 -0400
Subject: [PATCH] csv squash commit

---
 doc/src/sgml/config.sgml       | 4 +++-
 doc/src/sgml/file-fdw.sgml     | 3 ++-
 src/backend/utils/error/elog.c | 4 ++++
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 963824d050..ea5cf3a2dc 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7310,7 +7310,8 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
         character count of the error position therein,
         location of the error in the PostgreSQL source code
         (if <varname>log_error_verbosity</varname> is set to <literal>verbose</literal>),
-        application name, backend type, and process ID of parallel group leader.
+        application name, backend type, process ID of parallel group leader,
+        and query id.
         Here is a sample table definition for storing CSV-format log output:
 
 <programlisting>
@@ -7341,6 +7342,7 @@ CREATE TABLE postgres_log
   application_name text,
   backend_type text,
   leader_pid integer,
+  query_id bigint,
   PRIMARY KEY (session_id, session_line_num)
 );
 </programlisting>
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index 2e21806f48..2b531277de 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -266,7 +266,8 @@ CREATE FOREIGN TABLE pglog (
   location text,
   application_name text,
   backend_type text,
-  leader_pid integer
+  leader_pid integer,
+  query_id bigint,
 ) SERVER pglog
 OPTIONS ( filename 'log/pglog.csv', format 'csv' );
 </programlisting>
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 1cf71a649b..c3c2045580 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2967,6 +2967,10 @@ write_csvlog(ErrorData *edata)
 
 	appendStringInfoChar(&buf, '\n');
 
+	/* query id */
+	appendStringInfo(&buf, "%ld", pgstat_get_my_queryid());
+	appendStringInfoChar(&buf, ',');
+
 	/* If in the syslogger process, try to write messages direct to file */
 	if (MyBackendType == B_LOGGER)
 		write_syslogger_file(buf.data, buf.len, LOG_DESTINATION_CSVLOG);
-- 
2.20.1

tgl@sss.pgh.pa.us

almost 5 years ago

In reply to: Bruce Momjian (#181)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Bruce Momjian <bruce@momjian.us> writes:

Uh, I think your patch missed a few things. First, you use "%zd"
(size_t) for the printf string, but calls to pgstat_get_my_queryid() in
src/backend/utils/error/elog.c used "%ld". Which is correct? I see
pgstat_get_my_queryid() as returning uint64, but I didn't think a uint64
fits in a BIGINT SQL column.

Neither is correct. Project standard these days for printing [u]int64
is to write "%lld" or "%llu", with an explicit (long long) cast on
the printf argument.

regards, tom lane

bruce@momjian.us

almost 5 years ago

In reply to: Tom Lane (#182)

1 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Apr 7, 2021 at 07:01:25PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Uh, I think your patch missed a few things. First, you use "%zd"
(size_t) for the printf string, but calls to pgstat_get_my_queryid() in
src/backend/utils/error/elog.c used "%ld". Which is correct? I see
pgstat_get_my_queryid() as returning uint64, but I didn't think a uint64
fits in a BIGINT SQL column.

Neither is correct. Project standard these days for printing [u]int64
is to write "%lld" or "%llu", with an explicit (long long) cast on
the printf argument.

Yep, got it. The attached patch fixes all the calls to use %lld, and
adds casts. In implementing cvslog, I noticed that internally we pass
the hash as uint64, but output as int64, which I think is a requirement
for how pg_stat_statements has output it, and the use of bigint. Is
that OK?

I am also confused about the inconsistency of calling the GUC
compute_query_id (with underscore), but pg_stat_activity.queryid. If we
make it pg_stat_activity.query_id, it doesn't match most of the other
*id columsns in the table, leader_pid, usesysid, backend_xid. Is that
OK?I know I suggested pg_stat_activity.query_id, but maybe I was wrong.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

Attachments:

csv.difftext/x-diff; charset=us-asciiDownload

From 2d2bafb3c4c205ef5899074447f316d157345faf Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Wed, 7 Apr 2021 19:29:09 -0400
Subject: [PATCH] csv squash commit

---
 doc/src/sgml/config.sgml       |  4 +++-
 doc/src/sgml/file-fdw.sgml     |  3 ++-
 src/backend/utils/error/elog.c | 12 ++++++++----
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 963824d050..ea5cf3a2dc 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7310,7 +7310,8 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
         character count of the error position therein,
         location of the error in the PostgreSQL source code
         (if <varname>log_error_verbosity</varname> is set to <literal>verbose</literal>),
-        application name, backend type, and process ID of parallel group leader.
+        application name, backend type, process ID of parallel group leader,
+        and query id.
         Here is a sample table definition for storing CSV-format log output:
 
 <programlisting>
@@ -7341,6 +7342,7 @@ CREATE TABLE postgres_log
   application_name text,
   backend_type text,
   leader_pid integer,
+  query_id bigint,
   PRIMARY KEY (session_id, session_line_num)
 );
 </programlisting>
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index 2e21806f48..5b98782064 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -266,7 +266,8 @@ CREATE FOREIGN TABLE pglog (
   location text,
   application_name text,
   backend_type text,
-  leader_pid integer
+  leader_pid integer,
+  query_id bigint
 ) SERVER pglog
 OPTIONS ( filename 'log/pglog.csv', format 'csv' );
 </programlisting>
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 1cf71a649b..1ac18d9a55 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2716,11 +2716,11 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				break;
 			case 'Q':
 				if (padding != 0)
-					appendStringInfo(buf, "%*ld", padding,
-							pgstat_get_my_queryid());
+					appendStringInfo(buf, "%*lld", padding,
+							(long long) pgstat_get_my_queryid());
 				else
-					appendStringInfo(buf, "%ld",
-							pgstat_get_my_queryid());
+					appendStringInfo(buf, "%lld",
+							(long long) pgstat_get_my_queryid());
 				break;
 			default:
 				/* format error - ignore it */
@@ -2967,6 +2967,10 @@ write_csvlog(ErrorData *edata)
 
 	appendStringInfoChar(&buf, '\n');
 
+	/* query id */
+	appendStringInfo(&buf, "%lld", (long long) pgstat_get_my_queryid());
+	appendStringInfoChar(&buf, ',');
+
 	/* If in the syslogger process, try to write messages direct to file */
 	if (MyBackendType == B_LOGGER)
 		write_syslogger_file(buf.data, buf.len, LOG_DESTINATION_CSVLOG);
-- 
2.20.1

rjuju123@gmail.com

almost 5 years ago

In reply to: Bruce Momjian (#183)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Apr 07, 2021 at 07:38:35PM -0400, Bruce Momjian wrote:

On Wed, Apr 7, 2021 at 07:01:25PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Uh, I think your patch missed a few things. First, you use "%zd"
(size_t) for the printf string, but calls to pgstat_get_my_queryid() in
src/backend/utils/error/elog.c used "%ld". Which is correct? I see
pgstat_get_my_queryid() as returning uint64, but I didn't think a uint64
fits in a BIGINT SQL column.

Neither is correct. Project standard these days for printing [u]int64
is to write "%lld" or "%llu", with an explicit (long long) cast on
the printf argument.

Yep, got it. The attached patch fixes all the calls to use %lld, and
adds casts. In implementing cvslog, I noticed that internally we pass
the hash as uint64, but output as int64, which I think is a requirement
for how pg_stat_statements has output it, and the use of bigint. Is
that OK?

Indeed, this is due to how we expose the value in SQL. The original discussion
is at
/messages/by-id/CAH2-WzkueMfAmY3onoXLi+g67SJoKY65Cg9Z1QOhSyhCEU8w3g@mail.gmail.com.
As far as I know this is OK, as we want to show consistent values everywhere.

I am also confused about the inconsistency of calling the GUC
compute_query_id (with underscore), but pg_stat_activity.queryid. If we
make it pg_stat_activity.query_id, it doesn't match most of the other
*id columsns in the table, leader_pid, usesysid, backend_xid. Is that
OK?I know I suggested pg_stat_activity.query_id, but maybe I was wrong.

Mmm, most of the columns in pg_stat_activity do have a "_", so using query_id
would make more sense.

@@ -2967,6 +2967,10 @@ write_csvlog(ErrorData *edata)

appendStringInfoChar(&buf, '\n');

+	/* query id */
+	appendStringInfo(&buf, "%lld", (long long) pgstat_get_my_queryid());
+	appendStringInfoChar(&buf, ',');
+

/* If in the syslogger process, try to write messages direct to file */
if (MyBackendType == B_LOGGER)
write_syslogger_file(buf.data, buf.len, LOG_DESTINATION_CSVLOG);

Unless I'm missing something this will output the query id in the next log
line? The new code should be added before the newline is output, and the comma
should also be output before the queryid.

bruce@momjian.us

almost 5 years ago

In reply to: Julien Rouhaud (#184)

1 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Apr 8, 2021 at 08:47:48AM +0800, Julien Rouhaud wrote:

On Wed, Apr 07, 2021 at 07:38:35PM -0400, Bruce Momjian wrote:

On Wed, Apr 7, 2021 at 07:01:25PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Uh, I think your patch missed a few things. First, you use "%zd"
(size_t) for the printf string, but calls to pgstat_get_my_queryid() in
src/backend/utils/error/elog.c used "%ld". Which is correct? I see
pgstat_get_my_queryid() as returning uint64, but I didn't think a uint64
fits in a BIGINT SQL column.

Neither is correct. Project standard these days for printing [u]int64
is to write "%lld" or "%llu", with an explicit (long long) cast on
the printf argument.

Yep, got it. The attached patch fixes all the calls to use %lld, and
adds casts. In implementing cvslog, I noticed that internally we pass
the hash as uint64, but output as int64, which I think is a requirement
for how pg_stat_statements has output it, and the use of bigint. Is
that OK?

Indeed, this is due to how we expose the value in SQL. The original discussion
is at
/messages/by-id/CAH2-WzkueMfAmY3onoXLi+g67SJoKY65Cg9Z1QOhSyhCEU8w3g@mail.gmail.com.
As far as I know this is OK, as we want to show consistent values everywhere.

OK, yes, I do remember the discussion. I was wondering if there should
be a C comment about this anywhere.

I am also confused about the inconsistency of calling the GUC
compute_query_id (with underscore), but pg_stat_activity.queryid. If we
make it pg_stat_activity.query_id, it doesn't match most of the other
*id columsns in the table, leader_pid, usesysid, backend_xid. Is that
OK?I know I suggested pg_stat_activity.query_id, but maybe I was wrong.

Mmm, most of the columns in pg_stat_activity do have a "_", so using query_id
would make more sense.

OK, let me work on a patch to change that part.

@@ -2967,6 +2967,10 @@ write_csvlog(ErrorData *edata)

appendStringInfoChar(&buf, '\n');
+	/* query id */
+	appendStringInfo(&buf, "%lld", (long long) pgstat_get_my_queryid());
+	appendStringInfoChar(&buf, ',');
+
/* If in the syslogger process, try to write messages direct to file */
if (MyBackendType == B_LOGGER)
write_syslogger_file(buf.data, buf.len, LOG_DESTINATION_CSVLOG);

Unless I'm missing something this will output the query id in the next log
line? The new code should be added before the newline is output, and the comma
should also be output before the queryid.

Yes, correct, updated patch attached.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

Attachments:

csv.difftext/x-diff; charset=us-asciiDownload

From 2e4beaef3690f7b61b91264a95a270e5410dad53 Mon Sep 17 00:00:00 2001
From: Bruce Momjian <bruce@momjian.us>
Date: Wed, 7 Apr 2021 20:51:10 -0400
Subject: [PATCH] csv squash commit

---
 doc/src/sgml/config.sgml       |  4 +++-
 doc/src/sgml/file-fdw.sgml     |  3 ++-
 src/backend/utils/error/elog.c | 12 ++++++++----
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 963824d050..ea5cf3a2dc 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7310,7 +7310,8 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
         character count of the error position therein,
         location of the error in the PostgreSQL source code
         (if <varname>log_error_verbosity</varname> is set to <literal>verbose</literal>),
-        application name, backend type, and process ID of parallel group leader.
+        application name, backend type, process ID of parallel group leader,
+        and query id.
         Here is a sample table definition for storing CSV-format log output:
 
 <programlisting>
@@ -7341,6 +7342,7 @@ CREATE TABLE postgres_log
   application_name text,
   backend_type text,
   leader_pid integer,
+  query_id bigint,
   PRIMARY KEY (session_id, session_line_num)
 );
 </programlisting>
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index 2e21806f48..5b98782064 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -266,7 +266,8 @@ CREATE FOREIGN TABLE pglog (
   location text,
   application_name text,
   backend_type text,
-  leader_pid integer
+  leader_pid integer,
+  query_id bigint
 ) SERVER pglog
 OPTIONS ( filename 'log/pglog.csv', format 'csv' );
 </programlisting>
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 1cf71a649b..a1ebe06d5b 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2716,11 +2716,11 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 				break;
 			case 'Q':
 				if (padding != 0)
-					appendStringInfo(buf, "%*ld", padding,
-							pgstat_get_my_queryid());
+					appendStringInfo(buf, "%*lld", padding,
+							(long long) pgstat_get_my_queryid());
 				else
-					appendStringInfo(buf, "%ld",
-							pgstat_get_my_queryid());
+					appendStringInfo(buf, "%lld",
+							(long long) pgstat_get_my_queryid());
 				break;
 			default:
 				/* format error - ignore it */
@@ -2964,6 +2964,10 @@ write_csvlog(ErrorData *edata)
 		if (leader && leader->pid != MyProcPid)
 			appendStringInfo(&buf, "%d", leader->pid);
 	}
+	appendStringInfoChar(&buf, ',');
+
+	/* query id */
+	appendStringInfo(&buf, "%lld", (long long) pgstat_get_my_queryid());
 
 	appendStringInfoChar(&buf, '\n');
 
-- 
2.20.1

bruce@momjian.us

almost 5 years ago

In reply to: Bruce Momjian (#185)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Apr 7, 2021 at 08:54:02PM -0400, Bruce Momjian wrote:

I am also confused about the inconsistency of calling the GUC
compute_query_id (with underscore), but pg_stat_activity.queryid. If we
make it pg_stat_activity.query_id, it doesn't match most of the other
*id columsns in the table, leader_pid, usesysid, backend_xid. Is that
OK?I know I suggested pg_stat_activity.query_id, but maybe I was wrong.

Mmm, most of the columns in pg_stat_activity do have a "_", so using query_id
would make more sense.

OK, let me work on a patch to change that part.

Uh, it is 'queryid' in pg_stat_statements:

https://www.postgresql.org/docs/13/pgstatstatements.html

queryid bigint
Internal hash code, computed from the statement's parse tree

I am not sure if we should have pg_stat_activity use underscore, or the
GUC use underscore. The problem is that queryid can easily look like
quer-yid.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

bruce@momjian.us

almost 5 years ago

In reply to: Bruce Momjian (#185)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Apr 7, 2021 at 08:54:02PM -0400, Bruce Momjian wrote:

Unless I'm missing something this will output the query id in the next log
line? The new code should be added before the newline is output, and the comma
should also be output before the queryid.

Yes, correct, updated patch attached.

Patch applied.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

rjuju123@gmail.com

almost 5 years ago

In reply to: Bruce Momjian (#187)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Apr 07, 2021 at 10:31:01PM -0400, Bruce Momjian wrote:

On Wed, Apr 7, 2021 at 08:54:02PM -0400, Bruce Momjian wrote:

Unless I'm missing something this will output the query id in the next log
line? The new code should be added before the newline is output, and the comma
should also be output before the queryid.

Yes, correct, updated patch attached.

Patch applied.

Thanks! And I agree with using query_id in the new field names while keeping
queryid for pg_stat_statements to avoid unnecessary query breakage.

bruce@momjian.us

almost 5 years ago

In reply to: Julien Rouhaud (#188)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Apr 8, 2021 at 10:38:08AM +0800, Julien Rouhaud wrote:

On Wed, Apr 07, 2021 at 10:31:01PM -0400, Bruce Momjian wrote:

On Wed, Apr 7, 2021 at 08:54:02PM -0400, Bruce Momjian wrote:

Unless I'm missing something this will output the query id in the next log
line? The new code should be added before the newline is output, and the comma
should also be output before the queryid.

Yes, correct, updated patch attached.

Patch applied.

Thanks! And I agree with using query_id in the new field names while keeping
queryid for pg_stat_statements to avoid unnecessary query breakage.

I think we need more feedback from the group. Do people agree with the
idea above? The question is what to call:

GUC compute_queryid
pg_stat_activity.queryid
pg_stat_statements.queryid

using "queryid" or "query_id", and do they have to match?

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

alvherre@alvh.no-ip.org

almost 5 years ago

In reply to: Bruce Momjian (#189)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On 2021-Apr-07, Bruce Momjian wrote:

On Thu, Apr 8, 2021 at 10:38:08AM +0800, Julien Rouhaud wrote:

Thanks! And I agree with using query_id in the new field names while keeping
queryid for pg_stat_statements to avoid unnecessary query breakage.

I think we need more feedback from the group. Do people agree with the
idea above? The question is what to call:

GUC compute_queryid
pg_stat_activity.queryid
pg_stat_statements.queryid

using "queryid" or "query_id", and do they have to match?

Seems a matter of personal preference. Mine is to have the underscore
everywhere in backend code (where this is new), and let it without the
underscore in pg_stat_statements to avoid breaking existing code. Seems
to match what Julien is saying.

--
ï¿½lvaro Herrera Valdivia, Chile
"La libertad es como el dinero; el que no la sabe emplear la pierde" (Alvarez)

rjuju123@gmail.com

almost 5 years ago

In reply to: Bruce Momjian (#175)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Apr 07, 2021 at 02:12:11PM -0400, Bruce Momjian wrote:

Patch applied. I am ready to adjust this with any improvements people
might have. Thank you for all the good feedback we got on this, and I
know many users have waited a long time for this feature.

Thanks a lot Bruce and everyone! I hope that the users who waited a long time
for this will find everything they need.

Just to validate that this patchset also allows user to use pg_stat_statements,
any additional third-party module and the new added infrastructure with the
queryid algorithm of their choice, I created a POC extension ([1]https://github.com/rjuju/pg_queryid) which works
as expected.

Basically:

SHOW shared_preload_libraries;
shared_preload_libraries
--------------------------
pg_stat_statements, pg_queryid
(1 row)

SET pg_queryid.use_object_names TO on;
SET pg_queryid.ignore_schema TO on;

CREATE SCHEMA ns1; CREATE TABLE ns1.tbl1(id integer);
CREATE SCHEMA ns2; CREATE TABLE ns2.tbl1(id integer);

SET search_path TO ns1;
SELECT COUNT(*) FROM tbl1;
SET search_path TO ns2;
SELECT COUNT(*) FROM tbl1;

SELECT queryid, query, calls
FROM public.pg_stat_statements
WHERE query LIKE '%tbl%';
queryid | query | calls
---------------------+---------------------------+-------
4629593225724429059 | SELECT count(*) from tbl1 | 2
(1 row)

So whether that's a good idea to do that or not, users now have a choice.

[1]: https://github.com/rjuju/pg_queryid

thomas.munro@gmail.com

almost 5 years ago

In reply to: Julien Rouhaud (#191)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

Hi Julien, Bruce,

A warning appears on 32 bit systems:

In file included from pgstatfuncs.c:15:
pgstatfuncs.c: In function 'pg_stat_get_activity':
../../../../src/include/postgres.h:593:29: warning: cast to pointer
from integer of different size [-Wint-to-pointer-cast]
593 | #define DatumGetPointer(X) ((Pointer) (X))
| ^
../../../../src/include/postgres.h:678:42: note: in expansion of macro
'DatumGetPointer'
678 | #define DatumGetUInt64(X) (* ((uint64 *) DatumGetPointer(X)))
| ^~~~~~~~~~~~~~~
pgstatfuncs.c:920:18: note: in expansion of macro 'DatumGetUInt64'
920 | values[29] = DatumGetUInt64(beentry->st_queryid);
| ^~~~~~~~~~~~~~

Hmm, maybe this should be UInt64GetDatum()?

rjuju123@gmail.com

almost 5 years ago

In reply to: Thomas Munro (#192)

2 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Apr 08, 2021 at 11:36:48PM +1200, Thomas Munro wrote:

Hi Julien, Bruce,

A warning appears on 32 bit systems:

In file included from pgstatfuncs.c:15:
pgstatfuncs.c: In function 'pg_stat_get_activity':
../../../../src/include/postgres.h:593:29: warning: cast to pointer
from integer of different size [-Wint-to-pointer-cast]
593 | #define DatumGetPointer(X) ((Pointer) (X))
| ^
../../../../src/include/postgres.h:678:42: note: in expansion of macro
'DatumGetPointer'
678 | #define DatumGetUInt64(X) (* ((uint64 *) DatumGetPointer(X)))
| ^~~~~~~~~~~~~~~
pgstatfuncs.c:920:18: note: in expansion of macro 'DatumGetUInt64'
920 | values[29] = DatumGetUInt64(beentry->st_queryid);
| ^~~~~~~~~~~~~~

Wow, that's really embarrassing :(

Hmm, maybe this should be UInt64GetDatum()?

Yes definitely. I'm attaching the previous patch for force_parallel_mode to
not forget it + a new one for this issue.

Attachments:

v2-0001-Ignore-parallel-workers-in-pg_stat_statements.patchtext/x-diff; charset=us-asciiDownload

From d74523dfb76e7583c27166ec10d72670654c3b7b Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 8 Apr 2021 13:59:43 +0800
Subject: [PATCH v2 1/2] Ignore parallel workers in pg_stat_statements.

Oversight in 4f0b0966c8 which exposed queryid in parallel workers.  Counters
are aggregated by the main backend process so parallel workers would report
duplicated activity, and could also report activity for the wrong entry as they
are only aware of the top level queryid.

Author: Julien Rouhaud
Reported-by: Andres Freund
Discussion: https://postgr.es/m/20210408051735.lfbdzun5zdlax5gd@alap3.anarazel.de
---
 contrib/pg_stat_statements/pg_stat_statements.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index fc2677643b..dbd0d41d88 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -47,6 +47,7 @@
 #include <sys/stat.h>
 #include <unistd.h>
 
+#include "access/parallel.h"
 #include "catalog/pg_authid.h"
 #include "common/hashfn.h"
 #include "executor/instrument.h"
@@ -278,8 +279,9 @@ static bool pgss_save;			/* whether to save stats across shutdown */
 
 
 #define pgss_enabled(level) \
+	(!IsParallelWorker() && \
 	(pgss_track == PGSS_TRACK_ALL || \
-	(pgss_track == PGSS_TRACK_TOP && (level) == 0))
+	(pgss_track == PGSS_TRACK_TOP && (level) == 0)))
 
 #define record_gc_qtexts() \
 	do { \
-- 
2.30.1

v2-0002-Fix-thinko-in-pg_stat_get_activity-when-retrievin.patchtext/x-diff; charset=us-asciiDownload

From 61ff6d226761fcc8f2a28fe8e313382d1d46f098 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 8 Apr 2021 20:05:14 +0800
Subject: [PATCH v2 2/2] Fix thinko in pg_stat_get_activity when retrieving the
 queryid.

---
 src/backend/utils/adt/pgstatfuncs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9fa4a93162..182b15e3f2 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -917,7 +917,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			if (beentry->st_queryid == 0)
 				nulls[29] = true;
 			else
-				values[29] = DatumGetUInt64(beentry->st_queryid);
+				values[29] = UInt64GetDatum(beentry->st_queryid);
 		}
 		else
 		{
-- 
2.30.1

amit.kapila16@gmail.com

almost 5 years ago

In reply to: Julien Rouhaud (#191)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Apr 8, 2021 at 9:47 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Wed, Apr 07, 2021 at 02:12:11PM -0400, Bruce Momjian wrote:

Patch applied. I am ready to adjust this with any improvements people
might have. Thank you for all the good feedback we got on this, and I
know many users have waited a long time for this feature.

Thanks a lot Bruce and everyone! I hope that the users who waited a long time
for this will find everything they need.

@@ -1421,8 +1421,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Setting debug_query_string for individual workers */
debug_query_string = queryDesc->sourceText;

-   /* Report workers' query for monitoring purposes */
+   /* Report workers' query and queryId for monitoring purposes */
    pgstat_report_activity(STATE_RUNNING, debug_query_string);
+   pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);

Below lines down in ParallelQueryMain, we call ExecutorStart which
will report queryid, so do we need it here?

--
With Regards,
Amit Kapila.

rjuju123@gmail.com

almost 5 years ago

In reply to: Amit Kapila (#194)

3 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Apr 08, 2021 at 05:46:07PM +0530, Amit Kapila wrote:

@@ -1421,8 +1421,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Setting debug_query_string for individual workers */
debug_query_string = queryDesc->sourceText;
-   /* Report workers' query for monitoring purposes */
+   /* Report workers' query and queryId for monitoring purposes */
pgstat_report_activity(STATE_RUNNING, debug_query_string);
+   pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
Below lines down in ParallelQueryMain, we call ExecutorStart which
will report queryid, so do we need it here?

Correct, it's not actually needed. The overhead should be negligible but let's
get rid of it. Updated fix patchset attached.

Attachments:

v3-0001-Ignore-parallel-workers-in-pg_stat_statements.patchtext/x-diff; charset=us-asciiDownload

From d74523dfb76e7583c27166ec10d72670654c3b7b Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 8 Apr 2021 13:59:43 +0800
Subject: [PATCH v3 1/3] Ignore parallel workers in pg_stat_statements.

Oversight in 4f0b0966c8 which exposed queryid in parallel workers.  Counters
are aggregated by the main backend process so parallel workers would report
duplicated activity, and could also report activity for the wrong entry as they
are only aware of the top level queryid.

Author: Julien Rouhaud
Reported-by: Andres Freund
Discussion: https://postgr.es/m/20210408051735.lfbdzun5zdlax5gd@alap3.anarazel.de
---
 contrib/pg_stat_statements/pg_stat_statements.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index fc2677643b..dbd0d41d88 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -47,6 +47,7 @@
 #include <sys/stat.h>
 #include <unistd.h>
 
+#include "access/parallel.h"
 #include "catalog/pg_authid.h"
 #include "common/hashfn.h"
 #include "executor/instrument.h"
@@ -278,8 +279,9 @@ static bool pgss_save;			/* whether to save stats across shutdown */
 
 
 #define pgss_enabled(level) \
+	(!IsParallelWorker() && \
 	(pgss_track == PGSS_TRACK_ALL || \
-	(pgss_track == PGSS_TRACK_TOP && (level) == 0))
+	(pgss_track == PGSS_TRACK_TOP && (level) == 0)))
 
 #define record_gc_qtexts() \
 	do { \
-- 
2.30.1

v3-0002-Fix-thinko-in-pg_stat_get_activity-when-retrievin.patchtext/x-diff; charset=us-asciiDownload

From 61ff6d226761fcc8f2a28fe8e313382d1d46f098 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 8 Apr 2021 20:05:14 +0800
Subject: [PATCH v3 2/3] Fix thinko in pg_stat_get_activity when retrieving the
 queryid.

---
 src/backend/utils/adt/pgstatfuncs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9fa4a93162..182b15e3f2 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -917,7 +917,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			if (beentry->st_queryid == 0)
 				nulls[29] = true;
 			else
-				values[29] = DatumGetUInt64(beentry->st_queryid);
+				values[29] = UInt64GetDatum(beentry->st_queryid);
 		}
 		else
 		{
-- 
2.30.1

v3-0003-Remove-unnecessary-call-to-pgstat_report_queryid.patchtext/x-diff; charset=us-asciiDownload

From bc3b5470d12aab282e1b6f6b0b2f4afcc65b7dcf Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 8 Apr 2021 20:25:19 +0800
Subject: [PATCH v3 3/3] Remove unnecessary call to pgstat_report_queryid().

Reported-by: Amit Kapila
---
 src/backend/executor/execParallel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index d104a19767..bfd6155509 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1426,7 +1426,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 
 	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
-	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
+	//pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
-- 
2.30.1

rjuju123@gmail.com

almost 5 years ago

In reply to: Julien Rouhaud (#195)

3 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Apr 08, 2021 at 08:27:20PM +0800, Julien Rouhaud wrote:

On Thu, Apr 08, 2021 at 05:46:07PM +0530, Amit Kapila wrote:
@@ -1421,8 +1421,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Setting debug_query_string for individual workers */
debug_query_string = queryDesc->sourceText;
-   /* Report workers' query for monitoring purposes */
+   /* Report workers' query and queryId for monitoring purposes */
pgstat_report_activity(STATE_RUNNING, debug_query_string);
+   pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
Below lines down in ParallelQueryMain, we call ExecutorStart which
will report queryid, so do we need it here?
Correct, it's not actually needed. The overhead should be negligible but let's
get rid of it. Updated fix patchset attached.

Sorry I messed up the last commit, v4 is ok.

Attachments:

v4-0001-Ignore-parallel-workers-in-pg_stat_statements.patchtext/x-diff; charset=us-asciiDownload

From d74523dfb76e7583c27166ec10d72670654c3b7b Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 8 Apr 2021 13:59:43 +0800
Subject: [PATCH v4 1/3] Ignore parallel workers in pg_stat_statements.

Oversight in 4f0b0966c8 which exposed queryid in parallel workers.  Counters
are aggregated by the main backend process so parallel workers would report
duplicated activity, and could also report activity for the wrong entry as they
are only aware of the top level queryid.

Author: Julien Rouhaud
Reported-by: Andres Freund
Discussion: https://postgr.es/m/20210408051735.lfbdzun5zdlax5gd@alap3.anarazel.de
---
 contrib/pg_stat_statements/pg_stat_statements.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index fc2677643b..dbd0d41d88 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -47,6 +47,7 @@
 #include <sys/stat.h>
 #include <unistd.h>
 
+#include "access/parallel.h"
 #include "catalog/pg_authid.h"
 #include "common/hashfn.h"
 #include "executor/instrument.h"
@@ -278,8 +279,9 @@ static bool pgss_save;			/* whether to save stats across shutdown */
 
 
 #define pgss_enabled(level) \
+	(!IsParallelWorker() && \
 	(pgss_track == PGSS_TRACK_ALL || \
-	(pgss_track == PGSS_TRACK_TOP && (level) == 0))
+	(pgss_track == PGSS_TRACK_TOP && (level) == 0)))
 
 #define record_gc_qtexts() \
 	do { \
-- 
2.30.1

v4-0002-Fix-thinko-in-pg_stat_get_activity-when-retrievin.patchtext/x-diff; charset=us-asciiDownload

From 61ff6d226761fcc8f2a28fe8e313382d1d46f098 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 8 Apr 2021 20:05:14 +0800
Subject: [PATCH v4 2/3] Fix thinko in pg_stat_get_activity when retrieving the
 queryid.

---
 src/backend/utils/adt/pgstatfuncs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9fa4a93162..182b15e3f2 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -917,7 +917,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			if (beentry->st_queryid == 0)
 				nulls[29] = true;
 			else
-				values[29] = DatumGetUInt64(beentry->st_queryid);
+				values[29] = UInt64GetDatum(beentry->st_queryid);
 		}
 		else
 		{
-- 
2.30.1

v4-0003-Remove-unnecessary-call-to-pgstat_report_queryid.patchtext/x-diff; charset=us-asciiDownload

From c3ab7472f1f75c9a26f1b77d748aa267e3483d1e Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 8 Apr 2021 20:25:19 +0800
Subject: [PATCH v4 3/3] Remove unnecessary call to pgstat_report_queryid().

Reported-by: Amit Kapila
---
 src/backend/executor/execParallel.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index d104a19767..4fca8782b2 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1426,7 +1426,6 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 
 	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
-	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
-- 
2.30.1

bruce@momjian.us

almost 5 years ago

In reply to: Julien Rouhaud (#196)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Apr 8, 2021 at 09:31:27PM +0800, Julien Rouhaud wrote:

On Thu, Apr 08, 2021 at 08:27:20PM +0800, Julien Rouhaud wrote:
On Thu, Apr 08, 2021 at 05:46:07PM +0530, Amit Kapila wrote:
@@ -1421,8 +1421,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Setting debug_query_string for individual workers */
debug_query_string = queryDesc->sourceText;
-   /* Report workers' query for monitoring purposes */
+   /* Report workers' query and queryId for monitoring purposes */
pgstat_report_activity(STATE_RUNNING, debug_query_string);
+   pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
Below lines down in ParallelQueryMain, we call ExecutorStart which
will report queryid, so do we need it here?
Correct, it's not actually needed. The overhead should be negligible but let's
get rid of it. Updated fix patchset attached.
Sorry I messed up the last commit, v4 is ok.

Patch applied, thanks.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

bruce@momjian.us

almost 5 years ago

In reply to: Alvaro Herrera (#190)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Apr 7, 2021 at 11:27:04PM -0400, ï¿½lvaro Herrera wrote:

On 2021-Apr-07, Bruce Momjian wrote:

On Thu, Apr 8, 2021 at 10:38:08AM +0800, Julien Rouhaud wrote:

Thanks! And I agree with using query_id in the new field names while keeping
queryid for pg_stat_statements to avoid unnecessary query breakage.

I think we need more feedback from the group. Do people agree with the
idea above? The question is what to call:

GUC compute_queryid
pg_stat_activity.queryid
pg_stat_statements.queryid

using "queryid" or "query_id", and do they have to match?

Seems a matter of personal preference. Mine is to have the underscore
everywhere in backend code (where this is new), and let it without the
underscore in pg_stat_statements to avoid breaking existing code. Seems
to match what Julien is saying.

OK, let's get some details. First, pg_stat_statements.queryid already
exists (no underscore), and I don't think anyone wants to change that.

pg_stat_activity.queryid is new, but I can imagine cases where you would
join pg_stat_activity to pg_stat_statements to get an estimate of how
long the query will take --- having one using an underscore and another
one not seems odd. Also, looking at the existing pg_stat_activity
columns, those don't use underscores before the "id" unless there is a
modifier before the "id", e.g. "pid", "xid":

SELECT attname
FROM pg_namespace JOIN pg_class ON (pg_namespace.oid = relnamespace)
JOIN pg_attribute ON (pg_class.oid = pg_attribute.attrelid)
WHERE nspname = 'pg_catalog' AND
relname = 'pg_stat_activity' AND
attname ~ 'id$';
attname
-------------
backend_xid
datid
leader_pid
pid
queryid
usesysid

We don't have a modifier before queryid.

If people like query_id, and I do too, I am thinking we just keep
query_id as the GUC (compute_query_id), and just accept that the GUC and
SQL levels will not match. This is exactly what we have now. I brought
it up to be sure this is what we want,

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

rjuju123@gmail.com

almost 5 years ago

In reply to: Bruce Momjian (#198)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Apr 08, 2021 at 11:34:25AM -0400, Bruce Momjian wrote:

OK, let's get some details. First, pg_stat_statements.queryid already
exists (no underscore), and I don't think anyone wants to change that.

pg_stat_activity.queryid is new, but I can imagine cases where you would
join pg_stat_activity to pg_stat_statements to get an estimate of how
long the query will take --- having one using an underscore and another
one not seems odd.

Indeed, and also being able to join with a USING clause rather than an ON could
also save some keystrokes. But unfortunately, we already have (userid, dbid)
on pg_stat_statements side vs (usesysid, datid) on pg_stat_activity side, so
this unfortunately won't fix all the oddities.

bruce@momjian.us

almost 5 years ago

In reply to: Julien Rouhaud (#199)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Apr 9, 2021 at 12:38:29AM +0800, Julien Rouhaud wrote:

On Thu, Apr 08, 2021 at 11:34:25AM -0400, Bruce Momjian wrote:

OK, let's get some details. First, pg_stat_statements.queryid already
exists (no underscore), and I don't think anyone wants to change that.

pg_stat_activity.queryid is new, but I can imagine cases where you would
join pg_stat_activity to pg_stat_statements to get an estimate of how
long the query will take --- having one using an underscore and another
one not seems odd.

Indeed, and also being able to join with a USING clause rather than an ON could
also save some keystrokes. But unfortunately, we already have (userid, dbid)
on pg_stat_statements side vs (usesysid, datid) on pg_stat_activity side, so
this unfortunately won't fix all the oddities.

Wow, good point. Shame they don't match.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

alvherre@alvh.no-ip.org

almost 5 years ago

In reply to: Bruce Momjian (#198)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On 2021-Apr-08, Bruce Momjian wrote:

pg_stat_activity.queryid is new, but I can imagine cases where you would
join pg_stat_activity to pg_stat_statements to get an estimate of how
long the query will take --- having one using an underscore and another
one not seems odd.

OK. So far, you have one vote for queryid (your own) and two votes for
query_id (mine and Julien's). And even yourself were hesitating about
it earlier in the thread.

--
ï¿½lvaro Herrera Valdivia, Chile

bruce@momjian.us

almost 5 years ago

In reply to: Alvaro Herrera (#201)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Apr 8, 2021 at 12:51:06PM -0400, ï¿½lvaro Herrera wrote:

On 2021-Apr-08, Bruce Momjian wrote:

pg_stat_activity.queryid is new, but I can imagine cases where you would
join pg_stat_activity to pg_stat_statements to get an estimate of how
long the query will take --- having one using an underscore and another
one not seems odd.

OK. So far, you have one vote for queryid (your own) and two votes for
query_id (mine and Julien's). And even yourself were hesitating about
it earlier in the thread.

OK, if people are fine with pg_stat_activity.query_id not matching
pg_stat_statements.queryid, I am fine with that. I just don't want
someone to say it was a big mistake later. ;-)

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

bruce@momjian.us

almost 5 years ago

In reply to: Bruce Momjian (#202)

1 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Apr 8, 2021 at 01:01:42PM -0400, Bruce Momjian wrote:

On Thu, Apr 8, 2021 at 12:51:06PM -0400, ï¿½lvaro Herrera wrote:

On 2021-Apr-08, Bruce Momjian wrote:

pg_stat_activity.queryid is new, but I can imagine cases where you would
join pg_stat_activity to pg_stat_statements to get an estimate of how
long the query will take --- having one using an underscore and another
one not seems odd.

OK. So far, you have one vote for queryid (your own) and two votes for
query_id (mine and Julien's). And even yourself were hesitating about
it earlier in the thread.

OK, if people are fine with pg_stat_activity.query_id not matching
pg_stat_statements.queryid, I am fine with that. I just don't want
someone to say it was a big mistake later. ;-)

OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I
have not changed any of the APIs which existed before this feature was
added, and are called "queryid" or "queryId" --- it is kind of a mess.
I assume I should leave those unchanged. It will also need a catversion
bump.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

rjuju123@gmail.com

almost 5 years ago

In reply to: Bruce Momjian (#203)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Mon, Apr 12, 2021 at 10:12:46PM -0400, Bruce Momjian wrote:

On Thu, Apr 8, 2021 at 01:01:42PM -0400, Bruce Momjian wrote:

On Thu, Apr 8, 2021 at 12:51:06PM -0400, ï¿½lvaro Herrera wrote:

On 2021-Apr-08, Bruce Momjian wrote:

pg_stat_activity.queryid is new, but I can imagine cases where you would
join pg_stat_activity to pg_stat_statements to get an estimate of how
long the query will take --- having one using an underscore and another
one not seems odd.

OK. So far, you have one vote for queryid (your own) and two votes for
query_id (mine and Julien's). And even yourself were hesitating about
it earlier in the thread.

OK, if people are fine with pg_stat_activity.query_id not matching
pg_stat_statements.queryid, I am fine with that. I just don't want
someone to say it was a big mistake later. ;-)

OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I
have not changed any of the APIs which existed before this feature was
added, and are called "queryid" or "queryId" --- it is kind of a mess.
I assume I should leave those unchanged. It will also need a catversion
bump.

-	uint64		st_queryid;
+	uint64		st_query_id;

I thought we would internally keep queryid/queryId, at least for the variable
names as this is the name of the saved field in PlannedStmt.

-extern void pgstat_report_queryid(uint64 queryId, bool force);
+extern void pgstat_report_query_id(uint64 queryId, bool force);

But if we don't then it should be "uint64 query_id".

alvherre@alvh.no-ip.org

almost 5 years ago

In reply to: Bruce Momjian (#203)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On 2021-Apr-12, Bruce Momjian wrote:

OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I
have not changed any of the APIs which existed before this feature was
added, and are called "queryid" or "queryId" --- it is kind of a mess.
I assume I should leave those unchanged. It will also need a catversion
bump.

I think it is fine actually. These names appear in structs Query and
PlannedStmt, and every single member of those already uses camelCase
naming. Changing those to use "query_id" would look out of place.
You did change the one in PgBackendStatus to st_query_id, which also
matches the naming style in that struct, so that looks fine also.

So I'm -1 on Julien's first proposed change, and +1 on his second
proposed change (the name of the first argument of
pgstat_report_query_id should be query_id).

--
ï¿½lvaro Herrera Valdivia, Chile

bruce@momjian.us

over 4 years ago

In reply to: Alvaro Herrera (#205)

1 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Apr 13, 2021 at 01:30:16PM -0400, ï¿½lvaro Herrera wrote:

On 2021-Apr-12, Bruce Momjian wrote:

OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I
have not changed any of the APIs which existed before this feature was
added, and are called "queryid" or "queryId" --- it is kind of a mess.
I assume I should leave those unchanged. It will also need a catversion
bump.

I think it is fine actually. These names appear in structs Query and
PlannedStmt, and every single member of those already uses camelCase
naming. Changing those to use "query_id" would look out of place.
You did change the one in PgBackendStatus to st_query_id, which also
matches the naming style in that struct, so that looks fine also.

So I'm -1 on Julien's first proposed change, and +1 on his second
proposed change (the name of the first argument of
pgstat_report_query_id should be query_id).

Thanks for your analysis. Updated patch attached with the change
suggested above.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

rjuju123@gmail.com

over 4 years ago

In reply to: Bruce Momjian (#206)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Apr 14, 2021 at 02:33:26PM -0400, Bruce Momjian wrote:

On Tue, Apr 13, 2021 at 01:30:16PM -0400, ï¿½lvaro Herrera wrote:

So I'm -1 on Julien's first proposed change, and +1 on his second
proposed change (the name of the first argument of
pgstat_report_query_id should be query_id).

Thanks for your analysis. Updated patch attached with the change
suggested above.

Thanks Bruce. It looks good to me.

bruce@momjian.us

over 4 years ago

In reply to: Bruce Momjian (#206)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Apr 14, 2021 at 02:33:26PM -0400, Bruce Momjian wrote:

On Tue, Apr 13, 2021 at 01:30:16PM -0400, ï¿½lvaro Herrera wrote:

On 2021-Apr-12, Bruce Momjian wrote:

OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I
have not changed any of the APIs which existed before this feature was
added, and are called "queryid" or "queryId" --- it is kind of a mess.
I assume I should leave those unchanged. It will also need a catversion
bump.

I think it is fine actually. These names appear in structs Query and
PlannedStmt, and every single member of those already uses camelCase
naming. Changing those to use "query_id" would look out of place.
You did change the one in PgBackendStatus to st_query_id, which also
matches the naming style in that struct, so that looks fine also.

So I'm -1 on Julien's first proposed change, and +1 on his second
proposed change (the name of the first argument of
pgstat_report_query_id should be query_id).

Thanks for your analysis. Updated patch attached with the change
suggested above.

Patch applied.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

masao.fujii@oss.nttdata.com

over 4 years ago

In reply to: Bruce Momjian (#208)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On 2021/04/21 1:22, Bruce Momjian wrote:

On Wed, Apr 14, 2021 at 02:33:26PM -0400, Bruce Momjian wrote:

On Tue, Apr 13, 2021 at 01:30:16PM -0400, Álvaro Herrera wrote:

On 2021-Apr-12, Bruce Momjian wrote:

OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I
have not changed any of the APIs which existed before this feature was
added, and are called "queryid" or "queryId" --- it is kind of a mess.
I assume I should leave those unchanged. It will also need a catversion
bump.

I think it is fine actually. These names appear in structs Query and
PlannedStmt, and every single member of those already uses camelCase
naming. Changing those to use "query_id" would look out of place.
You did change the one in PgBackendStatus to st_query_id, which also
matches the naming style in that struct, so that looks fine also.

So I'm -1 on Julien's first proposed change, and +1 on his second
proposed change (the name of the first argument of
pgstat_report_query_id should be query_id).

Thanks for your analysis. Updated patch attached with the change
suggested above.

Patch applied.

I found another small issue in pg_stat_statements docs. The following
description in the docs should be updated so that toplevel is included?

This view contains one row for each distinct database ID, user ID and query ID

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

rjuju123@gmail.com

over 4 years ago

In reply to: Fujii Masao (#209)

1 attachment(s)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote:

I found another small issue in pg_stat_statements docs. The following
description in the docs should be updated so that toplevel is included?

This view contains one row for each distinct database ID, user ID and query ID

Indeed! I'm adding Magnus in Cc.

PFA a patch to fix at, and also mention that toplevel will only
contain True values if pg_stat_statements.track is set to top.

Attachments:

v1-0001-doc-Mention-that-toplevel-is-part-of-pg_stat_stat.patchtext/x-diff; charset=us-asciiDownload

From a1fe3e32ca3c59b059fd87b3070201ce4e7d0e70 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 22 Apr 2021 17:14:45 +0800
Subject: [PATCH v1] doc: Mention that toplevel is part of pg_stat_statements
 key.

While at it, also document that toplevel can only be true if
pg_stat_statements.top is set to top.

Author: Julien Rouhaud
Reported-by: Fuji Masao
Discussion: https://postgr.es/m/a878d5ea-64a7-485e-5d2f-177618ebc52d@oss.nttdata.com
---
 doc/src/sgml/pgstatstatements.sgml | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index a0c315aa97..561452c624 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -46,8 +46,8 @@
   <para>
    The statistics gathered by the module are made available via a
    view named <structname>pg_stat_statements</structname>.  This view
-   contains one row for each distinct database ID, user ID and query
-   ID (up to the maximum number of distinct statements that the module
+   contains one row for each distinct database ID, user ID, query ID and
+   toplevel (up to the maximum number of distinct statements that the module
    can track).  The columns of the view are shown in
    <xref linkend="pgstatstatements-columns"/>.
   </para>
@@ -93,6 +93,8 @@
       </para>
       <para>
        True if the query was executed as a top level statement
+       (requires <varname>pg_stat_statements.toplevel</varname> to be set to
+       <literal>all</literal> to see false values)
       </para></entry>
      </row>
 
-- 
2.30.1

masao.fujii@oss.nttdata.com

over 4 years ago

In reply to: Julien Rouhaud (#210)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On 2021/04/22 18:23, Julien Rouhaud wrote:

On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote:

I found another small issue in pg_stat_statements docs. The following
description in the docs should be updated so that toplevel is included?

This view contains one row for each distinct database ID, user ID and query ID

Indeed! I'm adding Magnus in Cc.

PFA a patch to fix at, and also mention that toplevel will only
contain True values if pg_stat_statements.track is set to top.

Thanks for the patch! LGTM.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Magnus Hagander

magnus@hagander.net

over 4 years ago

In reply to: Fujii Masao (#211)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Apr 23, 2021 at 9:10 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2021/04/22 18:23, Julien Rouhaud wrote:

On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote:

I found another small issue in pg_stat_statements docs. The following
description in the docs should be updated so that toplevel is included?

This view contains one row for each distinct database ID, user ID and query ID

Indeed! I'm adding Magnus in Cc.

PFA a patch to fix at, and also mention that toplevel will only
contain True values if pg_stat_statements.track is set to top.

Thanks for the patch! LGTM.

Agreed, in general. But going by the example a few lines down, I
changed the second part to:
        True if the query was executed as a top level statement
+       (if <varname>pg_stat_statements.track</varname> is set to
+       <literal>all</literal>, otherwise always false)

(changes the wording, but also the name of the parameter is
pg_stat_statements.track, not pg_stat_statements.toplevel (that's the
column, not the parameter). Same error in the commit msg except there
you called it pg_stat_statements.top - but that one needed some more
fix as well)

With those changes, applied. Thanks!

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

masao.fujii@oss.nttdata.com

over 4 years ago

In reply to: Magnus Hagander (#212)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On 2021/04/23 18:46, Magnus Hagander wrote:

On Fri, Apr 23, 2021 at 9:10 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2021/04/22 18:23, Julien Rouhaud wrote:

On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote:

I found another small issue in pg_stat_statements docs. The following
description in the docs should be updated so that toplevel is included?

This view contains one row for each distinct database ID, user ID and query ID

Indeed! I'm adding Magnus in Cc.

PFA a patch to fix at, and also mention that toplevel will only
contain True values if pg_stat_statements.track is set to top.

Thanks for the patch! LGTM.
Agreed, in general. But going by the example a few lines down, I
changed the second part to:
True if the query was executed as a top level statement
+       (if <varname>pg_stat_statements.track</varname> is set to
+       <literal>all</literal>, otherwise always false)

Isn't this confusing? Users may mistakenly read this as that the toplevel
column always indicates false if pg_stat_statements.track is not "all".

(changes the wording, but also the name of the parameter is
pg_stat_statements.track, not pg_stat_statements.toplevel (that's the
column, not the parameter). Same error in the commit msg except there
you called it pg_stat_statements.top - but that one needed some more
fix as well)

With those changes, applied. Thanks!

Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Magnus Hagander

magnus@hagander.net

over 4 years ago

In reply to: Fujii Masao (#213)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Apr 23, 2021 at 12:04 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2021/04/23 18:46, Magnus Hagander wrote:
On Fri, Apr 23, 2021 at 9:10 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2021/04/22 18:23, Julien Rouhaud wrote:

On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote:

I found another small issue in pg_stat_statements docs. The following
description in the docs should be updated so that toplevel is included?

This view contains one row for each distinct database ID, user ID and query ID

Indeed! I'm adding Magnus in Cc.

PFA a patch to fix at, and also mention that toplevel will only
contain True values if pg_stat_statements.track is set to top.

Thanks for the patch! LGTM.
Agreed, in general. But going by the example a few lines down, I
changed the second part to:
True if the query was executed as a top level statement
+       (if <varname>pg_stat_statements.track</varname> is set to
+       <literal>all</literal>, otherwise always false)
Isn't this confusing? Users may mistakenly read this as that the toplevel
column always indicates false if pg_stat_statements.track is not "all".

Hmm. I think you're right. It should say "always true", shouldn't it?
So not just confusing, but completely wrong? :)

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

masao.fujii@oss.nttdata.com

over 4 years ago

In reply to: Magnus Hagander (#214)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On 2021/04/23 19:11, Magnus Hagander wrote:

On Fri, Apr 23, 2021 at 12:04 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
On 2021/04/23 18:46, Magnus Hagander wrote:
On Fri, Apr 23, 2021 at 9:10 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2021/04/22 18:23, Julien Rouhaud wrote:

On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote:

I found another small issue in pg_stat_statements docs. The following
description in the docs should be updated so that toplevel is included?

This view contains one row for each distinct database ID, user ID and query ID

Indeed! I'm adding Magnus in Cc.

PFA a patch to fix at, and also mention that toplevel will only
contain True values if pg_stat_statements.track is set to top.

Thanks for the patch! LGTM.
Agreed, in general. But going by the example a few lines down, I
changed the second part to:
True if the query was executed as a top level statement
+       (if <varname>pg_stat_statements.track</varname> is set to
+       <literal>all</literal>, otherwise always false)
Isn't this confusing? Users may mistakenly read this as that the toplevel
column always indicates false if pg_stat_statements.track is not "all".
Hmm. I think you're right. It should say "always true", shouldn't it?

You're thinking something like the following?

True if the query was executed as a top level statement
(if <varname>pg_stat_statements.track</varname> is set to
<literal>top</literal>, always true)

So not just confusing, but completely wrong? :)

Yeah :)

I'm fine with the original wording by Julien.
Of course, the parameter name should be corrected as you did, though.

Or what about the following?

True if the query was executed as a top level statement
(this can be <literal>false</literal> only if
<varname>pg_stat_statements.track</varname> is set to
<literal>all</literal> and nested statements are also tracked)

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Magnus Hagander

magnus@hagander.net

over 4 years ago

In reply to: Fujii Masao (#215)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Fri, Apr 23, 2021 at 12:40 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2021/04/23 19:11, Magnus Hagander wrote:
On Fri, Apr 23, 2021 at 12:04 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
On 2021/04/23 18:46, Magnus Hagander wrote:
On Fri, Apr 23, 2021 at 9:10 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2021/04/22 18:23, Julien Rouhaud wrote:

On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote:

I found another small issue in pg_stat_statements docs. The following
description in the docs should be updated so that toplevel is included?

This view contains one row for each distinct database ID, user ID and query ID

Indeed! I'm adding Magnus in Cc.

PFA a patch to fix at, and also mention that toplevel will only
contain True values if pg_stat_statements.track is set to top.

Thanks for the patch! LGTM.
Agreed, in general. But going by the example a few lines down, I
changed the second part to:
True if the query was executed as a top level statement
+       (if <varname>pg_stat_statements.track</varname> is set to
+       <literal>all</literal>, otherwise always false)
Isn't this confusing? Users may mistakenly read this as that the toplevel
column always indicates false if pg_stat_statements.track is not "all".
Hmm. I think you're right. It should say "always true", shouldn't it?
You're thinking something like the following?

True if the query was executed as a top level statement
(if <varname>pg_stat_statements.track</varname> is set to
<literal>top</literal>, always true)

So not just confusing, but completely wrong? :)

Yeah :)

Ugh. I completely lost track of this email.

I've applied the change suggested above with another slight reordering
of the words:

+       (always true if <varname>pg_stat_statements.track</varname> is set to
+       <literal>top</literal>)

I'm fine with the original wording by Julien.
Of course, the parameter name should be corrected as you did, though.

Or what about the following?

True if the query was executed as a top level statement
(this can be <literal>false</literal> only if
<varname>pg_stat_statements.track</varname> is set to
<literal>all</literal> and nested statements are also tracked)

I found my suggestion, once the final reordering of words was done,
easier to parse.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 4 years ago

In reply to: Julien Rouhaud (#210)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On 22.04.21 11:23, Julien Rouhaud wrote:

The statistics gathered by the module are made available via a
view named <structname>pg_stat_statements</structname>.  This view
-   contains one row for each distinct database ID, user ID and query
-   ID (up to the maximum number of distinct statements that the module
+   contains one row for each distinct database ID, user ID, query ID and
+   toplevel (up to the maximum number of distinct statements that the module
can track).  The columns of the view are shown in
<xref linkend="pgstatstatements-columns"/>.

I'm having trouble parsing this new sentence. It now says essentially

"This view contains one row for each distinct database ID, each distinct
user ID, each distinct query ID, and each distinct toplevel."

That last part doesn't make sense.

rjuju123@gmail.com

over 4 years ago

In reply to: Peter Eisentraut (#217)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Mon, Jul 12, 2021 at 10:02:59PM +0200, Peter Eisentraut wrote:

On 22.04.21 11:23, Julien Rouhaud wrote:
The statistics gathered by the module are made available via a
view named <structname>pg_stat_statements</structname>.  This view
-   contains one row for each distinct database ID, user ID and query
-   ID (up to the maximum number of distinct statements that the module
+   contains one row for each distinct database ID, user ID, query ID and
+   toplevel (up to the maximum number of distinct statements that the module
can track).  The columns of the view are shown in
<xref linkend="pgstatstatements-columns"/>.
I'm having trouble parsing this new sentence. It now says essentially

"This view contains one row for each distinct database ID, each distinct
user ID, each distinct query ID, and each distinct toplevel."

Isn't it each distinct permutation of all those fields?

That last part doesn't make sense.

I'm not sure what you mean by that. Maybe it's not really self explanatory
without referring to what toplevel is, which is a bool flag stating whether the
statement was exected as a top level statement or not.

So every distinct permutation of (dbid, userid, queryid) can indeed be stored
twice, if pg_stat_statements.track is set to all. However in practice most
statements are not executed both as top level and nested statements.

Magnus Hagander

magnus@hagander.net

over 4 years ago

In reply to: Julien Rouhaud (#218)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Jul 13, 2021 at 8:10 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Mon, Jul 12, 2021 at 10:02:59PM +0200, Peter Eisentraut wrote:
On 22.04.21 11:23, Julien Rouhaud wrote:
The statistics gathered by the module are made available via a
view named <structname>pg_stat_statements</structname>.  This view
-   contains one row for each distinct database ID, user ID and query
-   ID (up to the maximum number of distinct statements that the module
+   contains one row for each distinct database ID, user ID, query ID and
+   toplevel (up to the maximum number of distinct statements that the module
can track).  The columns of the view are shown in
<xref linkend="pgstatstatements-columns"/>.
I'm having trouble parsing this new sentence. It now says essentially

"This view contains one row for each distinct database ID, each distinct
user ID, each distinct query ID, and each distinct toplevel."
Isn't it each distinct permutation of all those fields?

Maybe a problem for the readability of it is that the three other
fields are listed by their description and not by their fieldname, and
toplevel is fieldname?

Maybe "each distinct database id, each distinct user id, each distinct
query id, and whether it is a top level statement or not"?

Or maybe "each distinct combination of database id, user id, query id
and whether it's a top level statement or not"?

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

rjuju123@gmail.com

over 4 years ago

In reply to: Magnus Hagander (#219)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Tue, Jul 13, 2021 at 10:58:12AM +0200, Magnus Hagander wrote:

Maybe a problem for the readability of it is that the three other
fields are listed by their description and not by their fieldname, and
toplevel is fieldname?

I think so too.

Maybe "each distinct database id, each distinct user id, each distinct
query id, and whether it is a top level statement or not"?

Or maybe "each distinct combination of database id, user id, query id
and whether it's a top level statement or not"?

I like the 2nd one better. What about "and its top level status"? It would
keep the sentence short and the full description is right after if needed.

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 4 years ago

In reply to: Magnus Hagander (#219)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On 13.07.21 10:58, Magnus Hagander wrote:

Maybe "each distinct database id, each distinct user id, each distinct
query id, and whether it is a top level statement or not"?

Or maybe "each distinct combination of database id, user id, query id
and whether it's a top level statement or not"?

Okay, now I understand what is meant here. The second one sounds good
to me.

Magnus Hagander

magnus@hagander.net

over 4 years ago

In reply to: Peter Eisentraut (#221)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

On Wed, Jul 14, 2021 at 6:36 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 13.07.21 10:58, Magnus Hagander wrote:

Maybe "each distinct database id, each distinct user id, each distinct
query id, and whether it is a top level statement or not"?

Or maybe "each distinct combination of database id, user id, query id
and whether it's a top level statement or not"?

Okay, now I understand what is meant here. The second one sounds good
to me.

Thanks, will push a fix like that.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/