Make query ID more portable

Started by Andrey V. Lepikhovover 4 years ago9 messages
#1Andrey V. Lepikhov
a.lepikhov@postgrespro.ru
1 attachment(s)

Hi,

QueryID is good tool for query analysis. I want to improve core jumbling
machinery in two ways:
1. QueryID value should survive dump/restore of a database (use fully
qualified name of table instead of relid).
2. QueryID could represent more general class of queries: for example,
it can be independent from permutation of tables in a FROM clause.

See the patch in attachment as an POC. Main idea here is to break
JumbleState down to a 'clocations' part that can be really interested in
a post parse hook and a 'context data', that needed to build query or
subquery signature (hash) and, I guess, isn't really needed in any
extensions.

I think, it adds not much complexity and overhead. It still not
guaranteed equality of queryid on two instances with an equal schema,
but survives across an instance upgrade and allows to do some query
analysis on a replica node.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachments:

0001-Make-queryid-more-portable.patchtext/x-patch; charset=UTF-8; name=0001-Make-queryid-more-portable.patchDownload
From 714111f82569ba827d6387ca3e01e5f364a2c8dd Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Date: Tue, 12 Oct 2021 11:53:50 +0500
Subject: [PATCH] Make queryid more portable. 1. Extract local context from a
 JumbleState. 2. Make a hash value for each range table entry. 3. Make a hash
 signature for each subquery. 4. Use hash instead of rti. 5. Sort hashes of
 range table entries before adding to a context.

TODO:
- Use attnames instead of varattno.
- Use sort of argument hashes at each level of expression jumbling.
---
 contrib/pg_stat_statements/Makefile           |   1 +
 .../t/001_queryid_portability.pl              |  61 ++++
 src/backend/utils/adt/regproc.c               |  25 +-
 src/backend/utils/misc/queryjumble.c          | 319 +++++++++++-------
 src/include/utils/queryjumble.h               |  20 +-
 src/include/utils/regproc.h                   |   1 +
 6 files changed, 299 insertions(+), 128 deletions(-)
 create mode 100644 contrib/pg_stat_statements/t/001_queryid_portability.pl

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 7fabd96f38..bef304e7d4 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -17,6 +17,7 @@ LDFLAGS_SL += $(filter -lm, $(LIBS))
 
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
 REGRESS = pg_stat_statements oldextversions
+TAP_TESTS = 1
 # Disabled because these tests require "shared_preload_libraries=pg_stat_statements",
 # which typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
diff --git a/contrib/pg_stat_statements/t/001_queryid_portability.pl b/contrib/pg_stat_statements/t/001_queryid_portability.pl
new file mode 100644
index 0000000000..80f8bb4e93
--- /dev/null
+++ b/contrib/pg_stat_statements/t/001_queryid_portability.pl
@@ -0,0 +1,61 @@
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 3;
+
+my ($node1, $node2, $result1, $result2);
+
+$node1 = PostgresNode->new('node1');
+$node1->init;
+$node1->append_conf('postgresql.conf', qq{
+	shared_preload_libraries = 'pg_stat_statements'
+	pg_stat_statements.track = 'all'
+});
+$node1->start;
+
+$node2 = PostgresNode->new('node2');
+$node2->init;
+$node2->append_conf('postgresql.conf', qq{
+	shared_preload_libraries = 'pg_stat_statements'
+	pg_stat_statements.track = 'all'
+});
+$node2->start;
+$node2->safe_psql('postgres', qq{CREATE TABLE a(); DROP TABLE a;});
+
+$node1->safe_psql('postgres', q(CREATE EXTENSION pg_stat_statements));
+$node2->safe_psql('postgres', q(CREATE EXTENSION pg_stat_statements));
+
+$node1->safe_psql('postgres', "
+	SELECT pg_stat_statements_reset();
+	CREATE TABLE a (x int, y varchar);
+	CREATE TABLE b (x int);
+	SELECT * FROM a;"
+);
+$node2->safe_psql('postgres', "
+	SELECT pg_stat_statements_reset();
+	CREATE TABLE a (y varchar, x int);
+	CREATE TABLE b (x int);
+	SELECT * FROM a;
+");
+
+$result1 = $node1->safe_psql('postgres', "SELECT queryid FROM pg_stat_statements WHERE query LIKE 'SELECT * FROM a';");
+$result2 = $node2->safe_psql('postgres', "SELECT queryid FROM pg_stat_statements WHERE query LIKE 'SELECT * FROM a';");
+is($result1, $result2);
+
+$node1->safe_psql('postgres', "SELECT x FROM a");
+$node2->safe_psql('postgres', "SELECT x FROM a");
+$result1 = $node1->safe_psql('postgres', "SELECT queryid FROM pg_stat_statements WHERE query LIKE 'SELECT x FROM a';");
+$result2 = $node2->safe_psql('postgres', "SELECT queryid FROM pg_stat_statements WHERE query LIKE 'SELECT x FROM a';");
+is(($result1 != $result2), 1); # TODO
+
+$node1->safe_psql('postgres', "SELECT * FROM a,b WHERE a.x = b.x;");
+$node2->safe_psql('postgres', "SELECT * FROM b,a WHERE a.x = b.x;");
+$result1 = $node1->safe_psql('postgres', "SELECT queryid FROM pg_stat_statements WHERE query LIKE 'SELECT * FROM a,b WHERE a.x = b.x;'");
+$result2 = $node2->safe_psql('postgres', "SELECT queryid FROM pg_stat_statements WHERE query LIKE 'SELECT * FROM b,a WHERE a.x = b.x;'");
+diag("$result1, \n $result2");
+is($result1, $result2);
+
+$node1->stop();
+$node2->stop();
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index e4fb9d31d9..c0e4f7cdd3 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -1003,20 +1003,16 @@ to_regclass(PG_FUNCTION_ARGS)
 		PG_RETURN_NULL();
 }
 
-/*
- * regclassout		- converts class OID to "class_name"
- */
-Datum
-regclassout(PG_FUNCTION_ARGS)
+char *
+regclassout_ext(Oid classid, bool forceQualify)
 {
-	Oid			classid = PG_GETARG_OID(0);
 	char	   *result;
 	HeapTuple	classtup;
 
 	if (classid == InvalidOid)
 	{
 		result = pstrdup("-");
-		PG_RETURN_CSTRING(result);
+		return result;
 	}
 
 	classtup = SearchSysCache1(RELOID, ObjectIdGetDatum(classid));
@@ -1040,7 +1036,7 @@ regclassout(PG_FUNCTION_ARGS)
 			/*
 			 * Would this class be found by regclassin? If not, qualify it.
 			 */
-			if (RelationIsVisible(classid))
+			if (!forceQualify && RelationIsVisible(classid))
 				nspname = NULL;
 			else
 				nspname = get_namespace_name(classform->relnamespace);
@@ -1057,6 +1053,19 @@ regclassout(PG_FUNCTION_ARGS)
 		snprintf(result, NAMEDATALEN, "%u", classid);
 	}
 
+	return result;
+}
+
+/*
+ * regclassout		- converts class OID to "class_name"
+ */
+Datum
+regclassout(PG_FUNCTION_ARGS)
+{
+	Oid			classid = PG_GETARG_OID(0);
+	char	   *result;
+
+	result = regclassout_ext(classid, false);
 	PG_RETURN_CSTRING(result);
 }
 
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index 9f2cd1f127..38cae1737c 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -35,6 +35,7 @@
 #include "common/hashfn.h"
 #include "miscadmin.h"
 #include "parser/scansup.h"
+#include "utils/regproc.h"
 #include "utils/queryjumble.h"
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
@@ -46,12 +47,14 @@ int			compute_query_id = COMPUTE_QUERY_ID_AUTO;
 bool		query_id_enabled = false;
 
 static uint64 compute_utility_query_id(const char *str, int query_location, int query_len);
-static void AppendJumble(JumbleState *jstate,
+static void AppendJumble(JumbleContext *ctx,
 						 const unsigned char *item, Size size);
-static void JumbleQueryInternal(JumbleState *jstate, Query *query);
-static void JumbleRangeTable(JumbleState *jstate, List *rtable);
-static void JumbleRowMarks(JumbleState *jstate, List *rowMarks);
-static void JumbleExpr(JumbleState *jstate, Node *node);
+static uint64 JumbleQueryInternal(JumbleState *jstate, Query *query);
+static void BuildHashesByRangeTable(JumbleState *jstate, List *rtable,
+							 JumbleContext *ctx);
+static void JumbleRowMarks(JumbleState *jstate, List *rowMarks,
+						   JumbleContext *ctx);
+static void JumbleExpr(JumbleState *jstate, Node *node, JumbleContext *ctx);
 static void RecordConstLocation(JumbleState *jstate, int location);
 
 /*
@@ -97,6 +100,32 @@ CleanQuerytext(const char *query, int *location, int *len)
 	return query;
 }
 
+static void
+free_JumbleContext(JumbleContext * ctx)
+{
+	if (!ctx)
+		return;
+
+	pfree(ctx->jumble);
+	if (ctx->rte_hashes)
+		pfree(ctx->rte_hashes);
+	pfree(ctx);
+}
+
+static JumbleContext *
+make_JumbleContext(void)
+{
+	JumbleContext *ctx = (JumbleContext *) palloc(sizeof(JumbleContext));
+
+	/* Set up workspace for query jumbling */
+	ctx->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
+	ctx->jumble_len = 0;
+	ctx->rte_hashes = NULL;
+	ctx->nhashes = -1;
+
+	return ctx;
+}
+
 JumbleState *
 JumbleQuery(Query *query, const char *querytext)
 {
@@ -115,8 +144,6 @@ JumbleQuery(Query *query, const char *querytext)
 		jstate = (JumbleState *) palloc(sizeof(JumbleState));
 
 		/* Set up workspace for query jumbling */
-		jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
-		jstate->jumble_len = 0;
 		jstate->clocations_buf_size = 32;
 		jstate->clocations = (LocationLen *)
 			palloc(jstate->clocations_buf_size * sizeof(LocationLen));
@@ -124,10 +151,7 @@ JumbleQuery(Query *query, const char *querytext)
 		jstate->highest_extern_param_id = 0;
 
 		/* Compute query ID and mark the Query node with it */
-		JumbleQueryInternal(jstate, query);
-		query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
-														  jstate->jumble_len,
-														  0));
+		query->queryId = JumbleQueryInternal(jstate, query);
 
 		/*
 		 * If we are unlucky enough to get a hash of zero, use 1 instead, to
@@ -186,10 +210,10 @@ compute_utility_query_id(const char *query_text, int query_location, int query_l
  * the current jumble.
  */
 static void
-AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
+AppendJumble(JumbleContext *ctx, const unsigned char *item, Size size)
 {
-	unsigned char *jumble = jstate->jumble;
-	Size		jumble_len = jstate->jumble_len;
+	unsigned char *jumble = ctx->jumble;
+	Size		jumble_len = ctx->jumble_len;
 
 	/*
 	 * Whenever the jumble buffer is full, we hash the current contents and
@@ -215,7 +239,7 @@ AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
 		item += part_size;
 		size -= part_size;
 	}
-	jstate->jumble_len = jumble_len;
+	ctx->jumble_len = jumble_len;
 }
 
 /*
@@ -223,9 +247,25 @@ AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
  * of individual local variable elements.
  */
 #define APP_JUMB(item) \
-	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
+	AppendJumble(ctx, (const unsigned char *) &(item), sizeof(item))
 #define APP_JUMB_STRING(str) \
-	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
+	AppendJumble(ctx, (const unsigned char *) (str), strlen(str) + 1)
+#define APP_JUMB_QUERY(item) \
+	{ \
+		uint64 hash = JumbleQueryInternal(jstate, item); \
+		AppendJumble(ctx, (const unsigned char *) &(hash), sizeof(uint64)); \
+	}
+
+static int
+uint64_cmp(const void *a, const void *b)
+{
+	if (*(uint64 *) a < *(uint64 *) b)
+		return -1;
+	else if (*(uint64 *) a > *(uint64 *) b)
+		return 1;
+	else
+		return 0;
+}
 
 /*
  * JumbleQueryInternal: Selectively serialize the query tree, appending
@@ -236,41 +276,66 @@ AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
  * be deduced from child nodes (else we'd just be double-hashing that piece
  * of information).
  */
-static void
+static uint64
 JumbleQueryInternal(JumbleState *jstate, Query *query)
 {
+	JumbleContext *ctx = make_JumbleContext();
+
 	Assert(IsA(query, Query));
 	Assert(query->utilityStmt == NULL);
 
+	ctx->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
+	ctx->jumble_len = 0;
+
+	/* Create an array of source hashes. */
+	BuildHashesByRangeTable(jstate, query->rtable, ctx);
+
 	APP_JUMB(query->commandType);
 	/* resultRelation is usually predictable from commandType */
-	JumbleExpr(jstate, (Node *) query->cteList);
-	JumbleRangeTable(jstate, query->rtable);
-	JumbleExpr(jstate, (Node *) query->jointree);
-	JumbleExpr(jstate, (Node *) query->targetList);
-	JumbleExpr(jstate, (Node *) query->onConflict);
-	JumbleExpr(jstate, (Node *) query->returningList);
-	JumbleExpr(jstate, (Node *) query->groupClause);
+	JumbleExpr(jstate, (Node *) query->cteList, ctx);
+	JumbleExpr(jstate, (Node *) query->jointree, ctx);
+	JumbleExpr(jstate, (Node *) query->targetList, ctx);
+	JumbleExpr(jstate, (Node *) query->onConflict, ctx);
+	JumbleExpr(jstate, (Node *) query->returningList, ctx);
+	JumbleExpr(jstate, (Node *) query->groupClause, ctx);
 	APP_JUMB(query->groupDistinct);
-	JumbleExpr(jstate, (Node *) query->groupingSets);
-	JumbleExpr(jstate, query->havingQual);
-	JumbleExpr(jstate, (Node *) query->windowClause);
-	JumbleExpr(jstate, (Node *) query->distinctClause);
-	JumbleExpr(jstate, (Node *) query->sortClause);
-	JumbleExpr(jstate, query->limitOffset);
-	JumbleExpr(jstate, query->limitCount);
+	JumbleExpr(jstate, (Node *) query->groupingSets, ctx);
+	JumbleExpr(jstate, query->havingQual, ctx);
+	JumbleExpr(jstate, (Node *) query->windowClause, ctx);
+	JumbleExpr(jstate, (Node *) query->distinctClause, ctx);
+	JumbleExpr(jstate, (Node *) query->sortClause, ctx);
+	JumbleExpr(jstate, query->limitOffset, ctx);
+	JumbleExpr(jstate, query->limitCount, ctx);
 	APP_JUMB(query->limitOption);
-	JumbleRowMarks(jstate, query->rowMarks);
-	JumbleExpr(jstate, query->setOperations);
+	JumbleRowMarks(jstate, query->rowMarks, ctx);
+	JumbleExpr(jstate, query->setOperations, ctx);
+
+	if (ctx->rte_hashes != NULL)
+	{
+		Assert(ctx->nhashes > 0);
+		qsort(ctx->rte_hashes, ctx->nhashes, sizeof(uint64), uint64_cmp);
+		AppendJumble(ctx,
+					 (const unsigned char *) ctx->rte_hashes,
+					 ctx->nhashes * sizeof(uint64));
+	}
+
+	query->queryId = DatumGetUInt64(hash_any_extended(ctx->jumble,
+									ctx->jumble_len,
+									0));
+	free_JumbleContext(ctx);
+	return query->queryId;
 }
 
 /*
  * Jumble a range table
  */
 static void
-JumbleRangeTable(JumbleState *jstate, List *rtable)
+BuildHashesByRangeTable(JumbleState *jstate, List *rtable, JumbleContext *origin_ctx)
 {
 	ListCell   *lc;
+	uint64	   *rte_hashes = palloc(list_length(rtable) * sizeof(uint64));
+	int			rti = 0;
+	JumbleContext *ctx = make_JumbleContext();
 
 	foreach(lc, rtable)
 	{
@@ -280,24 +345,28 @@ JumbleRangeTable(JumbleState *jstate, List *rtable)
 		switch (rte->rtekind)
 		{
 			case RTE_RELATION:
-				APP_JUMB(rte->relid);
-				JumbleExpr(jstate, (Node *) rte->tablesample);
+			{
+				char *relname = regclassout_ext(rte->relid, true);
+
+				APP_JUMB_STRING(relname);
+				JumbleExpr(jstate, (Node *) rte->tablesample, ctx);
 				APP_JUMB(rte->inh);
 				break;
+			}
 			case RTE_SUBQUERY:
-				JumbleQueryInternal(jstate, rte->subquery);
+				APP_JUMB_QUERY(rte->subquery);
 				break;
 			case RTE_JOIN:
 				APP_JUMB(rte->jointype);
 				break;
 			case RTE_FUNCTION:
-				JumbleExpr(jstate, (Node *) rte->functions);
+				JumbleExpr(jstate, (Node *) rte->functions, ctx);
 				break;
 			case RTE_TABLEFUNC:
-				JumbleExpr(jstate, (Node *) rte->tablefunc);
+				JumbleExpr(jstate, (Node *) rte->tablefunc, ctx);
 				break;
 			case RTE_VALUES:
-				JumbleExpr(jstate, (Node *) rte->values_lists);
+				JumbleExpr(jstate, (Node *) rte->values_lists, ctx);
 				break;
 			case RTE_CTE:
 
@@ -317,14 +386,34 @@ JumbleRangeTable(JumbleState *jstate, List *rtable)
 				elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
 				break;
 		}
+
+		/* Add signature of an entry. */
+		rte_hashes[rti++] = DatumGetUInt64(hash_any_extended(ctx->jumble,
+										   ctx->jumble_len,
+										   0));
+
+		/* Quick context reset */
+		ctx->jumble_len = 0;
+	}
+
+	/*
+	 * Will use it during the query jambling and add to the signature at the end
+	 * of the process after sorting.
+	 */
+	if (rti > 0)
+	{
+		Assert(origin_ctx->rte_hashes == NULL && origin_ctx->nhashes == -1);
+		origin_ctx->rte_hashes = rte_hashes;
+		origin_ctx->nhashes = rti;
 	}
+	free_JumbleContext(ctx);
 }
 
 /*
  * Jumble a rowMarks list
  */
 static void
-JumbleRowMarks(JumbleState *jstate, List *rowMarks)
+JumbleRowMarks(JumbleState *jstate, List *rowMarks, JumbleContext *ctx)
 {
 	ListCell   *lc;
 
@@ -334,7 +423,8 @@ JumbleRowMarks(JumbleState *jstate, List *rowMarks)
 
 		if (!rowmark->pushedDown)
 		{
-			APP_JUMB(rowmark->rti);
+			Assert(rowmark->rti < ctx->nhashes);
+			APP_JUMB(ctx->rte_hashes[rowmark->rti]);
 			APP_JUMB(rowmark->strength);
 			APP_JUMB(rowmark->waitPolicy);
 		}
@@ -356,7 +446,7 @@ JumbleRowMarks(JumbleState *jstate, List *rowMarks)
  * about any unrecognized node type.
  */
 static void
-JumbleExpr(JumbleState *jstate, Node *node)
+JumbleExpr(JumbleState *jstate, Node *node, JumbleContext *ctx)
 {
 	ListCell   *temp;
 
@@ -377,6 +467,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 		case T_Var:
 			{
 				Var		   *var = (Var *) node;
+//				get_attname(Oid relid, AttrNumber attnum, bool missing_ok)
 
 				APP_JUMB(var->varno);
 				APP_JUMB(var->varattno);
@@ -411,18 +502,18 @@ JumbleExpr(JumbleState *jstate, Node *node)
 				Aggref	   *expr = (Aggref *) node;
 
 				APP_JUMB(expr->aggfnoid);
-				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggorder);
-				JumbleExpr(jstate, (Node *) expr->aggdistinct);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
+				JumbleExpr(jstate, (Node *) expr->aggdirectargs, ctx);
+				JumbleExpr(jstate, (Node *) expr->args, ctx);
+				JumbleExpr(jstate, (Node *) expr->aggorder, ctx);
+				JumbleExpr(jstate, (Node *) expr->aggdistinct, ctx);
+				JumbleExpr(jstate, (Node *) expr->aggfilter, ctx);
 			}
 			break;
 		case T_GroupingFunc:
 			{
 				GroupingFunc *grpnode = (GroupingFunc *) node;
 
-				JumbleExpr(jstate, (Node *) grpnode->refs);
+				JumbleExpr(jstate, (Node *) grpnode->refs, ctx);
 				APP_JUMB(grpnode->agglevelsup);
 			}
 			break;
@@ -432,18 +523,18 @@ JumbleExpr(JumbleState *jstate, Node *node)
 
 				APP_JUMB(expr->winfnoid);
 				APP_JUMB(expr->winref);
-				JumbleExpr(jstate, (Node *) expr->args);
-				JumbleExpr(jstate, (Node *) expr->aggfilter);
+				JumbleExpr(jstate, (Node *) expr->args, ctx);
+				JumbleExpr(jstate, (Node *) expr->aggfilter, ctx);
 			}
 			break;
 		case T_SubscriptingRef:
 			{
 				SubscriptingRef *sbsref = (SubscriptingRef *) node;
 
-				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refexpr);
-				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
+				JumbleExpr(jstate, (Node *) sbsref->refupperindexpr, ctx);
+				JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr, ctx);
+				JumbleExpr(jstate, (Node *) sbsref->refexpr, ctx);
+				JumbleExpr(jstate, (Node *) sbsref->refassgnexpr, ctx);
 			}
 			break;
 		case T_FuncExpr:
@@ -451,7 +542,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 				FuncExpr   *expr = (FuncExpr *) node;
 
 				APP_JUMB(expr->funcid);
-				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->args, ctx);
 			}
 			break;
 		case T_NamedArgExpr:
@@ -459,7 +550,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 				NamedArgExpr *nae = (NamedArgExpr *) node;
 
 				APP_JUMB(nae->argnumber);
-				JumbleExpr(jstate, (Node *) nae->arg);
+				JumbleExpr(jstate, (Node *) nae->arg, ctx);
 			}
 			break;
 		case T_OpExpr:
@@ -469,7 +560,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 				OpExpr	   *expr = (OpExpr *) node;
 
 				APP_JUMB(expr->opno);
-				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->args, ctx);
 			}
 			break;
 		case T_ScalarArrayOpExpr:
@@ -478,7 +569,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 
 				APP_JUMB(expr->opno);
 				APP_JUMB(expr->useOr);
-				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->args, ctx);
 			}
 			break;
 		case T_BoolExpr:
@@ -486,7 +577,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 				BoolExpr   *expr = (BoolExpr *) node;
 
 				APP_JUMB(expr->boolop);
-				JumbleExpr(jstate, (Node *) expr->args);
+				JumbleExpr(jstate, (Node *) expr->args, ctx);
 			}
 			break;
 		case T_SubLink:
@@ -495,8 +586,8 @@ JumbleExpr(JumbleState *jstate, Node *node)
 
 				APP_JUMB(sublink->subLinkType);
 				APP_JUMB(sublink->subLinkId);
-				JumbleExpr(jstate, (Node *) sublink->testexpr);
-				JumbleQueryInternal(jstate, castNode(Query, sublink->subselect));
+				JumbleExpr(jstate, (Node *) sublink->testexpr, ctx);
+				APP_JUMB_QUERY(castNode(Query, sublink->subselect));
 			}
 			break;
 		case T_FieldSelect:
@@ -504,15 +595,15 @@ JumbleExpr(JumbleState *jstate, Node *node)
 				FieldSelect *fs = (FieldSelect *) node;
 
 				APP_JUMB(fs->fieldnum);
-				JumbleExpr(jstate, (Node *) fs->arg);
+				JumbleExpr(jstate, (Node *) fs->arg, ctx);
 			}
 			break;
 		case T_FieldStore:
 			{
 				FieldStore *fstore = (FieldStore *) node;
 
-				JumbleExpr(jstate, (Node *) fstore->arg);
-				JumbleExpr(jstate, (Node *) fstore->newvals);
+				JumbleExpr(jstate, (Node *) fstore->arg, ctx);
+				JumbleExpr(jstate, (Node *) fstore->newvals, ctx);
 			}
 			break;
 		case T_RelabelType:
@@ -520,7 +611,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 				RelabelType *rt = (RelabelType *) node;
 
 				APP_JUMB(rt->resulttype);
-				JumbleExpr(jstate, (Node *) rt->arg);
+				JumbleExpr(jstate, (Node *) rt->arg, ctx);
 			}
 			break;
 		case T_CoerceViaIO:
@@ -528,7 +619,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 				CoerceViaIO *cio = (CoerceViaIO *) node;
 
 				APP_JUMB(cio->resulttype);
-				JumbleExpr(jstate, (Node *) cio->arg);
+				JumbleExpr(jstate, (Node *) cio->arg, ctx);
 			}
 			break;
 		case T_ArrayCoerceExpr:
@@ -536,8 +627,8 @@ JumbleExpr(JumbleState *jstate, Node *node)
 				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
 
 				APP_JUMB(acexpr->resulttype);
-				JumbleExpr(jstate, (Node *) acexpr->arg);
-				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
+				JumbleExpr(jstate, (Node *) acexpr->arg, ctx);
+				JumbleExpr(jstate, (Node *) acexpr->elemexpr, ctx);
 			}
 			break;
 		case T_ConvertRowtypeExpr:
@@ -545,7 +636,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
 
 				APP_JUMB(crexpr->resulttype);
-				JumbleExpr(jstate, (Node *) crexpr->arg);
+				JumbleExpr(jstate, (Node *) crexpr->arg, ctx);
 			}
 			break;
 		case T_CollateExpr:
@@ -553,22 +644,22 @@ JumbleExpr(JumbleState *jstate, Node *node)
 				CollateExpr *ce = (CollateExpr *) node;
 
 				APP_JUMB(ce->collOid);
-				JumbleExpr(jstate, (Node *) ce->arg);
+				JumbleExpr(jstate, (Node *) ce->arg, ctx);
 			}
 			break;
 		case T_CaseExpr:
 			{
 				CaseExpr   *caseexpr = (CaseExpr *) node;
 
-				JumbleExpr(jstate, (Node *) caseexpr->arg);
+				JumbleExpr(jstate, (Node *) caseexpr->arg, ctx);
 				foreach(temp, caseexpr->args)
 				{
 					CaseWhen   *when = lfirst_node(CaseWhen, temp);
 
-					JumbleExpr(jstate, (Node *) when->expr);
-					JumbleExpr(jstate, (Node *) when->result);
+					JumbleExpr(jstate, (Node *) when->expr, ctx);
+					JumbleExpr(jstate, (Node *) when->result, ctx);
 				}
-				JumbleExpr(jstate, (Node *) caseexpr->defresult);
+				JumbleExpr(jstate, (Node *) caseexpr->defresult, ctx);
 			}
 			break;
 		case T_CaseTestExpr:
@@ -579,29 +670,29 @@ JumbleExpr(JumbleState *jstate, Node *node)
 			}
 			break;
 		case T_ArrayExpr:
-			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements);
+			JumbleExpr(jstate, (Node *) ((ArrayExpr *) node)->elements, ctx);
 			break;
 		case T_RowExpr:
-			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args);
+			JumbleExpr(jstate, (Node *) ((RowExpr *) node)->args, ctx);
 			break;
 		case T_RowCompareExpr:
 			{
 				RowCompareExpr *rcexpr = (RowCompareExpr *) node;
 
 				APP_JUMB(rcexpr->rctype);
-				JumbleExpr(jstate, (Node *) rcexpr->largs);
-				JumbleExpr(jstate, (Node *) rcexpr->rargs);
+				JumbleExpr(jstate, (Node *) rcexpr->largs, ctx);
+				JumbleExpr(jstate, (Node *) rcexpr->rargs, ctx);
 			}
 			break;
 		case T_CoalesceExpr:
-			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args);
+			JumbleExpr(jstate, (Node *) ((CoalesceExpr *) node)->args, ctx);
 			break;
 		case T_MinMaxExpr:
 			{
 				MinMaxExpr *mmexpr = (MinMaxExpr *) node;
 
 				APP_JUMB(mmexpr->op);
-				JumbleExpr(jstate, (Node *) mmexpr->args);
+				JumbleExpr(jstate, (Node *) mmexpr->args, ctx);
 			}
 			break;
 		case T_SQLValueFunction:
@@ -618,8 +709,8 @@ JumbleExpr(JumbleState *jstate, Node *node)
 				XmlExpr    *xexpr = (XmlExpr *) node;
 
 				APP_JUMB(xexpr->op);
-				JumbleExpr(jstate, (Node *) xexpr->named_args);
-				JumbleExpr(jstate, (Node *) xexpr->args);
+				JumbleExpr(jstate, (Node *) xexpr->named_args, ctx);
+				JumbleExpr(jstate, (Node *) xexpr->args, ctx);
 			}
 			break;
 		case T_NullTest:
@@ -627,7 +718,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 				NullTest   *nt = (NullTest *) node;
 
 				APP_JUMB(nt->nulltesttype);
-				JumbleExpr(jstate, (Node *) nt->arg);
+				JumbleExpr(jstate, (Node *) nt->arg, ctx);
 			}
 			break;
 		case T_BooleanTest:
@@ -635,7 +726,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 				BooleanTest *bt = (BooleanTest *) node;
 
 				APP_JUMB(bt->booltesttype);
-				JumbleExpr(jstate, (Node *) bt->arg);
+				JumbleExpr(jstate, (Node *) bt->arg, ctx);
 			}
 			break;
 		case T_CoerceToDomain:
@@ -643,7 +734,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 				CoerceToDomain *cd = (CoerceToDomain *) node;
 
 				APP_JUMB(cd->resulttype);
-				JumbleExpr(jstate, (Node *) cd->arg);
+				JumbleExpr(jstate, (Node *) cd->arg, ctx);
 			}
 			break;
 		case T_CoerceToDomainValue:
@@ -684,7 +775,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 
 				APP_JUMB(ie->infercollid);
 				APP_JUMB(ie->inferopclass);
-				JumbleExpr(jstate, ie->expr);
+				JumbleExpr(jstate, ie->expr, ctx);
 			}
 			break;
 		case T_TargetEntry:
@@ -693,7 +784,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 
 				APP_JUMB(tle->resno);
 				APP_JUMB(tle->ressortgroupref);
-				JumbleExpr(jstate, (Node *) tle->expr);
+				JumbleExpr(jstate, (Node *) tle->expr, ctx);
 			}
 			break;
 		case T_RangeTblRef:
@@ -710,17 +801,17 @@ JumbleExpr(JumbleState *jstate, Node *node)
 				APP_JUMB(join->jointype);
 				APP_JUMB(join->isNatural);
 				APP_JUMB(join->rtindex);
-				JumbleExpr(jstate, join->larg);
-				JumbleExpr(jstate, join->rarg);
-				JumbleExpr(jstate, join->quals);
+				JumbleExpr(jstate, join->larg, ctx);
+				JumbleExpr(jstate, join->rarg, ctx);
+				JumbleExpr(jstate, join->quals, ctx);
 			}
 			break;
 		case T_FromExpr:
 			{
 				FromExpr   *from = (FromExpr *) node;
 
-				JumbleExpr(jstate, (Node *) from->fromlist);
-				JumbleExpr(jstate, from->quals);
+				JumbleExpr(jstate, (Node *) from->fromlist, ctx);
+				JumbleExpr(jstate, from->quals, ctx);
 			}
 			break;
 		case T_OnConflictExpr:
@@ -728,19 +819,19 @@ JumbleExpr(JumbleState *jstate, Node *node)
 				OnConflictExpr *conf = (OnConflictExpr *) node;
 
 				APP_JUMB(conf->action);
-				JumbleExpr(jstate, (Node *) conf->arbiterElems);
-				JumbleExpr(jstate, conf->arbiterWhere);
-				JumbleExpr(jstate, (Node *) conf->onConflictSet);
-				JumbleExpr(jstate, conf->onConflictWhere);
+				JumbleExpr(jstate, (Node *) conf->arbiterElems, ctx);
+				JumbleExpr(jstate, conf->arbiterWhere, ctx);
+				JumbleExpr(jstate, (Node *) conf->onConflictSet, ctx);
+				JumbleExpr(jstate, conf->onConflictWhere, ctx);
 				APP_JUMB(conf->constraint);
 				APP_JUMB(conf->exclRelIndex);
-				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
+				JumbleExpr(jstate, (Node *) conf->exclRelTlist, ctx);
 			}
 			break;
 		case T_List:
 			foreach(temp, (List *) node)
 			{
-				JumbleExpr(jstate, (Node *) lfirst(temp));
+				JumbleExpr(jstate, (Node *) lfirst(temp), ctx);
 			}
 			break;
 		case T_IntList:
@@ -763,7 +854,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 			{
 				GroupingSet *gsnode = (GroupingSet *) node;
 
-				JumbleExpr(jstate, (Node *) gsnode->content);
+				JumbleExpr(jstate, (Node *) gsnode->content, ctx);
 			}
 			break;
 		case T_WindowClause:
@@ -772,10 +863,10 @@ JumbleExpr(JumbleState *jstate, Node *node)
 
 				APP_JUMB(wc->winref);
 				APP_JUMB(wc->frameOptions);
-				JumbleExpr(jstate, (Node *) wc->partitionClause);
-				JumbleExpr(jstate, (Node *) wc->orderClause);
-				JumbleExpr(jstate, wc->startOffset);
-				JumbleExpr(jstate, wc->endOffset);
+				JumbleExpr(jstate, (Node *) wc->partitionClause, ctx);
+				JumbleExpr(jstate, (Node *) wc->orderClause, ctx);
+				JumbleExpr(jstate, wc->startOffset, ctx);
+				JumbleExpr(jstate, wc->endOffset, ctx);
 			}
 			break;
 		case T_CommonTableExpr:
@@ -785,7 +876,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 				/* we store the string name because RTE_CTE RTEs need it */
 				APP_JUMB_STRING(cte->ctename);
 				APP_JUMB(cte->ctematerialized);
-				JumbleQueryInternal(jstate, castNode(Query, cte->ctequery));
+				APP_JUMB_QUERY(castNode(Query, cte->ctequery));
 			}
 			break;
 		case T_SetOperationStmt:
@@ -794,24 +885,24 @@ JumbleExpr(JumbleState *jstate, Node *node)
 
 				APP_JUMB(setop->op);
 				APP_JUMB(setop->all);
-				JumbleExpr(jstate, setop->larg);
-				JumbleExpr(jstate, setop->rarg);
+				JumbleExpr(jstate, setop->larg, ctx);
+				JumbleExpr(jstate, setop->rarg, ctx);
 			}
 			break;
 		case T_RangeTblFunction:
 			{
 				RangeTblFunction *rtfunc = (RangeTblFunction *) node;
 
-				JumbleExpr(jstate, rtfunc->funcexpr);
+				JumbleExpr(jstate, rtfunc->funcexpr, ctx);
 			}
 			break;
 		case T_TableFunc:
 			{
 				TableFunc  *tablefunc = (TableFunc *) node;
 
-				JumbleExpr(jstate, tablefunc->docexpr);
-				JumbleExpr(jstate, tablefunc->rowexpr);
-				JumbleExpr(jstate, (Node *) tablefunc->colexprs);
+				JumbleExpr(jstate, tablefunc->docexpr, ctx);
+				JumbleExpr(jstate, tablefunc->rowexpr, ctx);
+				JumbleExpr(jstate, (Node *) tablefunc->colexprs, ctx);
 			}
 			break;
 		case T_TableSampleClause:
@@ -819,8 +910,8 @@ JumbleExpr(JumbleState *jstate, Node *node)
 				TableSampleClause *tsc = (TableSampleClause *) node;
 
 				APP_JUMB(tsc->tsmhandler);
-				JumbleExpr(jstate, (Node *) tsc->args);
-				JumbleExpr(jstate, (Node *) tsc->repeatable);
+				JumbleExpr(jstate, (Node *) tsc->args, ctx);
+				JumbleExpr(jstate, (Node *) tsc->repeatable, ctx);
 			}
 			break;
 		default:
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
index 7af6652f3e..0a05fccdba 100644
--- a/src/include/utils/queryjumble.h
+++ b/src/include/utils/queryjumble.h
@@ -33,12 +33,6 @@ typedef struct LocationLen
  */
 typedef struct JumbleState
 {
-	/* Jumble of current query tree */
-	unsigned char *jumble;
-
-	/* Number of bytes used in jumble[] */
-	Size		jumble_len;
-
 	/* Array of locations of constants that should be removed */
 	LocationLen *clocations;
 
@@ -50,8 +44,22 @@ typedef struct JumbleState
 
 	/* highest Param id we've seen, in order to start normalization correctly */
 	int			highest_extern_param_id;
+
 } JumbleState;
 
+typedef struct JumbleContext
+{
+	/* Jumble of current query tree */
+	unsigned char *jumble;
+
+	/* Number of bytes used in jumble[] */
+	Size		jumble_len;
+
+	/* Array of a range table entry hashes */
+	uint64	   *rte_hashes;
+	int			nhashes;
+} JumbleContext;
+
 /* Values for the compute_query_id GUC */
 enum ComputeQueryIdType
 {
diff --git a/src/include/utils/regproc.h b/src/include/utils/regproc.h
index 308a7faaa4..70f83c1961 100644
--- a/src/include/utils/regproc.h
+++ b/src/include/utils/regproc.h
@@ -35,5 +35,6 @@ extern char *format_operator(Oid operator_oid);
 extern char *format_operator_qualified(Oid operator_oid);
 extern void format_operator_parts(Oid operator_oid, List **objnames,
 								  List **objargs, bool missing_ok);
+extern char *regclassout_ext(Oid classid, bool forceQualify);
 
 #endif
-- 
2.25.1

#2Julien Rouhaud
rjuju123@gmail.com
In reply to: Andrey V. Lepikhov (#1)
Re: Make query ID more portable

Hi,

On Tue, Oct 12, 2021 at 4:12 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

QueryID is good tool for query analysis. I want to improve core jumbling
machinery in two ways:
1. QueryID value should survive dump/restore of a database (use fully
qualified name of table instead of relid).
2. QueryID could represent more general class of queries: for example,
it can be independent from permutation of tables in a FROM clause.

See the patch in attachment as an POC. Main idea here is to break
JumbleState down to a 'clocations' part that can be really interested in
a post parse hook and a 'context data', that needed to build query or
subquery signature (hash) and, I guess, isn't really needed in any
extensions.

There have been quite a lot of threads about that in the past, and
almost every time people wanted to change how the hash was computed.
So it seems to me that extensions would actually be quite interested
in that. This is even more the case now that an extension can be used
to replace the queryid calculation only and keep the rest of the
extension relying on it as is.

I think, it adds not much complexity and overhead.

I think the biggest change in your patch is:

  case RTE_RELATION:
- APP_JUMB(rte->relid);
- JumbleExpr(jstate, (Node *) rte->tablesample);
+ {
+ char *relname = regclassout_ext(rte->relid, true);
+
+ APP_JUMB_STRING(relname);
+ JumbleExpr(jstate, (Node *) rte->tablesample, ctx);
  APP_JUMB(rte->inh);
  break;

Have you done any benchmark on OLTP workload? Adding catalog access
there is likely to add significant overhead.

Also, why only using the fully qualified relation name for stable
hashes? At least operators and functions should also be treated the
same way. If you do that you will probably have way too much overhead
to be usable in a busy production environment. Why not using the new
possibility of 3rd party extension for the queryid calculation that
exactly suits your need?

#3Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Julien Rouhaud (#2)
Re: Make query ID more portable

On 12/10/21 13:35, Julien Rouhaud wrote:

Hi,

On Tue, Oct 12, 2021 at 4:12 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

See the patch in attachment as an POC. Main idea here is to break
JumbleState down to a 'clocations' part that can be really interested in
a post parse hook and a 'context data', that needed to build query or
subquery signature (hash) and, I guess, isn't really needed in any
extensions.

There have been quite a lot of threads about that in the past, and
almost every time people wanted to change how the hash was computed.
So it seems to me that extensions would actually be quite interested
in that. This is even more the case now that an extension can be used
to replace the queryid calculation only and keep the rest of the
extension relying on it as is.

Yes, I know. I have been using such self-made queryID for four years.
And I will use it further.
But core jumbling code is good, fast and much easier in support. The
purpose of this work is extending of jumbling to use in more flexible
way to avoid meaningless copying of this code to an extension.

I think, it adds not much complexity and overhead.

I think the biggest change in your patch is:

case RTE_RELATION:
- APP_JUMB(rte->relid);
- JumbleExpr(jstate, (Node *) rte->tablesample);
+ {
+ char *relname = regclassout_ext(rte->relid, true);
+
+ APP_JUMB_STRING(relname);
+ JumbleExpr(jstate, (Node *) rte->tablesample, ctx);
APP_JUMB(rte->inh);
break;

Have you done any benchmark on OLTP workload? Adding catalog access
there is likely to add significant overhead.

Yes, I should do benchmarking. But I guess, main goal of Query ID is
monitoring, that can be switched off, if necessary.
This part made for a demo. It can be replaced by a hook, for example.

Also, why only using the fully qualified relation name for stable
hashes? At least operators and functions should also be treated the
same way. If you do that you will probably have way too much overhead
to be usable in a busy production environment. Why not using the new
possibility of 3rd party extension for the queryid calculation that
exactly suits your need?

I fully agree with these arguments. This code is POC. Main part here is
breaking down JumbleState, using a local context for subqueries and
sorting of a range table entries hashes.
I think, we can call one routine (APP_JUMB_OBJECT(), as an example) for
all oids in this code. It would allow an extension to intercept this
call and replace oid with an arbitrary value.

--
regards,
Andrey Lepikhov
Postgres Professional

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrey Lepikhov (#3)
Re: Make query ID more portable

Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:

But core jumbling code is good, fast and much easier in support.

It won't be fast once you stick a bunch of catalog lookups into it.
I think this is fine as an extension, but it has no chance of being
accepted in core, just on performance grounds.

(I'm also not sure that the query ID calculation code is always/only
invoked in contexts where it's safe to do catalog accesses.)

A bigger issue is that query ID stability isn't something we are going
to promise on a large scale --- for example, what if a new release adds
some new fields to struct Query? So I'm not sure that "query IDs should
survive dump/reload" is a useful goal to consider. It's certainly not
something that could be reached by anything even remotely like the
existing code.

regards, tom lane

#5Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#4)
Re: Make query ID more portable

On Tue, Oct 12, 2021 at 09:40:47AM -0400, Tom Lane wrote:

Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:

But core jumbling code is good, fast and much easier in support.

It won't be fast once you stick a bunch of catalog lookups into it.
I think this is fine as an extension, but it has no chance of being
accepted in core, just on performance grounds.

(I'm also not sure that the query ID calculation code is always/only
invoked in contexts where it's safe to do catalog accesses.)

A bigger issue is that query ID stability isn't something we are going
to promise on a large scale --- for example, what if a new release adds
some new fields to struct Query? So I'm not sure that "query IDs should
survive dump/reload" is a useful goal to consider. It's certainly not
something that could be reached by anything even remotely like the
existing code.

Also, the current code handles renames of schemas and objects, but this
would not.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#6Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Bruce Momjian (#5)
Re: Make query ID more portable

On 12/10/21 18:45, Bruce Momjian wrote:

On Tue, Oct 12, 2021 at 09:40:47AM -0400, Tom Lane wrote:

Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:

But core jumbling code is good, fast and much easier in support.

Also, the current code handles renames of schemas and objects, but this
would not.

Yes, It is good option if an extension works only in the context of one
node. But my efforts are directed to the cross-instance usage of a
monitoring data. As an example, it may be useful for sharding.
Also, I guess for an user is essential to think that if he changed a
name of any object he also would change queries and reset monitoring
data, related on this object.

--
regards,
Andrey Lepikhov
Postgres Professional

#7Julien Rouhaud
rjuju123@gmail.com
In reply to: Andrey Lepikhov (#6)
Re: Make query ID more portable

On Thu, Oct 14, 2021 at 12:37 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 12/10/21 18:45, Bruce Momjian wrote:

On Tue, Oct 12, 2021 at 09:40:47AM -0400, Tom Lane wrote:

Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:

But core jumbling code is good, fast and much easier in support.

Also, the current code handles renames of schemas and objects, but this
would not.

Yes, It is good option if an extension works only in the context of one
node. But my efforts are directed to the cross-instance usage of a
monitoring data. As an example, it may be useful for sharding.
Also, I guess for an user is essential to think that if he changed a
name of any object he also would change queries and reset monitoring
data, related on this object.

What if someone wants to allow any form of partitioning without
changing to the ID, or ignore the schema because it's a multi tenant
db with dedicated roles?

I think that there are just too many arbitrary decisions that could be
made on what exactly should be a query identifier to have a single
in-core implementation. If you do sharding, you already have to
properly configure each node, so configuring your custom query id
extension shouldn't be a big problem.

#8Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Tom Lane (#4)
1 attachment(s)
Re: Make query ID more portable

On 12/10/21 18:40, Tom Lane wrote:

Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:

But core jumbling code is good, fast and much easier in support.

A bigger issue is that query ID stability isn't something we are going
to promise on a large scale --- for example, what if a new release adds
some new fields to struct Query? So I'm not sure that "query IDs should
survive dump/reload" is a useful goal to consider. It's certainly not
something that could be reached by anything even remotely like the
existing code.

Thank you for the explanation.
I think the problem of queryId is that is encapsulates two different
meanings:
1. It allows an extension to match an query on post parse and execution
stages. In this sense, queryId should be as unique as possible for each
query.
2. For pg_stat_statements purposes (and my project too) it represents an
query class and should be stable against permutations of range table
entries, clauses, e.t.c. For example:
"SELECT * FROM a,b;" and "SELECT * FROM b,a;" should have the same queryId.

This issue may be solved in an extension with next approach:
1. Force as unique value for queryId as extension wants in a post parse hook
2. Generalize the JumbleQuery routine code to generate a kind of query
class signature.

The attached patch is a first sketch for such change.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachments:

0001-Add-callback-for-jumbling-of-an-OID-value.patchtext/plain; charset=UTF-8; name=0001-Add-callback-for-jumbling-of-an-OID-value.patchDownload
From ef4033935b25c90fd8c5b6f10a0646a7ede385e3 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Wed, 13 Oct 2021 10:22:53 +0500
Subject: [PATCH] Add callback for jumbling of an OID value.

Query jumbling machinery depends on OIDs of objects. It is fine for
specific instance, because allows to survive an object rename. But it
is bad for a cross instance case. Even if you have the same code
version, the same schema, you can't guarantee stability of oids.

So, an extension can't rely on this technique

This patch changes an interface of the JumbleQuery routine to allow
an extension to use it for generation of an query signature and
use specific algorithm for oids jumbling.
---
 src/backend/commands/explain.c       |  6 +--
 src/backend/parser/analyze.c         | 12 ++---
 src/backend/tcop/postgres.c          |  6 +--
 src/backend/utils/misc/queryjumble.c | 81 ++++++++++++++--------------
 src/include/utils/queryjumble.h      | 14 ++++-
 5 files changed, 67 insertions(+), 52 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 10644dfac4..b2f10e828e 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -166,7 +166,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 {
 	ExplainState *es = NewExplainState();
 	TupOutputState *tstate;
-	JumbleState *jstate = NULL;
+	JumbleState jstate;
 	Query	   *query;
 	List	   *rewritten;
 	ListCell   *lc;
@@ -246,10 +246,10 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 
 	query = castNode(Query, stmt->query);
 	if (IsQueryIdEnabled())
-		jstate = JumbleQuery(query, pstate->p_sourcetext);
+		query->queryId = JumbleQuery(query, pstate->p_sourcetext, NULL, &jstate);
 
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query, jstate);
+		(*post_parse_analyze_hook) (pstate, query, &jstate);
 
 	/*
 	 * Parse analysis was done already, but we still have to run the rule
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 146ee8dd1e..f9c3991a94 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -113,7 +113,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
-	JumbleState *jstate = NULL;
+	JumbleState jstate;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -127,10 +127,10 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 	query = transformTopLevelStmt(pstate, parseTree);
 
 	if (IsQueryIdEnabled())
-		jstate = JumbleQuery(query, sourceText);
+		query->queryId = JumbleQuery(query, sourceText, NULL, &jstate);
 
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query, jstate);
+		(*post_parse_analyze_hook) (pstate, query, &jstate);
 
 	free_parsestate(pstate);
 
@@ -152,7 +152,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 {
 	ParseState *pstate = make_parsestate(NULL);
 	Query	   *query;
-	JumbleState *jstate = NULL;
+	JumbleState jstate;
 
 	Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -166,10 +166,10 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 	check_variable_parameters(pstate, query);
 
 	if (IsQueryIdEnabled())
-		jstate = JumbleQuery(query, sourceText);
+		query->queryId = JumbleQuery(query, sourceText, NULL, &jstate);
 
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query, jstate);
+		(*post_parse_analyze_hook) (pstate, query, &jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 0775abe35d..7dbffa409e 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -685,7 +685,7 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 	ParseState *pstate;
 	Query	   *query;
 	List	   *querytree_list;
-	JumbleState *jstate = NULL;
+	JumbleState jstate;
 
 	Assert(query_string != NULL);	/* required as of 8.4 */
 
@@ -705,10 +705,10 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 	query = transformTopLevelStmt(pstate, parsetree);
 
 	if (IsQueryIdEnabled())
-		jstate = JumbleQuery(query, query_string);
+		query->queryId = JumbleQuery(query, query_string, NULL, &jstate);
 
 	if (post_parse_analyze_hook)
-		(*post_parse_analyze_hook) (pstate, query, jstate);
+		(*post_parse_analyze_hook) (pstate, query, &jstate);
 
 	free_parsestate(pstate);
 
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index 9f2cd1f127..4b6ef897aa 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -36,6 +36,7 @@
 #include "miscadmin.h"
 #include "parser/scansup.h"
 #include "utils/queryjumble.h"
+#include "utils/syscache.h"
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
@@ -46,8 +47,6 @@ int			compute_query_id = COMPUTE_QUERY_ID_AUTO;
 bool		query_id_enabled = false;
 
 static uint64 compute_utility_query_id(const char *str, int query_location, int query_len);
-static void AppendJumble(JumbleState *jstate,
-						 const unsigned char *item, Size size);
 static void JumbleQueryInternal(JumbleState *jstate, Query *query);
 static void JumbleRangeTable(JumbleState *jstate, List *rtable);
 static void JumbleRowMarks(JumbleState *jstate, List *rowMarks);
@@ -97,23 +96,21 @@ CleanQuerytext(const char *query, int *location, int *len)
 	return query;
 }
 
-JumbleState *
-JumbleQuery(Query *query, const char *querytext)
+uint64
+JumbleQuery(Query *query, const char *querytext,
+			ObjectJumbleCallbackFn callback, JumbleState *jstate)
 {
-	JumbleState *jstate = NULL;
-
+	uint64 query_hash;
 	Assert(IsQueryIdEnabled());
 
 	if (query->utilityStmt)
 	{
-		query->queryId = compute_utility_query_id(querytext,
+		query_hash = compute_utility_query_id(querytext,
 												  query->stmt_location,
 												  query->stmt_len);
 	}
 	else
 	{
-		jstate = (JumbleState *) palloc(sizeof(JumbleState));
-
 		/* Set up workspace for query jumbling */
 		jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
 		jstate->jumble_len = 0;
@@ -122,10 +119,11 @@ JumbleQuery(Query *query, const char *querytext)
 			palloc(jstate->clocations_buf_size * sizeof(LocationLen));
 		jstate->clocations_count = 0;
 		jstate->highest_extern_param_id = 0;
+		jstate->oid_jumble_callback = callback;
 
 		/* Compute query ID and mark the Query node with it */
 		JumbleQueryInternal(jstate, query);
-		query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
+		query_hash = DatumGetUInt64(hash_any_extended(jstate->jumble,
 														  jstate->jumble_len,
 														  0));
 
@@ -133,11 +131,11 @@ JumbleQuery(Query *query, const char *querytext)
 		 * If we are unlucky enough to get a hash of zero, use 1 instead, to
 		 * prevent confusion with the utility-statement case.
 		 */
-		if (query->queryId == UINT64CONST(0))
-			query->queryId = UINT64CONST(1);
+		if (query_hash == UINT64CONST(0))
+			query_hash = UINT64CONST(1);
 	}
 
-	return jstate;
+	return query_hash;
 }
 
 /*
@@ -185,7 +183,7 @@ compute_utility_query_id(const char *query_text, int query_location, int query_l
  * AppendJumble: Append a value that is substantive in a given query to
  * the current jumble.
  */
-static void
+void
 AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
 {
 	unsigned char *jumble = jstate->jumble;
@@ -226,6 +224,11 @@ AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
 	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
 #define APP_JUMB_STRING(str) \
 	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
+#define APP_JUMB_OBJECT(cacheId, oid) \
+	if (jstate->oid_jumble_callback) \
+		(jstate->oid_jumble_callback)(jstate, cacheId, oid); \
+	else \
+		AppendJumble(jstate, (const unsigned char *) &(oid), sizeof(Oid));
 
 /*
  * JumbleQueryInternal: Selectively serialize the query tree, appending
@@ -280,7 +283,7 @@ JumbleRangeTable(JumbleState *jstate, List *rtable)
 		switch (rte->rtekind)
 		{
 			case RTE_RELATION:
-				APP_JUMB(rte->relid);
+				APP_JUMB_OBJECT(RELOID, rte->relid);
 				JumbleExpr(jstate, (Node *) rte->tablesample);
 				APP_JUMB(rte->inh);
 				break;
@@ -388,7 +391,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 				Const	   *c = (Const *) node;
 
 				/* We jumble only the constant's type, not its value */
-				APP_JUMB(c->consttype);
+				APP_JUMB_OBJECT(TYPEOID, c->consttype);
 				/* Also, record its parse location for query normalization */
 				RecordConstLocation(jstate, c->location);
 			}
@@ -399,7 +402,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 
 				APP_JUMB(p->paramkind);
 				APP_JUMB(p->paramid);
-				APP_JUMB(p->paramtype);
+				APP_JUMB_OBJECT(TYPEOID, p->paramtype);
 				/* Also, track the highest external Param id */
 				if (p->paramkind == PARAM_EXTERN &&
 					p->paramid > jstate->highest_extern_param_id)
@@ -410,7 +413,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 			{
 				Aggref	   *expr = (Aggref *) node;
 
-				APP_JUMB(expr->aggfnoid);
+				APP_JUMB_OBJECT(AGGFNOID, expr->aggfnoid);
 				JumbleExpr(jstate, (Node *) expr->aggdirectargs);
 				JumbleExpr(jstate, (Node *) expr->args);
 				JumbleExpr(jstate, (Node *) expr->aggorder);
@@ -430,7 +433,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 			{
 				WindowFunc *expr = (WindowFunc *) node;
 
-				APP_JUMB(expr->winfnoid);
+				APP_JUMB_OBJECT(PROCOID, expr->winfnoid);
 				APP_JUMB(expr->winref);
 				JumbleExpr(jstate, (Node *) expr->args);
 				JumbleExpr(jstate, (Node *) expr->aggfilter);
@@ -450,7 +453,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 			{
 				FuncExpr   *expr = (FuncExpr *) node;
 
-				APP_JUMB(expr->funcid);
+				APP_JUMB_OBJECT(PROCOID, expr->funcid);
 				JumbleExpr(jstate, (Node *) expr->args);
 			}
 			break;
@@ -468,7 +471,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 			{
 				OpExpr	   *expr = (OpExpr *) node;
 
-				APP_JUMB(expr->opno);
+				APP_JUMB_OBJECT(OPEROID, expr->opno);
 				JumbleExpr(jstate, (Node *) expr->args);
 			}
 			break;
@@ -476,7 +479,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 			{
 				ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node;
 
-				APP_JUMB(expr->opno);
+				APP_JUMB_OBJECT(OPEROID, expr->opno);
 				APP_JUMB(expr->useOr);
 				JumbleExpr(jstate, (Node *) expr->args);
 			}
@@ -519,7 +522,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 			{
 				RelabelType *rt = (RelabelType *) node;
 
-				APP_JUMB(rt->resulttype);
+				APP_JUMB_OBJECT(TYPEOID, rt->resulttype);
 				JumbleExpr(jstate, (Node *) rt->arg);
 			}
 			break;
@@ -527,7 +530,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 			{
 				CoerceViaIO *cio = (CoerceViaIO *) node;
 
-				APP_JUMB(cio->resulttype);
+				APP_JUMB_OBJECT(TYPEOID, cio->resulttype);
 				JumbleExpr(jstate, (Node *) cio->arg);
 			}
 			break;
@@ -535,7 +538,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 			{
 				ArrayCoerceExpr *acexpr = (ArrayCoerceExpr *) node;
 
-				APP_JUMB(acexpr->resulttype);
+				APP_JUMB_OBJECT(TYPEOID, acexpr->resulttype);
 				JumbleExpr(jstate, (Node *) acexpr->arg);
 				JumbleExpr(jstate, (Node *) acexpr->elemexpr);
 			}
@@ -544,7 +547,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 			{
 				ConvertRowtypeExpr *crexpr = (ConvertRowtypeExpr *) node;
 
-				APP_JUMB(crexpr->resulttype);
+				APP_JUMB_OBJECT(TYPEOID, crexpr->resulttype);
 				JumbleExpr(jstate, (Node *) crexpr->arg);
 			}
 			break;
@@ -552,7 +555,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 			{
 				CollateExpr *ce = (CollateExpr *) node;
 
-				APP_JUMB(ce->collOid);
+				APP_JUMB_OBJECT(COLLOID, ce->collOid);
 				JumbleExpr(jstate, (Node *) ce->arg);
 			}
 			break;
@@ -575,7 +578,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 			{
 				CaseTestExpr *ct = (CaseTestExpr *) node;
 
-				APP_JUMB(ct->typeId);
+				APP_JUMB_OBJECT(TYPEOID, ct->typeId);
 			}
 			break;
 		case T_ArrayExpr:
@@ -642,7 +645,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 			{
 				CoerceToDomain *cd = (CoerceToDomain *) node;
 
-				APP_JUMB(cd->resulttype);
+				APP_JUMB_OBJECT(TYPEOID, cd->resulttype);
 				JumbleExpr(jstate, (Node *) cd->arg);
 			}
 			break;
@@ -650,14 +653,14 @@ JumbleExpr(JumbleState *jstate, Node *node)
 			{
 				CoerceToDomainValue *cdv = (CoerceToDomainValue *) node;
 
-				APP_JUMB(cdv->typeId);
+				APP_JUMB_OBJECT(TYPEOID, cdv->typeId);
 			}
 			break;
 		case T_SetToDefault:
 			{
 				SetToDefault *sd = (SetToDefault *) node;
 
-				APP_JUMB(sd->typeId);
+				APP_JUMB_OBJECT(TYPEOID, sd->typeId);
 			}
 			break;
 		case T_CurrentOfExpr:
@@ -674,16 +677,16 @@ JumbleExpr(JumbleState *jstate, Node *node)
 			{
 				NextValueExpr *nve = (NextValueExpr *) node;
 
-				APP_JUMB(nve->seqid);
-				APP_JUMB(nve->typeId);
+				APP_JUMB_OBJECT(SEQRELID, nve->seqid);
+				APP_JUMB_OBJECT(TYPEOID, nve->typeId);
 			}
 			break;
 		case T_InferenceElem:
 			{
 				InferenceElem *ie = (InferenceElem *) node;
 
-				APP_JUMB(ie->infercollid);
-				APP_JUMB(ie->inferopclass);
+				APP_JUMB_OBJECT(COLLOID, ie->infercollid);
+				APP_JUMB_OBJECT(CLAOID, ie->inferopclass);
 				JumbleExpr(jstate, ie->expr);
 			}
 			break;
@@ -732,7 +735,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 				JumbleExpr(jstate, conf->arbiterWhere);
 				JumbleExpr(jstate, (Node *) conf->onConflictSet);
 				JumbleExpr(jstate, conf->onConflictWhere);
-				APP_JUMB(conf->constraint);
+				APP_JUMB_OBJECT(CONSTROID, conf->constraint);
 				APP_JUMB(conf->exclRelIndex);
 				JumbleExpr(jstate, (Node *) conf->exclRelTlist);
 			}
@@ -754,8 +757,8 @@ JumbleExpr(JumbleState *jstate, Node *node)
 				SortGroupClause *sgc = (SortGroupClause *) node;
 
 				APP_JUMB(sgc->tleSortGroupRef);
-				APP_JUMB(sgc->eqop);
-				APP_JUMB(sgc->sortop);
+				APP_JUMB_OBJECT(OPEROID, sgc->eqop);
+				APP_JUMB_OBJECT(OPEROID, sgc->sortop);
 				APP_JUMB(sgc->nulls_first);
 			}
 			break;
@@ -818,7 +821,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 			{
 				TableSampleClause *tsc = (TableSampleClause *) node;
 
-				APP_JUMB(tsc->tsmhandler);
+				APP_JUMB_OBJECT(PROCOID, tsc->tsmhandler);
 				JumbleExpr(jstate, (Node *) tsc->args);
 				JumbleExpr(jstate, (Node *) tsc->repeatable);
 			}
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
index 7af6652f3e..3f70f5d2fb 100644
--- a/src/include/utils/queryjumble.h
+++ b/src/include/utils/queryjumble.h
@@ -27,6 +27,12 @@ typedef struct LocationLen
 	int			length;			/* length in bytes, or -1 to ignore */
 } LocationLen;
 
+struct JumbleState;
+
+typedef void (*ObjectJumbleCallbackFn) (struct JumbleState *jstate,
+										int cacheId,
+										Oid oid);
+
 /*
  * Working state for computing a query jumble and producing a normalized
  * query string
@@ -50,6 +56,8 @@ typedef struct JumbleState
 
 	/* highest Param id we've seen, in order to start normalization correctly */
 	int			highest_extern_param_id;
+
+	ObjectJumbleCallbackFn oid_jumble_callback;
 } JumbleState;
 
 /* Values for the compute_query_id GUC */
@@ -65,7 +73,11 @@ extern int	compute_query_id;
 
 
 extern const char *CleanQuerytext(const char *query, int *location, int *len);
-extern JumbleState *JumbleQuery(Query *query, const char *querytext);
+extern uint64 JumbleQuery(Query *query, const char *querytext,
+						  ObjectJumbleCallbackFn callback,
+						  JumbleState *jstate);
+extern void AppendJumble(JumbleState *jstate, const unsigned char *item,
+						 Size size);
 extern void EnableQueryId(void);
 
 extern bool query_id_enabled;
-- 
2.33.0

#9Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Julien Rouhaud (#7)
Re: Make query ID more portable

On 14/10/21 10:40, Julien Rouhaud wrote:

On Thu, Oct 14, 2021 at 12:37 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 12/10/21 18:45, Bruce Momjian wrote:

On Tue, Oct 12, 2021 at 09:40:47AM -0400, Tom Lane wrote:

Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:

I think that there are just too many arbitrary decisions that could be
made on what exactly should be a query identifier to have a single
in-core implementation.

Yes, and I use such custom decision too. But core jumbling code
implements good idea and can be generalized for reuse. Patch from
previous letter and breaking down of JumbleState can allow coders to
implement their codes based on queryjumble.c module with smaller changes.

If you do sharding, you already have to
properly configure each node, so configuring your custom query id
extension shouldn't be a big problem.

My project is about adaptive query optimization techniques. It is not
obvious how to match (without a field in Query struct) a post parse and
an execution phases because of nested queries.
Also, if we use queryId in an extension, we interfere with
pg_stat_statements.

--
regards,
Andrey Lepikhov
Postgres Professional