RE: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

Started by Bykov Ivan10 months ago49 messages

i.bykov@modernsys.ru

10 months ago

2 attachment(s)

Hello!

Last time, I forgot to attach the patches.
The problem still persists in the 17.3 release.

Solution One
============
The simplest way to fix the problem is to place the scalar field used in the query ID calculation
between similar subnodes.
A patch for this solution is attached below (0001-Query-ID-Calculation-Fix-Variant-A.patch).

Solution Two
============
Alternatively, we can change the hash sum when we encounter an empty node.
This approach may impact performance but will protect us from such errors in the future.
A patch for this solution is attached below (0001-Query-ID-Calculation-Fix-Variant-B.patch).

======
SELECT version();
version
-------------------------------------------------------------------------------------------------
PostgreSQL 17.3 on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit

SET compute_query_id = on;

/* LIMIT / OFFSET */
EXPLAIN (VERBOSE) SELECT "oid" FROM pg_class LIMIT 1;

QUERY PLAN
----------------------------------------------------------------------------
Limit (cost=0.00..0.04 rows=1 width=4)
Output: oid
-> Seq Scan on pg_catalog.pg_class (cost=0.00..18.15 rows=415 width=4)
Output: oid
Query Identifier: 5185884322440896420

EXPLAIN (VERBOSE) SELECT "oid" FROM pg_class OFFSET 1;
QUERY PLAN
----------------------------------------------------------------------------
Limit (cost=0.04..18.15 rows=414 width=4)
Output: oid
-> Seq Scan on pg_catalog.pg_class (cost=0.00..18.15 rows=415 width=4)
Output: oid
Query Identifier: 5185884322440896420

/* DISTINCT / ORDER BY */
EXPLAIN (VERBOSE) SELECT DISTINCT "oid" FROM pg_class;

QUERY PLAN
------------------------------------------------------------------------------------------------------------
Unique (cost=0.27..23.54 rows=415 width=4)
Output: oid
-> Index Only Scan using pg_class_oid_index on pg_catalog.pg_class (cost=0.27..22.50 rows=415 width=4)
Output: oid
Query Identifier: 751948508603549510

EXPLAIN (VERBOSE) SELECT "oid" FROM pg_class ORDER BY "oid";

QUERY PLAN
------------------------------------------------------------------------------------------------------
Index Only Scan using pg_class_oid_index on pg_catalog.pg_class (cost=0.27..22.50 rows=415 width=4)
Output: oid
Query Identifier: 751948508603549510

Attachments:

0001-Query-ID-Calculation-Fix-Variant-A.patchapplication/octet-stream; name=0001-Query-ID-Calculation-Fix-Variant-A.patchDownload

From 9310c12547689ddeaaf2d47061eabf88b0951a73 Mon Sep 17 00:00:00 2001
From: Bykov Ivan <i.bykov@modernsys.ru>
Date: Fri, 22 Nov 2024 17:19:59 +0500
Subject: [PATCH] Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT /
 OFFSET

In some cases, we may encounter different query trees with the same IDs.
For two structurally similar query subnodes, the query trees may
look like this:

QueryA->subNodeOne = Value X;
QueryA->subNodeTwo = NULL;
QueryB->subNodeOne = NULL;
QueryB->subNodeTwo = Value X;

When the query jumble walker calculates the query ID, it traverses
the Query members from top to bottom and generates the same IDs
for these two queries because it does not change the hash value
when visiting an empty node (= NULL).

There are two pairs of subnodes that can trigger this error:
- distinctClause and sortClause (both contain a list of SortGroupClauses);
- limitOffset and limitCount (both contain an int8 expression).

To fix this problem, the definition of the Query node has been
changed so that a scalar field is placed between similar subnodes.
---
 src/include/nodes/parsenodes.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 67c90a2bd3..4710ffb6d9 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -208,11 +208,11 @@ typedef struct Query
 
 	List	   *distinctClause; /* a list of SortGroupClause's */
 
-	List	   *sortClause;		/* a list of SortGroupClause's */
-
 	Node	   *limitOffset;	/* # of result tuples to skip (int8 expr) */
-	Node	   *limitCount;		/* # of result tuples to return (int8 expr) */
 	LimitOption limitOption;	/* limit type */
+	Node	   *limitCount;		/* # of result tuples to return (int8 expr) */
+
+	List	   *sortClause;		/* a list of SortGroupClause's */
 
 	List	   *rowMarks;		/* a list of RowMarkClause's */
 
-- 
2.39.2

0001-Query-ID-Calculation-Fix-Variant-B.patchapplication/octet-stream; name=0001-Query-ID-Calculation-Fix-Variant-B.patchDownload

From d3f9b11e8bcf5983eeccfb18b1df43def270440a Mon Sep 17 00:00:00 2001
From: Bykov Ivan <i.bykov@modernsys.ru>
Date: Fri, 22 Nov 2024 17:35:10 +0500
Subject: [PATCH] Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT /
 OFFSET

In some cases, we may encounter different query trees with the same IDs.
For two structurally similar query subnodes, the query trees may
look like this:

QueryA->subNodeOne = Value X;
QueryA->subNodeTwo = NULL;
QueryB->subNodeOne = NULL;
QueryB->subNodeTwo = Value X;

When the query jumble walker calculates the query ID, it traverses
the Query members from top to bottom and generates the same IDs
for these two queries because it does not change the hash value
when visiting an empty node (= NULL).

There are two pairs of subnodes that can trigger this error:
- distinctClause and sortClause (both contain a list of SortGroupClauses);
- limitOffset and limitCount (both contain an int8 expression).

To fix this problem, for every empty node in the Query tree, a '0' character
is added to the jumble buffer, which is used for ID calculation.
---
 src/backend/nodes/queryjumblefuncs.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index 129fb44709..225a49ebc3 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -238,7 +238,12 @@ _jumbleNode(JumbleState *jstate, Node *node)
 	Node	   *expr = node;
 
 	if (expr == NULL)
+	{
+		const unsigned char null_symbol = '\0';
+
+		AppendJumble(jstate, &null_symbol, sizeof(unsigned char));
 		return;
+	}
 
 	/* Guard against stack overflow due to overly complex expressions */
 	check_stack_depth();
-- 
2.39.2

Bykov Ivan

i.bykov@modernsys.ru

10 months ago

In reply to: Bykov Ivan (#1)

Here is bug description from
/messages/by-id/ca447b72d15745b9a877fad7e258407a@localhost.localdomain

Problem
=======
In some cases, we could have same IDs for not identical query trees.
For two structurally similar query subnodes, we may have query
trees that look like this:
----
QueryA->subNodeOne = Value X;
QueryA->subNodeTwo = NULL;
QueryB->subNodeOne = NULL;
QueryB->subNodeTwo = Value X;
----
When the query jumble walker calculates the query ID, it traverses the
Query members from top to bottom and generates the same IDs for these
two queries because it does not change the hash value when visiting
an empty node (= NULL).
Here is an example (the collection of jstate->jumble is omitted):
----
/* QueryA_ID = AAAA */
QueryA->subNodeOne = &(Value X);
/* QueryA_ID = hash_any_extended(&(Value X), sizeof(Value X), QueryA_ID) = BBBB */
QueryA->subNodeTwo = NULL;
/* QueryA_ID = BBBB; */
/* QueryB_ID = AAAA */
QueryB->subNodeOne = NULL;
/* QueryB_ID = AAAA; */
QueryB->subNodeTwo = &(Value X);
/* QueryB_ID = hash_any_extended(&(Value X), sizeof(Value X), QueryB_ID) = BBBB */
----
There are two pairs of subnodes that can trigger this error:
- distinctClause and sortClause (both contain a list of SortGroupClauses);
- limitOffset and limitCount (both contain an int8 expression).
Here is an example of such errors (all queries run on REL_17_0):
----
SET compute_query_id = on;
/* DISTINCT / ORDER BY *************************************/
EXPLAIN (VERBOSE) SELECT DISTINCT "oid" FROM pg_class;
/* Query Identifier: 751948508603549510 */
EXPLAIN (VERBOSE) SELECT "oid" FROM pg_class ORDER BY "oid";
/* Query Identifier: 751948508603549510 */
/* LIMIT / OFFSET ******************************************/
EXPLAIN (VERBOSE) SELECT "oid" FROM pg_class LIMIT 1;
/* Query Identifier: 5185884322440896420 */
EXPLAIN (VERBOSE) SELECT "oid" FROM pg_class OFFSET 1;
/* Query Identifier: 5185884322440896420 */
----
Solution One
============
The simplest way to fix the problem is to place the scalar field used in
the query ID calculation between similar subnodes. A patch for this
solution is attached below (0001-Query-ID-Calculation-Fix-Variant-A.patch).
Here is an example:
----
/* QueryA_ID = AAAA */
QueryA->subNodeOne = &(Value X);
/* QueryA_ID = hash_any_extended(&(Value X), sizeof(Value X), QueryA_ID) = BBBB */
QueryA->scalar = Value Y;
/* QueryA_ID = hash_any_extended(&(Value Y), sizeof(Value Y), QueryA_ID) = CCCC */
QueryA->subNodeTwo = NULL;
/* QueryA_ID = CCCC; */
/* QueryB_ID = AAAA */
QueryB->subNodeOne = NULL;
/* QueryB_ID = AAAA; */
QueryB->scalar = Value Y;
/* QueryB_ID = hash_any_extended(&(Value Y), sizeof(Value Y), QueryB_ID) = DDDD */
QueryB->subNodeOne = &(Value X);
/* QueryB_ID = hash_any_extended(&(Value X), sizeof(Value X), QueryB_ID) = EEEE */
----
Solution Two
============
Alternatively, we can change the hash sum when we encounter an empty node.
This approach may impact performance but will protect us from such errors
in the future. A patch for this solution is attached below
(0001-Query-ID-Calculation-Fix-Variant-B.patch).
Here is an example:
----
/* QueryA_ID = AAAA */
QueryA->subNodeOne = &(Value X);
/* QueryA_ID = hash_any_extended(&(Value Y), sizeof(Value Y), QueryA_ID) = BBBB */
QueryA->subNodeTwo = NULL;
/* QueryA_ID = hash_any_extended(&('\0'), sizeof(char), QueryA_ID) = CCCC */
/* QueryB_ID = AAAA */
QueryB->subNodeOne = NULL;
/* QueryB_ID = hash_any_extended(&('\0'), sizeof(char), QueryB_ID) = DDDD */
QueryB->subNodeOne = &(Value X);
/* QueryB_ID = hash_any_extended(&(Value X), sizeof(Value X), QueryB_ID) = EEEE */
----

From: Быков Иван Александрович
Sent: Thursday, March 6, 2025 7:32 PM
To: 'pgsql-hackers@postgresql.org' <pgsql-hackers@postgresql.org>
Subject: RE: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

Hello!
Last time, I forgot to attach the patches.
The problem still persists in the 17.3 release.
Solution One
============
The simplest way to fix the problem is to place the scalar field used in the query ID calculation
between similar subnodes.
A patch for this solution is attached below (0001-Query-ID-Calculation-Fix-Variant-A.patch).
Solution Two
============
Alternatively, we can change the hash sum when we encounter an empty node.
This approach may impact performance but will protect us from such errors in the future.
A patch for this solution is attached below (0001-Query-ID-Calculation-Fix-Variant-B.patch).

SET compute_query_id = on;

/* LIMIT / OFFSET */
EXPLAIN (VERBOSE) SELECT "oid" FROM pg_class LIMIT 1;

/* DISTINCT / ORDER BY */
EXPLAIN (VERBOSE) SELECT DISTINCT "oid" FROM pg_class;

EXPLAIN (VERBOSE) SELECT "oid" FROM pg_class ORDER BY "oid";

Import Notes

Resolved by subject fallback

Bykov Ivan

i.bykov@modernsys.ru

10 months ago

In reply to: Bykov Ivan (#2)

2 attachment(s)

Here is
0001-Query-ID-Calculation-Fix-Variant-A.patch
and
0001-Query-ID-Calculation-Fix-Variant-B.patch

Attachments:

0001-Query-ID-Calculation-Fix-Variant-B.patchapplication/octet-stream; name=0001-Query-ID-Calculation-Fix-Variant-B.patchDownload

From d3f9b11e8bcf5983eeccfb18b1df43def270440a Mon Sep 17 00:00:00 2001
From: Bykov Ivan <i.bykov@modernsys.ru>
Date: Fri, 22 Nov 2024 17:35:10 +0500
Subject: [PATCH] Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT /
 OFFSET

In some cases, we may encounter different query trees with the same IDs.
For two structurally similar query subnodes, the query trees may
look like this:

QueryA->subNodeOne = Value X;
QueryA->subNodeTwo = NULL;
QueryB->subNodeOne = NULL;
QueryB->subNodeTwo = Value X;

When the query jumble walker calculates the query ID, it traverses
the Query members from top to bottom and generates the same IDs
for these two queries because it does not change the hash value
when visiting an empty node (= NULL).

There are two pairs of subnodes that can trigger this error:
- distinctClause and sortClause (both contain a list of SortGroupClauses);
- limitOffset and limitCount (both contain an int8 expression).

To fix this problem, for every empty node in the Query tree, a '0' character
is added to the jumble buffer, which is used for ID calculation.
---
 src/backend/nodes/queryjumblefuncs.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index 129fb44709..225a49ebc3 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -238,7 +238,12 @@ _jumbleNode(JumbleState *jstate, Node *node)
 	Node	   *expr = node;
 
 	if (expr == NULL)
+	{
+		const unsigned char null_symbol = '\0';
+
+		AppendJumble(jstate, &null_symbol, sizeof(unsigned char));
 		return;
+	}
 
 	/* Guard against stack overflow due to overly complex expressions */
 	check_stack_depth();
-- 
2.39.2

0001-Query-ID-Calculation-Fix-Variant-A.patchapplication/octet-stream; name=0001-Query-ID-Calculation-Fix-Variant-A.patchDownload

From 9310c12547689ddeaaf2d47061eabf88b0951a73 Mon Sep 17 00:00:00 2001
From: Bykov Ivan <i.bykov@modernsys.ru>
Date: Fri, 22 Nov 2024 17:19:59 +0500
Subject: [PATCH] Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT /
 OFFSET

In some cases, we may encounter different query trees with the same IDs.
For two structurally similar query subnodes, the query trees may
look like this:

QueryA->subNodeOne = Value X;
QueryA->subNodeTwo = NULL;
QueryB->subNodeOne = NULL;
QueryB->subNodeTwo = Value X;

When the query jumble walker calculates the query ID, it traverses
the Query members from top to bottom and generates the same IDs
for these two queries because it does not change the hash value
when visiting an empty node (= NULL).

There are two pairs of subnodes that can trigger this error:
- distinctClause and sortClause (both contain a list of SortGroupClauses);
- limitOffset and limitCount (both contain an int8 expression).

To fix this problem, the definition of the Query node has been
changed so that a scalar field is placed between similar subnodes.
---
 src/include/nodes/parsenodes.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 67c90a2bd3..4710ffb6d9 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -208,11 +208,11 @@ typedef struct Query
 
 	List	   *distinctClause; /* a list of SortGroupClause's */
 
-	List	   *sortClause;		/* a list of SortGroupClause's */
-
 	Node	   *limitOffset;	/* # of result tuples to skip (int8 expr) */
-	Node	   *limitCount;		/* # of result tuples to return (int8 expr) */
 	LimitOption limitOption;	/* limit type */
+	Node	   *limitCount;		/* # of result tuples to return (int8 expr) */
+
+	List	   *sortClause;		/* a list of SortGroupClause's */
 
 	List	   *rowMarks;		/* a list of RowMarkClause's */
 
-- 
2.39.2

Sami Imseih

samimseih@gmail.com

10 months ago

In reply to: Bykov Ivan (#1)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

Hi,

It seems like there are multiple threads on this topic. This is the
original [0]/messages/by-id/ca447b72d15745b9a877fad7e258407a@localhost.localdomain, but I suggest continuing the discussion in this thread
since it includes the examples and patches.

Regarding the issue itself, query jumbling behavior is often subjective,
making it difficult to classify as a bug. I'm not entirely sure this
qualifies as a bug either, but I do believe it should be addressed.

As I understand it, two nodes defined one after the other and in which both
could end up with the same expressions when traversed cannot be differentiated
by query jumbling when one of them is NULL. In this case, queryJumbling can't
differentiate between when limitOffset of LimitOption and they both result with
a similar FuncExpr.

By rearranging them as done in variant A, the position where the expression
is appended in the jumble changes, leading to a different hash. I just
don't like
this solution because it requires one to carefully construct a struct,
but it maybe
the best out of the other options.

Variant B is not acceptable IMO as it adds a whole bunch of null-terminators
unnecessarily. For example, in a simple "select 1", (expr == NULL) is
true 19 times,
so that is an extra 19 bytes.

I think a third option is to add a new pg_node_attr called "query_jumble_null"
and be very selective on which fields should append a null-terminator to the
jumble when the expression is null

The queryjumblefuncs.c could look like the below. JUMBLE_NULL will
be responsible for appending the null terminator.

"""
static void
_jumbleQuery(JumbleState *jstate, Node *node)
{
Query *expr = (Query *) node;
...
......
.......
if (!expr->limitOffset)
{
JUMBLE_NULL();
}
else
{
JUMBLE_NODE(limitOffset);
}
if (!expr->limitCount)
{
JUMBLE_NULL();
}
else
{
JUMBLE_NODE(limitCount);
}
"""

What do you think? Maybe someone can suggest a better solution?

Regards,

Sami Imseih
Amazon Web Services (AWS)

[0]: /messages/by-id/ca447b72d15745b9a877fad7e258407a@localhost.localdomain

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: Sami Imseih (#4)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Thu, Mar 06, 2025 at 06:44:27PM -0600, Sami Imseih wrote:

Regarding the issue itself, query jumbling behavior is often subjective,
making it difficult to classify as a bug. I'm not entirely sure this
qualifies as a bug either, but I do believe it should be addressed.

I would call that a bug and something that we should fix, but not
something that we can really backpatch as this has unfortunately an
impact on monitoring tools. Stability takes priority in this area in
already released branches.

By rearranging them as done in variant A, the position where the expression
is appended in the jumble changes, leading to a different hash. I just
don't like
this solution because it requires one to carefully construct a struct,
but it maybe the best out of the other options.

I prefer something like variant A. It would not be the first time we
are manipulating the structure of the parse nodes for the sake of
making the query jumbling more transparent. An advantage of depending
on the structure definition is that we can document the expectation in
the header, and not hide it in the middle of the jumble code.

Variant B is not acceptable IMO as it adds a whole bunch of null-terminators
unnecessarily. For example, in a simple "select 1", (expr == NULL) is
true 19 times,
so that is an extra 19 bytes.

Variant B is not acceptable here.

I think a third option is to add a new pg_node_attr called "query_jumble_null"
and be very selective on which fields should append a null-terminator to the
jumble when the expression is null

The queryjumblefuncs.c could look like the below. JUMBLE_NULL will
be responsible for appending the null terminator.

Not much a fan of that. For the sake of this thread, I'd still go
with the simplicity of A. And please, let's add a couple of queries
in pgss to track that these lead to two different entries.

Another option that I was thinking about and could be slightly cleaner
is the addition of an extra field in Query that is set when we go
through a offset_clause in the parser. It could just be a boolean,
false by default. We have been using this practice in in
DeallocateStmt, for example. And there are only a few places where
limitOffset is set. However, I'd rather discard this idea as
transformSelectStmt() has also the idea to transform a LIMIT clause
into an OFFSET clause, changing a Query representation. And note that
we calculate the query jumbling after applying the query
transformation. For these reasons, variant A where we put the
LimitOption between the two int8 expression nodes feels like the
"okay" approach here. But we must document this expectation in the
structure, and check for more grammar variants of LIMIT and OFFSET
clauses in pgss.

Another concept would be to add into the jumble the offset of a Node
in the upper structure we are jumbling, but this requires keeping a
track of all the nested things, which would make the query jumbling
code much more complicated and more expensive, for sure.

I'd also recommend to be careful and double-check all these
transformation assumptions for LIMIT and OFFSET. We should shape
things so as an OFFSET never maps to a LIMIT variant when we
differentiate all these flavors at query string level.
--
Michael

David Rowley

dgrowleyml@gmail.com

10 months ago

In reply to: Michael Paquier (#5)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Mon, 10 Mar 2025 at 12:39, Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Mar 06, 2025 at 06:44:27PM -0600, Sami Imseih wrote:

Regarding the issue itself, query jumbling behavior is often subjective,
making it difficult to classify as a bug. I'm not entirely sure this
qualifies as a bug either, but I do believe it should be addressed.

I would call that a bug and something that we should fix, but not
something that we can really backpatch as this has unfortunately an
impact on monitoring tools. Stability takes priority in this area in
already released branches.

I know you reviewed this, Michael, so you're aware; 2d3389c28 did
recently document that we'd only break this in minor versions as a
last resort. So, I agree that it sounds like a master-only fix is in
order.

For the record, the docs [1]https://www.postgresql.org/docs/current/pgstatstatements.html read:

"Generally, it can be assumed that queryid values are stable between
minor version releases of PostgreSQL, providing that instances are
running on the same machine architecture and the catalog metadata
details match. Compatibility will only be broken between minor
versions as a last resort."

David

[1]: https://www.postgresql.org/docs/current/pgstatstatements.html

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: David Rowley (#6)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Mon, Mar 10, 2025 at 02:14:01PM +1300, David Rowley wrote:

I know you reviewed this, Michael, so you're aware; 2d3389c28 did
recently document that we'd only break this in minor versions as a
last resort. So, I agree that it sounds like a master-only fix is in
order.

Yes, this thread's problem does not pass the compatibility bar.
Thanks for the reminder about the docs.
--
Michael

Bykov Ivan

i.bykov@modernsys.ru

10 months ago

In reply to: Michael Paquier (#7)

2 attachment(s)

Hello!

Variant B is not acceptable IMO as it adds a whole bunch of
null-terminators unnecessarily. For example, in a simple "select 1",
(expr == NULL) is true 19 times, so that is an extra 19 bytes.

Variant B is not acceptable here.

Could we improve Variant B?

I was thinking about adding a count of NULL pointers encountered by the query
jumble walker since the last scalar or non-null field visited.
This way, we can traverse the query as follows:
====
QueryA->subNodeOne = &(Value X);
/* JumbleState->Count = 0;
QueryA_ID = hash_any_extended(&(Value X), sizeof(Value X), QueryA_ID) */
QueryA->subNodeTwo = NULL;
/* JumbleState->Count = 1; */
QueryA->subNodeThee = NULL;
/* JumbleState->Count = 2; */
QueryA->subNodeFour = NULL;
/* JumbleState->Count = 3; */
QueryA->scalar = Value Z;
/* QueryA_ID = hash_any_extended(&(JumbleState->Count),
sizeof(JumbleState->Count), QueryA_ID)
JumbleState->Count = 0;
QueryA_ID = hash_any_extended(&(Value Y), sizeof(Value Y), QueryA_ID) */
====

Variant A improvement
---------------------

I have attached the updated Variant A patch with the following changes:
- Implemented a pg_stat_statements regression test (see similar test results
on REL_17_3 in Query-ID-collision-at_pg_stat_statements-1ea6e890b22.txt).
- Added extra description about the Query ID collision
in src/include/nodes/parsenodes.h.

Attachments:

Query-ID-collision-at_pg_stat_statements-1ea6e890b22.txttext/plain; name=Query-ID-collision-at_pg_stat_statements-1ea6e890b22.txtDownload

v2-0001-Query-ID-Calculation-Fix-Variant-A.patchapplication/octet-stream; name=v2-0001-Query-ID-Calculation-Fix-Variant-A.patchDownload

From 39532fd798c15dca9e3cd0474b32934b01739449 Mon Sep 17 00:00:00 2001
From: Bykov Ivan <i.bykov@modernsys.ru>
Date: Mon, 10 Mar 2025 15:11:06 +0500
Subject: [PATCH] Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT /
 OFFSET

In some cases, we may encounter different query trees with the same IDs.
For two structurally similar query subnodes, the query trees may
look like this:

QueryA->subNodeOne = Value X;
QueryA->subNodeTwo = NULL;
QueryB->subNodeOne = NULL;
QueryB->subNodeTwo = Value X;

When the query jumble walker calculates the query ID, it traverses
the Query members from top to bottom and generates the same IDs
for these two queries because it does not change the hash value
when visiting an empty node (= NULL).

There are two pairs of subnodes that can trigger this error:
- distinctClause and sortClause (both contain a list of SortGroupClauses);
- limitOffset and limitCount (both contain an int8 expression).

To fix this problem, the definition of the Query node has been
changed so that a scalar field is placed between similar subnodes.
---
 .../pg_stat_statements/expected/select.out    | 49 +++++++++++++++++++
 contrib/pg_stat_statements/sql/select.sql     | 18 +++++++
 src/include/nodes/parsenodes.h                | 16 ++++--
 3 files changed, 80 insertions(+), 3 deletions(-)

diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out
index dd6c756f67d..f1b71f56b7b 100644
--- a/contrib/pg_stat_statements/expected/select.out
+++ b/contrib/pg_stat_statements/expected/select.out
@@ -406,9 +406,58 @@ SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%SELECT GROUPING%';
      2
 (1 row)
 
+CREATE TABLE pgss_c AS
+  SELECT id FROM generate_series(1,2) AS id;
 SELECT pg_stat_statements_reset() IS NOT NULL AS t;
  t 
 ---
  t
 (1 row)
 
+-- These two queries trigger a Query ID collision in PostgreSQL versions prior to 18
+SELECT DISTINCT id FROM pgss_c;
+ id 
+----
+  2
+  1
+(2 rows)
+
+SELECT id FROM pgss_c ORDER BY id;
+ id 
+----
+  1
+  2
+(2 rows)
+
+-- These two queries trigger a Query ID collision in PostgreSQL versions prior to 18
+SELECT id FROM pgss_c LIMIT 1;
+ id 
+----
+  1
+(1 row)
+
+SELECT id FROM pgss_c OFFSET 1;
+ id 
+----
+  2
+(1 row)
+
+-- Expected four rows (Query ID collision has been fixed).
+SELECT query FROM pg_stat_statements
+  WHERE query LIKE '%pgss_c%'
+  ORDER BY query COLLATE "C";
+               query               
+-----------------------------------
+ SELECT DISTINCT id FROM pgss_c
+ SELECT id FROM pgss_c LIMIT $1
+ SELECT id FROM pgss_c OFFSET $1
+ SELECT id FROM pgss_c ORDER BY id
+(4 rows)
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+DROP TABLE pgss_c;
diff --git a/contrib/pg_stat_statements/sql/select.sql b/contrib/pg_stat_statements/sql/select.sql
index eb45cb81ad2..bbc37fa0005 100644
--- a/contrib/pg_stat_statements/sql/select.sql
+++ b/contrib/pg_stat_statements/sql/select.sql
@@ -146,4 +146,22 @@ SELECT (
 ) FROM (VALUES(6,7)) v3(e,f) GROUP BY ROLLUP(e,f);
 
 SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%SELECT GROUPING%';
+
+CREATE TABLE pgss_c AS
+  SELECT id FROM generate_series(1,2) AS id;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+
+-- These two queries trigger a Query ID collision in PostgreSQL versions prior to 18
+SELECT DISTINCT id FROM pgss_c;
+SELECT id FROM pgss_c ORDER BY id;
+-- These two queries trigger a Query ID collision in PostgreSQL versions prior to 18
+SELECT id FROM pgss_c LIMIT 1;
+SELECT id FROM pgss_c OFFSET 1;
+
+-- Expected four rows (Query ID collision has been fixed).
+SELECT query FROM pg_stat_statements
+  WHERE query LIKE '%pgss_c%'
+  ORDER BY query COLLATE "C";
 SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+
+DROP TABLE pgss_c;
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 67c90a2bd32..1a8ea81c690 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -113,6 +113,16 @@ typedef uint64 AclMode;			/* a bitmask of privilege bits */
  *	  significant (such as alias names), as is ignored anything that can
  *	  be deduced from child nodes (else we'd just be double-hashing that
  *	  piece of information).
+ *
+ *	  The query jumbling process may trigger a Query ID collision when
+ *	  two Query fields located sequentially contain the same type of nodes
+ *	  (a list of nodes), and both of them may be NULL.
+ *	  List of such groups of fields:
+ *	    - limitOffset and limitCount (which may be an int8 expression or NULL);
+ *	    - distinctClause, and sortClause (which may be a list of
+ *	      SortGroupClause entries or NULL);
+ *	  To prevent this collision, we should insert at least one scalar field
+ *	  without the attribute query_jumble_ignore between such fields.
  */
 typedef struct Query
 {
@@ -208,11 +218,11 @@ typedef struct Query
 
 	List	   *distinctClause; /* a list of SortGroupClause's */
 
-	List	   *sortClause;		/* a list of SortGroupClause's */
-
 	Node	   *limitOffset;	/* # of result tuples to skip (int8 expr) */
-	Node	   *limitCount;		/* # of result tuples to return (int8 expr) */
 	LimitOption limitOption;	/* limit type */
+	Node	   *limitCount;		/* # of result tuples to return (int8 expr) */
+
+	List	   *sortClause;		/* a list of SortGroupClause's */
 
 	List	   *rowMarks;		/* a list of RowMarkClause's */
 
-- 
2.39.5

David Rowley

dgrowleyml@gmail.com

10 months ago

In reply to: Bykov Ivan (#8)

1 attachment(s)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Mon, 10 Mar 2025 at 23:17, Bykov Ivan <i.bykov@modernsys.ru> wrote:

I have attached the updated Variant A patch with the following changes:
- Implemented a pg_stat_statements regression test (see similar test results
on REL_17_3 in Query-ID-collision-at_pg_stat_statements-1ea6e890b22.txt).
- Added extra description about the Query ID collision
in src/include/nodes/parsenodes.h.

I've not paid much attention to the jumble code before, but I did
glance at this patch:

+ *
+ *   The query jumbling process may trigger a Query ID collision when
+ *   two Query fields located sequentially contain the same type of nodes
+ *   (a list of nodes), and both of them may be NULL.
+ *   List of such groups of fields:
+ *     - limitOffset and limitCount (which may be an int8 expression or NULL);
+ *     - distinctClause, and sortClause (which may be a list of
+ *       SortGroupClause entries or NULL);
+ *   To prevent this collision, we should insert at least one scalar field
+ *   without the attribute query_jumble_ignore between such fields.
  */
 typedef struct Query
 {
@@ -208,11 +218,11 @@ typedef struct Query

List *distinctClause; /* a list of SortGroupClause's */

- List    *sortClause; /* a list of SortGroupClause's */
-
  Node    *limitOffset; /* # of result tuples to skip (int8 expr) */
- Node    *limitCount; /* # of result tuples to return (int8 expr) */
  LimitOption limitOption; /* limit type */
+ Node    *limitCount; /* # of result tuples to return (int8 expr) */
+
+ List    *sortClause; /* a list of SortGroupClause's */

List *rowMarks; /* a list of RowMarkClause's */

It seems to me that if this fixes the issue, that the next similar one
is already lurking in the shadows waiting to jump out on us.

Couldn't we adjust all this code so that we pass a seed to
AppendJumble() that's the offsetof(Struct, field) that we're jumbling
and incorporate that seed into the hash? I don't think we could
possibly change the offset in a minor version, so I don't think
there's a risk that could cause jumble value instability. Possibly
different CPU architectures would come up with different offsets
through having different struct alignment requirements, but we don't
make any guarantees about the jumble values matching across different
CPU architectures anyway.

I admit to only having spent a few minutes looking at this, so there
may well be a good reason for not doing this. Maybe Michael can tell
me what that is...?

A very quickly put together patch is attached. I intend this only to
demonstrate what I mean, not because I want to work on it myself.

David

Attachments:

v1-0001-A-very-rough-proof-of-concept-to-fix-the-problem-.patchapplication/octet-stream; name=v1-0001-A-very-rough-proof-of-concept-to-fix-the-problem-.patchDownload

From de5638e8c8656d7bcb8e7df25e203788fc1b3555 Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Tue, 11 Mar 2025 00:32:15 +1300
Subject: [PATCH v1] A very rough proof of concept to fix the problem in the
 link below

Discussion: https://postgr.es/m/aafce7966e234372b2ba876c0193f1e9@localhost.localdomain
---
 src/backend/nodes/gen_node_support.pl |  6 +--
 src/backend/nodes/queryjumblefuncs.c  | 60 +++++++++++++++------------
 2 files changed, 37 insertions(+), 29 deletions(-)

diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 1a657f7e0ae..bb4168aeac5 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -1301,7 +1301,7 @@ _jumble${n}(JumbleState *jstate, Node *node)
 		if (($t =~ /^(\w+)\*$/ or $t =~ /^struct\s+(\w+)\*$/)
 			and elem $1, @node_types)
 		{
-			print $jff "\tJUMBLE_NODE($f);\n"
+			print $jff "\tJUMBLE_NODE($f, offsetof($n, $f));\n"
 			  unless $query_jumble_ignore;
 		}
 		elsif ($t eq 'ParseLoc')
@@ -1315,12 +1315,12 @@ _jumble${n}(JumbleState *jstate, Node *node)
 		}
 		elsif ($t eq 'char*')
 		{
-			print $jff "\tJUMBLE_STRING($f);\n"
+			print $jff "\tJUMBLE_STRING($f, offsetof($n, $f));\n"
 			  unless $query_jumble_ignore;
 		}
 		else
 		{
-			print $jff "\tJUMBLE_FIELD($f);\n"
+			print $jff "\tJUMBLE_FIELD($f, offsetof($n, $f));\n"
 			  unless $query_jumble_ignore;
 		}
 	}
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index b103a281936..5e669e0970f 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -52,9 +52,9 @@ int			compute_query_id = COMPUTE_QUERY_ID_AUTO;
 bool		query_id_enabled = false;
 
 static void AppendJumble(JumbleState *jstate,
-						 const unsigned char *item, Size size);
+						 const unsigned char *item, Size size, int seed);
 static void RecordConstLocation(JumbleState *jstate, int location);
-static void _jumbleNode(JumbleState *jstate, Node *node);
+static void _jumbleNode(JumbleState *jstate, Node *node, int seed);
 static void _jumbleA_Const(JumbleState *jstate, Node *node);
 static void _jumbleList(JumbleState *jstate, Node *node);
 static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
@@ -127,7 +127,7 @@ JumbleQuery(Query *query)
 	jstate->highest_extern_param_id = 0;
 
 	/* Compute query ID and mark the Query node with it */
-	_jumbleNode(jstate, (Node *) query);
+	_jumbleNode(jstate, (Node *) query, 0); /* TODO seed? */
 	query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
 													  jstate->jumble_len,
 													  0));
@@ -165,7 +165,7 @@ EnableQueryId(void)
  * the current jumble.
  */
 static void
-AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
+AppendJumble(JumbleState *jstate, const unsigned char *item, Size size, int seed)
 {
 	unsigned char *jumble = jstate->jumble;
 	Size		jumble_len = jstate->jumble_len;
@@ -185,6 +185,11 @@ AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
 
 			start_hash = DatumGetUInt64(hash_any_extended(jumble,
 														  JUMBLE_SIZE, 0));
+
+			/* TODO, what's the best way to combine this? */
+			if (seed)
+				start_hash = hash_combine64(start_hash, seed);
+
 			memcpy(jumble, &start_hash, sizeof(start_hash));
 			jumble_len = sizeof(start_hash);
 		}
@@ -223,24 +228,24 @@ RecordConstLocation(JumbleState *jstate, int location)
 	}
 }
 
-#define JUMBLE_NODE(item) \
-	_jumbleNode(jstate, (Node *) expr->item)
+#define JUMBLE_NODE(item, seed) \
+	_jumbleNode(jstate, (Node *) expr->item, seed)
 #define JUMBLE_LOCATION(location) \
 	RecordConstLocation(jstate, expr->location)
-#define JUMBLE_FIELD(item) \
-	AppendJumble(jstate, (const unsigned char *) &(expr->item), sizeof(expr->item))
+#define JUMBLE_FIELD(item, seed) \
+	AppendJumble(jstate, (const unsigned char *) &(expr->item), sizeof(expr->item), seed)
 #define JUMBLE_FIELD_SINGLE(item) \
-	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
-#define JUMBLE_STRING(str) \
+	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item), 0)
+#define JUMBLE_STRING(str, seed) \
 do { \
 	if (expr->str) \
-		AppendJumble(jstate, (const unsigned char *) (expr->str), strlen(expr->str) + 1); \
+		AppendJumble(jstate, (const unsigned char *) (expr->str), strlen(expr->str) + 1, seed); \
 } while(0)
 
 #include "queryjumblefuncs.funcs.c"
 
 static void
-_jumbleNode(JumbleState *jstate, Node *node)
+_jumbleNode(JumbleState *jstate, Node *node, int seed)
 {
 	Node	   *expr = node;
 
@@ -250,11 +255,14 @@ _jumbleNode(JumbleState *jstate, Node *node)
 	/* Guard against stack overflow due to overly complex expressions */
 	check_stack_depth();
 
+	/* TODO how to jumble the seed? Make this proper. */
+	JUMBLE_FIELD_SINGLE(seed);
+
 	/*
 	 * We always emit the node's NodeTag, then any additional fields that are
 	 * considered significant, and then we recurse to any child nodes.
 	 */
-	JUMBLE_FIELD(type);
+	JUMBLE_FIELD(type, offsetof(Node, type));
 
 	switch (nodeTag(expr))
 	{
@@ -305,7 +313,7 @@ _jumbleList(JumbleState *jstate, Node *node)
 	{
 		case T_List:
 			foreach(l, expr)
-				_jumbleNode(jstate, lfirst(l));
+				_jumbleNode(jstate, lfirst(l), 0); /* TODO what to set as the seed? */
 			break;
 		case T_IntList:
 			foreach(l, expr)
@@ -331,26 +339,26 @@ _jumbleA_Const(JumbleState *jstate, Node *node)
 {
 	A_Const    *expr = (A_Const *) node;
 
-	JUMBLE_FIELD(isnull);
+	JUMBLE_FIELD(isnull, offsetof(A_Const, isnull));
 	if (!expr->isnull)
-	{
-		JUMBLE_FIELD(val.node.type);
+	{ /* TODO fix seeds */
+		JUMBLE_FIELD(val.node.type, 0);
 		switch (nodeTag(&expr->val))
 		{
 			case T_Integer:
-				JUMBLE_FIELD(val.ival.ival);
+				JUMBLE_FIELD(val.ival.ival, 0);
 				break;
 			case T_Float:
-				JUMBLE_STRING(val.fval.fval);
+				JUMBLE_STRING(val.fval.fval, 0);
 				break;
 			case T_Boolean:
-				JUMBLE_FIELD(val.boolval.boolval);
+				JUMBLE_FIELD(val.boolval.boolval, 0);
 				break;
 			case T_String:
-				JUMBLE_STRING(val.sval.sval);
+				JUMBLE_STRING(val.sval.sval, 0);
 				break;
 			case T_BitString:
-				JUMBLE_STRING(val.bsval.bsval);
+				JUMBLE_STRING(val.bsval.bsval, 0);
 				break;
 			default:
 				elog(ERROR, "unrecognized node type: %d",
@@ -365,15 +373,15 @@ _jumbleVariableSetStmt(JumbleState *jstate, Node *node)
 {
 	VariableSetStmt *expr = (VariableSetStmt *) node;
 
-	JUMBLE_FIELD(kind);
-	JUMBLE_STRING(name);
+	JUMBLE_FIELD(kind, offsetof(VariableSetStmt, kind));
+	JUMBLE_STRING(name, offsetof(VariableSetStmt, name));
 
 	/*
 	 * Account for the list of arguments in query jumbling only if told by the
 	 * parser.
 	 */
 	if (expr->jumble_args)
-		JUMBLE_NODE(args);
-	JUMBLE_FIELD(is_local);
+		JUMBLE_NODE(args, 0); /* TODO seed? */
+	JUMBLE_FIELD(is_local, offsetof(VariableSetStmt, is_local));
 	JUMBLE_LOCATION(location);
 }
-- 
2.40.1.windows.1

#10

Sami Imseih

samimseih@gmail.com

10 months ago

In reply to: David Rowley (#9)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

It seems to me that if this fixes the issue, that the next similar one
is already lurking in the shadows waiting to jump out on us.

Yes, this is true that there may be other cases, but I am not sure if
it's worth carrying all the
extra bytes in the jumble to deal with a few cases like this. This is
why I don't think Variant B
or tracking the offset is a thrilling idea. -1 for me.

--
Sami

#11

Sami Imseih

samimseih@gmail.com

10 months ago

In reply to: Sami Imseih (#10)

1 attachment(s)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

transformation. For these reasons, variant A where we put the
LimitOption between the two int8 expression nodes feels like the
"okay" approach here. But we must document this expectation in the
structure, and check for more grammar variants of LIMIT and OFFSET
clauses in pgss.

Please see the attached. Variant A with comments and some additional
test cases.

It should be noted that we currently have "WITH TIES/ROWS ONLY" tests in pg_s_s,
so I added another case to show "FETCH FIRST 2 ROW ONLY" and "LIMIT 2" are
the same queryId and also added a query that uses both a LIMIT and OFFSET.

I could not think of other cases we need to cover.

Sami

Attachments:

v1-0001-Fix-QueryId-collision-for-LIMIT-and-OFFSET.patchapplication/octet-stream; name=v1-0001-Fix-QueryId-collision-for-LIMIT-and-OFFSET.patchDownload

From 46361b13a73d13d95149560c97eca29dcc1c2786 Mon Sep 17 00:00:00 2001
From: "Sami Imseih (AWS)"
 <simseih@dev-dsk-simseih-1d-3940b79e.us-east-1.amazon.com>
Date: Mon, 10 Mar 2025 17:06:02 +0000
Subject: [PATCH v1 1/1] Fix QueryId collision for LIMIT and OFFSET

This patch fixes the inability of Query jumble to
differentiate between LIMIT and OFFSET due to both
of these clauses being a int8 expr and being jumbled
one after the other.

By changing the order of the limitCount and limitOffset
and placing LimitOptions between them, this naturally
changes the ordering of the jumble append and a different
queryId.

Added additional test cases in pg_stat_statements.

Author: Bykov Ivan
---
 .../pg_stat_statements/expected/select.out    | 25 ++++++++++++++++++-
 contrib/pg_stat_statements/sql/select.sql     | 13 ++++++++++
 src/include/nodes/parsenodes.h                |  7 +++++-
 3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out
index 37a30af034a..8742e8200f3 100644
--- a/contrib/pg_stat_statements/expected/select.out
+++ b/contrib/pg_stat_statements/expected/select.out
@@ -309,10 +309,33 @@ FETCH FIRST 2 ROW ONLY;
    0
 (2 rows)
 
+SELECT *
+FROM limitoption
+WHERE val < 2
+ORDER BY val
+LIMIT 2;
+ val 
+-----
+   0
+   0
+(2 rows)
+
+SELECT *
+FROM limitoption
+WHERE val < 2
+ORDER BY val
+OFFSET 2
+FETCH FIRST 2 ROW ONLY;
+ val 
+-----
+   0
+   0
+(2 rows)
+
 SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%FETCH FIRST%';
  count 
 -------
-     2
+     3
 (1 row)
 
 -- GROUP BY [DISTINCT]
diff --git a/contrib/pg_stat_statements/sql/select.sql b/contrib/pg_stat_statements/sql/select.sql
index e0be58d5e24..88943bd151c 100644
--- a/contrib/pg_stat_statements/sql/select.sql
+++ b/contrib/pg_stat_statements/sql/select.sql
@@ -120,6 +120,19 @@ WHERE val < 2
 ORDER BY val
 FETCH FIRST 2 ROW ONLY;
 
+SELECT *
+FROM limitoption
+WHERE val < 2
+ORDER BY val
+LIMIT 2;
+
+SELECT *
+FROM limitoption
+WHERE val < 2
+ORDER BY val
+OFFSET 2
+FETCH FIRST 2 ROW ONLY;
+
 SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%FETCH FIRST%';
 
 -- GROUP BY [DISTINCT]
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 23c9e3c5abf..d2f4f36e4c9 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -221,9 +221,14 @@ typedef struct Query
 
 	List	   *sortClause;		/* a list of SortGroupClause's */
 
+	/*
+	 * It is necessary to define limitOption between limitOffset
+	 * and limitCount to be able to differentiate between OFFSET
+	 * and LIMIT for query jumbling purposes.
+	 */
 	Node	   *limitOffset;	/* # of result tuples to skip (int8 expr) */
-	Node	   *limitCount;		/* # of result tuples to return (int8 expr) */
 	LimitOption limitOption;	/* limit type */
+	Node	   *limitCount;		/* # of result tuples to return (int8 expr) */
 
 	List	   *rowMarks;		/* a list of RowMarkClause's */
 
-- 
2.47.1

#12

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: David Rowley (#9)

1 attachment(s)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Tue, Mar 11, 2025 at 12:45:35AM +1300, David Rowley wrote:

It seems to me that if this fixes the issue, that the next similar one
is already lurking in the shadows waiting to jump out on us.

For how many years will be have to wait for this threat hiddent in the
shadows? :)

This issue exists since the query jumbling exists in pgss, so it goes
really down the road. I've spent a bit more than a few minutes on
that.

Couldn't we adjust all this code so that we pass a seed to
AppendJumble() that's the offsetof(Struct, field) that we're jumbling
and incorporate that seed into the hash? I don't think we could
possibly change the offset in a minor version, so I don't think
there's a risk that could cause jumble value instability. Possibly
different CPU architectures would come up with different offsets
through having different struct alignment requirements, but we don't
make any guarantees about the jumble values matching across different
CPU architectures anyway.

Yeah, something like that has potential to get rid of such problems
forever, particularly thanks to the fact that we automate the
generation of this code now so it is mostly cost-free. This has a
cost for the custom jumbling functions where one needs some
imagination, but with models being around while we keep the number of
custom functions low, that should be neither complicated nor costly,
at least in my experience.

When I was thinking about the addition of the offset to the jumbling
yesterday, I was wondering about two things:
- if there are some node structures where it could not work.
- if we'd need to pass down some information of the upper node when
doing the jumbling of a node attached to it, which would make the code
generation much more annoying.

But after sleeping on it I think that these two points are kind of
moot: having only the offset passed down to a single _jumbleNode()
with the offset compiled at its beginning should be sufficient. Using
"seed" as a term is perhaps a bit confusing, because this is an offset
in the upper node, but perhaps that's OK as-is. It's just more
entropy addition. If somebody has a better idea for this term, feel
free.

_jumbleList() is an interesting one. If we want an equivalent of the
offset, this could use the item number in the list which would be a
rather close equivalent to the offset used elsewhere. For the int,
oid and xid lists, we should do an extra JUMBLE_FIELD_SINGLE() for
each member, apply the offset at the beginning of _jumbleList(), once,
I guess.

_jumbleVariableSetStmt() should be OK with the offset of "arg" in
VariableSetStmt. The addition of hash_combine64() to count for the
offset should be OK.

With that in mind, the cases with DISTINCT/ORDER BY and OFFSET/LIMIT
seem to work fine, without causing noise for the other cases tracked
in the regression tests of PGSS.

What do you think?
--
Michael

Attachments:

0001-Add-node-offsets-in-query-jumbling-computations.patchtext/x-diff; charset=us-asciiDownload

From fac0cc4973ff5a2350b7e26c53291b11a7fbb1be Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Tue, 11 Mar 2025 15:47:52 +0900
Subject: [PATCH] Add node offsets in query jumbling computations

---
 src/backend/nodes/gen_node_support.pl         |  6 +-
 src/backend/nodes/queryjumblefuncs.c          | 73 ++++++++++------
 .../pg_stat_statements/expected/select.out    | 87 ++++++++++++++++++-
 contrib/pg_stat_statements/sql/select.sql     | 20 +++++
 4 files changed, 157 insertions(+), 29 deletions(-)

diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 1a657f7e0aea..bb4168aeac55 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -1301,7 +1301,7 @@ _jumble${n}(JumbleState *jstate, Node *node)
 		if (($t =~ /^(\w+)\*$/ or $t =~ /^struct\s+(\w+)\*$/)
 			and elem $1, @node_types)
 		{
-			print $jff "\tJUMBLE_NODE($f);\n"
+			print $jff "\tJUMBLE_NODE($f, offsetof($n, $f));\n"
 			  unless $query_jumble_ignore;
 		}
 		elsif ($t eq 'ParseLoc')
@@ -1315,12 +1315,12 @@ _jumble${n}(JumbleState *jstate, Node *node)
 		}
 		elsif ($t eq 'char*')
 		{
-			print $jff "\tJUMBLE_STRING($f);\n"
+			print $jff "\tJUMBLE_STRING($f, offsetof($n, $f));\n"
 			  unless $query_jumble_ignore;
 		}
 		else
 		{
-			print $jff "\tJUMBLE_FIELD($f);\n"
+			print $jff "\tJUMBLE_FIELD($f, offsetof($n, $f));\n"
 			  unless $query_jumble_ignore;
 		}
 	}
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index b103a2819366..11e43872753a 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -52,9 +52,9 @@ int			compute_query_id = COMPUTE_QUERY_ID_AUTO;
 bool		query_id_enabled = false;
 
 static void AppendJumble(JumbleState *jstate,
-						 const unsigned char *item, Size size);
+						 const unsigned char *item, Size size, int seed);
 static void RecordConstLocation(JumbleState *jstate, int location);
-static void _jumbleNode(JumbleState *jstate, Node *node);
+static void _jumbleNode(JumbleState *jstate, Node *node, int seed);
 static void _jumbleA_Const(JumbleState *jstate, Node *node);
 static void _jumbleList(JumbleState *jstate, Node *node);
 static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
@@ -127,7 +127,7 @@ JumbleQuery(Query *query)
 	jstate->highest_extern_param_id = 0;
 
 	/* Compute query ID and mark the Query node with it */
-	_jumbleNode(jstate, (Node *) query);
+	_jumbleNode(jstate, (Node *) query, 0);
 	query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
 													  jstate->jumble_len,
 													  0));
@@ -163,9 +163,13 @@ EnableQueryId(void)
 /*
  * AppendJumble: Append a value that is substantive in a given query to
  * the current jumble.
+ *
+ * The "seed" value would normally be the offset of a node member to force
+ * more entropy in the query ID generated.  There are exceptions in some of
+ * the custom query jumbling functions, like the List one.
  */
 static void
-AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
+AppendJumble(JumbleState *jstate, const unsigned char *item, Size size, int seed)
 {
 	unsigned char *jumble = jstate->jumble;
 	Size		jumble_len = jstate->jumble_len;
@@ -185,6 +189,8 @@ AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
 
 			start_hash = DatumGetUInt64(hash_any_extended(jumble,
 														  JUMBLE_SIZE, 0));
+			start_hash = hash_combine64(start_hash, seed);
+
 			memcpy(jumble, &start_hash, sizeof(start_hash));
 			jumble_len = sizeof(start_hash);
 		}
@@ -223,24 +229,24 @@ RecordConstLocation(JumbleState *jstate, int location)
 	}
 }
 
-#define JUMBLE_NODE(item) \
-	_jumbleNode(jstate, (Node *) expr->item)
+#define JUMBLE_NODE(item, seed) \
+	_jumbleNode(jstate, (Node *) expr->item, seed)
 #define JUMBLE_LOCATION(location) \
 	RecordConstLocation(jstate, expr->location)
-#define JUMBLE_FIELD(item) \
-	AppendJumble(jstate, (const unsigned char *) &(expr->item), sizeof(expr->item))
+#define JUMBLE_FIELD(item, seed) \
+	AppendJumble(jstate, (const unsigned char *) &(expr->item), sizeof(expr->item), seed)
 #define JUMBLE_FIELD_SINGLE(item) \
-	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
-#define JUMBLE_STRING(str) \
+	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item), 0)
+#define JUMBLE_STRING(str, seed) \
 do { \
 	if (expr->str) \
-		AppendJumble(jstate, (const unsigned char *) (expr->str), strlen(expr->str) + 1); \
+		AppendJumble(jstate, (const unsigned char *) (expr->str), strlen(expr->str) + 1, seed); \
 } while(0)
 
 #include "queryjumblefuncs.funcs.c"
 
 static void
-_jumbleNode(JumbleState *jstate, Node *node)
+_jumbleNode(JumbleState *jstate, Node *node, int seed)
 {
 	Node	   *expr = node;
 
@@ -250,11 +256,13 @@ _jumbleNode(JumbleState *jstate, Node *node)
 	/* Guard against stack overflow due to overly complex expressions */
 	check_stack_depth();
 
+	JUMBLE_FIELD_SINGLE(seed);
+
 	/*
 	 * We always emit the node's NodeTag, then any additional fields that are
 	 * considered significant, and then we recurse to any child nodes.
 	 */
-	JUMBLE_FIELD(type);
+	JUMBLE_FIELD(type, offsetof(Node, type));
 
 	switch (nodeTag(expr))
 	{
@@ -295,29 +303,43 @@ _jumbleNode(JumbleState *jstate, Node *node)
 	}
 }
 
+/*
+ * Custom query jumbling function for a List node.  Note that the
+ * "seed" is defined as the item number in a list.
+ */
 static void
 _jumbleList(JumbleState *jstate, Node *node)
 {
 	List	   *expr = (List *) node;
 	ListCell   *l;
+	int			num = 0;
 
 	switch (expr->type)
 	{
 		case T_List:
 			foreach(l, expr)
-				_jumbleNode(jstate, lfirst(l));
+				_jumbleNode(jstate, lfirst(l), num);
 			break;
 		case T_IntList:
 			foreach(l, expr)
+			{
 				JUMBLE_FIELD_SINGLE(lfirst_int(l));
+				JUMBLE_FIELD_SINGLE(num);
+			}
 			break;
 		case T_OidList:
 			foreach(l, expr)
+			{
 				JUMBLE_FIELD_SINGLE(lfirst_oid(l));
+				JUMBLE_FIELD_SINGLE(num);
+			}
 			break;
 		case T_XidList:
 			foreach(l, expr)
+			{
 				JUMBLE_FIELD_SINGLE(lfirst_xid(l));
+				JUMBLE_FIELD_SINGLE(num);
+			}
 			break;
 		default:
 			elog(ERROR, "unrecognized list node type: %d",
@@ -331,26 +353,27 @@ _jumbleA_Const(JumbleState *jstate, Node *node)
 {
 	A_Const    *expr = (A_Const *) node;
 
-	JUMBLE_FIELD(isnull);
+	JUMBLE_FIELD(isnull, offsetof(A_Const, isnull));
+
 	if (!expr->isnull)
 	{
-		JUMBLE_FIELD(val.node.type);
+		JUMBLE_FIELD(val.node.type, offsetof(A_Const, val));
 		switch (nodeTag(&expr->val))
 		{
 			case T_Integer:
-				JUMBLE_FIELD(val.ival.ival);
+				JUMBLE_FIELD(val.ival.ival, T_Integer);
 				break;
 			case T_Float:
-				JUMBLE_STRING(val.fval.fval);
+				JUMBLE_STRING(val.fval.fval, T_Float);
 				break;
 			case T_Boolean:
-				JUMBLE_FIELD(val.boolval.boolval);
+				JUMBLE_FIELD(val.boolval.boolval, T_Boolean);
 				break;
 			case T_String:
-				JUMBLE_STRING(val.sval.sval);
+				JUMBLE_STRING(val.sval.sval, T_String);
 				break;
 			case T_BitString:
-				JUMBLE_STRING(val.bsval.bsval);
+				JUMBLE_STRING(val.bsval.bsval, T_BitString);
 				break;
 			default:
 				elog(ERROR, "unrecognized node type: %d",
@@ -365,15 +388,15 @@ _jumbleVariableSetStmt(JumbleState *jstate, Node *node)
 {
 	VariableSetStmt *expr = (VariableSetStmt *) node;
 
-	JUMBLE_FIELD(kind);
-	JUMBLE_STRING(name);
+	JUMBLE_FIELD(kind, offsetof(VariableSetStmt, kind));
+	JUMBLE_STRING(name, offsetof(VariableSetStmt, name));
 
 	/*
 	 * Account for the list of arguments in query jumbling only if told by the
 	 * parser.
 	 */
 	if (expr->jumble_args)
-		JUMBLE_NODE(args);
-	JUMBLE_FIELD(is_local);
+		JUMBLE_NODE(args, offsetof(VariableSetStmt, args));
+	JUMBLE_FIELD(is_local, offsetof(VariableSetStmt, is_local));
 	JUMBLE_LOCATION(location);
 }
diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out
index 37a30af034a6..1587d2cafb3a 100644
--- a/contrib/pg_stat_statements/expected/select.out
+++ b/contrib/pg_stat_statements/expected/select.out
@@ -19,6 +19,86 @@ SELECT 1 AS "int";
    1
 (1 row)
 
+-- LIMIT and OFFSET patterns
+-- These require more entropy with parsing node offsets.
+SELECT 1 AS "int" LIMIT 1;
+ int 
+-----
+   1
+(1 row)
+
+SELECT 1 AS "int" LIMIT 2;
+ int 
+-----
+   1
+(1 row)
+
+SELECT 1 AS "int" OFFSET 1;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" OFFSET 2;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" OFFSET 1 LIMIT 1;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" OFFSET 2 LIMIT 2;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" LIMIT 1 OFFSET 1;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" LIMIT 3 OFFSET 3;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" OFFSET 1 FETCH FIRST 2 ROW ONLY;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" OFFSET 2 FETCH FIRST 3 ROW ONLY;
+ int 
+-----
+(0 rows)
+
+-- DISTINCT and ORDER BY patterns
+-- These require more entropy with parsing node offsets.
+SELECT DISTINCT 1 AS "int";
+ int 
+-----
+   1
+(1 row)
+
+SELECT DISTINCT 2 AS "int";
+ int 
+-----
+   2
+(1 row)
+
+SELECT 1 AS "int" ORDER BY 1;
+ int 
+-----
+   1
+(1 row)
+
+SELECT 2 AS "int" ORDER BY 1;
+ int 
+-----
+   2
+(1 row)
+
 /* this comment should not appear in the output */
 SELECT 'hello'
   -- but this one will appear
@@ -135,9 +215,14 @@ SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
      3 |    3 | SELECT $1 + $2 + $3 AS "add"
      1 |    1 | SELECT $1 AS "float"
      2 |    2 | SELECT $1 AS "int"
+     2 |    2 | SELECT $1 AS "int" LIMIT $2
+     2 |    0 | SELECT $1 AS "int" OFFSET $2
+     6 |    0 | SELECT $1 AS "int" OFFSET $2 LIMIT $3
+     2 |    2 | SELECT $1 AS "int" ORDER BY 1
      1 |    2 | SELECT $1 AS i UNION SELECT $2 ORDER BY i
      1 |    1 | SELECT $1 || $2
      1 |    1 | SELECT $1, $2 LIMIT $3
+     2 |    2 | SELECT DISTINCT $1 AS "int"
      0 |    0 | SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C"
      1 |    1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
      1 |    2 | WITH t(f) AS (                                                              +
@@ -145,7 +230,7 @@ SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
        |      | )                                                                           +
        |      |   SELECT f FROM t ORDER BY f
      1 |    1 | select $1::jsonb ? $2
-(12 rows)
+(17 rows)
 
 SELECT pg_stat_statements_reset() IS NOT NULL AS t;
  t 
diff --git a/contrib/pg_stat_statements/sql/select.sql b/contrib/pg_stat_statements/sql/select.sql
index e0be58d5e24b..4dcfa8ef74dc 100644
--- a/contrib/pg_stat_statements/sql/select.sql
+++ b/contrib/pg_stat_statements/sql/select.sql
@@ -12,6 +12,26 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t;
 --
 SELECT 1 AS "int";
 
+-- LIMIT and OFFSET patterns
+-- These require more entropy with parsing node offsets.
+SELECT 1 AS "int" LIMIT 1;
+SELECT 1 AS "int" LIMIT 2;
+SELECT 1 AS "int" OFFSET 1;
+SELECT 1 AS "int" OFFSET 2;
+SELECT 1 AS "int" OFFSET 1 LIMIT 1;
+SELECT 1 AS "int" OFFSET 2 LIMIT 2;
+SELECT 1 AS "int" LIMIT 1 OFFSET 1;
+SELECT 1 AS "int" LIMIT 3 OFFSET 3;
+SELECT 1 AS "int" OFFSET 1 FETCH FIRST 2 ROW ONLY;
+SELECT 1 AS "int" OFFSET 2 FETCH FIRST 3 ROW ONLY;
+
+-- DISTINCT and ORDER BY patterns
+-- These require more entropy with parsing node offsets.
+SELECT DISTINCT 1 AS "int";
+SELECT DISTINCT 2 AS "int";
+SELECT 1 AS "int" ORDER BY 1;
+SELECT 2 AS "int" ORDER BY 1;
+
 /* this comment should not appear in the output */
 SELECT 'hello'
   -- but this one will appear
-- 
2.47.2

#13

David Rowley

dgrowleyml@gmail.com

10 months ago

In reply to: Michael Paquier (#12)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Tue, 11 Mar 2025 at 19:50, Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Mar 11, 2025 at 12:45:35AM +1300, David Rowley wrote:

It seems to me that if this fixes the issue, that the next similar one
is already lurking in the shadows waiting to jump out on us.

For how many years will be have to wait for this threat hiddent in the
shadows? :)

This issue exists since the query jumbling exists in pgss, so it goes
really down the road. I've spent a bit more than a few minutes on
that.

I didn't mean to cause offence here. I just wanted to make it clear
that I don't think fixing these types of issues by swapping the order
of the fields is going to be a sustainable practice, and it would be
good to come up with a solution that eliminates the chances of this
class of bug from ever appearing again. Even if we were to trawl the
entire Query struct and everything that could ever be attached to it
and we deem that today it's fine and no more such bugs exist, the
person adding some new SQL feature in the future that adds new data to
store in the Query probably isn't going to give query jumbling much
thought, especially now that the code generation is automatic.

Couldn't we adjust all this code so that we pass a seed to
AppendJumble() that's the offsetof(Struct, field) that we're jumbling
and incorporate that seed into the hash? I don't think we could
possibly change the offset in a minor version, so I don't think
there's a risk that could cause jumble value instability. Possibly
different CPU architectures would come up with different offsets
through having different struct alignment requirements, but we don't
make any guarantees about the jumble values matching across different
CPU architectures anyway.

Yeah, something like that has potential to get rid of such problems
forever, particularly thanks to the fact that we automate the
generation of this code now so it is mostly cost-free. This has a
cost for the custom jumbling functions where one needs some
imagination, but with models being around while we keep the number of
custom functions low, that should be neither complicated nor costly,
at least in my experience.

I hadn't really processed this thread fully when I responded yesterday
and my response was mostly aimed at the latest patch on the thread at
the time, which I had looked at quickly and wanted to put a stop to
changing field orders as a fix for this. Since then, I see that Ivan
has already submitted a patch that accounts for NULL nodes and adds a
byte to the jumble buffer to account for NULLs. This seems quite clean
and simple. However, Sami seems to have concerns about the overhead of
doing this. Is that warranted at all? Potentially, there could be a
decent number of NULL fields. It'll probably be much cheaper than the
offsetof idea I came up with.

When I was thinking about the addition of the offset to the jumbling
yesterday, I was wondering about two things:
- if there are some node structures where it could not work.
- if we'd need to pass down some information of the upper node when
doing the jumbling of a node attached to it, which would make the code
generation much more annoying.

But after sleeping on it I think that these two points are kind of
moot: having only the offset passed down to a single _jumbleNode()
with the offset compiled at its beginning should be sufficient. Using
"seed" as a term is perhaps a bit confusing, because this is an offset
in the upper node, but perhaps that's OK as-is. It's just more
entropy addition. If somebody has a better idea for this term, feel
free.

Can you share your thoughts on Ivan's NULL jumble idea?

_jumbleList() is an interesting one. If we want an equivalent of the
offset, this could use the item number in the list which would be a
rather close equivalent to the offset used elsewhere. For the int,
oid and xid lists, we should do an extra JUMBLE_FIELD_SINGLE() for
each member, apply the offset at the beginning of _jumbleList(), once,
I guess.

Is this affected by the same class of bug that we're aiming to fix
here? (This class being not always jumbling all fields because of
NULLs or some other value, resulting in the next jumble field applying
the same bytes to the buffer as the previous field would have, had it
not been ignored)

_jumbleVariableSetStmt() should be OK with the offset of "arg" in
VariableSetStmt. The addition of hash_combine64() to count for the
offset should be OK.

With that in mind, the cases with DISTINCT/ORDER BY and OFFSET/LIMIT
seem to work fine, without causing noise for the other cases tracked
in the regression tests of PGSS.

What do you think?

I mostly now think the class of bug can be fixed by ensuring we never
ignore any jumble field for any reason and always put at least 1 byte
onto the buffer with some sort of entropy. I'm keen to understand if
I'm missing some case that you've thought about that I've not.

David

#14

Bykov Ivan

i.bykov@modernsys.ru

10 months ago

In reply to: David Rowley (#13)

2 attachment(s)

Hello!

Since then, I see that Ivan
has already submitted a patch that accounts for NULL nodes and adds a
byte to the jumble buffer to account for NULLs. This seems quite clean
and simple. However, Sami seems to have concerns about the overhead of
doing this. Is that warranted at all? Potentially, there could be a
decent number of NULL fields. It'll probably be much cheaper than the
offsetof idea I came up with.

To address the issue of it consuming a lot of bytes in the jumble buffer
for NULL marks, I have added a new patch version for Variant B.
(v2-0001-Query-ID-Calculation-Fix-Variant-B.patch).

This patch adds to JumbleState the count of NULL nodes visited before a
non-NULL one appears. The least significant byte of this count is appended
to the jumble buffer every time the count is not null (indicating that we have
visited non-NULL nodes previously), and a new chunk of data is also appended
to the jumble buffer. I intentionally did not add a final append for the NULL
count in the JumbleQuery processing, as NULL tail nodes of the query do not
improve the reliability of query ID collision resolution.

I believe that my first version, Variant B of the patch, is simple and easy to
understand. I would prefer to use the first version of the Variant B patch,
so I have attached an updated version along with tests
(v3-0001-Query-ID-Calculation-Fix-Variant-B.patch).

As you can see, v2-0001-Query-ID-Calculation-Fix-Variant-B.patch is more
complex and invasive than v3-0001-Query-ID-Calculation-Fix-Variant-B.patch.
I don't think that saving a few bytes in a 1024-byte sized buffer is worth
implementing, even for this simple optimization (v2).

Can you share your thoughts on Ivan's NULL jumble idea?

I would appreciate any feedback. Thank you.

-----Original Message-----
From: David Rowley <dgrowleyml@gmail.com>
Sent: Tuesday, March 11, 2025 12:49 PM
To: Michael Paquier <michael@paquier.xyz>
Cc: Быков Иван Александрович <i.bykov@modernsys.ru>; Sami Imseih <samimseih@gmail.com>; pgsql-hackers@postgresql.org
Subject: Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Tue, 11 Mar 2025 at 19:50, Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Mar 11, 2025 at 12:45:35AM +1300, David Rowley wrote:

It seems to me that if this fixes the issue, that the next similar
one is already lurking in the shadows waiting to jump out on us.

For how many years will be have to wait for this threat hiddent in the
shadows? :)

This issue exists since the query jumbling exists in pgss, so it goes
really down the road. I've spent a bit more than a few minutes on
that.

I didn't mean to cause offence here. I just wanted to make it clear that I don't think fixing these types of issues by swapping the order of the fields is going to be a sustainable practice, and it would be good to come up with a solution that eliminates the chances of this class of bug from ever appearing again. Even if we were to trawl the entire Query struct and everything that could ever be attached to it and we deem that today it's fine and no more such bugs exist, the person adding some new SQL feature in the future that adds new data to store in the Query probably isn't going to give query jumbling much thought, especially now that the code generation is automatic.

Couldn't we adjust all this code so that we pass a seed to
AppendJumble() that's the offsetof(Struct, field) that we're
jumbling and incorporate that seed into the hash? I don't think we
could possibly change the offset in a minor version, so I don't
think there's a risk that could cause jumble value instability.
Possibly different CPU architectures would come up with different
offsets through having different struct alignment requirements, but
we don't make any guarantees about the jumble values matching across
different CPU architectures anyway.

Yeah, something like that has potential to get rid of such problems
forever, particularly thanks to the fact that we automate the
generation of this code now so it is mostly cost-free. This has a
cost for the custom jumbling functions where one needs some
imagination, but with models being around while we keep the number of
custom functions low, that should be neither complicated nor costly,
at least in my experience.

I hadn't really processed this thread fully when I responded yesterday and my response was mostly aimed at the latest patch on the thread at the time, which I had looked at quickly and wanted to put a stop to changing field orders as a fix for this. Since then, I see that Ivan has already submitted a patch that accounts for NULL nodes and adds a byte to the jumble buffer to account for NULLs. This seems quite clean and simple. However, Sami seems to have concerns about the overhead of doing this. Is that warranted at all? Potentially, there could be a decent number of NULL fields. It'll probably be much cheaper than the offsetof idea I came up with.

When I was thinking about the addition of the offset to the jumbling
yesterday, I was wondering about two things:
- if there are some node structures where it could not work.
- if we'd need to pass down some information of the upper node when
doing the jumbling of a node attached to it, which would make the code
generation much more annoying.

But after sleeping on it I think that these two points are kind of
moot: having only the offset passed down to a single _jumbleNode()
with the offset compiled at its beginning should be sufficient. Using
"seed" as a term is perhaps a bit confusing, because this is an offset
in the upper node, but perhaps that's OK as-is. It's just more
entropy addition. If somebody has a better idea for this term, feel
free.

Can you share your thoughts on Ivan's NULL jumble idea?

_jumbleList() is an interesting one. If we want an equivalent of the
offset, this could use the item number in the list which would be a
rather close equivalent to the offset used elsewhere. For the int,
oid and xid lists, we should do an extra JUMBLE_FIELD_SINGLE() for
each member, apply the offset at the beginning of _jumbleList(), once,
I guess.

Is this affected by the same class of bug that we're aiming to fix here? (This class being not always jumbling all fields because of NULLs or some other value, resulting in the next jumble field applying the same bytes to the buffer as the previous field would have, had it not been ignored)

_jumbleVariableSetStmt() should be OK with the offset of "arg" in
VariableSetStmt. The addition of hash_combine64() to count for the
offset should be OK.

With that in mind, the cases with DISTINCT/ORDER BY and OFFSET/LIMIT
seem to work fine, without causing noise for the other cases tracked
in the regression tests of PGSS.

What do you think?

I mostly now think the class of bug can be fixed by ensuring we never ignore any jumble field for any reason and always put at least 1 byte onto the buffer with some sort of entropy. I'm keen to understand if I'm missing some case that you've thought about that I've not.

David

Attachments:

v2-0001-Query-ID-Calculation-Fix-Variant-B.patchapplication/octet-stream; name=v2-0001-Query-ID-Calculation-Fix-Variant-B.patchDownload

From e2c1f7d8876e11c28bfc4a9478080712a0868027 Mon Sep 17 00:00:00 2001
From: Bykov Ivan <i.bykov@modernsys.ru>
Date: Tue, 11 Mar 2025 14:34:21 +0500
Subject: [PATCH] Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / 
 OFFSET

In some cases, we may encounter different query trees with the same IDs.
For two structurally similar query subnodes, the query trees may look like this:

QueryA->subNodeOne = Value X;
QueryA->subNodeTwo = NULL;
QueryB->subNodeOne = NULL;
QueryB->subNodeTwo = Value X;

When the query jumble walker calculates the query ID, it traverses
the query members from top to bottom and generates the same IDs
for these two queries because it does not change the hash value
when visiting an empty node (i.e., NULL).

There are two pairs of subnodes that can trigger this error:
- distinctClause and sortClause (both contain a list of SortGroupClauses);
- limitOffset and limitCount (both contain an int8 expression).

To fix this problem, we count NULL nodes and add that count to the query
jumble buffer every time we visit a non-NULL field.
---
 .../pg_stat_statements/expected/select.out    | 49 ++++++++++++++++++
 contrib/pg_stat_statements/sql/select.sql     | 18 +++++++
 src/backend/nodes/queryjumblefuncs.c          | 50 +++++++++++++++----
 src/include/nodes/queryjumble.h               |  3 ++
 4 files changed, 111 insertions(+), 9 deletions(-)

diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out
index dd6c756f67d..f1b71f56b7b 100644
--- a/contrib/pg_stat_statements/expected/select.out
+++ b/contrib/pg_stat_statements/expected/select.out
@@ -406,9 +406,58 @@ SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%SELECT GROUPING%';
      2
 (1 row)
 
+CREATE TABLE pgss_c AS
+  SELECT id FROM generate_series(1,2) AS id;
 SELECT pg_stat_statements_reset() IS NOT NULL AS t;
  t 
 ---
  t
 (1 row)
 
+-- These two queries trigger a Query ID collision in PostgreSQL versions prior to 18
+SELECT DISTINCT id FROM pgss_c;
+ id 
+----
+  2
+  1
+(2 rows)
+
+SELECT id FROM pgss_c ORDER BY id;
+ id 
+----
+  1
+  2
+(2 rows)
+
+-- These two queries trigger a Query ID collision in PostgreSQL versions prior to 18
+SELECT id FROM pgss_c LIMIT 1;
+ id 
+----
+  1
+(1 row)
+
+SELECT id FROM pgss_c OFFSET 1;
+ id 
+----
+  2
+(1 row)
+
+-- Expected four rows (Query ID collision has been fixed).
+SELECT query FROM pg_stat_statements
+  WHERE query LIKE '%pgss_c%'
+  ORDER BY query COLLATE "C";
+               query               
+-----------------------------------
+ SELECT DISTINCT id FROM pgss_c
+ SELECT id FROM pgss_c LIMIT $1
+ SELECT id FROM pgss_c OFFSET $1
+ SELECT id FROM pgss_c ORDER BY id
+(4 rows)
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+DROP TABLE pgss_c;
diff --git a/contrib/pg_stat_statements/sql/select.sql b/contrib/pg_stat_statements/sql/select.sql
index eb45cb81ad2..bbc37fa0005 100644
--- a/contrib/pg_stat_statements/sql/select.sql
+++ b/contrib/pg_stat_statements/sql/select.sql
@@ -146,4 +146,22 @@ SELECT (
 ) FROM (VALUES(6,7)) v3(e,f) GROUP BY ROLLUP(e,f);
 
 SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%SELECT GROUPING%';
+
+CREATE TABLE pgss_c AS
+  SELECT id FROM generate_series(1,2) AS id;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+
+-- These two queries trigger a Query ID collision in PostgreSQL versions prior to 18
+SELECT DISTINCT id FROM pgss_c;
+SELECT id FROM pgss_c ORDER BY id;
+-- These two queries trigger a Query ID collision in PostgreSQL versions prior to 18
+SELECT id FROM pgss_c LIMIT 1;
+SELECT id FROM pgss_c OFFSET 1;
+
+-- Expected four rows (Query ID collision has been fixed).
+SELECT query FROM pg_stat_statements
+  WHERE query LIKE '%pgss_c%'
+  ORDER BY query COLLATE "C";
 SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+
+DROP TABLE pgss_c;
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index 129fb447099..cf6d43259c0 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -51,6 +51,7 @@ int			compute_query_id = COMPUTE_QUERY_ID_AUTO;
  */
 bool		query_id_enabled = false;
 
+static inline Size CompressJumble(unsigned char *jumble, Size jumble_len);
 static void AppendJumble(JumbleState *jstate,
 						 const unsigned char *item, Size size);
 static void RecordConstLocation(JumbleState *jstate, int location);
@@ -118,6 +119,7 @@ JumbleQuery(Query *query)
 		palloc(jstate->clocations_buf_size * sizeof(LocationLen));
 	jstate->clocations_count = 0;
 	jstate->highest_extern_param_id = 0;
+	jstate->null_sequence_number = 0;
 
 	/* Compute query ID and mark the Query node with it */
 	_jumbleNode(jstate, (Node *) query);
@@ -153,6 +155,26 @@ EnableQueryId(void)
 		query_id_enabled = true;
 }
 
+
+/*
+ * CompressJumble: Check whether the jumble buffer has overflowed.
+ * If it has, calculate the hash and store it in buffer.
+ */
+static inline Size
+CompressJumble(unsigned char *jumble, Size jumble_len)
+{
+	if (unlikely(jumble_len >= JUMBLE_SIZE))
+	{
+		uint64		start_hash;
+
+		start_hash = DatumGetUInt64(hash_any_extended(jumble,
+													  JUMBLE_SIZE, 0));
+		memcpy(jumble, &start_hash, sizeof(start_hash));
+		return sizeof(start_hash);
+	}
+	return jumble_len;
+}
+
 /*
  * AppendJumble: Append a value that is substantive in a given query to
  * the current jumble.
@@ -172,21 +194,28 @@ AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
 	{
 		Size		part_size;
 
-		if (jumble_len >= JUMBLE_SIZE)
-		{
-			uint64		start_hash;
-
-			start_hash = DatumGetUInt64(hash_any_extended(jumble,
-														  JUMBLE_SIZE, 0));
-			memcpy(jumble, &start_hash, sizeof(start_hash));
-			jumble_len = sizeof(start_hash);
-		}
+		jumble_len = CompressJumble(jumble, jumble_len);
 		part_size = Min(size, JUMBLE_SIZE - jumble_len);
 		memcpy(jumble + jumble_len, item, part_size);
 		jumble_len += part_size;
 		item += part_size;
 		size -= part_size;
 	}
+
+	/*
+	 * To prevent Query ID collisions, add to the query jumble information the
+	 * count of visited NULL nodes since the last non-NULL node was hashed.
+	 * This approach consumes less jumble memory than simply adding a NULL
+	 * marker for every NULL node field.
+	 */
+	if (jstate->null_sequence_number)
+	{
+		jumble_len = CompressJumble(jumble, jumble_len);
+		jumble[jumble_len] = jstate->null_sequence_number & 0xFF;
+		jstate->null_sequence_number = 0;
+		++jumble_len;
+	}
+
 	jstate->jumble_len = jumble_len;
 }
 
@@ -238,7 +267,10 @@ _jumbleNode(JumbleState *jstate, Node *node)
 	Node	   *expr = node;
 
 	if (expr == NULL)
+	{
+		++jstate->null_sequence_number;
 		return;
+	}
 
 	/* Guard against stack overflow due to overly complex expressions */
 	check_stack_depth();
diff --git a/src/include/nodes/queryjumble.h b/src/include/nodes/queryjumble.h
index f1c55c8067f..81dba02f08c 100644
--- a/src/include/nodes/queryjumble.h
+++ b/src/include/nodes/queryjumble.h
@@ -48,6 +48,9 @@ typedef struct JumbleState
 
 	/* highest Param id we've seen, in order to start normalization correctly */
 	int			highest_extern_param_id;
+
+	/* count of visited NULL nodes since the last non-NULL node was hashed */
+	int			null_sequence_number;
 } JumbleState;
 
 /* Values for the compute_query_id GUC */
-- 
2.39.5

v3-0001-Query-ID-Calculation-Fix-Variant-B.patchapplication/octet-stream; name=v3-0001-Query-ID-Calculation-Fix-Variant-B.patchDownload

From 098d136c72b948ab7517192c46b2e9d569c0f15c Mon Sep 17 00:00:00 2001
From: Bykov Ivan <i.bykov@modernsys.ru>
Date: Tue, 11 Mar 2025 14:24:24 +0500
Subject: [PATCH] Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / 
 OFFSET

In some cases, we may encounter different query trees with the same IDs.
For two structurally similar query subnodes, the query trees may
look like this:

QueryA->subNodeOne = Value X;
QueryA->subNodeTwo = NULL;
QueryB->subNodeOne = NULL;
QueryB->subNodeTwo = Value X;

When the query jumble walker calculates the query ID, it traverses
the Query members from top to bottom and generates the same IDs
for these two queries because it does not change the hash value
when visiting an empty node (= NULL).

There are two pairs of subnodes that can trigger this error:
- distinctClause and sortClause (both contain a list of SortGroupClauses);
- limitOffset and limitCount (both contain an int8 expression).

To fix this problem, for every empty node in the Query tree, a '0' character
is added to the jumble buffer, which is used for ID calculation.
---
 .../pg_stat_statements/expected/select.out    | 49 +++++++++++++++++++
 contrib/pg_stat_statements/sql/select.sql     | 18 +++++++
 src/backend/nodes/queryjumblefuncs.c          |  5 ++
 3 files changed, 72 insertions(+)

diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out
index dd6c756f67d..f1b71f56b7b 100644
--- a/contrib/pg_stat_statements/expected/select.out
+++ b/contrib/pg_stat_statements/expected/select.out
@@ -406,9 +406,58 @@ SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%SELECT GROUPING%';
      2
 (1 row)
 
+CREATE TABLE pgss_c AS
+  SELECT id FROM generate_series(1,2) AS id;
 SELECT pg_stat_statements_reset() IS NOT NULL AS t;
  t 
 ---
  t
 (1 row)
 
+-- These two queries trigger a Query ID collision in PostgreSQL versions prior to 18
+SELECT DISTINCT id FROM pgss_c;
+ id 
+----
+  2
+  1
+(2 rows)
+
+SELECT id FROM pgss_c ORDER BY id;
+ id 
+----
+  1
+  2
+(2 rows)
+
+-- These two queries trigger a Query ID collision in PostgreSQL versions prior to 18
+SELECT id FROM pgss_c LIMIT 1;
+ id 
+----
+  1
+(1 row)
+
+SELECT id FROM pgss_c OFFSET 1;
+ id 
+----
+  2
+(1 row)
+
+-- Expected four rows (Query ID collision has been fixed).
+SELECT query FROM pg_stat_statements
+  WHERE query LIKE '%pgss_c%'
+  ORDER BY query COLLATE "C";
+               query               
+-----------------------------------
+ SELECT DISTINCT id FROM pgss_c
+ SELECT id FROM pgss_c LIMIT $1
+ SELECT id FROM pgss_c OFFSET $1
+ SELECT id FROM pgss_c ORDER BY id
+(4 rows)
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+DROP TABLE pgss_c;
diff --git a/contrib/pg_stat_statements/sql/select.sql b/contrib/pg_stat_statements/sql/select.sql
index eb45cb81ad2..bbc37fa0005 100644
--- a/contrib/pg_stat_statements/sql/select.sql
+++ b/contrib/pg_stat_statements/sql/select.sql
@@ -146,4 +146,22 @@ SELECT (
 ) FROM (VALUES(6,7)) v3(e,f) GROUP BY ROLLUP(e,f);
 
 SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%SELECT GROUPING%';
+
+CREATE TABLE pgss_c AS
+  SELECT id FROM generate_series(1,2) AS id;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+
+-- These two queries trigger a Query ID collision in PostgreSQL versions prior to 18
+SELECT DISTINCT id FROM pgss_c;
+SELECT id FROM pgss_c ORDER BY id;
+-- These two queries trigger a Query ID collision in PostgreSQL versions prior to 18
+SELECT id FROM pgss_c LIMIT 1;
+SELECT id FROM pgss_c OFFSET 1;
+
+-- Expected four rows (Query ID collision has been fixed).
+SELECT query FROM pg_stat_statements
+  WHERE query LIKE '%pgss_c%'
+  ORDER BY query COLLATE "C";
 SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+
+DROP TABLE pgss_c;
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index 129fb447099..225a49ebc3c 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -238,7 +238,12 @@ _jumbleNode(JumbleState *jstate, Node *node)
 	Node	   *expr = node;
 
 	if (expr == NULL)
+	{
+		const unsigned char null_symbol = '\0';
+
+		AppendJumble(jstate, &null_symbol, sizeof(unsigned char));
 		return;
+	}
 
 	/* Guard against stack overflow due to overly complex expressions */
 	check_stack_depth();
-- 
2.39.5

#15

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: David Rowley (#13)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Tue, Mar 11, 2025 at 08:48:33PM +1300, David Rowley wrote:

On Tue, 11 Mar 2025 at 19:50, Michael Paquier <michael@paquier.xyz> wrote:

This issue exists since the query jumbling exists in pgss, so it goes
really down the road. I've spent a bit more than a few minutes on
that.

I didn't mean to cause offence here. I just wanted to make it clear
that I don't think fixing these types of issues by swapping the order
of the fields is going to be a sustainable practice, and it would be
good to come up with a solution that eliminates the chances of this
class of bug from ever appearing again. Even if we were to trawl the
entire Query struct and everything that could ever be attached to it
and we deem that today it's fine and no more such bugs exist, the
person adding some new SQL feature in the future that adds new data to
store in the Query probably isn't going to give query jumbling much
thought, especially now that the code generation is automatic.

FWIW, I've mentioned the use of the offset in a Node upthread, in the
next to last last paragraph of this email:
/messages/by-id/Z84mhjm5RtWtLG4X@paquier.xyz

One thing I did not notice yesterday was that it is possible to rely
on _jumbleNode() to pass an offset from the parent. So it looks like
we had roughly the same idea independently, but I was not able to look
more closely at that yesterday.

But after sleeping on it I think that these two points are kind of
moot: having only the offset passed down to a single _jumbleNode()
with the offset compiled at its beginning should be sufficient. Using
"seed" as a term is perhaps a bit confusing, because this is an offset
in the upper node, but perhaps that's OK as-is. It's just more
entropy addition. If somebody has a better idea for this term, feel
free.

Can you share your thoughts on Ivan's NULL jumble idea?

This is an incomplete solution, I am afraid. The origin of the
problem comes from the fact that a Node (here, Query) stores two times
successively equivalent Nodes that are used for separate data, with
NULL being the difference between both. The problem is not limited to
NULL, we could as well, for example, finish with a Node that has a
custom jumbling function and the same issue depending on how it is
used in a parent Node. Adding the offset from the parent in the
jumbling is a much stronger insurance policy that the proposed
solution specific to NULL, because each offset is unique in a single
Node.

_jumbleList() is an interesting one. If we want an equivalent of the
offset, this could use the item number in the list which would be a
rather close equivalent to the offset used elsewhere. For the int,
oid and xid lists, we should do an extra JUMBLE_FIELD_SINGLE() for
each member, apply the offset at the beginning of _jumbleList(), once,
I guess.

Is this affected by the same class of bug that we're aiming to fix
here? (This class being not always jumbling all fields because of
NULLs or some other value, resulting in the next jumble field applying
the same bytes to the buffer as the previous field would have, had it
not been ignored)

Yep, that could be possible. I am not sure if it can be reached
currently with the set of parse nodes, but it in theory possible could
be possible with a list of Nodes, at least.
--
Michael

#16

Sami Imseih

samimseih@gmail.com

10 months ago

In reply to: Michael Paquier (#15)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

and simple. However, Sami seems to have concerns about the overhead of
doing this. Is that warranted at all? Potentially, there could be a
decent number of NULL fields. It'll probably be much cheaper than the
offsetof idea I came up with.

I have not benchmarked the overhead, so maybe there is not much to
be concerned about. However, it just seems to me that tracking the extra
data for all cases just to only deal with corner cases does not seem like the
correct approach. This is what makes variant A the most attractive
approach.

Sami

#17

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: Sami Imseih (#16)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Tue, Mar 11, 2025 at 05:35:10PM -0500, Sami Imseih wrote:

I have not benchmarked the overhead, so maybe there is not much to
be concerned about. However, it just seems to me that tracking the extra
data for all cases just to only deal with corner cases does not seem like the
correct approach. This is what makes variant A the most attractive
approach.

I suspect that the overhead will be minimal for all the approaches I'm
seeing on this thread, but it would not hurt to double-check all that.
As the overhead of a single query jumbling is weightless compared to
the overall query processing, the fastest method I've used in this
area is a micro-benchmark with a hardcoded loop in JumbleQuery() with
some rusage to get a more few metrics. This exagerates the query
jumbling computing, but it's good enough to see a difference once you
take an average of the time taken for each loop.
--
Michael

#18

David Rowley

dgrowleyml@gmail.com

10 months ago

In reply to: Michael Paquier (#15)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Tue, 11 Mar 2025 at 23:28, Michael Paquier <michael@paquier.xyz> wrote:

Can you share your thoughts on Ivan's NULL jumble idea?

This is an incomplete solution, I am afraid. The origin of the
problem comes from the fact that a Node (here, Query) stores two times
successively equivalent Nodes that are used for separate data, with
NULL being the difference between both. The problem is not limited to
NULL, we could as well, for example, finish with a Node that has a
custom jumbling function and the same issue depending on how it is
used in a parent Node. Adding the offset from the parent in the
jumbling is a much stronger insurance policy that the proposed
solution specific to NULL, because each offset is unique in a single
Node.

One of us is not understanding the problem correctly. I'm very open
for that to be me, but I'll need a bit more explanation for me to know
that for sure.

As far as I see it, providing we ensure we always add something with a
bit of entropy to the jumble buffer for all possible values that a
jumble field can be, this fixes the issue. The problem only occurs at
the moment because we entirely skip over NULL node fields and we end
up with the same jumble bytes if we effectively move the same List of
nodes between the Query's distinctClause and sortClause fields.

If we add something for NULLs, we'll always add distinctClause before
the sortClause. If the distinctClause is NULL we won't end up with the
same jumble bytes as if the sortClause were NULL, as the order the
NULL byte(s) are added to the buffer changes.

The fragment of the Query struct looks like:

List *windowClause; /* a list of WindowClause's */

List *distinctClause; /* a list of SortGroupClause's */

List *sortClause; /* a list of SortGroupClause's */

Let's assume we have a non-NIL windowClause, a non-NIL distinctClause
and a NIL sortClause. The jumble bytes currently look like:
"<windowClause bytes><distinctClause bytes>", and if we have an ORDER
BY instead of a DISTINCT, we get: "<windowClause bytes><sortClause
bytes>", and this is problematic when the distinctClause bytes and
sortClause bytes at the same as that results in the same hash.

If we account for the NULLs, those two cases become: "<windowClause
bytes><distinctClause bytes><byte for NULL sortClause>" and
"<windowClause bytes><byte for NULL distinct clause><sortClause
bytes>", which is going to be highly unlikely to hash to the same
value due to the buffer not being the same.

Can you explain where my understanding is wrong?

David

#19

Sami Imseih

samimseih@gmail.com

10 months ago

In reply to: David Rowley (#18)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

If we add something for NULLs, we'll always add distinctClause before
the sortClause. If the distinctClause is NULL we won't end up with the
same jumble bytes as if the sortClause were NULL, as the order the
NULL byte(s) are added to the buffer changes.

That is how I understand it as well. By appending a single character null
terminator to the jumble for every NULL expression, we change the composition
of the final jumble. Passing offsets will accomplish the same thing, but I can't
see how it buys us any additional safeguards.

I suspect that the overhead will be minimal for all the approaches I'm
seeing on this thread, but it would not hurt to double-check all that.

Perhaps my concern is overblown. Also, it seems the consensus is for a more
comprehensive solution than that of variant A.

Here is an idea I was playing around with. Why don't we just append a null
terminator for every expression (a variant of variant B)?
I describe this as jumbling the expression position. And if we do
that, it no longer
seems necessary to jumble the expression type either. right?

+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -236,6 +236,13 @@ do { \
        if (expr->str) \
                AppendJumble(jstate, (const unsigned char *)
(expr->str), strlen(expr->str) + 1); \
 } while(0)
+#define JUMBLE_POSITION() \
+do { \
+       char nullterm = '\0'; \
+       AppendJumble(jstate, (const unsigned char *) &(nullterm),
sizeof(nullterm)); \
+       if (expr == NULL) \
+               return; \
+} while(0)

#include "queryjumblefuncs.funcs.c"

@@ -244,17 +251,15 @@ _jumbleNode(JumbleState *jstate, Node *node)
{
Node *expr = node;

- if (expr == NULL)
- return;
-
/* Guard against stack overflow due to overly complex expressions */
check_stack_depth();

        /*
-        * We always emit the node's NodeTag, then any additional
fields that are
-        * considered significant, and then we recurse to any child nodes.
+        * We represent a position of a field in the jumble with a null-term.
+        * Doing so ensures that even NULL expressions are accounted for in
+        * the jumble.
         */
-       JUMBLE_FIELD(type);
+       JUMBLE_POSITION();

switch (nodeTag(expr))
{

As the overhead of a single query jumbling is weightless compared to
the overall query processing

yeah, that is a good point. At least benchmarking the above on a
SELECT only pgbench did not reveal any regression.

--
Sami Imseih
Amazon Web Services (AWS)

#20

David Rowley

dgrowleyml@gmail.com

10 months ago

In reply to: David Rowley (#13)

2 attachment(s)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Tue, 11 Mar 2025 at 20:48, David Rowley <dgrowleyml@gmail.com> wrote:

Since then, I see that Ivan
has already submitted a patch that accounts for NULL nodes and adds a
byte to the jumble buffer to account for NULLs. This seems quite clean
and simple. However, Sami seems to have concerns about the overhead of
doing this. Is that warranted at all? Potentially, there could be a
decent number of NULL fields. It'll probably be much cheaper than the
offsetof idea I came up with.

To try and get some timing information about the overhead added for
jumbling the NULLs, I compared master and the patch at [1]/messages/by-id/attachment/173439/0001-Query-ID-Calculation-Fix-Variant-B.patch using the
attached jumbletime.patch.txt. The patch just adds an additional call
to JumbleQuery and times how long it takes and outputs the result in
nanoseconds. Running make check has the advantage of trying out many
different queries.

On an AMD 3990x machine, the results for jumbling all make check queries are:

master: 73.66 milliseconds
0001-Query-ID-Calculation-Fix-Variant-B.patch: 80.36 milliseconds

So that adds about 9.1% overhead to jumbling, on average.

See the attached jumble_results.txt for full results and the code to
extract the results.

David

[1]: /messages/by-id/attachment/173439/0001-Query-ID-Calculation-Fix-Variant-B.patch

Attachments:

jumbletime.patch.txttext/plain; charset=US-ASCII; name=jumbletime.patch.txtDownload

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 55ab2da299b..cc03458fcba 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -873,6 +873,19 @@ pg_rewrite_query(Query *query)
 	return querytree_list;
 }
 
+#define NANOSEC_PER_SEC 1000000000
+
+#include <time.h>
+
+/* Returns difference between t1 and t2 in nanoseconds */
+static int64
+get_clock_diff(struct timespec *t1, struct timespec *t2)
+{
+	int64 nanosec = (t1->tv_sec - t2->tv_sec) * NANOSEC_PER_SEC;
+	nanosec += (t1->tv_nsec - t2->tv_nsec);
+
+	return nanosec;
+}
 
 /*
  * Generate a plan for a single already-rewritten query.
@@ -883,6 +896,8 @@ pg_plan_query(Query *querytree, const char *query_string, int cursorOptions,
 			  ParamListInfo boundParams)
 {
 	PlannedStmt *plan;
+	struct timespec start, end;
+	int64 jumbletime;
 
 	/* Utility commands have no plans. */
 	if (querytree->commandType == CMD_UTILITY)
@@ -896,6 +911,12 @@ pg_plan_query(Query *querytree, const char *query_string, int cursorOptions,
 	if (log_planner_stats)
 		ResetUsage();
 
+	clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);
+	JumbleQuery(querytree);
+	clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end);
+	jumbletime = get_clock_diff(&end, &start);
+	elog(NOTICE, "QueryJumble in %ld nanoseconds", jumbletime);
+
 	/* call the optimizer */
 	plan = planner(querytree, query_string, cursorOptions, boundParams);

jumble_results.txttext/plain; charset=US-ASCII; name=jumble_results.txtDownload

#21

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: David Rowley (#18)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Wed, Mar 12, 2025 at 03:28:27PM +1300, David Rowley wrote:

If we account for the NULLs, those two cases become: "<windowClause
bytes><distinctClause bytes><byte for NULL sortClause>" and
"<windowClause bytes><byte for NULL distinct clause><sortClause
bytes>", which is going to be highly unlikely to hash to the same
value due to the buffer not being the same.

Can you explain where my understanding is wrong?

(I am a bit busy this week, sorry for the delay)

I have looked at the patches sent at
/messages/by-id/5ac172e0b77a4baba50671cd1a15285f@localhost.localdomain,
and they don't really address what I'm trying to point at here.

Custom jumbling functions could still be problematic, making
uneffective a NULL check at the beginning of jumbleNode().

Here is an example, possible by design, that would still be a problem:
static void
_jumbleNodeFoo(JumbleState *jstate, Node *node)
{
Foo *expr = (Foo *) node;

if (expr->field1 == 0)
return;

JUMBLE_FIELD(field2);
[..]
}

typedef struct Foo
{
pg_node_attr(custom_query_jumble)

NodeTag type;
int field1;
char *field2;
[..]
}

Then we have the same problem with another Node using this Foo node
two times in a row, depending on how it's used by the query parsing
and transform:
typedef struct FooBar
{
pg_node_attr(custom_query_jumble)

NodeTag type;
Foo *foo1;
Foo *foo2;
[..]
}

Adding a depth level incremented once we go through a Node would add
more entropy, but it may not ve completely bullet-proof either, even
if we add an offset. I am pretty sure I could find a Node structure
that acts as counter example even if you add a depth level and an
offset in the jumbling..
--
Michael

#22

Sami Imseih

samimseih@gmail.com

10 months ago

In reply to: Michael Paquier (#21)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

Then we have the same problem with another Node using this Foo node
two times in a row, depending on how it's used by the query parsing
and transform:

hmm, it's hard to imagine such a case being real-world and useful for
query jumbling purposes. Also, I think if we introduce something like
JUMBLE_NULL() or JUMBLE_POSITION(), the author of a custom jumbling
function should use it. This could be added in code comments somewhere, IMO.

--
Sami

#23

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: Sami Imseih (#22)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Thu, Mar 13, 2025 at 07:17:20PM -0500, Sami Imseih wrote:

Then we have the same problem with another Node using this Foo node
two times in a row, depending on how it's used by the query parsing
and transform:

hmm, it's hard to imagine such a case being real-world and useful for
query jumbling purposes.

Perhaps, but the code allows that be design, and people like forking
Postgres.

Also, I think if we introduce something like
JUMBLE_NULL() or JUMBLE_POSITION(), the author of a custom jumbling
function should use it. This could be added in code comments somewhere, IMO.

FWIW, another idea I have on top of my mind is the addition of a
counter in JumbleState that we increment each time we enter
_jumbleNode(), then simply call JUMBLE_FIELD_SINGLE() after the
incrementation. And we can rely on this counter to be unique each
time we jumble a node..
--
Michael

#24

Sami Imseih

samimseih@gmail.com

10 months ago

In reply to: Michael Paquier (#23)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

FWIW, another idea I have on top of my mind is the addition of a
counter in JumbleState that we increment each time we enter
_jumbleNode(), then simply call JUMBLE_FIELD_SINGLE() after the
incrementation. And we can rely on this counter to be unique each
time we jumble a node..

With this approach, the author of the custom function will need
to remember to increment the counter, right?

Also, I am wondering what is the difference between appending an incrementing
counter in JState vs just appending a constant for every node ( NULL
or NOT NULL ) ?

appending some constant or null-term to the hash:
<expression 1> <0><expression 2> <0> <0>

appending an incremented value:
expression_1 <1> expression_2 <2> <3>

Both will do the job of adding the required entropy

--
Sami

#25

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: Sami Imseih (#24)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Mar 14, 2025, at 8:15, Sami Imseih <samimseih@gmail.com> wrote:

FWIW, another idea I have on top of my mind is the addition of a
counter in JumbleState that we increment each time we enter
_jumbleNode(), then simply call JUMBLE_FIELD_SINGLE() after the
incrementation. And we can rely on this counter to be unique each
time we jumble a node..

With this approach, the author of the custom function will need
to remember to increment the counter, right?

Actually, no. _jumbleNode() is the unique entry point we use when jumbling a sub-Node in a bigger Node structure, so custom functions don’t need to touch it.

Also, I am wondering what is the difference between appending an incrementing
counter in JState vs just appending a constant for every node ( NULL
or NOT NULL ) ?

Well, the problem is not NULL vs non-NULL, so..
—
Michael

#26

David Rowley

dgrowleyml@gmail.com

10 months ago

In reply to: Michael Paquier (#25)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Fri, 14 Mar 2025 at 14:51, Michael Paquier <michael@paquier.xyz> wrote:

On Mar 14, 2025, at 8:15, Sami Imseih <samimseih@gmail.com> wrote:

FWIW, another idea I have on top of my mind is the addition of a
counter in JumbleState that we increment each time we enter
_jumbleNode(), then simply call JUMBLE_FIELD_SINGLE() after the
incrementation. And we can rely on this counter to be unique each
time we jumble a node..

With this approach, the author of the custom function will need
to remember to increment the counter, right?

Actually, no. _jumbleNode() is the unique entry point we use when jumbling a sub-Node in a bigger Node structure, so custom functions don’t need to touch it.

Err, what about non-node types? Those'll go to AppendJumble().

I think basically everyone here apart from you is having a hard time
understanding what additional benefits your counter solution brings
over just ensuring we always AppendJumble with something, regardless
of the field's value. I do want to understand what you're concerned
about but you've demonstrated nothing to us about the "always jumble
something" idea that breaks. Your example custom function broke that
rule as it skipped doing anything when "expr->field1 == 0".

Anyway, let's see your patch so we can judge actual code rather than
waste our time arguing over assumptions about what the hypothetical
code is and does.

David

#27

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: David Rowley (#26)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Mar 14, 2025, at 9:27, David Rowley <dgrowleyml@gmail.com> wrote:

Err, what about non-node types? Those'll go to AppendJumble().

I think basically everyone here apart from you is having a hard time
understanding what additional benefits your counter solution brings
over just ensuring we always AppendJumble with something, regardless
of the field's value. I do want to understand what you're concerned
about but you've demonstrated nothing to us about the "always jumble
something" idea that breaks. Your example custom function broke that
rule as it skipped doing anything when "expr->field1 == 0".

Because what I’ve mentioned is the kind of case I’ve wanted as supported when designing this facility, keeping also in mind that we may use this stuff for more than just Querys. If you’d rather discard what the argument I’ve done upthread, well, I guess that you’re free to do so and bypass any comments I have. Now I do think that what’s been sent does not check all the boxes if we want a solution other than shifting the fields of a Node.

Anyway, let's see your patch so we can judge actual code rather than
waste our time arguing over assumptions about what the hypothetical
code is and does.

Sure. I’m going to need a few days for that, though :)
—
Michael

#28

David Rowley

dgrowleyml@gmail.com

10 months ago

In reply to: Michael Paquier (#27)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Fri, 14 Mar 2025 at 15:46, Michael Paquier <michael@paquier.xyz> wrote:

On Mar 14, 2025, at 9:27, David Rowley <dgrowleyml@gmail.com> wrote:
I think basically everyone here apart from you is having a hard time
understanding what additional benefits your counter solution brings
over just ensuring we always AppendJumble with something, regardless
of the field's value. I do want to understand what you're concerned
about but you've demonstrated nothing to us about the "always jumble
something" idea that breaks. Your example custom function broke that
rule as it skipped doing anything when "expr->field1 == 0".

Because what I’ve mentioned is the kind of case I’ve wanted as supported when designing this facility, keeping also in mind that we may use this stuff for more than just Querys. If you’d rather discard what the argument I’ve done upthread, well, I guess that you’re free to do so and bypass any comments I have. Now I do think that what’s been sent does not check all the boxes if we want a solution other than shifting the fields of a Node.

I don't think I'm discarding them... As far as I'm aware, your
remaining concern is with custom jumble functions and you showed an
example in [1]/messages/by-id/Z9NrnVk7MtMe8uNF@paquier.xyz of a hypothetical jumble function that could cause the
same bug as this thread is discussing. My response to that was that
the custom jumble function is broken and should be fixed, which seems
to me to be analogous to Ivan's proposal to fix _jumbleNode() to do
something with NULL pointers.

I now can't tell which of the following is true: 1) I've missed one of
your concerns, or; 2) you want to find a way to make badly coded
custom jumble functions not suffer from this bug, or; 3) you think
that adding jumble bytes unconditionally regardless of the field's
value still does not fix the bug in question, or; 4) something else.
It would be good to know which of these it is.

David

[1]: /messages/by-id/Z9NrnVk7MtMe8uNF@paquier.xyz

#29

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: David Rowley (#28)

1 attachment(s)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Fri, Mar 14, 2025 at 04:14:25PM +1300, David Rowley wrote:

I don't think I'm discarding them... As far as I'm aware, your
remaining concern is with custom jumble functions and you showed an
example in [1] of a hypothetical jumble function that could cause the
same bug as this thread is discussing. My response to that was that
the custom jumble function is broken and should be fixed, which seems
to me to be analogous to Ivan's proposal to fix _jumbleNode() to do
something with NULL pointers.

Okay, that's where our opinions diverge. So, well, please let me
reformulate that with some different words and terms. This is not a
problem of having a NULL node vs a non-NULL node in the jumbling,
because by design is it possible for one to decide if a Node can be
included in the jumbling depending on its internal values,
particularly if it gets used across one than one type of parent node.
You are claiming that such functions are broken, but what I am saying
is that such function can be OK depending on how one wants this code
to behave depending on their node structure and what they expect from
them. So I'm making a claim about keeping this stuff more
flexible, while also addressing the problem of the same node defined
successively more than once in a parent structure.

I now can't tell which of the following is true: 1) I've missed one of
your concerns, or; 2) you want to find a way to make badly coded
custom jumble functions not suffer from this bug, or; 3) you think
that adding jumble bytes unconditionally regardless of the field's
value still does not fix the bug in question, or; 4) something else.
It would be good to know which of these it is.

I hope that my opinion counts, as I've worked on this whole design
in 3db72ebcbe20, 8eba3e3f0208 and other related commits. FWIW, I
don't really see any disadvantage in supporting what I'm claiming is
OK, giving more flexibility to folks to do what they want with this
facility, if it also tackles this thread's problem with the Node
handling.

So, here is attached a counter-proposal, where we can simply added a
counter tracking a node count in _jumbleNode() to add more entropy to
the mix, incrementing it as well for NULL nodes.

By the way, I was looking at the set of tests proposed upthread at
this message, which is as far as I know the latest version proposed:
/messages/by-id/5ac172e0b77a4baba50671cd1a15285f@localhost.localdomain

The tests do require neither a relation nor a stats reset, so let's
make it cheaper as I've proposed upthread and in the attached,
including checks with multiple queries and different constants to make
sure that these are correctly grouped in the pgss reports. The
null_sequence_number reset each time we run through a non-NULL node
from variant B reduces a bit the entropy, btw.. The argument about
adding a counter in AppendJumble() is the wrong level to use for such
a change.

Anyway, perhaps we should have your extra byte '\0' or something
equivalent added to JUMBLE_STRING() if dealing with a NULL string
rather than ignoring it. There's an argument about simplicity, IMO,
still it is true that ignoring a NULL value reduces the entropy of the
query ID. So I'd be OK with that if you feel strongly about this
point, at it sound to me that you are? This could be better than a
hardcoded '\0' byte as we could use a counter and
JUMBLE_FIELD_SINGLE() in the NULL case of JUMBLE_STRING(). It's not
proved to be a problem yet, and there's value in having a simpler
solution, as well. Anyway, one line I'd like to draw is that
appendJumble() remains as it is written currently. One extra thing
that We could do is to update it so as the size of an item given is
always more than zero, with an Assert(), to enforce a stricter policy.
--
Michael

Attachments:

v4-0001-Add-more-entropy-to-query-jumbling.patchtext/x-diff; charset=us-asciiDownload

From c78150fea9a49aa2ab06f4b03b1e5beea4457366 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 17 Mar 2025 09:38:22 +0900
Subject: [PATCH v4] Add more entropy to query jumbling

A counter tracking the number of nodes computed is added, and all nodes
use it in their computation.  This matters also for NULL nodes, that
were ignored from the computation up to now.
---
 src/include/nodes/queryjumble.h               |  7 ++
 src/backend/nodes/queryjumblefuncs.c          | 10 +++
 .../pg_stat_statements/expected/select.out    | 87 ++++++++++++++++++-
 contrib/pg_stat_statements/sql/select.sql     | 20 +++++
 4 files changed, 123 insertions(+), 1 deletion(-)

diff --git a/src/include/nodes/queryjumble.h b/src/include/nodes/queryjumble.h
index 50eb95665872..aa1126f50162 100644
--- a/src/include/nodes/queryjumble.h
+++ b/src/include/nodes/queryjumble.h
@@ -37,6 +37,13 @@ typedef struct JumbleState
 	/* Number of bytes used in jumble[] */
 	Size		jumble_len;
 
+	/*
+	 * Number of Nodes included in the computation.  This counter is
+	 * incremented each time a node is added to the jumbling computation,
+	 * and is added to the jumbling to increase its entropy.
+	 */
+	int			node_count;
+
 	/* Array of locations of constants that should be removed */
 	LocationLen *clocations;
 
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index b103a2819366..ec74b92d5c3c 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -120,6 +120,7 @@ JumbleQuery(Query *query)
 	/* Set up workspace for query jumbling */
 	jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
 	jstate->jumble_len = 0;
+	jstate->node_count = 0;
 	jstate->clocations_buf_size = 32;
 	jstate->clocations = (LocationLen *)
 		palloc(jstate->clocations_buf_size * sizeof(LocationLen));
@@ -244,6 +245,15 @@ _jumbleNode(JumbleState *jstate, Node *node)
 {
 	Node	   *expr = node;
 
+	/*
+	 * Increment the node count, and add it to the jumbling.  This operation
+	 * is done before checking if a Node is NULL, so as even a NULL node is
+	 * counted in the computation, without depending on its data, with some
+	 * data that we know to be unique for each computation.
+	 */
+	jstate->node_count++;
+	JUMBLE_FIELD_SINGLE(jstate->node_count);
+
 	if (expr == NULL)
 		return;
 
diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out
index 37a30af034a6..1587d2cafb3a 100644
--- a/contrib/pg_stat_statements/expected/select.out
+++ b/contrib/pg_stat_statements/expected/select.out
@@ -19,6 +19,86 @@ SELECT 1 AS "int";
    1
 (1 row)
 
+-- LIMIT and OFFSET patterns
+-- These require more entropy with parsing node offsets.
+SELECT 1 AS "int" LIMIT 1;
+ int 
+-----
+   1
+(1 row)
+
+SELECT 1 AS "int" LIMIT 2;
+ int 
+-----
+   1
+(1 row)
+
+SELECT 1 AS "int" OFFSET 1;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" OFFSET 2;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" OFFSET 1 LIMIT 1;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" OFFSET 2 LIMIT 2;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" LIMIT 1 OFFSET 1;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" LIMIT 3 OFFSET 3;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" OFFSET 1 FETCH FIRST 2 ROW ONLY;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" OFFSET 2 FETCH FIRST 3 ROW ONLY;
+ int 
+-----
+(0 rows)
+
+-- DISTINCT and ORDER BY patterns
+-- These require more entropy with parsing node offsets.
+SELECT DISTINCT 1 AS "int";
+ int 
+-----
+   1
+(1 row)
+
+SELECT DISTINCT 2 AS "int";
+ int 
+-----
+   2
+(1 row)
+
+SELECT 1 AS "int" ORDER BY 1;
+ int 
+-----
+   1
+(1 row)
+
+SELECT 2 AS "int" ORDER BY 1;
+ int 
+-----
+   2
+(1 row)
+
 /* this comment should not appear in the output */
 SELECT 'hello'
   -- but this one will appear
@@ -135,9 +215,14 @@ SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
      3 |    3 | SELECT $1 + $2 + $3 AS "add"
      1 |    1 | SELECT $1 AS "float"
      2 |    2 | SELECT $1 AS "int"
+     2 |    2 | SELECT $1 AS "int" LIMIT $2
+     2 |    0 | SELECT $1 AS "int" OFFSET $2
+     6 |    0 | SELECT $1 AS "int" OFFSET $2 LIMIT $3
+     2 |    2 | SELECT $1 AS "int" ORDER BY 1
      1 |    2 | SELECT $1 AS i UNION SELECT $2 ORDER BY i
      1 |    1 | SELECT $1 || $2
      1 |    1 | SELECT $1, $2 LIMIT $3
+     2 |    2 | SELECT DISTINCT $1 AS "int"
      0 |    0 | SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C"
      1 |    1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
      1 |    2 | WITH t(f) AS (                                                              +
@@ -145,7 +230,7 @@ SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
        |      | )                                                                           +
        |      |   SELECT f FROM t ORDER BY f
      1 |    1 | select $1::jsonb ? $2
-(12 rows)
+(17 rows)
 
 SELECT pg_stat_statements_reset() IS NOT NULL AS t;
  t 
diff --git a/contrib/pg_stat_statements/sql/select.sql b/contrib/pg_stat_statements/sql/select.sql
index e0be58d5e24b..4dcfa8ef74dc 100644
--- a/contrib/pg_stat_statements/sql/select.sql
+++ b/contrib/pg_stat_statements/sql/select.sql
@@ -12,6 +12,26 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t;
 --
 SELECT 1 AS "int";
 
+-- LIMIT and OFFSET patterns
+-- These require more entropy with parsing node offsets.
+SELECT 1 AS "int" LIMIT 1;
+SELECT 1 AS "int" LIMIT 2;
+SELECT 1 AS "int" OFFSET 1;
+SELECT 1 AS "int" OFFSET 2;
+SELECT 1 AS "int" OFFSET 1 LIMIT 1;
+SELECT 1 AS "int" OFFSET 2 LIMIT 2;
+SELECT 1 AS "int" LIMIT 1 OFFSET 1;
+SELECT 1 AS "int" LIMIT 3 OFFSET 3;
+SELECT 1 AS "int" OFFSET 1 FETCH FIRST 2 ROW ONLY;
+SELECT 1 AS "int" OFFSET 2 FETCH FIRST 3 ROW ONLY;
+
+-- DISTINCT and ORDER BY patterns
+-- These require more entropy with parsing node offsets.
+SELECT DISTINCT 1 AS "int";
+SELECT DISTINCT 2 AS "int";
+SELECT 1 AS "int" ORDER BY 1;
+SELECT 2 AS "int" ORDER BY 1;
+
 /* this comment should not appear in the output */
 SELECT 'hello'
   -- but this one will appear
-- 
2.47.2

#30

Bykov Ivan

i.bykov@modernsys.ru

10 months ago

In reply to: Michael Paquier (#29)

Hello, Michael!

So, here is attached a counter-proposal, where we can simply added a
counter tracking a node count in _jumbleNode() to add more entropy to
the mix, incrementing it as well for NULL nodes.

It definitely looks like a more reliable solution than my variant, which only
counts NULL nodes.

However, we already knew about the overhead of adding `\0` bytes for
every NULL field.

So that adds about 9.1% overhead to jumbling, on average.

See:
/messages/by-id/5ac172e0b77a4baba50671cd1a15285f@localhost.localdomain

Is it really worth spending extra execution time to increase entropy
when we have non-NULL nodes?

Maybe we should choose to add node_count to the hash every time we visit
non-NULL or NULL nodes.
We could also add entropy if we see a change in the node->type value for
non-NULL variants.

Your Variant
------------

< node_count = 1 > < node 1 >
< node_count = 2 > /* node 2 = NULL */
< node_count = 3 > < node 3 >

Alternative 1 (mark only NULL Nodes)
------------------------------------

/* node_count = 1 */ < node 1 >
< node_count = 2 > /* node 2 = NULL */
/* node_count = 3 */ < node 3 >

Alternative 2 (mark only non-NULL Nodes)
----------------------------------------
This could address concerns about problems related to visiting nodes with the
same content placed in different query tree branches.

< node_count = 1 > < node 1 >
/* node_count = 2 */ /* node 2 = NULL */
< node_count = 3 > < node 3 >

#31

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: Bykov Ivan (#30)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Mon, Mar 17, 2025 at 07:33:42AM +0000, Bykov Ivan wrote:

See:
/messages/by-id/5ac172e0b77a4baba50671cd1a15285f@localhost.localdomain

Is it really worth spending extra execution time to increase entropy
when we have non-NULL nodes?

The computed time is already quite cheap, so I'm not much worried
about that, FWIW.

We could also add entropy if we see a change in the node->type value for
non-NULL variants.

I am not sure to get this one. The issue shows up if we have the same
Node computed successively as reported on this thread. It would still
be a problem for different node types, though less likely, when two
nodes use with similar fields.
--
Michael

#32

David Rowley

dgrowleyml@gmail.com

10 months ago

In reply to: Michael Paquier (#29)

1 attachment(s)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Mon, 17 Mar 2025 at 13:53, Michael Paquier <michael@paquier.xyz> wrote:

So, here is attached a counter-proposal, where we can simply added a
counter tracking a node count in _jumbleNode() to add more entropy to
the mix, incrementing it as well for NULL nodes.

I had a look at this and it seems the main difference will be that
this patch will protect against the case that a given node is non-null
but has a custom jumble function which selects to not jumble anything
in some cases. Since Ivan's patch only adds jumbling for non-null,
that could lead to the same bug if a custom jumble function decided to
not jumble anything at all.

FWIW, I did the same test to jumble all make check queries with your
patch, same as [1]/messages/by-id/CAApHDvqaKySJdBf2v2_ybNuT=KObaUVBU4_5kgZfFTMSJr-Vmg@mail.gmail.com:

master: 73.66 milliseconds
0001-Query-ID-Calculation-Fix-Variant-B.patch: 80.36 milliseconds
v4-0001-Add-more-entropy-to-query-jumbling.patch: 88.84 milliseconds

I know you've said in [2]/messages/by-id/Z9flafNlH4-1YSJW@paquier.xyz that you're not worried about performance,
but I wanted to see if that's true... I can measure the cost of
compute_query_id=on with pgbench -S workload on my AMD 3990x machine.
I tried with that on "auto", then with "on" with master and again with
your v4 patch when set to "on":

(the -c 156 -j 156 is just there to ensure this machine properly
clocks up, which it still seems hesitant to do sometimes, even with
the performance power settings)

drowley@amd3990x:~$ echo master compute_query_id = auto && for i in
{1..10}; do pg_ctl stop -D pgdata > /dev/null && pg_ctl start -D
pgdata -l pg.log > /dev/null && pgbench -n -M simple -S -T 60 -c 156
-j 156 postgres | grep tps; done
master compute_query_id = auto
tps = 1202451.945998
tps = 1197645.856236
tps = 1182345.403291
tps = 1182440.562773
tps = 1180731.003390
tps = 1185277.478449
tps = 1174983.732094
tps = 1176310.828235
tps = 1179040.622103
tps = 1177068.520272
drowley@amd3990x:~$ echo master compute_query_id = on && for i in
{1..10}; do pg_ctl stop -D pgdata > /dev/null && pg_ctl start -D
pgdata -l pg.log > /dev/null && pgbench -n -M simple -S -T 60 -c 156
-j 156 postgres | grep tps; done
master compute_query_id = on
tps = 1158684.006821
tps = 1165808.752641
tps = 1158269.999683
tps = 1146730.269628
tps = 1154200.484194
tps = 1152750.152235
tps = 1150438.486321
tps = 1150649.669337
tps = 1144745.761593
tps = 1157743.530383
drowley@amd3990x:~$ echo
v4-0001-Add-more-entropy-to-query-jumbling.patch compute_query_id = on
&& for i in {1..10}; do pg_ctl stop -D pgdata > /dev/null && pg_ctl
start -D pgdata -l pg.log > /dev/null && pgbench -n -M simple -S -T 60
-
c 156 -j 156 postgres | grep tps; done
v4-0001-Add-more-entropy-to-query-jumbling.patch compute_query_id = on
tps = 1156782.516710
tps = 1162780.019911
tps = 1142104.075069
tps = 1145651.299640
tps = 1143157.945891
tps = 1150275.033626
tps = 1145817.267485
tps = 1135915.694987
tps = 1138703.153873
tps = 1138436.802882

It's certainly hard to see much slowdown between master and v4, but it
is visible if I sort the results and put them in a line graph and
scale the vertical axis a bit. A 0.7% slowdown. See attached.

I didn't run the same test on the
0001-Query-ID-Calculation-Fix-Variant-B.patch patch, but with the
first set of results I posted above, you could assume it's less than
0.7% overhead. Locally, I did play around with internalising the
AppendJumble function so my compiler would compile a dedicated
JumbleNull function, which removes a bit of branching and code for the
memcpy. I managed to get it going a bit faster that way.

I do agree that your v4 fixes the issue at hand and I'm not going to
stand in your way if you want to proceed with it. However, from my
point of view, it seems that we could achieve the same goal with much
less overhead by just insisting that custom jumble functions always
jumble something. We could provide a jumble function to do that if the
custom function conditionally opts to jumble nothing. That would save
having to add this additional jumbling of the node count.

Also, am I right in thinking that there's no way for an extension to
add a custom jumble function? If so, then we have full control over
any custom jumble function. That would make it easier to ensure these
always jumble something.

David

[1]: /messages/by-id/CAApHDvqaKySJdBf2v2_ybNuT=KObaUVBU4_5kgZfFTMSJr-Vmg@mail.gmail.com
[2]: /messages/by-id/Z9flafNlH4-1YSJW@paquier.xyz

#33

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: David Rowley (#32)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Tue, Mar 18, 2025 at 05:24:42PM +1300, David Rowley wrote:

I had a look at this and it seems the main difference will be that
this patch will protect against the case that a given node is non-null
but has a custom jumble function which selects to not jumble anything
in some cases. Since Ivan's patch only adds jumbling for non-null,
that could lead to the same bug if a custom jumble function decided to
not jumble anything at all.

Yeah.

It's certainly hard to see much slowdown between master and v4, but it
is visible if I sort the results and put them in a line graph and
scale the vertical axis a bit. A 0.7% slowdown. See attached.

Thanks for the numbers. I'd like to say that this is within noise,
but that's clearly not if repeatable, as you are pointing out. Ugh.

I do agree that your v4 fixes the issue at hand and I'm not going to
stand in your way if you want to proceed with it. However, from my
point of view, it seems that we could achieve the same goal with much
less overhead by just insisting that custom jumble functions always
jumble something. We could provide a jumble function to do that if the
custom function conditionally opts to jumble nothing. That would save
having to add this additional jumbling of the node count.

If we make the whole cheaper with only extra entropy added for NULLs
in nodes and strings, I'd rather have an insurance policy for the
custom functions. Something like that:
- Forbid a size of 0 in AppendJumble().
- When dealing with a non-NULL case in _jumbleNode(), save the initial
jumble_len and the jumble contents when starting, then complain if the
jumble_len matches with the initial length at the end and if the
contents are the same in an assert. A check on the length may be
enough, but we'd depend on JUMBLE_SIZE and nodes can get pretty big.

What do you think?

Also, am I right in thinking that there's no way for an extension to
add a custom jumble function? If so, then we have full control over
any custom jumble function. That would make it easier to ensure these
always jumble something.

Currently, no, that would not be possible through this code path. The
only way would be to fork the code as we rely entirely on the code
generated from the in-core headers.
--
Michael

#34

David Rowley

dgrowleyml@gmail.com

10 months ago

In reply to: Michael Paquier (#33)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Tue, 18 Mar 2025 at 21:00, Michael Paquier <michael@paquier.xyz> wrote:

If we make the whole cheaper with only extra entropy added for NULLs
in nodes and strings, I'd rather have an insurance policy for the
custom functions. Something like that:
- Forbid a size of 0 in AppendJumble().
- When dealing with a non-NULL case in _jumbleNode(), save the initial
jumble_len and the jumble contents when starting, then complain if the
jumble_len matches with the initial length at the end and if the
contents are the same in an assert. A check on the length may be
enough, but we'd depend on JUMBLE_SIZE and nodes can get pretty big.

What do you think?

If it's for Assert enabled builds only, then to save from having to
look at the buffer, you could have an extra field similar to
jumble_len, but does not get reset when the jumble buffer fills. Just
assert that the total_jumbled_bytes has grown after jumbling a node,
maybe?

David

#35

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: David Rowley (#34)

2 attachment(s)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Tue, Mar 18, 2025 at 09:24:06PM +1300, David Rowley wrote:

If it's for Assert enabled builds only, then to save from having to
look at the buffer, you could have an extra field similar to
jumble_len, but does not get reset when the jumble buffer fills. Just
assert that the total_jumbled_bytes has grown after jumbling a node,
maybe?

Indeed, this would work as well, and there's little point in going
full-memcmp() mode for the sake of such a check.

I have been doing some measurements to see how the NULL addition
affects the computation, and with the attached (based on what you have
sent upthread), I can see that a "SELECT 1", which would be one of the
worst cases possible based on its simple Query structure with 19 NULLs
and not much else (?), jumps from 80ms to 107~109ms for 100k loops
calling JumbleQuery(), meaning that a single computation moves from
800ns to 1070~1090ns on my local machine. So this stuff costs, which
is not surprising, but it does not seem *that* bad compared to a full
query run.

Potential ideas about optimizing the branches seem worth
investigating, I am not sure that I would be able to dig into that
until the feature freeze gong, unfortunately, and it would be nice to
fix this issue for this release. I'll see about doing some pgbench
runs on my side, as well, with tpc-b Querys.

Thoughts?
--
Michael

Attachments:

0001-Add-more-entropy-to-query-jumbling.patchtext/x-diff; charset=us-asciiDownload

From ea023f4cd892d65e36ac24450fa57003047bd592 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Fri, 21 Mar 2025 14:09:39 +0900
Subject: [PATCH 1/2] Add more entropy to query jumbling

NULL nodes have now some arbitrary data added to the jumbling, with
a rule enforcing that all nodes need to participate in the query jumble.
This last part counts for custom jumble functions.
---
 src/include/nodes/queryjumble.h               | 15 +++-
 src/backend/nodes/queryjumblefuncs.c          | 27 ++++++
 .../pg_stat_statements/expected/select.out    | 87 ++++++++++++++++++-
 contrib/pg_stat_statements/sql/select.sql     | 20 +++++
 4 files changed, 147 insertions(+), 2 deletions(-)

diff --git a/src/include/nodes/queryjumble.h b/src/include/nodes/queryjumble.h
index 905f66bc0bd4..e13fd49942ce 100644
--- a/src/include/nodes/queryjumble.h
+++ b/src/include/nodes/queryjumble.h
@@ -40,9 +40,22 @@ typedef struct JumbleState
 	/* Jumble of current query tree */
 	unsigned char *jumble;
 
-	/* Number of bytes used in jumble[] */
+	/* Number of bytes used in jumble[], capped at JUMBLE_SIZE */
 	Size		jumble_len;
 
+	/*
+	 * Total number of number bytes used in a query jumble, used for
+	 * sanity checks and debugging.
+	 */
+	Size		jumble_total_len;
+
+	/*
+	 * Extra counter provided for the case of NULL entries.  This counter is
+	 * incremented each time a NULL node or string is found, providing some
+	 * data then appended to a query jumble.
+	 */
+	int			null_count;
+
 	/* Array of locations of constants that should be removed */
 	LocationLen *clocations;
 
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index 189bfda610aa..cc3c73117f6c 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -129,6 +129,8 @@ JumbleQuery(Query *query)
 	/* Set up workspace for query jumbling */
 	jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
 	jstate->jumble_len = 0;
+	jstate->jumble_total_len = 0;
+	jstate->null_count = 0;
 	jstate->clocations_buf_size = 32;
 	jstate->clocations = (LocationLen *)
 		palloc(jstate->clocations_buf_size * sizeof(LocationLen));
@@ -179,6 +181,8 @@ AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
 	unsigned char *jumble = jstate->jumble;
 	Size		jumble_len = jstate->jumble_len;
 
+	Assert(size > 0);
+
 	/*
 	 * Whenever the jumble buffer is full, we hash the current contents and
 	 * reset the buffer to contain just that hash value, thus relying on the
@@ -202,6 +206,7 @@ AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
 		jumble_len += part_size;
 		item += part_size;
 		size -= part_size;
+		jstate->jumble_total_len += part_size;
 	}
 	jstate->jumble_len = jumble_len;
 }
@@ -328,10 +333,17 @@ IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
 	AppendJumble(jstate, (const unsigned char *) &(expr->item), sizeof(expr->item))
 #define JUMBLE_FIELD_SINGLE(item) \
 	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
+
+/* Append string to the jumble, including NULL values */
 #define JUMBLE_STRING(str) \
 do { \
 	if (expr->str) \
 		AppendJumble(jstate, (const unsigned char *) (expr->str), strlen(expr->str) + 1); \
+	else \
+	{ \
+		jstate->null_count++; \
+		JUMBLE_FIELD_SINGLE(jstate->null_count); \
+	} \
 } while(0)
 
 #include "queryjumblefuncs.funcs.c"
@@ -379,9 +391,18 @@ static void
 _jumbleNode(JumbleState *jstate, Node *node)
 {
 	Node	   *expr = node;
+	Size		initial_total_len = jstate->jumble_total_len;
 
 	if (expr == NULL)
+	{
+		/*
+		 * Increment the NULL counter, and add it to the jumble to force
+		 * more entropy to the computation.
+		 */
+		jstate->null_count++;
+		JUMBLE_FIELD_SINGLE(jstate->null_count);
 		return;
+	}
 
 	/* Guard against stack overflow due to overly complex expressions */
 	check_stack_depth();
@@ -429,6 +450,12 @@ _jumbleNode(JumbleState *jstate, Node *node)
 		default:
 			break;
 	}
+
+	/*
+	 * All nodes must participate in the computation, even nodes with
+	 * custom functions.
+	 */
+	Assert(initial_total_len < jstate->jumble_total_len);
 }
 
 static void
diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out
index 37a30af034a6..1587d2cafb3a 100644
--- a/contrib/pg_stat_statements/expected/select.out
+++ b/contrib/pg_stat_statements/expected/select.out
@@ -19,6 +19,86 @@ SELECT 1 AS "int";
    1
 (1 row)
 
+-- LIMIT and OFFSET patterns
+-- These require more entropy with parsing node offsets.
+SELECT 1 AS "int" LIMIT 1;
+ int 
+-----
+   1
+(1 row)
+
+SELECT 1 AS "int" LIMIT 2;
+ int 
+-----
+   1
+(1 row)
+
+SELECT 1 AS "int" OFFSET 1;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" OFFSET 2;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" OFFSET 1 LIMIT 1;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" OFFSET 2 LIMIT 2;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" LIMIT 1 OFFSET 1;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" LIMIT 3 OFFSET 3;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" OFFSET 1 FETCH FIRST 2 ROW ONLY;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" OFFSET 2 FETCH FIRST 3 ROW ONLY;
+ int 
+-----
+(0 rows)
+
+-- DISTINCT and ORDER BY patterns
+-- These require more entropy with parsing node offsets.
+SELECT DISTINCT 1 AS "int";
+ int 
+-----
+   1
+(1 row)
+
+SELECT DISTINCT 2 AS "int";
+ int 
+-----
+   2
+(1 row)
+
+SELECT 1 AS "int" ORDER BY 1;
+ int 
+-----
+   1
+(1 row)
+
+SELECT 2 AS "int" ORDER BY 1;
+ int 
+-----
+   2
+(1 row)
+
 /* this comment should not appear in the output */
 SELECT 'hello'
   -- but this one will appear
@@ -135,9 +215,14 @@ SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
      3 |    3 | SELECT $1 + $2 + $3 AS "add"
      1 |    1 | SELECT $1 AS "float"
      2 |    2 | SELECT $1 AS "int"
+     2 |    2 | SELECT $1 AS "int" LIMIT $2
+     2 |    0 | SELECT $1 AS "int" OFFSET $2
+     6 |    0 | SELECT $1 AS "int" OFFSET $2 LIMIT $3
+     2 |    2 | SELECT $1 AS "int" ORDER BY 1
      1 |    2 | SELECT $1 AS i UNION SELECT $2 ORDER BY i
      1 |    1 | SELECT $1 || $2
      1 |    1 | SELECT $1, $2 LIMIT $3
+     2 |    2 | SELECT DISTINCT $1 AS "int"
      0 |    0 | SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C"
      1 |    1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
      1 |    2 | WITH t(f) AS (                                                              +
@@ -145,7 +230,7 @@ SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
        |      | )                                                                           +
        |      |   SELECT f FROM t ORDER BY f
      1 |    1 | select $1::jsonb ? $2
-(12 rows)
+(17 rows)
 
 SELECT pg_stat_statements_reset() IS NOT NULL AS t;
  t 
diff --git a/contrib/pg_stat_statements/sql/select.sql b/contrib/pg_stat_statements/sql/select.sql
index e0be58d5e24b..4dcfa8ef74dc 100644
--- a/contrib/pg_stat_statements/sql/select.sql
+++ b/contrib/pg_stat_statements/sql/select.sql
@@ -12,6 +12,26 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t;
 --
 SELECT 1 AS "int";
 
+-- LIMIT and OFFSET patterns
+-- These require more entropy with parsing node offsets.
+SELECT 1 AS "int" LIMIT 1;
+SELECT 1 AS "int" LIMIT 2;
+SELECT 1 AS "int" OFFSET 1;
+SELECT 1 AS "int" OFFSET 2;
+SELECT 1 AS "int" OFFSET 1 LIMIT 1;
+SELECT 1 AS "int" OFFSET 2 LIMIT 2;
+SELECT 1 AS "int" LIMIT 1 OFFSET 1;
+SELECT 1 AS "int" LIMIT 3 OFFSET 3;
+SELECT 1 AS "int" OFFSET 1 FETCH FIRST 2 ROW ONLY;
+SELECT 1 AS "int" OFFSET 2 FETCH FIRST 3 ROW ONLY;
+
+-- DISTINCT and ORDER BY patterns
+-- These require more entropy with parsing node offsets.
+SELECT DISTINCT 1 AS "int";
+SELECT DISTINCT 2 AS "int";
+SELECT 1 AS "int" ORDER BY 1;
+SELECT 2 AS "int" ORDER BY 1;
+
 /* this comment should not appear in the output */
 SELECT 'hello'
   -- but this one will appear
-- 
2.49.0

0002-Trick-to-measure-jumbling.patch.txttext/plain; charset=us-asciiDownload

From 2211e60a47f8ff926e22c21433f9630bc4580bb6 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Fri, 21 Mar 2025 14:21:56 +0900
Subject: [PATCH 2/2] Trick to measure jumbling

---
 src/backend/tcop/postgres.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 0554a4ae3c7b..e2365efe676b 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -873,6 +873,19 @@ pg_rewrite_query(Query *query)
 	return querytree_list;
 }
 
+#define NANOSEC_PER_SEC 1000000000
+
+#include <time.h>
+
+/* Returns difference between t1 and t2 in nanoseconds */
+static int64
+get_clock_diff(struct timespec *t1, struct timespec *t2)
+{
+	int64 nanosec = (t1->tv_sec - t2->tv_sec) * NANOSEC_PER_SEC;
+	nanosec += (t1->tv_nsec - t2->tv_nsec);
+
+	return nanosec;
+}
 
 /*
  * Generate a plan for a single already-rewritten query.
@@ -883,6 +896,8 @@ pg_plan_query(Query *querytree, const char *query_string, int cursorOptions,
 			  ParamListInfo boundParams)
 {
 	PlannedStmt *plan;
+	struct timespec start, end;
+	int64 jumbletime;
 
 	/* Utility commands have no plans. */
 	if (querytree->commandType == CMD_UTILITY)
@@ -896,6 +911,19 @@ pg_plan_query(Query *querytree, const char *query_string, int cursorOptions,
 	if (log_planner_stats)
 		ResetUsage();
 
+	if (IsQueryIdEnabled())
+	{
+		#define JUMBLE_LOOPS 100000
+		clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);
+		for (int i = 0; i < JUMBLE_LOOPS; i++)
+		{
+			JumbleQuery(querytree);
+		}
+		clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end);
+		jumbletime = get_clock_diff(&end, &start);
+		elog(NOTICE, "QueryJumble in %f ms", jumbletime / 1000000.0);
+	}
+
 	/* call the optimizer */
 	plan = planner(querytree, query_string, cursorOptions, boundParams);
 
-- 
2.49.0

#36

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: Michael Paquier (#35)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Fri, Mar 21, 2025 at 02:45:09PM +0900, Michael Paquier wrote:

Potential ideas about optimizing the branches seem worth
investigating, I am not sure that I would be able to dig into that
until the feature freeze gong, unfortunately, and it would be nice to
fix this issue for this release. I'll see about doing some pgbench
runs on my side, as well, with tpc-b Querys.

So, here are some numbers with a "SELECT 1;" using pgbench, which
would be a worst-case. Then, at 128 clients on a machine I have
around:
- HEAD, compute_query_id = on:
tps = 282081.839715
tps = 281379.854539
tps = 283708.205096
tps = 280197.225782 #worst
tps = 291955.914664 #best
tps = 284172.351609
tps = 287234.624720
tps = 287915.683291 #best
tps = 277004.123742 #worst
tps = 281326.071323
- Patch, compute_query_id = on:
tps = 289746.641246 #best
tps = 279689.654196
tps = 284874.774620 #best
tps = 282312.246610
tps = 275807.057758 #worst
tps = 276058.858384 #worst
tps = 282705.638685
tps = 282651.367641
tps = 276777.418569
tps = 283137.265298

Removing the two best and two worst numbers, taking an average as in:
(head_tps - patch_tps) * 2 / (head_tps + patch_tps)
Then that's 283317.15 (HEAD) vs 281212.26 (patch) so a 0.7% decrease.
Still that could be claimed as noise.

Second set, at 1 client, and a better pattern becomes available:
- HEAD, compute_query_id = on:
tps = 19170.473320 #best
tps = 18722.828797
tps = 19138.765391
tps = 18243.611593
tps = 17878.605004 #worst
tps = 19557.753327 #best
tps = 18395.734885
tps = 18433.188326
tps = 18227.914138 #worst
tps = 18450.236841
- Patch, compute_query_id = on:
tps = 16611.766411
tps = 15992.734615 #worst
tps = 16465.077482
tps = 16726.097318
tps = 17284.874897 #best
tps = 16076.378905
tps = 17971.534072 #best
tps = 15733.718159 #worst
tps = 16911.014660
tps = 17217.202679

And here we get 18564.06 (head) vs 16667.92 (patch) so a 10.7%
difference in this run. Hence the automatic addition of NULL to the
jumbling is disappointing here, even if this is what I'd see this as a
worst case scenario, unlikely what one would see for real if they care
about monitoring.

So, on the other hand, we still have the trick to switch the fields,
which means that we would not suffer any performance penalty, while
fixing the original issue. And with the feature freeze in sight, time
is running short if we want to do something here.

NB: In case, a second run with 1 client showing the same trend:
- HEAD:
tps = 19697.406181
tps = 19474.495729
tps = 18614.615630
tps = 18868.245926
tps = 18949.775164
tps = 18943.311921
tps = 18824.192383
tps = 19104.542301
tps = 18950.870646
tps = 19331.829745
- Patch:
tps = 17887.864764
tps = 17317.628065
tps = 17090.278323
tps = 15876.258607
tps = 16818.644938
tps = 16816.334313
tps = 18134.148809
tps = 16891.809962
tps = 18193.104125
tps = 16866.726990
--
Michael

#37

David Rowley

dgrowleyml@gmail.com

10 months ago

In reply to: Michael Paquier (#36)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Mon, 24 Mar 2025 at 15:23, Michael Paquier <michael@paquier.xyz> wrote:

And here we get 18564.06 (head) vs 16667.92 (patch) so a 10.7%
difference in this run. Hence the automatic addition of NULL to the
jumbling is disappointing here, even if this is what I'd see this as a
worst case scenario, unlikely what one would see for real if they care
about monitoring.

Can you share which patch you tested here? There are many patches on
this thread and I don't want to make assumptions about which one these
are the results for.

It sounds like it wasn't your latest patch as you mentioned "automatic
addition of NULL to the jumbling". That doesn't sound like the
correct description for your latest patch.

David

#38

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: David Rowley (#37)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Mon, Mar 24, 2025 at 07:42:21PM +1300, David Rowley wrote:

Can you share which patch you tested here? There are many patches on
this thread and I don't want to make assumptions about which one these
are the results for.

The patch tested is the latest one that has been posted on this
thread, for a comparison of HEAD at 8a3e4011f02d and this 0001 (no
0002 which was forcing a loop to make the jumbling heavier):
/messages/by-id/Z9z85Ui5lPkPn2hq@paquier.xyz

The message holding the patch tested is the one I've replied to when
posting the results shared here:
/messages/by-id/Z-DB83o5AZmgnvnA@paquier.xyz
--
Michael

#39

David Rowley

dgrowleyml@gmail.com

10 months ago

In reply to: Michael Paquier (#35)

2 attachment(s)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Fri, 21 Mar 2025 at 18:45, Michael Paquier <michael@paquier.xyz> wrote:

Potential ideas about optimizing the branches seem worth
investigating, I am not sure that I would be able to dig into that
until the feature freeze gong, unfortunately, and it would be nice to
fix this issue for this release. I'll see about doing some pgbench
runs on my side, as well, with tpc-b Querys.

I've come up with a version of this that fixes the issue and improves
performance rather than regresses it. See attached.

This patch uses the same "ensure we jumble at least something when we
get a NULL" method, but works differently to anything previously
proposed here. Instead of modifying the jumble buffer each time we see
a NULL, the patch just increments a counter. It's only when something
else actually needs to append bytes to the buffer that we must flush
the pending NULLs to the buffer beforehand. I've opted to just write
the consecutive NULL count to the buffer. This allows the NULLs to be
appended more efficiently. This has the advantage of having to do very
little when we see NULL values, and would probably do well on your
"SELECT 1" test, which has lots of NULLs.

I've also performed a round of optimisations to the code:

1) The jumble size in AppendJumble should never be zero, so the while
(size > 0) loop can become a do while loop. This halves the size
checks for the normal non-overflow case.
2) Added inlined functions to jumble common field sizes. This saves
having to move the size into the appropriate register before calling
the AppendJumble function and reduces the size of the compiled code as
a result. It also allows the fast-path I added to AppendJumbleInternal
for the non-overflow case to work more efficiently as the memcpy can
transformed into fewer instructions due to the size being both known
and small enough to perform the copy in a single instruction. This
previously didn't work since part_size is a variable the inlined
function. With the fast-path, "size" is used, which is a compile-time
const to the inlined function.

I tested the performance of this using the patch from [1]/messages/by-id/attachment/174042/jumbletime.patch.txt and running
"make check" 50 times. This gave me the total time spent jumbling all
of the make check queries. I sorted the results by time and graphed
them in the attached graph. The patched version is on average 1.73%
*faster* than master on my AMD 3990x machine:

master: 76.564 milliseconds
v7 patch 75.236 milliseconds

It should be possible to still add the total_jumble_len stuff to this
in assert builds with zero extra overhead in non-assert builds, which
I think should address your concern with custom jumble functions not
jumbling anything. I can adjust accordingly if you're happy to use the
method I'm proposing.

The tests done were in a version of master prior to 5ac462e2b. I've
not yet studied if anything needs to be adjusted to account for that
commit.

David

[1]: /messages/by-id/attachment/174042/jumbletime.patch.txt

Attachments:

v7-0001-Fix-query-jumbling-to-account-for-NULL-nodes.patchapplication/octet-stream; name=v7-0001-Fix-query-jumbling-to-account-for-NULL-nodes.patchDownload

From 4423d726583a9adcfacce477cb98265f900e72fd Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Tue, 25 Mar 2025 18:49:15 +1300
Subject: [PATCH v7] Fix query jumbling to account for NULL nodes

Previously NULL nodes were ignored.  This could cause issues where the
computed query ID could match for queries where fields that are next to
each other in their Node struct where one field was NULL and the other
non-NULL.  For example, the Query struct has distinctClause and sortClause
next to each other.  If someone wrote;

SELECT DISTINCT c1 FROM t;

and then;

SELECT c1 FROM t ORDER BY c1;

these would produce the same query ID since in the first query we ignore
the NULL sortClause and append the jumble bytes for the distictClause.
In the latter query, since we do nothing for the NULL distinctClause
then jumble the non-NULL sortClause, and since the node representation
stored is the same in both cases, the query IDs are identical.

Here we fix this by always jumbling NULL nodes.  This produces different
jumble bytes in the above queries since the order that we jumble the
NULL changes between each of the above two queries.

Fixing this without regressing the performance of jumbling was quite
tricky, so this commit also contains a few optimizations to improve the
jumble performance.

Discussion: https://postgr.es/m/aafce7966e234372b2ba876c0193f1e9%40localhost.localdomain
---
 src/backend/nodes/queryjumblefuncs.c | 159 +++++++++++++++++++++++++--
 src/include/nodes/queryjumble.h      |   6 +
 2 files changed, 153 insertions(+), 12 deletions(-)

diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index f8b0f91704b..8d3b1348634 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -58,6 +58,8 @@ bool		query_id_squash_values = false;
  */
 bool		query_id_enabled = false;
 
+static void FlushPendingNulls(JumbleState *jstate);
+
 static void AppendJumble(JumbleState *jstate,
 						 const unsigned char *item, Size size);
 static void RecordConstLocation(JumbleState *jstate,
@@ -134,9 +136,13 @@ JumbleQuery(Query *query)
 		palloc(jstate->clocations_buf_size * sizeof(LocationLen));
 	jstate->clocations_count = 0;
 	jstate->highest_extern_param_id = 0;
+	jstate->pending_nulls = 0;
 
 	/* Compute query ID and mark the Query node with it */
 	_jumbleNode(jstate, (Node *) query);
+	if (jstate->pending_nulls > 0)
+		FlushPendingNulls(jstate);
+
 	query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
 													  jstate->jumble_len,
 													  0));
@@ -170,25 +176,40 @@ EnableQueryId(void)
 }
 
 /*
- * AppendJumble: Append a value that is substantive in a given query to
- * the current jumble.
+ * AppendJumbleInternal: Append a value that is substantive in a given query
+ * to the current jumble.
+ *
+ * Note: Callers must ensure that jstate->pending_nulls is zero first by
+ * calling FlushPendingNulls() when required.  Callers must also ensure that
+ * size > 0.
  */
-static void
-AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
+static pg_attribute_always_inline void
+AppendJumbleInternal(JumbleState *jstate, const unsigned char *item,
+					 Size size)
 {
 	unsigned char *jumble = jstate->jumble;
 	Size		jumble_len = jstate->jumble_len;
 
+	/* Ensure the caller didn't mess up */
+	Assert(size > 0);
+
+	if (likely(size <= JUMBLE_SIZE - jumble_len))
+	{
+		memcpy(jumble + jumble_len, item, size);
+		jstate->jumble_len += size;
+		return;
+	}
+
 	/*
 	 * Whenever the jumble buffer is full, we hash the current contents and
 	 * reset the buffer to contain just that hash value, thus relying on the
 	 * hash to summarize everything so far.
 	 */
-	while (size > 0)
+	do
 	{
 		Size		part_size;
 
-		if (jumble_len >= JUMBLE_SIZE)
+		if (unlikely(jumble_len >= JUMBLE_SIZE))
 		{
 			uint64		start_hash;
 
@@ -202,10 +223,106 @@ AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
 		jumble_len += part_size;
 		item += part_size;
 		size -= part_size;
-	}
+	} while (size > 0);
+
 	jstate->jumble_len = jumble_len;
 }
 
+/*
+ * AppendJumbleNull
+ *		For jumbling NULL pointers
+ */
+static pg_attribute_always_inline void
+AppendJumbleNull(JumbleState *jstate)
+{
+	jstate->pending_nulls++;
+}
+
+/*
+ * AppendJumble
+ *		Add 'size' bytes of the given jumble 'value' to the jumble state
+ */
+static pg_noinline void
+AppendJumble(JumbleState *jstate, const unsigned char *value, Size size)
+{
+	if (jstate->pending_nulls > 0)
+		FlushPendingNulls(jstate);
+
+	AppendJumbleInternal(jstate, value, size);
+}
+
+/*
+ * AppendJumble8
+ *		Add the first byte from the given 'value' pointer to the jumble state
+ */
+static pg_noinline void
+AppendJumble8(JumbleState *jstate, const unsigned char *value)
+{
+	if (jstate->pending_nulls > 0)
+		FlushPendingNulls(jstate);
+
+	AppendJumbleInternal(jstate, value, 1);
+}
+
+/*
+ * AppendJumble16
+ *		Add the first 2 bytes from the given 'value' pointer to the jumble
+ *		state.
+ */
+static pg_noinline void
+AppendJumble16(JumbleState *jstate, const unsigned char *value)
+{
+	if (jstate->pending_nulls > 0)
+		FlushPendingNulls(jstate);
+
+	AppendJumbleInternal(jstate, value, 2);
+}
+
+/*
+ * AppendJumble32
+ *		Add the first 4 bytes from the given 'value' pointer to the jumble
+ *		state.
+ */
+static pg_noinline void
+AppendJumble32(JumbleState *jstate, const unsigned char *value)
+{
+	if (jstate->pending_nulls > 0)
+		FlushPendingNulls(jstate);
+
+	AppendJumbleInternal(jstate, value, 4);
+}
+
+/*
+ * AppendJumble64
+ *		Add the first 8 bytes from the given 'value' pointer to the jumble
+ *		state.
+ */
+static pg_noinline void
+AppendJumble64(JumbleState *jstate, const unsigned char *value)
+{
+	if (jstate->pending_nulls > 0)
+		FlushPendingNulls(jstate);
+
+	AppendJumbleInternal(jstate, value, 8);
+}
+
+/*
+ * FlushPendingNulls
+ *		Incorporate the pending_null values into the jumble buffer.
+ *
+ * Note: Callers must ensure that there's at least 1 pending NULL.
+ */
+static pg_attribute_always_inline void
+FlushPendingNulls(JumbleState *jstate)
+{
+	Assert(jstate->pending_nulls > 0);
+
+	AppendJumbleInternal(jstate,
+						 (const unsigned char *) &jstate->pending_nulls, 4);
+	jstate->pending_nulls = 0;
+}
+
+
 /*
  * Record location of constant within query string of query tree that is
  * currently being walked.
@@ -319,19 +436,34 @@ IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
 }
 
 #define JUMBLE_NODE(item) \
-	_jumbleNode(jstate, (Node *) expr->item)
+	_jumbleNode(jstate, (Node *) expr->item);
 #define JUMBLE_ELEMENTS(list) \
 	_jumbleElements(jstate, (List *) expr->list)
 #define JUMBLE_LOCATION(location) \
 	RecordConstLocation(jstate, expr->location, false)
 #define JUMBLE_FIELD(item) \
-	AppendJumble(jstate, (const unsigned char *) &(expr->item), sizeof(expr->item))
+do { \
+	if (sizeof(expr->item) == 8) \
+		AppendJumble64(jstate, (const unsigned char *) &(expr->item)); \
+	else if (sizeof(expr->item) == 4) \
+		AppendJumble32(jstate, (const unsigned char *) &(expr->item)); \
+	else if (sizeof(expr->item) == 2) \
+		AppendJumble16(jstate, (const unsigned char *) &(expr->item)); \
+	else if (sizeof(expr->item) == 1) \
+		AppendJumble8(jstate, (const unsigned char *) &(expr->item)); \
+	else \
+		AppendJumble(jstate, (const unsigned char *) &(expr->item), sizeof(expr->item)); \
+} while (0)
+
+/* XXX what's this used for? */
 #define JUMBLE_FIELD_SINGLE(item) \
 	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
 #define JUMBLE_STRING(str) \
 do { \
 	if (expr->str) \
 		AppendJumble(jstate, (const unsigned char *) (expr->str), strlen(expr->str) + 1); \
+	else \
+		AppendJumbleNull(jstate); \
 } while(0)
 /* Function name used for the node field attribute custom_query_jumble. */
 #define JUMBLE_CUSTOM(nodetype, item) \
@@ -384,7 +516,10 @@ _jumbleNode(JumbleState *jstate, Node *node)
 	Node	   *expr = node;
 
 	if (expr == NULL)
+	{
+		AppendJumbleNull(jstate);
 		return;
+	}
 
 	/* Guard against stack overflow due to overly complex expressions */
 	check_stack_depth();
@@ -448,15 +583,15 @@ _jumbleList(JumbleState *jstate, Node *node)
 			break;
 		case T_IntList:
 			foreach(l, expr)
-				JUMBLE_FIELD_SINGLE(lfirst_int(l));
+				AppendJumble32(jstate, (const unsigned char *) &lfirst_int(l));
 			break;
 		case T_OidList:
 			foreach(l, expr)
-				JUMBLE_FIELD_SINGLE(lfirst_oid(l));
+				AppendJumble32(jstate, (const unsigned char *) &lfirst_oid(l));
 			break;
 		case T_XidList:
 			foreach(l, expr)
-				JUMBLE_FIELD_SINGLE(lfirst_xid(l));
+				AppendJumble32(jstate, (const unsigned char *) &lfirst_xid(l));
 			break;
 		default:
 			elog(ERROR, "unrecognized list node type: %d",
diff --git a/src/include/nodes/queryjumble.h b/src/include/nodes/queryjumble.h
index 905f66bc0bd..ab49b2b8b93 100644
--- a/src/include/nodes/queryjumble.h
+++ b/src/include/nodes/queryjumble.h
@@ -54,6 +54,12 @@ typedef struct JumbleState
 
 	/* highest Param id we've seen, in order to start normalization correctly */
 	int			highest_extern_param_id;
+
+	/*
+	 * Count of the number of NULL nodes seen since last appending a value.
+	 * These get flushed out to the jumble buffer before subsequent appends
+	 */
+	unsigned int pending_nulls;
 } JumbleState;
 
 /* Values for the compute_query_id GUC */
-- 
2.43.0

v7_bench_results.pngimage/png; name=v7_bench_results.pngDownload

#40

Bykov Ivan

i.bykov@modernsys.ru

10 months ago

In reply to: David Rowley (#39)

Hello, David!

As I can see, your patch has the same idea as my v2-0001-Query-ID-Calculation-Fix-Variant-B.patch from [1]/messages/by-id/5ac172e0b77a4baba50671cd1a15285f@localhost.localdomain.
I think it would be better to extract the jumble buffer update with hash calculation into a function
(CompressJumble in my patch). This will result in fewer changes to the source code.
We can also simply dump the null count to the buffer when we encounter a non-null field
(which will be processed in AppendJumble, indeed).

What do your thing about my patch (v2-0001-Query-ID-Calculation-Fix-Variant-B.patch)?

[1]: /messages/by-id/5ac172e0b77a4baba50671cd1a15285f@localhost.localdomain

#41

David Rowley

dgrowleyml@gmail.com

10 months ago

In reply to: Bykov Ivan (#40)

1 attachment(s)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Tue, 25 Mar 2025 at 21:14, Bykov Ivan <i.bykov@modernsys.ru> wrote:

As I can see, your patch has the same idea as my v2-0001-Query-ID-Calculation-Fix-Variant-B.patch from [1].
I think it would be better to extract the jumble buffer update with hash calculation into a function
(CompressJumble in my patch). This will result in fewer changes to the source code.
We can also simply dump the null count to the buffer when we encounter a non-null field
(which will be processed in AppendJumble, indeed).

What do your thing about my patch (v2-0001-Query-ID-Calculation-Fix-Variant-B.patch)?

Seemingly, I missed that one. I agree with the general method being
suitable, which might not be surprising since I ended up at the same
conclusion independently.

I quite like your CompressJumble() idea, but I don't know that it
amounts to any compiled code changes.

I think it's a bit strange that you're flushing the pending nulls to
the jumble buffer after jumbling the first non-null. I think it makes
much more sense to do it before as I did, otherwise, you're
effectively jumbling fields out of order.

I see you opted to only jumble the low-order byte of the pending null
counter. I used the full 4 bytes as I don't think the extra 3 bytes is
a problem. Witrh v7 the memcpy to copy the pending_nulls into the
buffer is inlined into a single instruction. It's semi-conceivable
that you could have more than 255 NULLs given that a List can contain
Nodes. I don't know if we ever have any NULL items in Lists at the
moment, but I don't think that that's disallowed. I'd feel much safer
taking the full 4 bytes of the counter to help prevent future
surprises.

Aside from that, v2b still has the performance regression issue that
we're trying to solve. I did a fresh run of master as of 19c6eb06c and
tested v7 and v2b. Here are the average times it took to jumble all
the make check queries over 50 runs;

master: 76.052 ms
v7: 75.422 ms
v2b: 81.391 ms

I also attached a graph.

I'm happy to proceed with the v7 version. I'm also happy to credit you
as the primary author of it, given that you came up with the v2b
patch. It might be best to extract the performance stuff that's in v7
and apply that separately from the bug fix. If you're still interested
and have time in the next 7 hours to do it, would you be able to split
the v7 as described, making v8-0001 as the bug fix and v8-0002 as the
performance stuff? If so, could you also add the total_jumble_len to
v8-0001 like Michael's last patch had, but adjust so it's only
maintained for Assert builds. Michael is quite keen on having more
assurance that custom jumble functions don't decide to forego any
jumbling.

If you don't have time, I'll do it in my morning (UTC+13).

Thanks

David

Show quoted text

[1] /messages/by-id/5ac172e0b77a4baba50671cd1a15285f@localhost.localdomain

#42

David Rowley

dgrowleyml@gmail.com

10 months ago

In reply to: David Rowley (#41)

2 attachment(s)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Wed, 26 Mar 2025 at 01:11, David Rowley <dgrowleyml@gmail.com> wrote:

I'm happy to proceed with the v7 version. I'm also happy to credit you
as the primary author of it, given that you came up with the v2b
patch. It might be best to extract the performance stuff that's in v7
and apply that separately from the bug fix. If you're still interested
and have time in the next 7 hours to do it, would you be able to split
the v7 as described, making v8-0001 as the bug fix and v8-0002 as the
performance stuff? If so, could you also add the total_jumble_len to
v8-0001 like Michael's last patch had, but adjust so it's only
maintained for Assert builds. Michael is quite keen on having more
assurance that custom jumble functions don't decide to forego any
jumbling.

If you don't have time, I'll do it in my morning (UTC+13).

Here is the v8 version with the bug fix and performance stuff
separated out. I also added the Assert code to ensure we always add
something to the jumble buffer when jumbling a node.

I didn't include the regression tests I saw in the v2b patch [1]/messages/by-id/5ac172e0b77a4baba50671cd1a15285f@localhost.localdomain. I
felt these were just marking the fact that there used to be a bug
here.

David

[1]: /messages/by-id/5ac172e0b77a4baba50671cd1a15285f@localhost.localdomain

Attachments:

v8-0001-Fix-query-jumbling-to-account-for-NULL-nodes.patchapplication/octet-stream; name=v8-0001-Fix-query-jumbling-to-account-for-NULL-nodes.patchDownload

From 449ad2f747b90fdbe14b15274089ccae50dd8e80 Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Tue, 25 Mar 2025 18:49:15 +1300
Subject: [PATCH v8 1/2] Fix query jumbling to account for NULL nodes

Previously NULL nodes were ignored.  This could cause issues where the
computed query ID could match for queries where fields that are next to
each other in their Node struct where one field was NULL and the other
non-NULL.  For example, the Query struct has distinctClause and sortClause
next to each other.  If someone wrote;

SELECT DISTINCT c1 FROM t;

and then;

SELECT c1 FROM t ORDER BY c1;

these would produce the same query ID since in the first query we ignore
the NULL sortClause and append the jumble bytes for the distictClause.
In the latter query, since we do nothing for the NULL distinctClause
then jumble the non-NULL sortClause, and since the node representation
stored is the same in both cases, the query IDs are identical.

Here we fix this by always jumbling NULL nodes.  This produces different
jumble bytes in the above queries since the order that we jumble the
NULL changes between each of the above two queries.

Author: Bykov Ivan <i.bykov@modernsys.ru>
Author: Michael Paquier <michael@paquier.xyz>
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/aafce7966e234372b2ba876c0193f1e9%40localhost.localdomain
---
 src/backend/nodes/queryjumblefuncs.c | 51 ++++++++++++++++++++++++++++
 src/include/nodes/queryjumble.h      | 10 ++++++
 2 files changed, 61 insertions(+)

diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index f8b0f91704b..91f8fc49ee8 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -58,6 +58,7 @@ bool		query_id_squash_values = false;
  */
 bool		query_id_enabled = false;
 
+static void FlushPendingNulls(JumbleState *jstate);
 static void AppendJumble(JumbleState *jstate,
 						 const unsigned char *item, Size size);
 static void RecordConstLocation(JumbleState *jstate,
@@ -134,9 +135,16 @@ JumbleQuery(Query *query)
 		palloc(jstate->clocations_buf_size * sizeof(LocationLen));
 	jstate->clocations_count = 0;
 	jstate->highest_extern_param_id = 0;
+	jstate->pending_nulls = 0;
+	jstate->total_jumble_len = 0;
 
 	/* Compute query ID and mark the Query node with it */
 	_jumbleNode(jstate, (Node *) query);
+
+	/* Flush any pending NULLs before doing the final hash */
+	if (jstate->pending_nulls > 0)
+		FlushPendingNulls(jstate);
+
 	query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
 													  jstate->jumble_len,
 													  0));
@@ -202,10 +210,42 @@ AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
 		jumble_len += part_size;
 		item += part_size;
 		size -= part_size;
+
+#ifdef USE_ASSERT_CHECKING
+		jstate->total_jumble_len += part_size;
+#endif
 	}
+
 	jstate->jumble_len = jumble_len;
 }
 
+/*
+ * AppendJumbleNull
+ *		For jumbling NULL pointers
+ */
+static pg_attribute_always_inline void
+AppendJumbleNull(JumbleState *jstate)
+{
+	jstate->pending_nulls++;
+}
+
+/*
+ * FlushPendingNulls
+ *		Incorporate the pending_null value into the jumble buffer.
+ *
+ * Note: Callers must ensure that there's at least 1 pending NULL.
+ */
+static pg_attribute_always_inline void
+FlushPendingNulls(JumbleState *jstate)
+{
+	Assert(jstate->pending_nulls > 0);
+
+	AppendJumble(jstate,
+				 (const unsigned char *) &jstate->pending_nulls, 4);
+	jstate->pending_nulls = 0;
+}
+
+
 /*
  * Record location of constant within query string of query tree that is
  * currently being walked.
@@ -332,6 +372,8 @@ IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
 do { \
 	if (expr->str) \
 		AppendJumble(jstate, (const unsigned char *) (expr->str), strlen(expr->str) + 1); \
+	else \
+		AppendJumbleNull(jstate); \
 } while(0)
 /* Function name used for the node field attribute custom_query_jumble. */
 #define JUMBLE_CUSTOM(nodetype, item) \
@@ -382,9 +424,15 @@ static void
 _jumbleNode(JumbleState *jstate, Node *node)
 {
 	Node	   *expr = node;
+#ifdef USE_ASSERT_CHECKING
+	Size		prev_jumble_len = jstate->total_jumble_len;
+#endif
 
 	if (expr == NULL)
+	{
+		AppendJumbleNull(jstate);
 		return;
+	}
 
 	/* Guard against stack overflow due to overly complex expressions */
 	check_stack_depth();
@@ -432,6 +480,9 @@ _jumbleNode(JumbleState *jstate, Node *node)
 		default:
 			break;
 	}
+
+	/* Ensure we added something to the jumble buffer */
+	Assert(jstate->total_jumble_len > prev_jumble_len);
 }
 
 static void
diff --git a/src/include/nodes/queryjumble.h b/src/include/nodes/queryjumble.h
index 905f66bc0bd..9414fb151ee 100644
--- a/src/include/nodes/queryjumble.h
+++ b/src/include/nodes/queryjumble.h
@@ -54,6 +54,16 @@ typedef struct JumbleState
 
 	/* highest Param id we've seen, in order to start normalization correctly */
 	int			highest_extern_param_id;
+
+	/*
+	 * Count of the number of NULL nodes seen since last appending a value.
+	 * These are flushed out to the jumble buffer before subsequent appends
+	 * and before performing the final jumble hash.
+	 */
+	unsigned int pending_nulls;
+
+	/* The total number of bytes added to the jumble buffer */
+	Size		total_jumble_len;
 } JumbleState;
 
 /* Values for the compute_query_id GUC */
-- 
2.43.0

v8-0002-Optimize-Query-jumble.patchapplication/octet-stream; name=v8-0002-Optimize-Query-jumble.patchDownload

From b1510c546df601952d1792fb576a6ffb2e1751c9 Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Wed, 26 Mar 2025 10:35:42 +1300
Subject: [PATCH v8 2/2] Optimize Query jumble

A recent change adjusted query jumbling so it no longer ignores NULL
nodes during the jumble.  This added some overhead.  Here we tune a few
things to make jumbling faster again.  This makes jumbling perform
similar or slightly faster than prior to that change.

Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvreP04nhTKuYsPw0F-YN+4nr4f=L72SPeFb81jfv+2c7w@mail.gmail.com
---
 src/backend/nodes/queryjumblefuncs.c | 130 ++++++++++++++++++++++++---
 1 file changed, 117 insertions(+), 13 deletions(-)

diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index 91f8fc49ee8..a7077778310 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -178,25 +178,50 @@ EnableQueryId(void)
 }
 
 /*
- * AppendJumble: Append a value that is substantive in a given query to
- * the current jumble.
+ * AppendJumbleInternal: Append a value that is substantive in a given query
+ * to the current jumble.
+ *
+ * Note: Callers must ensure that jstate->pending_nulls is zero first by
+ * calling FlushPendingNulls() when required.  Callers must also ensure that
+ * size > 0.
  */
-static void
-AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
+static pg_attribute_always_inline void
+AppendJumbleInternal(JumbleState *jstate, const unsigned char *item,
+					 Size size)
 {
 	unsigned char *jumble = jstate->jumble;
 	Size		jumble_len = jstate->jumble_len;
 
+	/* Ensure the caller didn't mess up */
+	Assert(size > 0);
+
+	/*
+	 * Fast path for when there's enough space left in the buffer.  This is
+	 * worthwhile as means the memcpy can be inlined into very efficient code
+	 * when 'size' is a compile-time constant.
+	 */
+	if (likely(size <= JUMBLE_SIZE - jumble_len))
+	{
+		memcpy(jumble + jumble_len, item, size);
+		jstate->jumble_len += size;
+
+#ifdef USE_ASSERT_CHECKING
+		jstate->total_jumble_len += size;
+#endif
+
+		return;
+	}
+
 	/*
 	 * Whenever the jumble buffer is full, we hash the current contents and
 	 * reset the buffer to contain just that hash value, thus relying on the
 	 * hash to summarize everything so far.
 	 */
-	while (size > 0)
+	do
 	{
 		Size		part_size;
 
-		if (jumble_len >= JUMBLE_SIZE)
+		if (unlikely(jumble_len >= JUMBLE_SIZE))
 		{
 			uint64		start_hash;
 
@@ -214,7 +239,7 @@ AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
 #ifdef USE_ASSERT_CHECKING
 		jstate->total_jumble_len += part_size;
 #endif
-	}
+	} while (size > 0);
 
 	jstate->jumble_len = jumble_len;
 }
@@ -229,6 +254,74 @@ AppendJumbleNull(JumbleState *jstate)
 	jstate->pending_nulls++;
 }
 
+/*
+ * AppendJumble
+ *		Add 'size' bytes of the given jumble 'value' to the jumble state
+ */
+static pg_noinline void
+AppendJumble(JumbleState *jstate, const unsigned char *value, Size size)
+{
+	if (jstate->pending_nulls > 0)
+		FlushPendingNulls(jstate);
+
+	AppendJumbleInternal(jstate, value, size);
+}
+
+/*
+ * AppendJumble8
+ *		Add the first byte from the given 'value' pointer to the jumble state
+ */
+static pg_noinline void
+AppendJumble8(JumbleState *jstate, const unsigned char *value)
+{
+	if (jstate->pending_nulls > 0)
+		FlushPendingNulls(jstate);
+
+	AppendJumbleInternal(jstate, value, 1);
+}
+
+/*
+ * AppendJumble16
+ *		Add the first 2 bytes from the given 'value' pointer to the jumble
+ *		state.
+ */
+static pg_noinline void
+AppendJumble16(JumbleState *jstate, const unsigned char *value)
+{
+	if (jstate->pending_nulls > 0)
+		FlushPendingNulls(jstate);
+
+	AppendJumbleInternal(jstate, value, 2);
+}
+
+/*
+ * AppendJumble32
+ *		Add the first 4 bytes from the given 'value' pointer to the jumble
+ *		state.
+ */
+static pg_noinline void
+AppendJumble32(JumbleState *jstate, const unsigned char *value)
+{
+	if (jstate->pending_nulls > 0)
+		FlushPendingNulls(jstate);
+
+	AppendJumbleInternal(jstate, value, 4);
+}
+
+/*
+ * AppendJumble64
+ *		Add the first 8 bytes from the given 'value' pointer to the jumble
+ *		state.
+ */
+static pg_noinline void
+AppendJumble64(JumbleState *jstate, const unsigned char *value)
+{
+	if (jstate->pending_nulls > 0)
+		FlushPendingNulls(jstate);
+
+	AppendJumbleInternal(jstate, value, 8);
+}
+
 /*
  * FlushPendingNulls
  *		Incorporate the pending_null value into the jumble buffer.
@@ -240,8 +333,8 @@ FlushPendingNulls(JumbleState *jstate)
 {
 	Assert(jstate->pending_nulls > 0);
 
-	AppendJumble(jstate,
-				 (const unsigned char *) &jstate->pending_nulls, 4);
+	AppendJumbleInternal(jstate,
+						 (const unsigned char *) &jstate->pending_nulls, 4);
 	jstate->pending_nulls = 0;
 }
 
@@ -365,7 +458,18 @@ IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
 #define JUMBLE_LOCATION(location) \
 	RecordConstLocation(jstate, expr->location, false)
 #define JUMBLE_FIELD(item) \
-	AppendJumble(jstate, (const unsigned char *) &(expr->item), sizeof(expr->item))
+do { \
+	if (sizeof(expr->item) == 8) \
+		AppendJumble64(jstate, (const unsigned char *) &(expr->item)); \
+	else if (sizeof(expr->item) == 4) \
+		AppendJumble32(jstate, (const unsigned char *) &(expr->item)); \
+	else if (sizeof(expr->item) == 2) \
+		AppendJumble16(jstate, (const unsigned char *) &(expr->item)); \
+	else if (sizeof(expr->item) == 1) \
+		AppendJumble8(jstate, (const unsigned char *) &(expr->item)); \
+	else \
+		AppendJumble(jstate, (const unsigned char *) &(expr->item), sizeof(expr->item)); \
+} while (0)
 #define JUMBLE_FIELD_SINGLE(item) \
 	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
 #define JUMBLE_STRING(str) \
@@ -499,15 +603,15 @@ _jumbleList(JumbleState *jstate, Node *node)
 			break;
 		case T_IntList:
 			foreach(l, expr)
-				JUMBLE_FIELD_SINGLE(lfirst_int(l));
+				AppendJumble32(jstate, (const unsigned char *) &lfirst_int(l));
 			break;
 		case T_OidList:
 			foreach(l, expr)
-				JUMBLE_FIELD_SINGLE(lfirst_oid(l));
+				AppendJumble32(jstate, (const unsigned char *) &lfirst_oid(l));
 			break;
 		case T_XidList:
 			foreach(l, expr)
-				JUMBLE_FIELD_SINGLE(lfirst_xid(l));
+				AppendJumble32(jstate, (const unsigned char *) &lfirst_xid(l));
 			break;
 		default:
 			elog(ERROR, "unrecognized list node type: %d",
-- 
2.43.0

#43

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: David Rowley (#42)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Wed, Mar 26, 2025 at 11:56:50AM +1300, David Rowley wrote:

Here is the v8 version with the bug fix and performance stuff
separated out.

Why not. I assume that you would merge these together?

I also added the Assert code to ensure we always add
something to the jumble buffer when jumbling a node.

Thanks. This part looks good with its USE_ASSERT_CHECKING.

I didn't include the regression tests I saw in the v2b patch [1]. I
felt these were just marking the fact that there used to be a bug
here.

I disagree about the lack of tests. These are cheap to add. I'd
suggest to not use the ones posted as part of v2 variant B as these do
not require the creation of a relation so they can be made cheaper,
and instead use the set I have posted in patch 0001 here which covers
most of the main scenarios from the parser with this nodes in the
Query:
/messages/by-id/Z9z85Ui5lPkPn2hq@paquier.xyz

It is a bit disappointing that we require JumbleQuery() to jumble the
NULLs. There are discussions about making this code available not
only for Querys, but for Nodes, and the NUL computations would be
missed..
--
Michael

#44

Bykov Ivan

i.bykov@modernsys.ru

10 months ago

In reply to: Michael Paquier (#43)

Hello, David!

I see you opted to only jumble the low-order byte of the pending null
counter. I used the full 4 bytes as I don't think the extra 3 bytes is
a problem. Witrh v7 the memcpy to copy the pending_nulls into the
buffer is inlined into a single instruction. It's semi-conceivable
that you could have more than 255 NULLs given that a List can contain
Nodes. I don't know if we ever have any NULL items in Lists at the
moment, but I don't think that that's disallowed. I'd feel much safer
taking the full 4 bytes of the counter to help prevent future
surprises.

Yes _jumbleList could theoretically handle cases with over 256 Node elements in same list,
but most (or all) of them is non-NULL (as I know only PlannedStmt->subplans accepts NULL elements),
Most of the foreach cycles over query lists don't have any NULL safety checks;
we just assume that the lists contain no NULL pointers.
I still believe storing just one byte is sufficient.

I'm happy to proceed with the v7 version. I'm also happy to credit you
As the primary author of it, given that you came up with the v2b
patch. It might be best to extract the performance stuff that's in v7
and apply that separately from the bug fix. If you're still interested
and have time in the next 7 hours to do it, would you be able to split
the v7 as described, making v8-0001 as the bug fix and v8-0002 as the
performance stuff? If so, could you also add the total_jumble_len to
v8-0001 like Michael's last patch had, but adjust so it's only
maintained for Assert builds. Michael is quite keen on having more
assurance that custom jumble functions don't decide to forego any
jumbling.

As I can see, you have already split the patches. I apologize for the delay; I don't have
much time to work on this issue. Thank you for your proposal to credit me as the patch author.

#45

David Rowley

dgrowleyml@gmail.com

10 months ago

In reply to: Michael Paquier (#43)

2 attachment(s)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Wed, 26 Mar 2025 at 14:04, Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Mar 26, 2025 at 11:56:50AM +1300, David Rowley wrote:

Here is the v8 version with the bug fix and performance stuff
separated out.

Why not. I assume that you would merge these together?

I think it's better it's two commits. They're for different purposes.

I disagree about the lack of tests. These are cheap to add. I'd
suggest to not use the ones posted as part of v2 variant B as these do
not require the creation of a relation so they can be made cheaper,
and instead use the set I have posted in patch 0001 here which covers
most of the main scenarios from the parser with this nodes in the
Query:
/messages/by-id/Z9z85Ui5lPkPn2hq@paquier.xyz

Ok, I won't protest. I just feel the bug is a relic of the old design
and I don't believe the tests are testing anything useful in the new
design. It feels unlikely that someone accidentally adjusts the code
and reverts it to the old design. Anyway, I've included the tests from
your patch.

It is a bit disappointing that we require JumbleQuery() to jumble the
NULLs. There are discussions about making this code available not
only for Querys, but for Nodes, and the NUL computations would be
missed..

I've adjusted the patch to add two static functions. InitJumble() and
DoJumble(). JumbleQuery() uses these.

David

Attachments:

v9-0001-Fix-query-jumbling-to-account-for-NULL-nodes.patchapplication/octet-stream; name=v9-0001-Fix-query-jumbling-to-account-for-NULL-nodes.patchDownload

From 1820a25ee972859c049d30a959b6e5f6ae29a3ba Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Tue, 25 Mar 2025 18:49:15 +1300
Subject: [PATCH v9 1/2] Fix query jumbling to account for NULL nodes

Previously NULL nodes were ignored.  This could cause issues where the
computed query ID could match for queries where fields that are next to
each other in their Node struct where one field was NULL and the other
non-NULL.  For example, the Query struct had distinctClause and sortClause
next to each other.  If someone wrote;

SELECT DISTINCT c1 FROM t;

and then;

SELECT c1 FROM t ORDER BY c1;

these would produce the same query ID since, in the first query, we
ignored the NULL sortClause and appended the jumble bytes for the
distictClause.  In the latter query, since we did nothing for the NULL
distinctClause then jumble the non-NULL sortClause, and since the node
representation stored is the same in both cases, the query IDs were
identical.

Here we fix this by always accounting for NULL nodes by recording that
we saw a NULL in the jumble buffer.  This fixes the issue as the order that
the NULL is recorded isn't the same in the above two queries.

Author: Bykov Ivan <i.bykov@modernsys.ru>
Author: Michael Paquier <michael@paquier.xyz>
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/aafce7966e234372b2ba876c0193f1e9%40localhost.localdomain
---
 .../pg_stat_statements/expected/select.out    |  87 ++++++++++++-
 contrib/pg_stat_statements/sql/select.sql     |  20 +++
 src/backend/nodes/queryjumblefuncs.c          | 116 +++++++++++++++---
 src/include/nodes/queryjumble.h               |  10 ++
 4 files changed, 216 insertions(+), 17 deletions(-)

diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out
index 1eebc2898ab..09476a7b699 100644
--- a/contrib/pg_stat_statements/expected/select.out
+++ b/contrib/pg_stat_statements/expected/select.out
@@ -19,6 +19,86 @@ SELECT 1 AS "int";
    1
 (1 row)
 
+-- LIMIT and OFFSET patterns
+-- Try some query permutations which once produced identical query IDs
+SELECT 1 AS "int" LIMIT 1;
+ int 
+-----
+   1
+(1 row)
+
+SELECT 1 AS "int" LIMIT 2;
+ int 
+-----
+   1
+(1 row)
+
+SELECT 1 AS "int" OFFSET 1;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" OFFSET 2;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" OFFSET 1 LIMIT 1;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" OFFSET 2 LIMIT 2;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" LIMIT 1 OFFSET 1;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" LIMIT 3 OFFSET 3;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" OFFSET 1 FETCH FIRST 2 ROW ONLY;
+ int 
+-----
+(0 rows)
+
+SELECT 1 AS "int" OFFSET 2 FETCH FIRST 3 ROW ONLY;
+ int 
+-----
+(0 rows)
+
+-- DISTINCT and ORDER BY patterns
+-- Try some query permutations which once produced identical query IDs
+SELECT DISTINCT 1 AS "int";
+ int 
+-----
+   1
+(1 row)
+
+SELECT DISTINCT 2 AS "int";
+ int 
+-----
+   2
+(1 row)
+
+SELECT 1 AS "int" ORDER BY 1;
+ int 
+-----
+   1
+(1 row)
+
+SELECT 2 AS "int" ORDER BY 1;
+ int 
+-----
+   2
+(1 row)
+
 /* this comment should not appear in the output */
 SELECT 'hello'
   -- but this one will appear
@@ -135,9 +215,14 @@ SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
      3 |    3 | SELECT $1 + $2 + $3 AS "add"
      1 |    1 | SELECT $1 AS "float"
      2 |    2 | SELECT $1 AS "int"
+     2 |    2 | SELECT $1 AS "int" LIMIT $2
+     2 |    0 | SELECT $1 AS "int" OFFSET $2
+     6 |    0 | SELECT $1 AS "int" OFFSET $2 LIMIT $3
+     2 |    2 | SELECT $1 AS "int" ORDER BY 1
      1 |    2 | SELECT $1 AS i UNION SELECT $2 ORDER BY i
      1 |    1 | SELECT $1 || $2
      1 |    1 | SELECT $1, $2 LIMIT $3
+     2 |    2 | SELECT DISTINCT $1 AS "int"
      0 |    0 | SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C"
      1 |    1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
      1 |    2 | WITH t(f) AS (                                                              +
@@ -145,7 +230,7 @@ SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
        |      | )                                                                           +
        |      |   SELECT f FROM t ORDER BY f
      1 |    1 | select $1::jsonb ? $2
-(12 rows)
+(17 rows)
 
 SELECT pg_stat_statements_reset() IS NOT NULL AS t;
  t 
diff --git a/contrib/pg_stat_statements/sql/select.sql b/contrib/pg_stat_statements/sql/select.sql
index 9a35471b271..c5e0b84ee5b 100644
--- a/contrib/pg_stat_statements/sql/select.sql
+++ b/contrib/pg_stat_statements/sql/select.sql
@@ -12,6 +12,26 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t;
 --
 SELECT 1 AS "int";
 
+-- LIMIT and OFFSET patterns
+-- Try some query permutations which once produced identical query IDs
+SELECT 1 AS "int" LIMIT 1;
+SELECT 1 AS "int" LIMIT 2;
+SELECT 1 AS "int" OFFSET 1;
+SELECT 1 AS "int" OFFSET 2;
+SELECT 1 AS "int" OFFSET 1 LIMIT 1;
+SELECT 1 AS "int" OFFSET 2 LIMIT 2;
+SELECT 1 AS "int" LIMIT 1 OFFSET 1;
+SELECT 1 AS "int" LIMIT 3 OFFSET 3;
+SELECT 1 AS "int" OFFSET 1 FETCH FIRST 2 ROW ONLY;
+SELECT 1 AS "int" OFFSET 2 FETCH FIRST 3 ROW ONLY;
+
+-- DISTINCT and ORDER BY patterns
+-- Try some query permutations which once produced identical query IDs
+SELECT DISTINCT 1 AS "int";
+SELECT DISTINCT 2 AS "int";
+SELECT 1 AS "int" ORDER BY 1;
+SELECT 2 AS "int" ORDER BY 1;
+
 /* this comment should not appear in the output */
 SELECT 'hello'
   -- but this one will appear
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index 62d6cfb7ac1..9d206afef15 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -58,6 +58,9 @@ bool		query_id_squash_values = false;
  */
 bool		query_id_enabled = false;
 
+static JumbleState *InitJumble(void);
+static uint64 DoJumble(JumbleState *jstate, Node *node);
+static void FlushPendingNulls(JumbleState *jstate);
 static void AppendJumble(JumbleState *jstate,
 						 const unsigned char *item, Size size);
 static void RecordConstLocation(JumbleState *jstate,
@@ -120,29 +123,22 @@ CleanQuerytext(const char *query, int *location, int *len)
 	return query;
 }
 
+/*
+ * JumbleQuery
+ *		Recursively process the given Query producing a 64-bit hash value by
+ *		hashing the relevant fields and record that value in the Query's queryId
+ *		field.  Return the JumbleState object used for jumbling the query.
+ */
 JumbleState *
 JumbleQuery(Query *query)
 {
-	JumbleState *jstate = NULL;
+	JumbleState *jstate;
 
 	Assert(IsQueryIdEnabled());
 
-	jstate = (JumbleState *) palloc(sizeof(JumbleState));
-
-	/* Set up workspace for query jumbling */
-	jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
-	jstate->jumble_len = 0;
-	jstate->clocations_buf_size = 32;
-	jstate->clocations = (LocationLen *)
-		palloc(jstate->clocations_buf_size * sizeof(LocationLen));
-	jstate->clocations_count = 0;
-	jstate->highest_extern_param_id = 0;
+	jstate = InitJumble();
 
-	/* Compute query ID and mark the Query node with it */
-	_jumbleNode(jstate, (Node *) query);
-	query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
-													  jstate->jumble_len,
-													  0));
+	query->queryId = DoJumble(jstate, (Node *) query);
 
 	/*
 	 * If we are unlucky enough to get a hash of zero, use 1 instead for
@@ -172,6 +168,51 @@ EnableQueryId(void)
 		query_id_enabled = true;
 }
 
+/*
+ * InitJumble
+ *		Allocate a JumbleState object and make it ready to jumble.
+ */
+static JumbleState *
+InitJumble(void)
+{
+	JumbleState *jstate;
+
+	jstate = (JumbleState *) palloc(sizeof(JumbleState));
+
+	/* Set up workspace for query jumbling */
+	jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
+	jstate->jumble_len = 0;
+	jstate->clocations_buf_size = 32;
+	jstate->clocations = (LocationLen *) palloc(jstate->clocations_buf_size *
+												sizeof(LocationLen));
+	jstate->clocations_count = 0;
+	jstate->highest_extern_param_id = 0;
+	jstate->pending_nulls = 0;
+	jstate->total_jumble_len = 0;
+
+	return jstate;
+}
+
+/*
+ * DoJumble
+ *		Jumble the given Node using the given JumbleState and return the resulting
+ *		jumble hash.
+ */
+static uint64
+DoJumble(JumbleState *jstate, Node *node)
+{
+	/* Compute query ID and mark the Query node with it */
+	_jumbleNode(jstate, node);
+
+	/* Flush any pending NULLs before doing the final hash */
+	if (jstate->pending_nulls > 0)
+		FlushPendingNulls(jstate);
+
+	return DatumGetUInt64(hash_any_extended(jstate->jumble,
+											jstate->jumble_len,
+											0));
+}
+
 /*
  * AppendJumble: Append a value that is substantive in a given query to
  * the current jumble.
@@ -205,10 +246,42 @@ AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
 		jumble_len += part_size;
 		item += part_size;
 		size -= part_size;
+
+#ifdef USE_ASSERT_CHECKING
+		jstate->total_jumble_len += part_size;
+#endif
 	}
+
 	jstate->jumble_len = jumble_len;
 }
 
+/*
+ * AppendJumbleNull
+ *		For jumbling NULL pointers
+ */
+static pg_attribute_always_inline void
+AppendJumbleNull(JumbleState *jstate)
+{
+	jstate->pending_nulls++;
+}
+
+/*
+ * FlushPendingNulls
+ *		Incorporate the pending_null value into the jumble buffer.
+ *
+ * Note: Callers must ensure that there's at least 1 pending NULL.
+ */
+static pg_attribute_always_inline void
+FlushPendingNulls(JumbleState *jstate)
+{
+	Assert(jstate->pending_nulls > 0);
+
+	AppendJumble(jstate,
+				 (const unsigned char *) &jstate->pending_nulls, 4);
+	jstate->pending_nulls = 0;
+}
+
+
 /*
  * Record location of constant within query string of query tree that is
  * currently being walked.
@@ -335,6 +408,8 @@ IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
 do { \
 	if (expr->str) \
 		AppendJumble(jstate, (const unsigned char *) (expr->str), strlen(expr->str) + 1); \
+	else \
+		AppendJumbleNull(jstate); \
 } while(0)
 /* Function name used for the node field attribute custom_query_jumble. */
 #define JUMBLE_CUSTOM(nodetype, item) \
@@ -385,9 +460,15 @@ static void
 _jumbleNode(JumbleState *jstate, Node *node)
 {
 	Node	   *expr = node;
+#ifdef USE_ASSERT_CHECKING
+	Size		prev_jumble_len = jstate->total_jumble_len;
+#endif
 
 	if (expr == NULL)
+	{
+		AppendJumbleNull(jstate);
 		return;
+	}
 
 	/* Guard against stack overflow due to overly complex expressions */
 	check_stack_depth();
@@ -435,6 +516,9 @@ _jumbleNode(JumbleState *jstate, Node *node)
 		default:
 			break;
 	}
+
+	/* Ensure we added something to the jumble buffer */
+	Assert(jstate->total_jumble_len > prev_jumble_len);
 }
 
 static void
diff --git a/src/include/nodes/queryjumble.h b/src/include/nodes/queryjumble.h
index 905f66bc0bd..9414fb151ee 100644
--- a/src/include/nodes/queryjumble.h
+++ b/src/include/nodes/queryjumble.h
@@ -54,6 +54,16 @@ typedef struct JumbleState
 
 	/* highest Param id we've seen, in order to start normalization correctly */
 	int			highest_extern_param_id;
+
+	/*
+	 * Count of the number of NULL nodes seen since last appending a value.
+	 * These are flushed out to the jumble buffer before subsequent appends
+	 * and before performing the final jumble hash.
+	 */
+	unsigned int pending_nulls;
+
+	/* The total number of bytes added to the jumble buffer */
+	Size		total_jumble_len;
 } JumbleState;
 
 /* Values for the compute_query_id GUC */
-- 
2.43.0

v9-0002-Optimize-Query-jumble.patchapplication/octet-stream; name=v9-0002-Optimize-Query-jumble.patchDownload

From 8d64fd04ca902f6760aafa5ec2ec14cfcfdddbec Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Wed, 26 Mar 2025 10:35:42 +1300
Subject: [PATCH v9 2/2] Optimize Query jumble

A recent change adjusted query jumbling so it no longer ignores NULL
nodes during the jumble.  This added some overhead.  Here we tune a few
things to make jumbling faster again.  This makes jumbling perform
similar or slightly faster than prior to that change.

Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvreP04nhTKuYsPw0F-YN+4nr4f=L72SPeFb81jfv+2c7w@mail.gmail.com
---
 src/backend/nodes/queryjumblefuncs.c | 130 ++++++++++++++++++++++++---
 1 file changed, 117 insertions(+), 13 deletions(-)

diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index 9d206afef15..a8c426b2483 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -214,25 +214,50 @@ DoJumble(JumbleState *jstate, Node *node)
 }
 
 /*
- * AppendJumble: Append a value that is substantive in a given query to
- * the current jumble.
+ * AppendJumbleInternal: Append a value that is substantive in a given query
+ * to the current jumble.
+ *
+ * Note: Callers must ensure that jstate->pending_nulls is zero first by
+ * calling FlushPendingNulls() when required.  Callers must also ensure that
+ * size > 0.
  */
-static void
-AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
+static pg_attribute_always_inline void
+AppendJumbleInternal(JumbleState *jstate, const unsigned char *item,
+					 Size size)
 {
 	unsigned char *jumble = jstate->jumble;
 	Size		jumble_len = jstate->jumble_len;
 
+	/* Ensure the caller didn't mess up */
+	Assert(size > 0);
+
+	/*
+	 * Fast path for when there's enough space left in the buffer.  This is
+	 * worthwhile as means the memcpy can be inlined into very efficient code
+	 * when 'size' is a compile-time constant.
+	 */
+	if (likely(size <= JUMBLE_SIZE - jumble_len))
+	{
+		memcpy(jumble + jumble_len, item, size);
+		jstate->jumble_len += size;
+
+#ifdef USE_ASSERT_CHECKING
+		jstate->total_jumble_len += size;
+#endif
+
+		return;
+	}
+
 	/*
 	 * Whenever the jumble buffer is full, we hash the current contents and
 	 * reset the buffer to contain just that hash value, thus relying on the
 	 * hash to summarize everything so far.
 	 */
-	while (size > 0)
+	do
 	{
 		Size		part_size;
 
-		if (jumble_len >= JUMBLE_SIZE)
+		if (unlikely(jumble_len >= JUMBLE_SIZE))
 		{
 			uint64		start_hash;
 
@@ -250,7 +275,7 @@ AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
 #ifdef USE_ASSERT_CHECKING
 		jstate->total_jumble_len += part_size;
 #endif
-	}
+	} while (size > 0);
 
 	jstate->jumble_len = jumble_len;
 }
@@ -265,6 +290,74 @@ AppendJumbleNull(JumbleState *jstate)
 	jstate->pending_nulls++;
 }
 
+/*
+ * AppendJumble
+ *		Add 'size' bytes of the given jumble 'value' to the jumble state
+ */
+static pg_noinline void
+AppendJumble(JumbleState *jstate, const unsigned char *value, Size size)
+{
+	if (jstate->pending_nulls > 0)
+		FlushPendingNulls(jstate);
+
+	AppendJumbleInternal(jstate, value, size);
+}
+
+/*
+ * AppendJumble8
+ *		Add the first byte from the given 'value' pointer to the jumble state
+ */
+static pg_noinline void
+AppendJumble8(JumbleState *jstate, const unsigned char *value)
+{
+	if (jstate->pending_nulls > 0)
+		FlushPendingNulls(jstate);
+
+	AppendJumbleInternal(jstate, value, 1);
+}
+
+/*
+ * AppendJumble16
+ *		Add the first 2 bytes from the given 'value' pointer to the jumble
+ *		state.
+ */
+static pg_noinline void
+AppendJumble16(JumbleState *jstate, const unsigned char *value)
+{
+	if (jstate->pending_nulls > 0)
+		FlushPendingNulls(jstate);
+
+	AppendJumbleInternal(jstate, value, 2);
+}
+
+/*
+ * AppendJumble32
+ *		Add the first 4 bytes from the given 'value' pointer to the jumble
+ *		state.
+ */
+static pg_noinline void
+AppendJumble32(JumbleState *jstate, const unsigned char *value)
+{
+	if (jstate->pending_nulls > 0)
+		FlushPendingNulls(jstate);
+
+	AppendJumbleInternal(jstate, value, 4);
+}
+
+/*
+ * AppendJumble64
+ *		Add the first 8 bytes from the given 'value' pointer to the jumble
+ *		state.
+ */
+static pg_noinline void
+AppendJumble64(JumbleState *jstate, const unsigned char *value)
+{
+	if (jstate->pending_nulls > 0)
+		FlushPendingNulls(jstate);
+
+	AppendJumbleInternal(jstate, value, 8);
+}
+
 /*
  * FlushPendingNulls
  *		Incorporate the pending_null value into the jumble buffer.
@@ -276,8 +369,8 @@ FlushPendingNulls(JumbleState *jstate)
 {
 	Assert(jstate->pending_nulls > 0);
 
-	AppendJumble(jstate,
-				 (const unsigned char *) &jstate->pending_nulls, 4);
+	AppendJumbleInternal(jstate,
+						 (const unsigned char *) &jstate->pending_nulls, 4);
 	jstate->pending_nulls = 0;
 }
 
@@ -401,7 +494,18 @@ IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
 #define JUMBLE_LOCATION(location) \
 	RecordConstLocation(jstate, expr->location, false)
 #define JUMBLE_FIELD(item) \
-	AppendJumble(jstate, (const unsigned char *) &(expr->item), sizeof(expr->item))
+do { \
+	if (sizeof(expr->item) == 8) \
+		AppendJumble64(jstate, (const unsigned char *) &(expr->item)); \
+	else if (sizeof(expr->item) == 4) \
+		AppendJumble32(jstate, (const unsigned char *) &(expr->item)); \
+	else if (sizeof(expr->item) == 2) \
+		AppendJumble16(jstate, (const unsigned char *) &(expr->item)); \
+	else if (sizeof(expr->item) == 1) \
+		AppendJumble8(jstate, (const unsigned char *) &(expr->item)); \
+	else \
+		AppendJumble(jstate, (const unsigned char *) &(expr->item), sizeof(expr->item)); \
+} while (0)
 #define JUMBLE_FIELD_SINGLE(item) \
 	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
 #define JUMBLE_STRING(str) \
@@ -535,15 +639,15 @@ _jumbleList(JumbleState *jstate, Node *node)
 			break;
 		case T_IntList:
 			foreach(l, expr)
-				JUMBLE_FIELD_SINGLE(lfirst_int(l));
+				AppendJumble32(jstate, (const unsigned char *) &lfirst_int(l));
 			break;
 		case T_OidList:
 			foreach(l, expr)
-				JUMBLE_FIELD_SINGLE(lfirst_oid(l));
+				AppendJumble32(jstate, (const unsigned char *) &lfirst_oid(l));
 			break;
 		case T_XidList:
 			foreach(l, expr)
-				JUMBLE_FIELD_SINGLE(lfirst_xid(l));
+				AppendJumble32(jstate, (const unsigned char *) &lfirst_xid(l));
 			break;
 		default:
 			elog(ERROR, "unrecognized list node type: %d",
-- 
2.43.0

#46

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: David Rowley (#45)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Thu, Mar 27, 2025 at 12:09:35PM +1300, David Rowley wrote:

I think it's better it's two commits. They're for different purposes.

Fine for me.

Ok, I won't protest. I just feel the bug is a relic of the old design
and I don't believe the tests are testing anything useful in the new
design. It feels unlikely that someone accidentally adjusts the code
and reverts it to the old design. Anyway, I've included the tests from
your patch.

One habit that I've found really useful to do when it comes to this
area of the code is to apply the tests first to show what kind of
behavior we had before changing the jumbling, then apply the update to
the query jumbling. This has two benefits:
- If the update of the second phase gets reverted, we still have the
tests.
- It is possible to show in the git history how the behavior changes,
analyzing them in isolation.

So I'd suggest an extra split, with the tests committed first.

It is a bit disappointing that we require JumbleQuery() to jumble the
NULLs. There are discussions about making this code available not
only for Querys, but for Nodes, and the NUL computations would be
missed..

I've adjusted the patch to add two static functions. InitJumble() and
DoJumble(). JumbleQuery() uses these.

That's cleaner, thanks for the split. I don't think that we've
reached a consensus on the other thread about what to provide as
interface and that's likely too late for this release, what you are
proposing here is a good step forward, and will help in not messing up
again with the problem of this thread in the future.

+#ifdef USE_ASSERT_CHECKING
+	Size		prev_jumble_len = jstate->total_jumble_len;
+#endif
[...]
+	/* The total number of bytes added to the jumble buffer */
+	Size		total_jumble_len;
 } JumbleState;

Maybe hide that in the header with one extra USE_ASSERT_CHECKING if we
are only going to increment it when assertions are enabled?

+	 * These are flushed out to the jumble buffer before subsequent appends
+	 * and before performing the final jumble hash.

This comment is slightly incorrect when 0001 is taken in isolation?
The flush of the NULL counter is done in the patch via
FlushPendingNulls() when jumbling the final value, not before more
appends? It becomes true with 0002, where FlushPendingNulls() is
called before appending any data.

Perhaps removing JUMBLE_FIELD_SINGLE() is better now.
--
Michael

#47

Sami Imseih

samimseih@gmail.com

10 months ago

In reply to: Michael Paquier (#46)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

Maybe I am missing something, but when I applied the
v9-0001 patch alone, it did not solve the problem it
claims to and the pg_s_s regression test fails:

+++ /local/home/simseih/pghead/install1/postgresql/contrib/pg_stat_statements/results/select.out
   2025-03-27 04:56:35.143855392 +0000
@@ -215,14 +215,12 @@
      3 |    3 | SELECT $1 + $2 + $3 AS "add"
      1 |    1 | SELECT $1 AS "float"
      2 |    2 | SELECT $1 AS "int"
-     2 |    2 | SELECT $1 AS "int" LIMIT $2
-     2 |    0 | SELECT $1 AS "int" OFFSET $2
+     4 |    2 | SELECT $1 AS "int" LIMIT $2
      6 |    0 | SELECT $1 AS "int" OFFSET $2 LIMIT $3
-     2 |    2 | SELECT $1 AS "int" ORDER BY 1
      1 |    2 | SELECT $1 AS i UNION SELECT $2 ORDER BY i
      1 |    1 | SELECT $1 || $2
      1 |    1 | SELECT $1, $2 LIMIT $3
-     2 |    2 | SELECT DISTINCT $1 AS "int"
+     4 |    4 | SELECT DISTINCT $1 AS "int"
      0 |    0 | SELECT calls, rows, query FROM pg_stat_statements
ORDER BY query COLLATE "C"
      1 |    1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
      1 |    2 | WITH t(f) AS (
                      +
@@ -230,7 +228,7 @@
        |      | )
                      +
        |      |   SELECT f FROM t ORDER BY f
      1 |    1 | select $1::jsonb ? $2
-(17 rows)
+(15 rows)

but when v9-0002 is applied, the regress tests then
succeed.

--
Sami Imseih
Amazon Web Services (AWS)

#48

David Rowley

dgrowleyml@gmail.com

10 months ago

In reply to: Sami Imseih (#47)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Thu, 27 Mar 2025 at 18:03, Sami Imseih <samimseih@gmail.com> wrote:

Maybe I am missing something, but when I applied the
v9-0001 patch alone, it did not solve the problem it
claims to and the pg_s_s regression test fails:

That was an artifact of me not having made the split quite in the
right place when dividing the patch into two. What's just been
committed fixed that. Thanks for looking.

David

#49

David Rowley

dgrowleyml@gmail.com

10 months ago

In reply to: Michael Paquier (#46)

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Thu, 27 Mar 2025 at 14:51, Michael Paquier <michael@paquier.xyz> wrote:

One habit that I've found really useful to do when it comes to this
area of the code is to apply the tests first to show what kind of
behavior we had before changing the jumbling, then apply the update to
the query jumbling. This has two benefits:
- If the update of the second phase gets reverted, we still have the
tests.
- It is possible to show in the git history how the behavior changes,
analyzing them in isolation.

So I'd suggest an extra split, with the tests committed first.

I didn't do that. I can't quite process the logic of adding tests to
check we get the wrong behaviour.

Maybe hide that in the header with one extra USE_ASSERT_CHECKING if we
are only going to increment it when assertions are enabled?
+        * These are flushed out to the jumble buffer before subsequent appends
+        * and before performing the final jumble hash.

I thought about that and had decided to leave it in. I had been
concerned about the struct growing and shrinking between
cassert/non-cassert builds. It's probably ok since it's expected to be
an internal field. I guess if a cassert extension used the field and
ran against a non-cassert PostgreSQL, then they'd have trouble. I did
end up removing it in what I just pushed. I felt I might be being
overly concerned since the field is at the end of the struct and is
commented that it's for Asserts only.

This comment is slightly incorrect when 0001 is taken in isolation?
The flush of the NULL counter is done in the patch via
FlushPendingNulls() when jumbling the final value, not before more
appends? It becomes true with 0002, where FlushPendingNulls() is
called before appending any data.

That was my mistake. I divided the patches incorrectly. There was
meant to be a flush in there.

Perhaps removing JUMBLE_FIELD_SINGLE() is better now.

Done and pushed.

Thanks.

David