RangeTblEntry jumble omissions

Started by Peter Eisentrautalmost 2 years ago7 messages
#1Peter Eisentraut
peter@eisentraut.org
1 attachment(s)

I think there are some fields from the RangeTblEntry struct missing in
the jumble (function _jumbleRangeTblEntry()). Probably, some of these
were really just forgotten, in other cases this might be an intentional
decision, but then it might be good to document it. This has come up in
thread [0]/messages/by-id/4b27fc50-8cd6-46f5-ab20-88dbaadca645@eisentraut.org and there is a patch [1]/messages/by-id/attachment/154249/v2-0002-Remove-custom-_jumbleRangeTblEntry.patch, but I figured I'd start a new
thread here to get the attention of those who know more about
pg_stat_statements.

I think the following fields are missing. (See also attached patch.)

- alias

Currently, two queries like

SELECT * FROM t1 AS foo
SELECT * FROM t1 AS bar

are counted together by pg_stat_statements -- that might be ok, but they
both get listed under whichever one is run first, so here if you are
looking for the "AS bar" query, you won't find it.

- join_using_alias

Similar situation, currently

SELECT * FROM t1 JOIN t2 USING (a, b)
SELECT * FROM t1 JOIN t2 USING (a, b) AS x

are counted together.

- funcordinality

This was probably just forgotten. It should be included because the
WITH ORDINALITY clause changes the query result.

- lateral

Also probably forgotten. A query specifying LATERAL is clearly
different from one without it.

Thoughts? Anything else missing perhaps?

[0]: /messages/by-id/4b27fc50-8cd6-46f5-ab20-88dbaadca645@eisentraut.org
/messages/by-id/4b27fc50-8cd6-46f5-ab20-88dbaadca645@eisentraut.org
[1]: /messages/by-id/attachment/154249/v2-0002-Remove-custom-_jumbleRangeTblEntry.patch
/messages/by-id/attachment/154249/v2-0002-Remove-custom-_jumbleRangeTblEntry.patch

Attachments:

0001-Add-more-RangeTblEntry-fields-to-jumble.patchtext/plain; charset=UTF-8; name=0001-Add-more-RangeTblEntry-fields-to-jumble.patchDownload
From 8176e6f91abd8809e79e1c8e9335522031da2755 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Fri, 23 Feb 2024 16:10:57 +0100
Subject: [PATCH] Add more RangeTblEntry fields to jumble

---
 src/backend/nodes/queryjumblefuncs.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index 82f725baaa5..3b0012a720d 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -372,9 +372,11 @@ _jumbleRangeTblEntry(JumbleState *jstate, Node *node)
 			break;
 		case RTE_JOIN:
 			JUMBLE_FIELD(jointype);
+			JUMBLE_NODE(join_using_alias);
 			break;
 		case RTE_FUNCTION:
 			JUMBLE_NODE(functions);
+			JUMBLE_FIELD(funcordinality);
 			break;
 		case RTE_TABLEFUNC:
 			JUMBLE_NODE(tablefunc);
@@ -400,4 +402,6 @@ _jumbleRangeTblEntry(JumbleState *jstate, Node *node)
 			elog(ERROR, "unrecognized RTE kind: %d", (int) expr->rtekind);
 			break;
 	}
+	JUMBLE_NODE(alias);
+	JUMBLE_FIELD(lateral);
 }
-- 
2.43.2

#2Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Peter Eisentraut (#1)
Re: RangeTblEntry jumble omissions

On 2024-Feb-23, Peter Eisentraut wrote:

- alias

Currently, two queries like

SELECT * FROM t1 AS foo
SELECT * FROM t1 AS bar

are counted together by pg_stat_statements -- that might be ok, but they
both get listed under whichever one is run first, so here if you are looking
for the "AS bar" query, you won't find it.

Another, similar but not quite: if you do

SET search_path TO foo;
SELECT * FROM t1;
SET search_path TO bar;
SELECT * FROM t1;

and you have both foo.t1 and bar.t1, you'll get two identical-looking
queries in pg_stat_statements with different jumble IDs, with no way to
know which is which. Not sure if the jumbling of the RTE (which
includes the OID of the table in question) itself is to blame, or
whether we want to store the relevant schemas with the entry somehow, or
what. Obviously, failing to differentiate them would not be an
improvement.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

#3Julien Rouhaud
rjuju123@gmail.com
In reply to: Peter Eisentraut (#1)
Re: RangeTblEntry jumble omissions

Hi,

On Fri, Feb 23, 2024 at 04:26:53PM +0100, Peter Eisentraut wrote:

- alias

Currently, two queries like

SELECT * FROM t1 AS foo
SELECT * FROM t1 AS bar

are counted together by pg_stat_statements -- that might be ok, but they
both get listed under whichever one is run first, so here if you are looking
for the "AS bar" query, you won't find it.

I think this one is intentional. This alias won't change the query behavior or
the field names so it's good to avoid extraneous entries. It's true that you
then won't find something matching "AS bar", but it's not something you can
rely on anyway.

If you first execute "select * from t1 as foo" and then "SELECT * FROM t1 AS
foo" then you won't find anything matching "AS foo" either. There isn't even
any guarantee that the stored query text will be jumbled.

- join_using_alias

Similar situation, currently

SELECT * FROM t1 JOIN t2 USING (a, b)
SELECT * FROM t1 JOIN t2 USING (a, b) AS x

are counted together.

IMHO same as above.

- funcordinality

This was probably just forgotten. It should be included because the WITH
ORDINALITY clause changes the query result.

Agreed.

- lateral

Also probably forgotten. A query specifying LATERAL is clearly different
from one without it.

Agreed.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Julien Rouhaud (#3)
Re: RangeTblEntry jumble omissions

Julien Rouhaud <rjuju123@gmail.com> writes:

On Fri, Feb 23, 2024 at 04:26:53PM +0100, Peter Eisentraut wrote:

- funcordinality
This was probably just forgotten. It should be included because the WITH
ORDINALITY clause changes the query result.

Agreed.

Seems OK.

- lateral
Also probably forgotten. A query specifying LATERAL is clearly different
from one without it.

Agreed.

Nah ... I think that LATERAL should be ignored on essentially the
same grounds on which you argue for ignoring aliases. If it
affects the query's semantics, it's because there is a lateral
reference in the subject subquery or function, and that reference
already contributes to the query hash. If there is no such
reference, then LATERAL is a noise word. It doesn't help any that
LATERAL is actually optional for functions, making it certainly a
noise word there.

IIRC, the parser+planner cooperatively fix things so that the final
state of an RTE's lateral field reflects reality. But if we are
hashing before that's happened, it's not worth all that much.

regards, tom lane

#5Julien Rouhaud
rjuju123@gmail.com
In reply to: Alvaro Herrera (#2)
Re: RangeTblEntry jumble omissions

On Fri, Feb 23, 2024 at 11:00:41PM +0100, Alvaro Herrera wrote:

Another, similar but not quite: if you do

SET search_path TO foo;
SELECT * FROM t1;
SET search_path TO bar;
SELECT * FROM t1;

and you have both foo.t1 and bar.t1, you'll get two identical-looking
queries in pg_stat_statements with different jumble IDs, with no way to
know which is which. Not sure if the jumbling of the RTE (which
includes the OID of the table in question) itself is to blame, or
whether we want to store the relevant schemas with the entry somehow, or
what. Obviously, failing to differentiate them would not be an
improvement.

Yeah that's also a very old known problem. This has been raised multiple times
(on top of my head [1]/messages/by-id/8f54c609-17c6-945b-fe13-8b07c0866420@dalibo.com, [2]/messages/by-id/9baf5c06-d6ab-c688-010c-843348e3d98c@gmail.com, [3]/messages/by-id/3aa097d7-7c47-187b-5913-db8366cd4491@gmail.com). At this point I'm not exactly holding my
breath.

[1]: /messages/by-id/8f54c609-17c6-945b-fe13-8b07c0866420@dalibo.com
[2]: /messages/by-id/9baf5c06-d6ab-c688-010c-843348e3d98c@gmail.com
[3]: /messages/by-id/3aa097d7-7c47-187b-5913-db8366cd4491@gmail.com

#6Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#4)
Re: RangeTblEntry jumble omissions

On Fri, Feb 23, 2024 at 06:52:54PM -0500, Tom Lane wrote:

Julien Rouhaud <rjuju123@gmail.com> writes:

On Fri, Feb 23, 2024 at 04:26:53PM +0100, Peter Eisentraut wrote:

- funcordinality
This was probably just forgotten. It should be included because the WITH
ORDINALITY clause changes the query result.

Agreed.

Seems OK.

+1.

- lateral
Also probably forgotten. A query specifying LATERAL is clearly different
from one without it.

Agreed.

Nah ... I think that LATERAL should be ignored on essentially the
same grounds on which you argue for ignoring aliases. If it
affects the query's semantics, it's because there is a lateral
reference in the subject subquery or function, and that reference
already contributes to the query hash. If there is no such
reference, then LATERAL is a noise word. It doesn't help any that
LATERAL is actually optional for functions, making it certainly a
noise word there.

Sounds like a fair argument to me.

Btw, I think that you should add a few queries to the tests of
pg_stat_statements to track the change of behavior when you have
aliases, as an effect of the fields added in the jumbling.
--
Michael

#7Peter Eisentraut
peter@eisentraut.org
In reply to: Michael Paquier (#6)
Re: RangeTblEntry jumble omissions

On 26.02.24 02:08, Michael Paquier wrote:

On Fri, Feb 23, 2024 at 06:52:54PM -0500, Tom Lane wrote:

Julien Rouhaud <rjuju123@gmail.com> writes:

On Fri, Feb 23, 2024 at 04:26:53PM +0100, Peter Eisentraut wrote:

- funcordinality
This was probably just forgotten. It should be included because the WITH
ORDINALITY clause changes the query result.

Agreed.

Seems OK.

+1.

Ok, I have added funcordinality for the RTE_FUNCTION case, and left the
others alone.