Some oversights in query_id calculation

Started by Julien Rouhaudover 4 years ago7 messages
#1Julien Rouhaud
rjuju123@gmail.com
1 attachment(s)

Hi,

While doing some sanity checks on the regression tests, I found some queries
that are semantically different but end up with identical query_id.

Two are an old issues:

- the "ONLY" in FROM [ONLY] isn't hashed
- the agglevelsup field in GROUPING isn't hashed

Another one was introduced in pg13 with the WITH TIES not being hashed.

The last one new in pg14: the "DISTINCT" in "GROUP BY [DISTINCT]" isn't hash.

I'm attaching a patch that fixes those, with regression tests to reproduce each
problem.

There are also 2 additional debatable cases on whether this is a semantic
difference or not:

- aliases aren't hashed. That's usually not a problem, except when you use
row_to_json(), since you'll get different keys

- the NAME in XmlExpr (eg: xmlpi(NAME foo,...)) isn't hashed, so you generate
different elements

Attachments:

v1-0001-Fix-some-oversights-in-query_id-calculation.patchtext/x-diff; charset=us-asciiDownload
From cf4ee4b493cdb730b7ac98d65bcca785455af7ce Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Sun, 25 Apr 2021 15:49:32 +0800
Subject: [PATCH v1] Fix some oversights in query_id calculation.

---
 .../expected/pg_stat_statements.out           | 151 ++++++++++++++++++
 .../sql/pg_stat_statements.sql                |  52 ++++++
 src/backend/utils/misc/queryjumble.c          |   4 +
 3 files changed, 207 insertions(+)

diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index fb97f68737..40b5109b55 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -916,4 +916,155 @@ SELECT query, toplevel, plans, calls FROM pg_stat_statements WHERE query LIKE '%
  $$ LANGUAGE plpgsql   |          |       | 
 (3 rows)
 
+-- FROM [ONLY]
+CREATE TABLE tbl_inh(id integer);
+CREATE TABLE tbl_inh_1() INHERITS (tbl_inh);
+INSERT INTO tbl_inh_1 SELECT 1;
+SELECT * FROM tbl_inh;
+ id 
+----
+  1
+(1 row)
+
+SELECT * FROM ONLY tbl_inh;
+ id 
+----
+(0 rows)
+
+SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%FROM%tbl_inh%';
+ count 
+-------
+     2
+(1 row)
+
+-- WITH TIES
+CREATE TABLE limitoption AS SELECT 0 AS val FROM generate_series(1, 10);
+SELECT *
+FROM limitoption
+WHERE val < 2
+ORDER BY val
+FETCH FIRST 2 ROWS WITH TIES;
+ val 
+-----
+   0
+   0
+   0
+   0
+   0
+   0
+   0
+   0
+   0
+   0
+(10 rows)
+
+SELECT *
+FROM limitoption
+WHERE val < 2
+ORDER BY val
+FETCH FIRST 2 ROW ONLY;
+ val 
+-----
+   0
+   0
+(2 rows)
+
+SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%FETCH FIRST%';
+ count 
+-------
+     2
+(1 row)
+
+-- GROUP BY [DISTINCT]
+SELECT a, b, c
+FROM (VALUES (1, 2, 3), (4, NULL, 6), (7, 8, 9)) AS t (a, b, c)
+GROUP BY ROLLUP(a, b), rollup(a, c)
+ORDER BY a, b, c;
+ a | b | c 
+---+---+---
+ 1 | 2 | 3
+ 1 | 2 |  
+ 1 | 2 |  
+ 1 |   | 3
+ 1 |   | 3
+ 1 |   |  
+ 1 |   |  
+ 1 |   |  
+ 4 |   | 6
+ 4 |   | 6
+ 4 |   | 6
+ 4 |   |  
+ 4 |   |  
+ 4 |   |  
+ 4 |   |  
+ 4 |   |  
+ 7 | 8 | 9
+ 7 | 8 |  
+ 7 | 8 |  
+ 7 |   | 9
+ 7 |   | 9
+ 7 |   |  
+ 7 |   |  
+ 7 |   |  
+   |   |  
+(25 rows)
+
+SELECT a, b, c
+FROM (VALUES (1, 2, 3), (4, NULL, 6), (7, 8, 9)) AS t (a, b, c)
+GROUP BY DISTINCT ROLLUP(a, b), rollup(a, c)
+ORDER BY a, b, c;
+ a | b | c 
+---+---+---
+ 1 | 2 | 3
+ 1 | 2 |  
+ 1 |   | 3
+ 1 |   |  
+ 4 |   | 6
+ 4 |   | 6
+ 4 |   |  
+ 4 |   |  
+ 7 | 8 | 9
+ 7 | 8 |  
+ 7 |   | 9
+ 7 |   |  
+   |   |  
+(13 rows)
+
+SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%GROUP BY%ROLLUP%';
+ count 
+-------
+     2
+(1 row)
+
+-- GROUPING SET agglevelsup
+SELECT (
+  SELECT (
+    SELECT GROUPING(a,b) FROM (VALUES (1)) v2(c)
+  ) FROM (VALUES (1,2)) v1(a,b) GROUP BY (a,b)
+) FROM (VALUES(6,7)) v3(e,f) GROUP BY ROLLUP(e,f);
+ grouping 
+----------
+        0
+        0
+        0
+(3 rows)
+
+SELECT (
+  SELECT (
+    SELECT GROUPING(e,f) FROM (VALUES (1)) v2(c)
+  ) FROM (VALUES (1,2)) v1(a,b) GROUP BY (a,b)
+) FROM (VALUES(6,7)) v3(e,f) GROUP BY ROLLUP(e,f);
+ grouping 
+----------
+        3
+        0
+        1
+(3 rows)
+
+SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%SELECT GROUPING%';
+ count 
+-------
+     2
+(1 row)
+
 DROP EXTENSION pg_stat_statements;
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index 56d8526ccf..bc3b6493e6 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -385,4 +385,56 @@ END;
 $$ LANGUAGE plpgsql;
 SELECT query, toplevel, plans, calls FROM pg_stat_statements WHERE query LIKE '%DELETE%' ORDER BY query COLLATE "C", toplevel;
 
+-- FROM [ONLY]
+CREATE TABLE tbl_inh(id integer);
+CREATE TABLE tbl_inh_1() INHERITS (tbl_inh);
+INSERT INTO tbl_inh_1 SELECT 1;
+
+SELECT * FROM tbl_inh;
+SELECT * FROM ONLY tbl_inh;
+
+SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%FROM%tbl_inh%';
+
+-- WITH TIES
+CREATE TABLE limitoption AS SELECT 0 AS val FROM generate_series(1, 10);
+SELECT *
+FROM limitoption
+WHERE val < 2
+ORDER BY val
+FETCH FIRST 2 ROWS WITH TIES;
+
+SELECT *
+FROM limitoption
+WHERE val < 2
+ORDER BY val
+FETCH FIRST 2 ROW ONLY;
+
+SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%FETCH FIRST%';
+
+-- GROUP BY [DISTINCT]
+SELECT a, b, c
+FROM (VALUES (1, 2, 3), (4, NULL, 6), (7, 8, 9)) AS t (a, b, c)
+GROUP BY ROLLUP(a, b), rollup(a, c)
+ORDER BY a, b, c;
+SELECT a, b, c
+FROM (VALUES (1, 2, 3), (4, NULL, 6), (7, 8, 9)) AS t (a, b, c)
+GROUP BY DISTINCT ROLLUP(a, b), rollup(a, c)
+ORDER BY a, b, c;
+
+SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%GROUP BY%ROLLUP%';
+
+-- GROUPING SET agglevelsup
+SELECT (
+  SELECT (
+    SELECT GROUPING(a,b) FROM (VALUES (1)) v2(c)
+  ) FROM (VALUES (1,2)) v1(a,b) GROUP BY (a,b)
+) FROM (VALUES(6,7)) v3(e,f) GROUP BY ROLLUP(e,f);
+SELECT (
+  SELECT (
+    SELECT GROUPING(e,f) FROM (VALUES (1)) v2(c)
+  ) FROM (VALUES (1,2)) v1(a,b) GROUP BY (a,b)
+) FROM (VALUES(6,7)) v3(e,f) GROUP BY ROLLUP(e,f);
+
+SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%SELECT GROUPING%';
+
 DROP EXTENSION pg_stat_statements;
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index afd6d76ceb..1bb9fe20ea 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -230,6 +230,7 @@ JumbleQueryInternal(JumbleState *jstate, Query *query)
 	JumbleExpr(jstate, (Node *) query->onConflict);
 	JumbleExpr(jstate, (Node *) query->returningList);
 	JumbleExpr(jstate, (Node *) query->groupClause);
+	APP_JUMB(query->groupDistinct);
 	JumbleExpr(jstate, (Node *) query->groupingSets);
 	JumbleExpr(jstate, query->havingQual);
 	JumbleExpr(jstate, (Node *) query->windowClause);
@@ -237,6 +238,7 @@ JumbleQueryInternal(JumbleState *jstate, Query *query)
 	JumbleExpr(jstate, (Node *) query->sortClause);
 	JumbleExpr(jstate, query->limitOffset);
 	JumbleExpr(jstate, query->limitCount);
+	APP_JUMB(query->limitOption);
 	JumbleRowMarks(jstate, query->rowMarks);
 	JumbleExpr(jstate, query->setOperations);
 }
@@ -259,6 +261,7 @@ JumbleRangeTable(JumbleState *jstate, List *rtable)
 			case RTE_RELATION:
 				APP_JUMB(rte->relid);
 				JumbleExpr(jstate, (Node *) rte->tablesample);
+				APP_JUMB(rte->inh);
 				break;
 			case RTE_SUBQUERY:
 				JumbleQueryInternal(jstate, rte->subquery);
@@ -399,6 +402,7 @@ JumbleExpr(JumbleState *jstate, Node *node)
 				GroupingFunc *grpnode = (GroupingFunc *) node;
 
 				JumbleExpr(jstate, (Node *) grpnode->refs);
+				APP_JUMB(grpnode->agglevelsup);
 			}
 			break;
 		case T_WindowFunc:
-- 
2.30.1

#2Aleksander Alekseev
aleksander@timescale.com
In reply to: Julien Rouhaud (#1)
Re: Some oversights in query_id calculation

Hi Julien,

I'm attaching a patch that fixes those, with regression tests to reproduce each
problem.

I believe something could be not quite right with the patch. Here is what I did:

$ git apply ...
# revert the changes in the code but keep the new tests
$ git checkout src/backend/utils/misc/queryjumble.c
$ ./full-build.sh && single-install.sh && make installcheck-world

... where named .sh scripts are something I use to quickly check a patch [1]https://github.com/afiskon/pgscripts.

I was expecting that several tests will fail but they didn't. Maybe I
missed something?

[1]: https://github.com/afiskon/pgscripts

--
Best regards,
Aleksander Alekseev

#3Julien Rouhaud
rjuju123@gmail.com
In reply to: Aleksander Alekseev (#2)
Re: Some oversights in query_id calculation

Hi Aleksander,

On Wed, Apr 28, 2021 at 01:19:36PM +0300, Aleksander Alekseev wrote:

Hi Julien,

I'm attaching a patch that fixes those, with regression tests to reproduce each
problem.

I believe something could be not quite right with the patch. Here is what I did:

$ git apply ...
# revert the changes in the code but keep the new tests
$ git checkout src/backend/utils/misc/queryjumble.c
$ ./full-build.sh && single-install.sh && make installcheck-world

... where named .sh scripts are something I use to quickly check a patch [1].

I was expecting that several tests will fail but they didn't. Maybe I
missed something?

I think it's because installcheck-* don't run pg_stat_statements' tests, see
its Makefile:

# Disabled because these tests require "shared_preload_libraries=pg_stat_statements",
# which typical installcheck users do not have (e.g. buildfarm clients).
NO_INSTALLCHECK = 1

You should see failures doing a check-world or simply a make -C
contrib/pg_stat_statements check

#4Aleksander Alekseev
aleksander@timescale.com
In reply to: Julien Rouhaud (#3)
Re: Some oversights in query_id calculation

Hi Julien,

You should see failures doing a check-world or simply a make -C
contrib/pg_stat_statements check

Sorry, my bad. I was running make check-world, but did it with -j4 flag
which was a mistake.

The patch is OK.

On Wed, Apr 28, 2021 at 1:27 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

Hi Aleksander,

On Wed, Apr 28, 2021 at 01:19:36PM +0300, Aleksander Alekseev wrote:

Hi Julien,

I'm attaching a patch that fixes those, with regression tests to

reproduce each

problem.

I believe something could be not quite right with the patch. Here is

what I did:

$ git apply ...
# revert the changes in the code but keep the new tests
$ git checkout src/backend/utils/misc/queryjumble.c
$ ./full-build.sh && single-install.sh && make installcheck-world

... where named .sh scripts are something I use to quickly check a patch

[1].

I was expecting that several tests will fail but they didn't. Maybe I
missed something?

I think it's because installcheck-* don't run pg_stat_statements' tests,
see
its Makefile:

# Disabled because these tests require

"shared_preload_libraries=pg_stat_statements",

# which typical installcheck users do not have (e.g. buildfarm clients).
NO_INSTALLCHECK = 1

You should see failures doing a check-world or simply a make -C
contrib/pg_stat_statements check

--
Best regards,
Aleksander Alekseev

#5Julien Rouhaud
rjuju123@gmail.com
In reply to: Aleksander Alekseev (#4)
Re: Some oversights in query_id calculation

Hi Aleksander,

On Wed, Apr 28, 2021 at 03:22:39PM +0300, Aleksander Alekseev wrote:

Hi Julien,

You should see failures doing a check-world or simply a make -C
contrib/pg_stat_statements check

Sorry, my bad. I was running make check-world, but did it with -j4 flag
which was a mistake.

The patch is OK.

Thanks for reviewing!

#6Bruce Momjian
bruce@momjian.us
In reply to: Julien Rouhaud (#5)
Re: Some oversights in query_id calculation

On Sun, May 2, 2021 at 12:27:37PM +0800, Julien Rouhaud wrote:

Hi Aleksander,

On Wed, Apr 28, 2021 at 03:22:39PM +0300, Aleksander Alekseev wrote:

Hi Julien,

You should see failures doing a check-world or simply a make -C
contrib/pg_stat_statements check

Sorry, my bad. I was running make check-world, but did it with -j4 flag
which was a mistake.

The patch is OK.

Thanks for reviewing!

Patch applied, thanks.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#7Julien Rouhaud
rjuju123@gmail.com
In reply to: Bruce Momjian (#6)
Re: Some oversights in query_id calculation

On Mon, May 03, 2021 at 02:59:42PM -0400, Bruce Momjian wrote:

On Sun, May 2, 2021 at 12:27:37PM +0800, Julien Rouhaud wrote:

Hi Aleksander,

On Wed, Apr 28, 2021 at 03:22:39PM +0300, Aleksander Alekseev wrote:

Hi Julien,

You should see failures doing a check-world or simply a make -C
contrib/pg_stat_statements check

Sorry, my bad. I was running make check-world, but did it with -j4 flag
which was a mistake.

The patch is OK.

Thanks for reviewing!

Patch applied, thanks.

Thanks a lot Bruce!