BUG #17668: Query normalization generates multiple queryId:s for calls to the same procedure
The following bug has been logged on the website:
Bug reference: 17668
Logged by: Marcus Kempe
Email address: marcus.kempe@gmail.com
PostgreSQL version: 14.5
Operating system: Linux
Description:
Consider the following case:
postgres=# create function test(x bigint) returns void language sql as $$
select x ;$$;
CREATE FUNCTION
postgres=# select test(1);
test
------
(1 row)
postgres=# select calls,userid,dbid,queryid,query from
public.pg_stat_statements where query like 'select test%';
calls | userid | dbid | queryid | query
-------+--------+-------+----------------------+-----------------
1 | 10 | 13757 | -8729651821477628722 | select test($1)
(1 row)
postgres=# select test(12345678910); -- larger that INTMAX.
test
------
(1 row)
postgres=# select calls,userid,dbid,queryid,query from
public.pg_stat_statements where query like 'select test%';
calls | userid | dbid | queryid | query
-------+--------+-------+----------------------+-----------------
1 | 10 | 13757 | -8729651821477628722 | select test($1)
1 | 10 | 13757 | 1452143938866253601 | select test($1)
(2 rows)
So given the above 2 calls for the same function, there are 2 different
queryIds generated, seemingly based on the fact that in the first case the
parameter can be cast to integer, and in the second case it can not.
I would have expected both calls to be normalized and fingerprinted to one
and the same queryId.
This behavior was similar even before queryjumble.c was introduced in PG14.
Hi,
On Thu, Oct 27, 2022 at 01:49:18PM +0000, PG Bug reporting form wrote:
postgres=# select calls,userid,dbid,queryid,query from
public.pg_stat_statements where query like 'select test%';
calls | userid | dbid | queryid | query
-------+--------+-------+----------------------+-----------------
1 | 10 | 13757 | -8729651821477628722 | select test($1)
1 | 10 | 13757 | 1452143938866253601 | select test($1)
(2 rows)So given the above 2 calls for the same function, there are 2 different
queryIds generated, seemingly based on the fact that in the first case the
parameter can be cast to integer, and in the second case it can not.I would have expected both calls to be normalized and fingerprinted to one
and the same queryId.
Agreed, and that's actually a known problem that is currently being worked on.
You can look at
/messages/by-id/36e5bffe-e989-194f-85c8-06e7bc88e6f7@amazon.com
for details about the discussion and in-progress fix.
Julien Rouhaud <rjuju123@gmail.com> writes:
On Thu, Oct 27, 2022 at 01:49:18PM +0000, PG Bug reporting form wrote:
I would have expected both calls to be normalized and fingerprinted to one
and the same queryId.
Agreed, and that's actually a known problem that is currently being worked on.
You can look at
/messages/by-id/36e5bffe-e989-194f-85c8-06e7bc88e6f7@amazon.com
for details about the discussion and in-progress fix.
Um ... that seems unrelated. AFAICS the OP is complaining about
the fact that '42'::int4::int8 is not identical to '42'::int8.
Well, they're not. I seriously doubt that we would or should
consider trying to get queryjumble to mask that.
regards, tom lane
On Thu, Oct 27, 2022 at 02:28:17PM -0400, Tom Lane wrote:
Julien Rouhaud <rjuju123@gmail.com> writes:
On Thu, Oct 27, 2022 at 01:49:18PM +0000, PG Bug reporting form wrote:
I would have expected both calls to be normalized and fingerprinted to one
and the same queryId.Agreed, and that's actually a known problem that is currently being worked on.
You can look at
/messages/by-id/36e5bffe-e989-194f-85c8-06e7bc88e6f7@amazon.com
for details about the discussion and in-progress fix.Um ... that seems unrelated. AFAICS the OP is complaining about
the fact that '42'::int4::int8 is not identical to '42'::int8.
Well, they're not. I seriously doubt that we would or should
consider trying to get queryjumble to mask that.
Oh wow I totally misread the bug report, sorry about that. A good night sleep
and a coffee later I see what it's about, and I agree that we shouldn't try to
mask that even if it might seem surprising.
Appreciate the clarifications. The only thing I would then add is that if
we agree that it's unintuitive, this could be better reflected in the
documentation.
Currently the only mention is the following (pgstatstatements.html):
"queries with identical texts might appear as separate entries, if they
have different meanings as a result of factors such as different
search_path settings."
Thanks and best regards,
Marcus Kempe
On Fri, Oct 28, 2022 at 3:28 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
Show quoted text
On Thu, Oct 27, 2022 at 02:28:17PM -0400, Tom Lane wrote:
Julien Rouhaud <rjuju123@gmail.com> writes:
On Thu, Oct 27, 2022 at 01:49:18PM +0000, PG Bug reporting form wrote:
I would have expected both calls to be normalized and fingerprinted
to one
and the same queryId.
Agreed, and that's actually a known problem that is currently being
worked on.
You can look at
/messages/by-id/36e5bffe-e989-194f-85c8-06e7bc88e6f7@amazon.com
for details about the discussion and in-progress fix.
Um ... that seems unrelated. AFAICS the OP is complaining about
the fact that '42'::int4::int8 is not identical to '42'::int8.
Well, they're not. I seriously doubt that we would or should
consider trying to get queryjumble to mask that.Oh wow I totally misread the bug report, sorry about that. A good night
sleep
and a coffee later I see what it's about, and I agree that we shouldn't
try to
mask that even if it might seem surprising.
Hello, I've seen possibly related case where that might be a problem.
PostgreSQL 12.8, pg_stat_statements.max=10000. There are queries like these:
insert into tablename (col1, ..., col28) values ($1, ..., $28), ...
($2605, ... , $2660) on conflict do nothing.
Problem is that not only we might insert different number of rows but
there might be many duplicates of the same exact query in
pg_stat_statements (userid and dbid are the same in all rows):
select md5(query), count(*), pg_size_pretty(sum(length(query))) from
pg_stat_statements
where query like 'insert into tablename%' group by 1 order by 2 desc
limit 10;
md5 | count | pg_size_pretty
----------+-------+----------------
170...84 | 1283 | 2679 kB
906...53 | 897 | 1724 kB
55e...78 | 676 | 12 MB
ea4...ef | 642 | 11 MB
944...0f | 629 | 11 MB
04e...ec | 530 | 9138 kB
572...27 | 476 | 8575 kB
5ea...3c | 430 | 7331 kB
be1...6c | 324 | 5704 kB
f69...47 | 313 | 550 kB
There are 8 bigint columns in the table, I suspect that we might insert
into some of them values both greater and less than INT_MAX.
In result pg_stat_statements buffer is full of such queries (they take
9k rows out of 10k right now, 10h since pg_stat_statements_reset call),
currently total sum of query lengths is 117MB (97% of total sum).
From monitoring perspective that might be a problem since calculating
metrics from pg_stat_statements became more expensive and we might lose
information about some queries when there is not enough space for them.
But I suppose such cases are pretty rare and some of them could be fixed
with COPY queries instead of inserts.
--
Thanks,
Alexey Ermakov
Show quoted text
On 2022-10-28 00:28, Tom Lane wrote:
Julien Rouhaud <rjuju123@gmail.com> writes:
On Thu, Oct 27, 2022 at 01:49:18PM +0000, PG Bug reporting form wrote:
I would have expected both calls to be normalized and fingerprinted to one
and the same queryId.Agreed, and that's actually a known problem that is currently being worked on.
You can look at
/messages/by-id/36e5bffe-e989-194f-85c8-06e7bc88e6f7@amazon.com
for details about the discussion and in-progress fix.Um ... that seems unrelated. AFAICS the OP is complaining about
the fact that '42'::int4::int8 is not identical to '42'::int8.
Well, they're not. I seriously doubt that we would or should
consider trying to get queryjumble to mask that.regards, tom lane