IDEA: pg_stat_statements tracking utility statements by tag?

Started by Martijn van Oosterhoutalmost 6 years ago7 messageshackers

Martijn van Oosterhout

kleptog@svana.org

almost 6 years ago

Hoi hackers,

We've been using the pg_stat_statements extension to get an idea of the
queries used in the database, but the table is being filled with entries
like:

SAVEPOINT sa_savepoint_NNN;
RELEASE SAVEPOINT sa_savepoint_NNN;
DECLARE "c_7f9451c4dcc0_5" CURSOR WITHOUT HOLD ...
FETCH FORWARD 250 FROM "c_7f9451b03908_5"

Since the unique id is different for each query, the aggregation does
nothing and there are quite a lot of these drowning out the normal queries
(yes, I'm aware this is an issue of itself). The only way to deal with this
is "pg_stat_statements.track_utility=off". However, it occurs to me that if
you just tracked the tags rather than the full query text you could at
least track the number of such queries and how much time they take. So the
above queries would be tracked under SAVEPOINT, RELEASE, DECLARE CURSOR and
(I guess) FETCH respectively. But it would also catch DDL.

Does this sound like something for which a patch would be accepted?

Have a nice day,
--
Martijn van Oosterhout <kleptog@gmail.com> http://svana.org/kleptog/

masao.fujii@gmail.com

almost 6 years ago

In reply to: Martijn van Oosterhout (#1)

Re: IDEA: pg_stat_statements tracking utility statements by tag?

On 2020/07/29 18:24, Martijn van Oosterhout wrote:

Hoi hackers,

We've been using the pg_stat_statements extension to get an idea of the queries used in the database, but the table is being filled with entries like:

SAVEPOINT sa_savepoint_NNN;
RELEASE SAVEPOINT sa_savepoint_NNN;
DECLARE "c_7f9451c4dcc0_5" CURSOR WITHOUT HOLD ...
FETCH FORWARD 250 FROM "c_7f9451b03908_5"

Since the unique id is different for each query, the aggregation does nothing and there are quite a lot of these drowning out the normal queries (yes, I'm aware this is an issue of itself). The only way to deal with this is "pg_stat_statements.track_utility=off". However, it occurs to me that if you just tracked the tags rather than the full query text you could at least track the number of such queries and how much time they take. So the above queries would be tracked under SAVEPOINT, RELEASE, DECLARE CURSOR and (I guess) FETCH respectively. But it would also catch DDL.

Does this sound like something for which a patch would be accepted?

Or, we should extend the existing query normalization to handle also DDL?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

rjuju123@gmail.com

almost 6 years ago

In reply to: Fujii Masao (#2)

Re: IDEA: pg_stat_statements tracking utility statements by tag?

On Wed, Jul 29, 2020 at 2:42 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/07/29 18:24, Martijn van Oosterhout wrote:

Hoi hackers,

We've been using the pg_stat_statements extension to get an idea of the queries used in the database, but the table is being filled with entries like:

SAVEPOINT sa_savepoint_NNN;
RELEASE SAVEPOINT sa_savepoint_NNN;
DECLARE "c_7f9451c4dcc0_5" CURSOR WITHOUT HOLD ...
FETCH FORWARD 250 FROM "c_7f9451b03908_5"

Since the unique id is different for each query, the aggregation does nothing and there are quite a lot of these drowning out the normal queries (yes, I'm aware this is an issue of itself). The only way to deal with this is "pg_stat_statements.track_utility=off". However, it occurs to me that if you just tracked the tags rather than the full query text you could at least track the number of such queries and how much time they take. So the above queries would be tracked under SAVEPOINT, RELEASE, DECLARE CURSOR and (I guess) FETCH respectively. But it would also catch DDL.

Does this sound like something for which a patch would be accepted?

Or, we should extend the existing query normalization to handle also DDL?

+1, introducing DDL normalization seems like a better way to go in the
long run. Defining what should and shouldn't be normalized can be
tricky though.

Martijn van Oosterhout

kleptog@svana.org

almost 6 years ago

In reply to: Julien Rouhaud (#3)

Re: IDEA: pg_stat_statements tracking utility statements by tag?

On Wed, 29 Jul 2020 at 15:40, Julien Rouhaud <rjuju123@gmail.com> wrote:

On Wed, Jul 29, 2020 at 2:42 PM Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:

Or, we should extend the existing query normalization to handle also DDL?

+1, introducing DDL normalization seems like a better way to go in the
long run. Defining what should and shouldn't be normalized can be
tricky though.

In principle, the only thing that really needs to be normalised is
SAVEPOINT/CURSOR names which are essentially random strings which have no
effect on the result. Most other stuff is material to the query.

That said, I think "aggregate by tag" has value all by itself. Being able
to collapse all CREATE TABLES into a single line can be useful in some
situations.

Ideally the results of FETCH "cursor" should be combined with the DECLARE,
but I really don't know how to go about that.

Have a nice day,
--
Martijn van Oosterhout <kleptog@gmail.com> http://svana.org/kleptog/

rjuju123@gmail.com

almost 6 years ago

In reply to: Martijn van Oosterhout (#4)

Re: IDEA: pg_stat_statements tracking utility statements by tag?

On Wed, Jul 29, 2020 at 5:29 PM Martijn van Oosterhout
<kleptog@gmail.com> wrote:

On Wed, 29 Jul 2020 at 15:40, Julien Rouhaud <rjuju123@gmail.com> wrote:

On Wed, Jul 29, 2020 at 2:42 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

Or, we should extend the existing query normalization to handle also DDL?

+1, introducing DDL normalization seems like a better way to go in the
long run. Defining what should and shouldn't be normalized can be
tricky though.

In principle, the only thing that really needs to be normalised is SAVEPOINT/CURSOR names which are essentially random strings which have no effect on the result. Most other stuff is material to the query.

That said, I think "aggregate by tag" has value all by itself. Being able to collapse all CREATE TABLES into a single line can be useful in some situations.

There's at least PREPARE TRANSACTION / COMMIT PREPARED / ROLLBACK
PREPARED that should be normalized too. I also don't think that we
really want to have different entries for begin / Begin / BEGIN /
bEgin and similar for many other commands, as the hash is computed
based on the query text.

Michael Paquier

michael@paquier.xyz

almost 6 years ago

In reply to: Julien Rouhaud (#5)

Re: IDEA: pg_stat_statements tracking utility statements by tag?

On Wed, Jul 29, 2020 at 06:35:41PM +0200, Julien Rouhaud wrote:

There's at least PREPARE TRANSACTION / COMMIT PREPARED / ROLLBACK
PREPARED that should be normalized too. I also don't think that we
really want to have different entries for begin / Begin / BEGIN /
bEgin and similar for many other commands, as the hash is computed
based on the query text.

Hmm. Do we really want to those commands fully normalized all the
time? There may be applications that care about the stats of some
commands that are for example prefixed the same way and would prefer
group those things together. By fully normalizing those commands all
the time, we would lose this option.

An example. The ODBC driver uses its own grammar for internal
savepoint names, aka _EXEC_SVP_%p. If you mix that with a second
application that has its own naming policy for savepoints it would not
be possible anymore to make the difference in the stats between what
one or the other do.
--
Michael

rjuju123@gmail.com

almost 6 years ago

In reply to: Michael Paquier (#6)

Re: IDEA: pg_stat_statements tracking utility statements by tag?

On Thu, Jul 30, 2020 at 3:54 AM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Jul 29, 2020 at 06:35:41PM +0200, Julien Rouhaud wrote:

There's at least PREPARE TRANSACTION / COMMIT PREPARED / ROLLBACK
PREPARED that should be normalized too. I also don't think that we
really want to have different entries for begin / Begin / BEGIN /
bEgin and similar for many other commands, as the hash is computed
based on the query text.

Hmm. Do we really want to those commands fully normalized all the
time? There may be applications that care about the stats of some
commands that are for example prefixed the same way and would prefer
group those things together. By fully normalizing those commands all
the time, we would lose this option.

An example. The ODBC driver uses its own grammar for internal
savepoint names, aka _EXEC_SVP_%p. If you mix that with a second
application that has its own naming policy for savepoints it would not
be possible anymore to make the difference in the stats between what
one or the other do.

But if you have an OLTP application that uses ODBC, won't you already
have 80+% of pgss entries being savepoint orders, which is really not
helpful at all? We'd technically lose the ability to group such
commands together, but in most cases the current behavior is quite
harmful.