proposal: simple query profile and tracing API

Started by Pavel Stehuleabout 7 years ago7 messages
#1Pavel Stehule
pavel.stehule@gmail.com

Hi

We have good API and tools for monitoring slow queries. What is very good.

But I miss a monitoring fast queries what is usually major number and where
relatively small slowdown can to produce unhappy latencies on user
interface. More, the slowdowns here can shows some issues of database
health - bloated tables, indexes, overloading, ..

Because these queries are usually fast, the proposed interface should not
to add any hard overhead, and it should not be too complex, because simple
things are just better.

My idea is collect few metrics for any query in local memory - when query
tracing will be enabled. Now I am thinking mainly about:

* session start time
* transaction start time
* query start time
* query signature
* planning interval
* lock interval
* execution interval
* finish time
* query status
.. maybe more

These metrics can be stored in local memory and I think so collecting these
numbers should be pretty fast. Some of mentioned metrics can be taken now,
but more than one hood should be assigned.

When query will be finished - then some new hook can be executed, and there
can be a access to mentioned metrics. The hook should be evaluated under
working transaction or with own transaction if previous query fails. This
API should to work with failed, cancelled, cancelled by timeout queries
too.

Maybe similar hooks can be after transaction, and after session - where
some metrics can be processed before will be replaced for new transaction
or lost by disconnect.

What do you think about this proposal?

Regards

Pavel

#2legrand legrand
legrand_legrand@hotmail.com
In reply to: Pavel Stehule (#1)
Re: proposal: simple query profile and tracing API

Hello Pavel,

What about using wait events and a trigger on pg_stat_activity ?

just :

* create a functions to get current query signature (queryid) for a pid

(not the top_level_query given for pl/pgsql blocks or triggers but the
active one)

* add some kind of active events to track planning (in an extension with a
planning hook)

and populate some continuous views as proposed by pipelinedb (a very
flexible solution).

Yes, I know a trigger is not possible, and overhead of continuous views has
not been verified,
then some high frequency sampling on pg_stat_activity could help (I can
provide examples for f_get_current_queryid(pid), active event for planning
hook, continuous views)

An other solution: a customized version of pgsentinel (for high frequency
sampling):

see
https://www.postgresql-archive.org/Proposal-Add-accumulated-statistics-for-wait-event-td6030800.html#a6031384

Regards
PAscal

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

#3Pavel Stehule
pavel.stehule@gmail.com
In reply to: legrand legrand (#2)
Re: proposal: simple query profile and tracing API

út 13. 11. 2018 v 13:12 odesílatel legrand legrand <
legrand_legrand@hotmail.com> napsal:

Hello Pavel,

What about using wait events and a trigger on pg_stat_activity ?

pg_stat_activity should not to show fresh data. Using pg_stat_activity can
be too expensive for fast queries

just :

* create a functions to get current query signature (queryid) for a pid

(not the top_level_query given for pl/pgsql blocks or triggers but the
active one)

* add some kind of active events to track planning (in an extension with a
planning hook)

and populate some continuous views as proposed by pipelinedb (a very
flexible solution).

Yes, I know a trigger is not possible, and overhead of continuous views has
not been verified,
then some high frequency sampling on pg_stat_activity could help (I can
provide examples for f_get_current_queryid(pid), active event for planning
hook, continuous views)

An other solution: a customized version of pgsentinel (for high frequency
sampling):

I don't believe to sampling method - I talk about less than 10ms queries, I
would to see a 2-3ms planning time, 2-5ms waitings - and it means sampling
aboy 2ms, what is expensive

Regards

Pavel

Show quoted text

see

https://www.postgresql-archive.org/Proposal-Add-accumulated-statistics-for-wait-event-td6030800.html#a6031384

Regards
PAscal

--
Sent from:
http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

#4Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Pavel Stehule (#3)
Re: proposal: simple query profile and tracing API

On Tue, 2018-11-13 at 13:55 +0100, Pavel Stehule wrote:

út 13. 11. 2018 v 13:12 odesílatel legrand legrand <
legrand_legrand@hotmail.com> napsal:

Hello Pavel,

What about using wait events and a trigger on pg_stat_activity ?

pg_stat_activity should not to show fresh data. Using
pg_stat_activity can be too expensive for fast queries

More importantly, how would you create a trigger on pg_stat_activity,
considering it's a system view backed by SRF?

...
An other solution: a customized version of pgsentinel (for high
frequency sampling):

I don't believe to sampling method - I talk about less than 10ms
queries, I would to see a 2-3ms planning time, 2-5ms waitings - and
it means sampling aboy 2ms, what is expensive

You're quietly assuming that whatever alternative solution you end up
inventing will be cheaper than this sampling. Which is going to be
hard, if you want to do that for every execution of even the shortest
queries. I'd say that's doomed to fail.

Moreover, the sampling does not need to catch every query execution.
The idea is to do it "just often enough" for some desired accuracy. For
example you might pick 10ms interval - it will hit even shorter queries
if they are executed often enough (and if they're not, who cares about
them?). And given the sample percentages and total time, you can do
some estimates for each query / phase.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tomas Vondra (#4)
Re: proposal: simple query profile and tracing API

út 13. 11. 2018 v 20:38 odesílatel Tomas Vondra <
tomas.vondra@2ndquadrant.com> napsal:

On Tue, 2018-11-13 at 13:55 +0100, Pavel Stehule wrote:

út 13. 11. 2018 v 13:12 odesílatel legrand legrand <
legrand_legrand@hotmail.com> napsal:

Hello Pavel,

What about using wait events and a trigger on pg_stat_activity ?

pg_stat_activity should not to show fresh data. Using
pg_stat_activity can be too expensive for fast queries

More importantly, how would you create a trigger on pg_stat_activity,
considering it's a system view backed by SRF?

...
An other solution: a customized version of pgsentinel (for high
frequency sampling):

I don't believe to sampling method - I talk about less than 10ms
queries, I would to see a 2-3ms planning time, 2-5ms waitings - and
it means sampling aboy 2ms, what is expensive

You're quietly assuming that whatever alternative solution you end up
inventing will be cheaper than this sampling. Which is going to be
hard, if you want to do that for every execution of even the shortest
queries. I'd say that's doomed to fail.

My idea is very simple.

1. continual collect of data - planning start, execution start, waiting
start, waiting end, query end

2. run a some callback function after query is finished. Collected data
will be passed there.

I think so anybody can do some different with these data. Sometimes only
sum can be ok, sometimes you need to increment some sorted counts,
sometimes you need to store these data for median or percentil calculation.

I think so it can be very simple and fast, because you should to work with
just one metrics vector.

Moreover, the sampling does not need to catch every query execution.
The idea is to do it "just often enough" for some desired accuracy. For
example you might pick 10ms interval - it will hit even shorter queries
if they are executed often enough (and if they're not, who cares about
them?). And given the sample percentages and total time, you can do
some estimates for each query / phase.

With 10ms sampling there will not be big error, but 10ms sampling can
utilize CPU too high. Now I don't see a advantage of sampling based method
with more complex processing (because you should to process more rows)
against to session based processing.

Show quoted text

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#6legrand legrand
legrand_legrand@hotmail.com
In reply to: Pavel Stehule (#5)
Re: proposal: simple query profile and tracing API

Pavel Stehule wrote

út 13. 11. 2018 v 20:38 odesílatel Tomas Vondra <

tomas.vondra@

napsal:

My idea is very simple.

1. continual collect of data - planning start, execution start, waiting
start, waiting end, query end

2. run a some callback function after query is finished. Collected data
will be passed there.

I think so anybody can do some different with these data. Sometimes only
sum can be ok, sometimes you need to increment some sorted counts,
sometimes you need to store these data for median or percentil
calculation.

I think so it can be very simple and fast, because you should to work with
just one metrics vector.

the same idea was already proposed for planning time in pg_stat_statements
by Thomas
https://www.postgresql-archive.org/Planning-counters-in-pg-stat-statements-tt5990933.html#a6002416

Regards
PAscal

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

#7Pavel Stehule
pavel.stehule@gmail.com
In reply to: legrand legrand (#6)
Re: proposal: simple query profile and tracing API

st 14. 11. 2018 v 14:06 odesílatel legrand legrand <
legrand_legrand@hotmail.com> napsal:

Pavel Stehule wrote

út 13. 11. 2018 v 20:38 odesílatel Tomas Vondra <

tomas.vondra@

napsal:

My idea is very simple.

1. continual collect of data - planning start, execution start, waiting
start, waiting end, query end

2. run a some callback function after query is finished. Collected data
will be passed there.

I think so anybody can do some different with these data. Sometimes only
sum can be ok, sometimes you need to increment some sorted counts,
sometimes you need to store these data for median or percentil
calculation.

I think so it can be very simple and fast, because you should to work

with

just one metrics vector.

the same idea was already proposed for planning time in pg_stat_statements
by Thomas

https://www.postgresql-archive.org/Planning-counters-in-pg-stat-statements-tt5990933.html#a6002416

Should not be original every time :)

Regards

Pavel

Show quoted text

Regards
PAscal

--
Sent from:
http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html