POC: Extension for adding distributed tracing - pg_tracing

aleksander@timescale.com

almost 3 years ago

In reply to: Anthonin Bonnefoy (#1)

Re: POC: Extension for adding distributed tracing - pg_tracing

Hi Anthonin,

I have a working prototype of a pg_tracing extension and wanted some feedback on the design and architecture.

The patch looks very interesting, thanks for working on it and for
sharing. The facts that the patch doesn't change the core except for
two lines in instrument.{c.h} and that is uses pull-based model:

- Collect the spans through a new pg_tracing_spans() function output

... IMO were the right design decisions. The patch lacks the
documentation, but this is OK for a PoC.

I added the patch to the nearest CF [1]https://commitfest.postgresql.org/44/4456/. Let's see what the rest of
the community thinks.

[1]: https://commitfest.postgresql.org/44/4456/

--
Best regards,
Aleksander Alekseev

hukutoc@gmail.com

almost 3 years ago

In reply to: Aleksander Alekseev (#2)

Re: POC: Extension for adding distributed tracing - pg_tracing

Hi!

This patch looks very interesting, I'm working on the same subject too. But
I've used
another approach - I'm using C wrapper for C++ API library from
OpenTelemetry, and
handle span storage and output to this library. There are some nuances
though, but it
is possible. Have you tried to use OpenTelemetry APIs instead of
implementing all
functionality around spans?

On Tue, Jul 25, 2023 at 1:16 PM Aleksander Alekseev <
aleksander@timescale.com> wrote:

Hi Anthonin,

I have a working prototype of a pg_tracing extension and wanted some

feedback on the design and architecture.

The patch looks very interesting, thanks for working on it and for
sharing. The facts that the patch doesn't change the core except for
two lines in instrument.{c.h} and that is uses pull-based model:

- Collect the spans through a new pg_tracing_spans() function output

... IMO were the right design decisions. The patch lacks the
documentation, but this is OK for a PoC.

I added the patch to the nearest CF [1]. Let's see what the rest of
the community thinks.

[1] https://commitfest.postgresql.org/44/4456/

--
Best regards,
Aleksander Alekseev

--
Regards,

--
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

aleksander@timescale.com

almost 3 years ago

In reply to: Nikita Malakhov (#3)

Re: POC: Extension for adding distributed tracing - pg_tracing

Nikita,

This patch looks very interesting, I'm working on the same subject too. But I've used
another approach - I'm using C wrapper for C++ API library from OpenTelemetry, and
handle span storage and output to this library. There are some nuances though, but it
is possible. Have you tried to use OpenTelemetry APIs instead of implementing all
functionality around spans?

I don't think that PostgreSQL accepts such kind of C++ code, not to
mention the fact that the PostgreSQL license is not necessarily
compatible with Apache 2.0 (I'm not a lawyer; this is not a legal
advice). Such a design decision will probably require using separate
compile flags since the user doesn't necessarily have a corresponding
dependency installed. Similarly to how we do with LLVM, OpenSSL, etc.

So -1 to the OpenTelemetry C++ library and +1 to the properly licensed
C implementation without 3rd party dependencies from me. Especially
considering the fact that the implementation seems to be rather
simple.

--
Best regards,
Aleksander Alekseev

anthonin.bonnefoy@datadoghq.com

almost 3 years ago

In reply to: Aleksander Alekseev (#4)

Re: POC: Extension for adding distributed tracing - pg_tracing

I've initially thought of sending the spans from PostgreSQL since this is
the usual behavior of tracing libraries.

However, this created a lot potential issues:

- Protocol support and differences between trace collectors. OpenTelemetry
seems to use gRPC, others are using http and those will require additional
libraries (plus gRPC support in C doesn't look good) and any change in
protobuf definition would require updating the extension.

- Do we send the spans within the query hooks? This means that we could
block the process if the trace collector is slow to answer or we can’t
connect. Sending spans from a background process sounded rather complex and
resource heavy.

Moving to a pull model fixed those issues and felt more natural as this is
the way PostgreSQL exposes its metrics.

On Wed, Jul 26, 2023 at 4:11 PM Aleksander Alekseev <
aleksander@timescale.com> wrote:

Show quoted text

Nikita,

This patch looks very interesting, I'm working on the same subject too.

But I've used

another approach - I'm using C wrapper for C++ API library from

OpenTelemetry, and

handle span storage and output to this library. There are some nuances

though, but it

is possible. Have you tried to use OpenTelemetry APIs instead of

implementing all

functionality around spans?

I don't think that PostgreSQL accepts such kind of C++ code, not to
mention the fact that the PostgreSQL license is not necessarily
compatible with Apache 2.0 (I'm not a lawyer; this is not a legal
advice). Such a design decision will probably require using separate
compile flags since the user doesn't necessarily have a corresponding
dependency installed. Similarly to how we do with LLVM, OpenSSL, etc.

So -1 to the OpenTelemetry C++ library and +1 to the properly licensed
C implementation without 3rd party dependencies from me. Especially
considering the fact that the implementation seems to be rather
simple.

--
Best regards,
Aleksander Alekseev

hukutoc@gmail.com

almost 3 years ago

In reply to: Anthonin Bonnefoy (#5)

Re: POC: Extension for adding distributed tracing - pg_tracing

Hi!

I've tried to test the extension, but got errors calling make check and
cannot install the extension.
I've applied the patch onto current master and configured it as:
./configure --enable-debug --enable-cassert --enable-depend
--enable-tap-tests
Could you please advise if I'm doing something wrong?
For make check please see attached log.
For installing extension after setting shared preload library I've got an
error in Postgres logfile:
2023-07-27 13:41:52.324 MSK [12738] LOG: database system is shut down
2023-07-27 13:41:52.404 MSK [14126] FATAL: could not load library
"/usr/local/pgsql/lib/pg_tracing.so": /usr/local/pgsql/lib/pg_tracing.so:
undefined symbol: get_operation_name
Thank you!

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

hukutoc@gmail.com

almost 3 years ago

In reply to: Nikita Malakhov (#6)

Re: POC: Extension for adding distributed tracing - pg_tracing

Hi,

Also FYI, there are build warnings because functions
const char * get_span_name(const Span * span, const char *qbuffer)
and
const char * get_operation_name(const Span * span, const char *qbuffer)
do not have default inside switch and no return outside of switch.

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

aleksander@timescale.com

almost 3 years ago

In reply to: Nikita Malakhov (#7)

Re: POC: Extension for adding distributed tracing - pg_tracing

Hi,

Also FYI, there are build warnings because functions
const char * get_span_name(const Span * span, const char *qbuffer)
and
const char * get_operation_name(const Span * span, const char *qbuffer)
do not have default inside switch and no return outside of switch.

You are right, there are a few warnings:

```
[1566/1887] Compiling C object contrib/pg_tracing/pg_tracing.so.p/span.c.o
../contrib/pg_tracing/span.c: In function ‘get_span_name’:
../contrib/pg_tracing/span.c:210:1: warning: control reaches end of
non-void function [-Wreturn-type]
210 | }
| ^
../contrib/pg_tracing/span.c: In function ‘get_operation_name’:
../contrib/pg_tracing/span.c:249:1: warning: control reaches end of
non-void function [-Wreturn-type]
249 | }
| ^
```

Here is the patch v2 with a quick fix.

but got errors calling make check and cannot install the extension

Agree, something goes wrong when using Autotools (but not Meson) on
both Linux and MacOS. I didn't investigate the issue though.

--
Best regards,
Aleksander Alekseev

hukutoc@gmail.com

almost 3 years ago

In reply to: Aleksander Alekseev (#8)

Re: POC: Extension for adding distributed tracing - pg_tracing

Hi,

I've fixed the Autotools build, please check patch below (v2).

On Thu, Jul 27, 2023 at 6:39 PM Aleksander Alekseev <
aleksander@timescale.com> wrote:

Hi,

Also FYI, there are build warnings because functions
const char * get_span_name(const Span * span, const char *qbuffer)
and
const char * get_operation_name(const Span * span, const char *qbuffer)
do not have default inside switch and no return outside of switch.

You are right, there are a few warnings:

```
[1566/1887] Compiling C object contrib/pg_tracing/pg_tracing.so.p/span.c.o
../contrib/pg_tracing/span.c: In function ‘get_span_name’:
../contrib/pg_tracing/span.c:210:1: warning: control reaches end of
non-void function [-Wreturn-type]
210 | }
| ^
../contrib/pg_tracing/span.c: In function ‘get_operation_name’:
../contrib/pg_tracing/span.c:249:1: warning: control reaches end of
non-void function [-Wreturn-type]
249 | }
| ^
```

Here is the patch v2 with a quick fix.

but got errors calling make check and cannot install the extension

Agree, something goes wrong when using Autotools (but not Meson) on
both Linux and MacOS. I didn't investigate the issue though.

--
Best regards,
Aleksander Alekseev

--
Regards,

--
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

#10

anthonin.bonnefoy@datadoghq.com

almost 3 years ago

In reply to: Nikita Malakhov (#9)

Re: POC: Extension for adding distributed tracing - pg_tracing

Agree, something goes wrong when using Autotools (but not Meson) on
both Linux and MacOS. I didn't investigate the issue though.

I was only using meson and forgot to keep Automake files up to date when
I've split pg_tracing.c in multiple files (span.c, explain.c...).

On Fri, Jul 28, 2023 at 8:10 AM Nikita Malakhov <hukutoc@gmail.com> wrote:

Show quoted text

Hi,

I've fixed the Autotools build, please check patch below (v2).

On Thu, Jul 27, 2023 at 6:39 PM Aleksander Alekseev <
aleksander@timescale.com> wrote:

Hi,

Also FYI, there are build warnings because functions
const char * get_span_name(const Span * span, const char *qbuffer)
and
const char * get_operation_name(const Span * span, const char *qbuffer)
do not have default inside switch and no return outside of switch.

You are right, there are a few warnings:

```
[1566/1887] Compiling C object contrib/pg_tracing/pg_tracing.so.p/span.c.o
../contrib/pg_tracing/span.c: In function ‘get_span_name’:
../contrib/pg_tracing/span.c:210:1: warning: control reaches end of
non-void function [-Wreturn-type]
210 | }
| ^
../contrib/pg_tracing/span.c: In function ‘get_operation_name’:
../contrib/pg_tracing/span.c:249:1: warning: control reaches end of
non-void function [-Wreturn-type]
249 | }
| ^
```

Here is the patch v2 with a quick fix.

but got errors calling make check and cannot install the extension

Agree, something goes wrong when using Autotools (but not Meson) on
both Linux and MacOS. I didn't investigate the issue though.

--
Best regards,
Aleksander Alekseev

--
Regards,

--
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

#11

hukutoc@gmail.com

almost 3 years ago

In reply to: Anthonin Bonnefoy (#10)

Re: POC: Extension for adding distributed tracing - pg_tracing

Hi,

I'd keep Autotools build up to date, because Meson is very limited in terms
of not very
up-to-date OS versions. Anyway, it's OK now. I'm currently playing with the
patch and
reviewing sources, if you need any cooperation - please let us know. I'm +1
for committing
this patch after review.

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

#12

anthonin.bonnefoy@datadoghq.com

almost 3 years ago

In reply to: Nikita Malakhov (#11)

Re: POC: Extension for adding distributed tracing - pg_tracing

Hi,

I'd keep Autotools build up to date

Definitely. The patch is still in a rough state and updating Autotools fell
through the cracks.

I'm currently playing with the patch and
reviewing sources, if you need any cooperation - please let us know.

The goal for me was to get validation on the design and to check if there
was anything that would be a blocker before commiting more time on the
project.

There are already several things I was planning on improving:
- The parse span is something I've added recently. I've realised that I
could rely on the stmtStartTimestamp to get the time spent in parsing in
the post parse hook so the code around this is still rough.
- There are probably race conditions on the reads and writes of the query
file that need to be fixed
- I'm using the same Span structure for everything while Executor spans
only need a small subset which is a bit wasteful. I've tried to use two
different shared buffers (base spans and spans with counters) but it was
making things more confusing.
- Finish commenting code and start writing the doc

So any comments, missing features and reviews on the current state of the
project is welcome.

Regards,
Anthonin

On Fri, Jul 28, 2023 at 9:35 AM Nikita Malakhov <hukutoc@gmail.com> wrote:

Show quoted text

Hi,

I'd keep Autotools build up to date, because Meson is very limited in
terms of not very
up-to-date OS versions. Anyway, it's OK now. I'm currently playing with
the patch and
reviewing sources, if you need any cooperation - please let us know.
I'm +1 for committing
this patch after review.

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

#13

hukutoc@gmail.com

almost 3 years ago

In reply to: Anthonin Bonnefoy (#12)

Re: POC: Extension for adding distributed tracing - pg_tracing

Hi!

What do you think about using INSTR_TIME_SET_CURRENT, INSTR_TIME_SUBTRACT
and INSTR_TIME_GET_MILLISEC
macros for timing calculations?

Also, have you thought about a way to trace existing (running) queries
without directly instrumenting them? It would
be much more usable for maintenance and support personnel, because in
production environments you rarely could
change query text directly. For the current state the most simple solution
is switch tracing on and off by calling SQL
function, and possibly switch tracing for prepared statement the same way,
but not for any random query.

I'll check the patch for the race conditions.

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

#14

anthonin.bonnefoy@datadoghq.com

almost 3 years ago

In reply to: Nikita Malakhov (#13)

Re: POC: Extension for adding distributed tracing - pg_tracing

What do you think about using INSTR_TIME_SET_CURRENT, INSTR_TIME_SUBTRACT

and INSTR_TIME_GET_MILLISEC

macros for timing calculations?

If you're talking of the two instances where I'm modifying the instr_time's
ticks, it's because I can't use the macros there.
The first case is for the parse span. I only have the start timestamp
using GetCurrentStatementStartTimestamp and don't
have access to the start instr_time so I need to build the duration from 2
timestamps.
The second case is when building node spans from the planstate. I directly
have the duration from Instrumentation.

I guess one fix would be to use an int64 for the span duration to directly
store nanoseconds instead of an instr_time
but I do use the instrumentation macros outside of those two cases to get
the duration of other spans.

Also, have you thought about a way to trace existing (running) queries

without directly instrumenting them?
That's a good point. I was focusing on leaving the sampling decision to the
caller through the sampled flag and
only recently added the pg_tracing_sample_rate parameter to give more
control. It should be straightforward to
add an option to create standalone traces based on sample rate alone. This
way, setting the sample rate to 1
would force the queries running in the session to be traced.

On Fri, Jul 28, 2023 at 3:02 PM Nikita Malakhov <hukutoc@gmail.com> wrote:

Show quoted text

Hi!

What do you think about using INSTR_TIME_SET_CURRENT, INSTR_TIME_SUBTRACT
and INSTR_TIME_GET_MILLISEC
macros for timing calculations?

Also, have you thought about a way to trace existing (running) queries
without directly instrumenting them? It would
be much more usable for maintenance and support personnel, because in
production environments you rarely could
change query text directly. For the current state the most simple solution
is switch tracing on and off by calling SQL
function, and possibly switch tracing for prepared statement the same way,
but not for any random query.

I'll check the patch for the race conditions.

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

#15

anthonin.bonnefoy@datadoghq.com

almost 3 years ago

In reply to: Anthonin Bonnefoy (#14)

Re: POC: Extension for adding distributed tracing - pg_tracing

Hi,

Here's a new patch with changes from the previous discussion:
- I'm now directly storing nanoseconds duration in the span instead of the
instr_time. Using the instr_time macros was a bit awkward as the
durations I generate don't necessarily have a starting and ending
instr_time.
Moving to straight nanoseconds made things clearer from my point of view.
- I've added an additional sample rate pg_tracing.sample_rate (on top of
the pg_tracing.caller_sample_rate). This one will allow queries to be
sampled even without trace information propagated from the caller.
Setting this sample rate to 1 will basically trace everything. For now,
this will only work when going through the post parse hook. I will add
support for prepared statements and cached plans for the next patch.
- I've improved how parse spans are created. It's a bit challenging to get
the start of a parse as there's no pre parse hook or instrumentation around
parse so it's only an estimation.

Regards,
Anthonin

On Fri, Jul 28, 2023 at 4:06 PM Anthonin Bonnefoy <
anthonin.bonnefoy@datadoghq.com> wrote:

Show quoted text

What do you think about using INSTR_TIME_SET_CURRENT,

INSTR_TIME_SUBTRACT and INSTR_TIME_GET_MILLISEC

macros for timing calculations?

If you're talking of the two instances where I'm modifying the
instr_time's ticks, it's because I can't use the macros there.
The first case is for the parse span. I only have the start timestamp
using GetCurrentStatementStartTimestamp and don't
have access to the start instr_time so I need to build the duration from 2
timestamps.
The second case is when building node spans from the planstate. I directly
have the duration from Instrumentation.

I guess one fix would be to use an int64 for the span duration to directly
store nanoseconds instead of an instr_time
but I do use the instrumentation macros outside of those two cases to get
the duration of other spans.

Also, have you thought about a way to trace existing (running) queries

without directly instrumenting them?
That's a good point. I was focusing on leaving the sampling decision to
the caller through the sampled flag and
only recently added the pg_tracing_sample_rate parameter to give more
control. It should be straightforward to
add an option to create standalone traces based on sample rate alone. This
way, setting the sample rate to 1
would force the queries running in the session to be traced.

On Fri, Jul 28, 2023 at 3:02 PM Nikita Malakhov <hukutoc@gmail.com> wrote:

Hi!

What do you think about using INSTR_TIME_SET_CURRENT, INSTR_TIME_SUBTRACT
and INSTR_TIME_GET_MILLISEC
macros for timing calculations?

Also, have you thought about a way to trace existing (running) queries
without directly instrumenting them? It would
be much more usable for maintenance and support personnel, because in
production environments you rarely could
change query text directly. For the current state the most simple
solution is switch tracing on and off by calling SQL
function, and possibly switch tracing for prepared statement the same
way, but not for any random query.

I'll check the patch for the race conditions.

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

#16

hukutoc@gmail.com

almost 3 years ago

In reply to: Anthonin Bonnefoy (#15)

Re: POC: Extension for adding distributed tracing - pg_tracing

Hi!

Thanks for the improvements!

Here's a new patch with changes from the previous discussion:
- I'm now directly storing nanoseconds duration in the span instead of the

instr_time. Using the instr_time macros was a bit awkward as the
durations I generate don't necessarily have a starting and ending
instr_time.

Moving to straight nanoseconds made things clearer from my point of view.

Cool. It could be easily casted to ms for the user.

- I've added an additional sample rate pg_tracing.sample_rate (on top of

the pg_tracing.caller_sample_rate). This one will allow queries to be
sampled even without trace information propagated from the caller.

Setting this sample rate to 1 will basically trace everything. For now,

this will only work when going through the post parse hook. I will add
support for prepared statements and cached plans for the next patch.
Cool, I've just made the same improvement and wanted to send a patch a bit
later after tests.

- I've improved how parse spans are created. It's a bit challenging to get

the start of a parse as there's no pre parse hook or instrumentation around
parse so it's only an estimation.
I've also added a query id field to span and made a table and an sql
function that flushes spans to this table instead of returning set or
records - it is more convenient for the maintenance to query the table.

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

#17

hukutoc@gmail.com

almost 3 years ago

In reply to: Nikita Malakhov (#16)

Re: POC: Extension for adding distributed tracing - pg_tracing

Hi!

Please check some suggested improvements -
1) query_id added so span to be able to join it with pg_stat_activity and
pg_stat_statements;
2) table for storing spans added, to flush spans buffer, for maintenance
reasons - to keep track of spans,
with SQL function that flushes buffer into table instead of recordset;
3) added setter function for sampling_rate GUC to tweak it on-the-fly
without restart.

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

#18

anthonin.bonnefoy@datadoghq.com

almost 3 years ago

In reply to: Nikita Malakhov (#17)

Re: POC: Extension for adding distributed tracing - pg_tracing

Hi!

1) query_id added so span to be able to join it with pg_stat_activity and

pg_stat_statements;
Sounds good, I've added your changes with my code.

2) table for storing spans added, to flush spans buffer

I'm not sure about this. It means that this is something that would only be
available on primary as replicas won't be able
to write data in the table. It will also make version updates and
migrations much more complex and I haven't seen a similar
pattern on other extensions.

3) added setter function for sampling_rate GUC to tweak it on-the-fly

without restart
ok, I've added this in my branch.

On my side, I've made the following changes:
1) All spans are now kept in palloced buffers and only added during
end_tracing. This way, we limit the shared_spans lock.
2) I've added a pg_tracing.drop_on_full_buffer parameter to drop all spans
when the buffer is full. This could be useful to always keep
the latest spans when the consuming app is not fast enough. This is also
useful for testing.
3) I'm testing more complex queries. Most of my previous tests were using
simple query protocol but extended protocol introduces
differences that break some assumptions I did. For example, with multi
statement transaction like
BEGIN;
SELECT 1;
SELECT 2;
The parse of SELECT 2 will happen before the ExecutorEnd (and the
end_tracing) of SELECT 1. For now, I'm skipping the post parse
hook if we still have an ongoing tracing.
I've also started running https://github.com/anse1/sqlsmith on a db with
full sample and it's currently failing some assertions and I'm
working to fix those.

On Thu, Aug 3, 2023 at 9:13 PM Nikita Malakhov <hukutoc@gmail.com> wrote:

Show quoted text

Hi!

Please check some suggested improvements -
1) query_id added so span to be able to join it with pg_stat_activity and
pg_stat_statements;
2) table for storing spans added, to flush spans buffer, for maintenance
reasons - to keep track of spans,
with SQL function that flushes buffer into table instead of recordset;
3) added setter function for sampling_rate GUC to tweak it on-the-fly
without restart.

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

#19

hukutoc@gmail.com

almost 3 years ago

In reply to: Anthonin Bonnefoy (#18)

Re: POC: Extension for adding distributed tracing - pg_tracing

Hi!

Storing spans is a tricky question. Monitoring systems often use pull model
and use probes or SQL
API for pulling data, so from this point of view it is much more convenient
to keep spans in separate
table. But in this case we come to another issue - how to flush this data
into the table? Automatic
flushing on a full buffer would randomly (to the user) significantly affect
query performance. I'd rather
make a GUC turned off by default to store spans into the table instead of
buffer.

3) I'm testing more complex queries. Most of my previous tests were using

simple query protocol but extended protocol introduces

differences that break some assumptions I did. For example, with multi

statement transaction like

BEGIN;
SELECT 1;
SELECT 2;
The parse of SELECT 2 will happen before the ExecutorEnd (and the

end_tracing) of SELECT 1. For now, I'm skipping the post parse

hook if we still have an ongoing tracing.

I've checked this behavior before and haven't noticed the case you
mentioned, but for
loops like
for i in 1..2 loop
StartTime := clock_timestamp();
insert into t2 values (i, a_data);
EndTime := clock_timestamp();
Delta := 1000 * ( extract(epoch from EndTime) - extract(epoch from
StartTime) );
end loop;

I've got the following call sequence:
psql:/home/postgres/tests/trace.sql:52: NOTICE:
pg_tracing_post_parse_analyze hook <1>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_planner_hook <1>
psql:/home/postgres/tests/trace.sql:52: NOTICE:
pg_tracing_post_parse_analyze hook <2>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_planner_hook <2>
psql:/home/postgres/tests/trace.sql:52: NOTICE:
pg_tracing_post_parse_analyze hook <StartTime := clock_timestamp()>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_planner_hook
<StartTime := clock_timestamp()>
psql:/home/postgres/tests/trace.sql:52: NOTICE:
pg_tracing_post_parse_analyze hook <insert into t2 values (i, a_data)>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_planner_hook
<insert into t2 values (i, a_data)>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_ExecutorStart
<insert into t2 values (i, a_data)>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_ExecutorRun
<insert into t2 values (i, a_data)>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_ExecutorFinish
<insert into t2 values (i, a_data)>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_ExecutorEnd
<insert into t2 values (i, a_data)>
psql:/home/postgres/tests/trace.sql:52: NOTICE:
pg_tracing_post_parse_analyze hook <EndTime := clock_timestamp()>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_planner_hook
<EndTime := clock_timestamp()>
psql:/home/postgres/tests/trace.sql:52: NOTICE:
pg_tracing_post_parse_analyze hook <Delta := 1000 * ( extract(epoch from
EndTime) - extract(epoch from StartTime) )>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_planner_hook
<Delta := 1000 * ( extract(epoch from EndTime) - extract(epoch from
StartTime) )>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_planner_hook
<insert into t2 values (i, a_data)>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_ExecutorStart
<insert into t2 values (i, a_data)>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_ExecutorRun
<insert into t2 values (i, a_data)>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_ExecutorFinish
<insert into t2 values (i, a_data)>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_ExecutorEnd
<insert into t2 values (i, a_data)>

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

#20

anthonin.bonnefoy@datadoghq.com

almost 3 years ago

In reply to: Nikita Malakhov (#19)

Re: POC: Extension for adding distributed tracing - pg_tracing

Hi!

Storing spans is a tricky question. Monitoring systems often use pull model and use probes or SQL
API for pulling data, so from this point of view it is much more convenient to keep spans in separate
table. But in this case we come to another issue - how to flush this data into the table? Automatic
flushing on a full buffer would randomly (to the user) significantly affect query performance. I'd rather
make a GUC turned off by default to store spans into the table instead of buffer.

My main issue with having the extension flush spans in a table is that
it will only work on primaries.
Replicas won't be able to do this as they can't write data on tables
and having this flush function is likely going to introduce more
confusion if it only works on primaries.

From my point of view, processing the spans should be done by an
external application.Similar to my initial example which forward spans
to a trace collector
(https://gist.github.com/bonnefoa/6ed24520bdac026d6a6a6992d308bd50#file-main-go),
this application could instead store spans in a dedicated table. This
way, the creation of the table is outside pg_tracing's scope and the
span_store application will be able to store spans on replicas and
primaries.
How frequently this application should pull data to avoid full buffer
and dropping spans is a tricky part. We have stats so we can monitor
if drops are happening and adjust the spans buffer size or increase
the application's pull frequency. Another possibility (though I'm not
familiar with that) could be to use notifications. The application
will listen to a channel and pg_tracing will notify when a
configurable threshold is reached (i.e.: if the buffer reaches 50%,
send a notification).

I've checked this behavior before and haven't noticed the case you mentioned, but for
loops like
for i in 1..2 loop
StartTime := clock_timestamp();
insert into t2 values (i, a_data);
EndTime := clock_timestamp();
Delta := 1000 * ( extract(epoch from EndTime) - extract(epoch from StartTime) );
end loop;

Was this run through psql? In this case, you would be using simple
protocol which always drops the portal at the end of the statement and
is not triggering the issue.

With extended protocol and a multi statement transactions, I have the
following backtrace:
* frame #0: 0x0000000104581318
postgres`ExecutorEnd(queryDesc=0x00000001420083b0) at execMain.c:471:6
frame #1: 0x00000001044e1644
postgres`PortalCleanup(portal=0x0000000152031d00) at
portalcmds.c:299:4
frame #2: 0x0000000104b0e77c
postgres`PortalDrop(portal=0x0000000152031d00, isTopCommit=false) at
portalmem.c:503:3
frame #3: 0x0000000104b0e3e8 postgres`CreatePortal(name="",
allowDup=true, dupSilent=true) at portalmem.c:194:3
frame #4: 0x000000010487a308
postgres`exec_bind_message(input_message=0x000000016bb4b398) at
postgres.c:1745:12
frame #5: 0x000000010487846c
postgres`PostgresMain(dbname="postgres", username="postgres") at
postgres.c:4685:5
frame #6: 0x0000000104773144
postgres`BackendRun(port=0x0000000141704080) at postmaster.c:4433:2
frame #7: 0x000000010477044c
postgres`BackendStartup(port=0x0000000141704080) at
postmaster.c:4161:3
frame #8: 0x000000010476d6fc postgres`ServerLoop at postmaster.c:1778:6
frame #9: 0x000000010476c260 postgres`PostmasterMain(argc=3,
argv=0x0000600001cf72c0) at postmaster.c:1462:11
frame #10: 0x0000000104625ca8 postgres`main(argc=3,
argv=0x0000600001cf72c0) at main.c:198:3
frame #11: 0x00000001a61dbf28 dyld`start + 2236

At this point, the new statement is already parsed and it's only
during the bind that the previous' statement portal is dropped and
ExecutorEnd is called. Thus, we have overlapping statements which are
tricky to handle.

On my side, I've made the following changes:
Skip nodes that are not executed
Fix query_id propagation, it should now be associated with all the
spans of a query when available
I've forgotten to add the spans buffer in the shmem hook so it would
crash when pg_tracing.max_span was too high. Now it's fixed and we can
request more than 2000 spans. Currently, the size of Span is 320 bytes
so 50000 will take ~15MB of memory.
I've added the subplan/init plan spans and started experimenting
deparsing the plan to add more details on the nodes. Instead of
'NestedLoop', we will have something 'NestedLoop|Join Filter : (oid =
relnamespace)'. One consequence is that it will create a high number
of different operations which is something we want to avoid with trace
collectors. I'm probably gonna move this outside the operation name or
find a way to make this more stable.

I've started initial benchmarks and profiling. On a default
installation and running a default `pgbench -T15`, I have the
following:
Without pg_tracing: tps = 953.118759 (without initial connection time)
With pg_tracing: tps = 652.953858 (without initial connection time)
Most of the time is spent process_query_desc. I haven't tried to
optimize performances yet so there's probably a lot of room for
improvement.

Show quoted text

On Mon, Aug 14, 2023 at 11:17 AM Nikita Malakhov <hukutoc@gmail.com> wrote:

Hi!

Storing spans is a tricky question. Monitoring systems often use pull model and use probes or SQL
API for pulling data, so from this point of view it is much more convenient to keep spans in separate
table. But in this case we come to another issue - how to flush this data into the table? Automatic
flushing on a full buffer would randomly (to the user) significantly affect query performance. I'd rather
make a GUC turned off by default to store spans into the table instead of buffer.

3) I'm testing more complex queries. Most of my previous tests were using simple query protocol but extended protocol introduces
differences that break some assumptions I did. For example, with multi statement transaction like
BEGIN;
SELECT 1;
SELECT 2;
The parse of SELECT 2 will happen before the ExecutorEnd (and the end_tracing) of SELECT 1. For now, I'm skipping the post parse
hook if we still have an ongoing tracing.

I've checked this behavior before and haven't noticed the case you mentioned, but for
loops like
for i in 1..2 loop
StartTime := clock_timestamp();
insert into t2 values (i, a_data);
EndTime := clock_timestamp();
Delta := 1000 * ( extract(epoch from EndTime) - extract(epoch from StartTime) );
end loop;

I've got the following call sequence:
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_post_parse_analyze hook <1>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_planner_hook <1>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_post_parse_analyze hook <2>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_planner_hook <2>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_post_parse_analyze hook <StartTime := clock_timestamp()>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_planner_hook <StartTime := clock_timestamp()>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_post_parse_analyze hook <insert into t2 values (i, a_data)>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_planner_hook <insert into t2 values (i, a_data)>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_ExecutorStart <insert into t2 values (i, a_data)>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_ExecutorRun <insert into t2 values (i, a_data)>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_ExecutorFinish <insert into t2 values (i, a_data)>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_ExecutorEnd <insert into t2 values (i, a_data)>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_post_parse_analyze hook <EndTime := clock_timestamp()>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_planner_hook <EndTime := clock_timestamp()>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_post_parse_analyze hook <Delta := 1000 * ( extract(epoch from EndTime) - extract(epoch from StartTime) )>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_planner_hook <Delta := 1000 * ( extract(epoch from EndTime) - extract(epoch from StartTime) )>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_planner_hook <insert into t2 values (i, a_data)>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_ExecutorStart <insert into t2 values (i, a_data)>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_ExecutorRun <insert into t2 values (i, a_data)>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_ExecutorFinish <insert into t2 values (i, a_data)>
psql:/home/postgres/tests/trace.sql:52: NOTICE: pg_tracing_ExecutorEnd <insert into t2 values (i, a_data)>

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

#21

anthonin.bonnefoy@datadoghq.com

almost 3 years ago

In reply to: Anthonin Bonnefoy (#20)

#22

anthonin.bonnefoy@datadoghq.com

over 2 years ago

In reply to: Anthonin Bonnefoy (#21)

#23

anthonin.bonnefoy@datadoghq.com

over 2 years ago

In reply to: Anthonin Bonnefoy (#22)

#24

aleksander@timescale.com

over 2 years ago

In reply to: Anthonin Bonnefoy (#23)

#25

hukutoc@gmail.com

over 2 years ago

In reply to: Aleksander Alekseev (#24)

#26

anthonin.bonnefoy@datadoghq.com

over 2 years ago

In reply to: Nikita Malakhov (#25)

#27

anthonin.bonnefoy@datadoghq.com

over 2 years ago

In reply to: Anthonin Bonnefoy (#26)

#28

Shlok Kyal

shlok.kyal.oss@gmail.com

over 2 years ago

In reply to: Anthonin Bonnefoy (#27)

#29

anthonin.bonnefoy@datadoghq.com

over 2 years ago

In reply to: Shlok Kyal (#28)

#30

anthonin.bonnefoy@datadoghq.com

over 2 years ago

In reply to: Anthonin Bonnefoy (#29)

#31

aleksander@timescale.com

over 2 years ago

In reply to: Anthonin Bonnefoy (#30)

#32

anthonin.bonnefoy@datadoghq.com

over 2 years ago

In reply to: Aleksander Alekseev (#31)

#33

Craig Ringer

craig@2ndquadrant.com

over 2 years ago

In reply to: Anthonin Bonnefoy (#32)

#34

hukutoc@gmail.com

over 2 years ago

In reply to: Craig Ringer (#33)

#35

aleksander@timescale.com

over 2 years ago

In reply to: Nikita Malakhov (#34)

#36

anthonin.bonnefoy@datadoghq.com

over 2 years ago

In reply to: Aleksander Alekseev (#35)

#37

hukutoc@gmail.com

over 2 years ago

In reply to: Anthonin Bonnefoy (#36)

#38

vignesh C

vignesh21@gmail.com

over 2 years ago

In reply to: Anthonin Bonnefoy (#32)

#39

Jelte Fennema-Nio

postgres@jeltef.nl

over 2 years ago

In reply to: Anthonin Bonnefoy (#36)

#40

anthonin.bonnefoy@datadoghq.com

over 2 years ago

In reply to: Jelte Fennema-Nio (#39)

#41

Jelte Fennema-Nio

postgres@jeltef.nl

over 2 years ago

In reply to: Anthonin Bonnefoy (#40)

#42

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Anthonin Bonnefoy (#40)

#43

anthonin.bonnefoy@datadoghq.com

over 2 years ago

In reply to: Peter Smith (#42)

#44

hukutoc@gmail.com

over 2 years ago

In reply to: Anthonin Bonnefoy (#43)

#45

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 2 years ago

In reply to: Anthonin Bonnefoy (#43)

#46

Jan Schulz

jasc@gmx.net

over 2 years ago

In reply to: Heikki Linnakangas (#45)

#47

anthonin.bonnefoy@datadoghq.com

over 2 years ago

In reply to: Jan Schulz (#46)

#48

hukutoc@gmail.com

over 2 years ago

In reply to: Anthonin Bonnefoy (#47)

#49

Andres Freund

andres@anarazel.de

over 2 years ago

In reply to: Nikita Malakhov (#48)

#50

anthonin.bonnefoy@datadoghq.com

about 2 years ago

In reply to: Andres Freund (#49)

#51

Craig Ringer

craig@2ndquadrant.com

about 2 years ago

In reply to: Anthonin Bonnefoy (#50)

#52

aleksander@timescale.com

about 2 years ago

In reply to: Craig Ringer (#51)

#53