libpq debug log
Hi,
I'm going to propose libpq debug log for analysis of queries on the application side.
I think that it is useful to determine whether the cause is on the application side or the server side when a slow query occurs.
The provided information is "date and time" at which execution of processing is started, "query", "application side processing", which is picked up information from PQtrace(), and "connection id", which is for uniquely identifying the connection.
To collect the log, set the connection string or environment variable.
- logdir or PGLOGDIR : directory where log file written
- logsize or PGLOGSIZE : maximum log size
What do you think about this? Do you think that such feature is necessary?
Regards,
Aya Iwata
"Iwata, Aya" <iwata.aya@jp.fujitsu.com> writes:
I'm going to propose libpq debug log for analysis of queries on the application side.
I think that it is useful to determine whether the cause is on the application side or the server side when a slow query occurs.
Hm, how will you tell that really? And what's the advantage over the
existing server-side query logging capability?
The provided information is "date and time" at which execution of processing is started, "query", "application side processing", which is picked up information from PQtrace(), and "connection id", which is for uniquely identifying the connection.
PQtrace() is utterly useless for anything except debugging libpq
internals, and it's not tremendously useful even for that. Don't
bother with that part.
Where will you get a "unique connection id" from?
How will you deal with asynchronously-executed queries --- either
the PQgetResult style, or the single-row-at-a-time style?
regards, tom lane
On Fri, 24 Aug 2018 04:38:22 +0000
"Iwata, Aya" <iwata.aya@jp.fujitsu.com> wrote:
Hi,
I'm going to propose libpq debug log for analysis of queries on the application side.
I think that it is useful to determine whether the cause is on the application side or the server side when a slow query occurs.
Do you mean you want to monitor the protocol message exchange at
the client side to analyze performance issues, right? Actually,
this might be useful to determin where is the problem, for example,
the client application, the backend PostgreSQL, or somewhere in the
network.
Such logging can be implemented in the application, but if libpq
provides the standard way, it would be helpful to resolve a problem
without modifying the application code.
The provided information is "date and time" at which execution of processing is started, "query", "application side processing", which is picked up information from PQtrace(), and "connection id", which is for uniquely identifying the connection.
I couldn't image how this is like. Could you show us a sample of log lines
in your head?
To collect the log, set the connection string or environment variable.
- logdir or PGLOGDIR : directory where log file written
- logsize or PGLOGSIZE : maximum log size
How we can specify the log file name? What should happen if a file size
exceeds to PGLOGSIZE?
Regards,
--
Yugo Nagata <nagata@sraoss.co.jp>
On Fri, Aug 24, 2018 at 9:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
PQtrace() is utterly useless for anything except debugging libpq
internals, and it's not tremendously useful even for that. Don't
bother with that part.
I think that improving the output format could help with that a lot.
What it current produces is almost unreadable; adjusting it to emit
one line per protocol message would, I think, help a lot. There are
probably other improvements that could be made at the same time.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Fri, Aug 24, 2018 at 9:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
PQtrace() is utterly useless for anything except debugging libpq
internals, and it's not tremendously useful even for that. Don't
bother with that part.
I think that improving the output format could help with that a lot.
What it current produces is almost unreadable; adjusting it to emit
one line per protocol message would, I think, help a lot. There are
probably other improvements that could be made at the same time.
I wouldn't mind throwing it out and reimplementing it ;-) ... tracing
at the logical message level rather than the byte level would help.
But still, I'm not really convinced that it has very much to do with
what you'd want in a user-level debug log.
Part of what's really bad about the PQtrace output is that in v2 protocol
the same output can be repeated several times as we try to parse a message
and conclude we don't have it all yet. I believe that problem is gone
in v3, but it may be hard to do a consistent redesign until we nuke libpq's
v2 support. Still, it might be past time for the latter, seeing that
we no longer promise pre-7.4 compatibility in either psql or pg_dump.
regards, tom lane
"Iwata, Aya" <iwata.aya@jp.fujitsu.com> writes:
I'm going to propose libpq debug log for analysis of queries on the application
side.
I think that it is useful to determine whether the cause is on the application
side or the server side when a slow query occurs.
Hm, how will you tell that really? And what's the advantage over the existing
server-side query logging capability?
The log I would like to propose is used when the performance issue happen,
system administrator knows the process of application internally and check if there is any problem.
"debug" is not the correct description of the feature. The correct one should be "trace". Should I create another thread?
The provided information is "date and time" at which execution of processing
is started, "query", "application side processing", which is picked up
information from PQtrace(), and "connection id", which is for uniquely
identifying the connection.PQtrace() is utterly useless for anything except debugging libpq internals,
and it's not tremendously useful even for that. Don't bother with that part.
My initial intention was to get only useful information from PQTrace () since it acquires a lot of information.
Is there another way to obtain application side information besides PQTrace() ?
Where will you get a "unique connection id" from?
When collecting trace log file in the application side,
I think it is necessary to identify different connection.
In order to do this, when new empty PQconn structure is created, new connection id is also created.
Then we output it in the trace log file for one application.
How will you deal with asynchronously-executed queries --- either the
PQgetResult style, or the single-row-at-a-time style?
In my understanding, PQgetResult style outputs logs of multiple result queries at once,
While the single-row-at-a-time style outputs log for each query. Is this correct?
I think PQgetResult style is better,
because this style traces the internal process of the application.
Regards,
Aya Iwata
I'm going to propose libpq debug log for analysis of queries on the application
side.
I think that it is useful to determine whether the cause is on the application
side or the server side when a slow query occurs.
Do you mean you want to monitor the protocol message exchange at the client
side to analyze performance issues, right? Actually, this might be useful to
determin where is the problem, for example, the client application, the backend
PostgreSQL, or somewhere in the network.Such logging can be implemented in the application, but if libpq provides the
standard way, it would be helpful to resolve a problem without modifying the
application code.
Since I'd like to monitor the information the server and the client exchange,
I think monitoring protocol messages is good.
When a slow query is occurs, we check this client side trace log.
The purpose of this log acquisition I thought is to identify where is the problem:
server side, application side or traffic.
And if the problem is in application side, checking the trace log to identify what is the problem.
The provided information is "date and time" at which execution of processing
is started, "query", "application side processing", which is picked up
information from PQtrace(), and "connection id", which is for uniquely
identifying the connection.I couldn't image how this is like. Could you show us a sample of log lines in
your head?
I am roughly thinking as follows;
...
START : 2018/09/03 18:16:35.357 CONNECTION(1)
STATUS : Connection
SEND MESSAGE : 2018/09/03 18:16:35.358
<information send to backend...>
RECEIVE MESSAGE : 2018/09/03 18:16:35.359
<information receive from backend...>
END : 2018/09/03 18:16:35.360
...
START : 2018/09/03 18:16:35.357 CONNECTION(1)
QUERY : DECLARE myportal CURSOR FOR select * from pg_database
SEND MESSAGE : 2018/09/03 18:16:35.358
<information send to backend...>
RECEIVE MESSAGE : 2018/09/03 18:16:35.359
<information receive from backend...>
END : 2018/09/03 18:16:35.360
...
To collect the log, set the connection string or environment variable.
- logdir or PGLOGDIR : directory where log file written
- logsize or PGLOGSIZE : maximum log sizeHow we can specify the log file name? What should happen if a file size exceeds
to PGLOGSIZE?
The log file name is determined as follow.
libpq-%ApplicationName-%Y-%m-%d_%H%M%S.log
When the log file size exceeds to PGLOGSIZE, the log is output to another file.
Regards,
Aya Iwata
"Iwata, Aya" <iwata.aya@jp.fujitsu.com> writes:
The purpose of this log acquisition I thought is to identify where is the problem:
server side, application side or traffic.
TBH, I think the sort of logging you're proposing would be expensive
enough that *it* would be the bottleneck in a lot of cases. A lot
of people find that the existing server-side "log_statement" support
is too expensive to keep turned on in production --- and that logs only
received SQL queries, not the returned data, and certainly not every
message passed over the wire.
regards, tom lane
On 04/09/2018 02:29, Iwata, Aya wrote:
Since I'd like to monitor the information the server and the client exchange,
I think monitoring protocol messages is good.When a slow query is occurs, we check this client side trace log.
The purpose of this log acquisition I thought is to identify where is the problem:
server side, application side or traffic.
And if the problem is in application side, checking the trace log to identify what is the problem.
Between perf/systemtap/dtrace and wireshark, you can already do pretty
much all of that. Have you looked at those and found anything missing?
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Let me explain this trace log in a bit more detail.
This log is not turned on in the system by default.
Turn it on when an issue occurs and you want to check the information in the application in order to clarify the cause.
I will present three use cases for this log.
1. Confirmation on cause of out-of-memory
We assume that Out-of-memory occurred in the process of storing the data received from the database.
However, the contents or length of the data is not known.
A trace log is obtained to find these out and what kind of data you have in which part (i.e. query result when receiving from database).
2. Protocol error confirmation
When there is an error in the protocol transmitted from the client and an error occurs in the database server, the protocol sent by the client is checked.
When the network is unstable, log is checked whether the database server is receiving protocol messages.
3. Processing delay survey
If the processing in the application is slow and the processing in the database is fast, investigate the cause of the processing delay.
4 kind of time can be obtained by the log;
Timestamp when SQL started
Timestamp when information began to be sent to the backend
Timestamp when information is successfully received in the application
Timestamp when SQL ended
Then the difference between the time is checked to find out which part of process takes time.
I reconfirm the function I proposed.
If get the trace log, set PGLOGDIR/logdir and PGLOGSIZE/logsize.
These parameters are set in the environment variable or the connection service file.
- logdir or PGLOGDIR : directory where log file written
- logsize or PGLOGSIZE : maximum log size. When the log file size exceeds to PGLOGSIZE, the log is output to another file.
The log file name is determined as follow.
libpq-%ConnectionID-%Y-%m-%d_%H%M%S.log
This is a trace log example;
...
Start: 2018-09-03 18:16:35.357 Connection(1)
Status: Connection
Send message: 2018-09-03 18:16:35.358
<information send to backend...>
Receive message: 2018-09-03 18:16:35.359
<information receive from backend...>
End: 2018-09-03 18:16:35.360
...
Start: 2018-09-03 18:16:35.357 Connection(1) ...(1), (2)
Query: DECLARE myportal CURSOR FOR select * from pg_database ...(3)
Send message: 2018-09-03 18:16:35.358 ...(4)
<information send to backend...> ...(5)
Receive message: 2018/09/03 18:16:35.359 ...(6)
<information receive from backend...> ...(7)
End: 2018-09-03 18:16:35.360 ...(8)
...
(1) Timestamp when SQL started
(2) Connection ID (Identify the connection)
(3) SQL statement
(4) Timestamp when information began to be sent to the backend
(5) send message to backend (Result of query, Protocol messages)
(6) Timestamp when information is successfully received in the application
(7) receive message from backend (Result of query, Protocol messages)
(8) Timestamp when SQL ended
Regards,
Iwata Aya
Hi,
Sorry for my late response.
Between perf/systemtap/dtrace and wireshark, you can already do pretty much
all of that. Have you looked at those and found anything missing?
These commands provide detailed information to us.
However, I think the trace log is necessary from the following point.
1. ease of use for users
It is necessary to take information that is easy to understand for database users.
This trace log is retrieved on the application server side.
Not only the database developer but also application developer will get and check this log.
Also, some of these commands return detailed PostgreSQL function names.
The trace log would be useful for users who do not know the inside of PostgreSQL (e.g. application developers)
2. obtain the same information on all OS
Some commands depend on the OS.
I think that it is important that the trace log information is compatible to each OS.
Regards,
Aya Iwata
Hi,
I create a first libpq trace log patch.
In this patch,
- All message that PQtrace() gets are output to the libpq trace log file
(I maybe select more effective message in the future patch)
- Trace log output style is changed slightly from previously proposed
This patch not include documentation,
but you can see parameter detail and how to use it by looking at my previous e-mail.
If get the trace log, set PGLOGDIR/logdir and PGLOGSIZE/logsize.
These parameters are set in the environment variable or the connection service
file.
- logdir or PGLOGDIR : directory where log file written
- logsize or PGLOGSIZE : maximum log size(M). When the log file size exceeds to
PGLOGSIZE, the log is output to another file.
The log file name is determined as follow.
libpq-%ProcessID-%Y-%m-%d_%H%M%S.log
Trace log example;
Start : 2018/10/30 08:02:24.433 ... time(a)
Query: SELECT pg_catalog.set_config('search_path', '', false)
To backend> Msg Q
To backend> "SELECT pg_catalog.set_config('search_path', '', false)"
To backend> Msg complete, length 60
Start sending message to backend: 2018/10/30 08:02:24.433 ... time(b)
End sending message to backend: 2018/10/30 08:02:24.433 ... time(c)
Start receiving message from backend: 2018/10/30 08:02:24.434 ... time(d)
End receiving message from backend: 2018/10/30 08:02:24.434 ... time(e)
From backend> T
From backend (#4)> 35
From backend (#2)> 1
From backend> "set_config"
From backend (#4)> 0
From backend (#2)> 0
From backend (#4)> 25
From backend (#2)> 65535
From backend (#4)> -1
From backend (#2)> 0
From backend> D
From backend (#4)> 10
From backend (#2)> 1
From backend (#4)> 0
From backend> C
From backend (#4)> 13
From backend> "SELECT 1"
From backend> Z
From backend (#4)> 5
From backend> Z
From backend (#4)> 5
From backend> I
End : 2018/10/30 08:02:24.435 ... time(f)
From time(a) to time(b): time for libpq processing
From time(b) to time(c): time for traffic
From time(c) to time(d): time for backend processing
From time(d) to time(e): time for traffic
From time(e) to time(f): time for libpq processing
Regards,
Aya Iwata
Attachments:
v1-0001-libpq-trace-log.patchapplication/octet-stream; name=v1-0001-libpq-trace-log.patchDownload+279-32
Greetings,
This is my first attempt at a patch review, so I will take a pass at the
low hanging fruit.
Initial Pass
============
+ Patch applies
+ Patch builds
+ Patch behaves as described in the thread
I tried a few small things:
When I set a relative path for `PGLOGDIR`, the files were correctly
written to the directory.
When I set a path for `PGLOGDIR` that didn't exist or was not
write-able, the patch writes no files, and does not alert the user that
no files are being written.
Performance
===========
I ran two permutations of make check, one with the patch applied but not
activated, and the other with with the files being written to disk. Each
permutation was run ten times, and the stats are below (times are in
seconds):
min max median mean
not logging 50.4 57.6 53.3 53.4
logging 58.3 77.7 65.0 65.8
Cheers,
Jim Doty
On Tue, Oct 30, 2018 at 2:39 AM Iwata, Aya <iwata.aya@jp.fujitsu.com> wrote:
I create a first libpq trace log patch.
Couple additional thoughts from a read-through of the patch:
- PQtrace() and the new trace-logging machinery overlap in some places
but not others -- and if both are set, PQtrace() will take precedence.
It seems like the two should not be separate.
- It isn't immediately clear to me how the new information in the logs
is intended to be used in concert with the old information. Maybe this
goes back to the comments by Tom and Robert higher in the thread --
that an overhaul of the PQtrace system is due. This patch as presented
would make things a little worse before they got better, I think.
That said, I think the overall idea -- application performance
information that can be enabled via the environment, without requiring
debugging privileges on a machine or the need to manually correlate
traces made by other applications -- is a good one, and something I
would use.
--Jacob
On Tue, Oct 30, 2018 at 8:38 PM Iwata, Aya <iwata.aya@jp.fujitsu.com> wrote:
Hi,
I create a first libpq trace log patch.
In this patch,
- All message that PQtrace() gets are output to the libpq trace log file
(I maybe select more effective message in the future patch)
- Trace log output style is changed slightly from previously proposedThis patch not include documentation,
but you can see parameter detail and how to use it by looking at my
previous e-mail.If get the trace log, set PGLOGDIR/logdir and PGLOGSIZE/logsize.
These parameters are set in the environment variable or the connection
service
file.
- logdir or PGLOGDIR : directory where log file written
- logsize or PGLOGSIZE : maximum log size(M). When the log file size
exceeds to
PGLOGSIZE, the log is output to another file.The log file name is determined as follow.
libpq-%ProcessID-%Y-%m-%d_%H%M%S.logTrace log example;
Start : 2018/10/30 08:02:24.433
... time(a)
Query: SELECT pg_catalog.set_config('search_path', '', false)
To backend> Msg Q
To backend> "SELECT pg_catalog.set_config('search_path', '', false)"
To backend> Msg complete, length 60
Start sending message to backend: 2018/10/30 08:02:24.433
... time(b)
End sending message to backend: 2018/10/30 08:02:24.433
... time(c)
Start receiving message from backend: 2018/10/30 08:02:24.434 ... time(d)
End receiving message from backend: 2018/10/30 08:02:24.434 ... time(e)
From backend> T
From backend (#4)> 35
From backend (#2)> 1
From backend> "set_config"
From backend (#4)> 0
From backend (#2)> 0
From backend (#4)> 25
From backend (#2)> 65535
From backend (#4)> -1
From backend (#2)> 0
From backend> D
From backend (#4)> 10
From backend (#2)> 1
From backend (#4)> 0
From backend> C
From backend (#4)> 13
From backend> "SELECT 1"
From backend> Z
From backend (#4)> 5
From backend> Z
From backend (#4)> 5
From backend> I
End : 2018/10/30 08:02:24.435 ... time(f)From time(a) to time(b): time for libpq processing
From time(b) to time(c): time for traffic
From time(c) to time(d): time for backend processing
From time(d) to time(e): time for traffic
From time(e) to time(f): time for libpq processing
Thanks for the patch.
I have some comments related to the trace output that is getting
printed. The amount of log it is generating may not be understood
to many of the application developers. IMO, this should print
only the necessary information that can understood by any one
by default and full log with more configuration?
Regards,
Haribabu Kommi
Fujitsu Australia
Hi Jim Doty san,
Thank you for review! I'm sorry my late reply...
Initial Pass
============+ Patch applies + Patch builds + Patch behaves as described in the thread
Thank you for your check.
When I set a path for `PGLOGDIR` that didn't exist or was not write-able,
the patch writes no files, and does not alert the user that no files are being
written.
I understand. I think it means that it is necessary to confirm how the setting is going well.
There is no warning method when connection string or the environment variable is wrong.
So I added following document:
+ If the setting of the file path by the connection string or the environment variable is
+ incorrect, the log file is not created in the intended location.
+ The maximum log file size you set is output to the beginning of the file, so you can check it.
And I added the process. Please see my v2 patch.
Performance
===========I ran two permutations of make check, one with the patch applied but not
activated, and the other with with the files being written to disk. Each
permutation was run ten times, and the stats are below (times are in
seconds):min max median mean
not logging 50.4 57.6 53.3 53.4
logging 58.3 77.7 65.0 65.8
Thank you for your measurement.
I'm thinking about adding a logging level so that only the necessary information can be printed by default. It was pointed out by Haribabu san's e-mail.
This minimizes the impact of logging on performance.
Regards,
Aya Iwata
Attachments:
v2-0001-libpq-trace-log.patchapplication/octet-stream; name=v2-0001-libpq-trace-log.patchDownload+386-32
Hi Hari san,
Thank you for your comment! And sorry my late reply…
I have some comments related to the trace output that is getting
printed. The amount of log it is generating may not be understood
to many of the application developers. IMO, this should print
only the necessary information that can understood by any one
by default and full log with more configuration?
I understand. And I agree your opinion.
I will add feature called “log level” that changes the amount of log output information with my future version patch.
Regards,
Aya Iwata
Hi Jacob san,
Thank you for your comment! And sorry for late reply...
Couple additional thoughts from a read-through of the patch:
- PQtrace() and the new trace-logging machinery overlap in some places but
not others -- and if both are set, PQtrace() will take precedence.
It seems like the two should not be separate.
I understand. This log is acquired for the purpose of investigating the cause part (server side or application side) when performance is bad.
So I will search whether getting other place of PQtrace() is necessary or not.
And I will reply after the research, please wait for a while a little.
- It isn't immediately clear to me how the new information in the logs is
intended to be used in concert with the old information. Maybe this goes back
to the comments by Tom and Robert higher in the thread -- that an overhaul
of the PQtrace system is due. This patch as presented would make things a
little worse before they got better, I think.That said, I think the overall idea -- application performance information
that can be enabled via the environment, without requiring debugging
privileges on a machine or the need to manually correlate traces made by other
applications -- is a good one, and something I would use.
Thank you. I think so, too. Some applications cannot be stopped to add the PQtrace() code.
Regards,
Aya Iwata
Hi,
I created a new version patch. Please find attached my patch.
Changes:
Since v2 patch
I forgot to write the detail of v2 patch changes. So I write these.
- Fixed the " Start receiving message from backend:" message location. Because I found that message located at outside of "retry4".
- Changed location where output "start :" / "end :" messages and timestamp. The reason of the change is that v1 patch did not correspond to Asynchronous Command Processing.
- Added documentation
- Added documentation how to check mistake of logdir and/or logsize. (Based on review comment of Jim san's)
Since v3 patch
- Fixed my mistake on fe-connect.c. Time and message were output at the place where does not output in originally PQtrace(). These messages are needed only new trace log. So I fixed it.
- Fixed two points so that new trace log overlaps all places in PQtrace(). (Based on review comment of Jacob san's)
TODO:
I will add the feature called "logging level" on next version patch. I will attach it as soon as possible.
I'm marking it as "Needs review".
Regards,
Aya Iwata
Attachments:
v3-0001-libpq-trace-log.patchapplication/octet-stream; name=v3-0001-libpq-trace-log.patchDownload+392-35
On 27/11/2018 08:42, Iwata, Aya wrote:
I created a new version patch. Please find attached my patch.
This does not excite me. It seems mostly redundant with using tcpdump.
If I were to debug networking problems, I wouldn't even trust this very
much because it relies on the willpower of all future PostgreSQL
developers to keep this accurately up to date, whereas tcpdump gives me
the truth from the kernel.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services