[BUG] pg_stat_statements and extended query protocol
Doing some work with extended query protocol, I encountered the same
issue that was discussed in [1]/messages/by-id/c90890e7-9c89-c34f-d3c5-d5c763a34bd8@dunslane.net. It appears when a client is using
extended query protocol and sends an Execute message to a portal with
max_rows, and a portal is executed multiple times,
pg_stat_statements does not correctly track rows and calls.
Using the attached jdbc script, TEST.java, which can reproduce the issue
with setFetchSize of 100 with autocommit mode set to OFF. We can
see that although pg_class has 414 rows, the total call and
rows returned is 14. the first 4 * 100 fetches did not get accounted for,.
postgres=# select calls, rows, query from pg_stat_statements
postgres-# where queryid = '-1905758228217333571';
calls | rows | query
---------------------------------
1 | 14 | select * from pg_class
(1 row)
The execution work flow goes something like this:
ExecutorStart
ExecutorRun – which will be called multiple times to fetch from the
portal until the caller Closes the portal or the portal
runs out of rows.
ExecutorFinish
ExecutorEnd – portal is closed & pg_stat_statements stores the final rows processed
Where this breaks for pg_stat_statements is during ExecutorRun,
es_processed is reset to 0 every iteration. So by the time the portal
is closed, es_processed will only show the total from the last execute
message.
This appears to be only an issue for portals fetched
through extended query protocol and not explicit cursors
that go through simple query protocol (i.e. FETCH <cursor>)
I attached a JDBC script to repro the issue.
One potential fix I see is to introduce 2 new counters in the
ExecutionState which will track the total rows processed
and the number of calls. These counters can then be used
by pg_stat_statements. Attached is an experimental patch
which shows the correct number of rows and number of
calls.
postgres=# select calls, rows, query from pg_stat_statements
postgres-# where queryid = '-1905758228217333571';
calls | rows | query
---------------------------------
5 | 414 | select * from pg_class
(1 row)
[1]: /messages/by-id/c90890e7-9c89-c34f-d3c5-d5c763a34bd8@dunslane.net
Thanks
–
Sami Imseih
Amazon Web Services (AWS)
On Wed, Jan 25, 2023 at 11:22:04PM +0000, Imseih (AWS), Sami wrote:
Doing some work with extended query protocol, I encountered the same
issue that was discussed in [1]. It appears when a client is using
extended query protocol and sends an Execute message to a portal with
max_rows, and a portal is executed multiple times,
pg_stat_statements does not correctly track rows and calls.
Well, it is one of these areas where it seems to me we have never been
able to put a definition on what should be the correct behavior when
it comes to pg_stat_statements. Could it be possible to add some
regression tests using the recently-added \bind command and see how
this affects things? I would suggest splitting these into their own
SQL file, following an effort I have been doing recently for the
regression tests of pg_stat_statements. It would be good to know the
effects of this change for pg_stat_statements.track = (top|all), as
well.
@@ -657,7 +657,9 @@ typedef struct EState
List *es_tupleTable; /* List of TupleTableSlots */
- uint64 es_processed; /* # of tuples processed */
+ uint64 es_processed; /* # of tuples processed at the top level only */
+ uint64 es_calls; /* # of calls */
+ uint64 es_total_processed; /* total # of tuples processed */
So the root of the logic is here. Anything that makes the executor
structures larger freaks me out, FWIW, and that's quite an addition.
--
Michael
Well, it is one of these areas where it seems to me we have never been
able to put a definition on what should be the correct behavior when
it comes to pg_stat_statements.
What needs to be defined here is how pgss should account for # of rows
processed when A) a select goes through extended query (EP) protocol, and
B) it requires multiple executes to complete a portal.
The patch being suggested will treat every 'E' ( execute message ) to the same
portal as a new call ( pgss will increment the calls ) and the number of rows
processed will be accumulated for every 'E' message.
Currently, only the rows fetched in the last 'E' call to the portal is tracked by
pgss. This is incorrect.
Could it be possible to add some
regression tests using the recently-added \bind command and see how
this affects things?
\bind alone will not be enough as we also need a way to fetch from
a portal in batches. The code that needs to be exercised
as part of the test is exec_execute_message with max_rows != 0.
\bind will call exec_execute_message with max_rows = 0 to fetch
all the rows.
I would suggest splitting these into their own
SQL file, following an effort I have been doing recently for the
regression tests of pg_stat_statements. It would be good to know the
effects of this change for pg_stat_statements.track = (top|all), as
well.
Yes, I agree that proper test coverage is needed here. Will think
about how to accomplish this.
- uint64 es_processed; /* # of tuples processed */ + uint64 es_processed; /* # of tuples processed at the top level only */ + uint64 es_calls; /* # of calls */ + uint64 es_total_processed; /* total # of tuples processed */
So the root of the logic is here. Anything that makes the executor
structures larger freaks me out, FWIW, and that's quite an addition.
I am not sure how to get around the changes to Estate and fixing this
Issue.
We could potentially only need the es_total_processed field and
continue to track calls in pgss.
es_total_processed in EState however is still needed.
Regards,
--
Sami Imseih
Amazon Web Services (AWS)
Yes, I agree that proper test coverage is needed here. Will think
about how to accomplish this.
Tried to apply this patch to current master branch and the build was ok,
however it crashed during initdb with a message like below.
"performing post-bootstrap initialization ... Segmentation fault (core
dumped)"
If I remove this patch and recompile again, then "initdb -D $PGDATA" works.
Thanks,
David
If I remove this patch and recompile again, then "initdb -D $PGDATA" works.
It appears you must "make clean; make install" to correctly compile after
applying the patch.
Regards,
Sami Imseih
Amazon Web Services (AWS)
On Sat, Mar 11, 2023 at 11:55:22PM +0000, Imseih (AWS), Sami wrote:
It appears you must "make clean; make install" to correctly compile after
applying the patch.
In a git repository, I've learnt to rely on this simple formula, even
if it means extra cycles when running ./configure:
git clean -d -x -f
--
Michael
Hi,
On 3/2/23 8:27 AM, Michael Paquier wrote:
On Wed, Jan 25, 2023 at 11:22:04PM +0000, Imseih (AWS), Sami wrote:
Doing some work with extended query protocol, I encountered the same
issue that was discussed in [1]. It appears when a client is using
extended query protocol and sends an Execute message to a portal with
max_rows, and a portal is executed multiple times,
pg_stat_statements does not correctly track rows and calls.Well, it is one of these areas where it seems to me we have never been
able to put a definition on what should be the correct behavior when
it comes to pg_stat_statements. Could it be possible to add some
regression tests using the recently-added \bind command and see how
this affects things? I would suggest splitting these into their own
SQL file, following an effort I have been doing recently for the
regression tests of pg_stat_statements. It would be good to know the
effects of this change for pg_stat_statements.track = (top|all), as
well.@@ -657,7 +657,9 @@ typedef struct EState
List *es_tupleTable; /* List of TupleTableSlots */
- uint64 es_processed; /* # of tuples processed */ + uint64 es_processed; /* # of tuples processed at the top level only */ + uint64 es_calls; /* # of calls */ + uint64 es_total_processed; /* total # of tuples processed */So the root of the logic is here. Anything that makes the executor
structures larger freaks me out, FWIW, and that's quite an addition.
--
Michael
I wonder if we can't "just" make use of the "count" parameter passed to the
ExecutorRun_hook.
Something like?
- Increment a "es_total_processed" counter in pgss based on the count received in pgss_ExecutorRun()
- In pgss_ExecutorEnd(): substract the last count we received in pgss_ExecutorRun() and add queryDesc->estate->es_processed? (we'd
need to be able to distinguish when we should apply this rule or not).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
It appears you must "make clean; make install" to correctly compile after
applying the patch.In a git repository, I've learnt to rely on this simple formula, even
if it means extra cycles when running ./configure:
git clean -d -x -f
Thank you all for pointing out that it needs make clean first. After
make clean followed by recompile with the patch then both make check
from regression test and pg_stat_statements extension report all test
passed. So the current existing test cases can't really detect any
change from this patch, then it would be better to add some test cases
to cover this.
Best regards,
David
Sorry about the delay in response about this.
I was thinking about this and it seems to me we can avoid
adding new fields to Estate. I think a better place to track
rows and calls is in the Instrumentation struct.
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -88,6 +88,8 @@ typedef struct Instrumentation
double nfiltered2; /* # of tuples removed by "other" quals */
BufferUsage bufusage; /* total buffer usage */
WalUsage walusage; /* total WAL usage */
+ int64 calls;
+ int64 rows_processed;
} Instrumentation;
If this is more palatable, I can prepare the patch.
Thanks for your feedback.
Regards.
Sami Imseih
Amazon Web Services (AWS)
On Mon, Mar 20, 2023 at 09:41:12PM +0000, Imseih (AWS), Sami wrote:
I was thinking about this and it seems to me we can avoid
adding new fields to Estate. I think a better place to track
rows and calls is in the Instrumentation struct.If this is more palatable, I can prepare the patch.
This indeed feels a bit more natural seen from here, after looking at
the code paths using an Instrumentation in the executor and explain,
for example. At least, this stresses me much less than adding 16
bytes to EState for something restricted to the extended protocol when
it comes to monitoring capabilities.
--
Michael
This indeed feels a bit more natural seen from here, after looking at
the code paths using an Instrumentation in the executor and explain,
for example. At least, this stresses me much less than adding 16
bytes to EState for something restricted to the extended protocol when
it comes to monitoring capabilities.
Attached is the patch that uses Instrumentation.
I did not add any new tests, and we do not have anyway now
of setting a row count when going through the Execute message.
I think this may be need to be addressed separately since there
Seems to be. Gap in extended query protocol testing.
For this fix however, The JDBC test does show correct results.
Regards,
Sami Imseih
Amazon Web Services (AWS)
Attachments:
0001-Correct-accumulation-of-counters-for-extended-query-.patchapplication/octet-stream; name=0001-Correct-accumulation-of-counters-for-extended-query-.patchDownload+16-5
Hi,
On 3/21/23 2:16 PM, Imseih (AWS), Sami wrote:
This indeed feels a bit more natural seen from here, after looking at
the code paths using an Instrumentation in the executor and explain,
for example. At least, this stresses me much less than adding 16
bytes to EState for something restricted to the extended protocol when
it comes to monitoring capabilities.Attached is the patch that uses Instrumentation.
Thanks, I think this new approach makes sense.
- const BufferUsage *bufusage,
+ int64 calls, const BufferUsage *bufusage,
What about using an uint64 for calls? That seems more appropriate to me (even if
queryDesc->totaltime->calls will be passed (which is int64), but that's already
also the case for the "rows" argument and queryDesc->totaltime->rows_processed)
@@ -88,6 +88,8 @@ typedef struct Instrumentation
double nfiltered2; /* # of tuples removed by "other" quals */
BufferUsage bufusage; /* total buffer usage */
WalUsage walusage; /* total WAL usage */
+ int64 calls; /* # of total calls to ExecutorRun */
+ int64 rows_processed; /* # of total rows processed in ExecutorRun */
I'm not sure it's worth mentioning that the new counters are "currently" used with the ExecutorRun.
What about just "total calls" and "total rows processed" (or "total rows", see below)?
Also, I wonder if "rows" (and not rows_processed) would not be a better naming.
Those last comments regarding the Instrumentation are done because ISTM that at the end their usage
could vary depending of the use case of the Instrumentation.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
What about using an uint64 for calls? That seems more appropriate to me (even if
queryDesc->totaltime->calls will be passed (which is int64), but that's already
also the case for the "rows" argument and queryDesc->totaltime->rows_processed)
That's fair
I'm not sure it's worth mentioning that the new counters are "currently" used with the ExecutorRun.
Sure, I suppose these fields could be used outside of ExecutorRun. Good point.
Also, I wonder if "rows" (and not rows_processed) would not be a better naming.
Agree.
I went with rows_processed initially, since it was accumulating es_processed,
but as the previous point, this instrumentation could be used outside of
ExecutorRun.
v3 addresses the comments.
Regards,
--
Sami Imseih
Amazon Web Services (AWS)
Attachments:
v3-0001-Correct-accumulation-of-counters-for-extended-query-.patchapplication/octet-stream; name=v3-0001-Correct-accumulation-of-counters-for-extended-query-.patchDownload+16-5
Hi,
On 3/22/23 10:35 PM, Imseih (AWS), Sami wrote:
What about using an uint64 for calls? That seems more appropriate to me (even if
queryDesc->totaltime->calls will be passed (which is int64), but that's already
also the case for the "rows" argument and queryDesc->totaltime->rows_processed)That's fair
I'm not sure it's worth mentioning that the new counters are "currently" used with the ExecutorRun.
Sure, I suppose these fields could be used outside of ExecutorRun. Good point.
Also, I wonder if "rows" (and not rows_processed) would not be a better naming.
Agree.
I went with rows_processed initially, since it was accumulating es_processed,
but as the previous point, this instrumentation could be used outside of
ExecutorRun.v3 addresses the comments.
Thanks! LGTM and also do confirm that, with the patch, the JDBC test does show the correct results.
That said, not having a test (for the reasons you explained up-thread) associated with the patch worry me a bit.
But, I'm tempted to say that adding new tests could be addressed separately though (as this patch looks pretty straightforward).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Thu, Mar 23, 2023 at 09:33:16AM +0100, Drouvot, Bertrand wrote:
Thanks! LGTM and also do confirm that, with the patch, the JDBC test
does show the correct results.
How does JDBC test that? Does it have a dependency on
pg_stat_statements?
That said, not having a test (for the reasons you explained
up-thread) associated with the patch worry me a bit.
Same impression here.
But, I'm tempted to say that adding new tests could be addressed
separately though (as this patch looks pretty straightforward).
Even small patches can have gotchas. I think that this should have
tests in-core rather than just depend on JDBC and hope for the best.
Even if \bind does not allow that, we could use an approach similar to
libpq_pipeline, for example, depending on pg_stat_statements for the
validation with a test module in src/test/modules/?
--
Michael
How does JDBC test that? Does it have a dependency on
pg_stat_statements?
No, at the start of the thread, a sample jdbc script was attached.
But I agree, we need to add test coverage. See below.
But, I'm tempted to say that adding new tests could be addressed
separately though (as this patch looks pretty straightforward).
Even small patches can have gotchas. I think that this should have
tests in-core rather than just depend on JDBC and hope for the best.
Even if \bind does not allow that, we could use an approach similar to
libpq_pipeline, for example, depending on pg_stat_statements for the
validation with a test module in src/test/modules/?
Yes, that is possible but we will need to add a libpq API
that allows the caller to pass in a "fetch size".
PQsendQueryParams does not take in a "fetch size",
so it returns all rows, through PQsendQueryParams
https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/fe-exec.c#L1882
Adding such an API that takes in a "fetch size" will be beneficial
not just for this test, but I can see it enabling another psql meta command,
similar to \bind but that takes in a "fetch size".
Regards
--
Sami Imseih
Amazon Web Services (AWS)
On Thu, 23 Mar 2023 09:33:16 +0100
"Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote:
Hi,
On 3/22/23 10:35 PM, Imseih (AWS), Sami wrote:
What about using an uint64 for calls? That seems more appropriate to me (even if
queryDesc->totaltime->calls will be passed (which is int64), but that's already
also the case for the "rows" argument and queryDesc->totaltime->rows_processed)That's fair
I'm not sure it's worth mentioning that the new counters are "currently" used with the ExecutorRun.
Sure, I suppose these fields could be used outside of ExecutorRun. Good point.
Also, I wonder if "rows" (and not rows_processed) would not be a better naming.
Agree.
I went with rows_processed initially, since it was accumulating es_processed,
but as the previous point, this instrumentation could be used outside of
ExecutorRun.v3 addresses the comments.
I wonder that this patch changes the meaning of "calls" in the pg_stat_statement
view a bit; previously it was "Number of times the statement was executed" as
described in the documentation, but currently this means "Number of times the
portal was executed". I'm worried that this makes users confused. For example,
a user may think the average numbers of rows returned by a statement is given by
rows/calls, but it is not always correct because some statements could be executed
with multiple portal runs.
Although it might not big issue to users, I think it is better to add an explanation
to the doc for clarification.
Regards,
Yugo Nagata
Thanks! LGTM and also do confirm that, with the patch, the JDBC test does show the correct results.
That said, not having a test (for the reasons you explained up-thread) associated with the patch worry me a bit.
But, I'm tempted to say that adding new tests could be addressed separately though (as this patch looks pretty straightforward).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
--
Yugo NAGATA <nagata@sraoss.co.jp>
I wonder that this patch changes the meaning of "calls" in the pg_stat_statement
view a bit; previously it was "Number of times the statement was executed" as
described in the documentation, but currently this means "Number of times the
portal was executed". I'm worried that this makes users confused. For example,
a user may think the average numbers of rows returned by a statement is given by
rows/calls, but it is not always correct because some statements could be executed
with multiple portal runs.
I don't think it changes the meaning of "calls" in pg_stat_statements, since every
time the app fetches X amount of rows from a portal, it's still done in a separate
execution, and thus a separate call.
I agree, the meaning of "calls" should be clarified in docs.
Regards,
Sami Imseih
Amazon Web Services (AWS)
On Thu, Mar 23, 2023 at 01:54:05PM +0000, Imseih (AWS), Sami wrote:
Yes, that is possible but we will need to add a libpq API
that allows the caller to pass in a "fetch size".
PQsendQueryParams does not take in a "fetch size",
so it returns all rows, through PQsendQueryParamshttps://github.com/postgres/postgres/blob/master/src/interfaces/libpq/fe-exec.c#L1882
Adding such an API that takes in a "fetch size" will be beneficial
not just for this test, but I can see it enabling another psql meta command,
similar to \bind but that takes in a "fetch size".
So... The idea here is to set a custom fetch size so as the number of
calls can be deterministic in the tests, still more than 1 for the
tests we'd have. And your point is that libpq enforces always 0 when
sending the EXECUTE message causing it to always return all the rows
for any caller of PQsendQueryGuts().
The extended protocol allows that, so you would like a libpq API to
have more control of what we send with EXECUTE:
https://www.postgresql.org/docs/current/protocol-overview.html#PROTOCOL-QUERY-CONCEPTS
The extended query protocol would require multiple 'E' messages, but
we would not need multiple describe or bind messages, meaning that
this cannot just be an extra flavor PQsendQueryParams(). Am I gettig
that right? The correct API design seems tricky, to say the least.
Perhaps requiring this much extra work in libpq for the purpose of
having some tests in this thread is not a brilliant idea.. Or perhaps
we could just do it and have something a-la-JDBC with two routines?
That would be one libpq routine for describe/bind and one for execute
where the limit can be given by the caller in the latter case, similar
to sendDescribeStatement() and sendExecute() in
QueryExecutorImpl.java.
--
Michael
So... The idea here is to set a custom fetch size so as the number of
calls can be deterministic in the tests, still more than 1 for the
tests we'd have. And your point is that libpq enforces always 0 when
sending the EXECUTE message causing it to always return all the rows
for any caller of PQsendQueryGuts().
That is correct.
The extended protocol allows that, so you would like a libpq API to
have more control of what we send with EXECUTE:
https://www.postgresql.org/docs/current/protocol-overview.html#PROTOCOL-QUERY-CONCEPTS
The extended query protocol would require multiple 'E' messages, but
we would not need multiple describe or bind messages, meaning that
this cannot just be an extra flavor PQsendQueryParams(). Am I gettig
that right?
Correct, there will need to be separate APIs for Parse/Bind, Execute
and Close of a Portal.
The correct API design seems tricky, to say the least.
Perhaps requiring this much extra work in libpq for the purpose of
having some tests in this thread is not a brilliant idea.. Or perhaps
we could just do it and have something a-la-JDBC with two routines?
That would be one libpq routine for describe/bind and one for execute
where the limit can be given by the caller in the latter case, similar
to sendDescribeStatement() and sendExecute() in
QueryExecutorImpl.java.
I am not too clear on your point here. ISTM you are suggesting adding
new libpq APis similar to JDBC, which is what I am also suggesting.
Did I understand correctly?
--
Sami Imseih
Amazon Web Services (AWS)