Trouble with FETCH_COUNT and combined queries in psql
Hi,
When FETCH_COUNT is set, queries combined in a single request don't work
as expected:
\set FETCH_COUNT 10
select pg_sleep(2) \; select 1;
No result is displayed, the pg_sleep(2) is not run, and no error
is shown. That's disconcerting.
The sequence that is sent under the hood is:
#1 BEGIN
#2 DECLARE _psql_cursor NO SCROLL CURSOR FOR
select pg_sleep(2) ; select 1;
#3 CLOSE _psql_cursor
#4 ROLLBACK
The root problem is in deciding that a statement can be run
through a cursor if the query text starts with "select" or "values"
(in is_select_command() in common.c), but not knowing about multiple
queries in the buffer, which are not compatible with the cursor thing.
When sending #2, psql expects the PQexec("DECLARE...") to yield a
PGRES_COMMAND_OK, but it gets a PGRES_TUPLES_OK instead. Given
that, it abandons the cursor, rollbacks the transaction (if
it opened it), and clears out the results of the second select
without displaying them.
If there was already a transaction open, the problem is worse because
it doesn't rollback and we're silently missing an SQL statement that
was possibly meant to change the state of the data, as in
BEGIN; SELECT compute_something() \; select get_results(); END;
Does anyone have thoughts about how to fix this?
ATM I don't see a plausible fix that does not involve the parser
to store the information that it's a multiple-query command and pass
it down somehow to is_select_command().
Or a more modern approach could be to give up on the
cursor-based method in favor of PQsetSingleRowMode().
That might be too big a change for a bug fix though,
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
Bonjour Daniel,
When FETCH_COUNT is set, queries combined in a single request don't work
as expected:\set FETCH_COUNT 10
select pg_sleep(2) \; select 1;No result is displayed, the pg_sleep(2) is not run, and no error
is shown. That's disconcerting.
Indeed.
Does anyone have thoughts about how to fix this?
ATM I don't see a plausible fix that does not involve the parser
to store the information that it's a multiple-query command and pass
it down somehow to is_select_command().
The lexer (not parser) is called by psql to know where the query stops
(i.e. waiting for ";"), so it could indeed know whether there are several
queries.
I added some stuff to extract embedded "\;" for pgbench "\cset", which has
been removed though, but it is easy to add back a detection of "\;", and
also to detect select. If the position of the last select is known, the
cursor can be declared in the right place, which would also solve the
problem.
Or a more modern approach could be to give up on the cursor-based method
in favor of PQsetSingleRowMode().
Hmmm. I'm not sure that row count is available under this mode? ISTM that
the FETCH_COUNT stuff should really batch fetching result by this amount.
I'm not sure of the 1 by 1 row approach.
--
Fabien.
Fabien COELHO wrote:
I added some stuff to extract embedded "\;" for pgbench "\cset", which has
been removed though, but it is easy to add back a detection of "\;", and
also to detect select. If the position of the last select is known, the
cursor can be declared in the right place, which would also solve the
problem.
Thanks, I'll extract the necessary bits from your patch.
I don't plan to go as far as injecting a DECLARE CURSOR inside
the query, but rather just forbid the use of the cursor in
the combined-queries case.
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
"Daniel Verite" <daniel@manitou-mail.org> writes:
Fabien COELHO wrote:
I added some stuff to extract embedded "\;" for pgbench "\cset", which has
been removed though, but it is easy to add back a detection of "\;", and
also to detect select. If the position of the last select is known, the
cursor can be declared in the right place, which would also solve the
problem.
Thanks, I'll extract the necessary bits from your patch.
I don't plan to go as far as injecting a DECLARE CURSOR inside
the query, but rather just forbid the use of the cursor in
the combined-queries case.
Keep in mind that a large part of the reason why the \cset patch got
bounced was exactly that its detection of \; was impossibly ugly
and broken. Don't expect another patch using the same logic to
get looked on more favorably.
I'm not really sure how far we should go to try to make this case "work".
To my mind, use of \; in this way represents an intentional defeat of
psql's algorithms for deciding where end-of-query is. If that ends up
in behavior you don't like, ISTM that's your fault not psql's.
Having said that, I did like the idea of maybe going over to
PQsetSingleRowMode instead of using an explicit cursor. That
would represent a net decrease of cruftiness here, instead of
layering more cruft on top of what's already a mighty ugly hack.
However ... that'd require using PQsendQuery, which means that the
case at hand with \; would result in a server error rather than
surprising client-side behavior. Is that an acceptable outcome?
regards, tom lane
Hello Tom,
Keep in mind that a large part of the reason why the \cset patch got
bounced was exactly that its detection of \; was impossibly ugly
and broken. Don't expect another patch using the same logic to
get looked on more favorably.
Although I do not claim that my implementation was very good, I must admit
that I'm not sure why there would be an issue if the lexer API is made
aware of the position of embedded \;, if this information is useful for a
feature.
I'm not really sure how far we should go to try to make this case "work".
To my mind, use of \; in this way represents an intentional defeat of
psql's algorithms for deciding where end-of-query is.
I do not understand: the lexer knows where all queries ends, it just does
not say so currently?
If that ends up in behavior you don't like, ISTM that's your fault not
psql's.
ISTM that the user is not responsible for non orthogonal features provided
by psql or pgbench (we implemented this great feature, but not in all
cases).
Having said that, I did like the idea of maybe going over to
PQsetSingleRowMode instead of using an explicit cursor. That
would represent a net decrease of cruftiness here, instead of
layering more cruft on top of what's already a mighty ugly hack.
Possibly, but, without having looked precisely at the implementation, I'm
afraid that it would result in more messages. Maybe I'm wrong. Also, I'm
ensure how returning new pg results for each row (as I understood the mode
from the doc) would interact with \;, i.e. how to know whether the current
query has changed. Maybe all this has simple answer.
However ... that'd require using PQsendQuery, which means that the
case at hand with \; would result in a server error rather than
surprising client-side behavior. Is that an acceptable outcome?
I've sent a patch to replace the existing PQexec by PQsendQuery for a not
directly related feature, see https://commitfest.postgresql.org/23/2096/
--
Fabien.
Tom Lane wrote:
Keep in mind that a large part of the reason why the \cset patch got
bounced was exactly that its detection of \; was impossibly ugly
and broken. Don't expect another patch using the same logic to
get looked on more favorably.
Looking at the end of the discussion about \cset, it seems what
you were against was not much how the detection was done rather
than how and why it was used thereafter.
In the case of the present bug, we just need to know whether there
are any \; query separators in the command string.
If yes, then SendQuery() doesn't get to use the cursor technique to
avoid any risk with that command string, despite FETCH_COUNT>0.
PFA a simple POC patch implementing this.
Having said that, I did like the idea of maybe going over to
PQsetSingleRowMode instead of using an explicit cursor. That
would represent a net decrease of cruftiness here, instead of
layering more cruft on top of what's already a mighty ugly hack.
It would also work with queries that start with a CTE, and queries
like (UPDATE/INSERT/DELETE.. RETURNING), that the current way
with the cursor cannot handle. But that looks like a project for PG13,
whereas a fix like the attached could be backpatched.
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
Attachments:
fetch-count-fix.difftext/x-patch; name=fetch-count-fix.diffDownload
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index bd28444..78641e0 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -1405,7 +1405,8 @@ SendQuery(const char *query)
results = NULL; /* PQclear(NULL) does nothing */
}
else if (pset.fetch_count <= 0 || pset.gexec_flag ||
- pset.crosstab_flag || !is_select_command(query))
+ pset.crosstab_flag || pset.num_semicolons > 0 ||
+ !is_select_command(query))
{
/* Default fetch-it-all-and-print mode */
instr_time before,
diff --git a/src/bin/psql/mainloop.c b/src/bin/psql/mainloop.c
index 3ae4470..e5af7a2 100644
--- a/src/bin/psql/mainloop.c
+++ b/src/bin/psql/mainloop.c
@@ -426,6 +426,7 @@ MainLoop(FILE *source)
/* execute query unless we're in an inactive \if branch */
if (conditional_active(cond_stack))
{
+ pset.num_semicolons = psql_scan_get_escaped_semicolons(scan_state);
success = SendQuery(query_buf->data);
slashCmdStatus = success ? PSQL_CMD_SEND : PSQL_CMD_ERROR;
pset.stmt_lineno = 1;
@@ -502,6 +503,7 @@ MainLoop(FILE *source)
/* should not see this in inactive branch */
Assert(conditional_active(cond_stack));
+ pset.num_semicolons = psql_scan_get_escaped_semicolons(scan_state);
success = SendQuery(query_buf->data);
/* transfer query to previous_buf by pointer-swapping */
diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h
index 5be5091..2755927 100644
--- a/src/bin/psql/settings.h
+++ b/src/bin/psql/settings.h
@@ -110,6 +110,8 @@ typedef struct _psqlSettings
char *inputfile; /* file being currently processed, if any */
uint64 lineno; /* also for error reporting */
uint64 stmt_lineno; /* line number inside the current statement */
+ int num_semicolons; /* number of query separators (\;) found
+ inside the current statement */
bool timing; /* enable timing of all queries */
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index 850754e..ee32547 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -694,8 +694,15 @@ other .
* substitution. We want these before {self}, also.
*/
-"\\"[;:] {
- /* Force a semi-colon or colon into the query buffer */
+"\\"; {
+ /* Count semicolons in compound commands */
+ cur_state->escaped_semicolons++;
+ /* Force a semicolon into the query buffer */
+ psqlscan_emit(cur_state, yytext + 1, 1);
+ }
+
+"\\": {
+ /* Force a colon into the query buffer */
psqlscan_emit(cur_state, yytext + 1, 1);
}
@@ -1066,6 +1073,9 @@ psql_scan(PsqlScanState state,
/* Set current output target */
state->output_buf = query_buf;
+ /* Reset number of escaped semicolons seen */
+ state->escaped_semicolons = 0;
+
/* Set input source */
if (state->buffer_stack != NULL)
yy_switch_to_buffer(state->buffer_stack->buf, state->scanner);
@@ -1210,6 +1220,16 @@ psql_scan_reset(PsqlScanState state)
}
/*
+ * Return the number of escaped semicolons in the lexed string seen by the
+ * previous psql_scan call.
+ */
+int
+psql_scan_get_escaped_semicolons(PsqlScanState state)
+{
+ return state->escaped_semicolons;
+}
+
+/*
* Reselect this lexer (psqlscan.l) after using another one.
*
* Currently and for foreseeable uses, it's sufficient to reset to INITIAL
diff --git a/src/include/fe_utils/psqlscan.h b/src/include/fe_utils/psqlscan.h
index c3a1d58..395ac78 100644
--- a/src/include/fe_utils/psqlscan.h
+++ b/src/include/fe_utils/psqlscan.h
@@ -83,6 +83,8 @@ extern PsqlScanResult psql_scan(PsqlScanState state,
extern void psql_scan_reset(PsqlScanState state);
+extern int psql_scan_get_escaped_semicolons(PsqlScanState state);
+
extern void psql_scan_reselect_sql_lexer(PsqlScanState state);
extern bool psql_scan_in_quote(PsqlScanState state);
diff --git a/src/include/fe_utils/psqlscan_int.h b/src/include/fe_utils/psqlscan_int.h
index 42a738f..752cc94 100644
--- a/src/include/fe_utils/psqlscan_int.h
+++ b/src/include/fe_utils/psqlscan_int.h
@@ -112,6 +112,7 @@ typedef struct PsqlScanStateData
int start_state; /* yylex's starting/finishing state */
int paren_depth; /* depth of nesting in parentheses */
int xcdepth; /* depth of nesting in slash-star comments */
+ int escaped_semicolons; /* number of embedded (\;) semicolons */
char *dolqstart; /* current $foo$ quote start string */
/*
Keep in mind that a large part of the reason why the \cset patch got
bounced was exactly that its detection of \; was impossibly ugly
and broken. Don't expect another patch using the same logic to
get looked on more favorably.Looking at the end of the discussion about \cset, it seems what
you were against was not much how the detection was done rather
than how and why it was used thereafter.In the case of the present bug, we just need to know whether there
are any \; query separators in the command string.
If yes, then SendQuery() doesn't get to use the cursor technique to
avoid any risk with that command string, despite FETCH_COUNT>0.PFA a simple POC patch implementing this.
Indeed it does not look that bad.
Note a side effect of simply counting: "SELECT 1 \; ;" is detected as
compound, but internal misplaced optimization would result in only one
result as the empty one is removed, so the cursor trick would work.
In some earlier version, not sure whether I sent it, I tried to keep their
position with some int array and detect empty queries, which was a lot of
(ugly) efforts.
--
Fabien.