syntax error position "CREATE FUNCTION" bug fix
Dear patchers,
Please find attached a patch to fix the "CREATE FUNCTION" syntax error
position bug I reported a few days ago.
As the exact query being processed in only known to the backend (it may be
the initial query, it may be a subset of the initial query, it may be some
generated query?), the offending string is returned with the error
position, which is expressed with respect to this query (that has always
been the case by the way).
In order to do that, I did the following:
1. appended a new "query" field into the ErrorData structure,
which is set with an added errquery function.
2. modified the error reporting part of the protocol (version 3).
As the protocol implementation in libpq is fuzzy enough, there is
not fix on the client reception part, only the server sending
part is modified with a new field for the query (Q). Thus this
addition should not harm old clients.
3. fixed yyerror to return the processed query on errors.
a copy of the buffer is needed as the scanner buffer is modified
and the convention about buffer termination are not the same.
4. fixed the psql position reporting code to use this reported query
instead of the one it sent. Tom's quick fix around the problem is removed.
5. updated comments where it seemd appropriate in the code.
6. finally updated the protocol documentation for the error reporting
part by adding the Q field and fixing the P field.
I dit it like that because I don't think it is elegantly feasible to
update the position to get back to the initial query, as escapes may have
been processed within the string, so a simple offset would not fix the
bug.
It validates for me.
Have a nice day,
--
Fabien Coelho - coelho@cri.ensmp.fr
Attachments:
error_pos_fix.patchtext/plain; charset=US-ASCII; name=error_pos_fix.patchDownload
*** ./doc/src/sgml/protocol.sgml.orig Wed Mar 10 15:56:59 2004
--- ./doc/src/sgml/protocol.sgml Thu Mar 18 10:19:24 2004
***************
*** 3890,3901 ****
<VarListEntry>
<Term>
<literal>P</>
</Term>
<ListItem>
<Para>
Position: the field value is a decimal ASCII integer, indicating
! an error cursor position as an index into the original query string.
The first character has index 1, and positions are measured in
characters not bytes.
</Para>
--- 3890,3916 ----
<VarListEntry>
<Term>
+ <literal>Q</>
+ </Term>
+ <ListItem>
+ <Para>
+ Query: an optional string that contains the processed query string
+ on syntax errors. The position field is expressed with respect to
+ this string. In most cases this the the initial query sent by
+ the client, however in some cases such as with
+ <command>CREATE FUNCTION</> this may be a subset of the initial query.
+ </Para>
+ </ListItem>
+ </VarListEntry>
+
+ <VarListEntry>
+ <Term>
<literal>P</>
</Term>
<ListItem>
<Para>
Position: the field value is a decimal ASCII integer, indicating
! an error cursor position as an index into the processed query string.
The first character has index 1, and positions are measured in
characters not bytes.
</Para>
*** ./src/backend/parser/scan.l.orig Tue Feb 24 22:45:18 2004
--- ./src/backend/parser/scan.l Thu Mar 18 10:09:57 2004
***************
*** 68,73 ****
--- 68,74 ----
/* Handles to the buffer that the lexer uses internally */
static YY_BUFFER_STATE scanbufhandle;
static char *scanbuf;
+ static char *initial_scanbuf; /* for syntax errors, as scanbuf is touched */
unsigned char unescape_single_char(unsigned char c);
***************
*** 623,628 ****
--- 624,630 ----
(errcode(ERRCODE_SYNTAX_ERROR),
/* translator: %s is typically "syntax error" */
errmsg("%s at end of input", message),
+ errquery(initial_scanbuf),
errposition(cursorpos)));
}
else
***************
*** 631,636 ****
--- 633,639 ----
(errcode(ERRCODE_SYNTAX_ERROR),
/* translator: first %s is typically "syntax error" */
errmsg("%s at or near \"%s\"", message, loc),
+ errquery(initial_scanbuf),
errposition(cursorpos)));
}
}
***************
*** 658,663 ****
--- 661,671 ----
scanbuf[slen] = scanbuf[slen + 1] = YY_END_OF_BUFFER_CHAR;
scanbufhandle = yy_scan_buffer(scanbuf, slen + 2);
+ /*
+ * Make a copy in case of error, as scanbuf may be touched.
+ */
+ initial_scanbuf = pstrdup(str);
+
/* initialize literal buffer to a reasonable but expansible size */
literalalloc = 128;
literalbuf = (char *) palloc(literalalloc);
***************
*** 675,680 ****
--- 683,689 ----
{
yy_delete_buffer(scanbufhandle);
pfree(scanbuf);
+ pfree(initial_scanbuf);
}
*** ./src/backend/utils/error/elog.c.orig Mon Mar 15 17:57:28 2004
--- ./src/backend/utils/error/elog.c Thu Mar 18 09:51:24 2004
***************
*** 111,116 ****
--- 111,117 ----
char *detail; /* detail error message */
char *hint; /* hint message */
char *context; /* context message */
+ char *query; /* offending query string */
int cursorpos; /* cursor index into query string */
int saved_errno; /* errno at entry */
} ErrorData;
***************
*** 793,798 ****
--- 794,820 ----
}
/*
+ * errquery --- give offending query, as it may not be exactly the one send.
+ */
+ int
+ errquery(const char * query)
+ {
+ ErrorData *edata = &errordata[errordata_stack_depth];
+ MemoryContext oldcontext;
+
+ recursion_depth++;
+ CHECK_STACK_DEPTH();
+ oldcontext = MemoryContextSwitchTo(ErrorContext);
+
+ edata->query = pstrdup(query);
+
+ MemoryContextSwitchTo(oldcontext);
+ recursion_depth--;
+
+ return 0; /* return value does not matter */
+ }
+
+ /*
* errposition --- add cursor position to the current error
*/
int
***************
*** 1353,1358 ****
--- 1375,1386 ----
pq_sendstring(&msgbuf, edata->context);
}
+ if (edata->query)
+ {
+ pq_sendbyte(&msgbuf, PG_DIAG_STATEMENT);
+ pq_sendstring(&msgbuf, edata->query);
+ }
+
if (edata->cursorpos > 0)
{
snprintf(tbuf, sizeof(tbuf), "%d", edata->cursorpos);
*** ./src/bin/psql/common.c.orig Mon Mar 15 16:13:12 2004
--- ./src/bin/psql/common.c Thu Mar 18 09:55:11 2004
***************
*** 347,381 ****
/*
* on errors, print syntax error position if available.
- *
- * the query is expected to be in the client encoding.
*/
static void
! ReportSyntaxErrorPosition(const PGresult *result, const char *query)
{
#define DISPLAY_SIZE 60 /* screen width limit, in screen cols */
#define MIN_RIGHT_CUT 10 /* try to keep this far away from EOL */
int loc = 0;
! const char *sp;
int clen, slen, i, *qidx, *scridx, qoffset, scroffset, ibeg, iend,
loc_line;
char *wquery;
bool beg_trunc, end_trunc;
PQExpBufferData msg;
if (query == NULL)
! return; /* nothing to do */
sp = PQresultErrorField(result, PG_DIAG_STATEMENT_POSITION);
if (sp == NULL)
return; /* no syntax error location */
- /*
- * We punt if the report contains any CONTEXT. This typically means that
- * the syntax error is from inside a function, and the cursor position
- * is not relevant to the original query string.
- */
- if (PQresultErrorField(result, PG_DIAG_CONTEXT) != NULL)
- return;
if (sscanf(sp, "%d", &loc) != 1)
{
--- 347,382 ----
/*
* on errors, print syntax error position if available.
*/
static void
! ReportSyntaxErrorPosition(const PGresult *result)
{
#define DISPLAY_SIZE 60 /* screen width limit, in screen cols */
#define MIN_RIGHT_CUT 10 /* try to keep this far away from EOL */
int loc = 0;
! const char *sp, *query;
int clen, slen, i, *qidx, *scridx, qoffset, scroffset, ibeg, iend,
loc_line;
char *wquery;
bool beg_trunc, end_trunc;
PQExpBufferData msg;
+ /* get the query from the error report.
+ *
+ * indeed, although most of the time the query processed was the
+ * query passed by the client, which is still available to the client,
+ * in some cases such as with CREATE FUNCTION the query processed
+ * is only a part of the initial query sent. As the position is given
+ * wrt the query being processed, this one must be given by the backend.
+ */
+ query = PQresultErrorField(result, PG_DIAG_STATEMENT);
if (query == NULL)
! return; /* no query, nothing to do */
!
sp = PQresultErrorField(result, PG_DIAG_STATEMENT_POSITION);
if (sp == NULL)
return; /* no syntax error location */
if (sscanf(sp, "%d", &loc) != 1)
{
***************
*** 384,389 ****
--- 385,394 ----
return;
}
+ /* It would be great to have the length of the offending token,
+ * so that highlighting may do some different jobs.
+ */
+
/* Make a writable copy of the query, and a buffer for messages. */
wquery = pg_strdup(query);
***************
*** 565,571 ****
* Returns true for valid result, false for error state.
*/
static bool
! AcceptResult(const PGresult *result, const char *query)
{
bool OK = true;
--- 570,576 ----
* Returns true for valid result, false for error state.
*/
static bool
! AcceptResult(const PGresult *result)
{
bool OK = true;
***************
*** 596,602 ****
if (!OK)
{
psql_error("%s", PQerrorMessage(pset.db));
! ReportSyntaxErrorPosition(result, query);
CheckConnection();
}
--- 601,607 ----
if (!OK)
{
psql_error("%s", PQerrorMessage(pset.db));
! ReportSyntaxErrorPosition(result);
CheckConnection();
}
***************
*** 660,666 ****
res = PQexec(pset.db, query);
! if (!AcceptResult(res, query) && res)
{
PQclear(res);
res = NULL;
--- 665,671 ----
res = PQexec(pset.db, query);
! if (!AcceptResult(res) && res)
{
PQclear(res);
res = NULL;
***************
*** 906,912 ****
results = PQexec(pset.db, query);
/* these operations are included in the timing result: */
! OK = (AcceptResult(results, query) && ProcessCopyResult(results));
if (pset.timing)
GETTIMEOFDAY(&after);
--- 911,917 ----
results = PQexec(pset.db, query);
/* these operations are included in the timing result: */
! OK = (AcceptResult(results) && ProcessCopyResult(results));
if (pset.timing)
GETTIMEOFDAY(&after);
*** ./src/include/postgres_ext.h.orig Sat Nov 29 23:40:53 2003
--- ./src/include/postgres_ext.h Thu Mar 18 09:23:38 2004
***************
*** 58,63 ****
--- 58,64 ----
#define PG_DIAG_MESSAGE_PRIMARY 'M'
#define PG_DIAG_MESSAGE_DETAIL 'D'
#define PG_DIAG_MESSAGE_HINT 'H'
+ #define PG_DIAG_STATEMENT 'Q'
#define PG_DIAG_STATEMENT_POSITION 'P'
#define PG_DIAG_CONTEXT 'W'
#define PG_DIAG_SOURCE_FILE 'F'
*** ./src/include/utils/elog.h.orig Mon Mar 15 17:57:29 2004
--- ./src/include/utils/elog.h Thu Mar 18 09:35:38 2004
***************
*** 131,136 ****
--- 131,137 ----
extern int errfunction(const char *funcname);
extern int errposition(int cursorpos);
+ extern int errquery(const char *query);
/*----------
*** ./src/test/regress/output/create_function_1.source.orig Mon Mar 15 10:22:20 2004
--- ./src/test/regress/output/create_function_1.source Thu Mar 18 10:06:51 2004
***************
*** 57,62 ****
--- 57,64 ----
AS 'not even SQL';
ERROR: syntax error at or near "not" at character 1
CONTEXT: SQL function "test1"
+ LINE 1: not even SQL
+ ^
CREATE FUNCTION test1 (int) RETURNS int LANGUAGE sql
AS 'SELECT 1, 2, 3;';
ERROR: return type mismatch in function declared to return integer
Fabien COELHO <coelho@cri.ensmp.fr> writes:
Please find attached a patch to fix the "CREATE FUNCTION" syntax error
position bug I reported a few days ago.
This strikes me as much too intrusive to be worthwhile ... the problem
is simply not important enough to justify a protocol change. Moreover,
I don't like the proposed protocol change anyway. This approach only
solves the problem for psql-like error reporting, namely clients that
are going to regurgitate a string in their error message and aren't
very picky about whether that string really matches the original input.
One of the design goals for the error reporting feature is that GUI-type
clients should be able to mark syntax error positions by highlighting in
the original input window. This proposal abandons that goal.
regards, tom lane
Dear Tom,
Please find attached a patch to fix the "CREATE FUNCTION" syntax error
position bug I reported a few days ago.This strikes me as much too intrusive to be worthwhile ... the problem
is simply not important enough to justify a protocol change.
Then I can suggest to happend the string after the position.
NO protocol change, but the convention that the string is shown after
the position. Small difference, I sent a proposal later.
Moreover, I don't like the proposed protocol change anyway. This
approach only solves the problem for psql-like error reporting, namely
clients that are going to regurgitate a string in their error message
and aren't very picky about whether that string really matches the
original input. One of the design goals for the error reporting feature
is that GUI-type clients should be able to mark syntax error positions
by highlighting in the original input window. This proposal abandons
that goal.
I cannot see your point.
Any GUI can take advantage of the returned string to show it in a window
with fancy colors and do any hilighting around the position.
I've implemented the stuff for psql (with your help btw), but I cannot see
why it couldn't be used with other interfaces?
The issue is that the error position is not enough in some cases.
I proposed the only possible fix for that, which is to provide the
missing information to the client, which will do something or nothing
about it. The current status is that the information is not available,
so nothing can be done.
Moreover, I had problems with syntax errors in string-embedded queries in
the past, and this is trickier to solve, so I don't see why the error
position should not be fixed for those errors.
So my point is that I agree that the protocole should not be changed, but
I disagree that the "bug" should remain as it is because of some
principles.
Have a nice day,
--
Fabien Coelho - coelho@cri.ensmp.fr
Fabien COELHO <coelho@cri.ensmp.fr> writes:
Moreover, I don't like the proposed protocol change anyway. This
approach only solves the problem for psql-like error reporting, namely
clients that are going to regurgitate a string in their error message
and aren't very picky about whether that string really matches the
original input. One of the design goals for the error reporting feature
is that GUI-type clients should be able to mark syntax error positions
by highlighting in the original input window. This proposal abandons
that goal.
I cannot see your point.
Any GUI can take advantage of the returned string to show it in a window
with fancy colors and do any hilighting around the position.
But it cannot (easily) match it up with the *original input* and put the
cursor in the right place in the *input* window. You are envisioning
something that's no better than psql with window borders. My idea of a
GUI syntax error report is something that puts my editing cursor in the
right place.
I've implemented the stuff for psql (with your help btw), but I cannot see
why it couldn't be used with other interfaces?
It's not the right thing for other interfaces. If it were the right
thing for all interfaces, we'd have put the functionality in the backend
to begin with.
regards, tom lane
Dear Tom,
Any GUI can take advantage of the returned string to show it in a window
with fancy colors and do any hilighting around the position.But it cannot (easily) match it up with the *original input*
Sure. Even the parser in the backend cannot do it, that's the problem!;-)
and put the cursor in the right place in the *input* window. You are
envisioning something that's no better than psql with window borders.
I try to envision what is achievable with a reasonnable effort;-)
If I read you correctly, it is all interfaces or none... As a mostly;
psql user, I'm not lucky;-)
I don't think it is really easy to compute the good position wrt to the
original query if you want to keep into account escapes that are eaten by
the first parsing. I can provide a fix that would catch simple cases,
but not all of them.
Would you accept a "it works sometimes, but it may be wrong others" hack?
My idea of a GUI syntax error report is something that puts my editing
cursor in the right place.
Thus you decided that you prefer that NO interface should be able to show
the correct position, rather than having at least one to do it, and other
being able to display something, because you decided that the only place
to show something in a GUI is in the initial window or never. You don't
like dialog box, I guess;-)
Have a nice day,
--
Fabien Coelho - coelho@cri.ensmp.fr
Fabien COELHO <coelho@cri.ensmp.fr> writes:
But it cannot (easily) match it up with the *original input*
Sure. Even the parser in the backend cannot do it, that's the problem!;-)
We *can* do it, it's just a matter of the pain level. For instance we
could have the main lexer save aside an indication of the number of
quoting characters it discarded at each position. The problem in my
mind is to not put a heavy overhead on the main lexing path for
information that will only get used in an error case.
I was toying with the idea of having the CREATE FUNCTION error callback
go back to the original command string (which I believe is now always
saved in the Portal, and certainly could be if it isn't) and re-parse
it using a slower variant of the lexer that keeps this info. Not sure
what all is involved in that approach, but in principle this would let
us meet the original design goal without any overhead except in the
error case.
Would you accept a "it works sometimes, but it may be wrong others" hack?
I think it'd be okay if it gets the common cases right and doesn't
generate lies in the cases it can't get right.
Also, I think it might be an acceptable compromise to give the position
correctly only when the function body is quoted using dollar-quoting.
(Everybody's gonna be using dollar quoting to write their functions in
7.5, right? ;-)) In that case it is trivial to convert the "inside"
syntax error offset to an offset in the original string literal, and you
only have to know the position of that literal in the original command
to finish up.
BTW, it's worth pointing out that the client will have to special-case
the situation where a CONTEXT entry is present anyhow, since that most
likely means that the syntax error is inside some function called by the
query, and not in the query itself. So the hack in psql doesn't go away
in any case; all that we do differently is not send CONTEXT from the SQL
function parse error callback if we are able to convert the syntax error
offset to something relative to the original command. (One could
envision a really smart client pulling back the text of the function
identified by the topmost CONTEXT entry and putting the cursor on that
--- of course this would have to be in a popup window not the original
query, but it's doable in principle.)
My idea of a GUI syntax error report is something that puts my editing
cursor in the right place.
Thus you decided that you prefer that NO interface should be able to show
the correct position, rather than having at least one to do it, and other
being able to display something, because you decided that the only place
to show something in a GUI is in the initial window or never.
No, I say that we shouldn't put in a kluge that gets it sort-of right in
a simple interface and makes it impossible for better interfaces to get
it really right. I think we can do better than that.
regards, tom lane
Attached is a proposed patch that fixes the
cursor-position-in-CREATE-FUNCTION issue per my earlier suggestion.
Since I complained that your proposal was too grotty, it's only fair to
throw this out to let people take potshots at it ;-). The main
grottiness here is providing access to the executing Portal so that the
callback function can get at the original command string. I don't think
this is unreasonably bad, since we already have a global PortalContext
that points to the portal's memory context --- I just added parallel
code that updates a new global ActivePortal.
The re-parsing of the original command is simplistic but will handle all
normal cases.
regards, tom lane
*** src/backend/catalog/pg_proc.c.orig Sat Mar 13 20:58:41 2004
--- src/backend/catalog/pg_proc.c Thu Mar 18 18:20:20 2004
***************
*** 23,31 ****
--- 23,33 ----
#include "executor/executor.h"
#include "fmgr.h"
#include "miscadmin.h"
+ #include "mb/pg_wchar.h"
#include "parser/parse_coerce.h"
#include "parser/parse_expr.h"
#include "parser/parse_type.h"
+ #include "tcop/pquery.h"
#include "tcop/tcopprot.h"
#include "utils/acl.h"
#include "utils/builtins.h"
***************
*** 45,50 ****
--- 47,56 ----
static Datum create_parameternames_array(int parameterCount,
const char *parameterNames[]);
static void sql_function_parse_error_callback(void *arg);
+ static int match_prosrc_to_query(const char *prosrc, const char *queryText,
+ int cursorpos);
+ static bool match_prosrc_to_literal(const char *prosrc, const char *literal,
+ int cursorpos, int *newcursorpos);
/* ----------------------------------------------------------------
***************
*** 768,774 ****
* location is inside the function body, not out in CREATE FUNCTION.
*/
sqlerrcontext.callback = sql_function_parse_error_callback;
! sqlerrcontext.arg = proc;
sqlerrcontext.previous = error_context_stack;
error_context_stack = &sqlerrcontext;
--- 774,780 ----
* location is inside the function body, not out in CREATE FUNCTION.
*/
sqlerrcontext.callback = sql_function_parse_error_callback;
! sqlerrcontext.arg = tuple;
sqlerrcontext.previous = error_context_stack;
error_context_stack = &sqlerrcontext;
***************
*** 800,821 ****
}
/*
! * error context callback to let us supply a context marker
*/
static void
sql_function_parse_error_callback(void *arg)
{
! Form_pg_proc proc = (Form_pg_proc) arg;
/*
! * XXX it'd be really nice to adjust the syntax error position to
! * account for the offset from the start of the statement to the
! * function body string, not to mention any quoting characters in
! * the string, but I can't see any decent way to do that...
*
! * In the meantime, put in a CONTEXT entry that can cue clients
! * not to trust the syntax error position completely.
*/
! errcontext("SQL function \"%s\"",
! NameStr(proc->proname));
}
--- 806,976 ----
}
/*
! * error context callback to let us adjust syntax errors from SQL functions
*/
static void
sql_function_parse_error_callback(void *arg)
{
! HeapTuple tuple = (HeapTuple) arg;
! Form_pg_proc proc = (Form_pg_proc) GETSTRUCT(tuple);
! int origerrposition;
! const char *queryText;
! bool isnull;
! Datum tmp;
! char *prosrc;
! int newerrposition;
!
! /*
! * Nothing to do unless we are dealing with a syntax error that has
! * a cursor position. In that case, we need to try to adjust the
! * position to match the original query, not the text of the function.
! */
! origerrposition = geterrposition();
! if (origerrposition <= 0)
! return;
!
! /*
! * We can get the original query text from the active portal (hack...)
! */
! Assert(ActivePortal && ActivePortal->portalActive);
! queryText = ActivePortal->sourceText;
!
! /*
! * Try to locate the function text in the original query.
! */
! tmp = SysCacheGetAttr(PROCOID, tuple, Anum_pg_proc_prosrc, &isnull);
! if (isnull)
! elog(ERROR, "null prosrc");
! prosrc = DatumGetCString(DirectFunctionCall1(textout, tmp));
!
! newerrposition = match_prosrc_to_query(prosrc, queryText, origerrposition);
!
! if (newerrposition > 0)
! {
! /* Successful, so fix the error position */
! errposition(newerrposition);
! }
! else
! {
! /*
! * If unsuccessful, put in a CONTEXT entry that will cue clients
! * not to treat the error position as relative to the original query.
! */
! errcontext("SQL function \"%s\"",
! NameStr(proc->proname));
! }
! }
!
! /*
! * Try to locate the string literal containing the function body in the
! * given text of the CREATE FUNCTION command. If successful, return the
! * character (not byte) index within the command corresponding to the
! * given character index within the literal. If not successful, return 0.
! */
! static int
! match_prosrc_to_query(const char *prosrc, const char *queryText,
! int cursorpos)
! {
! /*
! * Rather than fully parsing the CREATE FUNCTION command, we just scan
! * the command looking for $prosrc$ or 'prosrc'. This could be fooled
! * (though not in any very probable scenarios), so fail if we find
! * more than one match.
! */
! int prosrclen = strlen(prosrc);
! int querylen = strlen(queryText);
! int matchpos = 0;
! int curpos;
! int newcursorpos;
!
! for (curpos = 0; curpos < querylen-prosrclen; curpos++)
! {
! if (queryText[curpos] == '$' &&
! strncmp(prosrc, &queryText[curpos+1], prosrclen) == 0 &&
! queryText[curpos+1+prosrclen] == '$')
! {
! /*
! * Found a $foo$ match. Since there are no embedded quoting
! * characters in a dollar-quoted literal, we don't have to do
! * any fancy arithmetic; just offset by the starting position.
! */
! if (matchpos)
! return 0; /* multiple matches, fail */
! matchpos = pg_mbstrlen_with_len(queryText, curpos+1)
! + cursorpos;
! }
! else if (queryText[curpos] == '\'' &&
! match_prosrc_to_literal(prosrc, &queryText[curpos+1],
! cursorpos, &newcursorpos))
! {
! /*
! * Found a 'foo' match. match_prosrc_to_literal() has adjusted
! * for any quotes or backslashes embedded in the literal.
! */
! if (matchpos)
! return 0; /* multiple matches, fail */
! matchpos = pg_mbstrlen_with_len(queryText, curpos+1)
! + newcursorpos;
! }
! }
!
! return matchpos;
! }
!
! /*
! * Try to match the given source text to a single-quoted literal.
! * If successful, adjust newcursorpos to correspond to the character
! * (not byte) index corresponding to cursorpos in the source text.
! *
! * At entry, literal points just past a ' character. We must check for the
! * trailing quote.
! */
! static bool
! match_prosrc_to_literal(const char *prosrc, const char *literal,
! int cursorpos, int *newcursorpos)
! {
! int newcp = cursorpos;
! int chlen;
/*
! * This implementation handles backslashes and doubled quotes in the
! * string literal. It does not handle the SQL syntax for literals
! * continued across line boundaries.
*
! * We do the comparison a character at a time, not a byte at a time,
! * so that we can do the correct cursorpos math.
*/
! while (*prosrc)
! {
! cursorpos--; /* characters left before cursor */
! /*
! * Check for backslashes and doubled quotes in the literal; adjust
! * newcp when one is found before the cursor.
! */
! if (*literal == '\\')
! {
! literal++;
! if (cursorpos > 0)
! newcp++;
! }
! else if (*literal == '\'')
! {
! if (literal[1] != '\'')
! return false;
! literal++;
! if (cursorpos > 0)
! newcp++;
! }
! chlen = pg_mblen(prosrc);
! if (strncmp(prosrc, literal, chlen) != 0)
! return false;
! prosrc += chlen;
! literal += chlen;
! }
!
! *newcursorpos = newcp;
!
! if (*literal == '\'' && literal[1] != '\'')
! return true;
! return false;
}
*** src/backend/commands/portalcmds.c.orig Sat Nov 29 14:51:47 2003
--- src/backend/commands/portalcmds.c Thu Mar 18 16:57:10 2004
***************
*** 270,275 ****
--- 270,276 ----
PersistHoldablePortal(Portal portal)
{
QueryDesc *queryDesc = PortalGetQueryDesc(portal);
+ Portal saveActivePortal;
MemoryContext savePortalContext;
MemoryContext saveQueryContext;
MemoryContext oldcxt;
***************
*** 311,316 ****
--- 312,319 ----
/*
* Set global portal context pointers.
*/
+ saveActivePortal = ActivePortal;
+ ActivePortal = portal;
savePortalContext = PortalContext;
PortalContext = PortalGetHeapMemory(portal);
saveQueryContext = QueryContext;
***************
*** 342,347 ****
--- 345,351 ----
/* Mark portal not active */
portal->portalActive = false;
+ ActivePortal = saveActivePortal;
PortalContext = savePortalContext;
QueryContext = saveQueryContext;
*** src/backend/tcop/postgres.c.orig Mon Mar 15 15:01:58 2004
--- src/backend/tcop/postgres.c Thu Mar 18 16:56:55 2004
***************
*** 2710,2715 ****
--- 2710,2716 ----
*/
MemoryContextSwitchTo(TopMemoryContext);
MemoryContextResetAndDeleteChildren(ErrorContext);
+ ActivePortal = NULL;
PortalContext = NULL;
QueryContext = NULL;
*** src/backend/tcop/pquery.c.orig Fri Mar 5 15:57:31 2004
--- src/backend/tcop/pquery.c Thu Mar 18 16:56:56 2004
***************
*** 23,28 ****
--- 23,35 ----
#include "utils/memutils.h"
+ /*
+ * ActivePortal is the currently executing Portal (the most closely nested,
+ * if there are several).
+ */
+ Portal ActivePortal = NULL;
+
+
static uint32 RunFromStore(Portal portal, ScanDirection direction, long count,
DestReceiver *dest);
static long PortalRunSelect(Portal portal, bool forward, long count,
***************
*** 395,400 ****
--- 402,408 ----
char *completionTag)
{
bool result;
+ Portal saveActivePortal;
MemoryContext savePortalContext;
MemoryContext saveQueryContext;
MemoryContext oldContext;
***************
*** 433,438 ****
--- 441,448 ----
/*
* Set global portal context pointers.
*/
+ saveActivePortal = ActivePortal;
+ ActivePortal = portal;
savePortalContext = PortalContext;
PortalContext = PortalGetHeapMemory(portal);
saveQueryContext = QueryContext;
***************
*** 508,513 ****
--- 518,524 ----
/* Mark portal not active */
portal->portalActive = false;
+ ActivePortal = saveActivePortal;
PortalContext = savePortalContext;
QueryContext = saveQueryContext;
***************
*** 925,930 ****
--- 936,942 ----
DestReceiver *dest)
{
long result;
+ Portal saveActivePortal;
MemoryContext savePortalContext;
MemoryContext saveQueryContext;
MemoryContext oldContext;
***************
*** 948,953 ****
--- 960,967 ----
/*
* Set global portal context pointers.
*/
+ saveActivePortal = ActivePortal;
+ ActivePortal = portal;
savePortalContext = PortalContext;
PortalContext = PortalGetHeapMemory(portal);
saveQueryContext = QueryContext;
***************
*** 972,977 ****
--- 986,992 ----
/* Mark portal not active */
portal->portalActive = false;
+ ActivePortal = saveActivePortal;
PortalContext = savePortalContext;
QueryContext = saveQueryContext;
*** src/backend/utils/error/elog.c.orig Mon Mar 15 15:01:58 2004
--- src/backend/utils/error/elog.c Thu Mar 18 17:36:58 2004
***************
*** 808,813 ****
--- 808,829 ----
return 0; /* return value does not matter */
}
+ /*
+ * geterrposition --- return the currently set error position (0 if none)
+ *
+ * This is only intended for use in error callback subroutines, since there
+ * is no other place outside elog.c where the concept is meaningful.
+ */
+ int
+ geterrposition(void)
+ {
+ ErrorData *edata = &errordata[errordata_stack_depth];
+
+ /* we don't bother incrementing recursion_depth */
+ CHECK_STACK_DEPTH();
+
+ return edata->cursorpos;
+ }
/*
* elog_finish --- finish up for old-style API
*** src/include/tcop/pquery.h.orig Sat Nov 29 17:41:14 2003
--- src/include/tcop/pquery.h Thu Mar 18 16:56:49 2004
***************
*** 17,22 ****
--- 17,25 ----
#include "utils/portal.h"
+ extern DLLIMPORT Portal ActivePortal;
+
+
extern void ProcessQuery(Query *parsetree,
Plan *plan,
ParamListInfo params,
*** src/include/utils/elog.h.orig Mon Mar 15 15:02:09 2004
--- src/include/utils/elog.h Thu Mar 18 17:36:51 2004
***************
*** 132,137 ****
--- 132,139 ----
extern int errfunction(const char *funcname);
extern int errposition(int cursorpos);
+ extern int geterrposition(void);
+
/*----------
* Old-style error reporting API: to be used in this way:
Dear Tom,
Attached is a proposed patch that fixes the
cursor-position-in-CREATE-FUNCTION issue per my earlier suggestion.The re-parsing of the original command is simplistic but will handle all
normal cases.
[...]
That's quite a demonstration;-)
However, I still stick with my "bad" simple idea because the simpler the
better, and also because of the following example:
psql> CREATE OR REPLACE FUNCTION count_tup(TEXT) RETURNS INTEGER AS '
DECLARE
n RECORD;
BEGIN
FOR n IN EXECUTE \'SELECT COUNT(*) AS c FRM \' || $1
LOOP
RETURN n.c;
END LOOP;
RETURN NULL;
END;'
LANGUAGE plpgsql;
psql> SELECT count_tup('pg_shadow');
ERROR: syntax error at or near "FRM" at character 22
CONTEXT: PL/pgSQL function "count_tup" line 4 at for over execute statement
As you can notice, the extract is not in the submitted query, so there
is no point to show it there.
Maybe real PL/pgSQL programmers will never have syntax errors with their
SQL stuff.
Thus I really think that the parser should return the processed query,
at least in some cases.
Anyway, have a nice day!
--
Fabien Coelho - coelho@cri.ensmp.fr
[ moving thread to hackers ]
Fabien COELHO <coelho@cri.ensmp.fr> writes:
However, I still stick with my "bad" simple idea because the simpler the
better, and also because of the following example:
...
psql> SELECT count_tup('pg_shadow');
ERROR: syntax error at or near "FRM" at character 22
CONTEXT: PL/pgSQL function "count_tup" line 4 at for over execute statement
As you can notice, the extract is not in the submitted query, so there
is no point to show it there.
Yeah. However, I dislike your solution because it confuses the cases of
a syntax error in the actually submitted query, and a syntax error in an
internally generated query. We should keep these cases clearly separate
because clients may want to do different things. For a syntax error in
the submitted input, what you probably want to do is edit and resubmit
the original query --- that's the case I was thinking about in saying
that a GUI client like pgadmin would want to set the editing cursor in
the original input window. But this action is nonsensical if the syntax
error is from a generated query. Perhaps the GUI client could be smart
enough to pop up a new window in which one could edit and resubmit the
erroneous function definition.
Even in psql's simplistic error handling, you want to distinguish the
two cases. There's no point in showing the entire original query; one
line worth of context is plenty. But you very probably do want to see
all of a generated query. So I don't want the backend sending back
error reports that look the same in both cases.
The original design concept for the 'P' (position) error field is that
it would be used to locate syntax errors in the *original query*, and
so its presence is a cue to the client code to go in the direction of
setting the editing cursor. (Note the protocol specification says
"index into the original query string".) We have in fact misimplemented
it, because it is being set for syntax errors in internally generated
queries too.
I was already planning to modify plpgsql to send back the full text of
generated queries when there is an error. My intention was to supply
this just as part of the CONTEXT stack, that is instead of your example
of
ERROR: syntax error at or near "FRM" at character 22
CONTEXT: PL/pgSQL function "count_tup" line 4 at for over execute statement
you'd get something like
ERROR: syntax error at or near "FRM" at character 22
CONTEXT: Executing command "SELECT COUNT(*) AS c FRM pg_shadow"
PL/pgSQL function "count_tup" line 4 at for over execute statement
However it might be better to invent a new error-message field that
carries just the text of the SQL command, rather than stuffing it into
CONTEXT. (This is similar to your original patch, but different in
detail because I'm envisioning sending back generated queries, never the
submitted query. Regurgitating the submitted query is just a waste of
bandwidth.) The plus side of that would be that it'd be easy to extract
for syntax-error highlighting. The minuses are that existing clients
would fail to print such a field (the protocol spec says to ignore
unknown fields), and that there is no good way to cope with nested
queries.
A possible compromise is to put the text of the generated SQL command
into a new field only if the error is a syntax error, and put it into
the CONTEXT stack otherwise. Syntax errors couldn't be nested so at
least that problem goes away. This seems a bit messy though.
The other thing to think about is whether we should invent a new field
to carry syntax error position for generated queries, rather than making
'P' do double duty as it does now. If we don't do that then we have to
change the protocol specification to reflect reality. In any case I
think it has to be possible to tell very easily from the error message
whether the 'P' position refers to the submitted query or a generated
query.
Comments anyone?
regards, tom lane
Dear Tom,
However, I still stick with my "bad" simple idea because the simpler the
better, and also because of the following example:
...
psql> SELECT count_tup('pg_shadow');
ERROR: syntax error at or near "FRM" at character 22
CONTEXT: PL/pgSQL function "count_tup" line 4 at for over execute statementAs you can notice, the extract is not in the submitted query, so there
is no point to show it there.Yeah. However, I dislike your solution because it confuses the cases of
a syntax error in the actually submitted query, and a syntax error in an
internally generated query. We should keep these cases clearly separate
because clients may want to do different things.
[...]
I agree with you that both reports should not look the same.
The good news is that they already do not look the same, thanks
to the CONTEXT information. However, the context information is
informal (it is just plain English), to it is not easy for a client
to take that into account.
The original design concept for the 'P' (position) error field is that
it would be used to locate syntax errors in the *original query*, and
so its presence is a cue to the client code to go in the direction of
setting the editing cursor. (Note the protocol specification says
"index into the original query string".)
Yes, I noted that.
In my proposed patch, I changed it to the "processed" query, which
may or may not be the initial query.
We have in fact misimplemented it, because it is being set for syntax
errors in internally generated queries too.
Well, from the parser point of view, it is a little bit messy to have
to do different things for error reporting in different context. That
why I would try a one-fit-all solution. Maybe I'm a little bit lazy, but
sometimes it is a quality in programming, if it help keep things simple.
I was already planning to modify plpgsql to send back the full text of
generated queries when there is an error. My intention was to supply
this just as part of the CONTEXT stack, that is instead of your example
ofERROR: syntax error at or near "FRM" at character 22
CONTEXT: PL/pgSQL function "count_tup" line 4 at for over execute statementyou'd get something like
ERROR: syntax error at or near "FRM" at character 22
CONTEXT: Executing command "SELECT COUNT(*) AS c FRM pg_shadow"
PL/pgSQL function "count_tup" line 4 at for over execute statementHowever it might be better to invent a new error-message field that
carries just the text of the SQL command, rather than stuffing it into
CONTEXT.
I'm not sure I would like the CONTEXT field for that? as it may be
usefull for a lot of things. In your above example, the CONTEXT is filled
with 2 different informations that are just messed up for the client.
If localisation is used, there would be no way for a client to seperate
them. Different information should require different fields.
More over, I have other ideas for CONTEXT, which should really be a stack.
But that is another issue.
(This is similar to your original patch, but different in detail because
I'm envisioning sending back generated queries, never the submitted
query. Regurgitating the submitted query is just a waste of bandwidth.)
Well, if it is the same I agree. On the other hand, it is a rare case,
during an exception. I would expect submitted queries to work most of the
time, because the client is just some program which uses the database.
The plus side of that would be that it'd be easy to extract
for syntax-error highlighting. The minuses are that existing clients
would fail to print such a field (the protocol spec says to ignore
unknown fields
That's a really good design idea, because it allows to include new fields
and still to have old client compatible with the new protocol.
I think it is no big deal if existing clients don't make use if the new
information. It was not there before anyway, so it is just as is was
before.
Moreover, one should distinguish between fields for the human and fields
for the client. reporting the query and the position is more for the
client, giving the context is more for the human. One is use for
some automatic processing, the other is a high level semantical
information to be interpreted by the user.
), and that there is no good way to cope with nested queries.
A possible compromise is to put the text of the generated SQL command
into a new field only if the error is a syntax error, and put it into
the CONTEXT stack otherwise. Syntax errors couldn't be nested so at,
least that problem goes away. This seems a bit messy though.
I don't like it, because above comments.
The other thing to think about is whether we should invent a new field
to carry syntax error position for generated queries, rather than making
'P' do double duty as it does now. If we don't do that then we have to
change the protocol specification to reflect reality. In any case I
think it has to be possible to tell very easily from the error message
whether the 'P' position refers to the submitted query or a generated
query.
I think a new field is alas necessary. I thought about a
P 22 SELECT something FRM table
but psql simply uses the raw "P" field content in the message, so you
would get:
... at or near character 22 SELECT something FRM table
with old clients:-(
--
Fabien Coelho - coelho@cri.ensmp.fr
Fabien COELHO <coelho@cri.ensmp.fr> writes:
I agree with you that both reports should not look the same.
The good news is that they already do not look the same, thanks
to the CONTEXT information.
Right, but you quite properly didn't like my quick-hack to psql that
assumes that the presence of any CONTEXT means it's not a top-level
syntax error. I would like to replace that hack with something cleaner.
We have in fact misimplemented it, because it is being set for syntax
errors in internally generated queries too.
Well, from the parser point of view, it is a little bit messy to have
to do different things for error reporting in different context. That
why I would try a one-fit-all solution.
The parser needn't do anything different. What I'm imagining is that
for internally submitted queries, there will always be an
error_context_stack routine that can transform the error report into
whatever we decide the appropriate format is.
However it might be better to invent a new error-message field that
carries just the text of the SQL command, rather than stuffing it into
CONTEXT.
I'm not sure I would like the CONTEXT field for that? as it may be
usefull for a lot of things. In your above example, the CONTEXT is filled
with 2 different informations that are just messed up for the client.
If localisation is used, there would be no way for a client to seperate
them. Different information should require different fields.
The point is that CONTEXT is essentially a record of "how we got here".
In a situation where the actual error occurs inside a couple of levels
of nesting, you want to be able to report the outer queries as well as
the one that directly caused the error. I agree that there's probably
little hope of clients automatically making sense of the CONTEXT info.
But we need to provide it to help people debug complex functions.
More over, I have other ideas for CONTEXT, which should really be a stack.
It already is a stack.
The other thing to think about is whether we should invent a new field
to carry syntax error position for generated queries, rather than making
'P' do double duty as it does now.
I think a new field is alas necessary.
I'm leaning in that direction also. How about
'P': used only for syntax error position in the client-submitted query.
'p': syntax error position in an internally-generated query.
'q': text of an internally-generated query ('p' and 'q' would always
appear together).
In the case of a non-syntax error in an internally generated query, we
should stick the query text into the CONTEXT stack, since it might not
be the most closely nested context item anyway.
An existing client will ignore the 'p' and 'q' fields, thus providing
behavior that's no worse than what you get with 7.4 now.
regards, tom lane
Dear Tom,
The point is that CONTEXT is essentially a record of "how we got here".
Yes, but for human eyes.
In a situation where the actual error occurs inside a couple of levels
of nesting, you want to be able to report the outer queries as well as
the one that directly caused the error. I agree that there's probably
little hope of clients automatically making sense of the CONTEXT info.
But we need to provide it to help people debug complex functions.
Yes.
More over, I have other ideas for CONTEXT, which should really be a stack.
It already is a stack.
Ok, I agree that there is a "push", but I'm still looking fot the "pop".
Maybe I missed something, but it seemed to me that strings are appended
on to the other, and there is no way back.
That's what I mean by a "real" stack. If a student of mine comes to me
with his or her "stack" without a "pop", he or she will not have a very
good grade;-)
So when more information are to be fed into the context, they are to be
fed when they are actually needed on the error, maybe with callbacks?
They cannot be fed prevently.
What I would have looked for, is a stack on which functions could push
and pop informations as they want, so that the stack would be always
available for any error or warning or debug trace down the callgraph.
If it is the case already, as I said above, I haven't seen the "pop"
feature yet.
I think a new field is alas necessary.
I'm leaning in that direction also. How about
'P': used only for syntax error position in the client-submitted query.
'p': syntax error position in an internally-generated query.
'q': text of an internally-generated query ('p' and 'q' would always
appear together).
Why not.
Taking your very idea a little step further, I would suggest the
following, which is pretty similar, but even more general:
1/ field conventions:
- upper case letter fields are for humans.
they are meant to be shown to the user, even new ones?
- lower case letter fields are for machines / clients
they are never meant to be shown to the user!
but they should give some information to the client on how to
process an error report.
2/ new fields:
- p: character [length] [reserved for future use]
where the "parser error" occured, with the length of the
offending token if available (for better hilighting?)
this is basically the current P, but not meant to the user,
and with possible additions.
- q: select foo frm bla;
the raw sql query that was being processed, whatever the case.
I think syntax error are rare occurences, so it is no big deal
to return the query anyway. this may make some clients code easier
because there is no difference in the processing for different kind
of errors.
- c: context-description
some context for the client, not the human, so no localisation.
something like:
c: lang=sql query
c: lang=sql query-part
c: lang=sql function 'foo' 4
it could be adapted to other syntax errors, something like.
c: lang=plpgsql function 'foo' 4
c: lang=perl function 'bla' 43
or separate fields: l:sql for the language, f: for the function...
in such cases, the q: field would not necessarily contains sql.
the client may open an editor for plpgsql function of it sees it fits...
or whatever. The information is available, that is the point.
Have a nice day,
--
Fabien Coelho - coelho@cri.ensmp.fr
Fabien COELHO <coelho@cri.ensmp.fr> writes:
More over, I have other ideas for CONTEXT, which should really be a stack.
It already is a stack.
Ok, I agree that there is a "push", but I'm still looking fot the "pop".
Maybe I missed something, but it seemed to me that strings are appended
on to the other, and there is no way back.
But the string list is not constructed until the error actually occurs.
You don't need a pop at that point --- the call stack is what it is.
I think you are imagining that outer-level context hooks should be able
to editorialize on what inner-level ones said (or perhaps vice versa?)
but I honestly cannot think of a valid use-case for that.
What I would have looked for, is a stack on which functions could push
and pop informations as they want, so that the stack would be always
available for any error or warning or debug trace down the callgraph.
Look at the existing examples of adjusting the error_context_stack.
They already do all that, they just don't bother to compute the error
strings unless actually needed. I'm not willing to push very much cost
into the non-error path of control when there's no visible payoff.
regards, tom lane
Maybe I missed something, but it seemed to me that strings are appended
on to the other, and there is no way back.But the string list is not constructed until the error actually occurs.
You don't need a pop at that point --- the call stack is what it is.I think you are imagining that outer-level context hooks should be able
to editorialize on what inner-level ones said (or perhaps vice versa?)
No.
I just missed the "error_context_stack" by focussing on "errcontext()",
which does not managed a stack at all, so I was quite puzzled. Now
it is clearer.
Thanks,
--
Fabien Coelho - coelho@cri.ensmp.fr