array_in sub function ReadArrayDimensions error message

Started by jian heover 1 year ago10 messages
#1jian he
jian.universality@gmail.com

while reviewing the json query doc,
I found out the following error message was not quite right.

select '[1,2]'::int[];
ERROR: malformed array literal: "[1,2]"
LINE 1: select '[1,2]'::int[];
^
DETAIL: Missing "]" after array dimensions.
should it be:
"Missing delimiter ":" while specifying array dimensions."
?
Because here, the first problem is the comma should be colon.

also
"DETAIL: Missing "]" after array dimensions."
should be
DETAIL: Missing "]" while specifying array dimensions.
?

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: jian he (#1)
Re: array_in sub function ReadArrayDimensions error message

jian he <jian.universality@gmail.com> writes:

while reviewing the json query doc,
I found out the following error message was not quite right.

select '[1,2]'::int[];
ERROR: malformed array literal: "[1,2]"
LINE 1: select '[1,2]'::int[];
^
DETAIL: Missing "]" after array dimensions.

should it be:
"Missing delimiter ":" while specifying array dimensions."

That's presuming quite a lot about the nature of the error.
All the code knows is that what follows the "1" should be
either ":" or "]", and when it sees "," instead it throws
this error. I agree the existing message isn't great, but
trying to be more specific could confuse people even more
if the more-specific message doesn't apply either.

One possibility could be

         if (*p != ']')
             ereturn(escontext, false,
                     (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                      errmsg("malformed array literal: \"%s\"", origStr),
+                     (strchr(p, ']') != NULL) ?
+                     errdetail("Array dimensions have invalid syntax.") :
                      errdetail("Missing \"%s\" after array dimensions.",
                                "]")));

that is, only say "Missing "]"" if there's no ']' anywhere, and
otherwise just say the dimensions are wrong. This could be fooled
by a ']' that's part of some string in the data, but even then the
errdetail isn't wrong.

Or we could just say "Array dimensions have invalid syntax."
unconditionally.

regards, tom lane

#3jian he
jian.universality@gmail.com
In reply to: Tom Lane (#2)
1 attachment(s)
Re: array_in sub function ReadArrayDimensions error message

On Mon, Jul 8, 2024 at 10:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

jian he <jian.universality@gmail.com> writes:

while reviewing the json query doc,
I found out the following error message was not quite right.

select '[1,2]'::int[];
ERROR: malformed array literal: "[1,2]"
LINE 1: select '[1,2]'::int[];
^
DETAIL: Missing "]" after array dimensions.

should it be:
"Missing delimiter ":" while specifying array dimensions."

That's presuming quite a lot about the nature of the error.
All the code knows is that what follows the "1" should be
either ":" or "]", and when it sees "," instead it throws
this error. I agree the existing message isn't great, but
trying to be more specific could confuse people even more
if the more-specific message doesn't apply either.

One possibility could be

if (*p != ']')
ereturn(escontext, false,
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
errmsg("malformed array literal: \"%s\"", origStr),
+                     (strchr(p, ']') != NULL) ?
+                     errdetail("Array dimensions have invalid syntax.") :
errdetail("Missing \"%s\" after array dimensions.",
"]")));

that is, only say "Missing "]"" if there's no ']' anywhere, and
otherwise just say the dimensions are wrong. This could be fooled
by a ']' that's part of some string in the data, but even then the
errdetail isn't wrong.

Or we could just say "Array dimensions have invalid syntax."
unconditionally.

regards, tom lane

we can
if (*p == ':')
{
....

if (*p != ']')
ereturn(escontext, false,
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
errmsg("malformed array literal: \"%s\"", origStr),
errdetail("Missing \"%s\" after array dimensions.",
"]")));
}
else
{
if (*p != ']')
ereturn(escontext, false,
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
errmsg("malformed array literal: \"%s\"", origStr),
errdetail("Missing delimiter \"%s\" while specifying array dimensions.",
":")));
}

different error message in IF ELSE blocks.
please check attached.

Attachments:

improve_ReadArrayDimensions_message.difftext/x-patch; charset=US-ASCII; name=improve_ReadArrayDimensions_message.diffDownload
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index d6641b57..c5653a6a 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -453,19 +453,25 @@ ReadArrayDimensions(char **srcptr, int *ndim_p, int *dim, int *lBound,
 						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
 						 errmsg("malformed array literal: \"%s\"", origStr),
 						 errdetail("Missing array dimension value.")));
+			if (*p != ']')
+				ereturn(escontext, false,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						errmsg("malformed array literal: \"%s\"", origStr),
+						errdetail("Missing \"%s\" after array dimensions.",
+								"]")));
 		}
 		else
 		{
 			/* [n] format */
 			lBound[ndim] = 1;
 			ub = i;
+			if (*p != ']')
+				ereturn(escontext, false,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						errmsg("malformed array literal: \"%s\"", origStr),
+						errdetail("Missing delimiter \"%s\" while specifying array dimensions.",
+								":")));
 		}
-		if (*p != ']')
-			ereturn(escontext, false,
-					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-					 errmsg("malformed array literal: \"%s\"", origStr),
-					 errdetail("Missing \"%s\" after array dimensions.",
-							   "]")));
 		p++;
 
 		/*
#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: jian he (#3)
Re: array_in sub function ReadArrayDimensions error message

jian he <jian.universality@gmail.com> writes:

On Mon, Jul 8, 2024 at 10:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

One possibility could be
...
that is, only say "Missing "]"" if there's no ']' anywhere, and
otherwise just say the dimensions are wrong. This could be fooled
by a ']' that's part of some string in the data, but even then the
errdetail isn't wrong.

Or we could just say "Array dimensions have invalid syntax."
unconditionally.

please check attached.

Meh. This just replaces one possibly-off-point error message
with a different one that will be off-point in a different set
of cases. I think we should simply reduce the specificity of
the message.

BTW, I think we have pretty much the same issue with respect
to the check for "=" later on. For instance, if someone
is confused about whether to insert commas:

=# select '[1:1],[1:2]={{3,4}}'::int[];
ERROR: malformed array literal: "[1:1],[1:2]={{3,4}}"
LINE 1: select '[1:1],[1:2]={{3,4}}'::int[];
^
DETAIL: Missing "=" after array dimensions.

Here again, the problem is not a missing "=", it's invalid
syntax somewhere before that.

Another thing we could consider doing here (and similarly
for your original case) is

DETAIL: Expected "=" not "," after array dimensions.

The advantage of this is that it provides a little more
clarity as to exactly where things went wrong.

regards, tom lane

#5David G. Johnston
david.g.johnston@gmail.com
In reply to: Tom Lane (#4)
Re: array_in sub function ReadArrayDimensions error message

On Tue, Jul 9, 2024 at 8:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Here again, the problem is not a missing "=", it's invalid
syntax somewhere before that.

Another thing we could consider doing here (and similarly
for your original case) is

DETAIL: Expected "=" not "," after array dimensions.

The advantage of this is that it provides a little more
clarity as to exactly where things went wrong.

One possibility all this ignores is that what we are calling
array-dimensions are in reality most likely a user using json array syntax
in our SQL arrays. That seems eminently more likely than someone
mis-typing this niche incantation of building an array literal nowadays.
I'd add a hint if the first symbol is [ and we fail to get to the point of
actually seeing the equal sign or the first subsequent unquoted symbol is a
comma instead of a colon.

David J.

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#5)
Re: array_in sub function ReadArrayDimensions error message

"David G. Johnston" <david.g.johnston@gmail.com> writes:

One possibility all this ignores is that what we are calling
array-dimensions are in reality most likely a user using json array syntax
in our SQL arrays. That seems eminently more likely than someone
mis-typing this niche incantation of building an array literal nowadays.

Yeah, that's a good point.

I'd add a hint if the first symbol is [ and we fail to get to the point of
actually seeing the equal sign or the first subsequent unquoted symbol is a
comma instead of a colon.

That seems closely related to my suggestion of applying strchr() to
see if the character we are expecting to see actually appears
anywhere. Or did you have something else in mind? Please be
specific, both about the test you are thinking of and how the
message should be worded.

regards, tom lane

#7David G. Johnston
david.g.johnston@gmail.com
In reply to: Tom Lane (#6)
Re: array_in sub function ReadArrayDimensions error message

On Tue, Jul 9, 2024 at 8:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David G. Johnston" <david.g.johnston@gmail.com> writes:

I'd add a hint if the first symbol is [ and we fail to get to the point

of

actually seeing the equal sign or the first subsequent unquoted symbol

is a

comma instead of a colon.

That seems closely related to my suggestion of applying strchr() to
see if the character we are expecting to see actually appears
anywhere. Or did you have something else in mind? Please be
specific, both about the test you are thinking of and how the
message should be worded.

Pseudo-code, the syntax for adding a conditional errhint I didn't try to
learn along with figuring out the logic. Also not totally sure on
the ReadDimensionInt behavior interplay here.

In short, the ambiguous first dimension [n] case is cleared by seeing
either a second dimension or the equal sign separator. (ToDo: see how
end-of-string get resolved)

The [n:m] first dimension case clears as soon as the colon is seen.

The normal, undimensioned case clears once the first non-blank character is
not a [

If we error before clearing we assume the query author provided an SQL
array using json syntax and point out that the delimiters for SQL arrays
are {}.
We also assume, in the single bounds case, that a too-large upper-bound
value means they did intend to supply a single number in a json array
format. We would need to modify these tests so they occur after checking
whether the next part of the string is [ or = but, in the [ case, before
processing the next dimension.

diff --git a/src/backend/utils/adt/arrayfuncs.c
b/src/backend/utils/adt/arrayfuncs.c
index d6641b570d..0ac1eabba1 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -404,7 +404,7 @@ ReadArrayDimensions(char **srcptr, int *ndim_p, int
*dim, int *lBound,
 {
        char       *p = *srcptr;
        int                     ndim;
-
+    bool couldBeJson = true;
        /*
         * Dimension info takes the form of one or more [n] or [m:n]
items.  This
         * loop iterates once per dimension item.
@@ -422,8 +422,19 @@ ReadArrayDimensions(char **srcptr, int *ndim_p, int
*dim, int *lBound,
                 */
                while (scanner_isspace(*p))
                        p++;
+               if (*p == '=') {
+                       //if we have an equals sign we are not dealing with
a json array
+                       couldBeJson = false;
+                       break; // and we are at the end of the bounds
specification for the SQL array literal we do have
+               }
                if (*p != '[')
-                       break;                          /* no more
dimension items */
+               {
+                       couldBeJson = false; // json arrays will start with
[
+                       break;
+                       // on subsequent passes we better have either this
or an equal sign and the later is covered above
+               }                       /* no (more?) dimension items */
+               if (ndim > 0)
+                       couldBeJson = false; //multi-dimensional arrays
specs are invalid json arrays
                p++;
                if (ndim >= MAXDIM)
                        ereturn(escontext, false,
@@ -438,11 +449,18 @@ ReadArrayDimensions(char **srcptr, int *ndim_p, int
*dim, int *lBound,
                        ereturn(escontext, false,
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                                         errmsg("malformed array literal:
\"%s\"", origStr),
-                                        errdetail("\"[\" must introduce
explicitly-specified array dimensions.")));
+                                        errdetail("\"[\" must introduce
explicitly-specified array dimensions.")),
+                                        //if it isn't a digit and we might
still have a json array its a good bet it is one
+                                        //with non-numeric content
+                                        couldBeJson ? errhint("SQL array
vaLues are delimited by {}" : null));
                if (*p == ':')
                {
                        /* [m:n] format */
+                       //if we have numbers as the first entry, the
presence of a colon,
+                       //which is not a valid json separator, in a number
literal or an array,
+                       //means we have a bounds specification in an SQL
array
+                       couldBeJson = false;
                        lBound[ndim] = i;
                        p++;
                        q = p;
@@ -454,18 +472,22 @@ ReadArrayDimensions(char **srcptr, int *ndim_p, int
*dim, int *lBound,
                                                 errmsg("malformed array
literal: \"%s\"", origStr),
                                                 errdetail("Missing array
dimension value.")));
                }
-               else
+               else if (*p == ']')
                {
                        /* [n] format */
+                       //single digit doesn't disprove json with single
number element
                        lBound[ndim] = 1;
                        ub = i;
                }
-               if (*p != ']')
+               else
+               {
                        ereturn(escontext, false,
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                                         errmsg("malformed array literal:
\"%s\"", origStr),
                                         errdetail("Missing \"%s\" after
array dimensions.",
-                                                          "]")));
+                                                          "]")),
+                                       couldBeJson ? errhint("SQL array
values are delimited by {}" : null));
+               }
                p++;
                /*
@@ -475,29 +497,37 @@ ReadArrayDimensions(char **srcptr, int *ndim_p, int
*dim, int *lBound,
                 * would be equivalent.  Given the lack of field demand,
there seems
                 * little point in allowing such cases.
                 */
+               //not possible in the single bound case so cannot be json
                if (ub < lBound[ndim])
                        ereturn(escontext, false,

(errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
errmsg("upper bound cannot be less
than lower bound")));

/* Upper bound of INT_MAX must be disallowed, cf
ArrayCheckBounds() */
+ // the singular json number may very well be larger than an
integer...
if (ub == INT_MAX)
ereturn(escontext, false,

(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-                                        errmsg("array upper bound is too
large: %d", ub)));
+                                        errmsg("array upper bound is too
large: %d", ub),
+                                        couldBeJson ? errhint("SQL array
values are delimited by {}" : null)));

/* Compute "ub - lBound[ndim] + 1", detecting overflow */
+ //a whatever this threshold is a random number passed in a
json array likely will exceed it
if (pg_sub_s32_overflow(ub, lBound[ndim], &ub) ||
pg_add_s32_overflow(ub, 1, &ub))
ereturn(escontext, false,

(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                                         errmsg("array size exceeds the
maximum allowed (%d)",
-                                                       (int)
MaxArraySize)));
+                                                       (int) MaxArraySize,
+                                       couldBeJson ? errhint("SQL array
values are delimited by {}" : null))));

dim[ndim] = ub;
ndim++;
}

+    if (couldBeJson == true)
+           assert('not reachable, need to disprove json array literal
prior to returning');
+
        *srcptr = p;
        *ndim_p = ndim;
        return true;

David J.

#8jian he
jian.universality@gmail.com
In reply to: Tom Lane (#4)
Re: array_in sub function ReadArrayDimensions error message

On Tue, Jul 9, 2024 at 11:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

jian he <jian.universality@gmail.com> writes:

On Mon, Jul 8, 2024 at 10:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

One possibility could be
...
that is, only say "Missing "]"" if there's no ']' anywhere, and
otherwise just say the dimensions are wrong. This could be fooled
by a ']' that's part of some string in the data, but even then the
errdetail isn't wrong.

Or we could just say "Array dimensions have invalid syntax."
unconditionally.

please check attached.

Meh. This just replaces one possibly-off-point error message
with a different one that will be off-point in a different set
of cases. I think we should simply reduce the specificity of
the message.

After playing around, I agree with you that reducing the specificity of
the message here is better.

BTW, I think we have pretty much the same issue with respect
to the check for "=" later on. For instance, if someone
is confused about whether to insert commas:

=# select '[1:1],[1:2]={{3,4}}'::int[];
ERROR: malformed array literal: "[1:1],[1:2]={{3,4}}"
LINE 1: select '[1:1],[1:2]={{3,4}}'::int[];
^
DETAIL: Missing "=" after array dimensions.

Here again, the problem is not a missing "=", it's invalid
syntax somewhere before that.

Another thing we could consider doing here (and similarly
for your original case) is

DETAIL: Expected "=" not "," after array dimensions.

The advantage of this is that it provides a little more
clarity as to exactly where things went wrong.

this looks good to me.

I also found other issues, like.
select '[-2:2147483644]={3}'::int[];
ERROR: malformed array literal: "[-2:2147483644]={3}"
LINE 1: select '[-2:2147483644]={3}'::int[];
^
DETAIL: Specified array dimensions do not match array contents.

select '[-2:2147483645]={3}'::int[];
ERROR: array size exceeds the maximum allowed (134217727)
LINE 1: select '[-2:2147483645]={3}'::int[];
^

Both cases exceed the maximum allowed (134217727), but error is different.
also
"ERROR: array size exceeds the maximum allowed (134217727)"
is weird. we are still validating the dimension info function, we
don't even know the actual size of the array.
maybe
"ERROR: array dimensions specified array size exceeds the maximum
allowed (134217727)"

This customized dimension info is kind of obscure, even the
documentation of it is buried
in the third paragraph of
https://www.postgresql.org/docs/current/arrays.html#ARRAYS-IO

to address David G. Johnston concern,
in ReadArrayDimensions, some error message can unconditionally mention
errhint like:
errhint("array dimension template is \"[lower_bound:upper_bound]\", we
can optional have 6 of this"

#9David G. Johnston
david.g.johnston@gmail.com
In reply to: jian he (#8)
Re: array_in sub function ReadArrayDimensions error message

On Tuesday, July 9, 2024, jian he <jian.universality@gmail.com> wrote:

I also found other issues, like.
select '[-2:2147483644]={3}'::int[];
ERROR: malformed array literal: "[-2:2147483644]={3}"
LINE 1: select '[-2:2147483644]={3}'::int[];
^
DETAIL: Specified array dimensions do not match array contents.

select '[-2:2147483645]={3}'::int[];
ERROR: array size exceeds the maximum allowed (134217727)
LINE 1: select '[-2:2147483645]={3}'::int[];
^

Both cases exceed the maximum allowed (134217727), but error is different.
also

They do not both exceed (i.e., strictly greater than) 134217727 - the first
case equals it which is why it gets past the dimension specification parser.

"ERROR: array size exceeds the maximum allowed (134217727)"

is weird. we are still validating the dimension info function, we
don't even know the actual size of the array.

maybe

"ERROR: array dimensions specified array size exceeds the maximum
allowed (134217727)"

How about:

“array dimension %d declared size exceeds the maximum allowed %d”, ndim,
(int) MaxArraySize

But while that message fits the code aren’t we supposed to be checking,
after processing all dimensions, whether the combined number of cells is
greater than MaxArraySize? Obviously if any one dimension is the whole
thing will be, so this specific check and error is still useful.

to address David G. Johnston concern,

in ReadArrayDimensions, some error message can unconditionally mention
errhint like:
errhint("array dimension template is \"[lower_bound:upper_bound]\", we
can optional have 6 of this"

I suppose, but if they are writing json array syntax that hint is going to
mean little to them. Pointing out that their use of [] brackets is wrong
and that {} should be used seems more helpful. The extent we need to
consider people writing literal dimensions to set their lower bound to
something other than one seems close to none and not needing a hint, IMO.

That said, it isn’t making it back to us if our users are actually having
this confusion and would benefit meaningfully from such a hint.

David J.

#10David G. Johnston
david.g.johnston@gmail.com
In reply to: David G. Johnston (#9)
Re: array_in sub function ReadArrayDimensions error message

On Tue, Jul 9, 2024 at 10:31 PM David G. Johnston <
david.g.johnston@gmail.com> wrote:

That said, it isn’t making it back to us if our users are actually having
this confusion and would benefit meaningfully from such a hint.

None of us ever put forth an actual patch for this so it seems to have been
forgotten about. It seemed we agree some change is needed here, just not
exactly what.

I was thinking of adding a tracking entry in Drafts, even without the
patch, but then realized I didn't start this thread. So I decided to jump
ping it instead.

David J.