Allow to_date() and to_timestamp() to accept localized names

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Juan José Santamaría Flecha (#2)

Re: Allow to_date() and to_timestamp() to accept localized names

On 2019-Aug-22, Juan Josï¿½ Santamarï¿½a Flecha wrote:

On Sun, Aug 18, 2019 at 10:42 AM Juan Josï¿½ Santamarï¿½a Flecha
<juanjo.santamaria@gmail.com> wrote:

Going through the current items in the wiki's todo list, I have been
looking into: "Allow to_date () and to_timestamp () to accept
localized month names".

I have gone through a second take on this, trying to give it a better
shape but it would surely benefit from some review, so I will open an
item in the commitfest.

I'm confused why we acquire the MONTH_DIM / etc definitions. Can't we
just use lengthof() of the corresponding array? AFAICS it should work
just as well.

I wonder if the "compare first char" thing (seq_search_localized) really
works when there are multibyte chars in the day/month names. I think
the code compares just the first char ... but what if the original
string uses those funny Unicode non-normalized letters and the locale
translation uses normalized letters? My guess is that the first-char
comparison will fail, but surely you'll want the name to match.
(There's no month/day name in Spanish that doesn't start with an ASCII
letter, but I bet there are some in other languages.) I think the
localized comparison should avoid the first-char optimization, just
compare the whole string all the time, and avoid possible weird issues.

But then, I'm not 100% sure of this code. formatting.c is madness
distilled.

Thanks,

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

juanjo.santamaria@gmail.com

over 6 years ago

In reply to: Alvaro Herrera (#4)

Re: Allow to_date() and to_timestamp() to accept localized names

On Fri, Sep 13, 2019 at 10:31 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Thanks for taking a look at this.

I'm confused why we acquire the MONTH_DIM / etc definitions. Can't we
just use lengthof() of the corresponding array? AFAICS it should work
just as well.

It was because of the length difference between ascii-name arrays,
which were all null-ended, and localized-name arrays. The attached
version uses lengthof().

I wonder if the "compare first char" thing (seq_search_localized) really
works when there are multibyte chars in the day/month names. I think
the code compares just the first char ... but what if the original
string uses those funny Unicode non-normalized letters and the locale
translation uses normalized letters? My guess is that the first-char
comparison will fail, but surely you'll want the name to match.
(There's no month/day name in Spanish that doesn't start with an ASCII
letter, but I bet there are some in other languages.) I think the
localized comparison should avoid the first-char optimization, just
compare the whole string all the time, and avoid possible weird issues.

The localized search is reformulated in this version to deal with
multibyte normalization. A regression test for this issue is included.

Regards,

Juan José Santamaría Flecha

juanjo.santamaria@gmail.com

over 6 years ago

In reply to: Juan José Santamaría Flecha (#5)

Re: Allow to_date() and to_timestamp() to accept localized names

On Wed, Sep 18, 2019 at 11:09 AM Juan José Santamaría Flecha
<juanjo.santamaria@gmail.com> wrote:

On Fri, Sep 13, 2019 at 10:31 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Thanks for taking a look at this.

I'm confused why we acquire the MONTH_DIM / etc definitions. Can't we
just use lengthof() of the corresponding array? AFAICS it should work
just as well.

It was because of the length difference between ascii-name arrays,
which were all null-ended, and localized-name arrays. The attached
version uses lengthof().

I wonder if the "compare first char" thing (seq_search_localized) really
works when there are multibyte chars in the day/month names. I think
the code compares just the first char ... but what if the original
string uses those funny Unicode non-normalized letters and the locale
translation uses normalized letters? My guess is that the first-char
comparison will fail, but surely you'll want the name to match.
(There's no month/day name in Spanish that doesn't start with an ASCII
letter, but I bet there are some in other languages.) I think the
localized comparison should avoid the first-char optimization, just
compare the whole string all the time, and avoid possible weird issues.

The localized search is reformulated in this version to deal with
multibyte normalization. A regression test for this issue is included.

This version is updated to optimize the need for dynamic allocation.

Show quoted text

Regards,

Juan José Santamaría Flecha

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Juan José Santamaría Flecha (#6)

Re: Allow to_date() and to_timestamp() to accept localized names

This patch no longer applies. Can you please rebase?

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

juanjo.santamaria@gmail.com

over 6 years ago

In reply to: Alvaro Herrera (#7)

Re: Allow to_date() and to_timestamp() to accept localized names

On Wed, Sep 25, 2019 at 9:57 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

This patch no longer applies. Can you please rebase?

Thank you for the notification.

The patch rot after commit 1a950f3, a rebased version is attached.

Regards,

Juan José Santamaría Flecha

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 6 years ago

In reply to: Juan José Santamaría Flecha (#8)

Re: Allow to_date() and to_timestamp() to accept localized names

On Thu, Sep 26, 2019 at 08:36:25PM +0200, Juan Josï¿½ Santamarï¿½a Flecha wrote:

On Wed, Sep 25, 2019 at 9:57 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

This patch no longer applies. Can you please rebase?

Thank you for the notification.

The patch rot after commit 1a950f3, a rebased version is attached.

Thanks. I did a quick review of this patch, and I think it's almost RFC.
I only found a couple of minor issue:

- In func.sgml, it seems we've lost this bit:

<para>
<literal>TM</literal> does not include trailing blanks.
<function>to_timestamp</function> and <function>to_date</function> ignore
the <literal>TM</literal> modifier.
</para>

Does that mean the function no longer ignore the TM modifier? That
would be somewhat problematic (i.e. it might break some user code).
But looking at the code I don't see why the patch would have this
effect, so I suppose it's merely a doc bug.

- I don't think we need to include examples how to_timestmap ignores
case, I'd say just stating the fact is clear enough. But if we want to
have examples, I think we should not inline in the para but use the
established pattern:

<para>
Some examples:
<programlisting>
...
</programlisting>
</para>

which is used elsewhere in the func.sgml file.

- In formatting.c the "common/unicode_norm.h" should be right after
includes from "catalog/" to follow the alphabetic order (unless
there's a reason why that does not work).

- I rather dislike the "dim" parameter name, because I immediately think
"dimension" which makes little sense. I suggest renaming to "nitems"
or "nelements" or something like that.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#10

[1]: https://www.postgresql.org/docs/current/functions-formatting.html

juanjo.santamaria@gmail.com

over 6 years ago

In reply to: Tomas Vondra (#9)

Re: Allow to_date() and to_timestamp() to accept localized names

On Sat, Jan 11, 2020 at 5:06 PM Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

Thanks. I did a quick review of this patch, and I think it's almost RFC.

Thanks for reviewing.

- In func.sgml, it seems we've lost this bit:

<para>
<literal>TM</literal> does not include trailing blanks.
<function>to_timestamp</function> and <function>to_date</function>
ignore
the <literal>TM</literal> modifier.
</para>

Does that mean the function no longer ignore the TM modifier? That
would be somewhat problematic (i.e. it might break some user code).
But looking at the code I don't see why the patch would have this
effect, so I suppose it's merely a doc bug.

It is intentional. This patch uses the TM modifier to identify the usage of
localized names as input for to_timestamp() and to_date().

- I don't think we need to include examples how to_timestmap ignores
case, I'd say just stating the fact is clear enough. But if we want to
have examples, I think we should not inline in the para but use the
established pattern:

<para>
Some examples:
<programlisting>
...
</programlisting>
</para>

which is used elsewhere in the func.sgml file.

I was trying to match the style surrounding the usage notes for date/time
formatting [1]https://www.postgresql.org/docs/current/functions-formatting.html. Agreed that it is not worth an example on its own, so
dropped.

- In formatting.c the "common/unicode_norm.h" should be right after
includes from "catalog/" to follow the alphabetic order (unless
there's a reason why that does not work).

Fixed.

- I rather dislike the "dim" parameter name, because I immediately think
"dimension" which makes little sense. I suggest renaming to "nitems"
or "nelements" or something like that.

Agreed, using "nelements" as a better style matchup.

Please, find attached a version addressing the above mentioned.

Regards,

Juan José Santamaría Flecha

Show quoted text

#11

Artur Zakirov

zaartur@gmail.com

over 6 years ago

In reply to: Juan José Santamaría Flecha (#10)

Re: Allow to_date() and to_timestamp() to accept localized names

Hello!

On 2020/01/13 21:04, Juan José Santamaría Flecha wrote:

Please, find attached a version addressing the above mentioned.

I have some couple picky notes.

+ if (name_len != norm_len)
+ pfree(norm_name);

I'm not sure here. Is it save to assume that if it was allocated a new
norm_name name_len and norm_len always will differ?

static const char *const months_full[] = {
"January", "February", "March", "April", "May", "June", "July",
-	"August", "September", "October", "November", "December", NULL
+	"August", "September", "October", "November", "December"
};

Unfortunately I didn't see from the patch why it was necessary to remove
last null element from months_full, days_short, adbc_strings,
adbc_strings_long and other arrays. I think it is not good because
unnecessarily increases the patch and adds code like the following:

+				from_char_seq_search(&value, &s, months, localized_names,
+									 ONE_UPPER, MAX_MON_LEN, n, have_error,
+									 lengthof(months_full));

Here it passes "months" from datetime.c to from_char_seq_search() and
calculates its length using "months_full" array from formatting.c.

--
Arthur

#12

juanjo.santamaria@gmail.com

over 6 years ago

In reply to: Artur Zakirov (#11)

Re: Allow to_date() and to_timestamp() to accept localized names

On Wed, Jan 15, 2020 at 11:20 AM Arthur Zakirov <zaartur@gmail.com> wrote:

I have some couple picky notes.

Thanks for the review.

+ if (name_len != norm_len)
+ pfree(norm_name);

I'm not sure here. Is it save to assume that if it was allocated a new
norm_name name_len and norm_len always will differ?

Good call, that check can be more robust.

static const char *const months_full[] = {
"January", "February", "March", "April", "May", "June", "July",
-     "August", "September", "October", "November", "December", NULL
+     "August", "September", "October", "November", "December"
};
Unfortunately I didn't see from the patch why it was necessary to remove
last null element from months_full, days_short, adbc_strings,
adbc_strings_long and other arrays. I think it is not good because
unnecessarily increases the patch and adds code like the following:

+ from_char_seq_search(&value, &s, months,

localized_names,

+

ONE_UPPER, MAX_MON_LEN, n, have_error,

+

lengthof(months_full));

Here it passes "months" from datetime.c to from_char_seq_search() and
calculates its length using "months_full" array from formatting.c.

With the introduction of seq_search_localized() that can be avoided,
minimizing code churn.

Please, find attached a version addressing the above mentioned.

Regards,

Juan José Santamaría Flecha

Show quoted text

#13

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Juan José Santamaría Flecha (#12)

Re: Allow to_date() and to_timestamp() to accept localized names

=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= <juanjo.santamaria@gmail.com> writes:

[ 0001-Allow-localized-month-names-to_date-v6.patch ]

I took a quick look through this.

One thing I completely don't understand is why it's sufficient for
seq_search_localized to do "initcap" semantics. Shouldn't it cover
the all-upper and all-lower cases as well, as the existing seq_search
code does? (That is, why is it okay to ignore the "type" argument?)

On the other hand, you probably *should* be ignoring the "max"
argument, because AFAICS the values that are passed for that are
specific to the English ASCII spellings of the day and month names.
It seems very likely that they could be too small for some sets of
non-English names.

Related to that, the business with

+	mb_max = max * pg_encoding_max_length(encoding);
+	name_len = strlen(name);
+	name_len = name_len < mb_max ? name_len : mb_max;

seems wrong even if we assume that "max" is a sane limit on the number of
characters to consider. This coding is likely to truncate a long "name"
string in the middle of a multibyte character, leading to bad-encoding
errors from later functions.

I also am confused by the "if (mb_max > max ...)" bit. It looks to me
like that's an obscure way of writing "if (pg_encoding_max_length() > 1)",
which is a rather pointless check considering that the if() then goes on
to insist on encoding == PG_UTF8.

A bit further down, you have an "if (name_wlen > norm_wlen)"
condition that seems pretty fishy. Is it really guaranteed that
unicode_normalize_kc cannot transform the string without shortening it?
I don't see that specified in its header comment, for sure.

Also, it's purely accidental if this:

norm_name = (char *) palloc((norm_wlen + 1) * sizeof(pg_wchar));

allocates a long enough string for the conversion back to multibyte form,
because that output is not pg_wchars. I think you want something more
like "norm_wlen * MAX_MULTIBYTE_CHAR_LEN + 1". (I wonder whether we
need to be worrying about integer overflow in any of this.)

It seems like you could eliminate a lot of messiness by extending
localized_abbrev_days[] and related arrays by one element that's
always NULL. That would waste, hmm, four 8-byte pointers on typical
machines --- but you're eating a lot more than 32 bytes of code to
pass around the array lengths, plus it's really ugly that the plain
and localized arrays are handled differently.

I don't think the way you're managing the "localized_names" variable
in DCH_from_char is very sensible. The reader has to do a lot of
reverse engineering just to discover that the value is constant NULL
in most references, something that you've not helped by declaring
it outside the loop rather than inside. I think you could drop
the variable and write constant NULL in most places, with something
like
S_TM(n->suffix) ? localized_full_days : NULL
in those from_char_seq_search calls where it's relevant.

In general I'm displeased with the lack of attention to comments.
Notably, you added arguments to from_char_seq_search without updating
its header comment ... not that that comment is a great API spec, but
at the least you shouldn't be making it worse. I think the bug I
complained of above is directly traceable to the lack of a clear spec
here for what "max" means, so failure to keep these comments up to
speed does have demonstrable bad consequences.

regards, tom lane

#14

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Tom Lane (#13)

Re: Allow to_date() and to_timestamp() to accept localized names

I wrote:

One thing I completely don't understand is why it's sufficient for
seq_search_localized to do "initcap" semantics. Shouldn't it cover
the all-upper and all-lower cases as well, as the existing seq_search
code does? (That is, why is it okay to ignore the "type" argument?)

I took a closer look at this and decided you were probably doing that
just because the seq_search code uses initcap-style case folding to
match month and day names, relying on the assumption that the constant
arrays it's passed are cased that way. But we shouldn't (and the patch
doesn't) assume that the localized names we'll get from pg_locale.c are
cased that way.

However ... it seems to me that the way seq_search is defined is
plenty bogus. In the first place, it scribbles on its source string,
which is surely not okay. You can see that in error messages that
are thrown on match failures:

regression=# select to_date('3 JANUZRY 1999', 'DD MONTH YYYY');
ERROR: invalid value "JanuzRY 1" for "MONTH"
DETAIL: The given value did not match any of the allowed values for this field.

Now, maybe somebody out there thinks this is a cute way of highlighting
how much of the string we examined, but it looks more like a bug from
where I sit. It's mere luck that what's being clobbered is a local
string copy created by do_to_timestamp(), and not the original input
data passed to to_date().

In the second place, there's really no need at all for seq_search to
rely on any assumptions about the case state of the array it's
searching. pg_toupper() is pretty cheap already, and we can make it
guaranteed cheap if we use pg_ascii_toupper() instead. So I think
we ought to just remove the "type" argument altogether and have
seq_search dynamically convert all the strings it examines to upper
case (or lower case would work as well, at least for English).

On the other hand, you probably *should* be ignoring the "max"
argument, because AFAICS the values that are passed for that are
specific to the English ASCII spellings of the day and month names.
It seems very likely that they could be too small for some sets of
non-English names.

Closer examination shows that the "max" argument is pretty bogus as
well. It doesn't do anything except confuse the reader, because there
are no cases where the value passed is less than the maximum array entry
length, so it never acts to change seq_search's behavior. So we should
just drop that behavior from seq_search, too, and redefine "max" as
having no purpose except to specify how much of the string to show in
error messages. There's still a question of what that should be for
non-English cases, but at least we now have a clear idea of what we
need the value to do.

I also noted while I was looking at it that from_char_seq_search()
would truncate at "max" bytes even when dealing with multibyte input.
That has a clear risk of generating an invalidly-encoded error message.

The 0001 patch attached cleans up all the above complaints. I felt
that given the business about scribbling on strings we shouldn't,
it would also be wise to const-ify the string pointers if possible.
That seems not too painful, per 0002 attached.

I'm tempted to go a bit further than 0001 does, and remove the 'max'
argument from from_char_seq_search() altogether. Since we only need
'max' in error cases, which don't need to be super-quick, we could drop
the requirement for the caller to specify that and instead compute it
when we do need it as the largest of the constant array's string
lengths. That'd carry over into the localized-names case in a
reasonably straightforward way (though we might need to count characters
not bytes for that to work nicely).

Because of the risk of incorrectly-encoded error messages, I'm rather
tempted to claim that these patches (or at least 0001) are a bug fix
and should be back-patched. In any case I think we should apply this
to HEAD and then rebase the localized-names patch over it. It makes
a whole lot more sense, IMO, for the localized-names comparison logic
to match what this is doing than what seq_search does today.

Comments?

regards, tom lane

#15

Artur Zakirov

zaartur@gmail.com

over 6 years ago

In reply to: Tom Lane (#14)

Re: Allow to_date() and to_timestamp() to accept localized names

On 2020/01/23 7:11, Tom Lane wrote:

Closer examination shows that the "max" argument is pretty bogus as
well. It doesn't do anything except confuse the reader, because there
are no cases where the value passed is less than the maximum array entry
length, so it never acts to change seq_search's behavior. So we should
just drop that behavior from seq_search, too, and redefine "max" as
having no purpose except to specify how much of the string to show in
error messages. There's still a question of what that should be for
non-English cases, but at least we now have a clear idea of what we
need the value to do.

Shouldn't we just show all remaining string instead of truncating it?
For example I get the following output:

=# select to_timestamp('3 ЯНВАРЯ 1999', 'DD MONTH YYYY');
ERROR: invalid value "ЯНВА" for "MONTH"
DETAIL: The given value did not match any of the allowed values for
this field.

This behavior is reproduced without the patch though (on Postgres 12).

And the English case might confuse too I think:

=# select to_date('3 JANUZRY 1999', 'DD MONTH YYYY');
ERROR: invalid value "JANUZRY 1" for "MONTH"
DETAIL: The given value did not match any of the allowed values for
this field.

Users might think what means "1" in "JANUZRY 1" string.

I attached the draft patch, which shows all remaining string. So the
query above will show the following:

=# select to_timestamp('3 ЯНВАРЯ 1999', 'DD MONTH YYYY');
ERROR: invalid value "ЯНВАРЯ 1999" for "MONTH"
DETAIL: The remaining value did not match any of the allowed values for
this field.

The 0001 patch attached cleans up all the above complaints. I felt
that given the business about scribbling on strings we shouldn't,
it would also be wise to const-ify the string pointers if possible.
That seems not too painful, per 0002 attached.

+1 on the patch.

--
Arthur

#16

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Artur Zakirov (#15)

Re: Allow to_date() and to_timestamp() to accept localized names

Arthur Zakirov <zaartur@gmail.com> writes:

On 2020/01/23 7:11, Tom Lane wrote:

Closer examination shows that the "max" argument is pretty bogus as
well. It doesn't do anything except confuse the reader, because there
are no cases where the value passed is less than the maximum array entry
length, so it never acts to change seq_search's behavior. So we should
just drop that behavior from seq_search, too, and redefine "max" as
having no purpose except to specify how much of the string to show in
error messages. There's still a question of what that should be for
non-English cases, but at least we now have a clear idea of what we
need the value to do.

Shouldn't we just show all remaining string instead of truncating it?

That would avoid a bunch of arbitrary decisions, for sure.
Anybody have an objection?

regards, tom lane

#17

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Tom Lane (#16)

Re: Allow to_date() and to_timestamp() to accept localized names

On 2020-Jan-22, Tom Lane wrote:

Arthur Zakirov <zaartur@gmail.com> writes:

On 2020/01/23 7:11, Tom Lane wrote:

Closer examination shows that the "max" argument is pretty bogus as
well. It doesn't do anything except confuse the reader, because there
are no cases where the value passed is less than the maximum array entry
length, so it never acts to change seq_search's behavior. So we should
just drop that behavior from seq_search, too, and redefine "max" as
having no purpose except to specify how much of the string to show in
error messages. There's still a question of what that should be for
non-English cases, but at least we now have a clear idea of what we
need the value to do.

Shouldn't we just show all remaining string instead of truncating it?

That would avoid a bunch of arbitrary decisions, for sure.
Anybody have an objection?

I think it would be clearer to search for whitespace starting at the
start of the bogus token and stop there. It might not be perfect,
particularly if any language has whitespace in a month etc name (I don't
know any example but I guess it's not impossible for it to exist;
Portuguese weekday names maybe?), but it seems better than either of the
behaviors shown above.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#18

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Alvaro Herrera (#17)

Re: Allow to_date() and to_timestamp() to accept localized names

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2020-Jan-22, Tom Lane wrote:

Arthur Zakirov <zaartur@gmail.com> writes:

Shouldn't we just show all remaining string instead of truncating it?

That would avoid a bunch of arbitrary decisions, for sure.
Anybody have an objection?

I think it would be clearer to search for whitespace starting at the
start of the bogus token and stop there. It might not be perfect,
particularly if any language has whitespace in a month etc name (I don't
know any example but I guess it's not impossible for it to exist;
Portuguese weekday names maybe?), but it seems better than either of the
behaviors shown above.

I'm okay with that. It won't work so well for cases like
"1-Januzry-1999", but it's still better than what happens now:

regression=# select to_date('1-Januzry-1999', 'DD MONTH YYYY');
ERROR: invalid value "Januzry-1" for "MONTH"

That particular case could be improved by stopping at a dash ... but
since this code is also used to match strings like "A.M.", we can't
just exclude punctuation in general. Breaking at whitespace seems
like a reasonable compromise.

regards, tom lane

#19

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Tom Lane (#18)

Re: Allow to_date() and to_timestamp() to accept localized names

On 2020-Jan-23, Tom Lane wrote:

That particular case could be improved by stopping at a dash ... but
since this code is also used to match strings like "A.M.", we can't
just exclude punctuation in general. Breaking at whitespace seems
like a reasonable compromise.

Yeah, and there are cases where dashes are used in names -- browsing
through glibc for example I quickly found Akan, for which the month names are:

mon "Sanda-<U0186>p<U025B>p<U0254>n";/
"Kwakwar-<U0186>gyefuo";/
"Eb<U0254>w-<U0186>benem";/

and so on. Even whitespace is problematic for some languages, such as Afan,

mon "Qunxa Garablu";/
"Naharsi Kudo";/
"Ciggilta Kudo";/
(etc) but I think whitespace-splitting is going to be more comprehensible
in the vast majority of cases, even if not perfect.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#20

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Alvaro Herrera (#19)

Re: Allow to_date() and to_timestamp() to accept localized names

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Even whitespace is problematic for some languages, such as Afan,

mon "Qunxa Garablu";/
"Naharsi Kudo";/
"Ciggilta Kudo";/

(etc) but I think whitespace-splitting is going to be more comprehensible
in the vast majority of cases, even if not perfect.

Interesting. Given that to_date et al are often willing to ignore
whitespace in input, I wonder whether we won't have some other issues
with names like that --- not so much that basic cases wouldn't work,
as that people might reasonably expect that, say, we'd be flexible
about the amount of whitespace. Seems like a problem for another day
though.

In the meantime, I agree that "truncate at whitespace" is a better
heuristic for the error messages than what we've got. I'll go make it so.

regards, tom lane

#21

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Alvaro Herrera (#19)

#22

juanjo.santamaria@gmail.com

over 6 years ago

In reply to: Tom Lane (#21)

#23

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Juan José Santamaría Flecha (#22)

#24

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Juan José Santamaría Flecha (#22)

#25

peter_e@gmx.net

over 6 years ago

In reply to: Alvaro Herrera (#23)

#26

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Peter Eisentraut (#25)

#27

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Alvaro Herrera (#26)

#28

peter_e@gmx.net

over 6 years ago

In reply to: Tom Lane (#27)

#29

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Peter Eisentraut (#28)

#30

juanjo.santamaria@gmail.com

over 6 years ago

In reply to: Tom Lane (#29)

#31

peter_e@gmx.net

over 6 years ago

In reply to: Juan José Santamaría Flecha (#30)

#32

peter_e@gmx.net

over 6 years ago

In reply to: Peter Eisentraut (#31)

#33

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Peter Eisentraut (#32)

#34

mark.dilger@enterprisedb.com

over 6 years ago

In reply to: Tom Lane (#33)

#35

peter_e@gmx.net

over 6 years ago

In reply to: Mark Dilger (#34)

#36

peter_e@gmx.net

over 6 years ago

In reply to: Tom Lane (#33)

#37

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Peter Eisentraut (#35)

#38

mark.dilger@enterprisedb.com

over 6 years ago

In reply to: Tom Lane (#37)

#39

juanjo.santamaria@gmail.com

over 6 years ago

In reply to: Mark Dilger (#38)

#40

mark.dilger@enterprisedb.com

over 6 years ago

In reply to: Juan José Santamaría Flecha (#39)

#41

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Peter Eisentraut (#35)

#42

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Mark Dilger (#38)

#43

mark.dilger@enterprisedb.com

over 6 years ago

In reply to: Tom Lane (#42)

#44

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Mark Dilger (#43)

#45

peter_e@gmx.net

over 6 years ago

In reply to: Juan José Santamaría Flecha (#39)

#46

juanjo.santamaria@gmail.com

over 6 years ago

In reply to: Alvaro Herrera (#41)

#47

juanjo.santamaria@gmail.com

over 6 years ago

In reply to: Peter Eisentraut (#45)

#48

tgl@sss.pgh.pa.us

about 6 years ago

In reply to: Juan José Santamaría Flecha (#47)

#49

tgl@sss.pgh.pa.us

about 6 years ago

In reply to: Tom Lane (#48)

#50

juanjo.santamaria@gmail.com

about 6 years ago

In reply to: Tom Lane (#49)

#51

tgl@sss.pgh.pa.us

about 6 years ago

In reply to: Juan José Santamaría Flecha (#50)

#52

jtc331@gmail.com

about 6 years ago

In reply to: Tom Lane (#51)

#53

tgl@sss.pgh.pa.us

about 6 years ago

In reply to: James Coleman (#52)

#54

jtc331@gmail.com

about 6 years ago

In reply to: Tom Lane (#53)

#55

tgl@sss.pgh.pa.us

about 6 years ago

In reply to: James Coleman (#54)

#56

juanjo.santamaria@gmail.com

about 6 years ago

In reply to: Tom Lane (#55)

#57

jtc331@gmail.com

about 6 years ago

In reply to: Tom Lane (#55)

#58

jtc331@gmail.com

about 6 years ago

In reply to: Juan José Santamaría Flecha (#56)

#59

jtc331@gmail.com

about 6 years ago

In reply to: James Coleman (#58)

#60

tgl@sss.pgh.pa.us

about 6 years ago

In reply to: James Coleman (#59)

#61

tgl@sss.pgh.pa.us

about 6 years ago

In reply to: Tom Lane (#60)

#62

jtc331@gmail.com

about 6 years ago

In reply to: Tom Lane (#61)

#63

tgl@sss.pgh.pa.us

about 6 years ago

In reply to: James Coleman (#62)

#64