Support tab completion for upper character inputs in psql
Hi Hackers,
When using psql I found there's no tab completion for upper character inputs. It's really inconvenient sometimes so I try to fix this problem in the attached patch.
Here is the examples to show what this patch can do.
Action:
1. connect the db using psql
2. input SQL command
3. enter TAB key(twice at the very first time)
Results:
[master]
postgres=# set a
all allow_system_table_mods application_name array_nulls
postgres=# set A
postgres=# set A
[patched]
postgres=# set a
all allow_system_table_mods application_name array_nulls
postgres=# set A
ALL ALLOW_SYSTEM_TABLE_MODS APPLICATION_NAME ARRAY_NULLS
postgres=# set A
Please take a check at this patch. Any comment is welcome.
Regards,
Tang
Attachments:
0001-Support-tab-completion-for-upper-character-inputs-in.patchapplication/octet-stream; name=0001-Support-tab-completion-for-upper-character-inputs-in.patchDownload+37-8
"Tang, Haiying" <tanghy.fnst@cn.fujitsu.com> writes:
When using psql I found there's no tab completion for upper character inputs. It's really inconvenient sometimes so I try to fix this problem in the attached patch.
This looks like you're trying to force case-insensitive behavior
whether that is appropriate or not. Does not sound like a good
idea.
regards, tom lane
At Sun, 07 Feb 2021 13:55:00 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in
"Tang, Haiying" <tanghy.fnst@cn.fujitsu.com> writes:
When using psql I found there's no tab completion for upper character inputs. It's really inconvenient sometimes so I try to fix this problem in the attached patch.
This looks like you're trying to force case-insensitive behavior
whether that is appropriate or not. Does not sound like a good
idea.
Agreed. However I'm not sure what the OP exactly wants, \set behaves
in a different but similar way.
=# \set c[tab]
=# \set COMP_KEYWORD_CASE _
However set doesn't. If it is what is wanted, the following change on
Query_for_list_of_set_vars works (only for the case of SET/RESET
commands).
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5f0e775fd3..5c2a263785 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -725,7 +725,8 @@ static const SchemaQuery Query_for_list_of_statistics = {
" UNION ALL SELECT 'role' "\
" UNION ALL SELECT 'tablespace' "\
" UNION ALL SELECT 'all') ss "\
-" WHERE substring(name,1,%d)='%s'"
+" WHERE substring(name,1,%1$d)='%2$s' "\
+" OR pg_catalog.lower(substring(name,1,%1$d))=pg_catalog.lower('%2$s')"
#define Query_for_list_of_show_vars \
"SELECT name FROM "\
=# set AP[tab]
=# set application_name _
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
At Sun, 07 Feb 2021 13:55:00 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in
This looks like you're trying to force case-insensitive behavior
whether that is appropriate or not. Does not sound like a good idea.
Thanks for your reply.
I raise this issue because I thought all SQL command should be case-insensitive.
And the set/reset/show commands work well no matter the input configuration parameter is in upper or in lower case.
My modification is not good enough, but I really think it's more convenient if we can support the tab-completion for upper character inputs.
=# set APPLICATION_NAME to test;
SET
=# show APPLICATION_name;
application_name
------------------
test
(1 row)
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Sent: Monday, February 8, 2021 5:02 PM
However set doesn't. If it is what is wanted, the following change on Query_for_list_of_set_vars works (only for the case of SET/RESET commands).
Thanks for your update. I applied your patch, it works well for SET/RESET commands.
I added the same modification to SHOW command. The new patch(V2) can support tab completion for upper character inputs in psql for SET/RESET/SHOW commands.
Regards,
Tang
Attachments:
V2-0001-Support-tab-completion-for-upper-character-inputs-in.patchapplication/octet-stream; name=V2-0001-Support-tab-completion-for-upper-character-inputs-in.patchDownload+4-3
At Sun, 07 Feb 2021 13:55:00 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in
This looks like you're trying to force case-insensitive behavior
whether that is appropriate or not. Does not sound like a good idea.
I'm still confused about the APPROPRIATE behavior of tab completion.
It seems ALTER table/tablespace <name> SET/RESET is already case-insensitive.
For example
# alter tablespace dbspace set(e[tab]
# alter tablespace dbspace set(effective_io_concurrency
# alter tablespace dbspace set(E[tab]
# alter tablespace dbspace set(EFFECTIVE_IO_CONCURRENCY
The above behavior is exactly the same as what the patch(attached in the following message) did for SET/RESET etc.
/messages/by-id/a63cbd45e3884cf9b3961c2a6a95dcb7@G08CNEXMBPEKD05.g08.fujitsu.local
If anyone can share me some cases which show inappropriate scenarios of forcing case-insensitive inputs in psql.
I'd be grateful for that.
Regards,
Tang
On 09.02.21 15:48, Tang, Haiying wrote:
I'm still confused about the APPROPRIATE behavior of tab completion.
It seems ALTER table/tablespace <name> SET/RESET is already case-insensitive.For example
# alter tablespace dbspace set(e[tab]
# alter tablespace dbspace set(effective_io_concurrency# alter tablespace dbspace set(E[tab]
# alter tablespace dbspace set(EFFECTIVE_IO_CONCURRENCY
This case completes with a hardcoded list, which is done
case-insensitively by default. The cases that complete with a query
result are not case insensitive right now. This affects things like
UPDATE T<tab>
as well. I think your first patch was basically right. But we need to
understand that this affects all completions with query results, not
just the one you wanted to fix. So you should analyze all the callers
and explain why the proposed change is appropriate.
On Tuesday, March 16, 2021 5:20 AM, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
The cases that complete with a query
result are not case insensitive right now. This affects things likeUPDATE T<tab>
as well. I think your first patch was basically right. But we need to
understand that this affects all completions with query results, not
just the one you wanted to fix. So you should analyze all the callers
and explain why the proposed change is appropriate.
Thanks for your review and suggestion. Please find attached patch V3 which was based on the first patch[1]/messages/by-id/a63cbd45e3884cf9b3961c2a6a95dcb7@G08CNEXMBPEKD05.g08.fujitsu.local.
Difference from the first patch is:
Add tab completion support for all query results in psql.
complete_from_query
+complete_from_versioned_query
+complete_from_schema_query
+complete_from_versioned_schema_query
[1]: /messages/by-id/a63cbd45e3884cf9b3961c2a6a95dcb7@G08CNEXMBPEKD05.g08.fujitsu.local
The modification to support case insensitive matching in " _complete_from_query" is based on "complete_from_const and "complete_from_list" .
Please let me know if you find anything insufficient.
Regards,
Tang
Attachments:
V3-0001-Support-tab-completion-with-a-query-result-for-upper.patchapplication/octet-stream; name=V3-0001-Support-tab-completion-with-a-query-result-for-upper.patchDownload+40-8
Hi Tang,
Thanks a lot for the patch.
I did a quick test based on the latest patch V3 on latest master branch
"commit 4753ef37e0eda4ba0af614022d18fcbc5a946cc9".
Case 1: before patch
1 postgres=# set a
2 all allow_system_table_mods
application_name array_nulls
3 postgres=# set A
4
5 postgres=# create TABLE tbl (data text);
6 CREATE TABLE
7 postgres=# update tbl SET DATA =
8
9 postgres=# update T
10
11 postgres=#
Case 2: after patched
1 postgres=# set a
2 all allow_system_table_mods
application_name array_nulls
3 postgres=# set A
4 ALL ALLOW_SYSTEM_TABLE_MODS
APPLICATION_NAME ARRAY_NULLS
5 postgres=# create TABLE tbl (data text);
6 CREATE TABLE
7
8 postgres=# update tbl SET DATA =
9
10 postgres=# update TBL SET
11
12 postgres=#
So, as you can see the difference is between line 8 and 10 in case 2. It
looks like the lowercase can auto complete more than the uppercase;
secondly, if you can add some test cases, it would be great.
Best regards,
David
On 2021-03-22 5:41 a.m., tanghy.fnst@fujitsu.com wrote:
On Tuesday, March 16, 2021 5:20 AM, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
The cases that complete with a query
result are not case insensitive right now. This affects things likeUPDATE T<tab>
as well. I think your first patch was basically right. But we need to
understand that this affects all completions with query results, not
just the one you wanted to fix. So you should analyze all the callers
and explain why the proposed change is appropriate.Thanks for your review and suggestion. Please find attached patch V3 which was based on the first patch[1].
Difference from the first patch is:Add tab completion support for all query results in psql. complete_from_query +complete_from_versioned_query +complete_from_schema_query +complete_from_versioned_schema_query[1] /messages/by-id/a63cbd45e3884cf9b3961c2a6a95dcb7@G08CNEXMBPEKD05.g08.fujitsu.local
The modification to support case insensitive matching in " _complete_from_query" is based on "complete_from_const and "complete_from_list" .
Please let me know if you find anything insufficient.Regards,
Tang
--
David
Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca
On Wednesday, March 31, 2021 4:05 AM, David Zhang <david.zhang@highgo.ca> wrote
8 postgres=# update tbl SET DATA =
9
10 postgres=# update TBL SET
11
12 postgres=#So, as you can see the difference is between line 8 and 10 in case 2. It
looks like the lowercase can auto complete more than the uppercase;
secondly, if you can add some test cases, it would be great.
Thanks for your test. I fix the bug and add some tests for it.
Please find attached the latest patch V4.
Differences from v3 are:
* fix an issue reported by Zhang [1]/messages/by-id/3140db2a-9808-c470-7e60-de39c431b3ab@highgo.ca where a scenario was found which still wasn't able to realize tap completion in query.
* add some tap tests.
[1]: /messages/by-id/3140db2a-9808-c470-7e60-de39c431b3ab@highgo.ca
Regards,
Tang
Attachments:
V4-0001-Support-tab-completion-with-a-query-result-for-upper.patchapplication/octet-stream; name=V4-0001-Support-tab-completion-with-a-query-result-for-upper.patchDownload+52-10
On 01.04.21 11:40, tanghy.fnst@fujitsu.com wrote:
On Wednesday, March 31, 2021 4:05 AM, David Zhang <david.zhang@highgo.ca> wrote
8 postgres=# update tbl SET DATA =
9
10 postgres=# update TBL SET
11
12 postgres=#So, as you can see the difference is between line 8 and 10 in case 2. It
looks like the lowercase can auto complete more than the uppercase;
secondly, if you can add some test cases, it would be great.Thanks for your test. I fix the bug and add some tests for it.
Please find attached the latest patch V4.Differences from v3 are:
* fix an issue reported by Zhang [1] where a scenario was found which still wasn't able to realize tap completion in query.
* add some tap tests.
Seeing the tests you provided, it's pretty obvious that the current
behavior is insufficient. I think we could probably think of a few more
tests, for example exercising the "If case insensitive matching was
requested initially, adjust the case according to setting." case, or
something with quoted identifiers. I'll push this to the next commit
fest for now. I encourage you to keep working on it.
On Thursday, April 8, 2021 4:14 PM, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote
Seeing the tests you provided, it's pretty obvious that the current
behavior is insufficient. I think we could probably think of a few more
tests, for example exercising the "If case insensitive matching was
requested initially, adjust the case according to setting." case, or
something with quoted identifiers.
Thanks for your review and suggestions on my patch.
I've added more tests in the latest patch V5, the added tests helped me find some bugs in my patch and I fixed them.
Now the patch can support not only the SET/SHOW [PARAMETER] but also UPDATE ["aTable"|ATABLE], also UPDATE atable SET ["aColumn"|ACOLUMN].
I really hope someone can have more tests suggestions on my patch or kindly do some tests on my patch and share me if any bugs happened.
Differences from V4 are:
* fix some bugs related to quoted identifiers.
* add some tap tests.
Regards,
Tang
Attachments:
V5-0001-Support-tab-completion-with-a-query-result-for-upper.patchapplication/octet-stream; name=V5-0001-Support-tab-completion-with-a-query-result-for-upper.patchDownload+94-8
On Wed, Apr 14, 2021 at 11:34 PM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:
On Thursday, April 8, 2021 4:14 PM, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote
Seeing the tests you provided, it's pretty obvious that the current
behavior is insufficient. I think we could probably think of a few more
tests, for example exercising the "If case insensitive matching was
requested initially, adjust the case according to setting." case, or
something with quoted identifiers.Thanks for your review and suggestions on my patch.
I've added more tests in the latest patch V5, the added tests helped me find some bugs in my patch and I fixed them.
Now the patch can support not only the SET/SHOW [PARAMETER] but also UPDATE ["aTable"|ATABLE], also UPDATE atable SET ["aColumn"|ACOLUMN].I really hope someone can have more tests suggestions on my patch or kindly do some tests on my patch and share me if any bugs happened.
Differences from V4 are:
* fix some bugs related to quoted identifiers.
* add some tap tests.
I tried playing a bit with your psql patch V5 and I did not find any
problems - it seemed to work as advertised.
Below are a few code review comments.
====
1. Patch applies with whitespace warnings.
[postgres@CentOS7-x64 oss_postgres_2PC]$ git apply
../patches_misc/V5-0001-Support-tab-completion-with-a-query-result-for-upper.patch
../patches_misc/V5-0001-Support-tab-completion-with-a-query-result-for-upper.patch:130:
trailing whitespace.
}
warning: 1 line adds whitespace errors.
====
2. Unrelated "code tidy" fixes maybe should be another patch?
I noticed there are a couple of "code tidy" fixes combined with this
patch - e.g. passing fixes to some code comments and blank lines etc
(see below). Although they are all good improvements, they maybe don't
really have anything to do with your feature/bugfix so I am not sure
if they should be included here. Maybe post a separate patch for these
ones?
@@ -1028,7 +1032,7 @@ static const VersionedQuery
Query_for_list_of_subscriptions[] = {
};
/*
- * This is a list of all "things" in Pgsql, which can show up after CREATE or
+ * This is a list of all "things" in pgsql, which can show up after CREATE or
* DROP; and there is also a query to get a list of them.
*/
@@ -4607,7 +4642,6 @@ complete_from_list(const char *text, int state)
if (completion_case_sensitive)
return pg_strdup(item);
else
-
/*
* If case insensitive matching was requested initially,
* adjust the case according to setting.
@@ -4660,7 +4694,6 @@ complete_from_const(const char *text, int state)
if (completion_case_sensitive)
return pg_strdup(completion_charp);
else
-
/*
* If case insensitive matching was requested initially, adjust
* the case according to setting.
====
3. Unnecessary NULL check?
@@ -4420,16 +4425,37 @@ _complete_from_query(const char *simple_query,
PQclear(result);
result = NULL;
- /* Set up suitably-escaped copies of textual inputs */
+ /* Set up suitably-escaped copies of textual inputs,
+ * then change the textual inputs to lower case.
+ */
e_text = escape_string(text);
+ if(e_text != NULL)
+ {
+ if(e_text[0] == '"')
+ completion_case_sensitive = true;
+ else
+ e_text = pg_string_tolower(e_text);
+ }
Perhaps that check "if(e_text != NULL)" is unnecessary. That function
hardly looks capable of returning a NULL, and other callers are not
checking the return like this.
====
4. Memory not freed in multiple places?
@@ -4420,16 +4425,37 @@ _complete_from_query(const char *simple_query,
PQclear(result);
result = NULL;
- /* Set up suitably-escaped copies of textual inputs */
+ /* Set up suitably-escaped copies of textual inputs,
+ * then change the textual inputs to lower case.
+ */
e_text = escape_string(text);
+ if(e_text != NULL)
+ {
+ if(e_text[0] == '"')
+ completion_case_sensitive = true;
+ else
+ e_text = pg_string_tolower(e_text);
+ }
if (completion_info_charp)
+ {
e_info_charp = escape_string(completion_info_charp);
+ if(e_info_charp[0] == '"')
+ completion_case_sensitive = true;
+ else
+ e_info_charp = pg_string_tolower(e_info_charp);
+ }
else
e_info_charp = NULL;
if (completion_info_charp2)
+ {
e_info_charp2 = escape_string(completion_info_charp2);
+ if(e_info_charp2[0] == '"')
+ completion_case_sensitive = true;
+ else
+ e_info_charp2 = pg_string_tolower(e_info_charp2);
+ }
else
e_info_charp2 = NULL;
The function escape_string has a comment saying "The returned value
has to be freed." but in the above code you are overwriting the
escape_string result with the strdup'ed pg_string_tolower but without
free-ing the original e_text/e_info_charp/e_info_charp2.
======
5. strncmp replacement?
@@ -4464,7 +4490,7 @@ _complete_from_query(const char *simple_query,
*/
if (strcmp(schema_query->catname,
"pg_catalog.pg_class c") == 0 &&
- strncmp(text, "pg_", 3) != 0)
+ strncmp(pg_string_tolower(text), "pg_", 3) != 0)
{
appendPQExpBufferStr(&query_buffer,
" AND c.relnamespace <> (SELECT oid FROM"
Why not use strnicmp for case insensitive compare here instead of
strdup'ing another string (and not freeing it)?
Or maybe use pg_strncasecmp.
======
6. byte_length == 0?
@@ -4556,7 +4582,16 @@ _complete_from_query(const char *simple_query,
while (list_index < PQntuples(result) &&
(item = PQgetvalue(result, list_index++, 0)))
if (pg_strncasecmp(text, item, byte_length) == 0)
- return pg_strdup(item);
+ {
+ if (byte_length == 0 || completion_case_sensitive)
+ return pg_strdup(item);
+ else
+ /*
+ * If case insensitive matching was requested initially,
+ * adjust the case according to setting.
+ */
+ return pg_strdup_keyword_case(item, text);
+ }
}
The byte_length was not being checked before, so why is the check needed now?
======
7. test typo "ralation"
+# check query command completion for upper character ralation name
+check_completion("update TAB1 SET \t", qr/update TAB1 SET \af/,
"complete column name for TAB1");
======
8. test typo "case-insensitiveq"
+# check schema query(upper case) which is case-insensitiveq
+check_completion("select oid from Pg_cla\t", qq/select oid from
Pg_cla\b\b\b\b\bG_CLASS /, "complete schema query with uppper case
string");
------
Kind Regards,
Peter Smith.
Fujitsu Australia
On Wednesday, April 21, 2021 1:24 PM, Peter Smith <smithpb2250@gmail.com> Wrote
I tried playing a bit with your psql patch V5 and I did not find any
problems - it seemed to work as advertised.Below are a few code review comments.
Thanks for you review. I've updated the patch to V6 according to your comments.
1. Patch applies with whitespace warnings.
Fixed.
2. Unrelated "code tidy" fixes maybe should be another patch?
Agreed. Will post this modification on another thread.
3. Unnecessary NULL check?
Agreed. NULL check removed.
4. Memory not freed in multiple places?
oops. Memory free added.
5. strncmp replacement?
Agreed. Thanks for your advice. Since this modification has little relation with my patch here.
I will merge this with comment(2) and push this on another patch.
6. byte_length == 0?
The byte_length was not being checked before, so why is the check needed now?
We need to make sure the empty input to be case sensitive as before(HEAD).
For example
CREATE TABLE onetab1 (f1 int);
update onetab1 SET [tab]
Without the check of "byte_length == 0", pg_strdup_keyword_case will make the column name "f1" to be upper case "F1".
Namely, the output will be " update onetab1 SET F1" which is not so good.
I added some tab tests for this empty input case, too.
7. test typo "ralation"
8. test typo "case-insensitiveq"
Thanks, typo fixed.
Any further comment is very welcome.
Regards,
Tang
Attachments:
V6-0001-Support-tab-completion-with-a-query-result-for-upper.patchapplication/octet-stream; name=V6-0001-Support-tab-completion-with-a-query-result-for-upper.patchDownload+115-5
At Thu, 22 Apr 2021 12:43:42 +0000, "tanghy.fnst@fujitsu.com" <tanghy.fnst@fujitsu.com> wrote in
On Wednesday, April 21, 2021 1:24 PM, Peter Smith <smithpb2250@gmail.com> Wrot> >4. Memory not freed in multiple places?
oops. Memory free added.
All usages of pg_string_tolower don't need a copy.
So don't we change the function to in-place converter?
6. byte_length == 0?
The byte_length was not being checked before, so why is the check needed now?We need to make sure the empty input to be case sensitive as before(HEAD).
For example
CREATE TABLE onetab1 (f1 int);
update onetab1 SET [tab]Without the check of "byte_length == 0", pg_strdup_keyword_case will make the column name "f1" to be upper case "F1".
Namely, the output will be " update onetab1 SET F1" which is not so good.I added some tab tests for this empty input case, too.
7. test typo "ralation"
8. test typo "case-insensitiveq"Thanks, typo fixed.
Any further comment is very welcome.
if (completion_info_charp)
+ {
e_info_charp = escape_string(completion_info_charp);
+ if(e_info_charp[0] == '"')
+ completion_case_sensitive = true;
+ else
+ {
+ le_str = pg_string_tolower(e_info_charp);
It seems right to lower completion_info_charp and ..2 but it is not
right that change completion_case_sensitive here, which only affects
the returned candidates. This change prevents the following operation
from getting the expected completion candidates.
=# create table "T" (a int) partition by range(a);
=# create table c1 partition of "T" for values from (0) to (10);
=# alter table "T" drop partition C<tab>
Is there any reason for doing that?
+ if (byte_length == 0 || completion_case_sensitive)
Is the condition "byte_length == 0 ||" right?
This results in a maybe-unexpected behavior,
=# \set COM_KEYWORD_CASE upper
=# create table t (a int) partition by range(a);
=# create table d1 partition of t for values from (0) to (10);
=# alter table t drop partition <tab>
This results in
=# alter table t drop partition d1
I think we are expecting D1 as the result.
By the way COMP_KEYWORD_CASE suggests that *keywords* are completed
following the setting. However, they are not keywords, but
identifiers. And some people (including me) might dislike that
keywords and identifiers follow the same setting. Specifically I
sometimes want keywords to be upper-cased but identifiers (always) be
lower-cased.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
All usages of pg_string_tolower don't need a copy.
So don't we change the function to in-place converter?
Doesn't seem like a good idea, because that locks us into an assumption
that the downcasing conversion doesn't change the string's physical
length. There are a lot of counterexamples to that :-(. I'm not sure
that we actually implement such cases correctly today, but let's not
build APIs that prevent it from being fixed.
regards, tom lane
At Fri, 23 Apr 2021 11:58:12 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
Any further comment is very welcome.
Oh, I accidentally found a doubious behsbior.
=# alter table public.<tab>
public.c1 public.d1 public."t" public.t public."tt"
The "t" and "tt" are needlessly lower-cased.
# \d
List of relations
Schema | Name | Type | Owner
--------+--------------------+-------------------+----------
public | T | partitioned table | horiguti
public | TT | table | horiguti
public | c1 | table | horiguti
public | d1 | table | horiguti
public | t | partitioned table | horiguti
=# alter table public."<tab>
=# alter table public."t -- candidates are "t" and "tt"?
=# alter table public."tt<tab> -- nothing happenes
=# alter table public."TT<tab> -- also nothing happenes
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
At Thu, 22 Apr 2021 23:17:19 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
All usages of pg_string_tolower don't need a copy.
So don't we change the function to in-place converter?Doesn't seem like a good idea, because that locks us into an assumption
that the downcasing conversion doesn't change the string's physical
length. There are a lot of counterexamples to that :-(. I'm not sure
Mmm. I didn't know of that.
that we actually implement such cases correctly today, but let's not
build APIs that prevent it from being fixed.
Agreed. Thanks for the knowledge.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
At Thu, 22 Apr 2021 23:17:19 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in
Doesn't seem like a good idea, because that locks us into an assumption
that the downcasing conversion doesn't change the string's physical
length. There are a lot of counterexamples to that :-(. I'm not sure
Mmm. I didn't know of that.
The two examples I know of offhand are in German (eszett "ß" downcases to
"ss") and Turkish (dotted "Í" downcases to "i", likewise dotless "I"
downcases to "ı"; one of each of those pairs is an ASCII letter, the
other is not). Depending on which encoding is in use, these
transformations *could* be the same number of bytes, but they could
equally well not be. There are probably other examples.
regards, tom lane
FWIW...
At Fri, 23 Apr 2021 00:17:35 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
At Thu, 22 Apr 2021 23:17:19 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in
Doesn't seem like a good idea, because that locks us into an assumption
that the downcasing conversion doesn't change the string's physical
length. There are a lot of counterexamples to that :-(. I'm not sureMmm. I didn't know of that.
The two examples I know of offhand are in German (eszett "ß" downcases to
"ss") and Turkish (dotted "Í" downcases to "i", likewise dotless "I"
According to Wikipedia, "ss" is equivalent to "ß" and their upper case
letters are "SS" and "ẞ" respectively. (I didn't even know of the
existence of "ẞ". AFAIK there's no word begins with eszett, but it
seems that there's a case where "ẞ" appears in a word is spelled only
with capital letters.
downcases to "ı"; one of each of those pairs is an ASCII letter, the
other is not). Depending on which encoding is in use, these
Upper dotless "I" and lower dotted "i" are in ASCII (or English
alphabet?). That's interesting.
transformations *could* be the same number of bytes, but they could
equally well not be. There are probably other examples.
Yeah. Agreed.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Fri, 2021-04-23 at 14:44 +0900, Kyotaro Horiguchi wrote:
The two examples I know of offhand are in German (eszett "ß" downcases to
"ss") and Turkish (dotted "Í" downcases to "i", likewise dotless "I"According to Wikipedia, "ss" is equivalent to "ß" and their upper case
letters are "SS" and "ẞ" respectively. (I didn't even know of the
existence of "ẞ". AFAIK there's no word begins with eszett, but it
seems that there's a case where "ẞ" appears in a word is spelled only
with capital letters.
This "capital sharp s" is a recent invention that has never got much
traction. I notice that on my Fedora 32 system with glibc 2.31 and de_DE.utf8,
SELECT lower(E'\u1E9E') = E'\u00DF', upper(E'\u00DF') = E'\u1E9E';
?column? │ ?column?
══════════╪══════════
t │ f
(1 row)
which to me as a German speaker makes no sense.
But Tom's example was the wrong way around: "ß" is a lower case letter,
and the traditional upper case translation is "SS".
But the Turkish example is correct:
downcases to "ı"; one of each of those pairs is an ASCII letter, the
other is not). Depending on which encoding is in use, theseUpper dotless "I" and lower dotted "i" are in ASCII (or English
alphabet?). That's interesting.
Yes. In languages other than Turkish, "i" is the lower case version of "I",
and both are ASCII. Only Turkish has an "ı" (U+0131) and an "İ" (U+0130).
That causes annoyance for Turks who create a table named KADIN and find
that PostgreSQL turns it into "kadin".
Yours,
Laurenz Albe