Why format() adds double quote?

Started by Tatsuo Ishiialmost 10 years ago28 messages

ishii@postgresql.org

almost 10 years ago

test=# select format('%I', t) from t1;
format
----------
aaa
"AAA"
"あいう"
(3 rows)

Why is the text value of the third line needed to be double quoted?
(note that it is a multi byte character). Same thing can be said to
quote_ident().

We treat identifiers made of the multi byte characters without double
quotation (non delimited identifier) in other places.

test=# create table t2(あいう text);
CREATE TABLE
test=# insert into t2 values('aaa');
INSERT 0 1
test=# select あいう from t2;
あいう
--------
aaa
(1 row)

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pavel.stehule@gmail.com

almost 10 years ago

In reply to: Tatsuo Ishii (#1)

Re: Why format() adds double quote?

2016-01-20 3:47 GMT+01:00 Tatsuo Ishii <ishii@postgresql.org>:

test=# select format('%I', t) from t1;
format
----------
aaa
"AAA"
"あいう"
(3 rows)

Why is the text value of the third line needed to be double quoted?
(note that it is a multi byte character). Same thing can be said to
quote_ident().

We treat identifiers made of the multi byte characters without double
quotation (non delimited identifier) in other places.

test=# create table t2(あいう text);
CREATE TABLE
test=# insert into t2 values('aaa');
INSERT 0 1
test=# select あいう from t2;
あいう
--------
aaa
(1 row)

format uses same routine as quote_ident. So quote_ident should be fixed
first.

Regards

Pavel

Show quoted text

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

ishii@postgresql.org

almost 10 years ago

In reply to: Pavel Stehule (#2)

1 attachment(s)

Re: Why format() adds double quote?

2016-01-20 3:47 GMT+01:00 Tatsuo Ishii <ishii@postgresql.org>:

test=# select format('%I', t) from t1;
format
----------
aaa
"AAA"
"あいう"
(3 rows)

Why is the text value of the third line needed to be double quoted?
(note that it is a multi byte character). Same thing can be said to
quote_ident().

We treat identifiers made of the multi byte characters without double
quotation (non delimited identifier) in other places.

test=# create table t2(あいう text);
CREATE TABLE
test=# insert into t2 values('aaa');
INSERT 0 1
test=# select あいう from t2;
あいう
--------
aaa
(1 row)

format uses same routine as quote_ident. So quote_ident should be fixed
first.

Yes, I had that in my mind too.

Attached is the proposed patch to fix the bug.
Regression tests passed.

Here is an example after the patch. Note that the third row is not
quoted any more.

test=# select format('%I', あいう) from t2;
format
--------
aaa
"AAA"
あああ
(3 rows)

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Attachments:

ruleutils.c.difftext/x-patch; charset=us-asciiDownload

diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 3783e97..b93fc27 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -9405,7 +9405,7 @@ quote_identifier(const char *ident)
 	 * would like to use <ctype.h> macros here, but they might yield unwanted
 	 * locale-specific results...
 	 */
-	safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] == '_');
+	safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] == '_' || IS_HIGHBIT_SET(ident[0]));
 
 	for (ptr = ident; *ptr; ptr++)
 	{
@@ -9413,7 +9413,8 @@ quote_identifier(const char *ident)
 
 		if ((ch >= 'a' && ch <= 'z') ||
 			(ch >= '0' && ch <= '9') ||
-			(ch == '_'))
+			(ch == '_') ||
+			(IS_HIGHBIT_SET(ch)))
 		{
 			/* okay */
 		}

pavel.stehule@gmail.com

almost 10 years ago

In reply to: Tatsuo Ishii (#3)

Re: Why format() adds double quote?

Hi

2016-01-20 7:20 GMT+01:00 Tatsuo Ishii <ishii@postgresql.org>:

2016-01-20 3:47 GMT+01:00 Tatsuo Ishii <ishii@postgresql.org>:

test=# select format('%I', t) from t1;
format
----------
aaa
"AAA"
"あいう"
(3 rows)

Why is the text value of the third line needed to be double quoted?
(note that it is a multi byte character). Same thing can be said to
quote_ident().

We treat identifiers made of the multi byte characters without double
quotation (non delimited identifier) in other places.

test=# create table t2(あいう text);
CREATE TABLE
test=# insert into t2 values('aaa');
INSERT 0 1
test=# select あいう from t2;
あいう
--------
aaa
(1 row)

format uses same routine as quote_ident. So quote_ident should be fixed
first.

Yes, I had that in my mind too.

Attached is the proposed patch to fix the bug.
Regression tests passed.

Here is an example after the patch. Note that the third row is not
quoted any more.

test=# select format('%I', あいう) from t2;
format
--------
aaa
"AAA"
あああ
(3 rows)

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/src/backend/utils/adt/ruleutils.c
b/src/backend/utils/adt/ruleutils.c
index 3783e97..b93fc27 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -9405,7 +9405,7 @@ quote_identifier(const char *ident)
* would like to use <ctype.h> macros here, but they might yield
unwanted
* locale-specific results...
*/
-       safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] == '_');
+       safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] == '_' ||
IS_HIGHBIT_SET(ident[0]));
for (ptr = ident; *ptr; ptr++)
{
@@ -9413,7 +9413,8 @@ quote_identifier(const char *ident)
if ((ch >= 'a' && ch <= 'z') ||
(ch >= '0' && ch <= '9') ||
-                       (ch == '_'))
+                       (ch == '_') ||
+                       (IS_HIGHBIT_SET(ch)))
{
/* okay */
}

This patch ls simply - I remember I was surprised, so we allow any
multibyte char few months ago.

+1

Pavel

ishii@postgresql.org

almost 10 years ago

In reply to: Pavel Stehule (#4)

Re: Why format() adds double quote?

Hi

2016-01-20 7:20 GMT+01:00 Tatsuo Ishii <ishii@postgresql.org>:
2016-01-20 3:47 GMT+01:00 Tatsuo Ishii <ishii@postgresql.org>:

test=# select format('%I', t) from t1;
format
----------
aaa
"AAA"
"あいう"
(3 rows)

Why is the text value of the third line needed to be double quoted?
(note that it is a multi byte character). Same thing can be said to
quote_ident().

We treat identifiers made of the multi byte characters without double
quotation (non delimited identifier) in other places.

test=# create table t2(あいう text);
CREATE TABLE
test=# insert into t2 values('aaa');
INSERT 0 1
test=# select あいう from t2;
あいう
--------
aaa
(1 row)

format uses same routine as quote_ident. So quote_ident should be fixed
first.

Yes, I had that in my mind too.

Attached is the proposed patch to fix the bug.
Regression tests passed.

Here is an example after the patch. Note that the third row is not
quoted any more.

test=# select format('%I', あいう) from t2;
format
--------
aaa
"AAA"
あああ
(3 rows)

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/src/backend/utils/adt/ruleutils.c
b/src/backend/utils/adt/ruleutils.c
index 3783e97..b93fc27 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -9405,7 +9405,7 @@ quote_identifier(const char *ident)
* would like to use <ctype.h> macros here, but they might yield
unwanted
* locale-specific results...
*/
-       safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] == '_');
+       safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] == '_' ||
IS_HIGHBIT_SET(ident[0]));
for (ptr = ident; *ptr; ptr++)
{
@@ -9413,7 +9413,8 @@ quote_identifier(const char *ident)
if ((ch >= 'a' && ch <= 'z') ||
(ch >= '0' && ch <= '9') ||
-                       (ch == '_'))
+                       (ch == '_') ||
+                       (IS_HIGHBIT_SET(ch)))
{
/* okay */
}
This patch ls simply - I remember I was surprised, so we allow any
multibyte char few months ago.

+1

If we would go this way, question is if we should back patch this or
not since the patch apparently changes the existing
behaviors. Comments? I would think we should not.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pavel.stehule@gmail.com

almost 10 years ago

In reply to: Tatsuo Ishii (#5)

Re: Why format() adds double quote?

2016-01-20 10:17 GMT+01:00 Tatsuo Ishii <ishii@postgresql.org>:

Hi

2016-01-20 7:20 GMT+01:00 Tatsuo Ishii <ishii@postgresql.org>:

2016-01-20 3:47 GMT+01:00 Tatsuo Ishii <ishii@postgresql.org>:

test=# select format('%I', t) from t1;
format
----------
aaa
"AAA"
"あいう"
(3 rows)

Why is the text value of the third line needed to be double quoted?
(note that it is a multi byte character). Same thing can be said to
quote_ident().

We treat identifiers made of the multi byte characters without double
quotation (non delimited identifier) in other places.

test=# create table t2(あいう text);
CREATE TABLE
test=# insert into t2 values('aaa');
INSERT 0 1
test=# select あいう from t2;
あいう
--------
aaa
(1 row)

format uses same routine as quote_ident. So quote_ident should be

fixed
first.

Yes, I had that in my mind too.

Attached is the proposed patch to fix the bug.
Regression tests passed.

Here is an example after the patch. Note that the third row is not
quoted any more.

test=# select format('%I', あいう) from t2;
format
--------
aaa
"AAA"
あああ
(3 rows)

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/src/backend/utils/adt/ruleutils.c
b/src/backend/utils/adt/ruleutils.c
index 3783e97..b93fc27 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -9405,7 +9405,7 @@ quote_identifier(const char *ident)
* would like to use <ctype.h> macros here, but they might yield
unwanted
* locale-specific results...
*/
-       safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] ==
'_');

+ safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] == '_'

||
IS_HIGHBIT_SET(ident[0]));

for (ptr = ident; *ptr; ptr++)
{
@@ -9413,7 +9413,8 @@ quote_identifier(const char *ident)
if ((ch >= 'a' && ch <= 'z') ||
(ch >= '0' && ch <= '9') ||
-                       (ch == '_'))
+                       (ch == '_') ||
+                       (IS_HIGHBIT_SET(ch)))
{
/* okay */
}
This patch ls simply - I remember I was surprised, so we allow any
multibyte char few months ago.

+1
If we would go this way, question is if we should back patch this or
not since the patch apparently changes the existing
behaviors. Comments? I would think we should not.

I am sure, so we should not backport this change. This can breaks customer
regress tests - and the current behave isn't 100% correct, but it is safe.

Pavel

Show quoted text

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

robertmhaas@gmail.com

almost 10 years ago

In reply to: Pavel Stehule (#6)

Re: Why format() adds double quote?

On Wed, Jan 20, 2016 at 4:20 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

If we would go this way, question is if we should back patch this or
not since the patch apparently changes the existing
behaviors. Comments? I would think we should not.

I am sure, so we should not backport this change. This can breaks customer
regress tests - and the current behave isn't 100% correct, but it is safe.

Quite. This is not a bug fix. It's a behavior change, perhaps for the better.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

ishii@postgresql.org

almost 10 years ago

In reply to: Robert Haas (#7)

Re: Why format() adds double quote?

On Wed, Jan 20, 2016 at 4:20 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

If we would go this way, question is if we should back patch this or
not since the patch apparently changes the existing
behaviors. Comments? I would think we should not.

I am sure, so we should not backport this change. This can breaks customer
regress tests - and the current behave isn't 100% correct, but it is safe.

Quite. This is not a bug fix. It's a behavior change, perhaps for the better.

Added to the commitfest 2016-03.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Dickson S. Guedes

listas@guedesoft.net

almost 10 years ago

In reply to: Tatsuo Ishii (#8)

Re: Why format() adds double quote?

2016-01-24 8:04 GMT-02:00 Tatsuo Ishii <ishii@postgresql.org>:

On Wed, Jan 20, 2016 at 4:20 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

If we would go this way, question is if we should back patch this or
not since the patch apparently changes the existing
behaviors. Comments? I would think we should not.

I am sure, so we should not backport this change. This can breaks customer
regress tests - and the current behave isn't 100% correct, but it is safe.

Quite. This is not a bug fix. It's a behavior change, perhaps for the better.

Added to the commitfest 2016-03.

Hi,

I gone ahead a little and tested this patch and it works like was
proposed, I agree that it's not a bug fix but a new behavior so -1 for
backport.

While applying patch against master
(1129c2b0ad2732f301f696ae2cf98fb063a4c1f8) it offsets two hunks.

Since format() has regression tests I suggest that one should be added
to cover this. It could worth to add the new behavior to the docs,
since there no explicit example for %I.

I performed the follow tests that works as expected using some Portuguese words:

postgres=# create table test (nome varchar, endereço text, "UF"
varchar(2), título varchar);
CREATE TABLE
Time: 80,769 ms
postgres=# select format('%I', attname) from pg_attribute join
pg_class on (attrelid = oid) where relname = 'test';
format
----------
"UF"
cmax
cmin
ctid
endereço
nome
tableoid
título
xmax
xmin
(10 rows)

Time: 1,728 ms
postgres=# select format('%I', 'endereco');
format
----------
endereco
(1 row)

Time: 0,098 ms
postgres=# select format('%I', 'endereço');
format
----------
endereço
(1 row)

Time: 0,088 ms
postgres=# select format('%I', 'あああ');
format
--------
あああ
(1 row)

Time: 0,072 ms
postgres=# select format('%I', 'título');
format
--------
título
(1 row)

Time: 0,051 ms
postgres=# select format('%I', 'título e');
format
------------
"título e"
(1 row)

Time: 0,051 ms
postgres=# select format('%I', 'título_e');
format
----------
título_e
(1 row)

Time: 0,051 ms
postgres=# select format('%I', '_título');
format
---------
_título
(1 row)

Time: 0,047 ms
postgres=# select format('%I', '1_título');
format
------------
"1_título"
(1 row)

Time: 0,046 ms

Thank you for this!

Best regards,
--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://github.com/guedes - http://guedesoft.net
http://www.postgresql.org.br

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

ishii@postgresql.org

almost 10 years ago

In reply to: Dickson S. Guedes (#9)

Re: Why format() adds double quote?

2016-01-24 8:04 GMT-02:00 Tatsuo Ishii <ishii@postgresql.org>:

On Wed, Jan 20, 2016 at 4:20 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

If we would go this way, question is if we should back patch this or
not since the patch apparently changes the existing
behaviors. Comments? I would think we should not.

I am sure, so we should not backport this change. This can breaks customer
regress tests - and the current behave isn't 100% correct, but it is safe.

Quite. This is not a bug fix. It's a behavior change, perhaps for the better.

Added to the commitfest 2016-03.

Hi,

I gone ahead a little and tested this patch and it works like was
proposed, I agree that it's not a bug fix but a new behavior so -1 for
backport.

IMO, it's a bug or at least an inconsistency but I admit it's too late
to back patch to existing stable branches.

While applying patch against master
(1129c2b0ad2732f301f696ae2cf98fb063a4c1f8) it offsets two hunks.

Since format() has regression tests I suggest that one should be added
to cover this.

I don't think it's doable. The test requires to handle multiple
database encodings. The regression test framework handles only one
database encoding. Probably adding to the existing mb test is the
easiest.

It could worth to add the new behavior to the docs,
since there no explicit example for %I.

I performed the follow tests that works as expected using some Portuguese words:

I assume you used UTF-8 encoding database.
Great.

postgres=# create table test (nome varchar, endereço text, "UF"
varchar(2), título varchar);
CREATE TABLE
Time: 80,769 ms
postgres=# select format('%I', attname) from pg_attribute join
pg_class on (attrelid = oid) where relname = 'test';
format
----------
"UF"
cmax
cmin
ctid
endereço
nome
tableoid
título
xmax
xmin
(10 rows)

Time: 1,728 ms
postgres=# select format('%I', 'endereco');
format
----------
endereco
(1 row)

Time: 0,098 ms
postgres=# select format('%I', 'endereço');
format
----------
endereço
(1 row)

Time: 0,088 ms
postgres=# select format('%I', 'あああ');
format
--------
あああ
(1 row)

Time: 0,072 ms
postgres=# select format('%I', 'título');
format
--------
título
(1 row)

Time: 0,051 ms
postgres=# select format('%I', 'título e');
format
------------
"título e"
(1 row)

Time: 0,051 ms
postgres=# select format('%I', 'título_e');
format
----------
título_e
(1 row)

Time: 0,051 ms
postgres=# select format('%I', '_título');
format
---------
_título
(1 row)

Time: 0,047 ms
postgres=# select format('%I', '1_título');
format
------------
"1_título"
(1 row)

Time: 0,046 ms

Thank you for this!

Best regards,
--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://github.com/guedes - http://guedesoft.net
http://www.postgresql.org.br

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

daniel@manitou-mail.org

almost 10 years ago

In reply to: Tatsuo Ishii (#10)

Re: Why format() adds double quote?

Tatsuo Ishii wrote:

IMO, it's a bug or at least an inconsistency

Personally I don't see this change being good for everything.

Let's play devil's advocate:

create table abc(U&"foo\2003" int);

U+2003 is 'EM SPACE', in Unicode's General Punctuation block.

With the current version, format('%I', attname) on this column is:
"foo "

With the patched version, it produces this:
foo

So the visual hint that there are more characters at the end is lost.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Dickson S. Guedes

listas@guedesoft.net

almost 10 years ago

In reply to: Tatsuo Ishii (#10)

Re: Why format() adds double quote?

2016-01-26 5:29 GMT-02:00 Tatsuo Ishii <ishii@postgresql.org>:

I assume you used UTF-8 encoding database.

Yes, I do.

--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://github.com/guedes - http://guedesoft.net
http://www.postgresql.org.br

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Dickson S. Guedes

listas@guedesoft.net

almost 10 years ago

In reply to: Daniel Verite (#11)

Re: Why format() adds double quote?

2016-01-26 18:00 GMT-02:00 Daniel Verite <daniel@manitou-mail.org>:

...
create table abc(U&"foo\2003" int);

U+2003 is 'EM SPACE', in Unicode's General Punctuation block.

With the current version, format('%I', attname) on this column is:
"foo "

With the patched version, it produces this:
foo

So the visual hint that there are more characters at the end is lost.

Thanks for advocate, I see here that it even produces that output with
simple spaces.

postgres=# create table x ("aí " text);
CREATE TABLE
postgres=# \d x
Tabela "public.x"
Coluna | Tipo | Modificadores
----------+------+---------------
aí | text |

This will break copy&paste user actions and scripts that parses that output.

Maybe the patch should consider left/right non-printable chars to
choose whether to show or not the " ?

[]s
--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://github.com/guedes - http://guedesoft.net
http://www.postgresql.org.br

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

ishii@postgresql.org

almost 10 years ago

In reply to: Dickson S. Guedes (#13)

Re: Why format() adds double quote?

Thanks for advocate, I see here that it even produces that output with
simple spaces.

postgres=# create table x ("aí " text);
CREATE TABLE
postgres=# \d x
Tabela "public.x"
Coluna | Tipo | Modificadores
----------+------+---------------
aí | text |

This will break copy&paste user actions and scripts that parses that output.

Maybe the patch should consider left/right non-printable chars to
choose whether to show or not the " ?

This is a totally different story from the topic discussed in this
thread. psql never adds double quotations to column name even with
upper case col names.

test=# create table t6("ABC" int);
CREATE TABLE
test=# \d t6
Table "public.t6"
Column | Type | Modifiers
--------+---------+-----------
ABC | integer |

If you want to change the existing psql's behavior, propose it
yourself.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

ishii@postgresql.org

almost 10 years ago

In reply to: Daniel Verite (#11)

Re: Why format() adds double quote?

IMO, it's a bug or at least an inconsistency

Personally I don't see this change being good for everything.

Let's play devil's advocate:

create table abc(U&"foo\2003" int);

U+2003 is 'EM SPACE', in Unicode's General Punctuation block.

With the current version, format('%I', attname) on this column is:
"foo "

With the patched version, it produces this:
foo

So the visual hint that there are more characters at the end is lost.

What is the "visual hint"? If you are talking about psql's output, it
never adds "visual hint" (double quotations).

If you are talking about the string handling in a program, what kind
of program cares about "visiual"?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pavel.stehule@gmail.com

almost 10 years ago

In reply to: Daniel Verite (#11)

Re: Why format() adds double quote?

2016-01-26 21:00 GMT+01:00 Daniel Verite <daniel@manitou-mail.org>:

Tatsuo Ishii wrote:

IMO, it's a bug or at least an inconsistency

Personally I don't see this change being good for everything.

Let's play devil's advocate:

create table abc(U&"foo\2003" int);

U+2003 is 'EM SPACE', in Unicode's General Punctuation block.

With the current version, format('%I', attname) on this column is:
"foo "

With the patched version, it produces this:
foo

So the visual hint that there are more characters at the end is lost.

I can agree, so current behave can be useful in some cases, but still it is
bug (inconsistency) between PostgreSQL parser and PostgreSQL escaping
functions.

Currently, any multibyte char can be unescaped identifier (only apostrophes
are tested). We should to test white chars too.

Regards

Pavel

Show quoted text

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

ishii@postgresql.org

almost 10 years ago

In reply to: Pavel Stehule (#16)

Re: Why format() adds double quote?

2016-01-26 21:00 GMT+01:00 Daniel Verite <daniel@manitou-mail.org>:

Tatsuo Ishii wrote:

IMO, it's a bug or at least an inconsistency

Personally I don't see this change being good for everything.

Let's play devil's advocate:

create table abc(U&"foo\2003" int);

U+2003 is 'EM SPACE', in Unicode's General Punctuation block.

With the current version, format('%I', attname) on this column is:
"foo "

With the patched version, it produces this:
foo

So the visual hint that there are more characters at the end is lost.

I can agree, so current behave can be useful in some cases, but still it is
bug (inconsistency) between PostgreSQL parser and PostgreSQL escaping
functions.

Currently, any multibyte char can be unescaped identifier (only apostrophes
are tested). We should to test white chars too.

Really? I thought we do that test.

test=# create table t6("あいうえお" int);
CREATE TABLE
test=# \d t6
Table "public.t6"
Column | Type | Modifiers
-------------+---------+-----------
あいうえお | integer |
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pavel.stehule@gmail.com

almost 10 years ago

In reply to: Tatsuo Ishii (#17)

Re: Why format() adds double quote?

2016-01-27 6:13 GMT+01:00 Tatsuo Ishii <ishii@postgresql.org>:

2016-01-26 21:00 GMT+01:00 Daniel Verite <daniel@manitou-mail.org>:

Tatsuo Ishii wrote:

IMO, it's a bug or at least an inconsistency

Personally I don't see this change being good for everything.

Let's play devil's advocate:

create table abc(U&"foo\2003" int);

U+2003 is 'EM SPACE', in Unicode's General Punctuation block.

With the current version, format('%I', attname) on this column is:
"foo "

With the patched version, it produces this:
foo

So the visual hint that there are more characters at the end is lost.

I can agree, so current behave can be useful in some cases, but still it

is

bug (inconsistency) between PostgreSQL parser and PostgreSQL escaping
functions.

Currently, any multibyte char can be unescaped identifier (only

apostrophes

are tested). We should to test white chars too.

Really? I thought we do that test.

what you are expecting from this test? UTF single quotes are tested only in
quote functions probably.

Pavel

Show quoted text

test=# create table t6("あいうえお" int);
CREATE TABLE
test=# \d t6
Table "public.t6"
Column | Type | Modifiers
-------------+---------+-----------
あいうえお | integer |
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

ishii@postgresql.org

almost 10 years ago

In reply to: Pavel Stehule (#18)

Re: Why format() adds double quote?

I can agree, so current behave can be useful in some cases, but still it

is

bug (inconsistency) between PostgreSQL parser and PostgreSQL escaping
functions.

Currently, any multibyte char can be unescaped identifier (only

apostrophes

are tested). We should to test white chars too.

Really? I thought we do that test.

what you are expecting from this test? UTF single quotes are tested only in
quote functions probably.

I just wanted to demonstrate multibyte chars including ASCII white
spaces can be an identifier.

We should to test white chars too.

What do you exactly propose regarding white chars and multibyte chars
here? Maybe you propose to consider non ASCII white spaces (treate
them as ASCII white spaces)?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Pavel

test=# create table t6("あいうえお" int);
CREATE TABLE
test=# \d t6
Table "public.t6"
Column | Type | Modifiers
-------------+---------+-----------
あいうえお | integer |
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pavel.stehule@gmail.com

almost 10 years ago

In reply to: Tatsuo Ishii (#19)

Re: Why format() adds double quote?

2016-01-27 6:24 GMT+01:00 Tatsuo Ishii <ishii@postgresql.org>:

I can agree, so current behave can be useful in some cases, but still

it

is

bug (inconsistency) between PostgreSQL parser and PostgreSQL escaping
functions.

Currently, any multibyte char can be unescaped identifier (only

apostrophes

are tested). We should to test white chars too.

Really? I thought we do that test.

what you are expecting from this test? UTF single quotes are tested only

in

quote functions probably.

I just wanted to demonstrate multibyte chars including ASCII white
spaces can be an identifier.

I understand now.

We should to test white chars too.

What do you exactly propose regarding white chars and multibyte chars
here? Maybe you propose to consider non ASCII white spaces (treate
them as ASCII white spaces)?

I propose the work with UTF white chars should be same like ASCII white
chars. The current design is too simple - with possible pretty bad issues.
Daniel's example is good - there is big gap in design.

Regards

Pavel

Show quoted text

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Pavel

test=# create table t6("あいうえお" int);
CREATE TABLE
test=# \d t6
Table "public.t6"
Column | Type | Modifiers
-------------+---------+-----------
あいうえお | integer |
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

ishii@postgresql.org

almost 10 years ago

In reply to: Pavel Stehule (#20)

Re: Why format() adds double quote?

What do you exactly propose regarding white chars and multibyte chars
here? Maybe you propose to consider non ASCII white spaces (treate
them as ASCII white spaces)?

I propose the work with UTF white chars should be same like ASCII white
chars. The current design is too simple - with possible pretty bad issues.
Daniel's example is good - there is big gap in design.

I think we should consider followings before going forward:

1) Does PostgreSQL treat non ASCII white spaces same as ASCII white
spaces anyware in the system? If not, there's no reason we should
think format() and quote_indent() are exception.

2) What does the SQL standard say? Do they say that non ASCII white
spaces should be treated as ASCII white spaces?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pavel.stehule@gmail.com

almost 10 years ago

In reply to: Tatsuo Ishii (#21)

Re: Why format() adds double quote?

2016-01-27 8:25 GMT+01:00 Tatsuo Ishii <ishii@postgresql.org>:

What do you exactly propose regarding white chars and multibyte chars
here? Maybe you propose to consider non ASCII white spaces (treate
them as ASCII white spaces)?

I propose the work with UTF white chars should be same like ASCII white
chars. The current design is too simple - with possible pretty bad

issues.

Daniel's example is good - there is big gap in design.

I think we should consider followings before going forward:

1) Does PostgreSQL treat non ASCII white spaces same as ASCII white
spaces anyware in the system? If not, there's no reason we should
think format() and quote_indent() are exception.

+1

2) What does the SQL standard say? Do they say that non ASCII white
spaces should be treated as ASCII white spaces?

I am not sure, if SQL standard say some about it. But I am sure, so using
unescaped or unclosed UTF8 spaces is bug, dangerous, wrong

Pavel

Show quoted text

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

daniel@manitou-mail.org

almost 10 years ago

In reply to: Tatsuo Ishii (#21)

Re: Why format() adds double quote?

Tatsuo Ishii wrote:

2) What does the SQL standard say? Do they say that non ASCII white
spaces should be treated as ASCII white spaces?

I've used white space in the example, but I'm concerned about
punctuation too.

unicode.org has this helpful paper:
http://www.unicode.org/L2/L2000/00260-sql.pdf
which studies Unicode in SQL-99 identifiers.

The relevant BNF they extracted from the standard looks like this:

identifier body> ::=
<identifier start>
[ { <underscore> | <identifier part> }... ]

<identifier start> ::=
<initial alphabetic character>
| <ideographic character>

<identifier part> ::=
<alphabetic character>
| <ideographic character>
| <decimal digit character>
| <identifier combining character>
| <underscore>
| <alternate underscore>
| <extender character>
| <identifier ignorable character>
| <connector character>

<delimited identifier> ::=
<double quote> <delimited identifier body> <double quote>

<delimited identifier body> ::= <delimited identifier part>...

<delimited identifier part> ::=
<nondoublequote character>
| <doublequote symbol>

========

The current version of quote_ident() plays it safe by implementing
the rule that, as soon it encounters a character outside
of US-ASCII, it surrounds the identifier with double quotes, no matter
to which category or block this character belongs.
So its output is guaranteed to be compatible with the above grammar.

The change in the patch is that multibyte characters just don't imply
quoting.

But according to the points 1 and 2 of the paper, the first character
must have the Unicode alphabetic property, and it must not
have the Unicode combining property.

I'm mostly ignorant in Unicode so I'm not sure of the precise
implications of having such Unicode properties, but still my
understanding is that the new quote_ident() ignores these rules,
so in this sense it could produce outputs that wouldn't be
compatible with SQL-99.

Also, here's what we say in the manual about non quoted identifiers:
http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html

"SQL identifiers and key words must begin with a letter (a-z, but also
letters with diacritical marks and non-Latin letters) or an underscore
(_). Subsequent characters in an identifier or key word can be
letters, underscores, digits (0-9), or dollar signs ($)"

So it explicitly allows letters in general (and also seems less
strict than SQL-99 about underscore), but it makes no promise about
Unicode punctuation or spaces, for instance, even though in practice
the parser seems to accept them just fine.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

daniel@manitou-mail.org

almost 10 years ago

In reply to: Tatsuo Ishii (#15)

Re: Why format() adds double quote?

Tatsuo Ishii wrote:

What is the "visual hint"? If you are talking about psql's output, it
never adds "visual hint" (double quotations).

If you are talking about the string handling in a program, what kind
of program cares about "visiual"?

Sure. The scenario I'm thinking about would be someone inspecting
generated queries, say for controlling or troubleshooting,
the queries containing identifiers injected with the help of
quote_ident/format. That could be from the logs, or
monitoring or audit tools.

If identifiers contain weird Unicode characters, currently
that's relatively easy to spot because they get surrounded by
double quotes.

If I see something like this: UPDATE "test․table" SET...
I immediately think that there's something fishy. It looks like test
should be a schema, but the surrounding quotes say otherwise.
In any case, it's clear that it updates a table in the current schema.

But if I see that: UPDATE test․table SET...
is seems legit and seems to update "table" inside the "test" schema
even though that's not what it does, it actually updates the
"test․table" table in the current schema, because the dot between
test and table is not the US-ASCII U+002E, it's U+2024,
'ONE DOT LEADER'
On my display, they are almost indiscernible.

This boils down to the fact that the current quote_ident gives:

=# select quote_ident('test․table');
quote_ident
--------------
"test․table"

whereas the quote_ident patched as proposed gives:

=# select quote_ident('test․table');
quote_ident
-------------
test․table

So this is what I don't feel good about.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

tgl@sss.pgh.pa.us

almost 10 years ago

In reply to: Daniel Verite (#24)

Re: Why format() adds double quote?

"Daniel Verite" <daniel@manitou-mail.org> writes:

This boils down to the fact that the current quote_ident gives:

=# select quote_ident('test․table');
quote_ident
--------------
"test․table"

whereas the quote_ident patched as proposed gives:

=# select quote_ident('test․table');
quote_ident
-------------
test․table

So this is what I don't feel good about.

This patch was originally proposed as a simple, cost-free change,
but it's becoming obvious that it is no such thing. I think
we should probably reject it and move on.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Dickson S. Guedes

listas@guedesoft.net

almost 10 years ago

In reply to: Tatsuo Ishii (#14)

Re: Why format() adds double quote?

2016-01-26 23:40 GMT-02:00 Tatsuo Ishii <ishii@postgresql.org>:

Thanks for advocate, I see here that it even produces that output with
simple spaces.

postgres=# create table x ("aí " text);
CREATE TABLE
postgres=# \d x
Tabela "public.x"
Coluna | Tipo | Modificadores
----------+------+---------------
aí | text |

This will break copy&paste user actions and scripts that parses that output.

Maybe the patch should consider left/right non-printable chars to
choose whether to show or not the " ?

This is a totally different story from the topic discussed in this
thread. psql never adds double quotations to column name even with
upper case col names.

Indeed, you are right.

If you want to change the existing psql's behavior, propose it
yourself.

It could be interesting, maybe using a \pset quote_columns_char, I'll
think about, thank you.

Best regards.
--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://github.com/guedes - http://guedesoft.net
http://www.postgresql.org.br

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

ishii@postgresql.org

almost 10 years ago

In reply to: Daniel Verite (#23)

Re: Why format() adds double quote?

I've used white space in the example, but I'm concerned about
punctuation too.

unicode.org has this helpful paper:
http://www.unicode.org/L2/L2000/00260-sql.pdf
which studies Unicode in SQL-99 identifiers.

The relevant BNF they extracted from the standard looks like this:

identifier body> ::=
<identifier start>
[ { <underscore> | <identifier part> }... ]

<identifier start> ::=
<initial alphabetic character>
| <ideographic character>

<identifier part> ::=
<alphabetic character>
| <ideographic character>
| <decimal digit character>
| <identifier combining character>
| <underscore>
| <alternate underscore>
| <extender character>
| <identifier ignorable character>
| <connector character>

<delimited identifier> ::=
<double quote> <delimited identifier body> <double quote>

<delimited identifier body> ::= <delimited identifier part>...

<delimited identifier part> ::=
<nondoublequote character>
| <doublequote symbol>

========

The current version of quote_ident() plays it safe by implementing
the rule that, as soon it encounters a character outside
of US-ASCII, it surrounds the identifier with double quotes, no matter
to which category or block this character belongs.
So its output is guaranteed to be compatible with the above grammar.

The change in the patch is that multibyte characters just don't imply
quoting.

But according to the points 1 and 2 of the paper, the first character
must have the Unicode alphabetic property, and it must not
have the Unicode combining property.

Good point.

I'm mostly ignorant in Unicode so I'm not sure of the precise
implications of having such Unicode properties, but still my
understanding is that the new quote_ident() ignores these rules,
so in this sense it could produce outputs that wouldn't be
compatible with SQL-99.

Also, here's what we say in the manual about non quoted identifiers:
http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html

"SQL identifiers and key words must begin with a letter (a-z, but also
letters with diacritical marks and non-Latin letters) or an underscore
(_). Subsequent characters in an identifier or key word can be
letters, underscores, digits (0-9), or dollar signs ($)"

So it explicitly allows letters in general (and also seems less
strict than SQL-99 about underscore), but it makes no promise about
Unicode punctuation or spaces, for instance, even though in practice
the parser seems to accept them just fine.

You could arbitary extend your point, not only with Unicode
punctuation or spaces, There are number of characters look-alike "-"
in Unicode, for example.　Do we want to treat them like ASCII "-"?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

ishii@postgresql.org

almost 10 years ago

In reply to: Tom Lane (#25)

Re: Why format() adds double quote?

"Daniel Verite" <daniel@manitou-mail.org> writes:

This boils down to the fact that the current quote_ident gives:

=# select quote_ident('test․table');
quote_ident
--------------
"test․table"

whereas the quote_ident patched as proposed gives:

=# select quote_ident('test․table');
quote_ident
-------------
test․table

So this is what I don't feel good about.

This patch was originally proposed as a simple, cost-free change,
but it's becoming obvious that it is no such thing. I think
we should probably reject it and move on.

It seems I opend a can of worms. I'm going to reject my proposal
myself.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers