UTF8 conversion

Started by PG Bug reporting formover 5 years ago4 messagesdocsgeneral
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org
docsgeneral

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/11/functions-string.html
Description:

I cannot get psql to accept an UTF8 string, link update .. set ... =
'josé';
In the docpage on encodings, jose is used in examples for UTF8.
Plainly this makes no sense. If you give examples, use the specifics in
question, like a proper UTF8 string such as 'josé'.
That would help me, and probably quite a few others, greatly.
I can go in a song and dance about my settings. Believe me, I a proper
client_encoding and database encoding, and done umpteen attempts, but using
UTF8 in psql queries (without UGLY \u notation) appears completely
impossible.
Example: select 'jose'; Works fine. select 'josé'; will just hang, even if
embellished with ::utf8 or ::bytea.
When executing the query, psql does not return to its prompt until using ^c.
One might believe it is executing the query, but that simply is not so.
Then it shows it has not found the end of the string.
This mis-behavior irritates me wildly in an otherwise great product.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
docsgeneral
Re: UTF8 conversion

PG Doc comments form <noreply@postgresql.org> writes:

I cannot get psql to accept an UTF8 string, link update .. set ... =
'josé';

What this sounds like is some confusion in your locale and/or encoding
settings. While perhaps we could improve the documentation about that
area, it's impossible to guess what would help you, since it's not
clear exactly what's wrong.

I can go in a song and dance about my settings. Believe me, I a proper
client_encoding and database encoding, and done umpteen attempts, but using
UTF8 in psql queries (without UGLY \u notation) appears completely
impossible.

Without those details it's really impossible to say much. But I'm
suspicious that this isn't a Postgres issue at all really, but something
not right about your terminal settings and/or locale environment
variables.

Anyway, a documentation comment box isn't the right place for this kind of
support, and I seriously doubt that there's any bug here either. Please
send more detail to the pgsql-general mailing list, and we'll see if we
can work this out.

regards, tom lane

#3John Lumby
johnlumby@hotmail.com
In reply to: Tom Lane (#2)
docsgeneral
Set-Returning functions in a select list

I am unclear about the rules which postgresql uses in building a result
set when the select list ( see note ) contains several different
Set-Returning functions,  possibly with sets of different sizes.
At a very high level,   what I see happening is that each Set-Returning
expression causes the result set to be expanded with additional rows, 
one per member of the set.
When there is only one such expression,     this is clear  -   the size
of the final result set is N times the size of the output of the final
operator where N is the size of the set   (i.e. N times the size if that
expression had been a scalar).
But it becomes unclear where there is more than one such expression.

I can't find much on this point in the doc  (I am looking in PostgreSQL
13.2 Documentation).    There is one small example in
"9.25. Set Returning Functions"
but in this example there is only one Set-Returning expression in the
select list.

Also the description of the SELECT command mentions something called a
"row group" :
"The actual output rows are computed using the SELECT output expressions
for each selected row or row group. (See SELECT List below.)"
but no mention of Set-Returning expressions in the referenced section.

I tried an experiment :

<code>
echo -e "DROP SCHEMA testing cascade;
CREATE SCHEMA testing;
CREATE TABLE testing.set_returning_in_select_list (id int4 , bitmask_a
int , bitmask_b int );
insert into testing.set_returning_in_select_list values ( 1 ,
x'1010'::int , x'0101'::int ) , ( 2 , x'2020'::int , x'0202'::int ) , (
3 , x'3030'::int , x'0303'::int );
select id , to_hex(bitmask_a) as hex_a , unnest( array [ ( bitmask_a &
16 ) ,  ( bitmask_a & 32 ) , ( bitmask_a & 48 ) ] ) as bits_of_a ,
               to_hex(bitmask_b) as hex_b from
testing.set_returning_in_select_list order by bits_of_a;
select id , to_hex(bitmask_a) as hex_a , unnest( array [ ( bitmask_a &
16 ) ,  ( bitmask_a & 32 ) , ( bitmask_a & 48 ) ] ) as bits_of_a ,
               to_hex(bitmask_b) as hex_b , unnest( array [ ( bitmask_b
& 1 ) ,  ( bitmask_b & 2 ) , ( bitmask_b & 3 ) ] ) as bits_of_b
 from testing.set_returning_in_select_list order by bits_of_a;
select id , to_hex(bitmask_a) as hex_a , unnest( array [ ( bitmask_a &
16 ) ,  ( bitmask_a & 32 ) , ( bitmask_a & 48 ) ] ) as bits_of_a ,
               to_hex(bitmask_b) as hex_b , unnest( array [ ( bitmask_b
& 3 ) ,  ( bitmask_b & 2 ) , ( bitmask_b & 1 ) ] ) as bits_of_b ,
               unnest( array [ ( bitmask_b & 3 ) ,  ( bitmask_b & 2 ) ,
( bitmask_b & 1 ) , ( bitmask_b & 17 ) ] ) as extraneous_bits_of_b
 from testing.set_returning_in_select_list order by bits_of_a;" | psql -e
</code>

The output of the last query is,  firstly,  different on postgresql-9.6
than on 13.3,   and secondly,     hard to explain in either case.
my outputs appended below.

I *think* :

 . postgresql-9.6 applies a kind of triple cartesian join on the
elements of each set, joining one set with the other set and then again
with itself,   ending up with many duplicates
 . postgresql-13.3 applies a kind of null-padding extension of the
elements of the smaller set and then an arbitrary linear pairing of the
two sets,   ending up with one duplicate.

Which one,  if any,  is correct?    And why?

(note : I am not asking about the case of Set-Returning functions in the
FROM clause,  where things seem clear(er).)

Cheers,   John Lumby

My outputs of the final query :

 . postgresql-9.6

<code>

 id | hex_a | bits_of_a | hex_b | bits_of_b | extraneous_bits_of_b
----+-------+-----------+-------+-----------+----------------------
  1 | 1010  |         0 | 101   |         0 |                    0
  1 | 1010  |         0 | 101   |         0 |                    1
  1 | 1010  |         0 | 101   |         0 |                    1
  1 | 1010  |         0 | 101   |         0 |                    1
  2 | 2020  |         0 | 202   |         2 |                    2
  2 | 2020  |         0 | 202   |         2 |                    2
  2 | 2020  |         0 | 202   |         2 |                    0
  2 | 2020  |         0 | 202   |         2 |                    0
  1 | 1010  |        16 | 101   |         1 |                    1
  1 | 1010  |        16 | 101   |         1 |                    1
  1 | 1010  |        16 | 101   |         1 |                    1
  1 | 1010  |        16 | 101   |         1 |                    0
  1 | 1010  |        16 | 101   |         1 |                    1
  1 | 1010  |        16 | 101   |         1 |                    1
  1 | 1010  |        16 | 101   |         1 |                    0
  1 | 1010  |        16 | 101   |         1 |                    1
  3 | 3030  |        16 | 303   |         3 |                    3
  3 | 3030  |        16 | 303   |         3 |                    1
  3 | 3030  |        16 | 303   |         3 |                    1
  3 | 3030  |        16 | 303   |         3 |                    2
  2 | 2020  |        32 | 202   |         2 |                    0
  2 | 2020  |        32 | 202   |         2 |                    2
  2 | 2020  |        32 | 202   |         2 |                    0
  2 | 2020  |        32 | 202   |         0 |                    0
  2 | 2020  |        32 | 202   |         2 |                    2
  2 | 2020  |        32 | 202   |         0 |                    2
  2 | 2020  |        32 | 202   |         0 |                    0
  2 | 2020  |        32 | 202   |         0 |                    2
  3 | 3030  |        32 | 303   |         2 |                    2
  3 | 3030  |        32 | 303   |         2 |                    3
  3 | 3030  |        32 | 303   |         2 |                    1
  3 | 3030  |        32 | 303   |         2 |                    1
  3 | 3030  |        48 | 303   |         1 |                    3
  3 | 3030  |        48 | 303   |         1 |                    2
  3 | 3030  |        48 | 303   |         1 |                    1
  3 | 3030  |        48 | 303   |         1 |                    1
(36 rows)

</code>

 . postgresql-13.3

<code>

 id | hex_a | bits_of_a | hex_b | bits_of_b | extraneous_bits_of_b
----+-------+-----------+-------+-----------+----------------------
  1 | 1010  |         0 | 101   |         0 |                    0
  2 | 2020  |         0 | 202   |         2 |                    2
  1 | 1010  |        16 | 101   |         1 |                    1
  1 | 1010  |        16 | 101   |         1 |                    1
  3 | 3030  |        16 | 303   |         3 |                    3
  2 | 2020  |        32 | 202   |         0 |                    0
  2 | 2020  |        32 | 202   |         2 |                    2
  3 | 3030  |        32 | 303   |         2 |                    2
  3 | 3030  |        48 | 303   |         1 |                    1
  1 | 1010  |           | 101   |           |                    1
  3 | 3030  |           | 303   |           |                    1
  2 | 2020  |           | 202   |           |                    0
(12 rows)
</code>

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: John Lumby (#3)
docsgeneral
Re: Set-Returning functions in a select list

J Lumby <johnlumby@hotmail.com> <DM5PR06MB3082AEC288F546F7AF5393E3A3E69@DM5PR06MB3082.namprd06.prod.outlook.com> writes:

I am unclear about the rules which postgresql uses in building a result
set when the select list ( see note ) contains several different
Set-Returning functions,  possibly with sets of different sizes.

The docs you want exist, but they're a bit buried ... see

https://www.postgresql.org/docs/current/xfunc-sql.html#XFUNC-SQL-FUNCTIONS-RETURNING-SET

In particular, the NOTE at the bottom of that section explains that
the behavior changed in v10.

regards, tom lane