per-database locale: createdb switches

Started by Alvaro Herreraabout 17 years ago22 messages
#1Alvaro Herrera
alvherre@commandprompt.com

Hi,

I just noticed that the interface for choosing a different locale at db
creation time is
createdb --lc-collate=X --lc-ctype=X. Is there a reason for having
these two separate switches? It seems awkward; why can't we just have a
single --locale switch that selects both settings at once?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#2Teodor Sigaev
teodor@sigaev.ru
In reply to: Alvaro Herrera (#1)
Re: per-database locale: createdb switches

Alvaro Herrera wrote:

Hi,

I just noticed that the interface for choosing a different locale at db
creation time is
createdb --lc-collate=X --lc-ctype=X. Is there a reason for having
these two separate switches? It seems awkward; why can't we just have a
single --locale switch that selects both settings at once?

Sometimes it's needed to use C-collate with non-C-ctype. But for most
users it's enough just a locale switch. What about
[--locale=X|--lc-collate=X --lc-ctype=X] option?

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#2)
Re: per-database locale: createdb switches

Teodor Sigaev <teodor@sigaev.ru> writes:

Alvaro Herrera wrote:

It seems awkward; why can't we just have a
single --locale switch that selects both settings at once?

Sometimes it's needed to use C-collate with non-C-ctype. But for most
users it's enough just a locale switch. What about
[--locale=X|--lc-collate=X --lc-ctype=X] option?

Seems to me there's one there already.

regards, tom lane

#4Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#3)
Re: per-database locale: createdb switches

Tom Lane wrote:

Teodor Sigaev <teodor@sigaev.ru> writes:

Alvaro Herrera wrote:

It seems awkward; why can't we just have a
single --locale switch that selects both settings at once?

Sometimes it's needed to use C-collate with non-C-ctype. But for most
users it's enough just a locale switch. What about
[--locale=X|--lc-collate=X --lc-ctype=X] option?

Seems to me there's one there already.

You're thinking of initdb maybe? I'm talking about createdb.

$ LC_ALL=C createdb --version
createdb (PostgreSQL) 8.4devel

$ LC_ALL=C createdb --help
createdb creates a PostgreSQL database.

Usage:
createdb [OPTION]... [DBNAME] [DESCRIPTION]

Options:
-D, --tablespace=TABLESPACE default tablespace for the database
-E, --encoding=ENCODING encoding for the database
--lc-collate=LOCALE LC_COLLATE setting for the database
--lc-ctype=LOCALE LC_CTYPE setting for the database
-O, --owner=OWNER database user to own the new database
-T, --template=TEMPLATE template database to copy
-e, --echo show the commands being sent to the server
--help show this help, then exit
--version output version information, then exit

Connection options:
-h, --host=HOSTNAME database server host or socket directory
-p, --port=PORT database server port
-U, --username=USERNAME user name to connect as
-W, --password force password prompt

By default, a database with the same name as the current user is created.

Report bugs to <pgsql-bugs@postgresql.org>.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#4)
Re: per-database locale: createdb switches

Alvaro Herrera <alvherre@commandprompt.com> writes:

Tom Lane wrote:

Seems to me there's one there already.

You're thinking of initdb maybe? I'm talking about createdb.

Oh, okay. But how often is someone going to be changing locales during
createdb? I think the most common case might well be like Teodor said,
where you need to tweak them individually anyway.

regards, tom lane

#6Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#5)
Re: per-database locale: createdb switches

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Tom Lane wrote:

Seems to me there's one there already.

You're thinking of initdb maybe? I'm talking about createdb.

Oh, okay. But how often is someone going to be changing locales during
createdb? I think the most common case might well be like Teodor said,
where you need to tweak them individually anyway.

Frequently, I think. In fact I think creating a database in a different
language is going to be more frequent than tweaking the settings
individually.

I like Teodor's proposal; I'll see about implementing that.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#7Alvaro Herrera
alvherre@commandprompt.com
In reply to: Alvaro Herrera (#6)
1 attachment(s)
Re: per-database locale: createdb switches

Alvaro Herrera wrote:

I like Teodor's proposal; I'll see about implementing that.

Attached.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachments:

createdb-locale.patchtext/x-diff; charset=us-asciiDownload
Index: src/bin/scripts/createdb.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/bin/scripts/createdb.c,v
retrieving revision 1.27
diff -c -p -r1.27 createdb.c
*** src/bin/scripts/createdb.c	23 Sep 2008 09:20:38 -0000	1.27
--- src/bin/scripts/createdb.c	10 Nov 2008 15:45:09 -0000
*************** main(int argc, char *argv[])
*** 34,39 ****
--- 34,40 ----
  		{"encoding", required_argument, NULL, 'E'},
  		{"lc-collate", required_argument, NULL, 1},
  		{"lc-ctype", required_argument, NULL, 2},
+ 		{"locale", required_argument, NULL, 'l'},
  		{NULL, 0, NULL, 0}
  	};
  
*************** main(int argc, char *argv[])
*** 54,59 ****
--- 55,61 ----
  	char	   *encoding = NULL;
  	char	   *lc_collate = NULL;
  	char	   *lc_ctype = NULL;
+ 	char	   *locale = NULL;
  
  	PQExpBufferData sql;
  
*************** main(int argc, char *argv[])
*** 65,71 ****
  
  	handle_help_version_opts(argc, argv, "createdb", help);
  
! 	while ((c = getopt_long(argc, argv, "h:p:U:WeqO:D:T:E:", long_options, &optindex)) != -1)
  	{
  		switch (c)
  		{
--- 67,73 ----
  
  	handle_help_version_opts(argc, argv, "createdb", help);
  
! 	while ((c = getopt_long(argc, argv, "h:p:U:WeqO:D:T:E:l:", long_options, &optindex)) != -1)
  	{
  		switch (c)
  		{
*************** main(int argc, char *argv[])
*** 105,110 ****
--- 107,115 ----
  			case 2:
  				lc_ctype = optarg;
  				break;
+ 			case 'l':
+ 				locale = optarg;
+ 				break;
  			default:
  				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
  				exit(1);
*************** main(int argc, char *argv[])
*** 129,134 ****
--- 134,157 ----
  			exit(1);
  	}
  
+ 	if (locale)
+ 	{
+ 		if (lc_ctype)
+ 		{
+ 			fprintf(stderr, _("%s: only one of --locale and --lc-ctype can be specified\n"),
+ 					progname);
+ 			exit(1);
+ 		}
+ 		if (lc_collate)
+ 		{
+ 			fprintf(stderr, _("%s: only one of --locale and --lc-collate can be specified\n"),
+ 					progname);
+ 			exit(1);
+ 		}
+ 		lc_ctype = locale;
+ 		lc_collate = locale;
+ 	}
+ 
  	if (encoding)
  	{
  		if (pg_char_to_encoding(encoding) < 0)
*************** help(const char *progname)
*** 226,231 ****
--- 249,255 ----
  	printf(_("  -E, --encoding=ENCODING      encoding for the database\n"));
  	printf(_("  --lc-collate=LOCALE          LC_COLLATE setting for the database\n"));
  	printf(_("  --lc-ctype=LOCALE            LC_CTYPE setting for the database\n"));
+ 	printf(_("  -l, --locale=LOCALE          locale settings for the database\n"));
  	printf(_("  -O, --owner=OWNER            database user to own the new database\n"));
  	printf(_("  -T, --template=TEMPLATE      template database to copy\n"));
  	printf(_("  -e, --echo                   show the commands being sent to the server\n"));
#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#7)
Re: per-database locale: createdb switches

Alvaro Herrera <alvherre@commandprompt.com> writes:

Alvaro Herrera wrote:

I like Teodor's proposal; I'll see about implementing that.

Attached.

You missed updating the sgml docs, and personally I'd be inclined to
list -l before the individual --lc switches; otherwise it looks fine.

regards, tom lane

#9Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#8)
Re: per-database locale: createdb switches

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Alvaro Herrera wrote:

I like Teodor's proposal; I'll see about implementing that.

Attached.

You missed updating the sgml docs, and personally I'd be inclined to
list -l before the individual --lc switches; otherwise it looks fine.

Thanks, committed that way. I noticed that --lc-ctype and --lc-collate
were forgotten in SGML docs, so I added them too.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#10Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alvaro Herrera (#9)
Re: per-database locale: createdb switches

Alvaro Herrera wrote:

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Alvaro Herrera wrote:

I like Teodor's proposal; I'll see about implementing that.

Attached.

You missed updating the sgml docs, and personally I'd be inclined to
list -l before the individual --lc switches; otherwise it looks fine.

Thanks, committed that way. I noticed that --lc-ctype and --lc-collate
were forgotten in SGML docs, so I added them too.

Should we have a shorthand CREATE DATABASE option like that as well?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#11Bruce Momjian
bruce@momjian.us
In reply to: Heikki Linnakangas (#10)
Re: per-database locale: createdb switches

Heikki Linnakangas wrote:

Alvaro Herrera wrote:

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Alvaro Herrera wrote:

I like Teodor's proposal; I'll see about implementing that.

Attached.

You missed updating the sgml docs, and personally I'd be inclined to
list -l before the individual --lc switches; otherwise it looks fine.

Thanks, committed that way. I noticed that --lc-ctype and --lc-collate
were forgotten in SGML docs, so I added them too.

Should we have a shorthand CREATE DATABASE option like that as well?

createdb is really about convenience; not sure it is warranted for
CREATE DATABASE.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#12Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#11)
Re: per-database locale: createdb switches

Bruce Momjian wrote:

Heikki Linnakangas wrote:

Alvaro Herrera wrote:

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Alvaro Herrera wrote:

I like Teodor's proposal; I'll see about implementing that.

Attached.

You missed updating the sgml docs, and personally I'd be inclined to
list -l before the individual --lc switches; otherwise it looks fine.

Thanks, committed that way. I noticed that --lc-ctype and --lc-collate
were forgotten in SGML docs, so I added them too.

Should we have a shorthand CREATE DATABASE option like that as well?

createdb is really about convenience; not sure it is warranted for
CREATE DATABASE.

I think unless you are doing something completely funny, you would
usually want to have COLLATE and CTYPE equal. The fact that you now
have to enter both to get that result could be pretty annoying in
practice, I would think.

#13Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#12)
Re: per-database locale: createdb switches

Peter Eisentraut wrote:

Bruce Momjian wrote:

Heikki Linnakangas wrote:

Alvaro Herrera wrote:

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Alvaro Herrera wrote:

I like Teodor's proposal; I'll see about implementing that.

Attached.

You missed updating the sgml docs, and personally I'd be inclined to
list -l before the individual --lc switches; otherwise it looks fine.

Thanks, committed that way. I noticed that --lc-ctype and --lc-collate
were forgotten in SGML docs, so I added them too.

Should we have a shorthand CREATE DATABASE option like that as well?

createdb is really about convenience; not sure it is warranted for
CREATE DATABASE.

I think unless you are doing something completely funny, you would
usually want to have COLLATE and CTYPE equal. The fact that you now
have to enter both to get that result could be pretty annoying in
practice, I would think.

I agree but I can't think of many cases where we offer one option which
controls two other options; can you?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#14Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#13)
Re: per-database locale: createdb switches

Bruce Momjian wrote:

You missed updating the sgml docs, and personally I'd be inclined to
list -l before the individual --lc switches; otherwise it looks fine.

Thanks, committed that way. I noticed that --lc-ctype and --lc-collate
were forgotten in SGML docs, so I added them too.

Should we have a shorthand CREATE DATABASE option like that as well?

createdb is really about convenience; not sure it is warranted for
CREATE DATABASE.

I think unless you are doing something completely funny, you would
usually want to have COLLATE and CTYPE equal. The fact that you now
have to enter both to get that result could be pretty annoying in
practice, I would think.

I agree but I can't think of many cases where we offer one option which
controls two other options; can you?

We have cases like that:

initdb --locale
createdb --locale

It looks to me, however, that there is possible confusion about what
createdb --locale (as well as any possible option to be added to CREATE
DATABASE) really affects:

initdb --locale controls --lc-ctype, --lc-collate, --lc-messages,
--lc-monetary, --lc-numeric, --lc-time.

createdb --locale only controls --lc-ctype and --lc-collate. The
functionality to have database-specific settings of the other locale
categories already exists, so why shouldn't those be set as well?

Which raises yet another question, why CTYPE and COLLATE have to be
hardcoded settings and catalog columns instead of being stored in
datconfig as database-startup-only settings?

#15Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Peter Eisentraut (#14)
Re: per-database locale: createdb switches

Peter Eisentraut wrote:

Which raises yet another question, why CTYPE and COLLATE have to be
hardcoded settings and catalog columns instead of being stored in
datconfig as database-startup-only settings?

Because changing CTYPE or COLLATE in an existing database would render
indexes broken.

Perhaps we could've put them in datconfig, and forbidden changing them
after CREATE DATABASE. Then again, encoding is a similar setting too,
and that's stored in a catalog column.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#16Peter Eisentraut
peter_e@gmx.net
In reply to: Heikki Linnakangas (#15)
Re: per-database locale: createdb switches

Heikki Linnakangas wrote:

Peter Eisentraut wrote:

Which raises yet another question, why CTYPE and COLLATE have to be
hardcoded settings and catalog columns instead of being stored in
datconfig as database-startup-only settings?

Because changing CTYPE or COLLATE in an existing database would render
indexes broken.

Perhaps we could've put them in datconfig, and forbidden changing them
after CREATE DATABASE. Then again, encoding is a similar setting too,
and that's stored in a catalog column.

Yeah, it's a tricky case somewhere in between all the facilities that we
already have.

I notice in the documentation that the createdb --lc-ctype sets the
lc_ctype setting for the database, but the corresponding parameter for
CREATE DATABASE is CTYPE, but the global GUC setting is lc_ctype.
Should that be more consistent?

#17Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Peter Eisentraut (#16)
Re: per-database locale: createdb switches

Peter Eisentraut wrote:

Heikki Linnakangas wrote:

Peter Eisentraut wrote:

Which raises yet another question, why CTYPE and COLLATE have to be
hardcoded settings and catalog columns instead of being stored in
datconfig as database-startup-only settings?

Because changing CTYPE or COLLATE in an existing database would render
indexes broken.

Perhaps we could've put them in datconfig, and forbidden changing them
after CREATE DATABASE. Then again, encoding is a similar setting too,
and that's stored in a catalog column.

Yeah, it's a tricky case somewhere in between all the facilities that we
already have.

I notice in the documentation that the createdb --lc-ctype sets the
lc_ctype setting for the database, but the corresponding parameter for
CREATE DATABASE is CTYPE, but the global GUC setting is lc_ctype. Should
that be more consistent?

Hmm, I remember I pondered for a long time if it should be COLLATE and
CTYPE or LC_COLLATE and LC_CTYPE. I think the rationale in the end was
that a) COLLATE/CTYPE looks nicer and b) if we add support for ICU or
some other collation implementation, the association with LC_*
environment variables becomes misleading.

Being consistent would be nice, though.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#18Alvaro Herrera
alvherre@commandprompt.com
In reply to: Heikki Linnakangas (#17)
Re: per-database locale: createdb switches

Heikki Linnakangas wrote:

Peter Eisentraut wrote:

I notice in the documentation that the createdb --lc-ctype sets the
lc_ctype setting for the database, but the corresponding parameter for
CREATE DATABASE is CTYPE, but the global GUC setting is lc_ctype.
Should that be more consistent?

Hmm, I remember I pondered for a long time if it should be COLLATE and
CTYPE or LC_COLLATE and LC_CTYPE. I think the rationale in the end was
that a) COLLATE/CTYPE looks nicer and b) if we add support for ICU or
some other collation implementation, the association with LC_*
environment variables becomes misleading.

Being consistent would be nice, though.

I think consistency could be reached by renaming the GUC setting to
ctype. We could add a "lc_ctype" synonym for backwards compatibility
(like sort_mem) -- or maybe not.

Since the createdb setting is new as of 8.4, we should just rename that
to ctype as well.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#18)
Re: per-database locale: createdb switches

Alvaro Herrera <alvherre@commandprompt.com> writes:

Heikki Linnakangas wrote:

Hmm, I remember I pondered for a long time if it should be COLLATE and
CTYPE or LC_COLLATE and LC_CTYPE. I think the rationale in the end was
that a) COLLATE/CTYPE looks nicer and b) if we add support for ICU or
some other collation implementation, the association with LC_*
environment variables becomes misleading.

Being consistent would be nice, though.

I think consistency could be reached by renaming the GUC setting to
ctype.

I think this is a bad idea, particularly if you also rename the other
GUC to COLLATE (which is a reserved word that we're going to have to
implement someday). People know what LC_CTYPE and LC_COLLATE do,
at least if they've heard of Unix locale support at all (and if not
they can google those names successfully).

If we want consistency then the right answer is to rename the *new*
things to lc_xxx, not break compatibility on the names of the
existing things.

regards, tom lane

#20Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#14)
Re: per-database locale: createdb switches

Peter Eisentraut wrote:

Bruce Momjian wrote:

You missed updating the sgml docs, and personally I'd be inclined to
list -l before the individual --lc switches; otherwise it looks fine.

Thanks, committed that way. I noticed that --lc-ctype and --lc-collate
were forgotten in SGML docs, so I added them too.

Should we have a shorthand CREATE DATABASE option like that as well?

createdb is really about convenience; not sure it is warranted for
CREATE DATABASE.

I think unless you are doing something completely funny, you would
usually want to have COLLATE and CTYPE equal. The fact that you now
have to enter both to get that result could be pretty annoying in
practice, I would think.

I agree but I can't think of many cases where we offer one option which
controls two other options; can you?

We have cases like that:

initdb --locale
createdb --locale

It looks to me, however, that there is possible confusion about what
createdb --locale (as well as any possible option to be added to CREATE
DATABASE) really affects:

initdb --locale controls --lc-ctype, --lc-collate, --lc-messages,
--lc-monetary, --lc-numeric, --lc-time.

createdb --locale only controls --lc-ctype and --lc-collate. The
functionality to have database-specific settings of the other locale
categories already exists, so why shouldn't those be set as well?

Which raises yet another question, why CTYPE and COLLATE have to be
hardcoded settings and catalog columns instead of being stored in
datconfig as database-startup-only settings?

I was asking for cases where _SQL_ commands have one parameter that
controls two others, not command-line examples. Can you think of any?

FYI, I am fine adding the SQL-level option, I was just asking.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#21Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#19)
Re: per-database locale: createdb switches

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Heikki Linnakangas wrote:

Hmm, I remember I pondered for a long time if it should be COLLATE and
CTYPE or LC_COLLATE and LC_CTYPE. I think the rationale in the end was
that a) COLLATE/CTYPE looks nicer and b) if we add support for ICU or
some other collation implementation, the association with LC_*
environment variables becomes misleading.

Being consistent would be nice, though.

I think consistency could be reached by renaming the GUC setting to
ctype.

I think this is a bad idea, particularly if you also rename the other
GUC to COLLATE (which is a reserved word that we're going to have to
implement someday). People know what LC_CTYPE and LC_COLLATE do,
at least if they've heard of Unix locale support at all (and if not
they can google those names successfully).

If we want consistency then the right answer is to rename the *new*
things to lc_xxx, not break compatibility on the names of the
existing things.

Is anyone working on resolving this?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#22Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#21)
Re: per-database locale: createdb switches

Bruce Momjian wrote:

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Heikki Linnakangas wrote:

Hmm, I remember I pondered for a long time if it should be COLLATE and
CTYPE or LC_COLLATE and LC_CTYPE. I think the rationale in the end was
that a) COLLATE/CTYPE looks nicer and b) if we add support for ICU or
some other collation implementation, the association with LC_*
environment variables becomes misleading.

Being consistent would be nice, though.

I think consistency could be reached by renaming the GUC setting to
ctype.

I think this is a bad idea, particularly if you also rename the other
GUC to COLLATE (which is a reserved word that we're going to have to
implement someday). People know what LC_CTYPE and LC_COLLATE do,
at least if they've heard of Unix locale support at all (and if not
they can google those names successfully).

If we want consistency then the right answer is to rename the *new*
things to lc_xxx, not break compatibility on the names of the
existing things.

Is anyone working on resolving this?

I think we can just leave it for now.