[WIP] patch - Collation at database level

Started by Radek Strnadover 17 years ago31 messages

radek.strnad@gmail.com

over 17 years ago

1 attachment(s)

Hello,

I'm sending part of the code that I've done and is available for reviewing
and I'm asking for your comments and some help because I'm new to
PostgreSQL.

Proposal: http://archives.postgresql.org/pgsql-hackers/2008-05/msg00857.php

The code contains changes of parser, pg_collation catalog, parsenodes.h for
CREATE COLLATION... and DROP COLLATION statements.

Questions that I would like to know answers:

1) Should I move DROP statement from DropCollationStmt to DropStmt? Some
statements are seperated and some are included in DropStmt? Is there any
reason for that?
2) What type should all names in CREATE and DROP statement in gram.y have?
I've chosen qualified_name but I know it's not the best choice.
3) All collations are created from existing collations. How do I ensure that
the collation already exists? Is there any possibility to define it in
gram.y?
4) For further functionality development is there anything more needed than
adding T_CreateCollationStmt and T_DropCollationStmt to
/src/backend/tcop/utility.c, and write functionality into collation.c?
5) Also can you look at the pg_catalog and tell me if anything is wrong with
it?

Thank you for all your replies in advance.

Regards

Radek Strnad

Attachments:

collation.patchtext/x-diff; name=collation.patchDownload

Index: backend/parser/keywords.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/parser/keywords.c,v
retrieving revision 1.194
diff -c -r1.194 keywords.c
*** backend/parser/keywords.c	1 Jan 2008 19:45:50 -0000	1.194
--- backend/parser/keywords.c	1 Jul 2008 14:29:26 -0000
***************
*** 33,38 ****
--- 33,39 ----
  	/* name, value, category */
  	{"abort", ABORT_P, UNRESERVED_KEYWORD},
  	{"absolute", ABSOLUTE_P, UNRESERVED_KEYWORD},
+ 	{"accent", ACCENT, UNRESERVED_KEYWORD},
  	{"access", ACCESS, UNRESERVED_KEYWORD},
  	{"action", ACTION, UNRESERVED_KEYWORD},
  	{"add", ADD_P, UNRESERVED_KEYWORD},
***************
*** 81,87 ****
  	{"close", CLOSE, UNRESERVED_KEYWORD},
  	{"cluster", CLUSTER, UNRESERVED_KEYWORD},
  	{"coalesce", COALESCE, COL_NAME_KEYWORD},
! 	{"collate", COLLATE, RESERVED_KEYWORD},
  	{"column", COLUMN, RESERVED_KEYWORD},
  	{"comment", COMMENT, UNRESERVED_KEYWORD},
  	{"commit", COMMIT, UNRESERVED_KEYWORD},
--- 82,89 ----
  	{"close", CLOSE, UNRESERVED_KEYWORD},
  	{"cluster", CLUSTER, UNRESERVED_KEYWORD},
  	{"coalesce", COALESCE, COL_NAME_KEYWORD},
! 	{"collate", COLLATE, UNRESERVED_KEYWORD},
! 	{"collation", COLLATION, UNRESERVED_KEYWORD},
  	{"column", COLUMN, RESERVED_KEYWORD},
  	{"comment", COMMENT, UNRESERVED_KEYWORD},
  	{"commit", COMMIT, UNRESERVED_KEYWORD},
***************
*** 206,211 ****
--- 208,215 ----
  	{"language", LANGUAGE, UNRESERVED_KEYWORD},
  	{"large", LARGE_P, UNRESERVED_KEYWORD},
  	{"last", LAST_P, UNRESERVED_KEYWORD},
+ 	{"lccollate", LCCOLLATE, UNRESERVED_KEYWORD},
+ 	{"lcctype", LCCTYPE, UNRESERVED_KEYWORD},
  	{"leading", LEADING, RESERVED_KEYWORD},
  	{"least", LEAST, COL_NAME_KEYWORD},
  	{"left", LEFT, TYPE_FUNC_NAME_KEYWORD},
***************
*** 270,275 ****
--- 274,280 ----
  	{"overlay", OVERLAY, COL_NAME_KEYWORD},
  	{"owned", OWNED, UNRESERVED_KEYWORD},
  	{"owner", OWNER, UNRESERVED_KEYWORD},
+ 	{"pad", PAD, UNRESERVED_KEYWORD},
  	{"parser", PARSER, UNRESERVED_KEYWORD},
  	{"partial", PARTIAL, UNRESERVED_KEYWORD},
  	{"password", PASSWORD, UNRESERVED_KEYWORD},
***************
*** 317,322 ****
--- 322,328 ----
  	{"second", SECOND_P, UNRESERVED_KEYWORD},
  	{"security", SECURITY, UNRESERVED_KEYWORD},
  	{"select", SELECT, RESERVED_KEYWORD},
+ 	{"sensitive", SENSITIVE, UNRESERVED_KEYWORD},
  	{"sequence", SEQUENCE, UNRESERVED_KEYWORD},
  	{"serializable", SERIALIZABLE, UNRESERVED_KEYWORD},
  	{"session", SESSION, UNRESERVED_KEYWORD},
***************
*** 329,334 ****
--- 335,341 ----
  	{"simple", SIMPLE, UNRESERVED_KEYWORD},
  	{"smallint", SMALLINT, COL_NAME_KEYWORD},
  	{"some", SOME, RESERVED_KEYWORD},
+ 	{"space", SPACE, UNRESERVED_KEYWORD},
  	{"stable", STABLE, UNRESERVED_KEYWORD},
  	{"standalone", STANDALONE_P, UNRESERVED_KEYWORD},
  	{"start", START, UNRESERVED_KEYWORD},
***************
*** 337,342 ****
--- 344,350 ----
  	{"stdin", STDIN, UNRESERVED_KEYWORD},
  	{"stdout", STDOUT, UNRESERVED_KEYWORD},
  	{"storage", STORAGE, UNRESERVED_KEYWORD},
+ 	{"strcolfn", STRCOLFN, UNRESERVED_KEYWORD},
  	{"strict", STRICT_P, UNRESERVED_KEYWORD},
  	{"strip", STRIP_P, UNRESERVED_KEYWORD},
  	{"substring", SUBSTRING, COL_NAME_KEYWORD},
Index: backend/parser/gram.y
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/parser/gram.y,v
retrieving revision 2.605
diff -c -r2.605 gram.y
*** backend/parser/gram.y	1 Jan 2008 19:45:50 -0000	2.605
--- backend/parser/gram.y	1 Jul 2008 14:29:25 -0000
***************
*** 149,168 ****
  }
  
  %type <node>	stmt schema_stmt
! 		AlterDatabaseStmt AlterDatabaseSetStmt AlterDomainStmt AlterGroupStmt
! 		AlterObjectSchemaStmt AlterOwnerStmt AlterSeqStmt AlterTableStmt
! 		AlterUserStmt AlterUserSetStmt AlterRoleStmt AlterRoleSetStmt
! 		AnalyzeStmt ClosePortalStmt ClusterStmt CommentStmt
! 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
! 		CreateDomainStmt CreateGroupStmt CreateOpClassStmt
! 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
  		CreateSchemaStmt CreateSeqStmt CreateStmt CreateTableSpaceStmt
  		CreateAssertStmt CreateTrigStmt CreateUserStmt CreateRoleStmt
! 		CreatedbStmt DeclareCursorStmt DefineStmt DeleteStmt DiscardStmt
! 		DropGroupStmt DropOpClassStmt DropOpFamilyStmt DropPLangStmt DropStmt
! 		DropAssertStmt DropTrigStmt DropRuleStmt DropCastStmt DropRoleStmt
! 		DropUserStmt DropdbStmt DropTableSpaceStmt ExplainStmt FetchStmt
! 		GrantStmt GrantRoleStmt IndexStmt InsertStmt ListenStmt LoadStmt
  		LockStmt NotifyStmt ExplainableStmt PreparableStmt
  		CreateFunctionStmt AlterFunctionStmt ReindexStmt RemoveAggrStmt
  		RemoveFuncStmt RemoveOperStmt RenameStmt RevokeStmt RevokeRoleStmt
--- 149,167 ----
  }
  
  %type <node>	stmt schema_stmt
! 		AlterDatabaseStmt AlterDatabaseSetStmt AlterDomainStmt AlterGroupStmt 
! 		AlterObjectSchemaStmt AlterOwnerStmt AlterSeqStmt AlterTableStmt 
! 		AlterUserStmt AlterUserSetStmt AlterRoleStmt AlterRoleSetStmt AnalyzeStmt 
! 		ClosePortalStmt ClusterStmt CommentStmt	ConstraintsSetStmt CopyStmt 
! 		CreateAsStmt CreateCastStmt	CreateCollationStmt CreateDomainStmt 
! 		CreateGroupStmt CreateOpClassStmt CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
  		CreateSchemaStmt CreateSeqStmt CreateStmt CreateTableSpaceStmt
  		CreateAssertStmt CreateTrigStmt CreateUserStmt CreateRoleStmt
! 		CreatedbStmt DeclareCursorStmt DefineStmt DeleteStmt DiscardStmt 
! 		DropCollationStmt DropGroupStmt DropOpClassStmt DropOpFamilyStmt 
! 		DropPLangStmt DropStmt DropAssertStmt DropTrigStmt DropRuleStmt DropCastStmt 
! 		DropRoleStmt DropUserStmt DropdbStmt DropTableSpaceStmt ExplainStmt 
! 		FetchStmt GrantStmt GrantRoleStmt IndexStmt InsertStmt ListenStmt LoadStmt
  		LockStmt NotifyStmt ExplainableStmt PreparableStmt
  		CreateFunctionStmt AlterFunctionStmt ReindexStmt RemoveAggrStmt
  		RemoveFuncStmt RemoveOperStmt RenameStmt RevokeStmt RevokeRoleStmt
***************
*** 290,295 ****
--- 289,297 ----
  %type <list>	OptSeqList
  %type <defelt>	OptSeqElem
  
+ %type <list>	OptCollationList
+ %type <defelt>	OptCollationElem
+ 
  %type <istmt>	insert_rest
  
  %type <vsetstmt> set_rest SetResetClause
***************
*** 366,372 ****
   */
  
  /* ordinary key words in alphabetical order */
! %token <keyword> ABORT_P ABSOLUTE_P ACCESS ACTION ADD_P ADMIN AFTER
  	AGGREGATE ALL ALSO ALTER ALWAYS ANALYSE ANALYZE AND ANY ARRAY AS ASC
  	ASSERTION ASSIGNMENT ASYMMETRIC AT AUTHORIZATION
  
--- 368,374 ----
   */
  
  /* ordinary key words in alphabetical order */
! %token <keyword> ABORT_P ABSOLUTE_P ACCENT ACCESS ACTION ADD_P ADMIN AFTER
  	AGGREGATE ALL ALSO ALTER ALWAYS ANALYSE ANALYZE AND ANY ARRAY AS ASC
  	ASSERTION ASSIGNMENT ASYMMETRIC AT AUTHORIZATION
  
***************
*** 375,381 ****
  
  	CACHE CALLED CASCADE CASCADED CASE CAST CHAIN CHAR_P
  	CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
! 	CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
  	COMMITTED CONCURRENTLY CONFIGURATION CONNECTION CONSTRAINT CONSTRAINTS
  	CONTENT_P CONVERSION_P COPY COST CREATE CREATEDB
  	CREATEROLE CREATEUSER CROSS CSV CURRENT_P CURRENT_DATE CURRENT_ROLE
--- 377,383 ----
  
  	CACHE CALLED CASCADE CASCADED CASE CAST CHAIN CHAR_P
  	CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
! 	CLUSTER COALESCE COLLATE COLLATION COLUMN COMMENT COMMIT
  	COMMITTED CONCURRENTLY CONFIGURATION CONNECTION CONSTRAINT CONSTRAINTS
  	CONTENT_P CONVERSION_P COPY COST CREATE CREATEDB
  	CREATEROLE CREATEUSER CROSS CSV CURRENT_P CURRENT_DATE CURRENT_ROLE
***************
*** 404,410 ****
  
  	KEY
  
! 	LANCOMPILER LANGUAGE LARGE_P LAST_P LEADING LEAST LEFT LEVEL
  	LIKE LIMIT LISTEN LOAD LOCAL LOCALTIME LOCALTIMESTAMP LOCATION
  	LOCK_P LOGIN_P
  
--- 406,412 ----
  
  	KEY
  
! 	LANCOMPILER LANGUAGE LARGE_P LAST_P LCCOLLATE LCCTYPE LEADING LEAST LEFT LEVEL
  	LIKE LIMIT LISTEN LOAD LOCAL LOCALTIME LOCALTIMESTAMP LOCATION
  	LOCK_P LOGIN_P
  
***************
*** 417,423 ****
  	OBJECT_P OF OFF OFFSET OIDS OLD ON ONLY OPERATOR OPTION OR
  	ORDER OUT_P OUTER_P OVERLAPS OVERLAY OWNED OWNER
  
! 	PARSER PARTIAL PASSWORD PLACING PLANS POSITION
  	PRECISION PRESERVE PREPARE PREPARED PRIMARY
  	PRIOR PRIVILEGES PROCEDURAL PROCEDURE
  
--- 419,425 ----
  	OBJECT_P OF OFF OFFSET OIDS OLD ON ONLY OPERATOR OPTION OR
  	ORDER OUT_P OUTER_P OVERLAPS OVERLAY OWNED OWNER
  
! 	PAD PARSER PARTIAL PASSWORD PLACING PLANS POSITION
  	PRECISION PRESERVE PREPARE PREPARED PRIMARY
  	PRIOR PRIVILEGES PROCEDURAL PROCEDURE
  
***************
*** 427,437 ****
  	REPEATABLE REPLACE REPLICA RESET RESTART RESTRICT RETURNING RETURNS REVOKE
  	RIGHT ROLE ROLLBACK ROW ROWS RULE
  
! 	SAVEPOINT SCHEMA SCROLL SEARCH SECOND_P SECURITY SELECT SEQUENCE
! 	SERIALIZABLE SESSION SESSION_USER SET SETOF SHARE
! 	SHOW SIMILAR SIMPLE SMALLINT SOME STABLE STANDALONE_P START STATEMENT
! 	STATISTICS STDIN STDOUT STORAGE STRICT_P STRIP_P SUBSTRING SUPERUSER_P
! 	SYMMETRIC SYSID SYSTEM_P
  
  	TABLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN TIME TIMESTAMP
  	TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
--- 429,439 ----
  	REPEATABLE REPLACE REPLICA RESET RESTART RESTRICT RETURNING RETURNS REVOKE
  	RIGHT ROLE ROLLBACK ROW ROWS RULE
  
! 	SAVEPOINT SCHEMA SCROLL SEARCH SECOND_P SECURITY SELECT SENSITIVE
! 	SEQUENCE SERIALIZABLE SESSION SESSION_USER SET SETOF SHARE
! 	SHOW SIMILAR SIMPLE SMALLINT SOME SPACE STABLE STANDALONE_P START STATEMENT
! 	STATISTICS STDIN STDOUT STORAGE STRCOLFN STRICT_P STRIP_P SUBSTRING 
! 	SUPERUSER_P	SYMMETRIC SYSID SYSTEM_P
  
  	TABLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN TIME TIMESTAMP
  	TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
***************
*** 552,557 ****
--- 554,560 ----
  			| CreateAsStmt
  			| CreateAssertStmt
  			| CreateCastStmt
+ 			| CreateCollationStmt
  			| CreateConversionStmt
  			| CreateDomainStmt
  			| CreateFunctionStmt
***************
*** 575,580 ****
--- 578,584 ----
  			| DiscardStmt
  			| DropAssertStmt
  			| DropCastStmt
+ 			| DropCollationStmt
  			| DropGroupStmt
  			| DropOpClassStmt
  			| DropOpFamilyStmt
***************
*** 2410,2415 ****
--- 2414,2497 ----
  				}
  		;
  
+ /*****************************************************************************
+  *
+  *		QUERY :
+  *
+  * CREATE COLLATION <collation name> FOR <character set specification> 
+  * 		FROM <existing collation name> [STRCOLFN <fn name>] 
+  *		[ <pad characteristic> ] [ <case sensitive> ] [ <accent sensitive> ] 
+  *		[ LCCOLLATE <lc_collate> ] [ LCCTYPE <lc_ctype> ]
+  *
+  * DROP COLLATION <collation name>
+  *
+  *****************************************************************************/
+ 
+ CreateCollationStmt:
+ 			CREATE COLLATION qualified_name FOR qualified_name FROM qualified_name OptCollationList
+ 				{
+ 					CreateCollationStmt *n = makeNode(CreateCollationStmt);
+ 					n->name = $3;
+ 					n->for_charset = $5;
+ 					n->from_collation = $7;
+ 					n->options = $8;
+ 					$$ = (Node *)n;
+ 				}
+ 		;
+ 
+ OptCollationList: 
+ 			OptCollationList OptCollationElem		{ $$ = lappend($1, $2); }
+ 			| /*EMPTY*/								{ $$ = NIL; }
+ 		;
+ 
+ OptCollationElem:
+ 			NO PAD
+ 				{
+ 					$$ = makeDefElem("pad_characteristic", (Node *)makeInteger(FALSE));				
+ 				}
+ 			| PAD SPACE
+ 				{
+ 					$$ = makeDefElem("pad_characteristic", (Node *)makeInteger(TRUE));				
+ 				}
+ 			| CASE SENSITIVE
+ 				{
+ 					$$ = makeDefElem("case_sensitive", (Node *)makeInteger(TRUE));				
+ 				}
+ 			| CASE INSENSITIVE
+ 				{
+ 					$$ = makeDefElem("case_sensitive", (Node *)makeInteger(FALSE));								
+ 				}
+ 			| ACCENT SENSITIVE
+ 				{
+ 					$$ = makeDefElem("accent_sensitive", (Node *)makeInteger(TRUE));								
+ 				}
+ 			| ACCENT INSENSITIVE
+ 				{
+ 					$$ = makeDefElem("accent_sensitive", (Node *)makeInteger(FALSE));								
+ 				}
+ 			| LCCOLLATE qualified_name
+ 				{
+ 					$$ = makeDefElem("lc_collate_name", (Node *)makeString($2));				
+ 				}
+ 			| LCCTYPE qualified_name
+ 				{
+ 					$$ = makeDefElem("lc_ctype_name", (Node *)makeString($2));				
+ 				}
+ 			| STRCOLFN qualified_name
+ 				{
+ 					$$ = makeDefElem("str_col_function", (Node *)makeString($2));				
+ 				}
+ 		;
+ 
+ DropCollationStmt:
+ 			DROP COLLATION name
+ 				{
+ 					DropCollationStmt *n = makeNode(DropCollationStmt);
+ 					n->name = $3;
+ 					$$ = (Node *)n;
+ 				}
+ 		;
+ 
  
  /*****************************************************************************
   *
***************
*** 5349,5354 ****
--- 5431,5444 ----
  				{
  					$$ = makeDefElem("encoding", NULL);
  				}
+ 			| COLLATE opt_equal qualified_name
+ 				{
+ 					$$ = makeDefElem("collate", (Node *)makeString($3));
+ 				}
+ 			| COLLATE opt_equal DEFAULT
+ 				{
+ 					$$ = makeDefElem("collate", NULL);
+ 				}
  			| CONNECTION LIMIT opt_equal SignedIconst
  				{
  					$$ = makeDefElem("connectionlimit", (Node *)makeInteger($4));
***************
*** 9001,9006 ****
--- 9091,9097 ----
  unreserved_keyword:
  			  ABORT_P
  			| ABSOLUTE_P
+ 			| ACCENT
  			| ACCESS
  			| ACTION
  			| ADD_P
***************
*** 9027,9032 ****
--- 9118,9125 ----
  			| CLASS
  			| CLOSE
  			| CLUSTER
+ 			| COLLATE
+ 			| COLLATION
  			| COMMENT
  			| COMMIT
  			| COMMITTED
***************
*** 9106,9111 ****
--- 9199,9206 ----
  			| LANGUAGE
  			| LARGE_P
  			| LAST_P
+ 			| LCCOLLATE
+ 			| LCCTYPE
  			| LEVEL
  			| LISTEN
  			| LOAD
***************
*** 9142,9147 ****
--- 9237,9243 ----
  			| OPTION
  			| OWNED
  			| OWNER
+ 			| PAD
  			| PARSER
  			| PARTIAL
  			| PASSWORD
***************
*** 9179,9184 ****
--- 9275,9281 ----
  			| SEARCH
  			| SECOND_P
  			| SECURITY
+ 			| SENSITIVE
  			| SEQUENCE
  			| SERIALIZABLE
  			| SESSION
***************
*** 9186,9191 ****
--- 9283,9289 ----
  			| SHARE
  			| SHOW
  			| SIMPLE
+ 			| SPACE
  			| STABLE
  			| STANDALONE_P
  			| START
***************
*** 9194,9199 ****
--- 9292,9298 ----
  			| STDIN
  			| STDOUT
  			| STORAGE
+ 			| STRCOLFN
  			| STRICT_P
  			| STRIP_P
  			| SUPERUSER_P
***************
*** 9345,9351 ****
  			| CASE
  			| CAST
  			| CHECK
- 			| COLLATE
  			| COLUMN
  			| CONSTRAINT
  			| CREATE
--- 9444,9449 ----
Index: include/nodes/nodes.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/nodes/nodes.h,v
retrieving revision 1.205
diff -c -r1.205 nodes.h
*** include/nodes/nodes.h	1 Jan 2008 19:45:58 -0000	1.205
--- include/nodes/nodes.h	1 Jul 2008 14:29:26 -0000
***************
*** 313,318 ****
--- 313,320 ----
  	T_CreateEnumStmt,
  	T_AlterTSDictionaryStmt,
  	T_AlterTSConfigurationStmt,
+ 	T_CreateCollationStmt,
+ 	T_DropCollationStmt,
  
  	/*
  	 * TAGS FOR PARSE TREE NODES (parsenodes.h)
Index: include/nodes/parsenodes.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/nodes/parsenodes.h,v
retrieving revision 1.358
diff -c -r1.358 parsenodes.h
*** include/nodes/parsenodes.h	1 Jan 2008 19:45:58 -0000	1.358
--- include/nodes/parsenodes.h	1 Jul 2008 14:29:31 -0000
***************
*** 1198,1203 ****
--- 1198,1227 ----
  	bool		skip_validation;	/* skip validation of existing rows? */
  } FkConstraint;
  
+ /* ----------------------
+  *		Create/Drop Collation Statements
+  * ----------------------
+  */
+  
+  typedef struct CreateCollationStmt
+  {
+  	NodeTag		type;	
+  	char	   *name;
+  	char	   *for_charset;
+  	char	   *from_collation;
+  	List	   *options;
+  	
+  } CreateCollationStmt;
+  
+  typedef struct DropCollationStmt
+  {
+  	NodeTag		type;	
+  	char	   *name;
+  	
+  } DropCollationStmt;
+  
+ 
+ 
  
  /* ----------------------
   *		Create/Drop Table Space Statements
Index: src/include/catalog/pg_collation.h
===================================================================
RCS file: src/include/catalog/pg_collation.h
diff -N src/include/catalog/pg_collation.h
*** /dev/null	1 Jan 1970 00:00:00 -0000
--- src/include/catalog/pg_collation.h	1 Jan 1970 00:00:00 -0000
***************
*** 0 ****
--- 1,101 ----
+ /*-------------------------------------------------------------------------
+  *
+  * pg_collation.h
+  *	  definition of the system "collation" relation (pg_collation)
+  *	  along with the relation's initial contents.
+  *
+  *
+  * Portions Copyright (c) 1996-2008, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  * $PostgreSQL: pgsql/src/include/catalog/pg_depend.h,v 1.1 2008/04/10 19:45:56 momjian Exp $
+  *
+  * NOTES
+  *	  the genbki.sh script reads this file and generates .bki
+  *	  information from the DATA() statements.
+  *
+  *-------------------------------------------------------------------------
+  */
+ 
+ #ifndef PG_COLLATION_H
+ #define PG_COLLATION_H
+ 
+ /* ----------------
+  *		postgres.h contains the system type definitions and the
+  *		CATALOG(), BKI_BOOTSTRAP and DATA() sugar words so this file
+  *		can be read by both genbki.sh and the C compiler.
+  * ----------------
+  */
+ 
+ /* ----------------
+  *		pg_collation definition.  cpp turns this into
+  *		typedef struct FormData_pg_depend
+  * ----------------
+  */
+  
+ #define CollationRelationId  2978
+ 
+ enum {
+ 	PAD_CHARACTERISTIC_NO_PAD = 0;
+ 	PAD_CHARACTERISTIC_PAD_SPACE = 1;
+ 	
+ }	CollationPadCharacteristic;
+ 
+ enum {
+ 	CASE_INSENSITIVE = 0;
+ 	CASE_SENSITIVE = 1;
+ 		
+ }	CollationCaseSensitive;
+ 
+ enum {
+ 	ACCENT_INSENSITIVE = 0;
+ 	ACCENT_SENSITIVE = 1;
+ 	
+ }	CollationAccentSensitive;
+ 
+ 
+ CATALOG (pg_collation, 2978)
+ {
+   NameData    	colname;        		/* collation name */
+   Oid        	colschema;        		/* collation schema */
+   NameData   	colcharset;    			/*  character set specification */
+   Oid         	colexistingcollation; 	/* existing collation */
+   bool        	colpadattribute;    	/* pad attribute */
+   bool        	colcasesensitive;    	/* case sensitive */
+   bool        	colaccentsensitive;		/* accent sensitive */
+   NameData    	colcollate;    			/* lc_collate */
+   NameData    	colctype; 				/* lc_ctype */
+   regproc       colfunc;        		/* used collation function */	
+ 	
+ } FormData_pg_collation;
+ 
+ /* ----------------
+  *		Form_pg_collation corresponds to a pointer to a tuple with
+  *		the format of pg_collation relation.
+  * ----------------
+  */
+ typedef FormData_pg_collation *Form_pg_collation;
+ 
+ /* ----------------
+  *		compiler constants for pg_collation
+  * ----------------
+  */
+ #define Natts_pg_collation						10
+ #define Anum_pg_collation_colname				1
+ #define Anum_pg_collation_colschema				2
+ #define	Anum_pg_collation_colcharset			3
+ #define	Anum_pg_collation_colexistingcollation	4
+ #define	Anum_pg_collation_colpadattribute		5
+ #define	Anum_pg_collation_colcasesensitive		6
+ #define	Anum_pg_collation_colaccentsensitive	7
+ #define	Anum_pg_collation_colcollate			8
+ #define	Anum_pg_collation_colctype				9
+ #define	Anum_pg_collation_colfunc				10
+ 
+ /* ----------------
+  *		initial contents of pg_collation
+  * ----------------
+  */
+  
+  
+ #endif   /* PG_COLLATION_H */

Alvaro Herrera

alvherre@commandprompt.com

over 17 years ago

In reply to: Radek Strnad (#1)

Re: [WIP] patch - Collation at database level

Radek Strnad escribiï¿½:

2) What type should all names in CREATE and DROP statement in gram.y have?
I've chosen qualified_name but I know it's not the best choice.

I think it should be ColId.

3) All collations are created from existing collations. How do I ensure that
the collation already exists? Is there any possibility to define it in
gram.y?

Certainly not -- shouldn't they come from a catalog? In that case, it
must come in parse analysis (parser/analyze.c I guess) or perhaps later,
when you actually execute the function to create the new collation.

5) Also can you look at the pg_catalog and tell me if anything is wrong with
it?

Why does a collation have a schema?

What's the "existing collation"?

It seems a bit silly to have enum for what are basically boolean
variables. Why not just use "true" and "false"?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Alvaro Herrera (#2)

Re: [WIP] patch - Collation at database level

Alvaro Herrera <alvherre@commandprompt.com> writes:

Why does a collation have a schema?

Because the SQL spec says so. Also, if we don't put them in schemas,
we have no nice way to distinguish built-in and user-defined collations,
which creates a problem for pg_dump.

regards, tom lane

Alvaro Herrera

alvherre@commandprompt.com

over 17 years ago

In reply to: Tom Lane (#3)

Re: [WIP] patch - Collation at database level

Tom Lane escribiï¿½:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Why does a collation have a schema?

Because the SQL spec says so. Also, if we don't put them in schemas,
we have no nice way to distinguish built-in and user-defined collations,
which creates a problem for pg_dump.

Oh, I see :-) In that case, qualified_name would seem the right symbol
to use in the parser.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Gregory Stark

stark@enterprisedb.com

over 17 years ago

In reply to: Tom Lane (#3)

Re: [WIP] patch - Collation at database level

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Why does a collation have a schema?

Because the SQL spec says so. Also, if we don't put them in schemas,
we have no nice way to distinguish built-in and user-defined collations,
which creates a problem for pg_dump.

Out of curiosity, what is a "user-defined collation"? Are there SQL statements
to go around declaring what order code points should be sorted in? That seems
like it would be... quite tedious!

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

Martijn van Oosterhout

kleptog@svana.org

over 17 years ago

In reply to: Gregory Stark (#5)

Re: [WIP] patch - Collation at database level

On Wed, Jul 02, 2008 at 07:22:10PM +0100, Gregory Stark wrote:

Because the SQL spec says so. Also, if we don't put them in schemas,
we have no nice way to distinguish built-in and user-defined collations,
which creates a problem for pg_dump.

Out of curiosity, what is a "user-defined collation"? Are there SQL statements
to go around declaring what order code points should be sorted in? That seems
like it would be... quite tedious!

Not that we'll ever use it, but ICU for example allows users to say:
"use collation X but move this code point somewhere else", essentially
allowing users tweak the collation on a small scale. In any case,
whatever collation library is used, we're unlikely to predefine every
possible collation in the system, there's too many (assuming they're
denumerable).

Have a niceday,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

Radek Strnad

radek.strnad@gmail.com

over 17 years ago

In reply to: Gregory Stark (#5)

Re: [WIP] patch - Collation at database level

My patch should be "sort of" wrapper that will implement guts for further
development (collation at column level) like catalogs, creating collations
etc. When creating collation user will be able to choose which function to
use (by statement STRCOLFN - not in SQL standard). In the first stage I'll
implement function that will use system locales. Adding ICU or any other
library won't be that big deal.

Radek Strnad

On Wed, Jul 2, 2008 at 8:22 PM, Gregory Stark <stark@enterprisedb.com>
wrote:

Show quoted text

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Why does a collation have a schema?

Because the SQL spec says so. Also, if we don't put them in schemas,
we have no nice way to distinguish built-in and user-defined collations,
which creates a problem for pg_dump.

Out of curiosity, what is a "user-defined collation"? Are there SQL
statements
to go around declaring what order code points should be sorted in? That
seems
like it would be... quite tedious!

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Gregory Stark (#5)

Re: [WIP] patch - Collation at database level

Gregory Stark <stark@enterprisedb.com> writes:

Out of curiosity, what is a "user-defined collation"? Are there SQL statements
to go around declaring what order code points should be sorted in? That seems
like it would be... quite tedious!

Hm, that's a good point. SQL99 has

<collation definition> ::=
CREATE COLLATION <collation name> FOR
<character set specification>
FROM <existing collation name>
[ <pad characteristic> ]

<pad characteristic> ::=
NO PAD
| PAD SPACE

which seems pretty stupid if you ask me --- all the mechanism required
to manage a new object type, just to enable PAD SPACE or not?
(Especially when PAD SPACE itself is an utterly broken, useless concept
... but I digress.) You might as well just provide all the standard
collations in both variants and be done with it.

The statement looks the same in last year's 200n draft, so it's not
like they were just about to add some more capability.

We might be best off to treat collations like index access methods,
ie, they're theoretically add-able but there's no infrastructure for
managing them, and what's expected is that all the ones you need are
created by initdb.

regards, tom lane

Zdenek Kotala

Zdenek.Kotala@Sun.COM

over 17 years ago

In reply to: Tom Lane (#8)

Re: [WIP] patch - Collation at database level

Tom Lane napsal(a):

Gregory Stark <stark@enterprisedb.com> writes:

Out of curiosity, what is a "user-defined collation"? Are there SQL statements
to go around declaring what order code points should be sorted in? That seems
like it would be... quite tedious!

Hm, that's a good point. SQL99 has

<collation definition> ::=
CREATE COLLATION <collation name> FOR
<character set specification>
FROM <existing collation name>
[ <pad characteristic> ]

<existing collation name> ::= <collation name>

<pad characteristic> ::=
NO PAD
| PAD SPACE

which seems pretty stupid if you ask me --- all the mechanism required
to manage a new object type, just to enable PAD SPACE or not?
(Especially when PAD SPACE itself is an utterly broken, useless concept
... but I digress.) You might as well just provide all the standard
collations in both variants and be done with it.

The statement looks the same in last year's 200n draft, so it's not
like they were just about to add some more capability.

The proposed syntax of CREATE COLLATION is:

CREATE COLLATION <collation name> FOR <character set specification>
FROM <existing collation name> [STRCOLFN <fn name>]
[ <pad characteristic> ] [ <case sensitive> ] [ <accent sensitive> ]
[ LCCOLLATE <lc_collate> ] [ LCCTYPE <lc_ctype> ]

Which extends ANSI specification.

We might be best off to treat collations like index access methods,
ie, they're theoretically add-able but there's no infrastructure for
managing them, and what's expected is that all the ones you need are
created by initdb.

I think you cannot create all collation at bootstrap. You can only create record
for actual LC_COLLATION, because I there is not standard way how to obtain
complete list of supported collations and there is also problem if you install
new locales after initdb.

When I looked to another DB (MS SQL, MySQL, DB2, Firebird) then only Firebird
supports CREATE COLLATION command. Other databases has hard coded list of
locales. Hardcoded solution means to use some lib (e.g ICU) with unified names
or has locale name mapping for all supported OS.

I personally prefer open solution when I can create own collation and specify
collation function to handle it.

Zdenek

#10

Zdenek Kotala

Zdenek.Kotala@Sun.COM

over 17 years ago

In reply to: Tom Lane (#8)

Re: [WIP] patch - Collation at database level

Tom Lane napsal(a):

Gregory Stark <stark@enterprisedb.com> writes:

Out of curiosity, what is a "user-defined collation"? Are there SQL statements
to go around declaring what order code points should be sorted in? That seems
like it would be... quite tedious!

<snip>

We might be best off to treat collations like index access methods,
ie, they're theoretically add-able but there's no infrastructure for
managing them, and what's expected is that all the ones you need are
created by initdb.

I though more about it and I discussed it with Radek yesterday. The problem is
that collation must be created before user want to use CREATE DATABASE ...
COLLATE ... command. It inclines to have have pg_collation as a global catalog,
but ANSI specifies to use schema name in collation specification and schemes are
database specific ... It means that pg_collation have to be non-shared catalog
and new database only inherits collation from template db. And CREATE DATABASE
have to check list of collation in template database :(.

My conclusion is that CREATE COLLATION does not make much sense. I see two
possible solutions:

1) have global an local catalog for collation and have modified variants of
create cmd:

CREATE COLLATION ... GLOBAL|LOCAL

CREATE DATABASE will use only collation from global catalog

Local catalog will be useful when full support of collation will be available
mostly for specifying case sensitivity of collation.

2) Use Tom's suggested approach. Create list of collations in initdb phase. But
there is problem how to obtain list of supported collation on the server. I
think, only what is possible to do is to use default locale for creating default
collation for template1 database.

Any suggestion?

thanks Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql

#11

Zdenek Kotala

Zdenek.Kotala@Sun.COM

over 17 years ago

In reply to: Zdenek Kotala (#10)

Re: [WIP] patch - Collation at database level

Zdenek Kotala napsal(a):

I though more about it and I discussed it with Radek yesterday. The
problem is that collation must be created before user want to use CREATE
DATABASE ... COLLATE ... command. It inclines to have have pg_collation
as a global catalog, but ANSI specifies to use schema name in collation
specification and schemes are database specific ... It means that
pg_collation have to be non-shared catalog and new database only
inherits collation from template db. And CREATE DATABASE have to check
list of collation in template database :(.

thinking more ...
It must be shared catalog because pg_database will depend on it.

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql

#12

Martijn van Oosterhout

kleptog@svana.org

over 17 years ago

In reply to: Zdenek Kotala (#11)

Re: [WIP] patch - Collation at database level

On Tue, Jul 08, 2008 at 11:27:35AM +0200, Zdenek Kotala wrote:

Zdenek Kotala napsal(a):

I though more about it and I discussed it with Radek yesterday. The
problem is that collation must be created before user want to use CREATE
DATABASE ... COLLATE ... command. It inclines to have have pg_collation
as a global catalog, but ANSI specifies to use schema name in collation
specification and schemes are database specific ... It means that
pg_collation have to be non-shared catalog and new database only
inherits collation from template db. And CREATE DATABASE have to check
list of collation in template database :(.

thinking more ...
It must be shared catalog because pg_database will depend on it.

Not necessarily. pg_class is not shared yet without it you can't even
find pg_database. Same deal with pg_type. All it means is that
pg_collation in template1 must contain all the collations used in
template1, which shouldn't be hard to arrange.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

#13

Zdenek Kotala

Zdenek.Kotala@Sun.COM

over 17 years ago

In reply to: Martijn van Oosterhout (#12)

Re: [WIP] patch - Collation at database level

Martijn van Oosterhout napsal(a):

On Tue, Jul 08, 2008 at 11:27:35AM +0200, Zdenek Kotala wrote:

Zdenek Kotala napsal(a):

I though more about it and I discussed it with Radek yesterday. The
problem is that collation must be created before user want to use CREATE
DATABASE ... COLLATE ... command. It inclines to have have pg_collation
as a global catalog, but ANSI specifies to use schema name in collation
specification and schemes are database specific ... It means that
pg_collation have to be non-shared catalog and new database only
inherits collation from template db. And CREATE DATABASE have to check
list of collation in template database :(.

thinking more ...
It must be shared catalog because pg_database will depend on it.

Not necessarily. pg_class is not shared yet without it you can't even
find pg_database. Same deal with pg_type. All it means is that
pg_collation in template1 must contain all the collations used in
template1, which shouldn't be hard to arrange.

I think, Collation situation is different, becasue pg_database will contains
column colname. pg_class (and all bootstrap catalog) only contains row which
specify that shared table exists and content is cloned to the new database from
template database. In corner case you can get context specific dependency for
example if Czech collation will have oid=10 in database test01 and Swedish
collation will have oid=10 in database test02. How to handle CREATE DATABASE and
connect database? OK it shouldn't happen in normal situation but ...

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql

#14

Martijn van Oosterhout

kleptog@svana.org

over 17 years ago

In reply to: Zdenek Kotala (#13)

Re: [WIP] patch - Collation at database level

On Tue, Jul 08, 2008 at 12:00:34PM +0200, Zdenek Kotala wrote:

Not necessarily. pg_class is not shared yet without it you can't even
find pg_database. Same deal with pg_type. All it means is that
pg_collation in template1 must contain all the collations used in
template1, which shouldn't be hard to arrange.

I think, Collation situation is different, becasue pg_database will
contains column colname. pg_class (and all bootstrap catalog) only contains
row which specify that shared table exists and content is cloned to the new
database from template database. In corner case you can get context
specific dependency for example if Czech collation will have oid=10 in
database test01 and Swedish collation will have oid=10 in database test02.
How to handle CREATE DATABASE and connect database? OK it shouldn't happen
in normal situation but ...

Oh I see, you're referring to the storage of the default collation for
a database. I was jumping ahead to the per-column collation state, when
the collation default is attached to columns, types and domains, and
not at the database level. So there the problem does not exist.

To be honest, I'd suggest storing the collation in pg_database as a
string, rather than as an identifier. This sidesteps the problem
entirly.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

#15

Zdenek Kotala

Zdenek.Kotala@Sun.COM

over 17 years ago

In reply to: Martijn van Oosterhout (#14)

Re: [WIP] patch - Collation at database level

Martijn van Oosterhout napsal(a):

Oh I see, you're referring to the storage of the default collation for
a database. I was jumping ahead to the per-column collation state, when
the collation default is attached to columns, types and domains, and
not at the database level. So there the problem does not exist.

Yeah, but you still need one source/one collation list for database, scheme,
table and column. And of course shared tables need also collation for their indexes.

To be honest, I'd suggest storing the collation in pg_database as a
string, rather than as an identifier. This sidesteps the problem
entirly.

I don't think that string is good idea. You need to use same approach on all
levels by my opinion.

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql

#16

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Zdenek Kotala (#13)

Re: [WIP] patch - Collation at database level

Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:

Martijn van Oosterhout napsal(a):

Not necessarily. pg_class is not shared yet without it you can't even
find pg_database. Same deal with pg_type. All it means is that
pg_collation in template1 must contain all the collations used in
template1, which shouldn't be hard to arrange.

I think, Collation situation is different,

All the argument here is based on the premise that we should have
database-level collation specifications, which AFAICS is not required
nor suggested by the SQL spec. I wonder why we are allowing a
nonstandard half-measure to drive our thinking, rather than solving the
real problem which is column-level collations.

regards, tom lane

#17

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Zdenek Kotala (#15)

Re: [WIP] patch - Collation at database level

Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:

... And of course shared tables need also collation for their indexes.

No, they don't, because the only textual indexes on shared catalogs are
on "name" columns, which are intentionally not locale aware, and
wouldn't be collation aware either.

regards, tom lane

#18

Andrew Dunstan

andrew@dunslane.net

over 17 years ago

In reply to: Tom Lane (#16)

Re: [WIP] patch - Collation at database level

Tom Lane wrote:

All the argument here is based on the premise that we should have
database-level collation specifications, which AFAICS is not required
nor suggested by the SQL spec. I wonder why we are allowing a
nonstandard half-measure to drive our thinking, rather than solving the
real problem which is column-level collations.

Agreed. Are we even sure that we want per-database collations as a
half-way house? Unless we can be sure that we want all the required
catalog changes for the full requirement, it seems to me a rather messy
way of getting to where we want to go.

cheers

andrew

#19

Martijn van Oosterhout

kleptog@svana.org

over 17 years ago

In reply to: Andrew Dunstan (#18)

Re: [WIP] patch - Collation at database level

On Tue, Jul 08, 2008 at 10:54:28AM -0400, Andrew Dunstan wrote:

Agreed. Are we even sure that we want per-database collations as a
half-way house? Unless we can be sure that we want all the required
catalog changes for the full requirement, it seems to me a rather messy
way of getting to where we want to go.

Given that the current projected timeframe for full COLLATE support is
something like 2-3 major releases, I don't really see the problem with
doing this now. Who knows, it might prompt people to do something
sooner.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

#20

Josh Berkus

josh@agliodbs.com

over 17 years ago

In reply to: Andrew Dunstan (#18)

Re: [WIP] patch - Collation at database level

Andrew, Tom,

Agreed. Are we even sure that we want per-database collations as a
half-way house? Unless we can be sure that we want all the required
catalog changes for the full requirement, it seems to me a rather messy
way of getting to where we want to go.

Given that we don't have a delivery date for table or column level collations,
we don't want to turn down database-level collations. If nothing else,
Radek's work will expose what areas of our code are collation-dependant and
hopefully make the work of more granular collations easier. And if it takes
us 3 years to get more granular collations, at least people can use
database-level ones in the meantime so that they don't need to have separate
PostgreSQL binaries for every language they want to support fully.

Also ... this is a Summer of Code Project, which we accepted, which at least
in Google and the student's eyes means we're not going to discard the entire
premise of the patch. I'm not exaggerating when I say doing something like
that could get PostgreSQL permanently banned from Google SoC.

Tom, I think you need to be on the SoC committee in the future, just to raise
objections. Some 15+ PostgreSQL contributors on the SoC committee approved
Radek's project.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

#21

Michael Paesold

mpaesold@gmx.at

over 17 years ago

In reply to: Tom Lane (#16)

Re: [WIP] patch - Collation at database level

Tom Lane wrote:

Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:

Martijn van Oosterhout napsal(a):

Not necessarily. pg_class is not shared yet without it you can't
even
find pg_database. Same deal with pg_type. All it means is that
pg_collation in template1 must contain all the collations used in
template1, which shouldn't be hard to arrange.

I think, Collation situation is different,

All the argument here is based on the premise that we should have
database-level collation specifications, which AFAICS is not required
nor suggested by the SQL spec. I wonder why we are allowing a
nonstandard half-measure to drive our thinking, rather than solving
the
real problem which is column-level collations.

Wouldn't you still need per-database and per-table default collations?
At least MySQL does have such a concept.

Best Regards
Michael Paesold

#22

Zdenek Kotala

Zdenek.Kotala@Sun.COM

over 17 years ago

In reply to: Tom Lane (#17)

Re: [WIP] patch - Collation at database level

Tom Lane napsal(a):

Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:

... And of course shared tables need also collation for their indexes.

No, they don't, because the only textual indexes on shared catalogs are
on "name" columns, which are intentionally not locale aware, and
wouldn't be collation aware either.

Yeah, name uses strcmp, which is not locale aware but from ANSI perspective
there is collation SQL_IDENTIFIER for it which is fortunately implementation
defined.

What I see now as the problem is that we need also to know correct collation for
ORDER - for example:

select * from pg_shdescription order by description;

...thinking...

but it should solve by collation per column which will work well with
pg_attribute cloning for new database as Martijn mentioned.

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql

#23

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Josh Berkus (#20)

Re: [WIP] patch - Collation at database level

Josh Berkus <josh@agliodbs.com> writes:

Andrew, Tom,

Agreed. Are we even sure that we want per-database collations as a
half-way house? Unless we can be sure that we want all the required
catalog changes for the full requirement, it seems to me a rather messy
way of getting to where we want to go.

Given that we don't have a delivery date for table or column level collations,
we don't want to turn down database-level collations.

I am one hundred percent prepared to turn them down, if they end up
contorting the design in a way that we will have to undo (with
consequent backwards-compatibility problems) to get to the full feature.

If it's a partial implementation of the full feature, that's fine, but
I'm not getting good vibes about that from the discussions so far.

regards, tom lane

#24

Zdenek Kotala

Zdenek.Kotala@Sun.COM

over 17 years ago

In reply to: Tom Lane (#16)

Re: [WIP] patch - Collation at database level

Tom Lane napsal(a):

Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:

Martijn van Oosterhout napsal(a):

Not necessarily. pg_class is not shared yet without it you can't even
find pg_database. Same deal with pg_type. All it means is that
pg_collation in template1 must contain all the collations used in
template1, which shouldn't be hard to arrange.

I think, Collation situation is different,

All the argument here is based on the premise that we should have
database-level collation specifications, which AFAICS is not required
nor suggested by the SQL spec.

Yeah, it is not required, but by my opinion it should be derived from CREATE
SCHEMA statement. There is following item:

--- SQL ANSI 2003 page 520 ---

5) If <schema character set specification> is not specified, then a <schema
character set specification> that specifies an implementation-defined character
set that contains at least every character that is in <SQL language character>
is implicit.

----

It is not for collation directly, but if I understand it correctly when you want
to create schema then default charset is inherit from parent instance which is
database (catalog).

Following sentence specified that pg_collation should be database specific.

---- SQL ANSI 2003 page 15 ---
Character sets defined by standards or by SQL-implementations reside in the
Information Schema (named INFORMATION_SCHEMA) in each catalog, as do collations
defined by standards and collations, transliterations, and transcodings defined
by SQL-implementations.
----

I wonder why we are allowing a
nonstandard half-measure to drive our thinking, rather than solving the
real problem which is column-level collations.

I try to determine how to implement collation itself - collation catalog
structure and content and how to create new collation. Column-level collation is
nice but until we will not have basic infrastructure we cannot start implemented it.

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql

#25

Zdenek Kotala

Zdenek.Kotala@Sun.COM

over 17 years ago

In reply to: Andrew Dunstan (#18)

Re: [WIP] patch - Collation at database level

Andrew Dunstan napsal(a):

Tom Lane wrote:

All the argument here is based on the premise that we should have
database-level collation specifications, which AFAICS is not required
nor suggested by the SQL spec. I wonder why we are allowing a
nonstandard half-measure to drive our thinking, rather than solving the
real problem which is column-level collations.

Agreed. Are we even sure that we want per-database collations as a
half-way house? Unless we can be sure that we want all the required
catalog changes for the full requirement, it seems to me a rather messy
way of getting to where we want to go.

Andrew,
I would like also to see full collation implementation rather then collation per
database. But from my point of view split collation into small parts is much
better. Radek's work is mostly about creating infrastructure for full collation
support. When it will be finished, then "only" changes in parser, executor...
will be necessary to complete a job.

If you look on most discussion about collation they fell into ICU yes/no problem
without any real decision how to implemented the feature.

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql

#26

Martijn van Oosterhout

kleptog@svana.org

over 17 years ago

In reply to: Zdenek Kotala (#24)

Re: [WIP] patch - Collation at database level

On Tue, Jul 08, 2008 at 09:05:11PM +0200, Zdenek Kotala wrote:

All the argument here is based on the premise that we should have
database-level collation specifications, which AFAICS is not required
nor suggested by the SQL spec.

Yeah, it is not required, but by my opinion it should be derived from
CREATE SCHEMA statement. There is following item:
--- SQL ANSI 2003 page 520 ---
5) If <schema character set specification> is not specified, then a <schema

Careful, this is a 'character set specification" which has (almost)
nothing to do with collation. It's closer to the encoding field, which
is already in pg_database.

The issue with having a "default database collation" is that it's
unclear where it would be used. In the end the collation is defined by
the types and domains. Columns inherit from the types. I think the only
senseible definition is to decide that all the text/varchar/char types
inherit from the database. It's not in the spec but I think it does
make easier to decide what the default collation is. As an alternative
to:

ALTER TYPE text SET DEFAULT COLLATION TO foo;
<repeat for all other text types>

I try to determine how to implement collation itself - collation catalog
structure and content and how to create new collation. Column-level
collation is nice but until we will not have basic infrastructure we cannot
start implemented it.

I agree that this patch includes much basic work that needs to be done
for full collation support.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

#27

Zdenek Kotala

Zdenek.Kotala@Sun.COM

over 17 years ago

In reply to: Martijn van Oosterhout (#26)

Re: [WIP] patch - Collation at database level

Martijn van Oosterhout napsal(a):

On Tue, Jul 08, 2008 at 09:05:11PM +0200, Zdenek Kotala wrote:
All the argument here is based on the premise that we should have
database-level collation specifications, which AFAICS is not required
nor suggested by the SQL spec.

Yeah, it is not required, but by my opinion it should be derived from
CREATE SCHEMA statement. There is following item:
--- SQL ANSI 2003 page 520 ---
5) If <schema character set specification> is not specified, then a <schema
Careful, this is a 'character set specification" which has (almost)
nothing to do with collation. It's closer to the encoding field, which
is already in pg_database.

Yeah, I'm confuse why for character set is defined behavior when is not
specified but there is not definition how to handle it for collation.

Maybe because charset has default collation.

The issue with having a "default database collation" is that it's
unclear where it would be used. In the end the collation is defined by
the types and domains. Columns inherit from the types. I think the only
senseible definition is to decide that all the text/varchar/char types
inherit from the database. It's not in the spec but I think it does
make easier to decide what the default collation is. As an alternative
to:

ALTER TYPE text SET DEFAULT COLLATION TO foo;
<repeat for all other text types>

It is good point I thought about following situation

select 'xxx' < 'yyy';

But how you mentioned because both are text and text type has default collation
there is no reason to have collation per database.

Hmm, It seems to me that expectation should be create all string data types with
default collation during initdb phase. It means modify BKI types definition
according to default locale setting.

Zdenek

#28

Peter Eisentraut

peter_e@gmx.net

over 17 years ago

In reply to: Radek Strnad (#1)

Re: [WIP] patch - Collation at database level

Am Tuesday, 1. July 2008 schrieb Radek Strnad:

I'm sending part of the code that I've done and is available for reviewing
and I'm asking for your comments and some help because I'm new to
PostgreSQL.

Proposal: http://archives.postgresql.org/pgsql-hackers/2008-05/msg00857.php

The code contains changes of parser, pg_collation catalog, parsenodes.h for
CREATE COLLATION... and DROP COLLATION statements.

This patch is a small start on the way to adding a catalog and some SQL
statements that add and remove entries from it. But I don't see any
collation support here, which is about 99% of the work that is left to do.
Where are the collations going to come from? Have the various build and
distributions issues been thought about? How are they going to be configured
(not the SQL syntax, but how will the configuration be applied)? How are the
collations going to be applied at run-time? How are you going to handle
locale and encoding conflicts? I also think that the clauses you have
attached to your CREATE COLLATION statement (case-insensitive,
accent-insensitive) are an oversimplification of reality. I suggest you look
up the Unicode collation algorithm to learn about who collations work in
practice.

In my opinion, you are starting this project from the wrong end. I would
suggest you approach it like this:

- Find some collation implementations
- Patch PostgreSQL to link with them
- Patch PostgreSQL to apply them for comparison
- Implement system catalog to configure them
- Implement SQL statements to manipulate the system catalog

#29

Gregory Stark

stark@enterprisedb.com

over 17 years ago

In reply to: Peter Eisentraut (#28)

Re: [WIP] patch - Collation at database level

"Peter Eisentraut" <peter_e@gmx.net> writes:

In my opinion, you are starting this project from the wrong end. I would
suggest you approach it like this:

- Find some collation implementations
- Patch PostgreSQL to link with them

Well I think the feeling is that we may as well start with the lowest common
denominator of libc's collation implementation. It's the only one everyone's
going to have. Later adding compile-time options to use a different library
and different function calls might be useful but a lot of people are unhappy
about the idea of *requiring* a major outside library for this.

- Patch PostgreSQL to apply them for comparison

Er, yes. Well we do that already but the tricky bit is keeping track of
multiple collations and applying the right one for each comparison.

So we do need the concept of multiple collations and the syntax to select a
collation for each ordering operation.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

#30

Radek Strnad

radek.strnad@gmail.com

over 17 years ago

In reply to: Peter Eisentraut (#28)

Re: [WIP] patch - Collation at database level

Hello,

the main reason why I've submitted the patch was to start a discussion and
know other people's opinion on this problem.

On Tue, Jul 29, 2008 at 10:41 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

Where are the collations going to come from?

There will be two new catalogs - pg_collate and pg_charset. Each of them
will be filled with ANSI standard collations and charsets (ISO8BIT, LATIN1,
UTF-8..) and alternatively with default collation set when creating. For
instance if you create database cluster with initdb and specify en_US.utf8
there will be standard rows (ISO8BIT, LATIN1, UTF-8..) + one row with
en_US.utf8 in template0. Then you can connect to template0 and create other
collations if your POSIX locales support them and use them one per each
database.

Have the various build and distributions issues been thought about?

Yes. Since POSIX locales doesn't guarantee any collation there will be
hard-coded collations implemented regarding ANSI collation standard. Others
can be set by command CREATE COLLATION.

How are they going to be configured (not the SQL syntax, but how will the

configuration be applied)?

pg_type, pg_attribute, pg_namespace of each database will be extended with
collation oid column that will be specifying collation.

How are the collations going to be applied at run-time?

Collation will be set when connecting to the database with
setlocale(LC_COLLATION, XXX) and setlocale(LC_CTYPE, XXX)

How are you going to handle locale and encoding conflicts?

Since I'm currently implementing collation support per database I don't
think this is an issue. (It will be in the future I know.)

I also think that the clauses you have attached to your CREATE COLLATION
statement (case-insensitive,
accent-insensitive) are an oversimplification of reality. I suggest you
look
up the Unicode collation algorithm to learn about who collations work in
practice.

I already did in the very beginning of the development. The reason why I'm
not implementing the whole Unicode collation algorithm is that this patch
shold be sort of framework. You'll be able to use different collation
functions not only POSIX locales so further development towards full Unicode
collation algorithm is possible.

At the end of the next week I'll publish my bachelor thesis concerning this
topic where everything will be explained in details so stay tuned.

Regards

Radek Strnad

#31

Martijn van Oosterhout

kleptog@svana.org

over 17 years ago

In reply to: Radek Strnad (#30)

Re: [WIP] patch - Collation at database level

On Sat, Aug 02, 2008 at 03:39:18PM +0200, Radek Strnad wrote:

I also think that the clauses you have attached to your CREATE
COLLATION statement (case-insensitive, accent-insensitive) are an
oversimplification of reality. I suggest you look up the Unicode
collation algorithm to learn about who collations work in practice.

I already did in the very beginning of the development. The reason why I'm
not implementing the whole Unicode collation algorithm is that this patch
shold be sort of framework. You'll be able to use different collation
functions not only POSIX locales so further development towards full Unicode
collation algorithm is possible.

Agreed. Ofcourse it's a simplification of reality. POSIX locales are a
simplification of reality, but its the only form of collation currently
available to us. And quite frankly, I don't beleive postgresql should
be in the business of writing collation algorithms, we don't have the
expertese.

FWIW, I think case-insensitive and accent-insensitive are useful modifiers
that we should aim to support in the future.

At the end of the next week I'll publish my bachelor thesis concerning this
topic where everything will be explained in details so stay tuned.

Good luck!

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.